|
|
||||||||
a Dep. of Plant Agriculture, Univ. of Guelph, Guelph, Ontario, Canada N1G 2W1
b Dep. of Agronomy and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0091
c Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Lisboa 27, Apdo. Postal 6-641, 06600 Mexico D.F., Mexico
* Corresponding author (wyan{at}uoguelph.ca)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: G, genotypic main effect GE, genotype x environment interaction GGE, Genotype main effects plus genotype x environment interaction E, environment main effect SREGM+1, Mandel's sites regression model with one additional multiplicative term PC, principle component SREG2, Sites regression model with two multiplicative terms SVD, singular value decomposition
| INTRODUCTION |
|---|
|
|
|---|
Yan et al. (2000) developed a "GGE biplot" methodology for graphical analysis of MET data. "GGE" refers to the genotype main effect (G) plus the genotype x environment interaction (GE), which are the two sources of variation that are relevant to cultivar evaluation. A biplot (Gabriel, 1971) is a plot that simultaneously displays both the genotypes and the environments (or in more general terms, both the row and the column factors). The GGE biplot is a biplot that displays the GGE of MET data. It is constructed by plotting the first two principal components (PC1 and PC2, also referred to as primary and secondary effects, respectively) derived from singular value decomposition (SVD) of the environment-centered data. Models that decompose the environment-centered data are commonly referred to as sites regression models or SREG, and SREG with two PCs is referred to as SREG2. SREG can be used on scaled or non-scaled data. When replicated data are available, SREG on scaled data (Crossa and Cornelius, 1997) is more desirable because it deals with any heterogeneity of within-site error variance.
One unique merit of a GGE biplot is that it can graph-ically show the which-won-where patterns of the data, as first described in Yan et al. (2000). Briefly, markers of the cultivars furthest from the plot origin (0,0) are connected with straight lines to form a polygon such that markers of all other cultivars are contained in the polygon. To each side of the polygon, a perpendicular line, starting from the origin of the biplot, is drawn and extended beyond the polygon so that the biplot is divided into several sectors and the markers of the test sites are separated into different sectors. The cultivar at the vertex for each sector is the best performer at sites included in that sector, provided that the GGE is sufficiently approximated by PC1 and PC2. Thus, groups of sites that share the same best performers are graphically identified.
If the which-won-where patterns identified by a biplot are repeatable over years, different mega-environments (subregions) can be defined. By selecting superior cultivars for each mega-environment, both G and GE can be effectively exploited. The GGE biplot is still useful even in cases where the which-won-where patterns are not repeatable over years, which suggests that the tested environments belong to a single mega-environment. It can be used to identify superior cultivars and test environments that facilitate identification of such cultivars, provided that the target mega-environment is sufficiently sampled and that the genotype PC1 scores have near-perfect correlation (say, r > 0.95) with the genotype main effects. Ideal cultivars should have large PC1 scores (higher average yield) and near zero PC2 scores (more stable). Similarly, ideal test environments should have large PC1 scores (more discriminating of the cultivars) and near zero PC2 scores (more representative of an average environment). (Note that a "test environment" refers to a year-site combination; it does not necessarily correspond to a "test site".) Thus, the GGE biplot allows many important questions to be addressed effectively and graphically.
However, the requirement for a near-perfect correlation between genotype PC1 scores and genotype main effects is not always met, which restricts to the utility of the SREG2 based GGE biplot. Analysis of the yearly MET data of the Ontario winter wheat performance trials during 1989-1999, and of winter wheat performance trials from several states of the USA (Yan, unpublished) indicates that the genotype PC1 scores are usually highly correlated with the genotype main effect. Poor correlations between genotype PC1 scores and genotype main effects, however, do occur for some years. Moreover, when multiple years of data are analyzed together, this becomes a norm rather than an exception because of large and complex GE interaction (discussed later). In such cases, the genotype PC1 scores cannot be interpreted as representing the same information as the genotype main effects. Consequently, the yielding ability and stability of the genotypes, and the discriminating ability and the representativeness of the test environments cannot be readily visualized.
To avoid these possible exceptions, in this paper we report an alternative GGE biplot, which is constructed by Mandel's sites regression on genotype main effects as the primary effect and the first principal component derived from subjecting that residual to SVD as the secondary effect. Such a GGE biplot is referred to as a SREGM+1 biplot, with the subscript "M" referring to Mandel's solution. In a SREGM+1 biplot, the primary effects are the genotype main effects per se; it is, therefore, free from the problem discussed above for the SREG2 biplot. However, it is not clear if a SREGM+1 biplot is as effective as the SREG2 biplot in explaining the GGE and in displaying the which-won-where patterns of the data. This study was initiated to answer these questions by comparing the SREG2 biplot and the SREGM+1 biplot applied to several datasets that showed different relations between genotype PC1 scores of SREG2 and the genotype main effects.
| MATERIALS AND METHODS |
|---|
|
|
|---|
![]() | (1) |
n is the singular value for principal component PCn,
in and
jn are scores for Genotype i and Environment j on PCn, respectively, and
ij is the residual associated with Genotype i in Environment j. The values of
n,
in, and
jn are simultaneously obtained by subjecting the environment-centered yield (i.e., Yij-ßj) to SVD. This can be achieved by principal component analysis of the environment-centered yield using the SAS procedure PRINCOMP. The PRINCOMP generates
in as the genotype scores and (
n
in) as the environment scores. Alternatively,
n,
in and
jn can be obtained by the SVD function within the SAS procedure IML, which is a basic function in many SAS procedures related to principal component analysis. A SAS program for principal component analysis of MET data is available from the senior author of this paper.
To display results of fitting Eq. [1] in a biplot, the singular value
n has to be absorbed by the singular vector for cultivars hjn and that for environments
in. That is,
*in =
Ann
in and
*jn =
1-Ann
jn . An is chosen such that the range of the environment markers is equal to the range of the cultivar markers:
![]() |
![]() | (2) |
The SREGM+1 Biplot
Mandel (1961) presented the following model for analysis of non-additivity of two-way data:
![]() | (3) |
i is the main effect of Genotype i, and bj is the regression coefficient of the environment centered yields (i.e., Yij - ßj) within Environment j on the genotype main effects (ai). Equation [3] is similar to the well-known model of Finlay and Wilkinson (1963), but the roles of cultivars and sites are exchanged.
If the first principal component (
1
i1
j1) from SVD of the residual from Eq. [3], i.e., (Yij - ßj - bj
i), is added, then
![]() | (4) |
![]() | (5) |
*i1 =
A11
il,
*j1 =
1-A11
j1,b*j = Bbj, and
*i = B-1
i, where A1 is defined by Eq. [2], and
![]() | (6) |
A1 and B are chosen such that the plot space used by genotypes are the same as that by environments. Analogous to PC1 and PC2 in the SREG2 model, b*j
*i and
*j1
*i1 are referred to as the primary and secondary effects, respectively. All analyses were conducted using SAS (SAS Institute, 1996).
The Data
The data used in this study were from the 1989 to 1999 Ontario winter wheat performance trials (Yan, 1999). Each year, 10 to 33 winter wheat (Triticum aestivum L.) cultivars are tested with four to six replicates in seven to 14 sites representing the Ontario winter wheat growing areas. Previous analysis indicated that the yearly variance components due to environment (E) dominated the total yield variation, ranging from 55 to 91% and averaging 80% of the total variance. The variance component due to G ranged from 1.8 to 28.5%, whereas that due to GE ranged from 7.3 to 15.1% (Yan, 1999). G ranged from 13 to 65% of the total GGE. Analysis with the SREG2 biplot revealed that in all years except 1995 the environmental PC1 scores were of the same sign; and in all years except 1995 and 1996 the genotype PC1 scores showed high correlation with the mean yield of the genotypes (r > 0.93). Thus, in this study the 1995, 1996, and 1998 datasets, representing different types of relations between genotype PC1 versus genotype main effects, were chosen to compare the GGE biplot based on SREGM+1 with one based on SREG2. In addition, a complete subset of 11 cultivars by 34 environments (year-site combinations) extracted from the 1996 to 1999 trials was also used in the comparison.
| RESULTS |
|---|
|
|
|---|
|
|
In Fig. 1A, the sites fell into three sectors: the winning genotype for sites RN, WE, ID, and NN was Genotype 6; the winning genotype for sites WK, HN, and EA was Genotype 9; and the winning genotype for site OA was Genotype 29. Note that Genotype 9 was the best performer for WK, HN, and EA because markers of these sites were on Genotype 9's side of the perpendicular to the line that connects Genotypes 9's marker and that of genotype 6. Vertex genotypes without any site in their sectors were not the highest yielding genotypes at any site; moreover, they were the poorest genotypes at all or some sites. Genotypes within the polygon, particularly those located near the plot origin, were less responsive than the vertex genotypes. It can be appreciated that the supplementary lines on the biplot are critical for visual analysis of the MET data.
In addition, a near-perfect correlation between genotype primary effect scores and the genotype main effects allows both biplots, Fig. 1A, as well as Fig. 1B, to be used to evaluate cultivars for their yielding ability and stability and to evaluate environments for their discriminating ability and representiveness. Genotypes 6 and 9 gave the highest average yields (largest primary scores) and were relatively stable over the sites (small absolute secondary scores). In contrast, three non-adapted Genotypes 27, 28, and 31 yielded poorly at all sites, as indicated by their small primary scores (low yielding) and relatively small secondary scores (relatively stable). The average yield of Cultivars 1 and 20 were below average (primary scores <0) and highly unstable (large absolute secondary values). The biplots show not only the average yield of a genotype (the primary effect), but also how it was achieved. That is, the biplots also show the yield of a genotype at individual sites. For example, Cultivar 6 had the highest average yield because it yielded the highest at sites RN, WE, ID, and NN, and yielded above average at all other sites. On the other hand, the average yield of Cultivar 20 was below average, because it yielded below average at sites OA, EA, HN, WK, and NN, even though it was quite good at RN. A below-average yield is indicated if the virtual line from the origin to the marker of a genotype has an obtuse angle with the virtual line from the origin to the marker of a test site. Likewise, an above-average yield is indicated by an acute angle. Supplementary lines, not presented in the biplots, are required to explicitly determine these relationships.
With respect to the test sites, RN was most discriminating as indicated by the longest distance between its marker and the origin. However, due to its large secondary score, cultivar differences observed at RN may not exactly reflect the cultivar differences in average yield over all sites. Site NN was not the most discriminating, but cultivar differences at NN should be highly consistent with those averaged over sites because it had a near-zero secondary effect score. At a site with a near-zero secondary effect score, the genotypes are essentially ranked according to their primary effect scores (i.e., genotype main effects since they were perfectly correlated in this dataset) and the differences among genotypes are in proportion to the primary effect scores of the sites. Thus, a genotype that yielded well at such a site has a large average yield. On the contrary, site OA was neither discriminating (small primary effect score) nor representative (large secondary effect score); and therefore, cultivars had high yield at OA did not necessarily give high average yield over sites. Analysis of multiple year data indicated that OA represented a different mega-environment (eastern Ontario) from the major winter wheat growing regions in Ontario (Yan et al., 2000; Yan, 1999).
1996 Data
As with most datasets, the SREG2 biplot (Fig. 2A) for 1996 indicates that all PC1 scores of the sites were of the same sign, which was arbitrarily assigned positive so that the genotype PC1 scores correlated positively with the genotype main effect. However, as mentioned earlier, the correlation between the genotype PC1 scores and the genotype main effects for this dataset was only 0.85. The relatively poor correlation is associated with the fact that the GGE explained by PC1 is only slightly greater than that by PC2 (29.6 vs. 24.5%). The poor correlation prevents the genotype PC1 scores of the SREG2 solution being interpreted as representing the genotype main effect; in fact, it alone is not interpretable in known biological and agricultural terms. In such cases, the utility of a SREG2 biplot is limited to investigation of the which-won-where patterns. Based on Fig. 2A, Cultivar 1 was the best performer at sites RN, LN, ID, and WE; and Cultivar 2 was the best performer at sites EA, WK, CA, and OA, and nearly the best at HW.
|
1995 Data
The 1995 dataset was the only dataset found during the 1989 to 1999 Ontario winter wheat performance trials in which the site PC1 scores of the SREG2 differ in sign (Fig. 3A). Among the 14 test sites, four (Sites 4, 6, 7, and 10) had negative PC1 scores, though their absolute values were small. This led to poor a correlation between the cultivar PC1 scores and the cultivar main effects (r = 0.83). The SREG2 biplot indicates that cultivar G6 was the best for nearly all sites except Sites 4, 6, and 7, at which Cultivar G4 (and also G10) was better than G6. Cultivar G7 was as good as G6 for Sites 5 and 12. These patterns are similar in the SREGM+1 biplot (Fig. 3B). It indicates that Cultivar G6 was on average the best and Cultivar G12 the second best, and that Sites 5 and 12 were highly discriminating but neither was representative. Interestingly, all sites had positive primary effects in the SREGM+1 biplot, as compared with the site PC1 scores of different signs in the SREG2 biplot.
|
|
| DISCUSSION |
|---|
|
|
|---|
The SREGM+1 biplot was designed to be more interpretable than the SREG2 biplot. First, since the genotypic scores for the primary effect of SREGM+1 are designated to indicate the average yield (general adaptation) of the cultivars, the genotypic scores of the secondary effect must indicate GE interaction associated the cultivars, which is an indicator of selective or specific adaptation. Thus, the SREGM+1 biplot simultaneously displays both general adaptation and specific adaptation (stability) of the cultivars. The ideal cultivars are those with large primary effect scores but near-zero secondary scores. Second, because the genotypic primary effects indicate general adaptation of the cultivars, the environmental primary effects must indicate the ability of the environments to discriminate among the cultivars in terms of general adaptation. Environments with larger primary effects would thus facilitate identification of cultivars with better general adaptation. Third, analogous to the genotypic secondary effects, the environmental secondary effects must indicate the tendency of each environment to cause GE interaction. Environments with large (absolute) secondary effects should favor the performance of some cultivars, but disfavor others at the same time. Thus, cultivars selected under environments with large secondary effects may be highly specific to these environments but lack general adaptation or stability. Therefore, from the perspective of selection for high yielding and stable cultivars, the ideal test environments should have large primary effects, but near-zero secondary effects.
Why Correlation between Genotype Scores of PC1 in SREG2 and Genotype Main Effects Varies with Datasets
It was concluded that the SREGM+1 biplot is more desirable than the SREG2 biplot for MET data analysis because the interpretability of the latter is impacted by the uncertain relations between its primary effects and the genotype main effects. On the basis of the trials investigated in this study, Fig. 5 indicates that this correlation is strongly determined by the relative importance of G in GGE. Near-perfect correlation occurs when G is 40% or more of GGE (the 1992, 1993, 19971999 datasets), and poor correlation occurs when G is 20% or less of GGE (the 1995, 1996 and 19961999 datasets). The essence of principal component analysis is to pick up the most important pattern in the data using the smallest number of degrees of freedom. PC1 picks up the largest pattern, PC2 picks up the second largest pattern, and so on. A close correlation between PC1 scores and genotype main effects occurs only when the genotype main effect is large enough to be the most important component of GGE. A poor correlation occurs otherwise, which suggests strong and complex GE interaction in the data. Therefore, it is not surprising that the correlation between PC1 scores of SREG2 and genotype main effect is typically poor when multi-year data are analyzed in a genotype x environment (year-site) fashion, because greater and more complex GE interactions are sampled in a multi-year MET than in a single year MET. Complex GE interaction is usually accompanied by similar amounts of GGE explained by PC1 and PC2 (as for the 1996 and 19961999 datasets, Table 1), as opposed to much more GGE explained by PC1 than by PC2 (e.g., the 1998 dataset).
|
A single year data may indeed have limited value because of the year-to-year variation. Nevertheless, we believe biplot analysis of single year MET data is worthwhile for the following reasons. First, the GGE biplot is a graphic display of the G and GE of the data, which are relevant to cultivar evaluation and mega-environment identification. Therefore, if the researcher believes that a single year MET is worthy of analysis, and we believe most researchers do, the GGE biplot technique should be the first choice. Although the biplot does not add new information to the data, it does help the researcher quickly view the patterns that are in the data. The biplot gives the researcher the power to "see" what was going on in a particular year. Some may question the usefulness of the single year patterns if they are not repeatable over years. But without knowing the patterns from individual years, how could one know if they are repeatable or not? Second, the biplot can be used to identify research problems. For example, if two cultivars were found to perform the best in two different groups of locations in a particular year, one might want to know what were the underlying reasons, and answers to this question may lead to valuable findings. By relating biplot scores to explanatory variables collected in the trials, Yan and Hunt (2001) was able to reveal that in Ontario, Canada, tall and late winter wheat cultivars tended to be favored in seasons with cold winters and cool summers, whereas early and short cultivars tended to be favored in seasons with warm winters and hot summers. Third, the biplot patterns based on a single year MET can serve as hypotheses, which can be tested using extended data and more critical statistics. For example, biplots based on yearly data from the Ontario winter wheat performance trials led to the hypothesis that two eastern Ontario sites (Ottawa and Kemptville) constituted a mega-environment different from the rest of the Ontario winter wheat growing region, which was subsequently tested and supported by variance component analysis based on pooled data from 11 yr of performance trials (Yan, 1999). Thus, although conclusions from a single year MET may not be decisive, they are valuable suggestions. Fourth, even if the which-won-where pattern is proven to be unrepeatable over years, the researcher would still want to know the average yield and the stability of the cultivars based on each year's MET. These two aspects of cultivar performance are graphically depicted by the abscissa and ordinate of the biplot, respectively. Finally, although a biplot from a single year may not be very informative, biplots constructed from several years can be highly valuable.
Moreover, the biplot technique is not limited to single year MET data analysis. It can also be applied to balanced subsets extracted from multiple years of trials. In Ontario, for example, over 20 winter wheat cultivars are common to three to four years of performance trials, and a balanced subset from such database should contain valuable information. Furthermore, the biplot technique is not even limited to genotype x environment data analysis. It can also be used in displaying and analyzing other types of two-way data such as genotype x trait data and diallel cross data (Yan, unpublished research). In conclusion, the GGE biplot is a useful tool for, but not limited to, MET data analysis.
Received for publication February 14, 2000.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Burgueno, J. Crossa, P. L. Cornelius, and R.-C. Yang Using Factor Analytic Models for Joining Environments and Genotypes without Crossover Genotype x Environment Interaction Crop Sci., July 1, 2008; 48(4): 1291 - 1305. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Suriharn, A. Patanothai, K. Pannangpetch, S. Jogloy, and G. Hoogenboom Yield Performance and Stability Evaluation of Peanut Breeding Lines with the CSM-CROPGRO-Peanut Model Crop Sci., July 1, 2008; 48(4): 1365 - 1372. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Roozeboom, W. T. Schapaugh, M. R. Tuinstra, R. L. Vanderlip, and G. A. Milliken Testing Wheat in Variable Environments: Genotype, Environment, Interaction Effects, and Grouping Test Locations Crop Sci., January 16, 2008; 48(1): 317 - 330. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N. Egesi, P. Ilona, F. O. Ogbe, M. Akoroda, and A. Dixon Genetic Variation and Genotype x Environment Interaction for Yield and Other Agronomic Traits in Cassava in Nigeria Agron. J., June 26, 2007; 99(4): 1137 - 1142. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Joshi, G. Ortiz-Ferrara, J. Crossa, G. Singh, G. Alvarado, M. R. Bhatta, E. Duveiller, R. C. Sharma, D. B. Pandit, A. B. Siddique, et al. Associations of Environments in South Asia Based on Spot Blotch Disease of Wheat Caused by Cochliobolus sativus Crop Sci., May 31, 2007; 47(3): 1071 - 1081. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Dardanelli, M. Balzarini, M. J. Martinez, M. Cuniberti, S. Resnik, S. F. Ramunda, R. Herrero, and H. Baigorri Soybean Maturity Groups, Environments, and Their Interaction Define Mega-environments for Seed Composition in Argentina Crop Sci., July 25, 2006; 46(5): 1939 - 1947. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Gauch Jr. Statistical Analysis of Yield Trials by AMMI and GGE Crop Sci., May 18, 2006; 46(4): 1488 - 1500. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Navabi, R.-C. Yang, J. Helm, and D. M. Spaner Can Spring Wheat-Growing Megaenvironments in the Northern Great Plains Be Dissected for Representative Locations or Niche-Adapted Genotypes? Crop Sci., March 27, 2006; 46(3): 1107 - 1116. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Dehghani, A. Ebadi, and A. Yousefi Biplot Analysis of Genotype by Environment Interaction for Barley Yield in Iran Agron. J., March 2, 2006; 98(2): 388 - 393. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Blanche and G. O. Myers Identifying Discriminating Locations for Cultivar Selection in Louisiana Crop Sci., February 24, 2006; 46(2): 946 - 949. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. O. PB. Samonte, L. T. Wilson, A. M. McClung, and J. C. Medley Targeting Cultivars onto Rice Growing Environments Using AMMI and SREG GGE Biplot Analyses Crop Sci., October 27, 2005; 45(6): 2414 - 2424. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan and N. A. Tinker An Integrated Biplot Analysis System for Displaying, Interpreting, and Exploring Genotype x Environment Interaction Crop Sci., May 6, 2005; 45(3): 1004 - 1016. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Lee, T. K. Doerksen, and L. W. Kannenberg Genetic Components of Yield Stability in Maize Breeding Populations Crop Sci., November 1, 2003; 43(6): 2018 - 2027. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Narro, S. Pandey, J. Crossa, C. De Leon, and F. Salazar Using Line x Tester Interaction for the Formation of Yellow Maize Synthetics Tolerant to Acid Soils Crop Sci., September 1, 2003; 43(5): 1718 - 1728. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan Singular-Value Partitioning in Biplot Analysis of Multienvironment Trial Data Agron. J., September 1, 2002; 94(5): 990 - 996. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Crossa, P. L. Cornelius, and W. Yan Biplots of Linear-Bilinear Models for Studying Crossover Genotype x Environment Interaction Crop Sci., March 1, 2002; 42(2): 619 - 633. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan and I. Rajcan Biplot Analysis of Test Sites and Trait Relations of Soybean in Ontario Crop Sci., January 1, 2002; 42(1): 11 - 20. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan and L. A. Hunt Biplot Analysis of Diallel Data Crop Sci., January 1, 2002; 42(1): 21 - 30. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||