|
|
||||||||
Eastern Cereal and Oilseed Research Center, Agriculture and Agri-Food Canada, 960 Carling Ave., Ottawa, Ontario, Canada, K1A 0C6
* Corresponding author (wyan{at}ggebiplot.com; yanw{at}agr.gc.ca)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: AMMI, Additive main effect and multiplicative interaction GE, genotype x environment interaction GGE, genotype main effect plus genotype x environment interaction MET, multi-environment trials PC, principal component(s) PCA, principal component analysis PLSR, partial least squares regression SVD, singular value decomposition
| INTRODUCTION |
|---|
|
|
|---|
A genotype x trait biplot (Yan and Rajcan, 2002; Yan and Kang, 2003; Lee et al., 2003) graphically approximates a genotype x trait two-way table. Such a biplot can be used to visualize the genetic correlations among traits (breeding objectives), which facilitates a systems understanding of the crop. Understanding the trait relationships also facilitates identification of traits that can be used in indirect selection for a target trait and those that may be redundantly measured. A genotype x trait biplot can also be used to visualize the merits and shortcomings of individual genotypes, which is important for both cultivar evaluation and parent selection.
In spite of their many useful features, genotype x environment biplots have been criticized for not being able to incorporate information on genetic covariables that may explain the GE patterns. To amend this, factorial regression using genetic and/or environmental covariables has been suggested to complement biplot analysis (van Eeuwijk et al., 1996; Vargas et al., 1999; Brancourt-Hulmel and Lecomte, 2003). In this regard, the partial least squares regression (PLSR) plot provides a more attractive approach. It displays genotypes, environments, and genetic covariables in a single PLSR plot (Vargas et al., 1998, 1999; Crossa et al., 1999), which, therefore, may be referred to as a "tri-plot." A PLSR tri-plot appears to combine a genotype x environment biplot and a genotype x trait biplot by using the same set of genotype scores. The genotype scores in a PLSR tri-plot are selected such that the variation from both tables displayed by the tri-plot is maximized. A tri-plot is supposed to have interpretations of both the genotype x environment biplot and the genotype x trait biplot. Moreover, by combining two biplots, the tri-plot brings genetic covariables (e.g., genetic values of some traits) and environments in the same plot, which may allow the genotype x environment patterns for the target trait to be interpreted relative to trait x environment interactions. A genotype x environment biplot displays the most important patterns of the genotype x environment data; a genotype x trait biplot displays the most important patterns of the genotype x trait table; a PLSR tri-plot, however, may fail to adequately display both the genotype x environment patterns and the genotype x trait patterns, and as a result, it may have limited use in interpreting the genotype x environment patterns. This occurs when many irrelevant traits are included in the genotype x trait table. Tri-plots can also be constructed from other multivariate analyses such as redundancy analysis and canonical correspondence analysis (A.F. Zuur, http://www.brodgar.com/ordination.htm; verified 6 January 2005).
Yan and Hunt (2001) presented another approach for incorporating genetic and environmental covariables in MET data analysis, in which the observed GGE patterns were explained as interactions between genetic covariables and environmental covariables. This was achieved by relating the geneticenvironmental covariables to the genotypicenvironmental scores of the first two principal components derived from GGE biplot analysis. These correlation coefficients could be plotted to form a genetic covariable vs. environmental covariable biplot or superimposed on the GGE biplot so that the interactions between them can be visualized.
A full understanding of MET data encompasses (i) the GGE patterns of a target trait, (ii) the genotype x trait patterns in individual environments or across environments, and (iii) whether and how the GGE patterns for a target trait can be explained and exploited using other traits. The purpose of this paper is to describe a "covariable-effects biplot" that can be used to interpret and explore the GGE patterns of the target trait relative to genetic covariables (genetic values of explanatory traits, QTL, or genes). This biplot, together with the previously described GGE biplot and genotype x trait biplot, constitutes an integrated biplot system for MET data analysis.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
![]() | [1] |
1 and
2 are the singular values of PC1 and PC2, respectively;
i1 and
i2 are the eigenvectors of genotype i for PC1 and PC2, respectively;
1j and
2j are the eigenvectors of environment j for PC1 and PC2, respectively; and
ij is the residual associated with genotype i and environment j. To generate a GGE biplot, Eq. [1] was reorganized as follows:
![]() | [2] |
![]() | [3] |
Generating a Genotype x Trait Biplot
The procedures of generating a genotype x trait biplot are the same as described above for the GGE biplot except that environments are replaced with traits. Such a biplot facilitates visualization of the genetic correlations among traits (Yan and Rajcan, 2002; Lee et al., 2003) and evaluation of the genotype on the basis of multiple traits (Yan and Kang, 2003). Trait-focused singular-value partitioning (Eq. [3]) was used for appropriate visualization of the genetic correlations among traits.
Generating a Covariate-Effect Biplot
The complete dataset used in this study was a two-way table of 145 rows for the genotypes and 21 columns for the explanatory traits plus 25 columns for the yield data in each environment (Table 1). In attempting to explain the GE of yield relative to explanatory traits, a 21 by 25 trait x environment two-way table was first constructed, which contains correlation coefficients between yield and the genetic values of each trait in each of the environments. The correlation coefficients were used as a measure of the effects of the explanatory traits on yield. If the genetic values of the traits are generalized as genetic covariables, the correlation table may be referred to as a table of genetic covariable effects. After the covariate-effect table was constructed, the traits were screened for their relevance; a trait was regarded as relevant if it had a significant association (P < 0.05) with yield in at least one of the environments. The covariate-effect table with eight traits that survived the screening is presented in Table 2.
|
![]() | [4] |
1 and
2 are the singular values of PC1 and PC2, respectively;
i1 and
i2 are the eigenvectors of trait i for PC1 and PC2, respectively;
1j and
2j are the eigenvectors of environment j for PC1 and PC2, respectively; and
ij is the residual associated with trait i and environment j. To generate a covariate-effect biplot, the singular values (
1 and
2) were partitioned between the trait and the environment eigenvectors so that Eq. [4] could be written as
![]() | [5] |
![]() | [6] |
Congruency between the GGE Pattern and the Covariate-Effect Pattern
Calculating the congruency coefficient between the GGE pattern in the GGE biplot and the covariate-effect pattern in the covariate-effect biplot consisted of two steps. The first step was to calculate the distance matrices among environments in the two biplots. The distance between two environments was calculated as
![]() |
| RESULTS |
|---|
|
|
|---|
|
This biplot explained only 31% of the GGE, implying that the GE for yield in this dataset was complex. A biplot of PC3 vs. PC4 (not shown) did not reveal any discernible patterns, indicating that the GGE biplot of PC1 vs. PC2 adequately displayed the GGE patterns.
Effect of Genetic Covariables on Yield in Different Environments
The covariate-effect biplot based on the trait x environment table of correlations including 21 traits is presented in Fig. 2a. It explained 81% of the total variation of the covariate-effect table and is, therefore, a good approximation of it. Because it is based on trait-focused singular-value partitioning, it is appropriate for visualizing the effects of the traits on yield as well as the similarities among traits in response to the environment. The rays connecting the traits to the biplot origin are referred to as trait vectors. The vector length of a trait measures the magnitude of its effect (positive or negative) on yield. Kernel weight, days to maturity, days to heading, and lodging score had relatively long vectors, suggesting that they had relatively large effects on yield in one or more environments. In contrast, most quality traits such as amylase activity, soluble protein content, soluble/total protein ratio, diastatic power, extraction difference, etc., had short vectors, suggesting that they had little association with yield in all environments. When traits not significantly (P < 0.05) associated with yield in any of the environments were removed before biplot analysis, Fig. 2a was reduced to Fig. 2b. A comparison between the two biplots reveals that it is the traits with short vectors that were removed. This provides an empirical support to the statement that the vector length of a trait is a measure of its effect on yield. The numerical data on which Fig. 2b was based are presented in Table 2.
|
Megaenvironment Identification Based on the Covariate-Effect Biplot
Figure 2c represents the same biplot as Fig. 2b but is based on environment-focused singular-value partitioning. It is, therefore, more appropriate for visualizing the relationship among environments relative to yield-trait relations. Interestingly, the 25 environments fell into two non-overlapping clusters, whose members happen to be the same as the two megaenvironments revealed in the GGE biplot (Fig. 1). The congruency between the GGE pattern (Fig. 1) and the covariate-effect pattern (Fig. 2c) is 0.904, indicating that the response of trait-yield relations to the environment explained 0.9042
81% of the observed GGE pattern. In other words, the observed GGE pattern can be effectively explained by the covariate-effect pattern, which implies that the observed GE in the GGE biplot can be effectively exploited by developing trait-selection strategies specific to each megaenvironment.
Strategies of Indirect Selection for Different Megaenvironments
Dividing the target environments into meaningful megaenvironments and deploying different cultivars for different megaenvironments is the only way that positive GE can be exploited, and negative GE avoided. Evidence of the eastern vs. western megaenvironment differentiation in the GGE biplot (Fig. 1) implies that different cultivars should be selected and deployed for the two megaenvironments. Evidence of the same megaenvironment differentiation in the covariate-effect biplot (Fig. 2c) suggests that different trait-selection strategies can be developed in breeding for higher yield for each megaenvironment.
Figure 3a represents the covariate-effect biplot involving only the eastern locations (Prince Edward Island, Ontario, Quebec, and Manitoba). All traits except powdery mildew susceptibility and days to maturity showed consistent effects on yield across environments. Specifically, kernel weight, test weight, plumpness, and protein content had consistently positive associations with yield, as evidenced by the acute angles between the vectors of these traits and the vectors of all environments except QC92. On the contrary, lodging score and days to heading (not days to maturity, though), showed consistently negative associations with yield, as evidenced by the obtuse angles between their vectors and the vectors of all environments except QC92. Therefore, selection for larger kernel weight, earlier heading, and better lodging-resistance should lead to increased yield in the eastern megaenvironment. The correlation coefficients between yield and these three traits are summarized in Table 3 for each environment and megaenvironment.
|
|
Genetic Correlations among Traits
Despite the fact that the covariate-effect patterns differed dramatically in the two megaenvironments, the relationships among traits relative to their effects on yield were more or less similar. That is, in both megaenvironments (Fig. 3a vs. Fig. 3b), as well as across all environments (Fig. 2b), kernel weight, test weight, protein content, and plumpness constitute one group of traits with similar effects on yield (indicated by the acute angles among them); days to heading and lodging score constitute another group. The effects of the two groups were more or less opposite, however, as indicated by the obtuse angles between them. Genetic correlation among traits may be underlying reason for this. Figure 4 represents the genotype x trait biplot based on the genotype x trait two-way table, which contains the genetic values of the traits for each genotype. This biplot, therefore, approximately displays the genetic correlations among the traits. The eight traits fell into three relatively independent groups: kernel weight, test weight, protein content, and plumpness constitute one group with positive associations among them. Days to heading, days to maturity, and powdery mildew susceptibility constitute another group of positively associated traits. The latter group can be explained by the QTL mapping results, which reveal a strong QTL for both days to heading and days to maturity in the middle of chromosome 4, and a major gene for powdery mildew resistance ("dMlg") in the same region (Tinker et al., 1996). Lodging score is negatively associated with both groups of traits.
|
| DISCUSSION |
|---|
|
|
|---|
|
GGE Biplot vs. Other Genotype x Environment Biplots
Although we have recommended using a GGE biplot in studying genotype x environment tables, there are other types of biplots that have been used in studying such data (DeLacy et al., 1996). All biplots can be useful as long as they are interpreted correctly. Some comparisons of six types of GE-containing biplots are briefly given below.
Since cultivar evaluation and megaenvironment classification must be based on both G and GE (Gauch and Zobel, 1996), all biplots that display both G and GE can be useful for this purpose. All biplots listed above, except the GE biplot, contain G and some GE. The GE biplot is most powerful for studying GE per se, but cannot be used in selecting superior cultivars because it excludes G. Since a GGE biplot displays the maximum G+GE of all biplots and has many convenient interpretations, it is considered to be the most appropriate biplot for cultivar evaluation (Yan et al., 2000; Crossa et al., 2002). One attractive feature of the GGE biplot in cultivar evaluation is to graphically show the which-won-where pattern of a genotype x environment two-way data (Yan et al., 2000). All other biplots can at best approximate the GGE biplot in this regard. The AMMI1 biplot, although effective in summarizing the genotype x environment data, cannot show which-won-where because it does not have the inner-product property of a normal biplot. Some graphs (not biplots) based on AMMI analysis do address the "which-won-where" issue (Gauch and Zobel, 1997). For a given dataset, all types of biplots listed above except the SHMM biplot can be readily generated and visualized using the GGEbiplot software.
Four Questions Must Be Asked before Attempting to Interpret a Biplot
Four questions must be asked before trying to interpret a biplot. First, what is the model on which the biplot is based? This determines the type of questions that can be potentially addressed by the biplot. For example, a GGE biplot can be used in cultivar evaluation and recommendation, whereas a GE biplot cannot. Second, what is the singular-value partitioning method used in generating the biplot? This determines whether the biplot is appropriate for visualizing the relationships among genotypes or those among environments (or traits). Third, how much variation is explained by the biplot? This determines the credibility of the biplot; interpretations about entries and testers with short vectors may not be accurate if the biplot explained only a small fraction of the total variation. Last but not least, are the biplot axes drawn to scale? If not, the relations displayed by the biplot are distorted and the interpretations can be misleading. Biplot analysis reported in this paper was conducted using the GGEbiplot software, which explicitly addresses these questions and ensures that correct biplots are generated for particular purposes. GGEbiplot is user-friendly, feature-rich software for biplot analysis; it was developed for researchers with limited training in statistics and computer application.
GGE Patterns vs. GE Patterns
We have referred to the patterns observable from a GGE biplot as GGE patterns. With careful interpretations, however, meaningful statements about G and GE can be made from such patterns while maintaining the advantages of simultaneous display. The patterns that can be visualized in a GGE biplot include patterns regarding the genotypes, patterns regarding the environments, and patterns regarding both genotypes and environments (e.g., the which-won-where pattern). The genotype patterns are attributable to a mixture of G and GE. This is why a GGE biplot allows visualizing both mean performance and stability of the genotypes. Although G and GE are confounded in the GGE biplot, it is possible to distinguish patterns due to G from those due to GE. The best way is to use the average-environment coordination (AEC) view of the GGE biplot (Yan, 2001): the AEC-abscissa represents variation due to G and the AEC-ordinate presents variation due to GE. If there is no GE, all genotypes would fall on the AEC-abscissa, which would be parallel to the PC1 axis, as PC1 would explain 100% of the total variation of the environment-centered or standardized genotype x environment data. Any deviation from this pattern is due to GE. On the other hand, if there is little G in the data, both genotypes and environments would be scattered in the biplot in all directions. Typically, PC1 scores are highly correlated with G if G is >20% of G+GE (Yan et al., 2001). Thus, if G is sizable, PC1 will be dominated by G and all other PC dominated by GE. If G is trivial, all PC should be dominated by GE. In either case, anything not explained in the GGE biplot is mostly GE. If the GGE biplot explains a relatively small proportion of the total GGE, the GE in the data must be complex.
It is relevant here to emphasize that the distinction between G and GE may be meaningful only within the scope of environments in which they are estimated. The so-called G estimated across a small set of environments may well be GE if put into a larger scale of environments. Yan and Hunt (2001) demonstrated that G estimated from individual years was actually GE when compared across years. The GGE biplot interprets G as proportionate responses of genotypes to the environment (Yan et al., 2000). Although the patterns of genotypes can be separated into patterns attributable to G and GE, it is neither necessary nor beneficial from the viewpoint of cultivar evaluation.
In contrast to the genotypic patterns, the patterns regarding the environments in a GGE biplot are solely attributable to GE because E has been removed. If there is no GE in the GGE biplot, all environments would fall on a single point on the PC1-axis. Therefore, it is legitimate to discuss GE patterns based on a GGE biplot. With this understanding, the differentiation among environments in the GGE biplot should be consistent with that based on a GE biplot. To verify, the two megaenvironment classification based on the GGE biplot (Fig. 1) is also obvious on the GE biplot (Fig. 5). In the latter biplot, the western vs. eastern megaenvironment differentiation is clearly reflected on the interactive PC1. Although the interactive PC2 explains a sizable GE, no meaningful pattern can be found (Fig. 5). In general, the GE biplot is more powerful in environment classification than the GGE biplot because it displays more GE, although the GGE biplot is the single most informative biplot for both genotype and environment evaluation.
|
Figure 6 represents the which-won-where view of the same GGE biplot in Fig. 1. If which-won-where is used as the sole criterion, there would be three megaenvironments, defined by the wining genotypes 175 (five environments from SK and AB), 826 (three environments from ON), and 829 (17 environments from all locations), respectively. Obviously, this classification is meaningless because, on one hand, niches of line 175 and line 826 were actually part of the line 829 niche and, on the other hand, the line 829 niche covered all locations that were apparently different.
|
The western vs. eastern megaenvironment classification is meaningful because it meets the following criteria. First, the apparent differentiation of the two environment clusters on the biplot (Fig. 1 or Fig. 6) meets the common principle of classification that variation between clusters is maximized and that within clusters minimized. Second, the classification is consistent with the geographical locations of the test environments. Third, the location grouping was relatively consistent across years. And, finally, the two megaenvironments did have different winning genotypes: lines 175 and 829 were the winning genotypes for the western megaenvironment, whereas lines 829 and 826 were the winning genotypes for the eastern megaenvironment. This example illustrates that a megaenvironment may have more than one winning genotypes and that even if there exists a universal winner, it is still possible, and beneficial, to divide the target environments into meaningful megaenvironments.
The Covariate-Effect Biplot as a Graphical Tool for Interpreting GE
A covariate-effect biplot was proposed in this paper so that the GGE patterns for yield can be interpreted using explanatory traits. This biplot allows visualizing the effects of each explanatory trait on yield in each of the environments. Eight traits (kernel weight, heading, lodging, maturity, powdery mildew susceptibility, protein content, test weight, and plumpness) had associations with yield in at least one environment, and their relations with yield in different environments explained 81% of the GGE pattern, implying that the GGE pattern can be effectively exploited by developing trait-selection strategies specific to each megaenvironment.
The explanatory traits were used as genetic covariables in our analysis. That is, the correlation coefficients are obtained on the basis of the genetic values of the explanatory traits. One may question whether this is legitimate when the explanatory traits may express GE themselves. The justification is that GE associated with each trait is largely removed when traits are averaged across environments. Traits that strongly interact with the environment would have little variation after being averaged across environments; as a result, their associations with the target trait in individual environments would be trivial. Such traits would not survive the relevance screening, or if they do, they would fall near the origin of the covariate-effect biplot. Thus, all explanatory traits that survived the relevance screening should have relatively small GE relative to G.
A disadvantage of using genetic values of the explanatory traits is that traits with large GE may be regarded as irrelevant even if they are important. This is likely to occur for traits whose interactions with the environment are parallel with the GE of the target trait. To amend this, a covariate-effect table could be generated from phenotypic values of the explanatory traits in each environment. A potential problem of this approach is that the table may contain many missing cells because it is common that not all traits are measured in all environments. Nevertheless, whenever possible, it is advisable to examine both types of covariate-effect tables.
The covariate-effect biplot approach should be applicable when the explanatory traits are replaced by other genetic covariables such as molecular markers and gene sequences. When explanatory traits are replaced by genetic markers, the covariate-effect biplot can be used to identify QTL for the target trait (Yan et al., 2005), and to investigate the QTL x environment interactions (Yan et al., 2005; Yan and Tinker, 2005). The linear correlation coefficients in the covariate-effect table can also be replaced by linear regression coefficients, with response variables used as dependent variables and explanatory variables as independent variables.
The covariate-effect biplot based on a trait x environment table of covariate effects should not be confused with the trait x environment biplot based on a trait x environment table of trait values (Lee et al., 2003). The latter biplot can be used to visualize the environmental correlations among traits.
With minor modifications, the covariate-effect biplot described here can also be used to interpret GE using environmental factors. In this case, the environmental factors are regarded as explanatory variables while environment-centered or standardized data of the genotypes for a target trait as response variables (Table 4). The centering or standardization is necessary to remove the environment main effects. On the basis of Table 4, an environmental factor x genotype two-way table of covariate effects can be constructed, which can then be visualized by means of a biplot (not shown).
| ACKNOWLEDGMENTS |
|---|
| NOTES |
|---|
|
|
|---|
Received for publication February 6, 2004.
| REFERENCES |
|---|
|
|
|---|
Related articles in Crop Science:
This article has been cited by other articles:
![]() |
D. Baxevanos, C. Goulas, J. Rossi, and E. Braojos Separation of Cotton Cultivar Testing Sites based on Representativeness and Discriminating Ability Using GGE Biplots Agron. J., August 11, 2008; 100(5): 1230 - 1236. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Naeve, T. A. O'Neill, and J. E. Miller-Garvin Canopy Nitrogen Reserves: Impact on Soybean Yield and Seed Quality Traits in Northern Latitudes Agron. J., May 7, 2008; 100(3): 681 - 689. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Putto, A. Patanothai, S. Jogloy, and G. Hoogenboom Determination of Mega-Environments for Peanut Breeding Using the CSM-CROPGRO-Peanut Model Crop Sci., May 1, 2008; 48(3): 973 - 982. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan and N. A. Tinker DUDE: A User-Friendly Crop Information System Agron. J., June 5, 2007; 99(4): 1029 - 1033. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Robins, B. L. Waldron, K. P. Vogel, J. D. Berdahl, M. R. Haferkamp, K. B. Jensen, T. A. Jones, R. Mitchell, and B. K. Kindiger Characterization of Testing Locations for Developing Cool-Season Grass Species Crop Sci., May 31, 2007; 47(3): 1004 - 1012. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan, M. S. Kang, B. Ma, S. Woods, and P. L. Cornelius GGE Biplot vs. AMMI Analysis of Genotype-by-Environment Data Crop Sci., March 1, 2007; 47(2): 643 - 653. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||