|
|
||||||||
Crop Science Division, Dep. of Plant Agriculture, Univ. of Guelph, Guelph, ON, Canada N1G 2W1
* Corresponding author (wyan{at}uoguelph.ca)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: ATC, average tester coordinate FHB, Fusarium head blight GCA, general combining ability GGE, genotype main effect plus genotype x environment interaction MET, multi-environment trial PC, principal component PC1, first principal component PC2, second principle component PSB, pink stem borer SCA, specific combining ability SNB, Stagonospora nodorum blotch SREG, site regressions
| INTRODUCTION |
|---|
|
|
|---|
The concept of biplot was developed by Gabriel (1971) to graphically display a rank-two matrix, which is a matrix resulting from multiplying a matrix with two columns by a matrix with two rows. The significance of this concept is that if a two-way dataset can be sufficiently approximated by a rank-two matrix, then it can be graphically displayed and investigated. Bradu and Gabriel (1978) explored the use of the biplot as a diagnostic tool for choosing an appropriate model for the analysis of two-way data. Since then, the biplot has been used in multi-environment trial (MET) data analysis.
In analyzing Ontario winter wheat performance trial data, Yan (1999) and Yan et al. (2000) proposed a GGE biplot, constructed from the first two principal components (PC1 and PC2) derived from PC analysis of environment-centered yield data. It was termed GGE biplot to emphasize that it displays both genotype main effect (G) and genotype x environment interaction (GE), which are two sources of yield variation that are relevant to, and must be considered simultaneously in, cultivar evaluation (Gauch and Zobel, 1996). Such a biplot has been used previously (Cooper et al., 1997), but the methodology formulated in Yan et al. (2000) allows the following issues to be visually addressed: (i) determination of the best performing cultivar in a given environment; (ii) identification of the most suitable environment for a given cultivar; (iii) comparison of any pair of cultivars in individual environments; and (iv) best cultivars for each environment and mega-environment differentiations. Later, Yan et al. (2001) developed an alternative GGE biplot that allows two additional questions to be explicitly addressed: (i) average yield and stability of the genotypes; and (ii) discriminating ability and representativeness of the environments, which are important for visual evaluation of cultivars and test environments for a given mega-environment.
Although the GGE biplot methodology was developed for MET data analysis, it should be applicable to all types of two-way data that assume an entry-by-tester data structure. In MET data, genotypes are entries and environments are testers. In diallel data, each genotype is both an entry and a tester. Our first objective is to demonstrate the use of biplot for diallel data interpretation. Another purpose of this paper is to provide a detailed description of the method for constructing and interpreting a GGE biplot.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
|
When GGE biplot is applied to diallel data, the terms average yield and stability of the genotypes are correspondent to GCA and SCA, respectively, of the parents (Note that in conventional diallel analyses, SCA is associated with crosses rather then parents).
The SREG2 model is written as:
![]() | [1] |
1 and
2 are the singular values for PC1 and PC2, respectively;
i1 and
i2 are the PC1 and PC2 eigenvectors, respectively, for Entry i;
j1 and
j2 are the PC1 and PC2 eigenvectors, respectively, for Tester j; and
ij is the residual of the model associated with the combination of Entry i and Tester j. Since in diallel cross data each genotype is both an entry and a tester, i and j can refer to the same or different genotypes. When i = j, the combination is a pureline rather than a hybrid.
In some statistical software such as the Statistical Analysis System (SAS Institute, 1996), the singular values are usually combined with their respective row (entry) eigenvectors so that equation [1] looks like:
![]() | [2] |
To display PC1 and PC2 in a biplot, it is rearranged as
![]() | [3] |
*ik = 
k
ik and
*jk = 
k
jk, with k = 1 or 2. This singular-value partitioning method is called symmetrical scaling.
Obtaining PC1 and PC2 Scores for Biplot Construction
Below we show how to obtain the scores for biplots using SAS, using the Fusarium study data as an example. When Table 1 is saved in space delimited text format, the following SAS statements will implement Eq. [2]:
The SAS keyword COV specifies that the variance-covariance matrix calculated from the tester-centered diallel data be used in the principal component analysis. By default, without this specification, the correlation coefficient matrix would be used instead (SAS Institute, 1996). The SAS output of this program include, among others, (i) the eigenvectors of the first two PCs for each entry (
1
i1 and
2
i2) and each tester (
j1 and
j2), which are listed in Table 4, (ii) the eigenvalues for PC1 and PC2, which are 304.227 and 205.019, respectively. The singular value for a PC is the square root of the sum of squares explained by the PC, which is the product of the eigenvalue multiplied by the number of entries. Therefore, the square root of the singular value for the kth PC is calculated as:
![]() |

k to divide the entry eigenvectors and to multiply the tester eigenvectors:
![]() |
|

1 = 
= 6.793 and 
2 = (205.019 x 7)1/4 = 6.155. These two values are used to divide the entry eigenvectors and to multiply the tester eigenvectors of Table 4 for PC1 and PC2, respectively, resulting in the symmetrically scaled PC1 and PC2 scores listed in Table 5. Values in Table 5 are used to construct the biplot (Yan et al., 2000, Fig. 1).
|
| RESULTS |
|---|
|
|
|---|
|
B > A, which is roughly consistent with the order of G
F > E
D
C > B > A, based on entry means (Table 1). Although there are variations, the entries with largest and smallest GCA effects were correctly identified by the biplot. Because the biplot displays both GCA and SCA, and because GCA and SCA are orthogonal, if projections of the entries onto the ATC abscissa approximate their GCA effects, as just demonstrated, then projections of the entries onto the ATC ordinate must approximate their SCA effects, which represent the tendency of the entries to produce superior hybrids with specific testers. Entries G and A had the highest SCA effects (largest projections on to the ATC ordinate), whereas Entry E had the smallest SCA effect (smallest projection on to the ATC ordinate). Entries G and A had the highest SCA because they interacted positively with Testers B, C, D, and F, but negatively with themselves.
Two heterotic groups are suggested by Fig. 1A: Genotypes A and G as one group, and Genotypes B, C, D, and F as the other. Therefore, eight crosses, that is, [A, G] x [B, C, D, F] are expected to show heterosis defined as better than both parents. Entry E located near the ATC abscissa, and did not seem to belong to any of the groups.
Best Testers for General Combining Ability
An ideal tester should be highly discriminating of the entries and be highly representative of all testers. It is, therefore, defined as a tester that has the longest vector of all testers (i.e., most discriminating) and zero projection onto the ATC ordinate (i.e., most representative of the testers). Therefore, the closer a tester's marker is to the ideal tester, the better it is as a tester. This ideal tester happens to coincide with the Tester E (Fig. 1A). Thus, Genotype E was the best tester in this dataset. The speculation is that the GCA effects of the entries should be reasonably assessed by the performance of their hybrid with Genotype E. Indeed, the entries are in the order of G > F > B
C
D > E > A, based on the actual values of the hybrids with Tester E, which is in rough agreement with the order of G
F > E
C
D > B > A, based on the GCA effects, with the exception that Entry E per se was misplaced (Table 1).
Best Hybrids
The polygon view of a biplot provides the best way for visualizing the interaction patterns between entries and testers and to effectively interpret a biplot (Fig. 1B). It is drawn by connecting the entry markers positioned furthest from the plot origin using straight lines to form a polygon (or convex hull) such that all other entry markers are contained within the polygon. Lines perpendicular to each side or its extension of the polygon are drawn from the plot origin, which divide the biplot into several sectors, and each tester inevitably falls into one of the sectors. An interesting property of this polygon view of biplot is that testers falling into the same sector share the same best mating partner, which is the entry at the vertex of the polygon in that sector (Yan et al., 2000).
All interpretations of a biplot are based on the simple rule: the value of the hybrid between an entry and a tester is visualized by the projection of the entry's marker onto the vector of the tester or its extension (a vector of a tester is the line from the biplot origin towards the marker of the tester). Envision that a tester is located exactly on the perpendicular line to a side of the polygon; in other words, envision a tester whose vector coincides with the perpendicular line to a side of the polygon. Since the two vertex entries connected by this particular polygon side, and all entries located on this polygon side, have exactly the same projections onto the vector of the envisioned tester, they should be equally good mating partners with regard to the envisioned tester. Actually, they should be equally good with regard to all testers located on this perpendicular. They should be different from one another, however, with regard to all other testers. One entry will be better than the other with regard to testers located on its side of the perpendicular, and poorer than the other with regard to testers located on the other side of the perpendicular. Thus, the perpendicular lines divide the entries into groups.
Since the vertex entries have the largest distances from the origin, they are most responsive to the change of testers relative to other entries within respective groups. They are either the best or the poorest mating partners with some or all of the testers. It follows that the perpendiculars to the sides of the polygon also provide a way to group the testers based on their best mating partners. Thus, testers that fall in the same sector share the same best mating partner, which is the entry at the vertex of the polygon in that sector. Testers that fall in different sectors have different best mating partners. Entries located near the biplot origin are less responsive to the change of testers.
To illustrate, the biplot in Fig. 1B was divided into four sectors, with entries A, C, F, and G as the vertex entries, and are referred to as Sector A, Sector C, Sector F, and Sector G, respectively. No tester fell in Sector A, meaning that Entry A was not the best mating partner with any of the genotypes. Actually, Entry A produced the poorest combination or hybrid with itself and Tester E, which is located on the opposite side of the origin. A single tester, G, fell in Sector C, indicating that Entry C was the best mating partner with G. Moreover, since Genotype C, as a tester, was not in Sector C, the cross C x G must be better than both parents, and the term heterosis is used hereafter to refer to such situations. Had Tester C fallen in Sector C, the combination C x C (i.e., pureline C) would be the best among all crosses involving C, and therefore, heterosis between C and any other genotypes would not be possible. A single tester, Tester A, fell into Sector F, indicating that Entry F was the best mating partner for A, and the cross A x F was heterotic. Testers B, C, D, E, and F fell in sector G. Since G was not in this sector, all crosses between Genotype G and these genotypes should be heterotic.
To summarize, the biplot predicts the following F1 hybrids to be superior heterotic crosses: C x G in Sector C; F x A in Sector F; and G x B, G x C, G x D, G x E, and G x F in Sector G (Fig. 1B). In addition, Tester E was almost on the perpendicular line that separates Sectors F and G, meaning that Entries F and G were almost equally good as partners for E. Consequently, we have seven superior hybrids: F x [A, E] and G x [B, C, D, E, F]. Interestingly, in Sector G, G was predicted to be the best mating partner for C, and in sector C, C was predicted to be the best partner for G. C and G were, therefore, identified to be the best partners for each other, and C x G must be the best of all possible combinations. Other crosses were also predicted to be heterotic (Fig. 1A), such as A x C and A x D, but they were not predicted to be superior crosses (Fig. 1B).
Most of the above predictions can be verified from the original data (Table 1). Some are not consistent with the data, however. For example, the biplot predicts G to be the best and A the second best partner for F, whereas the data showed A to be the best and G the second best for F (Fig. 1B), even though the observed values for A x F (64.9%) and G x F (63.1%) were quite close. Also, based on Table 1, there were other heterotic crosses such as B x C, B x E, and C x E, which were not predicted to be heterotic by the biplot. (These crosses were apparently inferior to those that are identified to be superior crosses based on Fig. 1B). Such discrepancies are expected because the biplot explained 77 rather than 100% of the total variation. Since all data contain some error, and since the biplot displays and makes predictions on the general pattern of the whole dataset, the predictions are probably more reliable than the individual observations.
Hypotheses Concerning the Genetic Relationships among the Genotypes
An immediate impression of Fig. 1B is that four vertex entries are apparently different from one another. Comparison of Entries A and G reveals the following information: (i) they were different in genetic responses; (ii) there was no heterosis between them; and (iii) as purelines, G was more resistant than A. This suggests that any dominant resistance genes (resistance genes are defined in this paper as all genes contributed to the apparent disease resistance regardless of their mechanisms) present in Entry A should also be present in Entry G. Similarly, any dominant resistance genes present in C should also present in F. Assuming that heterosis results from accumulation of different dominant gene loci, Entries A and C each would appear to carry at least one dominant resistance gene since heterosis was observed in their hybrids with F and G, respectively. Thus, F and G each would appear to carry at least two dominant resistance genes.
Because there was heterosis between F and G, they must each have carried a unique dominant resistance gene. These two genes are likely to be responsible also for the heterosis between A and F and that between C and G. Assuming that Genotypes B and D had similar genetics as C, as they were relatively close on the biplot, the heterosis observed between these genotypes and G can also be explained by the same two dominant genes.
Therefore, it would appear that at least three different resistance genes existed in the seven wheat genotypes: one shared by A and G, another shared by C (also B, D, and E) and F, and the third present in G but not in A, which may be the same as the one that is in F but not in B, C, D, or E. Were there no common dominant resistance genes in F and G, the hybrid F x G should be significantly better than A x F or C x G. Entry E may carry both genes from A and C, which predicts E to produce heterotic hybrids with both F and G but not with A or C. Following these reasoning, A x F, C x G, F x G, E x F, and E x G may each have combined all three dominant FHB resistance genes. However, since both the data and the biplot show that C x G was better than F x G and that E x G was better than E x F, there might be a recessive gene or a pair of epistatic genes present in C, E, and G. It follows that a cross combining Genotypes C (or E), F, and G may lead to breeding lines better than all of the parents. These analyses may help narrow down the crosses to be further investigated.
Interpreting the Wheat Stagonospora Nodorum Blotch Data
The biplot for the wheat Stagonospora nodorum blotch data explained 79% (59 and 20% by PC1 and PC2, respectively) of the total variation (Table 2, Fig. 2)
. Based on projections onto the ATC abscissa (Fig. 2A), Entry A showed the largest and Entry F the smallest GCA effects. The ranking of the genotypes for GCA was: A > B
C
D > E > F, which is consistent with the ranking based on entry means (Table 2). Fig. 2A suggests two heterotic groups: A and E as one and C, D, and F as the other. Therefore, hybrids [A, E] x [C, D, F] should show heterosis, as can be verified from Table 2. Entry B did not belong to any of the groups.
|
With regard to genetic constitutions, the six entries seemed to differ from one another, except that Entries C and D were apparently similar (Fig. 2B). Among the four vertex entries, A and E were different but the two did not produce heterosis, suggesting that any resistance gene in E must also be present in A (since A > E in GCA, Fig. 2A). Likewise, any resistance present in Entry F must also be present in Entry C (According to Table 2, C x F was slightly better than C, but this was not indicated by the biplot). E and F each carried at least one gene due to heterosis of some crosses. Thus A and C each must have carried at least two dominant resistance genes.
Heterosis occurred between A and C and between E and F, suggesting different dominant genes in A and E as one group and C, D, and F as another. Entry B, located near the plot origin, was intermediate between these two heterotic groups. Therefore, Entry B might carry two genes, one being the same as that in C and F, which caused heterosis when crossed with A; the other being the same as that in A and E. This explains the fact that B had better GCA effects than E and F and showed no large heterosis with any of the testers except A. Thus, at least four SNB resistance genes were involved in these six wheat genotypes. One shared by E, A, and B, another shared by F, C, D, and B, the third shared by A and C, and the fourth existed only in A. On the basis of these hypotheses, the crosses A x [B, C, D] should have carried all three genes and were equally resistant to SNB (Table 2).
Interpreting the Corn Pink Stem Borer Data
The biplot (Fig. 3)
for the corn PSB study (Table 3) explained only 37% (PC1) + 26% (PC2) = 63% of the total variation. A large portion of the total variation was not accounted for by the biplot, reflecting the complexity of the genetics among the 10 corn inbreds in PSB resistance. Nevertheless, the biplot still provides a useful tool for understanding the interactions among the inbreds.
|
E
F > A
J > C
D > I > G, which is roughly consistent with the ranking of H > E > A
B
C
F > D
G
I
J based on the entry means (Table 3). On the basis of projections onto the ATC ordinate (i.e., SCA) and along the ATC abscissa, the entries fell into two obvious heterotic groups: entries C and D as one group, and B and I as the other (Fig. 3A). These two groups interacted negatively, however, to give hybrids inferior to both parents, as can be verified from the original data (Table 3). Negative heterosis is predicted between the two groups because Entries C and D are located on the opposite of the ATC ordinate as Testers B and I, and vice versa (Fig. 3A). Negative heterosis may suggest involvement of recessive resistance genes, a phenomenon not observed in the two wheat data sets examined previously.
The polygon of Fig. 3B helps explain why Genotype H had the largest, and Genotype G the smallest, GCA effects. It indicates that Entry H was the best or near-best mating partner with seven of the 10 testers (except Genotypes B, F, and H per se). On the contrary, although Genotype G was the best mating partner with H and a good partner with B, it was a poor partner with all other testers. Fig. 3B also indicates that entries C and D were the best partners of Tester F, and Entry B was the best partner of B itself. The latter conflicts with the data and will be discussed further below.
Interestingly, Entries C, D, E, F, and H were all located on the same polygon side that connects Entries C and H, and Tester D located on the line perpendicular to this polygon side. This suggests that Genotypes C, D, E, F, and H were equally good in crossing with Genotype D. Examination of Table 3 indicates that this was indeed true. This may suggest that D carried epistatic effects with inbreds C, E, F, and H. Similarly, entries B, I, and G located on the same polygon side connecting B and G, and Tester B was almost on the perpendicular line of this polygon side. This suggests that Entries B, G, and I should produce similar hybrids with B. This suggestion was only partially true, however. The data indicate that hybrid G x B was much better than hybrid I x B, and B x B was intermediate (Table 3). The failure of the biplot to identify G as the best partner of B may have resulted from the major pattern that G had the lowest GCA (Table 3 and Fig. 3A). The biplot did indicate G to be the best partner of H, probably also because the major pattern that Genotype H had the largest GCA.
The 10 parents seemed to fall into seven groups, with C and D, E and F, and A and J being pairs of genotypes with apparently similar genetics (Fig. 3B). To understand the genetic relationships among the inbreds, we start from examining the vertex entries. As was previously discussed, Genotypes C and D may carry a recessive resistance gene, and Genotypes B and I may carry another recessive gene, which would explain the fact that hybrids between these two groups were inferior to both parents (Table 3). In contrast, the observed heterosis between Genotype H and Genotype G may suggest different dominant gene loci in H and G. Therefore, at least two recessive genes and two dominant genes may have been involved in the control of resistance to PSB in these four vertex inbreds.
Since there was no heterosis between G and C/D, C/D must carry the dominant gene present in G. Since there was heterosis in G x B, B must carry a dominant gene that is different from the one in G, which may be the same as the one in H due to lack of heterosis in B x H. Entries I, E/F and A/J, all located intermediate among the vertex genotypes, may be some types of the combinations of the dominant and recessive resistance genes. The pattern denoting H as the best mating partner with seven entries (all except F and B) suggests that H carries a dominant gene that is different from all dominant genes that are present in these seven entries.
| DISCUSSION |
|---|
|
|
|---|
The second advantage of the biplot approach is that it is more interpretative. While the conventional method of diallel analysis was designed to describe the phenotypic performance of the crosses, the biplot approach tries to interpret the phenotypic variation of the crosses by understanding the parents. In the conventional approach, although all variation is accounted for by GCA and SCA, the parents are evaluated only on their GCA effects. The term SCA is associated with crosses and has little impact on the understanding of the parents. Empirical evidence was provided in Yan et al. (2001), which demonstrated that entry PC1 scores had near-perfect correlation with entry main effects (i.e., the GCA effects, in terms diallel data) if the latter are larger than 35% of the total GGE variation; otherwise, the variation explained by PC1 would be considerably greater than the entry main effects. Because the PCs are least squares solutions, PC1 alone explains at least as much variation as, and typically more than, that by GCA. Thus, a biplot of PC1 vs. PC2 is generally more powerful than the conventional approach in understanding the parents.
In our presentation of the results, the interpretations based on the biplots were frequently compared with the original data to indicate the validity of the biplot approach. The consistency between the biplot predictions and the original data should not be understood as indicating that the biplot approach is a redundant presentation of the data and, therefore, not needed. Rather, it indicates that the biplot is an excellent tool for revealing patterns that may not be noticed otherwise. For example, the biplot revealed for the corn PSB study that Genotype D tended to produce similar hybrids when crossed with Entries C, D, E, F, and H, and that Genotypes C and D had similar genetics (Fig. 3B). This result may be easily overlooked when using conventional methods.
Constraints of the Biplot Approach
A potential constraint of the biplot method is that it may fail to explain most of the variation and therefore fail to display all patterns of the data. This is most likely to occur with large datasets, small entry main effects, and complex entry-tester interactions. Even when this is the case, it can be assured that the biplot of PC1 vs. PC2, as least squares solutions, still displays the most important linear patterns of the data (Kroonenberg, 1995), as demonstrated by the corn PSB dataset. Nevertheless, other biplots, such as one consisting of PC3 vs. PC4, may be needed to fully understand the data. Such options are available in the GGEbiplot software (Yan, 2001). The method of Gauch and Zobel (1996) for estimating patterns vs. noise of the data may be adopted to determine if such biplots are needed. The pattern is estimated by the total SS of the tester-centered data minus the noise, which is estimated by the total treatment degrees of freedom multiplied by the error mean square and can be estimated from replicated data. A biplot of PC3 vs. PC4 is needed only when the pattern SS is considerably greater than that explained by the biplot of PC1 vs. PC2.
Another constraint of the biplot approach is lack of a measure of uncertainty. However, we suggest that the significance of the difference between two entries can be visually assessed from their plot distance relative to the plot size. In many cases, this visual assessment should be sufficient for a reasonable judgment; in other cases, the biplot patterns should be used to generate hypotheses rather than to make decisions.
The third constraint of the biplot approach is complexity of generating and interpreting biplots. This problem is solved, however, by the development of GGEbiplot software (Yan, 2001). GGEbiplot is a Windows application, which reads original data, generates biplots, and provides various perspectives of biplot visualization. It is available upon request with a charge.
Received for publication November 30, 2000.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
W. Yan, M. S. Kang, B. Ma, S. Woods, and P. L. Cornelius GGE Biplot vs. AMMI Analysis of Genotype-by-Environment Data Crop Sci., March 1, 2007; 47(2): 643 - 653. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Gauch Jr. Statistical Analysis of Yield Trials by AMMI and GGE Crop Sci., May 18, 2006; 46(4): 1488 - 1500. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bertoia, C. Lopez, and R. Burak Biplot Analysis of Forage Combining Ability in Maize Landraces Crop Sci., April 25, 2006; 46(3): 1346 - 1353. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. de la Vega and S. C. Chapman Multivariate Analyses to Display Interactions between Environment and General or Specific Combining Ability in Hybrid Crops Crop Sci., February 24, 2006; 46(2): 957 - 967. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bhatnagar, F. J. Betran, and L. W. Rooney Combining Abilities of Quality Protein Maize Inbreds Crop Sci., November 1, 2004; 44(6): 1997 - 2005. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Narro, S. Pandey, J. Crossa, C. De Leon, and F. Salazar Using Line x Tester Interaction for the Formation of Yellow Maize Synthetics Tolerant to Acid Soils Crop Sci., September 1, 2003; 43(5): 1718 - 1728. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Yan Singular-Value Partitioning in Biplot Analysis of Multienvironment Trial Data Agron. J., September 1, 2002; 94(5): 990 - 996. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||