|
|
||||||||
Alberta Agriculture, Food and Rural Development, #301, 7000-113 Street, Edmonton, AB, Canada T6H 5T6 and Dep. of Agricultural, Food and Nutritional Science, Univ. of Alberta, Edmonton, AB, Canada T6G 2P5
* Corresponding author (rongcai.yang{at}gov.ab.ca)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: LR, likelihood ratio MANOVA, multivariate analysis of variance REML, restricted maximum likelihood WLS, weighted least squares
| INTRODUCTION |
|---|
|
|
|---|
In an attempt to determine the nature of genotypeenvironment interactions detected for yield in maize (Zea mays L.), Moll et al. (1978) used an approach similar to those of Robertson (1959) and Cockerham (1963) to partition the interaction sum of squares into two components, one due to heterogeneity of genetic variances across environments and the other due to imperfect genetic correlations of the same trait measured in different environments. Muir et al. (1992) further showed that the partitioning could be made with reference to either genotypes or environments. In either case, lack of perfect correlations between environments or genotypes suggests the presence of crossover interactions, whereas heterogeneity of variances across environments or genotypes reflect noncrossover interactions. However, since the two components of the interaction sum of squares were not distributed as chi-squares, direct tests of significance were not possible.
Using the results from multivariate analysis of variance (MANOVA), Yang and Baker (1991) proposed two heuristic tests for the significance of the two causes of genotypeenvironment interactions. The first test for homogeneity of genetic variances used the synthetic F-test whose degrees of freedom were determined by Satterthwaite's (1946) approximation. The second test for perfect genetic correlations was to compare estimated genetic correlation with the hypothesized value of one, as suggested by Scheinberg (1966). On the basis of these two tests, Yang and Baker (1991) concluded that the interactions observed in two wheat crosses, Potam x Ingal and RL4137 x Ingal, were due to heterogeneity of family and line variances across 2 yr, but not to lack of perfect genetic correlations. However, these are approximate tests based on unwarranted assumptions about the sampling distributions of estimated variance and covariance components. Furthermore, while MANOVA estimates of genetic parameters are unbiased, they have a number of undesirable properties such as nonpositive definite estimates of genetic variancecovariance matrices (leading to negative heritability and out-of-bound estimates of genetic correlation) and large sampling variability (Searle et al., 1992; Liu et al., 1997).
In this paper, I illustrate the use of a restricted maximum likelihood (REML) approach based on Henderson's (1984) mixed models theory to estimate genetic parameters and to test for the significance of different causes of genotypeenvironment interactions in Potam x Ingal and RL4137 x Ingal. While the mixed models theory has been known for decades particularly in animal breeding, its application to plant breeding is a fairly recent phenomenon and focuses on predicting performances of parents and hybrids (e.g., Hill and Rosenberger, 1985; Panter and Allen, 1995; Bernardo, 1996). In analyzing interactions between 11 maize hybrids and four locations, Kang (1998) used the REML analysis to calculate stability variances for estimated hybrid performances. With the recent availability of the likelihood-based estimation methods including REML in SAS PROC MIXED procedure (SAS Institute, 1997), it is now possible to specify and test for different genetic and error covariance structures required for characterizing the observed genotypeenvironment interactions. The power and flexibility of the likelihood-based analysis allow (i) the development of sound statistical tests for the different causes of interactions and (ii) constraining estimated genetic variances and covariances within acceptable ranges, thereby effectively removing the deficiencies associated with the MANOVA estimates as discussed above. Since the data used in this study were previously analyzed by MANOVA approach (Yang and Baker, 1991), the comparison is made of the two analyses.
| MATERIALS AND METHODS |
|---|
|
|
|---|
To estimate genetic parameters and to test for the significance of the different causes of genotypeenvironment interactions by the REML approach, a multiple-trait mixed model (Lynch and Walsh, 1998, Ch. 27) was fitted to the 2-yr data:
![]() |
All vectors of effects except that for the population mean vector were considered random, normally distributed, and independent of each other, with zero mean vectors and variancecovariance matrices being:
![]() |
![]() |
![]() |
![]() |
2b,
2g, and
2bg are block, group, and block x group interaction variances,
2f1 and
2f2 are family variances for years 1 and 2, and
f12 =
f12
f1
f2 is the family covariance between the two years,
2l1 and
2l2 are line variances for years 1 and 2, and
l12 =
l12
l1
l2 is the line covariance between the two years,
2e1 and
2e2 are error variances for years 1 and 2, and
is the Kronecker product (Searle et al., 1992). The estimation and testing were carried out by the SAS MIXED procedure (SAS Institute, 1997). The SAS code for these analyses is given in the appendix. The REML estimation was performed by including METHOD=REML as an option in the PROC MIXED statement. The estimation of family and line covariance structures (variances and correlations) was achieved by including the SUBJECT and TYPE options in the RANDOM statements, whereas the heterogeneity of error variances between the two years was allowed with the GROUP option in the REPEATED statement. Initial values were included by the PARMS statement. The REML estimation guaranteed nonnegative estimates of variance components, but out-of-bound estimates of family and line correlations between the 2 yr (>1) were avoided by including the UPPERB option in the PARMS statement. Different family and line covariance structures were specified by choosing appropriate predefined covariance models (SAS Institute, 1997, Table 18.5) for the TYPE option in the RANDOM statements and/or by constraining covariance parameters to certain values as done in the EQCONS option in the PARMS statement. The PROC MIXED procedure used the iterative technique based on a ridge-stabilizing Newton-Raphson algorithm to solve highly nonlinear REML equations (SAS Institute, 1997, p. 639640). The iteration continued until the convergence criterion of 10-8 was achieved. In a preliminary analysis, the possibility of multiple peaks in the likelihood surface was examined by means of different initial values. The estimates of the variances and correlations from the MANOVA analysis (Yang and Baker, 1991) were found appropriate as the initial values to achieve the global maximum likelihood.
Table 1
lists a full model concerning family, line, and error covariance structures (M0) and seven reduced models with specific constraints being imposed on the covariance structures (M1 M7). Models M1 through M6 were compared with model M0 to test for causes of family x year and line x year interactions. In view of the fact that heterogeneity of experimental error variances may lead to spurious interactions (Cochran and Cox 1957, Ch. 14), a further comparison between models M7 and M0 was also made to test for equality of error variance across the 2 yr. Thus, for each of the five traits in each of the two wheat crosses, eight runs of PROC MIXED were carried out for the full model and the seven reduced models. Log likelihoods (L), derived from these REML analyses for the full model (LM0) and each of the seven reduced models (LMi, i = 1, 2,..., 7), were compared to construct a likelihood ratio (LRi) as the difference of the corresponding values of -2 times the log likelihoods,
![]() |
|
2 with degrees of freedom given by the difference between the numbers of covariance parameters specifying the full and reduced models (Kendall and Stuart, 1979, Ch. 24.7; SAS Institute, 1997, p. 642643). For example, to test for equality of family variances between the two years
, models M3 and M0 were compared. Since there were seven parameters in M0 but six in M3, the
2 test for the equality of family variances between the 2 yr had one degree of freedom. It should be noted that three variance components for design factors (
2b,
2g, and
2bg) were kept the same for all the models and thus did not affect the tests. | RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
1.00) was much higher than the corresponding MANOVA estimate (0.87). The discrepancies are likely due to the fact that the REML estimation set the boundary constraints of (i) nonnegativity for variance components and (ii) correlations being within the acceptable range of -1 to 1 to ensure the estimated covariance matrices are positive definite, whereas the MANOVA estimation was not able to control the boundaries of those parameters. In addition, the MANOVA estimates of variance components for design factors (i.e., replication, group, and replication x group) were negative in many cases. Should the estimated family and line covariance matrices be positive definite and should the estimated variance components for the design factors be positive, the REML estimates would be identical to the MANOVA estimates (Searle et al., 1992). Significant family x year interactions were found for seven out of 10 cases; the three insignificant cases were kernel hardness in Potam x Ingal and plant height and kernel weight in RL4137 x Ingal (Table 4) . In Potam x Ingal, three out of four significant interactions were attributable to heterogeneity of family variances across two years, whereas the significant interaction for kernel weight was attributable to both lack of perfect family correlation and unequal family variances. On the other hand, lack of perfect family correlation contributed significantly to all three significant family x year interactions in RL3147 x Ingal. None of five line x year interactions in Potam x Ingal was significant and all three significant line x year interactions (days to heading, kernel weight, and kernel hardness) in RL4137 x Ingal were due to heterogeneity of line variances between the two years. Heterogeneity of error variances was observed in all cases except days to heading in Potam x Ingal and days to heading and plant height in RL4137 x Ingal.
|
Lack of perfect family correlations between the 2 yr in four out of the 10 cases (Table 4) suggests that the family ranks have changed across the 2 yr in these cases (Yang and Baker, 1991; Lynch and Walsh, 1998). This type of crossover interactions is important in selection programs because much of the improvement made in one or one set of environments will not be carried over when the selected genotypes are grown in other environments (Falconer, 1952). Thus, the plant breeder must select one genotype for one set of environments and a different genotype for other environments. In this case, the breeder may choose to use the clustering procedures of Cornelius et al. (1993) and Crossa et al. (1995) to group genotypes or environments so that crossover interactions within groups will be minimized. On the other hand, significant interactions due to heterogeneity of family and/or line variances found in other cases are not important to the breeder because a genotype that is the best in one environment will be the best in all environments. Thus, in the absence of crossover interactions, a single genotype would optimize production in all environments. However, the impact of interaction variation, regardless of the causes, on the estimation of genotypic effects or breeding values using the best linear unbiased prediction (Henderson, 1984) remains to be investigated.
The present REML approach differs from the earlier efforts to estimate genetic parameters and test for genotypeenvironment interactions using the weighted least squares (WLS) analysis (e.g., Hayman, 1960; Nasoetion et al., 1967; Mather and Jinks, 1982, p. 162175). In essence, the WLS estimation is obtained by fitting the observed mean squares and cross-products from MANOVA to their expected values, as determined by genetic structure and experimental designs, weighted by the covariance matrix of the sampling errors including both correlated and unequal errors of the observed mean squares and cross products. The goodness-of-fit to a particular genetic model is evaluated by a chi-square value measuring the deviation of the observed mean squares and cross products from their expectations. A comparison of chi-square values from fitting to two genetic models also provides a test for heterogeneity of genetic or error variances across different environments, one aspect of genotypeenvironment interactions. There are several reasons why the WLS analysis is less preferred than a likelihood-based analysis. First, since the WLS analysis is still based on the MANOVA estimates of mean squares and cross products, the usual problems associated with the MANOVA estimators such as the estimated genetic variancecovariance matrix being nonpositive definite remain. Second, the WLS estimates of genetic parameters are found to be imprecise because of the large standard errors of variance and covariance estimates (Lynch and Walsh, 1998, Ch. 9). I (unpublished) used eight family, line, and error mean squares and cross products obtained from MANOVA to carry out the WLS analysis. Where the comparisons were possible for each trait in each cross, the WLS estimates of genetic and error variances tended to have larger standard errors than the REML estimates. Third, the WLS analysis is based on the estimated second-order statistics (mean squares and cross products), thereby limiting the types of genetic models that can be fitted.
This study was limited to the analysis of data from two environments (years). Extension to the analysis of multiple environments is straightforward. For example, the family covariance structure for n environments under the full model is given by,
![]() |
For multiple environments, a strategy is certainly needed to test for various aspects of the covariance structure in relation to genotypeenvironment interactions. In this example, a logical first step is to have overall tests of (i) homogeneity of family variances across all n environments and (ii) perfect family correlations between all pairs of n environments. The log likelihoods of the reduced models corresponding to conditions (i) and (ii) will be compared with that of the full model to provide appropriate chi-square tests with degrees of freedom being (n - 1) and n(n - 1)/2, respectively. If these reduced models are rejected by the chi-square tests, it is necessary to identify a restricted set of subhypotheses concerning heterogeneity of family variances and lack of perfect family correlations across a particular subset of environments. An exhaustive comparison may not be practical as the number of possible comparisons becomes prohibitive for a large number of environments. Most of these LR tests can be carried out with the predefined covariance structures and options that are currently available in the SAS PROC MIXED procedure (SAS/STAT version 8.1). Because individual comparisons are not statistically independent, significant levels for multiple comparisons need to be adjusted by a Bonferroni procedure (Lynch and Walsh, 1998, Ch. 21). An attractive alternative for model testing and selection in multiple environments is to use Akaike's Information Criterion (AIC) or Schwarz's Bayesian Information Criterion (BIC). Instead of a fixed significance level set for the LR test, AIC or BIC allows for more stringent significance levels with larger differences in the number of parameters between the full and reduced models (J. Crossa, personal communication, 2001). Both AIC and BIC are also part of fit statistics generated by the SAS PROC MIXED procedure.
Despite the superiority of the likelihood-based analysis for estimating genetic parameters and characterizing genotypeenvironment interactions over the MANOVA method (Searle et al., 1992; Liu et al., 1997; this study), its practical use has been limited. While the SAS MIXED procedure is suitable for such analysis, it is computationally very demanding. With a total of 576 observations across the 2 yr and 9 to 11 covariance parameters (three for design factors and six to eight for the eight covariance structures as given in Table 1), it took an average of more than 20 h on a 300 MHz Pentium II machine to complete fitting to the eight models for each trait in each cross (over 250 h was used to complete the analyses for all the traits in both crosses). The number of iterations before the convergence criterion was met ranged from 4 to 48. Computing time will likely increase further when the consideration is shifted from two to multiple environments as discussed above. Other computer programs such as MTDFREML (Boldman et al., 1993), DFREML (Meyer, 1998), and ASREML (Gilmour et al., 1999) may be more efficient with specialized data sets but are not flexible enough to allow for specifying a wide variety of covariance structures required for estimating and testing genotypeenvironment interactions. Computing time with the SAS PROC MIXED or other programs is expected to decrease as faster processors, better compilers, and more efficient computing algorithms become available.
APPENDIX
The following SAS (version 6.12 to 8.10) code was used to fit the full model (M0) and seven reduced models (M1M7) as given in Table 1 to the two-year data for days to heading (HD) in cross Potam x Ingal.
/****************************************************
M0 (Full model): The fitting of full model is achieved by (i) specifying TYPE=UNR in the first and second RANDOM statements to indicate completely general (unstructured) family and line covariance matrices and (ii) allowing for heterogeneous error variances with the option of GROUP=YEAR in the REPEATED statement. The PARMS statement is needed to give initial values for the variances and covariances which are the MANOVA estimates from Yang and Baker (1991) and to constrain family and line correlations be < = 1. The PROC MIXED statement has three options for all the models (M0 to M7). (1) METHOD=REML requests restricted maximum likelihood (REML) as the estimation method for the covariance parameters (this is a default method). (2) CONVH=1E-8 indicates that the relative Hessian convergence criterion is set to be 10-8 (this is a default value). (3) The COVTEST option requests the printouts of asymptotic standard errors and Wald Z-tests for the covariance parameter estimates.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=unr;
random year /subject=line(family) type=unr;
random rep group rep*group;
repeated / group = year;
parms 1.45.58.9.67.52.96.1.5 1.5.41.58
/upperb=.,.,.9999,.,.,.9999,.,.,.,.,.;
/****************************************************
M1: This reduced model has two constraints. The first constraint is the equality of family variances of two years and is implemented by choosing TYPE=AR(1) specifying a first-order autoregressive structure. The second constraint is the perfect family correlation and is implemented by the EQCON= option in the PARMS statement to hold the value of the second covariance term (i.e., family correlation) constant (i.e., 0.9999) throughout iteration.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=ar(1);
random year /subject=line(family) type=unr;
random rep group rep*group;
repeated / group = year;
parms 1.1.9999.67.52.96.1.5 1.5.41.58
/eqcons=2 upperb=.,.9999,.,.,.9999,.,.,.,.,.;
/****************************************************
M2: This reduced model is the same as model M1 except that there is one constraint: the perfect family correlation.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=unr;
random year /subject=line(family) type=unr;
random rep group rep*group;
repeated / group = year;
parms 1.45.58.9999.67.52.96.1.5 1.5.41.58
/eqcons=3 upperb=.,.,.9999,.,.,.9999,.,.,.,.,.;
/****************************************************
M3: This reduced model is the same as model M1 except that there is one constraint: the equality of family variances of two years. Note that there is no need to give initial values if no constraint is made on any of the initial values, but appropriate initial values do lead to a faster convergence.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=ar(1);
random year /subject=line(family) type=unr;
random rep group rep*group;
repeated / group = year;
parms / upperb=.,.9999,.,.,.9999,.,.,.,.,.;
/****************************************************
M4: This reduced model is the same as M1 except that the two constraints are on the line covariance structure (i.e., perfect line correlation and equality of line variances of the two years).
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=unr;
random year /subject=line(family) type=ar(1);
random rep group rep*group;
repeated / group = year;
parms 1.45.58.9.6.9999.1.5 1.5.41.58
/eqcons=5 upperb=.,.,.9999,.,.9999,.,.,.,.,.;
/****************************************************
M5: This reduced model is the same as model M2 except that the constraint is the perfect line correlation.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=unr;
random year /subject=line(family) type=unr;
random rep group rep*group;
repeated / group = year;
parms 1.45.58.9.67.52.9999.1.5 1.5.41.58
/eqcons=6 upperb=.,.,.9999,.,.,.9999,.,.,.,.,.;
/****************************************************
M6: This reduced model is the same as model M3 except that the constraint is the equality of line variances of two years.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=unr;
random year /subject=line(family) type=ar(1);
random rep group rep*group;
repeated / group = year;
parms / upperb=.,.,.9999,.,.9999,.,.,.,.,.;
/****************************************************
M7: This reduced model requires the equality of error variances of two years. This is achieved by simply not including the REPEATED statement in the analysis.
****************************************************/
proc mixed method=reml convh=1e-8 covtest;
class year rep group family line;
model hd = year;
random year /subject=family(group) type=unr;
random year /subject=line(family) type=unr;
random rep group rep*group;
* repeated / group = year;
parms /upperb=.,.,.9999,.,.,.9999,.,.,.,..;
Received for publication September 5, 2001.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. Steinheim, J. Odegard, T. Adnoy, and G. Klemetsdal Genotype by environment interaction for lamb weaning weight in two Norwegian sheep breeds J Anim Sci, January 1, 2008; 86(1): 33 - 39. [Abstract] [Full Text] [PDF] |
||||
![]() |
R.-C. Yang Mixed-Model Analysis of Crossover Genotype-Environment Interactions Crop Sci., May 31, 2007; 47(3): 1051 - 1062. [Abstract] [Full Text] [PDF] |
||||
![]() |
R.-C. Yang, S. F. Blade, J. Crossa, D. Stanton, and M. S. Bandara Identifying Isoyield Environments for Field Pea Production Crop Sci., January 1, 2005; 45(1): 106 - 113. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Plant Registrations | Soil Science Society of America Journal | ||||
| Journal of Natural Resources and Life Sciences Education |
Journal of Environmental Quality |
||||