|
|
||||||||
a Centro de Investigaciones Agrarias de Mabegondo, Apartado 10, A Coruña, Spain
b Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 Mexico DF, Mexico
c Dep. of Agronomy and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0091
* Corresponding author (j.crossa{at}cgiar.org).
| ABSTRACT |
|---|
|
|
|---|
s degrees of freedom as a measure of error variance absorption by the principal components (PC) when data are extremely noisy. Shrunken EVP models were generally more predictively accurate than truncated least squares-fitted AMMI models and BLUPs, which was also true for CCC except when error variance was large. The EVP shrinkage method appears to be promising for obtaining improved predictions of cultivar performance in multienvironment cultivar trials.
Abbreviations: AMMI, additive main effect and multiplicative interaction BLUP, best linear unbiased predictor CCC, work of Cornelius, Crossa, and associates COMM, completely multiplicative model EVP, eigenvalue partition GEAR, genotype, environment, attribute model GEI, genotypes x environment interaction GREG, genotype regression model MSEPM, mean squared error of predicted means PC, principal component RCBD, randomized complete block design RMSPD, root mean squared predictive difference SEPM, standard error of predicted means SREG, sites regression model
| INTRODUCTION |
|---|
|
|
|---|
The AMMI model for the cell means is
![]() |
k is the singular value (square root of the eigenvalue) for the kth PC axis; interaction parameters
ik and
jk are elements of the kth singular vector for genotypes and environments, respectively, and are interpretable as scores for the contribution of the ith genotype and the jth environment, respectively, to the kth PC;
ij is the residual, which includes the residual interaction not accounted for by the multiplicative terms and the contribution of experimental error to the cell mean. In the saturated AMMI model, the maximum number of principal components is equal to the smaller of [(m - 1), (n - 1)], m and n being the number of genotypes and environments, respectively. The AMMI model with 1, 2, 3,..., etc., multiplicative PC components are characterized as truncated AMMI1 AMMI2, AMMI3, etc. The truncated AMMI, with only the main effects of genotypes and environments, but without interaction, is called AMMI0. The random data splitting and cross validation procedure (Gauch, 1988; Gauch and Zobel, 1988, 1989) with the RMSPD criterion has been used for selecting the best truncated multiplicative model. In this procedure, some subset of replicates form the data used for fitting the model and the remaining replicates comprise the validation data (Gauch and Zobel, 1988, 1989; Crossa et al., 1990; Crossa and Cornelius, 1994). Since the choice of the best truncated multiplicative model depends on the number of replicates involved in the model data (Moreno-González et al., 2003), the RMSPD criterion may not select the best truncated multiplicative model obtainable when all replications are used for fitting the model. Moreno-González et al. (2003), in the context of an "eigenvalue partition method" (EVP), proposed the root mean square predictive difference (RMSPDEVP) criterion for selecting the best truncated AMMI model that can be applied to cell means involving all replications.
Cornelius et al. (1993)(1996) and Cornelius and Crossa (1995)(1999) proposed shrinkage factors for multiplicative models as a way to improve the prediction of cell means in a two-way table of GEI data. Shrinkage factors reduce the absolute value of the GEI terms in the multiplicative models because values of these shrinkage factors are always within the interval [0,1]. The authors showed that shrinkage estimation of multiplicative models produce better predictions of cultivar performance than truncated multiplicative models and are often also better predictors than BLUP on the basis of a two-way random effects model. One of the advantages of shrinkage factors is that they are computed from the complete data set, whereas the truncated models chosen by the RMSPD criterion of the random data splitting and cross validation procedure are from a modelling subset.
The shrinkage factors defined by Cornelius et al. (1993)(1996) and Cornelius and Crossa (1995)(1999), herein named CCC method, were constructed by analogy to shrinkage factors involved in empirical BLUPs in a two-way random effects model with interaction. The shrinkage estimates of the shrunken interaction term of the AMMI, GREG, SREG, and COMM multiplicative models was defined as
![]() |
k,
ik, and
jk are as previously defined for unshrunken AMMI. Shrinkage estimation of SHMM is more complicated than for the other multiplicative models, computation of which requires an iterative algorithm (Cornelius and Crossa, 1995, 1999).
A derivation and theoretical justification for the shrinkage estimators is given in Appendix B of Cornelius and Crossa (1999). The resulting formula for Sk is
![]() | [1] |
2k is the empirical eigenvalue of the kth PC axis; r is the number of replications;
2e is the error mean square, and ük is a parameter, rather analogous to degress of freedom (df), that multiplies
2e/r in an expression for the expectation of the eigenvalue. A first approximation suggested for ük was the number of df of Gollob's (1968) approximate F-test; i.e., ük = m+n - 1 - 2k (number of parameters in the kth multiplicative term minus number of constraints on those parameters). In the companion paper, Moreno-González et al. (2003) developed an EVP method to estimate the contribution of the GEI variance and the error variance to each AMMI PC axis. The EVP method was able to select the same truncated AMMI model than the conventional RMSPD cross validation criterion. However, the EVP has the advantage over the RMSPD cross validation that it can be applied to all replicates of the trial.
A potentially useful alternative strategy for estimating the shrinkage factors is to derive them from the EVP method by determining the contributions of interaction variance and error variance to the AMMI eigenvalues (Moreno-González et al., 2003).
The objectives of the study were (i) to develop shrinkage factors for the multiplicative terms of the AMMI model on the basis of the EVP method of Moreno-González et al. (2003) and (ii) to compare AMMI models fitted by shrinkage factors obtained by the EVP and CCC methods (the latter using Gollob's df as value of ük), unshrunken parsimonious AMMI models fitted by least squares and chosen by cross validation, and BLUP predictions based on a two-way model with main effects of cultivars and environments considered as fixed effects and the GEI term considered as random effect. These four estimation methods were compared by cross validation using the RMSPD.
| MATERIALS AND METHODS |
|---|
|
|
|---|
![]() |
Since the shrinkage factors should multiply the interaction terms of the model, the shrunken interaction term can be written as
![]() | [2] |
ijp is the shrunken interaction term; Sk is the shrinkage factor for the kth PC axis; and p
min (m - 1, n - 1), m and n being the number of rows (genotypes) and columns (environments) in a matrix array, respectively. If number of sites is greater than number of genotypes, for expedient computation, the genotypes may be taken as columns and the sites as rows.
The expected mean square difference between the shrunken predicted means and the true means [i.e., the mean square error of predicted means (MSEPM)] over all cells can be computed by an approach similar to Eq. [11] of Moreno-González et al. (2003). If their Eq. [11] is squared after substituting z*ijp and yijv for
ijp and yij, respectively, the following is obtained.
![]() | [3] |
![]() |
2GE and
2e/r are the structural GEI and error variance components associated with cell means in the ANOVA, respectively; r, m, and n are the number of repetitions, genotypes, and environments, respectively; p = min (m - 1, n - 1);
k and êk are the estimated adjusted coefficients of the structural GEI and error variance components for the kth PC axis, respectively; terms (m + n - 1)
2e/mnr, and 
S2kêk
2e/mnr are the error components associated with aij and zijk, respectively;
![]() |
The Sk parameters can be estimated by minimizing MSEPM. Therefore, by taking the partial derivative of MSEPM with respect to Sk and equating it to zero, the following expression was obtained
![]() |
![]() | [4] |
Equation [1] was originally constructed by Cornelius et al. (1993) and Cornelius and Crossa (1995) by analogy to shrinkage factors involved in BLUPs in a two-random effects model with interaction. Further theoretical considerations were discussed by Cornelius et al. (1996) and a complete theoretical justification was given by Cornelius and Crossa (1999). Equation [4] was based on the EVP theory (Moreno-González et al., 2003). Equation [4] is equivalent to Eq. [1] because
2k = 

(Moreno-González et al., 2003), and ük can be made equivalent to (m - 1)(n - 1) êk. The similarity of both equations, which were derived by different approaches, can be considered a reciprocal crosscheck for both methods.
No connection between the derivations of Eq. [1] and Eq. [4] was apparent when the senior author developed the heretofore unpublished Eq. [4]. The merit of Eq. [1] and [4] will depend on the accuracy of the ük and êk estimates, respectively. Cornelius et al. (1993)(1996) and Cornelius and Crossa (1995) used Gollob's df as a first approximation of the ük values, and then, because Gollob's df is known to be an appropriate measure of error absorption by the multiplicative terms only for terms for which the true
value is very large relative to error variance (Goodman and Haberman, 1990), they employed a computer simulation scheme (parametric bootstrap) to obtain improved values.
The simulation scheme can be iterated as many times as desired. However, in a subsequent cross validation study (Cornelius and Crossa, 1999) involving five multienvironment cultivar trials, shrinkage estimation using Gollob's df performed virtually as well as the subsequent simulation estimates of the ük values. The most fundamental difference between the EVP and the CCC method is that the EVP method uses a nonparametric data resampling method, rather than relying on Gollob's df or a simulation scheme, to estimate the expected error absorption by the multiplicative terms.
The standard error of predicted means (SEPM) can be estimated by taking the square root in Eq. [3] after substituting Sk from Eq. [4].
![]() | [5] |
Experimental Data
The shrinkage factors were estimated in yield data from three multienvironment trials that had been used for the EVP method of AMMI models (Moreno-González et al., 2003) and for the GEAR model of Moreno-González and Crossa (1998). Trial 1 is a multienvironment experiment including 16 triticale (X Triticosecale Wittpmack) cultivars with four replications in randomized complete blocks (RCBD) evaluated at 10 environments in Spain during 1989 (Royo et al., 1993). Trial 2 is a CIMMYT maize (Zea mays L.) international trial where eight maize genotypes were arranged in a randomized complete block design with four replications at each of 33 sites scattered across the tropical region in 1987. Trial 3 comprises 11 broad bean (Vicia faba L.) genotypes arranged in randomized complete block design with three replications grown at 10 environments in southern Spain (Cubero and Flores, 1995).
Simulation Data
To test the applicability of the EVP-based shrinkage factors in a wide range of situations, simulation data sets were generated from the original empirical data by adding to each observed cell mean a random error component for each replication in each trial. The random error effects in the simulation data sets came from normal distributions with mean zero and arbitrary standard deviations of 2500, 1500, 1000, 750, 600, 400, and 240 kg ha-1. Sixty-three simulation data sets were generated for each arbitrary standard deviation in Cases 1 and 2; 42 simulation data sets were generated in Cases 3 and 4; and 84 simulation data sets in Case 5. Cases 1 to 5 will be described in the next section.
Random Data Splitting
Since all trials were arranged in RCB designs, data were first adjusted to remove the block effects at each site (Cornelius and Crossa, 1999). Similarly to Moreno-González et al. (2003), cross validation was performed by splitting the data into model (rm) and validation data (rv). Five cases of data splitting were studied for the simulation and empirical data of Trials 1, 2, and 3. In Cases 1 and 2 involving Trials 1 and 2, respectively, three replicates were randomly selected for each genotype at each site to form the model data (rm = 3) and the remaining replicate (rv = 1) was used as validation data. In Cases 3 and 4 involving Trials 1 and 2, respectively, rm = 2 and rv = 2. In Case 5 involving trial 3, rm = 2 and rv = 1. Four random splitting events were performed on each of the 63 simulation data sets of Cases 1 and 2, in such a way that the four replicates of each cell were involved, once each, in the four one-replicate validation data sets. Likewise, six random splitting events were done on each of the 42 simulation data sets of Cases 3 and 4, in such a way that the four cell replicates were involved, three times each, in the six two-replicate validation data sets. Also, three random splittings were made on each of the 84 simulation data sets of Case 5, in such a way that the three cell replicates were involved, once each, in the three one-replicate validation data sets. For empirical data, the same random splitting procedure was used, but assignment of random replicates to each cell was done 63 times in Cases 1 and 2; 42 times in Cases 3 and 4; and 84 times in Case 5. In total, 252 random model data sets were validated for each of the situations and cases studied.
Shrinkage Factors, Cross Validation Procedure, AMMI, and BLUP
Shrinkage factors were computed (i) by substituting the adjusted coefficients of variance components
k and êk estimated from the EVP method (Moreno-González et al., 2003) into Eq. [4]; and (ii) by the CCC method (Eq. [1]) (Cornelius et al.,1993, 1996; Cornelius and Crossa, 1995, 1999) using the df according to Gollob as the estimate of ük. If shrinkage factor estimates were negative, they were assigned a zero value.
The shrinkage factors were studied for different situations in each trial. Shrunken predicted terms
ijp were computed from Eq. [2] and added to the additive main component aij to obtain the shrunken predicted means for the accumulative PC axis of the AMMI model. All PC axes were involved in the computations. The procedure was applied to the cell means of model data for each case. The RMSPD between the shrunken predicted means and the validation cell means was computed by taking the square root of the average of the squared predictive differences (
ij - yijv)2 across all cells and split data sets, where
ij is the cell mean predicted from the modelling data and yijv is the corresponding observed validation data value. The cross validation RMSPD criterion was used to validate and compare the accuracy of two different shrunken methods: the adjusted EVP method and the CCC method using the Gollob df as estimate of ük. In addition, the RMSPD was also computed for the predicted means of the conventional truncated AMMI model, where the shrinkage factor of each PC axis retained in the model is unity and is zero for the truncated (i.e., omitted) PC axes.
The BLUP from a two-way table having random interaction effects was obtained for each cell mean on the same simulation and empirical data as the shrunken models, using the PROC MIXED of SAS (SAS Institute Inc., 1999). The main effects in BLUP were considered fixed for fair comparison to the other methods. The RMSPD cross validation criterion was also applied to the BLUP estimates. The same model data subset from each random splitting event was used for comparing the four methods: CCC, EVP, AMMI, and BLUP.
Shrinkage factors were also estimated for trials including all replicates. Since no replicate was left for validation, the SEPM was estimated for the EVP method from Eq. [5]. The SEPM for AMMI was estimated from Eq. [11] (Moreno-González et al., 2003) after removing the error variance associated with the validation observation. The BLUP estimates and the standard error of the BLUP predicted cell means were obtained from the PROC MIXED from SAS (SAS Institute Inc., 1999).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
k, for the PC axis must be adjusted because of negative estimates (Moreno-González et al., 2003). Thus, precision of the EVP shrinkage factors is reduced as the error variance increases. BLUP was also better than the EVP method when the simulation standard error for cases 4 and 5 was 1500. In these situations, the ratio of the cell means error variance to the structural GEI variance was also high, i.e., greater than 4.5.
|
Truncated AMMI was better than the CCC method but not better than EVP, in Cases 2, 3, 4, and 5 with simulated data and standard deviation 2500. This is probably because Gollob's df will not accurately express the error absorption by the large eigenvalues when the data are extremely noisy (Cornelius, 1993). The best truncated least squares AMMI model in all four of these situations was the additive model (AMMI0). It appears that, in general, CCC shrinkage estimates using Gollob's df as the value of ük are better than truncated least squares AMMI, and the EVP method is better than both truncated AMMI and BLUP based on a two-way random effects model, except for situations where the ratio of variance of a cell mean to GEI variance is nearly as large as, or larger than, 4.5. Such situations do not occur frequently in multienvironment cultivar trials. Indeed, if they do, such is an indication of a need for either more replications or better selection of experimental sites with respect to within-site uniformity of experimental conditions. These results agreed with Cornelius and Crossa (1999) who found shrinkage estimates to be better predictors of the validation data than truncated multiplicative models in five multienvironment cultivar trials.
The choice of the best truncated model depends on the experimental error variance and the number of replications used for forming the modeling data. In Trial 1 for model data with three replicates, AMMI9, AMMI2, AMMI1, and AMMI0 were selected for standard errors 400, 1000, 1500, and 2500, respectively, whereas for modeling data with two replicates, AMMI9, AMMI1, AMMI1, and AMMI0 were selected for standard errors 400, 1000, 1500, and 2500, respectively (Table 1). In Trial 2, AMMI6, AMMI1, AMMI1, and AMMI0 were selected for model data with three replicates and standard errors 600, 1000, 1500, and 2500, respectively, whereas AMMI7, AMMI0, AMMI0, and AMMI0 were selected for modeling data with two replicates and standard errors 600, 1000, 1500, and 2500, respectively (Table 2). In Trial 3, AMMI9, AMMI1, AMMI1, and AMMI0 were selected for standard errors 240, 600, 750, and 1000, respectively. Thus, the best truncated AMMI model for cell means including all replications is unknown, since the models cannot be validated and the choice based on validation with a lesser number of replications may not be correct.
Therefore, the shrinkage methods had the following advantages over the truncated AMMI model: (i) a clear criterion exists for including all PC axes in the shrunken models, whereas an adequate criterion for selecting the best number of the first PC axis is lacking in truncated AMMI; (ii) the shrunken models are better cell mean predictors than the truncated AMMI models, since their validation RMSPD estimates were generally smaller than those of AMMI.
The shrunken CCC and EVP models were generally better cell mean predictors than BLUPs for the three trials and the model data with two and three replicates (Table 2). Formulas of shrinkage factors for the CCC, EVP methods, and BLUP have similar structure (Cornelius et al., 1993, 1996; Cornelius and Crossa, 1995, 1999), but the CCC and EVP methods provide for different values for the shrinkage factors to be applied to the individual PCs that make up the whole interaction, whereas the BLUP method based on a two-way random effects model with interaction applies the same shrinkage factor to all interactions. Thus, it seems logical that minimization of the error for each single PC should yield better results than minimization of the error for the entire interaction. Application of shrinkage factors to the random additive effects produced a negligible improvement in the BLUP estimates of the three trials (data not shown). The BLUPs were better predictors of cell means than the best truncated AMMI models for all trials, except for the two-replicate model data of Trial 1 with simulation standard errors 600, and 1000, and the empirical data themselves (Case 3 of Table 2), and also for the empirical Trial 3.
Comparisons among Shrunken Models
The adjusted EVP shrinkage method gave smaller RMSPD values than did the CCC shrinkage method [with Gollob's df used to estimate ük] for all cases and trials except the simulation data of Case 3 with standard deviation 750 and 1000, and the empirical data of Case 4 (Table 2), but the differences between the two methods were rather trivial as long as error variance was not exceedingly large. Differences between the two methods were largest when standard deviation was 2500. Failure of CCC to perform as well as EVP in these cases can be explained by the fact, previously mentioned in the context of comparisons with truncated AMMI, that Gollob's df will tend to underestimate error absorption by the large eigenvalues if the data are extremely noisy (Cornelius, 1993). Apparently, in cases of extremely noisy data, one should either use the EVP method or the CCC method with ük values estimated by simulation. While the CCC method with ük estimated by Gollob's df is less computationally intensive than the EVP method, the reverse is true if the CCC method is used with ük estimated by simulation. The results obtained here, along with the results of Cornelius and Crossa (1999) (who found little difference in RMSPD results for the CCC method using Gollob's df as compared with estimating ük by simulation), suggest that there is little or nothing to lose, and, in some cases some improvement in accuracy (or precision) to be gained (particularly when data are very noisy), by using EVP shrinkage estimators as an AMMI model fitting method.
Models with All Replications and SEPM
The shrinkage factors based on the adjusted EVP method were estimated for empirical data of Trials 1, 2, and 3, when all replicates were involved in the model data. Since no replication was left for validation, the SEPM (Eq. [5]) will be taken as criterion for model comparisons (Table 3)
. The RMSPD based on the EVP method (RMSPDEVP) was shown to be a good criterion for selecting the best-truncated AMMI models (Moreno-González et al., 2003). The SEPM has the same structure as the RMSPDEVP criterion, since both were derived from the same concepts. SEPM is the same as RMSPDEVP after removing the error associated with the validation observation and replacing the GEI effects from the PC analysis by shrunken GEI effects. Again, the shrunken EVP method was better (i.e., it has a smaller SEPM estimate) than the BLUP and the best truncated AMMI models for all trials (Table 3), as was seen when model data with incomplete number of replicates were validated with the RMSPD criterion (Table 2). Best models for the EVP method included all PC axes in all trials, whereas the best models in truncated AMMI included the first five, three, and one PC axes in Trials 1, 2, and 3, respectively (Table 3). As discussed above, selection of the best AMMI depends on the number of replications. The BLUP model was better than the best truncated AMMI model for Trials 2 and 3, but AMMI5 was superior to BLUP for Trial 1.
|
| CONCLUSIONS |
|---|
|
|
|---|
In the present study, the EVP-based shrinkage estimators were found to be empirically more predictively accurate than the CCC method [if the absorption of error variance by PC axes was estimated as df defined by Gollob (1968)] in all but three of the comparisons made. The difference was trivially small if error variance was small, but became of greater importance as error variance increased. The study did not reveal any disadvantages with respect to performance of the EVP shrinkage estimators as compared with the other methods studied [which included parsimonious ("truncated") least squares-fitted AMMI models and BLUPs based on a two-way random effects model, in addition to the CCC method using Gollob's df].
A suggested protocol for estimating means in a multienvironment cultivar trial is to (i) compute an analysis of variance of the data and a two-way table of empirical cell means, (ii) obtain the least squares solution for the full AMMI model, (iii) use the eigenvalue partition (EVP) data resampling method to estimate the contributions of error and interaction variances to the AMMI least squares PCs and compute the resulting shrinkage factors, (iv) multiply the least squares estimates of the AMMI interaction singular values by their respective shrinkage factors, (v) estimate the cell means from the resulting shrunken AMMI model, and (vi) use the SEPM defined in this paper to estimate the standard errors of the shrinkage estimates of the cell means.
| ACKNOWLEDGMENTS |
|---|
Received for publication December 18, 2002.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H. G. Gauch Jr. Statistical Analysis of Yield Trials by AMMI and GGE Crop Sci., May 18, 2006; 46(4): 1488 - 1500. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Moreno-Gonzalez, J. Crossa, and P. L. Cornelius Additive Main Effects and Multiplicative Interaction Model: I. Theory on Variance Components for Predicting Cell Means Crop Sci., November 1, 2003; 43(6): 1967 - 1975. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Plant Registrations | Soil Science Society of America Journal | ||||
| Journal of Natural Resources and Life Sciences Education |
Journal of Environmental Quality |
||||