Crop Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 31 May 2007
Published in Crop Sci 47:1063-1070 (2007)
© 2007 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kelly, A. M.
Right arrow Articles by Cullis, B. R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Kelly, A. M.
Right arrow Articles by Cullis, B. R.
Agricola
Right arrow Articles by Kelly, A. M.
Right arrow Articles by Cullis, B. R.
Related Collections
Right arrow Biometrics
Right arrow Crop Genetics
Right arrow Statistics

CROP BREEDING & GENETICS

The Accuracy of Varietal Selection Using Factor Analytic Models for Multi-Environment Plant Breeding Trials

Alison M. Kellya,*, Alison B. Smithb, John A. Ecclestonc and Brian R. Cullisb

a QDPI, Biometry, Toowoomba, Queensland, Australia
b NSWDPI, Biometrics, Wagga Wagga Agricultural Inst., Wagga Wagga, NSW, Australia
c School of Physical Sciences, Univ. of Queensland, Brisbane, QLD, Australia

* Corresponding author (alison.kelly{at}dpi.qld.gov.au).


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Modeling of cultivar x trial effects for multi-environment trials (METs) within a mixed model framework is now common practice in many plant breeding programs. The factor analytic (FA) model is a parsimonious form used to approximate the fully unstructured form of the genetic variance–covariance matrix in the model for MET data. In this study, we demonstrate that the FA model is generally the model of best fit across a range of data sets taken from early generation trials in a breeding program. In addition, we demonstrate the superiority of the FA model in achieving the most common aim of METs, namely the selection of superior genotypes. Selection is achieved using best linear unbiased predictions (BLUPs) of cultivar effects at each environment, considered either individually or as a weighted average across environments. In practice, empirical BLUPs (E-BLUPs) of cultivar effects must be used instead of BLUPs since variance parameters in the model must be estimated rather than assumed known. While the optimal properties of minimum mean squared error of prediction (MSEP) and maximum correlation between true and predicted effects possessed by BLUPs do not hold for E-BLUPs, a simulation study shows that E-BLUPs perform well in terms of MSEP.

Abbreviations: BLUP, best linear unbiased prediction • E-BLUP, empirical best linear unbiased prediction • FA, factor analytic • MET, multi-environment trial • MSEP, mean squared error of prediction • US, unstructured variance.


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
METHODS FOR the analysis of series of plant cultivar trials, also known as multi-environment trials (MET), are abundant in the literature. Recently, mixed model approaches have become popular (for a review, see Smith et al., 2005), as they provide a flexible framework in which incomplete data (not all cultivars grown in all trials) are easily handled and cultivar x trial effects and within-trial error variation can be appropriately modeled. In this study, we focused on the modeling of cultivar x trial effects. Several researchers have proposed the use of factor analytic (FA) models for this purpose (Smith et al., 2001; Piepho and Mohring, 2006). These approaches target selection of genotypes, and consequently assume that cultivar effects are random and trial effects are fixed. With the assumption of random effects for cultivars, the FA model implies that cultivar effects are correlated between trials. This is consistent with the quantitative genetics approach to the investigation of cultivar x environment interaction in which environments (synonymous with trials) are regarded as traits (Falconer and Mackay, 1996), so that the existence of a genetic variance matrix for traits is an integral part of the analysis. A more comprehensive discussion of the consequences of fitting cultivars as fixed or random can be found in Smith et al. (2005).

There are numerous possible choices for the form of the genetic variance matrix. The most general form is an unstructured matrix that contains t(t + 1)/2 parameters, where t is the number of trials. Although this form has intuitive appeal in terms of attempting to capture the underlying true structure, there may be numerous difficulties with this model. For example, if the number of environments is large, then the number of parameters to be estimated may exceed computational limits. Even with small numbers of environments there may be insufficient information (too few cultivars) to reliably estimate the parameters. Another difficulty is that most computing algorithms do not accommodate situations where either the true variance matrix or its estimate is singular. The alternative proposed by Smith et al. (2001) to overcome these difficulties was to use an FA model. This provides a parsimonious model [if k factors are fitted there are t(k+1) – k(k – 1)/2 parameters to be estimated] and singular matrices are easily accommodated using the algorithm of Thompson et al. (2003). Another model for the genetic variance matrix is the uniform (or compound symmetric) model that evolved from an ANOVA approach for MET data (Smith et al., 2005) and is still used by many researchers.

Despite the recommendations in Piepho (1997, 1998) and Smith et al. (2001, 2002a, 2002b), FA models are not widely used outside Australia for the regular analysis of MET data. The aim of this study was to show the superiority of FA models for this purpose, both in terms of goodness-of-fit to the data and in terms of the most common aim of METs, namely the selection of superior cultivars. The focus in this study is on selection in early generation cultivar trials, which typically test a large number of cultivars across a smaller number of environments, and these trials generally consist of a relatively balanced set of most cultivars being tested at each environment. Other studies on the performance of the FA model for trials taken from later stages in the breeding program can be found in Smith et al. (2001) and Thompson et al. (2003).

Within the framework of a multitrait analysis, selection is achieved with the use of a selection index that is a particular combination of all the traits. This concept is also appropriate in the context of MET. In this case, the index is a weighted average of the cultivar effects in individual environments. This is a flexible approach that accommodates selection for net merit across all environments or subsets of environments. The weights may be chosen in numerous ways, for example the concepts in Cooper et al. (1996) may be adopted, in which case the aim would be to give greater weight to trials that are more representative of the target population of environments. The index then provides the basis on which cultivars are ranked for selection. In practice, the components of the index, namely the cultivar effects in individual environments, are predicted from the data. Predictions are sought such that the correlation between the true and predicted index is maximized and the mean squared error of prediction is minimized (Mrode, 2005). Theory shows that these conditions are achieved with the use of best linear unbiased predictions (BLUPs) of cultivar effects in individual environments. The proviso is that the BLUPs are calculated on the basis of the true form for the genetic variance matrix. In practice, we must choose a model to represent this structure. An additional approximation is required for the formation of the selection index since the variance parameters of the assumed model are unknown and so must be estimated from the data. The selection index is then calculated using empirical best linear unbiased predictions (E-BLUPs). It is therefore important to know the impact on the optimality of the predicted index of both the assumed model for the genetic variance structure and the use of E-BLUPs rather than BLUPs.

The impact of model choice on the prediction of cultivar x environment effects has been considered by Piepho (1998), where cross-validation techniques on five MET data sets were used to compare BLUPs based on a range of models in terms of their predictive accuracy for "filling in" the cells in the cultivar x environment table. The models considered included those under investigation here, namely the uniform, FA, and unstructured variance (US) models. Piepho (1998) concluded that the predictive accuracy of BLUPs from the FA models was superior to that of the uniform model, but the results also appear to suggest that they are generally inferior to that of the US model. Note that for the FA model in Piepho (1998), a common variance was assumed for the lack of fit part, while Smith et al. (2001) allowed a separate (so-called specific) variance for each trial. We investigated the Smith et al. (2001) version of the FA model. Piepho (1998) alluded to the issue of the optimality of BLUPs being dependent on the use of known variance parameters and commented that "it can be hoped that the performance of empirical BLUPs is not far from optimal." We formally investigated this.

We applied a general mixed model analysis for MET data, described below, to eight example MET data sets, the results of which are presented. A range of models for the genetic variance matrix was used for each data set. These include the uniform, FA, and US models. Additionally a diagonal form was used implying that all pairwise genetic correlations between trials are zero. This is analogous to conducting separate analyses for each trial, and basing selections on individual trial data in isolation from other trials. Such an approach is still common practice in many plant breeding programs worldwide and is to be contrasted with the multitrait procedure described above. The goodness-of-fit of the models was compared for each data set. The optimality of selection indices based on estimated variance parameters, that is, using E-BLUPs, was investigated via a simulation study. The accuracy of prediction was compared for the range of variance models described above.


    MATERIALS AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Analysis of Multi-Environment Trial Data
A General Mixed Model
Consider a series of t trials in which a total of m cultivars has been grown (without necessarily all cultivars tested in all trials). It is assumed that the jth trial comprises nj field plots and we let n = {sum}jt=1 nj be the total number of plots. A general mixed model for the n x 1 vector y of individual plot yields combined across trials can be written as

Formula 1[1]
where µ is an overall mean, {tau}e is the t x 1 vector of (fixed) trial main effects and ug is the mt x 1 vector of (random) cultivar effects for individual trials (ordered as cultivars within trials). The n x mt matrix, M, is an indicator matrix that may contain columns whose elements are all zero if the corresponding cultivar x trial combination is not present in the data. Note that in their review, Smith et al. (2005) considered a range of models for {eta} and we have chosen the form used by Smith et al. (2001). The vectors {tau}p and up are vectors of fixed and random effects, respectively, with associated design matrices Xp (assumed to have full column rank) and Zp. These vectors represent trial-specific effects that are peripheral to the effects of main interest, namely the cultivar effects for each trial. For example, if individual trials are analyzed using a randomization-based approach, then up will contain effects associated with the block structure of each trial. In this study, we used the spatial modeling approach of Gilmour et al. (1997) so that {tau}p and up may contain effects for accommodation of nonstationary trend or extraneous variation within individual trials. Finally, the vector e is the n x 1 vector of plot error effects combined across trials.

The random effects in Eq. [1] are assumed to follow a Gaussian distribution with zero mean and variance matrix

Formula 2[2]
The matrix Gp is usually a diagonal matrix of scaled identity matrices. The variance matrix for the plot error effects is assumed to be block diagonal with R = diag(Rj), where Rj is the error variance matrix for the jth trial. Smith et al. (2001) used the spatial modeling approach of Gilmour et al. (1997), so it is assumed that the jth trial comprises a rectangular array of rj rows by cj columns (so that nj = rj cj), and, for data ordered as rows within columns, we have

Formula 3[3]
where {sigma}j2 is a scale parameter and {Sigma}cj and {Sigma}rj are the cj x cj and rj x rj correlation matrices for the column and row dimensions of the trial, respectively. In the Gilmour et al. (1997) approach, these matrices typically correspond to autoregressive processes of order one.

Many researchers assume that cultivar x environment effects may be represented as a two-way structure. We followed this convention but acknowledge that the extension to higher order tables (for example, when environments comprise the factorial combination of geographic locations and years) is possible. Given a two-dimensional structure, we assumed that the variance matrix for the cultivar effects for individual trials has the separable form

Formula 4[4]
where Ge and Gv are t x t and m x m symmetric positive definite matrices. The matrix Ge is often referred to as the genetic variance matrix with the diagonal elements representing genetic variances for individual environments and the off-diagonal elements representing genetic covariances between pairs of environments. Smith et al. (2001) assumed independence between cultivars, so used Gv = Im. An alternative is to use Gv = A, where A is a known relationship matrix, following the approach of Oakey et al. (2006). This may be a preferred approach but as yet is not common practice for the analysis of MET data in Australia. Our focus was on the choice of models for Ge so, for simplicity, and without loss of generality, we assumed Gv = Im.

There are numerous possible choices for the form of Ge. We considered four basic structures for the matrix, namely, diagonal, uniform, FA, and US. The factor analytic model based on k factors, denoted FAk, is given by

Formula 5[5]
where {Lambda} is a t x k matrix of environment loadings, and {Psi} is a t x t diagonal matrix with elements commonly referred to as specific variances. Note that reduced rank models are a special case of the FA model in which more than k of the specific variances are zero.

Estimation, Prediction, and Selection
Estimation of the model in Eq. [1] is achieved using two linked processes. First, the variance parameters are estimated, the most common method of estimation being residual maximum likelihood (REML; Patterson and Thompson, 1971). This usually involves an iterative process. In this study, the average information algorithm (Gilmour et al., 1995) as implemented in the software ASReml (Gilmour et al., 2005) and samm (Butler et al., 2003) was used. Given estimates of the variance parameters, we obtained empirical best linear unbiased estimates (E-BLUEs) of the fixed effects and E-BLUPs of the random effects. In particular, E-BLUPs of the cultivar effects in individual environments were calculated as

Formula 6[6]
where P = H–1 H–1X(X'H–1X)–1X'H–1, the design matrix X = [M1mtM(It {otimes} 1m) Xp], the variance matrix H = var(y) = M(Ge {otimes} Im)M' + ZpGpZp' + R, and all variance parameters have been replaced with their REML estimates. We can write

Formula 7[7]
where ugj is the m x 1 vector of E-BLUPs of the cultivar effects for the jth trial. Then the vector of (predicted) selection indices is constructed as

Formula 8[8]
where wj (j = 1 ... t) are user-supplied weights. We considered an equally weighted measure of net merit, so chose wj = 1/t for all j. Note that selections for individual trials were obtained using the E-BLUPs from the MET analysis, then setting the weights in Eq. [8] to zero for all trials other than the one of interest.

In this study, a key issue was the choice of model for Ge. The goodness-of-fit of two variance models may be assessed using an Akaike information criterion (AIC), or, if the models are nested, a residual maximum likelihood ratio test may be used. In terms of the models under study, the full sequence did not constitute a nested sequence. We note in particular, however, that the uniform model is nested within an FA1 model and an FAk model is nested within an FA(k + 1) model.

Example Data Sets
We considered the analysis of eight MET data sets from Australian plant breeding programs. All data sets relate to early generation (Stage 2) trials conducted in 2004. The number of cultivars in each set ranged from 50 for the Victoria green lentil (Lens culinaris Medik) series through to 1160 for the Queensland wheat (Triticum aestivum L.) series (Table 1). All data sets were reasonably complete in the sense of most cultivars being grown in all trials. The percentage of cultivar x trial combinations present ranged from 69 to 100% (Table 1). This high level of balance can be attributed to the fact that the data span a single year only. Most data sets were characterized by heterogeneity of trial mean yield, in particular the Queensland sorghum [Sorghum bicolor (L.) Moench] set. The trial designs for all series except Victoria barley (Hordeum vulgare L.) (in which all cultivars were replicated in all trials) were typical of Australian early generation cultivar trials. The South Australia barley and Victoria lentil series were designed using grid-plot designs (e.g., see Cullis et al., 2006), which, historically have been the most common type of design for this stage of testing. In Australia these designs are now being replaced by the more efficient partially replicated designs of Cullis et al. (2006). These designs were used in the New South Wales barley and all Queensland example data sets. The designs are particularly amenable to MET as it may then be possible to balance replicates across trials by choosing different subsets of cultivars for replication in individual trials. This approach was used in the Queensland barley and sorghum series.


View this table:
[in this window]
[in a new window]

 
Table 1. Example multi-environment trial data sets: numbers of trials and cultivars, percentage of cultivar (C) x trial (T) combinations present in data, trial design type, and range in trial mean yields.

 
Each data set was analyzed using the methods described above. The first step in this process was to determine appropriate spatial models for each trial. To simplify the computations, this was achieved using a diagonal model for Ge. More complex forms for Ge were then fitted, maintaining the spatial models identified in the first step (but reestimating the parameters). Details of the spatial models are not presented here as the focus in this study was on the comparison of models for Ge. We note, however, that there was spatial heterogeneity in most trials and significant heterogeneity of residual variance across trials within all series. The models fitted to Ge for each data set were the diagonal, uniform, and US models together with the FA model with a single factor through to the maximum number of factors possible for the data set. The latter is dependent on the number of trials and is given by the largest integer satisfying k ≤ [2t + 1 – {surd}(8t + 1)] (for details, see Smith et al., 2001). The AIC values for each model (expressed as deviations from the best model) are given in Table 2. From this we note that the FA model is the model of choice for six of the eight early-generation data sets. For the remaining two data sets, US is the preferred model, as was the case in the study by Piepho (1998).


View this table:
[in this window]
[in a new window]

 
Table 2. Goodness-of-fit for the range of genetic variance models{dagger} (diagonal, uniform, factor analytic, unstructured) fitted to example data sets. Models can be compared on the basis of Aikake's information criterion (AIC), which is given here as the difference between individual models and the best model for that data set. The best model therefore has a value of zero (underlined). The number of parameters, np, in the genetic variance matrix Ge is also given.

 
Summaries of estimates of the genetic variance parameters from the best model for Ge for each data set are given in Table 3. There was heterogeneity of both genetic variance and genetic correlation in all data sets. Genetic correlations were predominantly positive, but there were occasions where some trials had negative genetic correlations with others, suggesting that there may have been a reversal in cultivar rankings. The final column in Table 3 gives the range across trials in estimated line mean heritability. The heritability (hj2) value for the jth trial was calculated from a generalized formula for unbalanced data following the approach of Cullis et al. (2006), as

Formula 9[9]
where Aj is the average pairwise prediction error variance of cultivar effects for the jth trial and {sigma}gj2 is the genetic variance for the jth trial, taken from the diagonal model for Ge.


View this table:
[in this window]
[in a new window]

 
Table 3. Summary of parameter estimates from the best model for the genetic variance matrix Ge for each example data set: range in estimated genetic variances (diagonal elements of Ge) and percentage of non-negative and range in absolute values of estimated genetic correlations.

 
Simulation Study
The analysis of the example data sets detailed above supports our experience in analyzing numerous Australian MET data sets in that FA models regularly provide a good fit to the data. We also investigated whether FA models are better in terms of more accurate cultivar selection. This is best achieved using a simulation study. We considered 12 different data sets comprising the factorial combinations of three values for the number of cultivars (m = 80, 200, and 500), with four underlying genetic variance matrices. The range in number of cultivars used in the simulation covers the sizes found in Australian early generation trials for a range of crops. The four underlying genetic variance matrices were taken from two of the example data sets, chosen to contrast between a best-fit US model (Queensland wheat) and a model with an underlying FA2 best-fit (South Australia barley) (see Table 2). For each of these two data sets, parameters from both the fitted FA (with k = 2) and US structures were taken to form four genetic variance matrices for data generation. They form a complete factorial combination of the two data sets by two variance models: an FA2 and US model.

For simplicity, the data sets were assumed complete with all cultivars in every trial. All trials were designed as partially replicated designs (Cullis et al., 2006), with replicate plots of 20% (for m = 200 and 500) or 25% (for m = 80) of the cultivars and single plots of the remainder.

Individual plot yield data were generated according to a simplified version of the model in Eq. [1], namely

Formula 10[10]
with

Formula 11[11]
For the data set generated from the Queensland wheat example with seven trials, the fixed effects, residual variances, and genetic variances used to generate these data are given in Table 4. Genetic variances taken from the US and FA2 fit to these data were the same to two significant digits, and are reported for the US model. Two corresponding forms for Ge were used, derived from the US and the FA2 fit to this data set, and these are summarized as genetic correlation matrices in Table 5. Likewise, the fixed effects and residual and genetic variances used to generate the data from the South Australia barley example are given for a 10-site scenario in Table 6, with the corresponding form for Ge given as correlations in Table 7, for both the FA2 and US fit to these data.


View this table:
[in this window]
[in a new window]

 
Table 4. Parameters used for data generation in simulation study: fixed effects, residual and genetic variances for the seven-trial set based on Queensland wheat (Triticum aestivum L.).

 

View this table:
[in this window]
[in a new window]

 
Table 5. Genetic correlations for the two-dimensional factor analytic model fit (above the diagonal) and the unstructured variance model fit (below the diagonal) used in simulations for the seven-trial set based on Queensland wheat (Triticum aestivum L.).

 

View this table:
[in this window]
[in a new window]

 
Table 6. Parameters used for data generation in simulation study: fixed effects, residual and genetic variances for the 10-trial set based on South Australia barley (Hordeum vulgare L.).

 

View this table:
[in this window]
[in a new window]

 
Table 7. Genetic correlations for the two-dimensional factor analytic model fit (above the diagonal) and the unstructured variance model fit (below the diagonal) used in simulations for the 10-trial set based on South Australia barley (Hordeum vulgare L.).

 
As all fixed effects and variance parameters were chosen from representative sets of Australian MET data, they directly capture the heterogeneity observed in these types of early generation trials.

A total of 200 simulations was conducted for each of 12 data generation models: three trial sizes x four variance models. In each simulation run, the data sets were analyzed using six models for Ge, namely diagonal, uniform, a factor analytic model with one (FA1), two (FA2), or three (FA3) factors, and US.

The results from each analysis were used to calculate a mean squared error of prediction (MSEP) for each trial, namely

Formula 12[12]
and an MSEP for an equally weighted selection index, namely

Formula 13[13]
A summary of the ratio of MSEP relative to the US model for each of the five other models for analysis is given in Table 8. A clear feature of the table is the poor performance of the simplistic models (common covariance and diagonal variance models) compared with the US and FA models. Ratios of MSEP relative to US ranged from 1.39 to 2.39 for the common covariance model and from 1.11 to 1.22 for the diagonal variance model across the range of simulation scenarios (Table 8).


View this table:
[in this window]
[in a new window]

 
Table 8. Ratio of mean squared error of prediction for the five analysis models relative to the unstructured variance (US) model for 200 runs of the 12 simulated data sets.

 
Comparisons among FA and US models can be summarized by considering each of their counterpart data generation models in turn. First, we consider data generated using an FA2 model. In this case, the use of an FA2 model for analysis yielded a smaller MSEP than fitting a US model for both types of genetic variance structure (Queensland wheat and South Australia barley) and for all numbers of cultivars studied. Also, given this data generation model, the FA2 analysis was superior to both the FA1 and FA3 analyses. When data were generated using the US model, the results were more complex. With the largest number of cultivars (i.e., 500), the US analysis yielded a lower MSEP than any of the FA models. With fewer cultivars, however, the FA models were very competitive with US. In the case of 80 cultivars, all FA models yielded lower MSEPs than US (except for the FA3 model for Queensland wheat, which was marginally higher than US) and the FA2 model performed particularly well. In the case of 200 cultivars, the FA2 and FA3 models had MSEP values that were lower than or equal to those of US.

Practical difficulties arose when fitting the US model, due to updates of Ge being outside of the parameter space. In these cases, it was necessary to constrain the parameters in some way. In the ASReml package, one option is to invoke expectation–maximization updates rather than average information updates when the latter produce estimates of Ge that are outside the parameter space. Expectation–maximization was invoked in the majority (94%) of simulations when the generated data set involved the least number (80) of genotypes. In contrast, it was only needed in 1% of the simulations for the data sets involving 500 genotypes.

The FA1 and FA2 models converged for all simulation runs, provided that sensible starting values were given for the FA parameters. The FA3 model failed to converge for some simulation runs (on average <1% of simulation runs for m = 500 and 200, and 11% of simulation runs for m = 80). Experience in fitting these models indicates that convergence is usually achieved if the user progresses through a sensible sequence of models, beginning with a diagonal model and continuing through increasing dimensions of the FA model.


    RESULTS AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
This study demonstrates that the FA model is the preferred model for analyzing MET data for trials typical of those occurring in Australian plant-breeding programs. It is a robust model with predictive accuracy when there is substantial genotype x environment interaction, consisting of heterogeneous genetic variances and crossovers in genotype ranking.

Although it forms a simpler approximation to the fully unstructured variance matrix, a series of simulations revealed that the FA model is generally the preferred model over US, even for a number of data sets where the underlying variance structure was generated from a US model. The superiority of the FA models over US was particularly evident when the data involved a smaller number of cultivars. The smallest number considered in this study was 80 and we are aware that this is still quite high compared with many other examples of cultivar trial data in the literature. Our findings contrast with those of Piepho (1998), who reported that US was generally the superior model, based on a cross-validation study of five data sets. The difference in results may be due to a different underlying variance structure in the data sets used in the study by Piepho (1998), or due to the different form of the FA model he fitted with common specific variances.

More important from the perspective of the breeding program is the predictive accuracy of each analysis model, as this directly impacts on the gains made in the selection process. Our simulation study examined this predictive accuracy when variance parameters are estimated from the model, rather than assumed known, and resulted in the FA model being judged equivalent in predictive accuracy to US for E-BLUPs, and robust across a range of variance models. In the software that we used (ASReml and samm), the FA model is also generally easier to fit than the US model. The FA algorithm in ASReml (see Thompson et al., 2003) deals with cases where the estimated variance matrix is of reduced rank, whereas for the US model these cannot yet be dealt with efficiently using the available software.

A common practice currently used by plant breeders is the independent assessment of results from each trial. This is equivalent to predictions from the diagonal variance model, which allows for variance heterogeneity but no correlation in genotype performance across trials. The FA model is far superior to this approach, as it captures both variance heterogeneity and a more complex covariance structure at the genetic level, resulting in higher predictive accuracy both for individual trials and for net merit across trials. Likewise, the FA approach is far superior to fitting a simple variance component model, as the latter was the model with the lowest level of predictive accuracy in the simulation study.

The question remains as to why there is not widespread adoption of the FA methodology for the analysis of MET data, as it is a robust model with high predictive accuracy. In Australia, plant breeding programs benefit from the application of this methodology in a number of ways. Underlying all selections is the assurance that the genetic structure is being modeled with a more realistic variance model, which was shown here to be far superior to an independent-sites analysis or simple variance component model. Improved predictive accuracy translates directly to heritability and hence genetic gain made in the breeding program. The FA methodology results in improvements in terms of MSEP of 10 to 20% over an independent-sites analysis, or >40% over a simple variance component model.

Other practical gains are in the improved understanding of genotype x environment (g x e), and how the set of sites in the current year align with previous selection environments. Plots of environmental vectors may be used to summarize the g x e pattern graphically, akin to the biplot methodology of Kempton (1984). Of greater importance is the summary of how cultivar response varies across the whole environmental spectrum. A measure of cultivar stability across the set of environments under study is inherent in the methodology, and can be graphically displayed through a regression of cultivar scores against site loadings (Smith et al., 2002a).

In this study we have demonstrated the superiority of FA methodology for early-generation trials. Further research will investigate the performance of FA models for breeding trials in the later stages of testing in which there are more environments (typically involving a range of seasons as well as site locations) and the data are often less balanced in the sense that a greater percentage of genotype x environment combinations are missing.


    ACKNOWLEDGMENTS
 
We gratefully acknowledge the financial support of the Grains Research and Development Corporation of Australia, and would like to thank Gabriela Borgognone, Colleen Hunt, and Chris Lisle for supplying data sets used in this study.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

Received for publication August 24, 2006.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
Y.-S. So and J. Edwards
A Comparison of Mixed-Model Analyses of the Iowa Crop Performance Test for Corn
Crop Sci., August 7, 2009; 49(5): 1593 - 1601.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
J. Burgueno, J. Crossa, P. L. Cornelius, and R.-C. Yang
Using Factor Analytic Models for Joining Environments and Genotypes without Crossover Genotype x Environment Interaction
Crop Sci., July 1, 2008; 48(4): 1291 - 1305.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kelly, A. M.
Right arrow Articles by Cullis, B. R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Kelly, A. M.
Right arrow Articles by Cullis, B. R.
Agricola
Right arrow Articles by Kelly, A. M.
Right arrow Articles by Cullis, B. R.
Related Collections
Right arrow Biometrics
Right arrow Crop Genetics
Right arrow Statistics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome