|
|
||||||||
a Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, México D.F., México
b Wheat Program, CIMMYT
c Dep. of Plant and Soil Sciences and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0312, USA
d Biometrics and Bioinformatics Unit, International Rice Research Institute (IRRI), DAPO Box 7777, Manila, Philippines
e Dep. of Statistics, University of Madras, Chennai 6000 005, India and IRRI
* Corresponding author (j.crossa{at}cgiar.org)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: MET, multi environment trials GE, genotype x environment interaction SE, standard error BLUP, Best Linear Unbiased Prediction SREG, Site Regression model AMMI, Additive Main effect and Multiplicative Interaction
| INTRODUCTION |
|---|
|
|
|---|
ij, of the ith genotype (i = 1,2,...,g) in the jth environment (j = 1,2,...,s) with r replications in each of the gxs cells is expressed as
![]() | [1] |
i is the main effect of the ith genotype,
j is the main effect of the jth environment, (
)ij is the interaction of the ith genotype in the jth environment, and
ij is the mean of the experimental errors contributing to
ij, assumed to be NID (0,
2/r), where
2 is the within-site error variance, assumed to be constant.
A detailed history of the development of the fixed effects linear-bilinear models used in plant breeding is given in Crossa et al. (2001). When the GE is modeled bilinearly, i.e., as (
)ij =
k=1t
k
ik
jk (Gollob, 1968; Mandel, 1969, 1971), Eq. [1] is re-expressed as
![]() | [2] |
k is the singular value for the kth bilinear component with the
k ordered as
1
2
...
t
0;
ik is the ith element of the left singular vector for the kth multiplicative (bilinear) component representing genotypic sensitivity to hypothetical environmental factors. The effects of these factors are modeled by the right singular vector for the kth bilinear (multiplicative) component, the elements of which are denoted as
jk. The
ik and
jk satisfy the orthonormalization constraints
i
ik
ik' =
j
jk
jk' = 0 for k
k' and
i
ik2 =
j
jk2 = 1. Zobel et al. (1988) and Gauch (1988) referred to this model as the Additive Main Effects and Multiplicative Interaction model (AMMI).
Another linear-bilinear model form, described by Cornelius et al. (1996), is the Sites Regression Model (SREG)
![]() | [3] |
k,
ik, and
jk are similar to their definitions in the AMMI model. However, in the SREG model the main effects of genotype are absorbed into the bilinear terms. The SREG model has been used for grouping environments without statistically significant genotypic rank change (Crossa and Cornelius, 1997; Crossa et al., 2004). The interaction parameters
ik and
jk in the AMMI and SREG linear-bilinear models model the behavior of genotypes and environments. When plots of (
i1,
i2), i = 1,2,...,g and (
j1,
j2), j = 1,2,...,s are overlaid as a biplot (Gabriel, 1978), useful interpretations of the relationships between genotypes, environments, and GE are obtained (DeLacy et al., 1996; Yan et al., 2000; Crossa et al., 2002, 2004). The above statistical models for studying GE have been developed in the context of the two-way fixed effects or random effects models (Cornelius et al., 1996; Cornelius and Crossa, 1999), where the variances within sites are assumed to be equal and all pairwise covariances between sites are zero because sites are assumed to be independent (Crossa et al., 2001; Cornelius et al., 2001). However, heterogeneous variance-covariances usually arise in MET for the following reasons: (i) heterogeneous within-site error variances caused by site-to-site differences in variability among plots with respect to properties that affect realized values of the measured traits, (ii) some sites and/or years are likely to show more genotypic variation than others, and (iii) environmental factors such as soil type, temperature, precipitation, and/or elevation may cause heterogeneous covariances among sites (i.e., some sites are more similar than other sites). Furthermore, traditional statistical analyses of MET have assumed that in each site individual field plot errors are spatially independent and that genotypes are unrelated. These assumptions are not realistic because sites in close geographic locations can be expected to be alike, related genotypes, such as full-sibs, half-sibs, sister lines, etc., tend to be more alike than unrelated genotypes, and observations made in field plots in close proximity tend to be correlated.
The main feature of mixed linear model methodology, compared with fixed effects modeling, is that it allows modeling not only independent observations, but also heterogeneous and correlated variancecovariance structures. In mixed models, some effects are assumed to have arisen from a distribution (of random effects). This implies that there is (at least conceptually) a broad population of genetic effects and that samples are realized values from that population, and these can be predicted by using best linear unbiased predictors (BLUPs) (Henderson, 1975). Technically, when variancecovariance parameters must be estimated from the data, the resulting BLUPs are actually "empirical" BLUPs (EBLUPs), but in this paper, we will refer to them as BLUPs. Conversely, inferences on fixed effects in the model are restricted only to the observed levels, genotypes or environments, and empirical best linear unbiased estimates (EBLUE) are computed using empirical generalized least squares (EGLS).
Mixed linear models allow accurate prediction of genotypic performance by using covariance structures that consider correlations between sites, years, and plots in the field, as well as genetic associations between relatives. Piepho (1994) used BLUPs of effects of genotypes in an analysis of genotypic adaptability. BLUPs of breeding values of different parental soybean lines were very effective for predicting the performance of crosses using data from past METs (Panter and Allen, 1995a, 1995b); however, they did not include GE in their models.
The first authors to consider the traditional analysis of the regression of the genotypic means on the site means within the framework of a random effects model were Gogel et al. (1995) and Piepho (1997). Piepho (1997) considered a factor analytic variancecovariance structure for modeling a mixed model version of the regression of the genotypic mean on the site means and suggested a mixed model version of AMMI with environments random and genotypes fixed as another variant of factor analytic modeling for GE. Piepho (1998) considered mixed model versions of SREG, genotype regression (GREG) and AMMI with genotypes random. Smith et al. (2002, p.323335) described random effects factor analytic models analogous to SREG (Eq. [3]) and AMMI (Eq. [2]) and explained how the multiplicative factor analytic model accommodates random regression coefficients as opposed to scores obtained as elements of eigenvectors derived from the singular value decomposition of the GE matrix of fixed effects. Furthermore, sites may form clusters showing higher genotypic correlations than other site groups, as shown by Crossa et al. (2004), who used factor analytic mixed model theory to confirm clusters of sites formed by SREG with negligible crossover GE. Smith et al. (2005) gave a general formulation of the most common mixed models used for the analysis of MET including the factor analytic model.
In plants, breeding values of genotypes have not been used as extensively as in animal breeding because in plant breeding selection is almost always based on comparisons derived from carefully designed equireplicate experiments of the candidates for selection themselves or on replicated progeny tests of candidates for selection. In animal breeding, selection is based on information from highly unbalanced data sets in which there is a much greater need for BLUPs as a basis for selection. Furthermore, it is only in recent years that there has existed convenient general mixed model or random effects model computing software to accomplish the computation. Some animal breeding computing software would not be very practical in a plant breeding context. In addition, the plant breeder is constrained by time, often requiring the seed of selections to be prepared in time for the following year's nursery and/or yield trials or perhaps even an off-season nursery. These constraints and the lack of expedient software have likely discouraged the conduct of complicated analyses to obtain BLUPs.
BLUPs of breeding values shrink (i.e., adjust) empirical means toward the general mean. BLUPs of breeding values of genotypes are affected not only by the performance of their relatives but also by the number of observations, with the consequence that BLUPs of breeding values of genotypes with few observations shrink (i.e., adjust) the empirical genotype means toward the general mean more than do BLUPs of breeding values of genotypes with more observations. Recently, Bernardo (2002) summarized the interpretation of the genetic effects (additive, dominance, and epistasis) of BLUPs in different crop species. In the context of mixed linear models, BLUPs of breeding values of random genotypes are potentially useful for selecting superior genotypes because BLUPs allow using information from relatives through coefficients of parentage (COP). Information on relatives using COPs or molecular markers is not routinely incorporated in MET analysis, probably for the above mentioned reasons.
When individuals inherit copies of the same alleles at a substantial number of loci, the consequence is that they tend to show phenotypic resemblance due to genetic relationship (genetic covariance). The genetic covariance between relatives enhances breeding progress because it allows information from relatives to be optimally combined (or nearly so) with direct information on candidates into a computed selection criterion. The genetic covariance between any pair of related individuals (i and i'), because of their additive genetic effects, is equal to two times the COP between the strains, fii' multiplied by the population additive genetic variance,
a2 (Kempthorne, 1969). The COP is also known as the coefficient of coancestry (Falconer, 1989), and the matrix A = 2[fii'] is the additive relationship matrix (Henderson 1976). Thus, A
a2 is the variancecovariance matrix of the breeding values (additive genetic effects). Closely related individuals contribute more to the prediction of breeding values of their relatives than do less closely related genotypes. Moreover, when one genotype is missing (either partially or totally), its breeding value can still be predicted from its relatives, albeit less efficiently than if the data were complete.
Henderson's (1975) development of "mixed model equations" (MME) enabled computation of generalized least squares (GLS) of fixed effects and estimation of realized values of random effects (BLUPs) without having to construct and invert the covariance matrix V of the vector of response values (y). Henderson (1976) further obtained a shortcut method for computing the inverse of the relationship matrix involved in the MME in animal and plant breeding problems. These developments provided breeders with an efficient tool for genetic selection. However, it is expected that breeding values may be different under different environmental conditions and agricultural production systems. The prediction of breeding values in the context of a MET to simultaneously model GE and the relationship between genotypes through COPs does not seem to have been thoroughly studied. The objective of this study is to show how to use, and to report results from using, different mixed models for modeling the main effects of genotypes and GE using information on related genotypes for predicting breeding values of wheat (Triticum aestivum L.) genotypes evaluated in MET. A CIMMYT international wheat MET is used as an example.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Mixed Model 1 with Covariance between Relatives
Combining Main Effects of Genotype and Genotype x Environment Interaction
Mixed Model 1 (MM1), used for fitting the data from g genotypes, s sites, and r replicates (in each site) and assuming the relationship of the genotypes is measured by the matrix COP = fii' (of order g), is
![]() |
N
. The variance-covariance matrix of Y is V(Y) = ZrR Z'r + Zg1 G1 Z'g1 + E.
A more detailed representation of these matrices, similar to that given by Crossa et al. (2004), is as follows:
![]() |
r
Ir =
Ir, and E =
e
Irg =
Irg, where Ir and Irg are identity matrices of orders r and rg, respectively,
r and
e are the replicate and residual variance matrices, respectively, and
is the Kronecker (or direct) product operator. In covariance pattern models, it is assumed that residuals have a multivariate normal distribution with zero means and covariance matrix E. In this study, the structure of E assumes that the residuals of the field plots at each site (i.e., elements of vector e) are not spatially correlated, that is, E =
e
Irg. However, although we do not do so in this paper, when the field location of the plots is recorded, the matrix E could be modeled using, for example, the two-dimensional auto-regressive procedure in the direction of the rows and columns in the field (Gilmour et al., 1997).
The variancecovariance matrix G1, which combines the main effect of genotypes (breeding values) and GE, can be represented as:
![]() |
g1 is the additive genetic variance
aj2 within the jth site, and the jj'th element is the additive genetic covariance
jj'
aj
aj' between sites j and j'; thus
ij' is the correlation of additive genetic effects between sites j and j'. The matrix A = 2fii' is of order g x g and measures the relationship or covariance between relatives due to additive genetic effects. When genotypes are not related, A is replaced by Ig (identity matrix of order g) (Smith et al., 2002; Crossa et al., 2004), and the breeding value of each genotype will be predicted only by the value of the empirical responses of the genotype itself.
The mixed model equations and the solution for the vector of fixed effects of site means (
) and the vectors of random effects (
,
1) are obtained following Henderson (1975).
Mixed Model 2 with Covariance between Relatives
Distinguishing Main Effects of Genotype from Genotype x Environment Interaction
Mixed Model 2 (MM2), for fitting g genotypes, s sites, and r replicates, assuming that the covariance between relatives is proportional to twice the COP matrix [fii'] multiplied by the additive genetic variance
a2, is
![]() |
N
. The variance of Y is V(Y) = Zr RZ'r + Zg GZ'g + Zge GEZ'ge + E.
Variance-covariance matrices R and E are assumed to have a simple variance component structure, as defined for MM1. The variance-covariance matrix of the main effects of genotypes in MM2 is modeled as G = A
a2 and the variance-covariance of the genotype x environment interaction is modeled as GE =
ge
A where GE is of the same form as G1 in MM1 except that in the notation,
a1,
a2, ...,
aj is replaced by
ae1,
ae2, ...,
aej where the ae subscript denotes the additive x environment interaction. The mixed model equations and the solution for the vector of fixed effects (
) and random effects (
,
,
) are similar to those for MM1, but with equations for random effects of genotypes and GE being separately distinguished.
Modeling G1 of Mixed Model 1 and GE of Mixed Model 2
Different structures can be proposed to model matrices
g1 of G1 and
ge of GE for model MM1 and MM2, respectively. This will produce several models within each of the two main mixed model classes, MM1 and MM2. The most restrictive variance-covariance structure is to assume that genetic (in MM1) or GE (in MM2) variances within all sites are equal, and all pairwise correlations between genotypes and between site are zero. The most liberal structure is the completely unstructured model which assumes matrices
g1 and
ge contain s(s+1)/2 parameters. In this study, we have used two more conservative types of variance-covariance structures for modeling these matrices, namely, compound symmetry and factor analytic models.
Compound symmetry models assume that correlations between the effects of any one genotype measured in any two different environments is
, with 1/(s 1)
1. Furthermore, compound symmetry models can take two forms, differing with respect to whether we assume site-site homogeneity or heterogeneity of within site genetic variances. The form that assumes site-to-site homogeneity of genetic variance (CS model) is such that
g1 or
ge has the structure {
a2[(1
)Is +
Js]} (where Js is a s x s matrix of ones). Alternatively, the model can incorporate site-to-site heterogeneity of within-environment genetic variance (CSH model), in which case
g1 or
ge has the structure {diag(
aj)[(1
)Is +
Js]diag(
aj)}. For the CS case, there are two parameters, namely, constant genotypic variance within environments and constant correlation (
) between effects of a common genotype measured in different sites in the G1 and the GE matrices. For the CSH case, there are s + 1 parameters, namely s genetic variances,
aj2 (j = 1,2,...s) in the diagonal and the homogeneous correlation,
, in the off-diagonal so that the covariance for sites j and j' is 
aj
aj'.
The factor analytic structure models covariance among observations in terms of a few hypothetical unobserved factors and can be useful for modeling matrices G1 and GE of MM1 and MM2, respectively. MM1 with G1 modeled using a factor analytic structure is the mixed model counterpart to the SREG model (Eq. [3]). Similarly, MM2 with GE modeled using a factor analytic structure is analogous to the AMMI model (Eq. [2]). The factor analytic structure with q
s factors or components [FA(q)] is of the form 
' + D, where
is an s x q matrix and D is an s x s diagonal matrix with s possibly different nonnegative parameters on the diagonal. Each column of
contains the site scores for one of the multiplicative terms. For q = 1, the model, denoted as FA(1), has one multiplicative term and 2s parameters to be estimated, for q = 2, i.e., model FA(2), the model has 3s 1 parameters to be estimated, and so on for FA(3), etc. Boundary constraints must be imposed on the solutions to avoid overparameterization (parameter nonidentifiability). This is achieved by fixing some estimates to have values of zero.
Submodels from MM1 and MM2
As previously defined, matrices G1 of MM1 and GE of MM2 are each a Kronecker product of two matrices that depend on specific sub models defined below. For simplicity these submodels will be named Model 1 through Model N, and they are described in Table 1. Model 1 forms are the models obtained by regarding vector g1 in MM1 and vectors g and ge in MM2 as fixed effects rather than random, thus absorbing these effects into the b vector in the two respective models; thus g1, g, and ge are not defined in these models. Model 2 comprises the identity matrices Is and Ig, such that G1 of MM1 and GE of MM2 are fitted as random effects, without including A. Model 3 is the same as Model 2 but replaces
g2 with
a2 and matrix Ig with A. Model 4 includes matrix A and the diagonal matrix where
aj2 is the additive genetic variance in the jth environment. Model 5 assumes homogeneous compound symmetry (CS). Model 6 assumes heterogeneous compound symmetry (CSH). Model 7 considers a one-component factor analytic structure for the matrix of additive genetic variances and covariances. Model 8 is the same as Model 7 but with a two-component factor analytic structure. Models 9, 10,..., etc. use a factor analytic structure with three, four, etc. components, until a model is obtained for which information criteria show that the model is overfitted.
|
,
,...,etc. are fitted until an overfitted model is obtained. The number of covariance parameters in these two model forms [FA(q)
Ig and FA(q)
A] is the same, provided that the value of q (number of components) and the number of boundary constraints imposed on the solutions are the same for the two models. Variance component estimation and fitting of the different models were done using the Restricted Maximum Likelihood (REML) method and the Average Information algorithm implemented in ASReml (Gilmour et al., 2002). Appropriate selection of initial values for the factor analytic model is important. Our experience with ASReml is that the FA(1) solution provides a good starting point for fitting FA(2), and in general, the FA(k 1) solution provides a good starting point for fitting FA(k) (k = 1,2,3....).
Measures of Submodel Adequacy
As the number of parameters in the model increases, the maximum value of the residual log likelihood (RLL) statistics is expected to increase. The residual log likelihood ratio test (Wolfinger, 1993; Brown and Prescott, 1999) and the information criteria given in Table 46.2 of SAS (2004) were used for assessing and comparing models which included the same fixed effects in the model. In this paper we have not considered alternative diagnostics based on crossvalidation.
The information criteria included Akaike's Information Criterion, AIC = 2RLL + 2d, (where d is the number of independent variancecovariance parameters in the model), Akaike's Information Criteria Corrected for finite sample size, AICC = 2RLL + 2dn/(n d 1), (where n is the number of observations), the Hannan and Quin Information Criterion HQIC = 2RLL + 2d log[log(n)], the Schwarz Bayesian Criterion BIC = 2RLL + d[log(n)], and the criterion developed by Bozdogan CAIC = 2RLL + d[log(n)+1].
Biplots of Fitted Submodels
Biplots of all fitted models can be obtained from the two-way table of g genotypes and s sites. Biplots of fitted factor analytic models can be obtained directly from the scores of the genotypes and the loadings of the sites. Biplots of Submodel 1 are the usual biplots for the SREG and AMMI fixed effects models.
The Coefficient of Parentage Matrix
The COP between individuals i and i' is the probability that an allele from a randomly selected locus in individual i is identical by descent with an allele randomly selected from the same locus in individual i' (Cockerham, 1971). The derivation of the matrix of coefficients of parentage (COP) is computed following the Cockerham (1983) approach using the coefficient of inbreeding, and the coefficients of parentage between crosses and within crosses. In the case of successive generations of self-fertilization, sister lines have different COP values depending on the generation at which the last common ancestor is present. Therefore, COPs between sister lines tend to be different and higher if the generation of the last common ancestor is closer to the generation at which the lines are present. The software used for deriving the relationship matrix was the Browse application of the International Crop Information System (ICIS) described in McLaren et al. (2005) which accounts for selection as well as inbreeding, and improves the accuracy of breeding value estimation. The Browse application is described at http://cropwiki.irri.org/icis/index.php/TDM_GMS_Browse; verified 27 March 2006.
Experimental Data
The data used are from a CIMMYT bread wheat international trial. Twenty-nine lines (129) were tested in 16 international sites, namely Mexico (MEX), USA (two sites, USA1 and USA2), Turkey (TKY), Israel (ISR), Bangladesh (BGD), India (IND), Pakistan (PKT), Syria (SYR), Spain, (SPN) (two sites, SPN1 and SPN2), Nepal (NPL), Kenya (KNY), Zimbabwe (ZBW), New Zealand (NZL), and Chile (CHL), in randomized complete block designs with three replications at each site. The response variable analyzed was grain yield (Mg ha1). There were five sets of sister lines (5, 6), (4, 19, 20), (14, 15), (21, 23, 24), and (28, 29).
| RESULTS |
|---|
|
|
|---|
Fitting the Submodels from MM1
When fixed Model 1 was fitted, the F test resulted in a significant effect for the combination of line main effect and GE for grain yield (P
0.05). For Model 1, the FR test of Cornelius et al. (1992), which assesses the significance of residual variation after fitting the first k 1 multiplicative components, found no significant residual (P
0.05) after fitting the seventh multiplicative component, and, similarly, the FGH1 test (Cornelius et al., 1996) used for judging the significance of sequentially fitted multiplicative terms found seven significant terms (P
0.05).
Values for the 2 x logarithm of the restricted likelihood function (2RLL), five information criteria, standard error of line mean differences, and residual variances for submodels of MM1 for grain yield are shown in Table 2. The model that best fits the data was Model 15 [FA(9)
A] by AIC, AICC, HQIC, BIC, and CAIC criteria. Model 16 is less parsimonious than Model 15 because it estimated five more nonzero parameters (nep = 116) than Model 15 (nep = 111), and the difference in the 2 x residual log likelihood (2RLL) was 208 207 = 1, which is not significant by the
2 test with 5 degree of freedom (P = 0.925). On the other hand, Model 14 estimated seven parameters less than Model 15 and the differences on the 2 x residual log likelihood (2RLL) was 50, which is highly significant by the
2 test with 7 degree of freedom (P = 0.000). This is consistent with the observation that all of the information criteria were poorer (i.e., larger) when going from Model 15 to Model 16 but better (i.e., smaller) when going from Model 14 to Model 15. It is interesting to note that, although the average standard error of the difference between BLUPs of any two genotypes was not used as a model selection criterion, Model 15 had the smallest average standard error of a difference (0.380, Table 2). Among the factor analytic models that exclude COP, Model
[FA(3)
Ig] gave the best fit, but it performed worse than factor analytic models that included COP in the model definition (e.g., Model 15).
|
.
Although the factor analytic model with two components, Model 8 [FA(2)] of MM1 was significantly different from Model 15 [FA(9)] (P = 0.000 for
2 = 589 with 66 df); the correlation between the BLUPs obtained from Model 8 and Model 15 was 0.998, indicating the similarity between the two models [FA(2) and FA(9)] in predicting the BLUPs of the breeding values of the lines included in this experiment. Furthermore, all correlations between the BLUPs obtained from pairs of models among Models 8 through 15 were
0.98, indicating similarities between these models in predicting the breeding value of the lines. Although the estimates of Models 8 and 15 were not identical, a correlation of 0.97 between site covariances from FA(2) and FA(9) was found. Relatively high correlations were estimated, among others, between ZBW and ISR, USA1, USA2, SPN1 as well as between ISR and USA1, USA2, and MEX.
Standard Errors of BLUPs of Breeding Values of Genotypes in Sites
Standard errors (SE) of BLUPs of breeding values of the lines for grain yield ranged from 0.18 to 0.44 Mg ha1. In general, SEs were smaller for models using information on relatives and when the main effect of genotypes and GE were modeled using a factor analytic structure for the G1 variance-covariance matrix (Fig. 1
). The fixed Model 1 had the largest SEs, followed by Model 2 (without A) and Model 3 (with A). In contrast, Model 15 had the smallest SE for most of the line-environment combinations and also was the model with the smallest values for all the model selection criteria (Table 2). These results suggest that Model 15 gave more precise BLUP of breeding values of the lines than the other models.
|
The benefits of including information on related lines in terms of precision are evident when comparing Models 2 and 3; Model 3 is similar to Model 2 but includes information on relatives. It is clear that the SEs of sister lines were always smaller than the SEs of genotypes that had no relatives in the trial in all cases except for Models 1 and 2, which do not incorporate this information. When modeling the main effects of genotype and GE in conjunction with the information on relatives, the improvement in the precision of the BLUP of the sisters as well as the other lines is evident.
Biplots
The biplots of Model 1 (excluding matrix A) and Model 15 [FA(9) including matrix A] of MM1 are depicted in Fig. 2
and 3,
respectively. The biplot of Model 1 (Fig. 2) is the usual biplot which shows Genotypes 19 and 20 as having a positive response in terms of genotype main effect and GE for most of the sites because they are in the same direction. Genotypes 16, 18, and 25, located on the opposite side of the biplot, have a negative response in all sites. Sites located farther away from the center, such as NZL, USA1, ZBW, ISRL, and SPN2, are the ones that discriminate the genotypes the most. Pairs of sister lines 19 and 20 and 14 and 15 are distinct from the others. Sister lines are scattered throughout the two dimensions of the biplot.
|
|
Although the general pattern of response in terms of directions and projections of genotypes and sites in the biplot were not altered by using different models, the inclusion of information between relatives in conjunction with modeling the main effects of genotypes and GE by means of factor analytic structure offers the best option for predicting breeding values of genotypes across different environments and studying main effects of genotypes and GE. Model 15 gave a more realistic view of the relationship between the genotypes themselves, between sites, and between their interactions.
Fitting the Submodels from MM2
For fixed Model 1 of MM1, the FR test of Cornelius et al. (1992) found no significant residual (P
0.05) after fitting the sixth multiplicative component, whereas the FGH1 test (Cornelius et al., 1996) used for judging the significance of sequentially fitted multiplicative terms found only five significant terms (P
0.05).
When only the GE was modeled by the various submodels from MM2, the submodel that best fit the data according to the model-fitting criteria was Model 13 [FA(7)
A] for all five AIC, AICC, HQIC, BIC, and CAIC information criteria (Table 3). Model 14 [FA(7)
A] estimated six more nonzero parameters (nep = 97) than Model 13 (nep = 91), and the difference in the 2 x residual log likelihood (2RLL) was 320 316 = 4, which is not significant by the
2 test with 6 degrees of freedom (P = 0.677). Model 12 estimated eight fewer parameters than Model 13 and the differences between their 2 x residual log likelihoods (2RLL) was 52, which is highly significant by the
2 test with 8 degree of freedom (P = 0.000) (Table 3). This is consistent with the observation that all of the information criteria were poorer (i.e., larger) when going from Model 13 to Model 14 but better (i.e., smaller) when going from Model 12 to Model 13. Although it is not known whether the average standard error of the difference between BLUPs of any two genotypes is a reliable model selection criterion, Model 13 had the smallest average standard error of a difference (0.383, Table 3).
|
[FA(1)
Ig] was the best according to AIC, AICC, BIC, and CAIC information criteria, and Model
[FA(1)
Ig] for HQIC; however, these models were much worse than the best overall Model 13 (Table 3).
Standard Errors of the BLUPs of Breeding Values of Genotypes in Sites
The standard errors of the predicted BLUPs of the breeding values for Models 1 through 3 and Model 13 of MM2 (Fig. 4
) followed trends similar to those observed for Models 1 through 3 and Model 15 of MM1. Model 13 was the most precise model in terms of SEs as compared with Models 1 through 3, except for a few sitegenotype combinations. In all cases, the SEs of BLUPs of lines that had sister lines in the study were always smaller than the SEs of the BLUPs of lines that had no sister lines in the study. These results, together with the finding that Model 13 had the smallest values for all the model selection criteria, indicate the higher precision of Model 13 of MM2 over the other submodels for predicting BLUPs of breeding values of genotypes by modeling GE using the coefficient of parentage together with the factor analytic model. The correlations between the BLUPs obtained from Model 8 [FA(2)] and Model 13 [FA(7)] were all above 0.99.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
In this study, the relationship matrix A was calculated tracing the ancestries several generations back. However, one of the problems of using A is that selection is not considered; therefore, some relationships may be greater than those estimated in A (especially if several generation of selection have occurred). Information on molecular markers can also be used to relate genotypes and, as pointed out by Bernardo (2002), BLUP analysis of trait and marker data may be helpful for finding chromosome regions associated with quantitative traits.
Results of this study show the sensitivity of the factor analytic structure for simultaneously modeling the complex correlation among environments and among wheat genotypes and their interactions. The factor analytic model with random coefficients is analogous to the reaction norm model used for assessing the response of genotypic traits in several environments in terms of daily milk production. In this instance, the quantitative trait is written as a function of a random regression coefficient representing the average effect of gene substitution and some unobservable environmental factors (De Jong and Bijma, 2002).
From a plant breeder's perspective, Models MM1 and MM2 are useful for studying association between effects of sites and additive effects of genotypes being evaluated. As shown by Crossa et al. (2004), MM1 allows identification of subsets of sites and genotypes with low frequency and/or magnitude of crossover GE interactions. However, since in the MM1 model genotype main effects and GE interaction effects are not separated, specific interactions of particular genotypes with particular environments are not easily studied. On the other hand, the MM2 model is not sensitive to crossover GE interaction, but allows studying specific adaptation of genotypes to environment.
This study shows how main effects of lines and GE can be modeled with different statistical models in conjunction with information about additive covariances between relatives. Results show that modeling the GE using a factor analytic variancecovariance structure and additive genetic covariance between relatives increases the precision of the prediction of the breeding value. Further research is required to study the incorporation of other types of relationships between relatives (such as the covariances between relatives due to epistatic genetic effects) into the statistical models used in this study for modeling the main effects of genotypes and GE.
Received for publication November 17, 2005.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. M. Trethowan and A. Mujeeb-Kazi Novel Germplasm Resources for Improving Environmental Stress Tolerance of Hexaploid Wheat Crop Sci., July 1, 2008; 48(4): 1255 - 1265. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Burgueno, J. Crossa, P. L. Cornelius, and R.-C. Yang Using Factor Analytic Models for Joining Environments and Genotypes without Crossover Genotype x Environment Interaction Crop Sci., July 1, 2008; 48(4): 1291 - 1305. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Crossa, J. Burgueno, S. Dreisigacker, M. Vargas, S. A. Herrera-Foessel, M. Lillemo, R. P. Singh, R. Trethowan, M. Warburton, J. Franco, et al. Association Analysis of Historical Bread Wheat Germplasm Using Additive Genetic Covariance of Relatives and Population Structure |