Crop Science Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 20 June 2006
Published in Crop Sci 46:1722-1733 (2006)
© 2006 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (9)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Crossa, J.
Right arrow Articles by Krishnamachari, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Crossa, J.
Right arrow Articles by Krishnamachari, A.
Agricola
Right arrow Articles by Crossa, J.
Right arrow Articles by Krishnamachari, A.
Related Collections
Right arrow Biometrics

CROP BREEDING & GENETICS

Modeling Genotype x Environment Interaction Using Additive Genetic Covariances of Relatives for Predicting Breeding Values of Wheat Genotypes

Jose Crossaa,*, Juan Burgueñoa, Paul L. Corneliusc, Graham McLarend, Richard Trethowanb and Anitha Krishnamacharie

a Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, México D.F., México
b Wheat Program, CIMMYT
c Dep. of Plant and Soil Sciences and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0312, USA
d Biometrics and Bioinformatics Unit, International Rice Research Institute (IRRI), DAPO Box 7777, Manila, Philippines
e Dep. of Statistics, University of Madras, Chennai 6000 005, India and IRRI

* Corresponding author (j.crossa{at}cgiar.org)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In plant breeding, multienvironment trials (MET) may include sets of related genetic strains. In self-pollinated species the covariance matrix of the breeding values of these genetic strains is equal to the additive genetic covariance among them. This can be expressed as an additive relationship matrix, A, multiplied by the additive genetic variance. Using Mixed Model Methodology, the genetic covariance matrix can be estimated and Best Linear Unbiased Predictors (BLUPs) of the breeding values obtained. The effectiveness of exploiting relationships among strains tested in METs and usefulness of these BLUPs of breeding values for simultaneously modeling the main effects of genotypes and genotype x environment interaction (GE) have not been thoroughly studied. In this study, we obtained BLUPs of breeding values using genetic variance–covariance structures constructed as the Kroneker product (direct product) of a structured matrix of genetic variances and covariances for sites and a matrix of genetic relationships between strains, A. Results are compared with those from traditional fixed effects and random effects models for studying GE ignoring genetic relationships. A CIMMYT international wheat trial was used for illustration. Results showed that direct products of factor analytic structures with matrix A efficiently model the main effects of genotypes and GE. These models showed the lowest standard error of the BLUPs [SE(BLUP)] of breeding values. Genotypes that were related to other genotypes had small SE(BLUP). Related genotypes can clearly be visualized in biplots.

Abbreviations: MET, multi environment trials • GE, genotype x environment interaction • SE, standard error • BLUP, Best Linear Unbiased Prediction • SREG, Site Regression model • AMMI, Additive Main effect and Multiplicative Interaction


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
MULTIENVIRONMENT TRIALS are important in plant breeding and agriculture because genotypes are evaluated in different environmental conditions, their responses are compared, their overall stability and adaptability is assessed, the genotype x environment interaction (GE) is studied, and the best genotypes in specific environments and across environments are selected for further testing. An early approach to the analysis of GE used the fixed effects two-way analysis of variance model where the empirical mean response, yij, of the ith genotype (i = 1,2,...,g) in the jth environment (j = 1,2,...,s) with r replications in each of the gxs cells is expressed as

Formula 1[1]
where µ is the grand mean over all genotypes and environments, {tau}i is the main effect of the ith genotype, {delta}j is the main effect of the jth environment, ({tau}{delta})ij is the interaction of the ith genotype in the jth environment, and Formula 1ij is the mean of the experimental errors contributing to Formula 1ij, assumed to be NID (0, {sigma}2/r), where {sigma}2 is the within-site error variance, assumed to be constant.

A detailed history of the development of the fixed effects linear-bilinear models used in plant breeding is given in Crossa et al. (2001). When the GE is modeled bilinearly, i.e., as ({tau}{delta})ij = {sum}k=1t {lambda}k{alpha}ik{gamma}jk (Gollob, 1968; Mandel, 1969, 1971), Eq. [1] is re-expressed as

Formula 2[2]
where the constant {lambda}k is the singular value for the kth bilinear component with the {lambda}k ordered as {lambda}1 ≥ {lambda}2 ≥ ... ≥ {lambda}t ≥ 0; {alpha}ik is the ith element of the left singular vector for the kth multiplicative (bilinear) component representing genotypic sensitivity to hypothetical environmental factors. The effects of these factors are modeled by the right singular vector for the kth bilinear (multiplicative) component, the elements of which are denoted as {gamma}jk. The {alpha}ik and {gamma}jk satisfy the orthonormalization constraints {sum}i{alpha}ik{alpha}ik' = {sum}j{gamma}jk{gamma}jk' = 0 for k != k' and {sum}i{alpha}ik2 = {sum}j{gamma}jk2 = 1. Zobel et al. (1988) and Gauch (1988) referred to this model as the Additive Main Effects and Multiplicative Interaction model (AMMI).

Another linear-bilinear model form, described by Cornelius et al. (1996), is the Sites Regression Model (SREG)

Formula 3[3]
where µj is the mean of the jth site and definitions of parameters {lambda}k, {alpha}ik, and {gamma}jk are similar to their definitions in the AMMI model. However, in the SREG model the main effects of genotype are absorbed into the bilinear terms. The SREG model has been used for grouping environments without statistically significant genotypic rank change (Crossa and Cornelius, 1997; Crossa et al., 2004). The interaction parameters {alpha}ik and {gamma}jk in the AMMI and SREG linear-bilinear models model the behavior of genotypes and environments. When plots of ({alpha}i1, {alpha}i2), i = 1,2,...,g and ({gamma}j1, {gamma}j2), j = 1,2,...,s are overlaid as a biplot (Gabriel, 1978), useful interpretations of the relationships between genotypes, environments, and GE are obtained (DeLacy et al., 1996; Yan et al., 2000; Crossa et al., 2002, 2004).

The above statistical models for studying GE have been developed in the context of the two-way fixed effects or random effects models (Cornelius et al., 1996; Cornelius and Crossa, 1999), where the variances within sites are assumed to be equal and all pairwise covariances between sites are zero because sites are assumed to be independent (Crossa et al., 2001; Cornelius et al., 2001). However, heterogeneous variance-covariances usually arise in MET for the following reasons: (i) heterogeneous within-site error variances caused by site-to-site differences in variability among plots with respect to properties that affect realized values of the measured traits, (ii) some sites and/or years are likely to show more genotypic variation than others, and (iii) environmental factors such as soil type, temperature, precipitation, and/or elevation may cause heterogeneous covariances among sites (i.e., some sites are more similar than other sites). Furthermore, traditional statistical analyses of MET have assumed that in each site individual field plot errors are spatially independent and that genotypes are unrelated. These assumptions are not realistic because sites in close geographic locations can be expected to be alike, related genotypes, such as full-sibs, half-sibs, sister lines, etc., tend to be more alike than unrelated genotypes, and observations made in field plots in close proximity tend to be correlated.

The main feature of mixed linear model methodology, compared with fixed effects modeling, is that it allows modeling not only independent observations, but also heterogeneous and correlated variance–covariance structures. In mixed models, some effects are assumed to have arisen from a distribution (of random effects). This implies that there is (at least conceptually) a broad population of genetic effects and that samples are realized values from that population, and these can be predicted by using best linear unbiased predictors (BLUPs) (Henderson, 1975). Technically, when variance–covariance parameters must be estimated from the data, the resulting BLUPs are actually "empirical" BLUPs (EBLUPs), but in this paper, we will refer to them as BLUPs. Conversely, inferences on fixed effects in the model are restricted only to the observed levels, genotypes or environments, and empirical best linear unbiased estimates (EBLUE) are computed using empirical generalized least squares (EGLS).

Mixed linear models allow accurate prediction of genotypic performance by using covariance structures that consider correlations between sites, years, and plots in the field, as well as genetic associations between relatives. Piepho (1994) used BLUPs of effects of genotypes in an analysis of genotypic adaptability. BLUPs of breeding values of different parental soybean lines were very effective for predicting the performance of crosses using data from past METs (Panter and Allen, 1995a, 1995b); however, they did not include GE in their models.

The first authors to consider the traditional analysis of the regression of the genotypic means on the site means within the framework of a random effects model were Gogel et al. (1995) and Piepho (1997). Piepho (1997) considered a factor analytic variance–covariance structure for modeling a mixed model version of the regression of the genotypic mean on the site means and suggested a mixed model version of AMMI with environments random and genotypes fixed as another variant of factor analytic modeling for GE. Piepho (1998) considered mixed model versions of SREG, genotype regression (GREG) and AMMI with genotypes random. Smith et al. (2002, p.323–335) described random effects factor analytic models analogous to SREG (Eq. [3]) and AMMI (Eq. [2]) and explained how the multiplicative factor analytic model accommodates random regression coefficients as opposed to scores obtained as elements of eigenvectors derived from the singular value decomposition of the GE matrix of fixed effects. Furthermore, sites may form clusters showing higher genotypic correlations than other site groups, as shown by Crossa et al. (2004), who used factor analytic mixed model theory to confirm clusters of sites formed by SREG with negligible crossover GE. Smith et al. (2005) gave a general formulation of the most common mixed models used for the analysis of MET including the factor analytic model.

In plants, breeding values of genotypes have not been used as extensively as in animal breeding because in plant breeding selection is almost always based on comparisons derived from carefully designed equireplicate experiments of the candidates for selection themselves or on replicated progeny tests of candidates for selection. In animal breeding, selection is based on information from highly unbalanced data sets in which there is a much greater need for BLUPs as a basis for selection. Furthermore, it is only in recent years that there has existed convenient general mixed model or random effects model computing software to accomplish the computation. Some animal breeding computing software would not be very practical in a plant breeding context. In addition, the plant breeder is constrained by time, often requiring the seed of selections to be prepared in time for the following year's nursery and/or yield trials or perhaps even an off-season nursery. These constraints and the lack of expedient software have likely discouraged the conduct of complicated analyses to obtain BLUPs.

BLUPs of breeding values shrink (i.e., adjust) empirical means toward the general mean. BLUPs of breeding values of genotypes are affected not only by the performance of their relatives but also by the number of observations, with the consequence that BLUPs of breeding values of genotypes with few observations shrink (i.e., adjust) the empirical genotype means toward the general mean more than do BLUPs of breeding values of genotypes with more observations. Recently, Bernardo (2002) summarized the interpretation of the genetic effects (additive, dominance, and epistasis) of BLUPs in different crop species. In the context of mixed linear models, BLUPs of breeding values of random genotypes are potentially useful for selecting superior genotypes because BLUPs allow using information from relatives through coefficients of parentage (COP). Information on relatives using COPs or molecular markers is not routinely incorporated in MET analysis, probably for the above mentioned reasons.

When individuals inherit copies of the same alleles at a substantial number of loci, the consequence is that they tend to show phenotypic resemblance due to genetic relationship (genetic covariance). The genetic covariance between relatives enhances breeding progress because it allows information from relatives to be optimally combined (or nearly so) with direct information on candidates into a computed selection criterion. The genetic covariance between any pair of related individuals (i and i'), because of their additive genetic effects, is equal to two times the COP between the strains, fii' multiplied by the population additive genetic variance, {sigma}a2 (Kempthorne, 1969). The COP is also known as the coefficient of coancestry (Falconer, 1989), and the matrix A = 2[fii'] is the additive relationship matrix (Henderson 1976). Thus, A{sigma}a2 is the variance–covariance matrix of the breeding values (additive genetic effects). Closely related individuals contribute more to the prediction of breeding values of their relatives than do less closely related genotypes. Moreover, when one genotype is missing (either partially or totally), its breeding value can still be predicted from its relatives, albeit less efficiently than if the data were complete.

Henderson's (1975) development of "mixed model equations" (MME) enabled computation of generalized least squares (GLS) of fixed effects and estimation of realized values of random effects (BLUPs) without having to construct and invert the covariance matrix V of the vector of response values (y). Henderson (1976) further obtained a shortcut method for computing the inverse of the relationship matrix involved in the MME in animal and plant breeding problems. These developments provided breeders with an efficient tool for genetic selection. However, it is expected that breeding values may be different under different environmental conditions and agricultural production systems. The prediction of breeding values in the context of a MET to simultaneously model GE and the relationship between genotypes through COPs does not seem to have been thoroughly studied. The objective of this study is to show how to use, and to report results from using, different mixed models for modeling the main effects of genotypes and GE using information on related genotypes for predicting breeding values of wheat (Triticum aestivum L.) genotypes evaluated in MET. A CIMMYT international wheat MET is used as an example.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Depending on how the main effects of genotypes and GE are fitted, we will describe two classes of mixed models when information between relatives is used.

Mixed Model 1 with Covariance between Relatives
Combining Main Effects of Genotype and Genotype x Environment Interaction
Mixed Model 1 (MM1), used for fitting the data from g genotypes, s sites, and r replicates (in each site) and assuming the relationship of the genotypes is measured by the matrix COP = fii' (of order g), is

Formula 3
where X is the incidence matrix of 0s and 1s for the fixed effects of sites, and Zr and Zg1 are the incidence matrices of 0s and 1s for the random effects of replicates within sites and genotypes within sites, respectively. The random effect of genotypes within sites (g1) combines the main effects of genotypes and GE. Vector b denotes the fixed effects of sites; vectors r, g1, and e contain random effects of replicates within sites, genotypes within sites, and residuals within sites, respectively, and are assumed to be random and normally distributed with zero mean vectors and variance–covariance matrices R, G1, E, respectively, such that Formula 3 ~NFormula 3. The variance-covariance matrix of Y is V(Y) = ZrR Z'r + Zg1 G1 Z'g1 + E.

A more detailed representation of these matrices, similar to that given by Crossa et al. (2004), is as follows:

Formula 3
where yj is the vector of the response variable in the jth site (j = 1,2,...,s); 1 is a vector of ones, µj is the population mean of the jth site, ZRj and ZGj are design matrices for random effects of replicates and genotypes within the jth site, respectively. The variance–covariance matrices R and E are assumed to have the simple variance component structure R = {Sigma}r {otimes} Ir = Formula 3 {otimes} Ir, and E = {Sigma}e {otimes} Irg = Formula 3 {otimes} Irg, where Ir and Irg are identity matrices of orders r and rg, respectively, {Sigma}r and {Sigma}e are the replicate and residual variance matrices, respectively, and {otimes} is the Kronecker (or direct) product operator. In covariance pattern models, it is assumed that residuals have a multivariate normal distribution with zero means and covariance matrix E. In this study, the structure of E assumes that the residuals of the field plots at each site (i.e., elements of vector e) are not spatially correlated, that is, E = {Sigma}e {otimes} Irg. However, although we do not do so in this paper, when the field location of the plots is recorded, the matrix E could be modeled using, for example, the two-dimensional auto-regressive procedure in the direction of the rows and columns in the field (Gilmour et al., 1997).

The variance–covariance matrix G1, which combines the main effect of genotypes (breeding values) and GE, can be represented as:

Formula 3
where the jth diagonal element of the s x s matrix {Sigma}g1 is the additive genetic variance {sigma}aj2 within the jth site, and the jj'th element is the additive genetic covariance {rho}jj'{sigma}aj{sigma}aj' between sites j and j'; thus {rho}ij' is the correlation of additive genetic effects between sites j and j'. The matrix A = 2fii' is of order g x g and measures the relationship or covariance between relatives due to additive genetic effects. When genotypes are not related, A is replaced by Ig (identity matrix of order g) (Smith et al., 2002; Crossa et al., 2004), and the breeding value of each genotype will be predicted only by the value of the empirical responses of the genotype itself.

The mixed model equations and the solution for the vector of fixed effects of site means (Formula 3) and the vectors of random effects (Formula 3, Formula 31) are obtained following Henderson (1975).

Mixed Model 2 with Covariance between Relatives
Distinguishing Main Effects of Genotype from Genotype x Environment Interaction
Mixed Model 2 (MM2), for fitting g genotypes, s sites, and r replicates, assuming that the covariance between relatives is proportional to twice the COP matrix [fii'] multiplied by the additive genetic variance {sigma}a2, is

Formula 3
where X, Zr, Zg, and Zge are the design matrices for fixed effects of sites, random effects of replicates within sites, genotypes, and GE, respectively, and e is the vector of residuals. Vector b denotes the fixed effects of sites, and vectors r, g, ge, and e contain random effects of replicates within sites, genotypes, GE, and residuals, respectively, and are assumed to be random and normally distributed with zero mean vectors and variance-covariance matrices R, G, GE, and E, respectively, such that Formula 3 ~NFormula 3. The variance of Y is V(Y) = Zr RZ'r + Zg GZ'g + Zge GEZ'ge + E.

Variance-covariance matrices R and E are assumed to have a simple variance component structure, as defined for MM1. The variance-covariance matrix of the main effects of genotypes in MM2 is modeled as G = A{sigma}a2 and the variance-covariance of the genotype x environment interaction is modeled as GE = {Sigma}ge {otimes} A where GE is of the same form as G1 in MM1 except that in the notation, {sigma}a1, {sigma}a2, ..., {sigma}aj is replaced by {sigma}ae1, {sigma}ae2, ..., {sigma}aej where the ae subscript denotes the additive x environment interaction. The mixed model equations and the solution for the vector of fixed effects (Formula 3) and random effects (Formula 3, Formula 3, Formula 3) are similar to those for MM1, but with equations for random effects of genotypes and GE being separately distinguished.

Modeling G1 of Mixed Model 1 and GE of Mixed Model 2
Different structures can be proposed to model matrices {Sigma}g1 of G1 and {Sigma}ge of GE for model MM1 and MM2, respectively. This will produce several models within each of the two main mixed model classes, MM1 and MM2. The most restrictive variance-covariance structure is to assume that genetic (in MM1) or GE (in MM2) variances within all sites are equal, and all pairwise correlations between genotypes and between site are zero. The most liberal structure is the completely unstructured model which assumes matrices {Sigma}g1 and {Sigma}ge contain s(s+1)/2 parameters. In this study, we have used two more conservative types of variance-covariance structures for modeling these matrices, namely, compound symmetry and factor analytic models.

Compound symmetry models assume that correlations between the effects of any one genotype measured in any two different environments is {rho}, with –1/(s – 1) ≤ {rho} ≤ 1. Furthermore, compound symmetry models can take two forms, differing with respect to whether we assume site-site homogeneity or heterogeneity of within site genetic variances. The form that assumes site-to-site homogeneity of genetic variance (CS model) is such that {Sigma}g1 or {Sigma}ge has the structure {{sigma}a2[(1 – {rho})Is + {rho}Js]} (where Js is a s x s matrix of ones). Alternatively, the model can incorporate site-to-site heterogeneity of within-environment genetic variance (CSH model), in which case {Sigma}g1 or {Sigma}ge has the structure {diag({sigma}aj)[(1 {rho})Is + {rho}Js]diag({sigma}aj)}. For the CS case, there are two parameters, namely, constant genotypic variance within environments and constant correlation ({rho}) between effects of a common genotype measured in different sites in the G1 and the GE matrices. For the CSH case, there are s + 1 parameters, namely s genetic variances, {sigma}aj2 (j = 1,2,...s) in the diagonal and the homogeneous correlation, {rho}, in the off-diagonal so that the covariance for sites j and j' is {rho}{sigma}aj{sigma}aj'.

The factor analytic structure models covariance among observations in terms of a few hypothetical unobserved factors and can be useful for modeling matrices G1 and GE of MM1 and MM2, respectively. MM1 with G1 modeled using a factor analytic structure is the mixed model counterpart to the SREG model (Eq. [3]). Similarly, MM2 with GE modeled using a factor analytic structure is analogous to the AMMI model (Eq. [2]). The factor analytic structure with q ≤ s factors or components [FA(q)] is of the form {Delta}{Delta}' + D, where {Delta} is an s x q matrix and D is an s x s diagonal matrix with s possibly different nonnegative parameters on the diagonal. Each column of {Delta} contains the site scores for one of the multiplicative terms. For q = 1, the model, denoted as FA(1), has one multiplicative term and 2s parameters to be estimated, for q = 2, i.e., model FA(2), the model has 3s – 1 parameters to be estimated, and so on for FA(3), etc. Boundary constraints must be imposed on the solutions to avoid overparameterization (parameter nonidentifiability). This is achieved by fixing some estimates to have values of zero.

Submodels from MM1 and MM2
As previously defined, matrices G1 of MM1 and GE of MM2 are each a Kronecker product of two matrices that depend on specific sub models defined below. For simplicity these submodels will be named Model 1 through Model N, and they are described in Table 1. Model 1 forms are the models obtained by regarding vector g1 in MM1 and vectors g and ge in MM2 as fixed effects rather than random, thus absorbing these effects into the b vector in the two respective models; thus g1, g, and ge are not defined in these models. Model 2 comprises the identity matrices Is and Ig, such that G1 of MM1 and GE of MM2 are fitted as random effects, without including A. Model 3 is the same as Model 2 but replaces {sigma}g2 with {sigma}a2 and matrix Ig with A. Model 4 includes matrix A and the diagonal matrix where {sigma}aj2 is the additive genetic variance in the jth environment. Model 5 assumes homogeneous compound symmetry (CS). Model 6 assumes heterogeneous compound symmetry (CSH). Model 7 considers a one-component factor analytic structure for the matrix of additive genetic variances and covariances. Model 8 is the same as Model 7 but with a two-component factor analytic structure. Models 9, 10,..., etc. use a factor analytic structure with three, four, etc. components, until a model is obtained for which information criteria show that the model is overfitted.


View this table:
[in this window]
[in a new window]
 
Table 1. Description of submodels for G1 [from Mixed Model 1 (MM1)] and for GE [from Mixed Model 2 (MM2)].

 
In addition, factor analytic models for G1 of MM1 and GE of MM2 (similar to Models 7, 8, 9, etc.), but using the identity matrix Ig instead of matrix A, were fitted and identified with a tilde. These models are the same as Eq. [21.10] of Smith et al. (2002, p. 323–335) and those used by Crossa et al. (2004). Therefore, Models Formula 3, Formula 3,...,etc. are fitted until an overfitted model is obtained. The number of covariance parameters in these two model forms [FA(q) {otimes} Ig and FA(q) {otimes} A] is the same, provided that the value of q (number of components) and the number of boundary constraints imposed on the solutions are the same for the two models.

Variance component estimation and fitting of the different models were done using the Restricted Maximum Likelihood (REML) method and the Average Information algorithm implemented in ASReml (Gilmour et al., 2002). Appropriate selection of initial values for the factor analytic model is important. Our experience with ASReml is that the FA(1) solution provides a good starting point for fitting FA(2), and in general, the FA(k – 1) solution provides a good starting point for fitting FA(k) (k = 1,2,3....).

Measures of Submodel Adequacy
As the number of parameters in the model increases, the maximum value of the residual log likelihood (RLL) statistics is expected to increase. The residual log likelihood ratio test (Wolfinger, 1993; Brown and Prescott, 1999) and the information criteria given in Table 46.2 of SAS (2004) were used for assessing and comparing models which included the same fixed effects in the model. In this paper we have not considered alternative diagnostics based on crossvalidation.

The information criteria included Akaike's Information Criterion, AIC = –2RLL + 2d, (where d is the number of independent variance–covariance parameters in the model), Akaike's Information Criteria Corrected for finite sample size, AICC = –2RLL + 2dn/(nd – 1), (where n is the number of observations), the Hannan and Quin Information Criterion HQIC = –2RLL + 2d log[log(n)], the Schwarz Bayesian Criterion BIC = –2RLL + d[log(n)], and the criterion developed by Bozdogan CAIC = –2RLL + d[log(n)+1].

Biplots of Fitted Submodels
Biplots of all fitted models can be obtained from the two-way table of g genotypes and s sites. Biplots of fitted factor analytic models can be obtained directly from the scores of the genotypes and the loadings of the sites. Biplots of Submodel 1 are the usual biplots for the SREG and AMMI fixed effects models.

The Coefficient of Parentage Matrix
The COP between individuals i and i' is the probability that an allele from a randomly selected locus in individual i is identical by descent with an allele randomly selected from the same locus in individual i' (Cockerham, 1971). The derivation of the matrix of coefficients of parentage (COP) is computed following the Cockerham (1983) approach using the coefficient of inbreeding, and the coefficients of parentage between crosses and within crosses. In the case of successive generations of self-fertilization, sister lines have different COP values depending on the generation at which the last common ancestor is present. Therefore, COPs between sister lines tend to be different and higher if the generation of the last common ancestor is closer to the generation at which the lines are present. The software used for deriving the relationship matrix was the Browse application of the International Crop Information System (ICIS) described in McLaren et al. (2005) which accounts for selection as well as inbreeding, and improves the accuracy of breeding value estimation. The Browse application is described at http://cropwiki.irri.org/icis/index.php/TDM_GMS_Browse; verified 27 March 2006.

Experimental Data
The data used are from a CIMMYT bread wheat international trial. Twenty-nine lines (1–29) were tested in 16 international sites, namely Mexico (MEX), USA (two sites, USA1 and USA2), Turkey (TKY), Israel (ISR), Bangladesh (BGD), India (IND), Pakistan (PKT), Syria (SYR), Spain, (SPN) (two sites, SPN1 and SPN2), Nepal (NPL), Kenya (KNY), Zimbabwe (ZBW), New Zealand (NZL), and Chile (CHL), in randomized complete block designs with three replications at each site. The response variable analyzed was grain yield (Mg ha–1). There were five sets of sister lines (5, 6), (4, 19, 20), (14, 15), (21, 23, 24), and (28, 29).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The COP of all pairs of sister lines within sets of sister lines, (5–6), (4–19, 4–20, 19–20), (14–15), (21–23, 21–24, 23–24), and (28–29) are (0.917), (0.792, 0.792, 0.948), (0.985), (0.902, 0.902, 0.975), and (0.774), respectively. The COP between lines within sets (4, 19, 20) and (21, 23, 24) are not all alike because lines within these sets do not all have the same relationship to one another. For example, Line 4 had a COP of 0.792 with Lines 19 and 20, whereas the COP between 19 and 20 is 0.948. While Lines 13 and 27 are not strongly related to any of the other lines included in the trial, most of the lines (excluding the sister lines) are moderately related to one another, with COPs ranging from 0.1 to 0.4 (data not shown).

Fitting the Submodels from MM1
When fixed Model 1 was fitted, the F test resulted in a significant effect for the combination of line main effect and GE for grain yield (P ≤ 0.05). For Model 1, the FR test of Cornelius et al. (1992), which assesses the significance of residual variation after fitting the first k – 1 multiplicative components, found no significant residual (P ≤ 0.05) after fitting the seventh multiplicative component, and, similarly, the FGH1 test (Cornelius et al., 1996) used for judging the significance of sequentially fitted multiplicative terms found seven significant terms (P ≤ 0.05).

Values for the –2 x logarithm of the restricted likelihood function (–2RLL), five information criteria, standard error of line mean differences, and residual variances for submodels of MM1 for grain yield are shown in Table 2. The model that best fits the data was Model 15 [FA(9) {otimes} A] by AIC, AICC, HQIC, BIC, and CAIC criteria. Model 16 is less parsimonious than Model 15 because it estimated five more nonzero parameters (nep = 116) than Model 15 (nep = 111), and the difference in the –2 x residual log likelihood (–2RLL) was 208 – 207 = 1, which is not significant by the {chi}2 test with 5 degree of freedom (P = 0.925). On the other hand, Model 14 estimated seven parameters less than Model 15 and the differences on the –2 x residual log likelihood (–2RLL) was 50, which is highly significant by the {chi}2 test with 7 degree of freedom (P = 0.000). This is consistent with the observation that all of the information criteria were poorer (i.e., larger) when going from Model 15 to Model 16 but better (i.e., smaller) when going from Model 14 to Model 15. It is interesting to note that, although the average standard error of the difference between BLUPs of any two genotypes was not used as a model selection criterion, Model 15 had the smallest average standard error of a difference (0.380, Table 2). Among the factor analytic models that exclude COP, Model Formula 3 [FA(3) {otimes} Ig] gave the best fit, but it performed worse than factor analytic models that included COP in the model definition (e.g., Model 15).


View this table:
[in this window]
[in a new window]
 
Table 2. Values of –2RLL, five information criteria (AIC, AICC, HQIC, BIC, and CAIC; better models have smaller values), average standard error of the difference between the BLUPs of breeding values of any two genotypes in any two sites (SED), and residual variance (E) for the fitted submodels of MM1for grain yield. Smallest values for each information criteria are underlined. Total number of observations n = 1392.

 
The factor analytic model is very flexible and offers an efficient procedure for reducing the dimensionality and complexity of the G1 matrix into few factors. Model 15 of MM1 detected more significant multiplicative components for explaining the main effect of lines and GE [FA(9)] than the fixed Model 1, which, by the FR and FGH1 tests, found seven significant components. Furthermore, Model 15 detected six more components than the best factor analytic model fitted without COP, Model Formula 3.

Although the factor analytic model with two components, Model 8 [FA(2)] of MM1 was significantly different from Model 15 [FA(9)] (P = 0.000 for {chi}2 = 589 with 66 df); the correlation between the BLUPs obtained from Model 8 and Model 15 was 0.998, indicating the similarity between the two models [FA(2) and FA(9)] in predicting the BLUPs of the breeding values of the lines included in this experiment. Furthermore, all correlations between the BLUPs obtained from pairs of models among Models 8 through 15 were ≥0.98, indicating similarities between these models in predicting the breeding value of the lines. Although the estimates of Models 8 and 15 were not identical, a correlation of 0.97 between site covariances from FA(2) and FA(9) was found. Relatively high correlations were estimated, among others, between ZBW and ISR, USA1, USA2, SPN1 as well as between ISR and USA1, USA2, and MEX.

Standard Errors of BLUPs of Breeding Values of Genotypes in Sites
Standard errors (SE) of BLUPs of breeding values of the lines for grain yield ranged from 0.18 to 0.44 Mg ha–1. In general, SEs were smaller for models using information on relatives and when the main effect of genotypes and GE were modeled using a factor analytic structure for the G1 variance-covariance matrix (Fig. 1 ). The fixed Model 1 had the largest SEs, followed by Model 2 (without A) and Model 3 (with A). In contrast, Model 15 had the smallest SE for most of the line-environment combinations and also was the model with the smallest values for all the model selection criteria (Table 2). These results suggest that Model 15 gave more precise BLUP of breeding values of the lines than the other models.


Figure 1
View larger version (32K):
[in this window]
[in a new window]
 
Fig. 1. Standard errors of BLUP of 29 wheat genotypes (1–29) within 16 sites for grain yield for Models 1, 2, 3, and 15 of Mixed Model 1. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), and (28,29). For Model 15, each graph line represents one of the 16 sites.

 
The lines in Fig. 1 (from top to bottom) graph the SEs of BLUPs of breeding values of Models 1, 2, 3, and 15 in countries NZL, USA1, TKY, ZBW, CHL, SPA2, MEX, SYR, ISR, SPA1, NPL, KNY, IND, PKT, BGD, and USA2. New Zealand had the largest SE of the BLUPs of the breeding values and USA2 had the smallest. Only the SE of the BLUP of Genotype 26 in NZL from Model 15 was larger than those from Models 2 and 3 (Fig. 1). For several sites, the SEs of the genotypes from Model 15 were half (or less) the SE obtained from Models 1 through 3. Although not presented here, the patterns of SEs for Model 8 [FA(2)] are very similar to those depicted in Fig. 1 for Model 15 but less compact and more spread out than SEs obtained from Model 15.

The benefits of including information on related lines in terms of precision are evident when comparing Models 2 and 3; Model 3 is similar to Model 2 but includes information on relatives. It is clear that the SEs of sister lines were always smaller than the SEs of genotypes that had no relatives in the trial in all cases except for Models 1 and 2, which do not incorporate this information. When modeling the main effects of genotype and GE in conjunction with the information on relatives, the improvement in the precision of the BLUP of the sisters as well as the other lines is evident.

Biplots
The biplots of Model 1 (excluding matrix A) and Model 15 [FA(9) including matrix A] of MM1 are depicted in Fig. 2 and 3, respectively. The biplot of Model 1 (Fig. 2) is the usual biplot which shows Genotypes 19 and 20 as having a positive response in terms of genotype main effect and GE for most of the sites because they are in the same direction. Genotypes 16, 18, and 25, located on the opposite side of the biplot, have a negative response in all sites. Sites located farther away from the center, such as NZL, USA1, ZBW, ISRL, and SPN2, are the ones that discriminate the genotypes the most. Pairs of sister lines 19 and 20 and 14 and 15 are distinct from the others. Sister lines are scattered throughout the two dimensions of the biplot.


Figure 2
View larger version (19K):
[in this window]
[in a new window]
 
Fig. 2. Biplot of Model 1 of Mixed Model 1 for grain yield. Lines are 1–29. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), and (28,29). The 16 sites are MEX, USA1, USA2, TKY, ISR, BGD, IND, PKT, SYR, SPN1, SPN2, NPL, KNY, ZBW, NZL, and CHL.

 

Figure 3
View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3. Biplot of Model 15 of Mixed Model 1 [G1 = FA(9) {otimes} A)] for grain yield. Lines are 1–29. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), and (28,29). The 16 sites are MEX, USA1, USA2, TKY, ISR, BGD, IND, PKT, SYR, SPN1, SPN2, NPL, KNY, ZBW, NZL, and CHL.

 
When the main effect of genotypes and GE are modeled using the factor analytic variance–covariance Model 15 [FA(9)], which includes information between relatives (Fig. 3), sister lines with a strong genetic association are held together, but others, such as Genotype 4, tended to be farther away from its sister lines (19 and 20), because its COP was 0.792; a similar situation occurred with sister lines 28 and 29, with a COP of 0.774. Groups of sites with higher correlations, such as ZBW, SPN1, USA1, USA2, MEX, and ISR, tended to appear on the right hand side of the biplot (Fig. 3). The biplot obtained from Model 8 [FA(2)] was almost identical to the one from Model 15 (Fig. 3), reflecting the high correlation between BLUPs obtained from these models.

Although the general pattern of response in terms of directions and projections of genotypes and sites in the biplot were not altered by using different models, the inclusion of information between relatives in conjunction with modeling the main effects of genotypes and GE by means of factor analytic structure offers the best option for predicting breeding values of genotypes across different environments and studying main effects of genotypes and GE. Model 15 gave a more realistic view of the relationship between the genotypes themselves, between sites, and between their interactions.

Fitting the Submodels from MM2
For fixed Model 1 of MM1, the FR test of Cornelius et al. (1992) found no significant residual (P ≤ 0.05) after fitting the sixth multiplicative component, whereas the FGH1 test (Cornelius et al., 1996) used for judging the significance of sequentially fitted multiplicative terms found only five significant terms (P ≤ 0.05).

When only the GE was modeled by the various submodels from MM2, the submodel that best fit the data according to the model-fitting criteria was Model 13 [FA(7) {otimes} A] for all five AIC, AICC, HQIC, BIC, and CAIC information criteria (Table 3). Model 14 [FA(7) {otimes} A] estimated six more nonzero parameters (nep = 97) than Model 13 (nep = 91), and the difference in the –2 x residual log likelihood (–2RLL) was 320 – 316 = 4, which is not significant by the {chi}2 test with 6 degrees of freedom (P = 0.677). Model 12 estimated eight fewer parameters than Model 13 and the differences between their –2 x residual log likelihoods (–2RLL) was 52, which is highly significant by the {chi}2 test with 8 degree of freedom (P = 0.000) (Table 3). This is consistent with the observation that all of the information criteria were poorer (i.e., larger) when going from Model 13 to Model 14 but better (i.e., smaller) when going from Model 12 to Model 13. Although it is not known whether the average standard error of the difference between BLUPs of any two genotypes is a reliable model selection criterion, Model 13 had the smallest average standard error of a difference (0.383, Table 3).


View this table:
[in this window]
[in a new window]
 
Table 3. Values of –2RLL, five information criteria (AIC, AICC, HQIC, BIC, and CAIC; better models have smaller values), average standard error of the difference between the BLUPs of breeding values of any two genotypes in any two sites (SED), and residual variance (E) for the fitted submodels of MM2 for grain yield. Smallest values for each information criteria are underlined. Total number of observations n = 1392.

 
Among the factor analytic models without COP, Model Formula 3 [FA(1) {otimes} Ig] was the best according to AIC, AICC, BIC, and CAIC information criteria, and Model Formula 3 [FA(1) {otimes} Ig] for HQIC; however, these models were much worse than the best overall Model 13 (Table 3).

Standard Errors of the BLUPs of Breeding Values of Genotypes in Sites
The standard errors of the predicted BLUPs of the breeding values for Models 1 through 3 and Model 13 of MM2 (Fig. 4 ) followed trends similar to those observed for Models 1 through 3 and Model 15 of MM1. Model 13 was the most precise model in terms of SEs as compared with Models 1 through 3, except for a few site–genotype combinations. In all cases, the SEs of BLUPs of lines that had sister lines in the study were always smaller than the SEs of the BLUPs of lines that had no sister lines in the study. These results, together with the finding that Model 13 had the smallest values for all the model selection criteria, indicate the higher precision of Model 13 of MM2 over the other submodels for predicting BLUPs of breeding values of genotypes by modeling GE using the coefficient of parentage together with the factor analytic model. The correlations between the BLUPs obtained from Model 8 [FA(2)] and Model 13 [FA(7)] were all above 0.99.


Figure 4
View larger version (33K):
[in this window]
[in a new window]
 
Fig. 4. Standard errors of BLUP of 29 wheat genotypes (1–29) within 16 sites for grain yield for Models 1, 2, 3, and 13 of Mixed Model 2. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), and (28,29). For Model 13, each graph line represents one of the 16 sites.

 
Biplots
The biplots of Models 1 and 13 of MM2 are depicted in Fig. 5 and 6, respectively. In Fig. 5, sister lines 14 and 15 and 19 and 20 show positive GE in NZL, but Genotypes 7 and 22, located on the opposite quadrant, had negative GE in NZL. Sister lines 21, 23, and 24 tended to cluster in the lower left quadrant of the biplot, with positive GE in KYN, TKY, IND, NPL, and BGD. Sister lines 28 and 29 are in the upper part of the biplot, with positive responses in USA1 and ZBW, respectively. Sister lines 5 and 6, together with Genotype 4, clustered at the center of the biplot, indicating their relative insensitivity to the prevailing environmental conditions at all sites and thus giving an average response across all sites in terms of GE.


Figure 5
View larger version (17K):
[in this window]
[in a new window]
 
Fig. 5. Biplot of Model 1 of Mixed Model 2 for grain yield. Lines are 1–29. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), and (28,29). The 16 sites are MEX, USA1, USA2, TKY, ISR, BGD, IND, PKT, SYR, SPN1, SPN2, NPL, KNY, ZBW, NZL, and CHL.

 

Figure 6
View larger version (17K):
[in this window]
[in a new window]
 
Fig. 6. Biplot of Model 13 of Mixed Model 2 [GE = FA(7) {otimes} A)] for grain yield. Lines are 1–29. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), and (28,29). The 16 sites are MEX, USA1, USA2, TKY, ISR, BGD, IND, PKT, SYR, SPN1, SPN2, NPL, KNY, ZBW, NZL, and CHL.

 
When information on relatives is added and the GE is modeled using a factor analytic structure [FA(7)], the resulting biplot (Fig. 6) shows trends similar to those of the biplot from Model 2 (Fig. 6), but with the sister lines in each of the five sets being closer. For example, the subset consisting of sister lines 21, 23, and 24 formed a clear cluster in the lower left quadrant of the biplot, whereas subsets 14–15 and 19–20 formed two distinct groups in the lower right quadrant of the biplot. Genotypes 5 and 6 remained in the center of the biplot, but closer to one another than before; Genotype 4 remained in the center of the biplot.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Statistical analysis of lines in early generation testing in MET with replicated or unreplicated field design using the relationship between relatives should improve the precision of BLUPs of breeding values. Furthermore, it can be conjectured that if only some lines can be measured in some environments and others in different environments, information among relatives should facilitate estimating the association between environments, as well as modeling the main effects of lines and GE. Bernardo (2002) discusses the use of the different genetic effects for the prediction of hybrid performance. These improvements in precision should further improve the response to selection in the early generations of self-pollinated crop breeding programs. The correlated performance of a line at two sites measures the influence of environmental conditions on the alleles of that genotype. When two or more genotypes are genetically correlated and their response in environments is measured, the correlated performance should be related, at least in part, because of genetic commonality and will influence associations between sites that were not discovered when genetic associations were excluded from the analysis.

In this study, the relationship matrix A was calculated tracing the ancestries several generations back. However, one of the problems of using A is that selection is not considered; therefore, some relationships may be greater than those estimated in A (especially if several generation of selection have occurred). Information on molecular markers can also be used to relate genotypes and, as pointed out by Bernardo (2002), BLUP analysis of trait and marker data may be helpful for finding chromosome regions associated with quantitative traits.

Results of this study show the sensitivity of the factor analytic structure for simultaneously modeling the complex correlation among environments and among wheat genotypes and their interactions. The factor analytic model with random coefficients is analogous to the reaction norm model used for assessing the response of genotypic traits in several environments in terms of daily milk production. In this instance, the quantitative trait is written as a function of a random regression coefficient representing the average effect of gene substitution and some unobservable environmental factors (De Jong and Bijma, 2002).

From a plant breeder's perspective, Models MM1 and MM2 are useful for studying association between effects of sites and additive effects of genotypes being evaluated. As shown by Crossa et al. (2004), MM1 allows identification of subsets of sites and genotypes with low frequency and/or magnitude of crossover GE interactions. However, since in the MM1 model genotype main effects and GE interaction effects are not separated, specific interactions of particular genotypes with particular environments are not easily studied. On the other hand, the MM2 model is not sensitive to crossover GE interaction, but allows studying specific adaptation of genotypes to environment.

This study shows how main effects of lines and GE can be modeled with different statistical models in conjunction with information about additive covariances between relatives. Results show that modeling the GE using a factor analytic variance–covariance structure and additive genetic covariance between relatives increases the precision of the prediction of the breeding value. Further research is required to study the incorporation of other types of relationships between relatives (such as the covariances between relatives due to epistatic genetic effects) into the statistical models used in this study for modeling the main effects of genotypes and GE.

Received for publication November 17, 2005.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
R. M. Trethowan and A. Mujeeb-Kazi
Novel Germplasm Resources for Improving Environmental Stress Tolerance of Hexaploid Wheat
Crop Sci., July 1, 2008; 48(4): 1255 - 1265.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
J. Burgueno, J. Crossa, P. L. Cornelius, and R.-C. Yang
Using Factor Analytic Models for Joining Environments and Genotypes without Crossover Genotype x Environment Interaction
Crop Sci., July 1, 2008; 48(4): 1291 - 1305.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Crossa, J. Burgueno, S. Dreisigacker, M. Vargas, S. A. Herrera-Foessel, M. Lillemo, R. P. Singh, R. Trethowan, M. Warburton, J. Franco, et al.
Association Analysis of Historical Bread Wheat Germplasm Using Additive Genetic Covariance of Relatives and Population Structure