|
|
||||||||
a Biometrics and Statistics Unit of the Crop Informatics Laboratory (CRIL), International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, México, D.F., México
b Dep. of Plant and Soil Sciences and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0312, USA
c Plant Breeding Institute, University of Sydney, PMB 11, Camden NSW 2570, Australia
d Biometrics and Bioinformatics Unit of the Crop Informatics Laboratory (CRIL), International Rice Research Institute (IRRI), DAPO Box 7777, Manila, the Philippines
* Corresponding author (j.crossa{at}cgiar.org)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: BLUP, best linear unbiased prediction COP, coefficient of parentage FA, factor analytic MET, multienvironment trials MM, mixed model
| INTRODUCTION |
|---|
|
|
|---|
A factor analytic (FA) variancecovariance structure, with environments random and genotypes fixed, was considered by Piepho (1997) for the regression of within-site genotypic means on site means. Piepho (1998) considered genotypes as random effects in MM versions of linear and linearbilinear models. Smith et al. (2002) described random effects FA models in the context of random regression coefficients. Mixed model FA structures were used by Crossa et al. (2004) to confirm clusters of sites showing noncrossover genotype x environment interaction formed by the fixed effect sites regression model and shifted multiplicative model. Smith et al. (2005) gave a general formulation of the most common MMs used for analyzing METs, including the FA models.
Mixed models facilitate the use of genetic covariances between relatives by considering genetic values as random variables to be predicted. The genetic covariance between any pair of related genotypes due to their additive genetic effects is two times the coefficient of parentage (COP), fii', times the additive genetic variance,
a2 (Kempthorne, 1969; Henderson, 1976). The matrix A = 2[fii'] is the additive relationship matrix (Henderson, 1976); thus, A
a2 is the variancecovariance matrix of the breeding values (or additive genetic effects). Assuming loci segregate independently, the general covariance from the epistatic interaction between loci for any type of relatives includes, among others, coefficients (2fii')2 for additive x additive genetic effects (Falconer and Mackay, 1996).
Conceptually, for any number of loci, the genotypic value of any individual can be expressed as the sum of the average effect of allele substitution of each of its parents at each locus (additive effects or breeding value), plus a deviation due to nonadditive effects (dominance and epistatic interactions) (Falconer and Mackay, 1996; Bernardo, 2002). When genotypes from one breeding population are crossed with those from another breeding population, the mean genotypic value of the progeny is equal to a general mean plus a sum of the additive effects of the two parents (general combining ability of the parents) and nonadditive effects due to specific combinations of alleles at different loci (specific combining ability) (Bernardo, 2002). A genotype with large positive additive effects tends to perform well in crosses with others, whereas other genotypes may perform well only in specific crosses.
The commercial value of a line is measured by the overall genetic effect of that line (additive + additive x additive), whereas its potential for being a good parent is solely due to its additive effect or breeding value (which is how much of genetic value passes on to its progeny). It is, therefore, of some practical importance to be able to separate additive and epistatic genetic effects. Different mating designs (e.g., diallel crosses) can be used to assess the general and specific combining ability of genotypes, but have several drawbacks including (i) cost because they are established in addition to the usual METs, and (ii) inefficiency, as only a small number of lines can be tested.
In a recent study, Oakey et al. (2006) proposed a statistical MM for self-pollinated species evaluated in a single replicated trial that partitioned total genetic effects, g, into additive effects, a, and non-additive effects, i. The authors incorporated the additive relationship matrix, A, to model the covariance between relatives for the additive part, but used the identity matrix, I, for the non-additive component. By this procedure, Oakey et al. (2006) were able to overcome some of the drawbacks of standard mating systems (i.e., diallel) used for estimating breeding values of lines.
It would be desirable to predict additive effects separated from non-additive effects and to study the interaction of these components with environments using standard MET data. Recently, Crossa et al. (2006) used different MMs with MET data for modeling the main effects of genotypes and genotype x environment interaction using information about related genotypes of wheat. They obtained BLUPs of the genetic effects using genetic variancecovariance structures constructed as the Kronecker product (direct product) of a structured matrix of genetic variances and covariances across sites, and a matrix, A, of genetic relationships between strains. Results showed that the direct product of FA structures with matrix A efficiently modeled the main effects of genotypes and genotype x environment interaction.
This study extends the MMs developed by Oakey et al. (2006) for partitioning total genetic effects into additive effects and additive x additive effects by using the matrix à to model additive x additive genetic covariances and uses the factor analytic models proposed by Crossa et al. (2006) to partition the genotype x environment interaction into additive x environment interaction and additive x additive x environment interaction. Two CIMMYT international wheat METs were used to illustrate the theory.
| MATERIALS AND METHODS |
|---|
|
|
|---|
a2 + (2fi'i)2
aa2, where fi'i is the COP between individuals i' and i,
a2 is the additive genetic variance, and
aa2 is the additive x additive genetic variance. Assuming linkage and identity equilibrium, it seems justified to use (2fi'i)2, which in matrix notation can be represented by (A#A) = Ã, as the coefficient of the additive x additive component (where # is the element-wise multiplication operator) (Falconer and Mackay, 1996). The elements of the COP matrix, fii', are expressed as (1/2)(1 + Ft), where Ft is the coefficient of inbreeding between lines i and i' at generation t after crossing. In self-pollinating species, inbreeding accumulates through genes shared by ancestors and through the selfing process. Sister lines, those derived from the same cross, have different COP values depending on the generation since crossing of their most recent common ancestor. Sister lines which share a later common ancestor have higher COPs.
Routine computation of COP matrices requires information on relatives, as far back in a pedigree as possible, and on generations of selfing after the last cross. The International Crop Information System (ICIS) (McLaren et al., 2005) is designed to manage pedigree information, and the Browse application of ICIS (http://cropwiki.irri.org/icis/index.php/TDM GMS Browse, verified 13 Dec. 2006), which accounts for relationships between sister lines as well as inbreeding, was used to compute the COP matrices; therefore, no matrix adjustments are required.
Mixed Models
Following Crossa et al. (2006) and depending on how the additive and additive x additive main effects and the additive x environment interaction and additive x additive x environment interaction effects are modeled, two mixed models are used. We follow the MMs terminology used by Oakey et al. (2006) for describing the overall genetic effects, g, and its components a (additive effects) and i (additive x additive effects), and the MMs of Crossa et al. (2006) for expressing the total genetic x environment interaction effects (ge) and their additive x environment interaction (ae) and additive x additive x environment (ie) components.
Mixed Model 1
Mixed Model 1 combines the genetic main effects and genetic x environment interaction effects for fitting the data from g genotypes (i = 1, 2,..., g), s sites (j = 1, 2, ..., s), and r replicates (in each site) using the A and à matrices (both of dimensions g x g). The MM1 is written as
![]() | [1] |
, respectively. The random effects r, a1, i1, and
are assumed to be normally distributed, with zero mean vectors and variancecovariance matrices denoted by R, Ga1, Gi1, and N, respectively, such that
![]() | [2] |
![]() | [3] |
is the Kronecker product (direct product) operator and
a1 is a site additive genetic variancecovariance matrix with the additive plus additive x environment interaction genetic variance within the jth site on the diagonal,
a1(j)2, and the additive plus additive x environment interaction genetic covariance between sites j and j',
jj'
a1(j)
a1(j') on the off-diagonal; thus
jj' is the correlation of the additive and additive x environment interaction effects between sites j and j'. Matrix Gi1 is similarly defined, with A replaced by à and the additive genetic variancecovariance matrix
a1 replaced by an additive x additive epistatic genetic variancecovariance matrix
i1.
Assuming independence between vectors a1 and i1, the total genetic effect, g1, has a normal distribution with mean zero and variancecovariance matrix Gg1 = Ga1 + Gi1. The MM equations and the solution for the vector of fixed effects of site means
and the vectors of random effects
, â1, and î1 are obtained following Henderson (1975).
Unlike Oakey et al. (2006), in this study, the structure of the vector of fixed site effects, b, does not include global field variation such as fixed row or column effects or any possible source of extraneous field variability due to management such as serpentine planting, irrigation, etc. Furthermore, the structure of N does not include any random extraneous or local spatially dependent effects that could have been modeled, for example, by an autoregressive process in the direction of the rows and columns (Gilmour et al., 1997).
Mixed Model 2
Mixed Model 2 distinguishes the total genetic main effects, g, and their additive, a, and additive x additive, i, components, from the total genetic x environment interaction effects, ge, and their additive x environment interaction (ae) and additive x additive x environment components for fitting g genotypes, s sites, and r replicates using the A and à matrices of additive and additive x additive relationships, respectively. The MM2 model is denoted by the following mixed linear model:
![]() | [2] |
contain random effects of replicates within sites, additive, additive x additive, additive x environment interaction, additive x additive x environment interaction, and residuals, respectively, and are assumed to be random and normally distributed with zero mean vectors and variancecovariance matrices R, Ga, Gi, Gae, Gie, and N, respectively, such that
![]() | [5] |
a2A, and Gi =
aa2 Ã.
The variancecovariance of the random vector of additive x environment interaction interaction effects, additive x environment interaction, is modeled as
![]() | [6] |
ae is the additive x environment variance
aej2 within the jth site, and the jj'th off-diagonal element is the additive x environment covariance
jj'
aej
aej' between sites j and j'; thus
jj' is the correlation of additive x environment effects between sites j and j'. Similarly, matrix Gie =
ie
Ã, where
ie is the additive x additive x environment variancecovariance matrix of dimensions s x s.
Assuming independence between vectors additive x environment and additive x additive x environment interaction, the total ge effect has a normal distribution with mean zero and variancecovariance Gge = Gae + Gie. As in the case of MM1, the MM equations and the solution for the vector of fixed site effects,
, and the vectors of random effects of replicates within sites,
; additive, â; additive x additive, î; additive x environment,
; and additive x additive x environment interaction,
, are obtained following Henderson (1975).
Modeling
a1 and
i1 of MM1 and
ae and
ie of MM2 Using the Factor Analytic Model
The structure used in this study for modeling the variancecovariance matrices of MM1 and MM2 is the FA model, which models the variancecovariance relationships in these matrices as a small number of unobserved factors. To describe the FA model of the variancecovariance matrices of MM1 and MM2, we follow Smith et al. (2002) but adapted to the case where the total genetic effect is partitioned into additive and additive x additive.
For MM1, the Appendix shows that the factor analytic model that defines matrices
a1 and
i1 in Gg1 when g1 = a1 + i1 is
![]() | [7] |
ae and
ie in Gge of model MM2 can be derived for ge = ae + ie.
Biplots of Fitted Models MM1 and MM2
Biplots of the fitted factor analytic models can be obtained directly from the scores of the lines and loadings of the sites for the a1 and i1 effects of MM1, and for the additive x environment and additive x additive x environment interaction effects of MM2. Biplots from MM1 and MM2 models for Data Set 1 will be presented.
ASReml
Variance component estimation and fitting of the FA covariance structures were done using the restricted maximum likelihood (REML) method implemented in ASReml (Gilmour et al., 2002). Obtaining a solution for this model is sometimes cumbersome because the model is complex and because of possible multicolinearity between the variancecovariance matrix of additive and additive x additive effects A and Ã, respectively.
When using ASReml, appropriate selection of initial values for the FA model is important. The DIAG solution provides a good starting point for fitting FA(1), and in general, the FA(k-1) solution provides a good starting point for fitting FA(k) (k = 2, 3....). Sometimes the choice of initial values can delay the convergence, and the algorithm may even fail to converge; therefore, the use of satisfactory initial values for variance parameters in the FA structure is of paramount importance. Usually the likelihood function is flat, and various local maximums exist that force the user to perform a detailed process of model fitting, modifying and trying out different initial and changing parameter values that control the maximization process. The problem of identifiability of variance models is similar to the issue of fitting an over-parameterized fixed model. Identifiability problems of the variance components of a and i effects in MM2 were solved by constraining the variance
a2 of a to be equal to the variance
aa2 of i.
Experimental Data
The data are derived from two CIMMYT bread wheat METs. The variable analyzed was grain yield (Mg ha1). Data Set 1 contains data from 47 genotypes (147) arranged in an incomplete block design with two replicates in each of 10 sites. There were six sets of sister lines: the THILI group {17, 18}, KETUPA group{20, 21, 22}, OTUS group {26, 27}, CAZO/KAUZ group {28, 29}, OASIS group {34, 35, 36, 37, 38, 39, 40}, and SERI group {42, 45, 47}. Data Set 2 had 49 lines (149) arranged in an incomplete block design, with two replications at each of 15 sites. In this data set there were seven sets of sister lines denoted as the OTUS group {6, 7}, CROC group {15, 16}, WEAVER group {20, 21}, CHEN1 group {28, 29}, PARA group {30, 31}, CHEN group {33, 34}, and BABA group {42, 43}.
| RESULTS |
|---|
|
|
|---|
a1 of
(
a1) = 2.212, while MM2 gave an estimate of the average of the diagonal elements of
ae of
(
ae) = 2.251. Similarly, models MM1 and MM2 gave similar values for the average of the diagonal elements of
i1 and
ie,
(
i1) = 1.877 and
(
ie) = 1.872, respectively. Note that the parameters
a2 and
aa2 were assumed to be equal, and their estimate was 0.002 (Table 1). For Data Set 1, the residual variances for MM1 and MM2 were very similar, 0.242 and 0.241, respectively. For Data Set 2, only model MM1 was fitted and gave estimates of additive plus additive x environment interaction variance of
(
a1) = 1.956 and an estimate of additive x additive plus additive x additive x environment interaction variance of
(
i1) = 1.093, with a residual error of 0.250.
|
|
From MM1 and MM2, the 10 lines with the highest BLUPs of the total genetic effects for grain yield were lines 30, 2, 5, 22, 4, 43, 16, 37, 38, and 17 (Table 3); for both models, lines 30, 2, 5, 22, 16, and 37 were also within the 10 best with respect to additive genetic effects. Interestingly, lines 44, 40, 25, and 15, which did not perform within the 10 best in terms of BLUPs of total genetic effects as estimated using MM1 and MM2, were within the best 10 performers for the BLUPs of the additive effects for both MM1 and MM2. Line 43 was a good performer in terms of the BLUPs of the total genetic effects (it ranked sixth) and it showed the best BLUP of the additive x additive effects for MM1 and MM2 (it ranked first); however, it ranked 39th and 38th with respect to the BLUPs of the additive effects in models MM1 and MM2, respectively, indicating that line 43 would not be considered a good parent to be used in future crosses (it ranked 36th). However, sister lines 42, 45, and 47 were among the worst 10 performers in terms of overall genetic effects, as well as additive and additive x environment effects for MM1 and MM2. Concerning additive x additive effects, the 10 lines with the highest BLUPs of the additive x additive effects for MM1 were 30, 5, 4, 43, 17, 20, 31, 26, 3, and 6, whereas for MM2, the 10 best lines were 30, 4, 43, 17, 20, 31, 26, 3, 6, and 13 (Table 3). Lines 30, 4, 43, and 17 had high values of BLUPs of the additive x additive effects in MM1 and MM2 and were within the best 10 performers with respect to g1 and a1 in MM1 and also with respect to g and a in MM2. In contrast, lines 20, 31, 26, 3, 6, and 13 had intermediate to low BLUPs of the total genetic effects in both models.
|
Data Set 2
From the best 10 lines with respect to BLUPs of g1 (MM1) (41, 40, 19, 23, 26, 47, 28, 24, 6, and 34), only lines 41, 40, and 23 were within the 10 highest with respect to BLUPs of a1 (Table 4), and only lines 19, 26, 47, 6, and 34 were within the 10 lines with the highest BLUPs of i1. Other lines, such as sister lines {43, 42} as well as lines 18, 212, and 39, were within the best 10 lines with respect to BLUPs of a1 and therefore can be considered good potential parents despite their low commercial performance, indicated by the low values of BLUPs of g1. Lines 40, 41, and 23 had high commercial values and are good potential parents.
|
Biplots of the a1 and i1 of MM1 and the Additive x Environment and the Additive x Additive x Environment interaction of MM2 for Data Set 1
The additive and additive x environment interaction (a1) patterns modeled using the factor analytic variancecovariance of model MM1 are depicted in Fig. 1
. Subsets of sister lines of the SERI group {42, 45, 47} and the OTUS group {26, 27} tended to have negative additive interaction with most sites included in this analysis, as they are located in the quadrant of the biplot opposite from where the sites are located. Similar responses were shown for lines 1, 6, 13, 43, 46, and 11. The biplot of MM1 shows subsets of lines with promising performance as parents in specific sets of environments. For example, lines 42, 45, 47, 1, 6, 13, 43, 46, and 11 had negative additive interactions with most sites, but specifically with sites S1, S2, and S7, whereas lines 2, 5, 15, 22, and 25 (located in the opposite quadrant) had a positive interaction with most sites, but specifically with sites S1, S2, and S7. As already mentioned, lines 2, 5, 15, 22, and 25 had high performance for additive effects (Table 3) and should be good parents, especially for environmental conditions prevailing in sites S1, S2, and S7. On the other hand, lines 42, 45, 47, 1, 6, 13, 43, 46, 41, and 11 were poor parents in all environments, but especially in sites S1, S2, and S7 (Fig. 1 and Table 3). In terms of environments, S1, S2, and S7 were the sites that most discriminated the lines in terms of additive and additive x environment effects, whereas sites S3, S8, S9, and S10, in the center of the biplot, were more neutral to the additive and additive x environment effects of the lines.
|
|
| DISCUSSION AND CONCLUSIONS |
|---|
|
|
|---|
Therefore, making crosses between good potential parents will require considering sets of parents with high breeding values for certain target populations of environments. If additive effects predominate, as they do in Data Set 1, the plant breeder's task will be relatively easy, as he will simply accumulate those genes and phenotypic expression will improve incrementally. However, if additive x additive effects show some importance, as in Data Set 2, it will be more difficult to progress, given that selection must focus on the best combination of genes, many of which may not individually contribute to phenotype.
In instances where additive x additive effects play an important role in determining commercial yield, progress may not necessarily be linked to crossing parents with high breeding values and may in fact be achieved by crossing parents with lower breeding values, but targeted to specific sets of environments. Nevertheless, both characteristics (good commercial values and high additive effects) can be combined in certain lines with relatively high additive effects at certain sites and other lines that performed well commercially and had high additive effects at other sets of sites. As more and more genes are being tagged using DNA markers, it is now possible to better estimate gene effects, thereby improving our understanding of complex epistatic interaction. This will allow plant breeders to better exploit both additive and additive x additive variation in their breeding programs. Defining a target set of environments and lines for additive as well as additive x additive effects considering additive x environment and additive x additive x environment interactions should help breeders maximize overall genetic gains.
The MMs presented in this research have the advantage over the models of Crossa et al. (2006) that they allow partitioning the total genetic effects into additive x environment interaction effects and additive x additive x environment interaction effects, and over the model of Oakey et al. (2006) that they model the interactions of additive and additive x additive with environments. The computational effort when using ASReml is justified by the fact that more insightful understanding of the additive and additive x additive with environments of the lines can be obtained. Applying the models presented in this study using other software packages should be feasible.
Results of this research indicate that lines with high breeding values may not necessarily have high commercial values. However, it is possible to find lines with high overall production and high overall additive effects; they should be used in a crossing program so that lines with high additive effects can be crossed with each other, and crosses may also be made between lines with predominately additive effects and lines with additive x additive effects. This study shows that (i) additive x additive x environment interactions contribute to the overall commercial performance of a line, and (ii) subdivision of environments is required to exploit the positive additive x environment interactions of some lines in specific environments. Since additive x additive x environment interactions also contribute to the overall commercial performance of a line, these two criteria should be considered when selecting potential parents. Defining target sets of environments and lines with positive additive x environment and additive x additive x environment interactions should help breeders maximize overall genetic gains.
In summary, the results of this study show that it is possible to use matrices A and à for modeling a1, i1, ae, and ie through mixed linear models to select wheat lines with high additive effects under some environmental conditions, but that may not have similar responses in other environments. Some environments may favor lines with positive a1 or ae effects and disfavor other lines with positive i1 or ie effects. Wheat lines included in Data Set 1 had higher additive effects than additive x additive epistasis, whereas lines included in Data Set 2 had similar additive and additive x additive effects.
Further research is required to investigate the usefulness of partitioning total genetic effects into additive and additive x additive and their interactions with environments for enhancing the linkage between mapped markers and phenotypic trait observations used in association genetics analyses.
| APPENDIX |
|---|
|
|
|---|
jk for k = 1, 2, ... t, plus a residual
ij:
![]() | [A1] |
jk is the loading of the jth environment (environmental potentiality) in the kth latent factor, xik is the score of the ith genotype (genotypic sensitivity) in the kth latent factor, and
ij is the unexplained residual term. The variables xik are not observed, but play the role of independent variables in the standard regression theory. The g genotype effects in of s environments can be stacked into a vector, g1, of order gs x 1 and equation [A1] can be expressed in matrix form as:
![]() | [A2] |
k (for the kth latent variable of sites) is of order s x 1 so that matrix
k
Ig is of order gs x g, vector xk (for the kth latent variable of genotype) is of order g x 1 so that vector (
k
Ig)xk is of order gs x 1, and vector
is of order gs x 1. Equation [A2] can be written in a more compact form as:
![]() | [A3] |
is a matrix of order s x k, where the kth column contains the site loadings for the kth latent factor, matrix (
Ig) is of order gs x gk, vector x is of order gk x 1 and contains the genotypic scores for the latent factors stacked to conform with
, and therefore (
Ig)x is a gs x 1 vector.
Since we have assumed that genotypes are unrelated, the random effects x and
are independent and have a joint normal distribution with mean vector of zero and variances V(x) = Ik
Ig (of order gk x gk) and V(
) =
Ig (of order gs x gs), where Ik is an identity matrix of order k x k, and
is a diagonal matrix (
12, 
22,..., 
s2) of order s x s.
Therefore, the variancecovariance of g1, Gg1 (of order gs x gs) is
![]() | [A4] |
s),
is a s x k matrix of
and
is a s x s diagonal matrix, with s possibly having different nonnegative parameters on the diagonal. When only one factor is considered, k = 1, the model has one multiplicative term and is denoted as FA(1), for k = 2 FA(2) has two multiplicative components, etc. Thus, FA can be interpreted as the linear regression of genotype and ge on environmental covariates (environmental loadings), with each genotype having a separate slope (genotypic scores) but a common intercept (if main effects of genotypes are not distinguished from ge as in MM1). The slopes of genotypes measure the sensitivity of the genotypes to hypothetical environmental factors represented by the loadings of each site (Smith et al., 2002).
Now assume that genotypes are related such that the covariance between relatives, within sites, is proportional to A for additive effects and proportional to à for additive x additive effects and that g1 (of MM1) is partitioned into additive, a1, and additive x additive, i1 effects; then A3 becomes
![]() | [A5] |
a1 and xa1 denote the additive main effects and additive x environment interaction, and the subscript i1 for
i1 and xi1 represents the additive x additive main effects and additive x additive x environment interaction components of the total genetic effect, g1. Then the random effects xa1, xi1, and
can be assumed to be independent with a joint normal distribution with mean zero and variances V(xa1) = (Ik
A), V(xi1) = (Ik
Ã), and V(
) =
(A + Ã). Thus, the variance of the total random genetic effects, g1, is
![]() | [A6] |
![]() | [A7] |
Similar development can be obtained for MM2, in which the random main effects of additive and additive x additive and their interactions with environments are separated from the additive x environment and additive x additive x environment interaction effects.
| ACKNOWLEDGMENTS |
|---|
Received for publication September 6, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Burgueno, J. Crossa, P. L. Cornelius, and R.-C. Yang Using Factor Analytic Models for Joining Environments and Genotypes without Crossover Genotype x Environment Interaction Crop Sci., July 1, 2008; 48(4): 1291 - 1305. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Crossa, J. Burgueno, S. Dreisigacker, M. Vargas, S. A. Herrera-Foessel, M. Lillemo, R. P. Singh, R. Trethowan, M. Warburton, J. Franco, et al. Association Analysis of Historical Bread Wheat Germplasm Using Additive Genetic Covariance of Relatives and Population Structure Genetics, November 1, 2007; 177(3): 1889 - 1913. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||