|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
a Monsanto Co., 800 N. Lindbergh Blvd., St. Louis, MO 63167
b Univ. of Nebraska, Lincoln, NE 68583-0940
c USDA-ARS, Florence, SC
d Washington State Univ., Pullman, WA. Published as Univ. of Nebraska ARD Journal Series no. 14689
* Corresponding author (keskridge1{at}unl.edu).
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: CNN, cv. Cheyenne GEI, genotype-by-environment interaction KPS, kernels per spike ML, maximum likelihood P, precipitation QTL, quantitative trait locus RICLs, recombinant inbred chromosome lines RILs, recombinant inbred lines SEM, structural equation modeling SPSM, spikes per square meter SR, solar irradiance T, temperature TKW, thousand kernel weight YLD, yield.
| INTRODUCTION |
|---|
|
|
|---|
The most commonly used approaches to understanding GEI involve several challenging steps and are limited in their capabilities of characterizing the complex relationships involved in GEI. The first step is to cross at least two genotypes, molecularly characterize hundreds of progeny representing segregants of the entire genome, and evaluate the progeny phenotypically under different environmental conditions (Tanksley et al., 1982; Edwards et al., 1987; Stuber et al., 1987; Lander and Botstein, 1989). Then, the GEI of a number of traits is evaluated for QTL-by-environment interaction where each trait is quantitatively assessed either by comparing results of QTL analyses conducted in individual environments or by using a univariate statistical model with QTL, environment, and QTL-by-environment variables as predictor variables (Stuber et al., 1992; Beavis, 1994; Beavis and Keim, 1996; Vargas et al., 1998, 1999; Crossa et al., 1999). Molecular characterization of hundreds and possibly thousands of progeny across the entire genome is likely not cost effective and using univariate statistical models is not conducive to understanding the complex relationships (epistasis, pleiotropism, etc.) among a number of traits and variables present when describing QTL-by-environment interaction. Moreover, a single dependent variable quantitative approach cannot describe the complicated relationships between traits, QTL, and environments where some traits function simultaneously as both dependent variables to be predicted by other genetic and environmental factors, and as independent predictor variables of other traits. Novel genetic and quantitative methodologies are needed for understanding how genes interact with environment in complex traits.
Here we use an approach for dissecting GEI of complex traits that combines two distinctive components: chromosome substitution lines and structural equation modeling (SEM). Chromosome substitution lines, though rare in animals (Mackay, 1980), are common in wheat (Triticum aestivum L., 2n = 6x = 42) where they have a long history of use in genetic analysis (Sears, 1953). In a wheat chromosome substitution line, 20 of the 21 chromosome pairs come from one cultivar (recurrent parent) and the remaining chromosome pair comes from a second cultivar (donor parent). By comparing the recurrent parent to its chromosome substitution line, the contribution of genes on the substituted chromosome from the donor parent or the loss of genes from the recurrent parent can be identified. Substitution lines effectively dissect the large hexaploid wheat genome into 1/21 increments with a common background, thus greatly reducing background variability and effects that would be found in segregating progeny populations from two parents. The recurrent parent and chromosome substitution lines are true breeding, hence can be tested in multiple environments. In previous research (Berke et al., 1992), chromosome 3A contained genes affecting anthesis date, grain yield, and yield components.
To dissect the location and relationship of genes on a chromosome, recombinant inbred chromosome lines (RICLs) can be developed (Law, 1966; Berke et al., 1992; Joppa et al., 1997; Shah et al., 1999) which are analogous to recombinant inbred lines (RILs) but in this case 20 chromosome pairs are from the recurrent parent and only the chromosome of interest is allowed to recombine, again reducing the background variability. Since RICL lines are true breeding and can be evaluated in numerous environments for the traits of interest, QTL-by-environment interaction can be precisely evaluated which can give considerable insight into complex gene-environment interactions when used in conjunction with SEM.
Genotype-by-environment interactions are very common in plants, yet understanding the underlying QTL-by-environment interaction is difficult. Most agronomically important traits are the result of a number of genetic, molecular, and physiological mechanisms that affect the trait of interest either directly or indirectly through other intermediate traits (Adams 1967; Thomas et al., 1970; Hamid and Grafius 1978; García del Moral et al., 1991; Dofing and Knight 1992). Depending on the trait, this network among variables can be quite complex with the relationships among variables depending on a number of factors, foremost of which is the stage of development and the environmental conditions during those stages. In a similar fashion, the GEI of each trait in the system will be influenced, either directly or indirectly, by a number of QTL-by-environment interactions and GEI of other traits, which may, in turn, be influenced by other factors (Campbell et al., 2003). Identifying these complex relationships among variables will likely require considerable precision and since RICLs are more powerful for QTL analysis than RILs (Kaeppler, 1997), RICLs will provide more precision for understanding QTL-by-environment interaction.
Structural equation modeling is a generalization of path analysis proposed by Wright (1921) and is used to quantitatively analyze the causal structure among a number of variables where each may function as a dependent variable in some equations and an independent variable in others (Bollen, 1989). Because of this, SEM is ideal for characterizing the complex relationships of GEI where many traits function as both dependent variables to be predicted by environmental and genetic factors, but also as independent predictor variables of other traits further downstream. To use SEM to analyze GEI, prior knowledge of the direction of the causal relationships is assumed and specified through a path diagram and the model is then algebraically specified by a system of regression-type equations where each variable is adjusted to contain only GEI effects. A final model is then developed by fitting successive models and retaining significant QTL-by-environment variables which result in a better fitting model. The final model yields path coefficients and a path diagram that contains only significant paths thus giving insight into important relationships between the traits, QTL, and the environmental variables. The purpose of this work is to present a systematic approach for understanding GEI of complex traits by using chromosome substitution lines and a structural equation model that approximates the complicated relationships among QTL, environmental variables, and traits.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Environmental covariates for the first two developmental periods were used to model SPSM and KPS GEIs, since preanthesis environmental conditions were most critical for those traits (Donmez et al., 2001). In a similar way, environmental covariates for the last two developmental periods were used to model TKW GEI since this trait is mostly influenced by the environmental conditions during reproductive and grain-filling periods (Donmez et al., 2001). Since YLD is affected by all yield components, which are associated with different stages of development, all environmental covariates were used to model YLD GEI. Total precipitation for 3 mo before sowing was also used for modeling all yield and yield component GEIs to consider pre-sowing soil moisture.
Definition of Variables
Structural equation modeling requires that variables be identified as either exogenous or endogenous. Exogenous variables are those determined outside of the system and are always used as independent variables (predictor variables) whereas endogenous variables are determined by the system and are used as dependent as well as independent variables in the system of equations. Since our objective was to model yield GEI, observed yield and yield component residuals (YLDGEI, TKWGEI, KPSGEI, and SPSMGEI) were used as the measured endogenous Y variables. These residuals were obtained by subtracting the estimated main effects of genotypes, environments and blocks within environments from observed values. As an example, for variable Y, YGEI = Y G E B(E) where G and E are genotype and environment main effects and B(E) is the effect of block nested in environment. The exogenous variables (Xij) were the cross-products of the ith genotypic covariate and jth environmental covariate. Since genotype and environment main effects were removed from the endogenous variables, the only feasible exogenous variables were those that were cross-products of genotypic and environmental covariates, and no environmental or genotypic covariates by themselves were considered in the model. In addition, because yield and yield component residuals were modeled, exogenous variables (X) were also adjusted for main effects. For example, let Yi be a column vector of the ith endogenous variable, and, Z and X be exogenous variables where Z is the incidence matrix of the main effects for genotype, environment and block within environment. Then the full linear model is: Yi = Zß1 +Xß2 + ei. After fitting the main effects of genotype, environment, and block within environment (Z), the vector of residuals is, Yi Zß10 = [I Z(Z`Z)1Z`]Yi, where ß10 is the estimate for ß1 which equals (Z`Z)1Z`Yi. The resulting reduced model is (I Pz)Yi = (I Pz)Xß2 + di, where Pz = Z(Z`Z)1Z` and, di = (I Pz)ei. As our primary interest was to estimate ß2 in the reduced model, we adjusted variable X with I Pz before using it as a predictor. This adjustment allowed us to get estimates equivalent to what we would get by fitting the full model and were based on standard linear model theory (Searle, 1971; Ravishanker and Dey, 2002; Dhugana, 2004). All plot observations for all Y and X variables used in this study were adjusted for I Pz to remove main effects of genotypes, environments, and blocks as described in the example.
Statistical Model Formulation
Prior knowledge of the directions of causal relationships among the variables as required by SEM, was based on the causal, unidirectional relationships among yield components for small grains proposed by Dofing and Knight (1992). They proposed these relationships among yield components because: (i) yield components develop sequentially, with later-developing components being controlled by earlier-developing ones (Adams, 1967; Thomas et al., 1970; Hamid and Grafius, 1978; García del Moral et al., 1991; Dofing and Knight, 1992), and (ii) the unidirectional model more realistically reflects operative causal relationships among yield components than a bidirectional model. The unidirectional path model has been used in most of the recent yield component analysis for small grains (Donaldson et al., 2001; García del Moral et al., 2003; Maman et al., 2004).
The model was algebraically specified using SEM with observed variables (Model 1) (Bollen, 1989):
![]() | [1] |
|
\mathbf|<|\mathrm|<|B|>||>||<|=|>|
|
X is a q x 1 vector of (I Pz) adjusted exogenous variables, where q = number of cross-products of genotypic covariates with environmental covariates (Xij) retained in the model;
is the 4 x q coefficient matrix that shows the causal relationship among endogenous and exogenous variables;
is the 4 x 1 disturbance vector associated with endogenous variables and is assumed to have E(
) = 0 and covariance matrix, E(
`) =
and uncorrelated with the exogenous variables. It is also assumed that (I B) is nonsingular so that (I B)1 exists. Generally, B is a triangular matrix with zeroes on its diagonal.
It is important to note that Model 1 is a generalization of factorial regression. After simplification, Model 1 becomes Y = (I B)1
X + (I B)1
and by letting
= (I B)1
and f = (I B)1
, results in a multivariate factorial regression model, Y =
X + f. Each row of this model represents a factorial regression model for each dependent variable (YLDGEI, TKWGEI, KPSGEI, and SPSMGEI) as a function of (I Pz) adjusted cross product covariates (Xij). Here
is a 4 x q coefficient matrix showing the effect of q cross product terms on yield and yield components GEIs. By using SEM, we are able to estimate
in terms of B and
, which provide additional insight into how covariates contribute to yield GEI directly and indirectly via yield component GEIs, how yield component GEIs are interrelated, and which yield component GEI explained most of the yield GEI variance. With
= 0, Model 1 becomes a path model for yield components residuals (YLDGEI, TKWGEI, KPSGEI, and SPSMGEI).
Estimation
Conceptually, SEM parameters are estimated by minimizing the difference between the observed covariance matrix (
in Eq. [2]) and the model-predicted covariance matrix {
(
) in Eq. [3], where
contains the elements of model parameters B,
,
} (Bollen 1989). The observed covariance matrix (
) and the model-predicted covariance matrix [
(
)] were defined as
![]() | [|<|\dagger|>|] |
|
2 statistic is (n 1) times the fit function (FML), where n is number of observations (Schumacker and Lomax, 1996). A nonsignificant chi-square value (P > 0.05) indicates that the two matrices [
and
(
)] are not statistically different. The degrees of freedom (df) associated with
2 are deduced as: df = (1/2)(p + q)(p + q + 1) t, where p + q is the number of observed variables analyzed and t is the total number of parameters estimated. The
2 test is sensitive to sample size (Schumacker and Lomax, 1996), however, it is the only statistical test available to test the overall fit of the model. Several goodness-of-fit indices (GFI, AGFI, and NFI) were used to evaluate the fit of the structural equation model. GFI is the percentage of observed covariances explained by the model implied covariances (Pugesek, 2003; Schumacker and Lomax, 1996). GFI = FML/F0, where F0 is the fit function for the null model (i.e., when B = 0 and
= 0 in Eq. [1]). When the df is large relative to the sample size, GFI is biased downward (Kline, 1998). The AGFI is a variant of GFI which uses mean squares instead of total sums of squares. Equivalently, the AGFI adjusts the GFI for the df of a model relative to the number of variables (Schumacker and Lomax, 1996; Jöreskog and Sorbom, 1996). NFI reflects the proportion by which the proposed model improves the fit compared to the null model (Bentler and Bonnett, 1980; Schumacker and Lomax, 1996) and is given by (
null2
model2)/
null2. GFI, AGFI, and NFI values range from zero to one, with one for the best possible fit.
Nonsignificant
2 and GFI, AGFI, and NFI values greater than or equal to 0.90 were used to justify the final model. The squared multiple correlation, R2 for each yield and yield component was reported. The results were interpreted using standardized coefficients and by calculating the total direct and indirect effects of the cross-product covariates (Xij) and yield-component GEIs on yield GEI.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
2(
(21)2 = 14.65, P = 0.84) and the GFI, AGFI, and NFI goodness-of-fit indices all being considerably greater than 0.95. Exogenous variables retained in the final model had a significant path to at least one of the endogenous variables (Fig. 1
). The correlations between the exogenous variables in the final model ranged from 0.64 to 0.84 with six coefficients being significant, P < 0.05 (data not shown).
|
|
Three marker x environmental covariate interactions (Xtam055 x T1, Xbarc67 x T2, and Xbcd1555 x P2) that were retained in the model were predictors of GEI in SPSM (Fig. 1). The large positive effect of Xtam055 x T1 on SPSMGEI indicated that higher temperatures from seedling emergence to terminal spikelet initiation provided relatively more favorable conditions for WI genotypes at Xtam055 for spike development. Similarly, the positive coefficient between Xbarc67 x T2 and SPSMGEI implied that higher temperatures in the second growth (T2) period were relatively more favorable for the WI genotype at Xbarc67 than CNN in terms of SPSM. The negative coefficient between Xbcd1555 x P2 and SPSM GEI showed that increased precipitation in the reproductive period (P2) was relatively less favorable for the WI genotype at Xbarc67 than CNN in terms of SPSM. Campbell et al. (2004) also found Xtam055 x T1 to have the largest effect on SPSMGEI and although they did not find a relationship between Xbarc67 x T2 and SPSMGEI,, they did find similar covariates containing markers near Xbarc67 (Xksua6 x P1, Xksua6 x T1, and Xbcd366 x T1) that were related to SPSMGEI.
The negative relationship between Xbcd1555 x P1 and KPS GEI (KPSGEI) suggested that increased precipitation during vegetative growth (P1) was relatively less favorable for the WI genotype for KPS. However, the small significant direct positive effect of Xbcd1555 x P1 on yield GEI nullified the indirect negative effect of this covariate through KPSGEI (Table 2). KPSGEI was also affected by Xksua6 x T1 and Xtam055 x P0 where the positive effect of Xksua6 x T1 on KPSGEI was interpreted in a manner similar to Xtam055 x T1 on SPSMGEI. The negative relationship between Xtam055 x P0 and KPSGEI indicated that increased soil moisture during the pre-sowing period (P0) was relatively less favorable for WI genotypes compared to CNN at Xtam055. Similar results were reported in Campbell et al. (2004) in that the largest two covariates associated with KPSGEI were Xksua6 x T1 and Xksua6 x P1, however, they did not find significant covariates containing either Xtam055 or Xbcd1555 which possibly could have been due to their factorial regression model failing to account for SPSMGEI as a predictor variable of KPSGEI as in the structural equation model used here.
Both Xbarc86 x T2 and Xtam055 x SR2 were positively related to TKWGEI, while kernel weight GEI (TKWGEI) was affected by Xbarc86 x T2 and Xtam055 x SR2. Campbell et al. (2004) found TKWGEI was related to covariates with markers from similar areas on chromosome 3A (Xbarc86 x SR3, Xbcd366 x SR3, Xbarc67 x SR3). They did not find a covariate with markers close to Xtam055, however, which could have been due to their factorial regression model not including KSPGEI or SPSMGEI as other predictor variables.
These significant relationships between yield and yield component GEIs suggest that our understanding of yield GEI can be enhanced by identifying the genotype and environmental factors that contribute to the GEIs in TKW, KPS, and in particular, SPSM. Hence, in future GEI studies, more effort should be focused on finding factors that predict yield component GEIs to improve our understanding of the molecular character of genes that control important traits.
The positive coefficient between Xbarc67 x T2 and SPSMGEI implied that higher temperatures in the second growth (T2) period were relatively more favorable for the WI genotype at Xbarc67 than CNN in terms of SPSM. The higher temperature may have provided favorable conditions to form more spikes or to prevent the formed spikes from being aborted. These results are logical in that WI was developed in Kansas and CNN in Nebraska. The higher temperatures during development would be more similar to the "Kansas" environment where WI genotypes would be favored. Campbell et al. (2004) also identified Xbarc67 x T2 as the largest marker x environmental covariate interaction responsible for yield GEI using individual factorial regression. However, our results demonstrate a deeper understanding of the yield GEI by showing that the effect of Xbarc67 x T2 on yield GEI was partly due to its direct effect and partly due to its indirect effect via KPS GEI and TKW GEI. The negative relationships of Xbcd1555 x P1 with KPSGEI and Xbcd1555 x P2 with SPSMGEI could mean that WI is more drought tolerant than CNN since it was developed in hotter Kansas environments.
In this study, we have demonstrated a flexible method to study GEI using quantitative traits. This approach targets genes on an individual chromosome via chromosome substitution lines and RICLs and utilizes a single, biologically relevant model incorporating causal relationships among interrelated traits and environmental factors via SEM. Unlike traditional QTL and GEI approaches, a biologically meaningful conclusion can be reached within the scope of this approach, since complex direct and indirect relationships among genetic and environmental variables are accounted for. An "all inclusive" model is increasingly important to consider when studying complex traits of interest (e.g., grain yield) in wheat and other higher plants. Grain yield in wheat is known to be controlled by yield-component traits that are interrelated, have compensatory effects, develop sequentially at different growth stages, and have large GEIs. The large wheat grain-yield GEI associated with genes on chromosome 3A has been explained by direct effects of TKW and direct and indirect effects of SPSM and KPS. Moreover, the known GEI associated with the expression of grain yield QTL previously detected on chromosome 3A can be explained by the direct and indirect effects of yield component QTL-by-environmental covariate interactions represented by molecular markers linked to those QTL.
Many of the most popular methods used to analyze GEI, such as the AMMI model, factorial regression, and stability analysis, are based on univariate models that have a single trait as the dependent variable where each trait is analyzed and interpreted separately (Kang and Gauch, 1996; Vargas et al., 1999; Crossa et al., 1999; Campbell et al., 2004). These univariate approaches can provide considerable insight but generally are not helpful in describing the complex relationships among traits and variables present in GEI. In most GEI studies, each trait is separately analyzed without including other traits as predictors with the consequence of not being able to detect the effects of other traits or to obtain a comprehensive understanding of GEI. Factorial regression and partial least squares models as proposed by Crossa et al. (1999) and Vargas et al. (1999) may be used to incorporate other traits as independent variables but even these models are limited in that they are not conducive to depicting complex causal relationships among variables and they cannot account for indirect influences of other traits, QTL, or environmental variables acting through intermediate variables on GEI of the trait of interest. Such indirect influences are well known and expected in yield and yield components analysis and are important in understanding GEI of yield.
Structural equation modeling, unlike most univariate approaches, can be used to develop a succinct, graphic, and comprehensive view of how traits, genetic factors, and environmental variables all work together as a system to affect GEI. Structural equation modeling allows one to decompose relationships among traits, QTL, and environmental variables into direct and indirect causal relationships that aid with understanding how genes interact with environmental factors to affect GEI of important agronomic traits. With SEM, the direction of the causal relationships is described through a path diagram and the model is then algebraically specified by a system of regression-type equations meaning that any GEI method based on linear models, such as factorial regression, can be considered a special case of SEM. In general, SEM can provide insight into processes that can be modeled as systems of linear equations, such as the GEI of yield and yield components in wheat.
This study shows that GEI associated with a complex trait such as grain yield is affected by a multitude of genetic and environmental factors. Given the additional insight provided by our approach, using substitution lines in conjunction with SEM is a reasonable approach for studying GEI in other plants and organisms. As genes are identified that control complex and agronomically important traits such as grain yield (Ashikari et al., 2005), this approach can easily be extended to describe how the genes interact with the environment to create the needed phenotypes with higher and more stable grain yields.
| ACKNOWLEDGMENTS |
|---|
Received for publication June 26, 2006.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |