Crop Science Illumina
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 2 December 2005
Published in Crop Sci 46:174-179 (2006)
© 2005 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (3)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wu, J.
Right arrow Articles by Wu, D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wu, J.
Right arrow Articles by Wu, D.
Agricola
Right arrow Articles by Wu, J.
Right arrow Articles by Wu, D.
Related Collections
Right arrow Biometrics
Right arrow Experiment Design
Right arrow Crop Genetics

CROP BREEDING, GENETICS & CYTOLOGY

Variance Component Estimation Using the Additive, Dominance, and Additive x Additive Model When Genotypes Vary across Environments

Jixiang Wua, Johnie N. Jenkinsb,*, Jack C. McCartyb and Dongfeng Wuc

a Dep. of Plant and Soil Sciences,
b Crop Science Research Laboratory, USDA-ARS,
c Dep. of Mathematics and Statistics, Mississippi State Univ., Mississippi State, MS 39762

* Corresponding author (jnjenkins{at}ars.usda.gov).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 GENETIC MODELS AND METHODOLOGY
 SIMULATION RESULTS
 DISCUSSION
 REFERENCES
 
In addition to additive (A) and dominant (D) genetic effects, the A x A interaction (or A x A epistatic) effects that control many quantitative traits are important for genetic and breeding studies. To estimate these genetic variance components, including genotype x environment (G x E) interaction, one usually expects to have data from at least two generations (i.e., F1 and F2) and parents with the same entries in all environments. Practical difficulties may arise in implementing such a design. In this study, we performed Monte Carlo simulations to compare the estimated variance components between four partial and two complete genetic designs (GDs) using the mixed linear model approach. Our definition for GD is different from the traditional definitions of genetic mating designs. Simulation results showed that the estimated genetic variance components for A, A x E, A x A epistatic, and A x A x E effects were unbiased for the six designs. Among four partial designs, two provided the comparable results for D and D x E effects compared with the complete GDs, but with slightly larger mean square errors (MSEs), indicating that some partial GDs can be used when the genetic resources are limited.

Abbreviations: {sigma}µ2, variance component mean • A, additive • D, dominance or dominant • E, environment • G, genotype • GD, genetic design • MINQUE, minimum norm quadratic unbiased estimation • MSE, mean square error


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 GENETIC MODELS AND METHODOLOGY
 SIMULATION RESULTS
 DISCUSSION
 REFERENCES
 
ESTIMATION OF GENETIC VARIANCE components is an important issue in the field of quantitative genetics. A genetic mating design is often required to reach this goal. The widely used genetic mating designs for estimating A and D variance components include nested mating design (North Carolina I), factorial mating design (North Carolina II), and various types of diallel mating designs (Griffing's mating designs). Readers interested in these genetic mating designs may refer to Chapters 18 and 19 in Lynch and Walsh's book (1998). If a full diallel mating design is used, maternal and/or paternal effects may be detected. Usually, the ANOVA approach is used to estimate these genetic variance components. However, there are some limitations when using the ANOVA approach: (i) the mating design should be balanced (with no missing crosses), (ii) it requires the F1 hybrids, and (iii) the genetic model is not extendable. Thus, the ANOVA approach has limitations in selecting genetic models, generations, or data sets.

In addition to A and D effects, the epistatic (gene x gene interaction at different gene loci) effects are important effects and influence many quantitative traits in nature. Several quantitative trait locus mapping studies have shown epistatic effects (Cao et al., 2001; Doebley et al., 1995; Eshed and Zamir, 1996; Lark et al., 1995; Lee et al., 2001; Li et al., 1997; Liao et al., 2001; Wu, 2003). Few reports for the epistatic effects in the traditional quantitative genetic study could be found in literature (Goodnight, 1988; Cheverud and Routman, 1996; Xu and Zhu, 1999; Edwards and Lamkey, 2002; McCarty et al., 2004; Saha et al., 2004). Among the many possible epistatic effects, the A x A (or A x A epistatic) effects are one of the most important one for breeding studies. A genetic model containing A effects, D effects, and A x A epistatic effects is called the ADAA model (Cockerham, 1980). This model can also be extended to include G x E interaction effects (Zhu, 1994). For the ADAA genetic model, data only containing F1s and parental lines may not be sufficient. On the basis of Zhu's study (Zhu, 1994, 1998), at least two generations such as F1s and F2s and parents are required if the ADAA model is used. Since G x E interactions can have a significant influence on a quantitative trait, it is desirable that such traits be measured under multiple environments. Ideally, the genotypes under different environments should be balanced (or treated under a complete GD). In reality, it may be difficult to evaluate as a complete GD due to practical reasons. Even so, researchers want to get maximum genetic information from these experiments with unbalanced designs. The ANOVA or GLM methods are inappropriate for the analysis. Mixed linear model approaches offer the flexibility of analyzing various types of unbalanced data (Zuo et al., 2000, 2001; McCarty et al., 2004); however, the feasibility of using partial GDs for efficient estimation of genetic variance components for the ADAA genetic model remains unknown and warrants further investigation.

In this study, we provide several different GDs (complete and partial) and evaluate the estimation for variance components via simulation. This study provides important alternatives to guide breeders or experimenters in utilizing genetic experiments when complete GDs under multiple environments are difficult to implement.


    GENETIC MODELS AND METHODOLOGY
 TOP
 ABSTRACT
 INTRODUCTION
 GENETIC MODELS AND METHODOLOGY
 SIMULATION RESULTS
 DISCUSSION
 REFERENCES
 
Genetic Model
The genetic model including A, D, A x A effects, and their corresponding G x E interaction effects (ADAA model) was employed for the data analysis (Cockerham, 1980; Zhu, 1994).

The mixed linear models were as follows:

Parents:


[1]

F1:



[2]

F2:





[3]

F3:

[4]




Where µ is the population mean, a fixed effect; Eh is the environment effect, either random or fixed (fixed in this study); Ai (or Aj) is additive effect from parent i (or j); Dii, Djj, or Dij is the dominance effect; AAii, AAjj, or AAij is the additive x additive epistatic effect; AEhi (or AEhj) is additive x environment interaction effect; DEhii, DEhjj, or DEhij is the dominance x environment interaction effect; AAEhii, AAEhjj, or AAEhij is the additive x additive x environment interaction effect; Bk(h) is the block effect; and ehijk is the random error.

Equations [1] to [4] can be expressed in the form of vectors and matrices as follows.



[5]

In this model, we assume that the E effect is fixed, where µ is the population mean; 1 is the vector with all elements 1; eA is the vector for additive effects, eA {approx} N(0, {sigma}A2I); UA is the incidence matrix for additive effects, eD is the vector for dominance effects, eD {approx} N(0, {sigma}D2I); UD is the incidence matrix for dominance effects; eAA is the vector for additive x additive effects, eAA {approx} N(0, {sigma}AA2I); UAA is the incidence matrix for additive x additive effects; eAE is the vector for additive x environment effects, eAE {approx} N(0, {sigma}AE2I); UAE is the incidence matrix for additive x environment effects, eDE is the vector for dominance x environment effects, eDE {approx}  N(0, {sigma}DE2I); UDE is the incidence matrix for dominance x environment effects; eAAE is the vector for additive x additive x environment effects, eAAE {approx}  N(0, {sigma}AA2E); UAAE is the incidence matrix for AAE effects; eB is the vector for block effects, eB {approx}  N(0, {sigma}B2I); UB is the incidence matrix for block effects; ee is the vector for random errors, ee {approx}  N(0, {sigma}e2I).

Variance Component Estimation
The minimum norm quadratic unbiased estimation (MINQUE) approach was proposed by Rao (1971) for estimating variance components. The variance components in the ADAA model can be estimated by solving the following MINQUE normal equations for u, v = 1, 2, ..., 8.

[6]
where the trace (tr) is the sum of diagonals of a matrix, and

[7]
where V{alpha} = {sum}u=18 {alpha}uUuUuT and V{alpha}–1 is the inverse matrix of V{alpha} with prior values {alpha}u in place of {sigma}u2 in V. In this simulation study, we set {alpha}u = 1, u = 1, ..., 8, so this approach is called MINQUE(1) (Zhu, 1989).

Genetic Designs
Six GDs evaluated are listed in Table 1. Note that a partial GD in this study could be where the genotypes or generations are different across two environments and a complete GD is one where genotypes and generations used in two environments are exactly the same. Thus, our notations for the partial and complete GDs are different from the partial and complete mating GDs defined by Comstock and Robinson (1948, 1952) and Griffing (1956). Both partial and complete GDs could include the genetic materials obtained from either a complete or partial mating design. Designs 1 and 2 were called complete GDs because genotypes and generations across two environments were the same, while Designs 3 to 6 were called partial GDs because genotypes or generations were different in the two environments (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Six genetic designs for the ADAA model.{dagger}

 
Simulation
For simplicity, a randomized complete block design with four replicates in each environment was used in this simulation study. Note that in this simulation study, the number of genotypes could be different between the two environments (Table 1). The designs are used based on a set of half-diallel crossed of seven parents. On the basis of the six designs in Table 1, we generated the phenotypic data with each effect vector following a normal distribution, eu {approx} N(0, {sigma}u2I). Variance components were estimated by the MINQUE(1) approach (Zhu, 1989). Each GD with three sets of input values was analyzed with 500 simulations. In addition to calculating each variance component mean (u2), the corresponding bias was calculated accordingly by u2 - {sigma}u2, where {sigma}u2 was the preset value. The MSE for each parameter was calculated by MSE = bias2 + var(u2). All Monte Carlo simulation programs and program for actual data analysis were written in C++ language.


    SIMULATION RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 GENETIC MODELS AND METHODOLOGY
 SIMULATION RESULTS
 DISCUSSION
 REFERENCES
 
Different sets of input values have been used in our simulation study and similar conclusions resulted. For simplicity, we only reported the influence of the six GDs on the estimation of each variance component using the ADAA model for three cases: (i) all genetic effects, (ii) no A x A epistatic effects, and (iii) no D effects. The simulation results (bias and MSE) are summarized in Tables 234, respectively.


View this table:
[in this window]
[in a new window]
 
Table 2. Simulations results for the ADAA model with existence of all genetic effects by 500 simulations.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Simulations results for the ADAA model with no A x A epistatic effects by 500 simulations.

 

View this table:
[in this window]
[in a new window]
 
Table 4. Simulations results for the ADAA model with no D effects by 500 simulations.

 
The bias measures the difference between mean estimate and the preset value. Negative bias indicates underestimation for a parameter and positive bias indicates overestimation for a parameter. Thus the smaller a bias is, the more unbiased the estimate will be. The MSE measures the precision of an estimate. A smaller MSE indicates a more precise estimation. Simulation results showed that the estimated genetic variance components for A effects, A x E effects, A x A epistatic effects, and A x A x E effects were unbiased for the six designs (Tables 2, 3, and 4). The variance components of D effects and D x E effects could be estimated, but with large bias for Designs 3 and 4; however, estimates were almost unbiased for Designs 1, 2, 5, and 6. Designs 5 and 6 gave similar mean estimated variance components with a little larger MSE, as we expected. Designs 3 and 4 would obtain biased estimation for D and D x E variances when A x A epistatic effects exist.

Comparing the results in Tables 2, 3, and 4, we observed that the presence or absence of D effects or A x A effects did not affect the bias for each variance component for GDs 1, 2, 5, and 6, indicating that variance component estimation under these four GDs are robust under different cases. The variance components for A, A x A, A x E, and A x A x E, and residual effects were unbiased estimated with D or A x A effects absent or present for GDs 3 and 4 (Tables 3 and 4). It indicated that the estimations for variances components of these effects under GDs 3 and 4 were not influenced by the absence or presence of D effects or A x A effects. With all genetic effects present, the D variance was generally underestimated for GDs 3 and 4, while D x E variance was generally overestimated (Table 2), indicating that GDs 3 and 4 provide biased estimations for D and D x E effects when all genetic effects exist. When there was no A x A epistatic effect, the variance components for D effects and D x E effects can be estimated with slight bias for GD 3, while D variance can be estimated with a slight bias (–4.06) and variance for D x E effects with a large bias for GD 4 (Table 3). With no influence of D effects, D variance were estimated unbiasedly but with large MSE for both GDs 3 and 4, while variance for D x E effects were overestimated for these two GDs.

In summary, GDs 5 and 6, which were partial, provided similar and robust estimations for variance components with slightly larger MSE compared with GDs 1 and 2, which are complete, in two environments. This suggested that partial GDs 5 or 6 could be a good choice when not enough seeds (or individuals) of F1 and F2, or F2 and F3 are obtained. It also appeared that the GDs did not have great influence on estimation for A (including A x E) variances, and A x A (including A x A x E) variances though the corresponding MSE might vary. However, some partial GDs such as GDs 3 and 4 had great impact on the estimation of D and D x E variances under some situations, especially when D effects exist.

Actual Data Example
Ten upland cotton lines and their 20 F1 and F2 hybrids were planted at Zhejiang Agricultural University in 1992 and 1993 (Wu et al., 1995). The mating design was the North Carolina Design II: five parents as females were crossed with the other five male parents to produce F1 seeds in 1991 and 1992 at the same university. Note that only 20 F1 hybrids had enough seeds for planting so the genetic mating design was partial. F1 seeds were sent to Hainan Province, China, to produce F2 seeds in the winter of 1991. The experiment was a completely randomized block design with three replicates each year. The trait lint percentage (lint weight divided by cotton weight) was used as an example in this study. From this data set, we organized seven data sets (Table 5), where Data Set 1 was equivalent to GD 1 in Table 1, Data Sets 2 and 3 were equivalent to design 3 in Table 1, and Data Sets 4 to 7 were equivalent to GD 5. These data sets were used to estimate the variance components for lint percentage using the ADAA model. The resampling (jackknife) method was applied to calculate the standard error for each parameter by removal of each replicate within each environment. The approximate t test (one-tail) was used to detect the significance of each variance component (Miller, 1974). In this example, df = 5. The estimated variance components for this trait are summarized in Table 6. The estimated variance for residual was consistent for seven data sets. Additive variance for lint percentage was significantly detected for all data sets. In general, the results from Data Sets 4, 5, 6, and 7 were closer to those from the full and complete Data Set 1 than from Data Sets 2 and 3. The results showed an agreement with the simulation studies conducted in this study.


View this table:
[in this window]
[in a new window]
 
Table 5. Seven combinations from a 2-yr cotton data set (Wu et al., 1995).

 

View this table:
[in this window]
[in a new window]
 
Table 6. Estimated variance components for lint percentage using ADAA models with Data Sets 1 to 7.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 GENETIC MODELS AND METHODOLOGY
 SIMULATION RESULTS
 DISCUSSION
 REFERENCES
 
The partial GDs evaluated in this study could be caused by different genotypes, generations, or entry numbers in a multiple-environment experiment. Thus, our definition for the partial GD is different from the partial mating GDs (Griffing, 1956). Our complete GD is a data set that has the same genotypes in all environments regardless of genetic mating designs. Data obtained from such partial GDs can be considered as unbalanced data which usually cannot be appropriately analyzed by the ANOVA method or the GLM method, especially when a complicated genetic model like the ADAA model is applied. The mixed linear model approaches can be used to estimate variance components even though genotypes vary across environments. These approaches provide useful information to the researchers even though they have data sets obtained from partial GDs. The ADAA genetic model and one of the mixed linear model approaches, the MINQUE approach, have been proposed for more than two decades (Rao, 1971; Cockerham, 1980); however, the main purpose of this study is to provide some useful information on how to handle various types of genetic data sets when some data points are missing using the MINQUE approach.

A good GD should provide reliable results and save resource(s). Regarding the ADAA genetic model, our simulation results showed that GDs 5 and 6, which were partial, provided similar estimations for variance components with slightly larger MSE compared with the GDs 1 and 2, which were complete in two environments. This suggested that partial GDs 5 or 6 could be a good choice when enough seeds (or progenies) of F1 and selfed F2, or selfed F2 and F3, are difficult to obtain. Partial GD 3 or 4 would have large impact on the estimation of D and D x E variances, yet only little impact on A and A x A variance components; thus, these two GDs are not recommended for a general use. Our results from an actual data example had good agreement with our simulated results. Therefore, this simulation study provided important guidance for conducting a genetic experiment when it is difficult to run a complete GD in practice. Furthermore, our simulation results suggested that the mixed linear model approaches can also be applied to more complicated genetic models such as plant seed models (Zhu and Weir, 1994a, 1994b) with partial GDs.

Even though the mating design in our example was partial (with five missing crosses), we found similar results when carrying out 500 simulations using complete and partial GDs listed in Table 1 (data not presented). Thus, it appeared that the efficiency of estimating variance component in the ADAA model was more related to the GDs (partial or complete) rather than the mating designs. In the present simulation study, we choose 20 as a preset value for A variance and residual variance and 20 or 0 as a preset value for D variance and A x A epistatic variance. We conducted more simulations than what we reported in this manuscript. This included different inputs of G x E interaction effects and A x A interactions and similar conclusions resulted (data not shown). Research may be conducted for additional investigations with different preset values. In addition, researchers may investigate more complicated cases, such as missing crosses and/or missing plots. Our computer program provides the capability of evaluating the reliability of a specific GD (not mating designs) before analyzing a real data set. It can also be used to evaluate several potential GDs and choose an appropriate one. The simulation program we developed can be obtained from the authors by request.

Theoretically, increasing the number of parents will lead to higher statistical testing powers and more robust estimation. The number of parents in our simulation study was small compared with the other studies (Tang et al., 1993; Wu et al., 1995; McCarty et al., 2004), yet it still provided useful information on how to conduct experiments with various types of GDs when genetic resources such as seed supply are limited.

In this study, we focused on the ADAA genetic model. In some cases, other high-ordered epistatic effects may be important for some quantitative traits. For example, A x D and D x D can be important genetic effects. On the basis of the mixed linear model approach, the ADAA genetic model can extended to include these genetic effects; however, it remains unknown what GDs should be used to estimate these genetic variance components appropriately. This needs to be evaluated with another simulation study.

Estimating the standard error for each parameter is important for a significance test. Two methods, jackknife and bootstrap (Efron, 1982; Davison and Hinkley, 1997), can be used to estimate the standard errors. In this study, we used the jackknife method. This can be done by removal of one or several measurements each time; for example, one genotype within each environment, or one replication (or one block) within each environment. The latter is called the group-based jackknifing method. Then, the mean jackknife values could be tested by an approximate t test. However, we found the number of cells (genotypes in a replication or a block) could be different across environments (see GDs 5 and 6, Table 1; or Data Sets 4–7 in Table 3). The reasons we chose the block-based jackknife approach were (i) it reduced the computational intensity while retaining the maximum number of genotypes in the data set after removal of one replication within each environment, and (ii) it kept the same degrees of freedom for t tests.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 GENETIC MODELS AND METHODOLOGY
 SIMULATION RESULTS
 DISCUSSION
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (3)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wu, J.
Right arrow Articles by Wu, D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wu, J.
Right arrow Articles by Wu, D.
Agricola
Right arrow Articles by Wu, J.
Right arrow Articles by Wu, D.
Related Collections
Right arrow Biometrics
Right arrow Experiment Design
Right arrow Crop Genetics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome