|
|
||||||||
a CIMMYT, Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
b School of Land and Food Sciences, The Univ. of Queensland, Brisbane, Qld 4072, Australia
c Pioneer Hi-Bred International Inc., 7250 N.W. 62nd Avenue, PO Box 552, Johnston, IA 50131, USA
* Correspondence author (jkwang{at}cgiar.org)
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Quantitative genetic theory generally provides much of the framework for the design and analysis of selection methods used within breeding programs. However, various assumptions are made in quantitative genetics to render theories mathematically or statistically tractable (Falconer, 1989; Comstock, 1996; Kearsey and Pooni, 1996). Some of these assumptions can be easily tested or satisfied by certain experimental designs; others, such as the assumptions of no linkage, no multiple alleles, and no genotype by environment interaction, can seldom, if ever, be met. Other assumptions, like the presence or absence of epistasis and pleiotropy, are statistically difficult to define and test (Holland, 2001). Computer simulation provides an opportunity to lessen the impact of these assumptions by accommodating these factors, thereby improving the validity of genetic models for use in plant breeding (Kempthone, 1988; Comstock, 1996).
Simulation, using relatively simple genetic models, has been used for many special studies in plant breeding (Casali and Tigchelaar, 1975; Reddy and Comstock, 1976; van Oeveren and Stam, 1992; van Berloo and Stam, 1998; Frisch and Melchinger, 2001). Nevertheless, a tool capable of simulating the performance of breeding and selection strategies for a continuum of genetic models ranging from simple to complex, imbedded within an existing practical breeding program, has not been available until recently.
QU-GENE, a simulation platform for quantitative analysis of genetic models, provides the opportunity to develop a general simulation program for actual breeding programs through its two-stage architecture (Podlich and Cooper, 1998). The first stage involves QU-GENE as the central engine, the role of which is to define the genotype and environment system and generate the starting population of individuals and the reference population to estimate genetic variances and error variances. In the second stage, external application modules are developed and linked to the QU-GENE engine to manipulate, investigate, and analyze the starting population of individuals according to crossing and selection approaches set by the user within the genotype and environment system defined by the engine.
QuCim is such a QU-GENE application module and was specifically developed to simulate the wheat breeding programs in the International Maize and Wheat Improvement Center (CIMMYT), gaining its name from the contraction of these two names (software available; refer inquiry to the senior author). However, the breeding strategies defined in QuCim represent the operations of most breeding programs for self-pollinated crops, and hence in principle have wide potential application. The breeding methods that can be simulated by QuCim are mass selection, pedigree system (including single seed descent), bulk population system, backcross breeding, top cross (or three-way cross) breeding, doubled haploid breeding, marker-assisted selection, and many combinations and modifications of these methods. Simulation experiments can therefore be designed to compare the breeding efficiencies of different selection strategies, or various modifications within a selection strategy, for any self-pollinating crop or line-development breeding program, including that for inbred lines in cross-pollinated crops such as corn (Zea mays L.). QuCim has been used to compare two breeding strategies common in CIMMYT's wheat breeding program (Wang et al., 2003a). The objectives of this paper are (i) to explain how a complex crossing and selection strategy is defined in QuCim and (ii) to investigate effects on selection of dominance and epistasis in a breeding program developing inbred lines.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
Number of Generations in Selected Bulk and Number of Selection Rounds in Each Generation
In the breeding program in Fig. 2, the best advanced lines developed from the F10 generation will be returned to the crossing block to be used for new crosses; that is to say, a new breeding cycle starts after F10 leaf rust screening at El Batan. Therefore, the number of generations in one breeding cycle is 10 for the Selected Bulk breeding strategy. The crossing block (viewed as F0) and the 10 generations must first be defined in QuCim. The parameters to define a generation consist of the number of selection rounds in the generation, an indicator for seed source (explained later), and the planting and selection details for each selection round (Fig. 1; Tables 1 and 2). Most generations in this breeding program have just one selection round, e.g., F1 to F6, while some generations have more than one selection round since they are grown simultaneously at different sites or under different conditions, e.g., F7, F8, and F9.
|
|
Seed Propagation Type for Each Selection Round
The seed propagation type describes how the selected plants in a retained family from the previous selection round or generation are propagated to generate the seed for the current selection round or generation. There are seven options for seed propagation, presented here in the order of increasing genetic diversity (the F1 excluded): (i) clone (asexual reproduction), (ii) DH (doubled haploid), (iii) self (self-pollination), (iv) backcross (back crossed to one of the two parents), (v) topcross (crossed to a third parent, also known as three-way cross), (vi) random (random mating among the selected plants in a family), and (vii) noself (random mating but self-pollination is eliminated). The seed for the F1 is derived from crossing among the parents in the initial population (or crossing block). QuCim randomly determines the female and the male parents for each cross from a defined initial population, or alternately, one may select some preferred parents from the crossing block. The selection criteria used to identify such preferred parents (grouped here as the male and female master lists) can be defined in terms of among-family and within-family selection descriptors (see below for details) within the crossing block (referred to as F0 generation).
By using the parameter of seed propagation type, most if not all methods of seed propagation in self-pollinated crops can be simulated by QuCim.
Generation Advance Method for Each Selection Round
The generation advance method describes how the selected plants within a family are harvested. There are two options for this parameter: pedigree (the selected plants within a family are harvested individually, and therefore each selected plant will result in a distinct family in the next generation) and bulk (the selected plants in a family are harvested in bulk, resulting in just one family in the next generation). This parameter and the seed propagation type allow QuCim to simulate not only the traditional breeding methods, such as pedigree breeding and bulk population breeding, but also many combinations of different breeding methods (e.g., pedigree selection until the F4 and then doubled haploid production on selected F4 plants). The bulk generation advance method will not change the number of families in the following generation if no among-family selection is applied in the current generation, while the pedigree method increases the number of families rapidly if there is weak among-family selection intensity, and several plants are selected within each retained family. For a generation with more than one selection round, the generation advance method for the first selection round can be either pedigree or bulk. The subsequent selection rounds are used to determine which families derived from the first selection round will be advanced to the next generation. In the majority of cases, bulk generation advance is the preferred option for the subsequent selection rounds.
Field Experiment Design for Each Selection Round
The parameters used to define the virtual field experiment design in each selection round include the number of replications for each family, the number of individual plants in each replication, the number of test locations, and the environment type for each test location (Fig. 1 and Table 1). The concept of an environment type within the target population of environments was defined by Podlich and Cooper (1998) to distinguish sets of environmental conditions (e.g., CIMMYT's megaenvironments; Rajaram, 1999) that conditioned different genetic effects, and thus different genetic requirements for adaptation. Each environment type defined in the genotype and environment system has its own gene action and gene interaction, which provides the framework for defining the genotype x environment interaction. Therefore, by defining the target population of environments as a mixture of environment types, genotype x environment interactions are defined as a component of the genetic architecture of a trait (Cooper and Podlich, 2002). An integer number represents the environment type for each test location, and whenever possible, it should be consistent with known features that are defined for the target population of environments of the genotype and environment system. For those locations where the environment types are little understood, QuCim will randomly assign environment types to them with a likelihood based on the frequencies of environment types in the target population of environments. Additional examples demonstrating the application of these procedures to define genotype x environment interactions were given by Podlich et al. (1999) and Cooper et al. (2002).
Among-Family Selection and Within-Family Selection for Each Selection Round
Ten traits have been included as relevant (van Ginkel et al., 2002) for the selection process in the breeding program described in Fig. 2. Among-family selection and within-family selection are distinct processes in a breeding strategy. However, the definition of these two types of selection is essentially the same: the number of traits to be selected is followed by the definition of each trait (Fig. 1).
Apart from the trait code (defined in the genotype and environment system) there are two parameters that define a trait used in selection: selected proportion and selection mode. For among-family selection, the selected proportion is the percentage of families to be retained; for within-family selection, it is the percentage of individual plants to be selected in each retained family. There are four options for trait selection mode: (i) top (the individuals or families with highest phenotypic values for the trait of interest will be selected, e.g., yield, tillering, grains per spike, and kernel weight), (ii) bottom (the individuals or families with the lowest phenotypic values will be selected, e.g., lodging, stem rust, leaf rust, and stripe rust), (iii) middle (individuals or families with medium trait phenotypic values will be selected, e.g., height and heading), and (iv) random (individuals or families will be randomly selected). Independent culling is used if multiple traits are considered for within-family or among-family selection. If there is no among-family or within-family selection for a specific selection round, the number of selected traits is noted as 0 (Table 2). In generations F1 to F6 in the Selected Bulk method, each family is derived from one distinct cross since the method of bulk generation advance is applied from the F2 onwards. Among-family selection from F1 to F6 is, in fact, among cross selection. The traits for both among-family and within-family selections can be the same or different, as is the case for selected proportions (Table 2). The traits for selection may also differ from generation to generation, as may the selected proportions for traits.
Genetic Models Used to Investigate the Effects of Dominance and Epistasis on Selection
Seven agronomic traits and three rust resistances have been used in the simulation of the Selected Bulk breeding method. The gene number and genetic values were derived from discussions with breeders and from analyses of past unpublished experiments. In total, we postulated that 59 independently segregating genes control these traits (Table 3). The genetic effects for traits other than yield were considered fixed. Pleiotropic effects were included to account for trait correlations, and they were also considered fixed. Two kinds of pleiotropic effects were included (Fig. 3)
, although more complicated pleiotropic interaction can also be defined within the QU-GENE engine. The first kind is positive pleiotropy, such as the pleiotropic effects on lodging from genes for grains per spike (Fig. 3-a). The second kind is the negative pleiotropy, such as the pleiotropic effects on kernel weight from genes for grains per spike (Fig. 3-b). As shown in Table 3, at Cd. Obregon the three lodging genes, the five stem rust genes, and the five leaf rust genes have some degree of negative effect on yield, and the five kernel weight genes have a positive pleiotropic effect. Stem rust, leaf rust, heading, tillering, and grains per spike genes all have a negative pleiotropic effect on kernel weight (Table 2 in Wang, 2003a). Stripe rust rarely occurs at Cd. Obregon, so there is no selection for stripe rust when the nursery is grown there (Table 2) and the genetic effects of stripe rust genes are considered to be zero in this environment.
|
|
(AA+aa)/2, but is between AA and aa], a combination of partial, complete and overdominance (AD2, the genetic values of AA, Aa, and aa are independent), and digenic interaction (ADE). Following the procedures described by Cooper and Podlich (2002), the genetic effects of 20 yield genes in each environment type were sampled from the uniform distribution before the simulation was run. These sampled gene effects are approximations of the distribution of real gene effects (Cooper et al., 2002). Some genes have relatively large effects and can be viewed as major genes, and others have relatively small effects and can be viewed as minor genes (Fig. 4)
. For Model AD0 (aFig. 4-1a and -1b), gene 19 has the largest additive effect in the Cd. Obregon environment type and the first allele is the favorable allele (AA = 0.91, Aa = 0.47, and aa = 0.03). It explains more than 20% of genetic variance (variation from pleiotropic genes excluded) in the Cd. Obregon environment in the reference population where all gene frequencies were set at 0.5, but only 5 to 8% in the other two environment types. Gene 19 can be viewed as a major gene at Cd. Obregon, but probably not at Toluca and El Batan. Gene 8 has the second largest additive effect in the Cd. Obregon environment type, with the first allele also being favorable (AA = 0.83, Aa = 0.51, and aa = 0.18). This gene explains 12% of the total genetic variation and therefore functions as a major gene in the Cd. Obregon environment. However, it explains only a very small amount of genetic variation (less than 1%) in the Toluca and El Batan environment types and is considered a minor gene in these environments. For Model AD1 (aFig. 4-2a and -2b), genes 19, 9, 2, and 3 each contribute more than 12% to genetic variance, and are therefore major genes in the Cd. Obregon environment. Genes 19 and 3 have a positive dominance effect (a = 0.34, d = 0.22 and d/a = 0.64 for gene 19, and a = 0.23, d = 0.01 and d/a = 0.02 for gene 3, where a and d are the additive and dominance gene effects, and d/a is the degree of dominance). Genes 9 and 2 have a negative dominance effect (a = 0.26, d = 0.22 and d/a = 0.85 for gene 9, and a = 0.28, d = 0.13 and d/a = 0.45 for gene 2). For Model AD2 (aFig. 4-3a and -3b), genes 11, 7, 15, and 17 each contribute more than 10% to the genetic variance in the Cd. Obregon environment. For Model ADE (Fig. 4-4a and -4b), the interaction between genes 1 and 2 resulted in two local peaks and one global peak in the distribution of genotypic values. The importance of each epistasis network can also be represented by the proportion of genetic variance explained by the network. Among the 10 epistasis networks, the interaction between genes 17 and 18 explains the largest proportion of genetic variance (Fig. 4-4b) in the Cd. Obregon environment type.
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
As yield in the various environments in this particular genotype and environment system are positively correlated, selection for yield in one environment may also improve yield in the other environments. For example, the genetic correlation of yield at Cd. Obregon with that at Toluca, in the initial population when all gene frequencies were set at 0.5, was r = 0.56 for Model AD0, 0.61 for Model AD1, 0.56 for Model AD2, and 0.35 for Model ADE. The genetic correlation coefficient between two environment types j1 and j2 for a specific trait is calculated by
![]() |
Because of different selection modes (e.g., bottom for lodging and stem rust, middle for heading, and top for kernel weight; Table 2), the selection responses of other traits may be distinct. However, similar trends can be identified regardless of the yield genetic models used. In aFig. 5-3a, -3b, and -3c only the results for a purely additive yield gene model (AD0) are presented. In the Selected Bulk, the individuals and families with the lowest lodging severity (percentage) were selected in each generation. However, because of the related pleiotropic effect of genes for grains per spike and kernel weight, lodging in the selected population hardly decreases, no matter which initial population was used. In contrast, stem rust resistance could be improved as demonstrated by decreasing severity values over cycles, as there was no counteracting pleiotropic effect from other genes. Kernel weight increases with selection, as the individuals and families with highest kernel weights were selected. However, it cannot reach the level of the most desired target genotype because of the negative pleiotropic effects from the other two yield components, tillering and grains per spike, that when actively selected for will reduce kernel weight.
Gene Frequency
The change of gene frequency after one cycle of selection is proportional to the relative size of the additive gene effect (Falconer, 1989). This must then also be true for several cycles of selection and for various genetic models without epistasis. In Model AD0, only additive gene effects influence yield, and hence changes in yield must coincide directly with changes in gene frequency. In response to selection, genes with large additive effects change their gene frequencies faster, while those with small additive effects change more slowly. For partial dominance yield genes (AD1), gene 19 has the largest additive gene effect, followed by genes 9, 2, and 3. The changing rate in allele frequency after selection followed this order when all gene frequencies in the initial population were set at 0.5 (bFig. 6-1b)
. Gene 12 is an exception when gene frequencies in the initial population are set at 0.1. It has a smaller additive effect than gene 9, but responds with a faster increase in first-allele frequency. The size of the additive gene effect also reflects a change in the gene frequency. For example, genes 6 and 8 in Model AD1 and genes 8 and 17 in Model AD2 have negative additive effects for the nominated first-allele. In accordance with these negative additive values, the first-allele frequencies for these loci decrease compared to their frequency in the initial population (aFig. 6-1a, -1b, -1c, -2a, -2b, and -2c).
|
|
|
![]() |
![]() |
Consequently, a quarter of the additive x additive variance is contained within the estimate of interactive variance when epistasis is present, and all three kinds of interaction variance are included in the estimate of dominance variance. Therefore the actual interaction variance (VI) is always underestimated. As shown in Fig. 7-1, -2, and -3, the pure additive variance decreases to a low level after 10 cycles of selection. So we may suppose that for Model ADE, the additive x additive variance is the largest portion of the additive variance estimate after a few cycles of selection. The higher level of additive variance with epistasis (aFig. 7-4a, -4b, and -4c) means that selection can still be effective; it just takes more cycles to fix the additive x additive interaction. bFigure 5-1b indicates that yield under the Model ADE can be further improved with more cycles of selection.
The Reality of Genotype and Environment Systems in Simulation
As in field breeding, QuCim conducts within-family selection from individual phenotypic values in each family and among-family selection from family means. The genotypic value of an individual was calculated from the definition of gene actions in the genotype and environment system. The phenotypic value and family mean was estimated from the genotypic value and its associated error (random environmental deviation). A sensible definition of the genotype and environment system is thus essential to any such simulation, since it determines the phenotypic value of a genotype and then the phenotypic mean of a population to which selection is applied. However, given the current state of our knowledge of gene-to-phenotype relationships for complex traits, it is difficult to comprehensively define a real genotype and environment system. It is therefore not possible to ensure that the genotype and environment systems used in this simulation experiment match the biophysical systems within which CIMMYT's wheat breeding program operates.
However, a large amount of data from CIMMYT's historical breeding records has been used to define the genotype and environment systems as realistically as possible. The systems used in the current simulation research result in similar trait correlations, environment correlations, and trait heritabilities as derived from the field data. From a previous simulation experiment (Wang et al., 2003b), we found that the linkage (e.g., recombination frequency 0.05) has only a small effect on selection for a large breeding program, in which a large number of crosses have been made. Therefore, linkage was not included in this study.
As far as a specific cross is concerned, linkage does affect the response to selection. Linkage in repulsion delays the response while linkage in coupling favors the response (Allard, 1960). The following example illustrates the effect of linkage on selection for single crosses. Twenty genes are supposed to be evenly distributed on 4 chromosomes, each having 5 genes. The recombination frequency between two neighboring genes is set at 0.00, 0.01, 0.02, 0.03, 0.04, 0.05, 0.10, 0.20, 0.30, 0.40, and 0.50 (Fig. 8)
. Two crosses are made from two pairs of parents with genotypes
![]() |
|
| CONCLUSIONS |
|---|
|
|
|---|
We used QuCim to investigate some of the possible implications of increasing the genetic complexity of traits on the outcomes from the CIMMYT breeding program. This raises the question of how to introduce a range of simple to complex genetic models in simulation experiments. This paper reports a preliminary investigation in which our objective was to compare some specific models relative to the classical additive model (AD0). The models of interest, selected in consultation with the CIMMYT breeders, were to introduce partial dominance (AD1), partial or overdominance (AD2), and digenic epistasis (ADE). Analyses were then conducted to make specific comparisons among more complex models and AD0, to measure impact on genetic advance, gene frequency, and components of genetic variance when we challenge the assumption of additivity.
The simulation results indicate that the fully additive model provides similar outcomes regarding genetic advance as the partial dominance model, when all genes have the same frequency of 0.5 in the initial population. When gene frequency is not 0.5, the initial population is located at different points for different models (Fig. 5), making it more complicated to compare the effects of various models. However, genetic advance in genetic systems consisting of pure additive or partial dominance gene effects are generally faster in reaching the target genotype than systems with considerable overdominance and epistasis. The additive gene effect is an appropriate indicator of the change in gene frequency following selection for genetic models without epistatic effects. The gene with the largest additive effect changes its frequency fastest in response to selection. The positive or negative sign of the additive gene effect indicates the direction in which allele frequency will change. When epistasis is absent, the additive variance decreases rapidly following selection. Nevertheless, additive variance remains at a certain level for many breeding cycles when epistasis effects are active. This suggests that it could take a considerably longer time to fix useful additive x additive interaction in an inbred breeding program. This realization may motivate breeders to take other, swifter measures than further selection, such as producing large doubled haploid populations to identify desired recombinants. The variance from partial dominance is small and therefore hard to detect by using the covariance among half sibs and the covariance among full sibs. The variance from overdominance can be properly identified, but it does not change significantly following selection. This is because overdominance cannot be effectively utilized by an inbred breeding program.
| ACKNOWLEDGMENTS |
|---|
Received for publication November 12, 2003.
| REFERENCES |
|---|
|
|
|---|
Related articles in Crop Science:
This article has been cited by other articles:
![]() |
J. Franco, J. Crossa, and S. Desphande Hierarchical Multiple-Factor Analysis for Classifying Genotypes Based on Phenotypic and Genetic Data Crop Sci., December 30, 2009; 50(1): 105 - 117. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Chenu, S. C. Chapman, F. Tardieu, G. McLean, C. Welcker, and G. L. Hammer Simulating the Yield Impacts of Organ-Level Quantitative Trait Loci Associated With Drought Response in Maize: A "Gene-to-Phenotype" Modeling Approach Genetics, December 1, 2009; 183(4): 1507 - 1523. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.K. St. Martin, F.-t. Xie, H.-j. Zhang, W. Zhang, and X.-j. Song Epistasis for Quantitative Traits in Crosses between Soybean Lines from China and the United States Crop Sci., January 28, 2009; 49(1): 20 - 28. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, H. Li, Z. Li, and J. Wang Interactions Between Markers Can Be Caused by the Dominance Effect of Quantitative Trait Loci Genetics, October 1, 2008; 180(2): 1177 - 1190. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Ceron-Rojas, F. Castillo-Gonzalez, J. Sahagun-Castellanos, A. Santacruz-Varela, I. Benitez-Riquelme, and J. Crossa A Molecular Selection Index Method Based on Eigenanalysis Genetics, September 1, 2008; 180(1): 547 - 557. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wan, J. Weng, H. Zhai, J. Wang, C. Lei, X. Liu, T. Guo, L. Jiang, N. Su, and J. Wan Quantitative Trait Loci (QTL) Analysis For Rice Grain Width and Fine Mapping of an Identified QTL Allele gw-5 in a Recombination Hotspot Region on Chromosome 5 Genetics, August 1, 2008; 179(4): 2239 - 2252. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xu and J. H. Crouch Marker-Assisted Selection in Plant Breeding: From Publications to Practice Crop Sci., March 19, 2008; 48(2): 391 - 407. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||