|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
a Crop Research Informatics Lab., and Genetic Resources Enhancement Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
b CSIRO Plant Industry, 306 Carmody Rd, St. Lucia, QLD 4067, Australia
c CSIRO Plant Industry, P.O. Box 1600, Canberra, ACT 2601, Australia
d Institute of Crop Science, and The National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing 100081, China
* Correspondence author (wangjk{at}caas.net.cn or j.k.wang{at}cgiar.org).
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: DHs, doubled haploid lines MAS, marker-assisted selection QTL, quantitative trait locus or loci RIL, recombination inbred lines TCF1, F1 generation of the topcross TCF2, F2 generation of the topcross
| INTRODUCTION |
|---|
|
|
|---|
Marker-assisted selection (MAS) may utilize markers that are closely or completely linked with target genes of interest or markers that are associated with quantitative trait loci (QTL) and explain only part of the variance for a trait that may be under complex genetic control. If markers are not completely linked with the target genes, two flanking markers (on either side of the gene or QTL) may still be useful. Although molecular markers may allow more accurate selection in early generations than conventional phenotypic selection, the large number of individuals needed to recover a target homozygote at multiple loci at this stage can make this approach impracticable and/or too expensive. Conversely, screening in later generations often provides little or no advantage over conventional selection techniques (Bonnett et al., 2005). Considerable efficiency gains can be achieved if plant breeders are able to choose the most appropriate crossing (e.g., single cross, backcross, or topcross) and best MAS methods (Lande and Thompson, 1990; Delphin Koudande et al., 2000; Bonnett et al., 2005; Kuchel et al., 2005). Calculation of the distribution of desirable alleles among an initial set of genotypes can considerably assist the breeder.
Under simplified conditions (i.e., gene-based markers where the association between gene and marker is complete), some general recommendations were given by Bonnett et al. (2005). Using population genetic theory and the QU-GENE (developed at the University of Queensland, Australia) application module QuLine (previously called QuCim) (Podlich and Cooper, 1998; Wang et al., 2003; Wang et al., 2005), we have extended this theory to identify principles for design of efficient selection strategies where there is recombination between marker and gene, and where there is repulsion-phase linkage between desirable alleles. Note that we focus on crosses between "generally adapted" parents and therefore do not consider the process of "background" selection (Frisch and Melchinger, 2005), whereby markers are used to both select for target genes and to maximize recovery of the recurrent parent genome.
In this study, population genetic theory was used to establish general rules for the numbers of markers required, the best crossing strategies, and the level of inbreeding to maximize the efficiency of marker implementation where there was no recombination between marker and allele of interest. When the scenario was extended to linked markers, we adopted simulation analysis to develop rules for selection. A topcross among three Australian wheat lines was used to demonstrate the outcomes from the population genetic theory and simulation models, while considering both completely and incompletely linked markers, as well as linkage between target alleles.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Calculating Minimum Population Size
Where
is the probability of not having at least one target genotype present in the population sampled, and f is frequency of the genotype to be selected, the minimum population size (N) to ensure at least one target genotype is present in the population with given level of certainty can be calculated as
![]() | [1] |
= 0.01 was used. For strategies with multiple selection stages, population sizes were calculated to achieve a cumulative probability of
at 0.01 across all selection stages.
Comparing Biparental, Back-, and Topcrosses
If n loci differ between two parents with n1 favorable alleles in the first parent P1, and n2 in the second parent P2, then relative proportions of the target genotype in DHs or recombination inbred lines (RILs) derived from F1, P1BC1 (backcrossed to P1), and P2BC1 (backcrossed to P2) are
![]() | [2] |
If target alleles are dispersed among three parents, i.e., P1, P2, and P3, a topcross (or three-way cross), e.g., (P1 x P2) x P3, is required to combine all alleles. If each parent carries different alleles, the alleles contributed by parents P1 and P2 in the first cross will be present at frequencies of 0.25 following a topcross with P3, and the alleles contributed by P3 will each have a frequency of 0.5. If n1, n2, and n3 are the numbers of target alleles in the three parents, respectively, under the condition of no selection, the expected proportion of individuals with the target genotype in DHs/RILs is
![]() | [3] |
Minimizing the Total Number of Marker Assays with Sequential Culling
In a population of N individuals to be screened sequentially with markers at n independent loci, and where only those with the target genotype are retained for screening with the next marker, the total number of assays (M) required to identify the target genotype at all loci can be calculated according to the formula
![]() | [4] |
![]() | [5] |
<
<
<
. Equations [1] to [5] can be used to address the first two scenarios when no gene linkages exist. Simulation is needed for the other scenarios. The analytic expression for the cost of sequential culling ignores the costs of plant/line handling (tagging, leaf sampling, etc.) and DNA extraction, which are fixed with total sample size and cannot be reduced by sequential culling. If these fixed costs are major parts of the expense for genotyping, the order of markers used in the sequential culling may become less important.
Genetics and Breeding Simulation Tools
QU-GENE is a simulation platform for quantitative analysis of genetic models. The program generates populations of genotypes and provides a library of subroutines to develop simulation modules for real-world breeding programs (Podlich and Cooper, 1998). QuLine is a QU-GENE application module that was specifically developed to simulate breeding programs developing inbred lines (Wang et al., 2003) and has also been used to predict cross performance for quality traits using known gene information (Wang et al., 2005). The software is available to researchers via arrangements with the International Maize and Wheat Improvement Center (CIMMYT) (contact the corresponding author) or The University of Queensland, Australia (contact Dr. Mark Dieters: m.dieters{at}uq.edu.au).
Use of Simulation Modeling to Examine the Strategies to Minimize Population Sizes while Combining Target Alleles
Equations [1] to [5] do not consider genetic linkage between the marker and target allele, or different target alleles. While the equations can be readily extended to accommodate recombination, they become difficult to evaluate algebraically as gene number increases. To illustrate the effect of linkage, we simulated a topcross among three wheat lines: Sunstate (a commercial Australian line), HM14BS (a backcross derivative of the "long coleoptile" trait that utilizes the Rht8 allele for reduced height), and Silverstar+tin (a modified Australian variety that is a source of the tin "reduced-tillering" trait). Genotypic and marker data at the nine polymorphic loci are shown in Table 1. Alleles at seven of the nine loci are independently inherited, while the target Glu-A3 and tin are linked in repulsion on the short arm of chromosome 1A at a distance of 3.8 cM (r = 0.0366) (Spielmeyer and Richards, 2004). Haldane's mapping function was used to transform the mapping distance into recombination frequency.
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
= 0.01). If selection is made among homozygous lines (i.e., DHs or RILs) from the same cross, the frequency of the target genotype is 0.55 = 0.03125 with a minimum population size of only 146 (
= 0.01), i.e., the target genotype is more readily recovered with smaller population size if selection is delayed until greater homozygosity has been achieved. For more segregating loci, population sizes quickly increase even in DH or RIL populations. For example, in a biparental population with eight unlinked segregating loci, the frequency of the target genotype in a homozygous population is 0.58 = 0.0039, and the minimum population size 1177. In these instances, Bonnett et al. (2005) proposed a two-stage selection strategy. The first stage is "F2 enrichment," where F2 individuals carrying the entire set of target alleles in either homozygous or heterozygous form are selected. F2 enrichment takes advantage of the high expected frequency of carriers (either homozygous or heterozygous) at each locus of 0.75. The value of the technique can be seen in a population segregating at 12 loci, where the frequency of genotypes selected in an F2 enrichment step is 0.7512 = 0.03168, resulting in the minimum population size of 144 F2 generations (cf. frequency of 0.2512 and a population size >77 million to identify a single homozygous individual in the F2). After F2 enrichment, the frequency of each of the 12 target alleles in the selected population is increased from 0.5 to 0.67. The second step is to generate a population of more or less homozygous lines from the selected F2. The frequency of the target genotype in DHs or RILs generated from the enriched F2 will have been increased from 0.512 = 0.00002 to 0.6712 = 0.00771, resulting in a decrease in minimum population size from 18 861 to 596. Thus, with enrichment, both the F2 and DH/RIL populations are of a more practical size for breeding.
The point at which population sizes become unmanageable will vary from one breeding program to another, and for high-value trait combinations, breeders may be prepared to apply molecular screens to larger numbers (say tens of thousands of lines). However, to simplify further discussion, in our studies we set a relatively modest maximum population size of 1000 at
= 0.01 at any given selection stage. With this limitation, direct selection of the target genotype in F2 will allow no more than three alleles to be combined. If the target genotype is selected in DHs or RILs, only seven alleles can be combined. Use of F2 enrichment allows target alleles at 12 or 13 loci to be combined in derived homozygous lines. Linkage of alleles in coupling will have a positive effect on the frequency of the target genotype, while linkage in repulsion will have a negative effect related to the level of recombination. Wherever linkage occurs, simulation approaches (see later) can assist in determining optimum selection strategies.
While our initial findings support those of Bonnett et al. (2005) on the benefit of F2 enrichment using conventional formulae, we were able to extend this work to investigate possible benefits of enrichment in later segregating generations or combining F2 enrichment with enrichment in F3 and/or F4 populations. Given that minimum population size is determined by the frequency of the target genotype and that the same genotype frequency can result from different numbers and frequencies of target alleles, it is possible to study the relative efficiencies of different selection methodologies with a single-locus model. If a target allele, M, has the frequency p in an F2 population, then the frequencies of the three marker types MM, Mm, and mm, are p2, 2p(1 p), and (1 p)2, respectively, under HardyWeinberg expectations (Falconer and Mackay, 1996). Varying the frequency of M will result in different genotype frequencies for which alternative selection schemes were compared: (i) target genotype, i.e., MM, selected in F2; (ii) target genotype selected in DHs or RILs (say >F5); (iii) target genotype selected in F3 after F2 enrichment; (iv) the target genotype selected in DHs or RILs after F2 enrichment; (v) target genotype selected in F4 after F2 and F3 enrichment; and (vi) target genotype selected in DHs or RILs after F2 and F3 enrichment.
The formula for calculating allele and genotype frequencies after selection for each of these methodologies can be readily derived via Eq. [1], based on which we calculated the minimum population size with
= 0.01 for each scheme (Fig. 1
). For Schemes 36, with more than one selection stage, the minimum population sizes were summed across stages. The probability used for each stage was 1(1
)
when there were two selection stages (Schemes 3 and 4), and 1(1
)
when there were three selection stages (Schemes 5 and 6), to have the same cumulative probability of
for each scheme.
|
Enrichment at two selection stages (in F2 and F3) always required greater assay numbers than simple F2 enrichment (Fig. 1). As indicated by Bonnett et al. (2005), F2 enrichment increased the frequency of selected alleles, allowing large reductions in minimum population size for recovery of target genotypes (commonly around 90%) and/or selection at a greater number of loci. So the gain from another cycle of allele enrichment selection in F3 following enrichment in F2 is at best minor and often results in a small net increase in minimum population size.
Comparision of Biparental, Backcross, and Topcross Populations
Backcrossing is an effective method to reduce population size compared with a biparental cross where one parent contributes more target alleles than the other (Bonnett et al., 2005). However, when each parent has a similar number of target alleles, the magnitude of the reduction may not be sufficient to compensate for the added cost, complexity, and time involved in generating a backcross population. If fP1BC1/fF1=3n1/2n>1 (Eq. [2]), a backcross will reduce population sizes using P1 as the recurrent parent; if fP2BC1/fF1=3n2/2n>1, P2 should be the recurrent parent; otherwise, no backcross is needed. For example, if n = 5, n1 = 3, and n2 = 2, then fP1BC1/fF1=0.84, and fP2BC1/fF1=0.28, and backcrossing is not helpful. If n1 = 4 and n2 = 1, fP1BC1/fF1=2.53, and therefore, a backcross should be used with P1 as the recurrent parent.
If the target alleles are dispersed among three parents, i.e., P1, P2, and P3, a topcross (or three-way cross) is often used, e.g., (P1 x P2) x P3. Equation [3] shows that fTC is maximized when n3 is the greatest number, i.e., when a topcross is required, the parent with the largest number of favorable alleles should be used as the third parent.
Effects of Incompletely Linked Markers on Allele Frequencies following Selection
It takes substantial effort to develop markers that are completely linked to target alleles. The usefulness of incompletely linked markers depends on the level of recombination between the marker and the target allele and the minimum frequency of target genotypes considered acceptable following selection. If the minimum acceptable frequency of target genotypes is taken to be 0.95, a single marker will be suitable if its distance to the gene is less than 5 cM and homozygotes are to be selected in the F2 generation (Table 2). Single markers with a genetic distance of 10 cM will result in a frequency of the target allele of 0.91 (Table 2). However, selection in the F2 for flanking markers at 10 cM results in an allele frequency of 0.99, equivalent to that of a single marker 1 cM from the target gene. Such flanking markers will be better than a single marker at 5 cM in all cases, including where homozygotes are selected in F10 (0.959) or where allele enrichment is applied in F2, followed by selection of homozygotes in F10 (0.963).
|
Selection in the F1 Generation of the Topcross
In the F1 generation of the topcross (TCF1), Rht-B1, Rht8, Cre1, Glu-B1, and tin are segregating. The target genotypes of Rht-B1aRht-B1a and Glu-B1iGlu-B1i have a frequency of 0.5 in TCF1, and all other target alleles exist in heterozygous form at frequencies of 0.5. Therefore selection of Rht-B1a and Glu-B1i homozygotes and allele enrichment for Rht8, Cre1, and tin can be applied in TCF1, and the theoretical selected proportion in TCF1 is 0.55 = 0.0313. Considering this high proportion and for simplicity, no other selection option was applied in TCF1.
Selection in the F2 and F2Derived DH Generation of the Topcross
The target genotype lacks Rht-B1b and Rht-D1b and is homozygous for Rht8, Sr2, Cre1, VPM, Glu-B1i, Glu-A3b, and tin (Table 1, last row). We considered three options for selection in TCF2: (i) no selection in TCF2, (ii) F2 enrichment at all loci except Rht-B1 and Glu-B1 (as Rht-B1a and Glu-B1i have been fixed after selection of the homozygotes in TCF1 at the two loci), and (iii) selection of Rht8 homozygotes and F2 enrichment of all remaining alleles. Selection of homozygotes at two loci in TCF2 was also simulated, but a much larger minimum population size in TCF2 was required (results not shown).
For the three options considered, selection of target homozygotes was conducted in DHs, i.e., the first option (no selection in TCF2) consists of two selection stages, one in TCF1, the other in DHs. The simulation shows the proportion selected in TCF1 is close to the theoretical upper limit of 0.0313 (Table 3). The selected proportion in DHs is about 0.0009, requiring a large DH population to select the target genotype. The second and the third options both consist of three selection stages, one in TCF1, one in TCF2, and one in DHs. For the second option, the selected proportion is 0.1190 in TCF2 and 0.0071 in DHs. The third option has a more evenly distributed selected proportion over stages and requires the smallest number of lines overall (Table 3). In practice, if multistage selection is applied, the general rule to minimize population size would be to minimize differences in selection intensity at the different stages, which will minimize cost if markers are equal in cost. Multiplexing appropriate sets of markers provides further cost savings.
|
Given that tin and its microsatellite marker are 0.8 cM apart, the estimated allele frequency of tin is at 0.77 in the final selected population. The reason for the lower than expected frequency is due to its linkage in repulsion with the important glutenin allele, Glu-A3b, in parents Sunstate and Silverstar+tin (Table 1). The haplotype frequency from the biparental cross between Sunstate and Silverstar+tin illustrates the effect of repulsive linkage on allele frequency. When three linked loci, Glu-A3, tin, and the marker for tin (denoted as Mtin), are considered, there are eight haplotypes (Table 4). When no crossover interference is assumed, the frequency of each haplotype can be calculated from the recombination frequency between Glu-A3 and tin, and between tin and its marker (Table 4, last column). After MAS for Glu-A3b and tin, only Haplotypes 2 and 3 are retained, with a frequency for tin of 0.01488/(0.01488 + 0.00388) = 0.79318, which in turn confirms our simulation results. The frequency of tin may not be sufficient, and therefore the presence of the tin allele following MAS must be confirmed by other methods.
|
The selected proportion in Table 3 can be used to determine the minimum population size for each selection stage. At this point, the presence of the tin gene needs to be reconfirmed by phenotyping. Currently, laboratory progeny marker screening and field selection experiments are underway with these populations so that we can validate the simulation results.
To identify the best strategy with the smallest minimum population size to recover one target genotype does not solve all the problems facing breeders when using MAS. Sometimes, breeders may want to know how many target genotypes can be selected at the end of the selection process. This is important if breeders want to select on other segregating traits for which no markers are available. For example, there are 500 individuals in the TCF1, 50 seeds are taken from each selected individual after Step 1. After the selection of Step 2, 50 DHs are developed from each selected individual in TCF2, based on which the selection of Step 3 is applied. From 1000 simulation runs, we found on average 15.73 individuals were selected in TCF1, 31.43 were selected in TCF2, and 16.50 DHs with the target genotype (Table 1) were selected at the end.
In practice, breeders can seldom repeat a breeding process. But simulation has the advantage of being able to investigate the outcome of a crossing/selection process for a large number of replications, from which the variation can be estimated. From the 1000 simulation runs, we found the standard errors of selected individuals in TCF1, TCF2, and DHs are 4.00, 10.01, and 11.25, respectively. The frequency distribution of the number of selected individuals in TCF1 and DHs are shown in Fig. 2 . The number of selected individuals has a range from 5 to 31 in TCF1, and a range from 0 to 76 in DHs. Simulation cannot determine the exact number of selected individuals for a single selection experiment but can determine the probability of selecting a certain number of target genotypes. For the selection process previously described, the probability is 0.995 to select one or more target genotypes, 0.645 to select 10 or more, and 0.287 to select 20 or more (Fig. 2). Thus a larger population may be required if the breeders want to select no less than 20 DHs, based on which the selection for other important traits can be applied.
|
In this article, we give practical guidelines and a specific example of combining alleles related to several traits into the same target genotype. In practice, our breeding program described uses population sizes slightly greater than those given, as our program attempts to recover more than a single genotype during recombination. These guidelines are most relevant when the genes of interest are already present in genotypes that have relatively "adapted" backgrounds for other complex agronomic traits, as we have not considered here the effects of random background selection in the donor parents (Frisch and Melchinger, 2005). An extension of this work to optimize selection where a quantitative trait of interest is associated with multiple QTL and has complex gene action (including genotype by environment interaction) is currently underway.
| ACKNOWLEDGMENTS |
|---|
Received for publication May 28, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Y. Xu and J. H. Crouch Marker-Assisted Selection in Plant Breeding: From Publications to Practice Crop Sci., March 19, 2008; 48(2): 391 - 407. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |