|
|
||||||||
a CIMMYT, Int. Maize and Wheat Improvement Center, Lisboa 27, Apdo. Postal 6-641, 06600 Mexico D.F., Mexico
b Monsanto Life Sciences Research Center, 700 Chesterfield Parkway North, St. Louis, MO 63198
* Corresponding author (j.ribaut{at}cgiar.org)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: BC, backcross cM, centimorgan MAS, marker-assisted selection PCR, polymerase chain reaction N, screened population size Ni, number of individuals Nsl, selectable population size SNP, single nucleotide polymorphism
| INTRODUCTION |
|---|
|
|
|---|
During the past several years, simulations to evaluate the efficiency of MAS as a breeding tool have been reported by various groups. These simulations have been quite diverse; for example, MAS has been tested combining phenotypic and genotypic data in a selection index (Lande and Thompson, 1990; Knapp, 1994; Xie and Xu, 1998a), considering different breeding generations (Edwards and Page, 1994), and for different breeding schemes (Xie and Xu, 1998b). Efficiency of MAS has also been evaluated considering the heritability of the target trait (Hospital et al., 1997; Knapp, 1998), the genetic effect at target loci (Van Berloo and Stam, 1998), and by monitoring target genomic regions simultaneously vs. one by one (Hospital and Charcosset, 1997). Most of the theoretical papers related to MAS present complex mathematical models, making it difficult to directly derive a practical MAS experiment. In addition, the implications of using different laboratory strategies that can be considered to achieve the selection, such as different DNA markers, are rarely taken into account in those theoretical papers when comparing the efficiency of different approaches.
The objective of this paper is not to present new genetic models, but rather to provide some guidelines at both the theoretical and practical levels for identifying the most appropriate BC-MAS strategy based on the objectives of different types of applied breeding experiments. To achieve this objective, the relationship between the size of a segregating population that is screened at each generation and the number of BC generations required to achieve the selection was evaluated through simulations that considered different selection models for complete and partial line conversion.
| METHODS |
|---|
|
|
|---|
The objectives of a BC-MAS strategy are to identify individuals heterozygous for donor and recipient alleles at target loci and homozygous for recipient alleles at nontarget loci. Given these objectives, each generation can be divided into two steps. The first step is to identify the genotypes that are heterozygous at the target loci, reducing the screened population size (N) to the Nsl. The second step is to identify within the Nsl individuals those presenting the most suitable genomic composition at the nontarget loci. Assuming no linkage between target genes, the expected Nse can be obtained as Nse = (1/2)t N, where t denotes the number of target genes. For Nsl = 1, the minimal Nsl, no selection pressure can be applied at nontarget loci.
Simulation Experiments
Factors considered in the simulations included N, the number of target genes (one, three, and five), the distance between the flanking markers and the target gene (220 cM), and the number of genotypes selected in each BC generation (one to eight) used to generate the next screened population. Target genes were assigned randomly to one marker on a chromosome and no more than one gene per chromosome was considered.
Marker-locus genotypes of progeny individuals were simulated based on marker-locus genotypes of parents and rules of Mendelian segregation. Genotypes were simulated as strings of 1 (heterozygous for donor and recipient alleles) or 0 (homozygous for recipient allele). An F1 diploid individual consists of a string of 1's and another string of 0's, and only one string was regenerated for each individual in each BC generation since gametes from the recurrent parent were all the same. Haplotypes were simulated by "random walking" (all randomness being simulated by the computer's pseudorandom number generator) along the marker linkage map. The string for an individual began with the same bit by equal chance as one of two strings in the individual selected in the previous population and crossed over to read from the other string if a random number of uniform (0, 1) exceeded the specified recombination probability. Such practices are found in Tanksley and Nelson (1996).
A genome size of 10 chromosomes, each 200 cM in length, was chosen to approximate the genome of maize. All markers were assumed to be evenly distributed with an interval size of 20 cM (11 markers per chromosome), except for the two markers flanking the target genes. Two criteria were used to compare BC-MAS strategies: the number of generations necessary to obtain at least one desired individual, and the proportion of donor genome present after a fixed number of BCs. All simulation results presented in this study represent the average of 1000 repeats for each case in order to look at the variability and the distribution of the results. When we analytically predict the selection advance, we are pointed to the tail of the distribution that is most affected if the normality is violated. The normality assumption could be satisfied fairly well in BC1 since all loci are under segregation. But the approximation becomes poor in BC2 and BC3 when most loci are fixed and only a small proportion of loci are still segregating. For the simulation, we did not assume normality, and the genetic model closely resembles the practical situation except for the recombination interference. The effect of the interference on the allele introgression would make the elimination of nontarget alleles easier but the target allele more difficult. Thus, the overall effect of the interference on gene introgression would be neutral. The results of the calculation and simulation are very similar, which make the simulation results more reliable.
Complete Line Conversion
The objective of a complete line conversion is to develop a line that will have exactly the same genetic composition as the recipient line, except at target loci where the presence of homozygous alleles from a donor line is desired. By definition, such conversion requires strong selection pressure at nontarget regions linked to the target gene, due to the genetic drag generated by the presence of the donor allele at the target loci (Tanksley et al., 1989). Genetic drag is lowest in genotypes with homozygous recipient alleles at the two markers flanking the target gene. Because selection requires identification of recombinations between the target gene and flanking markers, the two flanking markers for each gene involved in the model must be carefully identified. A recombination rate between target gene and flanking markers of 2 to 20 cM was employed, depending on the requirement of the conversion.
We used a selection index based on the probability that an individual generates progeny with the desirable gametic type. The desirable gametic type is defined to have recipient alleles at all marker loci except the target gene. Individuals with the highest probability of giving rise to offspring with the desired genotype, the one presenting recombination in the flanking intervals of the target gene, were considered to be the most desirable recombination. For example, assume that there are two individuals with the marker genotype on the carrier chromosome (markers on other parts of the genome can be considered in the same way) as M1 m2 m3 M4 T5 M6 and m1 m2 M3 M4 T5 M6; M and m represent alleles from donor and recipient lines, respectively, and T is the target gene. The recombination is needed in each of the two flanking intervals of T in the desirable gametes. While conditional on these two markers, segregation of other markers is independent of T and can be treated the same as markers on the noncarrier chromosome. Therefore, the probability that either of two individuals would be selected based on the number of heterozygous markers is equal. However, the probability of the gamete with all markers being m except T generated by two individuals is 1/2(1 - r1+2+3)r4r5 for Individual 1 (since m2 and m3 are fixed already, only r1+2+3 is relevant here), and 1/2(1 - r3)r4r5 for Individual 2, where r is the recombination fraction in the corresponding interval and r1+2+3 represents the recombination between Marker 1 and Marker 4. The probability of a desired gamete is higher for Individual 2 than for Individual 1. The extension of the algorithm to the whole genome is straightforward, since segregation of markers from different chromosomes is independent. Therefore, the probability was calculated for each chromosome and multiplied over chromosomes. In the following simulation, a logarithm of the probability is used, which changes the multiplication to simple summation. Simulations were performed to investigate the effect of the population size, the size of the marker intervals flanking the target gene, and the number of target genes simultaneously introgressed.
Partial Line Conversion
Partial line conversion means that the conversion is complete when a limited proportion of the donor genome in an individual is found scattered over the genome in addition to the desirable homozygous alleles at the target gene. The selection index in this case is based on the estimated proportion of recipient genome. This is similar to phenotypic selection for a quantitative trait, the method used for the selection index proposed by Hospital et al. (1992). In this case, the preference for individuals with recombination in the flanking regions of the target gene is not necessary because the criterion is the total proportion of the recipient genome.
We considered marker selection on nontarget loci at only one generation while the desired allele at the target locus was selected in all generations. Let µt denote the mean of the proportion of the donor genome of individuals in Generation t, and let st denote the mean in the selected individuals in Generation t (t = 1, 2, 3). Based on classical selection theory, assuming normality and high marker density which provides the so-called heritability h2 a value close to 1 (as Visscher, 1996) showed that all the variance of the genetic composition can be explained by placing three or more markers per chromosome),
![]() |
t the standard deviation among individuals. It would be safe to assume that the relative reduction in µt+1 from st due to one more generation of backcrossing depends mainly on the value of st and has a minor effect from the genomic composition of the selected individual. That is, the mean in the next generation can be approximated as stµt+1/µt = µt+1(1 - i
t/µt). Then, the response on the mean of BCt2 due to the selection in BCt1, t2
t1, can be approximated as
![]() |
Therefore, the efficiency of the selections in a generation will depend on the ratio of
t:µt. We evaluate the mean and variance for BC1, BC2, and BC3 using the formulas of Stam and Zeven (1981). Simulations were also performed to compare the efficiency of the selection schemes in different generations. The simulation results were compared with the above analytical results.
| RESULTS |
|---|
|
|
|---|
100.
|
We defined the selection efficiency as the optimal ratio between the resources that have to be invested and the number of selection cycles required to achieve a complete selection, based on results presented in Fig. 1, the most efficient selection scheme should consider the screening of an initial population size that will result, after selection at target loci, in a Nsl ranging between 50 and 100 genotypes. Below these values, changes in Nsl still have a major impact on the number of selection generations, while above these values, changes in Nsl implies more resources for reduced impact on the selection process. For Nsl = 100,
200, 800, and 3200 individuals must be screened for one, three, and five target genes, respectively.
Complete Line Conversion
It is impossible to obtain complete line conversion, that is, the presence of only the homozygous donor alleles at the target gene. Therefore, line conversion is considered complete when, out of the selectable population, a genotype homozygous for recipient alleles at all detected nontarget loci can be identified. It implies that recombination should take place on both sides of the target gene(s), and the donor genome's contribution in the final line would be mostly around the target gene. Assuming no double recombination between two nontarget loci, and a 20-cM distance between the flanking markers and the target gene, the expected line conversion level (proportion of the genome from recipient parent) is 99% for one target gene, and 95% for five genes. For a 2-cM distance, this probability is 99.9 and 99.5% for one and five target genes, respectively. Table 1 presents the number of BC generations required to achieve the line conversion, based on simulation results. The simulations considered changes in the recombination frequency between a target gene and flanking markers, one to five target genes, different screened population sizes, and the selection of one or two individuals at each selection generation. Using these results and a manageable sample size (e.g., a selectable population of 50 to 100), the introgression can be completed in three to four generations for a single target gene, four to seven generations for three target genes, and four to nine generations for five genes, depending on the distance of flanking markers to the target gene. Dramatic effects are seen on the number of generations when Nsl < 50, and less so when Nsl >100, independent of the number of target genes, and conforms to the results in Fig. 1B.
|
Except for the case where a small Nsl (<50 genotypes) is combined with a reduced recombination frequency between the target gene and flanking markers (2 and 4 cM), the number of generations for selecting two individuals (Ni = 2) in each generation is almost equivalent to the results of selecting one individual (Ni = 1), with a population half that size for most of the cases in Table 1. Therefore, the selection fraction, which is the ratio of the Ni selected vs. the N, is a major parameter in a MAS experiment.
Partial Line Conversion with a Single Generation of Marker-Assisted Selection
The objective of a partial line conversion is to identify a line with donor alleles at target genes and a proportion of donor genome below a desired level. Usually no restriction would be enforced for the donor genome contribution outside the target loci over the genome. Mean and variation of the genome size from the donor of an individual in BC1, BC2, and BC3 populations, without selection at nontarget loci, were calculated separately for carrier and noncarrier chromosomes, and summed based on our 10-chromosome genome of 2000 cM. The ratio of the standard deviation to the mean was then calculated assuming one, three, and five target genes (Table 2). As the backcrossing continues, the ratio of the standard deviation to the mean of the donor genome contribution increases. This implies that the most efficient marker-assisted selection would be in later rather than early generations if only one generation of selection at nontargeted loci is applied. Without selection, the donor genome size in an individual decreases exponentially as the backcrossing proceeds, and most of the donor genome can be reduced through the BC process, especially on the noncarrier chromosomes. However, the difference between generations decreases with the number of genes included in the model. When more genes are involved in the selection model, fewer noncarrier chromosomes are involved; and as previously mentioned, the variation in the donor genome size is mostly from the noncarrier chromosomes. A BC-MAS strategy involving MAS at nontarget loci at a single BC generation induces a larger reduction of the donor genome contribution at nonselected loci compared with BC-MAS selection conducted at an earlier generation, when the allelic introgression is conducted at one or a few target genes rather than several genes.
|
|
50 and 70% of that found in a complete selection conducted only in BC1 and BC2, respectively (Table 3), which conforms to the results presented in Table 2. Differences in the Nsl, when selection is conducted only at a target locus, have the largest impact on the donor genome proportion when the complete MAS is conducted at the most advanced BC. Nevertheless, these differences become small when the Nsl is increased, as there is almost no difference between Nsl = 10 or Nsl = 100, even if the complete selection is conducted at the third BC. One should also consider that the smaller the Nsl, the larger the variation of the donor genome in the individuals in BC3. Therefore, a Nsl used to maintain the population would be 10, even if the single complete MAS is performed only at BC3.
|
| DISCUSSION |
|---|
|
|
|---|
From the Screened Population Size to the Selectable Population Size
The ratio between N and Nsl depends on the number of target loci. For more than three target loci in the model, the screening of thousands of plants is required to obtain a Nsl of 100 genotypes. Selection at each target locus reduces the Nsl by half. Nevertheless, the screening of the whole population has to be conducted at least once at the beginning of each BC generation. With N equal to thousands of individuals, such screening can be laborious and expensive. However, it can be optimized by using an appropriate combination of DNA markers. If markers can be amplified in the same reaction tube (Ribaut et al., 1997), a tremendous reduction in the number of PCR reactions required to conduct the selection can be achieved (e.g., in one step, duplexing reduces the population size by four, triplexing by eight). The PCR-based primers that amplify target genes could be distinguished in a single separation, because they amplify different fragment sizes. If this is not possible, other PCR-markers closely linked to the target genes might be used. Assuming the availability of fluorescent detection, the labeling of the different PCR-primers with different dyes allows direct multiplexing of the markers (Karp et al., 1997). Once the sequences of the donor and recipient alleles are known, the use of allele specific marker-like molecular beacons (Bonnet et al., 1999) and SNPs (Gilles et al., 1999) might be an efficient option. Indeed, the gel step can be eliminated by using this technique, and direct multiplexing can be obtained using different fluorescent dyes. Considering all these options, multiplexing in a BC-MAS should always be possible. Furthermore, in the context of the overall cost of an experiment, it is important to identify the most suitable set of markers at the target loci.
From the Selectable Population Size to the Selected Plants
Once the selectable population is identified, screening of Nsl genotypes with DNA markers should be conducted at nontarget loci in order to reduce the donor genome contribution. The selection response for this second selection step depends on the recombination frequency between the target gene and the flanking markers, and on the densities of the markers on the carrier and noncarrier chromosomes. In our simulation, we considered a fixed number of markers at each generation, therefore, the issue of different marker densities related to the BC generation is not addressed here. In regards to deciding how many unlinked nontarget loci should be screened and how they should be distributed, several strategies have already been advanced. The density of the marker coverage, for example, can be adapted to the inbreeding level of each BC generation. It has been shown that increasing the number of markers to more than three per noncarrier chromosome was not efficient at early generations (Hospital et al., 1992). At each new generation, due to additional crossover probability, an increase in the number of markers should be considered in order to optimize selection. This increase is balanced by the fact that markers that revealed fixed alleles at nontarget loci at one generation need not be screened at the next BC generation.
The Selected Genotypes at Each Generation
As presented, the Ni selected at each generation has a direct impact on the duration of the selection process. Although in BC-MAS the selection of a single individual is the fastest strategy in terms of generations required to achieve the selection, it is, nevertheless, risky from a practical point of view. A mistake at one of the selection steps or an unexpected field problem, such as low germination or poor pollen quality, will have dramatic consequences. Based on these practical considerations, the selection of more than one genotype at each generation should be considered. In practice, the number of individuals selected at each generation can be limited by the propagation ability of the studied crop, that is, the number of selected plants that are necessary to derive the suitable population size at the next selection generation. This limitation is important when several target genes are involved in the selection and the planting of thousands of plants is required. In this respect, maize, and more generally, cross-pollinated plants, offer an advantage. If the best genotype is selected before flowering, the pollen of only one selected plant is sufficient to develop the next large population, using several plants from the recipient line as females. This procedure is not general to all crops, and it may make the selection of only the best individual to create a large population at the next generation unrealistic. If the number of selected plants required at each generation to develop the next population is high, the optimal Nsl should be considered carefully. If this constraint is too great, other BC-MAS strategies may be considered.
Line Conversion
On the basis of several simulation studies, it is clear that BC-MAS is especially efficient when conducted on large segregating populations (Hospital et al., 1997). Nevertheless, the identification of a screened population size, which leads to the most efficient strategy for a line conversion through BC-MAS, has to consider different values for parameters that interactively influence the length of the selection process. Simulations have been widely used to evaluate and compare different strategies for the allelic introgression at single or multiple genes. Among a range of uses, simulations have been used to evaluate the optimal distribution of markers for carrier and noncarrier chromosomes (e.g., Hospital et al., 1992; Visscher, 1996), to optimize the position of the flanking markers (e.g., Frisch et al., 1999b), and to identify the minimum screened population size to obtain a given genomic composition in a given number of generations (e.g., Visscher et al., 1996; Frisch et al., 1999b). Moreover, several software programs, such as QU-GENE (Podlich and Cooper, 1998) and PLABSIM (Frisch et al., 2000), are now available to make selection predictions through simulations.
Obtaining a clear vision of the appropriate BC-MAS strategy is difficult because the implications of changing the values in the parameters involved in the selection are hard to project. As an example, it has been demonstrated in several studies that to minimize genetic drag around selected loci, emphasis should be placed on BC-MAS at an early stage of recombination on recombination events close to the target gene (Tanksley et al., 1989; Frisch et al., 1999a). If the strategy is clear, the choice of the most suitable markers to apply it must be considered carefully. Indeed, the distance between the target gene and the flanking markers has a major impact on the number of BC generations required to achieve the selection, especially when several target genes are considered. On the basis of our results, with five target genes and a selectable population of 100, having the flanking markers at 2 vs. 12 cM almost doubled the length of the complete line conversion (7.9 vs. 4.6 BC generations). In both cases, the level of conversion is different, 99.5 vs. 97% for 2 and 12 cM, respectively. Therefore, depending on the objective of the BC-MAS experiment, the position of the flanking markers can be quite different. In some cases, the most efficient strategy is less clear, especially when different theoretical approaches might serve the same purpose. For example, an increase of population size when advancing the BC generation reduces the number of required marker data points in comparison with a constant population size across all generations (Frisch et al., 1999a). The same data point reduction might be reached by increasing at each new BC generation the number of markers at nontarget loci while screening the same population size at each generation (Hospital et al., 1992). Different approaches might also be combined to increase the efficiency of the selection. Frisch proposed to identify through simulations the minimum population size that has to be screened to obtain at least one individual with a target genomic composition (Frisch et al., 1999b). This approach might be very relevant at an advanced generation of the strategy proposed in this paper. Indeed, at the end of the selection process, it is appropriate to calculate the Nsl that will allow the completion of the selection in one generation, thereby eliminating the need for an additional generation, even if Nsl > 100 in the last selection generation.
The nature of the germplasm considered in a BC-MAS experiment also has a major impact on the identification of the most suitable strategy. For example, the biological implication of having different levels of line conversion must be considered carefully. As already discussed, the distance between the flanking markers and the target gene has an impact on the final level of conversion, 99.5 vs. 97% for 2 and 12 cM, respectively. The biological implication on the plant phenotype of this 2.5% difference in donor genome contribution outside the target genes is difficult to predict and depends on the agronomic characteristics of the donor line (Lee, 1995). Once a target gene is introduced for the first time into an elite line, flanking markers at 2 cM should be the best option; while in the next phase, the transfer of the same target gene from elite into elite material, the use of flanking markers at 12 cM might be more effective.
Partial Line Conversion
Given the selection of only a few of the best genotypes at each generation, a single BC-MAS may be most efficient when conducted at advanced BC generations. After studying different selection schemes, Hospital et al. (1992) concluded that selection in later generations is better. The strategy of using one generation of selection in an advanced BC generation is an attractive option, especially if allelic introgression at a few target genes is considered concomitantly in a large number of recipient lines. The small population required for the first generations, in which selection is only conducted at target loci, represents a major logistical advantage. Moreover, if one target gene is linked to a phenotypic marker, or is a transgene (with a selectable gene such as herbicide resistance included in the gene construct), selection for this gene can be conducted phenotypically, reducing the cost of the selection. If this is the case, no DNA extraction is required to conduct the selection during the first generations. The "penalty" for this strategy is the retention of some donor genome contribution at nontarget loci, most of it flanking the target genes on the carrier chromosomes. Possible negative impacts from this remnant donor genome on plant performance can be minimized if the donor line is elite germplasm, because the probability of having bad agronomic characteristics dragged into the selection at nontarget loci is reduced.
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
Received for publication December 8, 2000.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. Tuberosa, S. Salvi, S. Giuliani, M. C. Sanguineti, M. Bellotti, S. Conti, and P. Landi Genome-wide Approaches to Investigate and Improve Maize Response to Drought Crop Sci., December 18, 2007; 47(Supplement_3): S-120 - S-141. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Mumm Backcross versus Forward Breeding in the Development of Transgenic Maize Hybrids: Theory and Practice Crop Sci., December 18, 2007; 47(Supplement_3): S-164 - S-171. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-M. Ribaut and M. Ragot Marker-assisted selection to improve drought adaptation in maize: the backcross approach, perspectives, limitations, and alternatives J. Exp. Bot., January 1, 2007; 58(2): 351 - 360. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Baenziger, W. K. Russell, G. L. Graef, and B. T. Campbell Improving Lives: 50 Years of Crop Breeding, Genetics, and Cytology (C-1) Crop Sci., September 8, 2006; 46(5): 2230 - 2244. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Frisch and A. E. Melchinger Selection Theory for Marker-Assisted Backcrossing Genetics, June 1, 2005; 170(2): 909 - 917. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||