Crop Science Illumina
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (14)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Reyes-Valdés, M.H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Reyes-Valdés, M.H.
Agricola
Right arrow Articles by Reyes-Valdés, M.H.
Crop Science 40:91-98 (2000)
© 2000 Crop Science Society of America

CROP BREEDING, GENETICS & CYTOLOGY

A Model for Marker-Based Selection in Gene Introgression Breeding Programs

M.Humberto Reyes-Valdésa

a Universidad Autónoma Agraria Antonio Narro, Departamento de Fitomejoramiento. Buenavista, Saltillo, Coah. C.P. 25315. Mexico

mhreyes{at}uaaan.mx


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
Marker-based breeding can be useful to expedite introgression of specific genetic material from a donor parent into the background of an elite variety, through backcrossing. A model is proposed to predict the probability of donor parent genetic material being present in specific regions of the genome, and its proportion at the chromosome-specific or whole genome levels, as a result of marker-based introgression. Furthermore, formulas are provided to calculate the variance of the predicted values. Two kinds of markers are considered: donor parent specific and recurrent parent specific. The first type serves to introgress the desired fraction of donor genome, and the second one to recover the recurrent parent background genome. In all cases, the probabilities and genomic proportions are calculated on a genetic map basis. This model permits any localization of markers through the genome, but requires knowledge of their map positions and the map lengths of the chromosomes. It is robust to mapping functions, and admits any one based on the assumption of coincidence being equal to the kth power of twice the recombination fraction. Two widely used mapping functions gave fairly different predictions of global chromosome introgression. Monte Carlo simulations for several circumstances allowed the testing of the model, and no significant statistical deviations from the theoretical predictions were found. The results indicate that the formulas presented herein can be useful for planning and prediction in a backcross breeding program.

Abbreviations: BC, backcross • cM, centimorgan • GS, genomic selection • MAS, marker assisted selection • QTL, quantitative trait locus • RAPD, random amplified polymorphic DNA • RFLP, restriction fragment length polymorphism


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
ONE OF THE MAIN OBJECTIVES of plant breeding is the introgression of one or more genes from a donor parent into the background of an elite variety. This is a way to retain the qualities of a good variety, while adding desirable traits such as resistance to environmental stress factors or nutritional quality from either domesticated or nondomesticated germplasm sources. The classical approach to introgression has been the successive backcrossing to a given elite variety, the recurrent parent.

Although the backcrossing approach has been successful in many instances, one of the main limitations is the number of generations, and thus time, necessary to achieve the introgression objective. Classically, the expected fraction of genome from the recurrent parent in the bth backcross generation is calculated as 1-(1/2)b+1. However, this formula ignores "linkage drag" (Brinkman and Frey, 1977), i.e., the persistence of donor genetic material linked to the gene to be introgressed.

Linkage drag was considered by Hanson (1959). He developed predictors of the length of chromosome segments retained around a locus held heterozygous with backcrossing or selfing. Naveira and Barbadilla (1992) derived the theoretical distributions of the lengths of donor chromosome segments. Another approach, taken by Stam and Zeven (1981), predicts the proportion of donor genome in the resulting generation. The approach of Hanson (1959), which is based on average donor chromosome lengths, ignores the presence of donor chromosome segments in places of the genome that are non-adjacent to the gene to be introgressed.

The advent of DNA markers opens many possibilities for backcross-based introgression. For instance, with markers linked to specific quantitative trait loci (QTLs), it is possible to introgress specific regions of the genome that confer desirable quantitative characteristics to an elite variety (Tanksley et al., 1989; Paterson et al., 1991; Dudley, 1993). In tomato (Lycopersicon esculentum Mill.), lines have been created that contain QTLs from the wild species Lycopersicon hirsutum Hub. & Bonpl.. Such lines outperform the original elite variety in yield, soluble solids content, and fruit color (Tanksley and McCouch, 1997). This result was accomplished by the "advanced backcross QTL analysis," developed by Tanksley and Nelson (1996), and marker-assisted selection. DNA markers can be useful as well to select for maximum similarity to the recipient line and minimum similarity to the donor line (Hillel et al., 1990). This approach can help to expedite the recovery of recurrent parent background, while retaining the gene or genes to be introgressed.

Several authors have contributed to develop the theory and strategies of marker-assisted selection (MAS). Use of flanking markers tightly linked to the target gene in introgression programs was suggested by Young and Tanksley (1989). Under this scheme, individuals in a segregating backcross population can be scored for target genotype along with flanking RFLP markers aimed to recover the background genotype. If the markers are very close to the target gene, selection can be applied to one marker in one backcross generation and to the other marker in the subsequent generation, thus allowing a realistic population size.

Lande and Thompson (1990) derived selection indices to maximize the rate of improvement in quantitative traits under different MAS schemes, by combining the information on molecular genetic polymorphism with data on phenotypic variation. This scheme allows variations in selection intensity. The proposed selection indices therein can be applied to sex-limited traits and can use information from relatives. Also, this approach can be applied to marker selection of immatures, in which selection of seedling, embryos, or juveniles is based on molecular marker loci, followed by conventional phenotypic selection of the surviving adults.

The so-called "genomic selection" (GS), proposed by Hillel et al. (1990), employs as a selection criterion the degree of resemblance between DNA fingerprints of a candidate and that of the desired or undesired genome. They presented theoretical distributions and variances of the relative percentage of donor genome without considering information about map positions of markers. The GS scheme permits gene introgression with use information of DNA fingerprints to maximize the recipient genome and minimize the donor genome. Visscher et al. (1996) criticized the formulas of Hillel et al. (1990) because their model ignores recombination around the marker loci.

Hospital et al. (1992) studied the effects of time, selection intensity, population size, and number and position of selected markers in introgression breeding programs, on the expected proportion of recipient genome. They focused on the case where only one gene of interest from a donor parent is introgressed into a cultivar. They considered recurrent parent markers surrounding the gene of interest and found that rather distant markers better control the gene neighborhood in terms of recovering recurrent genome, unless high selection intensity can be applied. Additionally, they analyzed the use of recurrent parent markers in chromosomes not carrying the introgressed gene and report that increasing the number of markers to more than three per chromosome is not efficient. A possible limitation of their analytical approach to calculate the expected proportion of recipient genome in non-carrier chromosomes is the assumption of independence among all loci. However, their simulation results are in qualitative accordance with their analytical approach.

Visscher et al. (1996) investigated by simulation the relative gain in a backcross program using only markers, only phenotypes, or an index of markers and phenotypes. They found that markers were efficient in backcross programs for simultaneously introgressing an allele and selecting for the desired background. Marker spacing of 10 to 20 centimorgans (cM) gave an advantage of one to two backcross generations of selection, relative to random or phenotypic selection. In this and all other cases, the cited authors assumed a Poisson distribution of crossovers along the chromosomes; thus they based their calculations on the Haldane (1919) mapping function. Use of other mapping functions and introgression of several genes simultaneously have not been analytically addressed so far; however, the case of several markers and traits for other MAS schemes has been revised by simulation elsewhere, for example by Gimelfarb and Lande (1994).

In this work, I studied both analytically and numerically the outcome of backcrosses with selection on the basis of two types of marker alleles: those linked to the gene(s) to be introgressed ("donor markers") and markers used to recover the background genotype ("recurrent markers"). The objectives were (i) to derive functions for the probability of donor genetic material in a given site of the genome after a given number of backcrosses, and (ii) to derive the expected proportion of donor genome in a given chromosome and the whole genome.

As a starting point, my model considers selection of only "ideal genotypes", that is, those that have the ideal combination of marker alleles. To be realistic, this perspective requires either large population sizes or few markers. However, the model is further extended to the use of different sets of markers in each generation as a way to reduce the population sizes required for the selection process.

The advantage of this model over the previously published ones is that it allows any number of markers with any distribution along the genome; thus permitting predictions to be made in programs where several genes are being introgressed. Furthermore, this model is robust to interference, allowing the use of several mapping functions other than that proposed by Haldane (1919), which is unrealistic in many cases. For example, the Kosambi (1944) mapping function fits most data fairly well (Crow and Dove, 1990) and it can be used in this model.


    Genetic model
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
Assume that a recurrent parent is crossed with a donor parent that contains one or more genes to be introgressed. The resulting F1 is mated to the recurrent parent, thus generating the first backcross (BC1); this generation is crossed with the recurrent parent, generating the second backcross (BC2) and so on. In each backcross generation, the plants to be crossed with the recurrent parent are selected in such a way that all of them possess the whole set of marker alleles present in the recurrent parent, and the marker alleles associated with the gene(s) to be introgressed. This imposes a restriction on the markers to be used: they have to be codominant, or if dominant, as in the case of RAPDs (Williams et al., 1990), the donor parent must carry the dominant alleles.

The basic predictions in this model relate to the sets of chromosomes coming from the non-recurrent parent to form the selected plants of the bth backcross generation, i.e., they model "selected gametes" produced by the (b-1)th generation.

Let c be the coincidence (actual double recombinations)/(number expected with no interference), and assume that it approximates to (2r)k, where r is the recombination fraction in an interval between two markers, and k is a constant that depends on the mapping function to be used. For instance, the Haldane (1919) mapping function assumes no interference, thus c = 1 and k = 0; the Kosambi (1944) mapping function assumes that c = 2r, thus k = 1.

Let us denote by {delta}(d) an inverse mapping function that converts a map distance d, given in morgans, to a recombination fraction r. In the case of the Haldane (1919) mapping function, {delta}(d) = H(d) is:

In the case of the Kosambi (1944) mapping function, {delta}(d) = K(d) is:

where tanh is the hyperbolic tangent.

Consider three kinds of chromosome landmarks: chromosome ends (telomeres), recurrent markers, and donor markers. From these kinds of landmarks, six kinds of chromosome intervals are possible: end–end, end–recurrent marker, end–donor marker, recurrent marker–recurrent marker, recurrent marker–donor marker, and donor marker–donor marker. The formulas in Table 1 give the probability of donor marker in a given site of the chromosome, and are specific for each kind of interval. The symbols herein used are defined as follows: x represents a position on a chromosome in morgans, considering an arbitrary chromosome end as the zero position; r represents the position of a recurrent marker and d is the position of a donor marker (which is linked to the target gene); b and k were defined above. The formula for the probability of donor genome in an end-end interval, i.e., in an unmarked chromosome, is well known in plant breeding textbooks, as well as the formula for an end–donor interval (Allard, 1960). The derivations of the remaining formulas are described in Appendix A.


View this table:
[in this window]
[in a new window]
 
Table 1 Formulas to calculate the probability of donor genome in a given x position of a chromosome, in recurrent parent gametes forming the bth backcross

 
If selection is applied to one generation of selfing after the last backcross, the donor markers must be codominant in order to use the prediction formulas described in Table 1. Furthermore, in all the formulas, b+1 must be considered instead of b if the average probability of donor parent genome in both homologues is to be estimated when selfing and selecting after the bth backcross. If the calculations are intended to obtain the average donor genome fraction in both homologues in a backcross generation, the formulas must be multiplied by 1/2.

It is noteworthy that the recurrent markers are used only in the first generation of selection because they become fixed by the second generation. To state a function that assigns the probability of having donor genome at Position x in a given chromosome, we define a function that indicates whether or not the x value pertains to certain type of the six intervals listed above. Let S be a chromosome interval; thus, the indicator function is:

Let g(x) be a function that gives the probability of a chromosome having donor genome at Position x, with domain (0, L), where L is the length of the chromosome in morgans:



The subscripts of I, (z1, z2) are open intervals that represent unmarked chromosome segments bound by two landmarks. Symbols e1 and e2 are the arbitrary beginning (e1 = 0) and the arbitrary end (e2 = L) of the chromosome; r is the position of a recurrent marker; r1 and r2 are recurrent marker positions with r1 < r2; d is the position of a donor marker; d1 and d2 are donor marker positions with d1 < d2. The term I{d}, added to achieve continuity, means that any x in a donor marker position has a unit probability of having donor genome. Obviously, for the case of an x at a recurrent parent marker, the zero probability does not need to be stated in g(x).

In a way analogous to the work of Stam and Zeven (1981), the expectation of the proportion G of donor genome in the genetic map of a given chromosome of L morgans can be calculated as:

The variance of the proportion G can be written as:

where g(y|x) is the conditional probability of having donor genome at Position y given donor genome at Position x (see Appendix A). The general function stated for g(x) can be used to calculate g(y|x) just by adding a donor parent marker at Position x and computing g(y). Although there are no explicit expressions for E(G) and VG in many cases, the integrals can be numerically solved.

So far the model has treated the case of the expected proportion of donor genome in a single chromosome. To extend this calculation to the whole genome, the following formulas can be used (Stam and Zeven, 1981). First, the expected proportion of donor genome in the n chromosomes can be computed as a weighted mean:

where E(Gi) is the expected proportion of donor genome in the ith chromosome, and Li its length in morgans. Then the variance of GT is:

where VGi is the variance of the donor genome fraction in the ith chromosome. Stam and Zeven (1981) gave a similar expression to calculate VGT, but with the equivalent of Li in the numerator instead of L2i, in what seems to be a typographic mistake.

Calculation of the probability of donor genome at a Position x can be done in a straightforward manner with a hand calculator by applying the formulas given in Table 1. Computation of E(G) and VG, requires use of a computer routine for numerical integration, which is included as a built-in command in several commercial softwares. The programs developed during this work in Mathematica (Wolfram Research, Inc., Champaign, IL) are available free of cost from the author.


    Simulation results
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
The genetic model presented above was tested by Monte Carlo simulation, assuming a random Poisson distribution of crossovers along the chromosome. The simulated chromosomes, with a length of 200 cM, were divided into units with 1 cM in length, in which crossing over was allowed. All the simulations plus model equations were programmed in the Mathematica language. The integrals in the model were solved numerically.

Some marker combinations used in the simulations (Table 2) are unrealistic because a large number of plants would be necessary to screen in order to recover the desired combination; however, they served to test the model in a wide variety of circumstances. In the first case, a chromosome with four recurrent and two donor markers was assumed. In the second case, only one recurrent and one donor marker were considered. In the third case, two recurrent markers flanking one donor marker were considered. In all cases, the observed average proportions and their variances (in the upper row for each backcross in Table 2) were statistically tested against the expected ones (lower row for each backcross) according to the model presented in this paper. The average proportions of donor parent were compared against their theoretical expectations by two-sided z-tests with {alpha} = 0.05. For the case of sampling variances of proportions, they were tested against the theoretical estimates generated by the model presented here, by a bootstrap method with 5000 resamplings in each case and {alpha} = 0.05. For all situations, no significant statistical differences were detected in the observed average percentages and their variances, as compared with their theoretical expectations. In the first case, which includes six markers, a second backcross generation did not improve the outcome in terms of percentage of donor parent genome.


View this table:
[in this window]
[in a new window]
 
Table 2 Results of global introgression, generated by Monte Carlo simulations of marker-based selection in backcross breeding, with three different marker arrays. Symbol E represents a chromosome end, R a recurrent marker, and D a donor marker. The numbers in marker locations are genetic distances in centimorgans. In the columns for fraction of donor genome and sampling variance, the upper figure is the observed number and the lower figure is the expected one. The observed and theoretically expected donor parent genome fractions are relative to the genetic map length of each chromosome

 
Probabilities of donor genome were plotted against the respective chromosome positions (Fig. 1) . The frequencies of donor parent genome for each position, obtained from Monte Carlo simulations of 200 plants in each case, are represented by dots separated by a length of 1 cM. In all cases, the model used to calculate the probabilities fits well with the results generated by simulations. In the first case, donor genome was maintained between the two donor markers near to 100% in Backcrosses 1 and 2. In the second case, the recurrent marker maintained the donor genome in a fairly low percentage to the "left" of the donor marker. In the third case, the two recurrent parent markers, flanking the donor marker (or target gene), were able to keep the donor genome in a fairly low percentage at both sides of the donor marker (or target gene).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1 Graphical representation of the introgression results, generated by Monte Carlo simulations of marker-based selection in backcross breeding, with the three marker arrays depicted in Table 1. Each row in the graphic array represents a chromosome with the given set of markers, for Backcrosses 1 and 2. The continuous line represents the theoretical probabilities of donor genome along the given chromosome, as calculated by the model herein presented. The dots show the observed frequencies as obtained by the simulations

 

    Mapping functions
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
One concern about using the Haldane (1919) mapping function is that the non-interference assumption is unrealistic; e.g., Pascoe and Morton (1987) found a poor fit of Haldane (1919) mapping function when applied to two sets of Drosophila data. Besides its well known mapping function, Haldane (1919) proposed an empirically based one that fit well the Drosophila data presented in his paper; however, its lack of theoretical underpinning may be a reason for its nonuse. Kosambi (1944) proposed a formula in which coincidence relates to recombination between two points as c = 2r. This mapping function fits most data fairly well (Crow and Dove, 1990).

For short map distances, e.g., less than 10 cM, most mapping functions have the same behavior in terms of conversion between recombination frequency and genetic distance; however, when genetic length increases, mapping functions show strong divergences. For example, the Haldane (1919) mapping function tends to overestimate genetic distances and underestimate recombination fractions, as compared with other functions. Thus, it is expected that, when the chromosome intervals between markers are long, the estimations of introgression become sensitive to the underlying assumption about interference, which in previous related works has been assumed to be zero.

Predictions of global donor genome proportions were compared for the Haldane (1919) and Kosambi (1944) mapping functions. Thus, for the first case (the Haldane mapping function) and k = 0; for the second case {delta}(d) = K(d) and k = 1 (Table 3) . The comparisons were made for the three marker arrays previously treated and Backcrosses 1 and 2. In the first case, both estimations are fairly similar, which may be due to the closeness of the markers. In the second case, the differences are larger, especially for Backcross 2; this must be due to the long distance between the markers and the chromosome ends. In the third case, there is a considerable difference for Backcross 1. As can be seen in the last case (Table 2), the distances between flanking markers and chromosome ends are long. However, there is no consistency in terms of direction of the difference between both estimations.


View this table:
[in this window]
[in a new window]
 
Table 3 Theoretical fractions of donor genome for two backcrosses, calculated with the aid of two different mapping functions. The calculations are based on the chromosomes and marker arrays represented in Table 1

 
The maximum discrepancy found between the global proportions of donor parent genome was 0.035 (Table 3, third case, Backcross 1), which may not be important in practical terms. However, to avoid bias as much as possible, the Kosambi (1944) mapping function is a better choice than the one of Haldane (1919), in the absence of specific information about interference in the biological material. Other choices are the functions proposed by Carter and Falconer (1951) with k = 3, and Pascoe and Morton (1987) with k = 2.


    Limitations and extensions of the model
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
There are several limitations in the approach presented here, for instance: (i) it is intended for selection based on markers only, (ii) the placement of the markers in the genetic map must be known, and (iii) it does not allow different selection intensities. The last limitation raises the problem of the number of plants to be screened each generation to assure the recovery of at least a few "ideal genotypes." However, the problem would arise only in the first backcross, because all the recurrent markers will remain fixed in the next generations, thus involving only selection of donor markers or target genes.

As a matter of example, for the first marker array (Table 2), there will be approximately 0.596% of ideal genotypes in Backcross 1. This figure is obtained by assuming a Poisson distribution of chiasmata along the chromosome and multiplying the probabilities of recombination or no recombination in each interval [Numerically this is 0.5 x (1 - 0.275) x 0.165 x (1 - 0.165) x 0.165 x (1 - 0.275)]. The first factor is the probability of a gamete with the recurrent marker located at 20 cM. The second factor is the probability of no recombination between the first and the second recurrent marker, i.e., the conditional probability of recurrent marker at 60 cM given recurrent marker at 20 cM, and so on. Thus we need to screen 771 plants to have a probability of 0.99 to recover at least one ideal genotype [this number is calculated by solving for n in (1 - 0.00596)n = 0.01]. For Backcross 2, one expects to recover 41.8% of ideal genotypes, and the same value applies to the next backcross generations. For the case of the second marker array (Table 2), the expected percentages of ideal plants recovered in Backcrosses 1 and 2 are 6.5 and 50%, respectively. For the third marker array, the values are 1.36 and 50%, respectively.

A better strategy, in terms of reducing the number of plants to be screened, is the use of a set of recurrent markers in the first generation, and then a different set in the next generation. For instance, for the third marker array (Table 2), one can use the recurrent marker placed at 60 cM in Backcross 1 and the recurrent marker placed at 100 cM in Backcross 2. This way, it is expected to have 8.2% of "ideal plants" in Backcross 1 and 8.2% in Backcross 2. Approximately 54 plants would have to be screened in each backcross, totaling 108, to have a probability of 0.99 to recover at least one "ideal" genotype in each generation. If all the markers were selected in each generation, 337 plants would be needed in Backcross 1 and 7 in Backcross 2, totaling 344, i.e., more than three times the number required by the first strategy. In terms of reduction of donor genome, there is very little toll to pay with the second strategy. A fraction of 0.222 of donor genome was estimated for Backcross 2, against 0.196 (Table 2) with the first and more expensive strategy.

The model presented here can be applied in a straightforward way to the already mentioned case of different markers in each generation. Suppose that we want to apply marker-assisted selection in two backcross generations, each one with a different set of markers. In this case, the equation to estimate the expected global percentage of donor genome (G) will have the product g(x) g*(x) instead of g(x). The first function g(x) will have the parameters associated with the marker array in the first backcross and b = 1; the second function, g*(x), will have the set of parameters associated with the marker array to be used in the second backcross and b = 1. In general terms, a product of functions will be used, with each factor corresponding to one backcross generation, and fixing b = 1.


    Conclusions
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
The model presented herein can be useful to predict the global fraction of a backcross genome that is coming from the donor parent and its sampling variance, as well as the probability of donor parent genome in specific sites of the chromosomes. It can be directly applied only in those cases that involve selection of "ideal genotypes," i.e., it is not robust for variation in selection intensity. However, it can be applied to make predictions when a different marker array is used in each generation to reduce the number of plants to be screened.

This model requires knowledge of the marker positions and estimation is based on the genetic map, rather than the genome itself. Therefore, the global fractions of donor genome are actually fractions of the genetic map and, although related to the physical map, it is not the same. The model does not distinguish regions of the genome that have the same nucleotide sequence between both donor and recurrent parent, thus what we call donor genome is the genetic material coming from the donor parent by DNA replication.

Use of different mapping functions gives different results, although the differences found in the cases treated here were not great from the practical point of view. However, use of mapping functions that are more reliable than those traditionally used for estimations in marker-based introgression does not introduce further complications to the model.

When phenotypic and marker selection is considered, along with any chosen selection intensity, a simulation-based method may be a more promising predictive tool than the analytical approach used in this work.


    ACKNOWLEDGMENTS
 
This work was supported by Universidad Autónoma Agraria Antonio Narro, in Mexico. The Mathematica computer programs developed to perform the calculations with the model herein presented are available on request to the email address mhreyes@uaaan.mx.

Received for publication November 30, 1998.
    Appendix A
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 
Derivation of the End–Recurrent Formula
The gamete from the F1 that will contribute to form the BC1 has undergone one meiosis, and the probability of a Position x of the target chromosome coming from the donor parent equals the probability of recombination between the Position x and the location of the recurrent marker (r). This probability will be called p. In a selected BC1 plant, the two homologues will have the recurrent marker, but one will have a probability p of donor genome at x and the other will have zero, so any gamete that will contribute to BC2 will have a probability p/2 of having donor genome at x. Following the same logic, the probability will halve each generation, so we obtain:

Now, the genetic distance between r and x is |x - r|, and the recombination fraction is {delta}(|x - r|), where {delta}(d) is a mapping function that converts a genetic distance d to a recombination probability p. The formula in Table 1 is obtained by substituting p by {delta}(|x - r|).

Derivation of the Recurrent–Recurrent Formula
The gamete from the F1 that will contribute to form the BC1 has undergone one meiosis, and the probability of a Position x of the target chromosome coming from the donor parent equals the probability of a double recombination event flanking the Position x within the interval (r1,r2). The conditional probability of this event, given that no recombination took place between the two recurrent parent markers, is (p1p2c)/(1 - p), where p1 and p2 are the recombination fractions between x and r1, and between x and r2, respectively, and p is the recombination fraction between r1 and r2. The factor c is the coincidence, which is assumed to be (2p)k, where k is a constant associated with the mapping function. As in the case of the end-recurrent formula, the probability of donor genome will halve each generation. Thus we have:

By substitution with a mapping function as in the end-recurrent formula, we obtain the formula in Table 1.

Derivation of the Recurrent–Donor Formula
Any gamete coming from F1 with the recurrent parent marker allele and the donor parent marker allele has undergone recombination between the two markers. To have donor parent genome at Position x, between both markers, recombination must have occurred between the recurrent parent marker and x, but not between the donor parent marker and x. The conditional probability of that event is:

where R1 and R2 denote recombination in the region recurrent–x and x–donor, respectively. The symbols p1 and p2 denote the respective probabilities of recombination. The expression in the denominator is equivalent to the probability of recombination p in the whole interval recurrent–donor. The symbol v is used as an exclusive "or." In the next backcross generations the probability of donor genome at x will depend on the absence of recombination in the interval x–donor, which is (1 - r2)b-1. The function fRD is obtained by the product P(R1|R1 v R2)(1 - Y2)b-1 and substituting by a mapping function as in the case of the recurrent-recurrent formula.

Derivation of the Donor–Donor Formula
Presence of recurrent parent genome at x requires double recombination, one in the interval donor–x and the other in x–donor. But the selected gamete showed no recombination between the markers. The conditional probability of double recombination given no recombination between the donor parent markers is:

where p1 and p2 are the probabilities of recombination in the intervals x–donor and donor–x, respectively, and p is the probability of recombination in the interval donor–donor. Thus the probability of donor genome at x in the bth backcross is:

The formula in Table 1 is obtained by substitution with a mapping function.

Variance of G
Following Stam and Zeven (1981), we define a Bernoulli random variable as follows

The variance can be written as

The derivation of the last expression can be seen on the paper of Stam and Zeven (1981). We already have a formula for the second term. For the first term we have






    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Genetic model
 Simulation results
 Mapping functions
 Limitations and extensions of...
 Conclusions
 Appendix A
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
G. Abalo, P. Tongoona, J. Derera, and R. Edema
A Comparative Analysis of Conventional and Marker-Assisted Selection Methods in Breeding Maize Streak Virus Resistance in Maize
Crop Sci., March 17, 2009; 49(2): 509 - 520.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
T. Ishii and K. Yonezawa
Optimization of the Marker-Based Procedures for Pyramiding Genes from Multiple Donor Lines: I. Schedule of Crossing between the Donor Lines
Crop Sci., March 1, 2007; 47(2): 537 - 546.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
C. Cervantes-Martinez and J. S. Brown
A Haplotype-Based Method for QTL Mapping of F1 Populations in Outbred Plant Species
Crop Sci., September 1, 2004; 44(5): 1572 - 1583.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (14)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Reyes-Valdés, M.H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Reyes-Valdés, M.H.
Agricola
Right arrow Articles by Reyes-Valdés, M.H.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome