Crop Science Grow Your Career with CSSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Frisch, M.
Right arrow Articles by Melchinger, A. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Frisch, M.
Right arrow Articles by Melchinger, A. E.
Agricola
Right arrow Articles by Frisch, M.
Right arrow Articles by Melchinger, A. E.
Related Collections
Right arrow Cell Biology & Molecular Genetics
Right arrow Crop Models
Right arrow Crop Genetics
Crop Science 41:1485-1494 (2001)
© 2001 Crop Science Society of America

CROP BREEDING, GENETICS & CYTOLOGY

Marker-Assisted Backcrossing for Introgression of a Recessive Gene

Matthias Frisch and Albrecht E. Melchinger*

Institute of Plant Breeding, Seed Science, and Population Genetics, University of Hohenheim, 70953 Stuttgart, Germany

* Corresponding author (melchinger{at}uni-hohenheim.de)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Molecular markers are used to trace the presence of target genes (foreground selection) and accelerate recovery of the recurrent parent genome (background selection) in backcross programs. In this study, we present an approach for introgression of a recessive target gene from a donor into the genetic background of a recipient line by foreground selection combined with background selection for reducing the donor chromosome segment around the target gene. The goal of the proposed breeding plan is to generate with a given probability, q2, up to the second backcross generation (BC2) at least k >= 1 individuals, which carry the target gene and are homozygous for the recurrent parent alleles at flanking markers, by means of a minimum number of individuals. We provide formulas for calculation of (i) the population size required in generation BC1 and (ii) the probability of success q2 of the breeding program in generation BC2. The latter depends on the number and genotype of the BC1 individuals selected for further backcrossing and the size of their BC2 families. The optimum allocation of individuals to generations BC1 and BC2 was determined by computer simulations for various map distances between the target gene and the flanking markers. Our approach is demonstrated by a numerical example and can assist breeders in the optimum design of breeding programs for marker-assisted introgression of a recessive gene.

Abbreviations: BCt, t-th backcross generation • cM, centimorgan • M, Morgan • QTL, quantitative trait loci


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
MANY IMPORTANT GENES in breeding for resistance and quality traits are inherited recessively. In conventional backcross programs for introgression of a recessive target gene, that gene's presence or absence in a backcross individual is determined by a phenotypic assay of progeny generated either by selfing or by crossing to the donor parent (Allard, 1960). As an alternative to this time-consuming method, flanking molecular markers can be used as a diagnostic tool to trace the presence of the target gene (foreground selection) in successive backcross generations. By this approach, presence of the target gene must be tested either by selfing or crossing to the donor only at the end of the breeding program. In addition, markers with a good coverage of the entire genome can be used to select for rapid recovery of the recurrent parent genome (background selection).

Marker-assisted foreground selection was proposed by Tanksley (1983) and investigated in the context of introgression of resistance genes by Melchinger (1990), who presented an a priori approach for calculating the minimum number of individuals and family size required in recurrent backcrossing. Marker-assisted background selection was proposed by Young and Tanksley (1989) and investigated by various authors (Hospital et al., 1992; Openshaw et al., 1994; Visscher et al., 1996; Frisch et al., 1999a,b). Hospital and Charcosset (1997) investigated combined foreground and background selection for introgression of favorable genes at quantitative trait loci (QTL). They presented an a priori approach to calculate the required population size for the case of several target loci and map positions estimated with varying accuracy. To our knowledge, no previous study exists concerning combined foreground and background selection for introgression of a recessive gene with known map position.

The objectives of this study were to (i) devise a breeding plan for combined foreground and background selection for introgression of a recessive gene, (ii) provide formulas for calculation of the required population size in generation BC1, (iii) derive the probability of success of the breeding program in generation BC2 depending on the number and genotype of the BC1 individuals selected for further backcrossing and family size of their BC2 progeny, and (iv) present simulation results with respect to the optimum allocation of resources in generations BC1 and BC2 for various distances between the target gene and flanking markers. Following Frisch et al. (1999a), we adopted an a posteriori approach in which the design of generation BC2 is determined on the basis of the known marker genotypes of the BC1 individuals selected for further backcrossing.


    MODEL
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Assumptions
Under the assumptions (a) the average number of crossovers formed on a chromatid is equal to its length in Morgan units and (b) the locations of crossovers are uniformly and independently distributed on the chromatid, the number of crossovers formed in a chromatid segment of length d follows a Poisson distribution with parameter d. Assumptions (a) and (b) imply that neither chiasma interference nor chromatid interference occur (Stam, 1979). Furthermore, the probability pr of recombination between two loci is related to their map distance d (in Morgan units) by Haldane's (1919) mapping function

(1)

Definitions
We consider a chromosome of length L. Positions on the chromosome are represented by a scale (in Morgan units) ranging from 0 to L. The target locus is located at position x and two flanking markers, used for foreground selection, are located at positions ml and mr (Fig. 1). If only one marker is used for foreground selection, we assume without loss of generality that it is located at position mr. Two markers located at positions yl and yr are used for background selection. Let d1 = x - yl, d2 = yr - x, {delta}1 = x - ml, {delta}2 = mr - x with {delta}1 < d1 and {delta}2cd2. The events A to H refer to the occurrence of recombination (i.e., an odd number of crossovers) between the loci delimiting the intervals [yl, x], [yl, ml], [ml, x], [x, mr], [mr, yr], [x, yr], [yl, mr], and [ml, mr], respectively. The corresponding probabilities pa to ph can be obtained from Eq. [1] by inserting the map distance between the loci delimiting the respective interval.



View larger version (6K):
[in this window]
[in a new window]
 
Fig. 1. Chromosome of length L with the target locus at position x, two markers for foreground selection at positions ml and mr, and two markers for background selection at positions yl and yr. The map distances of the target locus to the foreground selection markers are denoted with {delta}1 and {delta}2 and those to the background selection markers with d1 and d2.

 
Adopting the termimology of Hospital and Charcosset (1997), we denote by c- the genotype of an individual homozygous for the recurrent parent allele and by c+ the genotype of an individual heterozygous for the recurrent parent allele at the locus at position c (c {yl, ml, x, mr, yr}). We further define indicator variables Yl, Ml, X, Mr, and Yr, which take the value 1 if the marker at the respective position is heterozygous and 0 if it is homozygous for the recurrent parent allele.

Let A denote the set of markers employed in a backcross program. We have for one foreground selection marker A = {yl, mr, yr} and for two foreground selection markers A = {yl, ml, mr, yr}. The set {Omega}A contains all possible multilocus marker genotypes for a set of markers (|{Omega}A| = 23 = 8 for one foreground selection marker and |{Omega}A| = 24 = 16 for two foreground selection markers).

By definition, the subset G {subseteq} {Omega}A contains all multilocus marker genotypes with at least one background selection marker homozygous for the recurrent parent allele and at least one heterozygous foreground selection marker

(2)

The elements of G are listed in Table 1. In addition, we define the Genotype 0 consisting of heterozygous F1 individuals and G0 = G {cup} {0}.


View this table:
[in this window]
[in a new window]
 
Table 1. Formulas to calculate the probabilities p0,g (probability that a BC1 individual has marker genotype g), pg+|0,g (probability that a BC1 individual with marker genotype g carries the target gene), and Pg+,T+ (probability that a BC1 individual with marker genotype g+ generates a BC2 individual with marker genotype t+ T+); see text for detailed definitions of p0,g, pg+|0,g, and pg+,T+.

 
The subset T {subseteq} G contains all multilocus marker genotypes with both background selection markers homozygous for the recurrent parent allele and at least one heterozygous foreground selection marker

(3)

Let B = A {cup} {x} and {Omega}B the set of all possible multilocus genotypes with respect to B. Thus, |{Omega}B| = 16 for one foreground selection marker and |{Omega}B| = 32 for two foreground selection markers. By the same token as above, we define the following two subsets for carriers of the target gene:

(4)

(5)

Elements of the sets G, T, G+, and T+ are denoted with the lowercase letters g, t, g+, and t+, respectively.

We define the following probabilities.

  1. p0,g: Probability that a BC1 individual has marker genotype g.
  2. p0,g+: Probability that a BC1 individual has marker genotype g+.
  3. pg+|0,g: Conditional probability that a BC1 individual carries the target gene under the condition that it has marker genotype g.
  4. pg,T+: Probability that a backcross progeny of an individual with genotype g has a genotype t+ T+.
  5. pg+,T+: Probability that a backcross progeny of an individual with genotype g+ has a genotype t+ T+.

Equations for calculating the probabilities p0,g, pg+|0,g, and pg+,T+ from pa to ph are given in Table 1. The probabilities p0,g+ and pg,T+ can be calculated as

(6)

(7)

For further derivations, we also need the probabilities p0,T+ that a F1 individual generates by backcrossing an individual of marker genotype t+ T+. For one foreground selection marker, we have

(8)
and for two foreground selection markers

(9)

Basic Result on the Minimum Population Size
If a particular genotype occurs with probability p, the number m of individuals of this type in a sample of size n is binomially distributed with probability

(10)

The minimum population size n required to find with probability q at least one individual of a genotype, which occurs with probability p, can be derived from Eq. [10] as

(11)


    BREEDING PLAN
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
For introgression of a recessive gene with combined foreground and background selection for reduction of the intact donor chromosome segment around the target gene, we propose a breeding plan designed for producing at least one individual of genotype t+ T+ at latest in generation BC2 with a minimum expenditure. Such a breeding plan is fully described by answering the following questions.

  1. What is the necessary population size n1 in BC1?
  2. Suppose the marker genotypes of the n1 BC1 individuals are known. Which marker genotypes and how many individuals of each should be selected as parents for further backcrossing?
  3. What should be the size fg of a BC2 family produced from a selected BC1 individual of genotype g?

Population Size and Marker Analyses in BC1
Our approach for choosing n1 rests upon the fact that even with a large population size of several hundred individuals, the chances of finding a BC1 individual of genotype t+ T+ are small because this requires double crossovers in a small chromosome region. In most cases, the overall goal is reached by recombination between the target gene and a background selection marker on one side in generation BC1 and an analogous recombination on the other side of the target gene in generation BC2. Hence, a realistic goal for generation BC1 is to produce at least one individual which is (i) heterozygous for at least one foreground selection marker, (ii) homozygous for at least one background selection marker, and (iii) a carrier of the target gene. These three conditions are fulfilled by any individual with multilocus genotype g+ G+, but only the first two conditions can be determined by marker assays.

The minimum sample size n1 to assure with probability q1 that at least one individual of genotypes g+ G+ occurs in the BC1 population is derived from Eq. [11] as

(12)

A generalization of Eq. [12], which allows one to determine n1 to assure with probability q1 the presence of k individuals of genotype g+ G+, is presented in Appendix A.

The BC1 individuals are first analyzed for presence of the donor allele at the foreground selection marker(s). Individuals carrying the donor allele for at least one foreground selection marker are analyzed subsequently for the background selection markers. All BC1 individuals with marker genotype g G are potential parents for generation BC2.

Selection of BC1 Individuals and Family Size in Generation BC2
When several individuals with different genotype g G are found in the BC1 population, the experimenter must decide which and how many of them should be used as parents for producing the BC2 generation. This choice should be subject to the condition that a desired probability of success q2 is reached with a minimum number of individuals. Let denote (i) (og)gG the number of individuals with genotype g observed in BC1, (ii) (ig)gG the number of BC1 individuals with genotype g used for further backcrossing, and (iii) (fg)gG the size of a BC2 family produced from a BC1 individual with genotype g. If only few BC1 individuals with genotype g G are found and pg+|0,g or pg+,T+ are small, it may be necessary to back up one generation and backcross F1 individuals to the recurrent parent. We denote the respective parameters with i0 and f0 and set o0 = 1.

A certain parameter setting for generating the BC2 generation, consisting of the number individuals to be backcrossed ig and the respective family size fg for each marker genotype g G0, is denoted by gG0. The set of all admissible parameter settings gG0 is determined by (og)gG, the maximum possible family size m (which can be determined either by the multiplication rate of the species of the resources of the breeder), and the desired probability of success q2

(13)

The probability q of recovering at least one progeny of genotype t+ T+ when using the parameter setting gG0 is calculated as

(14)
where qg(ig, fg) is the probability of finding among the ig BC families of size fg at least one individual with genotype t+ T+

(15)

In Appendix A, we give an extension of Eq. [14] which can be used for calculating the probability that at least k individuals of genotypes t+ T+ are found with the parameter setting gG0.

The number of individuals required for the parameter setting gG0 is

(16)
and the optimum parameter setting gG0 is the one requiring the smallest number of individuals among all elements in

(17)

There is no closed analytical solution for the minimization problem in Eq. [17]. For finding a suitable parameter setting we propose to calculate the probability of success q (Eq. [14]) for various parameter settings gG0 and choose the one, which is element of and requires the smallest number of individuals.

Before calculating q for alternative parameter settings, it is useful to order the marker genotypes observed in the BC1 population with respect to their probability of obtaining a backcross progeny of genotype t+ T+ as follows:

(18)

This provides a clue as to which marker genotypes should be preferably backcrossed. A more adequate ordering of the genotypes with respect to their contribution to the total probability of success (Eq. [14]) can be obtained when preliminary information about the number ig of individuals to be backcrossed and the family size fg to be used is available by defining

(19)
where ih fh = ig fg.

Marker Analysis of BC2 Individuals and Progeny Testing
All BC2 individuals are marker assayed at the markers heterozygous in their nonrecurrent BC1 parent. BC2 individuals with marker genotype t T are either selfed or backcrossed to the donor genotype. Presence of the target gene in a backcross individual is determined by phenotypic evaluation of its progeny obtained wither by selfing or crossing with the donor parent.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Genetic Model
Following earlier studies (Hospital et al., 1992; Visscher et al., 1996; Hospital and Charcosset, 1997), we used Haldane's (1919) mapping function for modeling crossover formation during meiosis. It is well known that this is a simplified model because of the assumption of no interference (Stam, 1979). Since Haldane's pioneering paper, numerous researchers (e.g., Kosambi, 1944; Karlin and Liberman, 1978; Zhao and Speed, 1996; Browning, 2000) proposed alternative mathematical models which include interference. Most of the resulting map functions can be modeled by a stationary renewal process, the interevent distribution of which can be approximated by gamma distributions (Zhao and Speed, 1996). McPeek and Speed (1995) compared the fit of various crossover formation models and concluded that gamma interevent distribution fit best the Drosophilia dataset of Morgan et al. (1935).

We used Haldane's (1919) mapping function because of its mathematical simplicity and the stochastic independence of crossover formations in adjacent chromosome regions which allowed us to derive closed analytical formulas for the problems addressed in this paper. Applying gamma interevent distributions would in most instances yield unwieldy formulas which could only be numerically approximated. Moreover, as pointed out by Stam and Zeven (1981), dropping the assumption of no interference would reduce the generality of the presented approach because it would be necessary to know the type and degree of interference for the chromosome region of each target gene.

Under the assumption of positive chiasma interference (Stam, 1979), multiple crossovers in a given chromosome region occur less frequently than under the assumption of no interference. Consequently, if the target gene is located in a region with positive interference, the population sizes obtained by our equations are underestimated. The reverse holds true under the assumption of negative interference. In conclusion, the reader should be aware that the model presented (as with most mathematical models of biological systems) is not capable capturing every detail of the underlying biological process and the results presented should be interpreted with this in mind.

Comparison with Earlier Studies
Introgression of a recessive gene with combined foreground and background selection can be regarded as a special case of QTL introgression investigated by Hospital and Charcosset (1997) (one QTL with a zero-length confidence interval, two foreground selection markers, two background selection markers). Their Eq. [A.16] through [A.22] could be used to calculate the required population sizes of a breeding program for introgression of a recessive gene. Our approach differs from that of Hospital and Charcosset (1997) in three respects: (i) the definition of the goal of the breeding program, (ii) the selection strategy, and (iii) calculation of the population size required in each BC generation.

Concerning the goal of the breeding program, we propose to choose n2 such that at least one BC2 individual with genotype t+ T+ is obtained with probability q2. In contrast, Hospital and Charcosset (1997) use in their Eq. [A.16] through [A.22] the probability of finding at least one individual with marker genotype y-lm+lm+ry-r, but they do not include a condition about presence of the target gene in their criterion. (Note: By modifying the definition of the probability PM used in their paper, their approach could be used to determine n1 and n2 to generate with a certain probability at least one individual with genotype t+ T+.)

The main differences with respect to the selection strategy are (i) we propose to select as many promising BC1 individuals as required to each a desired probability of success q2 in generation BC2, while Hospital and Charcosset (1997) based their calculations on selection of a single BC1 individual; and (ii) we consider all BC1 individuals with marker genotype g G as potential parents for producing generation BC2 and select individuals of one or several genotypes g G0, on the basis of their effect on the probability q, i.e., depending on the marker distances d1, d2, {delta}1, and {delta}2. In contrast, Hospital and Charcosset (1997) propose to select an individual with all foreground selection markers carrying the donor allele and they do not distinguish between individuals of genotype y-l and y-r, even when d1 != d2.

In our approach, the number of BC1 individuals selected for further backcrossing and the respective family size of their BC2 progeny are determined after knowing the marker genotype of the BC1 individuals (i.e., a posteriori). In contrast, Hospital and Charcosset (1997) propose to calculate the population size for all backcross generations before starting the breeding program (i.e., a priori). Taking into account the marker genotype of the selected BC1 individual(s) has the following advantages: (i) only the number of BC2 individuals actually required to ascertain a given probability of success q2 is generated, and (ii) the desired probability of success q2 is reached irrespective of the outcome in generation BC1. Both properties follow directly from the Theorem of Total Probability.

The advantages of the a posteriori approach were previously demonstrated for the simpler case of marker-assisted background selection in combination with phenotypic selection for a dominant target gene (Frisch et al., 1999b). We give here only a short numerical example. Assume d1 = d2 = 0.10 M, {delta}1 = 0.03 M, and {delta}2 = 0.05 M. With our approach, the optimum population sizes to find with probability q2 = 0.99 at least one BC2 individual with genotype t+ T+ are n1 = 77 and n2 = 102 (Table 3). Applying the approach of Hospital and Charcosset (1997), the optimum population sizes to find with probability 0.99 at least one BC2 individual of genotype y-lm+lm+ry-r are n1 = 106 and n2 = 188. Besides requiring more than 100 additional individuals, an individual with marker genotype y-lm+lm+ry-r carries only with probability (1 - pc)(1 - pd)/(1 - ph) = 0.88 the target gene.


View this table:
[in this window]
[in a new window]
 
Table 3. Estimates of the optimum population size n*1 and the respective expected population size E(n2) required to obtain with probability q2 = 0.99 at least one BC2 individual with genotype t+ T+ in two-generation backcross programs with two foreground and two background selection markers. The values of n*1 and E(n2) depend on the marker distances D1, d2, {delta}1, and {delta}2 (see Fig. 1).

 
Direct selection for presence of a dominant target gene combined with marker-assisted background selection, investigated in a recent study by Frisch et al. (1999b), can be considered as a special case of our treatise on combined foreground and background selection by setting {delta}1 = {delta}2 = 0. In this case, the target gene cosegregates perfectly with the marker alleles and indirect selection simplifies to direct selection. Consequently, only three of the nine genotypes listed for two foreground selection markers in Table 1 can occur; together with genotype 0, they correspond exactly to Types 1 to 4 defined by Frisch et al. (1999b). Moreover, because pu,T+ is identical with pu+,T+, the ordering of the elements in G0 based on Eq. [18], reduce to the ordering proposed for a two-generation, marker-assisted backcross program for a dominant target gene (Frisch et al., 1999b). Furthermore, any probability of success q2 can be reached with each of these four genotypes with ig = 1 and a suitable family size fg calculated according to Eq. [15]. These family sizes correspond to the numbers n 1 ... n 4 for a two-generation background selection program given by Frisch et al. (1999b).

Rationale of the Breeding Plan
Besides the selection strategy, the core of the proposed breeding plan is (i) the definition of the subset G of marker genotypes considered as promising parents for producing generation BC2 and (ii) the definition of the subset T of marker genotypes, which satisfy necessary conditions for a successful outcome of the backcross program. Marker genotypes with Ml + Mr = 0 are not included in G because with high probability they do not carry the target gene. Furthermore, homozygosity at both foreground selection markers for the recurrent parent allele results in p{omega},T+ = 0 for all {omega} {Omega}A with Ml + Mr = 0 and, hence, q{omega}(i{omega}, f{omega}) = 0 for arbitrary i{omega} and f{omega}. Likewise, genotypes with Yl + Yr = 2 are excluded from G because with respect to the goal of reducing the donor genome around the target gene, they show no improvement compared with F1 individuals. In addition, they may have lost the target gene. In Appendix B, we give a mathematical proof that for all {omega} {Omega}A with Yl = Yr = 2 and arbitrary i and f the probability q{omega}(i, f) < q0(i, f) (i.e., each of these genotypes performs worse than F1 individuals in producing BC progeny with genotype t+ T+).

The definition of G is also closely related to the question of how to proceed if no individual with genotype g G is found in generation BC1. In principle, one can either back up one generation and use an F1 individual or backcross a BC individual with Yl + Yr = 2. Two aspects must be considered in this choice: (i) F1 individuals carry with probability 1 the target gene, while for BC1 individuals with {delta}1 > 0 and {delta}2 > 0 the probability p{omega}+|0,{omega} < 1 for {omega} {Omega}A and Yl + Yr = 2 and (ii) F1 individuals have on the noncarrier chromosomes an expected proportion of the recurrent parent genome of 0.50 compared to 0.75 for BC1 individuals. Hence, with two tightly linked foreground selection markers and a BC individual with genotype y+lm+lm+ry+r, the advantage of a higher recurrent parent genome proportion on the noncarrier chromosomes may be worthwhile to be taken at the cost of the small risk of loosing the target gene. However, when only genotypes y+lm-lm+ry+r or y+lm+lm-ry+r are found in BC1, backing up one generation may be more appropriate. In this treatise, we concentrate on the region around the target gene and defined G0 = G {cup} {0}. However, replacing this definition by G0 = G {cup} permits using the given framework of equations for breeding programs, in which BC1 individuals of genotype y+lm+lm+ry+r are preferred over F1 individuals.

Individuals with Yl + Yr = 0 and Ml + Mr >= 1 form the set T. Homozygosity at both background selection markers warrants a donor chromosome segment smaller than d1 + d2. While this applies also to the marker genotypes y+lm-lm+ry-r and y-lm+lm-ry+r, they are excluded from T because heterozygosity at a background selection marker indicates a second donor chromosome segment tightly linked to the target gene and, hence, the ultimate goal of reducing the donor genome around the target gene is not achieved by these genotypes.

Selection Strategy
The ranking of genotypes g G according to Eq. [18] warrants a maximum qg for ig = 1 and fg = 1 because in this case, Eq. [15] reduces to pg+|0,g pg,T+ which equals pg,T+. However, for iu > 1 and/or fu > 1, the ranking of the genotypes with respect to the value of q reached with a certain n2g = ig fg does not necessarily remain constant. For a given ig and large family sizes fg, the probability qg converges to 1 - B(iu, 0, pu+|0,u), while for a fixed fg and increasing ig the probability qg converges to 1.

This is illustrated in Fig. 2 for marker distances d1 = d2 = 0.10 M and {delta}1 = {delta}2 = 0.02 M. For ig = 1 and increasing fg, the probability qg converges to 0.50 and 0.99 for BC1 individuals with Ml + Mr = 1 and Ml + Mr = 2, respectively, and to 1.00 for F1 individuals. Up to a family size of about 10, the initial ranking of the genotypes warrants a maximum qg, while with larger family sizes the genotypes with Ml + Mr = 2 and Yl + Yr = 1 reach higher qg than genotypes with Ml + Mr = 1 and Yl + Yr = 2. With family sizes larger than about 200, F1 individuals reach larger values for q0 than all genotypes with Ml + Mr = 1. Note that the intersection of the curves for genotypes g,g' G0 can be obtained algebraically with the aid of Eq. [15]. For fg = 1 and increasing ig, the initial ranking of the genotypes remains constant.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Probability qg that at least one BC progeny with genotype t+ T+ is generated by backcrossing individuals of Genotype g G0. In the left diagram (A), the family size fg derived from one backcrossed individual (ig = 1) is increased. In the right diagram (B), the number of backcrossed individuals ig is increased for family size fg = 1. Marker distances are d1 = d2 = 0.10 M and {delta}1 = {delta}2 = 0.02 M.

 
For selection of BC1 individuals, the discussed properties of qg have the following consequences: (i) it may not be possible to reach a desired q2 by backcrossing only one individual, and in particular, when selecting individuals with only one heterozygous foreground selection marker, ig must be chosen sufficiently large; (ii) for each genotype g G, there is a family size beyond which further increments in fg result only in a marginal gain in qg; and (iii) when choosing individuals as parents for generation BC2, the comparison about which genotypes to prefer (Eq. [14] and [15]) must be made with the family sizes that will be employed in the breeding program.

Optimum Allocation of Resources
Because the number of selected individuals and the family sizes for generation BC2 are determined after knowing the outcome of generation BC1, these parameters can be chosen such that any desired probability of success 0 <= q2 < 1 is reached, irrespective of the choice of q1. Nevertheless, q1 is one of the key parameters in determining the optimum design of a breeding program. Small values for q1 result in small n1. In consequence, the probability of finding BC1 individuals which generate by further backcrossing with a high probability BC2 individuals of genotype t+ T+ is low. Hence, in this case a large number n2 of BC2 individuals must be produced to reach a desired q2. In contrast, large values for q1 result in large BC1 populations and consequently a high probability of finding BC1 individuals which require a smaller population size n2 to reach a certain value of q2 in generation BC2.

We investigated the effect of the choice of q1 on the expected total number of individuals N = n1 + E(n2) required in two-generation backcross programs with computer simulations (the computer program is provided upon request). [E(n2) is the expected population size required in generation BC2.] Population sizes n1 were chosen such that q1 ranged from 1 - 10-0.5 = 0.683772 to 1 - 10-5 = 0.99999 (Eq. [12]). With a Monte-Carlo method, we generated for each value of n1 20000 BC1 populations. For each population the parameter space was searched for the combination g G0 requiring the minimum number of individuals n2. The results were averaged over the repetitions in order to obtain an estimate of E(n2).

In a first series of simulations, we determined E(n2) required to reach q2 values of 0.900, 0.990, and 0.999 for marker distances d1 = d2 = 0.10 M and {delta}1 = {delta}2 = 0.02 M. For q2 = 0.900 and q2 = 0.990, the optimum values q*1 = 0.987 and q*1 = 0.994 minimizing N were greater than the respective value of q2, whereas for q2 = 0.999 the optimum value q*1 = 0.998 was smaller than q2 (Fig. 3). However, for values of q2 = 0.990 and q2 = 0.999 the slope of the graphs were small and the choice q1 = q2 resulted in a design which required only a few more individuals than the optimum design. This shows that the obvious choice q1 = q2 has in general no optimum properties with respect to N, but was fairly close to the optimum for q2 = 0.990 and q2 = 0.999.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 3. Estimates of the expected total number of individuals n1 + E(n2) required in a two generation backcross program (marker distances: {delta}1 = {delta}2 = 0.02 M, d1 = d2 = 0.10 M) in order to generate with probability q2 = 0.900, 0.990, and 0.999 at least one BC2 individual with genotype t+ T+. The values depend on the probability q1 of obtaining at least one BC1 individual with genotype g+ G+.

 
With a second series of simulations, we determined optimum values n*1 minimizing N for varying marker distances and probability q2 = 0.99. For the investigated combinations of marker distances d1, d2, {delta}1, and {delta}2, the optimum design required larger populations in generation BC2 than in generation BC1 irrespective of whether one or two background selection markers were employed (Tables 2 and 3). For constant d1 and d2, the ratio n1:E(n2) increases with increasing {delta}1 and {delta}2. Tight linkage between the target gene and foreground selection markers was important with respect to the total number of individuals required. For example, a breeding program for introgression of a target gene in the center of a 20-cM background selection marker bracket required on average a total of 145 individuals when the target gene was completely linked to the foreground selection marker ({delta}1 = 0) (Table 2). Almost the same number of individuals (147) are required when two foreground selection markers with distance 1 cM are used (Table 3), because the probability of double crossovers in a 2-cM chromosome region is very low. However, with only one foreground selection marker located 5 cM distant from the target gene, a total of 252 individuals is required (Table 2).


View this table:
[in this window]
[in a new window]
 
Table 2. Estimates of the optimum population size n*1 and the respective expected population size E(n2) required to obtain with probability q2 = 0.99 at least one BC2 individual with genotype t+ T+ in two-generation backcross programs with one foreground and two background selection markers. The values of n*1 and E(n2) depend on the marker distances d1, d2, and {delta}2 (see Fig. 1).

 
A short background selection marker bracket requires considerably more individuals than a larger one. For example, with one foreground selection marker at distance {delta}2 = 0.05 M and d1 = d2 = 0.20 M, the optimum design requires a total of 106 individuals, whereas with d1 = d2 = 0.10 M, a total of 252 individuals are required (Table 2). This reflects that high expenditures are required for obtaining an individual with a short donor chromosome segment around the target gene.

Instead of using the total number of required individuals as criterion for optimization, one could also optimize the breeding program such that the total number of marker data points is minimal. We choose the first criterion, because when using background selection for reducing the donor chromosome segment around the target gene, the difference in the required number of marker data points for alternative parameter settings is small and DNA extraction is the major cost factor. Furthermore, because of new developments in marker technologies we expect that the cost of marker assays further reduces in the future and, hence, optimization for the required number of individuals is more important from an economical point of view.

Numerical Example
We demonstrate the application of our approach in a breeding program with a numerical step-by-step example. The first decision concerns the flanking marker distances. In general, small flanking marker distances are advantageous because (i) heterozygosity at tightly linked foreground selection markers results in a high probability that an individual carries the target gene and (ii) homozygosity at tightly linked background selection markers results in a short donor chromosome segment around the target gene. Here, we consider using the marker distances and probability of success from the example in section "Comparison with Earlier Studies" (d1 = d2 = 0.1 M, {delta}1 = 0.03 M, {delta}2 = 0.05 M, and q2 = 0.99) and a maximum possible family size of m = 200.

We choose the population size n1 = 77 because this value minimizes the expected total number n1 + E(n2) of individuals required for the gene introgression program (Table 3). Let us assume, we marker-assayed the BC1 population and found the numbers (og)gG of individuals with marker genotype g, which are listed in Table 4. We now rank the observed marker genotypes according to Eq. 18 (Ranking 1) and 19 (Ranking 2). For Ranking 2 we use ig = 1 and fg = 102, because this corresponds to the expected population size E(n2) = 102 (Table 3) under the considered parameter settings. Marker genotype y-lm+lm-ry-r is most favorable under Ranking 1, while under Ranking 2 marker genotype y+lm+lm+ry-r is most favorable. Therefore, we first consider backcrossing individuals from these two genotypes.


View this table:
[in this window]
[in a new window]
 
Table 4. Parameters for generation BC1 in the numerical example (for a detailed description and definitions of symbols see text). r1 and r2 are the ranks of the observed marker genotypes according to Eq. 18 and 19, respectively.

 
We try to find the smallest family size for which q2 >= 0.99 when selecting exactly one individual of either of these marker genotypes and no individual of any other marker genotype. While for (ig, fg) = (1, 10) the marker genotype y-lm+lm-ry-r yields a higher q2 value than y+lm+lm+ry-r, it is not possible to reach q2 >= 0.62 by backcrossing only one individual of marker genotype y-lm+lm-ry-r, even when using the maximum family size m = 200 (Table 5). (Note that for this marker genotype Pg+|0,g = 0.62 (Table 4), see also the discussion of Fig. 2). The minimum population size to reach q2 >= 0.99 by backcrossing one individual of marker genotype y+lm+lm+ry-r is n2 = 105. This number is reduced to n2 = 102 when backcrossing two instead of one individual of marker genotype y+lm+lm+ry-r.


View this table:
[in this window]
[in a new window]
 
Table 5. Alternative selection parameters (ig, fg) in the numerical example (for a detailed description see text) and the resulting population sizes n2 (Eq. [16]) and probabilities of success q (Eq. [14]).

 
Now we investigate parameter combinations where individuals of two marker genotypes are selected. The previous calculations showed that for fg = 10 the marker genotype y-lm+lm-ry-r results in greater q2 values than the marker genotype y+lm+lm+ry-r. Therefore, we choose (ng, fg) = (1, 10) for g = y-lm+lm-ry-r. In combination with (ng, fg) = (2, 40) for g = y+lm+lm+ry-r, a probability q2 >= 0.99 is reached with n2 = 90 (Table 5). Also (ng, fg) = (1, 9) for g = y-lm+lm-ry-r, in combination with (ng, fg) = (1, 81) for g = y+lm+lm+ry-r reaches q2 >= 0.99 with n2 = 90. For marker genotypes g = y+lm+lm+ry-r and g = y-lm+lm+ry+r, the probabilities pg,T+ are almost identical (Table 4). Hence, the parameter setting (ng, fg) = (1, 9) for g = y-lm+lm-ry-r and (ng, fg) = (1, 81) for g = y-lm+lm+ry+r reaches also q2 >= 0.99 with n2 = 90.

Consequently, an optimum selection strategy is to backcross the BC1 individual with marker genotype y-lm+lm-ry-r with a family size of 9 individuals and to backcross one out of the five BC1 individuals with marker genotypes y+lm+lm+ry-r or y-lm+lm+ry+r with a family size of 81 individuals. These selection parameters combine a high selection intensity with a minimum number of required individuals.


    APPENDIX A
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Obtaining k Individuals of Genotype g+ G+ or t+ T+
The probability g of finding in a sample of n at least k individuals of a genotype, which occurs with probability p, is

(20)

(Bosch, 1993, p. 296), where F is the cumulative density function of the F-distribution. By defining c as the q percentile of the F distribution with parameters 2k and 2(n - k + 1),

(21)
we obtain the minimum population size n required to find with probability q at least k individuals as

(22)

This result can be used to generalize the presented approach in order to generate in generation BC1 at least k individuals of genotype g+ G+ and/or in generation BC2 at least k individual of genotype t+ T+. In generation BC1, the minimum sample size to generate with probability q1 at least k individuals of genotype g+ G+ is obtained by inserting in Eq. [21] and [22]:

(23)

For a given gG0, the probability of recovering at least k BC2 individuals of genotypes t+ T+ is

(24)
where

(25)
and

(26)

Equation [24] can be used instead of Eq. [14] to compare alternative parameter settings.


    APPENDIX B
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Proof of the proposition: For each genotype {omega} {Omega}A with Y1 + Y2 = 2, the probability q{omega}(i, f) < q0(i, f).


For the remaining two cases, we make use of the fact that B(i{omega}, s, p{omega}+|0,{omega}) is an increasing function of p{omega}+|0,{omega} and 1 - B(sf{omega}, 0, p{omega}+,T+) is an increasing function of p{omega}+,T+. Hence, Eq. [15] implies that the proposition holds true if p{omega}+|0,{omega} < 1 and p{omega}+,T+ <= p0,T+.


and

For symmetry reasons the proposition holds also true for {omega} = y+lm+lm-ry+r.


Received for publication July 31, 2000.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MODEL
 BREEDING PLAN
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES