Crop Science Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (10)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mackay, I.J.
Right arrow Articles by Caligari, P.D.S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Mackay, I.J.
Right arrow Articles by Caligari, P.D.S.
Agricola
Right arrow Articles by Mackay, I.J.
Right arrow Articles by Caligari, P.D.S.
Crop Science 40:626-630 (2000)
© 2000 Crop Science Society of America

CROP BREEDING, GENETICS & CYTOLOGY

Efficiencies of F2 and Backcross Generations for Bulked Segregant Analysis Using Dominant Markers

I.J. Mackaya and P.D.S. Caligarib

a Oxagen Limited, Milton Park, Abingdon, Oxon 14 4RY UK
b Dep. of Agric. Botany, School of Plant Sciences, The Univ. of Reading, Reading RG6 6AS, UK

p.d.s.caligari{at}reading.ac.uk


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 Discussion
 REFERENCES
 
Bulked segregant analysis (BSA) is being used increasingly as a screen for quantitative trait loci (QTL), which have been suggested to be more easily detected in backcross (Bc) populations than in the F2. However, for dominant markers the number of loci segregating in the F2 will be double that in the Bc, and the probability of false-positive results differs between F2 and Bc generations. This study was conducted to re-examine the relative value of Bc and F2 populations for use in BSA by using theoretical estimates of the genotypic composition of the selected bulks. It is shown that doubling the number of marker loci segregating in the F2 roughly halves the expected distance from the QTL to the nearest marker, while the bulk size in the F2 can be reduced to nearly one-half that of the Bc and still give the same probability of a false positive. The results show that for the same recombination frequency, the Bc is slightly superior to the F2 in its ability to detect QTL. However, if the likely distance of the nearest marker to the QTL is taken into account, the F2 is the more favorable generation. Overall, for dominant marker systems, the F2 is therefore the best generation in which to conduct BSA.

Abbreviations: a, average effect of an allele • AFLP, amplified fragment length polymorphism • Bc Backcross • BSA, bulked segregant analysis • cM, centimorgan • ISSR, inter-simple sequence repeat • PCR, polymerase chain reaction • QTL, quantitative trait locus • RAPD, random amplified polymorphic DNA


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 Discussion
 REFERENCES
 
BULKED SEGREGANT ANALYSIS was initially proposed for screening qualitative traits known to express variation at a single locus of large effect (Giovannoni, 1991; Hill et al., 1998; Michelmore et al., 1991; Lynch and Walsh, 1997). However, the simplicity and low cost of BSA have led to its use for more complex traits, including traits whose genetic control is unknown. This method is often restricted to segregating generations which are simplest and cheapest to produce, such as Backcross (Bc) and F2 generations. Wang and Paterson (1994) concluded that of these two generations, the Bc would generally be the best with which to work. Since BSA is being used increasingly as a first screen for QTLs, we have re-examined their conclusion, to test its validity under a range of different conditions and assumptions outlined below.

First, for a dominant marker system such as RAPDs, ISSRs, and AFLPs, only half the markers are expected to be informative in any particular backcross generation, whereas in the F2 generation all markers will be informative. To compensate for this, the number of markers screened when studying backcross populations could be doubled (Wang and Paterson, 1994). However, the increase in cost and effort of doubling throughput cannot be ignored. If a doubling of the number of markers screened for the backcross generation is practical, it is logical that the same number should also be screened in the F2. For this reason, comparisons of the merit of the F2 and Bc generations should take place with twice the number of markers segregating in the F2.

Second, BSA relies on informative individuals being grouped so that a particular genomic region is studied against a randomized genetic background of unlinked loci (Michelmore et al., 1991). The minimum size of the samples comprising the bulks is generally determined by the frequency with which these unlinked loci might be detected as polymorphic between the two bulked samples. This sample size will differ between backcross and F2 populations and between dominant and co-dominant marker systems. In this paper we refer to the detection of polymorphism between bulks in the absence of linkage to a QTL as a false positive.

Finally, a marker may fail to be detected as polymorphic between the two bulks in spite of linkage to a QTL. Such a failure will occur if the frequency of the rare allele in a bulk is sufficiently high that it can alter the banding pattern to one indistinguishable from the alternative bulk. Following Wang and Paterson (1994), throughout this paper we refer to the rare allele in a bulk as the "contaminant" allele. Wang and Paterson (1994) considered that if the average allele frequency of the common allele within the two bulks (i.e., averaging over different alleles in each bulk) was greater than 0.95, then BSA would generally be successful in detecting the difference between the two bulks, since at this low frequency the rare allele would be undetectable (Gilbert et al., 1999). Allele frequency in a bulk may fall below this threshold for two reasons. First, the phenotypic effect of the QTL may be too small for selection to generate sufficiently large differences in allele frequency between bulks. Second, recombination between the QTL and linked marker may reduce the difference in marker allele frequency between the bulks to an undetectable level. Wang and Paterson (1994) discussed the first possibility, but not the second. Hill (1998) discussed both possibilities, but only considered the Bc generation.

The objectives of the work presented here were to reassess the conditions under which it is advantageous to use F2 as opposed to Bc generations when carrying out BSA and to determine the relative advantages and disadvantages of using these two segregating generations.


    Materials and methods
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 Discussion
 REFERENCES
 
Distance of Markers from Target QTL
Bishop et al. (1983) derived the following formula for the probability of at least one marker within a specified map distance of a major gene:

(1)
in which C is the haploid number of chromosomes, L is the total map length in centimorgans, X/2 is the desired distance within which a marker must fall, and N is the number of segregating markers. This formula is valid only if X is less than the length of the shortest chromosome in the genome. Assuming that, in general, at least one chiasmata must occur per chromosome, then the minimum map length of a chromosome will be 50 cM, and this formula will be valid for desired intervals up to 25 cM. Since linkages in excess of 25 cM will be of little use in selecting for a linked QTL, effectively this formula covers the values of linkage of practical interest to plant breeders.

Genome sizes were chosen to represent a typical range of sizes. For example, the smallest size shown here—10 chromosomes and a total map length of 500 cM—is close to that reported for sugar beet, which has a map length of about 500 cM and a haploid chromosome number of nine (Nilsson et al., 1997). The largest size reported here (5000 cM, 25 chromosomes) is similar to that reported for cotton (26 chromosomes, predicted minimum map length 5125 cM) (Reinisch et al., 1994).

Four target map intervals were chosen: 1 and 2 cM, to represent tight linkage, and 10 and 20 cm representing loose linkage.

Probability of False Positives
We define a false positive for a marker as the observation of a difference in banding pattern between the two BSA pools in the absence of linkage between that marker and the QTL.

The probability of a random bulk of sample size n having a band, and a second bulk of the same size drawn from the same population not having that band (the probability of a false positive) can be calculated by the following formulae:

(2)

(3)

(4)

The first of these formulae was given in Michelmore et al. (1991). These formulae do not assume that all individuals in each bulk are identical. For example, for the backcross, we assume that a mix of heterozygotes and homozygotes will give a different banding pattern to that for a pool which is all homozygous. For large pool sizes, a low frequency of contamination (one or at most two individuals) of an otherwise homozygous pool by heterozygotes or the alternative homozygote might not affect the banding pattern. This effect is not taken into account in Eq. [2] to [4]. As a result, comparisons between pool sizes will be biased in favor of the larger pool size, where the true false positive frequency will be relatively higher than calculated.

Average Allele Frequency in a Bulk
Hill (1998) presented an exact method for calculating the genotypic composition of selected pools, given in formula [5] below.

(5)

P(n) is the probability P(n1...,nj...,nk) that there are nj individuals of genotype j out of a total of N individuals in the pool selected from a total of M individuals in the population. Genotype j has a frequency in the population qj and a genotypic value, measured in phenotypic standard deviation units aj. {Phi}(y) denotes the distribution function and {phi}(y) the density function of the standardized normal distribution.

We have used this formula to calculate genotype and allele frequencies at the QTL for selected bulks from both F2 populations and Bcs. The average allele frequency at marker loci was then calculated using expected values of the allele frequency at the QTL and the specified recombination frequencies between the QTL and the markers. Genotype frequencies at the marker locus were calculated on the assumption that these frequencies followed a multinomial distribution. Hill (1998) also gives a method for exact calculation of genotype frequencies at linked loci, but for the F2 this method becomes complex. In all cases where the exact frequency was checked against the frequency assuming a multinomial distribution, agreement was very good. All calculations were carried out using a computer spreadsheet.

An alternative method to calculate the allele frequencies in the selected pools is to calculate the relative proportions falling above or below a specified truncation point on a mixture of normal distributions (Wang and Paterson, 1994). In simulations, we found this method to slightly overestimate the frequency of the common allele in the bulk (results not shown). This was because, in the context of BSA, fixing the truncation point is not exactly equivalent to fixing the proportion selected. A similar effect was highlighted by Young (1976) in a study of sequential selection procedures in plant variety trials. Computer simulations in which truncation points were fixed and numbers selected were allowed to vary gave good agreement with the Wang and Paterson method (results not shown).

Note that both methods assume only that the residual variation within each QTL genotype is normally distributed. It is not a requirement that the total distribution of the segregating generation is normal.


    Results
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 Discussion
 REFERENCES
 
Distance of Markers from Target QTL
The probability of detecting a marker within a specified distance for a range of different marker numbers for four different genome sizes is given in Table 1 . Similar results have also been reported in the context of animal breeding (Haley, 1991). To a very good approximation, for the cases considered here, a halving of the number of segregating markers doubles the map interval within which at least one marker is expected to be found with the same probability. This follows since, if the desired interval with a marker, X, is small relative to the total genome map length (L), and the number of segregating markers (N) is large, and ignoring the correction for chromosome ends, Eq. [1] reduces to

(6)
from which it is clear that a doubling of marker number is exactly compensated by a halving of map interval. Equation [6] is equivalent to assuming a Poisson distribution of markers of the specified interval X/L, and corresponds to the formula given by Jacob et al. (1991).


View this table:
[in this window]
[in a new window]
 
Table 1 Probability of finding at least one marker within a desired interval of a randomly distributed QTL for different genome sizes and chromosome numbers

 
Table 2 gives the expected distance from a randomly distributed QTL within which at least one marker will be found for a range of specified probabilities. For any desired probability a doubling of the number of markers roughly halves this distance. Equation [6] gives poor agreement with these results for the larger calculated ranges, but agreement is acceptable where the expected distance is smaller, i.e., < 20 cM.


View this table:
[in this window]
[in a new window]
 
Table 2 Distance in cM of closest marker to a given QTL for specified probability, given different genome sizes

 
Probability of False Positives
The probability and the average number of false positives increases as the number of markers increases (Table 3) . However, for pool sizes of 10 of more, even with very large numbers of markers, the probability of finding a false positive is low. Since with large numbers of markers, it is impossible for all markers to act independently, these probabilities will be biased upwards—the true probability of a false positive will be lower than those tabulated. It may be possible to calculate more accurate probabilities of false positives by assuming that the genome is saturated with markers and then calculating a genome-wise probability of a false positive. This would be achieved following the methods used to determine the significance of the maximum lod score in linkage analysis with multiple markers (Sham, 1998; Lander and Kruglyak, 1995), but this has not been attempted. Nevertheless, it remains clear that the probability of a false positive is much lower for the F2 than for the Bc generation. For a dominant marker, to a very good approximation, equivalent probabilities of a false positive are given with a pool size of N in the F2 and of 2N - 1 in the Bc. This formula assumes that twice the number of markers is expected to be segregating in the F2.


View this table:
[in this window]
[in a new window]
 
Table 3 Probability and expected number of false-positive markers in bulked segregant analysis of Bc and F2 populations with varying pool sizes

 
Average Allele Frequency in a Bulk
Results are presented in Table 4 , comparing probabilities of getting 0, 1, and 2 contaminant individuals in a bulk selected from a population of size 250 for different values of a, the average effect of the allele at the QTL. In all cases, pool size for the F2 was 5, but pool size for the Bc was 9—these values being chosen to give equivalent low probabilities of a false positive in the two population types. The QTL was assumed to be completely additive, and the phenotypic variance for each genotype at the QTL, regardless of population type, was taken as 1. It is clear that average allele frequencies are very similar for the Bc and F2 bulks with the different bulk sizes chosen here, although the allele frequencies in the Bc are slightly higher. This is also apparent looking at the probability of there being 0, 1, or 2 contaminants in the bulk. However, a contaminant allele is more likely to be detected in an F2 bulk of size 5 than in a Bc bulk of size 9.


View this table:
[in this window]
[in a new window]
 
Table 4 Allele frequencies and probability of contaminants for marker loci under bulked segregant analysis in F2 (pooled size 9) and Bc (pool size 5) populations of size 250

 
For dominant markers, the probabilities given in Table 4 can be used directly to compare relative likelihoods of success of BSA analysis in the Bc and F2. This is because, for both generations, a linked marker will be detected if one pool has an allele frequency below the threshold for detection of the dominant allele and the other pool has a frequency of the same allele above the same threshold. Therefore, for the population and pool sizes considered in Table 4, which have been chosen to be realistic, the similarity of probabilities between generations means that BSA using dominant markers to detect additive QTLs is of roughly equal power in both the F2 and the Bc. However, the probabilities in Table 4 are for single marker loci. Twice the number of dominant markers will be informative in the F2 so that the expected recombination frequency between the closest marker in the F2 and the QTL will be half that in the Bc. Comparisons of pool composition should therefore take place at a recombination frequency in the Bc of twice that in the F2. In this case, there is a clear advantage to the F2 as the starting generation for BSA. This argument assumes that recombination frequency and map distance are equal, which is acceptable for the range of recombination frequencies (0–0.16) considered in Table 4.


    Discussion
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 Discussion
 REFERENCES
 
A clear conclusion to come out of this study is that BSA has only a very weak probability of detecting a QTL, unless it is of very large effect. This agrees with statements by both Hill (1998) and Wang and Paterson (1994). Given that BSA is to be used, the results we have presented indicate that the F2 is more likely to detect a QTL, at least for an additive trait with a dominant marker system. This is a result of twice the number of dominant markers segregating in the F2 compared to the Bc, and of the size of the bulk in the F2 being nearly one-half that of the Bc for the same probability of a false positive. For a dominant trait, the F2 will allow detection of QTLs with smaller effect, although the power of the Bc is unchanged (Wang and Paterson, 1994). The data given in this paper can therefore be regarded as giving conservative estimates of the relative merit of the F2. However, for a dominant QTL, the number of informative dominant marker loci is reduced by one-half for both the Bc and the F2; i.e., the dominant marker allele and the dominant QTL allele must be linked in coupling.

For a codominant marker system, the merit of the Bc will improve, since all markers will be informative, so it is probable that the Bc will be the best generation for BSA. However, in practice, BSA is used where there are considerable cost restraints, and the lower cost of PCR-based dominant markers is likely to make these the preferred choice.

If Bc generations can be produced with both parents, BSA can be carried out by comparing a high performing bulk from the high performing Bc with a low performing bulk from the other Bc. There are several advantages to this approach: all markers will be informative, the probability of a false positive for a dominant marker is reduced to 1/2n, and for a dominant QTL, there is no risk of choosing the backcross in which the QTL shows no genetic variation. If this approach is not possible, then on balance it would be preferable to use the F2 as a starting generation, simply because for a dominant QTL both phenotypic classes are guaranteed.

An additional factor which will favor the Bc over the F2 as a starting generation is that residual genetic variation will be lower within QTL classes in the Bc. As a result, contamination in the Bc bulks will be reduced compared to the F2, although the magnitude of the reduction will depend on the heritability of the trait. In the extreme, if only one QTL is segregating, all residual variation will be environmental and the comparisons between the F2 and Bc in Table 4 will be correct. With a heritability of 1, however, the Bc will be favored; rescaling residual variances within both generations to a value of 1.0, an additive effect in the Bc would be 1.41 times the additive effect of the F2. Thus values for a = 1 and a =1.5 in the F2 could roughly be compared with values for a = 1.5 and a = 2.0 in the Bc. There is then a clear advantage to the Bc, even if the expected closer recombination frequency between the nearest marker and the QTL in the F2 is taken into account. However, for any particular trait, the magnitude of the within-QTL variance in the F2 and the Bc will depend on the heritability of that trait. In practice, we believe that in the circumstances where BSA has any chance of succeeding in detecting a QTL, namely one QTL of large effect accounting for the bulk of the variation, with no other QTL of large effect segregating, then the F2 will still compare acceptably with the Bc.Giovannoni Wing Ganal Tanksley 1991


    ACKNOWLEDGMENTS
 
This work was partly supported by a Royal Society Industry Fellowship to one of us (IJM). Work was completed while the senior author was affiliated with Dep. of Agric. Botany, School of Plant Sciences, The Univ. of Reading, Reading RG6 6AS, UK, and Lion Seeds Ltd. Woodham Mortimer, Maldon, Essex CM9 6SN, UK.

Received for publication February 3, 1999.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results
 Discussion
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (10)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mackay, I.J.
Right arrow Articles by Caligari, P.D.S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Mackay, I.J.
Right arrow Articles by Caligari, P.D.S.
Agricola
Right arrow Articles by Mackay, I.J.
Right arrow Articles by Caligari, P.D.S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome