|
|
||||||||
Dep. of Agronomy, Iowa State Univ., 1208 Agronomy Hall, Ames, IA 50011-1010
* Corresponding author (jjannink{at}iastate.edu)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: AIL, advanced intercross line ANOVA, analysis of variance cM, centimorgan MCMC, Markov chain Monte Carlo MSE, mean squared error QTL, quantitative trait locus
| INTRODUCTION |
|---|
|
|
|---|
To increase the accuracy of QTL mapping in species that do not share these characteristics, I propose taking advantage of the fact that, from a statistical point of view, recombination is a random event. A consequence of this fact is that in any given sample of recombinant progeny, the number of recombination breakpoints will not be uniform among progeny but will follow a distribution. Identifying progeny with a high number of recombination breakpoints to form a mapping population could therefore allow QTL positions to be mapped more accurately. The identification of such progeny will require genotype information from DNA markers. The idea, therefore, is to genotype a large number of recombinant progeny but retain only the optimal set for phenotyping. For this idea to be feasible, one must assume that phenotyping costs, rather than genotyping costs, are a major experimental constraint limiting the ability of researchers to develop data sets sufficient to identify QTL. While this assumption would have been farfetched only a few years ago, biotechnological developments have decreased the cost of genotyping much more rapidly than the cost of phenotyping (e.g., Jobs et al., 2003). Second, for species that are long-lived, phenotyping costs will include the maintenance of the progeny until such time as the traits of interest (e.g., yield and quality of harvestable fruit) can be measured, potentially a substantial cost. On the other hand, genotyping could be performed at an early life stage, allowing many progeny to be culled and thereby avoiding the costs of raising them. Third, quite expensive phenotypes can be envisioned, including obtaining microarray expression data for an individual at a number of time points. Finally, if the mapping population becomes a resource used by many researchers, the cumulative effort invested in phenotyping may be increased many fold.
I call the use of DNA marker information to select an optimal set of progeny before phenotyping selective phenotyping, in reference to the complementary idea of selective genotyping also introduced by Darvasi and Soller (1992). Darvasi (1998) proposed selective phenotyping in relation to a single marker interval known to contain a QTL. In that context, progeny would be genotyped only at flanking markers, and only progeny recombinant in that interval would be phenotyped. Here, I extend this idea to the whole genome. The objective of this study was to evaluate the potential of selective phenotyping to improve the accuracy of QTL mapping in whole genome scans, and to explore different methods for selecting the optimal set of progeny.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Two methods for selecting progeny for phenotyping were evaluated. In the maximum number of recombinations method (denoted maxRec below), marker scores are available for M progeny. For each progeny j, define cij = 1 if j is recombinant in marker interval i, and cij = 0 otherwise. The number of recombinant marker intervals for progeny j is counted, cj =
cij, and the N progeny with the highest cj are selected for phenotyping and analysis. While the maxRec method seeks solely to maximize the overall mapping information available among the selected progeny, the uniform number of recombinations method (denoted uniRec) seeks to maximize both the mapping information and the uniformity of its distribution across the genome. The rationale for this approach follows from the assumption that researchers do not know a priori where QTL are located so that it is desirable to have mapping information evenly distributed. In the maxRec method, the distribution of that information depends directly on the distribution of recombination events occurring in that interval in the M original progeny. The uniRec method I propose here is a simple way to meet the two objectives of maximizing total information and its uniformity. Other more optimal methods may exist.
Define dij = cij/mi, where mi is the map distance in centimorgans between the markers flanking interval i. The variable dij represents the amount of mapping information for interval i conferred by progeny j on a per map unit basis. The uniRec method is iterative. First, the single progeny with the highest dj =
dij is selected to form the initial set S of selected progeny. The following steps are then iterated until N progeny are selected:
dij; 
;
S) a score is calculated tj = 
cij; The value p di represents the value added to S by including a progeny that is recombinant in interval i. If S already contains much mapping information for interval i (di is high) then including a progeny that is recombinant for that interval adds less value than if S contains little mapping information for interval i (di is low). The score tj represents the sum across marker intervals of the value of including j in S.
Simulations of Mapping Populations and Analyses
The potential of the uniRec and maxRec methods to improve the accuracy of estimates of QTL position was tested by simulation. For simplicity, I used progeny from a backcross design, in which two inbred parents were crossed and their F1 progeny was backcrossed to one of the parents to create segregating progeny. The simulated genome contained six chromosomes, each 100 cM long. Molecular marker data was simulated for these progeny assuming evenly spaced markers, with marker spacings of 10 cM on chromosomes 1, 2, and 3 and 20 cM on chromosomes 4, 5, and 6. Recombination across marker intervals was assumed independent, as in Haldane's mapping function (Haldane, 1919). Plant geneticists often use Kosambi's mapping function (Kosambi, 1944). Empirical evidence, however, is lacking for the Kosambi assumption of increasing positive crossover interference as marker spacing becomes smaller. Indeed, negative crossover interference over short distances has been observed (Esch and Weber, 2002; Rong et al., 2004; Sherman and Stack, 1995), a condition under which Haldane's function would fit better than Kosambi's. Given that neither Haldane's nor Kosambi's assumptions appear to be empirically justified, I chose Haldane's, which is the simpler of the two.
A single QTL was simulated at the center of the marker interval closest to the center of each chromosome (resulting in positions of 45 cM and 50 cM from the beginning of the chromosome for chromosomes with 10 cM and 20 cM marker spacings, respectively). The QTL explained 7, 11, and 17% of the phenotypic variance for chromosomes 1 and 4, 2 and 5, and 3 and 6, respectively. Thus, the simulation allowed for a full factorial of two marker spacings and three QTL effects. The total genetic variance over the six chromosomes was 70% of the phenotypic variance.
The uniRec and maxRec progeny selection methods were applied to obtain either 100 or 200 progeny for phenotyping and analysis. That is, the value N was set at either 100 or 200. The original number of genotyped progeny M was set anywhere from 100 to 3200 such that the selected fraction N/M of progeny ranged over five values: 1, 1/2, 1/4, 1/8, or 1/16. The selected fraction of 1 corresponds to no selection, that is, all genotyped progeny were also phenotyped and the data analyzed. The other selected fractions correspond to progressively more stringent selection of which progeny to phenotype, with a maximum of 16 times more progeny genotyped than phenotyped. In total there were 20 simulation settings: two selection methods times two levels for the number of progeny analyzed times five levels of selected fractions.
The set of progeny selected was then analyzed using a Bayesian mapping analysis (Satagopan et al., 1996; Sillanpää and Arjas, 1998). The analysis assumed the presence of a single QTL on each of the six chromosomes. The model of the phenotype for progeny j was:
![]() |
k is the effect of QTL k, and
j is a residual distributed as
j ~ N(0,
2). The Bayesian analysis was implemented using Markov-chain Monte Carlo (MCMC) to obtain posterior distributions of the parameters, with particular interest being paid to the posterior distributions of the QTL location parameters,
k. Briefly, the parameters µ,
2, and
k were updated using a simple random-walk MetropolisHastings procedure (Gilks et al., 1996), as described in more detail in (Jannink and Wu, 2003), assuming positive improper uniform priors. The QTL location
k and progeny QTL genotypes xkj were blocked and updated jointly as follows. The proposal distribution for a new location, p
was uniform over the interval [max(0,
k b), min(100,
k + b)]. New QTL genotypes x*k = {x*k1,.., x*kj,..x*kN were then proposed from their full conditional distribution p
, where
is the vector of all
k, xk is the matrix of all QTL genotypes except those at QTL k, and m is the matrix of marker genotypes. The joint update of
*k and x*k was then accepted with probability
![]() |
k) is the prior distribution for QTL genotypes, which depends on the QTL location and the marker genotypes. This same update method is given by Yi and Xu (2001) in the context of QTL mapping under complex mating designs. The prior for QTL position was uniform over its chromosome. The Markov chain was burned in for 200 iterations and then run for 5000 iterations. New data sets and complete Markov chains were simulated 2500 times for each of the 20 simulation settings.
The measure of QTL mapping accuracy I chose was the posterior MSE relative to the simulated QTL position. Assuming a single QTL on a chromosome of 100 cM, the posterior MSE is calculated as MSE =
1000
2
d
where
is the simulated QTL position,
is the estimate of that position, and
(
) is the posterior density of
. If there is no information in the data relative to the QTL position, then the posterior is the same as the prior [
(
) = p(
)]. In that case, for a QTL simulated at the center of a 100 cM chromosome, the posterior MSE would equal 833 (cM)2. The MSE was calculated in practice using the Markov chain samples as
![]() |
s was the Markov chain sample for QTL location from iteration s. To ensure that 5000 MCMC iterations were enough to estimate the MSE, I also ran 125 simulations with Markov chains of 100000 iterations for the uniRec method with N = 200 and a selected fraction of 0.5. I found no evidence that short runs were biased relative to long runs: mean MSE for the short runs was not significantly different from mean MSE for the long runs. Variance among short run MSE was also not significantly different from variance among long-run MSE (data not shown). To determine error in MSE estimates caused by MCMC sampling, I used the "batch means" method (Roberts, 1996), breaking down the long runs into 20 batches of 5000 iterations each. For estimating MSE from a simulated data set, MCMC sampling error as a fraction of the variance among estimates from independently simulated data ranged from less than 1% for QTL of large effect to about 9% for QTL of smaller effect. These small MCMC sampling errors suggested that chains of 5000 iterations were sufficient to adequately estimate the MSE. Differences among selection methods and the effects of the number of progeny analyzed, the selected fraction, the marker spacing, and the QTL effects were assessed by a full factorial ANOVA. The selected fraction of 1 was not included in this ANOVA because maxRec and uniRec methods were identical for it. The factorial of five main effects led to 96 simple effects: 2 selection methods x 2 progeny numbers x 4 selected fractions x 2 marker spacings x 3 QTL effects. All effects were assumed to be fixed. A log transform was found to best normalize MSE values so that analyses were conducted and averages obtained on a log scale. Averages reported here were backtransformed from the log scale. Variances across replicate simulations in the MSE were calculated, resulting in a single variance for each of the 96 simple effects. Treatment effects on these variances were assessed by factorial ANOVA, but assuming that all four-way and five-way interactions could be folded into the error term. This led to an ANOVA model with 29 degrees of freedom in the error and 66 model degrees of freedom.
To explore the interaction between QTL effect size, marker spacing, and the selected fraction, a set of simulations with a similar genome of six chromosomes but with QTL of the same effect placed at the center of each chromosome was run. Each chromosome, however, had different marker spacings, with markers every 25, 20, 10, 5, 4, and 2 cM over the chromosomes. Analyses were run with 200 phenotyped progeny under no progeny selection versus a selected fraction of 1/16 under the uniRec method. The effects of the QTL were 11.6% of the phenotypic variance in one set of simulations, and 5% of the phenotypic variance in the other set of simulations, for respective total genetic variances of 70 and 30% of the phenotypic variance in these two sets of simulations.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
![]() |
![]() |
Of interest in QTL mapping is total number of recombinant intervals over the set S of progeny analyzed, RS =
R. Generally RS will also not follow a standard distribution, but since it is a sum over many independent random variables, it will be approximately normal. In the case of the simulations performed here, there were 30 intervals of 10 cM each, and 15 intervals of 20 cM each, such that E(R) = 5.19 and var(R) = 4.54. This expected number of detected recombinants of 5.19 is less than the genome size of 6 Morgans because double recombinants in some intervals would not be detected. Thus, if 200 progeny are obtained without selection, E(RS) = 200(5.19) = 1038. Given that maxRec is a simple directional selection method for high numbers of recombinations, the number of recombinations in S can be predicted using standard selection theory (Falconer and Mackay, 1996). These predictions are 1378, 1580, 1740, and 1876 for the 0.5, 0.25, 0.125, and 0.0625 selected fractions, respectively. In practice, using maxRec and averaged over the 2500 replicate simulations with N = 200, I obtained 1372, 1600, 1781, and 1945 recombinations for those respective selected fractions (Fig. 1) . The deviations from the predictions arise from differences between the actual distribution of RS and the normal distribution used to obtain the predictions (standard errors for the observed averages were below 1).
|
|
The discussion above of the effect of selecting progeny with high numbers of recombinants differs somewhat from the discussion given in Darvasi and Soller (1995), in that they decreased marker spacing with increases in recombination frequency due to additional rounds of random mating. This decrease in marker spacing ensured that the recombination frequency across marker intervals remained constant despite increases in overall recombination frequency. Using selective phenotyping, and at the highest selection intensity examined here, the recombination frequency approaches twice that expected under random selection. Thus, it may be appropriate to compare the MSE for a marker spacing of 20 cM but without selection to the MSE for a marker spacing of 10 cM and selective phenotyping with a selected fraction of 0.0625. The averages for those treatments were 66.3 versus 22.6, respectively, indicating that selective phenotyping reduced the QTL position MSE by 66%.
Further simulations provided information on the value of selective phenotyping relative to decreasing marker spacing for the purpose of accurately mapping QTL. For relatively large effect QTL (11.6% of the phenotypic variance), decreasing marker spacing all the way to 2 cM decreased average QTL position MSE without selection and for a selected fraction of 0.0625 cM (Fig. 3A) . Consequently, the average QTL position MSE without selection with 2-cM marker spacing (12.9 cM2) was statistically not significantly different from the average QTL position MSE under selective phenotyping with 10-cM marker spacing (14.3 cM2). This result means that, if markers are available for high-density genotyping and if QTL of large effect are of primary interest, selective phenotyping may not provide gains in the accuracy of QTL mapping. In contrast, for QTL of smaller effect (5% of the phenotypic variance), decreasing marker spacing below 5 cM did not decrease QTL position MSE without selection, but did decrease QTL position MSE for selective phenotyping (Fig. 3B). Darvasi et al. (1993), using maximum likelihood QTL mapping, also found that, in the absence of advanced intercrossing, the mapping accuracy of smaller-effect QTL benefited less from decreasing marker spacing than did the mapping accuracy of larger-effect QTL. In the case of selective phenotyping, the consequence of this phenomenon was that the MSE obtained under selective phenotyping at a marker spacing of 10 cM (53.0 cM2) could not be matched without selection, where the lowest observed MSE was 74.3 cM2. Furthermore, with decreasing marker spacing, the MSE under selective phenotyping declined further, to a value of 30.7 cM2 for the 2-cM marker spacing (Fig. 3B). Selective phenotyping therefore appears to be more useful when marker resources are not available for dense marker mapping and when researchers also seek to accurately map QTL of small effect.
|
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
Received for publication May 4, 2004.
| REFERENCES |
|---|
|
|
|---|
Related articles in Crop Science:
This article has been cited by other articles:
![]() |
P. Boddhireddy, J.-L. Jannink, and J. C. Nelson Selective Advance for Accelerated Development of Recombinant Inbred QTL Mapping Populations Crop Sci., June 26, 2009; 49(4): 1284 - 1294. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. F. Cardoso, G. J. M. Rosa, J. P. Steibel, C. W. Ernst, R. O. Bates, and R. J. Tempelman Selective Transcriptional Profiling and Data Analysis Strategies for Expression Quantitative Trait Loci Mapping in Outbred F2 Populations Genetics, November 1, 2008; 180(3): 1679 - 1690. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. V. Rockman and L. Kruglyak Breeding Designs for Recombinant Inbred Advanced Intercross Lines Genetics, June 1, 2008; 179(2): 1069 - 1078. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sillanpaa and F. Hoti Mapping Quantitative Trait Loci From a Single-Tail Sample of the Phenotype Distribution Including Survival Data Genetics, December 1, 2007; 177(4): 2361 - 2377. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. J. M. Rosa, N. de Leon, and A. J. M. Rosa Review of microarray experimental design strategies for genetical genomics studies Physiol Genomics, December 13, 2006; 28(1): 15 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Fu and R. C. Jansen Optimal Design and Analysis of Genetic Studies on Gene Expression Genetics, March 1, 2006; 172(3): 1993 - 1999. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||