|
|
||||||||
a Groningen Bioinformatics Centre, Inst. of Mathematics and Computing Sci., POB 800, NL-9700 AV, Groningen, The Netherlands
b Dep. of Agronomy, Iowa State Univ., Ames, IA 50011-1010, USA
c NCGR, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
* Corresponding author (r.c.jansen{at}cs.rug.nl)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: DH, doubled haploid haploIM, haplotype interval mapping haploMQM, haplotype multiple quantitative trait loci model haploMQM-, reduced haplotype multiple quantitative trait loci model IM, interval mapping MQM, multiple quantitative trait loci model QTL, quantitative trait loci
| INTRODUCTION |
|---|
|
|
|---|
AB,
AC, and
BC, respectively. Jannink and Jansen (2001) showed that a reduction in the number of estimated parameters could be achieved if
AC =
AB +
BC. This parameter reduction defined REDUCED vs. FULL models. Differences in the likelihoods of these models provided evidence of epistatic interaction occurring between the locus analyzed and other loci in the genetic background (Jannink and Jansen, 2001). At the same time, assuming additive gene action, the REDUCED model detected QTL with higher power than the FULL model. This method requires multiple crosses in a diallel structure, and its applicability is therefore restricted. In this paper we propose to broaden the applicability of reduced parameter models by focusing on short genome segments, determining the DNA-marker haplotype carried by each parent on such a segment, grouping parents that share common haplotypes, and formulating reduced models in terms of haplotype effects. We coin this new approach haploMQM. We apply the method to simulations with multiple related F2:3 populations and highlight possible strengths and weaknesses of the new approach.
| MATERIALS AND METHODS |
|---|
|
|
|---|
We now reparametrize these multipopulation models in terms of the effects of QTL alleles carried by inbred parents rather than in terms of allele substitution effects within each segregating population. We define two parents as interconnected if, in the system of inbred crosses studied, there exists a path of crosses joining them. For example, in the two-population system {P1 x P2; P2 x P3}, parents P1 and P3 are interconnected. In the system {P1 x P2; P4 x P3}, P1 and P3 are not interconnected. An interconnected system of populations, then, is one in which all parents are interconnected. An important property of such a system is that N
P - 1, where P is the number of inbred parents and N is the number of derived populations. The maximum number of estimable QTL effect parameters from a system of interconnected populations is N. A more parsimonious parametrization that is always possible for such a system is therefore to fix the effect of the allele carried by one arbitrary parent to zero and then to estimate the effects of the alleles carried by other parents as deviations from the fixed parent. From this point of view, the FULL model described above corresponds to a parametrization in which the interconnectedness of the diallel is ignored, and each cross is considered separately. The REDUCED model, on the other hand, recognizes the interconnectedness of the diallel between parents A, B and C, arbitrarily fixes the effect of the allele carried by one parent to zero, and estimates two further parameters: the effects of the alleles carried by the two remaining parents. From the perspective of parameter reduction between FULL and REDUCED models, we distinguish strongly and weakly interconnected systems of populations. In the former, N > > P, while in the later N
P. The diallel is a good example of strongly interconnected populations because it maximizes N relative to P, N = P(P - 1)/2, assuming no reciprocal crosses. Finally, among populations in a set available for analysis, not all will be interconnected. It will therefore be necessary to distinguish interconnected subsets and parameterize each independently.
In the development given above, we have identified QTL alleles according to the inbred parent from which they originate. Consider now two inbred parents that have retained the same genomic block from a common ancestor. Rather than identifying distinct QTL alleles for each parent, it would be more parsimonious to identify the single QTL allele carried by the common ancestor. To be able to benefit from this source of parsimony, we need to detect when inbred parents carry common genomic blocks. Assume that this task can be done so that, among a set of P parents, we now can identify H haplotypes (H
P) for a genomic block in a given region of the genome. Consequently, we are interested in the effects of the QTL alleles carried by the H haplotypes. In a process analogous to the one described above, we can determine interconnected sets of haplotypes as follows. If two haplotypes segregate in a population, they can said to be crossed. Two haplotypes are interconnected if there is a path of crossed haplotypes joining them. For haplotypes to be interconnected, it is sufficient that they be carried by interconnected parents. However, haplotypes can be interconnected even if they are not carried by interconnected parents. Consider the noninterconnected system of parents {P1 x P2; P4 x P3}. Assume that we have P1(H1), P2(H2), P4(H2), and P3(H3) where the haplotype carried by the parent is given in parentheses. Haplotypes H1 and H3 are then interconnected. Note that in evaluating haplotype interconnectedness, two new situations arise. First, if both crosses P1 x P2 and P1 x P4 were made, haplotypes H1 and H2 would be contrasted twice. Second, if the cross P2 x P4 were made, haplotype H2 would be contrasted to itself. Without loss of generality, haplotypes involved in such replicated or identity contrasts can be considered interconnected haplotypes.
Given these parametrizations, we distinguish the following models:
(i) Interval Mapping (IM)
![]() |
i is effect of the QTL allele carried by an arbitrary parent of population i (the effect of the allele carried by the other parent is set to zero), and eij
N
is an error residual for individual j (the error variance,
2e, is assumed equal across populations; it can also be taken unequal in cases of higher heritability). The independent variables xij depend on the QTL genotype carried by individuals ij. Table 1 shows parameterizations for various example populations. In practice, QTL genotypes remain unobserved, the xij are stochastic, and probabilities for the possible QTL configurations are calculated using flanking markers. For missing QTL or marker information, Jansen and Stam (1994) have shown that maximum likelihood estimates for the parameters µi,
i, and eij can be obtained within each population by an expectation-maximization procedure using weighted regression.
|
![]() |
The parameters are the same as in IM, save that allele effect
h1(k) is the effect of the QTL allele defined by haplotype h1 of P1 in interconnected system k, and
h2(k) is that of haplotype h2 of P2. Table 1 shows parametrizations for example populations derived from parents with identified haplotypes at a putative QTL locus; in this example
h1(k) is set to zero, whereas
h2(k) and
h3(k) are free and written in short notation as
2 and
3. At the map location under study and within each population, probabilities for the three possible QTL configurations (MM, MP, PP) are calculated using flanking markers as in IM.
(iii) Haplotype Multiple-Quantitative-Trait-Loci Models (HaploMQM)
The exposition above is in terms of a single QTL only. Multiple-QTL models allow statistical control of genetic background noise due to QTL on other portions of the genome that are segregating in the population using multiple regression on marker cofactors (Jansen and Stam 1994; Jannink and Jansen 2001).
![]() |
(iv) Reduced Haplotype Multiple-Quantitative-Trait-Loci Model (HaploMQM-)
This model is identical to haploMQM, except that separate intercepts are no longer estimated for each population. Instead, differences in population means are assumed to derive from the different haplotypes segregating in each population. Thus, in this model, differences in population mean contribute to the estimate of haplotype QTL allele effects. Index i is dropped from the µi parameter in the models above and the model becomes:
![]() |
Implementation Issues
To fit a QTL at a certain map position under study, a window around this map position is defined. The different parental haplotypes in this window are identified. If two parents share the same haplotype, then we assume that they transmit the same QTL allele to their offspring. The window can be based either on a fixed map size, say 5 or 10 cM, or on a fixed number of markers, say four. In the latter case, the possible number of haplotypes can be very large (e.g., 24 for haplotypes of four biallelic markers and 54 for haplotypes of four five-allelic markers). However, if parents are derived from few ancestors, then the number of different haplotypes will be smaller than the number of parents, leading to the situation of strongly interconnected haplotypes that enables a large reduction in the number of estimated parameters. We here focus on such situations. To determine haplotypes, we use a window of four adjacent markers. Parents are grouped according to their haplotype and the allelic effects of the parental haplotypes are estimated from the data on the multiple populations using the models described above.
When analyzing multiple small populations, cofactor parametrization in MQMs can be problematic due to the large number of parameters involved (e.g., one cannot fit simultaneously 30 marker cofactors to a population with 10 offspring only; such simultaneous fitting would be possible in populations with 50 offspring, but if 60 such populations were analyzed, the computational burden would be excessive). Haplotype-based combining of cofactor parameters should reduce this problem substantially. Thus, haploIM can be extended to haploMQM by adding haplotype-based marker cofactors. For each marker cofactor, the local haplotypes form the basis for the reduction of parameters associated with that marker, in the same way as shown in Table 1. Since the clustering of parents is based on local haplotypes, clustering is likely to be different for each marker cofactor and different from the clustering at the focal QTL.
Procedure for Quantitative Trail Loci Analysis
We here briefly describe the haploMQM procedure (the approaches for haploIM and for haploMQM- are identical, except that in the former no cofactors are used and in the latter independent intercepts are not estimated for each population). Conceptually, the procedure for haploMQM is identical to the MQM procedure described in detail in Jansen (2001)(p. 581590). Three markers were chosen per chromosome to be candidate cofactors, leading to 3 x 10 = 30 candidate cofactors across the entire genome. Using all candidate cofactors, we calculated a bias-adjusted residual variance that was used for all further estimations on the population. We used backward elimination to retain in the model only those cofactors that explained a significant proportion of the variance. To determine whether to retain a cofactor in the backward elimination procedure, we used a threshold T such that Prob(Fdf1,df2 > T) = 0.02, with degrees of freedom df1 equal to the number of parameters of the cofactor and df2 equal to the residual degrees of freedom in the all-cofactor model. To locate a QTL, we then scanned the full genome in 5-cM steps. We first calculated the likelihood of the data in the presence of a QTL but without parameter reduction due to haplotyping (LFull), and with parameter reduction (LHaplo). Procedures without haplotyping used the likelihood ratio to quantify QTL likelihood:
![]() |
Procedures with haplotyping used the likelihood ratio:
![]() |
Large values of the LRQTL statistic indicate support for the presence of a QTL using the haplotype model. Note that the no-QTL likelihood is different in the haploMQM and the haploMQM- cases. In particular, since haploMQM- models the population means using QTL and cofactors, the likelihood LnoQTL- includes no contribution of the putative QTL to the modeling of population means. In general, then, LnoQTL- can be quite a bit smaller than LnoQTL.
To determine genome-wide significance thresholds for these statistics, one can perform simulation runs on populations generated without genetic variance or permutation. In the present discussion, an ad hoc approach was utilized for illustrative purposes just by taking a much more stringent threshold per test: (approximate) thresholds for QTL detection at
= 0.001 per test:
2(60; 0.001)
102 for IM,
2(15; 0.001)
38 and
2(37; 0.001)
69 for haploIM, and haploMQM can be used in Simulations 1 and 2. The degrees of freedom used to determine the
2 threshold derive from the number of parameters estimated by each model. The IM procedure estimates 60 QTL parameters, one for each family; in Simulation 1, an average of 15 haplotypes were distinguished for any given locus so that 15 allelic effects needed to be estimated; in Simulation 2, the average number of haplotypes distinguished was 37. In our simulations the number of QTL parameters varied little across the genome. In other simulations, where the degrees of freedom for the QTL (dfQTL) may vary more notably, it can be better to divide QTL likelihoods by the local dfQTL for graphical display and use local thresholds
2(dfQTL; 0.001) or determine such thresholds by simulation or permutation.
Two further likelihood ratios could be of interest:
![]() |
Large values of the LRHaplo statistic indicate problems with the haplotype parametrization that could be due to failure of marker haplotypes to correctly group parents in terms of the QTL allele that they carry or due to strong QTL x genetic background interaction such that the same QTL allele would have divergent effects in the different families in which it segregates. One can also calculate
![]() |
Large values of LRMeans indicate failure of the QTL and selected cofactors to predict differences among population means. Such a failure could be due to the presence of QTL undetected by the mapping procedure or due to epistatic effects on family means not accounted for by the models presented. In the calculation of LRQTL, the fact that LHaplo- is smaller than LHaplo is compensated for by the fact that LnoQTL- is smaller than LnoQTL.
Simulating Early Generation Progeny Tests in Maize Breeding
Early generation progeny tests in maize breeding are often referred to as first- and second-year topcross tests. We simulated marker and trait data for 60 related F2:3 populations of size 10 each, with one QTL on each of chromosomes 1 to 5. Note that a basic property of the testcross design is that it eliminates dominance as a source of genetic variance. To mimic the genome of maize, the genome in our simulation consisted of 10 linkage groups, each containing 101 biallelic marker loci with 2-cM map distance between adjacent pairs (using Haldane's mapping function). The genotype and phenotype data were generated in a number of steps as follows.
Step 1: Generate Base Population of Candidate Parents
The following protocol was used for generating a base population of inbred lines with different (re)combinations of ancestral linkage blocks. First we crossed a hypothetical inbred parent homozygous for all markers (say, 11111 and so on) with a parent also homozygous for all markers but carrying different alleles (say, 22222 and so on), and generated a set of 400 doubled haploid (DH) lines. For Simulation 2, we used a higher recombination frequency of 0.2 between adjacent markers (instead of 0.02) to mimic more generations of recombination. The resulting genotype of a DH line consisted of linkage blocks of 1s and of 2s of different size, for example, 111211222, and so on. Next, we generated multiallelic marker genotypes (five alleles per marker) from these linkage blocks. This was accomplished by assigning marker alleles to the linkage blocks, linkage block by linkage block, from a multinomial distribution with frequencies of marker alleles: 0.55, 0.24, 0.12, 0.06, and 0.03, respectively. For example, the original genotype 111221 contains three linkage blocks, 111, 22, and 1. A marker allele is randomly assigned to each linkage block. Thus, after sampling marker alleles for linkage blocks, the original genotype 111221 could become 444113. The multiallelic state of marker loci is similar to the polymorphism index that has been observed in simple sequence repeat markers in maize (Senior et al., 1996). Finally, a QTL allele was placed in the middle of linkage groups 1 to 5. If the linkage block surrounding the middle of the linkage group carried marker allele 1, then a favorable allele (denoted +) was placed there, else an unfavorable allele (denoted -) was placed there. The phenotypic effect of a QTL was such that each QTL was expected to contribute 15% to the total phenotypic variation in a population where all five QTLs were segregating (heritability of 75%; populations where less than five QTLs segregate will have lower heritability).
Step 2: Select Sixty Parent Pairs
Pairs of parents for multiple crosses were selected from the base population. The 400 lines in the base population can be crossed amongst each other in various combinations. In Simulation 1, pairs of parents were selected in such a way that the two parents of a cross were
10% related, that is,
90% of the 1010 marker loci are polymorphic in each population. In Simulation 2, pairs of parents were selected so that the two parents of a cross were
45% related.
Step 3: Generate Sixty F2:3 Populations
Parents were crossed to generate offspring. The marker profile and genetic value of each progeny was determined based on the markers and QTL segregating between its parents, using the laws of Mendelian segregation and recombination. Each population consisted of 10 offspring in Simulation 1, and 50 offspring in Simulation 2.
Step 4: Randomly Sample Two Hundred Markers Available for Analysis
As a last step in the simulation procedure, the set of 1010 markers was reduced: in each of the simulations, 200 loci were randomly sampled from the genome and only these marker data were available for analysis. This resembles currently available marker density.
| RESULTS |
|---|
|
|
|---|
|
| DISCUSSION |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
Received for publication April 17, 2002.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Yu, J. B. Holland, M. D. McMullen, and E. S. Buckler Genetic Design and Statistical Power of Nested Association Mapping in Maize Genetics, January 1, 2008; 178(1): 539 - 551. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-B. Veyrieras, L. Camus-Kulandaivelu, B. Gouesnard, D. Manicacci, and A. Charcosset Bridging Genomics and Genetic Diversity: Linkage Disequilibrium Structure and Association Mapping in Maize and Other Cereals Crop Sci., December 18, 2007; 47(Supplement_3): S-60 - S-71. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Stich, J. Yu, A. E. Melchinger, H.-P. Piepho, H. F. Utz, H. P. Maurer, and E. S. Buckler Power to Detect Higher-Order Epistatic Interactions in a Metabolic Pathway Using a New Mapping Strategy Genetics, May 1, 2007; 176(1): 563 - 570. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Crepieux, C. Lebreton, B. Servin, and G. Charmet Quantitative Trait Loci (QTL) Detection in Multicross Inbred Designs: Recovering QTL Identical-by-Descent Status Information From Marker Data Genetics, November 1, 2004; 168(3): 1737 - 1749. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Podlich, C. R. Winkler, and M. Cooper Mapping As You Go: An Effective Approach for Marker-Assisted Selection of Complex Traits Crop Sci., September 1, 2004; 44(5): 1560 - 1571. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Cervantes-Martinez and J. S. Brown A Haplotype-Based Method for QTL Mapping of F1 Populations in Outbred Plant Species Crop Sci., September 1, 2004; 44(5): 1572 - 1583. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mihaljevic, H. F. Utz, and A. E. Melchinger Congruency of Quantitative Trait Loci Detected for Agronomic Traits in Testcrosses of Five Populations of European Maize Crop Sci., January 1, 2004; 44(1): 114 - 124. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Plant Registrations | Soil Science Society of America Journal | ||||
| Journal of Natural Resources and Life Sciences Education |
Journal of Environmental Quality |
||||