Crop Science Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Related articles in Crop Science
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cervantes-Martinez, C.
Right arrow Articles by Brown, J. S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Cervantes-Martinez, C.
Right arrow Articles by Brown, J. S.
Agricola
Right arrow Articles by Cervantes-Martinez, C.
Right arrow Articles by Brown, J. S.
Related Collections
Right arrow Crop Genetics
Right arrow Biometrics
Published in Crop Sci. 44:1572-1583 (2004).
© 2004 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA

CROP BREEDING, GENETICS & CYTOLOGY

A Haplotype-Based Method for QTL Mapping of F1 Populations in Outbred Plant Species

Cuauhtemoc Cervantes-Martineza and J. Steven Brownb,*

a University of Florida, C/O USDA-ARS, SHRS, 13601 Old Cutler Road, Miami, FL 33158, USA
b USDA-ARS, SHRS, 13601 Old Cutler Road, Miami, FL 33158, USA

* Corresponding author (miajb{at}ars-grin.gov).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
The integration of quantitative trait loci (QTL) analysis into breeding strategies rather than being seen as separated processes has been proposed to increase the power and accuracy of QTL detection and to allow the two activities to be joined. The main objective of this research is to develop a specific scheme for mapping QTL in actual breeding F1 populations of outbred plant species with a high degree of accuracy. The proposed method groups populations by common founders and statistically associates founder-origin probabilities that trace the common founder haplotypes in a given region of the progeny genome with the phenotypic expression, using a linear model with a structured covariance matrix. The method was applied to computer simulated data sets, corresponding to five F1 populations of 100 individuals each obtained from the crosses of a common founder with several other founders. We are currently using this scheme with cocoa (Theobroma cacao L.) crosses, using selected clones resistant to specific diseases to widen the genetic base of disease resistance. The results indicate that the position and effect of QTLs in the common founder, that explain each at least 14% of the phenotypic variance, can be estimated with good precision and accuracy. The theoretical assumptions on which this approach was developed render the method appropriate for outbred plant species that are highly heterozygous, which is often the case in tropical tree crops like cocoa, and have phenotypic traits that show few interlocus interaction effects.

Abbreviations: QTL, quantitative trait loci • REML, restricted maximum likelihood


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
ACCURATE QTL ANALYSES have been developed in recent years to detect and estimate the effects of quantitative trait loci in plant populations with different genetic structures. While high resolution QTL maps can be obtained from large populations of annual plant species developed from crossing inbred lines followed by self-fertilization for two or more generations, the analysis of quantitative trait loci is more difficult in outbred plant species. Some of the difficulties arise when heterozygous heterogeneous parents are crossed to develop a mapping population, in which parents are differentially informative at different loci. To be informative, a parent must be heterozygous both at marker loci and a linked QTL. Complications arise if parents have alleles in common at the QTL or marker loci, or if the parents share QTL alleles in different linkage phases with the marker loci (Jansen et al., 1998; Lynch and Walsh, 1998). In addition, the biological properties of some outbred species, like fruit trees and forest trees, impose limiting factors for mapping QTL. The number of generations per time unit and the progeny size per space unit are usually fewer than in annual species, resulting in lower power for QTL detection. Luo (1993), Soller and Genizi (1978), and Weller et al. (1990) have confirmed that very large progeny sizes are needed to detect QTL with good statistical power in outbred populations for different designs. However, extremely large progeny sizes and designs requiring two or more generations are not generally feasible in practice for trees and some other outbred plant species. For example, the most recent QTL maps for yield components, vigor, resistance to Phytophthora palmivora (E.J. Butler) E.J. Butler, beans traits, and ovule number in T. cacao were estimated from F1 populations ranging from 88 to 125 individuals. These were obtained from the cross of a highly homozygous clone (Catongo) with other heterozygous clones (DR1, S52, and IMC78) (Clement et al., 2003a, 2003b). Therefore, alternative accurate methods must be developed for mapping QTL given the conditions and genetic structure of outbred plant species. Beavis (1998) first proposed the integration of QTL analysis into cultivar development to increase the resolution of QTL detection, by integrating mapping analyses across the numerous and large populations typically used by maize (Zea mays L.) breeders.

A haplotypic method for QTL analysis in trees species using founder-origin probabilities that trace specific segments of the chromosomes in individual offspring as independent variables with phenotypic values as the dependent variable in a simple regression analysis has been proposed for one population using the granddaughter design (Reyes-Valdés and Williams, 2002). Their results were similar to those obtained by Haley et al. (1994) that used all marker information. This method requires, however, the information from three generations for QTL detection. In contrast, we suggest an approach that uses founder-origin probabilities in several F1 populations obtained in a full-sib mating design and combines the F1 populations with a selected common founder in a regression-based analysis, using a linear model with a structured covariance matrix (Searle, 1971; Littell et al., 1996). Jannink and Jansen (2001) and Jansen et al. (2003), assuming additive effects, showed that combining related breeding populations for QTL analysis increases the power and accuracy of detection, associated mainly with the increased progeny numbers in the combined analysis.

QTL mapping analyses based on linear regression models (Haley and Knott, 1992), such as the one we propose in this study, are approximate methods that generally give results similar to maximum likelihood methods (Lander and Botstein, 1989); however, they are computationally much less demanding and have greater flexibility to implement complex linear models (Piepho, 2000). The objective of this paper is to explain our method in detail and apply it to the analysis of computer simulated data of five F1 populations with a common founder in a manner similar to a currently used breeding scheme and structure of cocoa breeding populations (Clement et al., 2003a, 2003b).


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Haplotypic Conditional Probabilities
The founder-origin probabilities that trace specific haplotype segments of the genome of F1 individuals to the haplotype of their founders are developed here as an extension of the model for marker-based selection in gene introgression showed by Reyes-Valdés (2000) and Reyes-Valdés and Williams (2002). Consider two diploid founders PCF and PSFi of a population i (subscripts CF and SFi stand for common founder and second founder, respectively) and two informative marker loci, A and B, in the genotypic array for founder PCF: A1B1/A2B2 and for founder PSFi: Ai3Bi3/Ai4Bi4. The chromosome segments A1B1 and A2B2 are the first and second haplotypes of the founder PCF, and the segments Ai3Bi3 and Ai4Bi4 are the first and second haplotypes of the founder PSFi. Let H1x and H2x be specific alleles of the locus at the map position x between marker loci A and B, in the first and second haplotype of the founder PCF, and Hi3x and Hi4x be specific alleles in the first and second haplotype of the founder PSFi, in the same locus at a map position x, and let d1 and d2 be the map positions of marker loci A and B. The absolute map distances between markers is |d1d2| and the distances between the locus located at position x and A and B are |d1x| and |d2 x|, respectively. The map distances are converted to recombination fractions using the inverse of the Haldane mapping function (Haldane, 1919) assuming no interference,

[1]

The conditional probabilities that F1 individuals have inherited specific founder alleles in a locus at position x, given that they have specific marker haplotypes, are shown in Table 1. These probabilities are shown when the flanking markers are linked in coupling or repulsion phase in the founders genotypes. To determine the founder-origin probabilities for each particular F1 individual, the marker haplotypes that trace to either founder genome must be specified for every segment analyzed. Some difficulty arises when there are identical alleles in founders PCF and PSFi for the marker loci under consideration. If founders PCF and PSFi share one or two alleles for either locus A or locus B with equal or different linkage phase, only the F1 individuals with homozygous genotypes for that locus are informative. When the founder PSFi shares one ortwo alleles with the founder PCF for both loci, with the same or different linkage phase, only the doubly homozygous F1 individuals are informative. The linkage phase of markers is estimated from data with linkage analysis methods for full-sib families obtained with the cross of heterozygous parents (Maliepaard et al., 1997; Wu et al., 2002). The haplotypic conditional probabilities are calculated considering only informative F1 individuals with the equations stated in Table 1, for intervals delimited for informative flanking markers, implying that the length of the interval and number of informative F1 individuals may vary among intervals and founder haplotypes analyzed.


View this table:
[in this window]
[in a new window]
 
Table 1. Founder-origin probabilities for each possible haplotypic state of F1 progeny.

 
Marker loci at the ends of the linkage groups are considered in the analysis whether they are informative or not. If the marker loci at the ends are not informative, then founder-origin probabilities are calculated as follows: the founder-origin probabilities of the H1x allele in the locus at the position x between a non-informative marker extreme and an informative marker locus with genotypes AA and B1B2, respectively, are Pr(H1x|B1) = 1 – rBx and Pr(H1x|B2) = rBx. The founder-origin probabilities for the allele Hi3x are obtained in an analogous manner, and the conditional probabilities for the alleles H2x and Hi4x are the corresponding complementary probabilities. The founder-origin probabilities described above are calculated for all informative individuals in F1 populations, in every map position between informative flanking markers. The phenotypic data and the founder-origin probabilities are used for the QTL analysis as outlined below.

Linear Model Formulation
Consider a number q of F1 populations with a common founder (PCF) and a second nonidentical founder for every population . Let A and B be marker loci at a given map distance of the linkage group, and x a map position between markers A and B, with founder-origin probabilities PHCFx|ABij and PHSFx|ABij (Table 1) that an F1 individual j from population i with marker haplotype AB, has inherited specific founder alleles from PCF and PSFi, respectively, in a locus at position x. A basic linear model can be fit as follows:

[2]
where yij refers to the phenotypic values of individual j in population i; µ is the mean of population i; {alpha}HCFx and {alpha}HSFix are the parameters corresponding to the fixed effects of the allele at the first homologs (Jansen et al., 1998; Lynch and Walsh, 1998) of the founders PCF and PSFi for the putative QTL in a locus at position x in population i; {alpha}HCF,x and {alpha}HSFi,x are the parameters corresponding to the fixed effects of the allele at the second homologs of the founders PCF and PSFi; and {epsilon}ij is a random variable identically distributed with mean zero and variance {sigma}2i, that includes the background effect, the environmental variation, and the inadequacy of the model. The model [2] can be reparameterized as

[3]
Here, µ*i = µi + {alpha}HCF,x + {alpha}HSF,x; {alpha}*HCFx = {alpha}HCFx{alpha}HCF,x and {alpha}*HSFix = {alpha}HSFix{alpha}HSFi,x are the allele-substitution fixed effects (Jansen et al., 1998) of QTL alleles of founders PCF and PSFi in population i, respectively. A second model is also formulated to include the intralocus interaction among QTL alleles at the QTL loci, adding the dominance effects to the additive effects model [2], so that

[4]
where the coefficients {delta}HCFHSFix, {delta}HCFHSFi,x, {delta}HCF,HSFix and {delta}HCF,HSFi,x represent the fixed dominance effects between the QTL alleles of the locus at position x of founders PCF and PSFi. A reduction is achieved by setting the restriction on the dominance effects {delta}i = {delta}HCFHSFix = –{delta}HCFHSFi,x = –{delta}HCF,HSFix = {delta}HCF,HSFi,x, and using the allele-substitution effects in [4], resulting in

[5]
Here, {delta}i is dominance effect in population i.

The details of reparameterization of models [2] and [4] are shown in the APPENDIX. The models [3] and [5] are referred to here as the additive effects model, and the additive and dominance effects model, respectively.

Analyses of linear regression (Reyes-Valdés and Williams, 2002) can be implemented to estimate the models [3] and [5] for every F1 population, where µ*i is the intercept, and {alpha}*HCFx, {alpha}*HSFix and {delta}i are the regression coefficients. The additive effects, and the additive and dominance effects linear models for the single population analyses are written by convenience in matrix notation as

[6]
where yTi = is the vector of phenotypic observations of the population i, ni is the number of individuals in the population i; X1i = 1i as a vector ni x 1 of elements ones; ai = {lfloor}µ*i{rfloor} is the intercept for the population i; Zi = {lfloor}PHCF/ABi PHSF/ABi{rfloor} is the founder-origin probability matrix for the additive model for the population i, PTHCF/ABi = {lfloor}PHCFx/ABi1 PHCFx/ABi2 · · · PHCFx/ABini{rfloor} is the vector containing the haplotypic conditional probabilities for the population i corresponding to the common founder PCF, and PTHSF/ABi = {lfloor}PHSFx/ABi1 PHSFx/ABi2 · · · PHSFx/ABini{rfloor} is the vector containing the haplotypic conditional probabilities corresponding to the founder PSFi of the population i; Zi = {lfloor}PHCF/ABi PHSF/ABi {gamma}i{rfloor} is the founder-origin probability matrix for the additive and dominance effects model for the population i,{gamma} Ti = ,{gamma}ij = ; bTi = is the vector of the fixed parameters for the additive effects model, {alpha}*HCFx is the allele-substitution parameter of the putative QTL alleles in the common founder PCF and {alpha}*HSFix is the allele-substitution parameter of the QTL alleles corresponding to the founder PSFi from population i; bTi = is the vector of the fixed parameters for the additive and dominance effects model, {delta}i is the dominance effect for population i; eTi = is the vector of random deviations for population i. The random vector ei is assumed to be normally distributed with E = 0 and Var = Ri = Ii{sigma}2i. Here, Ii denotes the identity matrix of order ni and {sigma}2i is the component of the residual variance corresponding to population i. Alternatively, all populations can be analyzed simultaneously using a covariance model with a structured residual covariance matrix (Searle, 1971; Littell et al., 1996). The linear model for the combined analysis is represented in matrix notation by

[7]
Where yT = {lfloor}yT1 yT2 · · · yTq{rfloor} is the vector of phenotypic observations of all populations; X1 = X1i, {oplus} denotes the matrix direct sum; aT = {lfloor}aT1 aT2 · · · aTq{rfloor} is the intercept vector; Z = is the founder-origin probability matrix for the additive effects model, with PTHCF/AB = {lfloor}PTHCFx/AB1 PTHCFx/AB2 · · · PTHCFx/ABq{rfloor}; Z = is the founder-origin probability matrix for the additive and dominance effects model; bT = is the vector of the fixed covariate parameters for the additive effects model, {alpha}*HCFx is the allele-substitution parameter of the putative QTL alleles in the common founder PCF, and {alpha}*HSFix is the allele-substitution parameter of the QTL alleles corresponding to the founder PSFi of population i; bT = is the fixed covariate parameter vector for the additive and dominance effects model, {delta}i is the dominance parameter for population i; and eT = {lfloor}eT1 eT2 · · · eTq{rfloor} is the vector of random deviations. The random vector e is also assumed to be normally distributed with E = 0 and Var = Var = R = Ii{sigma}2i, given that the vector ß only contains fixed-effect parameters.

It is assumed that the genetic background effects absorbed by the residual component of the model are independent among individuals within populations in model [6], and among populations in model [7]. This assumption might be unrealistic because individuals within populations are full-sibs and among populations are half-sibs. To control part of the genetic background by reducing the segregation variance generated by linked and unlinked QTLs, when the analysis is performed for a given position in the linkage map, appropriate markers outside of the interval analyzed can be fitted as cofactors in models [3] and [5]. The addition of marker cofactors to partially remove the background genetic effect has shown to increase the sensitivity and precision of QTL mapping (Jansen and Stam, 1994; Zeng, 1994). Since the number of observations in the combined analysis differs from the number of observations in the single population analyses, the markers associated with the interval analyzed for a given linkage group may differ between the combined analysis and the single population analyses. Therefore, the selection of cofactors sets should be done separately for each single population analysis, and for the combined analysis.

Cofactors are included in models [3] and [5] by adding the term

[8]
where I is the interval of the linkage group analyzed and delimited by two fully informative markers loci (A and B); CHCFk1ij and CHSFk2ij are the known coefficients for the k1th and k2th markers selected as cofactors of the common founder and second founder of individual j from population i, taking the value of 1 or 0 depending on the markers haplotype; vHCFk1i and vHSFk2i are the associated regression coefficients. The model in matrix notation [6] is modified to include the cofactors by redefining the following matrices

Where CHCFk1i and CHSFk2i are ni x 1 vectors of known coefficients of the k1th and k2th common founder and second founder cofactors; m1i and m2i are number of marker cofactors considered from common founder and second founder haplotypes from population i. The model in [7] is modified by redefining the following matrices as

In this case, CHCFk1 is a ni x 1 vector of known coefficients of the k1th common founder cofactor, and m1 is the number of marker cofactors in the common founder haplotype considered in the combined model.

The estimation of the variance components for both single population QTL analyses and combined analysis can be performed by restricted maximum likelihood (REML) using a ridge-stabilized Newton-Raphson algorithm, which given conditions of regularity of the likelihood function and adequate starting values, produces a quadratic convergence. The best linear unbiased estimators of the fixed effects parameters are obtained by solving the mixed model equations (Searle et al., 1992; Littell et al., 1996). The analyses are performed at each 1 cM position in the linkage group, and the likelihood ratio statistic (LR) is calculated by obtaining the difference between the –2 times the REML log likelihood of the reduced model with no QTL consideration (l0) and the full model with the QTL parameters (l1). Full models are represented by [3], [5], [6], and [7]. The reduced models only include the parameter µ*i and the random deviation {epsilon}ij for the population analyses, and the vectors X1a and e for the combined analysis. The REML log likelihood function for the full model is described as follows

[9]
for the single population analyses, where p = 3 + m1i + m2i in model [3] and p = 4 + m1i + m2i in model [5]. If no cofactors are used, then m1i = m2i = 0. The matrices Xi and Ri, and the vector ßi are the established for the model [6]. Likewise, the REML log likelihood function for the combined analysis is represented by

[10]

The matrices X and R, and the vector ß correspond to the model [7]. In this case, p = m1 + m2i + 2q + 1 for the additive effects model, and p = m1 + m2i + 3q + 1 for the additive and dominance effects model. Note that m1 + m2i = 0 when cofactors are not considered in the model.

The REML log likelihood functions for the reduced models (l0) for the single population analyses and the combined analysis are obtained substituting Xi by X1i and ßi by ai in [9], and X by X1 and ß by a in [10], with p = m1i + m2i + 1 in [9] and p = m1 + m2i + q in [10]. –2 times the REML log likelihood functions are evaluated with the values of the fixed effects vector and covariance matrix that maximize [9] and [10] given by the Newton-Raphson algorithm; for example, the REML estimates of R and the generalized least squares estimates (GLS) of ß in (10), which are denoted and = XT–1y, respectively. The estimated variance-covariance matrices of the GLS estimate of ßi and ß are and (XT–1X) (Searle, 1971; Searle et al., 1992; Littell et al., 1996).

The significance of the putative QTL can be obtained by the approximation of the likelihood ratio test to the {chi}2 distribution (Self and Liang, 1987). The approximated thresholds are {chi}2{alpha}/M,2q and {chi}2{alpha}/M,3q for the additive effects model and additive and dominance effects model, respectively, where q is the number of populations (q = 1 for single population analyses) and M is the number of intervals in the genome. The overall significance level of {alpha}/M is discussed by Zeng (1994). The use of empirical thresholds based on the permutation test would be a more robust alternative (Churchill and Doerge, 1994), however, computationally more demanding.

Genetic Considerations
Let us assume first an F1 population developed from the cross of the founder clones, PCF and PSFi, a putative QTL in the locus at position x in the linkage group, with alleles H1x and H2x in the founder PCF, and the Hi3x and Hi4x alleles in the founder PSFi. The regression coefficients of the phenotype on H1x and Hi3x founder-origin probabilities of the locus at position x for F1 individuals in model [3] and [5], represent the fixed allele-substitution effect of H1x by H2x and Hi3x by Hi4x.

Since not all possible QTL allele combinations are obtained in the heterozygous progeny in F1 populations from crossing QTL informative clones, there are some limitations in estimating dominance deviations. In fact, dominance can be estimated only by a lack of parallelism between the phenotypic values of the genotypic pairs H1Hi3x, H2Hi3x, and H1Hi4x, H2Hi4x. Since only these four genotypes are available for estimating both additive and dominance genetic effects, with three degrees of freedom, two independent parameters are used to estimate the allele-substitution effects , leaving one degree of freedom for estimating a dominance parameter. Given this constraint, we must imposed a restriction on the dominance effect such that {delta}i = {delta}HCFHSFix = –{delta}HCFHSFi,x = –{delta}HCF,HSFix = {delta}HCF,HSFi,x. A value {delta}i != 0 would indicate a lack of parallelism and imply the existence of dominance among QTL alleles at a locus. Conversely, however, a value of {delta}i = 0 would not definitively exclude the existence of dominance, as the dominance effect could be affecting the phenotype equally at all four genotypes, in which case it is not detected.

This principle can be extended to several F1 populations that share a common founder. Such scenario is seen in the context of breeding populations of fruit trees species, like T. cacao, in which the common founder is a selected clone with some very desirable traits, but also with undesirable genetic constitution for other traits. This clone, therefore, it is crossed to other selected clones with good complementary responses to other characteristics for further selection of the superior recombinants in the F1 populations.

Theoretical calculations as well as computer simulation research have shown that under conditions of regularity, high resolution QTL mapping is dependant on large progeny sizes. The power of QTL detection (the probability of a true marker-trait association) is improved by decreasing the within-marker class variance, or residual variance, which come with increased sample size (Soller and Genizi, 1978; Lander and Botstein, 1989; Weller et al., 1990; Lynch and Walsh, 1998). In the most general view, the q F1 populations can be considered a set of n = n1 + n2 + · · · + nq individuals containing the common founder PCF haplotype, establishing a very suitable scenario in which to implement a haplotype-based approach for QTL analysis. The power of QTL detection in a combined analysis that includes all populations is expected to increase in a manner proportional to the number of populations included in the analysis, and hence increase in sample size, resulting in more accurate QTL maps. When the intralocus interaction (dominance) with the alleles of the second parent of every population is incorporated into the model as showed above, however, an additional assumption of independence from interlocus QTL allele effects (epistasis) must be made. Departure from this assumption might result in lower power of QTL detection and biased estimates of the allele-substitution and dominance parameters. Another consideration is that if the common founder is not QTL informative (QTL heterozygote), then {alpha}HCFx = {alpha}HCF,x and {alpha}*HCF = {alpha}HCFx{alpha}HCF,x = 0 in the single population analysis and in the combined analysis, meaning that the utility of the proposed method relies on the assumption that the common founder is, indeed, heterozygous for the putative QTL.

Data Simulation and Statistical Analysis
Four data sets of five F1 populations of 100 individuals obtained from the crosses of one heterozygous common founder clone (PCF) with other five founder clones (PSF1, PSF2, PSF3, PSF4, and PSF5) were simulated with a SAS macro for Windows Version 9.0 (SAS Institute Inc., Cary, NC). The genome of every individual consisted of two chromosomes of 100 cM in length with one marker locus every 5 cM, and two QTLs, the first located at a position 59 cM distal from the beginning of the first chromosome and a the second located at a position 29 cM distal from the beginning of second chromosome. The genomes of the parental clones were simulated considering different percentages of homozygosity (20, 30, 35, 40, 45, and 50% for founders PCF, PSF1, PSF2, PSF3, PSF4, and PSF5, respectively), represented by non-informative marker loci located randomly through the chromosomes. A frequency of 5% of allele sharing between founder clones for different loci was also included. The recombination probabilities between homologs were obtained under the assumption of no interference among marker loci, and selecting the location in the chromosome at random for each recombination. The genotypic model for simulation corresponding to population i is described in the Table 2.


View this table:
[in this window]
[in a new window]
 
Table 2. Genotypic values for the F1 progeny of population i, with founders PSF1 and PSFi (i = 1, 2, 3, 4, 5).

 
The four phenotypic data sets were simulated with additive QTL effects in both chromosomes, but with QTL dominance effects only for the first chromosome (Table 3), with very specific restrictions on the genetic parameter values, explained below. The allele-substitution coefficients for the first chromosome were set to

[11]
Here, {alpha}*HCFx and {alpha}*HSFix refer to the allele-substitution of the putative QTL in the first chromosome of the common founder and the second founder of population i (i = 1, 2, 3, 4, 5). The dominance effects used for simulation were

[12]
for population 1 to 5, respectively. The restrictions on the allele-substitution effects for the second chromosome were

[13]


View this table:
[in this window]
[in a new window]
 
Table 3. Genetic parameters used for data simulation of the five populations using the model with additive and dominance effects, for two chromosomes of 100 cM in length each.

 
The residual component was obtained as a random observation from a normal distribution with mean zero and variance set to meet the restrictions in Table 3. The statistical analysis was performed with a SAS macro for Windows Version 9.0. The macro estimates the reduced and full models (6) and (7), the likelihood ratio test statistic, the covariance parameters and their standard error with the MIXED procedure (Littell et al., 1996) at every 1 cM position of the linkage group. The analyses were performed for the additive model and for the additive and dominance model, the latter both with and without cofactors. The significance of the allele-substitution for the different founders and the dominance effects were tested with a t test. As candidate cofactors were considered all markers with the exception of the flanking markers of the interval to be analyzed for the putative QTL. Cofactors were selected for models (6) and (7) separately by multiple regression using the backward method ({alpha} = 0.05).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Plots of the likelihood ratio (LR) test statistic against the chromosomal position obtained with the additive and dominance effects model are shown in Fig. 1 . Analyses of the first simulated data set with QTLs of the smallest effects (Table 3) did not show particularly satisfactory results in terms of the QTL position estimates. However, analyses performed on the data sets of the latter three simulations with QTLs of larger magnitude showed that the QTL position can be estimated with precision by this approach, as described next.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1. Likelihood ratio curves vs. genome position of the haplotype-based QTL analyses for two linkage groups of 100 cM in length each, of the simulated five F1 populations (Pop1, ... , Pop5) of 100 individuals each. The single population analyses and the combined analyses (Comb) were performed under the additive and dominance effects model.

 
The LR test statistic was larger when this method, based on founder-origin probabilities, was run on the combined population analysis than when it was based on single population analyses. This was expected since the difference in the number of parameters between the full and reduced models is not the same for the two types of analysis. The extra number of parameters fitted in the full model represents the expected value of the LR test statistic according with its asymptotic {chi}2 distribution (Self and Liang, 1987). Therefore, we compared the test statistic of both methods to {chi}2 thresholds, defined by the significance level and the degrees of freedom determined by the extra number of parameters fitted in the full model. Thresholds using the {chi}2 approximation are {chi}20.05/40,3 = {chi}20.00125,3 {approx} 15.8 and {chi}20.05/40,15 = {chi}20.00125,15 {approx} 37.04 for the single population analyses and the combined analyses, respectively. The likelihood ratio had larger values in chromosome 1 than in chromosome 2 over the chromosome segment containing the putative QTL, as expected, since chromosome 1 has the larger QTL. Analyses based on single populations showed larger values of the test statistic than the respective threshold when performed in the chromosome segment surrounding the putative QTL. The likelihood ratio obtained with the combined analyses was larger than the threshold for the most of chromosome 1. The absolute maximum of the test statistic in the combined analyses was generally in a segment very close to the true QTL position in the linkage group, while the absolute maxima of the likelihood ratio test statistic were slightly distant from this position for the individual population analyses.

Likelihood ratio plots from analyses using the additive effects model only (results not shown) were very similar to the plots in Fig. 1, but with slightly lower values of the test statistic for chromosome 1. Figure 2 shows the plots of the likelihood ratio versus the chromosome position for the analyses from the model with both additive and dominance effects and with cofactors. The number of cofactors included after the backward elimination was variable, from none to 10 for the single populations analyses and from 1 to 11 for the combined analyses. Improvement was achieved when marker cofactors were added to the model when estimating the QTL position in chromosome 1 for simulations 2, 3, and 4, effectively removing substantial residual variance. While the absolute maximum of the test statistic curves tended to be over intervals close to the true QTL position, the inflexion points flanking the maximum peaks covered segments larger than 30 cM in length when analyses were performed without cofactors (Fig. 1). These segments were narrowed to approximately 10 cM in length when cofactors were added to the analyses (Fig. 2). The confidence intervals based on the two-LOD rule (Van Ooijen, 1992) are shown in Table 4. The cofactor model seriously misestimated the QTL position in the first simulated data set with the smallest QTL effects. However, the misestimation was no larger than 1 cM for QTL position in the other three simulations with QTL alleles of larger magnitude. Shorter confidence intervals containing true QTL positions were obtained with the model that included cofactors for the QTL in chromosome 1 in simulations 2, 3, and 4, no larger than 6 cM.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2. Likelihood ratio curves vs. genome position of the haplotype-based QTL analyses for two linkage groups of 100 cM in length each, of the simulated five F1 populations (Pop 1, ... , Pop5) of 100 individuals each. The single population analyses and the combined analyses (Comb) were performed under the additive and dominance effects model. Cofactors were included in the analyses with a window of 5 cM.

 

View this table:
[in this window]
[in a new window]
 
Table 4. Estimated position of the putative QTL and its confidence interval (in parentheses) for the analyses performed using the model with additive and dominance effects.

 
The estimated effect for the allele-substitution values of QTLs contained in the founder clones using the additive and dominance effects model are shown in Table 5. The estimates were obtained with the combined analyses using cofactors, and correspond to the likelihood ratio curves showed in Fig. 2. In the first simulated data set with small QTL values, the estimated effects of the QTL alleles in both chromosomes were less accurate, nor did the analysis find the correct position of the QTLs.


View this table:
[in this window]
[in a new window]
 
Table 5. Allele-substitution effects of the parental founder clones corresponding to the five simulated populations. The true parameter values are shown on the first row of every combination of chromosome and simulation number, the estimated position and effect below, and the standard error of the estimated effect (in parentheses).

 
For the QTL alleles on chromosome 1, the allele substitution in all four simulations (Table 5) was significantly different from 0 only for the common founder and the second founder of the first population. For the latter three simulations, even though there is an upward bias in the point estimates of the allele-substitution effects of these founders, a 95% confidence interval contains the true parameter value. Zeng (1993) showed that the partial regression coefficients are biased estimates of QTL effects. In this research, the allele-substitution effects of QTL alleles in the common founder were upwardly biased from approximately 15, 5, and 7% for QTLs that explain on average 14, 28, and 35% of the phenotypic variance across populations (Table 3). Larger bias was observed for the point estimates of the allele-substitution effects for QTL alleles in the second parent of the first population with larger standard errors than in the common founder. The average effect of allele substitution in chromosome 1 of founders 3 and 4 were not significant. These were QTLs of smaller effect than of those from the common founder and the second founder of the first population, as given by (11). Seventy-five percent of the estimates of the allele-substitution effects in chromosome 2 of founders 2, 3, and 4 were significant with a relatively good approximation to the true value, as expected with QTL alleles of stronger magnitude.

The estimates of dominance effects are shown in Table 6. Most of the significant dominance coefficient estimates were observed for chromosome 1 of the third and fourth simulations, in which the dominance effect of the QTL had a larger magnitude. As in the case of the allele-substitution effect, there was a tendency among the significant coefficients to overestimate the true parameter value. There were two false positives, one in the allele-substitution effect and one in the dominance coefficient, both on chromosome 2 (Table 5 and 6).


View this table:
[in this window]
[in a new window]
 
Table 6. Dominance effects for the five simulated populations. The true parameter values are shown on the first row of every combination of chromosome and simulation number, the estimated position and effect below, and the standard error of the estimated effect (in parentheses).

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
This study shows the use of breeding populations of outbred plant species to map QTLs based on founder-origin probabilities (Reyes-Valdés, 2000; Reyes-Valdés and Williams, 2002) that trace specific haplotypes from the founders to their progeny. Figures 1 and 2 showed that the test statistic values are increased when a set of F1 populations with one common founder are considered in the analysis, and the absolute peaks of the curves were within approximately 1 cM of either side of the real QTL position in the linkage group for quantitative trait loci that explain, on average, a minimum of 14% of the phenotypic variation. Although the allele-substitution effect of mild and strong QTLs in the common founder was often overestimated (Table 5), a 95% confidence interval contains the real value of the parameter.

The most recent QTL map for cocoa was developed by Clement et al. (2003a)( 2003b), from the crosses of three female parental clones DR1, S52, and IMC78 and the male parental clone Catongo. DR1 and S52 are Trinitario genotypes and IMC78 is an upper Amazon Forastero, with heterozygosity estimates of 37, 27, and 27%, respectively; Catongo is a lower Amazon Forastero clone with a highly homozygous genotype. Each population was analyzed individually using an approximation to a testcross. The number of individuals of the populations developed from the female parents DR1, S52 and IMC78, were 96, 94, and 125 for yield components, vigor, and resistance to P. palmivora, and 95, 88, and 124 for bean traits and ovule number, respectively. QTL analyses using a backcross model with a sample size of 100, and assuming an informative marker linked 5 cM from the putative QTL, would be able to detect a QTL whose segregation accounts approximately a minimum of 23% of total variance with a power of detection (probability of detecting a true association) of 90% and a significance level of 5% (Lynch and Walsh, 1998). A Half-Sib Design in which every marker informative male parent is crossed to 100 female parents and a single offspring is scored from each mating, would have a power of detection of 44% with a significance level of 5% and assuming a linked informative marker 5 cM away from the QTL that explains 14% of the phenotypic variance. However, power is increased to 75% if three half-sib families of 100 offspring each are used to perform the analysis, and to 90% if five half-sib families of 100 offspring each are used for the calculations (Lynch and Walsh, 1998). The results that we presented using the haplotypic-based method give clear evidence that combining F1 populations to perform association analysis would increase the likelihood ratio peaks over the region where the QTLs are located, as discussed by Jansen et al. (2003) for multiple related F2:3 populations, yielding more accurate and precise QTL maps than single population analyses, especially for mild and strong QTLs contained in the common founder. However, more extensive simulation research should be done to test the proposed method that would include different number of populations and population sizes, unequal sizes among populations, and multiple QTLs in a linkage group. This method was designed to be implemented with fully informative codominant markers, such as restriction fragment length polymorphism (RFLP) and simple sequence repeats (SSR). Less-than fully informative markers or dominant markers cannot be used with the haplotypic method as described above. Research should be done on the implementation of partially informative markers to estimate QTL as an extension of the approach outlined in this study.

Marker cofactors were successfully used in the models of this research to control genetic background (Jansen and Stam, 1994; Zeng, 1994). Another possibility to explore to control genetic background for more precise QTL estimation is the use of a structured covariance matrix that would include both components of genetic variance and covariance among individuals (Gianola et al., 2003; Lund et al., 2003; Piepho, 2000), given that individuals of the same population are full-sibs and individuals from different populations with one common founder are half-sibs. Thus, the Cov(yij, yij') = d1 and Cov(yij, yij') = d2 are the covariances within and among populations due to additive and nonadditive gene action, including possible correlations caused by the environment. Precise prior estimates of the narrow sense heritability for the analyzed trait would allow separation of the additive component of variance and covariance of nonadditive components, otherwise they will be pooled together because of unreplicated F1 progeny. We used a ridge-stabilized Newton-Raphson algorithm in this research that generally converges with few iterations and makes available asymptotic sample variances for the estimated parameters, however, requires matrix inversion in each iteration, making it highly demanding of computational resources. A covariance QTL analysis with a structured variance covariance-matrix as described above could also be performed with a derivative free algorithm that would require more rounds to reach convergence but would be computationally feasible since matrix inversion is not required (Meyer, 1989; Searle et al., 1992).

The haplotype-based approach proposed in this study requires founders that are both marker informative and QTL informative, requiring crosses between founders with heterozygous QTL genotypes with relatively closely linked heterozygous markers. Such a set of conditions indicates that this method is better suited for highly heterozygous outbred plant species, as is the case of many out crossing perennial plants with high genetic load (Williams, 1998), and with mainly additive gene action, as has been shown for cocoa clones (Clement et al., 2003a). In addition, combining populations by a common founder for QTL analysis implies the assumption of no interaction of the putative QTL with the genetic background. In the use of this approach in full-sib F1 populations, every founder appears in more than one cross, allowing the combining of populations by common founders for QTL analysis (each population will be in more than one combination), increasing not only the accuracy and precision of the QTL position and effect estimates, but also the number of the putative QTLs given the increase in the number of parents considered.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Reparameterization of Models [2] and [4]
The allele effects are reparameterized in terms of the allele-substitution effects as follows

The reduction in the dominance parameters is achieved as follows

Received for publication November 6, 2003.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 


Related articles in Crop Science:

THIS ISSUE IN CROP SCIENCE

Crop Science 2004 44: 1507-1510. [Full Text]  




This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Related articles in Crop Science
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via ISI Web of Science (5)
Right arrow