Crop Science Grow Your Career with CSSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 25 April 2006
Published in Crop Sci 46:1323-1330 (2006)
© 2006 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (9)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Breseghello, F.
Right arrow Articles by Sorrells, M. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Breseghello, F.
Right arrow Articles by Sorrells, M. E.
Agricola
Right arrow Articles by Breseghello, F.
Right arrow Articles by Sorrells, M. E.
Related Collections
Right arrow Cell Biology & Molecular Genetics
Right arrow Crop Genetics
Right arrow Nitrogen

CROP BREEDING & GENETICS

Association Analysis as a Strategy for Improvement of Quantitative Traits in Plants

Flavio Breseghelloa and Mark E. Sorrellsb,*

a Embrapa Rice and Beans, Santo Antônio de Goiás, GO, Brazil, 75375
b Dep. of Plant Breeding, Cornell University, 252 Emerson Hall, Ithaca, NY, 14853-1902

* Corresponding author (mes12{at}cornell.edu)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 
Association analysis is a method potentially useful for detection of marker-trait associations based on linkage disequilibrium, but little information is available on the application of this technique to plant breeding populations. With appropriate statistical methods, valid association analysis can be done in plant breeding populations; however, the most significant marker may not be closest to the functional gene. Bias can arise from (i) covariance among markers and QTL, frequently related to population structure or intense selection and (ii) differences in initial frequencies of marker alleles in the population, such that exclusive alleles tend to be in higher association. The potentials and limitations of germplasm bank collections, synthetic populations, and elite germplasm are compared, as experimental materials for association analysis integrated with plant breeding practice. Synthetics offer a favorable balance of power and precision for association analysis and would allow mapping of quantitative traits with increasing resolution through cycles of intermating. A model to describe the association between markers and genes as conditional probabilities in synthetic populations under recurrent selection is proposed, which can be computed on the basis of assumptions related to the history of the population. This model is useful for predicting the potential of different populations for association analysis and forecasting the response to marker-assisted selection.

Abbreviations: LD: linkage disequilibrium • AA: association analysis • QTL: quantitative trait locus/loci


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 
ASSOCIATION ANALYSIS (AA), also known as association mapping or linkage disequilibrium mapping, is a method that relies on linkage disequilibrium to study the relationship between phenotypic variation and genetic polymorphisms (reviewed by Flint-Garcia et al., 2003). Linkage disequilibrium (LD) is the nonrandom combination of alleles at two genetic loci, which in random mating populations is mostly generated by mutation and genetic drift, and decays by recombination. Therefore, LD will be observed between two loci if they are in tight linkage or if the haplotype is recent (Hedrick, 2005). The time (t) in generations since one specific mutation occurred is geometrically distributed. Thus, if the mutation rate is µ, the expectation of t is E(t) = 1/µ. Mutations are rare events (µ < < 1); hence, it is expected that most mutations happened many generations ago and should be in linkage equilibrium with other loci, unless they are very closely linked.

While significant LD in random mating populations is evidence of tight linkage, population perturbations like migration, inbreeding, and selection can build up LD among loosely linked or even unlinked loci. Therefore, the characteristics of the population under study must be recognized when conducting AA and interpreting its results. Scientific plant breeding is a recent activity that normally involves a narrow genetic pool, such that breeding populations can be traced back to relatively few original parents, normally landraces, within a relatively small number of generations (e.g., Bered et al., 2002; Lu et al., 2005). Under this scenario, mutations play a minor role and most of the observed LD is expected to reflect the haplotypes of the original parents. Moreover, because there were few opportunities for recombination between the time of introduction of a parent and the present, LD in some plant breeding populations may not reliably indicate tight linkage. Between unlinked loci, LD can be caused by simultaneous selection of combinations of alleles at different genes, including epistasis, and by population structure (Hartl and Clark, 1997). Both phenomena should be common in plant breeding populations. Selection should affect LD in parts of the genome related to traits that are relevant for the breeding program. This source of distortion should be taken into consideration in the interpretation of results of AA in a case-specific manner. In contrast, population structure is expected to affect the pattern of LD over the whole genome and must be controlled a priori for correct association analysis (Pritchard et al., 2000b).

Most of the literature on AA refers to human populations or theoretical panmitic populations. There is limited information and discussion about applications of this technique to plant breeding. As the information generated by quantitative trait loci (QTL) studies accumulates, a method is needed to convert efficiently that information into practical tools for plant selection. Association analysis can be an effective approach for closing the gap between QTL analysis and marker-assisted selection.

The objective of this paper is to raise awareness among plant breeders of practical and theoretical aspects related to the application of AA in plant breeding programs. We compare three types of plant populations—germplasm bank collections, synthetic populations, and elite lines—with respect to their potential and limitations as experimental materials for AA and propose that populational breeding represents a favorable setting for AA. We describe a model to make predictions about marker-gene associations in synthetic populations, which can be useful for evaluating the potential of a given population for AA, and forecasting the response to marker-assisted selection.


    Marker Allele Means Are Biased Estimators of Gene Allele Effects
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 
Association analysis holds promise as a strategy to implement marker-assisted selection for quantitative traits in plant breeding programs. However, the breeder should be aware of the risk of biased estimation or even false inference resulting from this method.

Significant association between marker and trait depends on differences between phenotypic means of lines carrying different marker alleles as an indication of the effects of a gene in LD with the marker. However, estimation of gene effects through molecular markers is susceptible to errors caused by sampling variance and systematic biases. Sampling errors cause overestimation of gene effects because the errors add to the differences between alleles, when the same data set is used for detection and estimation of effects. QTL x environment interaction is another example of sampling error related to the finite number of locations and years tested. For this reason, before undertaking the effort of transferring a QTL allele, cross validation is advisable to confirm its consistency and to estimate unbiased expectations of genetic gain. Cross validation can be achieved on the basis of independent data sets (Melchinger et al., 1998), preferentially in multiple environments, or by resampling from a larger data set (Schon et al., 2004). Previous studies have demonstrated that the cross-validated allele effect was only about half the amount initially detected and that frequently the presence of the QTL was not repeatable (Melchinger et al., 1998; Schon et al., 2004).

Sampling errors are a consequence of limited sample size. However, as demonstrated in the next section, marker allele means could be biased estimators of gene effect even in infinite samples.


    Association between a QTL and a Molecular Marker in Terms of Conditional Probability
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 
Marker-based selection consists of selecting individuals on the basis of their molecular profile, justified by evidence that those individuals are more likely to bear a favorable allele at a gene of phenotypic relevance than individuals taken at random. If the evidence is true, the probability of the presence of the favorable allele, given that the associated marker allele is present, is higher than the frequency of the favorable allele in the population. Hence, associations can be treated as conditional probabilities for selection purposes.

To express the association between the marker and the gene as a conditional probability, let us define y as a quantitative trait under control of w QTL in a breeding population, such that the phenotype of a pure line i can be described by the simplified statistical model

Formula
where µ is a constant, {alpha}qi is two times the additive effect of the allele carried by the line i at the QTL q (q = 1,...,w), and ei represents the error associated with the phenotypic evaluation of line i, ei ~ N(0, {sigma}2). Let L be the QTL under test and gi be the sum of additive effects of the alleles carried by line i at all other QTL (q = 2,...,w), or "polygenic effect." Then the statistical model can be simplified to

Formula 1[1]
For simplicity, consider L a biallelic locus in which the allele A has additive effect {alpha}A/2 and allele a has additive effect equal to zero ({alpha}a = 0).

Now consider the phenotypic values associated with a marker locus. Although the molecular marker is considered functionally neutral, it can affect the expectation of phenotypic value by changing the probabilities of the alleles at L and the expectation of the polygenic effect. The expected value of y can be expressed as a conditional expectation, given the allele M at a molecular marker locus J:

Formula 1
Because a is the complement of A, Pr(a|M) = 1 – Pr(A|M), and

Formula 1
On the basis of Eq. [1], and considering {alpha}a = 0, the conditional expectations are E(y|AM) = µ + {alpha}A + E(g|M) and E(y|aM) = µ + E(g|M), consequently,

Formula 1
The conditional expectation of the polygenic effects g given M can be expressed as a covariance term (through the method of moments) between g and an indicator variable IM, such that IM = 1 when J = M; IM = 0 otherwise.

Formula 1
where N is the sample size and n11M is the number of plants with marker allele M. In the case of a quantitative trait influenced by multiple genes, potentially with multiple alleles per gene, there are many possible combinations of alleles at w – 1 QTL. None of those combinations is of particular interest in this case because the interest is in estimating the effect of the allele A at L. Hence, the polygenic effect can be considered as a random effect, gi ~ N(0, {sigma}2g), creating the settings for a mixed effects model. Since E(g) = 0,

Formula 1
Hence

Formula 2[2]
In AA, we estimate {alpha}A from the mean of plants carrying the marker allele M; therefore, it is desirable to maximize Pr(A|M) and to minimize cov(g,IM). Additionally, Eq. [2] also shows that rare alleles (N/nM > > 2) are more susceptible to the biases caused by covariances between the marker and the polygenic effects.


    Population Structure Must Be Considered for Valid Association Analysis
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 
The covariances of polygenic effects observed among relatives represent a source of bias in the estimation of gene effects (Kennedy et al., 1992). Population structure is the presence of subgroups in the sample in which individuals are more closely related to each other than the average pair of individuals taken at random in the population. Substructure is a common cause of covariances of polygenic effects because relatives tend to share marker and gene alleles genomewide. If the candidate marker is tested by fitting the simple linear model yi = µ + {alpha}1i + ei, the covariance in Eq. [2] is included in the error term. If the marker J is in linkage equilibrium with the other QTL influencing y, the covariance of polygenic effects with the marker allele M is null [cov(g,IM) = 0] and creates no biases. However, if QTL alleles are arranged in any systematic way, the error term will no longer be identically and independently distributed, contradicting a basic assumption of the analysis of variance.

The bias can be avoided to the extent that factors related to the covariances among QTL can be identified and included in the model. Races (e.g., indica or japonica rice) or major breeding pools (e.g., spring or winter wheat) represent strong population structuring and must be recognized in the analysis. Secondary subdivisions or hidden population structure can be inferred through unlinked marker data (Pritchard et al., 2000a).

Inclusion of population subdivision as random effects in a mixed model allows for the computation of unbiased estimates of allele ···effects. Considering a simplified case where individuals are discretely assigned to one of k subpopulations, without admixture, the variance–covariance matrix V(Y) can be represented as a group of submatrices (Littell et al., 1996):


Formula

O represents matrices of zeros of suitable dimensions, and {sigma}2y = {sigma}2s + {sigma}2. Hence, the covariance between two lines would be {sigma}2s if they belong to the same subpopulation, or zero if they belong to different subpopulations. Comparisons between subpopulations have variance 2({sigma}2s + {sigma}2), whereas comparisons within subpopulations have variance 2{sigma}2, which compensates for the inflating effect of covariances of polygenic effects with marker alleles, restoring the validity of the hypothesis test (Kennedy et al., 1992). More complex models can accommodate different levels of relationship and admixture of subpopulations (Yu et al., 2006). On the other hand, a gene that is polymorphic among subpopulations, but nearly monomorphic within subpopulations, will have its effect confounded with polygenic effects and is unlikely to be detected by AA (Deng, 2001).

Population stratification can be controlled in different levels of detail, depending on the desired level of confidence. As the population is divided in more subgroups, the probability of false positives is reduced, at the cost of a reduction in statistical power (Cardon and Palmer, 2003). The proportion of residual variance that is captured by population structure can be quantified in the mixed model as the intraclass correlation coefficient, icc = {sigma}2s/({sigma}2s + {sigma}2) (Neter et al., 1996). A high icc indicates that a large proportion of the variance of the trait is observed between subpopulations.


    Choice of Populations for Association Analysis in Plant Breeding Programs
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 
In a plant breeding program, three main types of populations could be considered for implementation of AA: germplasm bank collections, elite breeding materials, and synthetic populations. The application of AA differs among those populations in several aspects (Table 1). For efficient integration of AA with other methods currently in use, material that is routinely generated and evaluated should be used for both purposes. In the case of germplasm banks, core collections are expected to represent most of the genetic variability with a manageable number of accessions (Zhang et al., 2000), and thus are suitable for genetic studies. In the case of elite materials, the sample could be composed by lines and checks evaluated in regional trials. For synthetic populations, the evaluation unit should be also the association unit (or closely related to it), whether it is an individual or a family.


View this table:
[in this window]
[in a new window]
 
Table 1. Comparison of different types of populations for association analysis.

 
Germplasm Bank Core Collections
Samples representing the genetic diversity of a species are attractive for AA because of the wide allele diversity encompassed. Methods of selection of core collections often involve genotyping unlinked markers to compute genetic distances, thus providing information about population structure. The process of selection of a minimum sample with maximum variation has a normalizing effect that is expected to reduce population structure and LD between unlinked loci, thus creating a situation favorable for association analysis (Breseghello and Sorrells, 2005). A difficulty likely to occur in this type of material is related to genetic heterogeneity within samples. Landraces and natural populations often consist of open-pollinated varieties or mixtures of genotypes, and the DNA extraction, genotyping, and phenotyping schemes must account for this variability.

Core collections are useful materials for AA of qualitative traits, such as disease resistance or special quality characteristics (color, aroma, etc.). Studies focusing on domestication-related traits such as seed dormancy, shattering, or inflorescence type also could require wide phenotypic variation, beyond the limits of cultivated germplasm (Clark et al., 2004). Conversely, the broad genetic variability of those collections normally make them unsuitable for analysis of quantitative traits because part of the accessions would be unadapted to growing conditions and prevalent diseases, resulting in poor precision of trait measurement.

Common ancestors of distantly related individuals occurred many generations ago; therefore, LD is expected to have decayed to short genetic distances. For this reason, AA in core collections will probably require candidate genes or major QTL mapped within narrow confidence intervals (Thornsberry et al., 2001). Compared with linkage-based fine mapping and positional cloning (Yan et al., 2003), the AA approach would offer the advantage of simultaneously detecting the effect and screening the germplasm for useful alleles. Significant markers would be useful for introgression of the new variation into elite germplasm through marker-assisted backcrossing (Frisch and Melchinger, 2005), while markers used for population structure inference could be used to speed up the recovery of the recurrent parent genome. Theoretical projections indicate that the use of two markers per chromosome for selection against the donor genotype could shorten the transfer by about two generations (Hospital et al., 1992).

Elite Lines and Cultivars
Maximum relative efficiency of marker-assisted selection compared with phenotypic selection is expected when heritability is low and markers capture a significant portion of the variation for the trait (Lande and Thompson, 1990). Elite lines are desirable materials for AA of low heritability traits, including yield, yield components, and tolerance to abiotic stresses because elite lines are genetically stable and are well adapted to normal growing conditions.

In plant breeding programs, there is normally a large body of phenotypic data accumulated for elite lines and cultivars from replicated field experiments over locations and years. Use of those data for AA requires statistical models accounting for covariances introduced both by experimental design (years, locations, replicates) and polygenic effects. Moreover, those data are often unbalanced because new lines are included in field trials each year, while other lines are discarded. Maximum likelihood solutions of mixed-effects models yield minimum-variance unbiased estimates of allele effects from unbalanced data, taking into account the correlation structure of the data (Pinheiro and Bates, 2000). Mixed-effects models were used to analyze plant height, disease resistance, and grain moisture in maize (Parisseaux and Bernardo, 2004) and grain size and milling quality in wheat (Breseghello and Sorrells, 2005).

Population structure can be prominent in elite material because it is common for closely related lines to be admitted to advanced trials. If pedigrees are known, the relationships among the lines can be determined (Bered et al., 2002) and used to control for polygenic effects (Zhang et al., 2005). In this case, it is not essential to estimate population structure through unlinked markers, although there may still be interest in marker data as a genetic fingerprint for variety protection (Röder et al., 2002) and for purity control of seed production.

A typical elite plant breeding pool is derived from few founders in the recent past, and is submitted to intense selection. For those reasons, LD is expected to be high in this material, and the first experimental results confirm this expectation (Ching et al., 2002; Tenaillon, 2001). Although AA in elite lines may not offer much improved resolution compared with QTL analysis in biparental mapping populations, there are at least two important advantages: a substantially higher level of polymorphism and detection of favorable alleles directly in the target population.

Elite lines are natural candidates for crossing to generate the next round of breeding, and significant markers could be used for marker-assisted selection in the progeny. However, the breeder needs to confirm whether a given pair of parents differing for the marker indeed differs for the gene, before using the marker as a proxy for selection (Table 2). With a less-than-perfect association between M and A, some lines carrying M may have a, whereas some lines with m may have A. In this way, although a cross M x m for the marker is more likely to be A x a for the gene, it can also be A x A, a x a or even a x A. Validation could be achieved by demonstrating association between F2 genotypes and F3 phenotypes for the quantitative trait. This test would have high statistical power because the design is balanced, no population structure is expected and no multiple testing is involved.


View this table:
[in this window]
[in a new window]
 
Table 2. Results from association analysis and interpretation for plant breeding.

 
Synthetic Populations
Although the potential of synthetic populations for AA is largely unknown, they might be the plant breeding materials that best approximate the assumption of random mating because synthetics are normally designed and maintained to minimize inbreeding. Population structure is expected to be mild or absent, which is an important advantage of synthetics for AA. If the experimental material represents a single intermating population, the power of AA is maximized and the risk of false associations is minimized (Cardon and Palmer, 2003). Nevertheless, population structure can still occur because of differences in flowering time, plant height, and other traits that may lead to assortative mating.

Genotypic information could be useful in all phases of population breeding. In the choice of parents to form the population, knowledge of the genetic distance among lines would be useful to achieve a compromise between high means for agronomic traits and high allelic variability. By genotyping samples of subsequent cycles with unlinked markers, breeders can monitor changes in allele diversity, effective population size, and population structure (Courtois et al., 2005; Ramis et al., 2005).

The allele diversity of synthetic populations depends on the number and divergence of parents and the intensity of selection applied. Genetic diversity can be expressed, among other measures, as the effective allele number, Ae = 1/{Sigma}pi2, where pi is the frequency of allele i (Hartl and Clark, 1997). An approximate effective population size can be derived from estimates of LD (r2) among unlinked markers, as Ne = 1/(2r2) (Hedrick, 2005). Reduced effective population size can cause genetic drift. Conversely, allele changes beyond that expected from genetic drift for a given population size, indicate genomic regions that were probably affected by phenotypic selection (De Koeyer et al., 2001; Labate et al., 1999).

The level of LD in synthetic populations is expected to be high in the initial generations, such that a genome scan could detect large chromosome segments associated with traits, and trace them back to parental haplotypes. In subsequent generations, the decay of LD by recombination would favor increasingly refined mapping. However, synthetic populations are often submitted to recurrent selection, a breeding scheme consisting of successive cycles of evaluation, selection, and recombination (Fehr, 1987). Intense selection could build up LD by favoring allelic combinations or by promoting genetic drift (Palaisa et al., 2003). For this reason, populations subjected to mild or no selection would be preferred for AA. Laurie et al. (2004) developed a population for association analysis from the Illinois high/low oil populations, with 10 generations of recombination without selection.

In pedigree breeding, significant markers have to be confirmed in each cross, while in populational breeding, they can be included in selection indices, along with phenotypic information (Lande and Thompson, 1990), on the basis of their probabilistic association with the trait. The relative weight attributed to phenotypes and genotypes in the selection index could fluctuate according to the quality of the phenotypic evaluation in each cycle. When traits can be evaluated with precision, selection could be done on the basis of phenotypes, while associations with markers would be established. In cycles when field experiments fail to give precise data, selection could depend more heavily on genotypic data. This scheme represents a carry-on of information from a "good year" to a "bad year" through genetic markers. Selection based exclusively on marker data has been referred to as "genotype building" (Dekkers and Hospital, 2002), and it has been demonstrated by simulation that it could give genetic gains for a few generations following phenotyping, even if the linkage between genes and markers is not very tight (Hospital et al., 1997).

AA in synthetic populations under selection will require intensive genotyping because in each cycle, new progenies have to be tested to reflect the current state of the population and for implementation of marker-assisted selection. On the other hand, information about a population is cumulative over years, allowing a progressively refined genetic analysis of traits of interest to the breeding program.

A Genetic Model for Estimation of Pr(A|M) in Synthetic Populations under Recurrent Selection
We demonstrated that the association between marker and gene can be expressed as conditional probabilities and that synthetics are especially useful for AA in plant breeding. In the context of synthetic populations under recurrent selection, the conditional probability of the gene allele A, given the marker allele M, can be computed on the basis of the history of the population. This model assumes no epistasis, no genetic drift, and constant relative fitness coefficients.

Suppose that a parental line P was used in the synthesis of the population, contributing the exclusive allele A at the QTL L, and the allele M, not necessarily exclusive, at the molecular marker locus J. Let c represent the recombination frequency between L and J, and suppose that P contributed a proportion {phi} of the genetic base of the population. Additionally, let {theta} represent the frequency of the marker allele M which was not contributed by P. Under those settings, in the initial generation, the genotypic frequencies are Pr(AM|t0) = {phi}, Pr(aM|t0) = {theta}, Pr(am|t0) = 1–{phi} {theta}, Pr(Am|t0) = 0, where a and m represent alternative alleles at L and J, respectively (considering two biallelic loci). Genetic frequencies of the gene and marker alleles of interest are Pr(A|t0) = {phi}, Pr(M|t0) = {phi} + {theta}; and the linkage disequilibrium between L and J is Dt0 = {phi}(1 – {phi}{theta}).

Considering a recurrent selection scheme, phenotypic selection can be made for A, marker-based selection can be made for M, or selection can be imposed on a combination of both. The two-locus fitness, without epistasis, can be estimated by the multiplication of the relative fitness of each locus (Hedrick, 2005), such that wAM.AM = wAA x wMM, with 0 ≤ w ≤ 1. Additionally, assume no linkage phase effect, such that wAM.am = wAm.aM = wh. In the plant breeding context, relative fitness is proportional to the chance of the individual of being selected by the breeder.

Applying standard population genetics theory (Hedrick, 2005, p.560), the expected frequencies of gametes carrying each combination marker-gene in the next generation (t + 1) can be computed on the basis of selection and recombination:

Formula 2

Formula 2

Formula 2

Formula 2
where the selection coefficient of gametes is given by the weighted average fitness of individuals able to generate that gamete, as follows:

Formula 2

Formula 2

Formula 2

Formula 2
and the population average fitness at generation t is given by

Formula 2
For each generation, linkage disequilibrium and average fitness are updated on the basis of new gametic frequencies, Dt = Pr(AM|t)Pr(am|t) – Pr(Am|t)Pr(aM|t) and vice versa. The conditional probability of the gene allele A given the marker allele M can be computed as Pr(A|M,t) = Pr(AM|t)/Pr(M|t), where Pr(M|t) = Pr(AM|t) + Pr(aM|t).

The Conditional Probability Pr(A|M) Is Maximum for Exclusive Marker Alleles
Using this model, the breeder can better understand the expected association between two loci in actual populations, on the basis of the tightness of linkage between them, the knowledge of the genetic base of the population, number of generations since its synthesis, and the intensity of phenotypic selection applied. Nevertheless, it should be clear that a large variance may be observed in real situations.

The conditional probability Pr(A|M) is a measure of LD between the marker and the gene. Markers at which the allele present in the parent P is exclusive in the population have maximum initial LD [Pr(A|M, t0) = 1]. In contrast, nonexclusive marker alleles (for which {theta} > 0) have a lower starting LD. For example, consider a synthetic population formed by intercrossing 20 pure lines in equal frequency, in which the parent P contributed the exclusive gene allele A at the QTL L, located 1 cM from the SSR locus J, and 5 cM from the SSR locus K. If the allele carried by P at J was present in one of the 19 other parents used in the synthesis of the population ({theta} = 0.05), whereas the allele at K was exclusive ({theta} = 0), the allele at K is a better predictor of A than the allele at J, until the 17th cycle of recombination (Fig. 1 ). For a number of generations, {theta} is the major factor defining conditional probabilities.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Variation of Pr(A|M) as a function of recombination frequency (c) and frequency of M from other parents (variable {theta}), over 20 generations, with {phi} = 0.05 and no selection.

 
A short time frame is a fundamental characteristic of plant breeding populations for AA, compared with natural populations. Therefore, in plant breeding populations, the most significant association does not necessarily indicate the position of the gene. In the long term, linkage becomes the major factor defining the association between A and M, and only closely linked markers remain in high LD; however, the time required to achieve this situation is longer than most breeding programs have been in existence. For this reason, AA in plant breeding programs should be considered a method of detection of markers for indirect selection, rather than a method for fine-mapping QTL. To alleviate this problem, the breeder should use methods like recurrent selection, which maximizes the heterozygosity and the opportunities for recombination. In addition, the term association analysis seems more appropriate than association mapping, when the experimental material is a plant breeding population.

Use of Significant Markers for Marker-Assisted Selection
Once a genetic marker has been demonstrated to be associated with a phenotypic trait of interest, it can be used as a selection target to obtain an indirect response in the trait. In recurrent selection, markers could be used to store information acquired from phenotypic evaluations, which can be used for selection in later cycles. Likewise, in pedigree breeding, markers could carry information about yield potential from the phase of replicated field trials to the phase of single-plant selection, when evaluation of yield cannot be made with reasonable precision. If the linkage between A and M is tight, genetic gain can be accelerated by including M in a selection index that considers several traits and markers simultaneously (Falconer and Mackay, 1996; Lande and Thompson, 1990).

Figure 2 shows changes in Pr(A) and Pr(A|M) caused by phenotypic selection, marker-based selection, and a combination of both in a recurrent selection scheme, for three levels of linkage. In this example, it was considered that each allele substitution reduced the relative fitness of the individual by 0.10 in the case of the gene and by 0.25 in the case of the marker. The higher impact of the marker in the relative fitness (here interpreted as the chance of selection by the breeder), is justified by its higher recognizability compared with a gene underlying a quantitative trait. When the marker is closely linked to the gene (c = 0.01), marker-based selection is approximately as efficient as combined selection. For loose linkage (c = 0.10), combined selection is more efficient than either method alone. In all cases, the use of the marker improved selection efficiency. This advantage must be compared with the additional costs of obtaining genotypic data to evaluate the economic efficacy of marker-assisted selection.


Figure 2
View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2. Effect of selection based on the phenotype, the marker genotype, or a combination of both, for different levels of linkage between marker and gene, on Pr(A|M)– continuous lines– and Pr(A)– dashed lines– with {phi} = 0.05 and {theta} = 0. The relative fitness of genotypes AA, Aa and aa was 1, 0.9 and 0.8, respectively, and of genotypes MM, Mm, and mm was 1, 0.75 and 0.5, respectively.

 
The breeder may wish to select exclusively the favorable marker allele, trying to achieve fixation of the favorable gene allele in a single generation. This strategy, however, would require evaluating a larger population or neglecting other traits in that generation, which could have negative correlated responses from intense selection for M. This strategy would probably be advisable only for eliminating low-frequency unfavorable alleles.

When Pr(A|M) {approx} Pr(A), the marker can be considered exhausted as a tool for indirect selection. Further gains could be obtained by phenotypic selection or by shifting to another marker in closer association with the gene. The expectation through time is that the causative polymorphism is discovered and selected directly.

Received for publication September 7, 2005.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Marker Allele Means Are...
 Association between a QTL...
 Population Structure Must Be...
 Choice of Populations for...
 REFERENCES
 




This article has been cited by other articles:


Home page
The Plant GenomeHome page
C. Zhu, M. Gore, E. S. Buckler, and J. Yu
Status and Prospects of Association Mapping in Plants
The Plant Genome, July 1, 2008; 1(1): 5 - 20.
[Abstract] [Full Text] [PDF]


Home page
jashsHome page
I. Simko and J. Hu
Population Structure in Cultivated Lettuce and Its Impact on Association Mapping
J. Amer. Soc. Hort. Sci., January 1, 2008; 133(1): 61 - 68.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
R. Tuberosa, S. Salvi, S. Giuliani, M. C. Sanguineti, M. Bellotti, S. Conti, and P. Landi
Genome-wide Approaches to Investigate and Improve Maize Response to Drought
Crop Sci., December 18, 2007; 47(Supplement_3): S-120 - S-141.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Crossa, J. Burgueno, S. Dreisigacker, M. Vargas, S. A. Herrera-Foessel, M. Lillemo, R. P. Singh, R. Trethowan, M. Warburton, J. Franco, et al.
Association Analysis of Historical Bread Wheat Germplasm Using Additive Genetic Covariance of Relatives and Population Structure
Genetics, November 1, 2007; 177(3): 1889 - 1913.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. Skot, J. Humphreys, M. O. Humphreys, D. Thorogood, J. Gallagher, R. Sanderson, I. P. Armstead, and I. D. Thomas
Association of Candidate Genes With Flowering Time and Water-Soluble Carbohydrate Content in Lolium perenne (L.)
Genetics, September 1, 2007; 177(1): 535 - 547.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
R. Bernardo and J. Yu
Prospects for Genomewide Selection for Quantitative Traits in Maize
Crop Sci., May 31, 2007; 47(3): 1082 - 1090.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Ross-Ibarra, P. L. Morrell, and B. S. Gaut
Colloquium Papers: Plant domestication, a unique opportunity to identify the genetic basis of adaptation
PNAS, May 15, 2007; 104(suppl_1): 8641 - 8648.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (9)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Breseghello, F.
Right arrow Articles by Sorrells, M. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Breseghello, F.
Right arrow Articles by Sorrells, M. E.
Agricola
Right arrow Articles by Breseghello, F.
Right arrow Articles by Sorrells, M. E.
Related Collections
Right arrow Cell Biology & Molecular Genetics
Right arrow Crop Genetics
Right arrow Nitrogen


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal