Crop Science Illumina
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 31 May 2007
Published in Crop Sci 47:1082-1090 (2007)
© 2007 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bernardo, R.
Right arrow Articles by Yu, J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Bernardo, R.
Right arrow Articles by Yu, J.
Agricola
Right arrow Articles by Bernardo, R.
Right arrow Articles by Yu, J.
Related Collections
Right arrow Maize
Right arrow Crop Genetics

CROP BREEDING & GENETICS

Prospects for Genomewide Selection for Quantitative Traits in Maize

Rex Bernardoa,* and Jianming Yub

a Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, 411 Borlaug Hall, 1991 Upper Buford Cir., St. Paul, MN 55108
b Dep. of Agronomy, Kansas State Univ., 2004 Throckmorton Hall, Manhattan, KS 66506

* Corresponding author (bernardo{at}umn.edu).


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
The availability of cheap and abundant molecular markers in maize (Zea mays L.) has allowed breeders to ask how molecular markers may best be used to achieve breeding progress, without conditioning the question on how breeding has traditionally been done. Genomewide selection refers to marker-based selection without first identifying a subset of markers with significant effects. Our objectives were to assess the response due to genomewide selection compared with marker-assisted recurrent selection (MARS) and to determine the extent to which phenotyping can be minimized and genotyping maximized in genomewide selection. We simulated genomewide selection by evaluating doubled haploids for testcross performance in Cycle 0, followed by two cycles of selection based on markers. Individuals were genotyped for NM markers, and breeding values associated with each of the NM markers were predicted and were all used in genomewide selection. We found that across different numbers of quantitative trait loci (20, 40, and 100) and levels of heritability, the response to genomewide selection was 18 to 43% larger than the response to MARS. Responses to selection were maintained when the number of doubled haploids phenotyped and genotyped in Cycle 0 was reduced and the number of plants genotyped in Cycles 1 and 2 was increased. Such schemes that minimize phenotyping and maximize genotyping would be feasible only if the cost per marker data point is reduced to about 2 cents. The convenient but incorrect assumption of equal marker variances led to only a minimal loss in the response to genomewide selection. We conclude that genomewide selection, as a brute-force and black-box procedure that exploits cheap and abundant molecular markers, is superior to MARS in maize.

Abbreviations: BLUP, best linear unbiased predictions • MARS, marker-assisted recurrent selection • QTL, quantitative trait loci • SNP, single nucleotide polymorphism.


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
SINCE THE 1980s, molecular markers have largely been considered as an add-on to cultivar development. Marker applications for quantitative traits have been investigated in the context of the question: "Given the current methods for breeding crops, how can molecular markers enhance breeding progress?" Viewing markers primarily as an aid to selection was a natural consequence of the high cost of genotyping for molecular markers. In the 1990s, for example, the cost of genotyping one sample for one restriction fragment length polymorphism (RFLP) or simple sequence repeat (SSR) marker (i.e., one data point) was more than US$1 (Linkage Genetics, pers. comm.; Biogenetic Services, pers. comm.).

Advances in high-throughput genotyping have markedly reduced the cost per data point of molecular markers. This reduction was mainly the result of three parallel developments (Jenkins and Gibson, 2002; Syvänen, 2005): (i) the discovery of vast numbers of single nucleotide polymorphism (SNP) markers in many species; (ii) development of high-throughput technologies, such as multiplexing and gel-free DNA arrays, for screening SNP polymorphisms; and (iii) automation of the marker-genotyping process, including streamlined procedures for DNA extraction. The aggressive use of markers in a maize breeding program is exemplified by the Monsanto corporation: from 2000 to 2006, the number of data points generated by the company in its breeding programs increased more than 40-fold, while their cost per data point decreased more than six-fold (Eathington et al., 2007).

The availability of cheap and abundant molecular markers changes how markers may be viewed in a breeding program. Instead of viewing markers as an add-on to the breeding process, we can now ask, "How can molecular markers best be used to achieve breeding progress?" without conditioning this question on how breeding has traditionally been done. In this study, we investigated genomewide selection (Meuwissen et al., 2001) as a specific way by which cheap and abundant molecular markers can be exploited in breeding for a quantitative trait in maize.

Exploiting molecular markers in breeding has involved finding a subset of markers associated with one or more traits. Marker-assisted recurrent selection (MARS) refers to the improvement of an F2 population by one cycle of marker-assisted selection (i.e., based on phenotypic data and marker scores) followed by three cycles of marker-based selection (i.e., based on marker scores only) in an off-season nursery (Johnson, 2001, 2004). The marker scores are typically determined from about 20 to 35 markers that have been identified, in a multiple-regression model, as significantly associated with one or more traits of interest (Edwards and Johnson, 1994; Koebner, 2003).

In contrast, genomewide selection refers to marker-based selection without significance testing and without identifying of a subset of markers associated with the trait (Meuwissen et al., 2001). If, for example, the candidates are genotyped for 256 SNP markers distributed across the genome, in genomewide selection, the effects on the quantitative trait (i.e., breeding values) of all 256 markers are fitted as random effects in a linear model. Trait values are then predicted as the sum of an individual's breeding values across all 256 markers, and selection is subsequently based on these genomewide predictions. Studies have indicated that genomewide selection leads to high correlations between predicted and true breeding value for a quantitative trait (Meuwissen et al., 2001) and is useful in dairy cattle (Bos taurus) breeding (Schaeffer, 2006). Published information on the use of genomewide selection in maize breeding, however, is unavailable.

Our first objective in this study was therefore to assess, by simulation, the response resulting from genomewide selection compared with MARS, which uses significance tests to identify marker-trait associations. Our second objective was to determine the extent to which obtaining phenotypic data (i.e., phenotyping) can be minimized and genotyping maximized in genomewide selection. If, for example, the cost of genotyping on a large scale is 15 cents per data point and the cost of growing a maize yield-trial plot is US$20, genotyping for 256 markers would then cost less ($38) than conducting yield trials at, say, five locations ($100). We therefore investigated the response to genomewide selection with different population sizes for phenotyping and for subsequent genomewide selection.


    MATERIALS AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Genomewide Selection and MARS
We considered genomewide selection and MARS as depicted in Fig. 1. Cycle 0 is evaluated during the regular growing season when phenotypic measurements are meaningful. Cycles 1 and 2 of genomewide selection and MARS are conducted in an off-season (e.g., winter) nursery, where phenotypic evaluations of maize are not meaningful but where three generations can be grown in 1 yr (Koebner, 2003; Johnson, 2004). For Cycle 0, we considered genomewide selection and MARS that involved the production of doubled haploids, as our research has shown that the response to marker-based selection is greater with doubled haploids than with F2 plants (P. Mayor and R. Bernardo, unpublished).


Figure 1
View larger version (22K):
[in this window]
[in a new window]

 
Figure 1. Genomewide selection and marker-assisted recurrent selection in maize.

 
In our simulations, we assumed that two parental inbreds that differed at L quantitative trait loci (QTL) and at NM marker loci were crossed to form an F1 (Fig. 1). The F1 plants were crossed to a haploid inducer, haploids were identified, and chromosome doubling was induced (Bordes et al., 1997; Seitz, 2005). Crossing two inbreds maximizes the potential linkage disequilibrium (Dudley, 1993) and having only one meiosis in generating doubled haploids preserves a high linkage disequilibrium. In Cycle 0, a total of NDH doubled haploids were evaluated for their testcross performance when crossed with an unrelated inbred.

The MARS procedure has involved selection with a Lande and Thompson (1990) index of both phenotypic data and marker data in Cycle 0 (Edwards and Johnson, 1994). But in preliminary studies, we found that at the end of selection (after Cycle 2 in Fig. 1), the response with phenotypic selection in Cycle 0 was about 100 to 102% of the response with combined phenotypic and marker selection in Cycle 0. The best NSel(DH) doubled haploids in Cycle 0 were therefore selected based only on their testcross performance (Fig. 1). The NSel(DH) doubled haploids were intermated to produce F1s. These F1s were random-mated to produce a total of N plants in Cycle 1.

Breeders conceivably would consider genomewide selection or MARS in several populations at a time, but not all populations may perform well enough to warrant further selection. Assuming that the population as a whole had acceptable testcross performance in Cycle 0, the NDH doubled haploids in Cycle 0 were subsequently genotyped for a total of NM markers (Fig. 1). In genomewide selection, best linear unbiased predictions (BLUP) of breeding values associated with each of the NM markers were obtained (Meuwissen et al., 2001). In MARS, a subset of the NM markers with significant effects were identified. Out of the N plants in Cycle 1, a total of NSel plants were then selected before flowering on the basis of their predicted breeding values in genomewide selection or their estimated marker scores (Lande and Thompson, 1990) in MARS. For each of the two procedures, the NSel plants in Cycle 1 were random-mated to produce Cycle 2. Given that selection occurred before flowering, recombination was accomplished in the same generation as selection. The procedures for genomewide selection and MARS were repeated in Cycle 2 to produce Cycle 3. Overall selection responses were based on the mean of Cycle 3.

Values of NDH, N, and NM were chosen on the basis of existing technologies that allow high-throughput genotyping of 48 samples for 64 single nucleotide polymorphisms on a single chip (Biotrove, 2006). We therefore chose values of NDH and N that were multiples of 48 and values of NM that were multiples of 64, although our use of these values does not necessarily imply that we endorse this particular technology. In our comparisons of genomewide selection and MARS (the first objective of our study), we assumed NDH = N = 144, given that the population sizes used in MARS have typically ranged from 100 to 150 (Johnson and Mumm, 1996; Johnson, 2001). In our comparisons involving smaller numbers of plants phenotyped and larger number of plants genotyped (the second objective of our study), we assumed NDH values of 48 and 96 and N values of 288, 576, and 1152. The number of markers was NM = 64, 128, and 256 in both genomewide selection and MARS, and NM = 32 (half a chip) in MARS and NM = 512 and 768 in genomewide selection. Our previous research has indicated that the response to selection is largest when the number of selected individuals is roughly equal to the number of cycles of selection that will be conducted (Bernardo et al., 2006). We therefore used an NSel of 4 in Cycles 1 and 2. Given that doubled haploids are fully inbred, we used NSel(DH) = 2NSel = 8 in Cycle 0.

Genetic Models and Genotypic and Phenotypic Values
We considered a trait controlled by L = 20, 40, or 100 QTL, each with two alleles. The sizes of the chromosomes and of the entire genome (1749 cM) corresponded to those in a published maize linkage map (Senior et al., 1996). The genome was divided into NM bins, and a marker was assumed to be located at the midpoint of each bin. The L QTL were randomly located among the 10 chromosomes according to a uniform distribution across the total genome.

At the kth QTL, the testcross effect of the favorable allele was ak, with a = (L– 1)/(L + 1) as suggested by Lande and Thompson (1990). The testcross effect of the less favorable allele was –ak. The first QTL therefore had the largest effect, the second QTL had the second-largest effect, and the Lth QTL had the smallest effect. The first parent had the favorable allele at even-numbered QTL and the less-favorable allele at odd-numbered QTL. Coupling and repulsion linkages were therefore generated at random, given that QTL positions were randomly assigned without regard to the magnitude of QTL effects. Epistasis was assumed absent so that the variances at individual QTL (which were needed in later comparisons involving equal-versus-true QTL variances) could be calculated.

Random nongenetic effects were added to the genotypic values to obtain phenotypic values. Specifically, we assumed that the testcrosses in Cycle 0 were evaluated in six environments. This replication across environments allowed us to estimate testcross genetic variance (VG) and error variance (VE) for BLUP. The random nongenetic effects had a normal distribution with a mean of zero and were scaled so that testcross heritability, on a testcross-mean basis, was H = 0.20, 0.50, or 0.80 in the conceptual base (i.e., F2) population.

BLUP of Breeding Values in Genomewide Selection
In genomewide selection, the performance of the testcrosses of doubled haploids in Cycle 0 was modeled as

Formula
where y is an NDH x 1 vector of testcross phenotypic means of the doubled haploids in Cycle 0; µ is the overall testcross mean of the doubled haploids in Cycle 0; 1 is an NDH x 1 vector with all elements equal to 1; X is an NDH x NM design matrix, with elements equal to 1 if the doubled haploid is homozygous for the marker allele from the first parental inbred (i.e., of the initial F1) and –1 if the doubled haploid is homozygous for the marker allele from the second parental inbred; g is an NM x 1 vector of breeding values associated with the marker allele from the first parental inbred at each of the marker loci; and e is an NDH x 1 vector of residual effects.

The VG and VE were estimated by analysis of variance from the testcross phenotypic values of doubled haploids in Cycle 0. The variance of breeding values at each of the NM marker loci was assumed equal to VG/NM (Meuwissen et al., 2001). The g was obtained by solving the mixed-model equations (Henderson, 1984), with µ as a fixed effect and g as a random effect. In Cycles 1 and 2 of genomewide selection, the S0 plants were selected according to their predicted breeding values, that is, Zg, where Z was an N x NM design matrix that corresponded to the marker genotypes of the N plants evaluated in Cycle 1 or 2. The elements in Z were zero if the S0 plants in Cycle 1 or 2 were heterozygous at the corresponding marker loci.

Estimation of Marker Effects in MARS
Procedures for estimating marker effects in MARS have been described previously (Bernardo and Charcosset, 2006; Bernardo et al., 2006) but are repeated here for convenience. Markers associated with the trait were identified and their effects were estimated only in Cycle 0. First, multiple regression of phenotypic value on the number of marker alleles (0 or 2 in doubled haploids) from the first parental inbred was performed on a chromosome-by-chromosome basis. Significant markers on each chromosome were identified by backward elimination. Second, multiple regression coefficients were obtained by jointly analyzing all the markers found significant in the per-chromosome analysis. Standard procedures were used to handle any singularities encountered in multiple regression analysis (Press et al., 1992, p. 56). Relaxed significance levels ({alpha}= 0.20, 0.30, and 0.40), which have been found to maximize the response to MARS (Hospital et al., 1997; Johnson, 2001), were used. Selection in Cycles 1 and 2 was based on marker scores (Lande and Thompson, 1990), which were calculated for each S0 plant from the multiple regression coefficients for the markers with significant effects.

True versus Equal QTL Variances in Genomewide Selection
We investigated the effects of the convenient but incorrect assumption, in genomewide selection, that the variances due to each marker were equal (i.e., VG/NM). Suppose the testcross genetic variance at the kth QTL is VG(k) = 1/4a2k and the total testcross genetic variance across all QTL is {Sigma}VG(k). We simulated the situation in which each of the NM markers corresponded exactly to a unique QTL. We then assessed the selection response when the variance due to each marker was conveniently but incorrectly modeled as {Sigma}VG(k)/L, versus the selection response when the variance at the kth marker was equal to its true value, VG(k).

Data Analysis
Each simulation experiment comprised a combination of NDH, N, NM, L, and heritability. We conducted 1000 repeats of each simulation experiment and averaged the results across the repeats. Each repeat differed in the location of QTL, the genotypes of the individuals sampled, and their phenotypic values. Selection responses were expressed in units of the testcross genetic standard deviation in the base population. The statistical significance (P = 0.05) of differences in selection response was determined with z tests, using the variances of the selection response across the 1000 repeats of an experiment.


    RESULTS AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Genomewide Selection versus MARS
Genomewide selection led to larger responses than MARS. Consider population sizes of NDH = 144 doubled haploids in Cycle 0 and N = 144 plants in Cycles 1 and 2. At the end of selection (Fig. 1), the responses (in units of the testcross genetic standard deviation in the base population) to genomewide selection ranged from 2.86 to 4.67 across different numbers of QTL (L = 20, 40, and 100), numbers of markers (NM = 64 to 768), and levels of heritability (H = 0.20, 0.50, and 0.80; Table 1). In comparison, the responses to MARS ranged from 2.22 to 4.19. As expected, the responses to both genomewide selection and MARS increased as heritability and the number of QTL increased. Heritability had a larger effect than the number of QTL on selection response.


View this table:
[in this window]
[in a new window]

 
Table 1. Response to testcross phenotypic selection among doubled haploids in Cycle 0, and to testcross phenotypic selection in Cycle 0 plus two cycles of selection based either on markers with significant effects (marker-assisted recurrent selection, MARS) or on all markers (genomewide selection) in simulated maize populations.

 
For a given number of QTL and heritability, the maximum response across different NM was always greater with genomewide selection than with MARS (Table 1). For a highly heritable trait (H = 0.80) controlled by L = 20 QTL, the maximum responses were 4.10 with genomewide selection (NM = 512) and 3.87 with MARS (NM = 128). These maximum responses represented a 6% advantage of genomewide selection over MARS (i.e., RGS:MARS = 106%). But for a trait controlled by many QTL (L = 100) and with a low heritability (H = 0.20), the maximum responses were 3.31 with genomewide selection (NM = 512) and 2.80 with MARS (NM = 64). These maximum responses represented an 18% advantage of genomewide selection over MARS. The maximum responses with genomewide selection and MARS therefore suggest that genomewide selection is most useful for complex traits that are controlled by many QTL and have low heritability. Furthermore, a selection index comprising several traits would have many component QTL and would likely have low heritability. We speculate that genomewide selection would be more useful for improving an index of several traits than for improving a single trait alone.

Both the genomewide selection and MARS schemes we considered involved selection for testcross performance among doubled haploids in Cycle 0 (Fig. 1). Regardless of the number of QTL, the responses to selection for testcross performance in Cycle 0 were 1.60 for H = 0.20, 2.26 for H = 0.50, and 2.61 for H = 0.80 (Table 1). These results indicated that, in the breeding schemes for genomewide selection and MARS, phenotypic selection in Cycle 0 accounted for more than 50% of the total response to selection. One cycle of testcross phenotypic selection among doubled haploids would require at least 2 yr to complete, even with the aggressive use of off-season nurseries. Thus, as other researchers have noted for MARS (Edwards and Johnson, 1994; Hospital et al., 1997; Koebner, 2003), the advantage of marker-based selection in genomewide selection or MARS is in gain per unit time rather than in gain per cycle.

Given that the gain from phenotypic selection in Cycle 0 is the same for both genomewide selection and MARS, the total responses after Cycle 2 do not solely represent the per se advantage of selection based on all markers versus selection based only on significant markers. When the response due to testcross phenotypic selection in Cycle 0 was disregarded, the advantage of genomewide selection over MARS increased. Consider a highly heritable trait (H = 0.80) controlled by L = 20 QTL. When the response due to phenotypic selection in Cycle 0 (2.61, Table 1) was disregarded, the maximum response was 18% higher with genomewide selection than with MARS (i.e., RGS-PS:MARS-PS = 118%). Likewise, for a trait controlled by many QTL (L = 100) and with low heritability (H = 0.20), the maximum response was 43% higher with genomewide selection than with MARS. The response directly due to marker-based selection in Cycles 1 and 2 was therefore substantially greater with genomewide selection than with MARS.

In genomewide selection, the use of NM = 128, 256, 512, or 768 markers did not lead to significant differences (LSD0.05 {cong} 0.10 in Table 1) when heritability was H = 0.20 or when L = 20 QTL controlled the trait (Table 1). But when heritability was H = 0.50 or 0.80 and when L = 40 or 100 QTL controlled the trait, the responses were larger with NM = 256, 512, or 768 markers than with NM = 128 markers. Regardless of the heritability and number of QTL, responses to genomewide selection were smallest when NM = 64 markers were used. These results indicate that a minimum of NM = 128 to 256 polymorphic markers should be used in genomewide selection in maize and that more markers should be used for complex traits that have, at the same time, a high heritability (e.g., due to extensive phenotyping). These suggested numbers of markers are also likely dependent on the population size used (i.e., NDH = N = 144); with a finite population size, the number of recombinants between closely spaced markers becomes low and the information from adjacent markers becomes redundant as the number of markers increases.

In contrast, responses to MARS were largest with NM = 64 or 128 markers (Table 1). The responses to MARS with NM = 32 markers were smaller because of insufficient coverage of the genome. When the population size is small and heritability is low, exploiting the effects of only the major QTL rather than of all QTL leads to larger responses in MARS (Bernardo and Charcosset, 2006). Using only NM = 64 markers when heritability was H = 0.20 typically led to 20 to 30 markers being declared significant (results not shown). In contrast, using NM = 256 markers led to 75 to 125 markers being declared significant regardless of the number of QTL. The number of significant markers often exceeded the number of QTL, indicating model overfitting when NM = 256 markers were used in multiple regression in MARS.

Genomewide Selection with Minimum Phenotyping and Maximum Genotyping
The standard scheme we considered for genomewide selection involved phenotyping and genotyping NDH = 144 doubled haploids in Cycle 0 and genotyping N = 144 plants in Cycles 1 and 2. Compared with this standard scheme, reducing the number of doubled haploids to NDH = 96 led to similar responses provided that larger values of N were used. Consider a trait controlled by L = 40 QTL and with a heritability of H = 0.50. The maximum response under the standard scheme for genomewide selection was 4.02 (Table 1). A comparable response (4.00) was achieved when NDH = 96 doubled haploids were evaluated in Cycle 0, N = 1152 plants were evaluated in Cycles 1 and 2, and NM = 128 markers were used (Table 2). A comparable response (3.96) was also achieved when NDH = 96 doubled haploids were evaluated in Cycle 0, N = 576 plants were evaluated in Cycles 1 and 2, and NM = 256 markers were used. Both of these alternative schemes involved the same number of data points in Cycles 1 and 2, that is, 147456 data points with N = 1152 plants x NM = 128 markers, or N = 576 plants x NM = 256 markers. Compared with the standard scheme for genomewide selection, these two combinations of N and NM also maintained the selection response for other numbers of QTL (L = 20 or 100).


View this table:
[in this window]
[in a new window]

 
Table 2. Response to genomewide selection for different numbers of QTL, markers, doubled haploids evaluated in Cycle 0 (NDH), and individual plants in genomewide selection (N) in simulated maize populations.

 
Compared with the standard scheme for genomewide selection, reducing the number of doubled haploids to NDH = 48 led to a decrease in selection response regardless of the number of markers or the number of plants in Cycles 1 and 2 (Table 2). For a trait controlled by L = 40 QTL and with a heritability of H = 0.50, the maximum response with NDH = 48 was only 3.47 (for NM = 256 markers and N = 1152 plants in Cycles 1 and 2). These results therefore suggest that some minimum amount of phenotyping is required for genomewide selection. Specifically, approximately NDH = 100 doubled haploids need to be evaluated for their testcross performance in Cycle 0. Drastically reducing the number of doubled haploids, say to NDH = 50, would severely limit the response to genomewide selection regardless of the population sizes used for subsequent marker-based selection in Cycles 1 and 2.

Modeling Variances at Marker Loci
A BLUP approach and a Bayesian approach to genomewide selection were considered by Meuwissen et al. (2001). A Bayesian approach allows the modeling of variances due to individual QTL. In contrast, the simpler BLUP approach, which we used in this study, assumed that each marker has a variance equal to VG/NM. This assumption of equal, underlying marker variances does not imply that the BLUP of breeding values associated with each of the NM markers are equal; the assumption simply means that the marker effects are drawn from the same distribution with a variance of VG/NM, and that the effects of individual markers are expected to vary. For example, with L = 40 QTL and NM = 128 markers, the predicted effects of the favorable marker allele were unequal across the NM = 128 markers (Fig. 2). The BLUP of marker effects ranged from near zero to 0.25 for H = 0.20, 0.41 for H = 0.50, and 0.61 for H = 0.80. The true effects of the favorable QTL allele ranged from 0.14 to 0.95. These results show how the effects of the L = 40 QTL were jointly explained by all NM = 128 marker loci.


Figure 2
View larger version (20K):
[in this window]
[in a new window]

 
Figure 2. True effects of 40 quantitative trait loci (QTL) and predicted breeding values, for genomewide selection, associated with 128 marker loci at different levels of testcross heritability (H). Predicted breeding values were for a population size of NDH = 144 doubled haploids in cycle 0 and were averaged across 1000 repeats of the simulation experiment. In each repeat, the 128 markers were ranked in descending order of their effects.

 
The penalty associated with the false assumption of equal QTL variances can be assessed from the difference in response when the true QTL variances are used, versus when the marker variances are conveniently but incorrectly assumed equal to VG/NM. The loss in selection response due to this convenient but incorrect assumption ranged from 8% for a trait controlled by L = 100 QTL and with a heritability of H = 0.20, to 0% for a trait controlled by L = 20 QTL and with a heritability of H = 0.80 (results not shown). The mean loss in selection response, across different numbers of QTL and levels of heritability, was only 2%. This result indicated that, for the population sizes we considered, only a minimal loss in selection response occurs when the variances at all marker loci are incorrectly assumed equal to VG/NM. This result further suggested that, for the genomewide selection schemes we considered, Bayesian approaches to modeling the variance due to individual markers would have little, if any, advantage.

Costs of Genomewide Selection Schemes
The usefulness of different schemes for genomewide selection would depend on the costs involved. Suppose the cost of obtaining testcross performance data is US$100 for each doubled haploid (i.e., yield trials at five locations and US$20 per location), and the cost of genotyping is 15 cents per data point. Ignoring, for simplicity, other associated costs (e.g., production of doubled haploids, recombination, growing Cycle 1 and Cycle 2 plants), the cost of a standard scheme for genomewide selection (NDH = 144 doubled haploids, N = 144 plants in Cycles 1 and 2, and NM = 128 markers) would be US$22,694. Of this total cost, 63% is for phenotyping in Cycle 0 and 37% is for genotyping in Cycles 0, 1, and 2.

We have previously mentioned that compared with the standard scheme with NDH = N = 144 doubled haploids or plants and NM = 128 markers, similar responses were achieved with NDH = 96 doubled haploids and either (i) NM = 128 markers and N = 1152 plants in Cycles 1 and 2 or (ii) NM = 256 markers and N = 576 plants in Cycles 1 and 2. For simplicity, assume the cost of growing different numbers of individual plants in Cycles 1 and 2 (e.g., N = 144 to 1152 plants) is negligible compared with the total cost of phenotyping and genotyping. The two aforementioned schemes with minimum phenotyping and maximum genotyping would cost the same as the standard scheme only if the cost per marker data point reduced to 2 cents. More generally, the cost of a marker data point would need to be about 1000/0.02 = 5000 times less than the cost of phenotyping one entry. This rough analysis suggests that, at present, genomewide selection schemes that minimize phenotyping and maximize genotyping are not yet feasible. But as the costs of phenotyping increase and the costs of genotyping decrease, such schemes would be worth considering.

The use of standardized, off-the-shelf SNP chips would decrease the costs associated with genotyping (Jenkins and Gibson, 2002). By this we mean that instead of using different sets of SNP markers for each cross, the same set of, say, 256 or 512 SNP markers could be used for all breeding populations undergoing genomewide selection. Although some of the SNP markers will not always be polymorphic, the economy of scale in a standardized genotyping platform should offset the cost of parental screens and manufacturing custom SNP chips for different breeding populations. On the other hand, costs associated with growing plants in Cycles 1 and 2, collecting leaf tissue, and extracting DNA would remain constant regardless of the number of SNP markers used.

Application in Breeding Programs
The use of doubled haploids rather than F2 plants in Cycle 0 not only leads to a larger selection response (P. Mayor and R. Bernardo, unpublished), but it also has practical advantages. Specifically, in addition to recombining the best doubled haploids in Cycle 0 to produce Cycle 1 (Fig. 1), the best doubled haploids can also be advanced for further evaluation and testcrossing in the breeding program. Producing doubled haploids by crossing F1 plants to a haploid inducer and producing F2 plants by selfing F1 plants both require only one generation (Bordes et al., 1997; Seitz, 2005). Testcross selection among doubled haploids in Cycle 0 therefore does not entail any loss in time compared with testcross selection among F2 plants.

In Cycle 0 we considered selection based on testcross phenotypic data only, instead of selection based on an index (Lande and Thompson, 1990) that combines testcross phenotypic data and marker scores. As already mentioned, in our preliminary studies, the total response with phenotypic selection in Cycle 0 was about 100 to 102% of the response with combined phenotypic and marker selection in Cycle 0. The reasons for this lack of superiority of combined phenotypic and marker selection are unclear. Perhaps the population sizes were too small for obtaining marker scores that were good enough to be useful in selection. Or perhaps the potential gains from combined phenotypic and marker selection were inherently limited because the procedure was postdictive rather than predictive; that is, a data set was used to estimate marker effects, and these marker effects were then used to help identify the best individuals in the same data set.

Regardless of these unclear reasons, selection based only on phenotypic data in Cycle 0 has practical advantages. Specifically, it is unlikely that all populations evaluated in Cycle 0 would be viable candidates for further selection. In other words, breeders would discard Cycle 0 populations with unacceptable performance for one or more traits and would retain only those populations with acceptable performance. In the genomewide selection scheme we considered, the doubled haploid population in Cycle 0 is genotyped only after its testcross performance is deemed acceptable based on field tests. This procedure therefore prevents the unnecessary genotyping of doubled haploid populations that would have been discarded before marker-based selection. Marker data are not needed immediately after Cycle 0 because two generations of recombination are required for the doubled haploids (Fig. 1). Genotyping only after Cycle 0 therefore does not delay genomewide selection in Cycles 1 and 2.

Maize breeding typically involves four steps: (Step 1) choosing the parents of a breeding population, (Step 2) improving the mean performance of the population before inbred development, (Step 3) developing superior inbreds from the population, and (Step 4) finding combinations of inbreds (among all available inbreds) that perform well as single-cross hybrids. A BLUP approach based only on phenotypic and pedigree data has been found useful for evaluating the combining ability of maize inbreds (for Step 1) and predicting the performance of single crosses before field testing (for Step 4; Bernardo, 1996). On the other hand, doubled haploids or other types of progenies from the same biparental cross have the same pedigree. As such, pedigree-based BLUP is not useful in within-population selection (Steps 2 and 3; Bernardo, 2002, p. 234–235) unless strong family structures are developed during selfing. While we found in this study that genomewide selection is useful in population improvement, we speculate that genomewide selection will be less useful in choosing parents of a breeding population and finding pairs of inbreds that perform well as single-cross hybrids.

The promise of genomewide selection obviously does not imply that gene discovery should no longer be done. Several approaches for discovering QTL have been proposed: comparative genomics, association genetics, candidate-gene approach, and QTL mapping (reviewed by Mackay, 2001). These approaches for gene discovery will continue to be vital for increasing our basic knowledge of the genes underlying quantitative traits. Comparative genomics and association mapping usually focus on diverse germplasm, and the results from these approaches may not be readily applicable to selection in narrow, elite germplasm (Breseghello and Sorrells, 2006; Fig. 3). Candidate gene approaches, which are not mutually exclusive of comparative genomics and association mapping, utilize biological knowledge to identify a few genes that may be introgressed into elite germplasm to improve quantitative traits (Gebhardt et al., 2007). Genome selection, in contrast, does not involve gene discovery. But even though MARS and genomewide selection do not emphasize gene discovery, QTL mapping can and should be done in conjunction with both MARS and genomewide selection (Fig. 3). Although selection is genomewide, the markers with large, highly significant effects may be considered as putatively linked to major QTL.


Figure 3
View larger version (22K):
[in this window]
[in a new window]

 
Figure 3. Strategic positioning of genomewide selection compared to other methodologies for gene discovery and selection for complex traits.

 
We conclude that genomewide selection, which does not involve identifying a set of markers associated with the traits of interest, is superior to MARS, which involves finding a subset of markers with significant effects. Genomewide selection can be described as a black-box procedure (Haley et al., 2006) as well as a brute-force procedure for exploiting markers to improve a quantitative trait. By brute force, we mean that large numbers of markers are used as a surrogate for the phenotype and large numbers of individual plants are evaluated for their marker data. By black box, we mean that the procedure does not involve dissecting the mechanisms underlying the control and inheritance of quantitative traits. Akin to classical quantitative genetics, this black-box property of genomewide selection effectively avoids issues pertaining to the number of QTL controlling a trait, the distribution of effects of QTL alleles, and epistatic effects due to genetic background. Rather, genomewide selection simply aims to increase the mean performance of a particular population by exploiting cheap and abundant molecular markers.


    ACKNOWLEDGMENTS
 
Jianming Yu's contribution to this research was supported in part by a grant from the USDA CSREES-NRI Plant Genome Program.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

Received for publication November 1, 2006.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
H. P. Piepho
Ridge Regression and Extensions for Genomewide Selection in Maize
Crop Sci., June 26, 2009; 49(4): 1165 - 1176.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
J. W. Dudley and G. R. Johnson
Epistatic Models Improve Prediction of Performance in Corn
Crop Sci., May 11, 2009; 49(3): 763 - 770.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Zhong, J. C. M. Dekkers, R. L. Fernando, and J.-L. Jannink
Factors Affecting Accuracy From Genomic Selection in Populations Derived From Multiple Inbred Lines: A Barley Case Study
Genetics, May 1, 2009; 182(1): 355 - 364.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
C. H. Sneller, D. E. Mather, and S. Crepieux
Analytical Approaches and Population Types for Finding and Utilizing QTL in Complex Plant Populations
Crop Sci., March 17, 2009; 49(2): 363 - 380.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
R. Bernardo
Genomewide Selection for Rapid Introgression of Exotic Germplasm in Maize
Crop Sci., March 17, 2009; 49(2): 419 - 425.
[Abstract] [Full Text] [PDF]


Home page
The Plant GenomeHome page
C. M. Tobias
A Genome May Reduce Your Carbon Footprint
The Plant Genome, March 1, 2009; 2(1): 5 - 8.
[Full Text] [PDF]


Home page
GeneticsHome page
K. Pajerowska-Mukhtar, B. Stich, U. Achenbach, A. Ballvora, J. Lubeck, J. Strahwald, E. Tacke, H.-R. Hofferbert, E. Ilarionova, D. Bellin, et al.
Single Nucleotide Polymorphisms in the Allene Oxide Synthase 2 Gene Are Associated With Field Resistance to Late Blight in Populations of Tetraploid Potato Cultivars
Genetics, March 1, 2009; 181(3): 1115 - 1127.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
E. L. Heffner, M. E. Sorrells, and J.-L. Jannink
Genomic Selection for Crop Improvement
Crop Sci., January 28, 2009; 49(1): 1 - 12.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. J. Ceron-Rojas, F. Castillo-Gonzalez, J. Sahagun-Castellanos, A. Santacruz-Varela, I. Benitez-Riquelme, and J. Crossa
A Molecular Selection Index Method Based on Eigenanalysis
Genetics, September 1, 2008; 180(1): 547 - 557.
[Abstract] [Full Text] [PDF]


Home page
The Plant GenomeHome page
C. Zhu, M. Gore, E. S. Buckler, and J. Yu
Status and Prospects of Association Mapping in Plants
The Plant Genome, July 1, 2008; 1(1): 5 - 20.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
N. C. Collins, F. Tardieu, and R. Tuberosa
Quantitative Trait Loci and Crop Performance under Abiotic Stress: Where Do We Stand?
Plant Physiology, June 1, 2008; 147(2): 469 - 486.
[Full Text] [PDF]


Home page
GeneticsHome page
S. Zhong and J.-L. Jannink
Using Quantitative Trait Loci Results to Discriminate Among Crosses on the Basis of Their Progeny Mean and Variance
Genetics, September 1, 2007; 177(1): 567 - 576.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bernardo, R.
Right arrow Articles by Yu, J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Bernardo, R.
Right arrow Articles by Yu, J.
Agricola
Right arrow Articles by Bernardo, R.
Right arrow Articles by Yu, J.
Related Collections
Right arrow Maize
Right arrow Crop Genetics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome