|
|
||||||||
a Inst. for Genomic Diversity, Cornell Univ., Ithaca, NY 14853
b Dep. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX 77843
c Dep. of Agronomy, Kansas State Univ., Manhattan, KS 66506
d USDA-ARS Plant Stress and Germplasm Development Unit, Cropping Systems Research Lab., Lubbock, TX 79415
e current address: Nature Source Genetics, Ithaca, NY 14850
f contributed equally to this work
* Corresponding author (sk20{at}cornell.edu).
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: BIC, Bayesian information criteria SCP, Sorghum Conversion Program SSR, simple sequence repeat
| ACKNOWLEDGMENTS |
|---|
| NOTES |
|---|
|
|
|---|
Received for publication February 12, 2007.
a Inst. for Genomic Diversity, Cornell Univ., Ithaca, NY 14853
b Dep. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX 77843
c Dep. of Agronomy, Kansas State Univ., Manhattan, KS 66506
d USDA-ARS Plant Stress and Germplasm Development Unit, Cropping Systems Research Lab., Lubbock, TX 79415
e current address: Nature Source Genetics, Ithaca, NY 14850
f contributed equally to this work
* Corresponding author (sk20{at}cornell.edu).
Association mapping is a powerful strategy for identifying genes underlying quantitative traits in plants. We have assembled and characterized genetic and phenotypic diversity of a sorghum [Sorghum bicolor (L.) Moench] panel suitable for association mapping, comprised of 377 accessions representing all major cultivated races (tropical lines from diverse geographic and climatic regions), and important U.S. breeding lines and their progenitors. Accessions were phenotyped for eight traits, and levels of population structure and familial relatedness were assessed with 47 simple sequence repeat (SSR) loci. The panel exhibited substantial morphological variation and little genotypic differentiation was observed between the converted tropical and breeding lines. The phenotypic and genotypic data were used to evaluate the performance of several association models in controlling for spurious associations. Our analysis indicated that association models that accounted for both population structure and kinship performed better than those that did not. In addition, we found that the optimal number of subpopulations used to correct for population structure was trait dependent. Although augmentation of the genotypic data with additional SSR loci may be necessary, the association models, genotypic data, and germplasm panel described here provide a starting point for sorghum researchers to begin association studies of traits and markers or candidate genes of interest.
Abbreviations: BIC, Bayesian information criteria SCP, Sorghum Conversion Program SSR, simple sequence repeat
| INTRODUCTION |
|---|
|
|
|---|
Sorghum bicolor (L.) Moench, a tropical grass probably domesticated in East Africa 3000 to 6000 yr ago (Kimber, 2000), is a staple cereal food for millions of people in the developing world. In the United States, sorghum is grown primarily for animal feed, but recently grain, forage, and sweet sorghum types have received increased attention as potential energy crops (Rooney et al., 2006). In the early 1960s, sorghum breeders recognized that elite U.S. cultivars had experienced strong genetic bottlenecks as a result of breeding practices. To provide a long-term solution for the development of a sustainable sorghum production system, the USDA in cooperation with Texas A&M University initiated the Sorghum Conversion Program (SCP), a strategy to introduce novel genetic variation from exotic, tropical germplasm into modern U.S. cultivars (Stephens et al., 1967). To expedite their use in temperate-zone breeding programs, tropical lines were converted to photoperiod insensitive, early maturing, and short stature phenotypes. This was accomplished by crossing each tropical line to a temperate, elite line and selecting the progeny for day-neutral flowering and reduced height. Progeny were then backcrossed repeatedly to the tropical parent until the resultant lines were fixed for temperate alleles at major loci controlling maturity and height while retaining
90% of the tropical genome (Lin et al., 1995). To date,
850 converted tropical lines have been released by the SCP and this germplasm has allowed breeders to exploit novel variation for insect and disease resistance, drought tolerance, heterosis, and grain quality. As a result, most of the U.S. sorghum hybrids grown today have some tropical germplasm in their pedigrees (Gabriel, 2005). Because the SCP lines contain much of the genetic diversity present in tropical sorghums (Stephens et al., 1967), the use of this material offers a unique opportunity for dissecting the genetic and molecular bases of agriculturally important traits through association mapping.
Association mapping is a powerful tool for high-resolution mapping of loci underlying quantitative traits and is dependent on the structure of linkage disequilibrium or the non-random association of alleles or polymorphisms at different loci (Flint-Garcia et al., 2003). Significant associations between genotypes and phenotypes can be caused (i) by marker loci harboring causal polymorphisms, (ii) by marker loci being physically linked to a polymorphism that influences a particular phenotype, and, of greater concern, (iii) from the effects of population structure or familial relationship (kinship) between individuals comprising the test population. Individuals belonging to the same subpopulations or that are related by descent (kin), are more likely to both resemble each other phenotypically and share common alleles, independently of these alleles being linked or not to the causal polymorphism (leading to spurious associations). Therefore, knowledge of population structure and kinship in association mapping populations is critical. In fact, Yu et al. (2006) have recently shown that controlling for such demographic factors can lead to a significant reduction in the number of spurious associations in maize (Zea mays L.).
Our goal in this research was to assemble and characterize the genetic and phenotypic diversity of a panel of sorghum germplasm suitable for association mapping. We assessed the levels of population structure and familial relatedness with simple sequence repeat (SSR) loci and evaluated the performance of several models in controlling for spurious associations (i.e., Type I error).
| MATERIALS AND METHODS |
|---|
|
|
|---|
Phenotyping
The sorghum association panel was grown in a randomized complete block design in Weslaco (two replicates), College Station (two replicates), and Lubbock (four replicates), TX, in the summer of 2006 in 6-m rows spaced at either 0.50 or 0.75 m. Eight traits were evaluated on a per-row basis: flag leaf length and width (measured only at Weslaco and College Station), plant height, terminal branch length, flowering time (measured as time to mid-anthesis), panicle length, and flag leaf height and exsertion. A linear model was used to account for the effects of location and replication and derive a single phenotypic value for each genotype.
Genotyping and Statistical Analyses
DNA Preparation and Polymerase Chain Reactions
Six to 10 seeds from each accession were germinated in small pots in the dark at 27°C. Total genomic DNA was isolated from pooled etiolated seedlings (
7 d old) following a standard CTAB extraction protocol (Doyle and Doyle, 1987). Amplifications were performed in 10-µL volumes using one of two temperature cycling protocols (Supplementary Table 2) depending on how fluorescent amplicons were labeled (either with locus-specific or universal primers). The following reaction components were common between the two protocols: 20 ng of total genomic DNA, 1X polymerase chain reaction (PCR) buffer, 2.5 mmol L–1 MgCl2, 0.2 mmol L–1 dNTPs, 5% DMSO, and 0.5U of Taq DNA polymerase. For PCRs with end-labeled primers, 2 pmol each of 5'-labeled forward and unlabeled reverse primers were used. Cycling protocol consisted of 95°C for 3 min; followed by 27 cycles of 95°C for 30 s, 55°C for 20 s, and 72°C for 30 s; and incubation at 72°C for 45 min. Polymerase chain reactions with universal primers were performed using unlabeled locus-specific primers, one of which contained a 5' binding site for the universal primer (i.e., "pigtail"), and a 5'-labeled universal primer (see Supplementary Table 2). Primer concentrations and cycling conditions were as previously described (Schuelke, 2000) and fluorescent SSR detection and allele scoring was performed according to Casa et al. (2005).
Simple Sequence Repeat Loci and Diversity Estimates
A total of 49 SSR loci (Supplementary Table 2) were evaluated. Loci were selected based primarily on their genomic location (i.e., to achieve fairly uniform genome coverage) and secondarily on information content (see Casa et al., 2005). Approximately half of the SSRs assayed were also used in a previous study to characterize 3000 accessions from the world sorghum collection (Billot and Hash, 2006). Summary statistics, including number of alleles, allele frequencies, and polymorphism information content for each locus, were calculated with PowerMarker version 3.0 (Liu and Muse, 2005).
Population Structure and Kinship
The program Structure, version 2.1 (Pritchard et al., 2000), was used to determine the presence of population structure and assign sorghum lines to subpopulations. This program implements a model-based clustering method for inferring population structure using genotypic data from unlinked markers. We used an ancestry model that allowed population admixture, and allele frequencies among populations were assumed to be correlated (i.e., allele frequencies were likely to be similar due to shared ancestry or migration). We also tested for the optimal number of subpopulations, k (distinct from K, the kinship matrix; see below). We allowed k to vary from 1 to 12, with three independent runs for each value. The optimal k value was determined based on the estimated logarithmic likelihood of the data and its performance in the unified-mixed model for association analysis (Yu et al., 2006; and see below). Initially, 5 x 104 burn-in lengths (Pritchard et al., 2000) and sampling periods of iterations were used for each k value, while 5 x 105 burn-in and sampling periods of iterations were used for the optimal k value. Runs that did not meet the convergence criterion were not analyzed. A graphical display of subpopulation composition was generated with DISTRUCT (Rosenberg, 2004).
Simple sequence repeat based relative kinship estimates, defined as Fij = (Qij – Qm)/(1 – Qm)
ij, where
ij is the pairwise kinship coefficient (Fij is an estimator of the coefficient), Qij is the probability of identity by state for random genes from i and j, and Qm is the average probability of identity by state for genes coming from random individuals in the population from which i and j were drawn, were obtained as previously described (Loiselle et al., 1995; Ritland, 1996; Lynch and Ritland, 1999; Rousset, 2002) using SPAGeDi 1.2 (Hardy and Vekemans, 2002). Confidence intervals (95%) for kinship coefficients were calculated as follows:
![]() |
![]() |
Fi is the standard deviation for Fi and Fi is the average kinship between any given individual in a (sub)population and all other individuals in this same (sub)population, and n is the population size. For each subpopulation identified we also estimated the expected heterozygosity, He (also referred to as unbiased gene diversity, D), and confidence intervals based on 1000 bootstrap replicates using PowerMarker version 3.0.
Association Model Testing
In all, we tested the performance of six different association models in controlling for false positives or spurious associations (Type I error). These included (i) a model that did not control for population structure or relatedness (naive), expressed as y = A
+ e; (ii) a model that accounted for population structure (Q), y = A
+ Q
+ e; (iii) a model that controlled for familial relatedness or kinship (K), y = A
+ Zu + e; (iv) an alternative kinship-based model (K'), y = A
+ Zu' + e; (v) a mixed model that accounted for both population structure and kinship (QK), y = Xβ + A
+ Q
+ Zu + e; and (vi) a mixed model with the alternative kinship estimates (QK'), y = Xβ + A
+ Q
+ Zu' + e. Here, y is a vector of phenotypic observation;
is a vector of allelic effects; e is a vector of residual effects;
is a vector of population effects; β is a vector of fixed effects other than allelic or population group effects; u is a vector of polygenic background effects; Q is the population membership assignment matrix (based on SSR genotypic data and calculated using Structure) relating y to
; and X, A, and Z are incidence matrices of 1s and 0s relating y to β,
, and u, respectively. The variances of the random effects are expressed as Var(u) = KVg, and Var(e) = RVR, where K is the kinship matrix (based on SSR genotypic data and calculated using SPAGeDi), R is a matrix with the off-diagonal elements being zero and the diagonal elements being the reciprocal of the number of observations for which each phenotypic data point was obtained, Vg is the genetic variance, and VR is the residual variance. Best linear unbiased estimates of β,
, and
(fixed effects), and best linear unbiased predictions of u (random effects) were obtained by solving the mixed-model equations, Eq. [5] and [6], above. We should note that the naive, Q, K, and QK models have been described previously (Yu et al., 2006). To our knowledge, this is the first time that the K' and QK' models have been tested.
In this study, we tested models that used two different methods for estimating kinship (the K and K' matrices). In the first round of simulations (K matrix), the negative kinship values were simply set to zero as suggested by Yu et al. (2006). In a second round of simulations, Finf=
was used instead to compute a new kinship matrix (K' matrix), where Fij' = (Fij – Fref)/(1 – Fref)
ij, Fij is the raw pairwise kinship coefficients from the SPAGeDi output, and
is the average of the minimum (negative) Fij values from each row (or column) of the untransformed kinship matrix. In the case of K', negative Fij values quantify divergence between individuals belonging to different populations under drift and not under selection or adaptation (which is better accounted for by Q).
The Type I error rate was simulated based on the method described by Yu et al. (2006). Because of the low marker density relative to the extent of linkage disequilibrium in sorghum (Hamblin et al., 2005), few, if any, randomly distributed SSRs should associate with particular phenotype(s). Consequently, the random SSRs provide an empirical null distribution with which the models (above) can be tested for their ability to control Type I error. Only alleles with a frequency >10% in our sample were used for the simulation. Because both fixed and random effects were involved, only likelihood-based methods could be used for model comparison. In this study we used two methods, the –2 residual log likelihood (for comparing nested models) and the Bayesian information criterion (BIC) (Schwarz, 1978) (for non-nested models).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Diversity Levels and Partitioning in the Association Panel
Levels of genetic diversity in converted tropical (n = 228) and breeding (n = 149) panels were assessed at 49 SSR loci. Two loci (Xtxp065 and Xtxp287) exhibited >20% missing data and were discarded from further analysis. Summary statistics for the remaining 47 loci are presented in Supplementary Table 2. A total of 553 alleles were detected in the entire panel, with the number of alleles per locus ranging from 2 (Xcup06, 19, 23, and 55) to 50 (Xtxp343). Polymorphism information content values ranged from 0.01 (Xcup19 and Xcup55) to 0.93 (Xtxp067 and Xtxp343), with an average value of 0.56. Although the converted tropical sorghums exhibited both a higher average number of alleles per locus (10.9) and greater diversity (0.56) than the breeding lines (7.5 and 0.51, respectively), analysis of molecular variance indicated that only 2% of the variation was due to differences between panels. As a whole, therefore, the SCP and breeding lines were not significantly differentiated from each other. This result was not surprising, considering that the breeding panel was designed to contain as much diversity as possible and many of the breeding lines had tropical progenitors.
Population Structure and Kinship
To assess population structure, we used a model-based method for determining the number of subpopulations, k, in our panel. An accession was assigned to the subpopulation or group to which it showed the highest probability of membership. When k was varied from 1 to 12, the posterior probability of the data improved steadily for k
8 and reached a plateau for k
9 (Fig. 1
). Based on this result and results from further tests using our phenotypic data (see below), splitting the panel into either nine (k = 9) or 10 (k = 10) subpopulations best described population structure for testing the association mapping models.
|
k
10), the probability of assignment of individuals to specific groupings was not always consistent. For example, broomcorn, sudanense, and other loose-headed S. bicolor accessions were assigned to different populations in different runs for the same value of k. For k = 9, Subpopulation I was comprised mostly of nigricans (caudatum type) and guinea accessions from East Africa and India, whereas Subpopulation VIII contained mostly caudatum and guinea types from West Africa (Table 1). Because racial classification relies on a limited number of morphological traits, it was not surprising that some caudatum and guinea accessions clustered by geographic location rather than by race. Caudatum accessions were also prevalent in two other subpopulations: IV (mostly zerazera working group) and IX (a mix of accessions from different caudatum working groups including "hybrid/intermediate" caudatum types). Subpopulation III was comprised primarily of kafir types, whereas durra sorghums were split between two groups: V (accessions from India and Ethiopia) and VI (milo durras). Accessions classified as bicolor were prevalent in Subpopulations II (containing sudanense and broomcorn types) and VII (comprising dochna-bicolors and margaritiferum, a guinea-type sorghum grown primarily in West Africa). The primary differences between groupings in k = 9 and k = 10 were the division of Subpopulation IX into two groups (k = 10, Subpopulations IX and X) and alternative clustering of accessions assigned to poorly defined groups such as k = 9 Subpopulations I, II, and VII (Table 1).
|
15% of the accessions.
We also calculated the mean kinship,
ij, and the expected heterozygosity, He, for each population identified for k = 9 and k = 10 (Table 1). Among the well-defined subpopulations (i.e., those that were consistently defined between analyses and for which individuals showed highly correlated probability of assignment across simulations, shaded areas in Table 1), relatively high
ij and low He were observed for Subpopulations III and VIII. Here, strong kinship coupled with low diversity suggests that the kafir and West African guinea/caudatum groups have experienced a more severe genetic bottleneck than the other well-defined subpopulations. This observation is consistent with the historical geographic isolation of both groups and temperate adaptation of kafir types (see Casa et al., 2005). As would be expected, a tendency toward weaker kinship and higher gene diversity was observed among the loosely defined subpopulations (i.e., I, VII, and IX/X).
Phenotypic Diversity
This panel contains much of the phenotypic diversity present in tropical sorghums (e.g., variation for plant morphology and panicle architecture), and represents diverse geographic and climatic regions (representatives from the Americas, Asia, and the entire African continent, from high and low elevations, rainy and dry environments). Therefore, the collection presents an excellent source of variability for dissecting the genetic bases of agriculturally important traits and adaptation.
Phenotypic variation for eight traits, organized by subpopulation (k = 9), is shown in Table 2 . Except for the sudanense/broomcorn subpopulation, which was considerably taller than the other groups, the amount of variation for both plant height and flowering time (i.e., maturity-related traits) across all subpopulations was similar. Because variation in maturity can confound the phenotypic evaluation of other agriculturally important traits, use of this germplasm panel in association studies should simplify dissection of these traits in sorghum. Perhaps more importantly, this panel should be well suited for association studies in higher latitude environments, such as the United States. Besides plant height and flowering time, all other measured traits exhibited substantial variation both within and among subpopulations (Table 2), particularly for inflorescence-related traits such as panicle and terminal branch lengths.
|
Optimizing the Number of Subpopulations for Model Testing
Results from our analysis of population structure (see above) indicated that the probability of groupings based on the genotypic data improved steadily for k
8 but reached a plateau for k
9 (Fig. 1). Therefore, we refined estimates of the optimal number of subpopulations in our panel by testing the likelihood of each of these k values against the phenotypic data for all measured traits. Figure 2
shows the performance of the QK model for some phenotypic traits, measured by the BIC, as a function of k. While results for only four traits are shown, the lowest BIC values, and therefore the best likelihood for all traits, were obtained for k = 9 and 10 using mixed models QK and QK'. We, therefore, used membership assignment matrices for both k = 9 and k = 10 (Q9 and Q10, respectively; Supplementary Tables 3 and 4) for model testing and further analyses.
|
Performance of Various Association Models
Simulations of Type I error for all models and all quantitative traits combined are presented in Fig. 3
. As expected, the naive model showed the highest (25% of the P values are under the 5% threshold) inflation of P values (i.e., P values were not uniformly distributed), and consequently the highest Type I error. Controlling for population structure (Q model) yielded a slight improvement over the naive model but a considerable inflation of P values (15% of the P values are under the 5% threshold) can still be seen (Fig. 3). On the other hand, all other models (K, K', QK, and QK') showed a good approximation to a uniform distribution of P values, with QK and QK' performing slightly better (4.8% of the P values are under the 5% threshold) than the K or K' models (5.1% of the P values are under the 5% threshold).
|
|
|
We have developed an additional resource for the sorghum research community, a panel consisting of 377 diverse lines suitable for association mapping. Because genotypic data for this panel along with appropriate statistical models (QK method) for correcting for population structure and kinship are being made available to the entire sorghum community, researchers interested in using this germplasm can collect phenotypic data for their favorite trait or markers and candidate genes without the need for further SSR genotyping. For the short term, requests for a limited number of seeds of the sorghum association panel should be sent to Cleve Franks at the USDA-ARS Plant Stress and Germplasm Development Unit, Cropping Systems Research Laboratory, Lubbock, TX. In the near future, these lines will be maintained and distributed by the U.S. National Plant Germplasm System (www.ars-grin.gov/npgs/). Furthermore, 20 of the diverse lines characterized in this study are now being used to develop recombinant inbred populations in our labs for use in nested association mapping strategies (Yu et al., 2008). With the community resources presently available, S. bicolor is achieving sets of genetic data and genomics tools comparable to those of other important grain commodities such as rice (Oryza sativa L.), maize, and wheat.
Thanks to Charlotte Acharya for assistance with data collection and analysis and to Claire Billot (CIRAD) and Genoplante for making primer sequences available before publication. We also want to express our gratitude to Martha Hamblin for her comments and suggestions. Special thanks to Dr. Darrel Rosenow for assistance in classifying the accessions used in this study.
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.
Received for publication February 12, 2007.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A.-A. Saidou, C. Mariac, V. Luong, J.-L. Pham, G. Bezancon, and Y. Vigouroux Association Studies Identify Natural Variation at PHYC Linked to Flowering Time and Morphological Variation in Pearl Millet Genetics, July 1, 2009; 182(3): 899 - 910. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. H. Sneller, D. E. Mather, and S. Crepieux Analytical Approaches and Population Types for Finding and Utilizing QTL in Complex Plant Populations Crop Sci., March 17, 2009; 49(2): 363 - 380. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height The Plant Genome, March 1, 2009; 2(1): 48 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, Z. Zhang, C. Zhu, D. A. Tabanao, G. Pressoir, M. R. Tuinstra, S. Kresovich, R. J. Todhunter, and E. S. Buckler Simulation Appraisal of the Adequacy of Number of Background Markers for Relationship Estimation in Association Mapping The Plant Genome, March 1, 2009; 2(1): 63 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H. Paterson, J. E. Bowers, F. A. Feltus, H. Tang, L. Lin, and X. Wang Comparative Genomics of Grasses Promises a Bountiful Harvest Plant Physiology, January 1, 2009; 149(1): 125 - 131. [Full Text] [PDF] |
||||
![]() |
P. J. Brown, W. L. Rooney, C. Franks, and S. Kresovich Efficient Mapping of Plant Height Quantitative Trait Loci in a Sorghum Association Population With Introgressed Dwarfing Genes Genetics, September 1, 2008; 180(1): 629 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhu, M. Gore, E. S. Buckler, and J. Yu Status and Prospects of Association Mapping in Plants The Plant Genome, July 1, 2008; 1(1): 5 - 20. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||