Crop Science Illumina
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 1 November 2006
Published in Crop Sci 46:S-27-S-40 (2006)
© 2006 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Supplemental Table 1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Casa, A. M.
Right arrow Articles by Kresovich, S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Casa, A. M.
Right arrow Articles by Kresovich, S.
Agricola
Right arrow Articles by Casa, A. M.
Right arrow Articles by Kresovich, S.

ORIGINAL RESEARCH

Evidence for a Selective Sweep on Chromosome 1 of Cultivated Sorghum

Alexandra M. Casa*, Sharon E. Mitchell, Jeffrey D. Jensen, Martha T. Hamblin, Andrew H. Paterson, Charles F. Aquadro and Stephen Kresovich

A.M. Casa, S.E. Mitchell, M.T. Hamblin, and S. Kresovich, Institute for Genomic Diversity, Cornell Univ., Ithaca, NY 14853; J.D. Jensen and C.F. Aquadro, Dep. of Molecular Biology and Genetics, Cornell Univ., Ithaca, NY 14853; A.H. Paterson, Plant Genome Mapping Laboratory and Comparative Grass Genomics Center, Univ. of Georgia, Athens, GA 30602. DNA sequences were deposited in GenBank under accession numbers DQ459071, DQ462793DQ463100

* Corresponding author (amc56{at}cornell.edu).


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results and Discussion
 REFERENCES
 
Recently, a simple sequence repeat (SSR)-based genome-wide diversity scan of Sorghum bicolor (L.) Moench identified several candidate loci with patterns of variation consistent with directional selection in cultivated lines. Data were insufficient, however, to determine if selection had actually occurred at or near candidate SSR loci or if the unusual diversity patterns observed were due to the effects of demographic factors such as population bottlenecks or mating system. In the present study, we collected DNA sequences from 10 segments within a 99 kb region flanking one of the previously identified candidates, SSR locus Xcup15, located near the distal end of chromosome 1. We performed statistical tests both to address alternative hypotheses to selection and to aid in localizing the selection target. Analyses of genomic DNA sequences from a panel of 17 cultivated and 13 wild accessions indicated that cultivated lines had reduced diversity in this region (about one-third of the diversity present in wild sorghums) and a moderate degree of differentiation was observed between cultivated and wild groups (Fst = 0.15). Several features of the data support the hypothesis that recent directional selection shaped diversity patterns around Xcup15, including overall low levels of variation and extensive haplotype structure (a predominant haplotype occurred over the 99 kb region) in cultivated sorghum, and a derived fixed difference at the 5' untranslated region (UTR) of a protein phosphatase 2C (PP2C) gene between cultivated and wild sorghums. Moreover, two of the four tests employed to detect deviations from the neutral, equilibrium model, the Hudson Kreitman Aguadé (HKA), and the composite likelihood ratio (CLR) tests indicated that patterns of diversity in the Xcup15 region were consistent with a selective sweep. Although we were unable to rule out demography as a possible explanation for the diversity patterns observed along this region, this study supported previous findings based on SSR diversity and identified candidates for the target of selection; the confirmation of which will require functional and association studies.


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results and Discussion
 REFERENCES
 
DURING THE PAST DECADE, SSRs have been extensively used for quantifying neutral genetic diversity in plant in situ populations and ex situ germplasm collections (Garris et al., 2005). Although a great deal of valuable genetic information for managing both in situ and ex situ collections has been generated, these studies have been retrospective rather than predictive (Kresovich et al., 2006). It would be desirable, therefore, to use information gathered by such analyses to concurrently identify agronomically or horticulturally useful diversity. With this point in mind, we have recently used population genetics-based analysis of genome-wide SSR diversity in S. bicolor to locate candidate genomic regions that may have undergone diversifying and, in particular, directional selection (Casa et al., 2005).

Population genetics theory predicts that intense directional selection, as would be experienced during crop domestication, is expected to dramatically reduce variation at the genomic target of selection and at linked neutral loci, a phenomenon known as genetic hitchhiking (Maynard Smith and Haigh, 1974). Following a selective sweep, new mutations arising in the selected region initially result in a skew in the site frequency spectrum (i.e., excess of rare alleles). Selection might also lead to genetic differentiation as a consequence of allele frequency shifts between selected and nonselected populations (e.g., a domestication-associated allele will quickly increase in frequency in the cultivated populations). In a genome-wide scan of diversity at neutral markers such as SSRs, loci that show unusual patterns of allelic variation relative to genome-wide averages (i.e., locus-specific reductions in diversity, excess of rare alleles, or increased population differentiation) may be linked to targets of selection. Genome-wide scans of diversity have been used in this manner to identify candidate genomic regions in organisms such as humans, Drosophila, Arabidopis, and maize (Zea mays L.) (Vigouroux et al., 2002; Kauer et al., 2003; Kayser et al., 2003; Aranzana et al., 2005). Once a candidate region has been identified, it may be possible to identify the target by surveying adjacent genomic regions and looking for a return to neutral patterns of variation (Schlotterer, 2003).

In a genome-wide scan of diversity at neutral markers such as SSRs, loci that show unusual patterns of allelic variation relative to genome-wide averages may be linked to targets of selection.

Sorghum bicolor, a tropical grass probably domesticated in eastern Africa 3000 to 6000 years ago (Kimber, 2000), is the fifth most important grain crop worldwide (FAO, 2004). Because of its ability to tolerate drought, soil toxicities, and temperature extremes more effectively than other cereals including maize, grain sorghum is a pillar of food security in the semiarid zones of western and central Africa. Sorghum's global socioeconomic importance has prompted substantial interest in characterizing levels of genetic diversity using molecular markers (Dje et al., 2000; Grenier et al., 2000; Menz et al., 2004). More recently, studies of DNA sequence diversity (Hamblin et al., 2004, 2005, 2006) have indicated that sorghum has both lower nucleotide diversity and more extensive linkage disequilibrium (LD) than maize. Compared with more distantly related rice (Garris et al., 2003), however, sorghum has less extensive LD.

Recently, an SSR-based genome-wide scan of diversity in S. bicolor identified several loci with patterns of variation deviating from neutral expectations (Casa et al., 2005), but data were not sufficient to determine whether the apparent signal of selection resulted from a true selective event or from demographic factors such as population bottlenecks or mating system. For example, bottlenecks can produce locus-specific effects that resemble the effects of directional selection (Thornton and Andolfatto, 2006), and the degree of genetic differentiation between populations is usually higher in self-pollinating species than in outcrossers, independent of selection (Hamrick and Godt, 1996).

Here, we sequenced a bacterial artificial chromosome (BAC) clone containing the previously identified candidate SSR locus, Xcup15, which exhibited the highest genetic differentiation between wild and cultivated S. bicolor (Fst = 0.76) (Casa et al., 2005). We also collected and analyzed DNA sequence data from a panel of 17 cultivated and 13 wild sorghum accessions to determine if patterns of variation in this region of the S. bicolor genome show evidence of a domestication-related selective sweep.


    Materials and Methods
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results and Discussion
 REFERENCES
 
Comparative Analysis
Identification of BAC Clones Containing Candidate SSR and Sequencing
As reported in an earlier study, primers that amplify candidate SSR locus Xcup15 were developed from the DNA sequence of S. bicolor restriction fragment length polymorphism probe pSB1790 (Schloss et al., 2002). The BAC clones from BTx623, an elite S. bicolor inbred line, were obtained from the Clemson University Genomics Institute (www.genome.clemson.edu/groups/bac/, verified 5 May 2006) and clones containing Xcup15 were identified by hybridization to an overgo probe (SOG0602) derived from pSB1790. Restriction fragment length polymorphism probe pSB1790 maps to S. bicolor chromosome 1 at 227.9 cM (P.J. Brown, 2006, personal communication) on the BTx623 x IS3620C map (Menz et al., 2002) and at 106.9 cM on the S. bicolor BTx623 x S. propinquum map (Bowers et al., 2003). We should note that S. bicolor chromosome 1 corresponds to linkage group (LG) A of Peng et al. (1999) and to LG C of Chittenden et al. (1994).

DNA was extracted from single BAC clones and used as templates in PCRs to confirm the presence of locus Xcup15. From these clones, a single BAC, c0156b06, was selected for complete DNA sequencing because of its central position on the S. bicolor physical map (www.genome.arizona.edu, verified 5 May 2006) relative to the location of Xcup15 and to the other BACs evaluated.

Randomly sheared libraries with inserts ranging from 1.0 to 4.0 kb were constructed and shotgun sequencing was performed with pGEM-T (Promega Corporation, Madison, WI) vector primers by MWG-Biotech (Ebersberg, Germany). Sequence reads were generated to {approx}8-fold coverage and assembled using Sequencher (Gene Codes Corporation, Ann Arbor, MI) followed by visual inspection of chromatograms. The DNA sequence of BAC clone c0156b06 was deposited in the National Center for Biotechnology Information (NCBI) nucleotide database (GenBank) under accession number DQ459071.

BAC Annotation and Sequence Comparisons
Genes were predicted using the Rice Genome Automated Annotation System (http://ricegaas.dna.affrc.go.jp/, verified 5 May 2006). This system utilizes several gene prediction programs including FGENESH (trained with monocot sequences), GENESCAN (trained with Arabidopsis or maize sequences), and the Rice Hidden Markov Model (RiceHHM). We annotated genes only where all prediction programs were in agreement. We also queried against the rice genome sequence (www.gramene.org, verified 5 May 2006) and the NCBI protein (nr), nucleotide (nt), and the expressed sequence tag (EST) databases. Predicted gene sequences were considered to be expressed if they were at least 99.8% similar to S. bicolor ESTs or EST consensus sequences. This criterion for sequence similarity was determined by pairwise comparisons of nucleotide diversity ({pi}) observed in coding regions of cultivated sorghum (average {pi} = 0.0020 or about one single nucleotide polymorphism, SNP, every 500 bp) (Hamblin et al., 2006). Repetitive sequences were identified by searching against both the Poaceae RepBase (www.girinst.org, verified 5 May 2006) and the TIGR Gramineae Repeat Database (http://tigrblast.tigr.org/euk-blast/index.cgi?project=osa1, verified 5 May 2006). PipMaker (Schwartz et al., 2000) was used both to align DNA sequences from rice BAC clone OSJNBa0003A09 (GenBank accession AC118132), identified by similarity searches above, and S. bicolor BAC c0156b06 and to generate sequence identity and dot plots.

Diversity Analysis
Plant Material
DNA sequences around Xcup15 were collected from 30 S. bicolor accessions including both cultivated (subsp. bicolor) (n = 17) and wild (subsp. arundinaceum) (n = 13) lines and a weedy relative, S. propinquum (Table 1). These accessions, comprising all S. bicolor subspecies and races, were chosen to maximize geographic distribution, morphological variation, and genetic diversity as assessed by variation at 74 SSR loci (Casa et al., 2005). This sampling strategy was devised to minimize the effects of population structure on tests of selection. Seeds from cultivated material (landraces) were obtained either from the National Center for Genetic Resources Preservation (USDA-ARS, Ft. Collins, CO) or the Plant Genetic Resources Conservation Unit (USDA-ARS, Griffin, GA), and seeds from wild accessions were provided by Mitchell R. Tuinstra (Agronomy Department, Kansas State University). Sorghum propinquum leaves were obtained from the Plant Genome Mapping Laboratory (University of Georgia). Information on geographic origin and racial classification was gathered primarily from the System-wide Information Network for Genetic Resources database (http://singer.cgiar.org/Search/SINGER/search.htm, verified 5 May 2006).


View this table:
[in this window]
[in a new window]

 
Table 1. Sorghum bicolor accessions analyzed including cultivated (ssp. bicolor) types, wild (ssp. arundinaceum) and outgroup (S. propinquum).

 
DNA Sequencing and Assembly
Total genomic DNA was isolated from individual seedlings following a standard CTAB extraction protocol and used as template in PCRs following previously established protocols (Casa et al., 2005) except for segments (loci) 9a and 9b (see Results and Discussion), where annealing temperature was 62°C. PCR products were prepared for direct sequencing by treatment with exonuclease I (New England Biolabs, Ipswich, MA) and shrimp alkaline phosphatase (Promega Corporation, Madison, WI) following the enzyme manufacturers' instructions. Single-pass sequencing was performed at the Cornell University BioResource Center. Most individuals were homozygous so double-pass sequences were obtained only when putative heterozygotes were encountered. DNA sequences were assembled using Sequencher and alignments were visually inspected and manually edited. Each set of sequence chromatograms was inspected independently by at least two people. DNA sequences were deposited in the NCBI PopSet database under accession numbers DQ462793-DQ463100.

DNA Sequence Analysis
Summary statistics including levels of diversity based on both the average number of nucleotide differences per site between two sequences ({pi}) and number of segregating sites ({theta}), interspecific divergence, and Fst, were calculated using DnaSP v. 4.0 (Rozas et al., 2003). Insertion–deletion polymorphisms were excluded from these analyses. Three statistics were employed to evaluate deviations from the neutral, equilibrium model:

(i) The HKA test (Hudson et al., 1987) was used to compare ratios of polymorphism to divergence for sampled regions assuming a neutral model (i.e., no selection). Each locus was tested against a reference locus comprised of pooled data from 204 loci (Hamblin et al., 2006). For intraspecific polymorphism the following parameters were used: S = 1075, N = 16, and L = 138243, where S is the number of variable sites, N is the sample size, and L is the total number of nucleotide sites surveyed in a sample of cultivated sorghum. For interspecific divergence we used K = 1948 and L = 136626, where, K is the average number of differences between cultivated S. bicolor and S. propinquum and L is the number of nucleotide sites evaluated. A Bonferroni correction was applied to account for multiple comparisons.

(ii) Tajima's D (Tajima, 1989) was employed to test for an excess of rare alleles. Following a selective sweep, new mutations arise in the selected region resulting in a skew in the distribution of nucleotide polymorphisms (site frequency spectrum). The population bottleneck associated with sorghum's domestication is, however, expected to affect the site frequency spectrum genome-wide; in particular, the variance of D will be much larger than under a neutral equilibrium model. Critical values of D were obtained from coalescent simulations of a simple bottleneck model that produces the same average number of segregating sites and the same average D as was observed in a genome-wide survey of variation in cultivated sorghum, and in which most of the parameters were estimated based on independent data (Hamblin et al., 2006): the average ancestral population mutation parameter (4Neµ) was fixed at 3.8 based on variation in wild S. bicolor; the population recombination parameter (4Ner) was fixed at 0.01 bp (Hamblin et al., 2005). The time of the bottleneck was 0.025(4Ne) generations ago, which would correspond to about 14000 generations ago if all our assumptions were correct (although this is considerably longer ago than is suggested by archeological data, namely 3000 to 6000 years ago, more recent bottlenecks were incompatible with the observed average value of D). Assuming that the size of the current population and the ancestral population are the same, the intensity of the bottleneck (the size of the bottlenecked population relative to its duration) required to produce the observed value of S was 2.1, equivalent to a 128-fold reduction in population size. The distribution of D values generated by 10000 simulations of this model had a 95% confidence interval of –1.96, +2.35.

(iii) The CLR test (Kim and Stephan, 2002) was employed for detecting directional selection along a recombining chromosome. This test compares the likelihood of observed patterns of DNA sequence variation under a selective sweep model compared with a neutral equilibrium model of evolution. The CLR test was also used to generate maximum likelihood estimates (MLEs) of the location of the putative selected site (X) and the strength of selection ({alpha} = 2Nes). The following parameters were used: NCD = 0 (number of coding regions), Rn = 0.023 [scaled recombination rate (4Ner) per nucleotide, where r = 4 x 10–8 (Hamblin et al., 2005) and 4Ne = 570000 (Hamblin et al., 2006)], {theta} – 1 (Watterson's estimate of {theta} from data; Watterson, 1978), Nrepl 1 (number of replicates), LBs 1 and RBs 100250 (left and right boundaries on the candidate region where beneficial mutation might be located), and intX 1000 (interval between initial guesses of X). Recombination rate was assumed constant across the region and {theta} was estimated from the data in order to make the CLR test conservative (Kim and Stephan, 2002). The frequency of the beneficial allele was set to 1. This method assumes the selected site was fixed very recently. Only accessions for which DNA sequences were available for all loci were included in the analysis (see Table 1). Variable sites were coded as either ancestral (0) (if the nucleotide at the variable position was shared with S. propinquum) or derived (1).

Distinguishing between Positive Selection and Demographic Factors
A goodness-of-fit (GOF) test (Jensen et al., 2005) was performed for discriminating whether CLR test rejections were due to selection or to nonequilibrium demographic effects. To determine significance, GOF values obtained from our polymorphism data were compared with those estimated from 1000 data sets simulated under a selection scenario using the maximum likelihood parameter estimates of the location and intensity of selection from the CLR test. In this way, given that the dataset has rejected neutrality in favor of selection, the GOF sets the CLR test selection model as the null and determines whether the sweep model explains the data well, or whether the data simply poorly fit a neutral, equilibrium model.


    Results and Discussion
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results and Discussion
 REFERENCES
 
Gene Annotation and Comparative Analysis
To identify functional regions that might be targets of selection, and to obtain sequence information allowing additional polymorphism surveys in the vicinity of Xcup15, we identified and sequenced S. bicolor BAC clone c0156b06. Within this 112592 bp clone, 20 complete genes were predicted (Table 2). Three of the predicted genes were associated with S. bicolor retrotransposons (either reverse transcriptases or polyproteins). Of the 17 remaining genes, 15 had homologs in rice (P < e–16), 13 of which were collinear in a region on rice chromosome 3 (nucleotides 1991425 to 2070448) (Table 2). Homologous S. bicolor transcripts could be identified for 65% (11/17) of the nontransposable element-related genes. We should note that failure to identify an EST does not necessarily mean that the gene is not expressed. Although there are {approx}200000 S. bicolor ESTs in the public domain (www.ncbi.nlm.nih.gov/; verified 5 May 2006), these do not capture all expressed genes.


View this table:
[in this window]
[in a new window]

 
Table 2. Predicted genes within Sorghum bicolor BAC clone c0156b06.

 
Comparative sequence analyses indicated that gene order and orientation were well conserved between the 112592 bp region containing Xcup15 (on S. bicolor chromosome 1) and an 82915 bp region on rice chromosome 3 (Fig. 1 ). Areas of noncollinearity were primarily located in two regions corresponding to the S. bicolor retroelements and their associated flanking sequences (Fig. 1).


Figure 1
View larger version (8K):
[in this window]
[in a new window]

 
Fig. 1. Dot plot showing areas of sequence similarity between Sorghum bicolor BAC c0156b06 and rice chromosome 3. The x axis shows the rice genome coordinates (nucleotides), and sorghum coordinates are on the y axis. Note that the sorghum BAC sequence is in opposite orientation to the rice sequence (i.e., lower right line in the diagonal corresponds to sorghum gene 1 on Table 2). Arrows on the right side of the figure span the sorghum retrotransposon insertions.

 
Sequence Diversity Assessment
To assess diversity in cultivated and wild accessions within the Xcup15 region, genomic DNA sequences were collected from 10 segments (loci) ranging in size from 240 to 1964 bp and spanning 99 kb centered on Xcup15 (Supplemental Table 1). Because regions with higher neutral mutation rates provide greater power to detect reductions in variation (and less selective constraint), we assayed mostly intronic and intergenic sequences (79.2% of the {approx}7.7 kb of DNA sequence obtained from each individual was from noncoding DNA). In only one instance (locus 7) was coding sequence solely analyzed. The candidate SSR, Xcup15, resides within locus 9b (Tables 3 and S-1).

Levels of within and between species variation (diversity and divergence, respectively) in the sampled region are shown in Table 3. Cultivated sorghums were invariant at six of the 10 loci and average nucleotide diversity ({pi}) was 0.0008 (range was 0.0–0.0071), a considerably lower estimate than obtained in a previous study of other genomic regions in the same sorghum accessions (average {pi} was 0.0023) (Hamblin et al., 2006). In general, levels of diversity based on the number of segregating sites ({theta}) were lower than those based on {pi} (Table 3). Notably, locus 9-10b was unusually diverse. This locus, from an intergenic region rich in miniature inverted repeat transposable elements (MITEs) between the PP2C gene and a predicted protein, accounted for most ({approx}90%) of the variation detected within cultivated sorghum.


View this table:
[in this window]
[in a new window]

 
Table 3. DNA sequence diversity for loci sampled within the Xcup15 region in cultivated and wild Sorghum bicolor.

 
In contrast to the cultivated material, wild accessions were polymorphic at all loci. Average diversity levels based on {pi} (0.0027) and {theta} (0.0031) were similar and about three times higher than in cultivated sorghum. Accession L-WA15 was heterozygous at three loci and three wild samples had a MITE insertion within locus 9-10a (data not shown). As observed in cultivated lines, locus 9-10b exhibited the highest levels of variation (Table 3). Notably, a {approx}1 kb transposon-like insertion was observed within locus 9b (which includes SSR locus Xcup15) of S. propinquum (outgroup). This insertion was absent in all cultivated and wild accessions.

Diversity and divergence trends for cultivated and wild sorghums within the genomic region containing Xcup15 are shown in Fig. 2 . Directional selection is expected to reduce levels of diversity in cultivated relative to wild sorghum around the selection target. Previous values based on genome-wide estimates of nucleotide diversity have indicated that cultivated accessions exhibit about two-thirds the diversity observed in wild material (Hamblin et al., 2005). In the Xcup15 region, however, cultivated lines were even less diverse, showing one-third the diversity of wild accessions. The contrast is very striking, however, when polymorphism data for locus 9-10b, an extreme outlier with similar polymorphism levels in both cultivated and wild sorghums (see above), were excluded from the analysis. Here, the amount of variation in cultivated sorghum was only 5% of that observed in wild accessions. The magnitude of this reduction in diversity is comparable with that reported for domestication-related genes in maize. In contrast to genome-wide estimates that indicate that maize contains {approx}57% of the variability found in its progenitor (Wright et al., 2005), the promoter regions of the maize teosinte branched1 (tb1) (Doebley et al., 1995) and teosinte glume architecture1 (tga1) (Dorweiler et al., 1993) alleles possess 3% (Wang et al., 1999) and 5% (Wang et al., 2005), respectively, of the variation observed in wild relatives, the teosintes. Both tb1 and tga1 have been shown to be targets of domestication-related selection in maize (Wang et al., 1999, 2005).


Figure 2
View larger version (16K):
[in this window]
[in a new window]

 
Fig. 2. Diversity, divergence, and population differentiation (Fst) for 10 loci in a 99 kb region flanking Xcup15 in cultivated and wild Sorghum bicolor lines. The y axis shows levels of diversity (x100) and divergence (x100) and S. bicolor BAC c0156b06 coordinates (bp) are on the x axis (actual numbers were multiplied by 100 to obtain the reported numbers). Solid and dashed trend lines represent diversity and divergence, respectively. Cultivated and wild sorghum accessions are denoted by open squares and circles, respectively. The solid gray line with asterisks shows Fst values, a measure of population differentiation between cultivated and wild lines. Loci sampled are designated by numbers and letters along the lines (see Table 3). The asterisk to the right of the graph denotes average divergence levels between cultivated sorghum and S. propinquum based on genome-wide estimates. The arrow head along the x axis indicates the approximate location of SSR locus Xcup15.

 
Our data indicate that average nucleotide divergence (1.4%) (Table 3) between cultivated S. bicolor and S. propinquum in the Xcup15 region was similar to previous estimates (Hamblin et al., 2004, 2006). Divergence at one locus (9b), however, is twice as high (2.8%) (Fig. 2). This locus contains the Xcup15 SSR and encompasses part of the 5' UTR and upstream region of the PP2C gene. These patterns probably reflect differences in underlying mutation rates and/or functional constraint on these regions.

Fst measures the level of genetic differentiation between populations (here, cultivated and wild sorghums) based on allele frequencies. Under a scenario of directional selection in cultivated sorghum, Fst values are expected to be higher at the selection target and adjacent loci, but diminish with distance as recombination prevents the unusual differentiation associated with selection from occurring. Although the average value of Fst observed across the entire Xcup15 region (0.15) is comparable with a previous estimate based on genome-wide SSR data (Fst = 0.13) (Casa et al., 2005), loci corresponding to the third intron (locus 9a) and the 5' UTR (locus 9b) of the PP2C gene revealed a considerably greater degree of differentiation (0.52 and 0.46, respectively) (Fig. 2 and Table 4). Thus, the Fst analysis suggests that selection may have occurred in or near loci 9a and 9b (the PP2C gene).


View this table:
[in this window]
[in a new window]

 
Table 4. Fst, HKA test P values, and Tajima's D for wild and cultivated sorghums.

 
A Candidate for Directional Selection in Cultivated Sorghum
Several features of the data lend support to the hypothesis that recent directional selection has shaped diversity patterns around Xcup15. This assertion is based in part on previous observations on levels of DNA sequence diversity from S. bicolor and also polymorphism data obtained from other grass species. First, diversity levels across this region in cultivated accessions were very low, about one-third of previous estimates of genome-wide diversity using the same accessions (Hamblin et al., 2006).

Second, simulation studies have shown that LD increases after a selective sweep (Przeworski, 2002; Kim and Nielsen, 2004). A particular haplotype (extending for at least 99 kb) predominated among cultivated sorghums while wild accessions showed no such haplotype structure (Fig. 3 ). Previous estimates in sorghum have indicated that LD decays, on average, by 15 kb (Hamblin et al., 2005). Although low levels of polymorphism in the Xcup15 region precluded our ability to assess LD levels, the haplotype structure in cultivated sorghums was unusual and resembled that observed in swept regions of other species. For example, DNA sequence data from maize, a randomly mating outcrossing species (Brown and Allard, 1970), have suggested that selection produces higher LD. In a survey of six genes (1.2–10 kb in length) in a diverse set of tropical and semitropical lines of maize, Remington et al. (2001) found that LD declined rapidly (within 200–1500 bp) for five genes but that it decayed much more slowly (within {approx}10 kb), for sugary 1 (su1). Subsequent analysis showed that su1, an enzyme in the starch biosynthesis pathway, had been under directional selection during either domestication or breeding (Whitt et al., 2002). Extended LD has been also been detected around the maize allele of the Y1 gene that encodes for yellow endosperm (Palaisa et al., 2003). In rice, nucleotide diversity data surrounding the xa5 locus, a bacterial blight resistance gene, showed significant LD between sites 100 kb apart for resistant accessions but no significant association among susceptible types (Garris et al., 2003). Rice, like sorghum, is a predominantly selfing species although outcrossing rates in rice (<1%) (Rong et al., 2004) are much lower than estimates for sorghum (5–30%) (Ollitraut, 1987; Doggett, 1970).


Figure 3
View larger version (38K):
[in this window]
[in a new window]

 
Fig. 3. Haplotypes observed in the Xcup15 region of cultivated and wild sorghum excluding locus 9–10b. Numbers across the top of the figure indicate the site coordinate within the BAC sequence (bp), and the shaded area denotes the position of the derived fixed nucleotide difference between cultivated and wild sorghum accessions. Dots indicate sequence identity to reference sequence. Segregating nucleotide sites are shown only for accessions with no missing data. Insertion–deletion polymorphisms were excluded from the analysis. Mutations unique to S. propinquum are not shown.

 
And finally, support for recent selection in the region we have studied in sorghum comes from our identification of a fixed G->A transition between wild (including S. propinquum) and cultivated accessions at position 56122 bp of the BAC clone (corresponding to the 5' UTR of the PP2C gene) and {approx}105 bp upstream of SSR Xcup15 (Fig. 3). Previous analysis of variation across a total of 23174 bp (Hamblin et al., 2005) never yielded a fixed difference between DNA sequences from wild and cultivated sorghums. Moreover, DNA sequence alignment of this region to sequences from sugarcane, maize, and rice indicated that these taxa exhibit the same nucleotide (G) observed in the wild sorghums at position 56122 bp, confirming that the A allele in the cultivated is derived. The serine–threonine phosphatase (PP2C gene) that harbors this fixed transition was most similar to Arabidopsis thaliana gene At3g51370 and belongs to one of the largest gene families described in plants. According to Kerk et al. (2002), Arabidopsis contains 69 such genes. Moreover, the PP2C Arabidopsis homolog is a member of the least studied groups of phosphatases, class D (Schweighofer et al., 2004). Serine–threonine phosphatases have been implicated in mechanisms such as abscisic acid (ABA) signal transduction, regulation of flower development (Schweighofer et al., 2004) and seed germination (Yoshida et al., 2006). Two sorghum domestication-related QTLs co-localize with Xcup15, one for plant height (Lin et al., 1995) and the other for primary branch number in the inflorescence (P.J. Brown, 2006, personal communication). Although the prospects are tantalizing, we have no evidence at present that the PP2C gene does or does not influence any of these phenotypes in S. bicolor. The high level of LD (haplotype structure) in the cultivated lines should also lead to caution in the acceptance of the PP2C gene as being the actual target of selection without additional functional and/or association studies (see below).

Statistical Evidence for Selection
We employed statistical methods to determine if the patterns of diversity observed in cultivated sorghums in the genomic region surrounding Xcup15 differed significantly from an equilibrium neutral model and in a manner consistent with a selection scenario in cultivated sorghum. Directional selection (i.e., fixation of a favorable mutation) will result in decreased variation at linked neutral regions and the size of the affected region is a function of both the regional rate of recombination and the strength of selection. To test if differences among loci in the amount of diversity within species relative to divergence between species were significant we employed the HKA test. Because the amount of DNA sequence variation observed within a species (diversity) is expected to be proportional to the amount of DNA sequence divergence between species at neutrally evolving loci (Kimura, 1983), significant differences in these ratios might suggest the local effects of selection. If a particular locus shows a low ratio of diversity to divergence relative to other loci, for example, directional selection may have been responsible for the reduced diversity and the locus possibly encodes or influences a domestication-related trait. Conversely, higher diversity than expected under a neutral evolution model might indicate the effects of balancing or diversifying selection (the locus could be involved in local adaptation or crop improvement). Results from HKA tests for the 10 loci surveyed are presented in Table 4. Among the comparisons performed for cultivated sorghum (each of 10 loci vs. a "reference locus" composed of genome-wide data) (see Materials and Methods), only locus 9b (the same locus that showed the fixed nucleotide difference between cultivated and wild sorghums) exhibited a significant P value (0.0009) after applying the Bonferroni correction. This finding indicates a deficiency of polymorphism in cultivated lines relative to divergence and is consistent with expectations under a model of recent directional selection. None of the HKA tests performed on loci from the wild accessions were significant (Table 4). Although not ideal, comparison of wild data to the cultivated reference locus was carried out due to the lack of an appropriate reference dataset derived exclusively from the wild sorghums, and is conservative for detection of directional selection.

Another feature of the sequence data that can be used to infer the action of selection is the frequency distribution of polymorphisms. Assuming no recombination, a selective sweep of a new mutation or unique variant eliminates all linked neutral variation. With time, as the population recovers from the sweep, new mutations will accumulate, initially at low frequencies. This skew towards an excess of rare variants is measured by Tajima's D, which compares the difference between two measures of diversity, {theta}w and {theta}{pi}. The {theta}w estimate ({theta} in Table 3) is based on the number of segregating sites and is, therefore, affected mostly by low frequency variants, while {theta}{pi} ({pi} in Table 3) is based on average nucleotide diversity and is mostly influenced by intermediate frequency alleles. Because the means of these two estimators are expected to be equal under neutrality (see Fay and Wu, 2005), significantly negative values of D are consistent with directional selection whereas significantly positive values are consistent with balancing selection. Results for Tajima's D (Table 4) indicated a predominance of low-frequency polymorphisms in both cultivated (average D = –0.45) and wild (average D = –0.47) sorghums. Although these results are in the direction expected under a directional selection scenario, none of the loci (D ranged from –1.50 to +1.35) differed significantly from expectations under either an equilibrium neutral model or a simple bottleneck model. Therefore, the Tajima's D results provide no evidence for a recent selective sweep of a single new or unique variant. This result is not surprising considering that the power of this test for detecting a selective sweep is restricted within a fairly narrow time interval following the sweep (Simonsen et al., 1995).

Unlike the previous tests, in which loci are tested individually, the likelihood-based statistical test or CLR evaluates the significance of a local reduction of variation along a physically linked but not necessarily contiguous stretch of DNA (see Materials and Methods). Departure from neutrality is, therefore, tested with sequence data from all loci simultaneously. Moreover, the CLR estimates the strength and location of directional selection from DNA sequence data. We tested polymorphism data for the cultivated and wild groups separately and also for the combined dataset to evaluate species-wide patterns. Results from this composite likelihood analysis rejected the neutral equilibrium model in favor of a strong selective sweep or hitchhiking model (MLE of the strength of selection or {alpha} = 10087) only in the combined data set. When population size (Ne) is set to 142500 (see Materials and Methods), the MLE of {alpha} suggests a selection coefficient (s) of 0.035. This value of s is similar to those obtained for the tga1 (s = 0.03–0.04) (Wang et al., 2005) and tb1 (s = 0.04–0.08) (Wang et al., 1999) genes of maize. As indicated above, both loci have been shown to be targets of domestication-related selection in maize (Wang et al., 1999, 2005). In addition, the CLR test located the target of selection at position 26107 bp of the BAC clone sequence (between genes 2 and 3) (Table 2) and {approx}30 kb upstream of the fixed transition (at 56122 bp) observed between wild and cultivated sorghums (see above). Except for multiple transposable element-related coding sequences, the region containing the predicted target comprises the longest expanse of DNA containing no predicted genes (Table 2). It is worth noting, however, that simulation studies have recently demonstrated that the MLE of the target of selection is less reliable in partially sequenced regions, having a very large relative mean square error relative to estimates based on complete sequence (J.D. Jensen, 2006, personal communication). In order to quantify this result, 95% confidence intervals were calculated via parametric bootstrap and were seen to encompass {approx}39% of the total region, between positions 6487 and 45722. To improve precision of our localization, therefore, we would need to collect contiguous DNA sequence polymorphism data from across the entire 99 kb sample region (a very significant sequencing effort).

Distinguishing Selection from Demographic Factors
Results from the CLR test indicated that patterns of diversity in this region of the sorghum genome are a better fit to a selective sweep model than to an equilibrium neutral model. This test, however, is not robust to undetected population structure or a recent bottleneck (Jensen et al., 2005), processes that can generate large deviations from equilibrium and patterns of sequence variation that resemble those expected under a selection scenario. For example, an alternative interpretation of the diversity patterns observed for cultivated and wild sorghums (Fig. 2) could involve demographic amplification of ancestral stochastic variation via a population bottleneck associated with cultivation. Alternatively, this pattern could represent a preexisting sweep signal (i.e., selection occurred in the wild sorghums and was amplified in cultivated lines through one or more bottlenecks) (see Pool et al., 2006).

To address this issue, we took the maximum likelihood estimates from the CLR test and employed them in the GOF which has been shown to have high sensitivity for discriminating between a hitchhiking model and nonequilibrium demography (Jensen et al., 2005). Results from the GOF test suggest that the hitchhiking model fits the data poorly (P = 0.12; the lower the value the worse the fit) and, therefore, the signal detected by the CLR method can not be distinguished from demography. We should note, however, that other factors might account for the poor fit observed with the GOF test. First, Jensen et al. (2005) have indicated that deviations from a simple selection model (one that assumes a single, recent, and complete sweep) can generate a large {Lambda}GOF (and therefore a small P value), even if selection has taken place. Additionally, joint analysis of the wild and cultivated data artificially created population structure (see Fst results, Table 4), which has been shown to lead to false positives with the CLR test (Jensen et al., 2005). Furthermore, the sweep model (Kim and Stephan, 2002) assumes that the data are sampled from a random mating population at equilibrium. Sorghum, however, is a predominantly selfing species and it is not a population in equilibrium (Hamblin et al., 2005, 2006). While the GOF test appears to be robust to violations of a number of these assumptions in Drosophila (Jensen et al., 2005), the effects of these violations are as of yet unexplored in a species such as sorghum. Thus, the results of the CLR test should be viewed only as being consistent with, and not evidence for, recent strong selection in this region of the sorghum genome.

Implications for Identifying Targets of Directional Selection
The power to detect directional selection is directly proportional to the amount of within-species diversity. That is, higher levels of variation provide more power for detecting significant reductions in variation likely associated with selection (Wright et al., 2005; Yamasaki et al., 2005; Hamblin et al., 2006). Cultivated sorghum exhibits one-fourth of the amount of genetic variation observed in a comparable sample of geographically and genetically diverse maize landraces (Hamblin et al., 2004, 2005). Therefore, the low levels of diversity observed within sorghum, coupled with the relatively low divergence to the outgroup (S. propinquum), represent major factors limiting our ability to unambiguously determine the target of selection in this genomic region.

When employing genome-wide scans of diversity to identify signals of selection, there are both advantages and disadvantages associated with having extensive haplotype structure or LD. For example, species with fairly extensive LD such as rice and sorghum require lower marker density for suitable genome coverage compared with species in which LD decays much more rapidly (e.g., maize). Conversely, extensive haplotype structure also hinders exact localization of the selection target. Because one major haplotype was observed along the 99 kb Xcup15 region of cultivated sorghum, at least 12 predicted genes (1, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, and 15; Table 2) should be considered as potential selection candidates. Given that we were unable to establish the precise boundaries of this putative sweep, genes outside this range are possible candidates as well, despite the evidence of a fixed derived mutation at the PP2C noted in the previous section.

The number of genes that one needs to consider as selection candidates will also depend on the interplay between recombination and genome organization. A major difference between the genome organization of maize and sorghum (as well as rice and Arabidopsis) is the interspersion patterns of genes and repetitive sequences. Sorghum and rice have very compact genomes ({approx}772 and 470 Mb, respectively, Arumuganathan and Earle, 1991; Goff et al., 2002) and gene density tends to be high (Goff et al., 2002; Kim et al., 2005). Gene density in maize, on the other hand, is much lower (SanMiguel et al., 1996; Tikhonov et al., 1999), with genes separated by large blocks of highly methylated repetitive elements (Bennetzen et al., 1994) that are recombinationally suppressed. For example, although LD extends for up to 90 kb upstream of tb1 (Clark et al., 2004), a gene that has played a major role in the morphological transition from teosinte to maize (Doebley et al., 1995) and is the best documented target of strong directional selection in plants (Wang et al., 1999), tb1 is the only gene present within this range. The remaining 90 kb upstream region is composed almost entirely of transposable elements.

Implications for Association Studies and Future Directions
Genome-wide scans of diversity performed in highly diverse panels of maize have yielded dozens of candidates associated with domestication and/or crop improvement (Vigouroux et al., 2002; Wright et al., 2005; Yamasaki et al., 2005). The success of population genetics-based approaches in maize, therefore, prompted us to evaluate this methodology applied to a selfing species as a way of identifying targets of directional selection. As this study reveals, DNA sequence polymorphism data support our initial findings based on SSR genome-wide scans of diversity (Casa et al., 2005) that recent directional selection likely shaped diversity patterns around locus Xcup15. Thus, as has been shown in maize, population genetics-based approaches can also lead to the successful identification of candidate genomic regions in sorghum. However, the domestication process in sorghum may not have been as simple as it apparently has been in maize (see Matsuoka et al., 2002). While we assume a single, recent, and complete sweep, it is possible that the history of cultivated sorghum was complex and involved multiple domestication events and/or postdomestication gene flow between wild and cultivated sorghum.

This study has also revealed that unambiguous identification of the target of directional selection in sorghum might not be as straight forward as it presumably has been in maize, because of the overall low levels of variation, more extensive LD, and other departures from equilibrium in sorghum (Hamblin et al., 2006). This challenge might also be faced when such studies are conducted in species that exhibit genomic characteristics and mating systems similar to sorghum. As with the genomic signatures of directional selection, we do not really know what the signal of diversifying selection (pertaining to traits such as flowering time, plant height, and disease resistance) will look like in sorghum. From a practical point of view, however, use of directed (i.e., starting from traits of interest instead of random scans of diversity) and integrated approaches (i.e., combining population development, QTL mapping, and assessment of variation in diversity panels) should pave the way for the successful identification of functionally interesting alleles for crop improvement and line development in S. bicolor.


    ACKNOWLEDGMENTS
 
We thank Hong Sun (Cornell University) for help with collection of DNA sequence data, Baohua Wang (Cornell University) for database assistance, John Bowers (University of Georgia) for the overgo hybridizations, Mitch Tuinstra (Kansas State University) for providing seeds of wild S. bicolor accessions, and Gael Pressoir (Cornell University), Amanda Garris (USDA-ARS, Geneva, NY), and two anonymous reviewers for their comments on the manuscript. This work was funded by NSF grant DBI 0115903 (to AHP, CFA, and SK).


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results and Discussion
 REFERENCES
 
Abbreviations: SSR, simple sequence repeat; BAC, bacterial artificial chromosome; EST, expressed sequence tag; HKA, Hudson Kreitman Aguadé; CLR, composite likelihood ratio; GOF, goodness-of-fit; LD, linkage disequilibrium; MITE, miniature inverted repeat transposable element; MLE, maximum likelihood estimate; PP2C, protein phosphatase 2C; UTR, untranslated region.

Received for publication February 1, 2006.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results and Discussion
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Supplemental Table 1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Casa, A. M.
Right arrow Articles by Kresovich, S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Casa, A. M.
Right arrow Articles by Kresovich, S.
Agricola
Right arrow Articles by Casa, A. M.
Right arrow Articles by Kresovich, S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome