Published online 16 July 2007
Published in Crop Sci 47:S-125-S-134 (2007)
© 2007 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
ORIGINAL RESEARCH
Sequence Variation at Candidate Loci in the Starch Metabolism Pathway in Sorghum: Prospects for Linkage Disequilibrium Mapping
Martha T. Hamblina,*,
Maria G. Salas Fernandezb,
Mitchell R. Tuinstrac,
William L. Rooneyd and
Stephen Kresoviche
a Institute for Genomic Diversity, 156 Biotechnology Bldg., Cornell Univ., Ithaca, NY 14853
b Dep. of Plant Breeding and Genetics, Cornell Univ., Ithaca, NY 14853
c Dep. of Agronomy, 2004A Throckmorton Hall, Kansas State Univ., Manhattan, KS, 66506
d Dep. of Soil & Crop Science, Texas A&M Univ., 2474 TAMU, College Station, TX 77843-2474
e Institute for Genomic Diversity, Cornell Univ., Ithaca, NY 14853. This work was supported by funds from the USDA and Hatch awarded to MRT, and by the Institute for Genomic Diversity. Sequence data are available as GenBank accessions EF089570-EF090160
* Corresponding author (mth3{at}cornell.edu).
 |
ABSTRACT
|
|---|
Sorghum bicolor, an important component of human diets in many parts of the developing world, has attributes that should make it very suitable for linkage disequilibrium (LD) mapping of complex traits: low sequence diversity, moderately extensive LD, and natural homozygosity. Sorghum endosperm quality, which determines the grain's suitability for preparation of local food products, varies greatly among cultivars, and is genetically complex; some of the genes involved are likely to be in the starch metabolism pathway. To assess the prospects for linkage disequilibrium mapping of variation in endosperm quality, we have surveyed DNA sequence variation in 15 genes in the starch metabolism pathway in a panel of cultivars that vary in grain quality traits. Single nucleotide polymorphism (SNP) variation was found at all loci, including a strong candidate for the causal mutation underlying a waxy phenotype. Levels of LD varied but were generally moderate to high (average ZnS = 0.56), while non-singleton SNP frequency was low to moderate (
at silent sites ranged from 0.00026 to 0.0068), resulting in a fairly low haplotype diversity that can be captured by a moderate number of "tag SNPs" at most loci. These results suggest that association mapping of candidate genes in sorghum can be done with relatively few markers, resulting in lower costs and a smaller multiple testing problem than in more diverse species such as maize (Zea mays L.).
Abbreviations: AGP, ADP-glucose pyrophosphorylase BAC, bacterial artificial chromosome BE, branching enzyme DBE, debranching enzyme LD, linkage disequilibrium QTN, quantitative trait nucleotides SNP, single nucleotide polymorphism SS, starch synthases UTR, untranslated region
 |
INTRODUCTION
|
|---|
GRAIN SORGHUM (Sorghum bicolor ssp. bicolor) is an important component of human diets in many parts of the developing world, especially in regions where water is limiting (FAO, 1996), and is a promising source of biomass for energy production (Hallam et al., 2001). With the availability of the complete sorghum genome sequence, sorghum functional and comparative genomics will be applied to goals of increasing sorghum yield and quality as well as to basic biological questions common to the grasses. Thus, it is critical that the sorghum community develop resources and methodologies to efficiently dissect the genetics of important traits and identify functional allelic variation from the extensive germplasm resources that are available. Population-based methods for mapping complex traits, originally developed in the human genetics community (Clark, 2003), are now being applied to plants (Aranzana et al., 2005;Yu and Buckler, 2006), and have been successful in analysis of candidate genes in maize (Zea mays L.) (Wilson et al., 2004). Recently, the feasibility of whole-genome association mapping has been demonstrated in wheat (Breseghello and Sorrells, 2006) and in barley (Hordeum vulgare L.) (Rostoks et al., 2006).
Sorghum has attributes that should make it very suitable for linkage disequilibrium (LD) mapping of complex traits: sequence diversity one fourth that of maize (Hamblin et al., 2006), a population recombination parameter 26-fold less than in maize (Hamblin et al., 2005), and natural homozygosity. LD in sorghum is, on average, extensive enough that alleles at the 5' and 3' ends of genes may often be non-randomly associated, allowing a moderate or even large number of single nucleotide polymorphisms (SNPs) to be simplified to a small number of haplotypes, reducing genotyping costs and increasing statistical power (Clark, 2004). To assess the prospects of LD mapping of candidate genes in sorghum, we have undertaken a study of genes in the starch metabolism pathway, hypothesizing that variation at some of these genes underlies differences in endosperm quality related to starch composition.
Grain from sorghum landraces (farmer-selected, open-pollinated varieties) from different regions varies in endosperm quality, making different varieties more or less desirable for preparation of particular foods that are consumed locally. For example, stickiness is considered a very undesirable quality in an African porridge called "tô", and many sorghum varieties produce an unacceptably sticky product (Da et al., 1982). The suitability of sorghum varieties for preparation of particular foods is related to the texture of the endosperm, which varies from "floury" to "corneous", depending on the relative sizes of the floury and corneous portions (Rooney and Miller, 1982); a floury endosperm is soft, while a corneous endosperm is hard (Chavan and Salunkhe, 1984). Endosperm texture is, in turn, partly a function of the starch composition of the endosperm (Dombrink Kurtzman and Knutson, 1997); protein composition is also important (Chandrashekar and Mazhar, 1999). Sorghum endosperms range in starch content from 56 to 73% (Jambunathan and Subramanian, 1987), with amylopectin making up 70 to 80%; the remainder is amylose. Variation in amylose content, which has both genetic and environmental bases, affects gelatinization temperature, pasting temperature, swelling power and solubility, characteristics that determine such traits as grain hardness, dough or porridge stickiness, and rolling properties.
Some differences in endosperm quality are thought to arise because the ratio of linear amylose to branched amylopection, as well as the chain lengths and branching patterns of the amylopectin, affects the higher order packaging of starch molecules in starch granules, the semi-crystalline structures where starch synthesis takes place and starch is stored (Jobling, 2004). While the basic enzymology of starch synthesis is fairly well understood, the physical relationship of enzymes to the granule, and many of the roles of particular enzyme isoforms, are still unclear.
The key enzymes involved in endosperm starch synthesis are ADP-glucose pyrophosphorylase (AGP), the starch synthases (SS) and the branching (BE) and debranching (DBE) enzymes (James et al., 2003). AGP catalyzes the production of ADP-glucose, the substrate for starch synthesis, from glucose-1-phosphate and ATP. The starch synthases, of which there are ten forms in rice (Oryza sativa L.) (Hirose and Terao, 2004), catalyze the addition of ADP-glucose to linear chains of amylose and amylopectin by
-1,4 linkages. Different SS genes are thought to be important in the synthesis of starch molecules with different degrees of polymerization and are also associated with different branching patterns. For example, in maize, the granule-bound SS (GBSSI, encoded by the waxy gene) is required for the production of amylose, and mutations in SSIII (encoded by du1) lead to forms of amylopectin that are excessively branched (Gao et al., 1998). The production of amylopectin, which has branches formed by
-1,6 linkages, is achieved through the action of the BEs and DBEs, which also exist in multiple isoforms. Mutations in these genes result in various altered structures and properties of starch and starch granules, e.g., (Dinges et al., 2003; Rahman et al., 1998; Sato and Nishio, 2003).
Changes in the activities of some of these enzymes, whether through structural or regulatory mutations, are likely to underlie genetic differences in starch composition and grain quality. For example, a substantial portion of the dramatic differences in grain quality between indica and japonica rice can be attributed to alleles at waxy (Yamanaka et al., 2004), SBE1 and SBE3 (Han et al., 2004) and SSIIa (Nakamura et al., 2005). In other cereals, the genetic architecture of starch-related traits may be even more complex and environmental effects may also be important (Klein et al., 2001; Murty et al., 1982). As a first step toward elucidating the genetic basis of differences in endosperm quality in sorghum, we have undertaken a survey of variation in 15 candidate genes in a panel of 23 S. bicolor accessions chosen to represent a broad range of endosperm phenotypes. Fourteen of these genes encode enzymes that are involved in the starch metabolism pathway from synthesis and transport of sugar precursors to modification of branching patterns (Fig. 1
). We have also included starch phosphorylase, whose role in starch metabolism is not well understood (Blennow et al., 2002), but which shows a marked increase in activity during early endosperm development and is therefore hypothesized to have a role (Ohdan et al., 2005).
View this table:
[in this window]
[in a new window]
|
Table 2. Starch metabolism candidate genes surveyed. Zm = Zea mays; Sb = sorghum bicolor; Os = rice; G = genomic DNA; R = cDNA; P = protein.
|
|
Our main goal in this study was to identify SNP variation at candidate genes for endosperm quality and to analyze the data from the perspective of association study design: how many SNPs are observed at these genes, where in the gene are they located, what is the pattern of LD among SNPs, and how many SNPs would need to be scored to capture the variation segregating in a population of unrelated lines? These questions are very timely as researchers on non-model plants consider the issues involved in moving from QTL mapping in experimental populations to association studies.
Because of our interest in the genetics of domestication, we also tested for evidence of directional selection on the starch metabolism pathway. Grain characteristics differ between wild and domesticated sorghum, so it is reasonable to hypothesize that aspects of starch metabolism may have experienced selection during the domestication process, as was found by Whitt et al. (2002) for at least two genes in this pathway in maize. It is of particular interest to determine whether selection acted on the same steps in the pathway in these two closely related crops.
 |
MATERIALS AND METHODS
|
|---|
Sorghum Starch Content Diversity Panel
We surveyed a panel of 23 Sorghum bicolor that represent a wide range of diversity in endosperm characteristics (Table 1). Grain hardness and weight were measured by Single Kernel Characterization System (Bean et al., 2006) at the University of Kansas. The measurement for each sample is based on about 100 independent measurements of individual seeds. Seed was from multiple sources, i.e., accessions were not grown in a common environment; replicates were not available for most accessions. We used two outgroup taxa, S. propinquum and S. purpurocereceum, both of which are diploid and have the same chromosome number as S. bicolor (Price et al., 2005).
View this table:
[in this window]
[in a new window]
|
Table 1. Sorghum accessions in single nucleotide polymorphism (SNP) discovery panel. Numbers in parentheses are standard deviations.
|
|
Identification of Sorghum Genomic Sequences
We used inferred amino acid sequences for experimentally identified genes in maize and rice to query the sorghum GSS and EST databases by TBLASTN with default parameters. Resulting GSS and EST hits were assembled into contigs using Sequencher 4.0 (Gene Codes Corp., Ann Arbor, MI) at high stringency, i.e., only near-perfect matches were assembled into contigs. When only EST sequence was available, maize or rice sequence was used to infer exon/intron boundaries, and to estimate the size of introns, so that PCR primers spanning introns could be designed. In some cases, amplicons covered the entire coding region; in other cases, partial sequence was obtained. Primers used for amplification are provided in Supplementary Table 1. A summary of the regions surveyed is presented in Table 2.
Sequence Analysis
Amplicons across each gene were directly sequenced after treatment with Exonuclease I and shrimp alkaline phosphatase. Sequencing was performed at Cornell University's Bioresource Center using ABI BigDye and 3730 capillary sequencer. Traces were imported into Sequencher 4.0 and manually edited. Alignments were made manually using Se-Al (http://evolve.zoo.ox.ac.uk; verified 13 June 2007) and exported as fasta files for analysis. DnaSP (Rozas et al., 2003) was used to calculate summary statistics of polymorphism, divergence, and linkage disequilibrium. MEGA (Kumar et al., 2001) was used to generate files of variable sites. Accession numbers of these sequences are provided in Supplementary Table 1.
LD Assessment and Tag SNP Identification
The program Haploview (Barrett et al., 2005) was used to produce "triangle plots" of pairwise LD and to identify tag SNPs, such that all non-singleton SNPs were associated with a tag SNP at r2 = 1.0.
 |
RESULTS
|
|---|
Gene Identification
On the basis of homology to known genes from other plants, primarily maize, we assembled unannotated Sorghum bicolor genomic sequences into contigs that represent coding sequence for 14 loci (Table 2) encoding enzymes of the starch metabolism pathway. Sequences of mRNA for two of these loci were already known in sorghum (indicated by reference sequence "Sb" in Table 2). In addition, the genomic sequence for AGPss had already been identified in a bacterial artificial chromosome (BAC), for a total of 15 genes in this pathway. We are satisfied that these sequences represent expressed genes, based on high similarity to genes from maize and rice and the presence of near-perfect matches between ESTs and genomic sequence. Since many of these loci belong to gene families whose members are expressed in tissue-specific patterns, we attempted to identify those family members that play a major role in the endosperm, as opposed to leaves or other tissues, by finding sequences that are most similar to endosperm-specific maize genes. However, because the genes were identified in silico, because complete genome sequence was not available for sorghum, and because sorghum endosperm EST sequences were not available, there remains a possibility that one or more of these genes may be a paralogue of the maize gene used to identify it.
One case of possible paralogy is that of the small subunit of ADP-glucose pyrophosphorylase (AGPss). The leaf and endosperm isoforms of this protein are encoded in maize by two different loci, Agpls and Bt2, respectively (Hannah et al., 2001). These proteins are extremely similar except in the amino-terminal region, where the endosperm form has 43 amino acids and the leaf form has a non-homologous sequence of 85 amino acids. The difference is largely due to splicing: the 85 amino acid leaf isoform sequence can easily be aligned to a translation of Bt2 intronic sequence. If our sorghum sequence encodes the Bt2 homolog, it is incomplete at the 5' end, and we have no sorghum EST for this gene, so it is not entirely clear which gene we have identified. However, the protein translation of the sorghum sequence is a better match to the leaf protein than is the maize Bt2 intronic translation, suggesting that the sequence we identified encodes the leaf isoform. Another possibility is that, in sorghum, both forms of this enzyme are encoded by a single gene with alternative splicing, as has been proposed for barley (Thorbjornsen et al., 1996). While alternative splicing of AGPss has been observed (Burton et al., 2002), there is still substantial uncertainty about the complement of AGPss genes and their relationships to the plastidic/cytosolic/leaf/endosperm forms in many species (Johnson et al., 2003).
Another concern relevant to gene families is whether, in those cases where our sequence is partial and discontinuous, the amplicons are all truly from the same locus. Two types of evidence suggest that they are in fact from the same locus. First, we observe significant linkage disequilibrium between regions for most loci (see section Linkage Disequilibrium, Haplotype Diversity, and Tag SNPs below). Second, since the completion of this work, additional sorghum genomic sequence has become available from the Joint Genome Institute (JGI), in the form of unassembled scaffold contigs. In most cases, our discontinuous sequences show near-perfect matches to these scaffolds, and in no case does the new sequence provide evidence that our sequence comes from different loci (see Supplementary Figure 1).
Finally, it is obvious that this study is not exhaustive: for example, rice has a complement of ten SS genes, and we have surveyed variation in only five SS genes in sorghum.
Patterns of Sequence Variation
We surveyed DNA sequence variation at 15 loci (Table 2) encoding enzymes of the starch metabolism pathway in a panel of cultivated grain sorghum accessions chosen to represent the diversity of endosperm quality phenotypes in this crop (Table 1). In Table 3, we present summary statistics of sequence polymorphism, and distributions of SNPs by functional category. Levels of total sequence diversity vary 16-fold, while levels of silent (i.e., synonymous plus non-coding) diversity vary 26-fold. (Haplotypes of all individuals at each locus are shown in Supplementary Figure 1). These data can be analyzed in several different contexts. In one context, we are interested in identifying potentially functional variation that may be associated with grain quality. Eleven of the loci have amino acid variants, often considered to be the best candidates for quantitative trait nucleotides (QTN). However, QTNs are at least equally likely to be regulatory mutations in introns, UTRs, and promoter regions. Analysis in Drosophila (Andolfatto, 2005) suggests that only synonymous changes are, as a category, unlikely to be functionally significant. Thus, all loci have at least one SNP that is a candidate QTN, and most loci have several.
View this table:
[in this window]
[in a new window]
|
Table 3. Summary statistics of sequence variation at candidate loci. Sil = synonymous and non-coding sites; Non = nonsynonymous sites; Tot = total sites; Syn = synonymous sites; UTR = untranslated region (3' or 5' or both); Int = intron.
|
|
Two of the accessions in our panel, BTxArg1 and Tx2907, are waxy mutants, meaning that they have a waxy phenotype that maps to a single locus, presumably the homolog of the maize waxy gene, which encodes GBSS (Macdonald and Preiss, 1985). While many waxy mutations characterized in other grasses are large deletions, frameshift mutations, or premature stop codons, Sato and Nishio (2003) also found single amino acid changes associated with the waxy phenotype in rice. We sequenced the entire coding region of the GBSS gene, including all introns and about 1.3 kb of sequence 5' of the initiation codon, and found a single mutation that distinguishes BTxArg1 from the non-waxy accessions: a change from glutamine to histidine at aa 268. Not only is this residue conserved across the grasses, it also appears to be invariant in several families of dicots in which this region of the GBSS gene has been used to construct phylogenies (Evans et al., 2000) (see Fig. 2
). Tx2907, which does not share this mutation, has a unique haplotype but no unique mutations, although there is a
600 bp region of missing data in this accession due to failure of one set of PCR primers. Since there are no mutations in the primer-binding sequences in Tx2907, this region may contain a large insertion or deletion, which could be responsible for the PCR failure and the phenotype. The phenotype in either one of these accessions could also be due to a mutation in non-coding sequence not surveyed in this study.

View larger version (15K):
[in this window]
[in a new window]
|
Figure 2. Conservation of amino acid 268 in the GBSS gene across grasses and dicots. The Q > H mutation, boxed, is seen only in a sorghum waxy mutant, BTx-Arg1.
|
|
Linkage Disequilibrium, Haplotype Diversity, and Tag SNPs
The pattern of non-random association of SNP alleles (i.e., LD) within a locus is critically important to the design and success of association studies, where markers must be dense enough to capture variation at SNPs that are not directly scored. LD also affects the resolution of these studies, since the effects of SNP alleles that are always found together cannot be distinguished from one another. There are a number of different ways to measure association among SNPs; one is by a pairwise measure of LD such as r2. This statistic is expected to be higher for tightly linked SNPs, although allele frequency and drift both have a large effect (Pritchard and Przeworski 2001). Some of the genes in this study are quite large (10 kb or more), a range at which LD may have largely dissipated. In Supplementary Figure 1, we show "triangle plots" of r2 among all pairs of non-singleton SNPs at each locus, and in Table 4 we show the number of significant associations among SNPs as well as a summary statistic of LD, ZnS (Kelly, 1997). Eleven of the 15 loci have
40% of SNP pairs significantly associated at the 0.05 level, and eight have a significant level of locus-wide LD as assessed by ZnS. Using the four-gamete test of Hudson and Kaplan (1985), we detected evidence of recombination in only five of the loci. These results are consistent with levels of LD observed at other loci in sorghum (Hamblin et al., 2005).
In addition to their implications for experimental design, interactions of SNPs may be important to phenotypic expression, such that haplotypes are of direct biological relevance (Clark, 2004). Furthermore, when several SNPs are significantly associated, the number of tests may be reduced considerably by testing for association between the phenotype and a minimal set of SNPs ("tag SNPs") that can serve as proxies, through LD, for all SNPs. In general, the number of tag SNPs required is an increasing function both of the number of SNPs and the amount of recombination among SNPs. For each locus we identified a set of tag SNPs that can serve as proxies with r2 of 1.0 for all untagged SNPs (called "perfect proxies"). For 10 out of the 15 loci in this study, the number of tag SNPs was less than half the total number of non-singleton SNPs (Table 4).
The LD results also support the inference of linkage between discontinuous segments of sequence. Of those loci with discontinuous data, DBEp, SSIIa, BEI, BEIIb, SSIII, and StPh, all have significant associations between SNPs in the first and last segments of the data. In the case of SSIII, while there is only one such association (SNP2 in intron 1 with SNP9 in the 3'UTR, r2 = 1), this observation would be highly unlikely (p < 0.0001) if these two segments were unlinked. In rice genomic sequence (AP005441), these two positions are about 10 kb apart. The remaining loci with discontinuous data have weak (SSI) or no (DBEi) evidence for linkage, but genomic (DBEi) or mRNA (SSI) sequence spanning these segments strongly suggests that our PCR products were amplified from the same locus (see Supplementary Figure 1). Confirmation will be possible when the assembled genome sequence of sorghum is available.
Tests for a History of Selection
If a locus has experienced directional selection, for example as a consequence of domestication, the level of variation at linked neutral sites is expected to be reduced (Maynard Smith and Haigh 1974). If several loci had experienced such selection, we might expect average variation in this data set to be unusually low, as was observed by Whitt et al. (2002) in a study of six genes in the starch pathway of maize. While the average level of silent nucleotide diversity in these 15 loci, 0.0023, is about 14% lower than that observed in a large sample of randomly chosen loci (Hamblin et al., 2006), this difference is not statistically significant.
To test whether variation at individual loci is lower (or higher) than expected under the null hypothesis of neutral evolution, we used the method of Hudson et al. (1987), called the HKA test. In this method, polymorphism and divergence data at two or more loci are used to calculate, for each locus, an expected number of segregating sites and differences between species, based on the assumption that, while each locus has its own neutral mutation rate, all loci share the same neutral population history. Significant differences between observed and expected values indicate that this assumption is violated, possibly due to the action of selection. Power to detect a reduction in variation is greater when the test is conducted on sites with a higher neutral mutation rate, particularly when attempting to detect selection at loci that are under fairly high levels of selective constraint, such as many of the loci in this study. This can be achieved by testing silent sites or by using an outgroup with a reasonably high level of divergence. Thus we conducted two sets of tests, using (i) silent sites only with a closely related outgroup, S. propinquum, and (ii) total sites with a more distant outgroup, S. purpurocereseum. The overall data set was inconsistent with the null model when S. purpurocereseum was the outgroup, and one locus, DBEi, was found to have an excess of segregating sites. While this pattern is typically interpreted as suggesting the action of diversifying selection, the excess variation in this case is due to a single accession from China, PI22913, which carries 42 singletons, 30 of which are ancestral alleles shared with S. propinquum (see Supplementary Figure 1). This haplotype configuration is not typical of diversifying selection, and could be due to introgression from a wild sorghum, subsequent to domestication. After removal of PI22913 from the data set, variation at DBEi does not depart from neutrality. Thus we find no evidence of a history of selection at these loci.
 |
DISCUSSION
|
|---|
A survey of sorghum DNA sequence polymorphism in candidate genes for starch metabolism has identified potentially functional SNP variation at all loci. While this sample of genes is relatively small, it demonstrates a range of levels of polymorphism and patterns of LD that is likely to be representative of the sorghum genome, and provides a basis for designing association studies based on candidate genes. Patterns of polymorphism vary broadly among loci because of differences in neutral mutation rates as well as the stochastic variation associated with the evolutionary process at unlinked loci, but it was possible to capture the haplotypic diversity of these regions with a small number of tag SNPs at most loci. In practice, for some loci, additional SNPs should probably be genotyped in a large sample where LD might be less extensive (see "SNP Tagging Strategy" below). Nonetheless, these results suggest that association mapping of candidate genes in sorghum can be done with a reasonable number of markers, lowering costs and presenting a smaller multiple testing problem relative to a more diverse species such as maize.
The Frequency Spectrum of Variation
The frequency spectrum and haplotype structure of SNP variation have an important influence on strategies for mapping the genetic basis of complex traits, and should be well characterized before design of population-based studies. In sorghum, these features show unusual patterns, presumably due to a complex population history, including the domestication bottleneck: in a genome-wide survey of short, randomly chosen regions, SNP variation in common haplotypes was skewed toward intermediate frequencies, while rare, very divergent haplotypes contributed many singletons at some loci (Hamblin et al., 2006). These same patterns are observed in the sequences of genes coding for metabolic enzymes.
In population-based genetic studies, low frequency variants ( < 5% minor allele frequency) are generally ignored, largely for two reasons. First, unless sample sizes are extremely large, association studies have very poor power to detect the effects of low frequency variants (Long and Langley, 1999). Furthermore, individual low frequency variants cannot play a major role in common phenotypic variation at the population level because they are rare. The critical question, however, as seen in the debate over the "common disease, common variant" hypothesis in human genetics (Pritchard and Cox, 2002), is whether rare variants, collectively, make an important contribution to quantitative variation. Many of the rare variants observed in this study are found together on unique and highly divergent haplotypes, e.g., PI22913 at DBEi, PI267525 at StPh, KS115 at SS1 (see Supplementary Figure 1). Almost all SNPs on these haplotypes are singletons. What the actual frequency of these alleles would be in a larger sample, and whether rare haplotypes make an important contribution to phenotypic variation in sorghum, are questions that we hope to answer in the future.
SNP Tagging Strategy
In human genetic research, multiple strategies for selection of tag SNPs have been proposed; some of these are simply based on LD, while others are based on identification of "haplotype blocks", an approach that is more relevant when sequenced regions are much larger than those in this study (Wall and Pritchard, 2003). We have chosen a tag SNP selection strategy based only on LD: the set of tag SNPs must contain perfect proxies (i.e., r2 = 1) for all untagged non-singleton SNPs. While this approach is conservative, the sample size of our SNP discovery panel was somewhat small. Although SNP diversity is low in sorghum, and larger samples are likely to reveal very few additional non-singleton SNPs, larger samples could reveal new haplotypic combinations such that tag SNPs are no longer perfectly associated with their proxies, especially if haplotypes extend over several kb. For example, there are three sets of SNPs in BEII that are completely associated across about 12 kb in our SNP discovery panel. Because of this perfect association, only one from each set has been chosen as a tag SNP (see Supplementary Figure 1); including an additional SNP from each set would increase the number of tag SNPs to six, out of a total of 18 non-singleton SNPs observed. The extent to which new, recombinant haplotypes will be observed in a larger sample, and the extent to which r2 will consequently be reduced, will depend on the diversity and history of that sample. Because of the non-equilibrium history of cultivated sorghum (Hamblin et al., 2006), standard theoretical predictions do not apply, so the answer to this question remains to be determined empirically.
Selection
Even though we found no evidence of directional selection on these starch genes, the results of the HKA tests are interesting when compared to those obtained by Whitt et al. (2002), who performed similar tests on six of these loci in maize. They found strong evidence of selection at three loci: AGPss (maize Bt2), BEII (maize ae1), and DBEi (maize Su1). In our data, AGPss is consistent with this result in that it has less variation than expected, although divergence is also very low and power to detect selection is consequently poor. In contrast, BEII and DBEi both have more variation than expected; this discrepancy with the maize results cannot be attributed to a lack of power, and suggests that these genes have not experienced directional selection during sorghum domestication. This may not be surprising, given the complexity of this pathway, and the many possible routes to increased grain size and starch content. Selection during domestication operates largely on standing variation that was either neutral or deleterious in the wild ancestor, generated by a mutational process that is random and slow, so the potential targets of selection were likely quite different.
Prospects for LD Mapping in Sorghum
Sorghum bicolor exhibits extensive phenotypic diversity for many traits, yet haplotype diversity at most loci is modest, a result of low to moderate density of non-singleton SNPs combined with moderate to extensive LD. Thus, it should be possible to design efficient genotyping strategies that allow genetic dissection of complex traits through association studies using fewer markers than are needed for a high-diversity species like maize. This economy will come at the cost of QTN detection, since the effects of individual variants that are strongly associated cannot be separated. For some purposes, however, such as marker-assisted breeding, such resolution is unnecessary. Furthermore, genes identified through genome-wide scans in sorghum could subsequently be analyzed in maize, where higher resolution may allow identification of functionally important sites, and molecular genetic tools are available for functional studies. Other factors influencing the success of association studies are the genetic architecture of the traits and the genetic structure of the study population. A recent study of population structure in a 377-member panel of diverse sorghum revealed that population structure is low and that spurious associations are not detected using currently available methodology (A.M. Casa et al., unpublished data, 2007). This population will soon be available as a community resource for mapping a large number of complex traits.
 |
ACKNOWLEDGMENTS
|
|---|
We thank Hong Sun for technical help.
Received for publication January 30, 2007.
 |
REFERENCES
|
|---|
- Andolfatto, P. 2005. Adaptive evolution of non-coding DNA in drosophila. Nature 437:11491152.[CrossRef][Medline]
- Aranzana, M.J., S. Kim, K. Zhao, E. Bakker, M. Horton, K. Jakob, C. Lister, J. Molitor, C. Shindo, C. Tang, C. Toomajian, B. Traw, H. Zheng, J. Bergelson, C. Dean, P. Marjoram, and M. Nordborg. 2005. Genome-wide association mapping in arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genet. 1:e60.
- Barrett, J.C., B. Fry, J. Maller, and M.J. Daly. 2005. Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 21:263265.[Abstract/Free Full Text]
- Bean, S.R., O.K. Chung, M.R. Tuinstra, J.F. Pedersen, and J. Erpelding. 2006. Evaluation of the single kernel characterization system (SKCS) for measurement of sorghum grain attributes. Cereal Chem. 83:108113.[CrossRef]
- Blennow, A., T.H. Nielsen, L. Baunsgaard, R. Mikkelsen, and S.B. Engelsen. 2002. Starch phosphorylation: A new front line in starch research. Trends Plant Sci. 7:445450.[CrossRef][Web of Science][Medline]
- Breseghello, F., and M.E. Sorrells. 2006. Association Mapping of Kernel Size and Milling Quality in Wheat (Triticum aestivum L.) Cultivars. Genetics 172:11651177.[Abstract/Free Full Text]
- Burton, R.A., P.E. Johnson, D.M. Beckles, G.B. Fincher, H.L. Jenner, M.J. Naldrett, and K. Denyer. 2002. Characterization of the genes encoding the cytosolic and plastidial forms of ADP-glucose pyrophosphorylase in wheat endosperm. Plant Physiol. 130:14641475.[Abstract/Free Full Text]
- Chandrashekar, A., and H. Mazhar. 1999. The biochemical basis and implications of grain strength in sorghum and maize. J. Cereal Sci. 30:193207.[CrossRef]
- Chavan, J.K., and D.K. Salunkhe. 1984. Structure of sorghum grain. p. 2131. In D.K. Salunke (ed.) Nutritional and processing quality of sorghum. Oxford & IBH, New Delhi.
- Clark, A.G. 2004. The role of haplotypes in candidate gene studies. Genet. Epidemiol. 27:321333.[CrossRef][Web of Science][Medline]
- Clark, A.G. 2003. Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr. Opin. Genet. Dev. 13:296302.[CrossRef][Web of Science][Medline]
- Da, S., J.O.N. Akingbala, L.W. Rooney, J.F. Scheuring, and F. Miller. 1982. Evaluation of tô quality in a sorghum breeding programme. p. 1123. In L.W. Rooney and D.S. Murty (ed.) International Symposium on Sorghum Grain Quality. ICRISAT, Patancheru, India.
- Dinges, J.R., C. Colleoni, M.G. James, and A.M. Myers. 2003. Mutational analysis of the pullulanase-type debranching enzyme of maize indicates multiple functions in starch metabolism. Plant Cell 15:666680.[Abstract/Free Full Text]
- Dombrink Kurtzman, M.A., and C.A. Knutson. 1997. A study of maize endosperm hardness in relation to amylose content and susceptibility to damage. Cereal Chem. 74:776780.[CrossRef]
- Evans, R.C., L.A. Alice, C.S. Campbell, E.A. Kellogg, and T.A. Dickinson. 2000. The granule-bound starch synthase (GBSSI) gene in the rosaceae: Multiple loci and phylogenetic utility. Mol. Phylogenet. Evol. 17:388400.[CrossRef][Web of Science][Medline]
- FAO. 1996. The world sorghum and millet economies: Facts, trends and outlook. Food and Agricultural Organization of the United Nations, Rome.
- Gao, M., J. Wanat, P.S. Stinard, M.G. James, and A.M. Myers. 1998. Characterization of dull1, a maize gene coding for a novel starch synthase. Plant Cell 10:399412.[Abstract/Free Full Text]
- Hallam, A., I.C. Anderson, and D.R. Buxton. 2001. Comparative economic analysis of perennial, annual, and intercrops for biomass production. Biomass Bioenergy 21:407424.[CrossRef][Web of Science]
- Hamblin, M.T., M.G. Salas Fernandez, A.M. Casa, S.E. Mitchell, A.H. Paterson, and S. Kresovich. 2005. Equilibrium processes cannot explain high levels of short- and medium-range linkage disequilibrium in the domesticated grass sorghum bicolor. Genetics 171:12471256.[Abstract/Free Full Text]
- Hamblin, M.T., A.M. Casa, H. Sun, S.C. Murray, A.H. Paterson, C.F. Aquadro, and S. Kresovich. 2006. Challenges of detecting directional selection after a bottleneck: Lessons from sorghum bicolor. Genetics 173:953964.[Abstract/Free Full Text]
- Han, Y., M. Xu, X. Liu, C. Yan, S. Korban, X. Chen, and M. Gu. 2004. Genes coding for starch branching enzymes are major contributors to starch viscosity characteristics in waxy rice (oryza sativa L.). Plant Sci. 166:357364.
- Hannah, L.C., J.R. Shaw, M.J. Giroux, A. Reyss, J.L. Prioul, J.M. Bae, and J.Y. Lee. 2001. Maize genes encoding the small subunit of ADP-glucose pyrophosphorylase. Plant Physiol. 127:173183.[Abstract/Free Full Text]
- Hirose, T., and T. Terao. 2004. A comprehensive expression analysis of the starch synthase gene family in rice (oryza sativa L.). Planta 220:916.[CrossRef][Web of Science][Medline]
- Hudson, R.R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153159.[Abstract/Free Full Text]
- Hudson, R.R., and N.L. Kaplan. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147164.[Abstract/Free Full Text]
- Jambunathan, R., and V. Subramanian. 1987. Grain quality and utilization of sorghum and pearl millet. p. 133139. In Grain quality and utilization of sorghum and pearl millet. International biotechnology workshop, 1988 1987. ICRISAT, Patancheru, India.
- James, M.G., K. Denyer, and A.M. Myers. 2003. Starch synthesis in the cereal endosperm. Curr. Opin. Plant Biol. 6:215222.[CrossRef][Web of Science][Medline]
- Jobling, S. 2004. Improving starch for food and industrial applications. Curr. Opin. Plant Biol. 7:210218.[CrossRef][Web of Science][Medline]
- Johnson, P.E., N.J. Patron, A.R. Bottrill, J.R. Dinges, B.F. Fahy, M.L. Parker, D.N. Waite, and K. Denyer. 2003. A low-starch barley mutant, riso 16, lacking the cytosolic small subunit of ADP-glucose pyrophosphorylase, reveals the importance of the cytosolic isoform and the identity of the plastidial small subunit. Plant Physiol. 131:684696.[Abstract/Free Full Text]
- Kelly, J.K. 1997. A test of neutrality based on interlocus associations. Genetics 146:11971206.[Abstract]
- Klein, R.R., R. Rodriguez-Herrera, J.A. Schlueter, P.E. Klein, Z.H. Yu, and W.L. Rooney. 2001. Identification of genomic regions that affect grain-mould incidence and other traits of agronomic importance in sorghum. Theor. Appl. Genet. 102:307319.[CrossRef][Web of Science]
- Kumar, S., K. Tamura, I.B. Jakobsen, and M. Nei. 2001. MEGA2: Molecular Evolutionary Genetics Analysis software. Bioinformatics 17:12441245.[Abstract/Free Full Text]
- Long, A.D., and C.H. Langley. 1999. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9:720731.[Abstract/Free Full Text]
- Macdonald, F.D., and J. Preiss. 1985. Partial purification and characterization of granule-bound starch synthases from normal and waxy maize. Plant Physiol. 78:849852.[Abstract/Free Full Text]
- Maynard Smith, J., and J. Haigh. 1974. The hitchhiking effect of a favourable gene. Genet. Res. 23:2335.[Web of Science][Medline]
- Murty, D.S., H.D. Patil, and L.R. House. 1982. Sorghum roti: Genotypic and environmental variation for roti quality parameters. p. 7991. In L.W. Rooney and D.S. Murty (ed.) International symposium on sorghum grain quality. ICRISAT, Patancheru, India.
- Nakamura, Y., P.B. Francisco, Jr., Y. Hosaka, A. Sato, T. Sawada, A. Kubo, and N. Fujita. 2005. Essential amino acids of starch synthase IIa differentiate amylopectin structure and starch quality between japonica and indica rice varieties. Plant Mol. Biol. 58:213227.[CrossRef][Web of Science][Medline]
- Ohdan, T., P.B. Francisco, Jr., T. Sawada, T. Hirose, T. Terao, H. Satoh, and Y. Nakamura. 2005. Expression profiling of genes involved in starch synthesis in sink and source organs of rice. J. Exp. Bot. 56:32293244.[Abstract/Free Full Text]
- Price, H.J., S.L. Dillon, G. Hodnett, W.L. Rooney, L. Ross, and J.S. Johnston. 2005. Genome evolution in the genus sorghum (poaceae). Ann. Bot. (Lond.) 95:219227.[Abstract/Free Full Text]
- Pritchard, J.K., and N.J. Cox. 2002. The allelic architecture of human disease genes: Common disease-common variant...or not? Hum. Mol. Genet. 11:24172423.[Abstract/Free Full Text]
- Pritchard, J.K., and M. Przeworski. 2001. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69:114.[CrossRef][Web of Science][Medline]
- Rahman, A., K. Wong, J. Jane, A.M. Myers, and M.G. James. 1998. Characterization of SU1 isoamylase, a determinant of storage starch structure in maize. Plant Physiol. 117:425435.[Abstract/Free Full Text]
- Rooney, L.W., and F.R. Miller. 1982. Variation in the structure and kernel characteristics of sorghum. p. 143162. In L.W. Rooney and D.S. Murty (ed.) International symposium on sorghum grain quality. ICRISAT, Patancheru, India.
- Rostoks, N., L. Ramsay, K. MacKenzie, L. Cardle, P.R. Bhat, M.L. Roose, J.T. Svensson, N. Stein, R.K. Varshney, D.F. Marshall, A. Graner, T.J. Close, and R. Waugh. 2006. Recent history of artificial outcrossing facilitates whole-genome association mapping in elite inbred crop varieties. Proc. Natl. Acad. Sci. USA 103:1865618661.[Abstract/Free Full Text]
- Rozas, J., J.C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:24962497.[Abstract/Free Full Text]
- Sato, Y., and T. Nishio. 2003. Mutation detection in rice waxy mutants by PCR-RF-SSCP. Theor. Appl. Genet. 107:560567.[CrossRef][Web of Science][Medline]
- Thorbjornsen, T., P. Villand, L.A. Kleczkowski, and O.A. Olsen. 1996. A single gene encodes two different transcripts for the ADP-glucose pyrophosphorylase small subunit from barley (hordeum vulgare). Biochem. J. 313:149154.[Web of Science][Medline]
- Wall, J.D., and J.K. Pritchard. 2003. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4:587597.[CrossRef][Web of Science][Medline]
- Whitt, S.R., L.M. Wilson, M.I. Tenaillon, B.S. Gaut, and E.S. Buckler, 4th. 2002. Genetic diversity and selection in the maize starch pathway. Proc. Natl. Acad. Sci. USA 99:1295912962.[Abstract/Free Full Text]
- Wilson, L.M., S.R. Whitt, A.M. Ibanez, T.R. Rocheford, M.M. Goodman, and E.S. Buckler, IV. 2004. Dissection of maize kernel composition and starch production by candidate gene association. Plant Cell 16:27192733.[Abstract/Free Full Text]
- Yamanaka, S., I. Nakamura, K.N. Watanabe, and Y. Sato. 2004. Identification of SNPs in the waxy gene among glutinous rice cultivars and their evolutionary significance during the domestication process of rice. Theor. Appl. Genet. 108:12001204.[CrossRef][Web of Science][Medline]
- Yu, J., and E.S. Buckler. 2006. Genetic association mapping and genome organization of maize. Curr. Opin. Biotechnol. 17:155160.[Web of Science][Medline]