Published online 2 December 2005
Published in Crop Sci 46:12-21 (2006)
© 2005 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
GENOMICS, MOLECULAR GENETICS & BIOTECHNOLOGY
Single Nucleotide Polymorphisms and InsertionDeletions for Genetic Markers and Anchoring the Maize Fingerprint Contig Physical Map
I. Vroh Bia,*,
M. D. McMullenb,
H. Sanchez-Villedac,
S. Schroederc,
J. Gardinerd,
M. Polaccob,
C. Soderlunde,
R. Wingf,
Z. Fangc and
E. H. Coe, Jr.b
a Inst. for Genomic Diversity, Cornell Univ., Ithaca, NY 14853
b Agronomy Dep., Plant Sciences Unit, Univ. of Missouri, Columbia, MO, and USDA-ARS Plant Genetics Research Unit, Columbia, MO 65211
c Agronomy Dep., Plant Sciences Unit, Univ. of Missouri, Columbia, MO 65211
d BIO5 Inst., Univ. of Arizona, Tucson, AZ 85721
e Arizona Genomics Computational Lab., Univ. of Arizona, Tucson, AZ 85721
f Arizona Genomics Inst., Univ. of Arizona, Tucson, AZ 85721
* Corresponding author (biv2{at}cornell.edu)
 |
ABSTRACT
|
|---|
Single nucleotide polymorphisms (SNPs) and insertiondeletions (InDels) are becoming important genetic markers for major crop species. In this study we demonstrate their utility for locating fingerprint contigs (FPCs) to the genetic map. To derive SNP and InDel markers, we amplified genomic regions corresponding to 3000 unigenes across 12 maize (Zea mays L.) lines, of which 194 unigenes (6.4%) showed size polymorphism InDels between B73 and Mo17 on agarose gels. The analysis of these InDels in 83 diverse inbred lines showed that InDels are often multiallelic markers in maize. Single nucleotide polymorphism discovery conducted on 592 unigenes revealed that 44% of the unigenes contained B73/Mo17 SNPs, while 8% showed no sequence variation among the 12 inbred lines. On average, SNPs and InDels occurred every 73 and 309 bp, respectively. Multiple SNPs within unigenes led to a SNP haplotype genetic diversity of 0.61 among inbreds. The unigenes were previously assigned to maize FPCs by overgo hybridization. From this set of unigenes, 311 (133 SNP and 178 InDel) loci were mapped on the intermated B73 x Mo17 (IBM) high-resolution mapping population. These markers provided unambiguous anchoring of 129 FPCs and orientation for 30 contigs. The FPC anchored map of maize will be useful for map-based cloning, for genome sequencing efforts in maize, and for comparative genomics in grasses. The amplification primers for all mapped InDel and SNP loci, the diversity information for SNPs and InDels, and the corresponding overgoes to anchor bacterial artificial chromosome (BAC) contigs are provided as genetic resources.
Abbreviations: BAC, bacterial artificial chromosome cM, centimorgan FPC, fingerprint contig InDel, insertiondeletion polymorphism MMP, Maize Mapping Project PCR, polymerase chain reaction PIC, polymorphic information content RFLP, restriction fragment length polymorphism SNP, single nucleotide polymorphism SSR, simple sequence repeat
 |
INTRODUCTION
|
|---|
MAIZE, one of the most important crops in the world, is a diploid species with an estimated haploid genome size of 2500 Mb, and a high level of sequence complexity due to the abundance of multiple families of repetitive elements (SanMiguel et al., 1996; Myers et al., 2001). Knowledge of the maize genome will allow the improvement of desired traits by refining gene isolation and molecular breeding strategies. Molecular markers provide a means to estimate diversity, to assess genetic relationships, and to map genes underlying important agronomic traits in crop species. The utility of DNA markers for determining relationships and genetic similarity in maize was reported for DNA markers such as restriction fragment length polymorphism (RFLP) and for simple sequence repeat (SSR) (Taramino and Tingey, 1996; Smith et al., 1997; Senior et al., 1998; Bernardo et al., 2000, Romero-Severson et al., 2001).
Sequence variants of SNPs and/or InDels are the markers of choice for genotyping and mapping because of their abundance and amenability to high-throughput screening. Furthermore, SNPs and InDels can contribute directly to a phenotype (Thornsberry et al., 2001) or they can associate with a phenotype as a result of linkage disequilibrium (Daly et al., 2001). Despite the extensive use of SNPs to study human genetic disorders, there have been fewer extensive surveys of SNPs in plants (Tenaillon et al., 2001). Previous studies in maize indicated, however, that SNPs and InDels occur at a relatively high frequency in maize genes (Rafalski, 2002; Bhattramakki et al., 2002; Batley et al., 2003a), and in flanking regions of maize microsatellites (Matsuoka et al., 2002; Mogg et al., 2002; Batley et al., 2003b). Single nucleotide polymorphism discovery projects routinely identify InDel polymorphisms that can be used as diagnostic or mapping tools by analyzing size polymorphisms of polymerase chain reaction (PCR) products on agarose gel, as well as SNPs that can be mapped by a number of assay systems (Kwok 2000).
Integrated genetic and physical genome maps are essential for map-based cloning, comparative genomics, and as templates for genome sequencing efforts. A primary goal of the Maize Mapping Project (MMP) is to develop an integrated genetic and physical map of maize to enable the cloning of maize genes controlling phenotypes and crop productivity. Positioning genes to chromosomal location is important for establishing correlation with phenotype, thus facilitating our understanding of biochemical pathways and biological mechanisms controlling agronomically important traits (Davis et al., 1999; Matthews et al., 2001). The MMP has assembled a high-resolution genetic map for the intermated B73 x Mo17 (IBM) population (Lee et al., 2002) consisting of
1000 RFLP, and
1000 SSR markers (MaizeGDB, www.maizegdb.org, verified 4 Aug. 2005). Many of the RFLPs and SSRs were generated from EST sequences (Davis et al., 1999; Sharopova et al., 2002), and therefore, directly mark the position of candidate genes for phenotypes and traits. The parents of the IBM population, B73 and Mo17, represent the two major heterotic groups of U.S. maize germplasm, namely Stiff Stalk (B73) and non-Stiff Stalk (Mo17). The IBM population (Lee et al., 2002) includes 302 recombinant inbred lines that underwent four generations of random mating at the F2 stage. The combination of a large number of lines and the map expansion due to the random mating generations result in a genetic map resource with approximately 17 times the resolving power of the prior maize map standard (Coe et al., 2002).
To develop the resources for physical mapping, three BAC librairies representing approximately 27-fold genome coverage were constructed from the inbred line B73. HindIII fingerprinting of all three libraries and contig assembly using FPC software (Soderlund et al., 2000) are being performed at the University of Arizona (Cone et al., 2002; www.genome.arizona.edu/fpc/maize, verified 4 Aug. 2005). The IBM markers have been hybridized to the fingerprinted BACs and the BAC-marker associations have been entered into FPC. This anchors and orders contigs based on the genetic information, hence, providing an integrated genetic and physical map.
Recently, >10 600 maize unigenes were used as the sequence source for overgo probes that were hybridized to high-density BAC filters (Gardiner et al., 2004). These probes identified BACs corresponding to >9300 unigenes. The mapping strategy adopted by the MMP included, in silico correspondence, BAC pooling to anchor mapped markers, and the use of SNPs/InDels as a complementary tool to anchor unmapped unigenes has been outlined previously (Cone et al., 2002). Although a number of the FPCs containing overgo hybridizations were located on the genetic map by sequence similarity of the unigene sequences to RFLP or SSR loci on the IBM map, or by the BAC pooling strategy (Yim et al., 2002), the majority of the unigenes www.agron.missouri.edu/files_dl/MMP/Cornsensus/) are currently without a genetic map position. To speed integration of the maize genetic and physical maps we have undertaken a mapping approach to place unigenes that contain B73/Mo17 polymorphism on the IBM genetic map through SNPs or InDels, thereby anchoring the corresponding contigs.
The goals of the present research are (i) to analyze the utility of InDels identified within the MMP and address how these InDels can serve as general genetic markers for the maize research community; (ii) to conduct physical mapping of unigenes using SNPs and InDels to anchor BAC contigs to the genetic map; and (iii) to disseminate the maize SNP and InDel resources to the scientific community.
 |
MATERIALS AND METHODS
|
|---|
Plant Materials
The genomic DNA from 12 inbred lines was used for SNP and InDel discovery. Among the 12 lines, B73 and Mo17 are the parents of the IBM population (Lee et al., 2002). Four of the lines, Tx303, CO159, T218, and GT119, are the parents of two additional SSR mapping populations (Sharopova et al., 2002). The lines NC7A, Mp708, and Tx501 were chosen to further broaden the germplasm examined. The line W22R-scm2 was included because of its extensive use by many maize geneticists. Illinois High Oil (IHO) and Illinois Low Oil (ILO) were added as part of a collaboration project on application to agronomic trait analysis. The SNPs and InDels that were polymorphic between B73 and Mo17 were genotyped in 286 individuals of the IBM population (Lee et al., 2002, Sharopova et al., 2002). Eighty-three maize inbred lines that represent much of the diversity in maize (Remington et al., 2001) were used to assess InDel diversity (Supplemental Table 1). The origin and the composition of the 83 lines are described at http://www.maizegenetics.net/germplasm/lines.htm (verified 4 Aug. 2005).
View this table:
[in this window]
[in a new window]
|
Supplemental Table 1. Origin of the 83 diverse inbreds of maize assayed for 173 insertiondeletion polymorphism markers.
|
|
Amplification of Genomic DNA and Sequence Analysis
The unigene set www.agron.missouri.edu/files_dl/MMP/Cornsensus/, verified 4 Aug. 2005) was generated at DuPont using publicly available maize ESTs to seed their proprietary EST collection (Cone et al., 2002, Gardiner et al., 2004). The 3'-untranslated regions (3'-UTR) of unigenes were targeted to design PCR primers that amplify about 300 to 500 bp, using Primer3 (Whitehead Institute, Cambridge, MA). All PCR reactions were performed using a PTC-225 thermocycler from MJ Research (Watertown, MA). The following touchdown PCR program was used to amplify all the unigenes in the same conditions: 10 cycles of 1 min at 94°C, 1 min at 65°C with 1°C increment per cycle, and 90 sec at 72°C, followed by 35 cycles of 1 min at 94°C, 1 min at 55°C, and 90 sec at 72°C. The PCR amplification of unigenes was confirmed on 2% SFR agarose gels (Amresco, Solon, OH) stained by ethidium bromide.
Large InDels, visible on a 2% SFR agarose gel (Amresco, Solon, OH) between B73 and Mo17, were further analyzed to assess the polymorphic information content (PIC) of each InDel in the 83 inbred lines listed in Supplemental Table 1. The PIC values were calculated using the formula PIC = 1
i=1 to n x fi2, where fi is the frequency of the ith allele, and n is the number of alleles (Smith et al., 1997).
Loci polymorphic due to InDels detectable on agarose gels between B73 and Mo17 during primary screening were PCR-amplified in 286 individuals of the IBM population, and genotypes were visually determined directly from agarose gels. Primers that did not reveal InDels between B73 and Mo17 and gave consistent, single product amplification in all 12 lines were advanced to the sequencing phase for SNP and short InDel discovery. Nucleotide diversity parameter
(Tajima 1983) and haplotype diversity were analyzed using the DnaSP program (Rozas et al., 2003).
Identification of Sequence Variants and Data Management
A single PCR fragment per unigene was sequenced on both strands. Only SNPs that occurred in at least two of the 12 inbred lines were considered for mapping to minimize sequencing errors and PCR artifacts. Sequence variants were highlighted as described in Fig. 1, and SNP interrogation primers were typically designed with a 5061°C annealing temperature using the Oligo Analyzer (v. 2.0, IDT, San Jose, CA). Oligonucleo tide primers were synthesized by MWG Biotech, Inc. (High Point, NC).

View larger version (31K):
[in this window]
[in a new window]
|
Fig. 1. Pipeline for polymerase chain reaction amplification of unigenes, sequencing, and data processing for genotyping and mapping single nucleotide polymorphisms and insertiondeletion loci.
|
|
Bioinformatics routines developed to process data in the SNP project required integration of existing programs with custom PERL and Visual Basic scripts for sequence alignment and quality evaluation, automated detection of sequence variants and their position in the sequence, and generation of mapping files (Fig. 1). The SNP discovery and mapping pipeline was integrated into the Laboratory Information Management System (LIMS) of the MMP (Sanchez-Villeda et al., 2003).
Multiplex Single Nucleotide Polymorphism Assays
The approach for determination of SNP genotypes has seven steps: (i) separate PCR amplification of each unigene, (ii) pooling of up to six different unigene PCR products for simultaneous detection of sequence variants, (iii) exonuclease and phosphatase digestion, (iv) multiplex SNP primer extension reaction, (v) second phosphatase digestion, (vi) resolving labeled interrogation primers on an ABI PRISM 3700 Sequencer, and (vii) analysis of data with GeneScan and Genotyper software (Applied Biosystems, Forster City, CA). Sequence variants detected by sequence alignment were first validated in the parental lines B73 and Mo17 before determining genotypes in the IBM population following the manufacturer's protocol for the SNaPshot Kit (Applied Biosystems, Forster City, CA). The multiplex SNP extension consisted of a 0.5-µL aliquot of the enzyme-treated mix per individual SNP and the extension products were resolved on an ABI PRISM 3700 Sequencer (Applied Biosystems, Forster City, CA).
Linkage Mapping
Genotypes for loci exhibiting SNPs and/or InDels between B73 and Mo17 were determined for 286 individuals of the IBM population. Linkage mapping was conducted using Mapmaker/EXP version 3.0b (Lander et al., 1987) on a UNIX platform. The SNP and the InDel loci were integrated into a framework IBM map consisting of 247 RFLP and SSR loci. Markers were assigned to chromosomes with a minimum LOD score of 15 using the assign command. The SNP and InDel loci were ordered on the chromosomes with the build (LOD 3) and place (LOD 2) commands. Map distances were calculated using the Haldane mapping function.
 |
RESULTS
|
|---|
Optimization of Single Nucleotide Polymorphism Reactions
At the beginning of the genotyping process, it was necessary to optimize reaction conditions and determine the number of SNPs per multiplex reaction. The DNA of 286 individuals of the mapping population was arrayed into three 96-well plates. After PCR amplification, plates of different unigenes of a given multiplex group were bulked into final plates for enzyme treatment before use in SNP reactions. In a six-plex reaction, this results in avoiding enzyme treatment for 15 plates per multiplex reaction. For the number of SNPs per multiplex reactions, we tested three- to eight-plex. Under our conditions, we found that the maximum number of amplicons per multiplex reaction to achieve a good resolution of SNP peaks on an ABI 3700 capillary sequencer was six. In this case, the maximum volume of multiplex reaction analyzed on the ABI 3700 was 3 µL to keep the background noise at the minimum level. To assess the accuracy of the genotyping and mapping process, the genotype determination process was repeated at three different SNPs, in two different multiplex reactions for three unigenes; AY110160 on chromosome 1, AY110063 on chromosome 5, and AY109644 on chromosome 7. For each unigene, both SNPs mapped to the same genetic location.
Maize SNPs and InDels as Genetic Markers
Genomic regions corresponding to 3000 unigenes were screened by PCR amplification across 12 maize lines. Of these unigenes, 194 (6.4%) showed size polymorphism on agarose gels between B73 and Mo17 due to InDels, herein called large InDels, as opposed to short-length InDels that were present in sequences but did not result in visually obvious size polymorphism on 2% SFR (Amresco, Solon, OH) agarose gels. Accounting for large and short InDels resulted to an average of one InDel every 309 bp. Since the first objective of this research was to anchor the maize FPCs, unigenes showing large InDels between B73 and Mo17 were not sequenced. Rather, they were genetically mapped by agarose gel screening of 286 individuals of the IBM population.
We conducted SNP discovery on the first set of 592 unigenes sequenced in the project. The analysis showed that 260 unigenes (44%) contained B73/Mo17 SNPs, while 8% of the unigenes did not show any polymorphism among the 12 inbred lines. On average, 5.5 sequence variants occurred per unigene, with SNPs and InDels occurring every 73 bp and 309 bp, respectively. A previous study in maize found a frequency of one SNP every 70 bp, and one InDel every 160 bp (Rafalski et al., 2001). To assess the utility of SNPs as genetic markers, more detailed analysis was conducted on 470 unigenes having aligned sequence longer than 200 nucleotides across the 12 lines. The genetic diversity (
, the average pairwise difference) assessed in all the 470 unigene alignments varied from 0 to 0.016, with an average of 0.009. Individual SNP are almost always biallelic, leading to a theoretical maximum genetic diversity level of a single SNP marker of 0.5. However, when combinations of SNPs present in the 470 unigenes were used to derive allele haplotypes the genetic diversity range from 0 to 1, with an average of 0.61. This range of SNP haplotype diversity compares favorably with the genetic diversity of RFLP and SSR markers, and demonstrates that SNPs can be highly informative across maize germplasm when analyzed and used at the haplotype level. Unigenes with three haplotypes were the most frequent, with every class from 1 to 12 represented (Fig. 2).

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 2. Distribution of single nucleotide polymorphism haplotype class among 470 maize unigenes analyzed for sequence diversity.
|
|
To assess the discriminatory power of InDels as molecular markers, 173 of the 194 large InDels identified between B73 and Mo17 were PCR-amplified in 83 diverse inbred lines (Supplemental Table 1). Figure 3 shows an example of PCR amplification in unigene AY104252 in a set of these inbred lines. The 173 markers detected a total of 539 alleles among the 83 inbred lines. The number of alleles per unigene varied from 2 to 6 with the three-allele class (39.8%) as the largest (Table 1). These results showed that InDels are often multiallelic markers in maize. The PIC values for these InDel markers for the 173 unigenes ranged from 0.04 to 0.76, with an average value of 0.47 (Fig. 4). For assessment of SSRs in the same germplasm, one SSR was chosen per chromosome and tested against the 83 inbred lines. The number of SSR alleles varied from 2 to 5 and PIC values ranged from 0.22 to 0.73, with a mean of 0.54, only slightly higher than that of the large InDel markers. Information on allele number, PIC values, overgo identification, and sequence of PCR primers are given in Supplemental Table 2 for each InDel marker.

View larger version (49K):
[in this window]
[in a new window]
|
Fig. 3. Profile of a B73/Mo17 insertiondeletion marker in 2% SFR (Amresco, Solon, OH) agarose gel for unigene AY104252 showing three alleles in diverse inbreds of maize.
|
|

View larger version (13K):
[in this window]
[in a new window]
|
Fig. 4. Distribution of polymorphic information content values for173 insertiondeletion markers tested in 83 diverse inbreds of maize.
|
|
View this table:
[in this window]
[in a new window]
|
Supplemental Table 2. Allele number, polymorphic information content (PIC) values, and sequence of polymerase chain reaction primers for 173 B73/Mo17 insertion-deletion polymorphism markers derived from 173 maize unigenes, and displaying size polymorphism on 2% SFR agarose gel among 83 diverse inbreds of maize. Accession numbers were sorted by ascending order. Overgo names used to view contigs and hybridized bacterial artificial chromosomes in WebFPC http://www.genome.arizona.edu/fpc/WebAGCoL/maize/WebFPC/, verified 2 Aug. 2005).
|
|
Genotyping of SNPs and InDels in the IBM Population
Using one SNP interrogation primer per unigene, we tested 260 unigenes on B73 and Mo17, the parental lines of our IBM mapping population, to validate the SNPs. The B73 and Mo17 genotypes for 246 (95%) SNPs were in concordance with the SNPs predicted by the sequencing data. Among the validated SNPs, 133 were used to determine genotypes in 286 individuals of the IBM population by multiplex primer extension assays. The genotypes of 178 unigenes displaying large InDels between B73 and Mo17 were scored also on the IBM population by direct size polymorphism on 2% SFR (Amresco, Solon, OH) agarose gels. Therefore, a total of 311 unigene loci (represented by 178 InDel and 133 SNP markers) were derived for placement on the IBM genetic map via the LIMS (Sanchez-Villeda et al., 2003) by automated data processing and mapping.
Anchoring FPCs on the IBM Genetic Map
The 311 maize unigenes genotyped in this study were mapped relative to a framework of 247 genetic markers evenly distributed across the 10 maize chromosomes, thus totaling 558 markers (Fig. 5). The genetic map spans 5630 centimorgans (cM), with the number of mapped unigenes ranging from 21 on chromosome 9 to 54 on chromosome 1.