|
|
||||||||
a CIMMYT, Int, Applied Biotechnology Center, Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
b Facultad de Agronomia, Universidad de la Republica, Ave. Garzon 780, CP 12900, Montevideo, Uruguay
c Institute of Plant Breeding, Seed Science and Population Genetics, Univ. of Hohenheim, 70593 Stuttgart, Germany
* Corresponding author (mwarburton{at}cgiar.org)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: CIMMYT, International Maize and Wheat Improvement Center CML, CIMMYT maize inbred line SSR, simple sequence repeat RFLP, restriction fragment length polymorphism PIC, polymorphic information content
| INTRODUCTION |
|---|
|
|
|---|
Maize yields have been dramatically boosted by the use of hybrids in many parts of the world. Tropical maize germplasm has not been as fully classified into heterotic groups as has U.S. Corn Belt germplasm, and pedigree information is not available for some of the lines and populations used in tropical maize breeding. Past studies have used restriction fragment length polymorphism (RFLP) markers to place temperate lines into known heterotic groups with considerable success (Dubreuil et al., 1996; Messmer et al., 1992, 1993). RFLPs and microsatellites (SSRs) have also been used to assign tropical Asian maize germplasm to heterotic groups, but these groups are still being verified by traditional field crosses (Yuan et al., 2000).
Genetic diversity in the world's major cereal collections is critical as a resource to find new alleles that will improve yield to fight world hunger. Unadapted and wild relatives contain untapped genetic resources for biotic and abiotic stress resistance (Hoisington et al., 1999). Even unadapted parents with poor phenotypes can contribute favorable alleles to their progeny, when these alleles are placed in an adapted background. Therefore, screening based on phenotype may miss much favorable variation. New allelic variation may be identified by means of markers, following which the contribution of these new alleles can be measured phenotypically. Screening must be done (or complimented) with screening based on genotype (Tanksley and McCouch, 1997).
Much of the CIMMYT tropical maize germplasm from the West Indies and the Americas has been organized into core subsets to aid in characterization and utilization of this material by breeders. Accessions from the CIMMYT Maize gene bank were placed into subsets according to classification on the basis of phenotypic traits (Crossa et al., 1995; Taba et al., 1998). It is possible to refine the composition of these core subsets by molecular markers, or a combination of molecular markers and phenotypic field data (Franco et al., 1998, 2001).
To begin to classify a collection as large as the CIMMYT maize germplasm collection and the hundreds of breeding lines and populations created by CIMMYT breeders, very efficient marker protocols must be in place and tested. Dillman et al (1997) suggested the use of RFLPs, but these markers are slow and expensive to run on such a large scale. Microsatellite markers, or SSRs, have been suggested in other studies, and good correlations have been found between SSR and RFLP diversity and pedigree-based measurements (Pejic et al., 1998; Smith et al., 1997). In most cases, SSRs have the added advantage of having been mapped, so the genome can be uniformly sampled in SSR-based genetic classification. The efficiency of SSRs can be further increased by running multiplexed reactions under automatic electrophoresis conditions, as suggested by Mitchell et al. (1997). Possible impediments to the automatic scoring of SSR markers would be those markers that do not follow a stepwise mutation pattern, i.e., fragment size differences that are not a multiple of the repeat unit. It has been shown in Drosophila (Colson and Goldstein, 1999) that only 7 of 17 SSRs showed stepwise mutation patterns. The majority of the remaining 10 SSRs showed small insertions or deletions (indels) and two showed very large indels. The non-stepwise changes in SSRs will affect identification of alleles and should be examined for use in diversity studies. By means of SSRs, it may be possible to compare diversity studies done in different laboratories with the same SSR markers under standard conditions and by including a few standard genotypes across all laboratories.
The objectives of this study were to: (i) determine the minimum number and identity of the most suitable microsatellites to be used in large-scale tropical maize germplasm characterization projects and (ii) analyze the genetic diversity patterns of seven tropical maize populations and 57 inbred lines by the markers determined in Objective (i).
| MATERIALS AND METHODS |
|---|
|
|
|---|
![]() |
|
SSR Primers
Each SSR locus was first amplified separately, and then amplified with other SSR loci under multiplex conditions to find the optimal amplification and electrophoresis conditions. SSR markers were chosen from the MaizeDB database (http://www.agron.missouri.edu/ssr.html, verified June 4, 2002) on the basis of bin location (to maximize genomic coverage) and repeat unit. Dinucleotide repeats were avoided in most cases because of the difficulty in accurately sizing alleles that differ by only two bases. Fluorescent oligonucleotides were bought from commercial companies (Operon Technologies, Inc., Alameda, CA, or GIBCO BRL Life Technologies, Inc., Burlington, ON) and forward primers were labeled at the 5' end with either 6-carboxyfluorescein (6-FAM), tetrachloro-6-carboxyfluorescein (TET), or hexachloro-6-carboxyfluorescein (HEX). Multiplexed PCR reactions were performed in 10-µL volumes containing 2 µL of template DNA (output of the sap extractor diluted 5x with distilled, deionized H2O), 0.4 to 4.0 pmols each of 1 to 4 primers, 1x PCR buffer, 0.25 mM dNTPs, 1.5 to 2.5 mM MgCl2 and 0.75 U Taq polymerase. The reactions were performed with a Peltier Thermal cycler (MJ Research, Inc., Watertown, MA) using the amplification conditions of 94°C for 2 min; followed by 30 cycles of 94°C for 30 s, X°C for 1 min, and 72°C for 1 min; followed by extension at 72°C for 5 min. X°C refers the annealing temperature, which is specified for each primer in Table 2
.
|
Data Entry
Fragment sizes were automatically calculated with GeneScan 3.1 (Perkin Elmer/Applied Biosystems) using the Local Southern sizing method. The GeneScan data were appended to a table with Genotyper 2.1 (Perkin Elmer/Applied Biosystems), and then exported as an Excel file recording peak size for each individual. Peak sizes were converted to the proper configuration for subsequent analysis (0,1 binary matrix). For population analysis, binary data from the 48 individuals in each population were converted to allele frequencies for each population.
Peak sizes were converted to alleles by creating categories in Genotyper, which combines peak sizes within a predetermined range into the same allele, and thus takes into account error during size calling. Eight primers were chosen for a repeatability test to indicate the error rate in size calling. SSRs with 2, 3, and 4 base repeats were tested by amplifying them singly or under multiplex conditions on 10 inbred lines four separate times (same DNA was amplified and electrophoresed and the alleles called by GeneScan). All primers in this study were tested for repeatability by amplifying them under multiplex conditions at least four separate times on two inbred lines.
Alleles of the SSRs were assumed to increase in size in a stepwise manner in increments of the number of base pairs that reflected the repeat unit. SSRs whose polymorphisms did not follow stepwise mutation patterns and would be consistent with indels of greater than one base pair were eliminated at a very early point in the study because they could not be scored automatically by Genotyper. The remaining SSRs that followed, for the most part, a stepwise polymorphism pattern were scored as though all polymorphisms followed repeat units. Polymorphisms generated by indels were discounted, since an indel of only one or two base pairs would be equal to or smaller than the error associated with allele calling for most of the primers, and so would not preclude the correct scoring of polymorphisms because of differences in number of repeats. Very large indels in the amplified region would create fragments too far out of the expected range of sizes, and would most likely be scored as missing data. This would cause a loss of information, but unless all fragments outside the expected size range were sequenced, it would not be possible to determine whether they were artifacts or contaminants. Indels of numbers of bases equal to the repeat unit (or equal to the repeat unit plus or minus error associated with allele calling) would cause different polymorphisms to be called the same allele, which again would cause a loss of information that could not be avoided without sequencing. However, the error of calling two alleles the same, when they are actually different, is less grave an error than calling two alleles different, when they are actually the same. The error associated with assuming two alleles of the same size are identical by descent when one was generated by an indel and the other by the addition (or deletion of) repeat units should be minimized if sufficient numbers of markers are used.
Selecting the Minimum Subset of SSRs
All individuals in the study were genotyped with a total of 104 SSRs. Basic information from each primer including polymorphism information content (PIC) (Powell et al., 1996) and percent heterozygosity in the inbreds was calculated. A high percent heterozygosity in the inbreds for one or a few primers indicated a problem with the primer, such as uncorrectable stuttering or amplification of a second locus with the same approximate size. Eleven SSRs were removed from the study for these reasons and an additional eight were removed because polymorphisms did not follow the repeat unit and it was not possible to assign these amplification products automatically to the proper allele category by means of Genotyper.
To determine if the number of markers could be reduced in future studies, loci with polymorphisms that followed the repeat unit and with zero (or a very low) heterozygosity in the inbreds were analyzed retrospectively. This was done to determine the minimum number of markers necessary to describe adequately the variation present in the study and to identify the most discriminating (informative) markers (Franco et al. 2001). This analysis was done separately for the inbred lines and for each population to see if there was agreement in the identity and number of markers chosen in each case.
The retrospective analysis classified the individuals using all the alleles of the SSRs used in the study. The first step was to estimate the optimum number of groups (clusters) present in the study by means of the fusion values obtained from the Ward method and the upper-tail rule (Mojena, 1977). The AMOVA procedure (Excoffier et al., 1992) was then used to estimate the "within" and the "among" cluster variance components for different numbers of clusters to determine the number of clusters that produced the largest increment in the among-groups variance (largest reduction in the within-group variance) and/or the maximum average distance among the groups. The generalized linear model (McCullagh and Nelder, 1983) and the GENMOD procedure of SAS (1993) were used to determine the level of significance of each fragment produced by the SSRs in the study. This method ranked the fragments by their ability to discriminate among clusters. Finally, the individuals were again clustered on the basis of a reduced number of fragments, selected by their significance level, to find the minimum number that could be used to generate the same (or very similar) classification as that obtained using all the fragments.
Analysis of Genetic Diversity
For diversity analysis, a matrix of binary or allele frequency data was constructed with columns equal to genotypes (in the case of the inbred lines) or populations, and rows equal to distinct molecular marker fragments (alleles of each primer). For the 57 inbreds, the body of the matrix contains zeros and ones, corresponding to the absence or presence, respectively, of each fragment in each genotype. For the seven populations, the body contains the frequency of each allele in each population. For inbreds, similarity matrices were constructed from the binary data by the Simple Matching similarity coefficient (Kaufman and Rousseeuw, 1990). This is the coefficient that is used in the retrospective analysis to find the minimum subset of markers because it has Euclidian metric properties. Jaccard's and Dice similarity coefficients were also calculated, but were found to give the same results, as correlations between matrices [calculated by the MXCOMP procedure of NTSYSpc 2.01 (Rohlf, 1997)] were significant (rjaccard's vs. simple matching = 0.95**; rjaccard's vs. dice = 0.99**; rdice vs. simple matcing = 0.95**). Therefore, only the Simple Matching coefficient was used to create dendrograms. Matrices of genetic distance between populations were calculated by means of Nei (1972), Roger's, and Modified Roger's coefficients (SAS, 1993, Cary, NC). Standard deviation was calculated for between population distances, and standard error was calculated for within population distances to test the significance of the differences.
Dendrograms of the inbred lines were constructed from the similarity matrix by the UPGMA method (Rohlf, 1997) to visualize the patterns of diversity in the set of lines, and a bootstrap analysis was run by SAS (1993) to calculate the confidence intervals associated with each cluster. For the populations, dendrograms were created with the Ward and UPGMA clustering methods of NTSYS.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Allelic Diversity
The 85 SSR loci in this study amplified a total of 416 bands in the inbred lines, with an average number of 4.9 and a range of 2 to 14. In the populations, the 85 SSRs amplified a total of 531 bands, with an average number of 6.3 and a range of 2 to 16. Most previous studies of SSR diversity in maize revealed a similar allelic diversity in inbreds. For example, Lu and Bernardo (2001) report 40 U.S. maize inbreds averaged 4.9 alleles for 83 SSR loci. Senior et al. (1998) reported an average of 5.0 alleles. However, Pejic et al. (1998) reported an average of 6.8 alleles per 27 SSR loci in 33 U.S. maize inbreds, even higher than that reported in the CIMMYT populations. All marker data for inbred lines and populations have been submitted to the Maize-DB database.
Minimum Number of Markers
The retrospective analysis of the inbred lines used to choose the minimum number of markers identified 53 SSR loci having one or more alleles that were highly discriminatory and the most informative (data not shown). These 53 SSRs produced the same clusters as produced from the entire set of 85 in the retrospective analysis. The same analysis was run separately on each of the seven populations. Forty-six of the SSR loci were identified in common in most of the populations and the inbreds; 25 were identified as containing alleles that were discriminatory in only one or a very few populations and 14 of the loci were identified in both the populations and the inbreds as being noninformative (data not shown). This indicates that many of the markers were always informative, and a few of the markers were never ranked as highly informative in any of the analyses. Of course, the composition of the population under study influences which markers will be the most discriminatory.
The most discriminatory SSRs were not always those with the highest PIC. For example, Phi046 identified only two alleles, and had a PIC value of 0.46. However, it was identified as being discriminatory in the inbreds and in five of the seven populations. In contrast, Zcaa391 amplified 14 alleles and had a PIC value of 0.85, but was identified as discriminatory in only one of the populations and not in the inbreds. It is not possible to predict, therefore, which SSRs will be the most discriminatory on the basis of PIC alone. On the basis of the retrospective analysis, and for reasons of cost efficiency, it was decided that the 53 SSRs determined by analysis of the CMLs (which included almost all the 46 that were also identified in the seven populations) will be routinely used in future genetic diversity studies. These 53 markers still provide good coverage of the genome, with at least three and up to seven SSRs per chromosome (Table 2). Fingerprinting for proprietary protection of breeding lines could, however, be run with fewer SSRs, and if a particular data set of 53 SSRs shows poor resolution of patterns of diversity or low confidence of clustering when a bootstrap analysis is applied, up to 85 SSRs (or even more) could still be run.
Populations
Nei's genetic distance calculated on the allele frequencies of each of the seven populations created the dendrogram presented in Fig. 1
by the Ward clustering method. The Ward method combines the two clusters in each step whose fusion leads to the smallest increase in the Euclidean sum of squares within groups, thus leading to a maximized variance between groups and a minimized variance within groups. This was particularly important for the populations, since the variance within populations was already very high. Furthermore, clustering determined by the UPGMA method produced a very similar dendrogram, and is therefore not presented here. The populations clustered as would be predicted on the basis of pedigree and known heterotic group as defined by field evaluations using testers. Cluster 1 contains Populations 21, 22, 29, and Pool 24. Populations 21 and 22 are tropical, late white dent or semident maize types originated from Pool 24, which is from the Tuxpeño race of maize. All these populations belong to heterotic group A. Population 29 is also a tropical, late white dent maize, with both Tuxpeño and Caribe races in its background, and shows heterosis when crossed to both A and B testers, indicating it belongs to neither group. It is the farthest outlier in Cluster 1. Cluster 2 contains only Population 43, which is a mixture of La Posta elite lines, and is also tropical, late, white, and dent. It also belongs to neither heterotic group A nor B. Cluster 3 contains Populations 25 and 32, which are tropical to subtropical, intermediate to late white flints, and belong to heterotic group B.
|
|
|
| ACKNOWLEDGMENTS |
|---|
Received for publication October 22, 2001.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. L. Warburton, J. C. Reif, M. Frisch, M. Bohn, C. Bedoya, X. C. Xia, J. Crossa, J. Franco, D. Hoisington, K. Pixley, et al. Genetic Diversity in CIMMYT Nontemperate Maize Germplasm: Landraces, Open Pollinated Varieties, and Inbred Lines Crop Sci., March 19, 2008; 48(2): 617 - 624. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. C. Xia, J. C. Reif, A. E. Melchinger, M. Frisch, D. A. Hoisington, D. Beck, K. Pixley, and M. L. Warburton Genetic Diversity among CIMMYT Maize Inbred Lines Investigated with SSR Markers: II. Subtropical, Tropical Midaltitude, and Highland Maize Inbred Lines and their Relationships with Elite U.S. and European Maize Crop Sci., October 27, 2005; 45(6): 2573 - 2582. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zhang, Y. Lu, R. G. Cantrell, and E. Hughs Molecular Marker Diversity and Field Performance in Commercial Cotton Cultivars Evaluated in the Southwestern USA Crop Sci., June 24, 2005; 45(4): 1483 - 1490. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. C. Xia, J. C. Reif, D. A. Hoisington, A. E. Melchinger, M. Frisch, and M. L. Warburton Genetic Diversity among CIMMYT Maize Inbred Lines Investigated with SSR Markers: I. Lowland Tropical Maize Crop Sci., November 1, 2004; 44(6): 2230 - 2237. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Reif, X. C. Xia, A. E. Melchinger, M. L. Warburton, D. A. Hoisington, D. Beck, M. Bohn, and M. Frisch Genetic Diversity Determined within and among CIMMYT Maize Populations of Tropical, Subtropical, and Temperate Germplasm by SSR Markers Crop Sci., January 1, 2004; 44(1): 326 - 334. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Mohammadi and B. M. Prasanna Analysis of Genetic Diversity in Crop Plants--Salient Statistical Tools and Considerations Crop Sci., July 1, 2003; 43(4): 1235 - 1248. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Plant Registrations | Soil Science Society of America Journal | ||||
| Journal of Natural Resources and Life Sciences Education |
Journal of Environmental Quality |
||||