|
|
||||||||
a USDA -ARS, BARC, PSI, SPCL, 10300 Baltimore Ave. Bldg. 50 BARC-W, Beltsville, MD 20705, USA
b USDA-ARS, Tropical Agric. Research Station, P.O. Box 70, Mayaguez, PR 00681, Puerto Rico
c Genetic Engineering & Biotechnology Research Institute (GEBRI), Sadat City, Minufiya Univ., Egypt
d Molecular Biology, Biochemistry and Bioinformatics (MB3), 360 Smith Hall, 8000 York Road, Towson Univ., Towson, MD 21252, USA
* Corresponding author (ZhangD{at}ba.ars.usda.gov)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: ADO, Allele dropout FA, false allele PID, probability of identity SSR, single sequence repeats
| INTRODUCTION |
|---|
|
|
|---|
The development of simple sequence repeats (SSR) markers in cacao (Lanaud et al., 1999) has significantly enhanced the capacity of molecular characterization of cacao germplasm. This technique has been applied for cacao clone identification (Cryer et al., 2006; Saunders et al., 2004), parentage analysis (Schnell et al., 2005), diversity assessment (Lanaud et al., 1999, 2001), and investigation of the origin and dispersal of cacao (Motamayor et al., 2002, 2003). The USDA, together with its collaborating institutions, has undertaken a program to identify cacao genotypes and describe the genetic diversity of living plant germplasm collections that are maintained in 10 to 12 national and international collections located within tropical cacao growing countries of Central and South America. During international forums held in England and in France in 2001, a consortium of scientists and representatives from the cacao industry, academic centers involved in cacao research, and representatives from multiple international, government-sponsored laboratories reached an agreement that a set of standardized SSR primers would be used to characterize all T. cacao germplasm collections (Saunders et al., 2001, 2004). The strategy for the identification of mislabeling is to develop a reference SSR profile for each original accession. Then the reference SSR profiles will be compared with the putatively mislabeled accessions. Corrections will then be made as necessary (Turnbull et al., 2004).
However, the accuracy and reliability of using high throughput microsatellite analysis for cacao clone identification have not yet been rigorously assessed. Several questions must be addressed if this tool is to be applied for large-scale genotyping of cacao germplasm. First, we need to know if the set of proposed SSR loci have enough discriminating power to establish unique multilocus profiles. If too few loci are examined, multilocus genotypes in a population may not be unique; thus, closely related clones will be indistinguishable. The problem of finding different clones with the same SSR profile can be solved by increasing the number of loci examined, so that the probability that two different clones have the same multilocus genotype is small. This probability, often called the "probability of identity" (PID) in forensic science, can be estimated from the allele frequencies in a population (Waits et al., 2001). Probability of identity can also be conditioned on a given relationship (unrelated, parent-offspring, or sibling) between the two individuals (Waits et al., 2001).
Second, microsatellite-based DNA fingerprinting is not an error-free technology. Despite the use of highly robust fingerprinting protocols and the preparation of high quality DNA samples, genotyping errors still occur, which causes duplicate samples of a clone to appear to have different microsatellite profiles. Such error, even at modest rates, can distort the result of cultivar identification. Repeated genotyping is widely recommended as the most reliable approach to tackle the problem of genotyping error (Taberlet and Luikart, 1999; Bonin et al., 2004). With this approach, several PCRs are performed for a given locus and a consensus genotype is developed on the basis of the results of multiple PCRs. It is unlikely that a given allele will drop out in all multiple repeated PCRs. This approach has been applied in projects dealing with low quality DNA extracted from noninvasive samples, such as animal hairs and fecal samples (Taberlet et al., 1996; Taberlet and Luikart, 1999; Paetkau, 2003, 2004). Although this approach is reliable, accomplishment of a consensus profile requires large numbers of amplifications, adding significant extra experimental cost. Therefore, it is important to understand the magnitude of genotyping error so that the minimum number of essential repetitions can be applied.
There are a number of sources of errors encountered in automated microsatellite genotyping. These include human error, sample contamination, and errors occurring during amplification and electrophoresis (Hoffman and Amos, 2005). Whereas human errors and contamination can be avoided by cautious laboratory work or can be detected by controls, errors in amplification during PCR may not be avoided through laboratory procedures. Allelic dropouts (ADO: one allele of a heterozygous clone is not amplified during PCR), and false alleles (FA: PCR-generated allele results from a slippage artifact during the early cycles of the reaction) are the two source of errors that cannot be easily monitored and thus need to be quantified (Taberlet et al., 1996, Taberlet and Luikart, 1999; Broquet and Petit, 2004).
The present study aimed to examine the accuracy and reliability of the 15 SSR loci currently used to identify cacao accessions and to use this information for development of an optimum genotyping scheme. Using the USDA cacao collection at Mayaguez, Puerto Rico, as a test case, we quantified the genotyping error rate through repeated genotyping, and simulated the impact of the genotyping error on cacao clone identification. We then estimated the PID for the 15 standard microsatellite loci, conditioned on unrelated accessions, as well as on full siblings. On the basis of the results, we propose a genotyping protocol for the identification of mislabeling in cacao germplasm collections.
This study is part of an international collaborative project on DNA fingerprinting of cacao germplasm in the Americas. The resultant information will be used to design an optimum scheme for genotyping cacao germplasm and improve our understanding about the extent of mislabeling in the genebanks, thus facilitating the efficient conservation and use of cacao germplasm.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Cacao leaves have very high levels of endogenous phenolics, which can interfere with DNA isolation procedures (Griffiths, 1958; John, 1992; Katterman and Shattuck, 1983). Initial investigations of various DNA isolation protocols identified two methods that worked well for cacao microsatellite DNA analysis, and these were used interchangeably to yield consistent results. DNA was isolated from 50-mg samples of T. cacao leaf material using either the DNA Xtract Plus kit (D2 BioTechnologies Inc., Atlanta, GA) or the DNeasy Plant System (Qiagen, Valencia, CA). For either method, the air-dried and frozen leaf samples were cut into small pieces and placed into a 2-mL tube, sandwiched between ceramic spheres, with garnet matrix (Qbiogene, Inc., Carlsbad, CA). Lysis solution was added following the manufacturers' recommendations, except 10 mg/mL of polyvinylpolypyrrolidone (Sigma, St. Louis, MO) had been added to the Qiagen buffer AP1. Samples were homogenized in a Fast Prep instrument (Bio101-Qbiogene, Inc., Carlsbad, CA) as described previously (Saunders et al., 2001).
The DNA Xtract Plus procedure was, in brief, lysis, clarification by centrifugation, and solvent phasing followed by precipitation on ice. DNA was collected by centrifugation, washed in 70% (v/v) ethanol, centrifuged, dried, and resuspended in sterile water or buffer. The Dneasy Plant System isolation procedure included tissue lysis and RNase A treatment at 65°, followed by centrifugation and precipitation of detergent, proteins, and polysaccharides. Cell debris and precipitates were removed by centrifugation of the slurry through a QIAshredder spin column (Qiagen) assembly. Ethanol was used to precipitate the DNA in the cleared filtrate, which was loaded onto the DNeasy column. DNA was bound to the column's silica membrane by centrifugation, washed with 70% ethanol, and finally eluted from the membrane with preheated buffer. The presence of double-stranded DNA was verified by measuring DNA quantity with PicoGreen (Molecular Probes, Inc., Eugene OR) using a Fluoroskan Ascent microplate reader equipped with 485/538 excitation/emission filters (Labsystems, Helsinki, Finland).
DNA amplification used primer sets with sequences previously described (Lanaud et al., 1999; Saunders et al., 2004). Primers were synthesized by Proligo (Boulder, CO), and forward primers were 5'-labeled using WellRED fluorescent dyes (Beckman Coulter, Inc., Fullerton CA). PCR was performed as described in Saunders et al. (2004), using commercial hot-start PCR supermixes that had been fortified with an additional 30 units of the respective hot-start Taq DNA polymerase (Invitrogen Platinum Taq, Carlsbad CA; Eppendorf HotMaster Taq, Westbury NY) added to each milliliter of the supermix.
The amplified PCR products were separated by capillary electrophoresis as previously described (Saunders et al., 2004) using a CEQ 8000 genetic analysis system (Beckman Coulter Inc.). Data analysis was performed using the CEQ 8000 Fragment Analysis software version 7.0.55 according to manufacturers' recommendations (Beckman Coulter, Inc.). SSR fragment sizes were automatically calculated by the CEQ 8000 Genetic Analysis System. Allele determination was performed using the binning wizard software, which automatically generated locus tags in the CEQ 8000 Genetic Analysis System.
Key parameters for measuring informativeness of these 15 loci were calculated using the program Powermarker (Liu and Muse, 2005). These included mean number of alleles per locus, observed heterozygosity, gene diversity (expected heterozygosity), and polymorphic information content (PIC). The same software was also used to measure the two-locus linkage disequilibrium among the 15 loci. For the estimation of probability of identity, we used a conservative estimation of PID to assess the differentiating power of the SSR loci, assuming that the Mayaguez collection may have closely related clones.
We computed the probability of identity among full siblings (PID-sib), which was defined as the probability that two full-sib individuals drawn at random from a natural population have the same multilocus genotype. The overall PID-sib is the upper limit of the possible ranges of PID in a natural population, thus providing the most conservative number of loci required to resolve all individuals, including relatives (Waits et al., 2001). This can be computed using the following equation:
![]() | [1] |
To quantify the genotyping error rate, we performed an independent experiment, using the multiple tube approach (Taberlet and Luikart, 1999). This approach involves dividing the sample among multiple tubes (or wells), then amplifying and typing the products in each tube separately. The results are analyzed by a statistical procedure that determines whether a genotype can be conclusively assigned to the DNA sample. Thirty DNA samples were chosen from the Mayaguez collection and genotyped independently by three different persons who performed amplification and capillary electrophoresis at different times, using the established protocols in our laboratory (Saunders et al., 2004). In addition to the three repetitions, data from an experiment performed 2 yr previously, when this subset of samples was genotyped by a fourth person, were also considered. Hence, a total of four independent PCRs were performed on the 30 DNA samples. The genotypes obtained were analyzed for locus-specific error rate following the method of Broquet and Petit (2004), where the ADO rate for a locus was estimated by the number of heterozygous individuals genotyped as homozygous individuals, divided by the total number of genotyped heterozygous samples. The FA rate for a locus is estimated as the number of amplifications leading to the creation of one or more false alleles at the given locus, divided by the total number of genotyped samples. Since the procedure used here does not take into account any possible source of error before the DNA sample (i.e., from collection of the leaf sample in the field through extraction of the DNA), the estimates of error rate are underestimates of the true error rate, and the extent of the underestimation is unknown.
The impact of genotyping error on cacao clone identification was assessed in two ways. First, we evaluated the effectiveness of using a single genotyping for cacao identification. The single genotyping approach assumes that, if the genotyping error is reasonably small, most errors will cause samples from the same individual to differ (mismatch) by a small number of loci, and most frequently only a single locus. If enough loci are scored, each clone in the population will have multilocus genotypes (or a DNA profile) with mismatches at more than one locus. Therefore, if both of these criteria are met, most pairs of samples that mismatch by only a single locus will differ because of genotyping error (Kalinowski et al., 2006; Paetkau, 2004). This shows that if genotyping error is a potential problem, enough loci should be examined so that different clones sampled from the population are likely to have genotypes differing by more than one locus (Kalinowski et al., 2006). To test if this is the case for cacao clone identification in our system, we calculated the probability of mismatch for the 141 cacao accessions in the Mayaguez collection, on the basis of the 15 recommended SSR loci, using the computer program MM-DIST (Kalinowski et al., 2006).
Second, we examined the effectiveness of the multitube approach by simulating how many PCR repetitions would be needed to generate a "consensus genotype." We defined the minimum acceptable standard of a consensus genotype as one that appears at least twice in the repeated genotyping (Valière et al., 2002). The simulation used the allele frequency obtained from the 141 cacao accessions in the Mayaguez collection and the observed error rate from the experiment of repeated genotyping. The simulation was implemented using the program of GEMINI (Valière et al., 2002).
To identify duplicates and mislabeling in the Mayaguez cacao collection, a genotyping protocol that combined the multitube approach with the "mismatch tolerant" approach was applied. Pairwise comparison of all the 141 cacao accessions was performed to identify duplicates on the basis of their multilocus profiles (i.e., 15 SSR loci) generated through a single genotyping. Accessions with different names but fully matched at 15 loci were declared duplicates or synonymously mislabeled accessions. Accessions that differed by one, two, or three loci were repeatedly genotyped three times at the disputed loci to generate the consensus genotypes, followed by pairwise matching.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
We then determined how many repeated PCRs would be needed to obtain a consensus genotype at the disputed loci. The minimum acceptable standard of a consensus genotype was defined as a multilocus profile appearing at least twice in the repeated genotyping. The simulation result, on the basis of the observed error rate of the repeated genotyping, shows that 98.7% of the consensus genotype can be identified with a minimum of three PCR repetitions, and 99.7% of the consensus genotypes can be identified with four PCR repetitions. Further increase of PCR repetitions offers little improvement in terms of genotyping accuracy (Fig. 4 ).
|
|
| DISCUSSION |
|---|
|
|
|---|
Progress in developing microsatellite markers in cacao and the availability of high throughput genotyping facilities now enable systematic assessment of genetic identity in the national and international cacao gene banks. In contrast to dominant markers such as AFLP, RAPD and ISSR, identical genotypes can have a 100% match in the multilocus SSR profiles without ambiguity; thus, accuracy of identification is significantly improved. Nevertheless, the effectiveness of clone identification via SSR fingerprints depends on the number of loci used for genotyping, as well as on the rate of genotyping error. Paetkau (2004), for example, recommends genotyping samples one time at six loci and using stringent quality control protocols to avoid genotyping error.
The multitube approach has emerged as a widely accepted protocol for eliminating errors from noninvasive samples (Taberlet and Luikart, 1999). Taberlet et al. (1996) recommended genotyping such samples three to seven times.
However, repeated genotyping will increase the costs. Recently, an alternative approach, the mismatch tolerant approach, was proposed to tackle the problem of genotyping error. This approach accepts that genotyping error might not be completely eliminated, but it proposes that accurate clone identification can be achieved by using a relatively large number of loci (McKelvey and Schwartz, 2004; Kalinowski et al., 2006). The mismatch tolerant approach assumes that the genotyping error is reasonably small; thus, these errors will cause samples from the same clone to differ by a very small number of loci. If enough loci are scored, different clones in the population will have multilocus genotypes that mismatch by more than one locus. In this way, the error mismatch distributions are unlikely to overlap with genotyping difference distributions, which would allow the differentiation of mismatch caused by genotyping error, from that caused by real genotype differences (Kalinowski et al., 2006; McKelvey and Schwartz, 2004). When error rate is small, this method was shown to have the potential to reduce the number of PCR amplifications required in the multitube approach (Kalinowski et al., 2006). For example, McKelvey and Schwartz (2004) recommend using as many as 12 to 15 loci so that genotyping error mismatch distributions are unlikely to overlap with genotyping difference distributions.
Our results showed that a combination of the mismatch tolerant approach with the multitube approach was suitable for cacao clone identification using high throughput genotyping. This scheme used the multilocus profile, generated by single genotyping at 15 SSR loci, to identify clones with unique genotypes (i.e., SSR profile mismatched at more than three loci). Then the accessions with mismatch caused by genotyping errors (with mismatch at three loci or fewer) were genotyped three times to verify their true genotype at the disputed loci. In this manner, the need for PCR replication was limited to a small number of loci. As demonstrated in the case of USDA Mayaguez cacao collection, reliable clone identification was obtained. This method is practical and cost effective for assessing genetic identity of a large number of cacao germplasm accessions. Similar strategies can also be applied for cultivar identification in other crops and species.
Among the 15 SSR loci used in the present study, a few of them are less than ideal to serve the purpose of cacao clone identification (i.e., loci Y16986, Y16996, AJ271826, and AJ271942) because of their relatively high error rate. These 15 loci constitute the first set of SSR agreed on by various collaborating institutions for cacao clone identification (Saunders et al., 2004). The selection was made at the time when only limited cacao microsatellite loci were available. Since then, many more SSR sequences have been developed (Pugh et al., 2004). Screening for more informative and less erroneous SSRs is a continuous process in our laboratory. Meanwhile, optimization of the PCR reaction is also ongoing. The genotyping error rate will be reduced with the use of new primers and fully optimized PCR reactions.
Like many other cacao germplasm collections, the USDA Mayaguez cacao germplasm collection suffers from duplicate accessions and other types of mislabeling. In the present study, we only used samples taken from a single tree per accession. We conclude that accessions that have been transferred between cacao germplasm collections are frequently subject to errors in identification. Differences occur among germplasm collections as well as within collection. Therefore, the present study only judged, in case of full SSR profile matches, which accession (comprising a group of trees) had the problem of synonymous mislabeling (or duplicates). However, we cannot confirm that the accessions having unique SSR profiles are not mislabeled.
Samples for this research were collected in 2001 from trees in the original cacao collection established in 1960 at Mayaguez. In 2002, a new replicated collection was established at Mayaguez using Amelonado as a common rootstock and many of the duplicates identified in this study were deleted (Table 3). Works continues on the 15 SSR primers used in this study to discover if clones with the same accession number are mislabeled.
Germplasm collections invariably contain duplicate accessions, which burdens the effectiveness of genebank management because these redundancies do not contribute to the diversity in the collection. The present study has provided the first step to identify the duplicates in a cacao germplasm collection, allowing the elimination of redundancy to certain extent. However, this step is not sufficient to fully correct the mislabeling in this collection. All the cacao accessions in the USDA Mayaguez repository were introduced from various original collections in Central and South America. Therefore, the correction of mislabeling in the Mayaguez collection will have to be based on the "reference multilocus" profile of the original trees in the source genebanks. Currently, we are genotyping original trees in the two international cacao collections, in Trinidad (the ICG, T) and in Costa Rica (the CATIE collection). Further activities in assessing the intra-accession variation will be necessary to fully sort out the mislabeling in the Mayaguez collection.
| ACKNOWLEDGMENTS |
|---|
Received for publication January 2, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. S. Brown, W. Phillips-Mora, E. J. Power, C. Krol, C. Cervantes-Martinez, J. C. Motamayor, and R. J. Schnell Mapping QTLs for Resistance to Frosty Pod and Black Pod Diseases and Horticultural Traits in Theobroma cacao L. Crop Sci., September 1, 2007; 47(5): 1851 - 1858. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Plant Registrations | Soil Science Society of America Journal | ||||
| Journal of Natural Resources and Life Sciences Education |
Journal of Environmental Quality |
||||