Crop Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (12)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by DeHaan, L. R.
Right arrow Articles by Ehlke, N. J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by DeHaan, L. R.
Right arrow Articles by Ehlke, N. J.
Agricola
Right arrow Articles by DeHaan, L. R.
Right arrow Articles by Ehlke, N. J.
Related Collections
Right arrow Cell Biology & Molecular Genetics
Crop Science 42:1361-1364 (2002)
© 2002 Crop Science Society of America

NOTES

Peakmatcher

Software for semi-automated fluorescence-based aflp

L. R. DeHaana, R. Antonidesb, K. Belinaa and N. J. Ehlke*,a

a Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, 411 Borlaug Hall, 1991 Upper Buford Circle, St. Paul, MN 55108
b Apex Motion Control, Inc., 15691 92 Ave., Surrey, BC, Canada V49 3C3

* Corresponding author (ehlke001{at}umn.edu)


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 
Fluorescence-based amplified fragment length polymorphism (AFLP) is becoming a widely used molecular marker technology. The fluorescence-based technique is particularly desirable because it allows for semiautomation of gel scoring. However, semiautomated gel scoring requires manual creation of marker categories, which is time consuming. We have developed Peakmatcher software to create automatically marker categories and generate a binary table for the presence or absence of marker fragments. Peakmatcher generates categories primarily on the basis of the repeatability of markers across replications. The program produces results that are highly consistent with those from a graphical manual method. Furthermore, Peakmatcher required only about 10% as much time to generate comparable data. We conclude that the capability of Peakmatcher to analyze large data sets rapidly will expedite semiautomated fluorescence-based AFLP.

Abbreviations: AFLP, amplified fragment length polymorphism • RMU, relative migration unit


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 
SEMIAUTOMATED fluorescence-based AFLP is a widely used molecular marker technology. On the basis of the AFLP technique developed by Vos et al. (1995), it has recently been used in studies with Onopordum thistles (O'Hanlon et al., 1999), evergreen azaleas (DeRiek et al., 1999), bluebunch wheatgrass (Larson et al., 2000), potato (McGregor et al., 2000), ryegrass (Roldán-Ruiz et al., 2000), barley (Schwarz et al., 1999), and mulberry (Sharma et al., 2000). The fluorescence-based technique is particularly desirable because it expedites the gel-scoring process in comparison with silver staining or radiolabeling.

Fluorescence-based AFLP can be performed with an automated DNA sequencer, such as the ABI Prism 377 (Applied Biosystems, Foster City, CA)1, to run gels. Output from the sequencer is processed by ABI Prism Genescan Analysis Software, which detects DNA fragments as peaks in an electropherogram. Because a size standard is loaded in each lane, fragment sizing is accurate, but variability remains. Typical differences in fragment sizing are described by Mitchell et al. (1997), who showed that identical simple sequence repeat fragments were sized within a range of 0.17 base pairs within a gel and 0.46 base pairs across gels. Because of this imprecision in fragment sizing, fragments are often sized as intermediate between whole base pairs. Therefore, some researchers have opted for the term relative migration units (RMUs) rather than base pairs (O'Hanlon et al., 1999). We find this terminology to be more accurate, and will therefore use it throughout the paper.

Because of the inherent variability in fragment size calling, categories defined by a midpoint and some level of tolerance (e.g., 55 ± 0.25 RMU) must be created for each fragment. Fragments in lanes can be scored automatically for each category as being present (one) or absent (zero). Creating these categories accurately and rapidly is currently one of the most time consuming steps of fluorescence-based AFLP. This is particularly true in genetic diversity studies where many markers are being scored across numerous individuals.

Two main approaches to creating categories have been used: graphical and text-based. The graphical approach uses software such as ABI Prism Genotyper or Genographer (Benham et al., 1999). These software packages allow the user to visualize the data collected by the sequencer and manually create categories for fragments. The advantage to this approach is that the researcher is assured that the categories reflect the researcher's best judgment. The disadvantages are that the process is time consuming, the process is difficult to perform with large numbers of genotypes due to limited computer screen sizes, and categories need to be reconstructed when more individuals are added to the data set.

Programs such as Genotyper can create a text output of all the peaks detected in a given lane. Utilizing the text-based data is an attractive alternative to the graphical approach because it should allow for uniform size calling without the need to make visual judgements from gel images. However, setting up accurate categories has proved to be problematic. McGregor et al. (2000) created categories every one RMU by rounding all fragment sizes (peak location in RMUs) to the nearest whole number. The advantages of this approach are simplicity and speed. The disadvantage is that some fragments may actually be centered between two RMUs. When rounding to the nearest whole number, these fragments will then be divided into two categories rather than be included in a single category. Therefore, this approach is detrimental to the repeatability of the data, is seldom used and is advised against by Smith (1995).

Another text-based technique for creating categories is the histogram method. With this technique, a list of all the detected peaks (in RMUs) is placed in a spreadsheet such as Microsoft Excel. The histogram function is used to determine how many peaks are in each bin (bin sizes are typically set to 0.1 RMUs). By looking at the histogram, the user can determine the optimal location of categories with a reasonable degree of accuracy. These categories can be created manually, and a binary table can be generated in a software package such as AFLPapp (available from http://hordeum.oscs.montana.edu/software/AFLPapp/; verified February 14, 2002). Although this technique can readily be used with large numbers of genotypes, it still requires time for manual category creation and recreation of categories when genotypes are added, and repeatability can be low because of variability in fragment size calling.

In summary, previous approaches to creating categories for semiautomated fluorescence-based AFLP have been time consuming, difficult to use with numerous genotypes, or have lacked an acceptable degree of accuracy. To deal with these problems, we have developed a software tool (Peakmatcher) that utilizes text-based output from Genotyper software to create rapidly optimal categories and subsequently generate a binary table for the presence or absence of every fragment for each genotype.


    Description of the Program
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 
Peakmatcher is a Visual Basic program that operates as a macro in Microsoft Excel. Data are entered in Excel, and results are returned as Excel worksheets. The program requires Excel 98 or newer with all current updates installed.

The unique approach used in Peakmatcher is to identify the best categories primarily on the basis of repeatability. Therefore, the program requires two or more replications of each genotype. Optimally, the replications would be generated by repeating the AFLP technique and running the products on separate gels. Running the same product in multiple lanes on the DNA sequencer would be an inexpensive means to obtain replications, but would only account for variation in fragment sizing, not variation in the AFLP reactions.

To run Peakmatcher, a list of all detected peaks in each lane must be generated from Genescan files by means of a software package such as Genotyper. Each list is entered into a separate row or column in an Excel spreadsheet and identified by replication number and genotype. When Peakmatcher is run, the user selects the data to analyze, enters values for several user-defined settings, and initiates the analysis.

The internal operation of Peakmatcher consists of two major processes. The first is the generation of marker categories according to user preferences. The user specifies one or more ranges to use and the interval to use between the midpoints of the ranges. Categories of every range requested are then generated with midpoints at the specified interval. Typically, thousands of categories are generated in this step. Within each category, each genotype is scored as having a peak or peaks present in all replications (present), some replications (not repeatable), or no replications (absent).

The second major operation Peakmatcher performs is the elimination of undesirable categories until only categories containing useful, highly repeatable markers remain. Undesirable categories are removed through a linear process of elimination with seven steps:

  1. All categories containing zero peaks are eliminated.
  2. Categories with few peaks that overlap with categories with many peaks are eliminated. This step is necessary to prevent correct categories with many bands but low repeatability from being split into smaller categories, which would be artifacts. Specifically, this step removes any category overlapped by a second category containing a greater number of peaks (the user defines how many more peaks must be present, on a percentage basis, before a category is removed).
  3. Every category with repeatability below a minimum level set by the user is removed. Repeatability is defined as the percentage of genotypes having the same score (present or absent) in every replication.
  4. All categories that overlap with a category of higher repeatability are removed.
  5. Every category that overlaps with a category containing a greater number of peaks is removed.
  6. Every category that overlaps with a category having a smaller range is removed.
  7. Every category that overlaps with a category defined by a lower midpoint (in RMUs) is removed. After the filtering is completed, all remaining categories are displayed as a binary table in Excel (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Sample Peakmatcher output. Data were generated from fluorescence-based AFLP performed on five Illinois bundleflower plants with two replications. Settings were minimum repeatability of 75% and minimum bands present of 60%.
 
Peakmatcher contains several useful options. In the output sheet, the user can display monomorphic markers only, polymorphic markers only, or both. As a diagnostic tool, the user can also display categories that were removed in the seven step filtering process, including a column stating the reason each category was removed (Table 2). The user can also display the full table of categories generated prior to any filtering. For added convenience, the user can save a set of favorite settings as defaults.


View this table:
[in this window]
[in a new window]
 
Table 2. Sample Peakmatcher output showing categories that were removed in the filtering process and the reason the categories were removed. Categories with zero peaks present are not displayed. Data were generated from fluorescence-based AFLP performed on five Illinois bundleflower plants with two replications. Settings were minimum repeatability of 75% and minimum bands present of 60%.

 

    Evaluation of the Program
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 
To compare Peakmatcher to other methods of processing data from fluorescence-based AFLP, we analyzed two small data sets consisting of five plants each. In both data sets, two replications were obtained by repeating the selective step of the AFLP process. Products from the selective reactions were run on separate gels. Genescan software was set to detect peaks with a minimum fluorescence intensity of 50. The species used in the first data set was Illinois bundleflower [Desmanthus illinoensis (Michx.) MacMillan], and the species used in the second data set was quackgrass [Elytrigia repens (L.) Nevski]. Both data sets were analyzed by Peakmatcher and three other methods: (i) a graphical technique using Genographer; (ii) the histogram technique in Excel as described in the introduction; and (iii) rounding peak sizes in RMUs to the nearest whole number. For the histogram and rounding techniques, the program AFLPapp was used to create the binary table. The manual graphical technique is regarded as the most accurate, and was therefore used as a reference for evaluating the text-based approaches.

Results of the analyses revealed that when time and accuracy are considered, Peakmatcher was clearly superior to other techniques (Table 3). With Illinois bundleflower, Peakmatcher results were identical to those produced by a graphical method, but required only 11% of the time necessary for graphical analysis. With the quackgrass data set, one less marker was obtained by Peakmatcher than by graphical analysis, but the time required for Peakmatcher analysis was only 9% of the time required for graphical analysis.


View this table:
[in this window]
[in a new window]
 
Table 3. Comparison of four techniques for generating marker categories. Manual category creation by Genographer was used as a reference for three text-based techniques. The data were derived from two replications of a single AFLP primer combination used on five individuals from two plant species.

 
Both the histogram and rounding techniques gave substantially less information than Genographer (Table 3). Fewer repeatable markers were obtained with these methods, and many of the markers obtained were incorrectly scored. These methods produced better results with the Illinois bundleflower data set than with the quackgrass data set. The best explanation for this difference is that gel-running process created more variability in the quackgrass data set than in the Illinois bundleflower data set.

Although Peakmatcher can expedite analyses with small numbers of individuals as we have demonstrated, its greatest advantage is realized when large numbers of individuals are analyzed. We have used Peakmatcher to analyze successfully data from two separate genetic diversity studies using AFLPs. Both studies included more than 150 individuals, and the analyses were performed in less than one hour. Subsets of the output generated by Peakmatcher from both experiments were crosschecked with Genographer, and the Peakmatcher output was found to be consistently reliable. The speed and accuracy with which Peakmatcher can analyze large data sets should expedite the process of semiautomated fluorescence-based AFLP.


    Availability and Requirements
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 
The current version of Peakmatcher is available for download from http://www.agro.agri.umn.edu/~peak; verified February 14, 2002. The software is distributed free of charge under the GNU General Public License (details are available at http://www.gnu.org/copyleft/gpl.html; verified February 14, 2002). An online help file is distributed with the program.

Peakmatcher requires a personal computer with Microsoft Windows 98 and Excel 97 or newer to operate. The program has processed data sets containing >150 individuals on computers with 128 megabytes of RAM. Larger data sets are likely to require additional memory.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 
Contribution of the Minnesota Agric. Exp. Stn.

1 Names are necessary to report factually on available data; however, the Univ. of Minnesota neither guarantees nor warrants the standard of the product, and the use of the name by the Univ. of Minnesota implies no approval of the product to the exclusion of others that may be suitable. Back

Received for publication April 11, 2001.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Description of the Program
 Evaluation of the Program
 Availability and Requirements
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
K. M. Moncada, N. J. Ehlke, G. J. Muehlbauer, C. C. Sheaffer, D. L. Wyse, and L. R. DeHaan
Genetic Variation in Three Native Plant Species across the State of Minnesota
Crop Sci., November 7, 2007; 47(6): 2379 - 2389.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
K.B. Jensen, K.H. Asay, D.A. Johnson, S.R. Larson, B.L. Waldron, and A.J. Palazzo
Registration of 'Bozoisky-II' Russian Wildrye
Crop Sci., February 24, 2006; 46(2): 986 - 987.
[Full Text] [PDF]


Home page
Crop Sci.Home page
K. B. Jensen, S. R. Larson, B. L. Waldron, and K. H. Asay
Cytogenetic and Molecular Characterization of Hybrids between 6x, 4x, and 2x Ploidy Levels in Crested Wheatgrass
Crop Sci., December 2, 2005; 46(1): 105 - 112.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
K. B. Jensen, S. R. Larson, B. L. Waldron, and D. A. Johnson
Characterization of Hybrids from Induced x Natural Tetraploids of Russian Wildrye
Crop Sci., May 27, 2005; 45(4): 1305 - 1311.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
K.B. Jensen, S.R. Larson, and B.L. Waldron
Registration of 'Mustang' Altai Wildrye
Crop Sci., May 6, 2005; 45(3): 1168 - 1169.
[Full Text] [PDF]


Home page
Crop Sci.Home page
L. R. DeHaan, N. J. Ehlke, C. C. Sheaffer, G. J. Muehlbauer, and D. L. Wyse
Illinois Bundleflower Genetic Diversity Determined by AFLP Analysis
Crop Sci., January 1, 2003; 43(1): 402 - 408.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (12)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by DeHaan, L. R.
Right arrow Articles by Ehlke, N. J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by DeHaan, L. R.
Right arrow Articles by Ehlke, N. J.
Agricola
Right arrow Articles by DeHaan, L. R.
Right arrow Articles by Ehlke, N. J.
Related Collections
Right arrow Cell Biology & Molecular Genetics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome