Crop Science Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 6 May 2005
Published in Crop Sci 45:1120-1131 (2005)
© 2005 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Supplemental Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Heckenberger, M.
Right arrow Articles by Melchinger, A. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Heckenberger, M.
Right arrow Articles by Melchinger, A. E.
Agricola
Right arrow Articles by Heckenberger, M.
Right arrow Articles by Melchinger, A. E.
Related Collections
Right arrow Crop Genetics
Right arrow Maize

CROP BREEDING, GENETICS & CYTOLOGY

Identification of Essentially Derived Varieties Obtained from Biparental Crosses of Homozygous Lines

I. Simple Sequence Repeat Data from Maize Inbreds

M. Heckenbergera, M. Bohnb and A. E. Melchingera,*

a Institute of Plant Breeding, Seed Science, and Population Genetics, Univ. of Hohenheim, 70593 Stuttgart, Germany
b Crop Science Dep., Univ. of Illinois, S-110 Turner Hall, 1102 South Goodwin Ave., Urbana, IL, 61801

* Corresponding author (melchinger{at}uni-hohenheim.de)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Genetic distances (GDs) based on molecular markers such as simple sequence repeats (SSRs) have been proposed as an appropriate tool to assess the genetic conformity between putative essentially derived varieties (EDVs) and their initial varieties (IVs). However, for maize (Zea mays L.) and other crops, no consensus has been reached regarding GD thresholds for identification of EDVs, mainly because reliable benchmark data are lacking. Our objectives were to (i) determine the variation in the parental contribution (p) to the genome of homozygous progeny lines derived in recycling breeding programs; (ii) investigate the power of SSR-based GD estimates for discriminating between progeny lines derived from F2, BC1, and BC2 populations (BC = backcross); (iii) compare theoretical and simulated results of a companion study to our experimental data; and (iv) draw conclusions with regard to various EDV thresholds suggested hitherto. A total of 220 European and U.S. maize inbred lines comprising 163 triplets were genotyped with 100 uniformly distributed SSRs. A triplet consisted of one F2–, or BC1–derived progeny line and both parental lines. The SSR-based estimates of p varied from 0.25 to 0.74 for F2–derived lines with a mean (0.49) close to the expectation (0.50), and ranged from 0.51 to 0.80 for BC1–derived lines with a mean (0.66) significantly smaller than the expectation (0.75). Relative to the variation in p, the GD between progeny lines and parents was little influenced by the variation in the GD between the parents, particularly for BC1–derived lines. Suggested GD thresholds for EDVs resulted in different Type I ({alpha}) and Type II (ß) errors, depending on the germplasm pool. Considerable overlaps in the GD frequency distributions of F2–, BC1–, and BC2–derived lines indicate that the resolution to discriminate these types of progeny is poor unless a much larger number or a set of extremely polymorphic markers is used.

Abbreviations: AFLP, amplified fragment length polymorphism • ASSINSEL, International Association of Plant Breeders for the Protection of Plant Varieties • BC, backcross • CI, confidence interval • EDV, essentially derived variety • f, coefficient of parentage • GD, genetic distance • IDV, independently derived variety • IV, initial variety • p, variation in the parental contribution • PIC, polymorphic information content • PVP, plant variety protection • SSR, simple sequence repeat • T, potential threshold • UPOV, International Union for the Protection of New Varieties of Plants


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
LEGAL REGULATIONS for plant variety protection (PVP) should secure the reward for past breeding efforts but also sustain future breeding progress. Registered plant varieties need to be protected against plagiarism and misuse on one hand, but protected germplasm should be accessible for the development of new varieties on the other. The latter was warranted by the concept of breeder's exemption or breeder's privilege in the original convention of the International Union for the Protection of New Varieties of Plants (UPOV, 1978).

The advent of new methods such as genetic engineering and marker-assisted backcrossing, however, has provided the basis to undermine the breeder's exemption in its original intention. These tools make it possible to add a few new genes to a protected variety or to select deliberately for lines that are very similar to one of their parents and apply for PVP for this new variety. Therefore, the investments made in breeding the original variety can be exploited by the breeder of the plagiarized variety without indemnification to the breeder of the original variety.

The concept of EDVs was implemented into the revised UPOV convention (UPOV, 1991) and several national PVP acts to cope with this new situation. Accordingly, a variety is deemed to be essentially derived from an IV, if it (i) was predominantly derived from the IV, (ii) is clearly distinguishable from the IV, and (iii) genetically conforms to the IV. The genetic conformity between the IV and potential EDVs can result from repeated backcrossing, genetic engineering, reselection for variants within varieties, or other unaccepted breeding procedures.

Nevertheless, the UPOV convention gives space for interpretations how thresholds should be determined. Furthermore, breeding companies have not yet agreed on a catalog of accepted or unaccepted breeding procedures that result generally in independently derived varieties (IDVs) or EDVs, respectively. For instance, some companies argue that development of a line from a BC1 population obtained by using the line of a competitor as recurrent parent should be accepted, whereas others claim that even the derivation from an F2 population is unacceptable. In addition, no official guidelines or appropriate methods have been set to assess the genetic conformity between IVs and potential EDVs. Hence, crop-specific thresholds for the discrimination between EDVs and IDVs have not yet been defined.

In principle, the coefficient of parentage (f) introduced by Malécot (1948) could serve for identification of EDVs because it reflects the degree of relatedness between two genotypes on the basis of their pedigrees. In the case of a suspected EDV, however, pedigree data are usually not available for the breeder of the IV. In addition, f is an indirect measure of genetic similarity (i.e., the expected proportion of alleles identical by descent between two individuals) based on several simplifying assumptions such as equal parental genome contributions and absence of selection, mutation, or drift (Melchinger et al., 1991).

Molecular markers such as simple-sequence repeats (SSRs) or amplified fragment length polymorphisms (AFLPs) allow determination of the parental origin of the chromosomal segments in a progeny. Therefore, GDs based on molecular markers were proposed as an appropriate tool to determine the genetic conformity between an IV and putative EDVs and, consequently, to distinguish between EDVs and IDVs [International Association of Plant Breeders for the Protection of Plant Varieties (ASSINSEL), 1999; International Seed Federation, 2002a]. In maize, GDs between lines based on AFLP and SSR data were highly correlated with each other and with f estimates (Lübberstedt et al., 2000; Smith et al., 1997), suggesting that the degree of relatedness of two genotypes can be inferred from their GD. However, distributions of GDs for F2– and BC1–derived progenies showed a substantial overlap (Bernardo et al., 1997).

In a companion paper, we proposed a conceptual framework based on principles of statistical test theory for identification of EDVs with molecular markers (Heckenberger et al., 2005, unpublished data). Accordingly, for a progeny line derived from biparental crosses, the GD to each parent depends on the GD between the two parents and p, the parental genome contribution transmitted to the progeny. Experimental estimates of p for F2– and BC1–derived progenies were reported (Bernardo et al., 1997, 2000), and formulas for the variance of p for both types of progeny were derived (Wang and Bernardo, 2000). Nonetheless, further experimental data are required to verify the approach of Heckenberger et al. (2005, unpublished data) and quantify the influence of the above-mentioned factors with regard to consequences for potential EDV thresholds.

In this study, we investigated a large number of triplets in maize, each consisting of homozygous progeny lines derived from F2, BC1, or BC2 populations and their parental inbreds to provide benchmark data for practical implementation of the EDV concept in maize breeding. Our main goal was to test the hypothesis that GDs based on SSR data can be used to identify lines derived with unaccepted breeding methods. In detail, our objectives were to (i) estimate the p to the genome of the progeny; (ii) investigate the power of SSR-based GD estimates for discriminating between homozygous lines derived from F2, BC1, and BC2 populations; (iii) compare the theoretical and simulated results of Heckenberger et al. (2005, unpublished data) with our experimental data; and (iv) draw conclusions with regard to various EDV thresholds suggested in the literature.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Plant Materials
In total, 220 elite maize inbred lines were analyzed comprising 89 European flint, 74 European dent, 14 U.S. dent, and 43 introgression lines. These lines originated from the maize breeding programs at the University of Hohenheim (Stuttgart, Germany), Iowa State University (Ames, IA, USA), and three commercial breeding companies in Germany. The 220 lines comprised 163 triplets. Each triplet consisted of one progeny line and both parental lines. The materials consisted of 118 intrapool triplets of European flint or dent lines and 45 interpool triplets, each including one European and one U.S. line with an introgression line as progeny. European flint and dent lines as well as the pooled U.S. lines were regarded as separate germplasm pools.

Altogether, 83% of the progenies were derived from F2 populations and 17% were derived from BC1 or BC2 populations (Table 1). Detailed information on all 163 triplets and the 220 maize inbreds included in this study is available as supplemental data on the internet at http://crop.scijournals.org/.


View this table:
[in this window]
[in a new window]
 
Table 1. Number and type of parent–offspring triplets.

 
Molecular Analyses
All lines were genotyped with a set of 100 SSR markers uniformly covering the entire maize genome, as described in detail by Heckenberger et al. (2002). Briefly, DNA samples were analyzed using an ABI Prism 377 DNA Sequencer with 96-lane polyacrylamid gels. Internal fragment size standards were used in each lane to increase the accuracy of DNA fragment size determination. The size of each DNA fragment was determined automatically by using the GeneScan (Applied Biosystems, Foster City, CA) software and assigned to specific alleles by the Genotyper (Applied Biosystems) software. The 100 SSRs were selected on the basis of reliable single-locus amplification, absence of null alleles, high degrees of polymorphism, and high reproducibility of the bands. Seventy of the 100 SSRs contained dinucleotide repeat motifs, whereas the other 30 markers consisted of tri- to octanucleotide repeats. The SSR analyses were performed on a commercial basis. Nonparental alleles were defined as alleles present in the progeny line, but absent in each of the parents. Loci with nonparental alleles were considered as missing for the particular triplet.

Statistical Analyses
Polymorphic information content (PIC) values were estimated as suggested by Anderson et al. (1993). Malécot's (1948) coancestry coefficient (f) was calculated from pedigree information between all pairwise line combinations based on rules described by Melchinger et al. (1991). Genetic distances between lines based on SSR data were estimated using Rogers' distance (Rogers, 1972). In the case of missing values in one of the two inbreds compared, the corresponding alleles of the other accession were not used for GD calculation. Standard errors (SEs) for GDs were estimated using the bootstrap procedure with resampling over primer pairs (Tivang et al., 1994). This requires independent pairs of molecular markers, which can be taken for granted for those 90% of marker pairs located on different chromosomes. In addition, a further analysis of this germplasm revealed that the extent of intrachromosomal linkage disequilibrium was small (Stich, 2004, personal communication). Therefore, the genotypes at an overwhelming majority of marker pairs can be considered as stochastically independent.

Coefficients of correlation between GDs based on SSRs (GDSSR) and f were calculated using simple correlation coefficients (Snedecor and Cochran, 1980). In addition, a lack-of-fit test was used to test for linear or quadratic relationships between f and GD. Calculation of GDs was performed with the PLABSIM software (Frisch et al., 2000). The remaining statistical calculations were performed with the R software package (Ihaka and Gentleman, 1996).

Statistical Framework
Suppose progeny line O is derived from a cross or BC (e.g., F2, BC1, or BC2 generation) between the homozygous parents P1 and P2 and the GDs between P1 and P2 as well as P1 and O are denoted as GD(P1,P2) and GD(P1,O), respectively. When O was an F2–derived homozygous progeny line, P1 was the first parent listed in the pedigree record of O. When O was a BC-derived inbred, P1 was the recurrent parent. If GD is determined by a large number of polymorphic markers with uniform coverage of the entire genome, we obtain the following equation:

[1]
where p denotes the proportion of the genome transmitted from P1 to O, either directly obtained from simulation studies or estimated with SSRs. Both GD and p are regarded as random variables and their estimates from molecular marker data are denoted by adding a hat.

Solving Eq. [1] for p yields the estimate

{crop45-3-eqn2}

Given estimates of GD(P1,O) and GD(P1,P2) based on markers distributed throughout the genome, this formula can be used to estimate p. If the polymorphic markers between two unrelated parents are a random sample of the genome, then is an unbiased estimator of the true genome proportion. Similar formulas were given by Bernardo et al. (1997) on the basis of the number of common bands between P1 and O or the simple matching coefficient (Sneath and Sokal, 1973). Since the latter is based on single alleles without weighing multiple bands within a marker locus, we chose the Rogers' distance for this study.

In the absence of selection, p is a random variable with distributional properties depending on the (i) degree of relatedness between P1 and O and (ii) number and length of the chromosomes (Wang and Bernardo, 2000). If P1 and P2 are unrelated [f(P1,P2) = 0], then the expected value of p, µp, corresponds to the coancestry f(P1,O) and, thus, µp = 0.500, 0.750, and 0.875 for F2–, BC1–, or BC2–derived progeny lines of P1, respectively.

Formulas for the variance of p for F2 or BC1–derived progeny lines were given by Wang and Bernardo (2000). In addition, numerical values for maize were obtained for F2–, BC1–, or BC2–derived progeny lines from stochastic simulations by Heckenberger et al. (2005, unpublished data). The simulations were based on a genetic model allowing for genetic drift, but neither selection nor mutation. Empirical and simulated frequency distributions of p values were compared with a Kolmogorov-Smirnov test (Lehmann, 1986) to check for significant deviations caused by selection or mutation. Equality of variances of empirical and simulated estimates of p was evaluated with Levene's test (Levene, 1960).

If a number of specific progeny lines is derived from a large number of biparental crosses between different parents P1 and P2 representative for a germplasm pool, then GD(P1,P2) can be regarded as a random variable with mean µGD and variance {sigma}2GD. Since the value of p for a specific progeny is completely unrelated to the GD of its parent lines, GD(P1,P2) and p are stochastically independent. Thus, we obtain from Eq. [1] the following equations, using formulas given by Goodman (1960):

[3]

[4]
where µGD(P1,O) and {sigma}2GD are the mean and variance of GD(P1,O), respectively, for a given relationship between O and P1.

By inserting experimental estimates for {sigma}2GD and estimates for µp and {sigma}2p, determined either from computer simulations (Heckenberger et al., 2005, unpublished data) or the formulas given by Wang and Bernardo (2000), we were able to calculate predicted values for {sigma}2GD and compare them with estimated values for F2– or BC1–derived progeny lines from unrelated parents. Moreover, Eq. [4] permits comparison of the relative influence of {sigma}2p and {sigma}2GD on the variance of GD(P1,O) for F2– or BC1–derived progeny lines, which is of importance for the question of EDV thresholds. In addition, simulated GD(P1,O) values were calculated with Eq. [1] for each material group on the basis of simulated p values and GD, whereas 2GD values were estimated for observed GD(P1,P2) values of unrelated lines.

Simulation Studies
In a companion study (Heckenberger et al., 2005, unpublished data), we performed simulations to obtain the approximate distribution of p. Briefly, each simulation of a breeding program started with crossing parents P1 and P2, which were assumed to be homozygous (i.e., inbreeding coefficient F = 1) and fully polymorphic [i.e., GD(P1,P2) = 1 and {sigma}2GS] at all marker loci. One heterozygous F1 individual was selfed to produce a segregating F2 population or was backcrossed to parent P1 to obtain a BC1 population. A randomly selected BC1 individual was again backcrossed to P1 to obtain a BC2 population. From the segregating populations, randomly selected individuals were chosen and advanced to homozygosity by using the single seed descent method. Values of p were determined by dividing the number of loci homozygous for P1 by the total number of loci monitored. Simulations of each breeding program were repeated 50000 times to reduce sampling effects and to obtain estimates of µp and {sigma}2p with high numerical accuracy. The simulations were performed with the PLABSOFT software package (Maurer et al., 2004) to consider the effects of marker density and marker distribution (random vs. equal distribution) on the accuracy of GD(P1,O) estimates.

Threshold Scenarios
To increase the sample size, all the GD values of the data set with corresponding f values 0.5, 0.75, and 0.875 for F2–, BC1–, and BC2–derived progeny lines were used in addition to GD values obtained within triplets to evaluate potential thresholds (T). The frequency distributions of empirical GD(P1,O) values for F2–, BC1–, or BC2–derived progeny lines were approximated by ß distributions (Johnson et al., 1995), with parameters chosen such that the mean and variance of the original distribution were conserved. On the basis of these distributions, we calculated Type I ({alpha}) and Type II (ß) errors for various EDV thresholds and various types of populations. Here, {alpha} corresponds to the probability that a true IDV will be wrongly judged as EDV, whereas ß corresponds to the probability that a true EDV will not be recognized as such but judged as IDV (Fig. 1) . First, we consider the situation that an F2–derived progeny will be regarded as IDV, but a BC1–derived progeny as EDV. Second, we assume that a BC1–derived progeny will be regarded as IDV, but a BC2–derived progeny as EDV.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 1. Cumulative frequency distributions of GD(P1,O) for F2–, BC1–, and BC2–derived progenies based on simulated data assuming GD(P1,P2) = 0. Type I ({alpha}) and Type II (ß) errors refer to a threshold of GD = 0.37 for discriminating F2– vs. BC1–derived progeny lines.

 
The SSR- or restriction fragment length polymorphism-based GD values of 0.25, 0.20, 0.15, and 0.10 were suggested as possible EDV thresholds (T values) by the American Seed Trade Association (Smith and Smith, 1989), International Association of Plant Breeders for the Protection of Plant Varieties (ASSINSEL, 2000), Chambre Syndicale des Entreprises Françaises de Semences de Maïs (SEPROMA) (Leipert, 2003, personal communication), and Troyer and Rocheford (2002), respectively. For all thresholds, the corresponding {alpha} and ß values were calculated for homozygous progeny lines derived from F2, BC1, and BC2 populations. In addition, other thresholds with fixed {alpha} = 0.05 (T0.05) or {alpha} = ß (T{alpha} = ß) were evaluated to balance Type I and Type II errors, and therefore the risk of obtaining an EDV from an accepted procedure vs. the risk of declaring a true EDV as IDV.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Genetic Variation for Simple Sequence Repeats
A total of 1099 SSR alleles was observed with the 100 SSRs on the set of 220 inbred lines. Number of alleles per marker varied from three to 25, with a mean of 11. The PIC values ranged from 0.10 to 0.88, with a mean of 0.71. Only 3.7% of all marker data points were missing due to amplification failure or null alleles. Approximately 4% of all bands observed in progeny lines were nonparental and excluded from calculation of GD values in the respective triplets.

Correlations between GD and 1 – f were highly significant (P < 0.01) for all three material groups, being highest for dent lines (r = 0.90, P < 0.01), intermediate for flint lines (r = 0.75, P < 0.01), and lowest for introgression lines (r = 0.58, P < 0.01). In addition, we observed linear but not quadratic relationships between f and GD for all three material groups.

Parental Contributions for F2– and BC1–derived Progenies
The three material groups did not differ from each other in their mean parental contributions, p, for F2– as well as for BC1–derived progeny lines. Therefore, the data from all three groups were pooled for further analyses. For F2–derived progenies, SSR-based estimates of p ranged from 0.25 to 0.74, with p = 0.49 (Fig. 2) close to the expectation of 0.50. Variances for estimated and simulated values of p did not differ significantly (P < 0.05) (Table 2). Frequency distributions for estimated and simulated values of p were significantly different (P < 0.05) from each other due to a higher kurtosis of the former.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 2. Standard errors of pairwise genetic distances (GDs) and their corresponding GD values observed for 220 maize inbred lines with 100 simple sequence repeats.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Estimated and simulated or calculated means and variances inserted in Eq. [1] and Eq. [4] to calculate the proportion of 2GD attributable to 2GD and {sigma}2p.

 
The SSR-based estimates of p for BC1–derived progenies varied from 0.51 to 0.80, with a mean p = 0.66, which was significantly smaller than the expectation of 0.75 (Fig. 3) . Variances for estimated and simulated values of p were not significantly different (P < 0.05) from each other (Table 2). Frequency distributions for estimated and simulated values of p showed significant differences (P < 0.01) due to the shift to smaller values, the lower skewness, and the higher kurtosis for the distribution of observed values.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 3. Histograms of estimated (columns) and simulated (curve) parental contributions (p) for F2– and BC1–derived progeny lines. Means of estimated () and simulated [µ(p-sim)] values are indicated by dotted and dashed lines, respectively.

 
Genetic Distances among Unrelated Parental Inbred Lines
The GDs among unrelated [f(P1,P2) = 0] flint lines ranged from 0.23 to 0.79, with GD = 0.58 (Fig. 3). The GDs for unrelated dent lines varied from 0.25 to 0.85 with a significantly (P < 0.01) larger mean GD = 0.61. Unrelated parents of introgression lines, consisting of pairs of European and U.S. maize lines, had by far the largest range from 0.22 to 0.93 and also a significantly higher (P < 0.01) mean GD = 0.74 than the intrapool pairs of the other two material groups.

Subdivision of the Variance of GD(P1,O) for F2– and BC1–derived Progenies
Observed values of 2GD obtained from experimental data were in close agreement with the predicted 2GD values calculated with Eq. [4] on the basis of simulated values of µp and {sigma}2p as well as experimental estimates of GD and 2GD (Table 2). Further analysis revealed that for F2–derived progenies, 65% of 2GD could be explained by {sigma}2p and 34% by 2GD. For BC1–derived progenies, 92% of 2GD were explained by {sigma}2p, and only 8% by 2GD. The contribution of the product {sigma}2p2GD to 2GD was approximately 1% for both F2– and BC1–derived progeny lines. Thus, the proportion of 2GD explained by {sigma}2p is increased for advanced BC generations, and 2GD can be regarded as nearly independent of 2GD for BC progenies, but not for F2–derived progeny lines.

Evaluation of Essentially Derived Variety Threshold Scenarios
Observed frequency distributions of GD values for F2–, BC1–, and BC2–derived progenies fitted well the approximated ß distributions for flint and dent lines, but only moderately for introgression lines (Fig. 4) . For all three material groups, considerable overlaps were observed between the frequency distributions of GDs for F2– vs. BC1– as well as for BC1– vs. BC2–derived progenies. Within each generation, GD was significantly higher (P < 0.05) for the dent lines than for the flint lines (Fig. 5) . In addition, GD for the introgression lines was always significantly higher (P < 0.01) than GD for the flint and dent lines. Estimates of GD within the same generation were not significantly different (P < 0.01) between flint and dent lines but were significantly larger (P < 0.01) for introgression lines.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 4. Histograms of Rogers' genetic distances calculated from simple sequence repeat data between unrelated (f = 0) lines from European Flint and Dent lines as well as parents of introgression lines. Variables n, µ, and SD refer to the number of values, arithmetic mean, and the standard deviation of GD values for the particular distribution, respectively. Means are indicated by solid lines.

 


View larger version (70K):
[in this window]
[in a new window]
 
Fig. 5. Cumulative histograms (columns) and approximated ß distributions (curves) for genetic distances based on 100 simple sequence repeats for F2–, BC1–, and BC2–derived progenies. Variables n, µ, and SD refer to the number of values, arithmetic mean, and the standard deviation of genetic distance values for the particular distribution, respectively.

 
Given {alpha} = 0.05 for F2–derived lines, the power 1 – ß to classify a BC1–derived progeny line correctly as EDV amounted to 77, 63, and 15% for the particular thresholds determined for flint, dent, and introgression lines, respectively (Table 3). Corresponding values of 1 – ß for BC2–derived lines, assuming {alpha} = 0.05 for BC1–derived lines, were lower for flint and dent lines but larger for introgression lines. The power 1 – ß for thresholds determined for {alpha} = ß to classify BC1– or BC2–derived progenies as EDVs increased considerably compared with the values for {alpha} = 0.05. This increase in the power 1 – ß, however, is associated with higher values for {alpha}. Therefore, this leads to a considerably higher frequency of F2– or BC1–derived progenies incorrectly classified as EDVs.


View this table:
[in this window]
[in a new window]
 
Table 3. Evaluation of the Type I ({alpha}) and Type II (ß) errors for different EDV thresholds (T) based on observed and simulated data. refers to the average standard error of observed GDs at the particular threshold level.

 
For T = 0.25, 0.20, or 0.15, the corresponding {alpha} levels for F2–derived lines varied between {alpha} = 0.18 and {alpha} = 0.00 (Table 3). Corresponding values for 1 – ß ranged between 7 and 92%. For T = 0.15 and T = 0.10, the power 1 – ß to detect a BC2–derived line as EDV varied from 10 to 99%, with corresponding {alpha} values for BC1–derived lines ranging from 0.02 to 0.07. For each T, substantial differences for {alpha} and 1 – ß between flint, dent, and introgression lines were observed.

For {alpha} = 0.05 and {alpha} = ß, T values obtained from simulated data were lower than from observed data, with the exception of {alpha} = 0.05 for introgression lines (Table 3). For all these scenarios, the power 1 – ß to classify BC1 or BC2–derived progeny lines as EDVs was similar between thresholds, based on observed and simulated values of GD(P1,P2) for both flint and dent lines. For introgression lines, however, 1 – ß was substantially higher for T values based on simulated data than those based on observed data. Considerable differences existed also between observed and simulated data regarding values of {alpha} and 1 – ß for T = 0.25, 0.20, 0.15 and 0.10.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Our study was initiated by commercial breeding companies to provide guidelines for the derivation of thresholds for EDVs in maize based on scientifically reliable criteria, as requested by UPOV and ASSINSEL. Representative germplasm for each material group was taken from public and private breeding programs. The SSRs were chosen as a suitable marker system due to their known map positions, high degree of polymorphism, and suitability for automated high-throughput analyses. Our results are therefore relevant for conceivable essential derivation scenarios in European maize germplasm and the power of SSRs for discriminating different types of progeny lines.

In our opinion, EDV thresholds should be principally defined based on the notion of accepted vs. unaccepted breeding procedures. This is in contrast to the approach in some crops such as ryegrass (Lolium spp.) (International Seed Federation, 2002b) or lettuce (Lactuca sativa L.) (B. Vosman, 2003, personal communication), where EDV thresholds based on percentiles of the distribution of GD values in a reference set of current germplasm are discussed. Once thresholds based on scientifically reliable criteria are defined, the question of whether a specific line is a putative EDV or not can be based on GD estimates because pedigree relationships are initially unavailable. Hence, we investigated the possibility to uncover certain pedigree relationships with molecular markers and explored the consequences of EDV thresholds suggested hitherto with regard to commonly used breeding procedures.

Use of SSR-Based GDs for Identification of EDVs
The rationale for using SSR-based GD estimates for identification of EDVs is their relationship to f. In addition, GDs provide a better estimate of the true genome proportion p than the probabilistic value of f. Correlations between GDs and 1 – f calculated separately for each material group and across the entire data set (r = 0.77, P < 0.01) were similar or higher than reported in previous studies with maize (Lübberstedt et al., 1999; Pejic et al., 1998). This reflects the broad germplasm base in this study, ranging from unrelated to closely related combinations of lines. Moreover, the linear relationship observed between GD and f in the present data confirmed that GDs based on SSRs faithfully reflect the genetic diversity of the germplasm. In spite of the observed high correlations, considerable variation was observed for GD values obtained for the same f values, and thus overlaps in the frequency distributions of GDs occurred for f = 0.50, 0.75, and 0.88. Therefore, F2–, BC1–, and BC2–derived progenies could not be distinguished unambiguously by their GD(P1,O).

Factors Influencing p and GD(P1,O)
According to Eq. [1], GD(P1,O) is affected by three factors: the true but unobservable distribution of the real genome contribution p, the accuracy of its estimation by GDs based on molecular markers, and the parental genetic distance GD(P1,P2). Assuming the ideal case that unrelated parents [f(P1,P2) = 0] show a GD of 1.0 for a set of markers covering uniformly the entire genome, then GD(P1,O) yields an estimate of 1 – p, which theoretically results in the highest discrimination ability between different types of progeny (Heckenberger et al., 2005, unpublished data). Even for this most favorable case, overlaps between the frequency distributions of F2– and BC1–derived lines, as well as between BC1– and BC2–derived lines, were found in simulations (Fig. 1).

The range of GDs between unrelated lines from 0.2 to 0.9 was higher than reported in previous studies (Messmer et al., 1993) and suggested that some lines were not unrelated, as previously assumed. However, a cautious revision of the pedigree data revealed that the corresponding lines have no documented common ancestor. Furthermore, the data was checked with outlier tests which revealed no outliers in the corresponding distributions. Despite the strong influence on the range of GDs, the few low GD values hardly affected the mean and variance of GDs of unrelated lines due to the high number of line pairs analyzed in this study (Fig. 3).

Means and variances for distributions of values for F2–derived progenies were in close agreement with the distribution of simulated p values. However, p for BC1–derived progenies was substantially lower than the expectation (Table 2). This shift toward the distribution of F2–derived progenies is very likely attributable to the selection of the most vigorous BC1 plants in the development of improved progeny lines. Because of the phenomenon of heterosis, such BC1 plants are more heterozygous and consequently have a higher proportion of donor genome than the average. This selection for more heterozygous plants would obviously result in an increased overlap in the frequency distributions of GDs between F2 and BC1–derived or between BC1– and BC2–derived lines, compared with the simulated data shown in Fig. 1.

Deviating from the ideal case, the GD(P1,P2) between unrelated lines was <1.0 and showed a considerable variance 2GD. This leads to condensed and more flat frequency distributions for GD(P1,O) values of F2–, BC1–, and BC2–derived progenies and, therefore, to a further increase in the overlaps. The magnitude of the overlaps is mainly caused by the parameters GD and 2GD of unrelated lines. Because of different levels of genetic diversity among breeding germplasm of crops, GD and 2GD vary considerably among different crop species. For example, the GDs between unrelated barley (Melchinger et al., 1994) or tomato cultivars (Grandillo et al., 1999) were substantially lower than those in maize (Messmer et al., 1993). This underlines the necessity of crop-specific thresholds for the discrimination of EDVs and IDVs.

Consequences of Various EDV Thresholds
For fixed T = 0.25, 0.20, 0.15, and 0.10, substantial differences for the Type I error {alpha} and Type II error ß were found between the three material groups. Further analyses revealed that pooling flint with dent data would lead to a significant increase in the appearance of flint lines in the fraction of EDVs (data not shown). Moreover, fixing a joint threshold for intrapool and interpool progenies would result in a substantially greater risk of developing an EDV from intrapool than interpool crosses. A pool-specific approach is, consequently, more fair in terms of {alpha} and ß than universal GD thresholds.

Thresholds calculated for simulated GD(P1,O) values were generally lower than for observed data. This can be partially explained by the occurrence of nonparental alleles. The most probable reason for this shift, however, is the fact that simulated GD values were based on GD and 2GD of all pairwise distances between unrelated (f = 0) lines within a material group. But breeders often prefer genetically diverse inbred lines within a gene pool as parents for recycling breeding. This implies that the parental lines used in breeding programs may not be a random sample of all unrelated lines of a germplasm pool. When the generation of simulated GD values was repeated using the mean and variance of the estimated GDs of the subset of inbreds with f = 0, which were actually used in this study as parental lines (rather than using the mean and variance of estimated GDs between all pairs of inbreds with f = 0), this resulted in a shift of the thresholds toward the corresponding experimental values.

Precision of GDs and Number of Markers Required
Apart from their Type I and Type II errors, the robustness of GD(P1,O) values against addition, substitution, or removal of markers is an important factor to be considered for the development of appropriate thresholds. Standard errors (SEs) of GD estimates based on the 100 SSR markers ranged from 0.01 to 0.06, and were of considerable size across all scenarios and material groups (Fig. 2). However, SEs decreased with decreasing GDs, and thus were not independent from the GD estimates themselves.

Empirical 95% confidence intervals (CI) for GDs were found to be wider than desirable. For example, the 95% CIs at the threshold value T0.05 ranged from 0.13 to 0.29 (F2 vs. BC1 for flint lines) and from 0.04 to 0.14 (BC1 vs. BC2 for flint lines). Thus, a sample 100 SSRs seem to be at the lower limit for identification of EDVs, because high SEs for GDs increase the Type I and Type II errors. Hence, we recommend a two-stage procedure for identification of EDVs with SSRs in which a set of 100 SSRs uniformly distributed across the genome is analyzed initially, and if there are doubts about the EDV status of a new line, a second set of 100 or more SSR markers is analyzed subsequently.

Given an upper limit for the SE of GDs, one can calculate the necessary number of markers depending on the mean number of alleles per marker (Foulley and Hill, 1999). In addition, Heckenberger et al. (2005, unpublished data) investigated by simulation the size of SEs of GDs as a function of decreasing marker density. Fixing the upper boundary of SE of GDs to 0.01 would require a substantial increase in the number of SSRs necessary. For example, a minimum of 260 SSRs would be required to reduce the average SE to 0.01 at a GD level of 0.20. Alternatively, the SE of GDs could be reduced by the choice of SSR markers, with a higher degree of polymorphism. The effective number of alleles (ne) in our study was 4.2. If it could be doubled to ne = 8.4 by an appropriate choice of highly polymorphic SSR markers, only {approx}120 SSRs would be required to reduce the average SE to 0.01 at a GD level of 0.20. As this high degree of polymorphism is rather unrealistic in connection with an equal distribution of markers across the maize genome, other types of markers seem more promising, such as single nucleotide polymorphisms which can be determined in large numbers by high-throughput techniques.

A further aspect with regard to the accuracy of GD estimates is the correlation between true and marker-estimated genome proportions and GDs. In our companion study (Heckenberger et al., 2005, unpublished data), we investigated this correlation as a function of marker density and marker distribution. For maize genome parameters under the assumption that the parental lines are fully polymorphic (GD(P1,P2) = 1), the correlation amounted to r = 0.95 for the marker set used in this study, but decreased below 0.90 for the level of polymorphism observed in our study for unrelated lines.

Appropriate Distance Measures
It is desirable that the GDs between the progeny and either parent add up to the GD between the parental lines (Melchinger, 1993). From all commonly used GD measures, this criterion generally holds true only for the Rogers' (1972) and the Nei and Li (1979) distances. In addition, a linear relationship to f is desired, which is fulfilled by both GD measures. Coefficients like Jaccard (1908), or simple matching (Sneath and Sokal, 1973) are based on single bands, irrespective of the marker locus to which they belong. Heterozygous loci are, thus, overweighed. In contrast, the Rogers' distance is based on the allele frequencies of each marker. Therefore, multiple alleles for a particular marker are, weighted in comparison with homozygous alleles from another marker. As the proportion of heterozygous loci may be substantial even for inbred lines in advanced selfing generations (Heckenberger et al., 2002), we recommend the Rogers' distance for identification of EDVs with SSRs. Alternatively, one might consider a distance measure that takes the map positions and linkage of the markers into account, as suggested by Dillmann et al. (1997).


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Our results showed that the use of SSR-based GDs can aid to distinguish between progenies derived from F2, BC1, or BC2 source populations, but this discrimination is associated with a considerable error rate due to the overlaps in the distributions of and GD(P1,O). Nevertheless, we see clear advantages to determining GD-thresholds based on accepted vs. unaccepted breeding procedures instead of a threshold based on percentiles of the distribution of GD values in a reference set of current germplasm. First, as requested by ASSINSEL (1999), it is based on scientific principles borrowed from statistical test theory. Second, it has a direct relation to the original intention of the UPOV convention Article 14 (5c), where unaccepted breeding categories are listed exemplarily, but GD-thresholds are not mentioned. Third, it provides guidelines to breeders regarding risks of unintentionally developing an EDV by different breeding methods and parental germplasms. Fourth, the reference set of lines for our approach is based on an objective criterion (unrelated lines chosen at random from the material group under consideration), whereas the choice of a reference set of elite varieties in other approaches can be rather subjective and/or arbitrary.

After a generally accepted agreement on EDV thresholds has been reached, the initial suspicion of a putative EDV could be based on its GD to the IV. Subsequently, the breeder of the putative EDV would have the burden of proof to demonstrate, for example, by pedigree records, that the new line was developed by an accepted breeding method. Because of the overlaps in the frequency distributions of GD(P1,O) for F2–, BC1–, and BC2–derived progenies, the choice of an appropriate T is crucial with regard to Type I ({alpha}) and Type II (ß) errors. While the EDV threshold suggested by ASSINSEL (T = 0.20) results in fairly balanced {alpha} and ß values for flint lines, but rather low ß values for dent and introgression lines, we recommend crop- and germplasm-pool-specific thresholds on the basis of a fixed {alpha} level or {alpha} = ß. Furthermore, the thresholds should depend on the marker set and distance measure chosen. Implementation of the EDV concept in practical plant breeding requires a standard set of highly polymorphic markers for reliable determination of GDs with the number of markers being based on the expected SEs of GDs at the threshold. In addition, we strongly recommend replication of lab assays to minimize lab errors.


    ACKNOWLEDGMENTS
 
We are indebted to the Gesellschaft zur Förderung der privaten Pflanzenzüchtung (GFP), Germany, for a grant to support M. Bohn and the SSR analyses of the present study. Financial support for M. Heckenberger was provided by a grant from the European Union, Grant No. QLK-CT-1999-01499 (MMEDV). Finally, we wish to thank three anonymous reviewers as well the Associate Editor for valuable comments on this manuscript.

Received for publication February 23, 2004.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
M. Heckenberger, M. Bohn, D. Klein, and A. E. Melchinger
Identification of Essentially Derived Varieties Obtained from Biparental Crosses of Homozygous Lines: II. Morphological Distances and Heterosis in Comparison with Simple Sequence Repeat and Amplified Fragment Length Polymorphism Data in Maize
Crop Sci., May 6, 2005; 45(3): 1132 - 1140.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Supplemental Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Heckenberger, M.
Right arrow Articles by Melchinger, A. E.
Right arrow