|
|
||||||||
a Institute of Plant Breeding, Seed Science, and Population Genetics, Univ. of Hohenheim, 70593 Stuttgart, Germany
b Crop Science Dep., Univ. of Illinois, S-110 Turner Hall, 1102 South Goodwin Ave., Urbana, IL, 61801
* Corresponding author (melchinger{at}uni-hohenheim.de)
| ABSTRACT |
|---|
|
|
|---|
) and Type II (ß) errors, depending on the germplasm pool. Considerable overlaps in the GD frequency distributions of F2, BC1, and BC2derived lines indicate that the resolution to discriminate these types of progeny is poor unless a much larger number or a set of extremely polymorphic markers is used.
Abbreviations: AFLP, amplified fragment length polymorphism ASSINSEL, International Association of Plant Breeders for the Protection of Plant Varieties BC, backcross CI, confidence interval EDV, essentially derived variety f, coefficient of parentage GD, genetic distance IDV, independently derived variety IV, initial variety p, variation in the parental contribution PIC, polymorphic information content PVP, plant variety protection SSR, simple sequence repeat T, potential threshold UPOV, International Union for the Protection of New Varieties of Plants
| INTRODUCTION |
|---|
|
|
|---|
The advent of new methods such as genetic engineering and marker-assisted backcrossing, however, has provided the basis to undermine the breeder's exemption in its original intention. These tools make it possible to add a few new genes to a protected variety or to select deliberately for lines that are very similar to one of their parents and apply for PVP for this new variety. Therefore, the investments made in breeding the original variety can be exploited by the breeder of the plagiarized variety without indemnification to the breeder of the original variety.
The concept of EDVs was implemented into the revised UPOV convention (UPOV, 1991) and several national PVP acts to cope with this new situation. Accordingly, a variety is deemed to be essentially derived from an IV, if it (i) was predominantly derived from the IV, (ii) is clearly distinguishable from the IV, and (iii) genetically conforms to the IV. The genetic conformity between the IV and potential EDVs can result from repeated backcrossing, genetic engineering, reselection for variants within varieties, or other unaccepted breeding procedures.
Nevertheless, the UPOV convention gives space for interpretations how thresholds should be determined. Furthermore, breeding companies have not yet agreed on a catalog of accepted or unaccepted breeding procedures that result generally in independently derived varieties (IDVs) or EDVs, respectively. For instance, some companies argue that development of a line from a BC1 population obtained by using the line of a competitor as recurrent parent should be accepted, whereas others claim that even the derivation from an F2 population is unacceptable. In addition, no official guidelines or appropriate methods have been set to assess the genetic conformity between IVs and potential EDVs. Hence, crop-specific thresholds for the discrimination between EDVs and IDVs have not yet been defined.
In principle, the coefficient of parentage (f) introduced by Malécot (1948) could serve for identification of EDVs because it reflects the degree of relatedness between two genotypes on the basis of their pedigrees. In the case of a suspected EDV, however, pedigree data are usually not available for the breeder of the IV. In addition, f is an indirect measure of genetic similarity (i.e., the expected proportion of alleles identical by descent between two individuals) based on several simplifying assumptions such as equal parental genome contributions and absence of selection, mutation, or drift (Melchinger et al., 1991).
Molecular markers such as simple-sequence repeats (SSRs) or amplified fragment length polymorphisms (AFLPs) allow determination of the parental origin of the chromosomal segments in a progeny. Therefore, GDs based on molecular markers were proposed as an appropriate tool to determine the genetic conformity between an IV and putative EDVs and, consequently, to distinguish between EDVs and IDVs [International Association of Plant Breeders for the Protection of Plant Varieties (ASSINSEL), 1999; International Seed Federation, 2002a]. In maize, GDs between lines based on AFLP and SSR data were highly correlated with each other and with f estimates (Lübberstedt et al., 2000; Smith et al., 1997), suggesting that the degree of relatedness of two genotypes can be inferred from their GD. However, distributions of GDs for F2 and BC1derived progenies showed a substantial overlap (Bernardo et al., 1997).
In a companion paper, we proposed a conceptual framework based on principles of statistical test theory for identification of EDVs with molecular markers (Heckenberger et al., 2005, unpublished data). Accordingly, for a progeny line derived from biparental crosses, the GD to each parent depends on the GD between the two parents and p, the parental genome contribution transmitted to the progeny. Experimental estimates of p for F2 and BC1derived progenies were reported (Bernardo et al., 1997, 2000), and formulas for the variance of p for both types of progeny were derived (Wang and Bernardo, 2000). Nonetheless, further experimental data are required to verify the approach of Heckenberger et al. (2005, unpublished data) and quantify the influence of the above-mentioned factors with regard to consequences for potential EDV thresholds.
In this study, we investigated a large number of triplets in maize, each consisting of homozygous progeny lines derived from F2, BC1, or BC2 populations and their parental inbreds to provide benchmark data for practical implementation of the EDV concept in maize breeding. Our main goal was to test the hypothesis that GDs based on SSR data can be used to identify lines derived with unaccepted breeding methods. In detail, our objectives were to (i) estimate the p to the genome of the progeny; (ii) investigate the power of SSR-based GD estimates for discriminating between homozygous lines derived from F2, BC1, and BC2 populations; (iii) compare the theoretical and simulated results of Heckenberger et al. (2005, unpublished data) with our experimental data; and (iv) draw conclusions with regard to various EDV thresholds suggested in the literature.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Altogether, 83% of the progenies were derived from F2 populations and 17% were derived from BC1 or BC2 populations (Table 1). Detailed information on all 163 triplets and the 220 maize inbreds included in this study is available as supplemental data on the internet at http://crop.scijournals.org/.
|
Statistical Analyses
Polymorphic information content (PIC) values were estimated as suggested by Anderson et al. (1993). Malécot's (1948) coancestry coefficient (f) was calculated from pedigree information between all pairwise line combinations based on rules described by Melchinger et al. (1991). Genetic distances between lines based on SSR data were estimated using Rogers' distance (Rogers, 1972). In the case of missing values in one of the two inbreds compared, the corresponding alleles of the other accession were not used for GD calculation. Standard errors (SEs) for GDs were estimated using the bootstrap procedure with resampling over primer pairs (Tivang et al., 1994). This requires independent pairs of molecular markers, which can be taken for granted for those 90% of marker pairs located on different chromosomes. In addition, a further analysis of this germplasm revealed that the extent of intrachromosomal linkage disequilibrium was small (Stich, 2004, personal communication). Therefore, the genotypes at an overwhelming majority of marker pairs can be considered as stochastically independent.
Coefficients of correlation between GDs based on SSRs (GDSSR) and f were calculated using simple correlation coefficients (Snedecor and Cochran, 1980). In addition, a lack-of-fit test was used to test for linear or quadratic relationships between f and GD. Calculation of GDs was performed with the PLABSIM software (Frisch et al., 2000). The remaining statistical calculations were performed with the R software package (Ihaka and Gentleman, 1996).
Statistical Framework
Suppose progeny line O is derived from a cross or BC (e.g., F2, BC1, or BC2 generation) between the homozygous parents P1 and P2 and the GDs between P1 and P2 as well as P1 and O are denoted as GD(P1,P2) and GD(P1,O), respectively. When O was an F2derived homozygous progeny line, P1 was the first parent listed in the pedigree record of O. When O was a BC-derived inbred, P1 was the recurrent parent. If GD is determined by a large number of polymorphic markers with uniform coverage of the entire genome, we obtain the following equation:
![]() | [1] |
Solving Eq. [1] for p yields the estimate

is an unbiased estimator of the true genome proportion. Similar formulas were given by Bernardo et al. (1997) on the basis of the number of common bands between P1 and O or the simple matching coefficient (Sneath and Sokal, 1973). Since the latter is based on single alleles without weighing multiple bands within a marker locus, we chose the Rogers' distance for this study. In the absence of selection, p is a random variable with distributional properties depending on the (i) degree of relatedness between P1 and O and (ii) number and length of the chromosomes (Wang and Bernardo, 2000). If P1 and P2 are unrelated [f(P1,P2) = 0], then the expected value of p, µp, corresponds to the coancestry f(P1,O) and, thus, µp = 0.500, 0.750, and 0.875 for F2, BC1, or BC2derived progeny lines of P1, respectively.
Formulas for the variance of p
for F2 or BC1derived progeny lines were given by Wang and Bernardo (2000). In addition, numerical values for maize were obtained for F2, BC1, or BC2derived progeny lines from stochastic simulations by Heckenberger et al. (2005, unpublished data). The simulations were based on a genetic model allowing for genetic drift, but neither selection nor mutation. Empirical and simulated frequency distributions of p values were compared with a Kolmogorov-Smirnov test (Lehmann, 1986) to check for significant deviations caused by selection or mutation. Equality of variances of empirical and simulated estimates of p was evaluated with Levene's test (Levene, 1960).
If a number of specific progeny lines is derived from a large number of biparental crosses between different parents P1 and P2 representative for a germplasm pool, then GD(P1,P2) can be regarded as a random variable with mean µGD
and variance
2GD
. Since the value of p for a specific progeny is completely unrelated to the GD of its parent lines, GD(P1,P2) and p are stochastically independent. Thus, we obtain from Eq. [1] the following equations, using formulas given by Goodman (1960):
![]() | [3] |
![]() | [4] |
2GD
are the mean and variance of GD(P1,O), respectively, for a given relationship between O and P1.
By inserting experimental estimates for
2GD
and estimates for µp and
2p, determined either from computer simulations (Heckenberger et al., 2005, unpublished data) or the formulas given by Wang and Bernardo (2000), we were able to calculate predicted values for
2GD
and compare them with estimated values for F2 or BC1derived progeny lines from unrelated parents. Moreover, Eq. [4] permits comparison of the relative influence of
2p and
2GD
on the variance of GD(P1,O) for F2 or BC1derived progeny lines, which is of importance for the question of EDV thresholds. In addition, simulated GD(P1,O) values were calculated with Eq. [1] for each material group on the basis of simulated p values and
GD
, whereas
2GD
values were estimated for observed GD(P1,P2) values of unrelated lines.
Simulation Studies
In a companion study (Heckenberger et al., 2005, unpublished data), we performed simulations to obtain the approximate distribution of p. Briefly, each simulation of a breeding program started with crossing parents P1 and P2, which were assumed to be homozygous (i.e., inbreeding coefficient F = 1) and fully polymorphic [i.e., GD(P1,P2) = 1 and
2GS
] at all marker loci. One heterozygous F1 individual was selfed to produce a segregating F2 population or was backcrossed to parent P1 to obtain a BC1 population. A randomly selected BC1 individual was again backcrossed to P1 to obtain a BC2 population. From the segregating populations, randomly selected individuals were chosen and advanced to homozygosity by using the single seed descent method. Values of p were determined by dividing the number of loci homozygous for P1 by the total number of loci monitored. Simulations of each breeding program were repeated 50000 times to reduce sampling effects and to obtain estimates of µp and
2p with high numerical accuracy. The simulations were performed with the PLABSOFT software package (Maurer et al., 2004) to consider the effects of marker density and marker distribution (random vs. equal distribution) on the accuracy of GD(P1,O) estimates.
Threshold Scenarios
To increase the sample size, all the GD values of the data set with corresponding f values 0.5, 0.75, and 0.875 for F2, BC1, and BC2derived progeny lines were used in addition to GD values obtained within triplets to evaluate potential thresholds (T). The frequency distributions of empirical GD(P1,O) values for F2, BC1, or BC2derived progeny lines were approximated by ß distributions (Johnson et al., 1995), with parameters chosen such that the mean and variance of the original distribution were conserved. On the basis of these distributions, we calculated Type I (
) and Type II (ß) errors for various EDV thresholds and various types of populations. Here,
corresponds to the probability that a true IDV will be wrongly judged as EDV, whereas ß corresponds to the probability that a true EDV will not be recognized as such but judged as IDV (Fig. 1)
. First, we consider the situation that an F2derived progeny will be regarded as IDV, but a BC1derived progeny as EDV. Second, we assume that a BC1derived progeny will be regarded as IDV, but a BC2derived progeny as EDV.
|
and ß values were calculated for homozygous progeny lines derived from F2, BC1, and BC2 populations. In addition, other thresholds with fixed
= 0.05 (T0.05) or
= ß (T
= ß) were evaluated to balance Type I and Type II errors, and therefore the risk of obtaining an EDV from an accepted procedure vs. the risk of declaring a true EDV as IDV. | RESULTS |
|---|
|
|
|---|
Correlations between GD and 1 f were highly significant (P < 0.01) for all three material groups, being highest for dent lines (r = 0.90, P < 0.01), intermediate for flint lines (r = 0.75, P < 0.01), and lowest for introgression lines (r = 0.58, P < 0.01). In addition, we observed linear but not quadratic relationships between f and GD for all three material groups.
Parental Contributions for F2 and BC1derived Progenies
The three material groups did not differ from each other in their mean parental contributions,
p, for F2 as well as for BC1derived progeny lines. Therefore, the data from all three groups were pooled for further analyses. For F2derived progenies, SSR-based estimates of p ranged from 0.25 to 0.74, with
p = 0.49 (Fig. 2)
close to the expectation of 0.50. Variances for estimated and simulated values of p
did not differ significantly (P < 0.05) (Table 2). Frequency distributions for estimated and simulated values of p were significantly different (P < 0.05) from each other due to a higher kurtosis of the former.
|
|
p = 0.66, which was significantly smaller than the expectation of 0.75 (Fig. 3)
. Variances for estimated and simulated values of p
were not significantly different (P < 0.05) from each other (Table 2). Frequency distributions for estimated and simulated values of p showed significant differences (P < 0.01) due to the shift to smaller values, the lower skewness, and the higher kurtosis for the distribution of observed
values.
|
GD
= 0.58 (Fig. 3). The GDs for unrelated dent lines varied from 0.25 to 0.85 with a significantly (P < 0.01) larger mean
GD
= 0.61. Unrelated parents of introgression lines, consisting of pairs of European and U.S. maize lines, had by far the largest range from 0.22 to 0.93 and also a significantly higher (P < 0.01) mean
GD
= 0.74 than the intrapool pairs of the other two material groups.
Subdivision of the Variance of GD(P1,O) for F2 and BC1derived Progenies
Observed values of
2GD
obtained from experimental data were in close agreement with the predicted
2GD
values calculated with Eq. [4] on the basis of simulated values of µp and
2p as well as experimental estimates of
GD
and
2GD
(Table 2). Further analysis revealed that for F2derived progenies, 65% of
2GD
could be explained by
2p and 34% by
2GD
. For BC1derived progenies, 92% of
2GD
were explained by
2p, and only 8% by
2GD
. The contribution of the product
2p
2GD
to
2GD
was approximately 1% for both F2 and BC1derived progeny lines. Thus, the proportion of
2GD
explained by
2p is increased for advanced BC generations, and
2GD
can be regarded as nearly independent of
2GD
for BC progenies, but not for F2derived progeny lines.
Evaluation of Essentially Derived Variety Threshold Scenarios
Observed frequency distributions of GD values for F2, BC1, and BC2derived progenies fitted well the approximated ß distributions for flint and dent lines, but only moderately for introgression lines (Fig. 4)
. For all three material groups, considerable overlaps were observed between the frequency distributions of GDs for F2 vs. BC1 as well as for BC1 vs. BC2derived progenies. Within each generation,
GD
was significantly higher (P < 0.05) for the dent lines than for the flint lines (Fig. 5)
. In addition,
GD
for the introgression lines was always significantly higher (P < 0.01) than
GD
for the flint and dent lines. Estimates of
GD
within the same generation were not significantly different (P < 0.01) between flint and dent lines but were significantly larger (P < 0.01) for introgression lines.
|
|
= 0.05 for F2derived lines, the power 1 ß to classify a BC1derived progeny line correctly as EDV amounted to 77, 63, and 15% for the particular thresholds determined for flint, dent, and introgression lines, respectively (Table 3). Corresponding values of 1 ß for BC2derived lines, assuming
= 0.05 for BC1derived lines, were lower for flint and dent lines but larger for introgression lines. The power 1 ß for thresholds determined for
= ß to classify BC1 or BC2derived progenies as EDVs increased considerably compared with the values for
= 0.05. This increase in the power 1 ß, however, is associated with higher values for
. Therefore, this leads to a considerably higher frequency of F2 or BC1derived progenies incorrectly classified as EDVs.
|
levels for F2derived lines varied between
= 0.18 and
= 0.00 (Table 3). Corresponding values for 1 ß ranged between 7 and 92%. For T = 0.15 and T = 0.10, the power 1 ß to detect a BC2derived line as EDV varied from 10 to 99%, with corresponding
values for BC1derived lines ranging from 0.02 to 0.07. For each T, substantial differences for
and 1 ß between flint, dent, and introgression lines were observed.
For
= 0.05 and
= ß, T values obtained from simulated data were lower than from observed data, with the exception of
= 0.05 for introgression lines (Table 3). For all these scenarios, the power 1 ß to classify BC1 or BC2derived progeny lines as EDVs was similar between thresholds, based on observed and simulated values of GD(P1,P2) for both flint and dent lines. For introgression lines, however, 1 ß was substantially higher for T values based on simulated data than those based on observed data. Considerable differences existed also between observed and simulated data regarding values of
and 1 ß for T = 0.25, 0.20, 0.15 and 0.10.
| DISCUSSION |
|---|
|
|
|---|
In our opinion, EDV thresholds should be principally defined based on the notion of accepted vs. unaccepted breeding procedures. This is in contrast to the approach in some crops such as ryegrass (Lolium spp.) (International Seed Federation, 2002b) or lettuce (Lactuca sativa L.) (B. Vosman, 2003, personal communication), where EDV thresholds based on percentiles of the distribution of GD values in a reference set of current germplasm are discussed. Once thresholds based on scientifically reliable criteria are defined, the question of whether a specific line is a putative EDV or not can be based on GD estimates because pedigree relationships are initially unavailable. Hence, we investigated the possibility to uncover certain pedigree relationships with molecular markers and explored the consequences of EDV thresholds suggested hitherto with regard to commonly used breeding procedures.
Use of SSR-Based GDs for Identification of EDVs
The rationale for using SSR-based GD estimates for identification of EDVs is their relationship to f. In addition, GDs provide a better estimate of the true genome proportion p than the probabilistic value of f. Correlations between GDs and 1 f calculated separately for each material group and across the entire data set (r = 0.77, P < 0.01) were similar or higher than reported in previous studies with maize (Lübberstedt et al., 1999; Pejic et al., 1998). This reflects the broad germplasm base in this study, ranging from unrelated to closely related combinations of lines. Moreover, the linear relationship observed between GD and f in the present data confirmed that GDs based on SSRs faithfully reflect the genetic diversity of the germplasm. In spite of the observed high correlations, considerable variation was observed for GD values obtained for the same f values, and thus overlaps in the frequency distributions of GDs occurred for f = 0.50, 0.75, and 0.88. Therefore, F2, BC1, and BC2derived progenies could not be distinguished unambiguously by their GD(P1,O).
Factors Influencing p and GD(P1,O)
According to Eq. [1], GD(P1,O) is affected by three factors: the true but unobservable distribution of the real genome contribution p, the accuracy of its estimation by GDs based on molecular markers, and the parental genetic distance GD(P1,P2). Assuming the ideal case that unrelated parents [f(P1,P2) = 0] show a GD of 1.0 for a set of markers covering uniformly the entire genome, then GD(P1,O) yields an estimate of 1 p, which theoretically results in the highest discrimination ability between different types of progeny (Heckenberger et al., 2005, unpublished data). Even for this most favorable case, overlaps between the frequency distributions of F2 and BC1derived lines, as well as between BC1 and BC2derived lines, were found in simulations (Fig. 1).
The range of GDs between unrelated lines from 0.2 to 0.9 was higher than reported in previous studies (Messmer et al., 1993) and suggested that some lines were not unrelated, as previously assumed. However, a cautious revision of the pedigree data revealed that the corresponding lines have no documented common ancestor. Furthermore, the data was checked with outlier tests which revealed no outliers in the corresponding distributions. Despite the strong influence on the range of GDs, the few low GD values hardly affected the mean and variance of GDs of unrelated lines due to the high number of line pairs analyzed in this study (Fig. 3).
Means and variances for distributions of
values for F2derived progenies were in close agreement with the distribution of simulated p values. However,
p for BC1derived progenies was substantially lower than the expectation (Table 2). This shift toward the distribution of F2derived progenies is very likely attributable to the selection of the most vigorous BC1 plants in the development of improved progeny lines. Because of the phenomenon of heterosis, such BC1 plants are more heterozygous and consequently have a higher proportion of donor genome than the average. This selection for more heterozygous plants would obviously result in an increased overlap in the frequency distributions of GDs between F2 and BC1derived or between BC1 and BC2derived lines, compared with the simulated data shown in Fig. 1.
Deviating from the ideal case, the GD(P1,P2) between unrelated lines was <1.0 and showed a considerable variance
2GD
. This leads to condensed and more flat frequency distributions for GD(P1,O) values of F2, BC1, and BC2derived progenies and, therefore, to a further increase in the overlaps. The magnitude of the overlaps is mainly caused by the parameters
GD
and
2GD
of unrelated lines. Because of different levels of genetic diversity among breeding germplasm of crops,
GD
and
2GD
vary considerably among different crop species. For example, the GDs between unrelated barley (Melchinger et al., 1994) or tomato cultivars (Grandillo et al., 1999) were substantially lower than those in maize (Messmer et al., 1993). This underlines the necessity of crop-specific thresholds for the discrimination of EDVs and IDVs.
Consequences of Various EDV Thresholds
For fixed T = 0.25, 0.20, 0.15, and 0.10, substantial differences for the Type I error
and Type II error ß were found between the three material groups. Further analyses revealed that pooling flint with dent data would lead to a significant increase in the appearance of flint lines in the fraction of EDVs (data not shown). Moreover, fixing a joint threshold for intrapool and interpool progenies would result in a substantially greater risk of developing an EDV from intrapool than interpool crosses. A pool-specific approach is, consequently, more fair in terms of
and ß than universal GD thresholds.
Thresholds calculated for simulated GD(P1,O) values were generally lower than for observed data. This can be partially explained by the occurrence of nonparental alleles. The most probable reason for this shift, however, is the fact that simulated GD values were based on
GD
and
2GD
of all pairwise distances between unrelated (f = 0) lines within a material group. But breeders often prefer genetically diverse inbred lines within a gene pool as parents for recycling breeding. This implies that the parental lines used in breeding programs may not be a random sample of all unrelated lines of a germplasm pool. When the generation of simulated GD values was repeated using the mean and variance of the estimated GDs of the subset of inbreds with f = 0, which were actually used in this study as parental lines (rather than using the mean and variance of estimated GDs between all pairs of inbreds with f = 0), this resulted in a shift of the thresholds toward the corresponding experimental values.
Precision of GDs and Number of Markers Required
Apart from their Type I and Type II errors, the robustness of GD(P1,O) values against addition, substitution, or removal of markers is an important factor to be considered for the development of appropriate thresholds. Standard errors (SEs) of GD estimates based on the 100 SSR markers ranged from 0.01 to 0.06, and were of considerable size across all scenarios and material groups (Fig. 2). However, SEs decreased with decreasing GDs, and thus were not independent from the GD estimates themselves.
Empirical 95% confidence intervals (CI) for GDs were found to be wider than desirable. For example, the 95% CIs at the threshold value T0.05 ranged from 0.13 to 0.29 (F2 vs. BC1 for flint lines) and from 0.04 to 0.14 (BC1 vs. BC2 for flint lines). Thus, a sample 100 SSRs seem to be at the lower limit for identification of EDVs, because high SEs for GDs increase the Type I and Type II errors. Hence, we recommend a two-stage procedure for identification of EDVs with SSRs in which a set of 100 SSRs uniformly distributed across the genome is analyzed initially, and if there are doubts about the EDV status of a new line, a second set of 100 or more SSR markers is analyzed subsequently.
Given an upper limit for the SE of GDs, one can calculate the necessary number of markers depending on the mean number of alleles per marker (Foulley and Hill, 1999). In addition, Heckenberger et al. (2005, unpublished data) investigated by simulation the size of SEs of GDs as a function of decreasing marker density. Fixing the upper boundary of SE of GDs to 0.01 would require a substantial increase in the number of SSRs necessary. For example, a minimum of 260 SSRs would be required to reduce the average SE to 0.01 at a GD level of 0.20. Alternatively, the SE of GDs could be reduced by the choice of SSR markers, with a higher degree of polymorphism. The effective number of alleles (ne) in our study was 4.2. If it could be doubled to ne = 8.4 by an appropriate choice of highly polymorphic SSR markers, only
120 SSRs would be required to reduce the average SE to 0.01 at a GD level of 0.20. As this high degree of polymorphism is rather unrealistic in connection with an equal distribution of markers across the maize genome, other types of markers seem more promising, such as single nucleotide polymorphisms which can be determined in large numbers by high-throughput techniques.
A further aspect with regard to the accuracy of GD estimates is the correlation between true and marker-estimated genome proportions and GDs. In our companion study (Heckenberger et al., 2005, unpublished data), we investigated this correlation as a function of marker density and marker distribution. For maize genome parameters under the assumption that the parental lines are fully polymorphic (GD(P1,P2) = 1), the correlation amounted to r = 0.95 for the marker set used in this study, but decreased below 0.90 for the level of polymorphism observed in our study for unrelated lines.
Appropriate Distance Measures
It is desirable that the GDs between the progeny and either parent add up to the GD between the parental lines (Melchinger, 1993). From all commonly used GD measures, this criterion generally holds true only for the Rogers' (1972) and the Nei and Li (1979) distances. In addition, a linear relationship to f is desired, which is fulfilled by both GD measures. Coefficients like Jaccard (1908), or simple matching (Sneath and Sokal, 1973) are based on single bands, irrespective of the marker locus to which they belong. Heterozygous loci are, thus, overweighed. In contrast, the Rogers' distance is based on the allele frequencies of each marker. Therefore, multiple alleles for a particular marker are, weighted in comparison with homozygous alleles from another marker. As the proportion of heterozygous loci may be substantial even for inbred lines in advanced selfing generations (Heckenberger et al., 2002), we recommend the Rogers' distance for identification of EDVs with SSRs. Alternatively, one might consider a distance measure that takes the map positions and linkage of the markers into account, as suggested by Dillmann et al. (1997).
| CONCLUSIONS |
|---|
|
|
|---|
and GD(P1,O). Nevertheless, we see clear advantages to determining GD-thresholds based on accepted vs. unaccepted breeding procedures instead of a threshold based on percentiles of the distribution of GD values in a reference set of current germplasm. First, as requested by ASSINSEL (1999), it is based on scientific principles borrowed from statistical test theory. Second, it has a direct relation to the original intention of the UPOV convention Article 14 (5c), where unaccepted breeding categories are listed exemplarily, but GD-thresholds are not mentioned. Third, it provides guidelines to breeders regarding risks of unintentionally developing an EDV by different breeding methods and parental germplasms. Fourth, the reference set of lines for our approach is based on an objective criterion (unrelated lines chosen at random from the material group under consideration), whereas the choice of a reference set of elite varieties in other approaches can be rather subjective and/or arbitrary.
After a generally accepted agreement on EDV thresholds has been reached, the initial suspicion of a putative EDV could be based on its GD to the IV. Subsequently, the breeder of the putative EDV would have the burden of proof to demonstrate, for example, by pedigree records, that the new line was developed by an accepted breeding method. Because of the overlaps in the frequency distributions of GD(P1,O) for F2, BC1, and BC2derived progenies, the choice of an appropriate T is crucial with regard to Type I (
) and Type II (ß) errors. While the EDV threshold suggested by ASSINSEL (T = 0.20) results in fairly balanced
and ß values for flint lines, but rather low ß values for dent and introgression lines, we recommend crop- and germplasm-pool-specific thresholds on the basis of a fixed
level or
= ß. Furthermore, the thresholds should depend on the marker set and distance measure chosen. Implementation of the EDV concept in practical plant breeding requires a standard set of highly polymorphic markers for reliable determination of GDs with the number of markers being based on the expected SEs of GDs at the threshold. In addition, we strongly recommend replication of lab assays to minimize lab errors.
| ACKNOWLEDGMENTS |
|---|
Received for publication February 23, 2004.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Heckenberger, M. Bohn, D. Klein, and A. E. Melchinger Identification of Essentially Derived Varieties Obtained from Biparental Crosses of Homozygous Lines: II. Morphological Distances and Heterosis in Comparison with Simple Sequence Repeat and Amplified Fragment Length Polymorphism Data in Maize Crop Sci., May 6, 2005; 45(3): 1132 - 1140. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||