|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Institute of Crop Science and Resource Conservation, Univ. of Bonn, Katzenburgweg 5, D-53115 Bonn, Germany
* Corresponding author (j.leon{at}uni-bonn.de)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: BLUP, best linear unbiased prediction BLUP(E), best linear unbiased prediction considering environmental effects BLUP(E+A), best linear unbiased prediction considering environmental effects and pedigree information BLUP(E+GS), best linear unbiased prediction considering environmental effects and genetic similarities MME, mixed model equations QTL, quantitative trait locus
| INTRODUCTION |
|---|
|
|
|---|
Usually the breeding lines are genetically related. Consequently, the parental lines cannot be considered to be independent from each other. In general, the relationship among lines is calculated by the coefficient of coancestry (Cockerham and Weir, 1983). Calculation of coefficient of coancestry is based on several assumptions: (i) pedigree data of lines is detailed and accurate; (ii) the base population of ancestors is unrelated; and (iii) effects of selection, mutation, and genetic drift are negligible.
However, these assumptions do not hold for self-pollinating crops. Often the relationship among lines is unknown. If the relationship among the ancestor lines is not accounted for in the calculation of coefficient of coancestry, the true genetic relationship will be underestimated. Additionally, the coefficient of coancestry is usually calculated with knowledge of parents, progenies, and other genetic relationships. Even if parental lines are not randomly chosen, but result from intense selection, the coefficient of coancestry of their F1 progenies may not deviate from 0.5 (St. Martin, 1982). This pattern may be changed if progenies have undergone a high selection pressure between the F2 and later generations, and from the fact that selection often takes place toward the elite parent, especially when quality and resistance traits were incorporated into the parental material. Therefore, we suppose that in later generations the selected progenies of a cross are no longer a random sample of the whole population. Consequently, in self-pollinating crops, the assumption that homozygous progenies inherit half the parental genome (of the original cross combination) can hardly be met (Graner et al., 1994).
Since the mentioned changes are not counted for by the usual estimation of coefficient of coancestry, in self-pollinating crops it does not seem to be an appropriate method for calculating the relationship among the lines. Another approach would be to assess the genetic relationship among the lines by means of genetic similarity, which is based on information from molecular markers. Genetic similarity determines the proportion of alleles alike-in-state. Alleles that are alike-in-state, but not identical by descent, are ignored in estimating coefficient of coancestry.
In several studies, pedigree- and DNA marker-based relationship estimates were compared. The level of association between these two estimators may vary among different crop species (Van Becelaere et al., 2005). Tams et al. (2005) stated that in breeding hybrid maize (Zea mays L.), where pedigrees are more reliable and the simplifying assumptions are more appropriate, tighter associations between pedigree- and DNA marker-based estimates were detected than in cultivars of self-pollinating crops.
Currently, in plant breeding neither coefficient of coancestry nor genetic similarity, nor all available phenotypic information of lines are routinely considered in selecting parental material. An estimation of breeding values, commonly applied in animal breeding, is able to integrate both of these sources of information. The breeding value is defined as the sum over the average effects of all alleles of a line. Thus, the average effect of an allele is due to the difference between the overall population mean and the mean of a progeny population resulting from mating of this allele with a random sample of the population. The breeding value of a line can be estimated from the corresponding phenotypic value by considering nongenetic effects and relationship information among the lines. Consequently, the breeding value represents an estimation of the genotypic value. In this case, selection is based directly on the predicted genotype of a line rather than some function of the phenotype (Saxton, 2004).
Breeding values are estimated by best linear unbiased prediction (BLUP) including the relationship information in a variance-covariance matrix A in the mixed model equations (MME). This genetic relationship matrix A is computed by using the coefficient of coancestry.
However, since coefficient of coancestry is not well adapted to self-pollinating crops, genetic similarities may be used in the prediction of BLUP-breeding values instead of the genetic relationship matrix A. In maize, Bernardo (1993, 1994) demonstrated that restriction fragment length polymorphism (RFLP)-based coefficients of coancestry gave better predictions of single-cross yield than pedigree-based coefficient of coancestry. However, negative estimates of molecular marker similarity were obtained, which were probably caused by an upward bias. Bernardo (1999) compared covariance between single crosses using (i) coefficient of coancestry and (ii) conditional covariance, which was based on marker data. The advantage of marker-based BLUP decreased as the heritability and the number of quantitative trait loci (QTL) controlling the trait increased. This lack of improvement in the predictions is due to identical expectations of the covariance between single crosses in both approaches in the absence of selection. Villanueva et al. (2005) compared, in a simulated animal population, calculation of the genetic relationship matrix based on pedigree information with a relationship matrix containing pedigree and marker information used in BLUP analysis. Integration of marker information in the genetic relationship matrix improves the BLUP values dependent on genome length and the number of markers used. However, this approach requires knowledge of the haplotype phases for the closest informative marker pair of each individual.
Concerning the known literature, it can be concluded that the estimation of BLUP breeding values using genetic similarities, especially in self-pollinating crops, has not been demonstrated as being effective. Hence, the question arises whether in self-pollinating crops the selection efficiency of the BLUP-method can be improved further by using genetic similarities instead of the genetic relationship matrix A in the prediction.
The objectives of our research were (i) to examine if coefficient of coancestry and genetic similarity differs among each other under conditions that allow an unbiased calculation of both estimators and (ii) to determine if, in the prediction of BLUP-breeding values of self-pollinating crops, the coefficient of coancestry can be replaced by genetic similarity without any problems.
| MATERIALS AND METHODS |
|---|
|
|
|---|
To create the population, we first simulated 50 lines, which were assumed to be unrelated. These 50 lines represented the base population. To produce progeny, lines were randomly chosen and crossed among each other. The progeny lines were assumed to be homozygous.
Starting from the pedigree a binary matrix B containing 0's and 1's was created. In the binary matrix, Bijk = 1 indicates that the line i carries the allele j at locus k. Accordingly, Bijk = 0 indicates that the line i does not carry the allele j at locus k. Therefore, the binary matrix reflects the specific genotype of each line. As the considered loci represented the QTLs themselves, we did not simulate markers having a loose correlation to a QTL.
The genotypic value of a line was influenced by additive and epistatic effects. As the lines are assumed to be inbred, dominance effects do not exist. The additive effects of alleles were assigned randomly, based on the standard normal distribution with a mean of zero and a standard deviation
of 1 (N [0,1]). Normally distributed additive by additive epistasis (N [0,0.5]) was introduced by choosing randomly 50 allele combinations. Genotypic value gi of a line resulted from the sum of the additive effects over all alleles and loci, plus an epistatic effect if present.
Lines were simulated to be tested in five different environments e each with five replications r. The phenotypic value Pijk for each line was simulated by summing the overall mean of µ = 20, a normally distributed environmental (N [0,10]) effect, a genotype by environment interaction (N [0,2]) effect, and a residual effect to the genotypic value:
![]() | [1] |
ijk (k = 1,..., 12 500).
We simulated three different traits having different heritabilities. These different heritabilities of the traits followed from different standard deviation
of the residual
ijk. Standard deviation of the residue of
= 15, 46, and 90, respectively, resulted in heritabilities of h2 = 0.9, 0.5, and 0.1.
To examine the influence of an unbalanced information structure on selection, we assumed balanced as well as unbalanced designs. The balanced dataset was characterized by equal numbers of observations on each line. Originating from the balanced dataset, we simulated datasets having a systematic structure of missing values and datasets having a completely random structure of 30, 60, and 90% missing values. In the case of the systematic structure of missing values, 250 lines were assumed to be tested in all environments, 200 lines in two environments, and the remaining 50 lines were assumed to be newly developed lines, which had been tested in one environment only. This systematic structure of missing values represented the typical situation in plant breeding, in which many records are available from established cultivars, but only a few from new but promising lines.
The described simulation procedure for each population was repeated ten times using different seeds. To avoid overlapping streams of random numbers, seeds were generated using the SEEDGEN Macro developed by Fan et al. (2002). Within a replication, only one set of parental lines was simulated and all following unbalanced subsets depended on this dataset. Therefore, sampling effects within a simulation replication were assumed to be absent. The simulation process took 5 d on a Pentium IV 2.4 GHz processor.
Field Data
A total of 152 spring barley (Hordeum vulgare L.) accessions, mostly cultivars, were obtained from the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben. Cultivars were released between 1895 and 1998. Due to the high degree of self-pollination of barley, it is assumed that all lines were homozygous. As pedigree information was only available from 82 accessions (46%), the BLUP(E+A) was not calculated in this study.
The accessions were evaluated for three traits (Table 1) in a randomized complete-block design, with two and three replications in 2002 and 2003, respectively, at the Research Station-University of Bonn "Dikopshof" location near Cologne-Wesseling. The plot size was 3.75 m2 with 330 seeds m2 sown.
|
Data Analysis
The statistical analysis of simulated as well as field data was performed using the software ASReml (Gilmour et al., 2002).
The statistical model was as follows:
![]() | [2] |
ijk = residual effect.
![]() | [3] |
= vector of the fixed effect; û = vector of the random additive genetic effect of the lines i;
= vector of the random environmental effect j;
= vector of the random genotype by environment interaction; A = genetic relationship matrix;
=
2
/
2a;
2a = additive genetic variance;
2
= residual variance. X, Z, V and W represent the corresponding design matrices. As in this study there is no fixed effect, X includes only the overall mean µ. For the usual standard BLUP breeding value estimation (BLUP[E+A]), the genetic relationship matrix A was based on coefficient of coancestry using pedigree information. In this procedure, the assumption was met that the ancestor lines in the base population were unrelated. In contrast to Henderson (1976), we considered the inbreeding of the ancestor lines in the genetic relationship matrix A. In addition to this standard BLUP, we introduced a BLUP based on genetic similarities (BLUP[E+GS]). Here we replaced the genetic relationship matrix A in the MME by a matrix containing genetic similarities.
To determine if the coefficient of coancestry or genetic similarities used in the MME improved the selection decision, we also analyzed the data considering only environmental effects (BLUP[E]). In this case, matrix A is just an identity matrix I.
Heritability was calculated following Hanson (1963).
Genetic Similarity
Molecular markers were not simulated since the objective of our study was to compare both estimators of genetic relationship under unbiased conditions.
According to Reif et al. (2005), the simple matching coefficient sSM (Sneath and Sokal, 1973) between two simulated inbred lines i and j is linearly related to coefficient of coancestry and particularly adapted to homozygous inbred lines:
![]() | [3] |
Comparing two lines in our simulated population, the absence of alleles in both lines can be interpreted as a common characteristic of both lines; therefore, the most appropriate estimator for this study is the simple matching coefficient sSM.
The lower the number of QTLs, the higher the probability that lines are considered to be genetically identical. However, genetically identical individuals cause the matrix containing genetic similarities to be singular (Nejati-Javaremi et al., 1997), and as a result, its inverse cannot be calculated. Henderson (1984) suggested an approach in which the MME were modified in such a way that the inversion of this matrix is avoided. However, the calculation of these modified MME is not integrated in ASReml; therefore, in the simulated population, a high number of loci were generated to prevent genetically identical individuals.
The spring barley accessions were analyzed by 23 SSR markers which were used to estimate the genetic relationship between accessions by genetic similarities. However, using the simple matching coefficient sSM, the matrix of genetic similarity was singular. To overcome this problem we used the dice coefficient sD (Reif et al., 2005) instead:
![]() | [4] |
Selection Strategies
Simulated inbred lines and the spring barley accessions, respectively, were selected based on (i) BLUP(E), (ii) BLUP(E+A), (iii) BLUP(E+GS), and (iv) adjusted line means to a common environmental effect. The adjusted line mean was calculated by dividing the observation k of genotype i in environment j by its corresponding environmental mean:
. In contrast, due to a high amount of missing values the estimation of least square mean was not possible.
Ten percent of the lines/accessions with the largest BLUP(E), BLUP(E+A), BLUP(E+GS), and adjusted line mean were selected. For all selection strategies the mean phenotypic value of the selected lines was computed. Because the genotypic value of the virtual population is known, the mean genotypic value of each selected fraction was calculated.
The results of BLUP of the virtual population were analyzed statistically using Proc GLM of SAS 9.1 and Tukey's studentized range test (HSD) to compare the selection strategies. The dependent variable was the mean genotypic value of the selected lines.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
|
|
Comparing Fig. 1a, 1b, and 1c, it can be seen that the mean genotypic value of the selected fraction decreases with decreasing heritability for all selection strategies. With decreasing heritability, the estimation of the true genotype will be more biased. Consequently, more and more unfavorable genotypes are selected and therefore, the mean genotypic value of the selected fraction decreases.
Selection strategies BLUP(E+GS), BLUP(E+A), BLUP(E), and adjusted line mean do not differ significantly among each other regarding a high heritable trait (h2 = 0.9) with a balanced design. That means that in the case of no bias of the phenotypic value due to nongenetic effects, all tested selection strategies were able to detect the favorable genotypes. Regarding a low heritability (h2 = 0.1) and a very high frequency of missing values (90%), the selection strategies also result in nonsignificant differences. In this case, the decrease follows from the low accuracy of prediction. Remarkably, although the selection strategies do not differ significantly, the BLUP(E+A) and BLUP(E+GS) always showed the highest selection response.
In all other simulation scenarios, there are significant differences among selection strategies. We found that in all scenarios, selecting the best parental lines by their BLUP(E+GS) and BLUP(E+A) breeding value leads to a significantly higher mean genotypic value of the selected fraction than by adjusted line means. This indicated that considering relationship information among lines in prediction of BLUP breeding values greatly enhanced the ability to detect favorable genotypes, which increased with decreasing heritability. Accounting for relationship information in prediction was more useful the higher the bias due to nongenetic effects of the trait. With decreasing heritability, the adjusted line means will be highly influenced by nongenetic effects, which reduce selection response. Using relationship information, the performance of related lines can be utilized to predict BLUP breeding values of a line.
In every single case, the consideration of genetic similarities in the prediction of BLUP breeding values (BLUP[E+GS]) resulted in higher genotypic values of the selected fraction compared with the usual BLUP breeding value, which is based on coefficient of coancestry using pedigree information (BLUP[E+A]). However, these differences between BLUP(E+GS) and BLUP(E+A) are not significant. The Pearson's correlation coefficient between coefficient of coancestry and genetic similarity was r = 0.95, which also indicates a close association between the estimators. But especially if the regarded trait has a low heritability (h2 = 0.1), the advantage of BLUP(E+GS) over BLUP(E+A) was noticeable. The small disadvantage of BLUP(E+A) could be traced back to the fact that the ancestor lines in the base population were assumed to be unrelated in the calculation of coefficient of coancestry, which could still cause some bias. To estimate the degree of genetic relatedness among inbred lines, coefficient of coancestry and genetic similarity seem to be suitable in almost the same manner. The close relationship of both estimators in our study is due to our simulation. In real parental populations of self-pollinating crops, the difference between coefficient of coancestry and genetic similarity might be much greater.
In contrast to our study, Graner et al. (1994) detected a low association between coefficient of coancestry and RFLP-based genetic similarity in barley. Since cluster analyses based on coefficient of coancestry and genetic similarity estimates yielded largely different dendrograms, the authors concluded that their results did not support the application of RFLP data for quantifying the degree of pedigree relatedness in barley. These differences could be due to effects of strong selection for quantitative as well as qualitative traits at the same time.
Hence, the use of coefficient of coancestry or genetic similarity should be dependent on the specific situation. Computing genetic similarities requires extensive marker analysis, which is time-consuming and costly. If detailed and reliable pedigree data are available, and the effect of selection on the relationship is negligible, coefficient of coancestry should be appropriate. In the case of missing pedigree data or high selection intensities, genetic similarities are an alternative in the prediction of BLUP breeding values. However, marker-based genetic similarity values have a certain estimation error if marker loci cover only a small percentage of the total genome (Cox et al., 1985; Heckenberger et al., 2005). Moreover, the accuracy of prediction taking genetic similarities into account can be enhanced further if the markers are uniformly distributed across the genome (Heckenberger et al., 2005). This can be confirmed in our study.
Indeed, in several simulation scenarios, BLUP(E) is not significantly different from the adjusted line mean. Taking no relationship information among the lines into account, (BLUP[E]) reduces the prediction of BLUP values to an estimation of the performance of a line. In this case, the correlation between the obtained BLUP breeding value of a line and its phenotype is higher than the correlation between BLUP breeding value and the genotype of this line.
With increasing amount of missing values, the mean genotypic value of the selected fraction decreases. In particular, effects of genotype by environment interaction and epistasis could not be estimated accurately in designs with highly unbalanced data structure.
We generated a dataset where all observations from the tested environments of the "oldest" parental lines were available, but only a few observations of the "youngest" lines. This systematic unbalanced design has on average 24% missing values in the dataset. Table 5 presents the mean genotypic value of the selected fraction for the selection strategies in three different heritabilities.
|
Field Data
To verify the results obtained in the virtual parental population generated by computer simulation, we used field data of 152 spring barley accessions. Since the genotypic values of the spring barley accessions were not known, the mean phenotypic values of the selected accessions were calculated. We compared the phenotypic value of the simulated parental lines to the mean phenotypic value of the spring barley accessions.
BLUP(E+GS) produced a lower mean phenotypic value of the selected fraction of lines/accessions than BLUP(E) (data not shown). This is in contrast to the mean genotypic value of the selected inbred lines, which was increased when lines were selected by BLUP(E+GS) (Fig. 1). However, considering the overall standard error of difference, the BLUP(E) values had higher standard errors than the BLUP(E+GS) values (see Table 6). Apparently, the prediction efficiency is higher for BLUP(E+GS) than for BLUP(E).
|
To test this hypothesis we also examined the simulated parental lines. As for the results of the field data, the fraction selected by BLUP(E+GS) had a lower mean phenotypic value than the respective fraction of BLUP(E) selection (data not shown).
To get an impression whether the results of our simulated virtual parental population were realistic, we compared the relative difference between the mean phenotypic value of the fraction selected by BLUP(E) and mean fraction selected by BLUP(E+GS) for different heritabilities assuming balanced designs. These differences were compared for simulated and field data (Fig. 2 ).
|
Comparing results of field and simulated data, the simulation of plant populations seems to offer a great tool for examination of different selection strategies as the genotypes of the simulated lines are known.
| CONCLUSIONS |
|---|
|
|
|---|
Received for publication January 12, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Legarra, C. Robert-Granie, E. Manfredi, and J.-M. Elsen Performance of Genomic Selection in Mice Genetics, September 1, 2008; 180(1): 611 - 618. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |