Crop Science Grow Your Career with CSSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 6 May 2005
Published in Crop Sci 45:1151-1159 (2005)
© 2005 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Related articles in Crop Science
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Piepho, H. P.
Right arrow Articles by Möhring, J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Piepho, H. P.
Right arrow Articles by Möhring, J.
Agricola
Right arrow Articles by Piepho, H. P.
Right arrow Articles by Möhring, J.
Related Collections
Right arrow Germplasm Enhancement
Right arrow Biometrics
Right arrow Data Management

CROP BREEDING, GENETICS & CYTOLOGY

Best Linear Unbiased Prediction of Cultivar Effects for Subdivided Target Regions

H. P. Piepho* and J. Möhring

Bioinformatics Unit, Univ. of Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart, Germany

* Corresponding author (piepho{at}uni-hohenheim.de)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Breeding for local adaptation may be economically viable providing there is sufficient genotype x subregion interaction. If the targeted subregion is part of a larger region covered by a testing network, information from neighboring subregions can be exploited to gain more precise estimates for the targeted subregion. For balanced data, the simplest approach is to use genotypic mean estimates for the whole target region, and this has often been shown to yield better predictions than simple means per subregion. A disadvantage of this approach is that it gives equal weight to all neighboring subregions and the targeted subregion, thus ignoring potential heterogeneity in information content. The objective of the present paper is to propose a method that allows a weighted combination of data from several subregions and to compare that method to other estimators. The proposed method is based on best linear unbiased prediction, which employs a weighted mean of subregion means. It follows from the theory of mixed models that the resulting estimator is optimal under the assumed model, minimizing prediction errors and maximizing the expected gain from selection. Using published variance component estimates, we found the resulting predictions to be superior to other approaches. We also show that the estimator is beneficial when selecting for global adaptation.

Abbreviations: ANOVA, analysis of variance • BLUP, best linear unbiased prediction • MSEP, mean squared error of prediction • REML, restricted maximum likelihood


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
PLANT BREEDERS usually seek to develop broadly adapted varieties for a wider target region. If the target region is agroecologically diverse, it may be worthwhile to stratify the target into more homogeneous subregions. Stratification will allow more accurate overall performance estimates for candidate varieties in the target region, thus increasing gain from selection. Alternatively, plant breeders may opt to develop locally adapted varieties for specific subregions. Breeding for local adaptation will be worthwhile only when there is substantial genotype x subregion interaction. Moreover, division of a target region will be accompanied by a division of testing resources. Thus, despite presence of substantial genotype x subregion interaction, it may turn out to be more efficient to breed for broad adaptation, if resources are not sufficient for accurately detecting locally adapted genotypes. This has been lucidly demonstrated by Curnow (1988) and Atlin et al. (2000), who studied the response to selection in subdivided target regions. The authors considered genotypic means in a large target and constituent subregions as correlated traits. They showed that the correlated response to selection for overall performance may outperform the direct response to selection within subregions.

There are two opposing factors that will determine if breeding for local adaptation is worthwhile. On the one hand, subdividing the target tends to increase heritabilities (on a mean basis) within subregions, essentially because the genotype x subregion interaction variance becomes a genetic variance component. On the other hand, subdivision of resources will leave only a limited number of trials per subregion, thus decreasing heritabilities within subregions (Atlin et al., 2000). This is why selection based on subregion means alone is not necessarily a good strategy.

Regional trial networks designed to provide cultivar recommendations present breeders with a similar dilemma. If recommendations are based on overall performance in the target region, cultivars with good local adaptation in one or more subregions may go unnoticed, resulting in suboptimal cultivar recommendations. Conversely, an attempt to detect local adaptation by subdivision of the target may be compromised by a limited number of trials remaining per subregion.

The common view seems to be that there are basically two alternative approaches to estimation, depending on whether one strives for broad or local adaptation: (i) if the objective is broad adaptation, ignore subregions and use all data to make selections or recommendations based on overall performance in the target region; (ii) if the objective is local adaptation, use only data from the targeted subregion for selection or recommendation. An associated notion is that selecting or recommending for local adaptation requires almost the same amount of resources within each subregion as would be needed for assessing broad adaptation with the same accuracy. This notion assumes that subregions are not substantially differentiated and that local adaptation can be detected only on the basis of data from the targeted subregion (Comstock and Moll, 1963; Talbot, 1997; Atlin et al., 2000). More often than not, the result of this common view has been that global adaptation is favored over local adaptation. This is usually a reasonable choice if one considers only the two alternatives described above.

The objective of this paper is to show that accuracy of yield estimates can be increased both for global and for local adaptation, if a slightly modified route of analysis is followed. The suggestion is to (i) always contemplate a subdivision of the target and (ii) always use all the data, employing a suitable weighting scheme, based on genetic variances and covariances among subregions, no matter whether broad or local adaptation is the objective. We show how standard mixed model procedures (best linear unbiased prediction) can be used for this task. The method is developed by initially considering balanced data and a two-step approach. This simplifies the exposition and makes key features easier to appreciate. Subsequently, we will stress that a restricted maximum likelihood (REML)-based mixed model analysis demonstrates its full power with unbalanced data and we will explain how replicate data, balanced or unbalanced, can be dealt with in a single-step analysis. Some easy-to-use SAS code, which also works for unbalanced data, is presented in an appendix.


    THEORY
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Subdivision for Global Adaptation
If a random sample of locations is used for yield testing in each subregion, the resulting data may be regarded as a random stratified sample, with strata corresponding to subregions. As is well known from the theory of survey sampling (Kish, 1965), stratification of a target region is beneficial providing there is heterogeneity between subregions, governed by environmental factors such as climate, soil type, or topography. Gain in accuracy is largest in the one extreme (but hypothetical) case that all heterogeneity occurs between subregions, while there is complete homogeneity within subregions. In this extreme case, a single location per subregion would suffice to assess the expected cultivar yield per subregion. Conversely, in the other extreme case, where all heterogeneity occurs within subregions and none among subregions, stratification is not beneficial.

In stratified samples, the overall mean is estimated by a weighted mean of the subregion means, with the growing area per subregion used as a weight. To see that stratification is beneficial, again consider the extreme case where there is no heterogeneity within subregions, but considerable heterogeneity between subregions. The weighted mean based on a stratified sample, with growing areas used as weights, will have a variance of zero, while the simple mean based on an unstratified sample will have a variance depending on the heterogeneity between subregions. The only additional prerequisite for a stratified estimate of overall performance is that the growing areas per subregion must be available.

Estimation of Local Adaptation
Local adaptation in a subregion is usually assessed by analyzing data only from the targeted subregion (Talbot, 1997). It is often true, however, that some of the neighboring subregions are agroecologically similar to the subregion of interest. Thus, yield data from neighboring subregions may be exploited to improve yield estimates for the subregion of interest. A natural approach is to compute a weighted mean of mean yields in the targeted subregion and the neighboring subregions, with weights depending on the similarity between subregions and the number of trials per subregion. In fact, with a weighting approach, one could use data from all subregions for estimation in the targeted subregion, providing the availability of weights that are optimal or near-optimal in terms of the error of prediction for the targeted subregion. We will show, subsequently, that best linear unbiased prediction (BLUP; Searle et al., 1992) is the method of choice for this task.

Mixed Model for Subdivided Target Regions
Our basic mixed model is

[1]
where µr = expected value in the rth region, gir = random genotypic value of ith genotype in rth subregion and eirjk = random environmental deviation of the ith genotype in the kth year and in the jth location within the rth subregion. The environmental deviation will be modeled by a standard partition of the form

[2]
where Yk = main effect of kth year, L(S)j(r) = effect of jth location nested within rth subregion, (SY)rk = rkth subregion x year interaction, LY(S)jk(r) = jkth location x year interaction nested within rth subregion, (GY)ik = ikth genotype x year interaction, GL(S)ij(r) = ijth genotype x location interaction nested within rth subregion, (GSY)irk = irkth genotype x subregion x year interaction, GLY(S)ijk(r) = ijkth genotype x location x year interaction nested within rth subregion (includes error of a treatment mean). All effects appearing in eirjk are assumed to be independent homoscedastic normal deviates with zero mean.

The genetic effect may be further partitioned as

[3]
where Gi is a main effect for the ith genotype and (GS)ir is the irth genotype x subregion interaction, assuming that Gi and (GS)ir are independent homoscedastic normal deviates. This model implies that the variance-covariance model for gi = (gi1, gi2, ..., gim)', where m is the number of subregions, has the compound symmetry structure, i.e.,

[4]
where Jm is an m x m matrix of ones everywhere, Im is an m-dimensional identity matrix, and {sigma}2G and {sigma}2GS are the variances of Gi and (GS)ir, respectively. Under the compound symmetry model, genetic variances are the same in each subregion, and genetic covariances (and correlations) are the same for each pair of subregions. While this assumption may be useful in simple settings, more diverse settings call for more refined modeling. Specifically, some pairs of subregions may be more alike than others, requiring heterogeneity of covariances to be allowed for. Also, there may be heterogeneity of genetic variance between subregions. Many extensions of the ANOVA-type model, Eq. [3], have been proposed, which can be used for modeling {Sigma}g (Piepho, 1998, 1999; Smith et al., 2001). Here, we will confine attention to the compound symmetry model in order to facilitate comparison to other methods. It is stressed, however, that quite frequently more complex variance–covariance structures are needed.

Analysis of regional yield trials should be based on replicate data, using the model described above. This allows exploiting regional subdivision for estimation of both local and global adaptation in an optimal way. Following this approach, estimates of gir may be obtained by BLUP using standard procedures (Searle et al., 1992). Some sample code for SAS is given in Appendix A. This single-step analysis may be contrasted to a two-step analysis, in which genotypic means per subregion are estimated in the first step. These means are then subjected to a mixed model analysis to obtain BLUPs of gir in the second step. In balanced settings, both procedures yield identical results, while with unbalanced data, results differ and the REML-based single-step analysis is to be preferred. To study the properties of the BLUP procedure, it is more convenient to use the two-step approach and restrict attention to the balanced set-up, and this will be done subsequently.

Implied Model for Subregion Means
For demonstration purposes, we will assume here that the series of trials in each subregion is balanced in the following sense: On the basis of the mixed model described in the preceding section, taking genotypic effects gir fixed and environmental effects random, the means of all cultivars in a subregion are homoscedastic, i.e., they all have same variance. In addition, the means for all pairs of cultivars have the same covariance. This assumption is made here mainly to simplify the comparison of different estimators and to gain some insight into their statistical properties.

Let yir denote the mean for the ith genotype in the rth subregion. We can assume the model

[5]
where eir is the error associated with yir. We find, conditioning on the genetic effects, that

[6]
where yi = (yi1, yi2, ..., yim)', µ = (µ1, µ2, ..., µm)', gi = (gi1, gi2, ..., gim)' and ei = (ei1, ei2, ..., eim)'. Now taking genetic effects random, we have E(gi) = 0, var(gi) = {Sigma}g, and cov(gi, ei) = cov(gi, ei') = 0. For {Sigma}g, one may assume the compound symmetry structure, Eq. [4], or some more general model.

For illustration, consider the special case that the design is completely balanced, i.e., in each of the m subregions the n genotypes are tested in l locations and y years. In this case, {Sigma}e1 has diagonal elements {sigma}2GY/y + {sigma}2GSY/y + {sigma}2GSL/l + {sigma}2GSLY/ and off-diagonal elements {sigma}2GY/y, while {Sigma}e2 has diagonal elements {sigma}2Y/y + {sigma}2SY/y + {sigma}2SL/l + {sigma}2SLY/ and off-diagonal elements {sigma}2Y/y. It should be stressed, however, that Eq. [6] is more broadly applicable, e.g., when the number of locations differs among years or among subregions or both, when only some locations are used for several years, while others are exchanged every year, or when the number of years is not the same for each subregion. Under the assumed model in Eq. [1] and [2], the only prerequisite for the validity of Eq. [6] is that the same set of n genotypes is tested in each trial.

Estimation—Local and Global BLUP
On the basis of regional subdivision, the genotypic value in the target region can be expressed as

[7]
where ar (r = 1, ..., m) are the relative growing areas in the m subregions (expressed as proportions) and a = (a1, a2, ..., am)'. We wish to either estimate gi (for assessing global adaptation) or gir (for assessing local adaptation in the rth subregion). The estimable function of interest is of the general form

[8]
with v = a for gi (global adaptation) and v = ur for gir (local adaptation), where ur is a unit vector with rth element equal to unity and zeros elsewhere. For example, when there are five subregions, and an estimator is needed for the second region, we set v = ur = (0, 1, 0, 0, 0)'. We consider estimators of Li, which are of the form

[9]
where w are suitably chosen weights and the tilda indicates an estimator. Our preferred estimator is

[10]

[11]

[12]
because it involves an optimal weighting scheme, as detailed below. Estimator [10] is a special case of [9] with weights w' = v'W. These weights are optimal and follow from BLUP theory (Searle et al., 1992). The estimator [11] is essentially the BLUP of genetic effects gi (see Appendix B). When v = ur, we will refer to Eq. [10] as local BLUP, while when v = a, we refer to Eq. [10] as global BLUP.

In practice, variance components in W are unknown and need to be estimated. Plugging in estimates for parameters yields empirical BLUP with added uncertainty because of estimated weights. Providing parameters are estimated by REML using, e.g., a Newton-Raphson or Fisher scoring algorithm, uncertainty can be accounted for when computing approximate standard errors of BLUPs using the asymptotic dispersion matrix (Kackar and Harville, 1984). Generally, when a REML-based mixed model package such as MIXED is employed, the user need not worry about computation of weights W: these will be computed automatically for the BLUP of gi on the basis of the fitted variance–covariance model. In the case of global adaptation, Li = v'gi can be estimated from the BLUP for gi, which may require additional computation following the run of the mixed model routine. Note that the weighting matrix W is analogous to the broad-sense heritability in the case of an undivided target region. Generally, to estimate gir for a particular subregion, information on the ith genotype is used from all subregions. The extent to which the information from neighboring subregions is exploited is determined by W and depends on the heritabilities in the neighboring subregions and on the genetic and environmental correlations with the subregion of interest, as determined by the structures of {Sigma}g and {Sigma}e1, respectively.

Variance-Bias Trade-Off
The BLUP of gi is unbiased in the sense that, across all genotypes, BLUP has the same expected value as gi itself, i.e., E(gi) = E[BLUP(gi)] = 0. Clearly, this expectation is unconditional (Searle et al., 1992, p. 269). By contrast, there is a bias conditional on the genotype, i.e., E(BLUP(gi)|gi) != gi. The bias may increase when incorporating data from neighboring subregions. This may be counterbalanced, however, by the reduced estimation variance because of the use of more data. Thus, it is usually beneficial to exploit data from neighboring subregions.

The purpose of this section is to shed more light on our proposed procedure, explaining how the gain of efficiency comes about. The variance-bias trade-off can be most conveniently studied by considering pairwise differences among genotypic effect estimates. Note that the ranking of genotypes is fully determined by the set of all pairwise differences. The difference of two genotypes i and i' is given by

[13]

For simplicity, we will drop the indices on {delta}, i.e., we set {delta} {equiv} {delta}ii'. The difference {delta} is estimated by BLUP according to

[14]
This estimator is biased, since for given genotypes i and i'

[15]
where w' = v'W. Thus, exploiting information from neighboring subregions introduces a bias. This may be more than offset, however, by a reduction in the estimation variance.

The common criterion for combining bias and variance is the mean squared error of prediction (MSEP), which in the case at hand is

[16]
where

[17]
and

[18]

The subscripted g indicates that expectations are with respect to genotype pairs. It is seen from Eq. [16] that the MSEP depends on both bias and variance. The weights W in w' = v'W are chosen so as to minimize the MSEP. Thus, the optimal weights strike the best balance between bias and variance. Specifically, absence of bias is not a requirement: some bias can be tolerated, providing the MSEP is smaller than for an unbiased estimator. Not only does BLUP minimize the MSEP, but it also maximizes the response to selection in variance component models (Searle et al., 1992; but see Portnoy, 1982).

Gain from Selection
We consider response to selection based on the estimator i. In the special case that w' = v'W, where Li = vgi, this is our proposed optimal estimator opti. For generality, we regard i as an indirect trait and formulate the selection response as a correlated response to selection (Falconer and Mackay, 2001; Atlin et al., 2000). Selection for a direct trait is included as a special case with genetic correlation equal to unity. We have

[19]

[20]

[21]
The genetic correlation between Li and i is

[22]
The heritability of i equals

[23]
The correlated response to selection is

[24]
where i is the selection intensity (Falconer and Mackay, 2001). For evaluation of different methods it is convenient to consider the ratio of R-values, since the selection intensity as well as {surd}[var(Li)] cancel out. Note that all results in this section are rather general in that they do not require a special structure for {Sigma}g or {Sigma}e1, such as compound symmetric.

Variance Component Estimates
To illustrate our procedure, we use published variance component estimates for three multienvironment trials with wheat (Triticum aestivum L.), which are reproduced in Table 1. These estimates were also used by Atlin et al. (2000) to demonstrate their approach for estimation in subdivided target regions. We used the variance components in Table 1 to assign values to the variance components in our mixed model (Eq. [1], [2], and [3]) (Table 2). Following Atlin et al. (2000), the number of replications per trial was three, the total number of locations in the trial network was set equal to 12, and locations were evenly split among subregions. For simplicity, the variance of (GSY)irk was set to zero. Since our model is based on trial means, we set the variance of the residual effect GLY(S)ijk(r) equal to 2GLY + 2E/3, with variance components 2GYL and 2E taken from Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1. Variance component estimates for three multienvironment trials (reproduced from Atlin et al., 2000), expressed as proportion of the phenotypic variance 2P = 2G + 2GL + 2GY + 2GYL + 2E.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Variance components for mixed model given by Eq. [1], [2], and [3] as derived from estimates in Table 1. p is the proportion of 2GL assigned to {sigma}2GS.

 
The compound symmetry structure was used for {Sigma}g (Eq. [4]). Assuming an orthogonal design per subregion, the diagonal elements of {Sigma}e1 were equated to {sigma}2GY/y + {sigma}2GSY/y + {sigma}2GSL/l + {sigma}2GSLY/, where l is the number of locations per subregion and y is the number of years. The off-diagonal elements were set to {sigma}2GY/y.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Estimation for Global Adaptation
To study the gain from stratification, we assumed that the target can be subdivided into subregions of equal size, i.e., the relative areas are ar = 1/m. Estimation of the overall mean (gi) in the target region by global BLUP was performed by setting v = a = (a1, a2)' and w = v'W. The response to selection based on our method is denoted as R1. For comparison, we computed the response to selection assuming that stratification is ignored and locations are a fully random sample from the target region (R0). Ratios R1/R0 are reported in Table 3. The results show that one can only win by stratification and that the gain is largest when the genotype x subregion interaction variance is large relative to the variance of the genetic main effect. In the most favorable case reported in Table 3, stratification results in a 20% improvement in the response to selection.


View this table:
[in this window]
[in a new window]
 
Table 3. Selection for global adaptation. Ratio R1/R0, where R0 = response to selection for gi ignoring stratification and R1 = response to selection for gi using global BLUP. Analysis based on three replications per trial. p is the proportion of 2GL assigned to {sigma}2GS (see Table 2).

 
We studied the effect of unequal subregion areas ar assuming that the target region is subdivided into two subregions with a1 = q and a2 = 1 – q, where q takes on the values 0.3, 0.1, and 0.01. Estimation of the overall mean (gi) in the target region by global BLUP was implemented by setting v = a = (a1, a2)' and w = v'W. The associated response to selection is denoted as R1. For comparison, the response to selection based on the genotypic main effect Gi using the mixed model in Eq. [1], [2], and [3] was computed as proposed by Atlin et al. (2000). This is denoted as R2. The method corresponds to selection using a simple mean of yields in the two subregions, i.e., w = (0.5, 0.5)'. It should be stressed that both estimators exploit stratification of the target region. Differences in performance are therefore solely due to the contrasting weighting schemes. Of course, when the relative areas are the same (q = 0.5), both methods yield identical results. The ratio R1/R2 is reported in Table 4. The more substantial the genotype x location interaction in the target region and the less equal the relative growing areas, the more pronounced is the gain from accounting for unequal subregion areas by global BLUP.


View this table:
[in this window]
[in a new window]
 
Table 4. Selection for global adaptation. Ratio R1/R2, where R1 = response to selection for gi using global BLUP and R2 = response to selection for genotypic main effect Gi as proposed by Atlin et al. (2000). Analysis assumes l = 6 locations per subregion, s = 2 subregions, and three replications per trial. Relative areas a1 = q and a2 = 1 – q. p is the proportion of 2GL assigned to {sigma}2GS (see Table 2).

 
Estimation for Local Adaptation
Local BLUPs of the subregion mean were obtained by setting v = ur and w = v'W. The response to selection based on local BLUP is denoted as R3. For comparison, selection using the subregion mean was implemented by setting w = v = ur. The response to selection by this method is denoted as R4. The response to selection based on the unweighted global mean using the mixed model in Eq. [1], [2], and [3] was computed as proposed by Atlin et al. (2000). This is denoted as R2. The ratios R2/R4 and R3/R4 are reported in Table 5.


View this table:
[in this window]
[in a new window]
 
Table 5. Selection for local adaptation. Ratios R2/R4 and R3/R4, where R2 = response to selection for genotypic main effect Gi as proposed by Atlin et al. (2000), R3 = response to selection for gir using local BLUP, and R4 = response to selection based on subregion mean alone. Analysis based on three replications per trial. p is the proportion of 2GL assigned to {sigma}2GS (see Table 2).

 
The most important result is that local BLUP always does better than selection based on a subregion mean or than selection based on the global mean as proposed by Atlin et al. (2000). In some cases, the differences are quite marked; in others, differences are minor. When the genotype x subregion interaction is substantial (spring wheat in Australia), the global mean performs poorly, while local BLUP is slightly better than the subregion mean. When the genotype x subregion interaction is small (winter wheat in eastern Canada), the global mean almost always outperforms the subregion mean, but is itself outperformed by local BLUP.

Table 6 shows relative weights for the targeted subregion and the other subregions. The latter are equal for all subregions because of balancedness of the design and the compound symmetry model for {Sigma}g. The larger the number of years and locations, the smaller will be the magnitude of {Sigma}e1, thus increasing the weight for the targeted subregion relative to the other subregions. In the limit as {Sigma}e1 tends to zero, all weight will be on the target subregion, no matter what is the assumed structure for {Sigma}g. Also, the smaller the genetic correlation, i.e., the larger the diagonal elements of {Sigma}g in relation to the off-diagonal elements, the higher the weights for the target subregion. When the genetic correlation is very low, as in the Australian wheat data, negative weights may occur for nontarget subregions. This is a result of the genotype x year interaction component, which introduces a positive environmental covariance in {Sigma}e1. The negative weights are usually small in absolute value, and virtually all the weight lies on the target subregion.


View this table:
[in this window]
[in a new window]
 
Table 6. Selection for local adaptation. Standardized weights w'/(w'1) for local BLUP of gir. Analysis based on three replications per trial. p is the proportion of 2GL assigned to {sigma}2GS (see Table 2).

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Plant breeders and extension service personnel frequently consider a subdivision of a target region into smaller subregions, which in themselves are more homogeneous than the overall target region. Subdivision does not necessarily imply, however, that data from one subregion are not informative regarding another targeted subregion. In this paper, we have described a weighting scheme, based on standard mixed model procedures (BLUP), which allows information from different subregions to be efficiently combined. The weighting approach has been shown to be beneficial for two different objectives, the one being estimation of performance in a specific subregion, while the other is estimation of the global mean in the target region. For estimating the mean in a targeted subregion, local BLUP combines yield data from all subregions in an optimal way. If there is little information in the neighboring subregions because of large genotype x subregion interaction, the neighbors will be assigned small weights. Conversely, if the information content is high, weights assigned to neighbors will be relatively high. The BLUP procedure will yield the optimal weights from the variance components in a quasi-automatic fashion. For estimating the global mean, accuracy can be improved by stratification into subregions and using growing areas per subregion as weights to combine local BLUPs for subregions into a global BLUP.

Curnow (1988) and Atlin et al. (2000) have made a very important contribution in demonstrating that information from neighboring subregions may be informative for a targeted subregion. Atlin et al. (2000) considered two alternatives, i.e., use all data to estimate the global mean, giving equal weight to all subregions, or use only the data from the targeted subregion. They showed that the local mean might be outperformed by the global mean if genetic correlations are high between subregions. In these cases, they propose not to subdivide the target region. Our approach obviates the need to choose between these two simple alternatives. The method always uses all data, giving different weight to the targeted subregion and its neighbors, with weights depending on the objective of estimation (local or global adaptation) and on the information provided by each subregion. Also, as opposed to the procedures considered by Curnow (1988) and Atlin et al. (2000), our method will always benefit from a subdivision of the target, providing there is heterogeneity among subregions and variance components are known.

Under the assumed model and providing known variance components, our weighted estimator will be optimal in the sense that it minimizes the MSEP and maximizes the expected response to selection. Specifically, local BLUP will give the best estimate of gir and global BLUP will give the best estimate of gi. If sample estimates are plugged in (empirical BLUP), some loss of accuracy will result, and optimality can no longer be guaranteed, though near-optimality is usually achieved. It may be a worthwhile strategy to use long-term data to obtain more accurate variance component estimates. This gain in accuracy will need to be balanced, however, against a potential long-term shift in variance components due to breeding progress as well as advances in agronomic practices. It would be worthwhile to conduct a simulation study on the effect of estimation error in the variance parameters on the response to selection. This will be the subject of future research.

When considering the relative merits of local and global BLUP, it is useful to distinguish two different objectives: (i) geographical placement of cultivars and (ii) selection of entries in a breeding program. Local BLUP will maximize the gain from selection for local adaptation. Thus, placement of cultivars should be based on local BLUP, if a meaningful subdivision of the target region is available and data permit reliable estimates of all variance components. By contrast, it is not so straightforward to decide, whether local or global BLUP is preferable for breeding purposes. While selecting for global adaptation may reduce the selection gain in a particular subregion, it does have the advantage of addressing a larger growing area. Therefore, breeding for local adaptation will be worthwhile only if the superiority of local response to selection more than compensates the loss from a reduction in growing area where the cultivar can be successfully marketed.

We have shown that global BLUP outperforms the unweighted mean across subregions, when areas per subregion are unequal. Assuming compound symmetry and balanced data, the genetic correlation between the unweighted mean and the true mean effect in the target (gi) is

[25]
This will equal unity only when all subregions are of equal size, so that a'a = m–1. The more heterogeneous the growing areas, the smaller the genetic correlation. This shows why, for unequally sized subregions, the unweighted analysis is suboptimal.

Our conceptual framework assumes that there is a larger target region, which may be subdivided into agroecologically distinct subregions. The demarcation between target regions (broad, global) and subregions (narrow, local) will depend on the type and scale of breeding application. For example, some breeders are part of a multinational (company or IARC) effort, whereas others are part of a national, state or provincial level program. What is a subregion to one is a broad region to another. Thus, the breeder will have to decide on a clear definition of the target region and subregions. In so doing, one will need to balance the geographic and climatological context against the organizational context. For a given latitudinal zone, there may be a very high degree of similarity between distant locations, but across longitudinal or elevational distances, closer subregions may rapidly become quite dissimilar. Subdivision will be most useful if homogeneity within subregions is maximized, i.e., genotype x location interaction within subregions is minimized, while genotype x subregion interaction is maximized. In this case, data from the targeted subregion are very informative, while information from neighboring subregions will be relatively small. Conversely, if subdivision mainly follows administrative boundaries, the subregions may not be very distinct agroecologically, thus increasing the information content from neighboring subregions. Clearly, agroecological factors are usually better criteria for subdivision than administrative boundaries. In either case, optimal weighting of information from targeted and neighboring subregions is crucial, and this may be achieved in a convenient way by BLUP.

Performance of our methodology critically depends on good estimates of the variance–covariance structure for genetic effects. Efficiency will be compromised if subdivision leads to subregions with no more than one or two test locations, especially when heterogeneity of covariance needs to be accounted for. Thus, it is a good strategy to have a larger number of test locations per subregion. The optimal number of locations per subregion will depend on a number of factors, such as the magnitude of and relationship among of variance components, the degree of agroecological differentiation between subregions, the objective of estimation (global or local adaptation), the number of years available for analysis, the design used with individual trials, etc. These factors are best studied by comprehensive simulation, and this will be the subject of future work.

Mixed model analysis of multienvironment trials requires random locations to allow broad inferences with respect to the target region. The requirement of a truly random sample of locations from the whole target may not always be easy to satisfy in practice, particularly when locations are selected to be representative. The notion of representativeness of certain locations usually implies a stratification of the target region. Thus, instead of selecting representative locations, one may subdivide the target region into (representative) subregions and then take truly random samples of locations per subregion. Breeders may be more comfortable with this type of restricted random sampling from subregions than a fully random sample of locations from the whole target region.

Our approach treats genotypes as a random factor. Many trial systems are set up to test elite genotypes, which have undergone several cycles of selection at the later stages of a breeding program or released cultivars. This raises the question concerning the population of entries to which results should apply. One possible view is that the population is defined by the potential set of entries that could have been obtained by the same breeding programs that generated the entries under consideration. The entries in the trials can be considered as a random sample of this hypothetical population. Clearly, the entries are not a random sample from the genotypes available at the beginning of the breeding process. Under a random genotypes model, it is also possible to incorporate pedigree information, using a model allowing for genetic correlation among related genotypes (Piepho and Pillen, 2004). Such models may be useful for multi-environment testing in complex breeding programs. Generally, estimates of genetic effects are typically more efficient under a random genotypes model than under a fixed effects model, providing the genetic variance components can be accurately estimated (Piepho, 1998; Smith et al., 2001).

It is worth mentioning that Atlin et al. (2000) do not explicitly use BLUP, and their development is basically in terms of simple genotype means, although they regard genotypes as random. Under their assumed model and balanced data setting, simple means and BLUPs will be perfectly linearly related, so the selection decision will be the same with either estimator. With unbalanced data and other variance–covariance structures, however, the two estimators will differ, and BLUP will typically be more efficient than simple means in such circumstances (Piepho, 1998).

Our BLUP procedure has two salient statistical features: shrinkage and weighting. For balanced data, shrinkage toward the mean is the same for each genotype, so shrinkage will not affect ranking compared with alternative estimators such as the mean per subregion. Thus, the differences in performance we found in comparison to other estimators were not due to shrinkage. The differences were solely due to the weighting scheme. With unbalanced data, shrinkage will differ among genotypes and so may affect ranking.

In this paper, we mainly focused on balanced data per subregion because this facilitated studying the statistical properties of our procedure. For the same reason, we considered a two-step approach, in which genotypic means per subregion are estimated in the first step and are then submitted to a BLUP procedure in the second step. In practice, a single-step procedure is more convenient and, in fact, easier to implement for routine use. Moreover, data will often be unbalanced. A single-step analysis of possibly unbalanced data is straightforward when a good mixed model package is used. All that needs to be done is to fit the mixed model outlined in Eq. [1] and [2] and let the package compute the BLUPs of genetic effects and linear functions thereof, depending on the choice of the vector v (see Appendix A).

In the example, we have used the compound symmetry structure in Eq. [4] to model the genetic variance-covariance structure {Sigma}g. The model was used to facilitate comparison with the method of Atlin et al. (2000), which is based on this same structure. It should be stressed, however, that the compound symmetry model is rather restrictive in that variances are assumed to be the same for all subregions and that covariances are the same for each pair of subregions. It is now relatively easy to fit more general structures including heteroscedastic and factor-analytic, and experience shows that such models often fit considerably better than compound symmetry (Piepho, 1998; Smith et al., 2001). When fitting more complex models, one needs to balance increased realism against the need to estimate more parameters. In the extreme case of an unstructured model, {Sigma}g has m(m+1)/2 parameters, where m is the number of subregions, and sample size may not even permit fitting of such complex models. According to the principle of parsimony, it may be preferable to fit simpler models, particularly when the sample size is limited, and there are established procedures for striking the balance between overly simplistic and overly complex models (Piepho, 1999).

Departure from the compound symmetry structure implies heterogeneity in genetic correlations, which will affect the weights W in the BLUP equation. Specifically, for estimation of gir in the targeted subregion, the weight of neighboring subregions increases with the genetic correlation, while subregions showing low genetic correlation with the targeted subregion are down-weighted. The dependence of weights on genetic correlations with neighboring subregions, especially in models with heterogeneity in the correlations, is both intuitively appealing and statistically desirable. It is our experience that the proposed method demonstrates its full power when models departing from compound symmetry fit the data well. This is likely to occur when heterogeneity among subregions is pronounced. A detailed account will be published elsewhere.

While modeling of {Sigma}g is certainly the most crucial model selection step, other model components require attention as well. For example, heterogeneity of variance components pertaining to eijkr in Eq. [1] may be expected between subregions. Also, one may model replicate data instead of trial means. This opens several options for more refined modeling at the trial level, depending on experimental design, including fitting of incomplete block effects and spatial modeling (Gilmour et al., 1997; Smith et al., 2001). The optimal properties of BLUP and the near-optimality of empirical BLUP are retained with these more general and more flexible models.

In conclusion, it should be emphasized that our approach can only improve on the prediction of genotype x region interaction, while interactions of genotype x year and genotype x year x location essentially remain unpredictable for any farm in the target region. Unfortunately, these latter effects often dominate total variation. Thus, while the method proposed in this paper is a small step forward, the problem of unpredictable interactions related to years largely continues unabated.


    APPENDIX A
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
We present sample code in SAS for single-step local BLUP. Factors are assumed to be coded as follows: G = genotype, S = subregion, L = location, Y = year. The MIXED code for the response variable YIELD and the compound symmetry structure for {Sigma}g is as follows:

proc mixed;

class G S L Y;

model YIELD = S;

random Y S*L S*Y S*L*Y G*Y G*S*L G*S*Y;

random S/sub = G type = CS solution;

run;

Instead of the compound symmetry structure, other models such as factor-analytic of heteroscedasticity can be used for {Sigma}g by appropriately modifying the type = option in the second "random" statement (Piepho, 1999).

To reduce computation time, one may take all effects not crossed with genotypes as fixed. This will yield identical results for balanced data. With unbalanced data, this analysis will not make use of intertrial information. Since the intertrial information is often small (Patterson, 1997), sacrifice in efficiency will usually be marginal. The reduction in computing time results from taking genotypes as the "subject" for all effects (Littell et al., 1996).

proc mixed;

class G S L Y;

model YIELD = S Y S*L S*Y S*L*Y;

random Y S*L S*Y/subject = G;

random S/sub = G type = CS solution;

run;

If a genotype is missing in a particular subregion, a BLUP can still be computed. To do so, the input dataset needs to have at least one record for the subregion x genotype combination in question, coding the missing observation as a dot.


    APPENDIX B
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 
Let y' = and g' = , where n is the number of genotypes. For balanced data (see Eq. [6]) we find

where In is an n-dimensional identity matrix, Jn is an n x n matrix of ones everywhere, and {otimes} denotes the Kronecker or direct product operator (Searle et al., 1992). The best linear unbiased predictor of g is (Searle et al. (1992), p. 269)

where

The selection decision will depend on ranking of genotypes, which in turn is fully determined by all pairwise differences among genotypes. Any estimator, which always yields the same pairwise differences as BLUP, can be considered essentially equivalent to BLUP with regard to the resulting selection decision. An estimate of the pairwise difference for an estimable function of interest can be obtained by multiplication of BLUP(g) with a contrast vector c {otimes} v, where c is an n-dimensional pairwise contrast vector with ci = 1 and ci' = –1 for the two genotypes to be compared and zeros elsewhere, while v is a coefficient vector for the estimable function of interest. We find

It can be shown that

This follows from the easily established fact that

where A and B are m x m matrices. The proof is as follows:

Thus, we find

with W = {Sigma}g({Sigma}g + {Sigma}e1)–1. The key step in this derivation is to note that c'Jn = 0n, where 0n is a null vector. It follows that, for selection purposes, it is sufficient to compute

This is seen to be the estimator in [11].


    ACKNOWLEDGMENTS
 
We would like to thank the referees, whose comments helped improve the paper.

Received for publication July 2, 2004.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 REFERENCES
 


Related articles in Crop Science:

THIS ISSUE IN CROP SCIENCE

Crop Science 2005 45: xiii. [Full Text]  



This article has been cited by other articles:


Home page
Crop Sci.Home page
H. G. Gauch Jr., H.-P. Piepho, and P. Annicchiarico
Statistical Analysis of Yield Trials by AMMI and GGE: Further Considerations
Crop Sci., May 1, 2008; 48(3): 866 - 889.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
S. Ceretta and F. van Eeuwijk
Grain Yield Variation in Malting Barley Cultivars in Uruguay and Its Consequences for the Design of a Trials Network
Crop Sci., January 16, 2008; 48(1): 167 - 180.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
J. B. Holland
Estimating Genotypic Correlations and Their Standard Errors Using Multivariate Restricted Maximum Likelihood Estimation with SAS Proc MIXED
Crop Sci., February 1, 2006; 46(2): 642 - 654.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
H.-P. Piepho and J. Mohring
Selection in Cultivar Trials--Is It Ignorable?
Crop Sci., December 2, 2005; 46(1): 192 - 201.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free