Published online 2 December 2005
Published in Crop Sci 46:192-201 (2006)
© 2005 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
CROP BREEDING, GENETICS & CYTOLOGY
Selection in Cultivar TrialsIs It Ignorable?
Hans-Peter Piepho* and
Jens Möhring
Bioinformatics Unit, Institut für Pflanzenbau und Grünland, Universität Hohenheim, 70599 Stuttgart, Germany
* Corresponding author (piepho{at}uni-hohenheim.de)
 |
ABSTRACT
|
|---|
Crop cultivar registration requires multienvironment trials for assessing the value for cultivation and use (VCU). The series of trials usually extends across 3 yr, with some cultivars being discarded each year. Selection gives rise to a missing-data or drop-out pattern that is not completely random. The present paper studies the effect of drop-out on the validity of mixed model procedures such as REML and BLUP. It is shown on the basis of the pertinent statistical theory and simulations that selection is ignorable providing that all data used in the selection process are included in the analysis. Simulations show that REML is preferable to ML and BLUP is preferable to BLUE. It is suggested that cultivar registration authorities can benefit from multivariate mixed model analyses comprising all traits on which selection is based.
Abbreviations: BLUE, best linear unbiased estimation BLUP, best linear unbiased prediction MAR, missing at random MCAR, missing completely at random ML, maximum likelihood MNAR, missing not at random MSE, mean squared error REML, restricted (residual) maximum likelihood VCU, value for cultivation and use
 |
INTRODUCTION
|
|---|
IN MANY COUNTRIES, before a new crop cultivar is released to the market, government authorities usually require it first be evaluated in official registration trials to assess its value for cultivation and use (VCU). Usually, the series of VCU trials extends over 2 or 3 yr and many locations. For example, the authority responsible for cultivar registration in Germany, the Bundessortenamt (Hannover), tests new wheat (Triticum spp.) cultivars at 15 locations in the first and second year and at 32 locations in the third year. Typically, the series starts out with about 100 cultivars in the first year. Of these, about 50% are selected for trials in the second year. Similarly, after the second year, about 50% of the cultivars are selected for testing in the third year. Finally, about 50% of the cultivars tested in the third year are approved for official registration. The result of this selection procedure is a highly unbalanced dataset with a large fraction of empty cells in the year x location x cultivar classification. The structure of early generation plant breeding trials is similar in that a certain proportion of entries is discarded each year. Unbalanced cultivar trial data are commonly analyzed by mixed model procedures such as restricted maximum likelihood (REML) (Patterson, 1997), which allow missing observations to be handled in a straightforward manner. It is important to realize that these procedures require the missing observations to be missing at random (MAR) in the terminology coined by Little and Rubin (1987, 2002). The main reason for worrying about the pattern of data drop-out is the potential bias of variance component estimates and derived quantities (heritability, stability measures). Selection decisions at each stage will be unaffected provided that all cultivars that have been selected up to the current stage were tested by exactly the same experimental design both within and across the series of trials and that simple variance component models are used. This will not generally be the case in practice, e.g., in the presence of missing data, when the design involves incomplete blocking or when geostatistical methods are used for individual trials, and with heteroscedastic models for genotype x environment interaction. In all of these instances, good variance component estimates are needed for estimating cultivar effects, either by best linear unbiased estimation (BLUE), which assumes fixed cultivar effects, or by best linear unbiased prediction (BLUP), which assumes random cultivar effects (Searle et al., 1992). Poor variance component estimates can adversely affect these cultivar effect estimates, possibly causing a change in selection decision. For these reasons it is useful to know under which circumstances the drop-out mechanism should not be ignored.
There is an extensive literature on nonrandom dropout and other missing-data patterns in the context of longitudinal data, mainly focusing on medical applications (Verbeke and Mohlenberghs, 2000). This literature does not seem to have received much attention by plant breeders. One probable reason for this is that the data structures of large cultivar evaluation trials and longitudinal data from medical trials differ in many respects, perhaps the most important one being that for the latter the data can be grouped according to subjects or clusters, which are stochastically independent among themselves (Verbeke and Mohlenberghs, 2000), while with cultivar trials, there is effectively only a single subject if analysis is based on a mixed model with random years and locations, as is common practice. There is also a considerable literature on the validity of mixed model procedures in animal breeding under selection, though somewhat surprisingly most of this literature does not refer to the work of Rubin (1976). On the basis of that work, Im et al. (1989) concluded that the selection process in animal breeding is ignorable when estimation is done by likelihood-based methods, provided that all data employed in making the selection decision are used. This theoretical claim has been confirmed by several simulation studies (Van Tassel et al., 1995; Cantet et al., 2000; Duangjinda et al., 2001; Schenkel et al., 2002). Such evidence appears to be scarce for selection designs used in plant breeding.
The emphasis on ML procedures in Little and Rubin (1987) appears to have led to a preference for ML over REML procedures in medical studies with nonrandom dropout in some fields of application outside of agriculture (Verbeke and Mohlenberghs, 2000). This is in contrast to the preference of REML due to better properties such as smaller bias of variance component estimates in presence of many fixed effects (Diggle and Kenward, 1994; Searle et al., 1992). When taking recourse to the theory of Little and Rubin, it is therefore relevant to know which of the methods is preferable with missing-data patterns commonly found in plant breeding applications. We therefore compare ML and REML methods in plant breeding designs. The notion of ignorability implies that the missing data mechanism does not need to be modeled in deriving the likelihood, which is the same as would apply if the data were missing completely at random (MCAR) (Little and Rubin, 1987). The method of maximum likelihood is known to have optimal asymptotic properties in large samples. It is not obvious, however, to what extent estimators perform similarly in small samples under MAR and MCAR missing-data processes. Thus, it is useful to study the small-sample behavior of common estimation methods under different missing-data patterns. The purpose of the present paper is to explore missing-data patterns common in cultivar testing and their implications for inference on cultivar effects and variance components using likelihood-based procedures. Several settings are investigated by simulation. We compare the fixed versus random cultivars assumption (BLUE vs. BLUP) and REML vs. ML estimation. Our main focus here is on cultivar registration trials for assessing VCU. Results are equally relevant for early-generation selection in plant breeding programs. The methods are illustrated by a large multienvironment trial with oilseed rape (Brassica napus L.).
 |
MATERIAL AND METHODS
|
|---|
The Model
The model described in this section is used for studying different missing-data patterns by theoretical results as well as simulation. We assume for simplicity that a two-stage analysis is performed, in which the first step is to estimate cultivar means for each trial independently. The second step is to subject the resulting cultivar means from each environment to a mixed model analysis across environments. A two-stage analysis has the virtue of computational ease at the cost of some loss of efficiency compared with a fully efficient single-step analysis (Smith et al., 2001). Our assumed model for means is
 | [1] |
where yijk is the mean yield of the ith cultivar in the jth location and the kth year (i = 1, ..., I; j = 1, ..., J; k = 1, ..., K), µ is the overall mean, Gi is the main effect of ith cultivar, Lj is the main effect of jth location, Yk is the main effect of kth year, (GL)ij is the ijth genotype x location interaction, (GY)ik is the ikth genotype x year interaction, (LY)jk is the jkth location x year interaction, and (GLY)ijk is the ijkth genotype x location x year interaction (including the error of a treatment mean). Depending on the objective of the analysis and on the sampling design, several effects in this model may be random. For simplicity, we assume that all random effects are independent homoscedastic normal deviates with zero mean. Classically, in cultivar evaluation trials, cultivars are regarded as fixed factor, while years and locations are random factors (Patterson, 1997). With this assumption, all effects except Gi are random. Estimation of this model by standard procedures such as REML or ML is straightforward in principle, and computing time is relatively moderate if an efficient algorithm such as the average information algorithm (Gilmour et al., 1995, 1999) is used. When a package with special features for repeated measures or clustered data is available, further savings in computing time are possible by taking all environmental effects not involving cultivars as fixed, i.e., the effects Lj, Yk, and (LY)jk (Piepho and Möhring, 2005). The resulting analysis is analogous to that for an incomplete block design, in which the inter-block information is not recovered (Patterson, 1997). By analogy, the proposed analysis does not exploit interenvironment information. This information is usually very small because the variances associated with environmental effects Lj, Yk, and (LY)jk are large. When these effects are taken as fixed, one may regard cultivars as "clusters" or "subjects" in the terminology commonly used for repeated measures data (Schabenberger and Pierce, 2002). The key feature of clustered data is that observations may be correlated within clusters, but not between clusters. In the present context, observations on the same cultivar are correlated because of the presence of random effects (GL)ij and (GY)ik. By contrast, observations from different cultivars are uncorrelated, when environmental effects Lj, Yk, and (LY)jk are fixed, because each cultivar is associated with a new set of realizations of the random effects (GL)ij, (GY)ik, and (GLY)ijk. The main saving from the clustered structure comes from the fact that the variance-covariance matrix of the data, which is needed on the maximization of the likelihood, has a block-diagonal form, so that inversion can be done in block-wise fashion. Conversely, when environmental effects Lj, Yk, and (LY)jk are random, all observations in the full dataset are correlated among one another, and thus a single inversion of the complete variance-covariance matrix is needed, which may be computationally quite demanding for large datasets, for example with PROC MIXED in SAS (SAS Institute, 1999).
Brief Review of Theory for Likelihood-Based Analysis of Missing Data
The relevant theory for missing-data patterns was developed by Rubin (1976) and further elaborated by Little and Rubin (1987, 2002). We here give a cursory characterization of missing data patterns. A more rigorous treatment is given in Appendix A. Let Y denote the data that would occur in the absence of missing values. The data fall into two subsets: the observed data (Yobs) and the missing data (Ymis). Thus, the complete data is given by Y = (Yobs, Ymis). Moreover, define for each element in Y a random variable taking value 1 if the element is observed and 0 if it is missing. These indicator variables may be collected into a random vector R. In the present context, R describes the selection history of a series of trials. Concerning the appropriate likelihood function for inference, Little and Rubin (1987, 2002) popularized the following nomenclature for missing-data patterns.
Missing completely at random (MCAR): The missing-data pattern R is independent of Yobs and Ymis. In this case the missing-data mechanism is ignorable with likelihood-based inference.
Missing at random (MAR): is independent of Ymis, but dependent on Yobs. In this case the missing-data mechanism is also ignorable with likelihood-based inference.
Missing not at random (MNAR) (Verbeke and Mohlenberghs, 2000): R depends on Ymis. In this case, the missing-data mechanism is not ignorable. The missing-data mechanism is then said to be informative.
In a cultivar testing program, where cultivars are selected each year on the basis of the data thus far collected (Yobs), but not on unobserved data (Ymis), the missing-data mechanism is clearly MAR. Also, if the selection decision only depends on the trait being analyzed and, conditionally on the observed data, the decision rule does not depend on parameters related to Y (variances and fixed effects), the missing-data pattern is ignorable in single-trait analyses, providing data are analyzed by ML. REML can be derived as empirical Bayes with a flat prior on the fixed effects (Harville, 1976). On the basis of the results of Little and Rubin (1987, 2002), Bayesian inference, and hence REML-based inference is valid under the same conditions as ML estimation. For a cursory non-Bayesian justification see Appendix B. The interpretation of BLUP as empirical Bayes estimator (Searle et al., 1992) suggests that prediction of random effects based on ML or REML estimates of variance components is also valid under MAR. The important practical consequence of Rubins (1976) theory for cultivar testing is that standard mixed model procedures yield valid results provided that all data used in the selection process are included in the analysis. It also follows that the common practice of omitting all data on cultivars that did not make it to the current year, henceforth referred to as complete-case approach, yields an MNAR missing-data pattern with respect to the starting set or base population of cultivars entering the testing program, so standard mixed model procedures do not yield valid inferences on variance components for the starting set of cultivars.
The literature on drop-out in longitudinal studies seems to favor ML over REML (Diggle and Kenward, 1994; Verbeke and Mohlenberghs, 2000). It is therefore of interest to study the properties of REML-based estimates in the case that data are MAR. Also, the small-sample properties of ML and REML under an MAR mechanism for cultivar trials may be studied by simulation.
Simulation Study
The purpose of the simulation is to study the small-sample behavior of different estimation methods for variance components (ML, REML) and cultivar effects (BLUE, BLUP) under different missing-data patterns (MCAR, MAR, MNAR). We simulated cultivar yields for a series of trials in K = 3 yr and J = 15 locations each year according to Model [1]. In simulating the data, all effects were considered as random. We chose variance component values on the basis of REML estimates from German winter wheat evaluation trials by the Bundessortenamt (Hannover). The values were var(Gi) = 11, var(Lj) = 46, var(Yk) = 6, var[(LY)jk] = 71, var[(GL)ij] = 4, var[(GY)ik] = 2.5, var[(GLY)ijk] = 20. The value of the cultivar x year interaction variance turned out to be very influential, so we also considered the case var[(GY)ik] = 25 for comparison. For each simulation run, simulated cultivar main effects were saved for later comparison with estimated values. The number of cultivars was I = 100 and I = 20. We selected cultivars in the first 2 yr using the following method: Select the best I/2 cultivars in Year 1 and the best I/4 cultivars in Year 2. Selection was based on BLUE computed from the currently available observations of all cultivars. Because of the design, ranking of BLUE and BLUP for the cultivars currently under selection at any stage was identical. The selection scheme yielded a selected unbalanced dataset, in which data were MAR.
To assess the effect of selection, we generated a dataset with the same missing-data pattern as the selected incomplete dataset, but new, unselected observations, using the same Model [1]. This yielded an unselected unbalanced dataset, in which the missing data were MCAR (Little and Rubin, 1987). The simulations were done with 1000 and 10 000 runs per setting, depending on the size of the simulated data sets. All analyses were performed by the SAS System (SAS Institute, 1999).
After the third year, data were analyzed by ML and REML. We also compared analyses with fixed and random cultivar main effects. The variance component estimators were evaluated with respect to bias (%) and the mean squared error (MSE), defined as the mean squared deviation between estimates and true parameter values. Estimators of cultivar effects (BLUP or BLUE) were evaluated by the MSE. Note that because of squaring, the MSE of a parameter may become relatively large compared with true parameter values. BLUE of fixed cultivar main effects were computed on the basis of a sum-to-zero restriction. As BLUE is equivalent to ordinary least squares for the design considered here, estimates did not depend on the variance components. For analysis with random cultivar main effects we evaluated BLUP both for estimated and known variance components.
To investigate the MNAR pattern, we also studied the complete-case approach were analysis in the third year is based only on data on cultivars that are tested up to the third year. The missing-data pattern is MNAR in this case. For brevity, the results for variance components are reported only for I = 100 and var(GY)ik = 2.5.
A Real Example
We consider a series of VCU trials with oilseed rape conducted by the Bundessortenamt in Germany from 1991 to 2000. The example is used to demonstrate the differences among estimation methods in a selected and highly unbalanced data set. The data fall into nine series of two or three trial years each. Selection at each stage was based on a number of criteria, including yield, resistance to diseases, and quality traits. The exact details of the selection procedure were not available to us. The design in each trial was a randomized complete block design. The series had between 35 and 75 cultivars, while the number of locations per series varied between 13 and 40. We analyzed cultivar means per trial according to Model [1] assuming either fixed or random cultivar effects. The inverse of the squared standard error was used as a weight to partition error from cultivar x location x year interaction (Piepho, 1999). BLUP and BLUE were computed by the REML method, with all data available after completion of a series. All analyses were done by PROC MIXED (SAS Institute, 1999).
 |
RESULTS
|
|---|
Simulation
We first compare the complete-case analysis (Table 1) to the analysis of all available data (Table 2) for the case I = 100 and var[(GY)ik] = 2.5, focusing on the variance component estimates. For random cultivars, the results clearly show that a complete-case analysis (Table 1) leads to more severely biased estimates of the variance for cultivar main effects and of cultivar x year interaction compared with the analysis of all available data (Table 2). This confirms the expectation from the theory of Little and Rubin (1987, 2002) that the missing-data pattern in a complete-case analysis is MNAR because it depends on the omitted incomplete cases. When cultivar main effects are fixed, the complete-case approach does comparatively better, but with REML it is still clearly outperformed by the analysis of all available data. Simulation results for different settings on the basis of an analysis of all available data are presented in Tables 2 to 5. The results confirm that ML is considerably more biased than REML when data are MCAR. The difference in bias and MSE between ML and REML was more pronounced than that between MAR and MCAR. The effect of selection became more severe when the cultivar x year interaction variance was increased (Tables 3 and 5). This is because selection operates on cultivar main effects and cultivar x year interaction effects. Selection had the largest impact on variance estimates for the cultivar main effect and the cultivar x year interaction, increasing the MSE notably in most cases. Also, bias of variance component estimates was higher when cultivar main effects were taken as fixed. When cultivar main effects were fixed, REML was clearly preferable to ML in terms of the MSE for variance components, while with random cultivar effects, there was no clear-cut winner among the two methods under an MAR missing-data mechanism. This finding should be related to the fact that both bias and MSE are the net effect of two sources of error: the small-sample behavior under MCAR and the added error incurred from selection. BLUP turned out to be generally preferable to BLUE in terms of MSE. Also, the MSE was smaller under MCAR than under MAR. Finally, the MSE of BLUP was smaller when variance components were known, and the performance under MAR and MCAR missing-data patterns was very similar in this case. The number of cultivars had a considerable effect on performance of both effect estimators and variance component estimation methods. Except for the MNAR case, when the number of cultivars was large (Tables 2 and 3), all methods performed more similarly, as was expected from asymptotic properties of likelihood-based methods (Cox and Hinkley, 1974).
View this table:
[in this window]
[in a new window]
|
Table 1. Comparison of variance components estimated by ML or REML based on 1000 simulation runs where var[(GY)ik] = 2.5 and I = 100 cultivars. Labels MAR and MCAR refer to missing data pattern in dataset with all observations available after selection. From this dataset, only the subset of complete cases was used for analysis reported in this table. The corresponding analysis of all available data is found in Table 2.
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Comparison of parameters estimated by ML or REML based on 1000 simulation runs where var[(GY)ik] = 2.5 and I = 100 cultivars. Analysis based on all data available after selection. The corresponding analysis for the subset of complete cases is found in Table 1.
|
|
View this table:
[in this window]
[in a new window]
|
Table 5. Comparison of parameters estimated by ML or REML based on 10 000 simulation runs where var[(GY)ik] = 25 and I = 20 cultivars. Analysis based on all data available after selection.
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. Comparison of parameters estimated by ML or REML based on 1000 simulation runs where var[(GY)ik] = 25 and I = 100 cultivars. Analysis based on all data available after selection.
|
|
View this table:
[in this window]
[in a new window]
|
Table 4. Comparison of parameters estimated by ML or REML based on 10 000 simulation runs where var[(GY)ik] = 2.5 and I = 20 cultivars. Analysis based on all data available after selection.
|
|
Oilseed Rape Data
Differences of variance component estimates between all available data and complete cases subsets (Table 6) were not quite as extreme in the real data set as in simulations because selection was based on several criteria such as yield, resistance traits, quality traits, etc. Nevertheless, the downward bias in the MNAR situation is quite apparent, particularly for the cultivar main effect variance, while the bias for the cultivar x environment variance components tends to be smaller. The correlation of BLUP and BLUE from the analysis of all available data was very high in all cases, ranging between 0.994 and just below 1.000. To test the assumption of normally distributed cultivar main effects, we subjected the least square cultivar means to a Shapiro-Wilk test (this test is an approximate one because of some minor heterogeneity of standard errors). The test was nonsignificant for eight out of nine series. The Q-Q-plots (Atkinson, 1985) relative to the normal distribution looked inconspicuous for all series (not shown). We conclude that the normality assumption for random cultivar effects is usually tenable.
 |
DISCUSSION
|
|---|
Statistical theory shows that selection is ignorable in REML and ML analyses of cultivar evaluation trials with data MAR, provided that all data used for selection are included in the analysis. By contrast, the complete-case approach, by which only cultivars making it to the current year are analyzed, does not yield valid variance component estimates with respect to the set of cultivars entering the trial system because the data drop-out follows an MNAR pattern. Simulations as well as the oilseed rape data have shown that variance component estimates are notably biased in this case, and this corroborates similar findings by Little and Raghunathan (1999) for a medical science setting. In summary, selection is not ignorable with the complete-case approach, but it is ignorable when all observed data are included in the analysis.
We also estimated variance components for German VCU trials with wheat (results not shown) and did not find very strong evidence of downward bias for variances of cultivar main effects and of cultivar x year interaction, when a complete-case approach was used. The explanation for this finding is that wheat has five different bread quality classes. Baking quality is negatively correlated with yield, and breeders aim at cultivars in different quality classes. Therefore, selection based on both yield and baking quality for different classes does not usually dramatically decrease the genetic variance in yield for cultivars making it to the third year. It should also be remarked that the overlap of yield distributions in different quality classes is very wide, so the assumption of normally distributed cultivar main effects made in the simulations was reasonable.
We stress that ignorability does not imply identical properties of likelihood-based estimators under MCAR and MAR missing-data mechanisms. For example, MSE of variance components estimates tends to be larger for MAR than for MCAR. The simulation revealed that it is desirable to use as much data as possible for estimating the variance components, for example by using long-term data. REML is preferable to ML because of smaller bias and MSE. Also, the analysis based on a model with random cultivar main effects yields more accurate genotype x environment interaction variance component estimates than analysis based on fixed cultivar main effects. Similarly, BLUP had smaller MSE and bias than BLUE. Clearly, the Achilles heal of BLUP is estimation of the variance components for cultivar main effects and for cultivar x year interaction. Still, in all simulations BLUP of cultivar effects based on variance component estimates from 3-yr data was preferable to BLUE in terms of MSE and bias. This finding is in agreement with results by Panter and Allen (1995) for breeding populations in soybean. Those authors found that BLUP was more efficient than BLUE, especially when pedigree information was available and when data were unbalanced. Eagles and Moody (2004) reported that a REML analysis for gene effects in breeding populations of barley based data from multiple stages of selection minimized bias due to selection. Piepho (1994, 1998) showed BLUP to outperform least squares estimators in unselected datasets. We are not aware of any publications which specifically consider the effect of selection on REML-based estimates in cultivar evaluation trials.
It emerges from the theory of Little and Rubin (1987, 2002) that standard BLUP is valid under an MAR missing-data process. It is important to note that the framework of Little and Rubin (1987, 2002) treats the missing-data pattern as a random variable. This is in contrast to results derived by Henderson (1975), who considered properties under a fixed missing-data pattern. Im et al. (1989) pointed out that this more restrictive conditional framework results in a loss of information. It is shown in Appendix C that in the setup considered in this paper BLUP is also valid under Hendersons (1975) more restrictive assumptions.
The design we studied in simulations is typical for cultivar registration trials. It should be stressed that for the design considered here, BLUE and BLUP lead to identical rankings of cultivars under selection at any stage. The main advantage of BLUP is the smaller MSE of point estimates and thus a more accurate assessment of the cultivars true performances. Cultivar registration trials should be contrasted to early-generation plant breeding trials, where pedigree information is available and genetic covariances can be explicitly modeled. In these circumstances BLUE and BLUP will not only differ in MSE, but also in ranking of genotypes. BLUP then has the added advantage of maximizing the probability of finding the correct ranking among genotypes (Searle et al., 1992). A detailed investigation of the properties of BLUE and BLUP under selection in early-generation breeding trials with different pedigree structures is currently underway and will be published elsewhere.
It should be obvious that both BLUP and BLUE become more accurate as more years are included because variance component estimates become more accurate and because the effect of cultivar x year interaction on the effect estimates is reduced. The conclusion to the contrary by Yan and Rajcan (2003) is refuted by their own data: plotting their mean correlation between BLUPs based on past data and mean yields in current years against the number of past years (Yan and Rajcan, 2003: Table 6), one sees a trend of continuous increase of the correlation with a higher number of years, despite some ups and downs in the scatter plot, which presumably are just due to sampling noise.
The finding that analysis based on the assumption of random cultivars performs better than analysis based on fixed cultivar effects may be related to asymptotic properties of likelihood-based methods. The optimality of maximum likelihood in mixed models for clustered data or repeated measures rests on several assumptions as outlined, e.g., in Longford (1993: Appendix). For example, the usual proof of asymptotic properties requires that the number of clusters or subjects tends to infinity (Longford, 1993) and that the number of parameters is finite (Cox and Hinkley, 1974). In the context of cultivar trials, where the model is formulated such that cultivars correspond to subjects by taking environmental effects as fixed, the requirement is that the number of cultivars tends to infinity. If cultivar main effects are taken as fixed, these effects become parameters of the model, and thus the number of parameters increases to infinity as the number of cultivars increases to infinity, in which case the usual asymptotic theory does not apply. Conversely, when cultivar main effects are random, these are not parameters, but rather part of the within-subjects variance-covariance structure. In this case, standard conditions for asymptotic theory apply. From this, it may be expected that performance of likelihood-based methods is better when cultivar main effects are taken as random, especially when the number of cultivars is large.
Our simulation was favorable to BLUP because cultivar main effects were drawn from a normal distribution. When the effects come from a rather skewed distribution or when cultivars are from different populations, e.g., different baking quality categories of wheat or hybrids vs. lines, the normality assumption may not be tenable, thus reducing the relative merit of BLUP. While BLUP itself does not require normality (Searle et al., 1992), optimality as well as application of the theory of Little and Rubin (1987, 2002) does require normality. In case cultivars come from different populations and normality within populations is a reasonable assumption, the mixed model can be adjusted by replacing the cultivar effect with a sum of a fixed population effect and a random cultivar effect nested within population. In this case, BLUP will maintain its optimal properties.
The important implication of the theory of Little and Rubin (1987, 2002) is that all data employed in selection decisions should be included in the analysis to avoid selection bias. If this is done, the missing data-pattern is MAR and hence selection is ignorable in likelihood-based analyses. With these qualifications, the answer to the question posed in the title of this paper is in the affirmative. A crucial aspect of both multistage cultivar testing and early-generation breeding is that selection decisions are usually based on several traits. Thus, a single-trait analysis may be subject to bias, if the trait under analysis is correlated to another trait on which selection was based (Im et al., 1989). To reduce selection bias, it has therefore become common practice in animal breeding to perform a multivariate mixed model analysis (Henderson and Quaas, 1976; Mrode, 1998, p. 7778), and this approach is also gaining popularity in forestry breeding (Silva et al., 2000; Aleta et al., 2004; Persson and Andersson, 2004) and with perennial crops (Purba et al., 2001; da Costa et al., 2002). Fully fledged REML-based multivariate mixed model analyses including BLUP seem to be rare in annual crops. We conjecture that much is to be gained by applying these techniques in cultivar testing.
If a single-trait analysis is performed despite selection on multiple traits, the resulting analysis is potentially biased, with the amount of distortion depending on a number of factors, such as selection intensity and genetic and environmental correlation among traits. Clearly, in cases where a correlated trait subject to selection is not included in the analysis, the data do not strictly meet the MAR assumption, because selection was effectively based on missing data (Ymis), namely the data of the correlated trait omitted from the analysis (Im et al., 1989). In such cases, one could resort to models allowing for informative drop-out, e.g., selection models and pattern-mixture models (Little, 1995). These models are not as simple to implement in practice as standard mixed model analyses, however, and they rely on a number of relatively strong assumptions, such as a specific model regarding the mode of selection, which are difficult or impossible to verify in practice. There are some useful suggestions for sensitivity analysis of such models (Verbeke and Mohlenberghs, 2000), but again these are not simple to implement in practice. It seems much more straightforward to opt for a multivariate mixed model analysis. In future work this option will be explored in the context of plant breeding programs.
 |
APPENDIX A
|
|---|
The joint density of the random variables R and Y can be factorized as
 | [A1] |
where
contains the unknown mean and variancecovariance parameters and
contains parameters defining the missing-data mechanism via the conditional density f(R|Y,
). In an experiment with missing data, the data available for inference is given by (Yobs,R). Thus, inference on the parameters must be based on the density of (Yobs,R), which is obtained from the joint density of (Y,R) = (Yobs, Ymis,R) by integrating out Ymis:
 | [A2] |
An important practical consideration is to know under which circumstances inference based on f(Yobs,R|
,
), which accounts for the missing-data pattern, is equivalent to inference based on the much simpler marginal density of the observed data,
 | [A3] |
which ignores the missing-data pattern. In the context of mixed linear models, f(Yobs|
) is just a multivariate normal density, which is particularly straightforward to use for inference, if a mixed model package is available. By contrast, direct manipulation of f(Yobs,R|
,
) requires specialized software (Verbeke and Mohlenberghs, 2000), so that likelihood-based inference is less readily accessible.
Concerning the appropriate likelihood function for inference, Little and Rubin (1987, 2002) popularized the following nomenclature for missing-data patterns.
Missing completely at random (MCAR): R is independent of Yobs and Ymis. In this case,
 | [A4] |
sothat [A2] using [A3], becomes
 | [A5] |
If parameters
and
are disjoint (unrelated), this will yield the same likelihood-based inference on
as f(Yobs|
), i.e., the missing-data mechanism is ignorable. For example, maximizing the likelihood function on the basis of [A5] with respect to
, yields the same estimate as maximizing the likelihood based on f(Yobs|
). Disjointness of
and
is usually a realistic assumption. For example this requires that the mean and the missing data mechanism given the data Y are not determined by common parameters.
Missing at random (MAR): R is independent of Ymis but dependent on Yobs. In this case
 | [A6] |
 | [A7] |
If parameters
and
are disjoint, this will yield the same likelihood-based inference on
as f(Yobs|
), i.e., the missing-data mechanism is ignorable.
Missing not at random (MNAR) (Verbeke and Mohlenberghs, 2000]: R depends on Ymis. In this case, the integration in [A2] does not yield a simple factorization as in [A5] and [A7], and so inference must be based on the full density f(Yobs,R|
,
), i.e., the missing-data mechanism is not ignorable. The missing-data mechanism is then said to be informative, and the associated parameter
must be estimated from the observed data along with
for valid likelihood-based inference. Note for clarity that R is denoted as a random variable whether the missing data pattern itself is denoted as "missing (completely) at random" or not.
 |
APPENDIX B
|
|---|
Here, we give a justification why MAR is ignorable with REML. Under a multivariate Gaussian model E(Y) = Xß and var(Y) = V(
), where
are the variance parameters, the density of the data can be factorized as
 | [B1] |
where
= (
, ß), c(X) = mid;X'X|1\2 is a constant not depending on parameters,
= (X'V1X)1X'V1Y, Z = KY, where K is a matrix of n rank(X) error contrasts defined so that KK' = I X(X'X)1X' and KK' = I, where I is an identity matrix, n is the number of observations in Y, g(
) is the density of
, and h(Z|
) is the density of Z, which depends solely on
, but not on ß (e.g., Diggle et al., 1994). Patterson and Thompson (1971) argued that all information on the variance parameters
is contained in Z, while g(
) provides no information on
, so estimation of
can be based solely on h(Z|
). In the same vein, Harville (1977) pointed out that Z is marginally sufficient for
in the sense of Sprott (1975). Estimation based on h(Z|
) amounts to REML, which may therefore be characterized as "maximizing that part of the likelihood which is invariant to the fixed effects" [Thompson, 1962; Searle et al. (1992), p. 250].
Now if the data are MAR and
and
are disjoint, the theory of Little and Rubin (1987, 2002) shows that inference regarding
can be based on f(Yobs|
). This density can be factorized as in [B1], suggesting that MAR is ignorable when
is estimated by REML, i.e., when only the part of f(Yobs|
) that is invariant to the fixed effects is maximized.
 |
APPENDIX C
|
|---|
Regarding BLUP under a selection model, Henderson (1975] showed that selection is ignorable if it is based on a culling variable w = L'y, such that X'L = 0. This requirement is more stringent than that emerging from the theory of Rubin (1976). For a detailed justification in a likelihood framework, see Gianola et al. (1988). The requirement is met if selection is itself based on BLUP of cultivar effects. To see this, consider a two-stage design, where data (y1) are collected up to a certain point in time, and cultivars are selected on the basis of BLUP using y1. In the second stage, more data (y2) are collected, and after data collection, random effects are again estimated by BLUP. The mixed model for y1 can be written as
 | [C1] |
Selection after the first stage is based on
 | [C2] |
where (X1'V11X1) denotes a generalized inverse of X1'V11X1 (Searle et al., 1992). The model for the data collected after both stages needs to be augmented by additional fixed and random effects (ß2 and u2) pertaining to the second stage, yielding the model
 | [C3] |
with y' = (y1',y2'), ß' = (ß1',ß2'), u' =(u1',u2'), and e' = (e1',e2'). The design matrices need to be augmented accordingly. For the fixed effects, this may be written as
 | [C4] |
where 0 is a null matrix of appropriate dimension. Relative to the model for the full data y, the culling vector w = L'y may now be defined (retrospectively, as it were) by L' = (L1',0), since selection was, in fact, based only on y1. Thus,
 | [C5] |
It follows from the properties of generalized inverses (Searle et al. (1992), p. 449) that
 | [C6] |
from which it emerges that X'L = 0. It may be concluded that selection is ignorable for BLUP at the second stage. Extension of this derivation to multi-stage selection is straightforward.
 |
ACKNOWLEDGMENTS
|
|---|
We thank the Bundessortenamt (Hannover, Germany) for providing the wheat and oilseed rape data. We also thank the referees for useful comments. One referee drew our attention to the relationship between REML and empirical Bayes with a flat prior on the fixed effects.
Received for publication April 28, 2005.
 |
REFERENCES
|
|---|
- Aleta, N., A. Ninot, and J. Voltas. 2004. Retrospective evaluation of parental selection in nursery tests of Juglans regia L. using a mixed model analysis. Silvae Genet. 53:2633.
- Atkinson, A.C. 1985. Plots, transformations and regression. Clarendon Press, Oxford, England.
- Cantet, R.J.C., A.N. Birchmeier, M.G. Santos-Cristal, and V.S. de Avila. 2000. Comparison of restricted maximum likelihood and Method R for estimating heritability and predicting breeding value under selection. J. Anim. Sci. 78:25542560.[Abstract/Free Full Text]
- Cox, D.R., and D.V. Hinkley. 1974. Theoretical statistics. Chapman and Hall, London.
- da Costa, R.B., M.D.V. de Resende, P.S. Goncalves, and M.A. Silva. 2002. Individual multivariate REML/BLUP in the presence of genotype x environment interaction in rubber tree (Hevea) breeding. Crop Breed. Appl. Biotechnol. 2:131139.
- Diggle, P., K.Y. Liang, and S.L. Zeger. 1994. Analysis of longitudinal data. Clarendon Press, Oxford.
- Diggle, P., and M.G. Kenward. 1994. Informative drop-out in longitudinal data analysis. Appl. Statist. 43:4993 (with discussion).[CrossRef]
- Duangjinda, M., I. Misztal, J.K. Bertrand, and S. Tsurutu. 2001. The empirical bias of estimates by restricted maximum likelihood, Bayesian method, and Method R under selection for additive, maternal, and dominance models. J. Anim. Sci. 79:29912996.[Abstract/Free Full Text]
- Eagles, H.A., and D.B. Moody. 2004. Using unbalanced data from a barley breeding program to estimate gene effects: Ha2, Ha4, and sdw1 genes. Aust. J. Agric. Res. 55:379387.[CrossRef]
- Gianola, D., S. Im, and R.L. Fernando. 1988. Prediction of breeding value under Hendersons selection model: A revisitation. J. Dairy Sci. 70:27902798.
- Gilmour, A.R., R. Thompson, and B.R. Cullis. 1995. Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51:14401450.
- Gilmour, A.R., B.R. Cullis, S.J. Welham, and R. Thomson. 1999. ASREML Reference Manual. NSW Agriculture Biometric Bulletin No.3. NSW Agriculture, Orange, NSW, Australia.
- Harville, D.A. 1976. Extension of the Gauss-Markov theorem to include the estimation of random effects. Ann. Statist. 4:384395.
- Harville, D.A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. J. Am. Statist. Assoc. 72:320340.[CrossRef][Web of Science]
- Henderson, C. R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31: 423447.[CrossRef][Web of Science][Medline]
- Henderson, C.R., and R.L. Quaas. 1976. Multiple trait evaluation using relatives records. J. Anim. Sci. 43:11881197.[Abstract/Free Full Text]
- Im, S., R.L. Fernando, and D. Gianola. 1989. Likelihood inferences in animal breeding under selection: A missing-data theory view point. Genet. Sel. Evol. (Paris) 21:399414.
- Little, R.J.A. 1995. Modelling the drop-out mechanism in repeated measures studies. J. Am. Statist. Assoc. 90:11121121.[CrossRef][Web of Science]
- Little, R.J.A., and T. Raghunathan. 1999. On summary measures analysis of the linear mixed effects model for repeated measures when data are not missing completely at random. Statist. Med. 18:24652478.[CrossRef]
- Little, R.J.A., and D.B. Rubin. 1987. Statistical analysis with incomplete data. John Wiley & Sons, New York.
- Little, R.J.A., and D.B. Rubin. 2002. Statistical analysis with incomplete data. 2nd ed. John Wiley & Sons, New York.
- Longford, N.T. 1993. Random coefficient models. Oxford Univ. Press, Oxford, England.
- Mrode, R.A. 1998. Linear models for the prediction of animal breeding values. CAB International, Wallingford, England.
- Panter, D.M., and F.L. Allen. 1995. Using best linear unbiased predictions to enhance breeding for yield in soybean. 2. Selection of superior crosses from a limited number of yield trials. Crop Sci. 35:405410.[Abstract/Free Full Text]
- Patterson, H.D. 1997. Analysis of series of variety trials. p. 139161. In R. A. Kempton, and P. N. Fox (ed.) Statistical methods for plant variety evaluation. Chapman and Hall, London.
- Patterson, H.D., and R. Thompson. 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58:545554.[Abstract/Free Full Text]
- Persson, T., and B. Andersson. 2004. Accuracy of single- and multiple-trait REML evaluation of data including non-random missing records. Silvae Genet. 53:135139.
- Piepho, H.P. 1994. Best linear unbiased prediction (BLUP) for regional yield trials: A comparison to additive main effects multiplicative interaction (AMMI) analysis. Theor. Appl. Genet. 89:647654.
- Piepho, H.P. 1998. Empirical best linear unbiased prediction in cultivar trials using factor analytic variance-covariance structures. Theor. Appl. Genet. 97:195201.[CrossRef][Web of Science]
- Piepho, H.P. 1999. Stability analysis using the SAS system. Agron. J. 91:154160.[Abstract/Free Full Text]
- Piepho, H.P., and J. Möhring. 2005. Best linear unbiased prediction for subdivided target regions. Crop Sci. 45:11511159.[Abstract/Free Full Text]
- Purba, A.R., A. Flori, L. Baudouin, and S. Hamon. 2001. Prediction of oil palm (Elaeis guineesis, Jacq.) agronomic performances using best linear unbiased prediction (BLUP). Theor. Appl. Genet. 102:787792.[CrossRef]
- Rubin, D.B. 1976. Inference and missing data. Biometrika 63:581592.[Abstract/Free Full Text]
- SAS Institute, Inc. 1999. SAS/STAT users guide. Version 8. SAS Institute, Cary, NC.
- Schabenberger, O., and F.J. Pierce. 2002. Contemporary statistical models for the plant and soil sciences. CRC Press, Boca Raton, FL.
- Schenkel, F.S., L.R. Schaeffer, and P.J. Boettcher. 2002. Comparison between estimation of breeding values and fixed effects using Bayesian and empirical BLUP estimation under selection on parents and missing pedigree information. Genet. Sel. Evol. (Paris) 34:4159.[CrossRef]
- Searle, S.R., G. Casella, and C.E. McCulloch. 1992. Variance components. John Wiley & Sons, New York.
- Silva, J.C.E., H. Wellendorf, and N.M.G. Borralho. 2000. Prediction of breeding values and expected genetic gains in diameter growth, wood density and spiral grain from parental selection in Picea abies (L.) KARST. Silvae Genet. 49:102109.
- Smith, A., B. R. Cullis, and R. Thompson. 2001. Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:11381147.[CrossRef][Web of Science][Medline]
- Sprott, D.A. 1975. Marginal and conditional sufficiency. Biometrika 62:599605.[Abstract/Free Full Text]
- Thompson, W.A. 1962. The problem of negative estimates of variance components. Ann. Math. Stat. 33:273289.
- Van Tassel, C.P., G. Casella, and E.J. Pollok. 1995. Effects of selection on estimates of variance components using Gibbs sampling and restricted maximum likelihood. J. Dairy Sci. 78:678692.[Abstract]
- Verbeke, G., and G. Mohlenberghs. 2000. Linear mixed models for longitudinal data. Springer, Berlin.
- Yan, W.K., and I. Rajcan. 2003. Prediction of cultivar performance based on single- versus multiple-year tests in soybean. Crop Sci. 43:549555.[Abstract/Free Full Text]