Crop Science Grow Your Career with CSSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 21 November 2006
Published in Crop Sci 46:2654-2665 (2006)
© 2006 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cotes, J. M.
Right arrow Articles by Cornelius, P. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Cotes, J. M.
Right arrow Articles by Cornelius, P. L.
Agricola
Right arrow Articles by Cotes, J. M.
Right arrow Articles by Cornelius, P. L.
Related Collections
Right arrow Biometrics

CROP BREEDING & GENETICS

A Bayesian Approach for Assessing the Stability of Genotypes

José Miguel Cotesa, José Crossab,*, Adhemar Sanchesc and Paul L. Corneliusd

a Dep. de Ciencias Agronómicas, Facultad de Ciencias Agropecuarias, Univ. Nacional de Colombia, Calle 59A No 63– 20 B11 101-07, Medellín, Colombia
b Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600, Mexico DF, Mexico
c Faculdade de Ciências Agrárias e Veterinárias, Univ. Estadual Paulista ’Julio de Mesquita Filho’ Câmpus de Jaboticabal, Via de Acesso Prof.Paulo Donato Castellane s/n 14884-900 Jaboticabal, SP, Brazil
d Dep. of Plant and Soil Sciences and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0312, USA

* Corresponding author (j.crossa{at}cgiar.org)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Several statistical models can be used for assessing genotype x environment interaction (GEI) and studying genotypic stability. The objectives of this research were to show how (i) to use Bayesian methodology for computing Shukla's phenotypic stability variance and (ii) to incorporate prior information on the parameters for better estimation. Potato [Solanum tuberosum subsp. andigenum (Juz. & Bukasov) Hawkes], wheat (Triticum aestivum L.), and maize (Zea mays L.) multi environment trials (MET) were used for illustrating the application of the Bayes paradigm. The potato trial included 15 genotypes, but prior information for just three genotypes was used. The wheat trial used prior information on all 10 genotypes included in the trial, whereas for the maize trial, noninformative priors for the nine genotypes was used. Concerning the posterior distribution of the genotypic means, the maize MET with 20 sites gave less disperse posterior distributions of the genotypic means than did the posterior distribution of the genotypic means of the other METs, which included fewer environments. The Bayesian approach allows use of other statistical strategies such as the normal truncated distribution (used in this study). When analyzing grain yield, a lower bound of zero and an upper bound set by the researcher's experience can be used. The Bayesian paradigm offers plant breeders the possibility of computing the probability of a genotype being the best performer. The results of this study show that although some genotypes may have a very low probability of being the best in all sites, they have a relatively good chance of being among the five highest yielding genotypes.

Abbreviations: GEI, genotype x environment interaction • HPD, highest posterior density region • MCMC, Markov Chain Monte Carlo • MET, multi environment trials • ShV, Shukla stability variance


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
MULTI ENVIRONMENT TRIALS are important in agronomy and plant breeding studies because they aim (i) to estimate and predict yield with high precision, (ii) to determine yield stability and patterns of response in the environment, and (iii) to select the best genotype to be sown in future years and new localities (Crossa, 1990). Several statistical models can be used to assess genotype x environment interaction (GEI) and yield stability (Crossa, 1990; DeLacy et al., 1996); different models are appropriate for exploring different parts or aspects of the data. There is vast literature documenting research on the statistical modeling of GEI and on assessing yield stability in the context of METs in plant breeding.

Edwards and Jannink (2006) recently pointed out, in the context of METs, the importance of modeling heterogeneity of within-site error variances as well as genotypic and genotype x environment interaction variances. The usual regression on the mean stability method of Eberhart and Russell (1966) uses the variance due to deviation from regression as the stability parameter. Genotype x environment interaction variances such as that proposed by Shukla (1972) can be used as a measure of genotypic stability. The statistical models commonly used for computing these variance stability parameters when assessing GEI adopt a frequentist (or classical) approach. Recently, Edwards and Jannink (2006) used a Bayesian approach that considers, within the context of a MET, a linear model with heterogeneity of within-site error variances and GEI variances. Those authors found that considering heterogeneity of variances decreased error when estimating genotypic as well as GEI variances.

Bayesian inference methodology represents a flexible option for modeling GEI and studying yield stability. However, very little Bayesian inference for assessing GEI in the context of METs and for selecting genotypes has been done. A Bayesian approach to the analysis of incomplete genotype x environment x year data in the context of prediction from METs can be found in Theobald et al. (2002). Foucteau and Denis (2001) used a sequential Bayesian approach for cultivar recommendation in the context of integrating different METs. A Bayesian method for determining the number of important interaction components in an Additive Main effects and Multiplicative Interaction (AMMI) model and for deriving conditional posterior means of singular vectors was presented by Viele and Srinivasan (1999).

Bayes' theorem in probability theory states that for two random variables Y and {theta}, P({theta},Y) = P({theta}/Y)P(Y) = P(Y/{theta})P({theta}), where P({theta},Y) is the joint probability of {theta} and Y, and P(Y) and P({theta}) are the marginal probabilities of Y alone, and {theta} alone, respectively. Also, P({theta}/Y) and P(Y/{theta}) are conditional probabilities of {theta} given Y and Y given {theta}, respectively. It follows that P({theta}/Y) = P(Y/{theta}) P({theta})/P(Y). If Y and {theta} are continuous random variables, this result can be expressed in terms of their probability density functions, i.e., f({theta}/Y) = f(Y/{theta})f({theta})/f(Y), where, by the law of total probability, f(Y) = {int}{Theta} f(Y/{theta})f({theta}) d{theta}, where the integration is over the entire parameter space {Theta}. In the context of variable Y analysis in any given experimental data set, P(Y) is a constant. Thus, f({theta}/y) is proportional to f(y/{theta})f({theta}). In the context of a Bayesian analysis of experimental data, y denotes the response variable, {theta} denotes the parameter or vector of parameters on which the distribution of y depends, f({theta}) is the density function for the "prior" distribution of {theta}, and f({theta}/y) is the density function for the "posterior" distribution of {theta} given Y.

Thus, the information obtained from the data of the current experiment, which is expressed in the likelihood function, and from the prior probability that reflects the original knowledge on the parameters (some or all), gives the final posterior distribution of the parameters given the data, i.e., f({theta}|Y). In this posterior distribution, all information about the parameters is summarized, and marginal distributions for each parameter can be obtained. It is reasonable to specify an interval in which most of the posterior distribution lies. The highest posterior density region (HPD) is defined as an interval having the following property: that any point inside the interval has a greater probability density than any point outside the interval and is, at a given probability, the shortest interval (Box and Tiao, 1973). Therefore, points inside HPD have a larger relative likelihood than those outside HPD.

A common problem in most of the METs of breeding programs is incomplete data caused by "natural" loss of experimental units or because certain genotypes can only be evaluated in some of the MET environments because of seed shortage, etc. (Kang and Magari, 1996; Magari and Kang, 1997; Piepho, 1999). For this reason, alternative statistical techniques to ordinary least square estimation have been used, the most popular being the mixed linear model approach with the Restricted Maximum Likelihood (REML) estimation method. This has been shown to be the most adequate method for variance component estimation when the data is unbalanced (Kang and Magari, 1996). However, the estimation of variance components has many associated problems, including obtaining negative variance estimates, the difficulty of having confidence intervals, and their invalidity when there is no normality of data. The Bayesian inference methodology represents an option for the analysis of MET because it avoids some of the above problems such negative variance components estimate, it appropriately deals with incomplete genotype x environment data set, and it considers and integrates in the analysis the heterogeneity of within environment error variances. However, the use of Bayesian inference for assessing GEI in the context of MET and for selecting genotypes has been very limited.

The aim of this research is to show the use of Bayesian methodology for assessing the stability of genotypes from three regional trials with different levels of prior information. The Bayesian methodology computes Shukla's phenotypic stability variance and shows how to incorporate prior information on the parameters for better estimation. This study uses potato, wheat, and maize METs for illustrating the application of the Bayes paradigm.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Shukla Stability Variance
Shukla stability variance (ShV) measures the consistency of a genotype's performance across several environments; it can be considered equivalent to the variance due to deviations from regression when regressing the mean of a specific genotype on the site means (Shukla, 1972). The ShV partitions the GEI variance into components assignable to each genotype and measures the contribution of each genotype to the GEI. When data are unbalanced, ShVk for k = 1, 2,..., g genotypes cannot be computed, but an analogous quantity given by the GEI variance components for each genotype, {sigma}2GEI(k), can be obtained. The degrees of freedom associated with ShVk are a – 1 (where a is the number of environments). Genotypes that have small ShV values are considered stable.

The GEI analysis and genotypic selection can focus only on highly productive genotypes or only stable genotypes based on phenotypic performance or on both criteria simultaneously—high yield and phenotypic stability. The rank sum method proposed by Kang (1988) ranks the genotypes by combining the mean yield (after adjusting for Least Significant Differences) (highest mean receives rank 1) and Shukla stability variance (lowest variance receives rank 1); the genotype with the lowest rank sum is the most suitable. Furthermore, the yield-stability method of Kang (1993) intends to simultaneously select for both yield and stability.

Mixed Linear Model for Multi Environment Trials
Assume a MET with a environments (i = 1, 2,..., a), ri blocks in each environment (j = 1, 2,..., b; where b = {Sigma}ri), g genotypes (k = 1, 2,..., g), ni observations in each environment, mk the number of environments where the kth genotype was evaluated, and n = {Sigma}ni the total of observations. Thus, for the nx1 y vector of observations, the following mixed model is defined

Formula 1[1]
where ß = [µ1,..., µg], u1,u2, and u3(k) are vectors of fixed effects of genotypes, random effects of environments, random effects of blocks within environments, and random GEI effects involving the kth genotype; X (nxg), Z1 (nxa), Z2 (nxb), and Z3(k) (nx mk) are the corresponding incidence matrices for the effects in these vectors. The residual e vector is e = [e1,..., ea]', where ei is the nix1 vector of the residual effects (experimental error) of the ith environment, and e is the nx1 vector of the residual effects. In this research, we used normal priors for both the fixed and random effects of mixed model [1] to be used in the Bayesian inference.

Bayesian Prior and Posterior Density of the Parameters of the Mixed Model
Prior distribution for the vectors of random effects u1, u2, u3(k), and e will be normal distributions with zero mean and variances {sigma}2u1, {sigma}2u2, {sigma}2u3(k) (k = 1,2,..., g) and {sigma}2ei, respectively. The ShV are the variances of the GEI for the genotypes, which we denote as {sigma}2u3(1)...{sigma}2u3(g) (Kang and Magari, 1996; Piepho, 1999). Thus, the vector of parameters is {theta} = [ß,u1,u2,u3(1),...,u3(g),{sigma}2e1,...,{sigma}2ea]', and the conditional distribution of y/{theta}, i.e., of y|ß,u1,u2,u3(1),...,u3(g),{sigma}2e1,...,{sigma}2ea, is the multivariate normal distribution

Formula 6[6]
where R = {d{sigma}ei2 Ini} and Ini is the identity matrix of order ni, where ni is the number of observations in the ith environment. We further need to define prior distributions for ß, as well as prior distributions or prior values for the variance components.

  1. The vector of genotypic effects was assumed to have a multivariate normal truncated distribution (all genotype effects nonnegative); the nonnegative constraint was imposed to ensure that genotypic means of yield are always positive. Thus, the vector of genotype effects is given by

    Formula 7[7]
    ,where B0 is the prior vector of the genotype effects, namely genotypic means, and Ig{sigma}02 is the prior variance matrix of the genotypic means.

  2. The vector of environments effects of replicates within environments and ShV of the genotypes um|{sigma}2um ~ N (0, Iqm{sigma}2um) for m = 1, 2, 3(1),..., 3(g), where qm and {sigma}2um are the number of levels and the variance of the effects (um), respectively.
    Priors used for the variance components u1, u2, u3(k) (k = 1, 2,..., g) and, {sigma}2e1,..., {sigma}2ea were scaled inverse chi-square distributions (Scaled-Inv.-{chi}2) (Gelman et al., 1995). Each Scaled-Inv.-{chi}2 has two hyperparameters that can be obtained from experimental data or from the researcher's own experience, i.e.,

    Formula 8[8]
    where {nu}um and s2um are the degrees of belief and the variance (scale factor) for um, respectively. For m = 1, s2u1 is the prior variance of environments, and m = 2, s2u2 is the prior variance of replicates within environments; for m = 3(1), 3(2), ...., 3(g), s2u3(1), s2u3(2) ,..., s2u3(g) denote the prior ShV for genotypes 1, 2, 3,..., g, respectively.

  3. The vector of within site error variances (residuals) is

    Formula 9[9]
    where {nu}ei is named, following Wang et al. (1993), the degrees of belief, and s2ei is the prior value for the error variance, {sigma}2ei, within the ith environment, respectively. Edwards and Jannink (2006) have shown another Bayesian approach for modeling heterogeneity of within site error variance by using a generalized linear model with a natural-log link function.

Parameters B0, {sigma}20, {nu}um, s2um, {nu}ei, and s2ei are called hyperparameters because they are not estimated but rather obtained from other experiments performed during previous stages of the crop improvement breeding program or from the breeder's previous experience and knowledge. Therefore, the prior distribution of the parameters is proportional to

Formula 10[10]
which can be more completely expressed as

Formula 11[11]
The likelihood function of the data is proportional to the following quantity

Formula 12[12]
Therefore, the posterior joint distribution is proportional to the prior distribution multiplied by the likelihood function, i.e.,

Formula 2[2]
Note that matrix R comes from the previously defined conditional distribution of y given {theta} [f(y/{theta})].

Multi Environment Trial Data
Three types of METs were used for illustration: a potato trial performed by the Potato Plant Breeding Program of the Universidad Nacional de Colombia, plus a wheat international trial and a maize international trial performed by the CIMMYT Wheat and Maize Programs, respectively. The regional potato MET included 10 environments and 15 genotypes; the response variable was yield production measured in Mg ha–1. Four cultivars were included in only seven environments because of a seed shortage, and 11 genotypes were evaluated in all 10 environments. Genotypes G12 through G15 were used as checks. The mixed linear model [1] allowed for heterogeneity of within-site error variance. Mean and variances of check genotypes G12 through G15, which were included in other regional potato trials conducted in previous years, were used as prior information.

The wheat MET had 10 advanced bread wheat lines evaluated in five contrasting environments in two consecutive years; a randomized complete block design with two replicates was used in each of the five environments. This MET was balanced in the sense that all genotypes were planted in the five environments and included in the 2 yr. The response variable was grain yield in Mg ha–1. Prior information for this experiment was the performance of the 10 genotypes evaluated in the five environments of the first year.

The maize MET consisted of nine genotypes evaluated in 20 international environments. A randomized complete block design with four replicates was used in each site. In this experiment no prior information was available.

Considerations for Assigning Values to the Bayesian Prior Distribution
The B0 vector and I{sigma}0 matrix contain the mean and the standard error for each genotype. When prior information is not available, a low value for B0 (i.e., B0 = 0.1) and a high value for {sigma}02 (i.e., {sigma}02 = 1 x 108) may be used to indicate that we have no information on the parameter. To always obtain a proper posterior distribution, we use this setting as noninformative priors.

As previously mentioned, for the {sigma}um2 parameters, we provided prior information through two hyperparameters, {nu}um and s2um. Values of {nu}um denote the degrees of belief and give information about the spread of the distribution, i.e., as {nu}um increases, the spread of the distribution increases, and vice versa. For example, plant breeders can translate their prior knowledge or belief about genotypic means or variances into numerical values that will make the distribution of these means or variances more or less dispersed. When the degree of belief is obtained from experimental data, it corresponds to the degree of freedom (used in the usual frequentist statistical context). The other parameter of the scaled inverse chi-square distribution is the prior variance value s2um) or scaled factor. When prior information on {sigma}2um is not available, low values for {nu}um and s2um should be considered to avoid improper posterior distributions, as reported by Hobert and Casella (1996). A posterior distribution is said to be improper when it integrates to infinity (i.e., does not integrate to unity); improper posterior distributions can occur only when improper priors are used (Box and Tiao, 1973). Similar considerations are given for assigning prior values to the within-site error (residual) variances, {sigma}2ei s.

For the potato MET, prior information on the performance of check genotypes G12 through G15 was obtained from seven regional potato trials conducted earlier. Prior information on the mean of the checks (G12–G15) and their standard errors is shown in Table 1. The small mean (0.1) and the large standard error (10 000) of nonchecks G1 through G11 reflect the lack of prior information. Prior information of ShV on the checks (G12 = 4.9498, G13 = 68.0346, G14 = 19.5727, and G15 = 6.0400) with 6 degrees of freedom was obtained from previous trials, whereas no information was available for the remaining genotypes; thus, the assigned degree of freedom is 1, and the value for the scale factor (Shukla variance) is small (0.01). The degrees of freedom and variance components for environments and replicates within environments from the previous seven regional trials were used. Homogeneous variance (29.1097 with 27 degrees of freedom) was used as prior information for the within-site errors; however, heterogeneity of within-site error variances, as expressed in the mixed model [1], was computed in the posterior density.


View this table:
[in this window]
[in a new window]

 
Table 1. Prior information in the Bayesian analysis of potato, wheat, and maize METs. Means of genotypes (B0) and their standard errors ({sigma}0), degrees of freedom (or belief) (vum). Scale factors (or variance) (sum2) for environments (Env.), replicates within environments [Rep(Env.)], genotypes (Shukla variances), and error variances within environments.

 
For the wheat MET, prior information consisted of the genotypic means and standard errors (Table 1) of the 10 genotypes across the five environments in the first year of testing obtained by REML. Furthermore, for {sigma}2um, the two hyperparameters {nu}um and s2um contain prior information on the combined analysis of the five environments of the first year of testing, corresponding to the degrees of freedom and variances of environments and replicates within environments, as well as the ShV of each genotype and the within-site error variance for the five environments of the first year. Genotypes G6 through G8 had large ShV values (0.1221, 0.2453, and 0.6047, respectively), whereas the other genotypes had small ShV values (Table 1). The degrees of freedom and variance components for environments and replicates within environments from the first year of testing were used. With this information, the Bayesian analysis was performed on the five environments used in the second year of testing.

For the maize MET, no prior information was available, so proper noninformative prior distributions were used for computing the posterior marginal distribution of the parameters (Table 1). For the mean of the nine maize genotypes, a value of 0.1 was assigned with a standard error of 10 000. The within-site error variance was set at 0.01 with 1 degree of freedom (Table 1).

Evaluating the Posterior Using the Gibbs Sampler Algorithm
The Gibbs sampling (Geman and Geman, 1984) is a numerical integration scheme based on Markov Chain Monte Carlo methods (MCMC) which is commonly used in Bayesian inference for facilitating the evaluation of the multiple integrals that need to be integrated to produce the marginal posterior distribution of each parameter of interest. The MCMC method draws values of {theta} from an approximate posterior [f({theta}/y)] and further draws from a better target posterior distribution, i.e., samples are drawn sequentially but distribution of the sampled draws depends on the last value of the draw such that the sequence of distributions is improved at each step and tends to converge on the target posterior distribution. When the MCMC finds its equilibrium (that is, it converges), the posterior distribution is sampled. The Gibbs algorithm was used to generate samples from the joint posterior distribution [2] used to determine the marginal posterior distributions of the parameters. A large number of samples was taken to appropriately characterize the parameters' marginal distribution.

To implement the Gibbs algorithm, it is necessary to recognize the fully conditional distributions of all unknown parameters of the joint posterior distribution [2]. For convenience, the following parameterization was considered for simplifying the implementation of the algorithm.

Formula 13[13]
and

Formula 14[14]
Thus, the posterior conditional distribution of {theta} is

Formula 15[15]
where Formula 15 = (W'R–1W + D–1)–1 (W'R–1y + D–1{theta}0). Therefore, {theta} is distributed conditionally as a multivariate normal, i.e.,

Formula 3[3]
The posterior conditional distribution for {sigma}ei2 is identified as a scaled inverse chi-square, that is,

Formula 4[4]
where Formula 4ei = {nu}ei + ni and Formula 4ei2 = Formula 4. Similarly, the posterior conditional distribution for {sigma}2um is also identified as a scaled inverse chi-square, that is

Formula 5[5]
where Formula 5um = {nu}um + qm and Formula 5um2 = Formula 5.

The following routine was used for the three MET examples in this study: The first 5000 vectors generated were considered a "burn-in" period, which is necessary for the Gibbs sampling process to reach the equilibrium distribution or equilibrium stage (Wang et al., 1993). Graphical monitoring was used for determining when the MCMC chain reached the equilibrium stage. After this period 1 x 106 vectors were generated, but only one of every 10 vectors was selected to produce the posterior distribution. Thus samples of the marginal posterior distribution p({theta}|y), p({sigma}2um|y), and p({sigma}2ea|y) were obtained from 1 x 105 selected vectors by choosing the {theta}, {sigma}2um, and {sigma}2ea components from each of them, respectively.

Although marginal posterior density estimation may be obtained directly from Gibbs sampling, as reported by Wang et al. (1993), we used the nonparametric Kernel Density Estimation (KDE) method for obtaining a smooth marginal posterior density by means of the SAS/KDE procedure (SAS Institute Inc., 1999a). The KDE method is a very quick smoothing technique that fits a normal distribution between each of the data points submitted; in this case, it allowed smooth graphical representations of the marginal posterior densities of the parameters.

The Highest Posterior Density Region (HPD)
From the posterior marginal distributions, the 90% highest posterior density (HPD) for each parameter was obtained. The HPD is an interval with points having greater probability density than other points outside the interval. Available from the first author (jmcotes@unal.edu.co) are the SAS MACRO program constructed using SAS v8.2, with the Gibbs sampler algorithm programmed in SAS/IML (SAS Institute Inc., 1999b), and other routines used for computing 90% HPD for genotypic means and ShV for each genotype, and for calculating the Bayesian Linear Predictor (BLP) using the model [1].

As pointed out by Besag and Higdon (1999), from the MCMC it is possible to count the number of times a particular genotype gives the best mean performance and, therefore, to compute the probability of its being the best performer [P(best)] or one of the best q performers for some integer q > 1 (i.e., for q = 5; then the probability of being among the best 5 performers can be obtained). We compute these probabilities to help plant breeders select the best genotypes.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Potato Multi Environment Trials
The prior and posterior densities of the ShV and the genotypic means of 15 potato genotypes included in the potato MET are shown in Tables 1 and 2, and in Fig. 1a1f (for the noncheck genotypes G1–G11) and Fig. 2a2d (for the check genotypes G12–G15).


View this table:
[in this window]
[in a new window]

 
Table 2. Posterior values of the Bayesian analysis for potato, wheat, and maize METs. Mean and the lower and upper 90% highest posterior density region (HPD) of the marginal posterior distribution for genotypic means and Shukla variances. Probability of any genotype of being the best yielding genotype [P(best)] and probability of being among the five best yielding genotypes [P({epsilon} top 5)].

 

Figure 1
View larger version (26K):
[in this window]
[in a new window]

 
Fig. 1. Estimated densities for potato METs: (a) Prior density of Shukla's variance for genotypes G1 through G11; (b) Prior density of genotypic means (Mg ha–1) for genotypes G1 through G11; (c) Posterior density of Shukla's variances for genotypes G1 through G5; (d) Posterior density of Shukla's variances for genotypes G6 through G11; (e) Posterior density of genotypic means (Mg ha–1) for genotypes G1 through G5; (f) Posterior density of genotypic means (Mg ha–1) for genotypes G6 through G11.

 

Figure 2
View larger version (22K):
[in this window]
[in a new window]

 
Fig. 2. Estimated densities for potato METs: (a) Prior density of genotypic means (Mg ha–1) for check genotypes G12 through G15; (b) Prior density of Shukla's variances for check genotypes G12 through G15; (c) Posterior density of the genotypic means (Mg ha–1) for check genotypes G12 through G15; (d) Posterior density of Shukla's variances for check genotypes G12 through G15.

 
Genotypes with noninformative priors (i.e., G1–G11) had their scaled inverted {chi}2 distribution of the ShV concentrated around zero (Fig. 1a). The small values given to the degrees of freedom and ShV for genotypes G1 through G11 (i.e., 1 and 0.01, respectively, Table 1) precluded having improper posteriors. Because of the low values given to the genotypic means of G1 through G11 and their high-scaled factors (standard error of the genotypic mean) (10 000), their noninformative prior densities are around zero (Fig. 1b). The only prior information available for the potato MET is on the check genotypes G12 through G15; this is reflected in the average performance of the genotypes in Fig. 2a (G12 = 15.181, G13 = 24.317, G14 = 21.378, and G15 = 15.260, Table 1) and the ShV variances in Fig. 2b (G12 = 4.9498, G13 = 68.0346, G14 = 19.5727, G15 = 7.1193, Table 1).

The posterior genotypic means (Bayesian estimates) and the region with 90% HPD for each genotype of the potato MET are shown in Table 2. The posterior means of the check genotypes G12 through G15 were 14.72, 24.32, 18.54, and 16.58, respectively. These values differ from those given by prior information (Table 1) because they also include the effect of the data likelihood. Clearly, genotype G11 had the highest posterior mean and, among the checks, G13 had the highest posterior mean yield and also the highest prior average yield (24.32 Mg ha–1). Genotype 11 had a 0.9994 probability of being the best yielding genotype, while Genotypes 3 and 4 had a high probability (0.9998 and 0.9822, respectively) of being within the best five genotypes. The lowest posterior yield was for check G12, which also had the lowest prior yield average (15.18 Mg ha–1) and a yield significantly lower than G11. Another check with similar low posterior yield was G15, which also had a low prior mean production. Genotypes G1 through G10 did not differ much in their posterior means, which should be similar to the mean of the experiment. It is evident that the genotypic mean performance of the check genotypes in the prior density is more disperse and shorter (Fig. 2a) than the posterior density (Fig. 2c). The posterior is higher and less dispersed than the prior because it incorporates data from a new MET. This indicates that the mean yields of the genotypes are estimated with more precision from the posterior density than from the prior density.

Concerning the posterior mean and 90% HPD of the ShV of the 15 potato genotypes, values differ from those given in prior information. This is clear for check genotype G13, which has the highest mean posterior ShV (30.35, Table 2) and the widest 90% HPD interval (13.05–46.99), followed by check G14, with an average posterior ShV of 10.82 and a 90% HPD of 4.37 to 16.67. Data collected in the experiment and transmitted to the posterior means of the ShV through the likelihood caused the prior ShV to change from 68.0346 and 19.5727, to 30.35 and 10.82 for G13 and G14, respectively. Genotypes G1 through G12 and G15 had small posterior ShV values and were the most stable genotypes.

The Bayesian approach is effective in summarizing prior information from past METs with data from new METs. For the ShV of the check genotypes, prior knowledge of G12 through G15 is, in general, more disperse (Fig. 2b) than the dispersion of the ShV of G12 through G15 on the posterior density (Fig. 2d). Furthermore, the dispersion of the posterior density of the ShV of the check genotypes is less than that of the noncheck genotypes (G1–G11), where noninformative prior information on the ShV was used (Fig. 1c and 1d).

Bread Wheat Multi Environment Trials
Prior information used in the wheat MET was obtained from the combined analysis of the five environments of the first year of testing. The prior and posterior densities of the genotypic means and ShV of the 10 wheat genotypes and their means are in Tables 1 and 2, and depicted in Fig. 3a3h. Genotype G8 had the highest prior ShV (0.6047, Table 1 and Fig. 3b), followed by G7 and G6, with prior ShVs of 0.2453 and 0.1221, respectively. Genotypes G1 through G5 and G9 through G10 had small prior ShVs, indicating they were stable across the five test environments of the first year. The prior genotypic means obtained from the five environments in the first years ranged from 3.62 Mg ha–1 (for G9) to 5.77 Mg ha–1 (for G1) (Table 1).


Figure 3
View larger version (29K):
[in this window]
[in a new window]

 
Fig. 3. Estimated densities for wheat METs: (a) Prior density of Shukla's variance for genotypes G1 through G5; (b) Prior density of Shukla's variance for genotypes G5 through G10; (c) Prior density of genotypic means (Mg ha–1) for genotypes G1 through G5; (d) Prior density of genotypic means for genotypes G5 through G10; (e) Posterior density of Shukla's variance for genotypes G1 through G5; (f) Posterior density of Shukla's variance for genotypes G5 through G10; (g) Posterior density of genotypic means (Mg ha–1) for genotypes G1 through G5; (h) Posterior density of genotypic means (Mg ha–1) for genotypes G5 through G10.

 
The posterior genotypic means are very similar to those given by the prior density (genotypic means of the first year), indicating that the effect of the data from the second year did not much affect the genotypic means of the first year (Table 2, and Fig. 3 g and 3h). However, the dispersion of the genotypic means in the posterior density (Fig. 3 g and 3h) is less than that in the prior density (Fig. 3c and 3d) and is reflected in the short 90% HPD (Table 2). Prior and posterior yield performance of G1 was the highest and significantly higher than the posterior of the lowest genotype, G9; G1 had a 0.9176 probability of being the best performer in terms of grain yield and a 0.9980 probability of being among the five best yielding genotypes. Genotypes G4 and G10 had a low probability of being the winning genotypes but a relatively high probability of being among the best five yield performers (0.6861 and 0.7298, respectively) (Table 2).

The posterior of the ShV shows differences with respect to the information given by the prior density (Table 2 and Fig. 3e and 3f). Genotype G1 now displays the highest posterior Shukla variance (0.73) with the widest 90% HPD, followed by genotypes G8 (0.64), G7 (0.64), and G6 (0.63). The other genotypes had posterior ShV values much higher than values obtained in the first year, indicating the influence of the data on the information given by the prior density. These results can also be seen in the graphs for the posterior density of the ShV (Fig. 3e3f), where G1 and G6 through G8 show a more disperse posterior density than genotypes G2 through G5, G9, and G10, thus indicating a more stable performance of the latter across environments.

Maize Multi Environment Trials
In this case, a noninformative prior for the genotypic means and the ShV of the nine maize genotypes was employed. Therefore, genotypic means of 0.1 were assigned to each maize genotype with standard errors of 10 000 (Table 1). The fact that the stability of the genotypes (ShV), the within-site error variance, and their associated degree of freedom are unknown is reflected in the values 0.01 (assigned to the ShV and the within-site error) and 1 degree of freedom. Noninformative priors are shown in Fig. 4a and 4b, and posterior genotypic means and means of the ShV are shown in Fig. 4c4f.


Figure 4
View larger version (24K):
[in this window]
[in a new window]

 
Fig. 4. Estimated densities for maize METs: (a) Prior density of Shukla's variance for genotypes G1 through G9; (b) Prior density of the genotypic means (Mg ha–1) for genotypes G1 through G9; (c) Posterior density of Shukla's variance for genotypes G1 through G5; (d) Posterior density of the Shukla variance for genotypes G6 through G9; (e) Posterior density of genotypic means (Mg ha–1) for genotypes G1 through G5; (f) Posterior density of genotypic means for genotypes G6 through G9.

 
Genotypes G3 through G6 had the highest posterior density means with the certainty that they will be among the best five genotypes (Table 2). However, only G6 is certain to be a winner. Regarding the mean ShV, genotypes G2, G3, G6, and G7 are the most stable, with values for the Shukla variance of 1760, 1256, 1894, and 1975, respectively (Table 2).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The Bayesian paradigm for analyzing a series of (or sequential) METs conducted over several years offers the plant breeder the opportunity of including information on some genotypes from previous METs as prior information for the current MET. This was the case for the potato and wheat METs included in this study. Concerning the posterior distribution of the genotypic means, it is observed that the maize MET with 20 sites gives less disperse posterior distributions of the genotypic means (i.e., more precise estimates of the genotypic means) than does the posterior distribution of the genotypic means of the other METs that used fewer environments. This also occurred with the mean of the posterior ShV but to a much lesser degree. Therefore, the Bayesian approach with use of prior information gives more precision in the parameter estimates. However, the use of an appropriate number of environments in regional or international trials is also important for obtaining precise estimates of genotypic yield and yield stability.

The priors used in the Bayesian model of this study considered heterogeneity of within-site error variances and heterogeneity of GEI variances represented by individual genotypic variances. As found by Edwards and Jannink (2006), this approach significantly decreased errors in the estimation process. Furthermore, the Bayesian approach through the simulated marginal posterior of each parameter considers the joint posterior of all other parameters in the model and their individual estimation precision. For example, when some parameters had poor prior information, this is reflected in a flat posterior distribution (Fig. 1 and 4)], which, in turn, indicates poor precision that will be transmitted to the marginal posterior density of that parameter. The Bayesian approach treats prior information as estimators (not as known values) and, as such, is incorporated in the posterior density. The authors mention that this property is not considered in the maximum likelihood or Restricted Maximum Likelihood (REML) estimation approaches, which treat estimators as known values.

One of the most important practical issues in plant breeding is to rank the genotypes as closely as possible to their true rank to select the superior and most stable genotypes. The Bayesian paradigm offers plant breeders the possibility of computing the probability of a genotype being the best performer for conditions in a particular experiment. The results of this study show that although some genotypes may have a very low probability of being the best, they have a relatively good chance of being among the five highest yielding genotypes.

In this study, we show the different forms in which Bayesian methodology can be used in the context of METs for assessing yield and yield stability. The Bayesian approach does not change the concept of phenotypic stability, but it facilitates its more precise estimation when prior information is available. The potato MET used prior information for just three genotypes, whereas the wheat MET used prior information on all genotypes, and the maize MET used noninformative priors for all genotypes. When partial or complete prior information is available, the Bayesian approach is the optimum statistical analysis for the MET. When no prior information is available, the Bayesian approach produces results comparable to those obtained using frequentist statistical methods. However, the Bayesian approach allows use of other statistical strategies, such as the normal truncated distribution (used in this study), which do not seem to be easily implemented in the frequentists' statistical approach for analyzing METs. When analyzing a trait such as grain yield, a lower bound of zero and an upper bound set by the researcher's experience can be used.

The Shukla variance is a one-dimensional statistic that indicates whether a genotype has a low variance around the mean or a high variance when genotypes are tested across environments. It can be used to identify stable genotypes for breeding purpose. There are other stability parameters and methods that allow selecting simultaneously for yield and yield stability such as the yield-stability method of Kang (1993), but they were not considered in this Bayesian approach. Furthermore, multiplicative models such as AMMI can also be used for assessing yield stability and adaptability, but their use in the Bayesian context requires further research.

Received for publication April 8, 2006.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
R.-C. Yang
Mixed-Model Analysis of Crossover Genotype-Environment Interactions
Crop Sci., May 31, 2007; 47(3): 1051 - 1062.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cotes, J. M.
Right arrow Articles by Cornelius, P. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Cotes, J. M.
Right arrow Articles by Cornelius, P. L.
Agricola
Right arrow Articles by Cotes, J. M.
Right arrow Articles by Cornelius, P. L.
Related Collections
Right arrow Biometrics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome