Crop Science Grow Your Career with CSSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 20 June 2006
Published in Crop Sci 46:1711-1721 (2006)
© 2006 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (1)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cerón-Rojas, J. J.
Right arrow Articles by Santacruz-Varela, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Cerón-Rojas, J. J.
Right arrow Articles by Santacruz-Varela, A.
Agricola
Right arrow Articles by Cerón-Rojas, J. J.
Right arrow Articles by Santacruz-Varela, A.
Related Collections
Right arrow Biometrics

CROP BREEDING & GENETICS

A Selection Index Method Based on Eigenanalysis

J. Jesús Cerón-Rojasa, José Crossab,*, Jaime Sahagún-Castellanosc, Fernando Castillo-Gonzálezd and Amalio Santacruz-Varelad

a Colegio de Postgraduados, Km 36.5, Carretera México-Texcoco, Montecillo, Edo. de México, México
b Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, México DF, México
c Universidad Autónoma de Chapingo, CP 56230, Km 38.5, Carretera México-Texcoco, Chapingo, Edo. de México, México
d Colegio de Postgraduados, Km 36.5, Carretera México-Texcoco, Montecillo, Edo. de México, México

* Corresponding author (j.crossa{at}cgiar.org)


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
In plant breeding, selection indices (SI) help select the best individuals for the next breeding cycle on the basis of observed phenotypic values for several traits of each candidate individual. Selection indices, as originally defined by Smith (1936), assign subjective economic weights to each trait and are relatively simple to analyze. Their disadvantages are that they require large amounts of information; economic weights are difficult to assign; sampling error can be large; and the statistical sampling properties of SI of the selection response are unknown except in the case of two traits. The objective of this study is to propose and use an SI based on the eigenanalysis method (ESIM), in which the first eigenvector is used as the SI criterion and its elements determine the proportion of the trait that contributes to the SI and are used in the selection response. ESIM does not require assigning economic weights or estimates of the genotypic covariance matrix. Statistical properties of the estimators of ESIM and its response to selection are described. It is shown how ESIM estimates selection gains between selection cycles. A predictive function of the ESIM selection response in future selection cycles is proposed. Three data sets are used to show the properties of ESIM.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
IN PLANT and animal breeding, the best individuals are selected for the next breeding cycle on the basis of observed phenotypic values for several traits in each candidate individual. The point is to choose candidate individuals with high genotypic (g) values for traits (which are not directly observed) related to observable phenotypic scores (p) for those traits. It is assumed that the genotypes' unobserved values and their observed phenotypic scores have a joint probability distribution such that the expected value of g, given p, E(g/p) = g, is the regression of g on p. This regression should be a good predictor of g, so that individuals with the highest values of g can be selected (Bulmer, 1980). A function of the observed phenotypic values p such as E(g/p), which is used to rank and select the candidate individuals, is called a selection index (SI).

Selection indices were originally defined by Smith (1936) as a linear combination of the observed phenotypic values of the expression of traits of interest and are generally used to discriminate among selection units by taking into account both the genetic and statistical structure of the population from which the genotypes originated, as well as the economic importance of the trait(s). Thus, when evaluated, only those individuals predicted to have progeny of superior economic value are reproduced (Quinton and McMillan, 1995).

Selection index applications are of two types. One is single trait improvement, where it is possible to increase selection efficiency by incorporating information into the SIs about traits related to the trait of interest (Wei et al., 1996; Falconer and Mackay, 1997). The other is multiple trait improvement, which requires assignment of relative economic weights to the different traits. Determination of appropriate economic weights for different traits can be difficult; therefore, modified indices, such as the base index, modified base index, and nonweighted multiplicative index, have been proposed (Tallis, 1962; Williams, 1962; Elston, 1963; James, 1968; Baker, 1986).

The Smith index (1936), also called the optimum SI (Bulmer, 1980; Van Vleck, 1993), takes into account both heritability and genetic correlation between traits when assigning economic weights. Among its major advantages are (i) it assigns higher weights to traits whose differences are genetic; and (ii) it is relatively simple to analyze. Its disadvantages are (i) it requires large amounts of information; (ii) economic weights are difficult to assign; and (iii) sampling error can be large (Bulmer, 1980; Bernardo, 2002). Also, the statistical sampling properties of the Smith SI and its response to selection (R) are unknown except in the case of two traits (Haye and Hill, 1980). Even for two traits, the sampling properties of Smith's SI and its R, found by Harris (1964) using the delta method, are not easy to evaluate.

To generalize Smith's methodology (1936), Kempthorne and Nordskog (1959) proposed an index that set restrictions based on a predetermined improvement level. Henderson (1963) proposed a method that not only eliminates economic weights but also uses the SIs in the context of noninbred populations. On the basis of a linear mixed model, Henderson (1963) demonstrated that it is possible to predict the selection criteria, which he defined as an SI's particular realization. One of the problems with Henderson's procedure is the large amounts of information required to estimate the biometric parameters of interest and the complexity of its application in the context of plant breeding. None of the above-mentioned studies considers the possibility of applying Smith's (1936) basic SI concepts to develop a method that would be simple to implement while maintaining mathematical rigor.

Recently, Cerón-Rojas and Sahagún-Castellanos (2005) developed an SI based on principal components analyses of the phenotypic variance–covariance matrix of the traits, where the first principal component is used as the only SI criterion and no assignation of economic weights is required. The elements of the eigenvector associated with the first eigenvalue determine the proportion of the trait that contributes to the SI. However, the specific assumptions that were made led to inconsistencies that produced an overestimation of the selection response.

The objectives of this study were (i) to propose a methodology for constructing a SI based on eigenanalysis of the phenotypic variance-covariance (or correlation) matrix of the traits of interest (hereafter called ESIM, for eigen selection index method) and show that ESIM does not require economic weights and genotypic variances-covariances; (ii) to demonstrate that although the methods of Cerón-Rojas and Sahagún-Castellanos (2005) and ESIM use the first eigenvalue and its associated eigenvector to construct an SI, they have different assumptions and therefore produce different estimates of the selection response; (iii) to show the statistical properties of ESIM estimators and the selection response (R) obtained from ESIM; and (iv) to use ESIM for constructing two functions, one for estimating selection gains (or losses) between selection cycles and the second for predicting R in future selection cycles. The main theoretical results are applied to three data sets for practical evaluation of the properties of ESIM estimators and for predicting the selection response and estimating the gains between selection cycles obtained on the basis of ESIM.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
Theory
Smith's Selection Index Method
Smith (1936) developed a methodology for the simultaneous selection of several traits on the basis of a linear combination of

Formula 1[1]
where Y is the selection index (SI), and p' = [p1...pq] and ß' = [ß1...ßq] are the vector of phenotypic values and the vector of the coefficients of Y, respectively; Z denotes the breeding value or net genetic component that could be gained through selection, g' = [g1...gq] is the vector of genotypic values, and {theta}' = [{theta}1...{theta}q] is the vector of economic weights of the genotypic values, which, according to Smith (1936), breeders could determine on the basis of their experience and is considered a vector of constants.

Smith (1936) modeled the pj (j = 1, 2,..., q) phenotypic values as pj = gj + {varepsilon}j where gj is the genotypic value of the jth trait and {varepsilon}j is the environmental component affecting that trait. Smith (1936) assumed that the interactions between gj and {varepsilon}j can be considered a random effect and that gj represents only additive effects such that Z = {theta}'g denotes the breeding value (Hazel, 1943; Kempthorne and Nordskog, 1959). Under these assumptions, selection based on Y = ß'p leads to a selection response (R) equal to ßYZD (Kempthorne and Nordkog, 1959), that is,

Formula 2[2]
where D is the selection differential and ßYZ = {theta}'{Sigma}ß/ß' is the proportion of D that is expected to be realized when selection is applied (Holland et al., 2003); {Sigma} and S are the variance–covariance matrices of genotypic and phenotypic values, respectively; {theta}'{Sigma}ß is the covariance between Y and Z [Cov(Z, Y)], and ß' is the variance of Y ({sigma}Y2). Another way to write R is

Formula 3[3]
where k = D/{sigma}Y is the standardized selection differential, {sigma}Z2 = {theta}'{Sigma}{theta} is the variance of Z, and {rho}YZ is the correlation between Y and Z; {theta} and ß have been defined in Eq. 1; and {Sigma}, S, ß' and {theta}'{Sigma}ß were defined in Eq. 2.

On the basis of the observed vector p, Smith (1936) proposed maximizing R from Y = ß'p by finding partial derivatives of the natural logarithm (ln) of {rho}YZ [ln({rho}ZY)] with respect to ß, such that

Formula 3
thus

Formula 4[4.1]
from which Smith (1936) obtained the vector that maximizes {rho}YZ as

Formula 5[4.2]
since ß'/{theta}'{Sigma}ß was considered to be constant and the coefficient of correlation, invariant to changes in scale (Kempthorne and Nordskog, 1959). Equation [4.1] is Smith's fundamental result and provides the basis of standard SI theory. Note that because ß'/{theta}'{Sigma}ß is a constant, it can take any value within the domain of the real numbers including 1.0 (Henderson, 1963). Thus, the assumption ß' = {theta}'{Sigma}ß and its consequences are valid.

Estimating ßS, {Sigma}{theta}, and RS of the Smith's Method
The Smith (1936) method estimates ßS directly from Eq. [4.2] as Formula 5S = Formula 5–1Formula 5{theta}, where Formula 5–1 is the inverse of Formula 5, Formula 5 is an estimate of {Sigma}, and {theta} depends on the experience of the researcher. The estimate of RS is Formula 5S = kFormula 5. Since the probability densities of Formula 5S, and Formula 5S are unknown their sampling properties cannot be determined.

Eigenanalysis Selection Index Method (ESIM)
Assuming that ß' = {theta}'{Sigma}ß, Cerón-Rojas and Sahagún-Castellanos (2005) showed that Eq. [3] can be written as R = kFormula 5 and that maximizing R is equivalent to maximizing ß'. This led to the equation (S {lambda}I)ß = 0 where ß and {lambda} are the eigenvector and eigenvalue of S, respectively. Therefore, ß' = {lambda}, and R = kFormula 5. In contrast, the ESIM does not make the assumption that ß' = {theta}'{Sigma}ß, but it only assumes that the genotypic variance–covariance matrix multiplied by the vector of economic weights is equal to the eigenvector of the phenotypic variance covariance matrix, that is, {Sigma}{theta} = ß. This assumption implies (under the restriction ß'ß = 1.0) that Formula 5 = {lambda} and, therefore, Eq. [4.1] takes the form = {lambda}ß, from which

Formula 6[5]
and, as before, ß and {lambda} are the eigenvector and eigenvalue of S, respectively. Another way of obtaining the previous result is as follows. If {Sigma}{theta} = ß, Formula 6 = {lambda} (or ß' = {lambda}{theta}'{Sigma}ß), and {rho}YZ = Formula 6 = Formula 6, then by Eq. [3],

Formula 7[6]

Therefore, maximizing R is the same as maximizing the SI variance, ß'. Following Anderson (2003), Cerón-Rojas and Sahagún-Castellanos (2005) derived ß' with respect to ß and {lambda}, subject to the usual restriction used in eigenanalysis that the normalized eigenvectors are unit length (ß'ß = 1.0) and found that

Formula 8[7]
Therefore, Eq. [7] is the same as Eq. [5].

The only assumption made for deriving ESIM, {Sigma}{theta} = ß and its implication Formula 8 = {lambda}, makes the approach of Cerón-Rojas and Sahagún-Castellanos (2005) for SI inconsistent because they imply that ß' = 1.0, which makes the selection response equal to the selection differential D, that is, R = k Formula 8 = kFormula 8 = k = D (where k = D/{sigma}y).

The aim of the assumption {Sigma}{theta} = ß is to facilitate the theoretical development of ESIM. However, Formula 8 and Formula 8{theta} are point estimates of ß and {Sigma}{theta}, respectively. While Formula 8 is the maximum likelihood estimate of ß, Formula 8{theta} is an empirical estimate of {Sigma}{theta} because {theta} depends on the experience of the researcher. Thus, the assumption {Sigma}{theta} = ß does not imply that Formula 8Formula 8 = Formula 8 but rather that Formula 8{theta} {cong} Formula 8.

Relation between the Smith SI and ESIM
Consider Eq. [3] and denote the response to selection and the SI of the Smith method as RS and YS, respectively. Using Eq. [4.2], the variance of YS is ßS'S = (S–1{Sigma}{theta})'SS–1{Sigma}{theta} = (S–1{Sigma}{theta})'I{Sigma}{theta}, but since S and {Sigma} are symmetrical matrices, ßS'S = {theta}'{Sigma}S–1{Sigma}{theta}. Similarly, the covariance between YS and Z is {theta}'{Sigma}ßS = {theta}'{Sigma}S–1{Sigma}{theta}. Therefore, the variance of YS and the covariance between YS and Z are the same and the response to selection of the Smith SI is RS = kFormula 8. If {Sigma}{theta} = ß (with ß != ßS), then ß'S–1ß = Formula 8, where 1/{lambda} and ß are the eigenvalue and eigenvector of S–1, respectively. Thus

Formula 9[8]

On the other hand, since ß' = {lambda}, Eq. [6] is R = Formula 9, but because ß = {Sigma}{theta}, the response to selection of ESIM is

Formula 10[9]

The equality Formula 10 = kFormula 10 is valid because ß is the same eigenvector regardless of whether it is obtained from S or from S–1; this is because = {lambda}ß, then S–1ß = Formula 10ß. Thus, when ß is computed from S it is associated with the first eigenvalue of S ({lambda}), but when it is obtained from S–1, it is associated with the smallest eigenvalue of S–1 (1/{lambda}).

Equations [8] and [9] show that the response to selection from the Smith's method and ESIM are the same. Also, because ßS = S–1{Sigma}{theta}, for {Sigma}{theta} = ßESIM, then ßESIM = S. Results from Eq. [8] and [9] can also be more formally derived by the Cauchy-Schwartz inequality (see Appendix).

Estimating Eigenvalues, Eigenvectors, ESIM, and the Response to Selection of ESIM
ß' and ß'ß can be derived for ß'ß = 1.0 and (S{lambda}I)ß = 0 as long as (S{lambda}I) is a singular matrix, that is, the determinant |S{lambda}I| must be a function of {lambda} that equals zero. The equation |S{lambda}I| = 0 generates a polynomial with q roots {lambda}1, {lambda}2,..., {lambda}q, which are its potential solutions. The maximum likelihood estimators of the q eigenvalues and q eigenvectors are Formula 10i's and Formula 10i's, for which |Formula 10Formula 10iI|Formula 10i = 0, i = 1,2,...,q, where Formula 10 is an estimator of S (Anderson, 2003).These results allow writing the estimate of ESIM as Formula 10i = Formula 10i'p and the estimate of response to selection of ESIM as Formula 10ESIM = Formula 10.

Properties of Formula 10i, Formula 10i, and the estimator of ESIM (Formula 10i)
The maximum likelihood estimators of Formula 10i's and Formula 10i's are asymptotically consistent and unbiased, such that E(Formula 10i) = {lambda}i and, according to Anderson (2003)

Formula 11[10]
Mardia et al. (1982) indicate that, asymptotically, E(Formula 11i) = ßi and for j != i,

Formula 12[11]

Formula 13[12]
where q is the number of traits and n the number of individuals. For example, suppose that q = 3; then the variance of Formula 13;1 is Var(Formula 131) {approx} Formula 13}Formula 13Formula 13ßjßj' = Formula 13Formula 13 and the covariance of Formula 131 and Formula 133 is Cov(Formula 131,Formula 133) {approx}Formula 13ß3ß1'.

Concerning the properties of the estimator of ESIM, Formula 13i = Formula 13i'p because Var(Y) = ß', an estimator of Var(Formula 13i) is Formula 13ar(Formula 13i) = Formula 13i'Formula 13Formula 13i = Formula 13i. Furthermore, since Formula 13i is a maximum likelihood estimator, E(Formula 13i) = E(Formula 13i'p) = ßi'p = Yi (Mardia et al., 1982).

The correlations between Yi and the elements of the phenotypic vector p, according to Mardia et al. (1982), are

Formula 14[13]
where ßij is the jth value of the ith eigenvector (ßi) of S (i, j = 1, 2,..., q), {lambda}i is the eigenvalue associated with ßi and sj2 is the jth variance in the diagonal of S. {rho}Yipj thus allowing us to estimate the contribution of each trait to ESIM.

Properties of the Estimator of the Selection Response of ESIM (Formula 14)
To determine the properties of the estimator of the selection response obtained from ESIM, consider the asymptotic expected value [E(Formula 14YZ)] and variance [Var(Formula 14YZ)] of Formula 14YZ. If Y and Z have a joint normal distribution (Rahman, 1968), then

Formula 15[14]
where n is the sample size. Since k and {sigma}Z are fixed, according to Eq. [9] Formula 15 = Formula 15 = k{sigma}ZFormula 15YZ. Thus, from Eq. [14], the expected value and variance of the estimator of the selection response are given in Eq. [15.1] and [15.2], respectively:

Formula 16[15.1]

Formula 17[15.2]
Note that the last part of Eq. [15.1] is the bias of Formula 17. The E(Formula 17) and Var (Formula 17) indicate that Formula 17 is an asymptotically unbiased as well as a consistent estimator of R.

Additional criteria that can be used to characterize the quality of Formula 17 are the sampling error (SE), the coefficient of variation (CV), the mean-square error (MSE), and accuracy (AC). The SE is the square root of Var (Formula 17). The CV is defined as the ratio of the SE to the expected value of the estimator: CV(Formula 17) = Formula 17 = Formula 17Formula 17. The MSE is defined as the sum of the variance of the estimator and the square of the bias of this estimator. Thus MSE(Formula 17) = Var(Formula 17) + Formula 17. The AC is the bias divided by the sampling error and indicates how close the expected value of the estimator is to the population parameter. For Formula 17 the expression of AC is AC(Formula 17) = Formula 17.

Estimating Selection Gains between Selection Cycles from ESIM
The following question may be of considerable interest to breeders: what is the gain (or loss) from one selection cycle to another or for a period of time involving several selection cycles? This implies estimating the selection gain (or loss) between two or more consecutive selection cycles as a function of the selection response in each cycle. Within the framework of ESIM, this can be answered as follows. Suppose f({lambda}) = k/Formula 17 is a continuous function of the random variable {lambda} in the interval [Formula 17j, Formula 17j+1] (for Formula 17j > Formula 17j+1), where Formula 17j and Formula 17j+1 denote the estimated eigenvalue of selection cycles j and j+1 and Formula 17j+1 ≤ {lambda} ≤ Formula 17j. A function F, such that

Formula 18[16]
allows computing the selection gain between selection cycles. The magnitude of the Fj,j+1 ({lambda}) value indicates the importance of the change from one cycle to the next. Positive numbers denote selection gains, and negative numbers represent selection losses. Since the selection process will tend to reduce (at least partially) the traits' phenotypic variability, it is expected that the eigenvalue, which captures that variability, will also be reduced, the eigenvalues will be more similar, and Fj,j+1 ({lambda}) will tend to zero.

A Function for Predicting the ESIM Selection Response in Future Selection Cycles
A problem that has not yet been considered within the context of Smith's SIs (1936) is how to construct a function to predict the selection response based on results obtained at a given stage of the breeding process. Let us consider the estimator of the selection response Formula 18ESIM = k/Formula 18. Since k is constant, the only term that determines the variation of Formula 18ESIM is Formula 18. When 1.0 < Formula 18 < {infty}, suppose R0 = k–1Formula 18ESIM, then R0 = 1/Formula 18, and we can construct the stochastic matrix T = Formula 18. A function with which to predict future realizations of Formula 18 using Formula 18 can be constructed on the basis of T.

Another way of writing T is T = QFormula 18Q–1, where {lambda}1* and {lambda}2* are eigenvalues of T, and Q is a matrix of eigenvectors ß1* and ß2* of T. The eigenvalues of T are obtained from the determinant |T{lambda}*I| = 0 as

Formula 18

Formula 18

The elements of eigenvectors ß1* and ß2*, associated with {lambda}1* and {lambda}2*, are

Formula 18

According to the above results, the Q matrix is Q = Formula 18. In this case, Q–1 = Q. Thus, for N generations,

Formula 18
Since R0 = k–1Formula 18ESIM = 1/Formula 18, the function for predicting the selection response is

Formula 19[17]

In Formula 19(Formula 19), k is the standardized selection differential defined in Eq. [3]. It is expected that one or two selection cycles will be predicted with more precision than more distant selection cycles because as the value of {lambda} decreases in magnitude, the precision of the prediction will decrease. Note that k should be the same in selections cycles.

The Sign of the Scores of Eigenvectors of ESIM
The ESIM might assign a negative weight to an important trait such as grain yield; thus, individuals with low grain yield will be selected. In this case, the sign of the score of the first principal component (weight) must be changed so that individuals with high yield performance are selected. Consider any two phenotypic traits, then S = Formula 19 and the eigenvalues are obtained of S as

Formula 20[18]

The elements of the first eigenvector (ß1) associated to the first eigenvalue ({lambda}1) are obtained from Formula 20Formula 20 = 0 where Formula 20Formula 20 = 0. The equation ß1 + Formula 20ß2 = 0 has an infinite number of solutions for ß1 and ß2, thus it is necessary to fix ß1 or ß2. For example, assume that ß2 = 1, then ß1 = –Formula 20 = Formula 20, and thus the numerator of ß1 can be positive or negative. If ß1 = Formula 20 and ß2 = 1, then

Formula 21[19.1]

Formula 22[19.2]

However, if ß1 = –Formula 22 and ß2 = 1 then ß1 = Formula 22 and ß2 = Formula 22. These results show that the sign of the ESIM can be arbitrarily changed without affecting the ESIM (however, the response to selection will be affected).

Data Sets
Data Set 1
This data set was used for checking the admissibility of the assumption made when deriving ESIM, that is, {Sigma}{theta} = ß. The data were taken from Becker (1985), where two traits measured in pigs (Sus domesticus) are considered: conversion of feed into weight (p1) and area of eye muscle (p2). Economic weights were given by the author. The vector of economic weights given by Becker (1985) is {theta}' = [–50 12], and the corresponding estimates of the variance-covariance matrices of phenotypic and genotypic values are Formula 22 = Formula 22 and Formula 22 = Formula 22; thus, Formula 22{theta} = Formula 22.

Data Set 2
This data set (from Manning, 1956) was used for computing the estimators of R from Smith's SI (1936) and determining the properties of Formula 22i's, Formula 22i's, and R from ESIM. It was also used for predicting R, for computing the total selection gains between consecutive selection cycles based on ESIM, and to exam the assumption of ESIM, {Sigma}{theta} = ß. The three traits measured in cotton (Gossypium hirsutum L.) were number of cotton balls per plant (V1), number of seeds per ball (V2), and number of lint per seed (V3), evaluated in seven annual selection cycles during 1949 through 1955 (Manning, 1956). Estimates of the phenotypic and genotypic (in parenthesis) variances of the three traits are given in Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1. Estimates of the phenotypic and genotypic (in parenthesis) variances Formula 22V12, Formula 22V22, and Formula 22V32 and covariances Formula 22V1V2, Formula 22V1V3, and Formula 22V2V3 for three cotton traits: number of cotton balls per plant (V1), number of seeds per ball (V2), and lint per seed (V3), in each of the seven annual selection cycles from data set 2 (extracted from Manning, 1956).

 
Data Set 3
To measure, in practice, the quality of the estimator of the selection response (Formula 22) based on ESIM, the data from a CIMMYT maize (Zea mays L.) trial comprising 144 genotypes evaluated in one environment was used. The following 19 traits were measured: germination (ger); number of plants (plt); anthesis date (an); silking date (si); plant height (ph); ear height (eh); moisture (mo); root lodging (rl); stalk lodging (sl); % erect plants (pe); foliar disease ratio (fdr) measures the general health condition of the leaves on a scale of 1 to 5; number of plants harvested (phv); number of ears harvested (ehv); field weight (we); % ear rot (ert) measured on a scale of 1 to 5; ear rot (erot) measured in percentage; ear appearance (easp), agronomic scale (ags), which indicates the general agronomic condition of the plant measured on a scale of 1 to 5; and grain yield (yld). Ten percent selection pressure was used to estimate the ESIM selection response.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
The Assumption {Sigma}{theta} = ß and the Estimators Formula 22 and Formula 22Formula 22
Using data set 1, Formula 22–1 = Formula 22, and the normalized vector Formula 22{theta} is (Formula 22{theta})' = [Formula 22].

The maximum likelihood estimate of the first eigenvalue of Formula 22 (Eq. [18]) is Formula 221 = 9.025 and the elements of the eigenvector (Eq. [19.1] and [19.2]) associated with Formula 221 are b1 = –0.054 and b2 = 0.998. In this case, Formula 22{theta} indeed equals Formula 22ESIM.

For data set 2, Manning (1956) did not provide economic weights, but they can be estimated from Formula 22S = Formula 22–1Formula 22{theta} as Formula 22 = Formula 22–1Formula 22Formula 22S. Then the estimator of {Sigma}{theta}, in this case, is Formula 22Formula 22 = Formula 22Formula 22–1Formula 22Formula 22S = Formula 22Formula 22S. Normalized and nonnormalized coefficients and estimated values of ßS are shown in Table 2. Differences between Formula 22Formula 22 and Formula 22ESIM are small in some years and large in other years. The greatest discrepancy between Formula 22Formula 22 and Formula 22ESIM occurred in years 1950 and 1953, where the estimate Formula 22Formula 22 assigned the largest weight to variable V2 (and not to variable V1, as Formula 22ESIM did).


View this table:
[in this window]
[in a new window]
 
Table 2. Estimates of eigenvectors, from ESIM and from Formula 22Formula 22 (in parenthesis) (bV1ESIM, bV2ESIM, and bV3ESIM), and normalized and non-normalized (in parenthesis) Smith's SI coefficients (bV1S, bV2S, and bV3S) for three cotton traits; number of cotton balls per plant (V1), number of seeds per ball (V2), and lint per seed (V3), in each of the seven annual selection cycles from data set 2. Eigenvectors transformed from ESIM (bV1ESIM, bV2ESIM, and bV3ESIM) to Smith's SI coefficients (bV1S, bV2S, and bV3S) and from Smith's SI coefficients to ESIM.

 
The vector of coefficients of the Smith SI (ßS) is estimated as Formula 22S = Formula 22–1Formula 22{theta}, therefore the assumption {Sigma}{theta} = ß implies that the estimate of the eigenvector of S (Formula 22ESIM) and ßS has the relationships Formula 22ESIM = Formula 22Formula 22S and Formula 22S = S–1Formula 22ESIM. This shows that the coefficients from Smith SI can be transformed to the coefficients of ESIM and vice versa. Therefore, if the previous implications are true this suggests that the assumption {Sigma}{theta} = ß is realistic. To demonstrate the validity of these relationships, consider data set 1 first. For Formula 22{theta} = Formula 22 and Formula 22–1 = Formula 22 then Formula 22S = Formula 22–1Formula 22{theta} = Formula 22 and therefore before normalization Formula 22ESIM* = Formula 22Formula 22S = Formula 22{theta} = Formula 22 so that Formula 22S = Formula 22–1Formula 22ESIM*. A similar procedure can be used for data set 2 where the normalized eigenvector from ESIM can be transformed to the Smith SI coefficients and vice versa. Results from Table 2 indicate that the relationships are true, that is, values of Formula 22Formula 22 (in parenthesis) of the first three columnsare similar to those obtained when transforming the Smith's SI to ESIM coefficients and the normalized Smith coefficients of the last three columns are similar to those obtained when transforming the ESIM coefficients to the Smith SI coefficients. The advantages of being able to transform the coefficients of ESIM into the Smith SI and vice versa are two-fold. First, the economic weights are not necessary for calculating Formula 22S (Formula 22S = Formula 22–1Formula 22ESIM*). Second, it is unnecessary to estimate the genotypic variance-covariance matrix({Sigma}).

Estimating the Selection Response from ESIM
Denote the response to selection using the eigenvector of Formula 22 and Formula 22Formula 22 as Formula 22ESIM and Formula 22ESIM*, respectively. Using data set 1 and a selection pressure of 5% (k = 2.063), we obtain Formula 22ESIM = Formula 22 = Formula 22 = 0.687 and RESIM* = Formula 22=Formula 22 = 0.687

Estimates of the selection response from Smith's SI (Formula 22S) and ESIM (Formula 22ESIM), and their proportions (Formula 22ESIM/Formula 22S), for data set 2 are shown in Table 3. Formula 22S was computed as Formula 22S = kFormula 22 (with Formula 22S being a normalized vector) and ESIM, Formula 22ESIM = Formula 22 (where Formula 22 is the estimate of the largest eigenvalue of Formula 22). In 3 yr (1951, 1953, and 1954), the estimated selection response obtained from ESIM was greater than the response based on Smith's SI. In 1950, 1952, and 1955, the Smith's SI responses were greater than the ESIM response. Estimates of the selection response from Formula 22ESIM and <Formula 22ESIM* and their proportions (Formula 22ESIM/Formula 22ESIM*) show that except for year 1953, the ratio Formula 22ESIM/Formula 22ESIM* tends to fluctuate around 1.


View this table:
[in this window]
[in a new window]
 
Table 3. Year of selection, estimates of the selection response using Smith's (1936) method (Formula 22S) and eigenvalues of the ESIM(Formula 22ESIM and Formula 22ESIM*), and their ratios for seven annual selection cycles using three cotton traits and 5% selection pressure (k = 2.063) from data set 2.

 
These results indicate that the assumption {Sigma}{theta} = ß (and Formula 22 = {lambda}) is generally reasonable, that is, Formula 22 and {Sigma}{theta} are eigenvalues and eigenvectors, respectively, of the S phenotypic variance–covariance matrix. The assumption {Sigma}{theta} = ß places Smith's result (1936) within the context of the eigenvalues and eigenvectors of the phenotypic variance–covariance matrix, and it facilitates the estimation of SI without the need to estimate the genotypic variance–covariance matrix and the economic weights.

Estimating the ESIM parameters {lambda} and ß
From Table 1 we calculated the phenotypic variance-covariance matrix for the three traits in 1949. In this case, the first, second, and third eigenvalues are Formula 221949 = 7.301, Formula 221949 = 0.933 and Formula 221949 = 0.040, respectively, and the three associated eigenvectors are Formula 221' = [0.999 –0.0190.013], Formula 222' = [0.0200.955 –0.096], and Formula 223' = [–0.0110.0960.955]. From Eq. [10] and [11], the estimated variance of the first eigenvalue Formula 221949 is Formula 22ar(Formula 221949) = Formula 22, while the estimated variance-covariance matrix of its associated eigenvector Formula 221 is Formula 22ar(Formula 221) {approx} Formula 22Formula 22 = Formula 22Formula 22.

Note that as n increases, the variance of the first eigenvalue Formula 221949 and the variance of its associated first eigenvector Formula 221' tend to zero.

Estimating Selection Gain Fj,j+1({lambda}) between Selection Cycles
The eigenvalues and corresponding selection responses for the 7 yr are in Table 4. The selection gain between 1949 and 1950 can be calculated from Eq. [16] using a selection pressure of 5% (k = 2.063): F1949–1950({lambda}) = –2(2.063)[Formula 22Formula 22] = 2.30. However, a negative Fj,j+1({lambda}) quantity would indicate a selection loss in selection cycle j+1 with respect to selection cycle j. For example, the selection gain from 1951 to 1952 is F1951–1952({lambda}) = –2(2.063)[Formula 22Formula 22] = –4.31, which is negative because Formula 221952 > Formula 221951 (Table 4). This means that selection in 1952 was less effective than in 1951. The total selection gain from 1949 to 1955 is Ft({lambda}) = 6.41. The cumulated selection gain from 1949 to 1955 can be found as follows: F1949–1955({lambda}) = –2(2.063)[Formula 22Formula 22] = 6.41, which means that in 1955 there was 6.41 more selection advance than in 1949.


View this table:
[in this window]
[in a new window]
 
Table 4. Year of selection, estimated eigenvalues from ESIM (Formula 22ESIM) for each selection cycle, estimates of the selection response using the estimate eigenvalues for each selection cycle (Formula 22{lambda}ESIM), predicted selection responses using Formula 22ESIM from the previous year of selection (Formula 22 from previous year) (5% selection intensity, k = 2.063), and the selection gains [Fj,j+1({lambda})] between two consecutive cycles using data set 2.

 
Selection effectiveness is related to the magnitude of the eigenvalues and their differences from 1 yr to the next; this is reflected in the positive selection gains from 1949 to 1950, 1950 to 1951, and 1952 to 1953 (Table 4). There was not much selection gain (0.86) from 1954 to 1955 because of the small difference between the two corresponding eigenvalues.

Predicting the Selection Response of ESIM in Future Cycles
To illustrate the use of Eq. [17], which was derived to predict the selection response in future selection cycles, we used the eigenvalue for the 1949 cycle (Formula 22 = 7.301) to predict the value of the selection response in 1950, that is, Formula 22(7.301) = Formula 22Formula 22 = 0.96, which is the same as the selection response from the ESIM of that cycle. The prediction for 1951 using Formula 22(4.600) (from 1950) was 1.03, and the actual value was 1.33. Note that in 1953, the eigenvalue was smaller than 1.0; thus, no prediction was possible. The low eigenvalue estimated in 1954 (1.84) did not allow Formula 22(1.84) to be a good predictor of 0.80, the value obtained in 1955.

It should be pointed out that the restriction 1.0 < Formula 22 < {infty} is necessary for computing Formula 22(Formula 22). Cases where 0 < Formula 22 ≤ 1.0 should be rare and will occur only when phenotypic variability has decreased, and thus selection will not be effective. Also, when Formula 22 = 4.0, Formula 22(Formula 22) = k/2, and no prediction is possible.

Evaluating the Properties of the Selection Response of ESIM (Formula 22)
To measure, in practice, the quality of the estimator of the selection response (Formula 22), data set 3 was used. With a selection pressure of 10% (k = 1.755), the number of selected individuals with the highest ESIM values was n = 15 (Table 5). The estimated eigenvalue was 5.008 and the estimated selection response was 0.784. Using the 15 selected individuals, the expected values, variance, sampling error, coefficient of variation, mean square error, and accuracy of the selection response were computed as Ê(Formula 22) = 0.7836, Formula 22ar (Formula 22) = 1.9552, Formula 22E(Formula 22) = 1.3893, Formula 22V(Formula 22) = 1.120, Formula 22SE(Formula 22) = 1.9552, and ÂC(Formula 22) = 0.02, respectively. The bias of the selection response estimator was 0.03. Note how small the values of the estimated bias and the accuracy are, despite the fact that the sample includes only 15 individuals. This suggests that the selection response estimator is very accurate. To estimate {sigma}Z2 = {theta}'{Sigma}{theta}, we assumed that {theta} was the eigenvector of {Sigma}, so that Formula 22Z2 = Formula 22{Sigma}, where Formula 22{Sigma} = 9.082 is the eigenvalue of Formula 22.


View this table:
[in this window]
[in a new window]
 
Table 5. Values of 15 genotypes selected on the basis of ESIM (from correlation matrix) of the following traits: germination (ger), number of plants (plt), anthesis date (an), silking date (si), plant height (ph), ear height (eh), moisture (mo), root lodging (rl), stack lodging (sl), % erect plants (pe), foliar disease ratio (fdr), number of plants harvested (phv), number of ears harvested (ehv), field weight (we), % ear rot (ert), ear rot (erot), ear appearance (easp), agronomic scale (ags), grain yield (yld), and eigen selection index (ESIM) from data set 3.

 
A biplot of the principal component shows the relationships between each of the 144 genotypes and the 19 traits (plus ESIM) (Fig. 1 ). The first two principal components accounted for 49.3% of total variability. Genotypes 21, 4, 34, 113, and 143, located toward the right on the biplot, had the highest values for ESIM as well as for traits mo, ger, plt, ehv, yld, we, and phv. However, these genotypes had low values for traits located on the opposite (left) side of the biplot (i.e., ero, ert, easp, and ags). On the other hand, Genotype 127, located farther apart on the left side of the biplot has very low values for ESIM and traits located on that same side but high values for traits ero, ert, easp, and ags. The ESIM values are positive and highly correlated with traits ger, plt, ehv, yld, we, and phv (0.7, 0.7, 0.7, 0.9, 0.9, and 0.8, respectively) but negative and highly correlated with traits located on the left side of the biplot, erot, ert, easp, and ags (–0.7, –0.7, –0.8, and –0.7, respectively). ESIM values are not correlated with traits such as si, an, pe, fdr, and sl. From the biplot, it can be concluded that the ESIM selects genotypes with high values for traits mo, ger, plt, ehv, yld, we, and phv; low values for traits ero, ert, easp, and ags; and intermediate values for the remaining traits.


Figure 1
View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1. Biplot of the first and second principal component axes (Dimension 1 and Dimension 2) of 144 genotypes (1–144) and the traits eigen selection index (ESIM), germination (ger), number of plants (plt