Published online 19 March 2008
Published in Crop Sci 48:417-423 (2008)
© 2008 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
Breeding Line Selection Based on Multiple Traits
Weikai Yan* and
Judith Frégeau-Reid
Eastern Cereal and Oilseed Research Centre (ECORC), Agriculture and Agri-Food Canada (AAFC), Neatby Building, 960 Carling Ave., Ottawa, ON, Canada, K1A 0C6
* Corresponding author (yanw{at}agr.gc.ca).
 |
ABSTRACT
|
|---|
Breeding line selection, either for potential varieties or for useful parents, must be based on multiple breeding objectives (or traits). Varieties cannot have any major defects, while parents must have outstanding levels in at least one trait. Due to undesirable associations among breeding objectives, it is difficult to accomplish both tasks (variety selection and parent selection) through a single selection strategy. Additional complication results when a program is breeding for different end-uses such that both high and low levels of a trait are desirable. The first purpose of this paper was to propose a comprehensive multitrait selection procedure that coherently combines independent selection, independent culling, and index selection so that all the aspects in breeding line selection are taken into consideration. A dataset of 150 oat (Avena sativa L.) breeding lines with values evaluated for four quality traits (groat, oil, protein, and beta-glucan concentrations) was used for illustration. A genotype by trait biplot is a useful tool for exploring multiple trait data and can aid in multitrait selection because it graphically displays the trait associations across, and the trait profiles of, the genotypes. Procedures are outlined to avoid possible misinterpretation of such a biplot when the biplot does not fully display the patterns.
Abbreviations: GT biplot, genotype by trait biplot
 |
INTRODUCTION
|
|---|
GERMPLASM EVALUATION and variety selection must be based on multiple traits or breeding objectives. For most crops, although yield is the number one breeding objective, quality is also very important. Furthermore, quality is not a single trait; rather, it is measured by many characteristics, which can be negatively associated. Also, quality means different things for different end-uses. For example, milling oats (Avena sativa L.) must have superior milling quality and composition quality. Milling quality consists of milling yield (groat content), ease of dehulling, and groat breakage during dehulling; composition quality includes beta-glucan, oil, and protein concentrations in the groat. While high beta-glucan and low oil are desirable for milling oats, the opposite is true for feed oats. In addition to yield and quality, agronomic traits and pest resistance that determine adaptation and performance stability are also essential breeding objectives. Consequently, selection based on multiple traits is an inevitable issue for all breeders.
Three strategies of multitrait selection were outlined by animal breeders as early as the 1940s (Simmonds and Smartt, 1999). These are (i) tandem selection, whereby different traits are selected in different generations; (ii) independent culling, whereby multiple traits are selected simultaneously (concurrently) and independently; and (iii) index selection, whereby multiple traits are selected simultaneously by an index that is a linear combination of various traits, where the traits are treated as compensational. Index selection has been a very important selection strategy and has become increasingly sophisticated in animal breeding (e.g., Villanueva and Williams, 1997). It has also become an important concept in plant breeding and has been widely used, implicitly or explicitly, for the selection of superior varieties as well as for the improvement of a complex breeding objective (Jannink et al., 2000; Ivkovich and Koshy, 2002; Milligan et al., 2003; Sharma and Duveiller, 2003; Long et al., 2006).
Simmonds and Smartt (1999) pointed out that tandem selection was ineffective and that all plant breeding programs employed a blend of independent culling and (inexplicit) index selection. In all breeding programs, selection is repeatedly performed in all generations whenever feasible, and whenever selection is made, a blend of independent culling and index selection is employed. In earlier generations, selection is applied to traits that are of vital importance and are relatively simply inherited, while independent culling and index selection are performed at later generations when multiple traits can be effectively evaluated. Conceptually, independent culling may be more appropriate for traits of more qualitative nature, whereas index selection should consider all of the breeding objectives.
In any breeding program, selection is a dual-purpose task: selection for varieties and selection for parents. The requirement for a new variety is that it meets the minimum criterion for all essential breeding objectives while having a superior package of traits, as measured by the selection index; failure to meet the minimum criterion for any breeding objective will lead to the failure of the variety. This is where independent culling and index selection must be applied. In contrast, the requirement for a parent is that it is outstanding in one or more of the breeding objectives; a higher selection index is desirable but not essential. The most appropriate strategy for selecting superior parents may be "independent selection," as opposed to "independent culling." Therefore, in all breeding programs and at any breeding stage, the selection scheme should consist of three strategies, explicitly or implicitly: (i) independent selection, (ii) independent culling, and (iii) index selection, and in that order.
Yet another factor to consider in developing the selection strategy is that a single breeding program may have to breed for different or contrasting end-uses such that both the high and the low levels of a trait are desirable, depending on the end-use. For example, an oat breeding program may breed for both milling oats and feed oats. While high oil concentration is desirable for feed oats, low oil is desirable for milling oats. Therefore, genotypes with either extremely high or extremely low levels of oil are valuable. Thus, a valid, comprehensive selection scheme is needed to consider all of the following aspects: (i) independent selection for one or both ends regarding a trait, (ii) independent culling for one or both ends regarding a trait, and (iii) index selection to consider all the breeding objectives and their relative importance in the ideotype.
Relationships among breeding objectives impact the choosing of selection strategies. If all breeding objectives were positively correlated, the selection would not be much more difficult than selecting for a single trait. If all breeding objectives were either positively correlated or independently inherited, selection would not be too difficult either. Unfortunately, strong, negative correlations between breeding objectives often exist, either genetically or physiologically, which makes breeding more challenging (Yan and Wallace, 1995; Lewis, 2006).
Therefore, sufficient attention must be paid to undesirable associations among breeding objectives when performing independent culling, because selection for the desired levels or culling for the undesired levels of one trait can mean selection against the desired levels of another trait, which can lead to the loss of useful materials or even render the selection population useless (Yan and Rajcan, 2002). Index selection can also lead to the loss of materials with desirable levels of a trait or to the retaining of materials that have serious defects for some traits. Since the parameters used in selection are largely subjective and determined solely by the researcher's personal experience and judgment, a useful selection scheme should, therefore, have the flexibility for experimenting with various selection parameters before a final selection decision is made.
A genotype by trait (GT) biplot is an effective tool for exploring multitrait data (Yan and Rajcan, 2002). It graphically displays the genotype by trait table and allows visualization of the associations among traits across the genotypes and of the trait profile of the genotypes. However, there are pitfalls in interpreting a GT biplot when the biplot does not fully approximate the data. As biplots are increasingly used by plant breeders, correct interpretation of GT biplots becomes essential. The purpose of this article is twofold. The first is to propose a comprehensive multitrait selection strategy, and the second is to propose a procedure for correct interpretation a GT biplot.
 |
MATERIALS AND METHODS
|
|---|
A subset of 150 advanced breeding lines from our oat breeding program were evaluated for four quality traits: groat percentage, oil, protein, and beta-glucan concentrations in the groat, using whole grain near-infrared reflectance analysis. The purpose of this exercise was to determine which lines to advance to yield testing based on their values for these four traits. The proposed comprehensive selection procedure includes the following steps:- The data are first standardized to the range of [0, 1] for each trait, 0 for minimum and 1 for maximum (Table 1
).
- A weight is assigned for each trait based on the researcher's expert judgment on the relative importance of the traits. The weight can be anything in the range of [–1, +1]. In our example, groat content was given the maximum weight of 1 as it is the most important of the four traits. Protein was given a weight of 0.3, and beta-glucan 0.5. Oil was given a weight of –0.5 because a lower oil concentration is desirable for milling oats, which is our primary breeding goal. The weights will be used to calculate a selection index, which is a linear combination of all traits, for each genotype.
- For each trait, specify whether high levels, low levels, or both ends are of interest. In our example, high levels of groat and protein are always desirable. High levels of beta-glucan and low levels of oil are desirable for milling oats while the opposite is true for feed oats for maximum energy content. Therefore, the high levels of all four traits and the low levels of oil and beta-glucan were specified.
- Specify whether independent selection is to be performed, and if yes, set the selection rate. In our example, the selection rate was set to 10%. This means that for a trait on which selection will be imposed, genotypes with a relative value of 0.9 or higher are selected if the high levels of the trait are of interest, and genotypes with a relative value of 0.1 or lower are selected if low levels are of interest. This selection rate for each trait is adjusted for the weight given to the trait. For example, a 10% selection rate means 5% for beta-glucan as its weight was set to 0.5.
- Specify whether independent culling is to be performed, and if yes, set the culling rate. In our example, the culling rate was set to 30%. This means that for a trait on which culling will be imposed, genotypes with a relative value of 0.3 or lower will be discarded if high levels of the trait are of interest and genotypes with a relative value of 0.7 or higher will be discarded if low levels are of interest. The culling rate for a trait is also adjusted for the weight given to it.
- Set an overall culling rate. After the selection index is calculated, the genotypes will be sorted by their selection index values. The researcher then specifies an overall culling rate or cut-off point. This cut-off point is largely determined by the stage of selection (preliminary or final) and by the quality of the data. If all important breeding objectives are considered in the selection index and if the data are representative and reliable, the culling rate can be very high so that only a few genotypes are retained; otherwise the culling rate should be sufficiently low so that many genotypes are given the opportunity for future evaluation. The rate was arbitrarily set to 50% in our example.
View this table:
[in this window]
[in a new window]
|
Table 1. Relative trait values [0, 1], selection index, and decisions made on independent selection, independent culling, and index selection for 150 covered oat lines (not all lines are shown).
|
|

View larger version (87K):
[in this window]
[in a new window]
|
Figure 1. The multitrait selection interface that combines independent selection, independent culling, and index selection. The data were first scaled to [0, 1] range (0 for minimum and 1 for maximum) for each trait before the selections were applied. The weights for protein, oil, groat, and beta-glucan were set to 0.3, –0.5, 1.0, and 0.5, respectively. Single trait selection rate for the checked traits was 10% of the trait ranges adjusted by their respective weights. Single trait cutting rate was 30% of the trait ranges adjusted by their weights. Overall cutting rate based on selection index was 50% of the entries. All traits were selected for high levels, while oil and beta-glucan were also selected for low levels.
|
|
Among the three selection strategies in the comprehensive selection scheme, the first priority was given to independent selection, followed by independent culling, and then index selection. As a result, if a genotype is selected by independent selection for any trait, it will be retained regardless of its levels for other traits. If a genotype is culled for a trait and not selected for any trait, it will be discarded even if its overall selection index is above the culling point. This order of priority is necessary to prevent genotypes that may be useful parents from being discarded due to their defects for some other traits, and to prevent genotypes with serious defects from being selected as varieties because they will not become varieties. These steps were performed in one strike using the "MultiTrait Selection" module of the GGEbiplot software (Yan, 2001; http://ggebiplot.com/biplot-breeder's_kit.htm), as displayed in Fig. 1
. Genotype by trait biplots, also generated using the GGEbiplot software, were used to graphically explore the original data as well as the selection results.
 |
RESULTS AND DISCUSSION
|
|---|
Associations among Traits and Trait Profiles of Genotypes
The GT biplot (Fig. 2
) displays 64% of the information in the standardized data of the 150 genotypes for the four traits, which is partially presented in Table 1. This biplot can be visualized from two perspectives.

View larger version (27K):
[in this window]
[in a new window]
|
Figure 2. The genotype by trait (GT) biplot of 150 covered oat genotypes for four quality traits. The traits are groat content and oil, protein, and beta-glucan (BGLUCAN) concentrations in the groat.
|
|
First, it shows the associations among the traits across the 150 genotypes: (i) a positive correlation (acute angle) between oil and beta-glucan, (ii) a negative correlation (obtuse angle) between beta-glucan and groat, and (iii) a negative correlation (obtuse angle) between oil and protein. These are consistent with the actual correlation coefficients, although only the oil vs. beta-glucan association and the groat vs. beta-glucan association were statistically significant (Table 2
). From the viewpoint of milling oat, both associations are undesirable and constitute a challenge to oat improvement. The positive association between oil and beta-glucan is undesirable both for milling oats and for feed oats. Similar relationships among oat quality traits were reported for multiyear multilocation oat yield trials (Yan et al., 2007).
Second, it shows the trait profiles of the genotypes, particularly those that are placed farther away from the biplot origin. For example, it shows that genotypes 722 and 723 had extremely high oil and high beta-glucan but low groat; genotypes 732 and 1155 had extremely high groat but low beta-glucan and low oil; and genotypes 319 and 1247 had extremely high protein but near or below-average levels for other traits. The numerical values of these genotypes for the four traits are presented in Table 1.
Selection Results from the Comprehensive Selection Scheme
Results based on the comprehensive selection scheme specified in the Materials and Methods (Fig. 1) are presented in Table 1. Since the culling rate based on selection index was set to 50%, 75 lines would be promoted and the other 75 discarded. Of the selected lines, five would be discarded if the selection were based on the selection index (genotypes 722, 723, and 855) or independent culling (732 and 1155) (Table 1). However, these genotypes may be useful parents for their outstanding levels in certain traits; it was the independent selection component in the scheme that prevented them from being discarded (Table 1). Figure 3
is the same biplot as Fig. 2 but displays the trait profiles of the genotypes that would be promoted (labeled 1) versus those that would be discarded (labeled 0). Note that the selected genotypes are mostly concentrated on the high groat area because groat was given the highest weight in the index selection scheme (Fig. 1).

View larger version (20K):
[in this window]
[in a new window]
|
Figure 3. The same genotype by trait (GT) biplot as Fig. 2 modified to show the trait profiles of the selected vs. discarded genotypes based on the comprehensive selection scheme (1 for genotypes selected and 0 for genotypes discarded based on the comprehensive selection scheme). BGLUCAN, beta-glucan; SVP, singular value partitioning method.
|
|
Considerations in Parameter Specification
It is important to note that the selection results (Table 1, Fig. 3) were determined by the weights given to each of the traits and by the specified selection and culling rates (Fig. 1). These weights and rates are completely subjective and depend on the breeding objectives, the researcher's understanding of the relative importance of the objectives, the breeding stage (preliminary versus final) at which the selection is made, as well as the size of the selection population. At a preliminary stage, data on the most important traits may not be available or reliable (i.e., low heritability), and therefore the culling rate should be relatively low. For example, in the example presented here, data on the most single important trait, yield, were not even available. Clearly different weights and rates, which are mainly determined by the researcher's personal experience and judgment, will lead to different decisions.
GT Biplot as an Aid for Independent Selection
On a GT biplot, the vector length of a genotype, which is the distance between the genotype and the biplot origin, is a measure of the genotype's peculiarity (i.e., how it differs from an "average" genotype), which is a hypothetical genotype that has an average level for all traits and is represented by the biplot origin. Therefore, genotypes with long vectors are those that have extreme levels for one or more traits. Such genotypes may or may not be a superior variety but they may be useful as parents. Based on the vector length of the genotypes in Fig. 3, the 150 genotypes were arbitrarily stratified into two subsets: a subset of 10 genotypes with vectors longer than 50% of the longest genotype vector (Fig. 4
) and a subset of 140 genotypes with vectors shorter than 50% of the longest genotype vector (Fig. 5
). Interestingly, but not surprisingly, the six genotypes with the longest vectors (722, 723, 319, 1247, 1155, and 732) in Fig. 4 (as well as Fig. 2) were also among the genotypes that were retained by independent selection (Table 1), confirming that the GT biplot can provide a quick, visual means to identifying genotypes that have extreme and useful trait profiles. Nine of the 10 genotypes (except 142) with long vectors were selected by the comprehensive selection scheme.

View larger version (14K):
[in this window]
[in a new window]
|
Figure 4. Genotype by trait (GT) biplot of 10 genotypes with extreme trait profiles. BGLUCAN, beta-glucan.
|
|

View larger version (19K):
[in this window]
[in a new window]
|
Figure 5. Genotype by trait (GT) biplot of 140 genotypes with less extreme trait profiles (1 for genotypes selected and 0 for genotypes discarded based on the comprehensive selection scheme). BGLUCAN, beta-glucan.
|
|
Issues on the Interpretation of a GT Biplot
In contrast to Fig. 2, the biplot in Fig. 4 revealed a much stronger negative association between groat and beta-glucan across the 10 genotypes with extreme trait profiles (also see Table 2), suggesting that it is a difficult task to combine these two traits at a high level. The positive association between beta-glucan and oil, however, became weaker (Fig. 4) and nonsignificant (Table 2), suggesting that high beta-glucan does not always go with high oil or that low beta-glucan does not always go with low oil. However, no genotype in this population was identified to have low oil and high beta-glucan (desirable for milling oat) or high oil and low beta-glucan (desirable for feed oat). The trait associations shown in Fig. 2 and Fig. 4 are consistent with the actual correlation coefficients (Table 2).
The GT biplot with the 140 genotypes of less extreme trait profiles also revealed a strong, positive correlation between oil and beta-glucan (Fig. 5, Table 2). However, these two traits were virtually independent of groat and protein (Fig. 5, Table 2). This suggests that genotypes existed in the current population that combined relatively high groat and relatively high beta-glucan (desirable for milling oat) or genotypes that combined relatively high groat and relatively low beta-glucan (desirable for feed oat).
An important intention for our presenting Fig. 5 in this paper is to point out that a GT biplot can sometimes show a false trait association. Based on the angle, Fig. 5 appears to suggest that protein and groat were strongly, negatively correlated across the 140 genotypes, but the two traits were actually uncorrelated (Table 2). A closer examination of Fig. 5 revealed that the vector of protein was apparently shorter than that of other traits. This indicates that information on protein may not have been fully presented in the first two principal components (PC1 and PC2), and therefore its associations with other traits may not have been correctly displayed. Indeed, examination of a spinning three-dimensional biplot (not presented) revealed right angles among three groups of traits: oil + beta-glucan, protein, and groat, suggesting independence among them.
In principal, a two-dimensional biplot can only fully display two (groups of) variables that are independent of each other, which will be in a right angle. To display three or more groups of variables that are independent of one another in a two-dimensional biplot, some relationships will be distorted. In such cases, the biplot displays the strongest patterns more accurately at the expense of other patterns. In the example, the strongest patterns were the positive correlation between oil and beta-glucan, and their independence of groat and protein (Table 2). These were correctly displayed by the biplot. However, the biplot was not able to display the independence between groat and protein. As a rule, if there is a variable (trait) in the GT biplot whose vector is considerably shorter, it implies that the relationships of the variable with others may have not been properly displayed, and that this variable tends to be independent of the variables with longer vectors.
This example calls for caution in interpreting a GT biplot. The following steps should be followed to prevent from drawing false conclusions from such a biplot.
- Check the goodness of fit of the biplot and see if it explained all or most of the information that is to be studied. If yes (e.g., Fig. 4), the patterns in the biplot are accurate.
- If the biplot does not display most of the variation, check if the vectors of the traits are of similar length. If yes (e.g., Fig. 2 and 4), the biplot adequately displays the patterns in the data but the associations among the traits may not be as strong as they appear in the biplot.
- If some traits have apparently shorter vectors (e.g., Fig. 5), the relationships involving these variables may be displayed inaccurately. This can be explored by (i) visualizing a rotating three-dimensional biplot, (ii) examining a biplot of PC1 vs. PC3, PC2 vs. PC3, or PC3 vs. PC4, and/or (iii) examining the actual correlation matrix among the variables.
 |
ACKNOWLEDGMENTS
|
|---|
The excellent technical support from Klaus D. Jakubinek for running the breeding nursery and Dorothy Sibbitt for conducting the near-infrared analysis is greatly appreciated. We thank two anonymous reviewers and the associate editor for their corrections and suggestions for an earlier version of this manuscript.
 |
NOTES
|
|---|
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.
Received for publication May 3, 2007.
 |
REFERENCES
|
|---|
- Ivkovich, M., and M. Koshy. 2002. Optimization of multiple trait selection in western hemlock (Tsuga heterophylla (Raf.) Sarg.) including pulp and paper properties. Ann. Sci.
59
:577–582.[CrossRef][Web of Science]
- Jannink, J.-L., J.H. Orf, N.R. Jordan, and R.G. Shaw. 2000. Index selection for weed suppressive ability in soybean. Crop Sci.
40
:1087–1094.[Abstract/Free Full Text]
- Lewis, R.S. 2006. Identification of germplasm of possible value for confronting an unfavorable inverse genetic correlation in tobacco. Crop Sci.
46
:1764–1771.[Abstract/Free Full Text]
- Long, J., J.B. Holland, G.P. Munkvold, and J.-L. Jannink. 2006. Responses to selection for partial resistance to crown rust in oat. Crop Sci.
46
:1260–1265.[Abstract/Free Full Text]
- Milligan, S.B., M. Balzarini, and W.H. White. 2003. Broad-sense heritabilities, genetic correlations, and selection indices for sugarcane borer resistance and their relation to yield loss. Crop Sci.
43
:1729–1735.[Abstract/Free Full Text]
- Sharma, R.C., and E. Duveiller. 2003. Selection index for improving helminthosporium leaf blight resistance, maturity, and kernel weight in spring wheat. Crop Sci.
43
:2031–2036.[Abstract/Free Full Text]
- Simmonds, N., and J. Smartt. 1999. Principles of crop improvement, 2nd ed. Blackwell Science Ltd. Press, Oxford, UK.
- Villanueva, B., and J.A. Williams. 1997. Optimization of breeding programs under index selection and constrained inbreeding. Genet. Res.
69
:145–158.[CrossRef][Web of Science]
- Yan, W. 2001. GGEbiplot: A Windows application for graphical analysis of multienvironment trial data and other types of two-way data. Agron. J.
93
:1111–1118.[Abstract/Free Full Text]
- Yan, W., and I. Rajcan. 2002. Biplot evaluation of test sites and trait relations of soybean in Ontario. Crop Sci.
42
:11–20.[Abstract/Free Full Text]
- Yan, W., N.A. Tinker, S. Molnar, J. Fregeau-Reid, and A. McElroy. 2007. Associations among oat traits and their responses to the environment in North America. J. Crop Improve.
20
:1–29.[CrossRef]
- Yan, W., and D.H. Wallace. 1995. Breeding for negatively associated traits. Plant Breed. Rev.
13
:141–177.