Crop Science 42:1493-1497 (2002)
© 2002 Crop Science Society of America
CROP BREEDING, GENETICS & CYTOLOGY
Resource Allocation to Select for Yield in Soybean
T. C. Helms*,a,
J. H. Orfb and
J. T. Terpstraa
a Dep. of Statistics, North Dakota State Univ., Fargo, ND 58105
b Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, St. Paul, MN 55108
* Corresponding author (Ted.Helms{at}ndsu.nodak.edu)
 |
ABSTRACT
|
|---|
Breeders need information on the best way to allocate resources to develop higher-yielding soybean [Glycine max (L.) Merr.] lines using a fixed amount of resources. Our objectives were to determine the best resource allocation for first-year yield tests by varying the number of replicates, number of lines per population, and number of populations. Thirty lines developed from each of 10 populations were tested in two selection environments. All 300 lines were evaluated in each of the seven environments. Five environments were used as validation environments to compare the mean of the selected lines to the overall mean. Increasing the number of replicates did not increase the effectiveness of selection. When 30 lines were tested from five populations, the realized selection differential varied from 170 to 310 kg ha-1, depending on which of the five populations were evaluated. We concluded that no more than 15 lines should be sampled from as many different populations as possible. The best allocation of resources to maximize the response to selection for first-year yield evaluation was the use of a single replicate at one location.
 |
INTRODUCTION
|
|---|
THE MEAN AND GENETIC VARIANCE differ among populations of inbred soybean lines when each population is derived from a different pair of parents. Populations with a high mean and large genetic variance would be expected to produce the best new experimental lines. Breeders make many crosses to increase their chances that some of these populations will have both a high mean and large genetic variance. Helms et al. (1997) concluded that the genetic variance of a population could not be predicted using either RAPD markers or the coefficient of parentage.
Weber (1979) reported that it is necessary to make many crosses, even though some crosses are known to have a reduced probability of producing desirable genotypes. On the basis of theoretical considerations, he suggested that one line should be evaluated from as many F2 populations as possible. St. Martin (1985) discussed allocation of resources in a multistage plant breeding program. He concluded that one-half of the available testing resources should be used for the first stage of testing. He provided several reasons why the predicted gain equation may be an unreliable estimator of realized gain.
The amount of testing resources is a fixed quantity. Plant breeders need to compare different allocations of resources to determine the best strategy to maximize genetic gain in the first year of replicated yield testing. Realized gain is a better indicator of the best allocation of resources than predicted gain. Our objectives were to compare the realized response with selection when: (i) the number of replicates per experimental line was varied; (ii) the total number of experimental lines evaluated was varied and the number of yield plots was held constant; and (iii) the number of populations used to develop experimental lines was varied.
 |
MATERIALS AND METHODS
|
|---|
Ten soybean populations were evaluated. Each of the 10 populations comprised 30 experimental lines that had been previously selected. Lines were selected as plant rows, on the basis of early maturity and visual appearance. The 30 selected lines from each of the 10 populations were then evaluated in replicated yield trials at each of seven environments. Soil type and precipitation from 1 May to 30 September were determined for each of the seven environments (Table 1)
.
The pedigrees of each population included: Merit x Ozzie (Population 1), M84-93 x M81-18 (Population 2), (Ozzie x Sprite) x Sigco KG20 (Population 3), (Ozzie x Sprite) x Ozzie (Population 4), Kato x Maple Ridge (Population 5), Agassiz x Northrup King S07-80 (Population 6), Pioneer 9061 x Evans (Population 7), Bert x M84-93 (Population 8), LMA-82/1 x Maple Ridge (Population 9), and LS-352 x McCall (Population 10). M84-93 and M81-18 are experimental lines derived from the University of Minnesota soybean breeding project. LMA-82/1 and LS-352 are experimental lines obtained from Poland by the University of Minnesota. Populations were considered to be a random effect which represented a conceptual population of elite x elite crosses.
Lines were derived from F3 plants that were individually threshed and F3:4 plant rows were evaluated for populations Pioneer 9061 x Evans and M84-93 x M81-18. For the other eight populations, F4 plants were individually threshed and F4:5 plant rows were evaluated.
Two-row plots were planted at the Casselton and Mantador, ND, locations in 1995 and 1996. At these North Dakota locations, plots were planted 6.1 m long with 0.76-m spacing between rows. The center 4.3 m of the two-row plot was harvested. Planting rate was 41 seeds m-1. Four-row plots were planted at Morris and Rosemount, MN, in 1995 and at Rosemount in 1996. At these Minnesota locations, plots were planted 3.7 m long with 0.25-m spacing between rows. The center 2.4 m of all four rows in each plot were harvested. Planting rate was 34 seeds m-1. The experimental design was an eight by eight square lattice with two replicates per location. The 10 populations were evaluated using a sets within replicates arrangement. Six lines from each of the 10 populations were assigned to a set for a total of 60 entries. Four commercially available check cultivars also were included in each set. The same experimental lines were assigned to the same set in all seven environments. There were five sets at each environment and each set was arranged in a separate lattice design. Each set within each environment was adjusted for lattice effects.
The two selection environments were identified using a random process and the remaining five environments were considered validation environments. By chance, the two selection environments were the 1995 Casselton and Mantador sites. The selection environments represent a random effect and the inferences of this random effect include a conceptual population of environments in the same geographical region. The use of two sites to evaluate the influence of selection served as two replications to evaluate the effects of selection. The response to selection was the difference in the mean yield of the selected lines minus the overall mean yield of the 300 lines evaluated, averaged across the five validation environments. This realized response to selection was defined as the realized selection differential.
All 300 lines were evaluated at all seven test environments. Lines were selected from each of the five sets on the basis of the data from the selection environment(s). The selected lines were then evaluated at five validation environments that were not the same as the selection environments. The validation environments included Rosemount and Morris in 1995; Casselton and Mantador in 1996; and Rosemount in 1996. The validation environments were used to compare realized selection differentials for various resource-allocation strategies. When the influence of selection was evaluated by testing 150 lines, three lines from each of the 10 populations within each set were randomly identified for yield evaluation. When selection was conducted on the basis of data from 75 lines, 15 lines from each set were randomly identified for evaluation, without regard to the population of origin.
The probability of a sample of n lines containing at least one line that is in the highest-yielding 10% of a distribution can be calculated. Let the probability equal the proportion of lines that are equal to or greater than a specified truncation point. Let
equal the probability of success, which is 0.1; and 1 -
equals the probability of failure; n = the number of lines sampled. Then the probability of a sample of n lines containing at least one line that is in the highest 10% of the distribution is
 | [1] |
 |
RESULTS AND DISCUSSION
|
|---|
Replication Effect
The effectiveness of selection in first-year yield evaluation was determined when the number of replicates per line was varied. There were 30 lines selected out of a total of 300 lines evaluated. Lines were selected on the basis of data from one replicate per selection site, on two replicates per selection site, and on two replicates at each of two sites. The average realized selection differential for selecting lines based on one replicate per location was 270 kg ha-1 at Casselton; 200 kg ha-1 at Mantador; and 235 kg ha-1 (Table 2)
when averaged across the Casselton and Mantador sites. The realized selection differential for selecting lines based on the mean, averaged across two replicates and two sites, was 250 kg ha-1. The group of lines selected on the basis of data from yield of a single plot yielded as much as the group of lines selected on the basis of the mean of four plots. The maturity of lines selected based on one replicate at one site was 3 d later than the mean of the population. The maturity of lines selected based on the mean, averaged across two replicates and two sites also was 3 d later than the mean of the population. Because the maturity of the selected lines is the same for both selection criteria, the yield comparison is not confounded with the influence of maturity on yield.
View this table:
[in this window]
[in a new window]
|
Table 2. Realized selection differential of 30 soybean lines selected from 300 lines on the basis of selection for yield at one location with one replicate, one location with two replicates per site, or two locations with two replicates per site.
|
|
The objective of selection in the first year of yield trials is primarily to discard the lowest-yielding lines. Hegstad et al. (1999) reported that when soybean lines were evaluated for yield using only one replicate at a single site, the highest-yielding one-third of the lines yielded more in validation environments than the lowest-yielding two-thirds of the lines. Streit et al. (2001) stated that yield of soybean lines selected, based on one or two replicates of a single row 108-cm long, was not significantly better than random selection. When selection was conducted at individual locations with two-row plots 3.7 m long, the use of one replicate was as effective as two replicates.
Baihaki et al. (1976) reported that when soybean lines were divided equally into three groups on the basis of yield performance, the high-yielding lines as a group contributed 25% of the total genotype x environment interaction component of variance. They selected lines on the-basis of three replicates at a single location and evaluated all the lines averaged across six environments. They reported that the highest-yielding lines could be identified on the basis of data from a single location. Their results agree with the results of this study that the highest-yielding lines could be identified on the basis of data from a single site. The genetic variance among unselected lines would be much larger than the genetic variance among lines that had been selected previously. This might explain why it required only one replicate per line to identify a group of 30 high-yielding lines. Increasing the number of replicates at a single site would be expected to have limited value because the soil and weather conditions are similar within a site. As the number of observations per line is increased, the precision of the mean of an individual line is increased. However, increasing the number of replicates within a site only provides additional precision regarding the performance of a line at that site. Genotype x environment interaction between the selection and validation sites would reduce the value of additional replicates at the selection site.
The environmental conditions within a site are usually the same for each block. Helms et al. (1999) found a lack of block x treatment interaction for most of the sites they evaluated. The limited sample of environmental conditions that exists in a single year might provide an explanation as to why selection based on a single replicate was as effective as selection based on the mean of two replicates at each of two sites within a single year.
The phenotypic correlation between the 1995 Casselton and 1995 Mantador sites that were used for selection was r = 0.67 (P > 0.001). The precipitation during the growing season was the same at the 1995 Casselton and Mantador selection sites (Table 1). This is evidence that the environmental conditions were very similar between these two sites. Perhaps this would be a common occurrence for two selection sites in close geographical proximity that share a common year. The two sites are located
100 km apart. Breeders commonly allocate the majority of their first-year yield tests in close proximity to the local breeding station. Jones (1988) stated that low correlation coefficients among test sites indicate that the additional site is reducing the standard error of the mean of a genotype. The high correlation coefficient between the 1995 Casselton and Mantador sites indicates that there is little additional information to be gained by testing at both these sites.
After the first year of selection for yield, the objective is to identify the best individual line out of a group of high-yielding lines. Greater precision is required for the second year of yield evaluation because the differences in yield among the lines at this stage are of a smaller magnitude. For this reason, more testing at a larger number of sites with increased replication at each site is a common practice.
Number of Genotypes Tested is Varied
The realized selection differential in the five validation sites was 270 kg ha-1 when 300 lines were evaluated using one replicate, and 240 kg ha-1 when 150 lines were evaluated using two replicates at Casselton. This difference was not significant (Table 3)
. The realized selection differential in the five validation sites was 200 kg ha-1 when 300 lines were evaluated using one replicate, and 160 kg ha-1 when 150 lines were evaluated using two replicates at Mantador. This difference was not significant (Table 3). These results are evidence that sampling 30 lines for each of 10 populations and evaluation based on one replicate was as effective as sampling 15 lines from each of 10 populations with evaluation based on two replicates.
View this table:
[in this window]
[in a new window]
|
Table 3. Realized selection differential for yield with 30 soybean lines selected among 300 lines evaluated on the basis of one replicate per line, 150 lines evaluated using two replicates per line, or 75 lines evaluated using four replicates per line.
|
|
The total amount of yield-testing resources was fixed at 300 plots for this comparison. The average realized selection differential for testing 300 lines using one replicate per location was 270 kg ha-1 at Casselton, 200 kg ha-1 at Mantador, and 235 kg ha-1 when averaged across the Casselton and Mantador sites (Table 3). The average realized selection differential for testing 150 lines using two replicates was 240 kg ha-1 at Casselton, 160 kg ha-1 at Mantador, and 200 kg ha-1 when averaged across the Casselton and Mantador sites. This result shows that when the average of the realized selection differential of the two resource-allocation strategies is compared, both strategies had equal merit. A third resource-allocation strategy was to select 75 lines based on the mean, averaged across two replicates and two sites. This third resource-allocation strategy resulted in a realized selection differential of 200 kg ha-1 (Table 3). The realized selection differential was the same for each of the three-resource allocation strategies. This is empirical evidence that for a fixed number of plots, the realized gain from selection would be the same when 300, 150, or 75 lines were evaluated. The maturity of the selected lines varied from 2 to 3 d later than the population mean for the three selection criteria. This indicates that yield of lines selected based on different resource allocations can be compared without the confounding influence of maturity.
When the total number of plots and number of lines evaluated was varied and the intensity of selection was held constant at 7%, the realized selection differential did not change (Table 4) . The lines were selected on the basis of two replicates at each of two sites, but the number of lines evaluated was varied. This result is evidence that increasing the number of lines sampled from the same number of populations does not identify selected lines that as a group are higher in yield. The intensity of selection and phenotypic variance of an individual line on an entry-mean basis did not change as the number of lines sampled was varied. Mean maturity of selected lines varied from 2 to 4 d later than the population mean. This indicates that maturity of the selected lines was similar when different numbers of lines were evaluated. The influence of maturity on yield was not an important factor when comparing groups of selected lines.
View this table:
[in this window]
[in a new window]
|
Table 4. Realized selection differential for 20 soybean lines selected from 300 lines, 10 lines selected from 150 lines, and five lines selected from 75 evaluated lines.
|
|
When 30 lines were sampled from each of the 10 populations, based on the mean averaged across two replicates and two sites, the realized selection differential was 250 kg ha-1 (Table 2). When 15 lines were sampled from each of the 10 populations, based on the mean of two replicates and two sites, the realized selection differential was 260 kg ha-1 (Table 5)
. This is evidence that no more than 15 lines should be sampled from each population. The fewer the number of lines sampled per population, the greater the number of populations that can be sampled when the total number of lines to be evaluated is a fixed quantity.
If 15 experimental lines were sampled from one biparental cross, the probability that at least one line would have a genotypic value for yield in the top 10% of the distribution would be
 | [2] |
When the experimental lines are developed from 10 different crosses and an equal number of lines are sampled from each of the 10 populations, the total number of lines sampled is 10n. The probability of sampling at least one line that had a genotypic value for yield in the top 10% of the pooled population would be
 | [3] |
When n = 15 and there are 10 populations which are pooled, then P = 0.999. When n = 30 and there are 10 populations which are pooled, then P = 0.999, which is the same result as when only 15 lines were sampled from each population.
Of the 10 populations evaluated, only two populations had a high mean and large genetic variance (Table 6)
. Those two populations included lines developed from crossing M84-93 x M81-18 (Population 2), and the three-way cross (Ozzie x Sprite) x Sigco KG20 (Population 3). If we assume that experimental lines that were superior to the check cultivars could be derived only from those two populations, the probability that a sample of 15 lines would include at least one line that had a genotypic value in the top 10% of the pooled distribution would be
 | [4] |
View this table:
[in this window]
[in a new window]
|
Table 6. Mean and genetic variance for yield of 10 soybean populations with 30 lines per population, averaged across seven environments.
|
|
If 30 lines were sampled from each of those two populations, then
 | [5] |
This result provides evidence that there is a high probability that at least one line in the highest-yielding 10% of a population would be included in a sample of 15 lines from each population. Evaluation of 15 lines per population, as opposed to evaluation of 30 lines per population, would permit twice as many populations to be evaluated. This strategy would increase the likelihood that some of the populations would have a high mean and large genetic variance.
Population Effect
The mean and genetic variance for yield was different among the 10 populations that were sampled (Table 6). When 30 lines were sampled from each of five populations, the realized selection differential depended on which five populations were sampled (Table 5). The realized selection differential of 10 populations with 15 lines sampled per population was 260 kg ha-1. The realized selection differential of different permutations of five populations was 240 kg ha-1, averaged across the five permutations. When only five populations were sampled, the smallest realized selection differential was 170 kg ha-1, and this was significantly smaller than the realized selection differential when 10 populations were used with 15 lines sampled from each population. This is evidence that more than five populations should be developed and sampled. When only five populations are sampled, there is a risk that none of the populations will have a high mean and large genetic variance.
 |
CONCLUSIONS
|
|---|
The information from this empirical experiment does not provide a global solution to the best allocation of resources. It will require further research to determine whether these results are repeatable in other years and geographical regions. The results of all empirical research are of necessity, limited to the conceptual population of environments and populations that were randomly sampled.
Increasing the number of replicates assigned to each experimental line will increase the total number of plots necessary to evaluate a fixed number of lines. The similarity of environmental conditions between blocks within a single site and between sites within a single year explains why the realized selection differential was the same regardless of whether one, two, or four plots were used to evaluate each line. This result suggests that experimental lines in the first year of yield evaluation need only be evaluated using one replicate.
As the number of different populations increases, there is a greater probability of developing lines from a population with both a high mean and large genetic variance. Sampling 30 lines per population did not increase the realized selection differential compared with sampling 15 lines per population. This experiment provides empirical evidence that for a fixed number of yield plots, no more than 15 lines should be sampled from as many different populations as possible.
Received for publication August 10, 2000.
 |
REFERENCES
|
|---|
- Baihaki, A., R.E. Stucker, and J.W. Lambert. 1976. Association of genotype x environment interactions and performance level of soybean lines in preliminary yield tests. Crop Sci.16:718721.[Abstract/Free Full Text]
- Hegstad, J.M., G. Bollero, and C.D. Nickell. 1999. Potential of using plant row yield trials to predict soybean yield. Crop Sci. 39:16711675.[Abstract/Free Full Text]
- Helms, T., J. Orf, G. Vallad, and P. McClean. 1997. Genetic variance, coefficient of parentage, and genetic distance of six soybean populations. Theor. Appl. Genet. 94:2026.
- Helms, T.C., R.A. Scott, and J.J. Hammond. 1999. Intrablock variance among duplicate treatments for nearest neighbor analyses. Agron. J. 91:317320.[Abstract/Free Full Text]
- Jones, T.A. 1988. A probability method for comparing varieties against checks. Crop Sci. 28:907912.[Abstract/Free Full Text]
- Streit, L.G., W.R. Fehr, and G.A. Welke. 2001. Family and line selection for seed yield of soybean. Crop Sci. 41:358362.[Abstract/Free Full Text]
- St. Martin, S.K. 1985. The application of quantitative genetics theory to plant breeding problems. p. 311317. In R. Shibles (ed.) World Soybean Res. Conf. III. Westview Press, Boulder, CO.
- Weber, W.E. 1979. Number and size of cross progenies from a constant total number of plants manageable in a breeding program. Euphytica 28:453456.
This article has been cited by other articles:

|
 |

|
 |
 
T. C. Helms and J. J. Hammond
Genetic Gain Equation with Correlated Genotype x Environment Effects
Crop Sci.,
March 27, 2006;
46(3):
1137 - 1142.
[Abstract]
[Full Text]
[PDF]
|
 |
|