Crop Science Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 24 January 2006
Published in Crop Sci 46:456-466 (2006)
© 2006 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Messina, C. D.
Right arrow Articles by Vallejos, C. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Messina, C. D.
Right arrow Articles by Vallejos, C. E.
Agricola
Right arrow Articles by Messina, C. D.
Right arrow Articles by Vallejos, C. E.
Related Collections
Right arrow Crop Growth and Development
Right arrow Crop Models
Right arrow Crop Genetics

CROP BREEDING, GENETICS & CYTOLOGY

A Gene-Based Model to Simulate Soybean Development and Yield Responses to Environment

C. D. Messina*,a, J. W. Jonesa, K. J. Booteb and C. E. Vallejosc

a Agric.& Biol. Eng. Dep.
b Dep. of Agronomy
c Horticultural Sci. Dep., Univ. of Florida, Gainesville, FL 32611

* Corresponding author (Charlie.Messina{at}pioneer.com)


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Realizing the potential of agricultural genomics into practical applications requires quantitative predictions for complex traits and different genotypes and environmental conditions. The objective of this study was to develop and test a procedure for quantitative prediction of phenotypes as a function of environment and specific genetic loci in soybean [Glycine max (L.) Merrill]. We combined the ecophysiological model CROPGRO-Soybean with linear models that predict cultivar-specific parameters as functions of E loci. The procedure involved three steps: (i) a field experiment was conducted in Florida in 2001 to obtain phenotypic data for a set of near-isogenic lines (NILs) with known genotypes at six E loci; (ii) we used these data to estimate cultivar-specific parameters for CROPGRO-Soybean, minimizing root mean square error (RMSE) between observed and simulated values; (iii) these parameters were then expressed as linear functions of the (known) E loci. CROPGRO-Soybean predicted various phenological stages for the same NILs grown in 2002 in Florida with a RMSE of about 5 d using the E loci–derived parameters. A second evaluation of the approach used phenotypic data from cultivar trials conducted in Illinois. Cultivars were genotyped at the E loci using microsatellites. The model predicted time to maturity in the Illinois variety trials with RMSE around 7.5 d; it also explained 75% of the time-to-maturity variance and 54% of the yield variance. Our results suggest that gene-based approaches can effectively use agricultural genomics data for cultivar performance prediction. This technology may have multiple uses in plant breeding.

Abbreviations: cM, centimorgan • DOY: day of year • GC, genetic coefficients • ME, mean error • MEP, mean error of the prediction • NILs, near-isogenic lines • PCR, polymerase chain reaction • PTD, photothermal days • RMSE, root mean square error • RMSEP, root mean square error of prediction • SSR, simple sequence repeat • TD, thermal days


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
RECENT ADVANCES in agricultural genomics promise a new era of yield maximization through genetic improvement and optimized crop management (Somerville and Somerville, 1999; Huang et al., 2002). The realization of the potential of agricultural genomics into practical applications will depend on our capacity to make accurate quantitative predictions of agronomic traits for different genotypes under varying environmental conditions. Most agronomic traits are genetically complex (Daniell and Dhingra, 2002; Stuber et al., 2003; Orf et al., 1999; Lark et al., 1995) and often show genotype x environment interactions (Allard and Bradshaw, 1964) making their prediction a challenging task. Dynamic process-oriented crop models such as DSSAT (Jones et al., 2003) and APSIM (Keating et al., 2003) have the potential to link genetic architectures and whole-organism phenotypic expression (Boote et al., 2001; White and Hoogenboom, 2003; Chapman et al., 2003; Cooper et al., 2002).

These crop models incorporate knowledge of environmental and management effects on crop growth and development by dynamically simulating the effects of climate and soils on physiological processes, soil water, and nutrient dynamics. Genotypic variation for morphological and physiological traits is taken into account by introducing cultivar-specific parameters (Hunt and Boote, 1998; Boote et al., 2003). Although these parameters are frequently referred to as "genetic coefficients" (GC), they are usually estimated directly through phenotypic data without considering genetic information. Few such coefficients are directly measured. Rather, their values are estimated using optimization algorithms and large data sets (Hunt et al., 1993; Mavromatis et al., 2001, 2002; Grimm et al., 1993). This limits the applicability of crop models. If instead we could estimate genetic coefficients using genetic information, we could then link the crop's genetic architecture and plant phenotypes through the effects of genes on the regulation of physiological processes and how these respond to the environment. This parameterization scheme could enhance model applicability in crop management optimization, plant breeding and agricultural genomics.

The first attempt to implement this concept was published by White and Hoogenboom (1996) in Genegro, a process-oriented model that incorporated effects of seven genes affecting phenology, growth habit and seed size of common bean (Phaseolus vulgaris L.). Genetic coefficients in Genegro were based on the allelic configuration of a set of loci and a set of linear functions. Genegro accurately predicted bean development but poorly explained yield variations between sites (Hoogenboom et al., 1997). Recent improvements of Genegro included the simulation of the effects of temperature on photoperiod sensitivity regulated by the gene Tip and a new function to predict seed weight (Hoogenboom and White, 2003; Hoogenboom et al., 2004). Similar modeling approaches were used to incorporate quantitative trait loci effects on leaf elongation rate in maize (Zea mays L.) (Reymond et al., 2003); plant height, preflowering duration, carbon partitioning to spike, spike number, and radiation use efficiency in barley (Hordeum vulgare L.) (Yin et al., 2003); and transpiration efficiency in sorghum [Sorghum bicolor (L.) Moench] (Chapman et al., 2003). Collectively these methods are referred to as to top-down methodologies (Hammer et al., 2004).

Photothermal models (e.g., Grimm et al., 1993) that predict time to flowering and flowering duration based on the genetic makeup of E loci were developed for soybean (Stewart et al., 2003; Cober et al., 2001; Upadhyay et al., 1994a; Summerfield et al., 1998). The E loci were shown to regulate time to flowering but also time to maturity: E1 and E2 (Bernard, 1971), E3 (Buzzel, 1971), E4 (Buzzel and Voldeng, 1980), E5 (McBlain and Bernard, 1987), and E7 (Cober and Voldeng, 2001). The dominant alleles lengthen time to flowering and maturity in response to photoperiod (Saindon et al., 1989; McBlain et al., 1987; Cober et al., 1996; Cober and Voldeng, 2001). These loci were mapped with simple sequence repeat (SSR) markers (Cregan et al., 1999; Molnar et al., 2003). Similar studies were conducted in other important agronomic traits (Lark et al., 1994, 1995; Orf et al., 1999; Mansur et al., 1993, 1996). These advances in trait mapping and in the understanding of the genetic basis of agronomic traits suggest the opportunity to develop a gene-based model that predicts growth and development of soybean. The objective of this study was to develop and test a procedure for quantitative prediction of phenotypes as a function of environment and specific genetic loci in soybean.


    MATERIALS AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Overview
We developed the procedure for quantitative prediction of phenotypes as a function of environment and specific genetic loci in soybean by combining the ecophysiological framework of CROPGRO-Soybean (Boote et al., 1998) and linear models that predict cultivar-specific parameters in CROPGRO-Soybean as a function of E loci. The procedure involved three steps. First, a field experiment was conducted in Florida in 2001 to generate phenotypic data for a set of near isogenic lines (NILs) with known genotypes at six E loci (Table 1). Second, we used this data to estimate cultivar-specific parameters for CROPGRO-Soybean (Table 2) by minimizing the RMSE between field observations and simulated values. Third, these parameters were then estimated as linear functions of the E loci, and a variable NLOCI that accounts for the total number of dominant alleles at the E loci.


View this table:
[in this window]
[in a new window]
 
Table 1. List of soybean near-isogenic lines used for model development and for molecular marker evaluation.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Selected genetic coefficients in CROPGRO-Soybean controlling plant development in soybean, potential associations with E loci, and variables used for parameter estimation.

 
A first evaluation of the procedure was conducted using phenotypic data collected in 2002 in Florida. This data set contains the same phenotypic and genotypic information as the data set used for model development. It is independent, however, since the NILs were grown in a different year.

A second evaluation of the approach used phenotypic data collected in variety trials conducted in Illinois. SSR were analyzed to deduce the genotypes of E loci in a set of cultivars that were grown in Illinois.

Simulation of Soybean Development
In the CROPGRO-Soybean model, the vegetative phase, comprising the time from emergence to flowering, is subdivided into three subphases: (i) juvenile, (ii) inductive, and (iii) flower initiation that ends when the first flower becomes visible (Boote et al., 1998). The duration of the vegetative phase varies between determinate and indeterminate soybeans, and this difference is coded by the time to flowering plus the genetic coefficient FL-VS, which determines the physiological time between flowering and the end of differentiation of nodes in the main stem (Boote et al., 2003). FL-VS generally coincides with the appearance of a seed greater than 3 mm in the upper four nodes in the main stem (R5). The reproductive R phase, covering from flowering to physiological maturity, includes (i) the onset of pod addition, (ii) the onset of seed addition, and (iii) the time from the onset of seed addition to physiological maturity. R stages are defined in Fehr and Caviness (1977)

CROPGRO uses a multiplicative function of photoperiod (P) and temperature (T) to model daily relative developmental progress [R(t)] during different growth phases with maximum rate standardized to 1.0 (Grimm et al., 1993, 1994).

Formula[1]

At optimum temperature and photoperiod, the rate of progress in physiological days equals the rate of progress in calendar days. When conditions deviate from the optimum, the rate of development per day decreases, becoming a fraction of a physiological day. The multiplicative model holds for a period beginning after the juvenile phase. Before and during the juvenile phase, the plant is not sensitive to changes in daylength, and development depends only on temperature. Each phase has its own "developmental accumulator" starting at a unique point in time; when it reaches a threshold (genotype characteristic), the phase ends. This nonlinear model has been shown to have high predictive capabilities for soybean (Grimm et al., 1993, 1994; Mavromatis et al., 2001, 2002).

Field Experiments
A set of soybean NILs (Table 1) was grown in the field at the University of Florida, Gainesville, FL (29°38' N; 82°22' W), in 2001 and 2002. Soybeans were planted on 22 May and 25 July 2001 and on 23 May and 7 Aug. 2002. Planting was done by hand at a density of 20 seed m–1 and with a row spacing of 0.56 m. Plots were single rows 3 m long in 2001, and 1.5 m long in 2002. Weeds were controlled by a pre-emerge herbicide application of 1.0 g m–2 of pendimenthalin [N-(1-ethylpropyl)-3,4-dimethyl-2,6-dinitrobenzenamine] and by manual control during the growing season. Pests and diseases were controlled by applications of 1.17 mL m–2 of daconil [tetrachloroisophthalonitrile] and 0.07 mL m–2 of permethrin (3-phenoxyphenyl)methyl 3-(2,2-dichloroethenyl)-2,2-dimethylcyclopropanecarboxylate] as necessary, varying between seasons and planting dates. Fertilizer (14:6:12) was applied in bands at a rate of 16.8 g m–2, when the plants reached the four-leaf stage. Plants were grown under irrigation. The soil of the experimental plots is loamy, siliceous, hyperthermic Grossarenic Paleudult. Detailed characteristics for this soil are available from DSSAT (Jones et al., 2003). Daily weather data were measured using an automated weather station 50 m from the experimental plots (temperature, rainfall, solar radiation). Within planting dates and years, NILs were planted in two complete randomized blocks. Time of first visible flower, onset of pod addition, first seed on the upper four nodes (R5), and physiological maturity (R7) were recorded at 2 to 4 d intervals.

Gene-Based Parameterization of CROPGRO-Soybean
A systematic approach was used to estimate the genetic coefficients (Hunt and Boote, 1998; Boote et al., 2003) listed in Table 2 using a procedure similar to Mavromatis et al. (2001). The initial values for the genetic coefficients were those for maturity group 0 (Jones et al., 2003). These values (Table 2) were modified in sequence using selected NILs and planting dates to minimize the root mean square error,

Formula[2]
where n is the number of observations and xi and yi are predicted and observed values for the variables listed in Table 2.

Soybean NILs carrying recessive alleles for E are impaired in the perception or transduction of the photoperiodic signal. These loss-of-function lines grown under short photoperiod ensure the absence of photoperiodic effects on plant development, allowing us to estimate the thermal components of the photothermal time coefficient. The line L71–920 grown in late plantings (short photoperiod) were used to estimate the thermal component of the photothermal time between emergence to flowering and from flowering to the onset of pod addition, and between flowering and the end of canopy leaf area expansion (FL-LF).

Lines carrying the E alleles grown in early plantings responded to photoperiod. All lines but L71–920 were used to estimate the critical photoperiod and photoperiod sensitivity to predict time to flowering. The coefficient R1PPO, which accounts for increased photoperiod sensitivity after flowering (Piper et al., 1996) was estimated using time to physiological maturity date.

A final step used both planting dates in 2001 and photoperiod-sensitive lines to estimate the photoperiod component of the photothermal times between flowering and last leaf formed in the main stem (FL-VS), between flowering and first seed (FL-SD), and time from first seed to physiological maturity (SD-PM). Photothermal time from flowering to first seed in photoperiod-sensitive lines was estimated as a fraction of the photothermal time FL-VS. We estimated FL-VS through an analysis of observations made for stage R5, which typically coincides with the expansion of the last leaf on the main stem. The average ratio of FL-SD to FL-VS duration is 0.56 for several standard cultivars of different maturity groups in CROPGRO-Soybean (Jones et al., 2003). FL-SD was then estimated as 0.56(FL-VS). Results from reciprocal transplant experiments suggest that the locus E1 is involved in extending the juvenile phase (Upadhyay et al., 1994b). Near-isogenic lines carrying the E1 allele were used to estimate the duration of the juvenile phase.

After obtaining cultivar-specific parameters for CROPGRO-Soybean, using only the data from the 2001 field experiment, we estimated them from E loci information using multiple linear regression. The linear functions used allelic information at the loci E1, E2, E3, E4, E5, and E7. For each loci, a value of 0 or 1 was assigned, depending on whether the allele present in the near isogenic line was recessive or dominant, respectively. A variable NLOCI was introduced to account for the total number of dominant alleles at the E loci. Stewart et al. (2003) showed a linear relationship between photoperiod sensitivity in time to flowering and the number of dominant alleles at selected E loci.

Model Evaluation
Quantitative measurements for model evaluation include RMSE, slope and intercept from the regression between simulated and independently observed values (Hunt and Boote, 1998), and mean error (ME),

Formula[3]
which evaluates model bias.

The procedure was evaluated using two datasets. First, phenotypic data from the field experiment conducted in 2002 season at the University of Florida were used to evaluate the gene-based model developed using data collected during 2001 season at the same location and for the same NILs.

Second, we used the gene-based model to simulate growth and development of a set of public soybean varieties grown in a variety trial network in Illinois. The trial network consisted of eight locations: Belleville, Urbana, Dekalb, Dixon Springs, Dwight, Monmouth, Carbondale, and Perry, where soybeans were grown between 1995 and 1999. Yield, time to maturity, and crop management data were from the University of Illinois (University of Illinois, 2005). Weather data were from the Midwestern Regional Climate Center (Midwestern Regional Climate Center, 2005). Soil parameters were provided by Dr. T. Mavromatis (T. Mavromatis, personal communication, 2003). Genetic coefficients controlling plant development were estimated as functions of E loci after genotyping (see next section), while the remaining genetic coefficients were from Mavromatis (Table 3) using a calibration procedure described in Mavromatis et al. (2001).


View this table:
[in this window]
[in a new window]
 
Table 3. Cultivar specific parameters not estimated from genotypic information but required to simulate yield for soybean varieties grown in eight locations in Illinois (1995–1999). Data from Dr. T. Mavromatis (T. Mavromatis, personal communication, 2003).

 
Molecular Marker Analysis
To determine the general applicability of the gene-based model, we performed an independent test to determine how well the model would simulate the Illinois varieties. To run the test, the allelic constitution at the selected E loci had to be determined for these varieties. This procedure was performed in two steps. First, we characterized the En–linked SSR marker loci in the NILs, and then analyzed the same SSR loci in the Illinois varieties to deduce the genotypes at the selected E loci. The deduced E genotypes were used to assign the proper genetic coefficients to each variety.

E1, E2, and E3 along with several number of SSR markers have been incorporated into the integrated (classical and molecular maps) soybean linkage map (Cregan et al., 1999). The following flanking markers around E1, E2, and E3 were considered for evaluation and genotyping: Satt557–E1Satt319; Satt518–E2Sat_038; and Satt513–Satt229–E3Satt006 (Cregan et al., 1999; Grant and Shoemaker, 2005). More recently, Abe et al. (2003) mapped E4 5 centimorgans (cM) apart from Satt496 and Molnar et al. (2003) mapped E4 at loci Satt354, located 11.9 cM from Satt496. Molnar et al. (2003) mapped E7 at loci Satt319.

Seven NILs were selected to investigate the presence of length polymorphism of SSR at the loci E1, E2, and E3 (Table 1). Plants were grown in a greenhouse for 2 wk. Soybean DNA was isolated from upper node leaves by a modified procedure of Murray and Thompson (1980) as reported in Vallejos et al. (1992). Seven soybean cultivars, Yale, Williams 82, Vinton 81, Savoy, Omaha, Nile, and Linford, were genotyped at E1 to E4 loci. This soybean germplasm is from GRIN Germplasm-Soybean Collection and provided by Dr. Randal Nelson. DNA was extracted from 50 mg seed tissue flour in 150 µL TES (Tris 0.1 M, pH 8; EDTA 5 µM; NaCl 50 mM) and 800 µL of 1.25x extraction buffer [125 mM Tris-HCl (pH 7.8); 12.5 mM EDTA-Na (pH 8.0); 1.4 M NaCl; 1.25% CTAB; 0.5% NaSO2]. Samples were incubated for 50 min at 65°C, and then extracted with chloroform–octanol (400 µL). Phases were separated by centrifugation at 13c000 RPM for 15 min. Due to high concentration of polysaccharides and oils, this step was repeated two to three times. DNA was precipitated in isopropanol (600 µL) for 30 min, incubated for an hour in a solution containing 76% ethanol and 0.2 M sodium acetate, followed by a 30-s rinse in a solution containing 76% ethanol and 10 mM ammonium acetate, and finally air dried for 1 h. The pellet was resuspended in 800 µL of TE buffer (10 mM Tris-HCl; 1.0 mM EDTA-Na).

Polymerase chain reaction mixes contained 1x PCR buffer [20 mM Tris-HCl (pH 8.4), 50 mM KCl] (Cat. No. 10342–020, Invitrogen, Carlsbad, CA), 1.5 mM MgCl2, 200 µmol of each nucleotide, 0.1 µL of 300 Ci mmol–1 {alpha}-32PdATP, 0.1 µmol of 3' and 5' end primers, 0.5 unit Taq DNA polymerase, and 30 ng of soybean genomic DNA in a total volume of 20 µL. Thermocycling consisted of a 30 s denaturation at 94°C, a 30 s annealing at 50°C, and a 30 s extension at 72°C for 35 cycles on a Geneamp PCR System 9600 (PE Applied Biosystems, Foster City, CA). The thermocycle had an initial denaturation phase of 2 min at 94°C, and a final extension phase of 5 min at 72°C. Samples were denatured at 72°C in formamide for 5 min and quenched on ice before loading. Polymerase chain reaction products (3 µL) were separated on a DNA vertical gel containing 6% Long Ranger (Cat. No. 50611,Cambrex Bio Science Rockland, Inc., Rockland, ME), 0.5x TBE (0.5 M Tris, 0.445 M boric acid, 10 mM EDTA) and 6 M urea, at 50 W constant power for 90 min. Polymerase chain reaction amplification products were visualized by autoradiography on a Kodak X-OMAT film (Cat. No. 1651512, Kodak, Rochester, NY).


    RESULTS AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Parameterization of CROPGRO-Soybean
CROPGRO-Soybean accurately simulated soybean phenology during the 2001 season after calibration. Precision of estimates for the timing of developmental processes varied between 1.4 and 3.2 d according to RMSE calculations (Table 4). These RMSE values are close to the observational error, defined by the interval between field observations (2–4 d). When calibrated GC were used, there were a high correlations between observed and simulated values for time to flowering, time to maturity, time to onset of pod addition, and time to last leaf on main stem node for 2001 (Fig. 1 , Table 4). However, there was a tendency for the correlation to deteriorate as development progressed toward physiological maturity. This decrease in predictive ability is apparently related to both the propagation of errors as the simulation of later stages progresses, and the uncertainty in measuring late stages of development. Comparable observations have been reported by Grimm et al. (1993, 1994). The calibration procedure provided genetic coefficient estimates with little or no bias as reflected in ME, and a slope not significantly different from 1 (Table 4). The model accounted for 88 to 96% of the variance in observed values (Table 4), which were within the range of results reported in other soybean studies (Mavromatis et al., 2001, 2002; Grimm et al., 1994; Elizondo et al., 1994) and in common bean studies (Hoogenboom et al., 1997; White and Hoogenboom, 1996).


View this table:
[in this window]
[in a new window]
 
Table 4. Comparisons between CROPGRO-Soybean parameterized directly from data and CROPGRO-Soybean parameterized using E loci information using data collected in Gainesville, FL, during 2001.

 

Figure 1
View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1. Relation between observed and predicted (A) time to flowering, (B) time to onset of pod addition, (C) time to last mainstem node (R5), and (D) time to physiological maturity. Soybean development simulated using CROPGRO-Soybean and cultivar-specific parameters calibrated for each near-isogenic line. Data collected in Gainesville, FL, during 2001.

 
Estimation of Genetic Coefficients for CROPGRO-Soybean from E loci
Genetic coefficients were estimated from E loci with different levels of accuracy (Table 5). The proportion of the total variance explained by the linear models varied between 32 and 88%, similar to values reported for Genegro (White and Hoogenboom, 1996). The number of dominant alleles (NLOCI) in a NIL affected the coefficients CSDL, PPSEN, and SD-PM. Because CSDL and PPSEN mediate the influence of photoperiod on rate of development, their relationship with E loci was expected and was consistent with previous models of time to flowering based on E loci (Stewart et al., 2003; Upadhyay et al., 1994a). However, the prediction of SD-PM from E loci had higher RMSE than other phases, indicating that photoperiodic effects indirectly amplify differences in phase duration. Therefore, this study indicates two modes of action of E loci on soybean development. One mode acts by the modulation of the critical photoperiod and photoperiod sensitivity. A second mode regulates the number of physiological days required for phase duration.


View this table:
[in this window]
[in a new window]
 
Table 5. Associations between E loci and genetic coefficients in CROPGRO-Soybean. Dominant and recessive alleles take values of 1 and 0, respectively. NLOCI denotes the sum of dominant alleles.

 
E1 alone showed major control on PPSEN, EM-FL, V1-JU, and SD-PM and, in interaction with other loci, on PPSEN. This result confirmed the hypothesis that E1 affects the juvenile phase as was inferred from Upadhyay et al. (1994b). It also confirmed previous evidence of epistatic effects between E1 and E3 on the regulation of time to flowering (Upadhyay et al., 1994a). Notably, E1 had a negative effect on physiological days from first seed to maturity, consistent with previous observations indicating that E1 hastened soybean development during the reproductive period (McBlain et al., 1987).

Locus E5 showed an association with CSDL (Table 5). In previous experiments it was shown that E5 has major control over the duration of pod addition (Messina, 2003). This study shows that E5 affects pod addition duration by delaying the transition of the apex from vegetative to reproductive. Longer duration of pod addition would be the result of a longer time to sink development allowing the plant to set more pods (Wardlaw, 1990). A strong interaction was also shown between E3 and E5 in regulating pod number (Messina, 2003). Our results suggest that the interaction between E3 and E5 is via their effects on CSDL. A reduction in CSDL would delay the onset and rate of seed growth under long photoperiods, increasing pod addition duration and seed number.

CROPGRO-Soybean simulated phenology well (RMSE = 2.7–4.0 d) for the 2001 data where cultivar coefficients were estimated from the genetic makeup of the NILs (Table 4). The RMSE increased and the proportion of the explained variance of the observations decreased relative to the calibrated results, which did not use E loci information. This result was expected due to the propagation of errors intrinsic to linear models used to estimating genetic coefficients, although this characteristic was not observed in the development of Genegro (White and Hoogenboom, 1996).

With the exception of time to flowering, all predictions showed good agreement with observed results with little deviations from the 1:1 relationship (Table 4). CROPGRO-Soybean insensitivity in predicting time to flowering when parameters were estimated from genotypes (b < 1; P < 0.05) was due to one extreme value (50 d), which on removal, the slope of regression between observed and simulated values was not significantly different from one.

Model Evaluation with Independent Data from 2002
We tested the model capabilities to predict crop development using an independent data set collected during 2002. The model accurately predicted reproductive development with low bias. Root mean square error of prediction (RMSEP) ranged between 2.6 and 7.6 d, and mean error of prediction (MEP) varied between –5.9 and 1.1 d (Table 6). These values are slightly higher than RMSE of the calibration (Table 4) data set. Despite the small decrease in the model precision relative to the calibration values, these RSMEP are comparable in magnitude with the precision of the measurements (2–4 d) and previous modeling results (Grimm et al., 1993).


View this table:
[in this window]
[in a new window]
 
Table 6. Evaluation of CROPGRO-Soybean and CROPGRO-Soybean parameterized using E loci information using independent data collected in Gainesville, FL, during 2002.

 
CROPGRO-Soybean, run either with genetic coefficients estimated using data collected during 2001 or estimated from E genotypes (also 2001), showed poor accuracy when predicting physiological maturity. The slope of the regression between observed and simulated values was different from one (P < 0.01) (Table 6). The model consistently underpredicted the time of physiological maturity for long cycle NILs grown in early plantings. After removing six simulations for early plantings and long life cycle (>105 d) the model predicted physiological maturity without systematic bias (b = 1; {alpha} = .01). The removed data points had values greater than the upper quartile plus 1.5 times the interquartile range. The simulation error can arise from measurement errors in R7.

SSR Length Polymorphisms Linked to E Loci and Cultivar Genotyping
Polymorphisms were detected at E-linked SSR marker loci in the NILs (Fig. 2 ). Results obtained from this study support the proposed location of the E loci on the molecular map and were used to genotype soybean varieties (Fig. 2). Because of the linkage between the SSR and E loci position, we can infer with varying degrees of certainty the presence of the dominant allele in each public soybean cultivar. Due to the close linkage between Satt557 and E1, the expected uncertainty in determining the presence of the dominant allele is in the order of 1% since there is about 1.2 cM from Satt557 to E1 (Cregan et al., 1999; Abe et al., 2003). The uncertainty increases for E4, which was mapped 5.0 cM apart from Satt496 (Abe et al., 2003), and it is highest for E2 as the distances between E2 and Satt518 and Sat_038 were estimated as 17.2 and 18.3 cM, respectively. Locus E3 was located within a bracket of 14 cM between Satt006 and Satt513. However, within this bracket only Satt229 showed length polymorphism between the near isogenic lines.


Figure 2
View larger version (43K):
[in this window]
[in a new window]
 
Fig. 2. Electrophoretic separation of SSR markers linked to four E loci. Selected soybean cultivars from Illionois are depicted in panels A–E, and near-isogenic lines with ‘Clark’ background in panels F–H. (A and E) E1-linked Satt557; (B, C, and G) E2-linked Satt581 and Satt038; (D and H) E3-linked Satt229; and (E) E4-linked Satt496.

 
Given the uncertainty in the determination of each allele at a given locus, we inferred from the PCR fragment sizes (Fig. 2) the E loci makeup for each cultivar as follows: Linford, e1E2E3E4; Nile, e1E2E3E4; Omaha, e1E2E3E4; Savoy, e1e2e3e4; Vinton 81, e1e2E3e4; Williams 82, e1E2E3E4; Yale, e1E2E3E4. At locus Satt496 the PCR fragment sizes differ from previous reports (Abe et al., 2003). E4 alleles were determined by comparing the fragment size at the locus Satt496 relative to the NIL for Clark carrying the dominant allele E4. We assume that deviations from this size indicate the absence of E4, hence the cultivar might have the e4 genotype.

All genotypes had the recessive allele at the marker locus Satt557, therefore, its genotype is e1 and probably e7 since these loci are tightly linked (Cober and Voldeng, 2001). The cultivar Vinton 81, however, has gray pod pubescence. The alternative allele, tawny color, was found associated with earlier maturity (Cober and Voldeng, 2001). The locus T controlled this trait and is tightly linked to E1 (1.4 cM) and E7 (4.0 cM). Fragment size for Satt557 suggests that the genotype is e1, hence e7. However, from the color of the pubescence we can infer the genotype as being E7. Finally, due to the lack of markers for E5 and based on the maturity groups we assumed that all cultivars carried the e5 allele at this locus.

Predicting Soybean Yield and Maturity in Variety Trials
We used E loci–linked SSR markers to deduce the genotypes of a set of cultivars for which actual data were already available for variety trials performed at eight locations in the state of Illinois over a period of 5 yr. The use of variety trials for model evaluation added realism to the test but restricted the genotypic diversity. Nonetheless, the dataset included a wide range of maturity dates and yield variation. The model was able to predict maturity, which varied from about 240 and 275 d of the year, and yields, which varied from 1.5 to 5.0 Mg ha–1 (Fig. 3 ). The model predicted 75 and 54% of the observed maturity date and yield variances. These values are within the lower range of values obtained in previous modeling studies predicting yield and development in variety trials (Mavromatis et al., 2001, 2002; Hoogenboom et al., 1997). Prediction errors are also similar to those found in predicting leaf expansion in maize (Reymond et al., 2003) and phenology and yield in beans (Hoogenboom et al., 2004).


Figure 3
View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3. Simulated and observed time to maturity (day of the year) and yield (kg ha–1) of seven soybean public varieties grown at eight locations and 5 yr in Illinois (1995–1999). Regression equations for time to maturity, y = –50.7 (± 9.7) + 0.8x (± 0.04), R2 = 0.75; and yield, y = 689 (± 192) + 0.77x (± 0.07), R2 = 0.54. Time to maturity and yield RMSE were 5.2 (days) and 393 (kg ha–1), respectively. Yield RMSE is 12.3% of the average observed yields.

 
We can identify some causes contributing to the slightly higher prediction errors in our study. This study used genetic coefficients calculated from E loci genotypes. Even though the model based on six loci accounted for as much as 75% of the variance in maturity date, which demonstrates the importance of these loci on the regulation of soybean development, other loci involved in the regulation of soybean development and yield were not included in our model (Mansur et al., 1993, 1996; Orf et al., 1999; Tasma and Shoemaker, 2003; Tasma et al., 2001).

The selection of terms in multiple regression linear models was an iterative process. Errors could arise from the choice of variables that defined the statistical model (Pinheiro and Bates, 2000) and from the estimation of parameters.

Bias and systematic errors of prediction of time to maturity suggests that these patterns can be due to errors in predicting certain genotypes, locations, or years. Table 7 shows that CROPGRO-Soybean significantly underestimated time to maturity for the variety with the shortest predicted lifecycle (Savoy) but classified as maturity group II. This suggests that there is an error in the estimated genotype. Alternatively, other loci regulate development in Savoy.


View this table:
[in this window]
[in a new window]
 
Table 7. Prediction errors in time to maturity and yield for soybean public varieties grown in eight locations in Illinois (1995–1999).

 
We determined soybean genotypes based on the linkage between SSR markers and the E loci. Recombination between loci can occur and there is a risk of inferring the absence of the dominant allele when it is present. If this would have been the case for any of the marker loci, the genotype for Savoy would have been more sensitive to photoperiod, hence the model would have not underestimated the time to maturity and yield (Table 7). In addition, because soybean cultivars were not derived strictly from Clark genetic background, alleles could have mistakenly diagnosed as the sequence surrounding the actual gene could vary among varieties. This is noticeable for Savoy, which alleles at Satt496 differed from the other varieties and the NILs. In contrast, at loci Satt557 all varieties had Clark allele, and at Satt229 all cultivars except Savoy carried the same allele as Clark (Fig. 2). Error of prediction for Savoy illustrates well important limitations of the method. However, recent advances in the identification and development of single nucleotide polymorphisms in soybean (Zhu et al., 2003) will help reduce the uncertainties associated with cultivar genotyping, and should increase the prediction skill of the model.

The statistics ME and RMSE calculated for time to maturity for all soybean cultivars excepting Savoy compares well with previous results (Table 4, Table 6, Mavromatis et al., 2001; Mavromatis et al., 2002). Errors in the simulation of time to maturity for cultivar Savoy led to errors in the simulation of yield, which was underestimated (ME = –390 kg ha–1). Simulated yield RMSE for all but Savoy and Vinton 81 cultivars varied from 9 to 15%, and ME varied between –3 to 5% of average yields. It should be noted that yield predictions used a subset of genetic coefficients characterizing growth parameters not estimated from E loci but fitted by Mavromatis (T. Mavromatis, personal communication, 2003; Table 3). Thus, yield predictions (Fig. 3) are not totally independent from the observed yields. Model skill to predict time to maturity and yield is considered acceptable for applications of crop models in agricultural production.


    CONCLUSIONS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
We developed a gene-based model for soybean, based on the CROPGRO-Soybean, which contributes toward linking crop's genetic architectures and whole organism phenotypic expression. The model accurately predicted time to flowering and post-flowering development phases and yield. This is significant advance with respect to previous models in predicting only time to flowering for soybean from E loci information. More importantly, we tested a gene-based model for its ability to reproduce yields and development at the field scale and for different environmental conditions and genetic materials as the ones used for model development.

The prediction skill showed by CROPGRO-Soybean linked to E loci was comparable with that of Genegro for dry bean demonstrating that the potential for the implementation of gene-based approaches are not conditioned to a particular species. However, prediction skill can vary with the level of ploidy, genetic background, and zygosity of the organism. Our results also demonstrate the ability of gene-based models implemented on the basis of a top-down methodology to predict overall system behavior. The prediction errors using a gene-based approach are comparable with those using conventional parameter estimation. From a model application perspective, gene-based approaches can help reduce the requirements for expensive and intensive experimentation to provide up-to-date genetic coefficients.

Failure to simulate yield for Savoy and Vinton 81 cultivars shows there is potential for improvement and thus to reduce uncertainties and errors involved in the development and implementation of gene-based models. Genotyping for Savoy indicated the absence of dominant alleles at the E loci. This is large difference when compared to the other cultivars, and it is inconsistent with its maturity group. Major gain in predictability is conditioned on the development of allele specific markers and high-density molecular maps. Advances in agricultural genomics will help the identification and incorporation of novel genes regulating important physiological processes expanding the capabilities of current gene-based models. Increasing the density of markers around E and other loci will reduce uncertainties during genotyping thus improving the simulation of yield. Improved gene-based models could enhance the simulation of breeding program.


    ACKNOWLEDGMENTS
 
The authors are grateful for the valuable comments received from three anonymous reviewers. Special thanks to Jeffrey W. White, Rafael A. Ferreyra, and Associate Editor Elroy Cober for their valuable suggestions during the preparation of the manuscript. Near isogenic lines were provided by Dr. Bernard. Thanks to Dr. T. Mavromatis for providing parameters to run CROPGRO.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Florida Agricultural Experiment Station, Journal Series No. R-11017.

Received for publication June 18, 2004.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
J. W. White, M. Herndl, L. A. Hunt, T. S. Payne, and G. Hoogenboom
Simulation-Based Analysis of Effects of Vrn and Ppd Loci on Flowering in Wheat
Crop Sci., March 19, 2008; 48(2): 678 - 687.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Messina, C. D.
Right arrow Articles by Vallejos, C. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Messina, C. D.
Right arrow Articles by Vallejos, C. E.
Agricola
Right arrow Articles by Messina, C. D.
Right arrow Articles by Vallejos, C. E.