|
|
||||||||
a Dep. of Agronomy, Agric. College, GyeongSang National Univ., 900 Gaza-Dong, Chinju City, Geyong-Nam, South Korea, 660-701
b Botany Dep., Iowa State Univ., Ames, IA 50011
c Dep. of Agronomy & Horticulture, Univ. of Nebraska, Lincoln, NE 68583-0915
d USDA, ARS, Bldg. 006, Room 100, BARC-West, 10300 Baltimore Ave., Beltsville, MD
e USDA, ARS, Corn Insect and Crop Genetics Research Unit, Dep. of Agronomy, Iowa State Univ., Ames, IA 50011
* Corresponding author (jspecht1{at}unl.edu)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: AFLP(s), amplified fragment length polymorphism(s) BC, backcross BG, background bp, base pair CAPS, cleaved amplified polymorphic sequence CIM, composite interval mapping cM, centimorgan DP, donor parent LG, linkage group LOD, base-10 logarithm of the odds ratio (i.e., likelihood of proposed model versus null model) LR, linear regression MG, maturity group PI, plant introduction QTL(s), quantitative trait locus (loci) RAPD(s), random amplified polymorphic DNA(s) RFLP(s), restriction fragment length polymorphism(s) RIL(s), recombinant inbred line(s) RP, recurrent parent SIM, simple interval mapping SSR(s), simple sequence repeat(s)
| INTRODUCTION |
|---|
|
|
|---|
Soybean protein and oil contents in various regions of the USA can deviate significantly from the foregoing national averages. Some of that geographic variability arises from meteorological events that, in any given season, randomly affect some regions but not others. For example, high temperatures during soybean seed development can elevate seed oil (Howell and Cartter, 1958), while severe drought can depress seed protein (Specht et al., 2001). Other geographic variability arises from region-specific climatic parameters. Notably, seed protein is typically lower in the northwestern than in southeastern soybean-growing states. In 2001, that regional difference spanned 3 percentage points, the largest ever observed in 17 yr of survey data (Hurburgh, 2001). Processors may not be able to derive the valuable 48% soybean meal if the soybean seed has too low of a protein content. To offset this geographical disadvantage, breeders developing high yielding cultivars adapted to northern and western production regions must practice coordinate selection for intrinsically greater seed protein.
Among the accessions in the USDA soybean germplasm collection (NGRP, 2001), seed protein ranges from 347 to 552 g kg-1 on a dry seed basis (µ = 435 g kg-1; n = 11779), while seed oil ranges from 65 to 287 g kg-1 (µ = 184 g kg-1; n = 11775). Such wide ranges would suggest that there is adequate genetic variability for breeders to develop cultivars high in both protein and oil. However, only 1750 (about 15%) of the 11 775 germplasm accessions have a protein and an oil content at or above their respective population means. Just 78 (0.66%) qualify if the criterion is
1.05x both population means, and only one accession qualifies at a
1.10x criterion. Even among the multientry trials conducted each year in soybean-producing states, it is rare to find any entry whose seed protein and oil content exceeds both respective constituent trial means (Thompson et al., 2001). Highly negative phenotypic and genotypic correlations between protein and oil are well documented in the soybean genetics and breeding literature (Johnson et al., 1955; Hanson et al., 1961; Hartwig and Hinson, 1972; Shannon et al., 1972; Brim, 1973; Brim and Burton, 1979; Sebern and Lambert 1984; Burton, 1987; Wehrmann et al., 1987; Wilcox and Cavins, 1995; Wilcox, 1998; Cober and Voldeng, 2000).
Seed protein and oil heritabilities are high, especially if the parental differences are extreme (Brim, 1973; Burton, 1987). Selection to enhance either seed constituent is usually successful. Population means and midparent values for protein and oil tend not to be significantly different (Thorne and Fehr, 1970), suggesting an inheritance governed primarily by additive, rather than dominance, genetic effects. Single cross, backcross, and recurrent selection approaches have been used successfully to improve seed protein content (Hartwig and Hinson, 1972; Shannon et al., 1972; Brim and Burton, 1979; Sebern and Lambert, 1984; Wehrmann et al., 1987; Hartwig and Kilen, 1991; Wilcox and Guodong, 1997; Helms and Orf, 1998; Cober and Voldeng, 2000).
In many studies, only a few genes or perhaps even one major gene seemed to govern protein content. Consider, for example, the data of Wilcox and Cavins (1995), who mated the recurrent parent (RP) cultivar Cutler 71 (408 g protein kg-1, dry weight basis) to the donor parent (DP) accession Pando (498 g kg-1). From that mating, the F4derived line with the highest seed protein content was selected and backcrossed to the RP, followed by n = 2 cycles of again selecting the highest protein BCnF4derived line to backcross to the RP. The F4derived line selected from the RP x DP, BC1, and BC2 progenies for mating to the RP (i.e., to create the BC1, BC2, and BC3) had near-similar protein contents of 474, 476, and 466 g kg-1, respectively. The high protein allele(s) present in the initial (RP x DP) F4derived line were apparently recaptured after each BC by simply selecting the highest protein BCnF4derived line and using it for the next backcross. The oil content (175 g kg-1) of the F4derived line selected from the RP (204 g kg-1) x DP (148 g kg-1) mating was also nearly identical to the 174 g kg-1 oil content of the final BC3F4 selection. Evidently, low oil allele(s) were cotransmitted with high protein allele(s) between BC cycles by simply advancing the highest protein F4 line to the next backcross. In fact, in each successive backcross progeny, the linear coefficient of regression of protein on oil became more negative and stronger (i.e., a greater R2). Similar results were obtained when Wilcox (1998) subsequently intermated Pando with three cultivars possessing genetic male-sterility, creating a base population for initiating the first of eight cycles of recurrent selection for higher protein. The allele(s) for high protein were mostly fixed by cycle five, such that 66% of the plants in the next cycle produced seed with >479 g kg-1 protein, with no significant improvement in later cycles. Again, the inverse relationship between protein and oil steepened and strengthened with each selection cycle. A strongly coordinate high protein and low oil response to selection for just higher protein suggests preferential fixation of either (i) high protein allele(s) that were pleiotropic for low oil or (ii) high protein allele(s) linked in repulsion phase with low oil allele(s). If the latter, then such linkage(s) had to be tight enough to preclude the generation of coupling-phased recombinants in populations of the size used in the above studies, otherwise the fixation of any such recombinant would have certainly weakened, if not reversed, the negative correlation between protein and oil.
During the last decade, several research groups, using various population types and different marker-based mapping techniques, have identified many QTLs governing soybean seed protein, oil, and/or yield. Diers et al. (1992) measured the protein and oil content of seed of 60 F2:3 lines derived from the mating of a high yielding G. max breeding line (A81-356022) with a G. soja Siebold & Zucc. accession (PI 468916), then genotyped those 60 lines with 252 marker loci that mapped to 31 linkage groups (LGs). Restriction fragment length polymorphism (RFLP) markers in LG-I (i.e., A144, A407, A688, and K011) and in LG-E (A053, A242, and SAC7) identified a G. soja segment in each LG that increased seed protein but decreased seed oil. Sebolt et al. (2000) attempted to confirm these associations by identifying an F2derived line homozygous for the G. soja segments in LG-I and LG-E, and then backcrossing it to the RP (A81-356022). The RFLP marker A144 on LG-I and the classical marker Pb on LG-E were used for a marker-assisted introgression of the respective G. soja genomic segments into 53 BC3F4derived lines that were then tested for protein and oil content in multiple environments. The LG-I G. soja segment conditioning high protein but low oil was successfully introgressed, but the LG-E segment was not. Lines with the introgressed LG-I segment exhibited earlier maturity, taller plants, smaller seed, and lower yields. A BC3F4derived line homozygous for the LG-I G. soja segment was then mated to the cultivars Parker, Kenwood, and C1914, the latter a high protein breeding line of Pando origin. Segregation of the LG-I G. soja segment was readily detectable in the Parker and Kenwood F2 populations (i.e., 10 g more protein but 4.5 g less oil per kg of seed), but not in the C1914 F2 populations. That result led Sebolt et al. (2000) to suggest that the LG-I high protein allele of the G. soja accession (PI 468916) was probably allelic with the high protein allele of C1914 (and by inference, Pando).
The LG-I protein (or oil) QTL has since been detected by others. Brummer et al. (1997) examined eight different populations of F2derived lines and identified associations of soybean protein and/or oil content with RFLPs of various linkage groups. One of the strongest associations was with RFLPs A407 and A144 in a population derived from the mating of a breeding line (M82-806) with a high protein line (HHP; 25% G. soja by pedigree). Mansur et al. (1993), using an F5 population from a Minsoy x Noir 1 mating, reported a protein QTL associated with RFLP L048. No association of any LG-I marker with protein was detected in a recombinant inbred line (RIL) population of the same mating (Mansur et al., 1996), although Specht et al. (2001) did report an oil QTL between LG-I SSRs Satt127 and Satt239. Csanádi et al. (2001) detected a LG-I QTL for oil (and for protein in some tests) in a mating of the early-maturing cultivars Proto (462 g kg-1 protein) and Ma Belle (390 g kg-1 protein). Lee et al. (1996), Orf et al. (1999), and Qiu et al. (1999) did not detect a LG-I QTL for protein or oil, but protein variation in their populations was small. Statistical parameters for the protein and oil QTLs detected by each of the foregoing researchers were recently summarized (see Appendix Table I of Olsen, 2001). Map positions of the reported protein and oil QTLs are also available on-line (SoyBase, 2002).
We report here the results of a QTL analysis of seed protein, oil, and yield in a population of 76 F5derived RILs from a high protein G. max germplasm accession PI 437088A mated to a high yielding G. max cultivar Asgrow A3733. Surprisingly, the RILs segregated for the same LG-I protein QTL that has been detected in other G. soja and G. max germplasm. Because this LG-I QTL seems to be more ubiquitous in high protein germplasm than originally surmised, our objective in this research was to delineate more precisely the map position of this LG-I QTL by using more markers than had been used in prior studies. Our second objective was to obtain more precise estimates of the direction and magnitude of the parental allele effects on RIL seed protein, oil, and yield, and to interpret these allelic effects with respect to the observed genotypic-level correlations among those three traits.
| MATERIALS AND METHODS |
|---|
|
|
|---|
PI 437088A was mated as a male to Asgrow A3733 in the summer of 1992. Five F1 plants were grown in the greenhouse during the winter of 1992-93. The F2 generation was grown in the summer of 1993. The F3 and F4 generations were grown at the USDA Tropical Agriculture Research Station at Isabela, Puerto Rico, during the November to February and February to May seasons of 1993-1994, and advanced from the F2 to F5 by single-seed descent. The F5 generation was grown at Lincoln, NE, during the summer of 1994. About 100 random F5derived F6 (i.e., F5:6) progenies were grown at Lincoln, NE, in 1995 for a seed increase, but only 76 generated sufficient seed for inclusion in the subsequent field trials.
Quantitative Trait Measurement and Analysis
Performance trials of the 76 RILs were conducted in the F5:7 (1996) and F5:8 (1997) generations at the Agricultural Research and Development Center near Mead, NE. The soil at the test site is a Sharpsburg silty clay loam (fine, smectitic, mesic Typic Argiudoll). Each year, the experimental design was a randomized complete block with two replicates. The treatment design was a split-plot. The main plots were six seasonal water amounts cumulatively generated by weekly sprinkler irrigation that replaced 0, 20, 40, 60, 80, 100% of the evapotranspiratory water loss (adjusted for rainfall) since the prior irrigation, a procedure described by Specht et al. (1986)(2001). The 80 subplots consisted of 76 RIL entries plus two entries of each of the two parents. Each subplot was comprised of two 3.05-m plant rows spaced 0.76 m apart. The field trials were planted on 30 May 1996 and 13 May 1997 at a seeding rate of 370000 viable seed ha-1 and a sowing depth of 3 cm.
The two-row subplots were harvested with a self-propelled plot combine to estimate subplot seed yield (kg ha-1). Two 100-seed samples taken from each subplot were used to estimate 100-seed weight (g). Seed yield and 100-seed weight data were adjusted to the standard 130 g kg-1 seed moisture content. A 75-g seed sample randomly drawn from each subplot was assayed with the standard near-infrared transmittance technique to quantify seed protein and oil content (g kg-1) on a zero percentage moisture basis (Panford, 1987). Maturity (d from planting), plant height (cm from ground to stem tip), and plant lodging (visually scored on a scale of 1 = erect to 5 = prostrate) data were collected before harvest, but only on the 0 and 100% irrigated subplots.
Trait data were subjected to an analysis of variance and covariance by means of the PROC GLM module (with a MANOVA statement) of version 8 of PC SAS for Windows (SAS, 1999). Years, water levels, genotypes, and replicates were treated as random effects. Heritability was estimated on an entry mean basis. The associated confidence intervals were computed as described by Knapp et al. (1985). Genotypic correlations between traits were estimated as described by Mode and Robinson (1959).
Leaf Collection and DNA Isolation Procedures
In early July of 1996, young leaf tissue was collected from 25+ plants of each RIL (and each parental) plot of the first replicate, transported to the laboratory on dry ice, frozen, lyophilized, and then ground to a fine powder before being stored at -20°C. DNA was extracted by means of a protocol modified from that described by Saghai-Maroof et al. (1984). After extraction of DNA and its resuspension in TE, the DNA samples were diluted to 20 ng µL-1 for marker analysis.
Molecular Marker Types and Assay Protocols
The initial genomic characterization of the parents and RILs was performed by means of RAPD markers and a polymerase chain reaction (PCR) protocol (Williams et al., 1990). Each PCR reaction was performed in 10 µL containing 50 mM Tris (pH 8.5), 2 mM MgCl2, 20 mM KCl, 0.5 mg mL-1 bovine serum albumin (BSA), 2.5% (w/v) Ficoll 400, 0.02% (w/v) xylene cyanol, 4 ng µL-1 template DNA, 0.4 µM primer, 100 µM of each dATP, dCTP, dGTP, and dTTP, and 0.6 unit Taq polymerase. To prevent evaporation during PCR, the reaction mixture was covered with a mineral oil drop. PCR was conducted in a thermocycler (Model PT-100, MJ Research, Watertown, MA) programmed to this three-step cycling protocol: (i) two cycles in which the reaction mix tubes were heated to 91.0°C for 75 s to allow DNA denaturation, cooled to 42.0°C for 22 s to allow primers to bind, then reheated to 72.0°C for 70 s to allow DNA synthesis by the polymerase; (ii) 38 cycles of 16 s at 91.0°C, 2 s at 42.0°C, and 70 s at 72°C; (iii) one final cycle in which the extension step lasted 4 min at 72.0°C, before the reaction mixes were cooled to 4°C. The amplification products were loaded on 1.2% (w/v) agarose gels (Amresco, 3:1 agarose, Solon, OH) for electrophoretic separation at 74 V for 4 h. Ethidium bromide staining of the separated amplicons allowed their visualization under UV light. All scored amplicons were well within the 2176- to 154-base pair (bp) sizes of largest and smallest markers of the molecular weight standard VI (Boehringer Mannheim, Indianapolis, IN).
About 1000 10-bp primers differing in sequence were obtained from Operon Technologies (Alameda, CA) to screen the parents for amplicon presence (+)/absence (-), but only 370 of the primers produced an amplicon bimorphism. These primers were used to assay 20 random RILs to determine if the parental +/- amplicon difference actually segregated. About 340 amplicons met this test and were then examined in all 76 RILs. Eleven amplicons whose +/- segregation in the initial screen was not reproduced with complete fidelity in the final screen were dropped. The remaining 329 amplicons were then assigned a marker name consisting of the Operon product code for the primer (e.g., OP_A01), but appended with a small-case letter suffix if more than one +/- amplicon was produced by the primer (e.g., OP_A01a, OP_A01b, etc.).
Codominant SSR markers were not publicly available in sufficient numbers until late 1998 (Cregan et al., 1999). Thereafter, the two parents were screened with about 250 SSR primers. Only 103 of these were parentally polymorphic, but were distributed over the 20 linkage groups. The SSR-PCR amplification protocols were similar to those used by Akkaya et al. (1992) and Rongwen et al. (1995). The PCR reaction mix contained 50 ng of genomic DNA (parent or RIL), 0.1µM of each of the paired primers, 5x reaction buffer (250 mM Tris), 0.6 unit DNA polymerase, and 2.5 µM of each of the four dNTPs. Each sample was subjected to 31 cycles of denaturing (25 s at 94°C); annealing (25 s at 47°C); and extension (25 s at 68°C) in a thermocycler, followed by a final extension step (3 min at 72°C) and incubation at 4°C. For 74 of the 103 SSRs, electrophoretic separation of their 32P-labeled amplicons on polyacrylamide gels was contracted to Biogenetics Services Inc. (Brookings, SD). Parental amplicons of the other 29 SSRs were sufficiently different in base pair size to permit their electrophoretic separation on high-resolution 5% (w/v) agarose gels (SFR Agarose, Amaresco, Solon, OH) at 74 V for 4 h, followed by visualization via ethidium bromide staining. The SSR amplicons fell within the 2642- to 50-bp size range of the markers in molecular weight standard XIII (Boehringer Mannheim). The 20- to 24-bp sequences of the 103 SSR primer pairs are available on-line (SoyBase, 2002).
Classical Markers
The 76 RILs segregated for the black/brown seed hilum color phenotypes (on a yellow seed coat) governed by the R/r locus on linkage group K (LG-K), and for the brown/tan pod color phenotypes governed by the L2/l2 locus on LG-N (Shoemaker and Specht, 1995). The genotypes of the PI 437088A and Asgrow A3733 parents were thus rrl2l2 and RRL2L2, respectively.
Cysteine Protease CAPS Marker
Endoproteolytic cleavage has been associated with the accumulation of soybean storage proteins (Herman and Larkins, 1999). However, the two parents were not polymorphic for the endopeptidase isozyme locus (Enp) located on LG-I. A search of GenBank (NCBI, 2001) indicated that only one soybean endopeptidase had been cloned to datea cysteine protease gene (GenBank Acc. No. D28876). To map this gene, a cleaved amplified polymorphic sequence (CAPS) marker was constructed. An oligonucleotide primer pair was synthesized to amplify a unique portion of the published cDNA sequence. When parental PCR products were tested with a battery of restriction enzymes, a HinfI restriction site was detected in the Asgrow A3733 amplicon but not in the PI 437088A amplicon. The 888-bp uncleaved HinfI fragment from PI 437088A was subcloned and sequenced to construct the following forward and reverse primers:
PIHNF1: CAACTTACAAGATGCTCCG
PIHNF2: ATCATCAACCACCCTGAG
This primer pair amplified an 844-bp fragment in each parent. On treatment with HinfI, the Asgrow 3733 amplicon was cleaved into two fragments (580 and 264 bp), whereas the PI 437088A amplicon remained intact. The PIHNF- generated amplicons from each of the 76 RILs, after HinfI treatment, were loaded on 1.2% agarose gels (Amresco, 3:1 agarose, Solon, OH), electrophoresed at 74 V for 4 h, then stained with ethidium bromide for visualization under UV light. This codominant CAPS marker was assigned the marker name CystProt.
Mature Seed Protein Marker
Because the two parents differed substantially in seed protein content, a gel-based search for qualitative protein differences in mature parental seed was conducted. A 50-seed sample, taken from pure RIL and parent seed stock produced in a common environment, was ground into a fine powder that was then stored at -20°C. A sample of soybean flour (i.e., a microcentrifuge tube filled with flour to the 0.5-µL line) was placed into a mortar (on ice) to which 1 mL of cold extraction buffer was added. The extraction buffer consisted of 50 mM Tris (pH 7.5), 10 mM EDTA (pH 8.0), 5 mM dithiothreitol (DTT), 1% (w/v) insoluble polyvinylpolypyrrolidone (PVPP), and 0.5 mM phenylmethylsulfonyl fluoride (PMSF). The flour and buffer were mixed in the mortar with the pestle until a thick slurry formed, which was then poured into a 1.5-mL microcentrifuge tube and spun in a centrifuge at 16000 g for 10 min at 4°C. The supernatant was removed and its protein concentration was determined (Bio-Rad DC Protein Assay, Hercules, CA). The protein samples were then diluted to 3 µg µL-1, mixed with the SDS-PAGE loading buffer, denatured by boiling for 5 min, then cooled. Sample aliquots of 30 µL were loaded into individual wells of a 12% polyacrylamide resolving gel (Bio-Rad, Hercules, CA), with 1x TG buffer in the upper chamber and 0.5x TG buffer in the bottom chamber. Electrophoresis was conducted at 25 mA for about 4 h, when the dye front moved off the gel edge. Each parent exhibited a unique protein band the other lacked, but the two bands segregated in the RILs as codominant alleles at a single locus that was assigned the marker name MatSdPro.
Linkage Map Construction
The observed allelic segregation ratios at each of the 436 loci329 RAPDs, 103 SSRs, two classical markers, one CAPS marker, and one seed protein markerwere subjected to Chi-square tests for goodness-of-fit to expected ratios. For codominant SSR markers, a 15A:2H:15B segregation ratio was expected in this F5derived set of RILs, where the letters code for RILs homozygous for the A allele of Asgrow A3733 or the B allele of PI 437088A, or for heterozygous AB RILs. For dominant RAPD markers, the expected segregation ratio was either 17D:15B, with D coding for the indistinguishable AA and AB RILs when the B allele produced no amplicon, or 17C:15A, with C coding for the indistinguishable BB and AB RILs when the A allele produced no amplicon. To obtain a genomewise significance criterion of
' = 0.05 for this large number (436) of 3- or 2-class Chi-square tests, an approximate Bonferroni adjustment was applied to ensure an appropriate testwise significance criterion (i.e.,
=
'/436 = 0.0001).
Mapmaker/exp 3.0 software (Lander and Botstein, 1989; Lincoln and Lander, 1993) was used to create a genetic map of the ri self type, which required replacing the few SSR marker H (heterozygote) codes in the raw data file with dashes (e.g., missing data). Mapmaker's Group command, used with a stringent grouping LOD of 5.5 to preclude spurious grouping results, led to 385 of the 436 markers coalescing into 50 LGs. Each of those 50 LGs was then tested one at a time for linkage to the other 49 other LGs by means of a relaxed grouping LOD of 3.0. This resulted in an unambiguous fusion of 15 LGs with other LGs, reducing to 35 the total number of LGs. Those 35 LGs were then tested one at a time for linkage with each of the 51 unlinked markers, also by means of a relaxed grouping LOD of 3.0. This resulted in the unambiguous linkage of 31 of the 51 markers with the 35 LGs, leaving 20 markers (of the 436) not linking to any of the 35 LGs. Marker ordering within LGs of eight or fewer markers was accomplished directly by use of Mapmaker's Compare command. Marker ordering within the larger LGs was accomplished by Mapmaker's Order command. A "seed order" was established by means of a minimum grouping LOD of 3.0, a minimum seed order start size of five markers, and a minimum distance of 2 cM between those markers. Each remaining marker was then placed in that order by LOD placement thresholds of 4.0 (strict) and 3.0 (relaxed), and a minimum window size of 3 (i.e., at least one marker on each side of a placed marker). After all markers were placed in the LG, the Ripple command was then used with a LOD threshold of 3.0 and a (marker) window size of 6 to identify/correct any minor marker placement errors.
QTL Analyses
QTL analysis was performed by the linear regression (LR), simple interval mapping (SIM), composite interval mapping (CIM) modules of QTL Cartographer (Zeng, 1994; Basten et al., 2001). The LOD score criteria for evaluating the statistical significance of QTL effects in each module were estimated by permutation (Churchill and Doerge, 1994). A minimum of 1000 permutations was generated in each module (i.e., LR, SIM, and CIM) for each traityear data set. Permutation output differences among the trait-year data sets were small, so they were averaged to obtain final LOD score criteria (equivalent to
' = 0.05 genomewise error rates) of 3.4 for LR, 3.2 for SIM, and 3.8 for CIM analyses. A limited number of background (BG) markers for the CIM analysis were identified via the forward/backward stepwise regression option of QTL Cartographer using conservative probability thresholds (Pin = 0.01; Pout = 0.01). A CIM window parameter of 1 cM was chosen to exclude, from the BG marker group, any marker located within 1 cM of the two markers flanking an interval being tested for a putative QTL peak.
The joint map analysis module of QTL Cartographer was applied to the 1996 and 1997 data of each trait to evaluate the significance of G x E interaction, which in the present case is actually the QTL x Y interaction for each trait. Jiang and Zeng (1995) noted the interval maximum for the joint mapping statistic is roughly approximated by a Chi-square distribution with 2m+1 degrees of freedom, where m = the number of jointly mapped traits (i.e., m = 2 for the 1996 and 1997 values of the trait). To derive a genomewise error rate of
' = 0.05 for the joint trait mapping statistic, the Bonferroni correction was applied:
= 1 (1 -
')1/M, where M is the total number of marker intervals in the genome. In the present case, M = 381 (i.e., 416 loci in 35 LGs, see Fig. 1). Thus,
= 0.000134619 which, with 2x2+1 = 5 degrees of freedom, translated into a Chi-square value of 25.08, which was equivalent to a LOD score of 5.4 (25.08 x 0.217). The G x E statistic itself was computed only for those few marker intervals whose joint mapping LOD statistic exceeded the 5.4 criterion, and it is asymptotically distributed as a Chi-square with two degrees of freedom (Jiang and Zeng, 1995). Thus, at
= 0.05, the G x E statistic has a Chi-square significance criterion of 5.99, which translates into a LOD score of 1.30 (5.99 x 0.217).
|
| RESULTS |
|---|
|
|
|---|
Twenty of the 436 markers failed to group into LGs. Fourteen (13 RAPDs and one SSR) had significantly distorted segregation ratios. The six others (all SSRs) occupy terminal map positions on LGs (Cregan et al., 1999) and did not link because the linkage distance to the next inward marker was >45 cM. These terminal SSRs included Satt371 at the bottom of LG-C2, Satt184 at the top of LG-D1a, Sctt008 at the top of LG-D2, Sat_124 at the top of LG-E, Satt308 at the bottom of LG-M, and Satt487 at the top of LG-O (Fig. 1).
The summed genomic distance of 2943 cM for the RIL map in Fig. 1 compared favorably with the 3003-cM map of RFLPSSR markers in the F7derived RIL population documented in Cregan et al. (1999). Yamanaka et al. (2001) recently reported a 2909-cM map based on RFLPSSRP markers in a F2 population. Ferreira et al. (2000) reported a 3275-cM map of 106 RAPD and 250 RFLP markers in an F6derived RIL population. Keim et al. (1997) reported a summed distance of 3441 cM for a map of 840 mostly dominant AFLP markers in the F6derived RIL population. In the latter two reports, marker clustering was evident in the maps. The authors speculated that this clustering was likely due to markers occupying positions in heterochromatic (possibly centromeric) regions of the chromosomes in which recombination was greatly reduced. Some clustering of RAPD markers was also evident in our map (Fig. 1).
The observed segregation of the codominant CAPS marker CystPro, and that of the codominant protein mobility marker MatSdPro, did not differ significantly from the expected 15A:2H:15B ratio. The CystProt marker mapped to LG-D2, about 12.1 cM from Satt154, whereas the MatSdPro marker mapped to LG K, about 10 cM from Satt260 (Fig. 1).
Phenotypic Data
The analyses of variance revealed that the 0 to 100% irrigation treatments had a modest, though statistically significant, linear impact on all six traits. Seed yield, seed protein, 100-seed weight, maturity, and plant height were enhanced, whereas seed oil was depressed by incrementally greater amounts of seasonal water. Such responses have been observed in prior studies (Specht et al., 1986, 2001). The irrigation x RIL and the year x irrigation x RIL interactions were not significant for any trait. Although the year x irrigation interaction was significant for seed protein and oil, it was simply due to a larger linear effect of irrigation in 1996 than in 1997. These findings indicated that RIL means could be averaged over irrigation treatments.
Significant differences among the RILs were detected for all traits. The RIL x year interaction was significant for yield, protein, oil, and 100-seed weight, indicating that RIL means could not be pooled over years for subsequent QTL analyses. This interaction was not unexpected, particularly for yield, and might have been attributable to a Bean pod mottle virus epidemic that occurred in 1997. One symptom of viral infection was the failure of plant stems to senesce at physiological maturity, resulting in what is commonly termed "green stems" (Schwenk and Nickell, 1980). Pods on green stems eventually matured, but the green stems, when crushed by the combine cylinder, exuded moisture onto seeds and pods, making difficult a clean separation of seed from haulm (i.e., pod walls, stems and branches, plus any nonabscised petioles and leaves). This led to imprecision in the 1997 yield measurement. Variable delays in pod maturity also rendered unreliable the estimates of 1997 RIL maturity.
In both years, the seed protein content of the PI 437088A parent substantially exceeded that of the Asgrow A3733 parent, whereas the inverse was true for seed oil and yield (Table 1). The population means were near the mid-parent values. The trait data were normally distributed.
|
|
|
|
QTL Analysis
Statistically significant QTLs for soybean seed protein and oil in both years, and for yield and maturity in 1996, were detected in a genomic region located in the upper half of LG-I (Fig. 1), but were not detected elsewhere in the soybean genome (except for 1996 yield and maturity). No significant QTL for 1997 yield was detected anywhere in the genome, probably because of yield variability associated with the difficulty of completely threshing the green-stemmed plants that year. Statistically significant 1996 and 1997 QTLs were detected for plant height, lodging, and 100-seed weight, but none of these QTLs mapped to LG-I (data not shown), so these will not be discussed in this paper. The two parentally allelic seed protein markers, CystPro on LG-D2 and MatSdPro on LG-K, were not associated with the parental difference in seed protein content, since no statistically significant QTLs for seed protein content were detected near those markers.
The LG-I QTL scans generated by the LR, SIM, and CIM modules of QTL Cartographer are presented in Fig. 3. The LOD peaks and valleys detected in the LR and SIM scans suggested that LG-I possessed one major QTL and perhaps two or three (nearby) minor QTLs for protein, oil, and 1996 yield. Other researchers using SIM have reported multiple QTLs in LG-I, and in fact SOYBASE (2002) currently lists 11 protein QTLs (numbered 1-1 to 1-8, 3-12, 10-1, and 11-1) in this small region of LG-I. However, CIM evaluates marker-flanked genomic intervals for evidence of a QTL with greater precision than SIM, by using background (BG) markers to control trait variation arising from the BG genomic segments residing outside of each (window-sized) marker-flanked segment being tested for a QTL (Zeng, 1994). To avoid the well-known problem of identifying too many BG markers for CIM, conservative threshold parameters (i.e., Pin = Pout = 0.01) were used in this study for the forward (in), backward (out), stepwise regression module of QTL Cartographer. The window size parameter was also narrowed to 1 cM (from its default of 10 cM) to enable a more precise positioning of the protein, oil, and yield QTLs on LG-I, and to ensure a more definitive estimate of each allelic effect. In contrast to SIM-identified major and minor peaks on LG-I, the CIM-based scan revealed only one QTL of strong effect (Fig. 3). In effect, when the LG-I marker associated with the primary QTL was used as a BG marker (to control trait variation), the other LG-I markers had no independent effect on the trait. Evidently, the SIM-identified minor QTLs flanking the major QTL were statistical artifacts.
|
|
The CIM-estimated additive effects of allele A (i.e., the Asgrow A3733 allele) at the LG-I protein and oil QTL(s) were a respective 1.00 and +0.57 (1996) and 0.89 and +0.59 (1997) percentage points (Table 4). In comparison, Diers et al. (1992) reported that the G. max segment of LG-I segregating in their G. max x G. soja population had protein/oil additive effects of about -1.10/+0.75, respectively, whereas Sebolt et al. (2000) reported values of -0.975/+0.45 for the same segment. In the present study, the ratios of the protein/oil additive effects for the LG-I protein and oil QTL(s) were -1.70 (1996) and -1.52 (1997), which were similar to -1.77 and -1.51 values computed for the 1996 and 1997 ratios of the protein/oil phenotypic regression coefficients (Fig. 2). The additive effect of allele A at the LG-I yield QTL was a statistically significant +134 kg ha-1 in 1996 (Table 4), but was not detectable in 1997 becaue of the yield measurement imprecision that year.
The joint mapping module of QTL Cartographer was applied to the 1996 and 1997 data for protein, oil, and yield to assess the significance of any CIM-based QTL x Y interactions. The QTL x Y interaction for protein was nominally significant, arising from a negative additive effect of QTL allele A on seed protein content that was somewhat greater in 1996 than in 1997. The significant QTL x Y interaction for seed yield was the result of a large positive effect of QTL allele A on yield in 1996, but an essentially zero effect in 1997.
| DISCUSSION |
|---|
|
|
|---|
QTL interval analyses indicated that seed protein was increased by 1.84 percentage points on the substitution of two PI 437088A alleles (i.e., BB) for two Asgrow A3733 alleles (i.e., AA) at OPAW13a, the marker nearest the LG-I protein QTL (Table 4). However, that same substitution coordinately depressed seed oil by 1.14 percentage points, and in one of the trial years, depressed seed yield by 268 kg ha-1 (3.9 bu ac-1). These effects are depicted in Fig. 2, where solid and open symbols identify RILs with respective AA or BB genotypes at the locus OPAW13a. In 1996 trial, RILs of the BB genotype (i.e., open symbols) fall mainly in the left side of the graph, consistent with their lower yield, but greater protein and lower oil, compared to RILs of the AA genotype (i.e., solid symbols). Only the protein and oil pattern was repeated in the 1997 trial, presumably because of imprecision in the yield measurement that year.
No RIL with a recombinant coupling-phased phenotype of high protein high oil (or its alternative of low protein low oil) was clearly identifiable in Fig. 2. This outcome was consistent with the null hypothesis of a single QTL conditioning (inverse) pleiotropic effects on protein and oil. The alternative hypothesis, that a protein QTL is in tight (repulsion phase) linkage with an oil QTL, is less parsimonious. Its acceptance requires one to pose an additional hypothesis of what the true recombination fraction between two such QTLs might be (Hanson, 1959), given that no (i.e., x = 0) recombinants were actually observed in the RIL population (i.e., n = 76). One might reasonably assume the recombination fraction (R = x/n) is less than one recombinant per 76 RILs. Because of additional selfing, the recombination fraction (R) in an RIL population will be greater than that (r) of an F2 population. Using the equation r = R/(2-2R) derived by Haldane and Waddington (1931) for selfed (not sib-mated) RIL populations, an RIL recombination fraction (R) of < 1/76 translates into an F2 recombination fraction (r) of <0.0067. When transformed into Haldane map units, i.e., m = -0.5 x ln (1 - 2r), the linkage distance between two LG-I QTLs, one for protein and one for oil would clearly have to be less than 0.67 cM.
A QTL with inversely pleiotropic effects on protein and oil is consistent with the results of Wilcox and Cavins (1995) and Wilcox (1998). In those studies, the linear coefficients for protein regressed on oil strengthened from -1.51 to -1.72 during backcross introgression of the high protein allele from Pando into the high yield recurrent parent, and from -1.55 to -1.75 during cycles five to eight of recurrent selection for high protein. The QTL allelism test conducted by Sebolt et al. (2000) demonstrated that the G. soja (PI 468916) allele for high protein and low oil was allelic with the Pando allele for high protein and low oil. Thus, it is now clear that in the Pando populations, the genotypic-level regression coefficients were simply reflective of the ratio of the additive effects exerted by the Pando (= G. soja) allele on soybean seed protein and oil. We have not yet confirmed the allelism of the Pando allele with the PI 437088A allele. However, both alleles map to the same small segment of LG-I, and the latter allele has protein/oil additive effect ratio of about 1.51 to -1.77, which is certainly consistent with a hypothesis of allelism.
The 1996 CIM analysis indicated that a QTL for maturity might reside near the QTL(s) for protein, oil, and yield. The map position and strength of this maturity QTL will need to be verified, since only two of the six subplots per replicate were scored for maturity in 1996, and 1997 maturity data were not reliable (i.e., green stems). A physiological coupling of earlier senescence with higher soybean seed protein would not be entirely unexpected, given the "self-destruct" hypothesis of Sinclair and de Wit (1975). These authors hypothesized that soybean, because of its high seed protein content, is unable to meet the massive demand for N (during seed filling) from just soil N uptake and N2 fixation, and must thus remobilize N from its vegetative tissues. An early "shutdown" of carbon assimilation would be expected to hasten senescence.
We conclude with a discussion of the allelic effects of the LG-I QTL and the correlation of yield with seed protein (negative) and seed oil (positive) observed in our study (and in many other studies). In a now classic paper, Hanson et al. (1961) examined the genetic basis of the inversely coordinate genetic and environmental variation in soybean seed protein and oil. The authors noted that the average calorific values for soybean oil and protein were about 9.4 and 4.6 Kcal g-1, which translated into an oil/protein energy ratio of about 2.0 (i.e., 9.4/4.6). The synthesis of oil or protein requires energy, which can be obtained by oxidizing (some of the) carbons present in carbohydrate. However, the authors noted that seed carbohydrate changed little on genetic (or environmental) alteration of the seed protein (or oil) content. Indeed, increases in protein were accompanied by coordinate decreases in oil, leading the authors to speculate that the synthetic pathways for protein and oil compete for the same carbon and energy sources. Hanson et al. (1961) computed that, on a nongenetic (environmental) scale, the protein/oil exchange ratio was nearly equivalent to the intrinsic calorific ratio of 2.0 (i.e., 1.92 unit of seed protein was gained per 1.0 unit loss in seed oil). However, on a genotypic scale, computed from a large set of soybean genotypes, the calculated protein/oil exchange ratio was substantially less, about -1.5 or -1.6. This finding led the authors to conclude that if breeding lines high in seed oil were to be genetically converted into lines high in seed protein, at a protein/oil exchange ratio less than a calorific ratio of two, then the high protein lines should be higher yielding. Although Hanson et al. (1961) were aware of prior reports of negative correlations of protein with yield, and in fact observed a negative correlation in their own data, they still concluded that high protein and high yield were compatible. Since their report, however, negative genotypic correlations between yield and protein have been repeatedly documented in the soybean literature. We now show that the PI 437088A allele of a LG-I QTL positively affects protein, but negatively affects oil and yield. The energetic cost associated with increased protein deposition (at the expense of oil) in the soybean seed via genetic (but apparently not environmental) means would appear to be seriously underestimated. Cloning this QTL would prove useful, if only to learn how it functionally mediates the allocation of photosynthetic carbon and energy between seed protein, oil, and yield.
| NOTES |
|---|
|
|
|---|
Received for publication June 18, 2002.
| REFERENCES |
|---|
|
|
|---|