|
|
||||||||
a Torrey Mesa Research Institute, Syngenta Research and Technology, 3115 Merryfield Row, San Diego, CA 92121
b Dep. of Crop Sciences, Univ. of Illinois, 334 NSRC, 1101 W. Peabody Blvd., Urbana, IL 61801
c Syngenta Biotechnology, 3054 Cornwallis Rd., Research Triangle Park, NC 27709
d Syngenta Seeds B.V., Westeinde 62, Enkhuizen, the Netherlands
* Corresponding author (mhudson{at}uiuc.edu).
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: EDTA, ethylenediaminetetraacetic acid FDR, false discovery rate MES, 2-N-morpholino-ethane-sulfonic acid MOID, match-only integral distribution RH, relative humidity PCR, polymerase chain reaction RTPCR, reverse transcription polymerase chain reaction SAM, Statistical Analysis of Microarrays SAPE, Streptavidin Phycoerythrin
| INTRODUCTION |
|---|
|
|
|---|
Phylogenetic analyses based on DNA sequences have indicated the close relationship between the genera Brassica and Arabidopsis (Yang et al., 1999, Koch et al., 2001). Recent comparative genomic analyses have clearly demonstrated both synteny and microsynteny between Arabidopsis and Brassica, even though Brassica oleracea may lack homologs of some Arabidopsis genes (Hall et al., 2002; O'Neill and Bancroft, 2000). It is clear that gene content, gene order, and homology at both the nucleic acid and amino acid sequence level are closely related between Brassica oleracea and Arabidopsis (Quiros et al., 2001).
Several discoveries have been made in Arabidopsis and elsewhere using Affymetrix (Affymetrix, Santa Clara, CA) microarrays as an exploratory tool (Tepperman et al., 2001; Salter et al., 2003; Loguinov et al., 2004; Monte et al., 2004). Used in this context, the microarray guides the design of further, lower-throughput experiments to find genes that rapidly or strongly change in expression (e.g., gel blots or quantitative reverse transcription polymerase chain reaction [RTPCR]). These genes provide biomarkers for the response at the cellular level and can provide insight into signal transduction and developmental mechanisms. Cross-species exploratory microarray hybridization is a potentially useful technique for applying model organism genomics in related species for which few functional genomics resources are available (Zhu et al., 2001b; Chismar et al., 2002; Lee et al., 2004). Using such resources, it is not possible to monitor the entire transcriptome simultaneously, and it is necessary to confirm any result using a second method of transcript analysis, since the underlying genome is unknown and hence the actual transcript in the target species must be sequenced and verified. For this reason, cross-species hybridizations are necessarily exploratory experiments, and researchers confirm potential biomarkers of the response under investigation using a second technique. For example, Zhu et al. (2001b) used an oligonucleotide microarray designed using the rice (Oryza L. ssp. japonica) genome to assay the barley (Hordeum vulgare L.) transcriptome and confirmed the results by RNA gel blot analysis.
Several issues remain to be addressed regarding the successful application of this technique in Brassica using arrays designed using Arabidopsis genome data, since these two members of the Brassicaceae family are significantly divergent. Sequence distance values of up to 22.9% have been reported for nuclear coding DNA sequence within the Brassicaceae family (Koch et al., 2001). Divergence between the tribe Brassicae and Arabidopsis has been estimated to have occurred 14.5 to 20.4 million years ago, based on a mitochondrial sequence divergence of 0.71% (Yang et al., 1999); this divergence would be expected to be at least 12-fold higher between nuclear DNA sequences (Wolfe et al., 1987). The sequence differences between the two species may present a barrier in applying Arabidopsis microarrays to Brassica. In a previous study, Girke et al. (2000) tested the feasibility of directly applying Arabidopsis cDNA microarrays to profile developing Brassica seeds. While the correlation coefficients were high between the hybridization signals in the two species, the hybridization signal intensity was approximately 50% lower for Brassica compared to Arabidopsis targets. The reported transcription pattern of selected genes with lower transcript abundance could not be validated on a genomic scale by RNA blot analysis or RTPCR using Brassica-specific probes (O'Hara et al., 2002). Thus, it is necessary to address these issues before applying Arabidopsis GeneChip microarrays to accurately profile gene expression in Brassica.
We describe a system for using microarrays across species boundaries, using genomic DNA hybridization to oligonucleotide arrays and purpose-written analysis software to control for species-specific differences in hybridization. We then perform an exploratory experiment to investigate the reprogramming of the transcriptome during germination of Brassica seeds. Seed germination has been the subject of intense biochemical, molecular, and physiological research (Jacobsen and Beach, 1985; Bewley and Marcus, 1990; Bewley 1997; Bewley et al., 2000). Functional genomic approaches have great promise to reveal more mechanistic details of this process (Ogawa et al., 2003; van der Geest, 2002; Bové et al., 2001) and have already proven their potential to generate results on an unprecedented scale. For example, microarray analyses have been used to identify seed-specific transcripts and promoters (Girke et al., 2000; Zhu et al., 2001a). It was reported that 25% of 2600 genes investigated showed twofold or more increased abundance in seeds than leaves or roots (Girke et al., 2000). The control of transcription during development of embryos and filling of seeds has also been investigated using microarray techniques (Ruuska et al., 2002; Lee et al., 2002; Zhu et al., 2003), as has the response of seed to gibberellin treatment during germination (Ogawa et al., 2003). More recently, the work of Soeda et al. (2005) used mechanically spotted microarrays with probes representing 1437 Brassica cDNAs to investigate the effect of priming, drying, and shelf-life induction treatments on transcript abundance. Here, we assay genome-wide patterns of gene expression using 17 886 probesets homologous to Arabidopsis genes, with the aim of both determining the mechanism of action of pregermination hydration treatments on germination efficiency and validating our genomic DNA controlled approach to cross-species microarray analysis.
| MATERIALS AND METHODS |
|---|
|
|
|---|
To test potential longevity, dried seeds were equilibrated at 75% RH and 20°C for 3 d and then divided over aluminum bags, which were subsequently sealed and placed in a water bath at 46°C. After periods of 0, 2, 3, 4, 5, 6, and 7 d, bags were removed from the water and opened, and the seeds were put to germinate on moistened filter paper. The percentage of seeds that germinated and developed into a normal seedling was determined after 10 d of subsequent incubation at 20°C. Seeds for RNA extraction were processed in parallel with the germination experiment. Statistical significance of the germination responses was evaluated using the chi-square test.
DNA and RNA Extraction
Seeds treated in parallel with the treatments for the seed germination experiments were fixed and stored by transfer to RNAlater buffer (Ambion, Austin, TX). DNA was extracted using CsCl2 gradient centrifugation (Sambrook et al., 1989). For each sample, total RNA was extracted from approximately 0.75 mL of seeds with phenol:chloroform:isoamyl alcohol (25:24:1), precipitated with ethylene glycol monobutyl ether, and then resuspended in water and further purified through LiCl precipitation (Sambrook et al., 1989). The RNA was further examined by gel electrophoresis for integrity and by spectrometry for purity. To ensure data quality, only samples with A260/A280 ratios of 1.9 to 2.1 were included in the study.
Microarray Labeling and Hybridization
For genomic DNA labeling, 3 µg of genomic DNA directly from the CsCl preparation was mixed with 20 µL of 2.5X random hexamers and heated at 100°C for 5 min. The mixture was cooled on ice immediately and labeled with biotin-dNTPs at 37°C for 2 h in the presence of Klenow DNA fragment using the BioPrime DNA labeling system (Invitrogen, Carlsbad, CA).
The RNA labeling and hybridization were performed as previously described (Zhu et al., 2001a). Briefly, total RNA (5 µg) from each sample was reverse transcribed at 42°C for 1 h using 100 pmol of the oligo dT(24) primer containing a 5' T7 RNA polymerase promoter sequence [5'-GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG-(dT)243'], 50 mM Tris-HCl (pH 8.3), 75 mM KCl, 3mM MgCl2, 10 mM dithiotreitol (DTT), 0.5 mM dNTPs, and 200 units of SuperScript II reverse transcriptase (Invitrogen, Carlsbad, CA). The second strand of cDNA was synthesized using 40 units of Escherichia coli DNA polymerase I, 10 units of E. coli DNA ligase, and 2 units of RNase H in a reaction containing 25 mM Tris-HCl (pH 7.5), 100 mM KCl, 5 mM MgCl2, 10 mM (NH4)SO4, 0.15 mM b-NAD+, 1 mM dNTPs, and 1.2 mM DTT. The reaction proceeded at 16°C for 2 h and was terminated using ethylenediaminetetraacetic acid (EDTA). Double-stranded cDNA products were purified by phenol/ chloroform extraction and ethanol precipitation.
Biotinylated complementary RNAs (cRNAs) were transcribed in vitro from synthesized cDNA by T7 RNA Polymerase (Enzo BioArray High Yield RNA Transcript Labeling Kit, Enzo, NY). cRNAs were purified using affinity resin (QIAGEN RNeasy Spin Columns; Valencia, CA) and randomly fragmented by incubating at 94°C for 35 min in a buffer containing 40 mM Tris-acetate (pH 8.1), 100 mM potassium acetate, and 30 mM magnesium acetate to produce molecules of approximately 35 to 200 bases. The labeled DNA and RNA samples were mixed with 0.1 mg ml1 sonicated herring sperm DNA in a hybridization buffer containing 100 mM 2-N-morpholino-ethane-sulfonic acid (MES), 1 M NaCl, 20 mM EDTA, 0.01% Tween 20, denatured at 99°C for 5 min, and equilibrated at 45°C for 5 min before hybridization. The hybridization mix was then transferred to the GeneChip microarray cartridge and hybridized at 45°C for 16 h on a rotisserie at 60 rpm.
The hybridized arrays were then rinsed and stained in a fluidics station (Affymetrix, Santa Clara, CA). They were first rinsed with wash buffer A (6X SSPE [0.9M NaCl, 0.06 M NaH2PO4, 0.006 M EDTA], 0.01% Tween 20, 0.005% Antifoam) at 25°C for 10 min and incubated with wash buffer B (100 mM MES, 0.1 M NaCl, 0.01% Tween 20) at 50°C for 20 min, then stained with Streptavidin Phycoerythrin (SAPE) (100 mM MES, 1M NaCl, 0.05% Tween 20, 10 mg ml1 SAPE, 2 mg ml1 BSA) at 25°C for 10 min, washed with wash buffer A at 25°C for 20 min and stained with biotinylated anti-streptavidin antibody at 25°C for 10 min. After staining, arrays were stained with SAPE at 25°C for 10 min and washed with wash buffer A at 30°C for 30 min. Arrays were scanned according to manufacturer's instructions, using an Agilent GeneArray Scanner and GeneChip Suite 4.0 scanning software (Affymetrix, Santa Clara, CA). The 4.0 software was used only to generate the DAT and CEL files and not to determine expression metrics, which was done using custom Perl software described below.
Microarray Data Processing and Generation of Expression Metrics
The DAT files of raw images generated by the scan of the microarray were used to generate CEL files with Affymetrix MAS 4.0. The CEL files generated in this manner, containing the signals of each oligonucleotide probe feature, were then analyzed using custom scripts written in Perl 5.6. The CEL files generated by genomic DNA hybridization were compared at the individual probe level to generate a database of probes hybridizing to the Brassica genome with 75% or less of the signal generated by hybridization to the Arabidopsis genome. The perfect match probe values from the microarrays hybridized to Brassica RNA were then compared to the database; probes hybridizing to the Brassica genome with 75% or less of the signal from the Arabidopsis genome were removed from the analysis. The 72nd percentile value of the remaining probes was then calculated, according to the match-only integral distribution (MOID) algorithm (Zhou and Abagyan, 2002). These raw signal values were then corrected by background correction (subtracting a background estimate obtained from the fifth percentile of the signal values, and scaling any resultant negative values to zero), and subsequently, a per-array normalization was used that scaled all expression metrics on each array to a target mean of 100. For cluster analysis, the values were normalized by median centering to a target of 1 on a per-gene basis. For Statistical Analysis of Microarrays (SAM) analysis, data were log(2) transformed. All the above operations were performed using scripts written in Perl 5.6. Self-organizing maps were generated using GeneSpring 4.2 (Silicon Genetics, Palo Alto, CA), based on the results of hierarchical, correlation-based clustering also performed with GeneSpring.
Replication
Since our major concern was the reproducibility of our probe-level analysis, we performed two technical replicate microarrays for a single RNA sample. We replicated the single RNA sample for each of the five points from the cultivar Lintop with a second RNA sample, also at each of the five points and with two replicate microarrays, for the cultivar Maverick. Hence, our overall experiment contains five treatment points, each with two different genotypes to control for biological and allelic variation between Brassica varieties, and each genotype with two technical replicates to control for technical variation in our procedure, giving 20 microarrays in total. We also performed additional replication using the previous generation, Novartis-designed "8k" Gene Chip (Zhu and Wang, 2000), which gave broadly similar results (data files available from the authors).
Statistical Analysis
Since this was an exploratory experiment, the mean of the two technical replicate hybridizations for each genotype, for every transcript, was used for initial analysis and clustering, without filtering genes for statistical significance. However, at this stage genes were filtered based on the statistical likelihood that true expression was detected, hence excluding the noisy probes with low signal that can confound clustering studies. The coefficient of variation of all the negative control probes was used to calculate the 95% confidence interval for undetected transcripts. This 95% confidence threshold, 25, gave the threshold of detectability (the arrays are each normalized to a target mean of 100). Any gene where no two replicate data points of the 10 microarrays used reached a signal of 25 was excluded from the analysis (all genes where all the probes were eliminated were also removed from the analysis at this step). In the case of this analysis, genes with one or more probes remaining and called "good" were allowed to proceed past the filter; however, this can readily be changed for other experimental designs using our scripts. Among 26 367 mRNAs with Arabidopsis probesets, 17 886 putative orthologs were considered "present" in Brassica using these cutoffs.
To determine which individual genes were statistically significant, the d statistic test from Statistical Analysis of Microarrays (SAM; Tusher et al., 2001) was used, as implemented in the Bioconductor package for the R programming language. Eight hundred thirty-three genes were classed as significant at a 5% false discovery rate (FDR) (delta = 2.3, Q = 0.05).
Real-Time Quantitative RTPCR
The total RNA was treated with DNase I from DNA-free Kit (Ambion, Austin, TX) before RTPCR to ensure no gDNA contamination. The treated RNA (100 ng) was reverse transcribed at 37°C for 1 h using M-MLV reverse transcriptase kit (Sigma Aldrich, St. Louis, MO) with antisense primers (0.5mM each) for the gene of interest and actin 1. The latter was used as an internal control to eliminate any differences in reverse transcription efficiency or RNA concentration.
Brassica-specific PCR primers and probes were designed from Brassica gene sequences homologous to Arabidopsis probeset sequences using Primer Express software (Applied Biosystems, Foster City, CA). Probes were labeled at their 5' end with a reporter fluorophore (tetrachloro-6-carboxyfluorescein [TET] for the internal control [actin 1] and fluorescein [FAM] for the gene of interest) and at the 3' end with the quencher fluorophore tetramethylrhodamine (TAMRA). Sequences of primers and probes used in this work are shown in Table 1.
|
Raw data of the assay results were obtained by using the software SDS2.0 (Applied Biosystems, Foster City, CA), according to manufacturer's instructions). Relative expression levels to actin were calculated using the cycle threshold (Ct) values and the formula 2
Ct as described by the manufacturer (Applied Biosystems, 1997). To compare the expression pattern of measured genes between each sample and reduce any effect of polymorphism between the genotypes, the expression level of the five replicates of each sample was normalized to the mean expression value.
| Results and Discussion |
|---|
|
|
|---|
We expect, based on the data of Koch et al. (2001), that the sequence divergence between Brassica RNA samples and the Arabidopsis sequences on the chip will be sufficiently high to cause mismatches between most of the oligonucleotide probes and the target RNA. This problem invalidates the assumption, inherent in most Affymetrix-type oligonucleotide microarray experiments, that a "mismatch" probe hybridization value may be safely subtracted from a "perfect match" probe hybridization value. Using standard analysis tools, this will cause the transcripts to be called "absent" (as described by Chismar et al., 2002) since the mismatch hybridization levels will be comparable to those of the "match." However, substantial information about transcript levels will remain in the perfect match signal, as demonstrated by Zhu et al. (2001a), who used a custom chip design without a mismatch probe. These authors validated cross-species microarray analysis without mismatch probes and the technique of genomic DNA hybridization, using the relatively large genomes of rice and barley. In the work described here, a similar custom array design is used, although the techniques are also valid for use in a conventional match-mismatch design if the mismatch data is discarded and the match probe signal is considered alone.
An additional concern is that some Arabidopsis probes will not hybridize measurably to Brassica DNA, or will give much lower signal intensity. The generally lower signal intensity is dealt with here at the data processing level by normalizing all the data from each microarray (Brassica or Arabidopsis) to a target mean of 100. The potential for nonhybridizing probes to cause skewed or noisy data is addressed here by detecting such probes by hybridization of genomic DNA to the microarray, and excluding these probes from the data analysis step.
Finally, a distinct problem exists in differentiating between closely related members of the same gene family in a cross-species hybridization. This is a problem that we are unable to address without the sequence of all the Brassica orthologs of the Arabidopsis genes in every paralogous gene family. This is an equivalently large problem for the use of a Brassica cDNA spotted microarray (Soeda et al., 2005). For this reason we present our results as valid only at the gene-family level, unless confirmed by analysis of a Brassica ortholog. We validate the expression patterns of examples from the gene-family members of selected transcripts with strongly differential expression, using real-time PCR with specific probes designed to cloned Brassica orthologs of Arabidopsis genes.
Removal of Hybridization Data for Nonhybridizing Oligonucleotide Probes Increases Data Quality and Simplifies Interpretation
The main experiments described in this paper were conducted using a custom-designed oligonucleotide microarray (Zhu, 2003). This microarray contains no mismatch probes and 382 166 perfect-match oligonucleotide probes of 25 base pairs each, designed to the TIGR2 version of the Arabidopsis genome annotation (January, 2002). These sequences were designed to match 25 996 unique Arabidopsis genes (representing over 99% of the predicted exons in the Arabidopsis genome at the time of design). Consequently, each gene is represented by a probeset of, on average, 15 oligonucleotide probes. The probeset hybridization data, of 15 intensity values on average for each gene ID, is used to calculate the expression value for that gene, which is taken as the 72nd percentile of the normalized probe cell intensity values for each probeset (the MOID algorithm; Zhou and Abagyan, 2002), a comparable metric to the standard Affymetrix expression intensity value, but one that does not require subtraction of a mismatch. The same expression metric was used to calculate expression values from the match probes of the standard matchmismatch Arabidopsis genome array and compared with the results generated by Affymetrix MAS 4.0 software. The correlation coefficients between the Affymetrix matchmismatch algorithm and the match probe only (72nd percentile) algorithm used here were between 0.92 and 0.96 for all the arrays used in this experiment.
To assess the utility of the Arabidopsis array for Brassica gene expression analysis, we estimated the rate of nucleotide substitution between Brassica oleracea and Arabidopsis thaliana. Three full-length Brassica coding sequences belonging to different functional groups and with clear Arabidopsis orthologs were compared using pair-wise Martinez/Needleman-Wunsch alignment (Chlorphyllase I, Arabidopsis = At1 g19670, Brassica = GB: AF337544; Ethylene receptor ETR2, A. = At3 g23150, B. = EMBL: AB078598; 40s ribosomal protein, A. = At3 g02560, B. = EMBL: AF144752). The nucleotide substitution rates between these sequences were 20.5% 19.7 and 20%, respectively, for the coding regions to which the probe oligonucleotides were designed.
Assuming the median of these measurements is close to that for all genes (20% substitution rate or 80% sequence identity), the probability of any 25-bp probe designed to match an Arabidopsis coding region being 100% identical to Brassica can be estimated at 0.825, so 99.62% of probes will be expected to contain a mismatch. Despite this, 75% of the probes on an array hybridized to Brassica DNA were found to be within 50% of the signal intensity of the identical probes hybridized to Arabidopsis DNA. We found that probes with extensive mismatches, but 12 bp or more of contiguous, identical sequence, were often capable of producing hybridization signals from Brassica genomic DNA that were within 75% of the Arabidopsis values after hybridization (data not shown). Since the 72nd percentile is used to estimate the expression value, it is likely that the expression patterns of closely homologous genes are being monitored in most, if not all, cases.
Labeling efficiency differences due to the genome size were normalized by subtracting a background measurement (the fifth percentile of the cell values) and then scaling the corrected cell values for each microarray to a target mean of 100. The cell intensity values for each oligonucleotide probe in two replicated arrays hybridized to Arabidopsis ecotype Columbia genomic DNA were then compared. In the first replicate Arabidopsis array, 91.5% of probes had at least 50% of the signal intensity of the probes on the second array. The comparison of Brassica and Arabidopsis genomic DNA hybridized arrays was then used to define a list of useful probes. The intention here is to remove probes from the analysis for which the Brassica target gene is not clearly orthologous to a gene from Arabidopsis, while retaining probes for genes that may have two or more Brassica orthologs for the Arabidopsis gene. Since the arrays contain several redundant probes for each gene, it is possible to remove a substantial number of probes from the analysis and still obtain a robust result. Consequently, all the probes where Brassica genomic hybridization was less than 75% of the signal from Arabidopsis (143 361 probes, or 37.5%) were designated unusable; the remainder were processed using a purpose-written expression analysis algorithm (see "Materials and Methods" section). The figure of 62.5% of probes that hybridized to Brassica genomic DNA with 75% or more of the Arabidopsis signal is close to the 68.7% of probes we estimate to have 12 bp or more of identical contiguous sequence. It is likely therefore that these probes are recognizing related sequences in Brassica with substantial homology, but significant mismatches, to the Arabidopsis gene to which the probes were designed. For this reason we cannot attribute the behavior of individual genes in Brassica to the data from probes designed to their direct Arabidopsis orthologsin most cases the sequence of the Brassica genes themselves is not known. Since the nucleotide level similarity between members of a gene family in Arabidopsis may be greater than the similarity between the Arabidopsis and Brassica orthologs, we can only describe the Brassica gene expression data in terms of the behavior of gene families. However, the ability to do this on a whole-genome scale can still provide insight into the physiology of transcriptional regulation, in the case of gene families that are coregulated.
Figure 1 shows the effect of using only the probes designated "usable" on the correlation of expression values from control genomic DNA hybridization, using a fourth microarray also hybridized to Brassica genomic DNA. Although the average probe number per probeset was reduced from 15 to 9 after removal of the unusable probes, the correlation coefficient between Brassica and Arabidopsis genomic hybridization improved from 0.69 from the unmodified analysis to 0.75 by the use of only the "homologous" probes. These data support the assumption that the expression values we infer from our analysis of microarrays hybridized to Brassica mRNA are likely to represent measurements of the expression level of genes with high nucleotide-level identity to those in Arabidopsis. We compared the reproducibility of RNA detection by comparing the same Brassica RNA sample hybridized to two microarrays, analyzing both arrays with both methods and comparing the correlation between replicates for each method. The correlation in both cases was very high (0.99) with no loss of reproducibility by exclusion of probes. Consequently, we do not expect (and have not observed) the use of a restricted probeset to strongly affect the relative expression values between Brassica RNA samples.
|
|
|
Figure 2 shows that viability of untreated seeds is reduced to half its original value after 4 to 6 d of incubation at 46°C in both cultivars. For primed seeds this reduction is reached after approximately 2 d. Seeds partially dried with heating (shelf-life induction) after priming showed similar longevity to untreated seeds. In summary, primed seed germinated more rapidly but deteriorated more quickly, and this was reversed by the shelf-life treatment of gradual dehydration with heating. These data are fully consistent with previous studies on seed germination responses and indicate that our treatments are equivalent to those in comparable studies (Bruggink et al., 1999; Powell et al., 2000). The responses of germination viability to priming and to shelf-life treatment are both significant at the P < 0.001 level.
Exploratory Analysis: Correlation of Transcriptional Profile with Functional Category
An exploratory analysis was then performed on the results of the expression microarrays from the seed germination experiments. The RNA used for microarray analysis was extracted from Brassica seeds of two genotypes (Lintop and Maverick), which were either untreated or subjected to four successive treatments corresponding to established procedures designed to optimize germination rates. In the data shown in Fig. 3
, 4
, and 5
, point 1 represents untreated seed. Point 2 represents seed treated by partial hydration to 54% moisture or "priming." After this treatment, seeds for point 3 were dried to 6% moisture (priming followed by drying). Seeds for point 4 were treated as for point 2, but instead of drying, they were exposed to the partial dehydration with heating referred to as shelf-life induction (Bruggink et al., 1999). Subsequently, RNA for point 5 (seeds partially hydrated, partially dried with heating [shelf-life induced] and then dried fully) was harvested. The continuous lines connecting the points in Fig. 2 are used only to demonstrate the pattern used to map the gene clusters. Therefore, points 1, 2, and 3 are continuous, as are the points 1, 2, 4, and 5, but there is no continuity between points 3 and 4.
|
|
|
In total, 17 886 probesets met our preliminary filter criteria; we concluded that the remaining genes are likely not expressed in Brassica seed at high enough levels to detect with confidence, or are substantially polymorphic between Arabidopsis and Brassica. This cutoff eliminated most of the probesets with unacceptable noise levels. The expression profiles of all 17 886 remaining transcripts, with values normalized to the median on a per-gene basis, are shown in Fig. 3A for the five treatment points described, to demonstrate the global patterns within the total data set. Values represent the mean of the signal values from two independent replicate microarray experiments. It is apparent from Fig. 3A that there are three clearly visible generic patterns within these lines, an observation that was confirmed using hierarchical clustering analysis, which was consistent with three global clusters of expression patterns (analysis not shown).
Having found three distinct expression patterns that were clearly visible among these genes with detectable mRNA levels in seed, we used clustering of the median-centered data using self-organizing maps to separate the three clusters, which are shown in Fig. 3C. The first, and largest, group of 7731 transcripts (cluster 1) is induced by the hydration step, relatively unchanged by the drying step, repressed by the gradual dehydration (shelf-life induction) step and again relatively unchanged by drying. This behavior is consistent with genes that are involved in the mediation of seed responses to hydration, and whose response to priming is reversed by the gradual dehydration induction treatment. Since this group represents 43% of all the genes detected in seed, the response to seed hydration probably occurs on a genomic scale. The second cluster shows no dramatic alteration in gene expression and serves to provide an indication that the large changes seen in the other two clusters are unlikely to be due to systematic error. The third cluster has almost the inverse response pattern of the first, in that most of the genes are strongly repressed by the hydration treatment. However, any of the genes repressed by hydration in this group are not reversed by a subsequent induction in response to gradual dehydration. Many genes in this cluster change only slightly, but several transcripts are strongly hydration-repressed, by more than an order of magnitude in some cases. Compared with other studies that have, for example, shown that 10% of investigated transcripts respond strongly to light in etiolated seedlings (Tepperman et al., 2001), this result may indicate global changes in the seed or seedling genome during early developmental transitions such as germination and de-etiolation. However, without thorough statistical analysis of a Brassica-specific Affymetrix array, it is not possible to prove this conclusion on a whole-genome scale.
We used the MIPS classification system (http://mips.gsf.de/proj/thal/; verified 11 June 2007) to determine which functional classes of proteins are encoded by the transcripts in each of the three expression pattern clusters. This system has the advantage of providing a putative functional category for most genes in the Arabidopsis genome. Figure 3B shows the breakdown by functional category of all of the detected transcripts. The detected transcripts closely mirror the overall proportion of the genome, with the exception of a reduction in the percentage of unknowns (probably many of these genes are misannotated or expressed at low levels) and an increase in the proportion of metabolic genes (a well-characterized and highly expressed group). The relative abundance of each functional category in each of the coregulated groups is shown as a bar graph (Fig. 3C, right-hand panel). The clusters of genes that are upregulated by the hydration treatment (cluster 1) are enriched in proteins involved in translation and protein synthesis (361 of the detected transcripts fell into this category, 253 of them in cluster 1; significant by chi-squared at P = 2.5 x 1025). The result that genes related to protein synthesis are transcriptionally activated during hydration was anticipated, since respiration and amino acid metabolism are processes activated very early in seed germination (Bové et al., 2001). They include ribosomal proteins and elongation factors, genes involved in ribosomal RNA biosynthesis, amino acid biosynthesis, protein complex formation and tRNA biosynthesis and assembly. A similar conclusion was reached recently by Soeda et al. (2005) using a Brassica cDNA microarray.
Proteins involved in energy production and protein degradation and folding were also overrepresented in this cluster (372 of 752 degradation and folding, P = 5.4 x 104; 118 of 246 energy, P < 0.05). These observations are consistent with previous results at the single-protein or activity level that protease activities (Muntz, 1996) and protein synthesis (Bray, 1995; Bewley, 1997) are activated when seeds are hydrated, but demonstrate that a global effect is seen on genes in these classes.
The second and third clusters show no strong enrichment in any of the general MIPS functional categories, although the general MIPS categories do not separate storage proteins (discussed below). The third cluster contains fewer protein synthesis components and more plant-specific proteins (a group that includes storage proteins) than the previous two.
Soeda et al. (2005) divide their transcripts into three groups, which correspond to transcripts repressed during priminggermination, transcripts induced during priming, and transcripts induced during the later stages of germination (not investigated here). We also observed transcripts whose levels appear to be induced (cluster 1) and repressed (cluster 3) during the process of seed hydration. The classes of genes in these groups match well those described by Soeda et al. (2005), including global changes in expression of ribosomal proteins, elongation factors, histone genes and a number of transcription factors and metabolic enzymes (see Supplemental Table 1 for details).
Gene by Gene Statistical Analysis Provides Support for Changes in Gene Regulation
The SAM package (Tusher et al., 2001) was used to analyze the significance of the expression values for each gene on a gene-by-gene basis. Using an FDR of 5%, 833 genes were classed as significant. These statistically significant probesets were used for further analysis. In contrast, 16 941 genes were called significant at a FDR of 50%while the genes at this high FDR were not further used, this gives evidence that a high number of genes (>8000) change in expression, indicating likely changes in a very large number of transcripts during the process of seed hydration.
The 833 statistically significant Arabidopsis homolog genes included large numbers in the same functional categories as those correlated with particular gene clusters shown in Fig. 3. Fifty-nine of the 833 genes are involved in protein synthesis, folding, modification, destination, translation, or ribosome biogenesis, and strikingly, fifty-two are ribosomal proteins. Six are involved in energy generation, 4 in amino acid biosynthesis, and 50 in general metabolism. These genes and their expression patterns are summarized in Supplemental Table 1.
Storage Protein mRNA Levels Decline in an Irreversible Manner after Hydration Treatment
We broke down the data shown in Fig. 3 into the genes contained within the set of 833 statistically significant genes representing gene families found to correlate with particular clusters (Fig. 4). The mRNAs of the super-abundant seed storage proteins, induced strongly during seed development, are known to persist in dry seeds, forming 50 to 60% of the total mRNA by mass in desiccated soybeans (Goldberg et al., 1981). These mRNA levels are known to decline rapidly during germination; this is thought to be necessary to curtail the synthesis of storage proteins, which are synthesized from these messages in the immediate period after seed imbibition (Kermode and Bewley, 1985; Kermode et al., 1985; Bewley et al., 1989; Misra and Bewley, 1985). The levels of the messages for oleosins, cruciferins, and legumins, and their homologs and isoforms, are displayed in Fig. 4 and Supplemental Table 1. Two oleosin probesets (At4 g25140 and At5 g51210), one legumin (At2 g28680), and two cruciferins (At1 g03880 and At4 g28520) were found to be statistically significant at the FDR 0.05 level. These messages all decline sharply following hydration treatment, with a more than fivefold reduction in the case of At4 g25140, and a more than 10-fold reduction in the case of At4 g28520. The sharp decline in oleosin message levels mirrors the 10- to 15-fold induction of these messages observed in Arabidopsis seeds 8 to 10 d after fertilization (Ruuska et al., 2002). Interestingly, the message level of the storage proteins and oleosins do not return to the value in the dried seed after the gradual dehydration but remain low. The removal of "legacy" transcripts of embryogenesis and seed filling is therefore not reversible by gradual dehydration. These results correlate well with those of Soeda et al. (2005), which also demonstrated that storage proteins such as cruciferin, oleosin, napin and the late embryogenesis abundant (LEA) transcripts are repressed by seed priming.
Levels of an Amylase Transcript Are Reduced in Response to Hydration
The mRNAs for amylases are strongly induced in aleurone tissue during cereal seed germination, demonstrating that at least part of the induction of degradative enzymes is controlled at the level of transcription (Rogers, 1985; Livesley and Bray, 1991). Contrary to the behavior of amylase genes in cereal aleurone cells, we did not observe a strong increase in the expression of any messages for alpha- or beta-amylase during seed hydration (Fig. 4 and 5). In fact, we observed a substantial reduction in the level of the mRNA for one highly statistically significant beta-amlyase probeset (Arabidopsis gene ID At3 g23920) in response to the hydrationpriming treatment. This unexpected result was not detected by the limited microarray used by Soeda et al. (2005), and since our microarray experiment used a cross-species hybridization approach, there remained a possibility of error. We therefore validated this observation using RTPCR on the orthologous Brassica transcript (Fig. 5).
Levels of Transcripts for Proteolytic and Lipolytic Enzymes Are Differentially Regulated during Priming
We found that one protease gene (At5 g67360, a cucumisin-like serine protease) was significantly induced during the initial hydration treatment, and that two other protease messages (At5 g58870, an FtsH protease, and At4 g39910, a ubiquitin-specific protease) were significantly repressed. It is expected that not all proteases will be induced, as many proteases have cellular roles other than the mobilization of seed reserves. However, the lack of a globally strong induction of genes in this category could indicate that the general increases in proteolytic activity observed during germination (Muntz, 1996) are mediated partly at the post-transcriptional level.
We observed a strong and statistically significant increase in expression of messages for omega-3 and omega-6 fatty acid desaturases in the partially hydrated seed (At2 g29980 and At3 g12120). Interestingly, a delta-9 desaturase, At3 g15850, was significantly repressed in seeds subject to the same treatments. The stronger effect on lipid metabolism than starch metabolism may reflect the balance of lipid to polysaccharide reserves in cruciferous seed, or a tendency to utilize lipid resources preferentially in this phase of germination (lipid filling gene activation occurs in a different phase of seed development than the induction of starch filling transcripts [Ruuska et al., 2002]). While the changes in message levels for the storage proteins themselves (note particularly the behavior of oleosin messages) are not reversed at point 4 (shelf life induction treatment), the changes in the omega-3 fatty acid desaturase, the beta amylase, and the serine protease in response to the initial priming hydration treatment are reversed by the shelf-life treatment. Therefore, the removal of "legacy" storage transcripts is not tied to the mobilization of the reserves stored in the products of these genes, and this mobilization may be partly reversed by the shelf-life treatment.
Analysis of Transcriptional Responses of Putative Biomarkers from the Two Cultivars by RTPCR
We selected five Arabidopsis probesets from the microarray experiment as potential biomarkers for the quality of primed and shelf-life treated seed: At4 g28520 (Cruciferin), At3 g02560 (40s ribosomal protein), At4 g25140 (Oleosin), At2 g29980 (Omega-3 desaturase), and At3 g23920 (Beta amylase). All showed statistically significant changes in expression except for the 40s ribosomal protein, which was chosen for its close sequence similarity to the only available Brassica ribosomal gene, since many other ribosomal proteins were statistically significantly altered in expression. We assayed the expression of the five homologous Brassica genes by quantitative RTPCR. All five genes were present in GenBank and are clearly orthologous to the Arabidopsis probes that gave a strong signal response to seed priming treatments. To address the biological variations between the Brassica cultivars, we again replicated these experiments in another cultivar, B. oleracea L. convar botrytis L. Alef. var. botrytis cv. Lintop, as used for the priming experiment shown in Fig. 1 and for the microarray experiments. We were able to clearly validate the expression patterns of the representative storage protein messages we examined, an oleosin (accession X61937) and a cruciferin (accession AAK07609). Both were substantially reduced by the priming treatment in both cultivars. We were also able to reproduce the unexpected behavior of the beta-amylase gene, using the B. napus ortholog AF319168 of the Arabidopsis gene ID At3 g23120 to design the primers. We also investigated the behavior of a fatty acid desaturase, AF056571. We replicated the behavior of this gene in the cultivar Maverick; however, the response pattern in cultivar Lintop was different and, similar to the microarray result (Fig. 4), shows noisy data, possibly a result of allelic variation between the cultivars. We were unable to reproduce the generalized behavior of the components of the translation machinery using primers designed to a 40s ribosomal protein from B. oleracea (Acc. AF337544) (data not shown). However, the microarray data indicated that ribosomal proteins are induced on a large scale after the priming treatment, and we believe this result is likely to be attributable to aberrant behavior of the particular Brassica 40s gene selected (the Arabidopsis homolog of which was not one of the 52 that were statistically significant).
| Conclusions |
|---|
|
|
|---|
The extent of alteration in mRNA (and presumably, protein) synthesis activated by seed priming, while the process of germination can still be reversed by drying, indicates that priming treatment alone is sufficient to cause genomescale transcriptional remodeling. We suggest that the effect of the primingheatinggradual dehydration process on the induction of protein synthesis could be a contributing factor to seed longevity. One effect of hydration treatment that is not reversed by gradual dehydration is the reduction in the abundance of the messages of seed storage proteins, and certain other seed-specific messages. It is well known that the mRNAs for storage proteins persist in large quantities in the viable dried seed and that these messages must be removed before the synthesis of the proteins required for germination can proceed. Since the removal of seed storage protein messages seems to be the strongest effect of partial hydration, and is not fully reversed in either cultivar by slow drying (Fig. 4), the marked difference between the transcript profile of treated and untreated seeds may be the key to the effectiveness of the pregermination treatments described here. Primed seed, which lack seed-specific transcripts such as storage proteins, may, on germination, immediately activate a system of protein synthesis unencumbered by the legacy mRNAs generated during seed development.
The data set we generated using cross-species hybridization controlled by genomic DNA hybridization gave us useful biological insight into the processes potentially controlling the seed hydration viability phenomenon. We were able to show that four of five orthologous Brassica transcripts were in broad agreement with the results of our exploratory microarray, as were the results of small-scale hybridizations with Brassica-specific arrays. We believe therefore that we have described a useful and robust method for exploratory, cross-species use of model-organism microarrays.
| ACKNOWLEDGMENTS |
|---|
Received for publication December 1, 2006.
| REFERENCES |
|---|
|
|
|---|
-amylase and rRNA genes in barley aleurone protoplasts by gibberellin and abscisic acid. Nature 316:275277.[CrossRef]
-amylase production and protein synthesis by wheat aleurone layers. Ann. Bot. (Lond.) 68:6973.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||