Crop Science Illumina
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 16 July 2007
Published in Crop Sci 47:S-83-S-95 (2007)
© 2007 Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Supplemental Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fengler, K.
Right arrow Articles by Rafalski, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Fengler, K.
Right arrow Articles by Rafalski, A.
Agricola
Right arrow Articles by Fengler, K.
Right arrow Articles by Rafalski, A.
Related Collections
Right arrow Plant Genetic Resources
Right arrow Crop Genetics

ORIGINAL RESEARCH

Distribution of Genes, Recombination, and Repetitive Elements in the Maize Genome

Kevin Fengler*, Stephen M. Allen, Bailin Li and Antoni Rafalski

DuPont Crop Genetics Research, DuPont Experimental Station Building E353, Wilmington, DE 19880-0353

* Corresponding author (kevin.a.fengler{at}cgr.dupont.com).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results
 Discussion
 REFERENCES
 
A high-density genetic map of maize (Zea mays L.), integrated with a high-resolution bacterial artificial chromosome (BAC) contig map compiled from two inbred lines, B73 and Mo17, was used to determine gene density and recombination frequency along chromosomes and to compare the distribution of these features as a function of physical distance. Recombination frequency is well correlated with gene density for each chromosome (r = 0.767–0.960), although in centromeric regions, recombination is more depressed than a strict correlation with gene density would indicate. In addition, the distribution of major retrotransposon classes and of Mu and Helitron insertion sites were examined. Overall retrotransposon density is equal at all maize chromosomes (r = 0.996), although different classes of retrotransposons display preferential insertion into different chromosome regions. Cinful, Zeon, and Prem are more common near the centromere, while Opie, Ji, and Huck preferentially reside at the subtelomeric segments of chromosomes. Grande does not show preference in distribution relative to physical location on the chromosome. Comparison of the distribution of genic overgo probe hybridization-positive BACs from maize inbred line B73 to those of the inbred Mo17 indicates that nonshared genic segments, many of which correspond to pseudogenes carried by the Helitron elements, tend to be less common in the centromeric region. As previously reported, Mu transposons seem to preferentially insert in the vicinity of genes, as indicated by their paucity in centromeric regions, and moderate correlation (r = 0.759) with gene density.

Abbreviations: BAC, bacterial artificial chromosome • EST, expressed sequence tag • HICF, high-information-content fingerprinting • LTR, long-terminal repeat • RN, recombination nodule • TIGR, The Institute for Genomic Research


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results
 Discussion
 REFERENCES
 
MEIOTIC RECOMBINATION occurs more frequently in some regions (hot spots) of the eukaryotic genome than in other regions (cold spots), and there is no simple linear relationship between genetic and physical distances (Lichten and Goldman, 1995; Petes, 2001). Because the amount of nuclear DNA (physical map distance) in eukaryotes varies greatly, while the total length of their genetic maps and number of genes are fairly constant, it has been hypothesized that meiotic recombination is largely restricted to genes (Thuriaux, 1977). Indeed, several genic regions in maize (Zea mays L.), including the bronze (bz), a1, and waxy genes, have been identified as recombination hot spots (Civardi et al., 1994; Dooner, 1986; Okagaki and Weil, 1997). Recombination is evenly distributed within the bz gene (Dooner and Martinez-Ferez, 1997), and recombination at one side of bz (gene-dense) is almost two orders of magnitude higher than at the other side of bz, which is gene-poor and contains a large cluster of methylated retrotransposons (Fu et al., 2002). It was concluded that repetitive retrotransposon DNA in maize contributes little to genetic length and recombination breakpoints are mostly located in genic or low-copy, highly conserved regions of chromosomes (Schnable et al., 1998).

At the chromosomal level, it has been observed that recombination is often suppressed near centromeres and elevated near telomeres (Copenhaver et al., 1999; Nachman, 2002; Yan et al., 2005). In general, genes are relatively rare in the centromere regions of eukaryotes. Mapping of wheat (Triticum aestivum L.) expressed sequence tags (ESTs) into chromosome bins indicated that majority of EST-dense regions are in the distal parts of the chromosomes, and relative gene density and recombination rate increase with the relative distance from the centromere (Akhunov et al., 2003; Qi et al., 2004). These observations suggest a correlation between recombination and gene density.

To determine and compare the rate of recombination and its correlation with gene distribution along the chromosomes, genetic distance and gene density need to be compared with physical distance between markers. Genomewide comparison has been difficult in maize due to the lack of an integrated genetic and physical map (genetic markers anchored to physical map) or whole genome sequences.

Previous attempts to characterize the sequence composition of the maize genome have focused either on specific regions of the genome or on genomic bacterial artificial chromosome (BAC) inserts (Haberer et al., 2005), BAC ends (Messing et al., 2004), and gene-enriched sequences (Palmer et al., 2003; Whitelaw et al., 2003; Yuan et al., 2003) without knowledge of their physical location in the genome. One way to directly examine recombination at the whole genome level and to estimate the physical positions of genetic markers or ESTs is through the cytological maps of crossing over based on recombination nodules (RNs) (Anderson et al., 2006, 2004, 2003). With this system, gene density and recombination rate are found to be lower near the centromere regions. A strong correlation between gene density and recombination rate is observed, although the crossover frequencies for telomeric intervals are much higher than was expected from their EST frequencies.

With the recent availability of an integrated genetic and physical map in maize, with thousands of genetic markers and genic sequences anchored to the genome (BAC contigs), it is possible to compare the distribution of recombination breakpoints, genes, and other elements along the maize chromosomes at a much higher resolution than with cytological maps. Here, we confirm the uneven distribution of genes and recombination along each chromosome and a strong correlation between gene density and recombination frequency. However, the recombination rates near centromeres are lower than what the gene density would have predicted. Retrotransposon counts are strictly proportional to chromosome length, although specific classes of retroelements show clear bias toward centromere or telomere regions. By comparing the distribution of shared and nonshared genic overgo probes between BAC contigs of maize inbred lines B73 and Mo17 (with many of the nonshared overgos corresponding to pseudogenes carried by Helitron elements), and the distribution of Mutator insertion sites, we propose that the lower-than-expected frequencies in recombination, Mutator and Helitron insertions near centromeres may be the results of shared mechanism, such as a need of relatively open chromatin and formation of double-strand breaks.


    Materials and Methods
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results
 Discussion
 REFERENCES
 
Identification of Low Copy and Genic BAC Ends
Maize B73 BAC ends sequences were downloaded from NCBI's GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html; verified 2 April 2007). The 473640 sequences (306442474 bp) were deposited in GenBank as part of the Maize BAC End Sequencing Project from the Plant Genome Initiative at Rutgers (http://pgir.rutgers.edu/Sequencing/STMG_Page_7Dec06.html; verified 2 April 2007) and Arizona Genomics Institute (http://www.genome.arizona.edu/fpc/maize; verified 2 April 2007). The sequences were downloaded December 2004. The sequences were processed using cross_match (www.phrap.org/phredphrap/general.html). Files were first masked for Escherichia coli and vector sequences and then masked for repetitive elements using The Institute for Genomic Research's (TIGR) characterized (characterized_02202004.fasta) and uncharacterized (uncharacterized_02202004.fasta) maize repeat datasets (TIGR, Rockville, MD), stringencies available on request. BAC sequences with less than 150 unmasked base pairs were removed, resulting in a dataset of 217991 sequences (144127458 bp). This dataset was blasted and then clustered against itself, using BioFacet software's version of Blast2 (Gene-IT, Westborough, MA). Clusters were formed based on an alignment window greater than or equal to 150 bp and greater than or equal to 80% identity; clusters with seven or more members were removed. As a final step, remaining BAC ends that had more than 100 consecutive Ns were removed from the dataset. The resulting dataset, the low-copy BAC ends, contained 76 217 sequences (45310818 bp).

Genic content of the low-copy BAC ends was determined by Blast comparison to a dataset of assembled maize EST sequences (66469 sequences, 46860,691 bp). The assembled ESTs were filtered via keyword by referencing the top blast hit of each assembled sequence to a list of repetitive element names. The assembled EST dataset was then masked for additional repetitive sequence using cross_match, and the TIGR characterized set polyA was masked. Finally, sequences with less than 80 unmasked base pairs were removed, resulting in a dataset of 61346 ESTs (43518765 bp). The low-copy BAC end sequences were blasted against this dataset using BioFacet. Blast hits with an alignment window greater than or equal to 75 bp and greater than or equal to 98% identity were considered genic (8527 BAC ends).

Identification of Major Retrotransposon Families
A set of 473645 B73 BAC ends was scanned for the major retrotransposons Ji, Opie, Prem, Cinful, Grande, Huck, and Zeon as follows. The repeat elements were captured from TIGR's characterized repeat set (characterized_02202004.fasta) by a keyword search (187 sequences, 1036754 bp). These were blasted against the BAC ends using BioFacet. Blast hits with greater than or equal to 70% identity and an alignment window greater than or equal to 200 bp were collected. The highest-scoring repeat/BAC end alignments (based on a 3:1 scoring matrix) were reported and mapped as described above.

Alignment and Ordering of Allelic Contigs
Allelic BAC contigs from the B73 (http://www.genome.edu/fpc/maize/WebAGCoL/WebFPC/) and Mo17 (DuPont, Wilmington, DE) physical maps were identified via shared overgo probes. The two physical maps were compared against each other and further coalesced into "supercontigs" using reciprocal complementary information. Genetic markers anchored to either physical map served to order and orient the contig pairs along each chromosome. The mapped and oriented set of super-contigs consists of 294 allelic pairs that comprise 95.8 (B73) and 88.4% (Mo17) of the fingerprinted bands of their respective physical maps. Based on a genome size of 2365 Mbp for B73 and an average of 4.9 kb per FPC band (Arizona Genomics Institute, 2005), at least 87% of the genome was considered in this analysis.

Determination of Physical Distance, Map Position, and Definition of Bins
Physical distance was computed as a function of the number of fingerprinted bands for contigs positioned on the B73 map. Bands from consecutive contigs were added to assign each band an absolute position along a given chromosome. Physical distance between contigs was assumed to be zero bands. Most of the unassigned contigs are likely to reside near centromere because of the relative paucity of genetic markers in that region, and physical distance will therefore be underestimated across this region. Genetic markers and overgo probes were given a physical map position based on the midpoint of positive overlapping BAC clones, while BAC ends were placed by the center of the particular BAC. The total length of a chromosome was used to subdivide the physical length of each into 10 equal-size bins. The closest IBM2 framework marker (Cone et al., 2002) to a bin junction was used to calculate the genetic distance covered by each bin. IBM is a multi-meiotic map with expanded genetic distances, relative to a single meiosis map.

Counting Shared and Nonshared Genes
Shared genes were defined as low-copy overgo probes that show positive hybridization to less then five contigs and were common to both B73 and Mo17. Nonshared genes were defined by overgo hybridizations to at least two BACs from either the B73 or Mo17 (to avoid counting false hybridizations). Some overgos were counted on multiple contigs (3146 = 1 ctg, 573 = 2 ctgs, 108 = 3 ctgs, and 19 = 4 ctgs). To circumvent the problems associated with comparing the supercontigs, only ungapped regions flanked by shared overgos were compared. Although the Mo17 and B73 physical maps were aligned together, overgos have distinct positions on their respective maps and, to be counted, needed to be given positions on the same map. While the shared and nonshared B73 genes could be directly positioned on the B73 map, the Mo17 nonshared genes were placed relative to the shared genes on the B73 map and were counted in the B73 physical distance bins.

Defining the Centromere and Subtelomere Regions
The cores of maize centromeres are known to be composed of a blend of CentC and CRM (Jin et al., 2004). BACs from the Mo17 map were hybridized with probes for these centromere indicators. Three overgo probes were used to detect CentC (ggttccggtggcaaaaactcgtgctttgtatgcaccgaca, gaatgggtgacgtgcgacaacgaaattgcgagaaaccacc, and gttttggacctaaagtagtggattgggcatgttcgttgtg) while a central part of CRM from 940 to 6622 bp labeled with a random primer was used to detect this repeat. Contigs containing BACs positive for both of these centromeric sequences were identified for chromosomes 1, 2, 3, 4, 5, 7, 9, and 10. The physical location for these centromere cores was in close agreement with their genetic position on the IBM2 Neighbors map (Cone et al., 2002). Physical distance bins that contained a centromere core were designated as centromeric bins. For chromosomes 1, 3, 4, 7, and 10, the centromere core was localized to the middle of a physical bin, which made it convenient to pool data from these five chromosomes for analysis. The first and last physical distance bins for each chromosome were designated as subtelomeric bins.

Identification of RescueMu-Flanking Sequences
A set of ~190,000 RescueMu-flanking GSS sequences (Fernandes et al., 2004; Maize Gene Discovery Project, http://www.mutransposon.org/project/RescueMu) were BLASTed against the shared overgo parent sequences to place these insertion events on the physical map. There were 505 matching hits given the physical position of the corresponding shared overgo on the B73 map. In addition, RescueMu-flanking sequences were identified in 802 physically mapped B73 BAC end sequences. However, only 559 of these were present on the B73 agarose physical map used in this study. The remaining 243 BACs were given physical positions on the agarose by extrapolating their position from the high-information-content fingerprinting (HICF) physical map based on the closest common BACs on the two physical maps.


    Results
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results
 Discussion
 REFERENCES
 
Gene Density and Recombination along the Length of Maize Chromosomes
Two approaches were taken to determine the distribution of genes along the physical length of the chromosomes. The first was to identify on the integrated physical–genetic map BACs that hybridize with overgo probes derived from EST unigenes (Gardiner et al., 2004). To eliminate false positive hybridization signals, as well as avoid counting pseudogenes unique to either B73 or Mo17 (Brunner et al., 2005), only overgo probes shared by allelic contig pairs between inbreds Mo17 and B73 were considered. The second approach was to identify and locate on the integrated physical–genetic map BAC ends containing gene sequences. Thus, the chromosomal locations of 21846 genes from 16145 shared overgos and 5701 genic B73 BAC ends were identified (Table 1). This constitutes roughly 37 to 52% of the genic component of maize based on sample sequence estimations from BAC clones (Haberer et al., 2005) and BAC ends (Messing et al., 2004). The distribution of recombination along the chromosomes was determined by identifying the chromosome locations (cM) of genetic markers from the IBM2 map anchored to the physical map. In total, 1117 framework markers were given a precise position on the integrated physical–genetic map (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1. Distribution of genic sequences identified by overgo hybridization and by analysis of DNA sequences of B73 bacterial artificial chromosome (BAC) ends and of genetic markers anchored to the physical map on maize chromosomes. The physical length of each chromosome was determined by counting the number of fingerprinted contig (FPC) bands of the integrated genetic–physical map. The genetic length of each chromosome is taken from the IBM2 map. Correlation values between genes and recombination were derived by comparing the percentage of total genes and genetic distance in each of 10 equal physical distance bins for each chromosome. The genome-wide correlation between genes and recombination (r = .866) was determined by comparing the values from all 100 physical distance bins.

 
The cumulative counts of genes and genetic distances (cM) were plotted against cumulative physical distance to depict gene density and recombination frequency along each chromosome (Fig. 1 ). In some cases, the plot of genes and recombination follows a very similar path along the chromosome (chromsome 1 and 6), indicating that gene density and recombination frequency are highly correlated (r = 0.960 and r = 0.916, respectively). However, for all 10 chromosomes, both plots follow a distinctively nonlinear path, indicating uneven distribution of gene density and recombination frequency along the physical length of the chromosomes. The cumulative gene and genetic distance counts at first rapidly increase starting from the top of the short arm but then level off toward the centromere, eventually becoming almost completely flat. Around the centromere, but well beyond the centromere core, in the region encompassing close to the middle one-third to one-fourth of the chromosome, there is little or no recombination. Gene density (gene/unit physical distance) in the subtelomeric regions is 6 to 7.6 times higher than around centromeres. Beyond the broad centromeric region, both curves steadily increase again toward the bottom of the long chromosome arm, at an increasing rate. In some chromosomal regions, a cumulative increase in genes was not matched by an increase in cumulative genetic distance. For example, several chromosomes display a steep increase in cumulative recombination greater than the cumulative gene count in the short arm (chromosome 4, 5, 7, and 10) or the number of genes increases steadily while recombination does not (chromosome 2, 3, 5, 8, 9, and 10). Not surprisingly, the plot for chromosome 6, which has a diminutive short arm, is unusually flat at the beginning. Similar observations, although at a lower resolution, were made by Anderson et al. (2004).


Figure 1
Figure 1
Figure 1
Figure 1
Figure 1
View larger version (139K):
[in this window]
[in a new window]

 
Figure 1. Distribution of genes and recombination along the chromosomes. The cumulative gene count (small black squares) and centimorgan counts (empty diamonds) were plotted against the physical length, expressed as number of fingerprinted bands for each chromosome. On average, 1000 fingerprinted contig (FPC) bands corresponds to 4.9 megabases. Large black squares on the physical distance axi correspond to the location of the centromeres identified on the physical map. For chromosomes 6 and 8, the centromere cores could not be placed in the integrated physical–genetic map, and their genetic positions are based on the IBM2 map and indicated by large black square on this axis.

 
It was also noted that the location of centromere on chromosome 8 does not correspond to the recombination minimum of that chromosome. For this chromosome the centromere core could not be placed on the integrated physical–genetic map by hybridization of centromeric repeats. Therefore, the centromere position was based on the IBM2 genetic map (Cone et al., 2002).

Another way to examine the relationship between genes and recombination is to subdivide each chromosome into 10 equal-size physical distance bins and count the number of genes and centimorgans in each bin. This type of analysis sheds light on the distribution of these features in different chromosomal regions. Overall, gene density and recombination frequency were well correlated (r = 0.767–0.960, Table 1). However, there were significant differences in the number of genes and the amount of recombination along the chromosomes. For example, on chromosome 1, the outermost bins (bin1 and bin10) representing the subtelomere region contained 15.3 and 16.4% of the genes, while the centromere bin (bin5) only had 3.7%. On the other hand, the subtelomeric bins of chromosome 1 contained 18 and 21.7% of the recombination compared to 1.1% in the centromere. Similar values were obtained from the pooled data from five chromosomes (Table 2). As expected, fewer genes and less recombination were observed in the centromere, but there was also proportionally less recombination than in the subtelomere region based on the number of genes present. There were nearly six times more genes per centimorgan in the centromere than in the subtelomere.


View this table:
[in this window]
[in a new window]

 
Table 2. Genes and recombination in the subtelomere and centromere regions. The number of genes (shared overgos plus B73 bacterial artificial chromosome [BAC] ends) and genetic length values represent the per-chromosome average of chromosomes 1, 3, 4, 7, and 10. The percentage of total genes and recombination confined to the subtelomere and centromere regions was determined by dividing the average values by the totals for these five chromosomes. Subtelomere values are the average of physical distance bin1 and bin10. Centromere values are taken from the physical distance bin that contains the centromere. Average genetic length is a function of specific genetic cross in which it was determined, in this case, B73 x Mo17.

 
Distribution of Genic Sequences Unique to Either B73 or Mo17 Inbreds
In addition to genes that were shared between Mo17 and B73, the distribution of nonshared genic sequences, most likely representing pseudogenes-carrying Helitrons (Lai et al., 2005; Brunner et al., 2005), were examined as well. There were 4692 nonshared genic sequences that were identified by counting hybridization positives with overgo probes observed in allelic contig pairs (Table S-1).

In general, the distribution pattern of nonshared genic sequences along the chromosomes closely parallels that of shared genes (Table S-1) (r = 0.884–0.981). For example, on chromosome 1 (Fig. S-1), there are more nonshared genes localized to the distal ends of the chromosome (16–18%) than in the centromere region (2%). However, toward the centromere, the number of nonshared genic sequences declines more rapidly than the number of shared genes.

The observed differences in the distribution of overgo hybridization positives shared between the two inbreds versus those specific to either inbred can be expressed as a ratio of shared to inbred specific genic sequences (Fig. 2 ). The genic sequence content between the experimental inbreds is most similar near the centromere, as evidenced by a higher ratio of shared to nonshared genic sequences (7.8:1) compared with the subtelomere region (3.4:1), when two telomeric bins are compared with the centromeric bin for chromosomes 1, 3, 4, 7, and 10. Overall, the differences are subtle, except for chromosomes 1 and 9, which have very few nonshared genic sequences in the centromeric regions (Fig. 2).


Figure 2
Figure 2
Figure 2
Figure 2
Figure 2
View larger version (163K):
[in this window]
[in a new window]

 
Figure 2. Distribution of Mu-flanking sequences, recombination, and genic similarity along the chromosomes. Genetic similarity is defined here as the ratio between the count of shared overgo positives to inbred-specific overgo positives (see also Fig. S-1). The number of shared and inbred-specific overgo positives was counted in each of 10 bins, defined by physical size, along the maize chromosomes. The percentages of total recombination (cM) and the percentages of the total number of Mu-flanking sequences per chromosome were tallied in the physical distance bins.

 
The genomic locations of 1307 RescueMu-flanking sequences (Fernandes et al., 2004) were determined by matches to shared overgo parent sequences (n = 505) and to B73 BAC ends (n = 802). Peculiarly, there was a tendency for RescueMu-flanking sequences to be located distally on the short arm of the chromosomes. Physical distance bin1 contained 25% of the total sequences but was as high as 32.9 to 37.1% for individual chromosomes (1, 5, and 10). Overall, the distribution pattern of RescueMu-flanking sequences was similar to that of shared and nonshared genes in that 19.8% of the sequences were located in the subtelomere region and only 1.9% were found in the centromere region for chromosomes 1, 3, 4, 7, and 10. However, the correlation between these sequences and either shared genes or nonshared genes was moderate (r = 0.759, r = 0.751, respectively) (Table S-2).

Genomic Distribution of Retrotransposons
The numbers of long-terminal repeat (LTR) retrotransposons belonging to major families identified in B73 BAC ends (Table 3) were similar to previous findings (Messing et al., 2004). Between 50 and 65% of the BAC ends identified for each retrotransposon family were present on genetically mapped B73 BAC contigs and included in this analysis, comprising a total of more than 120000 mapped elements. Occurrences of each retrotransposon type were tallied for each of the chromosomes (Table 3). Overall, the retrotransposon counts are strongly correlated (r = 0.995) with physical map length of the chromosomes. To determine the distribution of retrotransposons along the chromosomes, the number of occurrences of each type was counted separately in the physical distance bins. Three distinct general patterns of LTR-retrotransposon distribution that emerged from this analysis: (i) centromere-preferred: relatively low abundance in the subtelomere region with a gradual increase in abundance that culminates near the centromere (Prem, Cinful, and Zeon); (ii) centromere-depleted: relatively low abundance in the centromere with a gradual increase in abundance that culminates in the subtelomere region (Opie, Ji, and Huck); and (iii) uniform distribution along a chromosome (Grande). Chromosome 1 (Fig. 3 ) displayed results typical to those of the other chromosomes (Table S-4). The uneven distribution of certain retrotransposons types between gene-rich (subtelomere) and gene-poor (centromere) regions is further demonstrated by combining data from several chromosomes (Fig. 4 ).


View this table:
[in this window]
[in a new window]

 
Table 3. (A) Numbers of retrotransposons belonging to major LTR-retrotransposon families identified in BAC end sequences, and the total placed on the B73 integrated genetic and physical map. (B) Distribution of seven major retrotransposons across the maize genome.

 

Figure 3
Figure 3
Figure 3
View larger version (80K):
[in this window]
[in a new window]

 
Figure 3. Distribution of retrotransposons along chromosome 1. Relative abundance of each retrotransposon on chromosome 1 in physical distance bins (10% chromosome each). Bin1 and bin10 constitute the subtelomere region and bin5 contains the centromere. (A) centromere-preferred distribution, (B) centromere-depleted distribution, (C) uniform distribution.

 

Figure 4
View larger version (11K):
[in this window]
[in a new window]

 
Figure 4. Distribution of seven major retrotransposons. Percentage of each retrotransposon type localized to the subtelomere or centromere regions. Values represent the pooled average for chromosomes 1, 3, 4, 7, and 10. Subtelomere values are the average of physical distance bin1 and bin10. Centromere values are taken from the physical distance bin that contains the centromere.

 

    Discussion
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results
 Discussion
 REFERENCES
 
An integrated genetic, physical, and genic map, with genetic markers and genic sequences placed on BAC contigs, was used for high-resolution comparison of the distribution of genes, transposons, and retrotransposons, as well as of recombination frequency along maize chromosomes. All of the features studied, at the chromosomal level, are highly correlated with the physical length of each chromosome when examined as a percentage of the genome total: recombination (r = 0.913), shared overgo-detected genes (r = 0.967), unique overgo-detected genes (r = 0.894), genic BAC ends (r = 0.942), Mu-flanking sequences (r = 0.948), and the cumulative count of retroelements (r = 0.996). In contrast, most of these features display a distinctly nonuniform distribution along each chromosome.

The distributions of both genes and crossovers are not uniform. Gene density and recombination frequency are much higher near telomere regions than that near the centromere regions (Fig. 1). These results are consistent with lower-resolution cytogenetic (RN-cM map)–based analysis (Anderson et al., 2004; Anderson et al., 2003). A much better correlation between gene density and recombination frequency was observed in our study than in the cytogenetically based work map (R2 = 0.866 vs. R2 = 0.47). This difference is likely due to the 18-fold larger number of genes mapped and more precisely estimated physical and genetic distances between markers used in this study. This result strongly supports the hypothesis that recombination occurs mostly in genic or low-copy regions (Fu et al., 2002; Schnable et al., 1998; Thuriaux, 1977).

Although there is a close correlation between gene density and recombination frequency along each chromosome in maize, not all genic regions are equally active in recombination. The recombination frequency is lower near centromere regions and higher near telomere regions than would be predicted by a perfect correlation with gene density (Table 2). The centromeric regions are almost completely silent recombinationally but contain a significant number of genes (Fig. 1 and 2, Table 2). Similarly, in rice (Oryza L.), a recombination-free region spanning the centromere of chromosome 8 was observed, but the lack of recombination could not be explained by the lack of genes (Yan et al., 2005). It has been proposed that genic regions may adopt a less-condensed chromatin configuration and therefore be more accessible to the recombination machinery during meiosis (Fu et al., 2002). The heterochromatic regions near centromere are therefore highly condensed despite the presence of genes. The recombination suppression near centromere could also be an epigenetic feature (Yan et al., 2005), and some of the genes present there may be silenced. At present we have little information about the expression levels of the maize genes in centromeric regions. In rice it has been demonstrated that genes embedded within centromeric regions are transcribed normally and maintain a euchromatic histone modification patterns (Yan et al., 2006). If the relationship between gene density and recombination is causal, it can also be postulated that a certain minimum gene density is required before recombination is enabled, and that with increasing gene density saturation is achieved, to explain departures from perfect relationship.

Recently, the development of sperm-typing and linkage disequilibrium–based analyses in humans makes it possible to study recombination rate at a highly fine-scale level (Lynn et al., 2004; Kauppi et al., 2004; McVean et al., 2004; Myers et al., 2005). Sperm-typing experiments provide direct estimates of recombination rates in small physical regions. Regions surveyed by this technique show extensive heterogeneity in fine-scale recombination rates, with hot spots clustered in narrow regions (1–2 kb), which are surrounded by DNA sequences that are recombinationally inert (Kauppi et al., 2004). Close to 80% of the recombination occurs in 10 to 20% of the sequence, and more than 25000 recombination hot spots were identified in the human genome (Myers et al., 2005). These recombination hot spots seem to preferentially occur near genes but outside the transcribed domains. To what extent the differences in recombination rate over a large scale are due to differences in the number of hot spots or to the differences in the intensity of hot spots is unknown. Local recombination rate variations in small physical intervals have also been observed in maize (Fu et al., 2002) and Arabidopsis (Singer et al., 2006). Although teosinte glume architecture (tga1) locates near the centromere of maize chromosome 4, where there is very little recombination, enough recombination breakpoints were identified in a moderate-size population to delimit the tga1 locus to a 1-kb interval during the positional cloning of tga1 (Wang et al., 2005). This demonstrates that despite clear overall patterns of recombination along the length of chromosome, extreme local variations in recombination rate in maize may be common. Whether the pattern of recombination at the fine-scale level in maize and other plant species is similar to that in humans is not addressed by our work and remains to be determined.

In each of the inbred lines, B73 and Mo17, approximately 12 to 13% of the genes detected by hybridization were not detected in the syntenic position in the other inbred line. Many such nonallelisms are likely to represent pseudogene-carrying Helitrons (Brunner et al., 2005; Morgante et al., 2005), although some may be true genic differences between inbreds (Lai et al., 2004; Xu and Messing, 2006) Interestingly, there is a lesser proportion of nonshared (B73- or Mo17-specific) genic loci near the centromeres than in the subtelomeric regions (Fig. 2, 5 , Table S-3). The distribution of shared genes in the subtelomeric region versus centromere region differs significantly from the distribution of nonshared genes ({chi}2 = 21.17, p ≤ 0.001) (Table S-3). The reason for this difference is not known, although two difficult-to-distinguish mechanisms could be proposed: a preference of the helitron elements for insertion in the telomeric regions, or a reduction in diversity in recombination-poor region as a result of background selection, a phenomenon that has been predicted theoretically and observed in a number of species (Begun and Aquadro, 1992; Dvorak et al., 1998; Stephan and Langley, 1998). The effect of background selection on diversity has been investigated in maize (Tenaillon et al., 2002), but it was only observed for rapidly evolving markers (simple sequence repeat), but not for single nucleotide polymorphisms. The Helitrons have been active in the maize genome recently, and, therefore, the signature of background selection may still be observable.


Figure 5
View larger version (25K):
[in this window]
[in a new window]

 
Figure 5. Distribution Of genes, transposons, and recombination. Values represent the pooled average for chromosomes 1, 3, 4, 7, and 10. Subtelomere values are the average of physical distance bin1 and bin10. Centromere values are taken from the physical distance bin that contains the centromere.

 
It is well documented that insertions of Mutator transposon, or its derivatives, is biased toward low-copy or genic regions (Bennetzen and Freeling, 1993; Cresse et al., 1995; Fernandes et al., 2004). In proportion to shared genes, RescueMu insertions, like putative Helitron insertions, seem to be less frequent than expected from gene density near centromere than near telomere (Fig. 3). Moreover, RescueMu insertions are more strongly excluded from centromeric regions than shared genes ({chi}2 = 5.63, p ≤ 0.025) and behave similarly to nonshared genes (Table S-3). These observations appear to indicate that recombination and transposon insertion share certain common mechanism (e.g., formation of double-strand breaks), that they are similarly affected by chromatin structure, or that transposon insertions occur during recombination.

Large differences in the distribution of retrotransposons along the chromosomes were found (Fig. 4). Only 6% of Cinful, Zeon, and Prem elements reside in the subtelomere region, while up to 20% of these elements can be found in the centromere. The CRR retrotransposons of rice and related CRM maize elements have been reported to have an even stronger preference for centromeric repeats region of the genome (Nagaki et al., 2005). The opposite trend is observed for Opie, Ji, and Huck, which have ~11% of occurrences in the subtelomere and ~5% of occurrences around the centromere, similar to the rice Tos17 element, which has a genic region preference (Miyao et al., 2003). Among the elements studied here, only Grande displays equal distribution, as both the subtelomere and centromere regions contain 10% of this element. Yet these two contrasting regions of the chromosome harbor similar total numbers of retrotransposons. The breakdown of distribution pattern types does not coincide with the class of retrotransposon, as Ji, Opie, and Prem are Ty1/copia-like and the others are Ty3/gypsy-like, and yet such drastic differences in preferences for insertion sites seem to imply mechanistic differences in retrotransposition. Differences in the sequence preferences of the nicking endonuclease have been implicated in the specificity of L1 insertion (Cost et al., 2001).


    ACKNOWLEDGMENTS
 
The authors thank Evgueni Ananiev for sharing hybridization data from centromeric probes and Ying Zhang for bioinformatics support. Michele Morgante and Scott Tingey contributed with generous advice and ideas. Howie Smith and Barbara Mazur read the manuscript and helped improve it.

Received for publication November 28, 2006.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and Methods
 Results
 Discussion
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Supplemental Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fengler, K.
Right arrow Articles by Rafalski, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Fengler, K.
Right arrow Articles by Rafalski, A.
Agricola
Right arrow Articles by Fengler, K.
Right arrow Articles by Rafalski, A.
Related Collections
Right arrow Plant Genetic Resources
Right arrow Crop Genetics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome