Abstract

LTR retrotransposons may be important contributors to host gene evolution because they contain regulatory and coding signals. In an effort to assess the possible contribution of LTR retrotransposons to C. elegans gene evolution, we searched upstream and downstream of LTR retrotransposon sequences for the presence of predicted genes. Sixty-three percent of LTR retrotransposon sequences (79/124) are located within 1 kb of a gene or within gene boundaries. Most gene-retrotransposon associations were located along the chromosome arms. Our results are consistent with the hypothesis that LTR retrotransposons have contributed to the structural and/or regulatory evolution of genes in C. elegans.

Introduction

The relative abundance of transposable elements (TEs) in eukaryotic genomes varies considerably among species. For example, it is estimated that 3% of the S. cerevisiae genome (Kim et al. 1998) and 6% of the C. elegans genome (C. elegans Sequencing Consortium 1998; Kidwell 2002) are composed of TEs, and up to 90% of the genomes of many higher eukaryotes are composed of TEs (e.g., Drosophila, 10% to 20% [Adams et al. 2000; Hoskins et al. 2002; Kaminker et al. 2002]; Arabidopsis, 10% [The Arabidopsis Genome Initiative 2000]; Mus, 37% [Smit 1999; Waterston et al. 2002]; Homo sapiens, 43% [Li et al. 2001]; Pinus, 90% [Flavell 1986; Pearce et al. 1996]). For many years, TEs have been viewed as either neutral or deleterious components of genomes (e.g., Orgel and Crick 1980; Charlesworth, Sniegowski, and Stephan 1994). According to this view, TEs located in or near genes (“gene” as used in this paper refers to the transcriptional unit including introns and exons) are likely to be detrimental to gene function and will be removed by natural selection. Alternatively, TEs can be adaptively beneficial to genes and may contribute to adaptive evolution (e.g., McDonald 1993, 1995; Brosius 1999; Kidwell and Lisch 2001).

Genomic sequence analysis has proved to be a useful tool in efforts to understand the possible adaptive significance of TEs in gene and genome evolution. One group of TEs, the retrotransposons, has been studied in this regard. Retrotransposons are the most abundant group of TEs in the human genome and have a lifecycle analogous to that of infectious retroviruses (Boeke et al. 1985). Retrotransposon sequences are transcribed by host transcription complexes, and these transcripts are reverse transcribed by element-encoded reverse transcriptase (RT). As a consequence, retrotransposons contain many cis-regulatory components typical of eukaryotic genes, including promoter and enhancer sequences as well as termination and polyadenylation signals (fig. 1). The effect of these regulatory sequences are not always limited to the retroelements in which they are contained but may also influence the expression of adjacent genes (e.g., Kapitonov and Jurka 1999; Mager et al. 1999; Baust et al. 2000; Llorens and Marin 2001; Medstrand, Landry, and Mager 2001; Stokstad 2001; Jordan et al. 2003). In addition to regulatory effects, retrotransposons may also contribute to the coding regions of genes. For example, in a preliminary study of the human genome, Nekrutenko and Li (2001) discovered that about 4% of human genes have a retrotransposon component in the coding region. Thus, retrotransposons are a significant source of regulatory and coding region variation and a potentially important factor in gene evolution.

To date, whole-genome analyses of the impact of retroelement sequences on gene structure and function has been limited primarily to the human genome (e.g., Nekrutenko and Li 2001; Medstrand, van de Lagemaat, and Mager 2002; Jordan et al. 2003). In this paper, we report the results of a comprehensive genomic study of the contribution of retrotransposon sequences to gene structure and function in the genome of the nematode C. elegans. Seventy genes are located within 1 kb of a retrotransposon sequence; that is, within regions believed to be capable of exerting cis-regulatory effects on C. elegans gene expression (McGhee and Krause 1997). An additional 40 genes were identified with a retrotransposon within the boundaries of a gene; that is, in the exons or introns. Further, we show that the observed number of transposons within a 1000-bp to 500-bp window of genes is greater than expected by chance. Our results are consistent with the hypothesis that retrotransposons have contributed to the evolution of gene structure and function in C. elegans.

Methods

Data Collection

A flat file annotating the chromosomal position of each retrotransposon was created for use with the Wormbase Genome Browser (www.wormbase.org/db/seq/gbrowse [Stein et al. 2002]) using a previously defined data set of 124 LTR retrotransposons in C. elegans (Ganko, Fielman, and McDonald 2001). Next, a 5,000-bp sequence window upstream and downstream of each retrotransposon was visually searched via the Genome Browser for the presence of the nearest predicted gene region. The distance for each association was recorded from the closest retrotransposon coordinate to the nearest open reading frame (ORF) coordinate of the associated gene. In cases where more than one gene was located within 5 kb upstream or downstream of a given retrotransposon, only the most proximal gene on either side was scored as an association. A 5-kb window size was chosen on the basis of estimates of average intergenic distances in the C. elegans genome and potential regulatory region size (C. elegans Sequencing Consortium 1998).

Information regarding the function, expression, and homologs of each gene was collected from various sources. For most genes, information on function and size was available from NCBI and Wormbase gene reports (Spring 2002 data releases). EST data were obtained through Blasts of the NCBI “est” database. Exon boundaries were based on reports in Wormbase and NCBI. Conserved domains were predicted with the NCBI CDD-Conserved Domain Database. C. briggsae homology data was obtained from Wormbase (WABA predictions, Kent and Zahler 2000) and directly from the Washington University C. briggsae blast server (http://genome.wustl.edu/projects/cbriggsae, Spring 2002 data). MacVector version 7.0 was used to annotate and collate gene information from all sources as well as provide graphical representations of gene/retrotransposon association regions.

Statistical Analysis

The goal of the statistical analysis was to determine whether the distribution of TEs in the genome deviates from the random expectation and, in particular, whether TEs tend to lie near genes. We test two null hypotheses: (1) the location of TEs follows a uniform distribution in the nongenic genome (the term “nongenic genome” refers to the nontranscriptional regions upstream and downstream of genes), and (2) the location of TEs follows a uniform distribution throughout the entire genome.

To test the first hypothesis, we define windows of length 1,000 bp upstream and downstream of each gene. This window is defined to contain only nongenic genome. The window is shortened if the distance to the next gene is less than 1,000 bp. A TE is located in the window if its nearest end to the gene is located within the window. The following discussion will be in terms of a window on the 5′ end of the gene, but identical arguments apply for the 3′ window.

Under the null hypothesis, the probability p that a particular TE is located in an upstream window is simply the length of nongenic genome within 1,000 bp of a 5′ end divided by the total length of nongenic genome. Then, the probability that x out of N total TEs is located in upstream windows is given by the binomial distribution.  

formula
The use of the binomial distribution assumes (1) that the initial insertion point and subsequent survival of each TE is independent of other TEs and (2) that the probability of insertion in a window and subsequent survival is the same for all TEs. Since the density of TEs relative to the entire genome is low, these assumptions seem reasonable. Given the observed number x of TEs located in the window, the probability under the null hypothesis can be calculated using equation (1).

The quantity p was calculated as  

formula
Where j is over all genes, G is the total nongenic genome length, and Ij,j+1 is given by  
formula
One complication arises when a TE is upstream of two different genes. Equation (2) does not consider the orientation of genes and overcounts the amount of genome within the 1,000-bp window of the 5′ end. In such cases, the TE was only counted once, which is conservative.

We also consider a window consisting of nongenic genome that is between 500 and 1,000 bp from the nearest 5′ end. The new quantity p for this case is found by repeating the calculation in equation (2) with 500 bp instead of 1,000 bp and subtracting from the original p. For our calculations to remain conservative, a TE is not counted as being in the window if it is upstream of two genes. The calculations for the second null hypothesis (2) defined above are carried out in a similar manner, with the obvious modifications.

Results

Many Retrotransposon Sequences Are Closely Associated with Genes in C. elegans

The search for potential host gene/retrotransposon associations began with a defined data set of 124 C. elegans LTR retrotransposons (Ganko, Fielman, and McDonald 2001). One-third (1/3) of the retrotransposons in C. elegans are gypsy-like elements (families Cer1 to Cer6). The remaining elements are members of the Bel clade of retrotransposons (families Cer7 to Cer20). On average, full-length Bel-like elements are larger than gypsy-like elements, and their fragments are more numerous in the C. elegans genome (Ganko, Fielman, and McDonald 2001). Eighty-two Bel-like element sequences constitute 356,195 bp (83%) of the retrotransposon component of the C. elegans genome, whereas 42 gypsy-like elements constitute only 71,728 bp (17%).

The retrotransposon data set was used to create an annotation file readable by the Wormbase genome browser (Stein et al. 2002). This file was used to visualize the location of retrotransposons, genes, and other genomic features within a given chromosomal region. Analysis of genomic sequence from a 5-kb window on either side of each retrotransposon resulted in the identification of 190 gene/retrotransposon associations (tables 1 and 2). Forty (40) retrotransposon sequences were found to be associated with a single gene, and 75 were associated with genes both upstream and downstream of the TE. Only nine retrotransposon sequences were not located within 5 kb of any gene.

Solo LTRs are the most abundant retrotransposon sequence in the C. elegans genome (Ganko, Fielman, and McDonald 2001), and we found them to be the retrotransposon sequence most frequently associated with genes. However, there was no detectable bias for or against the location of fragments or full-length elements near genes (table 3). The chromosomal distribution of gene/element associations is correlated with the overall distribution of retrotransposon sequences in the genome; that is, most retrotransposon sequences (Ganko, Fielman, and McDonald 2001) and most gene/element associations are located along the chromosome arms (fig. 2).

Most C. eleganscis-regulatory regions have been shown to extend approximately 1 kb upstream of transcriptional start sites (McGhee and Krause 1997). We find 70 instances in which a retrotransposon sequence lies within 1 kb of a gene (table 1 and fig. 3A and B). The number of retrotransposon sequences located 1 kb upstream of genes is significantly greater than expected (P < 0.025). Further examination of the 1-kb upstream window revealed that retrotransposon sequences were overrepresented within a 500-bp to 1000-bp window (P < 0.0029). This result is significant considering approximately 4% of the C. elegans genome is contained within a 500-bp to 1000-bp window of intergenic space near genes, whereas 12% of Cer elements are found within the same region.

We also investigated the strand orientation of each retrotransposon in relation to its associated gene. Sense and antisense associations were found to be equally abundant. In 97 associations, the gene and the retrotransposon sequence are both in the sense orientation, and in the remaining 93 associations, the gene and retrotransposon are in the antisense configuration. This nearly 50/50 ratio holds for all upstream and downstream associations in which the gene and element are located 4000 bp or less from one another. For the nine associations in which the gene and element are located more than 4000 bp apart, eight of the nine associations are in the sense configuration.

Associated Gene Function and Homology

Functional information for each gene associated with a retrotransposon was analyzed to confirm the validity of the genes. Several studies have addressed the quality of gene identification and prediction in C. elegans (e.g., Harrison, Echols, and Gerstein 2001; Reboul et al. 2001; Mounsey, Bauer, and Hope 2002). The consensus conclusion of these studies is that 80% to 90% of C. elegans's predicted genes are “real” or functional, whereas the remainder are likely pseudogenes or false predictions. We find that 125 of the 190 genes associated with retrotransposon sequences have one or more identifiable functional domains or are members of established homolog families. In addition, about half (49%) of all retrotransposon sequences are associated with genes having medium to high identity with C. briggsae homologs as defined by Wormbase (93 C. briggsae homologs/190 total associations). Pooling these findings, we conclude that at least 172 of the 190 genes (90.5%) found to be associated with retrotransposon sequences in our study have functional or phylogenetic support.

Some Cer Elements Are Within Genes

We discovered 40 genes containing a Cer retrotransposon component, meaning a retrotransposon was identified within predicted gene boundaries (hereafter an “internal association”). In some cases, a retrotransposon sequence lies within two genes, so 35 (of 124) retrotransposons are responsible for the 40 internal associations. Since genic regions represent approximately 52% of the C. elegans genome, this result is significantly lower than expected (chi-squared test, expect 64.5 Cer TEs, P < 4.28−8) if insertion sites are assumed to be random.

The frequency of solo LTRs (18), fragments (nine), and full-length elements (eight) in genes is consistent with the frequency observed for all associations. As with all gene/element associations in C. elegans, sense (21) and antisense (19) associations are equally abundant. There are more than three times more Bel-like (27) than gypsy-like (eight) element sequences located within the boundaries of genes. This result contrasts with the approximately two times greater number of Bel-like element sequences present in the entire C. elegans genome. Cer9 is one Bel-like element that accounts for nearly a quarter of all internal associations (table 4).

Thirty-five percent (14/40) of internal associations involve an element exclusively within an intron, and 23% (9/40) involve an element exclusively within an exon. Element sequences that extend into both intron and exon regions account for the remaining 43% (17/40) of internal associations. In terms of percent contribution to genes, internally-associated retrotransposons vary from 0.4% to 87.3% of the DNA of C. elegans genes (table 4). The mean contribution of retrotransposon sequences to internally associated genes (including intron and exon regions) is 23%. Eleven internally associated genes (28%) have EST support, and 18 genes (45%) show homology to C. briggsae genomic sequences.

Discussion

C. elegans is an attractive model system for the study of the contribution of TEs to genome evolution. The nematode worm has a tractable, sequenced genome (100 Mb) with an active annotation database (C. elegans Sequencing Consortium 1998; Stein et al. 2001). Additionally, the C. elegans sister species, C. briggsae, is currently being sequenced, and the results will be available for comparative genomics in the near future. Utilizing these resources, along with a data set of C. elegans LTR retrotransposons (Ganko, Fielman, and McDonald 2001), we have conducted a comprehensive study of the potential contribution of LTR retrotransposons to C. elegans's gene evolution.

A total of 124 Cer retrotransposon sequences (full-length elements, fragmented elements, and solo LTRs) account for 0.4% of the C. elegans genome. Searching a 5-kb window both upstream and downstream of each Cer element sequence resulted in the identification of 190 gene-retrotransposon associations. Interestingly, 79 (63%) LTR retrotransposons map within 1 kb of a gene. Within this group, we discovered that retrotransposons are overrepresented upstream of genes, specifically in an intergenic region 1000 bp to 500 bp from genes. This is significant because most cis-regulatory sequences are believed to lie within 1 kb of the transcriptional start site of C. elegans genes (McGhee and Krause 1997). An additional 21.1% of all associations involved retrotransposon sequences located within introns, exons, or both.

Reports of TE content in humans indicate that more than 40% of the genome is composed of retroelement sequences (Li et al. 2001), and an estimated 4% of human protein-coding genes have been found to contain retrotransposon sequences (Nekrutenko and Li 2001). Additional studies suggest that the role of retrotransposon sequences on the regulation of human gene expression may also be significant. For example, it was recently estimated that approximately 24% of identified human promoter regions contain retrotransposon sequences (Jordan et al. 2003). Our results indicate that 190 of the 19,000 genes (1.0%) identified in the C. elegans genome (C. elegans Sequencing Consortium 1998; Reboul et al. 2001) are associated with retrotransposon sequences and that 28% (35/124) of all Cer element sequences are located within genes.

In a recent study of the distribution of retrotransposon sequences within the human genome, Medstrand et al. (2002) noted a significant decrease in the density of LTR retrotransposon sequences within 5 kb of genes. Moreover, those retrotransposon sequences located near human genes are relatively recent insertions and most often in an antisense configuration with respect to the adjacent gene. The authors interpret these results to suggest that most retrotransposon insertions proximal to human genes, and especially those in a sense configuration, are nonadaptive and selected against. In contrast to the pattern observed in humans, our results demonstrate that well over half of all retrotransposon sequences in the C. elegans genome (57.9%) are located in or within 1 kb of genes, with no bias against sense associations observed. At least two hypotheses may help account for these differences.

Protection from deletion or recombination may explain why TEs are close to genes in C. elegans. The relatively small size of the C. elegans genome has been attributable, in part, to a significantly higher rate of deletion than humans and other animals (Kent and Zahler 2000; Robertson 2000). In addition, C. elegans is estimated to have up to a 1440-fold higher rate of genome rearrangement than humans and other mammals (Coghlan and Wolfe 2002). Recombination breakpoints in C. elegans are typically associated with repetitive sequences, including retrotransposon sequences (Coghlan and Wolfe 2002). Deletion or recombination events involving retrotransposon sequences in or near genes may have an adverse effect and thus be selected against. Such a scenario might help explain the clustering of retrotransposon sequences that are not otherwise deleterious in or around genes.

Another possible explanation of the abundance of retrotransposon sequences in or near C. elegans genes is that they are of adaptive benefit. Indeed, there is a growing body of evidence from a number of systems (Makalowski 2000; Medstrand, Landry, and Mager 2001; Nigumann et al. 2002) that retrotransposon sequences have contributed to adaptive changes in gene structure and regulation.

The central regions of C. elegans chromosomes are the general location of “housekeeping” genes and other essential genes displaying homology to genes even in distantly related species (C. elegans Sequencing Consortium 1998). In contrast, many nematode-specific genes are located along the chromosomal arms. Interestingly, C. elegans transposons and other repeats also tend to cluster on the chromosomal arms (Surzycki and Belknap 2000; Ganko, Fielman, and McDonald 2001). The chromosomal arms of C. elegans are regions of high insertional polymorphism, duplications, and intrachromosomal rearrangements (C. elegans Sequencing Consortium 1998). Insertions, duplications, chromosome rearrangements, and TEs may all have a role in the evolution of novel genes (Long 2001; Betrán and Long 2002). For these reasons, regions of the chromosomal arms of C. elegans might be viewed as an “evolutionary laboratory” where new genes are created and tested by natural selection. Low mobility species such as C. elegans may require a diverse group of specialized genes to successfully exploit their environment (Hodgkin 2001), and an ability to rapidly evolve new genes or new regulatory structures may be particularly important to these organisms. The fact that nearly all of the C. elegans genes that we have found to be in close association with retrotransposon sequences are located in the chromosome arms suggests that retrotransposon sequences may play a role in the evolution of new nematode genes. It will be interesting to determine if newly evolved genes in other species, including humans, show a preference for close association with retrotransposon sequences.

Supplementary Material

Please see the laboratory Web site (http://www.genetics.uga.edu/retrolab/data.html) and the journal's Web site for the table of associations with retrotransposon name and associated gene ID and the representative Cer family sequences.

Pierre Capy, Associate Editor

Fig. 1.

Potential gene/retrotransposon association schemes. Retrotransposons may provide regulation and/or coding regions to a gene. (A) Element acts as enhancer of host gene. (B) Element acts as polyadenylation or promoter signal within host gene. (C) Element contributes exon material. Schemes are not exclusive. For instance, a TE could provide both promotion and exon material

Fig. 1.

Potential gene/retrotransposon association schemes. Retrotransposons may provide regulation and/or coding regions to a gene. (A) Element acts as enhancer of host gene. (B) Element acts as polyadenylation or promoter signal within host gene. (C) Element contributes exon material. Schemes are not exclusive. For instance, a TE could provide both promotion and exon material

Fig. 2.

Distribution of gene/retrotransposon associations in the C. elegans genome. A genomic coordinate value for each Cer retrotransposon was calculated and plotted to its previous chromosome location (see Ganko, Fielman, and McDonald 2001). Chromosomes were divided into three regions (left, centric, and right) marked by vertical hash marks. Open circles represent retrotransposons with an associated gene, and each retrotransposon located inside a gene is marked by a closed circle. Retrotransposons lacking a gene within 5 kb are marked by the symbol ×

Fig. 2.

Distribution of gene/retrotransposon associations in the C. elegans genome. A genomic coordinate value for each Cer retrotransposon was calculated and plotted to its previous chromosome location (see Ganko, Fielman, and McDonald 2001). Chromosomes were divided into three regions (left, centric, and right) marked by vertical hash marks. Open circles represent retrotransposons with an associated gene, and each retrotransposon located inside a gene is marked by a closed circle. Retrotransposons lacking a gene within 5 kb are marked by the symbol ×

Fig. 3.

Distance distributions between LTR retrotransposon and associated gene. (A) 190 gene/LTR retrotransposon associations within a 5-kb window were sorted into 1-kb bins. Light gray shading denotes internal element contribution (top rectangle of the 0 bp to 1,000 bp column). (B) Distribution of retrotransposon associations upstream and downstream of a gene within a window of 1bp to 2000 bp. A model gene is represented by the small gray bar in the center and is not scaled to size

Fig. 3.

Distance distributions between LTR retrotransposon and associated gene. (A) 190 gene/LTR retrotransposon associations within a 5-kb window were sorted into 1-kb bins. Light gray shading denotes internal element contribution (top rectangle of the 0 bp to 1,000 bp column). (B) Distribution of retrotransposon associations upstream and downstream of a gene within a window of 1bp to 2000 bp. A model gene is represented by the small gray bar in the center and is not scaled to size

Table 1

Distribution of Distances Between Genes and Cer Retrotransposons in C. elegans.

Distance Gene/Element Associations % Total 
Internal (within gene) 40 21.1 
1–1000 bp 70 36.8 
1000–2000 bp 33 17.4 
2001–3000 bp 25 13.2 
3001–4000 bp 14 7.4 
4001–5000 bp 4.2 
Total 190  
Distance Gene/Element Associations % Total 
Internal (within gene) 40 21.1 
1–1000 bp 70 36.8 
1000–2000 bp 33 17.4 
2001–3000 bp 25 13.2 
3001–4000 bp 14 7.4 
4001–5000 bp 4.2 
Total 190  
Table 2

Gene/Retrotransposon Associations per Cer Family.

TE Family Internal (0 bp) External (1–5000 bp) Total Associations 
Cer1 
Cer2 10 
Cer3 15 20 
Cer4 
Cer5 21 21 
Cer6 
Cer7 
Cer8 
Cer9 10 19 
Cer10 
Cer11 
Cer12 19 24 
Cer13 
Cer14 
Cer15 12 
Cer16 21 24 
Cer17 
Cer19 14 16 
Cer20 10 
Total 40 150 190 
TE Family Internal (0 bp) External (1–5000 bp) Total Associations 
Cer1 
Cer2 10 
Cer3 15 20 
Cer4 
Cer5 21 21 
Cer6 
Cer7 
Cer8 
Cer9 10 19 
Cer10 
Cer11 
Cer12 19 24 
Cer13 
Cer14 
Cer15 12 
Cer16 21 24 
Cer17 
Cer19 14 16 
Cer20 10 
Total 40 150 190 
Table 3

Gene/Retrotransposon Associations for Full-Length, Fragmented, or Solo LTR Retrotransposons.

Associations Full-length Fragmented Solo LTR Total 
10 23 40 
15 16 44 75 
Associations Full-length Fragmented Solo LTR Total 
10 23 40 
15 16 44 75 
Table 4

Genes with a Cer Retrotransposon Component.

LTR Family Gene ID TE/Exon % TE/Intron % TE/Gene % 
Cer2 f53e10.5 11.9 5.4 7.8 
Cer2-1 k08d10.5 61.7 100.0 62.6 
Cer3 f58h7.7 6.0 5.7 5.9 
Cer3-1 k09h9.7 0.0 42.9 28.0 
Cer3-1 y39b6a.b 0.0 9.6 6.5 
Cer3-1 y75b8a.27 0.0 9.0 6.3 
Cer3-1 y23h5b.7a 20.8 1.4 3.2 
Cer6 y73f8a.11 0.0 4.1 3.2 
Cer7 h08m01.2 0.0 2.3 1.8 
Cer8 c03a7.12 32.3 76.6 65.4 
Cer8 c03a7.13 11.3 0.0 7.7 
Cer9 f07b7.14 32.8 39.2 37.0 
Cer9 c40a11.1 1.9 0.0 1.5 
Cer9 b0047.4 23.4 0.0 21.7 
Cer9 f15a2.4 0.0 29.0 19.7 
Cer9 f07b7.8 2.3 0.0 1.3 
Cer9 k06c4.1 0.8 0.0 0.4 
Cer9 c33c12.4 22.8 7.9 12.6 
Cer9 c56g3.2 24.8 0.0 19.9 
Cer9 y57a10a.30a 15.8 22.5 18.3 
Cer12 y60a3a.5 21.4 0.8 3.6 
Cer12 w03g1.9 10.4 0.0 4.6 
Cer12 c04g6.7 14.6 0.0 6.3 
Cer12-1 zc15.5 41.4 46.1 44.6 
Cer12-1 zc15.2 4.9 19.7 15.5 
Cer13 c09b9.3 0.0 19.4 12.6 
Cer15-1 y40h7a.6 36.1 14.1 17.4 
Cer15-1 f19b2.1 36.7 82.5 72.9 
Cer15-1 f19b2.8 1.3 0.0 0.4 
Cer16-1 y71h2am.3 0.0 15.4 11.4 
Cer16-1 f20b4.6 0.0 16.0 11.4 
Cer16-2 6r55.2 33.6 25.8 27.6 
Cer17 r52.10a 0.0 92.3 78.8 
Cer19 t06a10.2 86.6 87.5 87.3 
Cer19 f35h10.3 11.2 41.3 34.8 
Cer20 y87g2a.11 0.0 40.1 35.3 
Cer20 f41b5.5 0.0 83.1 69.7 
Cer20 r11g10.1b 0.0 14.2 8.5 
Cer20 t28d6.2 0.0 12.8 7.7 
Cer20 k01d12.3 3.5 54.5 21.8 
LTR Family Gene ID TE/Exon % TE/Intron % TE/Gene % 
Cer2 f53e10.5 11.9 5.4 7.8 
Cer2-1 k08d10.5 61.7 100.0 62.6 
Cer3 f58h7.7 6.0 5.7 5.9 
Cer3-1 k09h9.7 0.0 42.9 28.0 
Cer3-1 y39b6a.b 0.0 9.6 6.5 
Cer3-1 y75b8a.27 0.0 9.0 6.3 
Cer3-1 y23h5b.7a 20.8 1.4 3.2 
Cer6 y73f8a.11 0.0 4.1 3.2 
Cer7 h08m01.2 0.0 2.3 1.8 
Cer8 c03a7.12 32.3 76.6 65.4 
Cer8 c03a7.13 11.3 0.0 7.7 
Cer9 f07b7.14 32.8 39.2 37.0 
Cer9 c40a11.1 1.9 0.0 1.5 
Cer9 b0047.4 23.4 0.0 21.7 
Cer9 f15a2.4 0.0 29.0 19.7 
Cer9 f07b7.8 2.3 0.0 1.3 
Cer9 k06c4.1 0.8 0.0 0.4 
Cer9 c33c12.4 22.8 7.9 12.6 
Cer9 c56g3.2 24.8 0.0 19.9 
Cer9 y57a10a.30a 15.8 22.5 18.3 
Cer12 y60a3a.5 21.4 0.8 3.6 
Cer12 w03g1.9 10.4 0.0 4.6 
Cer12 c04g6.7 14.6 0.0 6.3 
Cer12-1 zc15.5 41.4 46.1 44.6 
Cer12-1 zc15.2 4.9 19.7 15.5 
Cer13 c09b9.3 0.0 19.4 12.6 
Cer15-1 y40h7a.6 36.1 14.1 17.4 
Cer15-1 f19b2.1 36.7 82.5 72.9 
Cer15-1 f19b2.8 1.3 0.0 0.4 
Cer16-1 y71h2am.3 0.0 15.4 11.4 
Cer16-1 f20b4.6 0.0 16.0 11.4 
Cer16-2 6r55.2 33.6 25.8 27.6 
Cer17 r52.10a 0.0 92.3 78.8 
Cer19 t06a10.2 86.6 87.5 87.3 
Cer19 f35h10.3 11.2 41.3 34.8 
Cer20 y87g2a.11 0.0 40.1 35.3 
Cer20 f41b5.5 0.0 83.1 69.7 
Cer20 r11g10.1b 0.0 14.2 8.5 
Cer20 t28d6.2 0.0 12.8 7.7 
Cer20 k01d12.3 3.5 54.5 21.8 

Note.—TE/exon % is a function of the number of TE nucleotides within predicted exon boundaries ÷ total number of nucleotides within the exons of a given gene. TE/intron % substitutes the values for intron boundaries. TE/gene % combines the exon and intron calculations.

Our laboratory is supported by grants from the National Institutes of Health. We thank Eileen Kraemer (University of Georgia, Computer Science) for statistical advice and helpful discussion.

Literature Cited

Adams, M. D., S. E. Celniker, and R. A. Holt, et al. (195 co-authors).
2000
. The genome sequence of Drosophila melanogaster.
Science
 
287
:
2185
-2195.
The Arabidopsis Genome Initiative.
2000
. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
Nature
 
408
:
796
-815.
Baust, C., W. Seifarth, H. Germaier, R. Hehlmann, and C. Leib-Mosch.
2000
. HERV-K-T47D-Related long terminal repeats mediate polyadenylation of cellular transcripts.
Genomics
 
66
:
98
-103.
Betrán, E., and M. Long.
2002
. Expansion of genome coding regions by acquisition of new genes.
Genetica
 
115
:
65
-80.
Boeke, J. D., D. J. Garfinkel, C. A. Styles, and G. R. Fink.
1985
. Ty elements transpose through an RNA intermediate.
Cell
 
40
:
491
-500.
Brosius, J.
1999
. Genomes were forged by massive bombardments with retroelements and retrosequences.
Genetica
 
107
:
209
-238.
C. elegans Sequencing Consortium.
1998
. Genome sequence of the nematode C. elegans: a platform for investigating biology.
Science
 
282
:
2012
-2018.
Charlesworth, B., P. Sniegowski, and W. Stephan.
1994
. The evolutionary dynamics of repetitive DNA in eukaryotes.
Nature
 
371
:
215
-220.
Coghlan, A., and K. H. Wolfe.
2002
. Fourfold faster rate of genome rearrangement in nematodes than in Drosophila.
Genome Res.
 
12
:
857
-867.
Flavell, R. B.
1986
. Repetitive DNA and chromosome evolution in plants.
Philos. Trans. R. Soc. Lond. B Biol. Sci.
 
312
:
227
-242.
Ganko, E. W., K. T. Fielman, and J. F. McDonald.
2001
. Evolutionary history of Cer elements and their impact on the C. elegans genome.
Genome Res.
 
11
:
2066
-2074.
Harrison, P. M., N. Echols, and M. B. Gerstein.
2001
. Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.
Nucleic Acids Res.
 
29
:
818
-830.
Hodgkin, J.
2001
. What does a worm want with 20,000 genes?
Genome Biol.
 
2
:
comment2008
.
Hoskins, R. A., C. D. Smith, and J. W. Carlson, et al. (16 co-authors).
2002
. Heterochromatic sequences in a Drosophila whole-genome shotgun assembly.
Genome Biol.
 
3
:
RESEARCH0085–0085
.
Jordan, I. K., I. B. Rogozin, G. V. Glazko, and E. V. Koonin.
2003
. Origin of a substantial fraction of human regulatory sequences from transposable elements.
Trends Genet.
 
19
:
68
-72.
Kaminker, J. S., C. M. Bergman, and B. Kronmiller, et al. (12 co-authors).
2002
. The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective.
Genome Biol.
 
3
:
RESEARCH0084–0084
.
Kapitonov, V. V., and J. Jurka.
1999
. The long terminal repeat of an endogenous retrovirus induces alternative splicing and encodes an additional carboxy-terminal sequence in the human leptin receptor.
J. Mol. Evol.
 
48
:
248
-251.
Kent, W. J., and A. M. Zahler.
2000
. Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.
Genome Res.
 
10
:
1115
-1125.
Kidwell, M. G.
2002
. Transposable elements and the evolution of genome size in eukaryotes.
Genetica
 
115
:
49
-63.
Kidwell, M. G., and D. R. Lisch.
2001
. Perspective: transposable elements, parasitic DNA, and genome evolution.
Int. J. Org. Evol.
 
55
:
1
-24.
Kim, J. M., S. Vanguri, J. D. Boeke, A. Gabriel, and D. F. Voytas.
1998
. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence.
Genome Res.
 
8
:
464
-478.
Li, W. H., Z. Gu, H. Wang, and A. Nekrutenko.
2001
. Evolutionary analyses of the human genome.
Nature
 
409
:
847
-849.
Llorens, C., and I. Marin.
2001
. A mammalian gene evolved from the integrase domain of an LTR retrotransposon.
Mol. Biol. Evol.
 
18
:
1597
-1600.
Long, M.
2001
. Evolution of novel genes.
Curr. Opin. Genet. Dev.
 
11
:
673
-680.
Mager, D. L., D. G. Hunter, M. Schertzer, and J. D. Freeman.
1999
. Endogenous retroviruses provide the primary polyadenylation signal for two new human genes (HHLA2 and HHLA3).
Genomics
 
59
:
255
-263.
Makalowski, W.
2000
. Genomic scrap yard: how genomes utilize all that junk.
Gene
 
259
:
61
-67.
McDonald, J. F.
1993
. Evolution and consequences of transposable elements.
Curr. Opin. Genet. Dev.
 
3
:
855
-864.
McDonald, J. F.
1995
. Transposable elements: possible catalysts of organismic evolution.
Trends Ecol. Evol.
 
10
:
123
-126.
McGhee, J. D., and M. W. Krause.
1997
. Transcription factors and transcriptional regulation. Pp. 147–184 in D. L. Riddle, T. Blumenthal, B. J. Meyer, and J. R. Priess, eds. C. elegans II. Cold Spring Harbor Laboratory Press, Plainview, N.Y.
Medstrand, P., J. R. Landry, and D. L. Mager.
2001
. Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans.
J. Biol. Chem.
 
276
:
1896
-1903.
Medstrand, P., L. N. van de Lagemaat, and D. L. Mager.
2002
. Retroelement distributions in the human genome: variations associated with age and proximity to genes.
Genome Res.
 
12
:
1483
-1495.
Mounsey, A., P. Bauer, and I. A. Hope.
2002
. Evidence suggesting that a fifth of annotated Caenorhabditis elegans genes may be pseudogenes.
Genome Res.
 
12
:
770
-775.
Nekrutenko, A., and W. H. Li.
2001
. Transposable elements are found in a large number of human protein-coding genes.
Trends Genet.
 
17
:
619
-621.
Nigumann, P., K. Redik, K. Matlik, and M. Speek.
2002
. Many human genes are transcribed from the antisense promoter of l1 retrotransposon.
Genomics
 
79
:
628
-634.
Orgel, L. E., and F. H. C. Crick.
1980
. Selfish DNA: the ultimate parasite.
Nature
 
284
:
604
-607.
Pearce, S. R., G. Harrison, D. Li, J. Heslop-Harrison, A. Kumar, and A. J. Flavell.
1996
. The Ty1-copia group retrotransposons in Vicia species: copy number, sequence heterogeneity and chromosomal localisation.
Mol. Gen. Genet.
 
250
:
305
-315.
Reboul, J., P. Vaglio, and N. Tzellas, et al. (20 co-authors).
2001
. Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans.
Nat. Genet.
 
27
:
332
-336.
Robertson, H. M.
2000
. The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses.
Genome Res.
 
10
:
192
-203.
Smit, A. F.
1999
. Interspersed repeats and other mementos of transposable elements in mammalian genomes.
Curr. Opin. Genet. Dev.
 
9
:
657
-663.
Stein, L. D., C. Mungall, and S. Shu, et al. (11 co-authors).
2002
. The Generic genome browser: a building block for a model organism system database.
Genome Res.
 
12
:
1599
-1610.
Stein, L., P. Sternberg, R. Durbin, J. Thierry-Mieg, and J. Spieth.
2001
. WormBase: network access to the genome and biology of Caenorhabditis elegans.
Nucleic Acids Res.
 
29
:
82
-86.
Stokstad, E.
2001
. Entomology: first light on genetic roots of Bt resistance.
Science
 
293
:
778
.
Surzycki, S. A., and W. R. Belknap.
2000
. Repetitive-DNA elements are similarly distributed on Caenorhabditis elegans autosomes.
Proc. Natl. Acad. Sci. USA
 
97
:
245
-249.
Waterston, R. H., K. Lindblad-Toh, and E. Birney, et al. (222 co-authors).
2002
. Initial sequencing and comparative analysis of the mouse genome.
Nature
 
420
:
520
-562.