Abstract

We traced the sequence evolution of the active lineage of LINE-1 (L1) retrotransposons over the last ∼25 Myr of human evolution. Five major families (L1PA5, L1PA4, L1PA3B, L1PA2, and L1PA1) of elements have succeeded each other as a single lineage. We found that part of the first open-reading frame (ORFI) had a higher rate of nonsynonymous (amino acid replacement) substitution than synonymous substitution during the evolution of the ancestral L1PA5 through the L1PA3B families. This segment encodes the coiled coil region of the protein-protein interaction domain of the ORFI protein (ORFIp). Statistical analysis of these changes indicates that positive selection had been acting on this region. In contrast, the coiled coil segment hardly changed during the evolution of the L1PA3B to the present L1PA1 family. Therefore, selective pressure on the coiled coil segment has changed over time. We suggest that the fast rate of amino acid replacement in the coiled coil segment reflects the adaptation of L1 either to a changing genomic environment or to host repression factors. In contrast, the second open-reading frame and the nucleic acid–binding domain of the first open-reading frame are extremely well conserved, attesting to the strong purifying selection acting on these regions.

Introduction

L1 (LINE-1) non–long terminal repeat (LTR) retrotransposons (fig. 1A ) constitute the major family of autonomously transposing elements in mammals (Smit 1999<$REFLINK> ). The human genome contains ∼500,000 L1 elements that account for 17% of its mass (Lander et al. 2001<$REFLINK> ), attesting to the profound effect that L1 replication has had on mammalian genomes. L1 retrotransposition generates mostly defective copies which remain in the genome and accumulate mutations at the pseudogene rate (Voliva et al. 1983<$REFLINK> ; Hardies et al. 1986<$REFLINK> ; Pascale et al. 1993<$REFLINK> ). Novel replication-competent L1 variants are also produced, and can subsequently generate a family of several hundreds or thousands of copies that share the diagnostic features of their progenitor (or group of closely related progenitors) (reviewed in Furano 2000<$REFLINK> ). This process is exemplified in humans, where five major families (L1PA5, L1PA4, L1PA3B, L1PA2, and L1PA1) have succeeded each other as a single lineage over the last 25 Myr in the primate ancestors of modern day humans (fig. 1B ) (Boissinot, Entezam, and Furano 2001<$REFLINK> ).

About 70% of the full-length (potentially active) members of the ancestral L1PA5–L1PA2 families have apparently been cleared from the genome (Boissinot, Entezam, and Furano 2001<$REFLINK> ). As this loss was more profound from recombining regions of the genome when compared with regions of low or no recombination (e.g., the Y chromosome), we suggested that the full-length elements have been deleterious enough to be subjected to purifying selection. This has not yet occurred for L1PA1 which is actively transposing in humans, generating new inserts that cause both neutral polymorphisms and genetic defects (reviewed in Kazazian and Moran 1998<$REFLINK> ). Although these latter events would reduce the fitness of the host, we have proposed that a more global (dysgenic) effect of L1 activity accounted for the selection against the full-length L1 elements (Boissinot, Entezam, and Furano 2001<$REFLINK> ).

Whatever the case, the fact that L1 activity is deleterious enough to be subject to purifying selection suggests that control of L1 transposition would be important for maintaining host fitness. Aside from evidence suggesting that the general transcriptional inhibition imposed by DNA methylation (Jones 1999<$REFLINK> ) could as well silence L1 transcription (Nur, Pascale, and Furano 1988<$REFLINK> ; Thayer, Singer, and Fanning 1993<$REFLINK> ; Hata and Sakaki 1997<$REFLINK> ; Woodcock et al. 1997<$REFLINK> ), neither the regulation of L1 replication nor any other possible role played by the host has been examined in detail. Host factors that specifically repress or reduce L1 activity would be highly advantageous. In turn, such factors would constitute a selective pressure on L1 to evade repression. Thus, L1 evolution may in part reflect interactions between the element and its host.

To identify regions of L1 that could be involved in host-L1 interactions, we examined the evolutionary changes that occurred in the evolution of the active lineage of L1 elements from the ancestral L1PA5 family to the currently active L1PA1 family. In particular, we identified a region of the first open-reading frame (ORFI) that uniquely shows a high rate of nonsynonymous (amino acid replacement) substitutions, which is the typical signature of positive selection. The fact that this region of ORFI encodes a coiled coil domain that has been shown to mediate protein-protein interaction (Hohjoh and Singer 1996<$REFLINK> ; Martin, Li, and Weisz 2000<$REFLINK> ), suggests that the ORFI protein (ORFIp) could be involved in host-L1 interaction.

Materials and Methods

Sequence Alignment

Full-length elements belonging to the five most recent human L1 families (L1PA5, L1PA4, L1PA3B, L1PA2, and L1PA1 following the nomenclature in Smit et al. 1995<$REFLINK> ; Boissinot, Chevret, and Furano 2000<$REFLINK> ; Boissinot, Entezam, and Furano 2001<$REFLINK> ) were collected from the GenBank public database (table 1 ). These families encompass the last ∼25 Myr of L1 evolution in humans. We limited our analysis to these five families because they are less likely to be interrupted by more recent internal insertions than older families. The selected full-length elements were part of our previous work in which their classification was confirmed by phylogenetic analysis (Boissinot, Entezam, and Furano 2001<$REFLINK> ). L1 elements differ from each other by the family-specific mutations that they inherit from their progenitors, and by the mutations that they have accumulated at the neutral rate since insertion in the genome. As we were interested in the evolution of the active lineage, we focused only on the family-specific differences. We eliminated the mutations that arose after insertion by deriving a consensus sequence for each family. Mutations in the highly mutable CpG dinucleotides were eliminated from the consensus except when they corresponded to fixed differences among families. Alignment and consensus were created using the GCG program (Wisconsin Package, Version 10.0, Genetics Computer Group, Madison, Wis.). The alignment is available from the EMBL-ALIGN database (accession number, ALIGN_000165).

Sequence Analysis

Phylogenetic reconstructions were performed using the maximum-likelihood (ML) method as implemented in the PAML program package (Yang 2000<$REFLINK> ). Variations in the rate of evolution along a sequence can result from either selection or recombination. We used the program PLATO 2.0 (Grassly and Holmes 1997<$REFLINK> ) to identify such regions. First, an ML tree based on the entire coding region of L1 was calculated with PAML (Yang 2000<$REFLINK> ). Using the ML tree as null hypothesis, PLATO employs a sliding window through which deviations from the ML generated branch-lengths are calculated, thereby identifying regions that differ in evolutionary rate from the complete coding sequence.

ORFIp was analyzed for coiled coil domains using the program COILS (Lupas, Van Dyke, and Stock 1991<$REFLINK> ) at http://www.ch.embnet.org/software/COILS_form.html. COILS compares a given sequence to a database of sequences which are known to form coiled coil structures. COILS calculates the probability that the sequence of interest will adopt a coiled coil conformation.

Test for Selection

The effect of selection on a coding sequence can be estimated by comparing the synonymous (dS) and nonsynonymous (dN) substitution rates (for a review, see Yang and Bielawski 2000<$REFLINK> ). The value of the ratio ω = dN/dS is an indicator of the type and strength of selection. If nonsynonymous mutations have no effect on fitness, they are going to be fixed at the same rate as synonymous mutations and a value of ω = 1 is expected. If nonsynonymous mutations are deleterious, they are going to be fixed at a lower rate than synonymous mutations (i.e., negative or purifying selection) and ω will be <1. If nonsynonymous mutations are advantageous, they are going to be fixed faster than synonymous mutations (i.e., positive or adaptive selection) and ω will be >1. The parameter ω was estimated using the ML method of Goldman and Yang (1994)<$REFLINK> . In this method, parameters of a model of codon substitution are estimated from the data by ML and are used to calculate dN and dS. To test if dN is significantly different from dS, ω was fixed at 1 in the null model (i.e., neutrality), whereas ω was estimated as a free parameter in the alternative model (Yang 1998<$REFLINK> ). The double of the log-likelihood difference between the two models is compared with a χ2 distribution with one degree of freedom to test whether ω is different from 1. All these calculations were performed using the codem1 program of the PAML package (Yang 2000<$REFLINK> ).

Results

Consensus sequences for the L1PA5 (13 elements), L1PA4 (13), L1PA3B (14), L1PA2 (22), and L1PA1 (14) families were derived from the elements listed in table 1 . We used the Ta-1d subset of L1PA1 to derive the L1PA1 consensus because it is the most recently evolved version of L1PA1 (Boissinot, Chevret, and Furano 2000<$REFLINK> ). The alignment (see Materials and Methods) contains 282 nt substitutions and seven indels (table 2 ). Figure 1B shows that the five families have been succeeding each other and that a single lineage of L1 families best describes the evolution of the active lineage of L1.

Positive Selection on the Coiled Coil Domain of ORFI

The PLATO analysis identified a single region of ORFI that has undergone nucleotide substitutions at a statistically different rate than the complete coding sequence (Z = 9.65, P < 0.05, sliding window size = 5). Increasing the window size or modifying the parameters of the analysis identified the same approximate region and yielded similar Z values, all of which were statistically significant. This region starts 153 nt from the beginning of ORFI and is 217 nt long. It coincides almost perfectly with the coiled coil (C-C on fig. 1A ) encoding region of the protein-protein interaction domain of ORFI. All but one of the 29 substitutions that differentiate L1PA5 from L1PA3B in this region are nonsynonymous (fig. 1C and table 2 ). This fast rate of amino-acid replacement indicates that this region is either evolving under relaxed selection (i.e., neutrally) or under positive selection.

We distinguished between these possibilities by analyzing the nonsynonymous to synonymous rate ratio (ω, see Materials and Methods). The parameter ω was calculated independently for the coiled coil encoding region (from codon 51 to 148) and for the entire coding sequence (ORFI and ORFII), excluding the coiled coil domain (table 3 ). Because some of the pairwise comparisons in the coiled coil domain include very few if any synonymous substitutions, ML estimates of the parameter κ (=transition-transversion rate ratio for synonymous substitutions) were first estimated based on the complete coding sequence. Values of κ for the entire L1 range from 1.8 to 3.1 and several values within this range were incorporated into the ML model to calculate ω. These analyses gave congruent results (only values with κ = 2.5 are shown in table 3 ). Pairwise comparisons among L1PA5, L1PA4, and L1PA3B give values for ω significantly higher than 1, indicating that nonsynonymous mutations have been fixed at a faster rate than synonymous mutations (i.e., faster than if they had been neutral). Thus, by these criteria, the coiled coil domain of ORFI has evolved under positive selection during the evolution of L1PA5 to L1PA3B. Although higher than 1, values of ω are lower when L1PA5 or L1PA4 are compared with L1PA2 or L1PA1 because of purifying selection acting between the evolution of L1PA3B to L1PA1 (see later). Purifying selection increases the relative number of synonymous substitutions between the older (L1PA5 or L1PA4) families and the younger (L1PA2 or L1PA1) families. As the number of nonsynonymous mutations between the older and younger families has hardly changed, the value of ω derived from these comparisons is lower.

In contrast to the fast rate of amino acid replacement from L1PA5 to L1PA3B, the coiled coil domain has remained almost unchanged from L1PA3B to L1PA1. Indeed, the coiled coil domains of L1PA2 and L1PA1 are identical although they differ in other parts of the sequence (fig. 1C and table 3 ). The ratio, ω, is lower than 1 although not significantly so. It is difficult to determine if the conservation of recent L1 families is because of purifying selection, the absence of selection, or recombination that would have homogenized the sequence of different families. In any case, it seems that positive selection has not played a significant role in the evolution of the L1 families (i.e., L1PA2 and L1PA1) derived from the L1PA3B family. The alternation of a high (L1PA5 to L1PA3B) and a low (L1PA3B to L1PA1) amino acid replacement rate indicates that the nature of the selective pressure acting on the coiled coil domain has changed over time.

By comparison, the amino acid sequence outside the coiled coil domain has been always highly conserved (table 3 , above diagonal); in all comparisons ω is significantly lower than 1. This low rate of amino acid replacement indicates that strong purifying selection has been acting on most regions of the L1 proteins. In ORFII, sequence conservation is not limited to the endonuclease (EN) and reverse transcriptase (RT) encoding domains (fig. 1C ). The segments that separate EN, RT, and the 3′ terminal region of ORFII are also very conserved, suggesting that these regions are functionally important because they either encode for some yet to be described function or they play a role in the conformation of the ORFII protein.

Pattern of Amino Acid Replacements in the Coiled Coil Domain

Coiled coil structures are formed by the intertwining of two or more α-helical peptide chains that have a repeating arrangement of nonpolar side chains (reviewed in Lupas 1996<$REFLINK> ). Typically, domains that can form coiled coil structures consist of seven-residue repeats (heptads), with nonpolar or hydrophobic residues in the first (a) and fourth (d) positions of the heptad (fig. 2 ). The coiled coil domain of ORFIp ranges from amino acid 52 to 131 and consists of a first group of four or five heptads (depending on the family) separated from a group of six heptads by three amino acids (fig. 2 ). The COILS program indicates a 90%–100% probability that these heptads will adopt a coiled coil conformation (Lupas, Van Dyke, and Stock 1991<$REFLINK> ).

Because the probability that a sequence will form a coiled coil structure depends on the position of hydrophobic or nonpolar residues, change in the polarity of an amino acid can affect the conformation of the protein. Figure 2 shows the distribution of the amino acid replacements that occurred between L1PA5 and L1PA1. Conservative changes (indicated by a + in fig. 2 ) are substitutions between two polar or two nonpolar amino acids (following the classification in Li 1997<$REFLINK> , pp. 13–17). The replacement of nonpolar amino acids by C or Y (usually considered polar) at position (a) or (d) of the heptad is also considered conservative because these two amino acids are found preferentially at positions (a) or (d) of known coiled coils (Lupas, Van Dyke, and Stock 1991<$REFLINK> ). Out of the 28 amino acid replacements that differentiate L1PA5 from L1PA1, 26 are conservative and therefore not likely to affect the potential coiled coil conformation of the protein (fig. 2 ). Only the two amino acid substitutions, V83T and M96K, are not conservative with regard to polarity. Substitution M96K is at position (g) of heptad VI, which is not as critical to the coiled coil conformation as positions (a) and (d). On the other hand, V83T is a radical change at position (d) of heptad V and this change does disrupt heptad V in L1PA1 (fig. 3). Indeed, T is very rarely found at position (d) of any known coiled coil (Lupas, Van Dyke, and Stock 1991<$REFLINK> ) and substitution V83T could affect the coiled coil conformation of the protein. However, probabilities of forming coiled coil (as calculated by COILS, with a window of 21 or 28) are not significantly affected by any of the 28 amino acid changes, including these two radical changes.

Discussion

Our results demonstrate that the coiled coil domain of ORFI has been subjected to an episode of intense positive selection (L1PA5 → L1PA4 → L1PA3B) resulting in a high rate of amino acid replacement. Since then (L1PA3B → L1PA2 → L1PA1), the coiled coil domain has been highly conserved. Thus, either the strength or nature of the selective pressure on this region has changed over time. As the major amplification of L1PA2 occurred only in the African apes (subfamily Homininae, i.e., gorillas, chimpanzees, and humans; unpublished data), the change in selective pressure on ORFI occurred after the divergence between Asian (e.g., orangutan) and African apes. Interestingly, the same region of ORFI is hypervariable in murine rodents and galagos (prosimians) as indicated by a high rate of amino acid replacement, duplications, and deletions (Kolosha and Martin 1995<$REFLINK> ; Cabot et al. 1997<$REFLINK> ; Mayorov, Rogozin, and Adkison 1999<$REFLINK> ; Furano 2000<$REFLINK> ). However, whether these changes are the result of positive selection is yet to be determined (Mayorov, Rogozin, and Adkison 1999<$REFLINK> ; Furano 2000<$REFLINK> ).

Because many aspects of L1 biology are still unknown, we can only speculate about the possible causes of positive selection. As the 5′UTR (untranslated region) of L1 evolves at a very high rate, including its wholesale replacement (Adey et al. 1<$REFLINK> 994; Furano 2000<$REFLINK> ), adaptive changes in ORFI could be a response to changes in the 5′UTR. Although the number of base changes between the 5′UTR of L1PA4 and L1PA3B and between L1PA2 and L1PA1 are roughly the same, ORFI has evolved under positive selection between L1PA4 and L1PA3B but remained very conserved between L1PA2 and L1PA1. Thus, positive selection in ORFI is not correlated with the global rate of base substitution in the 5′UTR. Although changes in the amino acid sequence of the ORFI may be a response to particular sequence changes in the 5′UTR, the selective pressure on ORFI may also lie elsewhere.

Most of the genes for which positive selection has been documented are involved in interactions between the organism and its environment (see Yang and Bielawski 2000<$REFLINK> ). By analogy, we propose that positive selection in L1 may reflect an interaction between the L1 element (the organism) and the host (its environment). For instance, the rapid evolution of the coiled coil domain could have been driven by L1 adaptation to a host factor required by L1 for replication. Rapid evolution of the putative host factor might have occurred for a number of reasons, including avoidance of recruitment by L1. Alternatively, rapid evolution of the coiled coil domain could have resulted from the evasion by L1 of a host-encoded repressor of L1 replication. This would be similar to positive selection in pathogenic genes that evade a host's immune system (Zanotto et al. 1999<$REFLINK> ; Haydon et al. 2001<$REFLINK> ).

In both cases, the alternation between periods of positive and purifying selection on ORFI can be correlated with changes in L1 activity. In rodents (Pascale, Valle, and Furano 1990<$REFLINK> ) and primates (unpublished data), L1 activity (amplification) is episodic, and therefore its deleterious effect on the host changes over time. Possibly, very active (deleterious) families would induce a strong response by the host, leading to intense positive selection for both the host and the element. Conversely, families that generate just enough copies to persist in the genome, but not enough to cause serious damage, would probably be ignored by the host, and the action of positive selection would be very limited.

Figure 2 shows that positive selection in ORFI resulted in substitutions among amino acids that share similar physicochemical properties. Therefore, the effects of positive selection on the coiled coil domain have been limited by structural constraints, i.e., the ability to form a coiled coil structure. This suggests that the potential to form a coiled coil structure is an important functional feature of ORFIp. This conclusion is supported by the fact that, although the N-terminal one-third of ORFI shows no sequence homology among murine rodents (old world rats and mice), rabbits, galagos, and humans (Kolosha and Martin 1997<$REFLINK> ), all possess the potential to form coiled coil structures (data not shown; Martin, Li, and Weisz 2000<$REFLINK> ). The ability of ORFIp to form a coiled coil structure is also shared by nonmammalian L1-like elements, like the Xenopus Tx1L (cited in Pont-Kingdon et al. 1997<$REFLINK> ), the teleost Swimmer (Duvernell and Turner 1998<$REFLINK> ), and the bird CR1 elements (Haas et al. 1997<$REFLINK> , unpublished data). Coiled coils often mediate protein-protein interactions with themselves or other proteins. In mouse and human L1, the coiled coil domain mediates ORFIp binding to itself (Hohjoh and Singer 1996<$REFLINK> ; Martin, Li, and Weisz 2000<$REFLINK> ) but the possibility of interactions with other proteins has not been explored. The ORFIps of two divergent mouse L1 families (Tf and L1MdA) readily interact (Martin, Li, and Weisz 2000<$REFLINK> ) suggesting that conservative changes in the coiled coil domain would not significantly affect ORFIp interaction with itself. Thus interactions between ORFIp and other proteins could well be responsible for positive selection on ORFI.

Thomas Eickbush, Reviewing Editor

Keywords: L1/LINE-1 human retrotransposon positive selection

Address for correspondence and reprints: Anthony V. Furano, NIH, Building 8, Room 203, 8 Center DR MSC 0830, Bethesda, Maryland 20892-0830. avf@helix.nih.gov .

Table 1 List of the Full-length L1 Elements. Coordinates of the Elements Were Obtained Using RepeatMasker, Version 3.0 (A. F. A Smit and P. Green, http://ftp.genome.washington.edu/RM/RepeatMasker.html) and Correspond Approximately to the Beginning and the End of the Elements

Table 1 Continued

Table 2 List of the Mutations Differentiating the 5 L1 Families. C 123 A Means that an Ancestral C at Position 123 Mutated to an A. Syn means a Synonymous mutation; id means indel. The Sequence Coordinates are from the Alignment of the Consensus Sequences of the L1PA5-L1PA1 Families (see Materials and Methods)

Table 2 Continued

Table 3 Maximum-likelihood Estimates of ω, dN, and dS for the C-C Domain (below diagonal) and for the Non–C-C Part of L1 (above diagonal)

Fig. 1.—Differences among L1 families. (A) Structure of a human full-length element. A typical full-length element has a 5′ untranslated region (5′UTR), two open-reading frames (ORFI and ORFII), and a 3′UTR (for a review, see Furano 2000<$REFLINK> ). The 5′UTR has a regulatory function. ORFI encodes a 40-kDa protein with an N-terminal domain involved in protein-protein interaction (P-P) and a C-terminal domain with nucleic-acid binding and chaperone activity (N-P) (Martin, Li, and Weisz 2000<$REFLINK> ; Martin and Bushman 2001<$REFLINK> ). The N-terminal domain contains a potential coiled coil domain (C-C) and a leucine-zipper domain (Z). The C-terminal domain contains seven (1–7 in fig. 1 ) regions that are highly conserved among human, rat, mouse, and rabbit L1 (Hohjoh and Singer 1996<$REFLINK> ). ORFII encodes endonuclease (EN) and reverse transcriptase (RT) domains and a Cys-rich region (C). The RT domain contains 10 (0–9) regions that are highly conserved among most non-LTR retrotransposons (Malik, Burke, and Eickbush 1999<$REFLINK> ). In addition, the transcript of ORFII contains two regions (A and B) to which ORFIp is thought to bind (Hohjoh and Singer 1997<$REFLINK> ). The 3′UTR contains a conserved G-rich motif (G) (Howell and Usdin 1997<$REFLINK> ). (B) Maximum-likelihood tree built on the consensus sequences of the 3′UTR of six L1 families. The tree is rooted with L1PA6. The numbers above the nodes indicate the percentages of the time the labeled node was present in 1,000 bootstrap replicates of the data. (C) Nucleotide substitutions among L1 families, from L1PA5 to L1PA1. Each bar represents a substitution in the indicated pairwise comparison. Bars above the ORFs represent nonsynonymous substitutions, bars below the ORFs represent synonymous substitutions. Short bars represent substitutions located in noncoding regions

Fig. 1.—Differences among L1 families. (A) Structure of a human full-length element. A typical full-length element has a 5′ untranslated region (5′UTR), two open-reading frames (ORFI and ORFII), and a 3′UTR (for a review, see Furano 2000<$REFLINK> ). The 5′UTR has a regulatory function. ORFI encodes a 40-kDa protein with an N-terminal domain involved in protein-protein interaction (P-P) and a C-terminal domain with nucleic-acid binding and chaperone activity (N-P) (Martin, Li, and Weisz 2000<$REFLINK> ; Martin and Bushman 2001<$REFLINK> ). The N-terminal domain contains a potential coiled coil domain (C-C) and a leucine-zipper domain (Z). The C-terminal domain contains seven (1–7 in fig. 1 ) regions that are highly conserved among human, rat, mouse, and rabbit L1 (Hohjoh and Singer 1996<$REFLINK> ). ORFII encodes endonuclease (EN) and reverse transcriptase (RT) domains and a Cys-rich region (C). The RT domain contains 10 (0–9) regions that are highly conserved among most non-LTR retrotransposons (Malik, Burke, and Eickbush 1999<$REFLINK> ). In addition, the transcript of ORFII contains two regions (A and B) to which ORFIp is thought to bind (Hohjoh and Singer 1997<$REFLINK> ). The 3′UTR contains a conserved G-rich motif (G) (Howell and Usdin 1997<$REFLINK> ). (B) Maximum-likelihood tree built on the consensus sequences of the 3′UTR of six L1 families. The tree is rooted with L1PA6. The numbers above the nodes indicate the percentages of the time the labeled node was present in 1,000 bootstrap replicates of the data. (C) Nucleotide substitutions among L1 families, from L1PA5 to L1PA1. Each bar represents a substitution in the indicated pairwise comparison. Bars above the ORFs represent nonsynonymous substitutions, bars below the ORFs represent synonymous substitutions. Short bars represent substitutions located in noncoding regions

Fig. 2.—Structure of the coiled coil domain of L1PA5 and L1PA1. Heptads are numbered I to XI and amino acid positions within each heptads are indicated a to g. Only amino acids at position a and d (in bold) and amino acids that differ between L1PA5 and L1PA1 are shown. Conservative changes for polarity (see text) are represented by + and radical changes by −

Fig. 2.—Structure of the coiled coil domain of L1PA5 and L1PA1. Heptads are numbered I to XI and amino acid positions within each heptads are indicated a to g. Only amino acids at position a and d (in bold) and amino acids that differ between L1PA5 and L1PA1 are shown. Conservative changes for polarity (see text) are represented by + and radical changes by −

References

Adey N. B., S. A. Schichman, D. K. Graham, S. N. Peterson, M. H. Edgell, C. A. I. Hutchison,
1994
Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences
Mol. Biol. Evol
 
11
:
778
-789
Boissinot S., P. Chevret, A. V. Furano,
2000
L1 (LINE-1) retrotransposon evolution and amplification in recent human history
Mol. Biol. Evol
 
17
:
915
-928
Boissinot S., A. Entezam, A. V. Furano,
2001
Selection against deleterious LINE-1-containing loci in the human lineage
Mol. Biol. Evol
 
18
:
926
-935
Cabot E. L., B. Angeletti, K. Usdin, A. V. Furano,
1997
Rapid evolution of a young L1 (LINE-1) clade in recently speciated Rattus taxa
J. Mol. Evol
 
45
:
412
-423
Duvernell D. D., B. J. Turner,
1998
Swimmer 1, a new low-copy-number LINE family in teleost genomes with sequence similarity to mammalian L1
Mol. Biol. Evol
 
15
:
1791
-1793
Furano A. V.,
2000
The Biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons
Prog. Nucleic Acid Res. Mol. Biol
 
64
:
255
-294
Goldman N., Z. Yang,
1994
A codon-based model of nucleotide substitution for protein-coding DNA sequences
Mol. Biol. Evol
 
11
:
725
-736
Grassly N. C., E. C. Holmes,
1997
A likelihood method for the detection of selection and recombination using sequence data
Mol. Biol. Evol
 
14
:
239
-247
Haas N. B., J. M. Grabowski, A. B. Sivitz, J. B. Burch,
1997
Chicken repeat 1 (CR1) elements, which define an ancient family of vertebrate non-LTR retrotransposons, contain two closely spaced open reading frames
Gene
 
197
:
305
-309
Hardies S. C., S. L. Martin, C. F. Voliva, C. A. Hutchison III,, M. H. Edgell,
1986
An analysis of replacement and synonymous changes in the rodent L1 repeat family
Mol. Biol. Evol
 
3
:
109
-125
Hata K., Y. Sakaki,
1997
Identification of critical CpG sites for repression of L1 transcription by DNA methylation
Gene
 
189
:
227
-234
Haydon D. T., A. D. Bastos, N. J. Knowles, A. R. Samuel,
2001
Evidence for positive selection in foot-and-mouth disease virus capsid genes from field isolates
Genetics
 
157
:
7
-15
Hohjoh H., M. Singer,
1997
Sequence specific single-strand RNA-binding protein encoded by the human LINE-1 retrotransposon
EMBO J
 
16
:
6034
-6043
Hohjoh H., M. F. Singer,
1996
Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA
EMBO J
 
15
:
630
-639
Howell R., K. P. Usdin,
1997
The ability to form intrastrand tetraplexes is an evolutionarily conserved feature of the 3′ end of L1 retrotransposons
Mol. Biol. Evol
 
14
:
144
-155
Jones P. A.,
1999
The DNA methylation paradox
Trends Genet
 
15
:
34
-37
Kazazian H. H. Jr.,, J. V. Moran,
1998
The impact of L1 retrotransposons on the human genome
Nat. Genet
 
19
:
19
-24
Kolosha V. O., S. L. Martin,
1995
Polymorphic sequences encoding the first open reading frame protein from LINE-1 ribonucleoprotein particles
J. Biol. Chem
 
270
:
2868
-2873
———.
1997
In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition
Proc. Natl. Acad. Sci. USA
 
94
:
10155
-10160
Lander E. S., L. M. Linton, B. Birren, et al. (100 co-authors)
2001
Initial sequencing and analysis of the human genome
Nature
 
409
:
860
-921
Li W.-H.,
1997
Molecular evolution Sinauer Associates, Sunderland, Mass.
Lupas A.,
1996
Coiled coils: new structures and new functions
Trends Biochem. Sci
 
21
:
375
-382
Lupas A., M. Van Dyke, J. Stock,
1991
Predicting coiled coils from protein sequences
Science
 
252
:
1162
-1164
Malik H. S., W. D. Burke, T. H. Eickbush,
1999
The age and evolution of non-LTR retrotransposable elements
Mol. Biol. Evol
 
16
:
793
-805
Martin S. L., F. D. Bushman,
2001
Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon
Mol. Cell. Biol
 
21
:
467
-475
Martin S. L., J. Li, J. A. Weisz,
2000
Deletion analysis defines distinct functional domains for protein-protein and nucleic acid interactions in the ORF1 protein of mouse LINE-1
J. Mol. Biol
 
304
:
11
-20
Mayorov V. I., I. B. Rogozin, L. R. Adkison,
1999
Characterization of several LINE-1 elements in Microtus kirgisorum
Mamm. Genome
 
10
:
724
-729
Nur I., E. Pascale, A. V. Furano,
1988
The left end of rat L1 (L1Rn, long interspersed repeated) DNA which is a CpG island can function as a promoter
Nucleic Acids Res
 
16
:
9233
-9251
Pascale E., C. Liu, E. Valle, K. Usdin, A. V. Furano,
1993
The evolution of long interspersed repeated DNA (L1, LINE 1) as revealed by the analysis of an ancient rodent L1 DNA family
J. Mol. Evol
 
36
:
9
-20
Pascale E., E. Valle, A. V. Furano,
1990
Amplification of an ancestral mammalian L1 family of long interspersed repeated DNA occurred just before the murine radiation
Proc. Natl. Acad. Sci. USA
 
87
:
9481
-9485
Pont-Kingdon G., E. Chi, S. Christensen, D. Carroll,
1997
Ribonucleoprotein formation by the ORF1 protein of the non-LTR retrotransposon Tx1L in Xenopus oocytes
Nucleic Acids Res
 
25
:
3088
-3094
Smit A. F. A.,
1999
Interspersed repeats and other mementos of transposable elements in mammalian genomes
Curr. Opin. Genet. Dev
 
9
:
657
-663
Smit A. F. A., G. Tóth, A. D. Riggs, J. Jurka,
1995
Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences
J. Mol. Biol
 
246
:
401
-417
Thayer R. E., M. F. Singer, T. G. Fanning,
1993
Undermethylation of specific LINE-1 sequences in human cells producing a LINE-1-encoded protein
Gene
 
133
:
273
-277
Voliva C. F., C. L. Jahn, M. B. Comer, C. A. Hutchison III,, M. H. Edgell,
1983
The L1Md long interspersed repeat family in the mouse: almost all examples are truncated at one end
Nucleic Acids Res
 
11
:
8847
-8859
Woodcock D. M., C. B. Lawler, M. E. Linsenmeyer, J. P. Doherty, W. D. Warren,
1997
Asymmetric methylation in the hypermethylated CpG promoter region of the human L1 retrotransposon
J. Biol. Chem
 
272
:
7810
-7816
Yang Z.,
1998
Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution
Mol. Biol. Evol
 
15
:
568
-573
———.
2000
PAML (phylogenetic analysis by maximum-likelihood) Version 3.0 University College, London
Yang Z., J. P. Bielawski,
2000
Statistical methods for detecting molecular adaptation
Trends Ecol. Evol
 
15
:
496
-503
Zanotto A. M. d. A., E. G. Kallas, R. F. de Souza, E. C. Holmes,
1999
Genealogical evidence for positive selection in the nef gene of HIV-1
Genetics
 
153
:
1077
-1089