Transposable elements have been shaping the genome throughout evolution, contributing to the creation of new genes and sophisticated regulatory network systems. Today, most of genomes (animals and plants) allow the expression and accommodate transposition of a few transposon families. The potential genetic impact of this small fraction of mobile elements should not be underestimated. Although new insertions that happen in germ cells are likely to be passed to the next generation, mobilization in pluripotent embryonic stem cells or in somatic cells may contribute to the differences observed in genetic makeup and epigenetic gene regulation during development at the cellular level. The fact that these elements are still active, generating innovative ways to alter gene expression and genomic structure, suggests that the cellular genome is not static or deterministic but rather dynamic. In this short review, we collect a set of recent observations that point to a new appreciation of transposable elements as a source of genetic variation.
Long dismissed as selfish or ‘junk’ DNA, retroelements are frequently thought to be mere intracellular parasites from our distant evolutionary past. Transposons and transposon-derived repetitive elements are responsible for more than 40% of the human genome. However, only a small proportion of these retroelements, < 0.05%, remains able to transpose (Fig. 1 ). Recent evidence indicates that among the non-long terminal repeat (LTR) retrotransposonies, only some long interspersed nucleotide elements-1 (LINE-1, or L1) and short interspersed element (SINE) ( Alu and SVA) subfamilies continue to be mobile in mammals today ( 1 , 2 ).
LINE-1 comprises ∼20% of mammalian genomes ( 3–5 ). On the basis of reverse transcriptase (RT) phylogeny, L1 elements are most closely related to the group II introns of mitochondria and eubacteria ( 6 , 7 ). These studies revealed that the RT enzyme is extremely old and that retroelements can be viewed as relics or molecular fossils of the first primitive replication systems in the progenote. The origin of retroelements possibly traces back to the conversion of RNA-based systems, the RNA World, to modern DNA-based systems. Current models suggest that mobile introns of eubacteria were transmitted to eukaryotes during the initial fusion of the eubacterial and archaebacterial genomes or during the symbiosis that gave rise to the mitochondria, generating the modern-day spliceosomal introns ( 8 ). Further acquisition of an endonuclease enzyme and a promoter sequence certainly represented important steps in the evolution of L1 retrotransposons, providing autonomy for L1s to insert into many locations throughout the genome. Nevertheless, the L1 machinery is frequently hijacked by other non-autonomous elements.
SINEs are known to occur throughout the genomes of eukaryotes. They originate from retrotransposition events of small RNAs such as 7SL RNA, tRNA or their derivatives featuring internal RNA polymerase III promoters ( 9–12 ). Specific types of SINE, called Alu elements, are the most abundant repetitive elements in the human genome; they emerged 65 million years ago and amplified throughout the human genome by retrotransposition to reach the present number of more than one million copies. Several pieces of evidence have demonstrated that these elements modulate gene expression at the post-transcriptional level in at least three independent ways. They have been shown to be involved in alternative splicing, RNA editing and translation regulation ( 10 ). In normal growth conditions, Alu RNAs are present at very low levels in the cytosol (10 3 –10 4 molecules per cell), but numerous stressful conditions, such as viral infection, cycloheximide exposure or heat shock, can transiently increase their level of expression, which rapidly decreases upon recovery ( 13 , 14 ). This precisely controlled regulation raised the attractive possibility that Alu RNAs might serve a specific function in cell metabolism, which is required during stress conditions. This hypothesis was supported by two independent studies showing that an overexpressed Alu RNA was able to stimulate translation of a co-transfected reporter gene in mammalian cells ( 13 , 14 ). These data suggested for the first time that transcribed Alu elements might have a particular function in translation regulation.
SVA is a hominid-specific element that was derived from three other repeats (SINE-R, VNTR and Alu ). It is the youngest retrotransposon family in primates and is also mobilized by L1 ( 15 ). Recent studies have demonstrated that this element is capable of transducing genomic sequences upon its mobilization and may also act as a promoter ( 16 ). The relationship between Alu and SVA remains unclear, but it has been proposed that Alu s activate the SVA transcript for entry into the transmechanism of L1 via an SVA homologous portion to Alu ( 2 ).
OPOSSUM GENOME SEQUENCE SURPRISES
The recent publication of the first marsupial genome to be sequenced (short-tailed opossum or Monodelphis domestica ) called attention not so much to the protein coding regions of its genome as to the non-coding portion. Interspersed elements cover ∼52% of the opossum genome, compared with ∼44% for human and ∼38% for mouse ( 17 , 18 ). That increased amount of transposon-derived sequences were probably generated by the combined activity of three different non-LTR transposons (L1; CR1, a more common repetitive element in birds and reptiles; RTE, first described in C . elegans ), along with a suggested reduced rate of genome recombination, promoting the fixation of the transposed locus ( 17 ).
A comparison of opossum and eutherian (placental mammals) genomes revealed a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. Approximately 20% of eutherian conserved non-coding elements are recent inventions that postdate the divergence of eutheria and metatheria (marsupials), most of which arose from sequences inserted by transposable elements. Such data point to transposons as a major creative force in the evolution of mammalian gene regulation ( 17 ).
THE IMPACT OF MOBILE ELEMENTS IN THE GENOME
The proliferation and evolution of retrotransposons have had multiple impacts on the genome, which are only gradually being discovered. Classically, numerous reports have shown that retrotransposition can destabilize the genome, shaping genomic landscapes by insertional mutagenesis, deletions and gene rearrangements. L1s and Alu s may also introduce intragenic polyadenylation signals, influencing evolutionary pressures ( 19 ). In addition, transposable elements in high copy number can provide a substrate for illegitimate homologous recombination (also called ectopic recombination), causing rearrangements that may be deleterious, advantageous or null ( 20 , 21 ). The genomic deletions generated by recombination between Alu elements in the human lineage since the human–chimpanzee divergence was recently estimated to have occurred approximately 6 million years ago ( 22 ). The authors identified 492 human-specific deletions (∼400 kb) attributable to Alu -derived ectopic recombination. The majority of the deletions (295 of 492) coincided with known or predicted genes, suggesting that a substantial portion of the genomic differences between humans and chimpanzees was caused by Alu recombination ( 22 ). This work indicates that this profound recombination process has had a much higher impact in the genome than was inferred from previously identified isolated events.
Transposons have also been notably involved in the emergence of new genes during mammalian evolution, mainly by generating pseudogenes. The estimated number of pseudogenes in the human genome is approximately 8000 and the vast majority has been generated by L1 retrotransposition activity ( 23–25 ). Also known as processed pseudogenes or retrogenes (usually lacking intronic regions), they typically contain a 3' poly-A tract and short direct repeats of varying lengths flanking the two ends. These hallmarks can be used to distinguish L1-derived pseudogenes from other types of pseudogenes generated by tandem duplication or unequal crossing-over ( 26 , 27 ).
Extensive retrotransposition activity has been proposed to cause the presence in the genome of a vast number of pseudogenes derived from highly expressed embryonic stem (ES) cells and also germline-specific genes ( 25 , 28 , 29 ). Two distinct observations could explain such bias. First, L1 intermediates (mRNA and protein) have been shown to be increased in those cells ( 30 ). Secondly, retrotransposons that emerge in either ES or germline cells have the advantage of becoming bona fide structures to be transmitted to the next generation ( 29 ).
Transcription factors that are essential for the maintenance of ES pluripotency, such as Oct4, Nanog and Stella, have been said to generate more pseudogenes than non-ES cell-specific genes ( 29 ). Likewise, the expression of at least 10 retroposons was detected only in testis-derived tissue, including primary spermatocytes. The fact that those pseudogenes are conserved in mouse and human genomes and the evidence that some of them could actually be transcribed ( 31 ) suggest a functional and evolutionary relevance. Moreover, a potential ‘pseudogene-signature’ of retrotransposition activity for ES and germline cells can be envisioned.
Using a database developed for human-processed pseudogenes ( www.pseudogene.org ), combined with the decay rates for human mRNAs, Pavlicek et al . ( 26 ) proposed that not only germline expression but also RNA stability, translational competence and localization (in free ribosomes outside the endoplasmic reticulum) likely contributed to retrotransposition of processed pseudogenes. Moreover, similar to embryos and oocytes, ES cells may also lack interferon response to long, double-stranded RNA, allowing the expression of long reverse-transcribed mRNAs ( 32 , 33 ). The combination of these features may explain why few genes are actually capable of giving rise to many pseudogenes and ∼90% of the transcribed genome is not ( 23 , 24 ).
SINEs can also generate novel insertions when hijacking the L1 RT enzyme. It was recently shown that the entire AMAC gene was duplicated three times in the human genome through an SVA-mediated transduction event, creating a hybrid SINE-pseudogene ( 16 ). Surprisingly, two copies of the retroposed AMAC gene can be actively transcribed in different human tissues. The authors proposed that the LTR region from the SVA element could be acting as a promoter for the newly integrated gene duplicate, proposing an explanation for how processed pseudogenes are expressed ( 16 ).
Another intriguing role attributed to transposable elements is a process termed ‘exaptation’, in which relics of transposable elements have acquired a regulatory function, providing the source of some of the most conserved vertebrate-specific genomic sequences (Fig. 2 ). Ultra conserved SINE sequences have recently been proposed to act as distal cis -regulatory elements ( 12 ). Also, there is strong evidence that a SINE-derived conserved region situated 0.5 million bases from the neuro-developmental gene ISL1 can act as an in vivo enhancer for a reporter gene. Moreover, the reporter expression recapitulates the expression pattern of islet1 ( 34 ).
MECHANISMS FOR CONTROLLING RETROTRANSPOSITION ACTIVITY
Given the deleterious nature of many retrotransposon insertions and rearrangements, evolution has generated multiple pathways to inhibit retrotransposition; these pathways are now being elucidated. Recent evidence has linked miRNA and siRNA pathways with retrotransposon inhibition. In Drosophila , a novel, repeat-associated small interfering RNAs (rasiRNAs) homologous to the retrotransposons' antisense strand have been shown to prevent the detrimental retrotransposition of the gypsy endogenous retrovirus ( 35 , 36 ). This pathway is Dicer-independent and functions through Piwi , a member of the Argonaute family, which acts as splicer for the rasiRNAs, thereby inhibiting retrotransposition in germline cells ( 37 ). In addition, recent reports have indicated that AGO3, another member of the Argonaute family, associates and processes sense strand rasiRNAs, leading to a model for rasiRNA 5′ end formation that involves formation of a 5′ terminus that is guided by rasiRNA transcripts from the opposite strand and the activity of PIWI ( 38 ). Mammalian PIWI proteins (MIWI and MILI) have also been identified and demonstrated to associate with PIWI-interacting RNAs (piRNAs) to regulate spermatogenesis ( 39–41 ). Moreover, Mili-null mice express lower amounts of L1 and intracisternal A particle (IAP) mRNA and lose the DNA methylation of L1 elements. On the basis of this evidence, the authors ( 41 ) propose that Mili is regulating L1 and IAP elements in mouse.
In plants, DNA methylation can be induced by double-stranded RNA through the RNAi pathway, a response known as RNA-directed DNA methylation. This process requires a specialized set of RNAi components, including AGO4. This member of the Argonaute family binds to small RNAs originated from transposable and repetitive elements, cleaving target RNA transcripts ( 42 ). The model suggests that AGO4 can function at target loci through two distinct and separable mechanisms. First, AGO4 can recruit components that signal DNA methylation in a manner independent of its catalytic activity. Secondly, AGO4 catalytic activity can be crucial for the generation of secondary siRNAs that reinforce its repressive effects. A similar mechanism has not yet been found for silencing mammalian transposable elements.
In humans, evidence that the RNAi pathway can silence L1 retrotransposon activity was recently found in cultured cell lines ( 43 , 44 ). The double-stranded RNA generated by the sense and antisense activity of the L1 5′-UTR promoter region is subject to processing, generating siRNAs that may induce neighbor L1 elements to epigenetic regulation ( 44 ). Recent research has also indicated that miRNAs can act to silence retrotransposon elements. Unlike the rasiRNAs, which are processed from long, double-stranded RNA precursors, these miRNAs arise from conventional hairpins. Four miRNAs have been shown to target MIR/LINE-2 elements ( 45 ), and 30 miRNAs, a remarkably large number, have exhibited seed matches to Alu elements ( 46 ).
Other cellular inhibitors of retrotransposon action include the APOBEC family of proteins. Originally shown to have intrinsic antiretroviral ability against HIV and other retroviruses, several members of the APOBEC family have since been shown to inhibit retrotransposon action ( 47–50 ). In humans, APOBEC 3A inhibits L1 retrotransposition as much as 85% ( 47 ), whereas APOBEC 3B, 3C and 3F all decrease the rate of L1 retrotransposition by ∼75% ( 47 , 51 ); in contrast, APOBEC 3D, 3G and 3H have little effect on L1 activity ( 47 , 51 ). The effect on L1 seems to be independent of APOBEC deaminase activity and may occur through a novel but still unknown mechanism ( 51 , 52 ). APOBEC 3G has been shown to inhibit the activity of Alu elements ( 53 ) by sequestering Alu RNAs in cytoplasmic high-molecular-weight complexes away from L1 elements.
Rather than viewing retrotransposons as only a destabilizing genomic entity, recent reports indicate that the co-evolution of retrotransposons with their host genomes has led to the incorporation and integration of retrotransposons into complex genomic processes. For instance, given that Alu is the most prominent human repeat and is found in > 5% of all human 3′-UTRs ( 54 ), new evidence for miRNA targeting of Alu sequences suggests the intriguing prospect (or idea) of Alu participation in gene expression regulation on a global scale. Similarly, the relationship between miRNA and retrotransposon is complex, as recent genomic analysis provides evidence that, in addition to being miRNA targets, Alu elements can drive miRNA expression in the chromosome 19 miRNA cluster through RNA polymerase III ( 55 ).
Recent evidence from Morrish et al . ( 56 ) suggests that L1 element retrotransposons are ancestral mechanisms for telomere repair. The authors utilized cell lines that were deficient in non-homologous end-joining and had telomere capping defects induced through either the loss of DNA protein kinase catalytic subunit activity or through overexpression of dominant negative TRF2. They found that L1s containing a disabled endonuclease domain (ENi) could use the 3′OH overhang present at a dysfunctional telomere as a substrate to initiate retrotransposition, and they demonstrated orientation-specific retrotransposition events at the terminal telomere repeat in ∼30% of ENi events. This seemingly convoluted appropriation could represent a mechanism for RNA-mediated DNA repair associated with non-LTR retrotransposons before the development of the endonuclease domain, illustrating distinct similarities between ENi retrotransposition and the action of telomerase.
There is now abundant evidence for the role of retrotransposons in modulating post-transcriptional gene expression through influences on alternative splicing, RNA editing and translation regulation. Retrotransposons contain an internal polymerase II promoter, allowing them to influence transcription of neighboring genes and acting as alternative promoters ( 9 ). In addition, L1 and Alu contain numerous internal splice donor and splice acceptor sites that act to influence splicing patterns and expression of many genomic ESTs ( 57 ). In a computational comparison of eight vertebrate species, the majority of new exons were found to contain repetitive sequences, especially Alu s ( 58 ), suggesting that only very few mutations are necessary to convert a repeat to a novel exon. There is also evidence to indicate that repeat elements, especially Alu s, are abundant targets of adenosine-to-inosine RNA editing ( 59 ). Although the consequences of this editing are as yet unknown, it may act as a mechanism for introducing new intronic splice sites and thereby affecting new exon formation. Lev-Maor et al . ( 60 ) have shown that the primate-specific Alu -exon formation depends on RNA editing for exonization in a tissue-specific fashion, and only a very few mutations are needed to exonize an Alu . Retrotransposons have also recently been shown to contain transcription factor-binding sites, and most binding sites are for factors known to be early markers of development ( 61 ). Lastly, new research implicates retrotransposons in protein translation, specifically Alu elements acting to increase translation initiation, and in complex with SRP9/14 protein, they can inhibit translation ( 10 ).
UNCOVERING SOMATIC RETROTRANSPOSITION
The magnitude of endogenous retrotransposition in somatic cells is still unknown, and retrotransposons are often assumed to be active only in germline or early ES cells. This assumption is mainly based on the silencing of transposon expression by DNA methylation in somatic cells. However, methylation can be regulated, especially during the differentiation process in several tissues. Also, the fidelity of DNA methylation is not perfectly reproduced after each cell division, allowing a window of opportunity for retrotransposition to happen. Moreover, non-phenotypic integrations are likely to be overlooked, causing a detection bias, where only visible mutants (usually leading to diseases) are scored.
The notion that somatic cells are free of new retrotransposition events is slowly changing. The development of better detection tools, such as synthetic L1 elements carrying reporter genes, is revealing the likelihood of retrotransposition in several somatic tissues ( 62 , 63 ). If the behavior of artificial L1s used in these studies is similar to that of endogenous elements, each individual can be considered to be a living library of de novo insertions, allowing stochastic gene expression in every single cell of the body. Of interest, we recently showed that L1 retrotransposition might have an insertional preference for neuronal genes when active in progenitor cells during the neuronal differentiation process ( 64 ). Newly generated L1 insertions can affect gene expression and influence neuronal fate. Such an observation indicates that neurons are a genetic mosaic for L1 content and that phenomenon may contribute to neuronal diversity ( 65 ). It remains to be determined the actual amount of endogenous mobilization, if such genetic mosaicism can influence neuronal networks and, as a result, have a functional consequence on behavior ( 66 ).
Following a detailed characterization of chimpanzee-specific L1 subfamily diversity and a comparison with their human-specific counterparts, it has been reported that L1 elements have experienced different evolutionary fates in humans and chimpanzees within the past approximately 6 million years ( 67 ). Although the species-specific L1 copy numbers are on the same order in both species (1200–2000 copies), the number of retrotransposition-competent elements appears to be much higher in the human genome than in the chimpanzee genome ( 67 ).
Mobile elements are highly polymorphic in human population ( 68–70 ). Taking advantage of that characteristic, Witherspoon et al . ( 71 ) used L1 and Alu insertion polymorphisms to analyze human population structure. After genotyping 75 recent, polymorphic L1 insertions in 317 individuals from 21 populations in sub-Saharan Africa, East Asia, Europe and the Indian subcontinent, they obtained congruent results that support the recent African origin model of human ancestry. Collectively, L1 insertion polymorphisms represent useful markers for population diversity and evolutionary studies. Also, as shown for Alu elements, L1 insertion polymorphisms have potential use in forensics ( 72 ).
Such polymorphisms are probably generated in germ cells or during early embryonic stages. In support of this assertion, we recently showed that human ES cells (HESC) express endogenous L1 elements and can accommodate L1 retrotransposition in vitro . The resultant retrotransposition events can occur into genes and can result in the concomitant deletion of genomic DNA at the target site. These data suggest that L1 retrotransposition events are likely to occur during early stages of human development, contributing to the genome fluidity and variability of HESC ( 30 ). In fact, an endogenous L1 retrotransposition in the CHM gene of a patient with choroideremia, an X-linked progressive eye disease, revealed that the insertion occurred very early during the embryonic development of the patient's mother ( 73 ).
It is estimated that only 100 or so L1 copies in a given genome are capable of retrotransposition; roughly 10% of these active elements are classified as ‘hot’ or highly active in an artificial culture system ( 74 ), probably due to allele-specific differences in retrotransposition capability ( 75 ). Interestingly, L1 activity seems to vary depending on the population analyzed, indicating that there is substantial individual variation in retrotransposition capability ( 75 ). Of course, the ability of individual L1 elements to retrotranspose is related to the chromatin context in which the element is located in the genome, independent of being a ‘hot’ element in culture. The different levels of retrotransposition could be dependent not only on the population, but also on the epigenetic regulation that the full-length element is subjected to at a given time. We predict that such variation may be directly or indirectly able to influence individual personalities and predisposition to certain diseases.
CONCLUDING REMARKS AND FUTURE DIRECTIONS
It has been more than 20 years since autonomous transposable elements were suggested to be selfish sequences, parasites in the genome that increased the amount of ‘junk’ DNA in the different living organisms ( 76 , 77 ). In the interim, considerable evidence suggesting how transposable elements can shape the genome through evolution has challenged this idea. As more and more genomes are being sequenced, our understanding of the impact of transposable elements in the genome increases. The literature holds several examples that illustrate how transposons can serve as a reservoir of genetic innovation. Such innovation includes the creation of new coding or non-coding genes, with beneficial functions to the cells and/or modifying regulatory cis elements that may affect the balance of bound and free regulatory factors. Moreover, it is becoming clear, due to mechanistic similarities, that the epigenetic regulation of certain genes is derived from the defense mechanism against the activity of ancestral transposable elements.
Most of these modifications must have happened in germ cells or progenitor germ cells to be passed to the next generation, and only then be shaped by evolution. However, a completely new paradigm may be anticipated. ES cells are the perfect incubator for the transposable genetic toolbox to act, contributing to differences in gene regulation in a tissue-specific manner (Fig. 3 ). Furthermore, the realization that active transposons are mobile in somatic tissues brings a new dimension to the understanding of individual development and challenges the idea of a single and static genome per individual. Organisms are likely genetic mosaics influenced by environmental conditions to elicit de novo insertions from transposable elements. It remains to be seen how such genetic mosaicism contributes to individual differences, from cognition to disease predisposition.
A.R.M. is supported by the Rett Syndrome Research Foundation; M.C.N.M. is supported by the George E. Hewitt Foundation for Medical Research; F.H.G. is supported by the Lookout Fund and the National Institutes of Health: National Institute on Aging, National Institute of Neurological Disease and Stroke, the Lookout Fund and the Picower Foundation. The authors would like to thank M.L. Gage for editorial comments. We regret that, owing to space limitations, we have been unable to include all the recent exciting literature about transposable elements in the different organisms.
Conflict of Interest statement . None declared.