Discrete subcellular partitioning of human retrotransposon RNAs despite a common mechanism of genome insertion

Despite the immense signiﬁcance retrotransposons have had for genome evolution much about their biology is unknown, including the processes of forming their ribonucleoprotein (RNP) particles and transporting them about the cell. Suppression of retrotransposon expression, together with the presence of retrotransposon sequence within numerous mRNAs, makes tracking endogenous L1 RNP particles in cells problematic. We overcame these difﬁculties by assaying in living and ﬁxed cells tagged-RNPs generated from constructs expressing retrotransposition-competent L1s. In this way, we demonstrate for the ﬁrst time the subcellular colocalization of L1 RNA and proteins ORF1p and ORF2p, and show their targeting together to cytoplasmic foci. Foci are often associated with markers of cytoplasmic stress granules. Furthermore, mutation analyses reveal that ORF1p can direct L1 RNP distribution within the cell. We also assayed RNA localization of the nonautonomous retrotransposons Alu and SVA. Despite a requirement for the L1 integration machinery, each manifests unique features of subcellular RNA distribution. In nuclei Alu RNA forms small round foci partially associated with marker proteins for coiled bodies, suborganelles involved in the processing of non-coding RNAs. SVA RNA patterning is distinctive, being cytoplasmic but without prominent foci and concentrated in large nuclear aggregates that often ring nucleoli. Such variability predicts signiﬁcant differences in the life cycles of these elements.


INTRODUCTION
Until 40 years ago a basic tenet of biology was the unidirectional flow of information from DNA to RNA and thence to production of protein (1). This notion seems quaint now that we understand that almost 40% of human DNA is a consequence of reverse transcriptases copying RNA to cDNA, which is thence inserted back into the genome. The master element responsible for the majority of these mobile DNA insertions is the LINE-1 (L1) retrotransposon. L1s alone comprise 17% of the genome, although most of their 500,000 copies are mutated, truncated, rearranged and otherwise incapable of further retrotransposition. Nevertheless, at least 100 potentially retrotransposition-competent L1s are estimated to reside in the average diploid genome (2). L1s have also been responsible for genomic insertion of approximately 8000 human processed pseudogenes and over a million SINEs, notably Alus and SVAs (3)(4)(5). There are 65 known human disease-causing insertions of L1s, Alus and SVAs (reviewed in 6).
Despite the immense significance retrotransposons have had for genome evolution, much about their biology is unknown, including little about the processes of forming their ribonucleoprotein (RNP) particles and moving them about the cell. The 6.0 kb full-length human L1 (Fig. 1B) has a 900-nucleotide 5 0 untranslated region (UTR) that functions as an internal promoter, driving expression of two open reading frames (ORF1 and ORF2) that are separated by a short intergenic spacer (IGS). ORF2 encodes a 150 kDa protein (ORF2p) with endonuclease and reverse transcriptase activities. Most attention has focused on the 40 kDa RNAbinding ORF1 protein (ORF1p), especially that of the mouse, but while absolutely essential for L1 retrotransposition its precise function remains unclear (7).
Primate Alus are about 300 bp in length and dimeric in structure, consisting of two related but not identical right and left arms joined by an A-rich linker (Fig. 1G). The arms originated over 55 million years ago from 7SL RNA of the signal recognition particle (SRP) (see 8 for review). Although the Alu terminates in a poly (A) tail, it is transcribed by polymerase III, which is forced to find its T-rich terminator downstream of the element in flanking DNA. Alus are non-autonomous retrotransposons and, since they encode no protein, depend upon the L1 machinery for retrotransposition, a symbiotic relationship that has been recapitulated in a cell culture assay (9). While not essential, L1 ORF1p enhances Alu retrotransposition (10). Pol III-transcribed Alu RNAs are rare and may exist as only 100 -1000 copies per cell (11,12). However, Alu sequences are very common within non-coding regions of mature mRNAs and within the introns of heterogeneous nuclear RNAs (hnRNAs).
Hominid genomes also contain a composite nonautonomous retrotransposon termed SVA (Fig. 1H), an acronym for its component parts: (i) 5 0 -terminus (CCCTCT) n hexamer repeats, (ii) two antisense Alu-homologous fragments separated by unrelated sequence, (iii) a region of 30-50 bp variable number tandem repeats (VNTR) and (iv) 490 bp of SINE-R sequence, derived from the long terminal repeat (LTR) of a human endogenous retrovirus (HERV-K(HML2)). Curiously, SVAs lack obvious promoters at their 5 0 ends, suggesting that active elements may be transcribed from promoters fortuitously located in flanking upstream DNA. Like L1s, SVAs terminate with a poly (A) signal and tail, are flanked by short target site duplications (TSDs), and are frequently truncated, all together strong evidence that they are mobilized by L1s. Despite a small copy number (about 3000 in humans), SVAs are retrotranspositionally quite active, being highly polymorphic and the cause of six known cases of human disease (13,14).
Transposons can be the stuff that genes are made of. Retrotransposition occasionally generates target site deletions, some quite large or adds non-retrotransposon DNA to the genome by processes termed 5 0 -and 3 0 -transduction. Recombination between non-homologous retrotransposons causes deletions, duplications or rearrangements of gene sequence. Ongoing retrotransposition salts genomes with novel splice sites, adenylation signals and promoters, and thus builds new transcription modules. Retrotransposon sequence may be captured and incorporated into mRNAs, a phenomenon called 'exonization'. It has been estimated that 5% of human alternatively spliced internal exons originate in Alus, a phenomenon fostered by 'pseudosplice sites' and occasionally by RNA editing. A recent comprehensive study by Zhang and Chasin (15) confirms that high copy number repeats are the most important sources of new genes in humans and mice, and by implication novel proteins. Nekrutenko and Li (16) noted that 4% of human protein-coding regions contain transposable element sequence (mostly L1s and Alus).
In turn, the cell has evolved an array of defenses to stand against unfettered retrotransposition. Among these are DNA methylation, chromatin remodeling, RNA-induced silencing and the activity of APOBEC3 and ADAR nucleic acid deaminases (reviewed in 17). Transcription and translation inhibition is also important. As a consequence, endogenous full-length Alu and L1 RNA transcripts are hard to detect in most cell types, although expression levels, especially for Alus, increase under some conditions of stress and viral infection. L1 proteins, especially ORF2p, are suppressed in most cells, but elevated levels have been detected in human testicular vascular endothelial and Leydig cells (18). We are only beginning to clearly understand the interweaving of these various mechanisms of transposable element control.
Suppression of retrotransposon expression, together with the presence of retrotransposon sequence within so many mRNAs, means that tracking endogenous L1 RNPs in cells is very difficult. We previously reported that L1 ORF2p overexpressed alone in a modified vaccinia virus Ankara/T7 RNA polymerase (MVA/T7RP) hybrid system is predominantly cytoplasmic, and that both ORF1p and ORF2p are detected in nucleoli of a small percentage of cells. A nucleolar localization signal was mapped within the endonuclease domain of ORF2p (19). We also demonstrated that both endogenous and overexpressed L1 ORF1p colocalize with proteins that mark cytoplasmic stress granules (20). Stress granules are discrete cytoplasmic aggregates which can be induced by a range of stress conditions, including heat shock, hypoxia, osmotic shock, oxidative stress, viral infection and the overexpression of some cellular proteins. Stalled pre-initiation complexes are found in stress granules and include the small but not the large ribosomal subunit bound to translation initiation factors such as eIF2 and eIF3 (reviewed in 21). ORF1p foci form constitutively in the absence of exogenous stress, suggesting that the cell may recognize overexpression of the L1 itself as a stress and target L1 RNPs to granules where they can be sequestered or targeted for degradation. However, although ORF1p foci contain polyadenylated RNA, we previously failed to detect full-length functional L1 RNA in the granules (20).
We sought further understanding of the retrotransposition life cycle by assaying cytologically in both living and fixed cells the subcellular distribution of RNPs of the three active human retrotransposons. We overcame inherent difficulties in detecting endogenous retrotransposon RNA by assaying RNPs generated from transfected constructs expressing L1s, Alus and SVAs. For the first time, we can report that L1 RNA, ORF1p and ORF2p colocalize together in the cytoplasm in granules. We show that ORF1p with an intact RNA-binding domain can direct L1 RNP distribution within the cell, including entry into the nucleus. Furthermore, we compare the subcellular partitioning of L1 RNA with that of Alu and SVA elements and discover significant differences, despite a presumed common mechanism of retrotransposition insertion. Each retrotransposon's RNP is associated with discrete and different subcellular compartments: the L1 RNA with cytoplasmic granules, Alu RNA with nuclear bodies and SVA RNA with novel nuclear aggregates. It is anticipated that data, reagents and techniques described here will help lay groundwork for more detailed studies of L1 RNP expression, assembly, and movement in cells.

L1 RNA colocalizes with ORF1p in cytoplasmic foci
To track L1 RNA in both living and fixed cells we took advantage of the MS2-NLS-GFP system (22, reviewed in 23). The two components of this system are bacteriophage MS2

1714
Human Molecular Genetics, 2010, Vol. 19, No. 9 coat protein (CP) fused with GFP (Fig. 1A), and an mRNA tagged with a tandem array of 19 bp stem loops able to bind MS2 CP. When expressed alone in cells, a nuclear localization signal (NLS) directs MS2-NLS-GFP fusion protein to the nucleus ( Fig. 2A). However, when cotransfected with an episome expressing tagged RNA, MS2-NLS-GFP protein is tethered to the stem loops and marks the RNA in the cytoplasm or at sites of concentration in the nucleus by GFP fluorescence. We cloned six CP-binding stem loops into the 3 0 -UTR of a retrotransposition-competent, full-length L1 (L1-MS2, Fig. 1B) and assayed for localization of its RNA. MS2-NLS-GFP protein in the presence of tagged L1 RNA was concentrated in cytoplasmic foci in cotransfected HEK 293T cells (Fig. 2B) (20). This pattern of cytoplasmic RNA was reproduced in fixed 293T cells (Fig. 2C), human breast cancer MCF7 cells and human osteosarcoma U20S or 143Btk-cells by fluorescent in situ hybridization (FISH) employing a DNA probe (Cy3-MS2) to the MS2 stem loops in the 3 0 -UTR of the L1 RNA. MS2-FISH analysis also detected L1 RNA in the nucleus, but generally excluded from nucleoli. Overnight treatment with RNaseA abolished most cytoplasmic signal, although foci were partially resistant suggesting a densely packed protein-RNA composition (data not shown). A different probe, Cy3-SV40-1, situated upstream of the vector's SV40 poly (A) signal, detected a similar pattern of RNA derived from MS2 stem loop-tagged or untagged L1s (data not shown). In addition, northern blot analysis demonstrated that the presence of MS2 stem loops did not affect levels of L1 RNA expression (Fig. 1I).
Employing a previously characterized polyclonal antibody recognizing the C-terminus of exogenously expressed ORF1p (19), we detected ORF1 protein in cytoplasmic foci together with its RNA (Fig. 2D). On the other hand, L1 RNA lacking MS2 stem loops failed to tether coexpressed MS2-NLS-GFP protein and no RNA foci could be detected in the cytoplasm (Fig. 2E). We previously reported that a double point mutation, N157A/R159A, in the conserved middle domain of ORF1 almost completely abolished ORF1p cytoplasmic foci (20). Introducing the same mutation into full-length L1 similarly abrogated RNA foci formation, but in only 60% of transfected 293T cells (data not shown). This suggests that although ORF1p is important for directing L1 RNA to granules, additional factors may also help target RNA to foci.
Both exogenous and endogenous ORF1 protein colocalizes in unstressed and, to greater degree, in stressed cells with proteins that mark cytoplasmic stress granules (20). Using sequential MS2-FISH and protein immunofluorescence, L1 RNA is seen overlapping with the stress granule marker proteins T-cell intracellular antigen (TIA-1, Fig. 2F and G) and elongation initiation factor 3 (eIF3h, not shown) in some cells. In other cells, protein and RNA foci were in juxtaposition (Fig. 2H), although treatment with thapsigargin (a compound that blocks calcium uptake into the ER lumen and triggers the unfolded protein stress response) increased the incidence of coalignment. Nevertheless, in many cells L1 RNP granules were present alone, unmarked by stress granule proteins. The reason for this ambiguous patterning is so far unclear, although it would suggest that not all L1 RNP foci form bona fide stress granules (at least as defined by these marker proteins). As previously reported for ORF1 protein (20), processing (P) body marker proteins did not generally overlap, although they occasionally juxtaposed with L1 RNA foci ( Fig. 2I and J). P-bodies and stress granules are often found in close proximity (24).

ORF1p mediates the subcellular distribution of ORF2p
Unfortunately, ORF2 protein is expressed at a much lower level than ORF1p making it very difficult to image (19,25). Using an affinity purified polyclonal antibody (a-ORF2-C) that recognizes an epitope (residues 1259-75) at the very end of ORF2p, we successfully detected the protein in cells transfected with a vector containing CMV promoter, L1.3 5 0 -UTR and ORF2 ( Fig. 1C; 26). Expressed alone in this manner, ORF2 protein was typically uniformly distributed throughout the cytoplasm, without foci and faintly evident in nuclei (Fig. 3A). When ORF2 protein was transiently overexpressed by the MVA/T7RP hybrid system (19), a single prominent band of about 150 kDa was detected by western blotting and a-ORF2-C (Fig. 3B).
As previously reported, ORF1p expressed alone can induce foci formation (19). To examine its behavior in association with ORF2p, we transfected 293T cells with a vector (ORF1-EGFP L1-RP WT; Fig. 1D) containing a CMV promoter, ORF1 C-terminally tagged with enhanced green fluorescent protein (EGFP), followed by intact downstream L1 sequence (IGS, ORF2, 3 0 -UTR). This construct is active in the cell culture retrotransposition assay (27) at 30% activity of full-length L1-RP. Although Alisch et al. (26) determined that an L1 construct with ORF1 replaced entirely with GFP remained competent for ORF2 translation, interestingly a-ORF2-C was able to detect the protein in only a small fraction of ORF1p-EGFP positive 293T cells. Previously, we showed that a point mutation in the EN domain, known to abolish both nicking and retrotransposition in cell culture (27,28), increased expression of ORF2p. A second mutation in the reverse transcriptase domain further increased protein levels (19). By introducing both mutations (ORF1-EGFP L1-RP EN RT MT), we boosted the number of ORF2ppositive cells (without changing the cellular pattern), although still to only 1 or 2% of transfected cells.
Importantly, for the first time we are able to visualize ORF2 protein in cells. We can image L1 RNA, ORF1p and ORF2p, all expressed from the same L1 construct, and report their colocalization together in the cytoplasm and their targeting to granules. This same pattern was seen for construct ORF1-EGFP L1-RP WT (Fig. 3C), as well as an L1 construct with ORF1 modified by a C-terminal T7-epitope tag and detected with anti-T7 antibody (not shown). ORF1p, ORF2p and L1 RNA combined colocalization to foci was also detected in MCF7 cells.
Our data suggest that ORF1p must bind the L1 RNP in order to cause ORF2p to enter cytoplasmic granules. We altered ORF1 in ORF1-EGFP L1-RP EN RT MT by point mutation or truncation ( Fig. 1E and F). Neither truncated ORF1p fragment 1 -130 nor full-length ORF1p with middle domain point mutations could induce foci formation. Coexpressed ORF2p similarly remained diffusely cytoplasmic without granules (Fig. 3D), the same pattern seen when the protein Human Molecular Genetics, 2010, Vol. 19, No. 9

1716
Human Molecular Genetics, 2010, Vol. 19, No. 9 is expressed alone in the absence of ORF1p (Fig. 3A). On the other hand, ORF1p residues 131-339 span a recently discovered novel RRM1 RNA-binding domain and a C-terminal domain (CTD) that together are necessary and sufficient to bind RNA (29). Interestingly, this GFP-tagged ORF1p 131-339 fragment is predominantly nuclear, but retains foci remnants in some cells. When coexpressed from L1 RNA, ORF2p and the ORF1p 131 -339 fragment track together, both found in cytoplasmic foci in some cells and colocalized in nuclei (Fig. 3E). Together the evidence shows that ORF1 protein having an intact RNA-binding domain can escort L1 RNPs about the cell, carrying along ORF2p also bound within the RNP.
A majority of Alu RNA is nuclear but can be sequestered by L1 ORF1p in cytoplasmic granules Dewannieux et al. (9) assayed in cell culture an Alu retrotransposon of the young, active Ya5 subfamily, joined with a 7SL gene upstream pol III enhancer and tagged with the neo TET reporter cassette (a neomycin phospotransferase gene interrupted by a self-splicing Group I intron) and discovered that the Alu could be mobilized in trans at high frequency by a cotransfected episomal L1. A minimal pol III terminator (a TTTT motif 28 nts downstream of the Alu poly(A) tract) marked the 3 0 -end of the element. We removed the C-terminal reporter cassette from this construct and either replaced it with an MS2 stem-loop array or reconstructed the wild-type sequence in order to assay Alu RNA localization in living and fixed cells (Fig. 1G). In addition, a member of the older AluSx subfamily, 93% identical to the Ya5 element and differing in its terminator sequence (isolated from a BC1 SINE), was cloned downstream of the 7SL enhancer and assayed for RNA localization in cells (Fig. 1G). This Alu is also active for retrotransposition in cell culture (30). Full-length Ya5 Alu RNA was detected by MS2-FISH in both nuclei and cytoplasm of human HEK 293T, HeLa and U2OS cells, with nuclear predominance in a majority of cells (especially 293T cells, Fig. 4A and B). In nuclei of living or fixed cells, both MS2-NLS-GFP binding and the RNA FISH Cy3-MS2 probe showed Alu RNA concentrated in typically less than 20 bright, round nuclear foci. This pattern was not restricted to tumor cells since fluorescing foci were also seen in nuclei of transfected primary human fibroblasts (Fig. 4C). The degree of general nucleoplasm staining differed between cells, and in 25% of cells nuclear signal resided almost exclusively in foci (Fig. 4B). The reason for this stochastic variability is unknown.
A FISH probe (Cy3-Alu) targeting wild-type Alu RNA lacking MS2 repeats also detected a similar pattern in cells transfected with either the Ya5 or AluSx construct (not shown). In non-transfected cells, this Alu probe generated nucleolar-excluded, uniform nucleoplasm staining that disappeared upon RNase treatment. However, it is likely that hnRNAs containing Alu sequence, rather than true pol IIItranscribed Alus, were the source of this endogenous signal.
Unlike L1 RNA, targeting to prominent cytoplasmic foci is not an obvious feature of Alu RNA distribution. However, Alu RNA expressed in the presence of GFP-tagged ORF1p was bound and sequestered by ORF1p in cytoplasmic granules, without any disruption of Alu nuclear foci (Fig. 4D). On the other hand, coexpressed ORF2 protein had no effect on the cytoplasmic distribution of Alu RNA, and colocalization in the nucleus was not apparent (Fig. 4E).

Alu RNA can associate with markers of Cajal bodies
We sought to determine the nature of the prominent Alu RNA-rich nuclear foci of 293T cells by incubating MS2-NLS-GFP-tagged cells with antibody markers of known nuclear bodies. Foci were not costained by antibodies recognizing nucleolar proteins (Fig. 4F). In addition to nucleoli and chromosomes, the eukaryotic nucleus is compartmentalized into a number of small, membrane-less organelles (summarized in 31). Alu RNA foci were not marked by SC35, the canonical protein of nuclear speckles, presumed sites of storage for mRNA splicing and processing factors (Fig. 4G) (32). A subset of polymerase III transcipts, including RNaseP RNA, MRP RNAs and some hY RNAs, concentrate in the perinucleolar compartment (PNC), an irregularly shaped structure of unknown function capping the nucleoli of mostly transformed cells (33,34). Wang et al. (35) reported endogenous Alu RNA in the PNC of HeLa cells. However, we failed to detect obvious colocalization of Alu RNA foci of 293T cells with polypyrimidine tract binding protein (PTB), the main marker of PNCs, and only a minor percentage of Alu foci were seen adjacent to nucleoli (Fig. 4H). Furthermore, Alu foci were not associated with the proliferating cell marker antigen Ki67, which enters nucleoli and also forms nuclear foci not coincident with coiled bodies (36, Fig. 4I).
Coiled or Cajal bodies (CBs) are involved in maturation and cycling of certain classes of snoRNPs and snRNPs and are typically defined by the presence of the protein p80-coilin (37,38). Survival of motor neurons (SMN) protein is also found in CBs, as well as the functionally and structurally related body called gems (gemini of Cajal bodies) (39). Mutations in SMN can lead to Spinal Muscular Atrophy, a neuromuscular disorder characterized by loss of motor neurons (40). In some cell lines, such as the Hela PV strain and many fetal tissues, SMN protein separates from CBs to form discrete bodies or 'gems' (41). Most cultured cells have 10% separate gems and CBs (42). CBs themselves are dynamic structures, assembling and dissembling, and differing in number per cell (typically 0 -10). A subset of Alu RNA foci did not directly overlap, but were closely juxtaposed with SMN-and, somewhat less frequently, coilin-rich bodies ( Fig. 4J -L). Not all Alu RNA foci in a single cell associated with these marker proteins, and in a significant number of 293T cells Alu foci existed without CBs being present (Fig. 4K). Nevertheless their association was evident.
In a similar manner, but less frequently, Alu foci associated with promyelocytic leukemia (PML) bodies, as revealed by costaining with PML protein (Fig. 4M). The function of PML bodies is unknown, although they have been linked with various cellular functions, including transcriptional regulation, storage, DNA repair and cell cycle regulation. There is often partial overlap between a PML body and a CB in a given cell, mediated in part by interaction between coilin and PIASy, a PML body protein (43,44).
hTR, the RNA component of telomerase, passes through CBs. Its 3 0 region shares structural features with H/ACA box snoRNAs and scaRNAs (small Cajal body RNAs that guide modifying enzymes to their targets). Two hairpins are connected by a single-stranded hinge and are followed by a tail containing conserved H (consensus AnAnnA) and ACA boxes, respectively (45). The terminal loops of both hairpins of scaRNAs and the 3 0 hairpin of hTR contain Cajal body box (CAB) motifs (consensus UGAG) that direct accumulation of these RNAs in CBs (46,47). It struck us that dimeric Alu RNA also consists of two hairpins, the right and left monomers, connected by a poly (A) hinge and ending in a tail having sequence similar to an ACA box in some elements. Moreover, terminal loops of both monomers contain a variant CAB box (CGAG) motif (for a detailed representation of the secondary structure of Ya5 Alu RNA see Fig. 4 in ref. 48). However, when the invariant AG residues of the putative CAB boxes of Alu RNA were altered (A80U/G81C and A233U/G234C), nuclear foci in 293T cells were not visibly disrupted (data not shown).
Distribution of SVA RNA is distinct from that of L1 or Alu RNA Hassoun et al. (49) reported insertion of novel mobile DNA into the a-spectrin gene (SPTA1) of a patient with hereditary elliptocytosis. Subsequently, we ascertained that this insertion was the result of an SVA-mediated transduction event and iso- lated and characterized its full-length SVA precursor, SVA SPTA1 (13). We cloned SVA SPTA1 upstream of six MS2-repeats in vectors containing or lacking a CMV promoter (Fig. 1H) and assayed for subcellular localization of SVA RNA by MS2-FISH (Cy3-MS2 probe).
Subcellular distribution of SVA SPTA1 -MS2 RNA in 293T or U2OS cells was distinctive from that of L1 or Alu: cytoplasmic without prominent foci, and concentrated in large nuclear aggregates (Fig. 5A). Although apparently randomly scattered about the nuclei of some cells, in others SVA SPTA1 aggregates ringed nucleoli and were not obviously associated with coiled or PML bodies (Fig. 5C and D). SVA RNA lacking MS2 repeats displayed a similar pattern when assayed with the Cy3-BGH FISH probe (not shown).
SVA RNA aggregates were more numerous and irregularshaped than Alu nuclear foci. However, dual-color FISH labeling revealed Alu RNA foci to be typically abutting, but generally excluded from SVA RNA aggregates (Fig. 5E). Intriguingly, these two species of SINE RNA appear to be targeted to associated but discrete subnuclear compartments.
A luciferase reporter assay failed to detect promoter activity within 700 bp of upstream sequence flanking SVA SPTA1 , or within its hexamers and a portion of the Alu-like region (M.C. Seleme and H.H. Kazazian, unpublished data). On the other hand, deleting the CMV promoter from either a pcDNA6 vector (Fig. 1H) or from a pCEP4-based vector (99-PUR) containing SVA SPTA1 did not alter FISH detection of SVA RNA in 293T cells. Although we cannot exclude

1720
Human Molecular Genetics, 2010, Vol. 19, No. 9 the possibility of transcription from a cryptic promoter within these CMV-deleted constructs, the evidence argues for element transcription initiated from a promoter within the SVA. This is also supported by northern blot analysis, in which dominant bands indicate transcription beginning near the 5 0 -end of SVA SPTA1 , even in the absence of an upstream CMV promoter (Fig. 5B).

DISCUSSION
L1s, Alus and SVAs are the active human retrotransposons, a triumvarate responsible for generating a significant portion of the genome's junk and, to lesser degree, genic baggage and involved with its packing, unpacking and reorganization. Progress in understanding retrotransposons has been in large part due to in silico analyses of mammalian genome sequences, an assay for retrotransposition in cultured cells, and transgenic rodent models that allow analysis of retrotransposition in vivo (reviewed in 6,17,50). Nevertheless, we still have a rudimentary understanding of the biochemistry of insertion, the movements of retrotransposons within the cell, and their interaction with host factors. In this paper, we demonstrate significant differences in the localization of RNAs of the three active human non-LTR retrotransposons. We also describe effects of L1 ORF1 protein on the distribution of ORF2 protein and the L1 RNP, reveal an association of Alu RNA with nuclear coiled bodies, show both nuclear and cytoplasmic residency of SVA RNA and present evidence that at least a subset of SVA elements harbor an internal promoter. The ability to detect and manipulate the L1 RNP particle is critical to better understanding of the complex process of retrotransposition. Early studies identified mouse or human L1 ORF1p in large RNPs fractionated from cytoplasmic extracts (51,52). The observation that only intact retrotransposons generated new inserted copies suggested cis preference of L1 proteins for their own encoding RNA (53). Kulpa and Moran (54) subsequently provided the first biochemical evidence that L1 RNA, ORF1p and ORF2p colocalize in a putative RNP retrotransposition intermediate (coenriched by differential centrifugation), and that ORF2p reverse transcriptase preferentially transcribes its own RNA. Here we directly visualize, for the first time, colocalization of L1 RNA, ORF1p and ORF2p in the cytoplasm where these macromolecules concentrate as RNPs in microscopically phase-dense foci. Significantly, mutations in ORF1p that alter its subcellular localization similarly redirect ORF2p that is coexpressed from the same construct.
We previously reported that both ectopically expressed and endogenous ORF1p intensely accumulate in cytoplasmic foci that colocalize with protein markers of stress granules under both stressed and unstressed conditions. Chemical induction of stress increases the intensity of costaining (20). Based on current knowledge of the domain structure of ORF1p, we conclude that RNA-binding, but not protein multimerization, is most important for targeting its RNP to cytoplasmic foci. Three structured regions have been defined in L1 ORF1p: an N-terminus coiled-coil (C-C), a middle domain and a CTD. In mice, the C-terminal basic third of the protein, but not the CTD alone, is responsible for low sequence specificity and high affinity binding of RNA and single-stranded DNA. In humans, a recently discovered non-canonical RNA-recognition motif (RRM) within the middle domain cooperates with the CTD in RNA-binding in vitro (29,55,56).
In humans, the predicted C-C structure extends from residues 52-152, and includes a leucine zipper motif (56,57). C-terminal-deleted ORF1p mutants that only include the C-C domain fail to form cytoplasmic foci. These truncation mutants include a protein trimerization domain but lack RNA binding (55,58). Therefore, although simple multimerization of some proteins, such as G3BP and TIA-1, is sufficient to induce stress granule formation (59,60), apparently this is not the case for L1 ORF1p.
Patterns of localization seen when ORF1p is expressed alone are replicated for the ORF1p-bound RNP. For example, the double point mutation N157A/R159A, that completely abolishes foci formation of ORF1p, similarly attenuates localization of full-length L1 RNA to foci. Khazina and Weichenreider (29) recently reported that these same residues are important for RNA-binding, forming conserved side chains of an RRM domain that spans the central region of ORF1p, with N157A linking the N-and C-termini of the domain. Altering R159 diminished RNA-binding as determined by size-exclusion chromatography.
Although the polyclonal antibody, a-ORF2-C, clearly detects coexpressed ORF2p in only a limited number of ORF1p-EGFP-positive cells, it reveals that only ORF1p with intact RNA-binding domain can direct ORF2p and RNPs to cytoplasmic granules. Using this antibody, we provide visual evidence for previous biochemical and cell culture assays suggesting that both ORF1p and ORF2p reside together in an RNP (54,61). a-ORF2-C will be a valuable resource for further study of L1 biology.
Full-length dimeric Alu RNAs have a short half-life, but some can be processed to a longer-lived left monomeric cytoplasmic form, termed scAlu, that lacks a poly (A) terminus and is incapable of retrotransposition (62). Alu sequences are also contained within 10% of Pol II-derived hnRNAs, exceeding much rarer Pol III-transcribed Alu transcripts by many-fold. Moreover, some Alu hybridization probes may cross-hybridize with abundant 7SL RNA complicating interpretation of northern blot analyses (12). Understandably, studies of the subcellular distribution of full-length Alu RNA are few, limited generally to cell fractionation assays and visualization of the Alu domain of the SRP ancestral RNA, sequence that is 85% identical to Ya5 RNA. Some results have been contradictory, as described below.
Following microinjection of Alu constructs into Xenopus oocytes, Perlino et al. (63) reported almost exclusive nuclear retention of full-length transcripts. On the other hand, using primer extension analysis of poly Aþ HeLa cell extracts, Liu et al. (12) detected low-copy, full-length endogenous Alu transcripts in nuclear and cytoplasmic fractions, but predominantly in the cytoplasm. Jacobson and Pederson (64) injected canine SRP RNA into nuclei of epithelial cells and found it rapidly localized to nucleoli from whence signal was progressively lost to the nucleoplasm and thence partially to the cytoplasm where it formed a faint 'patched pattern'. Transient nucleolar localization of SRP RNA was associated with both the Alu domain and helix 8 of the S domain.
Cell fractionation experiments by He et al. (65) found that the Alu domain facilitated export of Xenopus SRP RNA to the cytoplasm following its injection into oocyte nuclei.
We found full-length Alu RNA exogenously expressed by members of two different Alu subfamilies and differing in their terminator sequences, to be predominantly nuclear, but not concentrated in the nucleoli of transfected cells of different lines. A couple of cell fractionation studies have reported fulllength Alu RNA transcripts mainly in the cytoplasm (12,66, as data not shown). It is conceivable that overexpression of exogenous Alu RNA might overwhelm a cell's natural ability to export it efficiently to the cytoplasm. However, it is also possible that loss of Alu RNA to the cytoplasm might be an artifact of cell fractionation experiments. We would expect that La protein, which binds strongly to 3 0 terminators of Alu and other pol III transcripts, would facilitate nuclear retention of Alu RNA (67,68). La resides predominantly in the nucleus, but moves rapidly to the cytoplasm upon cellular stress (69,70) and perhaps also upon cell membrane lysis. As for cytoplasmic scAlus, these RNAs are 3 0 -end processed, are not bound by La, and fail to be retained in nuclei (62).
Although distributed throughout the nucleoplasm, we also detected Alu RNA concentrated in small, round, intensely stained nuclear foci, a significant number of which juxtaposed with Cajal bodies. The biogenesis of cellular RNAs is significantly compartmentalized into subnuclear domains that have been called nuclear organelles. Evidence suggests that nuclear Cajal bodies are involved with post-transcriptional assembly of snRNPs and snoRNPs, base modification of spliceosomal snRNAs, and the biogenesis of the telomerase RNP (reviewed in 71). Newly synthesized U RNAs are transported to the cytoplasm and incorporated into snRNPs, aided by the SMN complex. The SMN complex then delivers the snRNPs into the nucleus and CBs, a process that appears to depend upon direct interaction between SMN protein and coilin (72). From CBs small RNAs are passed to other nuclear structures, such as nucleoli, nuclear speckles, and spliceosomal complexes. The Cajal body is the site for modification of not only U snRNAs, but also 2 0 -O-ribose-methylation and pseudouridylation of snoRNAs prior to transit to nucleoli. Telomerase RNA also passes through Cajal bodies, possibly to be modified by methylation (47,71,73). Here, perhaps, Alu RNAs also undergo some form of maturation.
We also assayed the subcellular distribution of SVA RNA. SVA elements are the youngest family of non-LTR retrotransposons in primates. They may also be the most active, evidenced by the relatively large percentage of SVAs associated with disease compared with L1-and Alu-associated insertions, a greater number of human-specific than chimpanzee-specific SVA insertions, and high levels of insertion polymorphism (14,74,75). Unfortunately, beyond sequence and numbers, little is known of SVA biology. All evidence points to their mobilization by the L1 retrotransposition apparatus. However, associated but discrete subcellular compartmentalization of SVA RNAs and Alu RNAs (Fig. 5E), which are also inserted by the L1 machinery, predicts differences in details of their mobilization.
Due to its poly (A) tail, its length, and the presence of several internal pol III termination signals, it is presumed that SVA is transcribed by pol II. However, the nature and location of its promoter is unknown, knowledge that is essential for understanding the SVA life cycle. The detection of SVA inserts with additional 5 0 flanking genomic sequence of significant length affords strong evidence that some SVA source elements coopt their promoters from upstream flanking DNA (76,77). This is not obviously the case for canonical fulllength SVA elements, such as SVA SPTA1 , whose 5 0 ends are within the hexamers, suggesting that transcription initiates within this region. The SINE-R of the SVA appears to retain at least some of the promoter activity of the HERV LTR from which it derives (J.L. Goodier and H.H. Kazazian, unpublished data). Whether it can initiate transcription from the 5 0 end of the SVA remains to be determined.
L1s, Alus and SVAs are believed to share a common mechanism of genomic insertion, termed target primed reverse transcription (TPRT), a 'copy and paste' process best characterized for insect non-LTR retrotransposons. According to this model, L1 ORF2-encoded endonuclease nicks the bottom strand of target DNA to expose a 3 0 -hydoxyl that primes reverse transcription of retrotransposon RNA. Secondstrand DNA synthesis follows and the integrant is resolved in a manner still poorly understood (78,79). Differences in the subcellular distribution of L1, Alu and SVA RNAs, their relative abundances in nuclei versus the cytoplasm, and their targeting to different subcellular compartments predict variable fates within the L1 RNP. Differences likely relate in part to L1 RNAs being mobilized in cis by their own encoded proteins (61,80), while Alu and SVA RNAs retrotranspose in trans and so likely enter the L1 RNP after its initial formation. Furthermore, although ORF1p enhances Alu retrotransposition in cell culture (10) and, as we have shown, can bind Alu RNA (Fig. 4D), Alu unlike L1 does not strictly require ORF1p for retrotransposition and so may bypass association with this protein in the cytoplasm (9).
One goal of retrotransposon research is to track retrotransposon RNPs in cells from their time of transcription to their point of insertion in the genome. We have moved in this direction by developing techniques for directly visualizing retrotransposon RNAs not only in fixed but also in living cells. However, we still cannot associate these RNAs with specific stages of the retrotransposon life cycle. For example, we cannot distinguish transcripts in the process of being inserted by retrotransposition from the total pool of RNAs. Nor are we yet able to ascertain whether L1 transcripts or proteins are in a pathway productive for a new insertion or in the process of being eliminated from the pool of active elements. By applying steadily improving techniques for tracking movements of RNA particles in real-time to the study of retrotransposon RNPs, we should be able to better understand their respective fates in cells.

Cloning of plasmid constructs
A construct containing the MS2-NLS-GFP fusion protein with pPol II promoter was a gift of S. Janicki (Wistar Institute, Philadelphia). To generate the L1-MS2 construct, six tandem MS2 CP binding sites were excised from plasmid The neo TET reporter cassette of the pAluYa5-neo TET plasmid provided by T. Heidmann (9) was removed and replaced with EcoRI/BamHI restriction sites for insertion of MS2-6 repeats, or restored to wild-type Alu sequence by QuickChange mutagenesis (Stratagene).
The full-length SVA precursor that created the a-spectrin insertion, SVA SPTA1 , was isolated from BAC clones RP11-166N6 and RP11-34J4 (Research Genetics) and cloned into pBluescript (Stratagene) to create pBS-SVA (13). SVA SPTA1 was recloned by PCR in KpnI/SalI sites of vector pcDNA6myc/hisB (Invitrogen). The MS2-6 array was inserted in an XbaI (blunted) site immediately downstream of the SVA.
To generate ORF1-GFP L1-RP WT, ORF1-GFP L1-RP ENRT MT, and ORF1 truncation mutants, ORF1 sequence was cloned into the BamHI/SalI sites of pEGFP-N3 (Clontech). Sequence downstream of ORF1p in L1-RP was amplified by PCR and cloned into the NotI site immediately downstream of the EGFP gene. The L1 insert of ORF1-EGFP L1-RP WT was excised, recloned in the construct JM101 containing neoI reporter cassette, and tested for retrotransposition in cell culture as previously described (27).
For sequential immunofluorescence and RNA FISH, cells were blocked with 2% BSA Fraction V, incubated with primary and secondary antibodies for 1 -2 h, post-fixed with paraformaldehyde, rehydrated in 2X SSC and incubated overnight for RNA FISH.

RNA FISH
Cells grown on poly-L-lysine-treated cover slips were washed twice with Dulbecco phosphate buffered saline (DPBS, Invitrogen) for 5 min and fixed with cold 4% DPBS/paraformaldehyde (pH 7.3) for 15 min at 48C. Fixed cells were incubated overnight or longer in 70% ethanol at 48C, and either dehydrated through a 80, 90, 100% ethanol series and dried on a heat block, or rehydrated in 2X SSC buffer. Coverslips were incubated overnight at 378C with 300 ng of a fluorophoreconjugated antisense DNA probe in a humid chamber. Hybridization buffer contained 10% dextran sulfate, 2 mM vanadylribonucleoside complexes, 40 mg E. coli tRNA, 25 mg salmon sperm DNA, 2X SSC, 30% formamide, and 1 ml RNasin (Promega). Following hybridization, cells were washed twice for 30 min in 2X SSC/30% formamide at 378C, and cover slips were mounted with Mounting Medium (KPL Laboratories).
Slides were examined with either a Leica DR RE microscope with TCS SP Scanning Confocal Imaging Spectrophotomer using Leica software ver. 2.6.1, or a Nikon E600 microscope equipped with brightfield, Nomarski and fluorescent optics, Fast 1394 Camera (QICAM, Canada), and IP Laboratory imaging software (Scanalytics Corp., BD Biosciences). The Nikon E600 was also equipped with a Z-axis drive for threedimensional imaging and subsequent deconvolution.