This review presents detailed information about the structure of triplet repeat RNA and addresses the simple sequence repeats of normal and expanded lengths in the context of the physiological and pathogenic roles played in human cells. First, we discuss the occurrence and frequency of various trinucleotide repeats in transcripts and classify them according to the propensity to form RNA structures of different architectures and stabilities. We show that repeats capable of forming hairpin structures are overrepresented in exons, which implies that they may have important functions. We further describe long triplet repeat RNA as a pathogenic agent by presenting human neurological diseases caused by triplet repeat expansions in which mutant RNA gains a toxic function. Prominent examples of these diseases include myotonic dystrophy type 1 and fragile X-associated tremor ataxia syndrome, which are triggered by mutant CUG and CGG repeats, respectively. In addition, we discuss RNA-mediated pathogenesis in polyglutamine disorders such as Huntington's disease and spinocerebellar ataxia type 3, in which expanded CAG repeats may act as an auxiliary toxic agent. Finally, triplet repeat RNA is presented as a therapeutic target. We describe various concepts and approaches aimed at the selective inhibition of mutant transcript activity in experimental therapies developed for repeat-associated diseases.
In the early 1990s, the identification of a new class of disease-causing mutations caused considerable excitement in the community of human molecular geneticists. The mutations were inherited trinucleotide repeat (TNR) expansions, and the associated disorders became known as Trinucleotide Repeat Expansion Diseases (TREDs) (1). Over 20 neurological diseases have now been assigned to this group. Each disease is associated with a single defective gene, which triggers the process of pathogenesis through aberrant expression or toxic properties of mutant transcripts or proteins [reviewed in (2–4)]. Although researchers have been making efforts to develop treatments for TREDs for nearly two decades, they remain incurable.
TREDs include spinal and bulbar muscular atrophy (SBMA) (5), fragile X syndrome (FXS) (6), myotonic dystrophy type 1 (DM1) (7), Huntington's disease (HD) (8) and a number of spinocerebellar ataxias (SCA) (9,10). The first years of research on pathogenic mechanisms in TREDs resulted in clear mechanistic separation among different groups of the disorders. However, recent studies have begun to reveal that mutant RNA and mutant protein can act in parallel and exert their toxicities independently in some TREDs (11–13). Mutant transcripts may contribute to the pathogenesis of diseases driven by mutant proteins (11,12), and mutant proteins may contribute to the pathogenesis of disorders known as driven by toxic RNA (13). Thus, the long-standing borders between distinct pathomechanisms in TREDs are beginning to be crossed, and this crossing occurs in both directions.
Much of the recent excitement brought to the field of TREDs may be attributed to the rapid progress of research on various approaches to treat these diseases (14–16). All the approaches discussed here are aimed at targeting triplet repeat RNA sequences with the goal of disrupting their pathogenic interaction with sequestered proteins, inhibiting translation from the mutant allele or destroying mutant transcripts. In some of these approaches, detailed information on the structure of the target RNA is essential for the rational design of potent reagents that may become useful therapeutic tools in the future.
In this review, we summarize the results of detailed structural studies of triplet repeats present in transcripts of TRED genes, in either non-coding or protein coding regions. Relevant structural information is given to illustrate involvement of RNA structure in the mechanism of pathogenesis triggered by expanded repeats. Important recent findings are also presented in the context of TNR genomics. The genomic and transcriptomic perspectives are shown to better understand the abundance of various triplet repeats, i.e. their presence in the cells in which pathology develops and where selective targeting by various reagents must occur. The characteristics of interactions between TRED transcripts and specific proteins are also presented, as these interactions determine the downstream adverse effects of TNR mutations.
TRIPLET REPEATS ARE FREQUENT MOTIFS IN HUMAN TRANSCRIPTS
TNRs belong to simple sequence repeats (SSRs), also known as short tandem repeats or microsatellites, and are common motifs in the genomes of humans and many other species (17). The repeats mutate at a very high rate, are often polymorphic in length and functions proposed for the repeats are related to their variable length (18). They are copious not only in genomes but also in transcriptomes, and their abundance may be higher than originally thought due to the presence of bidirectional transcription across the majority of human genes and intergenic regions (19,20). Importantly, in translated sequences, TNRs are selected preferentially over dinucleotide or tetranucleotide repeats, because the length variation of TNRs does not change the reading frame (21).
Twenty different TNR motifs may potentially occur in RNAs if homotrinucleotide motifs are excluded and different phases of individual motifs are combined. The great abundance of some TNRs in cells raises questions about what roles these sequences might play in transcripts (22). TNRs differ in length, and the expression levels of their host transcripts vary greatly. The structures formed by TNRs in transcripts depend not only on the repeat motif, and the number of its units, but also on the presence of interruptions breaking the homogeneity of the repeat tract (23). TNRs that have beneficial structural features and functional properties are positively selected during evolution, and TNRs with deleterious features are selected against (24).
A number of studies have been performed and many resources developed to characterize the frequency and location of SSRs (including TNRs) in the human genome (17,25–27) and exome (28–33). The main questions that have been addressed are the following: how many human mRNAs contain TNRs? At what frequency do certain types of TNRs occur? What is the length distribution of various TNRs in transcripts? What is the preferred location of TNRs in mRNA? And what are the known and putative functions of these sequences?
Three independent studies have provided the answers to these questions by identifying TNRs in the human genome reference sequence (27,29,34). In the most recent study, 32 448 tracts of uninterrupted TNRs composed of six or more repeated units were identified using the BLASTn algorithm (29). The relative frequencies of different TNR types were similar to those reported in earlier genome-wide surveys that used different repeat length and purity thresholds (25–27,34). As many as 1030 TNRs were identified in the exonic sequences of 878 genes. The TNRs that are strongly overrepresented in exons are CNG (where N is any nucleotide), CGA and AGG, whereas CTT, CAT, CAA, TAA and TTA are robustly underrepresented (Figure 1A). The shortest tracts are most prevalent, and the frequency of TNRs decreases roughly exponentially with their length. For the majority of TNR types, the longest tracts are <20 repeated units. However, for some TNRs such as AAG, several tracts >30 units have been identified (29). Of the 1030 exonic TNRs, 59% are located in the ORF, 28% are in the 5′-UTR and 13% are in the 3′-UTR. The CCA, CAG, CTG, CCT, AGG, AAG and ATG TNRs occur most frequently in the open reading frame (ORF) (~80%); AT-rich TNRs are more frequent in the 3′-UTR, whereas CCG and CGG repeats are most frequent in the 5’-UTR (52% and 62%, respectively) (29).
To better characterize TNR sequences, their occurrence and genetic polymorphism have been investigated (35–38). Detailed information about triplet repeat length distribution in specific genes has been gathered experimentally for CAG and CTG repeats (36). A population genotyping study was conducted on 100 human genes selected to contain the longest runs of these repeats. The results demonstrated that very long and highly polymorphic repeat tracts are rare in genes not known to be associated with TREDs, which is in agreement with the results of a previous bioinformatics survey (23).
Functional association studies have been performed to gain some insight into the roles played by TNR-containing genes. It was found that genes coding for (i) proteins with transcription-related functions, (ii) proteins that interact with nucleic acids and (iii) proteins with nuclear localization are generally overrepresented among TNR-containing genes (27,29,37). These results as well as other lines of evidence suggest that the functions of TNRs can be expressed not only through proteins, but also at the DNA (24,35,39,40) and RNA (29,41–43) levels. Furthermore, TNR functions in RNA may strongly depend on the structures adopted by these sequences.
FOUR STRUCTURAL CLASSES OF TRIPLET REPEAT RNAs
To provide a basis for a functional analysis of triplet repeats in RNAs, the solution structures of oligoribonucleotides (ORNs) composed of the reiterations of specific triplets have been investigated under different experimental conditions using various methods. The ORNs that were first analyzed were four CNG motifs (N = A, C, G or U) reiterated 17 times (44). All these ORNs were found to form hairpin structures as demonstrated by chemical and enzymatic structure probing. The stem of the CNG repeat hairpin was shown to be composed of periodically occurring C–G and G–C base pairs and single N–N base mismatches. The hairpin loop was formed by either four or seven nucleotides. With the exception of the CGG repeat hairpin, the other three repeated CNG motifs form ‘slippery’ hairpins (i.e. tend to form alternative alignments unless they are fixed in one conformation by a G–C clamp) (44). Recently, the structures of all 20 different triplet motifs repeated either 17 or 20 times were subjected to a comparative analysis using biochemical structure probing as well as gel mobility analysis, and these structures were assigned to four classes (45). As shown in Figure 1B, AGG and UGG repeats form the most stable G-quadruplex structures; CGA, CGU and all CNG repeats form hairpins that are more stable than those of UAG, AUG, UUA, CUA and CAU repeats, whereas CAA, UUG, AAG, CUU, CCU, CAA and UAA repeats do not form any higher order structure. Further analysis of the hairpin and G-quadruplex structure-forming repeats was pursued using biophysical methods. UV-monitored structure melting revealed the following order of stability for CNG repeat hairpins: CGG > CAG > CUG > CCG; the stabilities of the AGG and UGG G-quadruplexes are roughly similar. CD spectra have shown that both G-quadruplexes are formed by parallel RNA strands (45). A shorter version of the G-quadruplex forming AGG RNA repeats (GGA)4 was also analyzed by NMR and CD (46,47). The intramolecular G-quadruplex was shown to be formed by a G:G:G:G tetrad plane and a G(:A):G:G(:A):G hexad plane (47). Other studies of triplet repeat ORN structures have included UV melting and/or CD studies of all CNG repeats or selected sequences of this group (48,49), gel mobility analysis of CGG repeats (50,51) and NMR studies of CGG repeats (52).
Crystal structures have been determined for short CUG (53,54), CAG (55) and CGG (56) repeats, which form intermolecular duplexes. The X-ray structures revealed details of the molecular architecture of these duplexes that are considered representative for stem portions of the CUG, CAG and CGG repeat hairpins. From a structural biology perspective, the nature and consequences of the periodic U:U, A:A and G:G mismatches were the most interesting findings. In the CUG repeat crystal structure, the U:U mismatches form stretched wobble interactions having only one hydrogen bond between the carbonyl O4 atom of one uracil residue and the N3 imino group of the opposite U residue (53). Similarly, in the CAG repeats, only one hydrogen bond is formed between the opposing adenine residues. This is an unusual and weak C2-H2•N1 bond. All the adenine residues are in the anti-conformation and serve as both hydrogen bond donors and acceptors (55). In the non-canonical G:G pairs found in CGG repeat duplexes, one guanosine residue is always in syn and the other is in anti conformation, and they form two hydrogen bonds, O6•N1H and N7•N2H. The helical structures of CGG repeats are more stable than those formed by CAG and CUG repeats (56).
Finally, an interesting correlation was found on comparing the occurrence of different triplet repeats in exons (described in a previous section) with their structures. As presented in Figure 1, TNRs that are strongly overrepresented in exons belong to the hairpin forming repeats, whereas underrepresented is the majority of repeat types that do not form any stable structure. The positive selection of repeats capable of forming hairpin structures in transcripts may suggest their importance in the regulation of gene expression, but the selection may also be acting at the level of amino acid repeats in proteins.
TRIPLET REPEAT EXPANSION DISEASES AND MECHANISMS OF PATHOGENESIS
Over 20 different genes containing unstable TNRs have been implicated in the pathogenesis of human neurological diseases collectively named TREDs [reviewed in (1)]. Expanded CTG, CGG, GAA and CAG repeats are sources of degenerative changes leading to symptoms associated with DM1, FXTAS, Friedreich's ataxia (FRDA) as well as HD and a series of SCAs (Figure 2). These are typically late-onset inherited disorders and their causative repeat mutations are located in different parts of genes that primarily determine the number of potential toxic entities. The adverse effect of non-coding mutations in DM1 and FXTAS is principally determined by the expression of a mutant transcript (57–61), whereas the toxicity of coding mutations is pronounced by the presence of both RNA and protein, which harbor abnormally lengthened repeats (11,62). In two other non-coding repeat expansion disorders, FRDA and FXS, it is the diminished expression of specific proteins which triggers pathogenesis as a result of inhibited or abortive transcription across, respectively, expanded GAA and CGG repeats (63–67).
Among well-studied mechanisms underlying TREDs are: (i) toxic RNA gain-of-function caused by transcripts harboring expanded CUG, CAG or CGG repeats (2,61,68–70); (ii) toxic protein gain-of-function through expression of polyglutamine (polyQ) tract encoded by mutant CAG repeats (71,72); and (iii) aberrant loss-of-transcript and loss-of-protein function caused by GAA and CGG expansions (63–67). However, considering the results of the most recent reports one can speculate that the mechanistic complexity of pathogenesis in TREDs is higher and more variable. The presence of non-ATG initiated translation was recently reported by Ranum and colleagues (13,73), and bidirectional transcription through repeat regions was shown for several genes of TREDs (74–76). The bidirectional transcription is not only a source of sense and antisense transcripts, but also leads to the generation of triplet repeat-derived siRNAs targeting transcripts containing complementary repeats as shown by the groups of Bonini (77) and Richards (78). These results indicate the existence of novel toxic entities that may give rise to new potential pathomechanisms. Further studies will evaluate their importance to the pathogenesis of specific TREDs.
STRUCTURES OF TRIPLET REPEATS IN TRED TRANSCRIPTS
The involvement of mutant transcripts in the pathogenesis of DM1, and its possible contribution to the pathogenesis of FXTAS and polyQ diseases prompted researchers in the past decade to take on the detailed structure examination of triplet repeat regions in numerous TRED transcripts. A further argument for undertaking that effort was the conviction that the treatments of TREDs aimed at the allele-specific inhibition of mutant transcript or its destruction by direct repeat targeting will benefit from having a deeper insight into the structure of the target. The structural information gathered for ORNs (44,45) and described earlier in this review was insufficient for this purpose, as it did not provide answers to questions such as the following: what is the effect of repeat length on structures formed by repeats? What is the contribution of sequences flanking repeats to the structure of the repeat region? And what are the structural roles of various repeat interruptions?
The first study aimed at answering these questions was performed on the DMPK transcript implicated in DM1 pathogenesis (79). The study design included: (i) a selection of representative normal alleles of TRED genes based on population genotyping results; (ii) size selection of the repeat region based on RNA structure prediction; and (iii) PCR synthesis of DNA templates for in vitro transcription, RNA synthesis, end labelling and structure probing with the use of nucleases (80) and lead ions (81,82). Normal length transcripts containing 5, 11 and 21 CUG repeats revealed the conversion of a single-stranded repeat region into semi-stable slippery hairpins upon increasing the repeat length, whereas stable hairpins were formed by expanded 49 CUG repeats (79). The finding that double-stranded-like structures are formed by CUG repeats in expanded transcripts was instrumental to the later discovery that muscleblind-like 1 (MBNL1) protein is sequestered by mutant DMPK transcripts in DM1 patient cells (83).
Similar in vitro structural analysis was further conducted for the triplet repeat regions of the majority of TRED transcripts and revealed their structural diversity. The contribution of sequences flanking the repeats to repeat hairpin stabilization was shown for ATXN1 (84), CACNA1A (85) and FMR1 (86) transcripts (Figure 3A). In contrast, flanking sequences did not influence the structures formed by repeats in DMPK (79), ATXN2 (87), ATXN3 or ATN1 transcripts (85). In HTT and AR transcripts, neighbouring repeats CCG and CUG, respectively, interact with CAG repeat tracts to form hairpins that have an unique composite architecture (88). It should be recognized, however, that the structures determined for triplet repeats in vitro may not fully recapitulate folding that occurs inside cells in the context of full-length RNA and various RNA binding proteins. The intracellular RNA structures and interactions, which were out of reach for a long time, are now amenable for investigation also for triplet repeats and on a transcriptome-wide scale. Such methods as global RACE (89), transcriptome-wide RNA structure probing (90) and HITS-CLIP (HIgh-Throughput Sequencing of CrossLinking and ImmunoPrecipitation products) (91) allow detecting products of RNA cleavages by endogenous nucleases or exogenous reagents, and determining high-resolution maps of RNA–protein interaction in vivo.
In four TRED-related genes i.e. FMR1, ATXN1, ATXN2 and TBP, the majority of the normal alleles contain specific interruptions located within the repeat tracts. These are AGG triplets disrupting CGG repeats in the FMR1 gene, CAT triplets within CAG repeats in the ATXN1 gene and CAA interruptions within CAG repeats in both the ATXN2 and the TBP genes (87). Such repeat interruptions in DNA have been shown to function as protective elements, preventing pathogenic repeat expansion (92). But what could be their functions in transcripts? RNA structure probing revealed that AGG interruptions within the CGG repeat of the FMR1 transcript prevent single hairpin structure formation by the repeats (Figure 3B). Instead, branched hairpins are formed that have the substituted base either in the side loop or in an enlarged terminal loop, depending on the location of the interruption (86). Similar structural roles were demonstrated for CAU and CAA interspersions in the ATXN1 (84) and ATXN2 (87) transcripts (,Figure 3B) and were predicted for the TBP transcript (87). It was hypothesized that the AGG interruptions may protect some premutation carriers from being prone to FXTAS by shortening the length of the hairpin composed of pure CGG repeats (86). In the cases of SCA1 and SCA2, rare carriers of expanded interrupted repeats have not developed any disease, and the RNA structure of the repeat region was found to be better correlated with pathogenesis than the length of the polyQ tract (84,87). These correlations suggested that RNA hairpin structure plays a more general role in the pathogenesis of TREDs.
TRIPLET REPEAT RNA INTERACTION WITH PROTEINS
The protein binding properties of TNR sequences have been mostly studied in relevance to the toxic features of mutant transcripts rather than in the context of the putative normal functions of TNR RNA (83,93,94). These studies took advantage of various methods to identify proteins that bind repeats and to characterize these interactions structurally. Most of this research has dealt with CUG repeats. First, the CUGBP1 protein was identified on the basis of its specific binding to single-stranded (CUG)8 incubated in HeLa cell nuclear extract (95). CUGBP1 is a member of the CELF (CUGBP and ETR-3-like factors) protein family, which regulate a number of post-transcriptional RNA processing steps including alternative splicing (96,97). The electron microscopy examination of the in vitro binding of recombinant CUGBP1 to transcripts containing expanded CUG repeats revealed that the protein localizes only to the base portion of the CUG repeat hairpin and its binding is not proportional to CUG repeat length (98). Studies have shown that CUGBP1 does not co-localize with mutant transcripts in DM1 cells (99,100).
Swanson and colleagues succeeded in identifying the RNA-binding protein, homologous to the Drosophila mbl proteins which binds to CUG repeats in a length-dependent manner and regulates alternative splicing (83). This protein was later shown to co-localize with mutant DMPK transcripts in a variety of DM1 patient cells and model organisms (99,101,102). Using the yeast two hybrid system, MBNL1 was shown to bind not only to CUG repeats but also to other types of repeated sequences, including CAG repeats (93). A filter-binding assay revealed a very similar in vitro binding affinity of MBNL1 to mutant CUG and CAG repeats (103). Moreover, the analysis of fluorescence recovery after photobleaching (FRAP) indicated that the affinity of the GFP-MBNL1 fusion protein to long CUG and CAG repeats is very similar in transfected cells (104). Most recently, HeLa and neuroblastoma cells expressing 5, 30, 70 or 200 exogenous CUG or CAG repeats were analyzed for colocalization repeat-containing transcripts with endogenous MBNL1 (12). Mutant transcripts with 70 or 200 CUG or CAG repeats were found to form nuclear inclusions that overlap with MBNL1. Other studies investigated the ability of CUG and CAG repeats to activate RNA-dependent protein kinase (PKR), the known cellular sensor of long dsRNA (105). CUG repeats of lengths with pathological consequences showed some activation of the kinase in vitro, and mutant CAG repeats of the HTT transcript were shown to activate PKR in human HD brain tissues (106).
The architecture of a CUG repeat hairpin of mutant length (54 repeats) in complex with MBNL1 was investigated using chemical and enzymatic structure probing and electron microscopy (103). MBNL1 multimers were shown to form ring-like structures that bind to the stem portion of the CUG repeat hairpin. The structures of very short oligomers containing CUG motifs in complex with MBNL1 were determined by crystallography, and the results suggested that MBNL1 may efficiently bind single-stranded CUG repeats (107). Very recently, MBNL1 was shown to bind (CUG)17 and (CAG)17 ORNs with similar affinity, whereas non-hairpin forming repeats of the same lengths composed of AUG or UUA repeats did not bind MBNL1 under the same assay conditions (12).
Interactions between CGG repeats and proteins have been recently investigated in cellular systems. Various cell lines were transfected with plasmids expressing 20, 40, 60 or 100 CGG repeats, and only the expression of long repeats supported the formation of nuclear aggregates in some but not all of the cell lines tested (108). These aggregates were shown to recruit the RNA-binding proteins SAM68, hnRNP-G and MBNL1. Earlier studies identified a number of other proteins that co-localize with long CGG repeats. Hagerman and colleagues used fluorescence-activated flow sorting, mass spectroscopy and immunohistochemistry to analyze the protein composition of RNA-containing intranuclear inclusions formed in astrocytes and neurons of FXTAS patients (109). These authors identified >20 proteins, including Lamin A/C, vimentin, hnRNP-A2/B1 and MBNL1. Furthermore, Pur alpha, hnRNP A2/B1 and CUGBP1 were shown to bind CGG repeats in the Drosophila model of FXTAS (94,110). The results of the protein binding studies suggest that MBNL1 sequestration by expanded CUG and CAG repeat transcripts is likely caused by direct RNA–protein interactions (103), and that SAM68 sequestration by expanded CGG repeats depends rather on indirect interaction (108).
RNA-MEDIATED PATHOGENESIS TRIGGERED BY TRIPLET REPEAT EXPANSION
The ability of mutant CNG repeats to form long hairpin structures (44,79,98) is a significant determinant of toxic RNA-dependent pathogenesis in a number of TREDs. TNRs of mutant length gain a toxic function by binding essential proteins and consequently diminishing their functional cellular levels (Figure 4). A characteristic feature of the RNA–protein interactions in mutant TNR-expressing cells is the formation of nuclear foci [reviewed in (111)] in which sequestered proteins, such as MBNL1-3 and SAM68, become immobilized. These aggregates cause cells to develop degenerative changes that are manifested through misregulated alternative splicing and embryonic splicing patterns in adult tissues (108,112–114). In the following sections, we characterize the toxic effects of nuclear aggregates in cells expressing mutant CUG, CGG and CAG repeat RNA focusing on their adverse effect on alternative splicing.
Cellular toxicity mediated by expanded CUG repeats in DM1 and SCA8
DM1 was the first neurological disorder in which the nuclear retention of transcripts containing expanded CUG repeats was detected (57,58). Although the reduced expression of both DMPK and the neighboring gene SIX5 accompany the mutant transcript retention (115), the main pathogenic mechanism is a deleterious gain-of-function of mutant RNA harboring CUG repeats. Over the years, as the number of DM1 model organisms has increased, the evidence for an RNA-dominant mechanism has grown stronger. The characteristic changes associated with DM1 pathogenesis in skeletal and cardiac muscles have been reproduced in transgenic mice and flies by expressing expanded CUG repeats (97,116,117). These similarities have been correlated with an adverse influence of mutant transcripts on RNA-binding proteins, i.e. MBNL1 and CUGBP1, that causes misregulated alternative splicing in DM1 (Figure 4B) (83,101).
To date, several aberrantly spliced transcripts have been identified that explain some of the phenotypic features of DM1 pathogenesis. For example, skeletal muscle hyperexitability (myotonia) and weakness have been associated with the mis-splicing of the chloride channel (CLCN1) (113,118–120) and the bridging integrator 1 (BIN1) (121) transcripts. Additionally, splicing alteration of the insulin receptor (INSR) contributes to the insulin resistance in DM1 muscle fibers (112,122). Defects in the cardiac functions are thought to be associated with the aberrant splicing of the troponin T type 2 (cTNT) transcript (123–125). Spliceopathy also features a number of central nervous system (CNS) transcripts including those for the glutamate receptor, ionotropic N-methyl d-aspartate 1 (NAMDAR1/GRIN1), amyloid beta precursor protein (APP) and microtubule-associated protein tau (MAPT) (68,126). Studies have shown that the DM-specific aberrant splicing pattern is reproduced not only in mice (102,127) and flies (128,129) expressing the DM1 mutation, but also in Mbnl1ΔE3/ΔE3 knockout mice (102) and in CUGBP1 overexpressing mice (127,130,131). These findings provide further evidence for the prominent roles played by these two splicing factors in DM1 pathogenesis.
An RNA gain-of-function mechanism by mutant CUG repeats is also implicated in SCA8 pathogenesis. This neurodegenerative disease primarily affects the cerebellum and is caused by a CTG•CAG repeat expansion that is transcribed in both directions and gives rise to the antisense non-coding CUG-harboring transcripts and translated CAG-bearing transcripts (Figure 2). The later transcripts undergo conventional translation that results in polyQ protein and non-ATG translation (13) that occurs across expanded CAG repeats in all reading frames and gives rise to the homopolymeric proteins of long polyglutamine, polyserine and polyalanine tracts [reviewed in (73,132)]. Whereas protein products of the sense transcripts build up as intranuclear inclusions that are detected in human brain autopsy tissue and in the brains of transgenic SCA8 mice (69), the nuclear retention of mutant CUG transcripts is manifested by the formation of RNA inclusions that colocalize with MBNL1 in selected neurons in the cerebellum. This event is thought to affect the alternative splicing of a number of CNS transcripts including APP, MAPT, NMDAR1 and MBNL1, which mimics aberrations observed in DM1 (68). In addition, the SCA8 CUG repeat expansion transcripts trigger splicing changes and increased expression of the GABA-A transporter 4 (GAT4/Gabt4) due to the dysregulation of MBNL/CELF regulated pathways in the brain (69).
Toxicity mediated by expanded CGG and CAG repeat RNAs
Strong evidence has been provided for the contribution of an RNA gain-of-function mechanism also in the pathogenesis of FXTAS and some of the polyglutamine disorders (11,109). In these diseases, specific proteins are sequestered into nuclear foci containing expanded CGG and CAG repeats that cause the deviated splicing of specific transcripts, which resembles DM1 pathogenesis (Figure 4B). In FXTAS, the mis-splicing might affect only transcripts regulated by proteins recruited early to CGG repeat foci as shown by Charlet-Berguerand and colleagues (108). In fact, in mutant CGG-expressing cells, the only foci protein whose functional activity is compromised is splicing factor SAM68, which is recruited to RNA foci before hnRNP G and MBNL1. Consequently, the misregulation of pre-mRNA alternative splicing controlled by SAM68 is observed in CGG-transfected cells and in FXTAS patients. An analysis of alternative splicing of the phospho-type-4 ATPase-11B (ATP11B) pre-mRNA showed a splicing misregulation in SAM68-depleted cells as measured by a significant decrease of exon-28B inclusion in comparison with control samples (108).
Experimental evidence has shown that expansions of CAG repeats in coding exons may convey pathogenic potential not only to polyQ proteins, but also to transcripts that harbor the repeat mutation. Initially, it was demonstrated by Cooper and colleagues that in COSM6 cells the transient expression of 960 CAG repeats causes nuclear retention of the expanded repeats (104). Subsequent experimental evidence provided support for the toxic capacity of mutant CAG repeat RNA in transgenic mice (11), worms (133) and the SCA3 Drosophila model (62) as well as in human SCA3 and HD fibroblasts (12). However, because mutant CAG repeat-triggered alternative splicing abnormalities were discovered very recently, the scale of these effects and their roles in pathogenesis has yet to be determined (Figure 4B). The most recent results from our laboratory show that in human HD and SCA3 fibroblasts, the expression of endogenous HTT and ATXN3 mutant transcripts causes the formation of CAG repeat foci and MBNL1 sequestration (12,88), which in turn trigger the aberrant splicing of endogenous sarco/endoplasmic reticulum Ca2+ ATPase 1 (SERCA1) and INSR transcripts. Similarly, expanded but exogenous CAG repeats mimic CUG repeats in the misregulation of alternative splicing, and in HeLa and neuroblastoma cells both types of repeat RNA cause defects in the processing of SERCA1, INSR, LIM domain binding 3 (LDB3) and CLCN1 pre-mRNAs (12). These results show that transcripts containing expanded CAG repeats may contribute to the pathogenesis of HD, SCA3 and perhaps other polyglutamine diseases.
EXPERIMENTAL THERAPIES DIRECTED AGAINST EXPANDED REPEAT RNA
To date, three main strategies which use expanded repeat RNA as a target have been tested as experimental therapies for TREDs. The first is based on degrading mutant transcripts with RNA interference (RNAi) tools or antisense oligonucleotides (AON); the second utilizes repeat hairpin-specific small compounds or antisense oligomers to inhibit pathogenic interactions between repeat RNAs and nuclear proteins; and the third is aimed at blocking toxic protein synthesis by binding chemically modified antisense oligomers to repeats or by targeting repeats with mutant siRNAs acting as miRNAs (Figure 5). These strategies are aimed at destroying the toxic agents of pathogenic pathways associated with TREDs which involve either RNA, protein or both.
An important issue in therapies for TREDs that involve repeat targeting reagents is the requirement for gene and allele selectivity. Despite the fact, that there are numerous mRNAs containing triplet repeats in human transcriptome, the specific inhibition of mutant gene expression by targeting repeat regions is a promising therapeutic strategy for diseases caused by CAG and CUG repeat expansion. Among the features used in designing selective reagents are differences between normal and mutant transcripts related to their repeat sequence lengths and structural properties as well as, in the case of DM, cellular localization. The intended targets i.e. expanded repeats are likely to form hairpin structures in cells which make them both more prone (increased repeat length) and more resistant (more stable structure) to interaction with targeting reagents in comparison with normal repeats. The necessity of selective inhibition of mutant allele expression may not be equally important for all polyQ diseases; nevertheless, preserving the expression of the protein from normal allele seems to be an advantageous feature of any therapeutic strategy.
Below, we describe the use of AON and RNAi reagents as well as that of small compounds that specifically bind to some repeat hairpins. According to their mechanism of action, antisense reagents can be divided into two categories: ‘cutters’, which bind to complementary targets and induce their cleavage taking advantage of RNaseH or Argonaute 2 (AGO2) activities (AON or siRNA, respectively), and ‘blockers’, which are oligomers that bind to complementary targets, either alone or within protein complexes, but do not induce the cleavage of their complementary target. Examples of ‘blockers’ include peptide nucleic acids (PNAs), locked nucleic acids (LNAs), morpholino and miRNA-like acting duplexes (Figure 5). The exact mechanisms by which different reagents described below exert their activity are not yet known in detail.
The degradation of mutant transcripts induced by antisense oligonucleotides and RNAi triggers
Therapies for polyQ expansion diseases are aimed primarily at reducing the level of the mutant protein. In addition, mutant transcript downregulation may be desirable because of its involvement in pathogenesis. RNAi reagents require ~20 nt of complementary sequence for efficient silencing, which constitutes only 7 CAG repeats. While the normal alleles of CAG-bearing transcripts usually have 10–20 repeats, their mutant versions contain 40–100 CAG repeats meaning that transcripts from both alleles can be targeted by triplet repeat siRNA duplexes. Moreover, only among annotated human mRNAs, there are at least 50 transcripts containing CAG repeats and 30 with CUG repeats (29) being potential targets for complementary AONs and RNAi reagents.
RNA duplexes composed of CAG and CUG repeat strands have been tested in cell culture for their ability to silence HTT, AR, ATXN1 and ATXN3 transcripts. The reagents used included 81-bp long synthetic CAG/CUG RNA duplexes (134), 21-bp siRNA duplexes (135–137) and shRNA (138), and all showed only a slight silencing preference for mutant allele versus normal allele. Both strands of CAG/CUG siRNA were found to be active and silenced other transcripts also containing complementary repeats causing a considerable loss of the viability of human fibroblasts (137). The CAG/CUG siRNA induced some toxicity also in two Drosophila models co-expressing elevated levels of expanded CTG and CAG repeats (77,78). Interestingly, the cell culture experiments showed that hairpin structures which are likely formed in cells by expanded CAG repeats did not inhibit the activity of RNAi machinery directed at the repeat region (88,137). As a result, mutant HTT transcripts were efficiently targeted by CAG/CUG repeat siRNAs and the repeated CAG sequence was cleaved at numerous positions by AGO2 loaded with CUG repeat guide strand (88).
In DM1, the degradation of expanded CUG repeat transcripts was taken into consideration as possible therapy (139). To destroy toxic RNA in cells, several types of antisense reagents directed against CUG repeats have been successfully tested. Retroviral expression of 149-nt RNA complementary to part of the DMPK 3′-UTR in DM1 myoblasts resulted in the 80% silencing of the mutant DMPK transcript and a 50% reduction of the normal transcript leading to the partial restoration of some myoblast normal functions (140). Recently, Furling and colleagues (141) engineered AONs by the covalent linking of RNA sequences composed of 7, 11 or 15 CAG repeats to human U7 small nuclear RNA for their delivery exclusively to the cell nucleus. The lentiviral transduction of DM1 muscle cells with such constructs caused the specific silencing of ~60–70% of mutant DMPK mRNA without affecting normal DMPK transcripts. In the treated cells, the number of nuclei containing toxic mutant CUG repeat foci was reduced and the alternative splicing of several DM1-affected genes was significantly corrected (141).
In another study, Wansink and colleagues (142) designed a short single-stranded (CAG)7 2′-O-methyl phosphorothioate oligonucleotide (PS58) to degrade transcripts containing expanded CUG repeats. Such chemical modification of antisense oligomers is known to stabilize duplex formed with a target sequence and increase the resistance of AONs to cellular nucleases. PS58 decreased the level of mutant DMPK transcripts by 50–90% in patient myoblasts and in two DM1 mouse models, DM500 and HSALR. Interestingly, the levels of other transcripts containing short CUG repeats remained almost unchanged. Local administration of this AON into the mouse skeletal muscles reduced CUG repeat foci formation, partially restored the alternative splicing of DM-specific exons and significantly reduced myotonia (142).
Antisense ‘blockers’ targeting triplet repeat RNA
Another straightforward therapeutic strategy is targeting expanded CAG repeats in mutant transcripts to decrease the production of toxic polyQ protein. Corey and colleagues have shown the allele-selective inhibition of toxic protein translation in HD and SCA3 cells, using antisense PNA and LNA oligomers composed of 7 CTG repeats (136). After strong binding to complementary sequences, these oligomers formed an impassable translational barrier only in mutant transcripts (Figure 5). REP19N, containing 19 PNA residues and modified by addition of lysine residues at both termini, was the best allele-discriminating reagent, with a specificity of inhibition at least 10 times greater for mutant HTT protein translation than for normal protein. Moreover, the translation of other transcripts containing CAG repeats was not inhibited significantly in PNA- and LNA-treated cells. PNA oligomers were also tested for mutant ATXN3 translation inhibition; however, the maximum selectivity achieved (≤5) was lower than that observed for HTT (136). Oligomers with other chemical modifications such as 2′, 4′-constrained ethyl (cEt), carba-LNA, 2′-O-methoxyethyl (2′-MOE) or 2′-fluoro have also been shown to be promising blockers of mutant huntingtin translation (143).
Another approach to RNA repeat-targeted inhibition of mutant protein synthesis is the application of CAG/CUG siRNA duplexes containing base substitutions at specific positions causing mismatches with their mRNA targets. This approach was tested in cultured HD fibroblasts (137,144) where several duplexes caused the efficient translational inhibition of mutant huntingtin without downregulation of its transcript. CAG/CUG duplexes that form one or more mismatch with target sequences most likely act as miRNAs despite the fact that their target is located in the ORF. In most efficient and selective duplexes, the mismatched bases were present in the central positions of the duplex (144) or in its 3′ half (137). Interestingly, transcriptional activation of the normal HTT allele triggered by some of these duplexes occurred concomitantly with the silencing of mutant allele (137). This approach was also successfully used to target the CAG repeat region of the mutant ATXN3 transcript, although the selectivity of the reagents was lower than that observed for HTT inhibition (145). Efforts are being continued to develop this strategy further and to obtain even more selective reagents.
The primary pathogenic effect of expanded CUG repeats in DM1 is the sequestration of nuclear proteins, including MBNL1. Therefore, preventing pathogenic RNA–protein interactions is another approach to reducing the toxic effects of mutant RNA. Proof of the principle for this concept came from two independent studies (146,147). Swanson and colleagues showed that the overexpression of MBNL1 in an expanded CUG repeat expressing HSALR mice can reverse the DM-like phenotype (146). Thornton and colleagues using an antisense morpholino oligomer composed of 25 CAG repeats (CAG-25) inhibited the MBNL1/RNA interaction and disrupted the complexes preformed in vitro (147) (Figure 5). The morpholino modification provides resistance to cellular nucleases, high-stability duplexes with complementary targets and does not induce the cleavage of targeted RNA. These features made it a promising agent for in vivo therapy. The local administration of CAG-25 into the skeletal muscles of DM1 HSALR mice expressing (CUG)250 RNA resulted in some molecular and phenotypic changes manifested through the significant reduction or elimination of nuclear CUG repeat foci and nearly a complete reversion of abnormal splicing. These changes were driven by the release of MBNL1 protein from sequestration and the restoration of ion channel function leading to a significant reduction of myotonia. Moreover, in treated muscles, the mutant transcript was efficiently exported from the nucleus and translated in the cytoplasm. Importantly, the researchers found that the in vivo beneficial molecular effects were observed for as long as 14 weeks after a single injection and the expression of other transcripts containing short CUG repeats was not affected in skeletal muscle treated with CAG-25 (147).
Among the advantages of using repeat-targeting reagents, both ‘cutters’ and ‘blockers’, the most important is their potential application in several expansion-driven diseases. In contrast, widely exploited gene-specific and SNP-based strategies are applicable only for a fraction of patients suffering from specific disorders. On the other hand, the main challenge for repeat-targeting strategies is directing the reagent specifically to the mutant allele and leaving other repeat-containing transcripts intact. At this time, all potential mRNA off-targets are known (29), as discussed in ‘Triplet Repeats are Frequent Motifs in Human Transcripts’ section of this review, and their representatives may be tested along with testing allele selectivity of silencing disease-causing genes.
Small compounds that bind specifically to repeat RNA hairpins
Another strategy considered for the treatment of some TREDs is based on the identification of small compounds that interfere with pathogenic interactions between expanded RNA repeats and proteins. Different groups have used various approaches to identify ligands that specifically bind CUG and CAG repeat hairpins (148–153). Screening a library of ~11 000 compounds yielded a few molecules that showed selectivity for binding to either short or expanded CUG repeat hairpins (148). These ligands were able to prevent the CUG repeat/MBNL1 interaction in vitro with a low micromolar inhibition constant. In another work, pentamidine and neomycin B were shown to inhibit the interaction of short CUG repeat RNA with MBNL1 in vitro (152). However, only the former drug reversed the mis-splicing of two DM1-affected transcripts and released MBNL1 from nuclear inclusions in cells expressing 960 CUG repeats. Additionally, high doses of pentamidine administered by intraperitoneal injection into HSALR mice partially reversed the mis-splicing of Serca1 and Clcn1 pre-mRNAs (152). In the most recent work, d-amino acid hexapeptide (ABP1) was selected from a combinatorial peptide library screen in DM1 Drosophila model (153). In vitro, ABP1 induced a switch of CUG hairpins to a single-stranded conformation, whereas in Drosophila it reduced CUG foci formation and suppressed CUG-induced lethality. An intramuscular injection of ABP1 into DM1 HSALR mice reversed muscle histopathology and mis-splicing of Serca1 and Tnnt3.
The important issue regarding such compounds is their binding specificity for RNA structures formed by repeated sequences. To address this issue, Disney and colleagues designed several multivalent ligands containing between two and five Hoechst 33258 or kanamycin A derivatives attached to a peptoid backbone (Figure 5) (150,151,154). The modularly assembled ligands differed in their distance between RNA-binding modules. All ligands were tested both for their binding to CAG and CUG repeat transcripts and for the inhibition of repeat RNA/MBNL1 interactions. The best multivalent ligands inhibited the formation of RNA–protein complexes with inhibition constant falling within the low nanomolar range. These promising compounds have not yet been tested in cellular or animal systems expressing expanded repeats; however, their cell permeability has been demonstrated (151).
FINAL REMARKS AND FUTURE PERSPECTIVES
In this article, we reviewed several aspects of research on triplet repeat RNA. The occurrence of TNRs in human transcripts was presented and the structural features of both normal and pathogenic repeats were described. The role of mutant TNRs as triggers of disease pathogenesis and targets in experimental therapies for TREDs was discussed wherever relevant, from the perspective of RNA structure.
At present, all annotated human mRNAs containing triplet repeat tracts have been identified. However, only a fraction of these transcripts is relatively well characterized in terms of their TNR length polymorphism and tissue expression. There is a much larger gap in our knowledge of TNR RNAs if we take into consideration various antisense transcripts from human genes as well as sense and antisense transcripts from intergenic regions. Nevertheless, even this gap may be filled soon as relevant data may be mined by bioinformatics from several recently completed and on-going large-scale sequencing and expression profiling projects. We foresee that clearer and a nearly complete view of human TNR RNA repeatome is just on the horizon and this information may become available soon for more focused research.
The efforts of several laboratories over the past decade resulted in highly advanced structural characteristics of TNR RNA, which was achieved using biochemical and biophysical methods. TNRs may now be considered as a part of human RN-ome that belongs to the best-characterized structurally. We know which TNR types form G-quadruplex structures, which tend to form more stable and less stable hairpins and which are reluctant to form any higher order structures. We also know that repeats having hairpin structure forming potential are overrepresented in exons and therefore are likely implicated in some specific biological functions. At present, however, the normal functions of TNRs in transcripts are very poorly understood.
The specific functions of TNRs in RNA need to be expressed via interactions with proteins. But only a few proteins that interact directly with normal length TNR RNA have been identified thus far and their interactions characterized. As described in this review, the length of TNR RNA influences its protein binding properties and only mutant TNRs are efficient in specific protein sequestration. This raises a more general question about the protein binding potential of normal TNR sequences. One possible answer is that this potential is low and only few proteins are capable of getting involved in such interaction. The alternative answer is that this potential is higher, but no systematic study was conducted thus far to disclose it. In our opinion, further research focused on finding additional proteins that act on relatively short normal repeats and on expanded pathogenic repeats, may help in a better understanding of the normal roles of these sequences, and possibly in identifying new pathways of pathogenesis in TREDs. High-throughput analyses of triplet repeat RNAs associating with proteins are also encouraged to identify their transcriptome-wide interaction maps in cells.
Looking forward to any future structural studies of TNR RNA, we predict that recent successes at resolving crystal structures of short TNR duplexes may stimulate efforts to determine high-resolution structures of longer repeat TNRs and their complexes with various biologically and therapeutically relevant ligands. Further progress in this direction may help to better understand the RNA-triggered pathogenic mechanisms in TREDs, and promote a rational design of repeat-targeting therapeutic agents.
Recent years have witnessed rapid progress in developing experimental therapies for TREDs in various cellular and animal model systems. Several different concepts and approaches have been successfully tested, making clinical trials on humans a realistic prospect. Repeat-targeting antisense reagents and RNA interference reagents acting in miRNA fashion seem to be the most promising treatment options. Efforts are now under way to better characterize such reagents and optimize their gene selectivity and allele selectivity, and minimize sequence-specific and non-specific off-target effects.
European Regional Development Fund within Innovative Economy Programme (POIG.01.03.01-00-098/08 to W.J.K.); Ministry of Science and Higher Education (N N301 569340 to W.J.K.; N N302 278937 to P.K.; N N302 260938 to K.S. and N N401 572140 to M.W.). Funding for open access charge: European Regional Development Fund within Innovative Economy Programme (POIG.01.03.01-00-098/08).
Conflict of interest statement. None declared.