Higher order RNA structures can mask splicing signals, loop out exons, or constitute riboswitches all of which contributes to the complexity of splicing regulation. We identified a G to A substitution between branch point (BP) and 3′ splice site (3′ss) of Saccharomyces cerevisiae COF1 intron, which dramatically impaired its splicing. RNA structure prediction and in-line probing showed that this mutation disrupted a stem in the BP-3′ss region. Analyses of various COF1 intron modifications revealed that the secondary structure brought about the reduction of BP to 3′ss distance and masked potential 3′ss. We demonstrated the same structural requisite for the splicing of UBC13 intron. Moreover, RNAfold predicted stable structures for almost all distant BP introns in S. cerevisiae and for selected examples in several other Saccharomycotina species. The employment of intramolecular structure to localize 3′ss for the second splicing step suggests the existence of pre-mRNA structure-based mechanism of 3′ss recognition.
Ribonucleic acid is considered to be the earliest as well as the most versatile information polymer that carries sequential information and forms higher order structures. Precursor mRNA replicas of protein-coding genes are processed by the spliceosome to remove introns; this maturation phase occurs, at least with some transcripts, in all eukaryotic cells studied so far. Spliceosomal introns are marked by four splicing signals on the level of nucleotide sequence: the 5′ splice site (5′ss), branch point (BP), polypyrimidine tract (pY-tract), and 3′ splice site (3′ss). These signals by themselves, however, are not sufficient to predict a splicing event in the Metazoa. Additional inputs, including the propensity of pre-mRNA to attain a thermodynamically stable fold, are required ( 1 ). The most context dependent is the recognition of 3′ss, distinguished only by the sequence AG, whereas the positions of 5′ss and BP are demarcated by seven and five-nucleotide sequences, respectively. 3′ss is also the last signal to be recognized in the splicing cycle.
The ability of certain pre-mRNAs to form intra-molecular secondary structures that affect the outcome of splicing was recognized more than 25 years ago ( 2 , 3 ). Various types of such structures have since then been shown to impact both the constitutive and alternative splicing in many species ( 4 , 5 ).
(A) The linear sequences can be base paired to complementary regions in stems/helices whereby splicing signals or enhancer/silencer motifs are blocked from recognition by snRNAs or RNA-binding proteins. In the human SMN2 gene, a secondary structure involving 5′ss of exon 7 hinders the interaction with U1 snRNA and leads to exon exclusion ( 6 ).
(B) Distant splicing signals of long introns, which are on a threshold for recognition, can be brought to proximity and thus made available for the spliceosome. In the two tandem introns of the YL8A gene, both of which form stems between 5′ss and BP, the swapping of complementary sequences between introns causes exon skipping ( 7 ).
(C) Interactions over yet a longer range may loop out whole exons and induce complex patterns of alternative splicing. A conserved stem structure was responsible for alternative exclusion of exon 5 in the Drosophila Nmnat gene ( 8 ).
(D) Higher order structure-epitopes may bind regulatory proteins or small metabolites such as riboswitches ( 9 ). Saccharomyces cerevisiae RPL30 transcript folds in a structure that binds the gene's product L30. The L30 protein then blocks spliceosomal rearrangements required for U2 snRNP mediated BP-region recognition and hence inhibits the splicing of its own transcript ( 10 ).
The yeasts of the Saccharomyces genus ‘sensu stricto’ ( 11 ) belong to intron-poor organisms with splicing limited to only ∼5% of their genes ( 12 ). The introns, mostly one per gene, reach up to ∼1000 nt in length. Some introns with long BP to 5′ss distance, e.g. RPS17B , require a secondary structure within pre-mRNA for efficient BP recognition and first step-spliceosome assembly ( 13–15 ). The recognition of 3′ss also depends on the number of nucleotides separating BP and 3′ss, but this is not well understood at present. Artificially extending the BP to 3′ss distance of ACT1 intron in S. cerevisiae to ∼120 nt completely abolishes splicing ( 16 ). However, there are 22 introns in S. cerevisiae that have BP to 3′ss distance longer than 60 nt (Saccharomyces Genome Database) in which the spliceosome has to rely on additional mechanisms for 3′ss recognition.
Here, we present the comparison of splicing efficiencies of the wild-type and manipulated COF1 and UBC13 introns which have a long BP to 3′ss distance. Our data suggest that a stable stem-loop forms between BP and 3′ss in these introns. We show that the secondary structure is essential for the recognition of the proper 3′ss by shortening the structural distance between BP and 3′ss and by masking BP-proximal cryptic 3′ss. As RNA structure analysis tools also predict structures in other long Saccharomycotina introns as well, we reason that these and perhaps also other organisms use a pre-mRNA structure-based mechanism of 3′ss recognition.
MATERIALS AND METHODS
Yeast strains, media and growth conditions
Primer extension experiments were performed using S. cerevisiae strain EGY48 ( MATα his3 trp1 ura3 LexAop(x6)-LEU2 ) ( 17 ). Strain 46ΔCup ( MATa ade2 cup1Δ::ura3 his3 leu2 lys2 trp1 ura3, GAL+ ) ( 18 ) was employed for an in vivo copper sensitivity splicing assay. Cells were grown in YPD plus adenine or in synthetic complete drop-out media supplemented with the required amino acids at 30°C. For testing of Cu 2+ resistance, cells expressing CUP1 fusion reporter were cultivated to OD 600 approximately 0.4, concentrated to OD 600 4, spotted in 8-fold dilution series on plates with media containing the indicated concentration of CuSO 4 , and cultivated for 3 days.
Construction of splicing reporters
All CUP1 -based reporters were expressed from replicative p423GPD vector. Plasmid constructs used in this study are listed in Supplementary Data ( Supplementary Table S2 ). COF1-CUP1 reporter was constructed as follows: 221 bp fragment of COF1 gene (exon 1 including 13 nucleotides upstream of the translation start codon, intron and 15 nucleotides of exon 2) and complete coding sequence of CUP1 gene was amplified from genomic DNA using polymerase chain reaction (PCR) and primer pairs OG44/OG45 and OG46/OG47, respectively ( Supplementary Data , Supplementary Table S3 ) and TOPO-TA cloned into pCR®II-TOPO® vector (Invitrogen). BamHI/EcoRI fragment of COF1 and EcoRI/SalI fragment of CUP1 were inserted into p423GPD vector, resulting in 416 bp COF1-CUP1 fusion expressed from TDH3 promoter. COF1-CUP1 reporters with single-nucleotide substitutions were generated in p423GPD vector by site-directed PCR-based mutagenesis using QuikChange® II Site-Directed Mutagenesis Kit (Stratagene) and primers listed in Supplementary Table S3 . DNA fragments encoding COF1-CUP1 reporters containing internal intron deletions and all UBC13-CUP1 reporters were synthesized commercially by GeneArt (Germany) and inserted into BamHI/SalI-digested p423GPD. UBC13-CUP1 reporter contained UBC13 fragment (exon 1 including 19 nucleotides upstream of the translation start codon, intron and 23 nucleotides of exon 2) and complete coding sequence of CUP1 .
Primer extension analysis
Cells harboring reporter plasmid were cultivated to OD 600 approximately 0.5–0.8 and harvested. Total RNA was isolated by MasterPure™ Yeast RNA Purification Kit (Epicentre Biotechnologies). Primer extension reactions were performed with the RevertAid™ M-MuLV Reverse Transcriptase (Fermentas) on 3–4 µg of total RNA. The reactions were primed using the oligonucleotide YAC6, annealing to the 5′-end of CUP1 ORF, and YU14, complementary to U14 snoRNA. Primers were radiolabeled on 5′-ends by phosphorylation using T4 Polynucleotide Kinase (Fermentas) and [γ- 32 P]ATP (3000 Ci/mmol; MP Biomedicals). The products were separated on 8% polyacrylamide/7 M urea gels and visualized by phosphorimager. The identities of selected bands (see ‘Results and Discussion’ section) were confirmed using 5′-RACE System for Rapid Amplification of cDNA Ends (Invitrogen).
RNA in-line probing
Information on the generation of DNA templates for RNA in vitro transcription, preparation of RNA, RNA end-labeling and the generation of RNA-ladders is provided in Supplementary Data . In-line probing was carried out essentially as described in ( 19 ). To monitor the stability of the various intronic sequences, RNAs were incubated for 45 h at temperatures of 10°C, 20°C, 30°C or 37°C. The incubations were terminated by the addition of an excess of gel loading buffer. The products of spontaneous RNA degradation were separated, together with non-treated RNA and the products of the RNase T1 digest, on denaturing polyacrylamide gels containing 7 M urea. Gels were run at a limiting current of 25 mA for at least 8 h. Visualization was carried out by phosphorimaging.
RNA structure predictions
Secondary structures of introns were predicted by RNAfold ( 20 ) and RNAshapes ( 21 ) algorithms. Free energy of secondary structures was calculated using RNAfold with default settings, except that the temperature was set to 30°C. Sequences of analyzed introns from S. cerevisiae were downloaded from the Saccharomyces Genome Database ( http://www.yeastgenome.org/ ).
RESULTS AND DISCUSSION
Efficient splicing of COF1 intron requires the formation of a stem between BP and 3′ splice site
Screening UV-mutagenized S. cerevisiae cells for splicing-defective mutations, we identified a G to A transition 31 nt upstream of 3′ss in COF1 intron (referred to as G149A; Figure 1 A). The mutation caused an approximately 6-fold increase of pre-mRNA and 2-fold decrease of mRNA levels as compared to wild-type cells (data not shown), suggesting a defect in splicing. COF1 belongs to the subset of genes in budding yeast with an exceptionally long distance between BP and 3′ss. Intriguingly, according to the literature at present, to be efficiently identified as acceptor site in S. cerevisiae , 3′ss ought to be placed no further than 55 nt from BP ( 16 ). We thus analyzed the sequence of COF1 intron ( COF1i ) using available algorithms for RNA structure prediction.
RNAfold- ( 20 ) and RNAshapes- ( 21 ) based models predicted the formation of a long stable stem structure between BP and 3′ss in wild-type COF1i . When G149A substitution was introduced, an additional internal loop appeared within the stem, which resulted in the destabilization of the predicted structure ( Figure 1 B). Other secondary structure predicting tools, including Mfold ( 22 ), which uses different physical parameters, and knowledge-based MC-Fold ( 23 ) showed qualitatively similar results. All the algorithms applied to the wild-type intron sequence predicted the existence of two double stranded regions, which hereafter will be referred to as the ‘inner’ and ‘outer’ stem (highlighted in Figure 1 B). Importantly, the same stem formation was predicted independently of the length of the flanking sequences on 5′- and/or 3′-end. We decided to test the prediction that the stem exists between 75 and 153 nt of COF1i ( Figure 1 B) and to study the splicing of wild-type and mutant COF1i versions in more detail.
To test splicing efficiency, we performed a primer extension analysis of COF1-CUP1 fusion reporters expressed in S. cerevisiae EGY48 strain. Unmodified COF1i containing pre-mRNA was spliced efficiently, whereas G149A mutation caused a severe splicing defect resulting in barely detectable quantities of spliced mRNA ( Figure 2 A, lanes 1 and 2). As G149A destabilized pairing in the inner stem of the predicted structure ( Figure 1 B), we asked whether G149A impairment could be suppressed by the substitution of predicted-complementary nucleotide (C80U). As expected, the G149A+C80U double mutant attained wild-type stability and was spliced efficiently ( Figure 2 A, lane 3). C80U single mutation affected neither the RNAfold predicted stability nor splicing ( Figure 2 A, lane 4) hypothetically because the G-U pair in the stem would be stable enough to maintain wild-type properties. In an effort to disrupt the structure by independent mutation, we manipulated a nucleotide adjacent to G149. A148U ( Figure 2 A, lane 5) and A148C (not shown) substitutions had negative effect on both predicted structure stability and splicing, similarly to G149A, whereas A148G, which allows alternative G-U pairing, had no effect ( Figure 2 A, lane 6). In all the cases tested, calculated structure stability correlated with experimentally tested splicing efficiency. To test whether the secondary structure or the sequence of the predicted inner stem-region is crucial for efficient COF1i splicing, we randomized all 10 base pairs of the inner stem such that the stability of the modeled structure remained the same (this variant is referred to as COF1 (hel); Figure 2 B, left panel). Although the sequence was extensively altered, the splicing of this intron was not impaired, as documented by primer extension ( Figure 2 B, middle panel). Efficient splicing was demonstrated also by the resistance of cup1-Δ cells expressing COF1 (hel)- CUP1 reporter to increased Cu 2+ concentration ( Figure 2 B, right panel).
We used in-line probing analysis ( 19 ) to compare relative stabilities of 5′-3′ phosphodiester bonds of wild-type and G149A RNAs. The method allows monitoring of secondary structures in RNA molecules: base paired regions cannot adopt a conformation that allows for spontaneous RNA degradation, while single stranded regions can. For wild-type COF1i , degradation started to appear at 30°C and increased slightly at 37°C (wild-type; Figure 2 C). G149A-mutated structure deviated from the wild-type in several aspects. First, there were strong additional in-line cuts in the regions of A147-A151 and C80 (G149A; Figure 2 C), which were predicted to form the complementary arms of the inner stem ( Figure 1 B). Second, the G149A intron was considerably less stable than wild-type at 37°C; in some experiments, we observed its almost complete degradation (data not shown). When the compensatory C80U mutation was introduced into the G149A intron, the in-line probing pattern was reversed to that of the wild-type (G149A+C80U; Figure 2 C), which indicated that the secondary structure was re-stabilized. Taken together, we demonstrated that a secondary structure which reduces the BP to 3′ss distance, rather than any particular sequence motif between the two splicing signals, is critical for efficient COF1i splicing.
Secondary structure within COF1 intron masks potential 3′ splice sites
We generated a set of internal COF1i deletions ( Figure 3 A) and tested their splicing efficiency using CUP1 fusion reporters. Deletion of nucleotides 91 to 140, which are predicted to be involved in outer stem formation, did not detectably affect splicing efficiency ( cof1 (Δ91-140); Figure 3 B, lanes 1 and 2). However, concomitant destabilization of the inner stem led to dramatic decrease of mRNA signal and the appearance of an additional product [ cof1 (Δ91-140, G149A); Figure 3 B, lane 3]. Using 5′-RACE technique, we found that this product corresponds to mRNA spliced to AAG located 27 nt upstream of the regular 3′ss. As expected, adding the G149A-complementary C80U mutation, which should stabilize the stem, partially restored the use of the annotated 3′ss ( cof1 (Δ91-140, G149A+C80U); Figure 3 B, lane 4). We then deleted the whole stem, obtaining an intron with 56 nt between BP and 3′ss [ cof1 (Δ76-152)]. This variant was spliced to CAG positioned 23 nt upstream of the regular 3′ss ( Figure 3 B, lane 5). Thus, both cof1 (Δ91-140, G149A) and cof1 (Δ76-152), which are predicted to lack stable structures (RNAfold), were spliced to the first acceptor AG downstream of BP. Crucially, a variant with the BP–3′ss distance of 31 nt (similar to S. cerevisiae median) was spliced as efficiently as full-length COF1i , generating wild-type mRNA [ cof1 (Δ76-176); Figure 3 B, lane 6].
In summary, we demonstrated that neither the complementary sequences nor the stem–loop structure per se are needed by the spliceosome. Rather, the secondary structure formed between BP and 3′ss of COF1 intron masks sequences that might otherwise serve as acceptor sites and thereby ensures proper 3′ss choice. Similar conclusions were reached when distant branch point (dBP) intron of ACT gene of Kluyveromyces lactis was analyzed as a heterologous construct in S. cerevisiae ( 24 ).
Secondary structure within UBC13 intron aids splicing in a temperature-dependent manner
We modeled the structures of all S. cerevisiae introns with BP to 3′ss distance longer than 50 nt (RNAfold). The vast majority of these introns was predicted to fold into a structure resembling the stem–loop characterized in COF1i (RNAfold; Figure 4 A, C and Supplementary Table S1 ). To further support the evidence that S. cerevisiae dBP introns depend on secondary structure between BP and 3′ss for splicing, we designed a CUP1 -based splicing reporter for the UBC13 gene, which has the second longest BP–3′ss sequence in S. cerevisiae (155 nt). We introduced multiple substitutions in one arm of the presumed stem of UBC13 intron ( UBC13i ), which destabilized the predicted structure (‘disordered’ in Figure 4 A and Supplementary Figure S1 ). The disordered UBC13i did not produce wild-type mRNA but was instead spliced to CAG located 38 nt downstream of BP (5′-RACE confirmed); this site was apparently masked by the stem structure in wild-type intron ( Figure 4 A, lane 1 and 2). When the base pairing (but not the original sequence) in the predicted stem was restored through a set of complementary mutations ( Figure 4 A and Supplementary Figure S1 ), wild-type splicing pattern was observed ( Figure 4 A, lane 3). Clearly, the requirement of secondary structure to overcome long BP–3′ss distance and to mask BP proximal sequences is not limited to COF1i .
Point mutation G232A, which caused only a mild decrease in the stability of the predicted structure at 30°C (data not shown), resulted in splicing that used several AGs, including the annotated 3′ss ( Figure 4 B lane 3). However, at 39°C, aberrant 3′ss were preferentially employed ( Figure 4 B, lane 4). The compensating C148U mutation suppressed, in a temperature-dependent manner, the inclusion of additional 3′ss ( Figure 4 B, lanes 5 and 6). These findings further support the hypothesis that the stem structure is responsible for proper 3′ss selection, as the stability of folded RNA is temperature dependent.
Long BP–3′ss sequences encode secondary structures in several Saccharomycotina species
We demonstrated that higher order structures are required for splicing of dBP introns in S. cerevisiae . We also noticed the occurrence of such structures in other intron-poor Saccharomycotina species (hemiascomycetes) ( 11 ). RNAfold predicted stem-loops downstream of BP in COF1 introns in five species of the Saccharomyces ‘sensu stricto’ genus ( S. cerevisiae , S. paradoxus , S. kudriavzevii , S. mikatae and S. bayanus ; Supplementary Figure S2B ). A multiple alignment of intron sequences revealed a high conservancy in the regions predicted to base pair ( Supplementary Figure S2A ). For most of the nucleotides that are not conserved, base pairing is preserved. A change in one strand either preserves base pairing with the nucleotide of the other strand (e.g. A-U pair is changed to G-U), or is matched by the co-evolution of the opposite strand (e.g. G-C pair is replaced by A-U). Thus, there seems to be a selection against mutations destabilizing the secondary structure. Notably, the AGs present between BP and the physiological 3′ss do not seem to be immediately usable (they do not give rise to translatable mRNAs). We found the same conservancy on the level of both primary and secondary structure also for UBC13, YDR381C-A and UBC12 introns (data not shown). In fact, every other dBP intron we examined within the genus was similarly conserved. Outside of the genus, COF1i was conserved in position and secondary structure in Candida glabrata and Kluyveromyces lactis . In C. glabrata , all dBP introns tested, e.g. RPS4A , were structured. Long and structured intron was previously found in K. lactis ACT gene ( 24 ), but it is not conserved in the Saccharomyces genus.
Previous analysis of phylogenetic distribution of BP–3′ss distances within Saccharomycotina revealed two groups of species ( 12 ). A group with constrained BP to 3′ss distance (e.g. Debaryomyces hansenii ; 7–8 nt) and yeasts with unconstrained BP–3′ss spacing ( Saccharomyces genus, C. glabrata and K. lactis ; distances reach up to 166, 471 and 185 nt, respectively; http://genome.jouy.inra.fr/genosplicing/index.html ). Outside of the Saccharomyces ‘sensu stricto’ genus, the conserved position of an intron within a gene did not imply the conservancy of its BP–3′ss length. Also, complementary regions of dBP introns that we analyzed did not show any homology to transposons ( 24 ). Importantly however, in every Saccharomycotina intron we checked, BP–3′ss sequence over 60 nt folded into a stable structure (RNAfold).
Mechanisms of 3′ss recognition
In S. cerevisiae , splice site consensus sequences are recognized repeatedly during across-intron spliceosome assembly and subsequent rearrangements through both catalytic steps ( 25 ). For lariat formation, pre-mRNA substrate must contain 23 nucleotides downstream of BP in a sequence independent manner ( 26 , 27 ). This type of splicing, which occurs typically in S. cerevisiae , where the branch site is strictly conserved, is called AG independent. For the so-called AG dependent splicing, which is typical for some mammalian introns with weak pY-tracts, the acceptor YAG trinucleotide must be present for the first step of splicing to proceed ( 28 ). However, the YAG seems to be required for spliceosome assembly rather than for exon ligation. Experiments with 3′ substrates in trans clearly showed that these two phases can be separated ( 29 ). It seems that for both S. cerevisiae and mammalian introns, acceptor YAG must be correctly positioned only before the second step. We clearly observed lariat-exon 2 intermediate accumulation in all cases where the disruption of secondary structure inhibited mRNA formation ( Figures 2–4 ).
The recognition of 5′ss and BP_pY-tract_3′ss regions of long introns in mammalian cells involves the cotranscriptional formation of complexes across flanking short exons (exon definition complex) ( 30 , 31 ). dBP introns, comprising around 0.6% of all human introns, represent an additional problem of overcoming the separation of BP_pY tract from 3′ss. Long sequences between BP and 3′ss (>40 bp) are usually devoid of AG dinucleotides (AG exclusion zone, AGEZ) ( 32 ) and are presumed to be scanned by the spliceosome (leaky scanning model) ( 33 ). An example is the human serotonin receptor 4 gene ( HTR4 ), in which dBP introns 3, 4 and 5 contain AGEZs of 149–291 nt ( 34 ). We examined these and other human BP_pY-tract_3′ss regions and found them to be unstructured (RNAfold; data not shown).
In contrast, the recognition of distant 3′ss in S. cerevisiae does not obey the scanning model ( 35–37 ) and is dependent on the formation of a secondary structure. We reason that the requirement of a secondary structure for splicing of dBP introns as well as the presence of silent proximal AGs within it confirms that S. cerevisiae spliceosome cannot use a processive scanning mechanism to locate distant acceptor 3′ ss.
Splicing signals must be recognized both over distance and among competing sequences. We experimentally demonstrated in S. cerevisiae that COF1 and UBC13 introns, which both have distant branch points, are spliced with the aid of intramolecular structure within pre-mRNA. This dependence of splicing on intron structure may have evolved during the reductive evolution of hemiascomycetes ( 12 , 38 ). It remains to be seen whether pre-mRNA structure-mediated recognition of 3′ss is confined to intron-poor yeasts or whether it exists also in higher eukaryotes, which employ more complex networks of splicing regulation. The role of nascent RNA secondary structure offers exciting possibilities for the discovery of novel regulatory mechanisms, as the introns with long BP–3′ss distance may have acquired additional functions which proved advantageous for the organisms.
Supplementary Data are available at NAR Online.
Czech Ministry of Education, Youth and Sports grants (MSM0021620858; LC07032); Grant Agency of the Charles University grant (398811); Heisenberg stipend by the Deutsche Forschungsgemeinschaft (HA 3459/5 to C.H.). Funding for open access charge: Czech Ministry of Education, Youth and Sports grant LC07032.
Conflict of interest statement . None declared.
Anne Kalweit is acknowledged for generating the DNA templates for in vitro transcription.