-
PDF
- Split View
-
Views
-
Cite
Cite
Christian Twittenhoff, Vivian B Brandenburg, Francesco Righetti, Aaron M Nuss, Axel Mosig, Petra Dersch, Franz Narberhaus, Lead-seq: transcriptome-wide structure probing in vivo using lead(II) ions, Nucleic Acids Research, Volume 48, Issue 12, 09 July 2020, Page e71, https://doi.org/10.1093/nar/gkaa404
- Share Icon Share
Abstract
The dynamic conformation of RNA molecules within living cells is key to their function. Recent advances in probing the RNA structurome in vivo, including the use of SHAPE (Selective 2′-Hydroxyl Acylation analyzed by Primer Extension) or kethoxal reagents or DMS (dimethyl sulfate), provided unprecedented insights into the architecture of RNA molecules in the living cell. Here, we report the establishment of lead probing in a global RNA structuromics approach. In order to elucidate the transcriptome-wide RNA landscape in the enteric pathogen Yersinia pseudotuberculosis, we combined lead(II) acetate-mediated cleavage of single-stranded RNA regions with high-throughput sequencing. This new approach, termed ‘Lead-seq’, provides structural information independent of base identity. We show that the method recapitulates secondary structures of tRNAs, RNase P RNA, tmRNA, 16S rRNA and the rpsT 5′-untranslated region, and that it reveals global structural features of mRNAs. The application of Lead-seq to Y. pseudotuberculosis cells grown at two different temperatures unveiled the first temperature-responsive in vivo RNA structurome of a bacterial pathogen. The translation of candidate genes derived from this approach was confirmed to be temperature regulated. Overall, this study establishes Lead-seq as complementary approach to interrogate intracellular RNA structures on a global scale.
INTRODUCTION
RNA is a versatile biomolecule with diverse functions, ranging from carrying genetic information to regulatory as well as catalytic activities. Interactions between non-adjacent nucleotides (nt) establish secondary structures, whereas long-range interactions within the RNA molecule (e.g. kissing loops, tetraloop:receptor interactions or pseudoknots) and inter-molecular interactions (e.g. RNA–RNA or RNA–protein interactions) contribute to the formation of a three-dimensional (3D) structure. Structured regions in RNAs often have an impact on transcription, transcript maturation and degradation as well as translation, and thereby modulate gene expression. Non-coding RNAs (ncRNAs), such as tRNAs, rRNAs, sRNAs and ribozymes, typically fold into stable defined secondary structures. In contrast, many mRNAs harbor regulatory structural motifs that undergo dynamic conformational alterations in response to physical or chemical stimuli. Two known cis-encoded RNA regulators are ligand-binding riboswitches (1,2) and RNA thermometers (RNATs) (3). RNATs are frequently located in the 5′-UTR (5′-untranslated region) of bacterial mRNAs, regulating translation of heat shock- or virulence-related genes in response to temperature (4,5). The thermosensor adopts a thermo-labile secondary structure that prevents binding of the ribosome to the Shine-Dalgarno sequence at low temperatures. Upon increasing temperature, the RNAT gradually melts, allowing for ribosome binding and translation initiation (3). The reversibility of the zipper-like mechanism precisely adjusts translation efficiency to the cellular demand (6).
Since the structure and function of RNA molecules are interconnected, the secondary and tertiary architectures of RNAs are of great interest. Several methods are available to inspect folded RNAs. Secondary structures can be predicted computationally (7) by comparative sequence alignment or by predicting the thermodynamically most favorable structure (8). Experimentally, RNA structures can be solved by biophysical methods, such as nuclear magnetic resonance spectroscopy, X-ray crystallography, or cryo-electron microscopy (9–12). A widely used experimental approach is the probing of RNA structures in solution with the help of enzymes or chemicals (13). Recently, powerful global or transcriptome-wide structure probing methods have been established. These couple RNA structure probing techniques to next generation sequencing and uncover the RNA structurome of any given organism, even at different physiological conditions (14–16). The probes used in these approaches are either nucleases that digest the ribose-phosphate backbone, or chemicals that modify the RNA bases or ribose moiety. Due to their biological and physical properties, both enzymatic and chemical probes have their advantages and limitations (17,18). Enzymatic probes cleave nt either in a sequence-specific (e.g. RNase T1 that preferentially cleaves at unpaired guanines) or in a sequence-unspecific (e.g. Nuclease S1) manner (19). RNase V1 is the only probe that permits mapping of double-stranded nt (20). Due to their bulky structure, RNases may not be able to reach all single- or double-stranded regions and are not able to penetrate cell membranes, which limits their application to in vitro approaches (19). In contrast, chemical probes are suitable for in vivo probing due to their small size. All presently available RNA modification probes exclusively react with the bases or the backbone of single-stranded regions. For example, DMS (dimethyl sulfate) alkylates the Watson–Crick face of adenine and cytosine residues leading to termination of reverse transcription (21–24). Complementary to DMS-mediated probing, small carbodiimides form adducts with exposed Gs and Us of RNA structures. While the use of the carbodiimide CMCT (N-cyclohexy-N′(2-morpholinoethyl)carbodiimide metho-p-toluene sulfonate) is restricted to in vitro applications (19,25), recently, the EDC (1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide) reagent was established for probing of RNA molecules in intact cells (26,27). A most recent report describes the use of N3-kethoxal for labeling of single-stranded guanine bases in live cells (28). SHAPE (selective 2′-hydroxyl acylation analyzed by primer extension) reagents are less base-specific (29). They acylate the 2′-hydroxy group of ribose in all four nt, if they are single-stranded, and are applicable under cellular as well as cell-free conditions (30–33).
Divalent metal cations, such as lead(II), are another option for RNA structure probing (34). Lead cleavage of single-stranded nt has been used to determine RNA structures in vitro (35,36) and in vivo (37–39). Recently, lead probing was applied to resolve temperature-sensitive regions in an RNAT that mediates translational regulation of a cyanobacterial heat shock gene (6). Lead induces RNA cleavage by two distinct mechanisms. The first mechanism is rather strong and highly specific. It occurs near or at metal ion binding sites but is rarely observed (40,41). The second, weaker cleavage activity is more frequently detected and rather unspecific. Here, lead ions preferentially attack bulges, loops, and other single-stranded RNA regions, but do not cleave stacked nt or those involved in higher-order interactions (42–44). Both cleavage events are based on the same mode of action. Lead ions interact with the 2′-OH group of the ribose moiety to induce a nucleophilic attack of the 2′-oxygen on the vicinal phosphodiester, which results in 2′,3′-cyclic phosphorus and 5′-hydroxyl groups as cleavage products (45,46).
To interrogate the dynamic RNA structurome of the extraordinarily temperature-responsive human pathogen Yersinia pseudotuberculosis, we recently applied the parallel analysis of RNA structure with temperature elevation (PARTE) approach (47). Nuclease-based probing at 25°C, 37°C and 42°C mimicked environmental, host body and heat shock temperatures, respectively (48). Since this procedure relies on the structure-specific nucleases S1 and V1 (cleaving single- and double-stranded nt, respectively), it is limited to in vitro applications. Here, we describe a new method named Lead-seq that is able to chart bacterial RNA structuromes in vivo. Lead(II) acetate-induced RNA cleavage was followed by cDNA synthesis and next-generation sequencing, allowing a transcriptome-wide view on the structural RNA-landscape in Y. pseudotuberculosis.
MATERIALS AND METHODS
Strains and plasmids
All utilized bacterial strains, oligonucleotides and plasmids are summarized in Supplementary Tables S1–3, respectively. Correctness of nucleotide sequences was confirmed by automated sequencing (Eurofins Genomics, Martinsried, Germany). Enzymes for cloning were obtained from Thermo Scientific (St. Leon-Rot, Germany). Polymerase chain reaction (PCR), DNA manipulations, DNA restrictions and transformations were performed as described before (49).
Plasmid construction
Plasmids pBO7300 and pBO7301 were constructed as follows: the 5′-UTR of groES (YPK_3349) or clpB (YPK_3823) including 30 nt of the respective coding region (CDR) were PCR-amplified using primer pairs YPK_3349_fw/YPK_3349_rv or YPK_3823_fw/_YPK_3823_rv, respectively. Afterward, the resulting fragments were ligated into pBAD2-bgaB-His via NheI and EcoRI restriction sites, resulting in reporter plasmids pBO7300 or pBO7301. Repressing and derepressing point mutations were introduced by site-direct mutagenesis.
Determination of ideal lead acetate concentration
To achieve single-hit kinetics for in vivo RNA probing, we treated Escherichia coli cells (Supplementary Figure S1) with different concentrations of lead(II) acetate at 25°C and 37°C. All lead(II) acetate solutions were prepared freshly. A 1 M lead(II) acetate solution was mixed with 2 ml prewarmed (to 25°C/37°C) 4× LB medium (4 % (w/v) NaCl, 4 % (w/v) tryptone, 2 % (w/v) yeast extract) and distilled water to a final volume of 8 ml. Lead solutions of different concentrations (such as 175, 350 or 525 mM), corresponding to a final concentration of 50, 100 or 150 mM, respectively, were prepared. To determine appropriate lead(II) acetate concentrations for lead-mediated in vivo probing, cultures of E. coli DH5α (20 ml) were grown at 25°C and 37°C in LB medium (1 % (w/v) NaCl, 1 % (w/v) tryptone, 0.5 % (w/v) yeast extract) to an OD600 = 0.5. The lead solutions were added to the culture and cells were incubated with shaking for 7 min. Lead induced RNA cleavage was stopped by addition of 10 ml cold 500 mM ethylenediaminetetraacetic acid (EDTA) and cultures were placed on ice. Cells were pelleted, resuspended in 250 μl cold resuspension buffer (10 mM Tris–HCl, 100 mM NaCl, 1 mM EDTA, pH = 8.0) and 250 μl lysis buffer (50 mM Tris–HCl, 8 % (w/v) sucrose, 0.5 % (w/v) Triton X-100, 10 mM EDTA, 4 mg/ml lysozyme, pH = 8.0) was added. After incubation at 65°C for 90 s, RNA was obtained from samples by four times extraction with acid phenol and, subsequently, three times extraction with chloroform/isoamylalcohol (24:1). After ethanol precipitation, RNA pellets were resuspended in 50 μl ddH2O. RNA concentration was measured with a NanoDrop spectrophotometer ND-1000 (peqlab, Erlangen, Germany). First, total RNA was analyzed by agarose gel electrophoresis (Supplementary Figure S1 and degradation of 23S as well as 16S rRNA was determined by densitometry (AlphaEaseFC program). Second, to determine the extent of lead cleavage-induced degradation degree of the groES 5′-UTR:gfp mRNA, we performed primer extension (primer gfp_rev_seq; Supplementary Table S2). Cultures of E. coli DH5α (20 ml), harboring plasmid pBO2267 (Salmonella enterica serovar Typhimurium M556 translational 5′-UTR groES:gfp) were grown as described above. For primer extension, 20 μg RNA from each sample were mixed with 2.5 μl dNTPs (4 mM), 2 μl 5′ labeled (32P) Primer (gfp_rev_seq) as well as RNase inhibitor (RiboLock RI; Thermo Scientific) and heated to 80°C for 2 min. Subsequently, the sample was slowly cooled to 50°C followed by addition of 4 μl 5× first strand buffer, 1 μl DTT (100 μM), 1 μl RNase inhibitor (RiboLock RI; Thermo Fisher Scientific) and 1 μl SuperScript III reverse transcriptase (Invitrogen). After incubation at 52°C for 60 min, the reaction was stopped by addition of 2.7 μl NaOH (1 M) and heating to 70°C for 15 min. The samples were ethanol precipitated and reaction products were analyzed on an 8 % polyacrylamide gel. The DNA sequencing reaction was performed with the same primer using the Thermo Sequenase cycle sequencing kit (USB) and plasmid pBO2267 (Supplementary Table S3) as template. Degradation of the full-length groES 5′-UTR:gfp mRNA was determined by densitometry (AlphaEaseFC program). The two experiments allowed to determine the appropriate concentrations of lead(II) acetate to equalize the extent of partial RNA digestion at 25 and 37°C.
In vivo probing of the Y. pseudotuberculosis transcriptome
Cultures of Y. pseudotuberculosis YPIII (Supplementary Table S1) were grown at 25°C and 37°C in LB medium to an OD600 = 0.5. Immediately, a 525 mM (final concentration 150 mM) and a 245 mM (final concentration 70 mM) lead acetate solution (2 ml 4× LB + 6 ml lead(II) acetate in distilled water) was added to the culture grown at 25 and 37°C, respectively. The cells were incubated under continuous shaking for 7 min, the reaction was stopped by addition of 10 ml cold 500 mM EDTA and cultures were placed on ice. Samples treated the same way but omitting lead(II) acetate treatment (2 ml 4× LB + 6 ml distilled water) served as internal control. RNA isolation was carried out by applying a hot phenol extraction protocol (49). Briefly, harvested cells were resuspended in 250 μl resuspension buffer and 250 μl lysis buffer prior to incubation at 65°C for 90 s. Afterward, 500 μl prewarmed (65°C), water saturated phenol was added, samples were vortexed intensively and incubated at 65°C for 3 min. Tubes were frozen in liquid nitrogen for at least 30 s and subjected to centrifugation (13.000 rpm, 10 min, room temperature). The withdrawn aqueous phase was again extracted four times with phenol, followed by addition of 300 μl chloroform:isoamylalcohol (24:1) were added and vigorous vortexing for 30 s. After a centrifugation step (15682 g, 10 min, room temperature), the aqueous phase was transferred into a new tube and the chloroform extraction was repeated. Finally, the RNA was precipitated by addition of 1/10 volume sodium acetate (pH 4.5) and 2.5 volume ice-cold 95 % ethanol and incubation at −20°C for 1 h. DNA was digested using the TURBO DNase (Ambion) according to the manufacturer's instructions, purified with phenol:chloroform:isoamyl alcohol and RNA concentrations were measured using a NanoDrop spectrophotometer ND-1000 (peqlab). The RNA quality was verified using the Agilent RNA 6000 Nano Kit on the Agilent 2100 Bioanalyzer (Agilent Technologies).
Library preparation
First, rRNA was depleted from 8 μg of total RNA using the Ribo-Zero rRNA removal Kit (Gram-Negative Bacteria; Illumina). Subsequently, the RNA (1.5 μg) was fragmented by sonication, using a Covaris Adaptive Focused Acoustics device (Covaris), to a median fragment size of 200 nt. Resulting fragments were phosphorylated at the 5′-end via T4 polynucleotide kinase (Thermo Fisher Scientific). Afterward, fragments were ligated to sample-specific 5′- and 3′-adapter oligonucleotides (Supplementary Table S4) using T4 RNA ligase (Thermo Fisher Scientific). Next, RNA samples were subjected to reverse transcription (RT) using SuperScript III reverse transcriptase (Invitrogen) as well as a primer complementary to the 3′-adapter. To enrich for correctly ligated and reverse transcribed fragments, cDNA was amplified by 15 cycles of polymerase chain reaction using PCR primers identical to corresponding Illumina adapters (Supplementary Table S4). The PCR products were separated on a 2 % agarose gel, fragments (150–500 bp) were cut out and purified using the QIAquick gel extraction kit (QIAGEN). The Illumina cluster station was used to perform Cluster generation. Single-end sequencing on the HiSeq2500 and Genome Analyzer II× followed a standard protocol. The fluorescent images were processed to sequences and transformed to FastQ format using the Genome Analyzer Pipeline Analysis software 1.8.2 (Illumina).
Sequencing data analysis and lead score calculation
To evaluate the quality of the sequencing output, we used the program FastQC (Babraham Bioinformatics). 5′- and 3′-adapter sequences were removed with cutadapt (50). All sequenced libraries were mapped to the Y. pseudotuberculosis transcriptome with bowtie (version 0.12.8) (51). The transcriptome was prepared according to the genome (NC_010465.1) and a transcriptome annotation published earlier (48).
Visualization of lead scores
To enhance comprehensibility of visualized lead scores, scores < 0.5 were set to zero when plotting explicit scores. Plotting was realized with the python package matplotlib v2.2.2. Varna (v3–93) (55) was used to visualize lead scores mapped to secondary structure.
Testing of reliability of lead scores
Location and secondary structure of tRNAs were identified with tRNAScan-SE (56). Identical sequences as well as transcripts without lead scores were filtered out, resulting in a test dataset of 32 tRNAs. Since lead scores result from counting 5′-ends of mapped reads, and we restricted the read length to a minimum size of 20 nt during read mapping, lead scores are missing for the last 20 nt of each mapped transcript. Consequently, those nt were excluded from the test dataset, and the final test dataset consisted of 1429 nt (sum of ∼45 nt from each of the 32 tRNAs), of which 764 are in a paired and 665 are in an unpaired state.
To test the reliability of lead scores, we calculated the Positive Predictive Value (PPV) as well as the sensitivity for each tRNA, regarding scores above different thresholds. The PPV is defined as the proportion of unpaired nt with a lead score (true positives) in all nt with a lead score. It ranges between 0 and 1, with 1 meaning that all nt above the threshold are unpaired in the reference structure. The sensitivity can assume values between 0 and 1 as well. It is defined as the proportion of true positives in all nt in an unpaired state in the reference structure, and reflects how many of the actual unpaired nt can be found via the lead score (57).
Secondary structure prediction with lead scores
For the data-guided structure prediction, we used the soft-constraint folding based approach by Washietl et al. (58), implemented in the programs RNAfold and RNApvmin of the ViennaRNA package v2.3.5 (59). To use the lead scores for structure prediction, they were first rescaled, setting the 96th percentile to a lead score of 2. Then the top 4% of the scores were set to a rescaled value of 2. All rescaled scores < 1 were excluded from the structure prediction. The resulting values were passed to RNApvmin, with the ratio of the weighting factors tau and sigma set to 0.6. The obtained perturbation energies are then further processed by RNAfold, where the lead scores - driven structure is predicted. The implementation of lead scores into RNA structure prediction was evaluated using the same test dataset of 32 tRNAs as in the testing of the reliability of scores. Comparisons between secondary structures were based on the coarse-grained structure and calculated with RNAdistance.
Identification of RNA thermometer candidates
Following our previous work on the Y. pseudotuberculosis’ in vitro structurome (48), we applied the Lead-Seq method at different temperatures (25°C + 37°C) to identify new RNAT candidates. For this purpose, Δscores were calculated for each nt by subtracting lead scores at 37°C from lead scores at 25°C for each position. For each transcript, the top 70% Δscores of all positive scores as well as the bottom 70% of all negative Δscores were used to form the set Δsadj. Each value of Δsadj was mapped back to its position in the transcript. A sliding window of 31 nt length was moved across the entire transcript, forming the set Δsadj(local). After each movement of the window, the Mann–Whitney–U test was performed for the two groups Δsadj(local) inside the window and Δsadj of the entire transcript, using scipy.stats.mannwhitneyu. Afterward, the corresponding p-value was log10-converted. If the average Δscore above the window was smaller than the average Δscore above the entire transcript, the log10(P-value) was additionally multiplied with −1. If <21 nt were included in Δsadj(local), the output of the test was set to 0.
β-galactosidase activity assay
For the β-galactosidase activity assays, E. coli DH5α cells carrying the 5′-UTR:bgaB fusion plasmids were grown overnight in LB with ampicillin at 25°C. Before being inoculated with an overnight culture (OD600 = 0.2), LB media supplemented with ampicillin was pre-warmed to 25°C. After growth to an OD600 = 0.5, transcription was induced with 0.01% w/v L-arabinose. The culture was split up and shifted to pre-warmed 100 ml flasks (temperatures indicated in the respective figure). Subsequently, the cultures were incubated for 30 min and 200 μl samples were taken for the β-galactosidase assay. The β-galactosidase assay was carried out as described previously (48). Mean standard deviations were calculated from at least three biological replicates with three technical replicates each.
Alignment-based structure comparison of RNAT sequences and secondary structure visualization
Enterobacterial genome sequences were retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/genome/microbes/). The local sequence and structure alignment was carried out using the LocARNA program (60,61) with default settings. The consensus structure was computed using RNAalifold (59) with default settings and the respective LocARNA-derived alignment file.
RESULTS
Lead(II) acetate maps the in vivo RNA structurome of a bacterium
The goal of this study was to establish the use of the lead(II) ions as a structure-specific probe in a transcriptome-wide structure probing approach. It was previously demonstrated that lead(II) acetate is suitable for the identification of single-stranded RNA regions in vitro and in vivo (37). To assess the influence of temperature on lead(II) acetate-mediated RNA cleavage, we first adjusted the lead(II) acetate concentrations, such that a similar amount of RNA is cleaved at 25 and at 37°C. It has been reported that single-hit kinetics are achieved when around 10–20 % of the RNA is cleaved (52). In pilot experiments with E. coli, we grew cells to mid-logarithmic growth phase at 25 or 37°C and added lead(II) acetate to obtain different final concentrations of lead(II) ions. Isolated total RNA from these lead-treated cultures was run on agarose gels and inspected for intact 16S and 23S rRNAs by densitometry (Supplementary Figure S1A). Additionally, we analyzed cleavage of a single plasmid-expressed mRNA (groES 5′-UTR:gfp mRNA) by primer extension and densitometry (Supplementary Figure S1B). At 25°C, treatment of cells with 150 mM lead(II) acetate led to degradation of 17% of the rRNA and 22% of the groES mRNA. At 37°C, about 22 % of the rRNA and 27 % of the mRNA were degraded when cells were treated with 70 mM lead(II) acetate. Based on these results, 150 mM and 70 mM lead(II) acetate were deemed suitable to approach single-hit kinetics and equal lead(II) reactivity at 25 and 37°C, respectively.
To map the entire RNA structurome of living Y. pseudotuberculosis cells, we coupled lead(II) ion-based structure probing (37,45) with next-generation sequencing (Lead-seq, Figure 1A). RNA was isolated from lead(II) acetate-treated Y. pseudotuberculosis cells that were grown at 25 and 37°C. After rRNA depletion, the remaining RNA was subjected to random fragmentation. 5′-adapters were ligated to the 5′-phosphate of RNA fragments, which were generated by T4 polynucleotide kinase treatment (T4 PNK). The 5′-adapter comprised a 6-nt barcode at the 3′-end of the adapter sequence (Supplementary Table S4). An adapter ligated to the 3′-end of the RNA fragments served as primer binding site for RT. Subsequently, cDNA products were amplified by PCR using primers complementary to the 5′- and 3′-adapter sequences. The cDNA libraries were subjected to high-throughput sequencing generating 8.8–10.5 Mio cDNA reads for each library (Supplementary Table S5). The data analysis pipeline comprised trimming of the adapter sequences and mapping of the reads to the Y. pseudotuberculosis YPIII transcriptome. To ensure unambiguous mapping of the sequenced reads, we only considered reads with a size >20 nt. As only the 5′-ends of mapped reads carry structural information (i.e. correspond to unpaired or lead-cleaved nt), the size-filtering leads to a ‘blind 3′-end’ corresponding to the last 20 nt of each transcript. RT run-off events were counted in the libraries generated with RNA isolated from cells treated with or without (control) lead(II) acetate. Finally, we computed the lead score for each nt of the transcriptome. The lead score is the logarithmic ratio of RT run-off counts in the lead(II) acetate-treated library over RT run-off counts in the control library. To examine whether Lead-seq possesses a cleavage bias toward certain nt, we counted the number of adenines, cytosines, guanines or uraciles that were cleaved. We found almost equal cleavage of all four nt at 25 and 37°C (Figure 1B). To explore the transcriptome's overall folding behavior, we inspected the distribution of the lead scores of all 3 million nt of the transcriptome at both tested temperature conditions (Figure 1C). We found a slight but significant increase of lead scores at the elevated temperature indicating melting of the transcriptome from 25 to 37°C (mean 25°C = 0.13; mean 37°C = 0.23; P = 1e-13).

Mapping lead cleavage sites by high-throughput sequencing. (A) RNA from lead(II) acetate-treated Yersinia pseudotuberculosis cells, grown at 25 or 37°C. Lead(II) ion-mediated attack of single-stranded nucleotides generates 2′,3′-cyclic phosphate (cP) and 5′-hydroxyl groups as cleavage products (45). After RNA isolation and rRNA depletion, total RNA was randomly fragmented and treated with T4 PNK. The RNA fragments were ligated to 5′—(yellow) and 3′—(green) adapters and subjected to RT. The cDNA products were PCR-amplified prior to Illumina high-throughput sequencing. Cleavage events per nt were counted in the lead-treated and control library. The lead score was calculated as the log ratio between lead-induced and spontaneous cleavage events. (B) Pie charts displaying the ratio of lead-cleavage preference toward adenines (As), cytosines (Cs), guanines (Gs) and uraciles (Us) at 25 and 37°C. Moreover, pie charts representing these ratios corrected for the genomic abundance of each nt are shown. (C) Distribution of lead scores at 25°C and 37°C (n = 3009815). Over all sampled nt, the mean lead score is significantly higher at 37°C than at 25°C (mean score at 25°C = 0.13, mean score at 37°C = 0.23, P = 1.0e-13;, D = 0.71).
Lead-seq is capable of mapping well-characterized RNA structures
To assess the accuracy of the Lead-seq approach, we examined the lead score profiles of well-studied and stable RNA molecules with conserved secondary structures. The profiles of tRNAAsp measured at 25 and 37°C largely reflected the expected secondary structure (Figure 2). Most of the enhanced lead reactivity corresponded to unpaired nt in the D and anticodon loops or in single-stranded nucleotides between two stems (8UA9), not only in tRNAAsp but also in other tRNAs (Supplementary Figure S2). As previously mentioned, the last 20 nt of each transcript are excluded from the analysis due to technical reasons, resulting in a lack of lead scores in the 3′-CCA region and the TψC loop (Figure 2A and B; Supplementary Figure S2). Several paired nt featured lead scores as well, e.g. 6GG7 in tRNAAsp at 25°C (Figure 2B). Enhanced accessibility of nt in the first position of a stem has previously been reported for other probing agents like SHAPE (62). Moreover, the cleavage specificity and efficiency of lead(II) ions has been reported to depend on the structural context of the flanking base-pairs (63). Cleavage of paired nt might suggest a dynamic tRNA structure ensemble comprising mature, premature, or misfolded tRNAs. Enhanced cleavage by lead(II) ion seems to be corresponding to physical exposure on the outside of the 3D-structure of tRNAAsp (Figure 2C, (64)). The lead scores of the D stems of tRNAAsp as well as tRNAHis indicated an unfolding at higher temperature (Figure 2A and B; Supplementary Figure S2), which is in line with the findings from temperature-dependent SHAPE probing of tRNAAsp from E. coli (65).

Lead score profile of tRNAHis matches the expected structure. (A) Lead score profiles of tRNAHis (YPK_R0043) obtained at 25 and 37°C. A positive lead score indicates that the nt is in an unstructured conformation. (B) The expected secondary structure of tRNAHis (taken from tRNAScan-SE (56)) was superimposed with lead scores obtained at 25 and 37°C. Lead score values are marked according to the indicated color code. (C) 3D representation of tRNAHis with lead-accessible nucleotides colored in blue.
We tested the performance of the Lead-seq approach using a reference dataset including 32 tRNAs present on the Y. pseudotuberculosis YPIII chromosome (for construction of the test dataset: see ‘Materials and Methods’ section). With this test dataset, we investigated whether the calculation of lead scores enhances the predictive power of structure predictions. Higher lead scores should provide more reliable information about the pairing state of a nt. To verify this, we applied different minimum thresholds, and excluded all nt with a lead score below these thresholds from the dataset. We then calculated the PPV and sensitivity for the remaining nt in the test dataset. Likewise, we calculated PPV and sensitivity for the raw counts (RT run-off events) from control- (untreated) and treatment- (lead(II) acetate-treated) experiments. This test was performed for the raw counts and lead scores at 25°C (Figure 3A), as well as 37°C (Figure 3B). At both temperatures, the lead scores improved the detection of single-stranded regions in tRNA structures. In general, we found that unpaired nt exhibited significantly higher lead scores at both temperatures (Figure 3C). Thus, Lead-seq is able to map single-stranded nt.

Lead-seq performance on a tRNA reference structural dataset. The structure of 32 tRNAs was used to assess the reliability of lead scores, including 764 paired and 665 unpaired nt. After excluding scores smaller than a certain threshold from the dataset, PPV and sensitivity were calculated from lead scores at 25°C (A) and 37°C (B). This was repeated using different thresholds. The distribution of lead scores at unpaired and paired sites at 25°C (mean unpaired = 0.84, mean paired = 0.26, P < 2.2e-16, D = 0.22) and 37°C (mean unpaired = 1.41, mean paired = 0.69, P < 2.2e-16, D = 0.22) is illustrated as boxplots (C).
In order to test whether Lead-Seq is capable of mapping structures of longer RNA molecules, we inspected the lead score profile of the RNase P RNA (Figure 4A). The secondary structure of the catalytic RNA of RNase P is known to be conserved and consists of 18 stem-loop structures as well as several tertiary interactions (66,67). The Lead-seq profiles of the RNase P RNA measured at 25°C and 37°C were well correlated (Pearson correlation = 0.7) consistent with the expected thermostability of the molecule (Figure 4A). Overall, the lead scores recapitulated the proposed secondary structure of the Yersinia RNase P RNA, and many high lead scores coincided with cleavage sites from previous lead(II) acetate-based structural investigations (Figure 4B) (36–37,68).

Lead score profiles of RNase P RNA matches the validated structure. (A) Lead score profiles of M1 RNA obtained at 25 and 37°C. (B) The lead scores of the M1 RNA (25 and 37°C) were overlaid with its expected secondary structure (105). Nt with positive lead score values, indicating a single-stranded conformation, are indicated in color. Proposed pseudoknots are indicated. Black arrows indicate prominent sites of lead(II) cleavage that were also observed by (36,38,68).
Additionally, we inspected the lead score profiles of the tmRNA at 25 and 37°C (Supplementary Figure S3A). The tmRNA rescues stalled ribosomes on defective mRNAs with the helper protein SmpB in a process called trans-translation (69). Overall, the obtained lead scores fit the expected secondary structure (70) (Supplementary Figure S3B). Nt 16UU17 (as well as nt 333GAC335, which are not visible in our approach) are supposed to be involved in SmpB binding as revealed by standard in vivo lead probing of tmRNA from wild type and ΔsmpB E. coli strains (71). Our finding that 16UU17 are cleaved by lead at both temperatures suggests that the majority of tmRNAs in Y. pseudotuberculosis is not bound by SmpB under the tested conditions. Based on the data evaluation for tRNAs as well as larger RNA molecules, we conclude that Lead-seq is a suitable method to map RNA structures in vivo.
Lead-seq complements presently available RNA structure probing techniques
We inspected the 5′-UTR of rpsT encoding the ribosomal protein S20, which was previously analyzed in E. coli by SHAPE probing and is supposed to harbor a structured RNA motif with unknown function (72). With the exception of only one nt, which is positioned in an internal loop, the sequences of this region are identical in Y. pseudotuberculosis and E. coli (Supplementary Figure S4A). The RNA motif constitutes a single hairpin, in which single-stranded nt are prone to lead cleavage as well as SHAPE modification (Supplementary Figure S4B). The reliability of both methods seems comparable for the small RNA structure, and some overlap of enhanced lead- and SHAPE-reactivity can be found.
To further validate Lead-seq by other probing methods, we extended our analysis to larger RNAs and compared the results from the 16S rRNA of structure probing experiments with DMS (73) and SHAPE (74) with the lead scores derived from our experiments (Figure 5). Consistent with an intricate structure within the 30S ribosome, only few cuts/modifications were found by all three methods. A 3D representation of the 16S rRNA alone or the 30S ribosome shows that the lead cleavage occurs at surface-exposed nucleotides (Supplementary Movies S1 and 2). As in the 5′-UTR of rpsT, some intersection between lead scores and the reactivities from other methods occur. Strikingly, several unpaired sites are highly reactive only toward one of the used chemical probes. In the future, the combination of several probes in one study should be suitable to yield more comprehensive structural information.

Lead-seq can compete with other global structure probing approaches. Secondary structure of the 16S rRNA. Nucleotides which are cleaved by lead(II) ions at 25°C, 37°C or at both temperatures are marked in blue, red or orange, respectively. Structural information from transcriptome-wide in vivo studies of 16S RNA using either DMS (106) or SHAPE reagents (74) as probe are indicated by blue, red or orange arrow heads, respectively.
Lead-seq data improve secondary-structure prediction
The structures of all tRNAs were predicted with and without guidance by lead scores. The distance between the resulting MFE structures and the reference structure was determined based on the coarse-grained structures (Figure 6A). The initial prediction without incorporation of Lead scores was correct for about half of all tested tRNAs (Figure 6B). For 15 of the 32 tRNAs, the classic approach predicted MFE structures not in accordance with the reference structures. For nine tRNAs—and thus 60% of all initially incorrectly predicted tRNAs—the inclusion of our experimental Lead-seq data improved the prediction. Strikingly, the Lead scores had a correcting influence on the MFE structure when the initial prediction was rather inaccurate, while the structure often remained unchanged when the initial structure was already close to the reference structure. In total, the predictions of 28% of the tRNAs were positively influenced by the lead scores (Figure 6C).

Comparison between RNA structure predictions with and without incorporation of Lead-seq data. (A) Examples of minimum free energy structures (MFE structures). The MFE structure of tRNAAsp (center) is not in accordance to the reference structure (left). This could be improved by using Lead scores as soft constraints (right). (B) Distance to the reference structure of MFE structures of 32 tRNAs predicted without (left) and with (right) incorporation of Lead-seq data, based on coarse grained representations of the structures. (C) Fraction of tRNAs whose predicted MFE structure could be improved by Lead scores.
Metagene analysis of the Y. pseudotuberculosis in vivo structurome
Although all mRNA structures are distinct (or in other words, have their own structural ‘footprint’ (72)), metagene analysis can provide interesting insights into overall structural features. We used our Lead-seq dataset to examine the temperature-responsiveness of the coding RNA structurome (mRNAs possessing a 5′-UTR > 14 nt, n = 1553) at 25 and 37°C. On average, mRNAs seem to be less structured at 37°C compared to 25°C (mean 25°C = 0.126, mean 37°C = 0.238, P = 6.3e-11), suggesting that coding RNAs tend to unfold at higher temperatures (Supplementary Figure S5A). To assess the differential melting behavior within mRNAs, we compared average lead scores of 5′-UTRs (Supplementary Figure S5B) as well as CDRs (Supplementary Figure S5C) and found the majority of both to melt from 25°C to 37°C. Transcript-wise examination of the 5′-UTR’s and CDR’s respective Δscore (25–37°C) revealed only a moderate correlation (Supplementary Figure S5D, Spearman's correlation = 0.68). Moreover, the mean Δscore of 5′-UTRs (mean Δscore = −0.133) is more negative than the corresponding value of CDRs (mean Δscore = −0.110), suggesting stronger melting of 5′-UTRs (Supplementary Figure S5E, P = 1.1e-07).
Next, we had a closer look at structural features across coding transcript regions by analyzing an average mRNA lead score profile (Figure 7A and B). All 1016 mRNAs with 5′-UTRs longer than 50 nt were aligned at the translation start codon and the average lead score of the last 48 nt of the 5′-UTRs as well as the first 48 nt (16 codons) of the CDR was calculated. We found 5′-UTRs to be more structured than CDRs at 25°C (mean 5′-UTR = 0.145, mean CDR = 0.163, P = 5.6e-16) (Figure 7A). At 37°C, we found nearly equal average lead scores for 5′-UTRs and CDRs (mean 5′-UTR = 0.265, mean CDR = 0.257, P = 0.003) (Figure 7B). In line with the results described above (Supplementary Figure S5A–E), this finding suggests that 5′-UTRs unfold at 37°C to a larger extent than CDRs. We also observed a local maximum of unstructuredness at the translation start codon at both temperatures (Figure 7A and B), which coincides with findings from the in vivo probing of the E. coli transcriptome (72). In line with the E. coli study but in contrast to the meta-analysis from our in vitro probing approach (48), we did not find a local minimum around the Shine–Dalgarno (SD) sequence (−10 ± 4 nt from the start codon) (Figure 7A and B).

Structural analysis of the average lead profile across and within protein-coding transcripts. Average lead score for each nt across the last 50 nt of the 5′-UTR and the first 50 nt of the CDR at 25°C (A), and at 37°C (B). Average lead score for the first, second, and third base of each codon in the 5′ UTR (last 48 nt) and the CDRs (first 48 nt) at 25°C (C), and 37°C (D). Statistical significance determined by unpaired two-sided t-test (asterisks abbreviation: ns P > 0.05; * P < = 0.05; ** P < = 0.01; *** P < = 0.001; **** P < = 0.0001). (A and B) n = 1016, (C and D) n = 16 256
Finally, we analyzed the structuredness of single nt within base triplets. Whether a three-nt periodicity in CDRs exists or not is controversial (72,75). To address this issue in Yersinia, we computed the average lead score of the first, second and third nt within the first 16 base triplets of all CDRs as well as the last 16 base triplets of all 5′-UTRs. At both temperatures, the first nt of the triplet was slightly less structured within CDRs (P 25°C = 0.01, P 37°C = 0.009) (Figure 7C and D), while this differential periodicity was absent in 5′-UTRs.
Discovery and validation of new RNA thermometers (RNATs)
Yersinia pseudotuberculosis adapts to virulence conditions by taking advantage of the temperature-controlled thermodynamic characteristics of RNA molecules (48,76). Since our study provides information on RNA structures at two temperatures relevant in this context (25°C as proxy for environmental conditions and 37°C mimicking virulence conditions), one should be able to extract known and potentially new RNATs from our datasets. This was indeed the case. We used a sliding-window approach to identify local structural rearrangements of protein-coding transcripts (Figure 8A). This method identifies transcript regions with significant local melting events compared to the general unfolding of the whole transcript. Following earlier studies from our lab, we additionally factored in the difference between average lead scores at 37 and 25°C (Δscore) by implementing a sign-conversion. RNA regions with a strongly negative plog,sign can be interpreted as regions undergoing temperature-dependent conformational changes. mRNAs exhibiting a plog,sign < −3 around the SD region (−10 ± 4 from the start codon) were considered promising RNAT candidates (Figure 8B). We found that the SD region upstream of ailA, trxA, sodB, pepN and cysK-2 showed significant unfolding from 25 to 37°C (Table 1). The respective 5′-UTRs are known RNATs, which were identified and experimentally validated in our previous PARS-based (parallel analysis of RNA structure) study (48). Moreover, we identified several other potential RNA-based thermosensors that melt at elevated temperature, such as the 5′-UTRs upstream of groES and clpB both coding for heat shock chaperones (Table 1; Figure 9A and B). To corroborate that Lead-seq is capable of detecting novel RNATs, we experimentally measured temperature-dependent translation mediated by the groES or clpB 5′-UTRs. The respective 5′-UTRs were translationally fused to bgaB, a gene coding for a heat-stable β-galactosidase (Figure 8C). In this well-established plasmid-based reporter gene system (48), transcription is driven by the arabinose-inducible PBAD promoter. Reporter gene activity in E. coli was several-fold induced after a shift from 25 to 37°C (Figure 8D). This temperature-controlled activity was comparable to the Y. pseudotuberculosis katA 5′-UTR, a known RNA-based thermoregulator (48).

Detection of RNATs via lead scores and their experimental validation. (A) Schematic representation of the intra-transcript comparison method. First, the scores from experiments at 25°C (blue) and 37°C (red) of a transcript (top) form the basis for the calculation of Δscores at the single nt level (middle). From the set of all Δscores of a transcript, the percentiles P30(Δs+) and P70(Δs−) are calculated indicated by dashed lines. Score Δsadj is based on all Δscores of the transcript being greater than P30(Δs+) or smaller than P70(Δs−) (bottom left). Next, a window is moved over the transcript, and Δsadj of nt inside the window form Δsadj(local) (bottom right). Scores Δsadj and Δsadj(local) are subsequently compared using the Mann–Whitney–U rank test (mwu). (B) Relation between general unfolding (Δscore) and local unfolding (plog,sign) at Shine-Dalgarno regions sampled at 25 and 37°C. Significance levels of plog,sign < −3 or plog,sign > 3 indicate an significant lower or higher structuredness at 37°C than at 25°C, respectively, and are indicated by dashed lines. n = 902. (C) Schematic representation of reporter gene constructs used to study translational thermoregulation of potential thermosensors. The 5′-UTRs upstream of clpB and groES were translationally fused to bgaB encoding a heat-stable β-galactosidase, while transcription is under control of an arabinose-inducible promoter (PBAD). (D) Escherichia coli DH5α cells harboring the corresponding plasmid (technical triplicates per each construct) were grown to an OD600 = 0.5 at 25°C. Afterward, transcription was induced with 0.01% (w/v) L-arabinose and the cultures were split immediately: One half remained at 25°C while the other was transferred into pre-warmed flasks at 37°C. After 30 min incubation, samples were taken for subsequent β-galactosidase assay. The displayed results represent the mean activities from five independent experiments (biological replicates). Mean standard deviations are indicated as error bars.
. | . | Lead-seqb . | . | . | . | |
---|---|---|---|---|---|---|
locusa . | gene namea . | Δp log,sign . | Δscore . | PARSc . | Thermal controld . | Reference . |
YPK_1268 | ailA | −7.39 | −1.40 | 0.52 | Y | (48) |
YPK_1429 | cysK-2 | −3.44 | −0.27 | 0.11 | Y | (48) |
YPK_1863 | sodB | −4.78 | −0.45 | −0.22 | Y | (48) |
YPK_2645 | pepN | −6.27 | −0.49 | −0.62 | Y | (48) |
YPK_3349 | clpB | −3.26 | −0.54 | −0.29 | Y | This study |
YPK_3823 | groES | −3.56 | −1.66 | −0.45 | Y | This study |
YPK_4035 | trxA | −3.16 | −0.19 | −0.85 | s:Y l:N | (48) |
. | . | Lead-seqb . | . | . | . | |
---|---|---|---|---|---|---|
locusa . | gene namea . | Δp log,sign . | Δscore . | PARSc . | Thermal controld . | Reference . |
YPK_1268 | ailA | −7.39 | −1.40 | 0.52 | Y | (48) |
YPK_1429 | cysK-2 | −3.44 | −0.27 | 0.11 | Y | (48) |
YPK_1863 | sodB | −4.78 | −0.45 | −0.22 | Y | (48) |
YPK_2645 | pepN | −6.27 | −0.49 | −0.62 | Y | (48) |
YPK_3349 | clpB | −3.26 | −0.54 | −0.29 | Y | This study |
YPK_3823 | groES | −3.56 | −1.66 | −0.45 | Y | This study |
YPK_4035 | trxA | −3.16 | −0.19 | −0.85 | s:Y l:N | (48) |
aLocus ID and gene name of the downstream gene potentially thermoregulated by the putative RNAT.
bAverage values of the SD region (−14 to −6 from the start codon) calculated from lead scores at 25 and 37°C.
cDifference in the PARS scores of the SD region from 37 to 25°C. PARS scores were taken from Righetti et al., (48).
dExperimental confirmation of temperature-dependent translational regulation in reporter gene studies from this study or studying the in vitro transcriptome of Y. pseudotuberculosis (48). (Y) indicates thermal control, (N) no thermal control, (s) and (l) short and long variants of the 5′-UTR, respectively.
. | . | Lead-seqb . | . | . | . | |
---|---|---|---|---|---|---|
locusa . | gene namea . | Δp log,sign . | Δscore . | PARSc . | Thermal controld . | Reference . |
YPK_1268 | ailA | −7.39 | −1.40 | 0.52 | Y | (48) |
YPK_1429 | cysK-2 | −3.44 | −0.27 | 0.11 | Y | (48) |
YPK_1863 | sodB | −4.78 | −0.45 | −0.22 | Y | (48) |
YPK_2645 | pepN | −6.27 | −0.49 | −0.62 | Y | (48) |
YPK_3349 | clpB | −3.26 | −0.54 | −0.29 | Y | This study |
YPK_3823 | groES | −3.56 | −1.66 | −0.45 | Y | This study |
YPK_4035 | trxA | −3.16 | −0.19 | −0.85 | s:Y l:N | (48) |
. | . | Lead-seqb . | . | . | . | |
---|---|---|---|---|---|---|
locusa . | gene namea . | Δp log,sign . | Δscore . | PARSc . | Thermal controld . | Reference . |
YPK_1268 | ailA | −7.39 | −1.40 | 0.52 | Y | (48) |
YPK_1429 | cysK-2 | −3.44 | −0.27 | 0.11 | Y | (48) |
YPK_1863 | sodB | −4.78 | −0.45 | −0.22 | Y | (48) |
YPK_2645 | pepN | −6.27 | −0.49 | −0.62 | Y | (48) |
YPK_3349 | clpB | −3.26 | −0.54 | −0.29 | Y | This study |
YPK_3823 | groES | −3.56 | −1.66 | −0.45 | Y | This study |
YPK_4035 | trxA | −3.16 | −0.19 | −0.85 | s:Y l:N | (48) |
aLocus ID and gene name of the downstream gene potentially thermoregulated by the putative RNAT.
bAverage values of the SD region (−14 to −6 from the start codon) calculated from lead scores at 25 and 37°C.
cDifference in the PARS scores of the SD region from 37 to 25°C. PARS scores were taken from Righetti et al., (48).
dExperimental confirmation of temperature-dependent translational regulation in reporter gene studies from this study or studying the in vitro transcriptome of Y. pseudotuberculosis (48). (Y) indicates thermal control, (N) no thermal control, (s) and (l) short and long variants of the 5′-UTR, respectively.

Lead-seq data indicate melting of the groES and clpB 5′-UTR’s secondary structures. Lead scores obtained at 25 or 37°C are depicted in the secondary consensus structures (see Supplementary Figure S6) of the enterobacterial groES 5′-UTR (A) and clpB 5′-UTR (B). The SD sequence as well as the translation start codon are marked and the lead scores are represented by the indicated color code. Stabilizing (REP) as well as destabilizing mutations (DEREP) are depicted within the structures colored with lead scores obtained at 25 or 37°C, respectively. (C) Point mutations within the 5′-UTRs of groES and clpB impair RNAT functionality. WT and mutated 5′-UTRs (REP and DEREP variants. For details see panel A and B) were translationally fused to the bgaB reporter gene and BgaB activity was measured as described in Figure 6. The displayed results represent the mean activities from three independent experiments (biological replicates). Mean standard deviations are indicated as error bars.
It is interesting to note that the newly identified thermosensors appear to be conserved structural elements. The groES 5′-UTR of enterobacteria is composed of two hairpins containing conserved as well as covariant nt (Supplementary Figure S6A). The consensus structure masks the SD region and AUG codon by base-pairing interactions (Supplementary Figure S6B) and is known to mediate translational control and differential expression of the groESL operon in Salmonella (77). A high sequence conservation is also evident in the clpB 5′-UTR (Supplementary Figure S6C). In the consensus structure, SD region and start codon are involved in a base-pairing interaction (Supplementary Figure S6D). Superimposing the consensus structures of both new RNATs with the respective lead scores at 25 and 37°C revealed higher lead scores for the groES 5′-UTR at 37°C compared to 25°C consistent with the thermolability of both hairpins (Figure 9A). In particular, the SD region is resistant to lead(II) acetate-mediated cleavage at 25°C but highly accessible at 37°C. Temperature-induced melting is also evident in the clpB 5′-UTR (Figure 9B). At 25°C, the 5′-UTR constitutes a stable structure as only few nucleotides are susceptible to lead cleavage. The attacked nucleotides are located within single-strand loop regions within the secondary structure model. At 37°C, the accessibility to lead(II) ions increased. To further explore the functionality of the newly discovered thermosensors, we designed stabilizing (called REP) as well as destabilizing point mutations (called DEREP) that were supposed to either clamp or loosen the structure of the 5′-UTRs (Figure 9A and B). The obtained variants were tested for temperature-dependent translation control using the bgaB reporter gene system (Figure 9C). As expected, the REP mutations led to full repression of reporter gene activity indicating a stable RNA structure that impairs translation initiation both at 25 and 37°C. In contrast, the destabilized variants produced two- to three-fold higher BgaB activity relative to the wildtype 5′-UTRs already at 25°C. This result confirms the identity of these RNA elements as labile RNAT structures, whose destabilization by strategic mutations immediately facilitates ribosome access even at the low temperature. Overall, these results show that the Lead-seq approach can uncover dynamic RNA structures when applied at two different temperatures.
DISCUSSION
Transcriptome-wide RNA structure probing has been developed only recently and various methods have been established for probing in vitro and in vivo. Mapping of RNA structures in living cells is limited by the ability of the probing agent to penetrate the cell wall or membrane, which in turn is determined by the probe's physical properties such as size or charge. Furthermore, the electrostatic properties of the RNA environment determine the reactivity of chemical probes (32,78). Like DMS or SHAPE reagents, lead(II) acetate is a suitable reagent for RNA structure determination in living cells. Hitherto, this probe was successfully used to determine the structure of single RNA molecules (37,38) and to identify metal ion binding sites within RNA (79).
In this proof-of-concept study we took advantage of the ability of lead(II) acetate to pass bacterial membranes and to attack ribonucleotides without sequence bias (Figure 1B). The parallel analysis of RNA from untreated cells identifies background signals derived from spontaneous RT stops, e.g. caused by stable RNA secondary structure, covalent modification and RNA degradation or processing (80). Such correction against untreated samples has commonly been used, for example in Mod-seq (81) or Structure-Seq2 (82). We generated lead scores by calculating the logarithmic ratios between RT run-offs in the lead treatment library and the control library. The magnitude of the lead score corresponds to the enhanced accessibility of nt, indicating a higher probability for a nt to be in a single-stranded conformation. Computation of logarithmic ratios as quantitative representation for structuredness has also been used in previous approaches, such as PARS (53,54), FragSeq (83) and CIRS-seq (25).
In earlier studies, it was reported that lead(II) acetate is able to map complex RNA structures (44,46). Our data show that Lead-seq has the potential to interrogate the genome-wide RNA structure landscape within living cells. The lead score profiles of tRNAs (Figure 3), RNase P RNA (Figure 4) and tmRNA (Supplementary Figure S3) were consistent with their accepted secondary structures. DMS or SHAPE reagents are alternative probes to investigate higher-order structures of RNA molecules (84). These compounds were used for comparison of in vitro and in vivo structures of single RNAs (85,86) as well as on a genome-wide scale (72,87–89). Although the folding conditions in dilute buffer or in a crowded intracellular environment differ, genome-wide SHAPE approaches barely found noticeable differences in mRNA structures between in vivo and in vitro data from mouse embryonic stem cells (87) and E. coli (72). Future comparative studies are needed to learn whether this is also true for other organisms. Since lead is applicable to both, cellular RNA after extraction and refolding as well as RNA within living cells, the Lead-seq approach has great potential to address this question in bacteria.
It was previously discussed that combining multiple structure probing techniques will provide the most definitive structure of certain RNA molecules (90). Thus, Lead-seq in combination with SHAPE and/or other methods might eliminate structural information gaps. An example for the additive value of two approaches is the mapping of the rpsT leader region by Lead-seq in Y. pseudotuberculosis (this study) and by SHAPE in E. coli (72) (Supplementary Figure S4). The comparison of Lead-seq, DMS (73) and SHAPE probing (74) on the well-known structure of the 16S rRNA (Figure 5) reveals complementary but also idiosyncratic patterns indicative of the different resolution power of these methods.
The final goal of all RNA structuromics methods is to generate detailed information on the folding state of cellular RNAs. At present, all experimental techniques are limited to the identification of single-stranded residues. These results provide valuable information that can be utilized to improve the quality of predictions by classical RNA secondary structure prediction algorithms (91). We show that this is also possible with Lead-seq data by combining secondary structure prediction with lead scores via the method developed by Washietl et al. (58). The prediction of 28% of the tested tRNA could be improved (Figure 6) and the benefit was particularly strong for those RNAs that initially showed a higher distance to the reference structure.
In addition to the analysis of selected individual RNAs to validate the approach, we used the Lead-seq method to investigate global structural features of coding transcripts. We found the CDR to be less structured (higher lead scores) than the upstream 5′-UTRs at 25°C (Figure 7A). The same trend was reported for mRNAs from metazoan (92), HIV-1 virus (93), rice (94) and human (54). Consistent with the results from the metagene analysis of the Y. pseudotuberculosis in vitro structurome (48) as well as the in vivo structurome of E. coli (72), 5′-UTRs and CDRs seem to be equally structured at 37°C (Figure 7B). Consequently, 5′-UTRs undergo stronger unfolding from 25 to 37°C than CDRs. Similar observations were made by DMS-based in vivo probing of Oryza sativa L. (rice) seedlings grown at 22 or 42°C (94). We observed a slight but significant structural pattern within codons, such that on average the third position in a codon was less structured (i.e. more prone to lead cleavage) than the preceding ones (Figure 7B). A similar structural bias in codons was reported in Arabidopsis thaliana (95), yeast (53), E. coli (75) and mouse (25). It is believed that this structural feature serves as a hidden code in RNA structure, which influences and regulates translation efficiency (25,95). Another explanation might be co-translational mRNA decay following the last translating ribosome, which can lead to a 3-ntp eriodicity pattern in the CDR (96).
Lead-seq applied to Y. pseudotuberculosis grown at two different temperatures allowed us to scrutinize its temperature-responsive in vivo structurome. We observed a general unfolding of mRNA structures (Supplementary Figure S5), in particular in 5′-UTRs. Thus, we attempted to localize structural melting processes, especially around the SD region (Figure 8). This strategy not only retrieved RNATs known from previous studies, but also uncovered novel thermosensors upstream of groES and clpB (Table 1). Both genes encode chaperones known to rescue proteins during the heat shock response (97–100). Strikingly, the homolog groES 5′-UTR found in Salmonella enterica also harbors a dynamic temperature-sensitive RNA structure. It drives differential translation of the groESL operon reducing the level of GroES protein over GroEL at low temperatures (77). Interestingly, many base pairs of the groES-RNAT of Salmonella and Yersinia are either covariant or conserved among the species, especially in the hairpin comprising the SD region (Figure 9). The alignment-based structure is further supported by the observed lead reactivity at 25°C, and it suggests a functional role of this structural element. The newly identified clpB RNAT also bears high sequence and structure identity among enterobacteria (Supplementary Figure S6). Structure and sequence conservation was also found in the shuA/chuA thermosensors from Shigella dysenteriae/E. coli regulating genes involved in iron acquisition (101). In contrast, less conserved leader regions with high structural similarity were found upstream of cnfY or cnf1 from Y. pseudotuberculosis or uropathogenic E. coli, respectively (102).
Recent reports suggest a remarkable difference between the outcome of temperature-induced RNA melting in prokaryotes and eukaryotes. While unfolding of UTRs of bacterial mRNA facilitates ribosome loading and translation initiation, such melting rather induces mRNA degradation by the exosome in eukaryotic organisms (47,94). Lead-seq or other global structure probing techniques might be applied to address how temperature or other environmental parameters affect the RNA structurome in a variety of organisms. The potential of such methods is exemplified by a recently discovered thermoswitch controlling expression of a cold shock protein in Listeria monocytogenes (103).
The Lead-seq approach was able to map RNA structures well in line with previous findings. Thus, we believe that global lead probing has the potential to become a valuable method to map the dynamic RNA structuromes in vitro and in vivo. The established protocol is fairly straight-forward and direct since it does not rely on modification-induced reverse transcription termination of chemically modified nt. Therefore, it should be adaptable to various applications. For example, one should be able to use it for footprinting of RNA-binding proteins by comparison of the in vivo structuromes from wild type strains and strains lacking a certain RNA-binding protein (81). Another potential of global structure probing is the identification of ligand-binding sites in RNA molecules. Changes in SHAPE reactivity were observed near the ligand-binding site in the thiM TPP riboswitch aptamer domain in the presence and absence of TPP in vitro (85). In combination with ribosome profiling, Lead-seq should be able to correlate the effect of RNA structures and translation efficiency, as previously accomplished by other structure probing approaches (72,104).
DATA AVAILABILITY
Data is available at the Gene Expression Omnibus (GEO) database under accession number GSE140649 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140649).
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
German Research Foundation [DFG NA 240/10-2]. Funding for open access charge: DFG [NA 240/10-2].
Conflict of interest statement. None declared.
REFERENCES
Author notes
The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors.
Comments