Long-read single-molecule RNA structure sequencing using nanopore

Abstract RNA molecules can form secondary and tertiary structures that can regulate their localization and function. Using enzymatic or chemical probing together with high-throughput sequencing, secondary structure can be mapped across the entire transcriptome. However, a limiting factor is that only population averages can be obtained since each read is an independent measurement. Although long-read sequencing has recently been used to determine RNA structure, these methods still used aggregate signals across the strands to detect structure. Averaging across the population also means that only limited information about structural heterogeneity across molecules or dependencies within each molecule can be obtained. Here, we present Single-Molecule Structure sequencing (SMS-seq) that combines structural probing with native RNA sequencing to provide non-amplified, structural profiles of individual molecules with novel analysis methods. Our new approach using mutual information enabled single molecule structural interrogation. Each RNA is probed at numerous bases enabling the discovery of dependencies and heterogeneity of structural features. We also show that SMS-seq can capture tertiary interactions, dynamics of riboswitch ligand binding, and mRNA structural features.

Testing of different RNA structure probing reagents. E. coli 16S rRNA was treated with DEPC, DMS, NAI, and untreated control. The IGV read coverage plot on the full-length 16S rRNA depicts NAI and DMS treated RNA had short and low-quality sequences. Whereas DEPC treatment had yielded a full-length read comparable with the control. The y-axis is the number of reads covering a given position and the x-axis is the base position along 16S rRNA. Each bar represents a nucleotide. The gray bars represent an exact match and the colored bars indicate at least 10 % of the reads have mismatches to the reference sequence. Raw data for synthetic hairpin RNA. Normalized signal and standard deviation carry information to distinguish modified (red) and unmodified (black) bases and in turn to recognize which regions are open (loop) and closed (stem). The x-axis displays a normalized mean raw nanopore signal, while the y-axis displays normalized standard deviation of the raw nanopore signal. Nanopore raw current signal and alignment for the synthetic hairpin and enzymatic assay.
A) Distribution of Nanopore resquiggled current signal from control (black) and DEPC treated (red) hairpin RNA. Predicted accessible regions are labeled as "loop". B) Modification frequency per base over synthetic hairpin for selected three RNA structures with different minimum free energy (MFE). C) RNase T1, which cuts after G, enzymatic cleavage of hairpin RNA shows the predominant structure (black arrow) and alternative structures with an opening of the stem (white arrow).

E)
Supplementary Figure 6. FMN riboswitch. A) The FMN riboswitch RNA structure. B) Log2 fold change in accessibility between bound and unbound states of FMN. Red bars indicate significant change. Adding the ligand-induced conformational changes reveals the accessibility of some of the nucleotides at the ligand-binding pockets. C) FMN riboswitch RNA in a stick-and-ribbon representation (PDB ID: 3F2Q(2)) and tertiary interactions are coloured red using PyMOL (v 1.5)(3). L, P, and J stand for loop, stem, and junction, respectively. D) dAMI dependencies between pairs of bases for the bound (upper) and unbound (lower) state of the FMN riboswitch, which reveals inter-domain dependencies and tertiary interactions where the ligand essentially lock-up the riboswitch with more dependencies among different domains (the top gray big box). Nucleotides reorientation upon ligand binding induces a change in the dependencies among different domains. Some of the tertiary interactions that have been described using X-ray crystallography (2) are depicted in the dependencies plot with different line styles boxes designated in the panels (C). E) Log2 fold change in accessibility between bound and unbound states of FMN in 0 mM, 1 mM, and 15 mM MgCl2. Red bars indicate significant change. The different concentration of Mg2+ induced heterogeneous conformations threby the accessibility of some of the nucleotides (4).

Supplementary Figure 7.
Comparison between RING-MaP data and dAMI metric of TPP. RING-MaP correlation (left side) and SMS-seq dAMI (right side). RNA interaction groups among regions in TPP. J, P, and L refer to junction, stem and loop regions, respectively. dAMI is sensitive in picking up for example the known TPP structural interaction between nucleotides in P3 and L5 region in the presence of ligand(5), while not highlighting false positives of RNA interaction groups of L5 with P2 and L5 with P4. Relation between coverage and correlation for two yeast replicates. Correlations here are calculated based on z-score -modification rate normalized per gene per replicate. We selected 50 as minimum coverage for being included in further analysis, which results in above 0.6 correlations between replicates for modification rate.

Supplementary Figure 10.
Modification rate along highly expressed mRNAs. Z-score of the modification rate correlates well within individual genes between replicates. The pattern of low/high z-score seems to be very well preserved.

Supplementary Figure 11.
Comparison of SMS-seq to PARS-assisted folding and SMS-seq results for a known mRNA structure. A) The SMS-seq modification frequency for regions that are classified as either open or closed according to PARS-assisted folding in yeast mRNAs (6). B) Modification signal of a known hairpin structure in mRNA. The structure is from RPL33A (YPL143W) mRNA (chromosome XVI:282824-282900) and agrees with previous in vitro DMS structural probing (7). SMS-seq modification signal is color-coded proportional to modification level and plotted onto the RNAfold structure prediction (right). Vertical broken lines define the loop region.

Supplementary Table 1.
RTA adapters, TPP oligos and RNA sequences ligated at the 5 ′ end of the hairpin and riboswitches.