Distributed Biotin-Streptavidin Transcription Roadblocks for Mapping Cotranscriptional RNA Folding

RNA molecules fold cotranscriptionally as they emerge from RNA polymerase. Cotranscriptional folding is an important process for proper RNA structure formation as the order of folding can determine an RNA molecule’s structure, and thus its functional properties. Despite its fundamental importance, the experimental study of RNA cotranscriptional folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structures at nucleotide resolution during transcription. We previously developed cotranscriptional selective 2’-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-seq) to simultaneously probe all of the intermediate structures an RNA molecule transitions through during transcription elongation. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent streptavidin roadblocking strategy to simplify the preparation of roadblocking transcription templates. We determine the fundamental properties of streptavidin roadblocks and show that randomly distributed streptavidin roadblocks can be used in cotranscriptional SHAPE-Seq experiments to measure the Bacillus cereus crcB fluoride riboswitch folding pathway. Comparison of EcoRIE111Q and streptavidin roadblocks in cotranscriptional SHAPE-Seq data shows that both strategies identify the same RNA structural transitions related to the riboswitch decision-making process. Finally, we propose guidelines to leverage the complementary strengths of each transcription roadblock for use in studying cotranscriptional folding.


Introduction
The capacity for RNA to fold into sophisticated structures is integral to its roles in diverse cellular processes including gene expression, macromolecular assembly, and RNA splicing (1,2). Because RNA folding can occur on a shorter timescale than nucleotide addition by RNA polymerase (RNAP) (3)(4)(5), a nascent RNA can transition through multiple intermediate structural states as it is synthesized (6). Pioneering experimental studies of RNA cotranscriptional folding used biochemical methods to characterize these RNA structural intermediates (6)(7)(8) and more recently, singlemolecule force spectroscopy has been used to directly observe RNA folding during transcription by measuring changes in RNA extension in real time (9). However, the lack of a robust method to directly interrogate RNA structure at nucleotide resolution during transcription has so far limited our ability to fully investigate the fundamental principles of RNA cotranscriptional folding and its impact on generating functional RNA structural states that govern fundamental biological processes.
We recently addressed this technological gap by developing cotranscriptional SHAPE-Seq to measure nascent RNA structures at nucleotide resolution (10). SHAPE-Seq combines chemical RNA structure probing with high-throughput sequencing to simultaneously characterize the structure of RNAs in a mixture (11)(12)(13). Chemical modification of a target RNA is accomplished using any of the suite of SHAPE probes available that react with the RNA 2'-hydroxyl at 'flexible' regions of the molecule, such as unpaired nucleotides in single stranded regions and loops (14). After reverse transcription (RT), modified nucleotides can be detected as truncated RT products using high-throughput sequencing. The resulting sequencing reads are then used to generate a 'reactivity' value for each nucleotide in each RNA (15). SHAPE-Seq reactivities represent the relative flexibility of each nucleotide of an RNA: highly reactive nucleotides tend to be single-stranded, whereas nucleotides with low reactivities tend to be constrained by base-pairing or other intra-or intermolecular interactions (11,16).  (10) which are generated by constructing a DNA template library containing a promoter, a variable length of the target RNA template, and an EcoRI site. Within 30s of the start of in vitro transcription, TECs are blocked by a catalytically dead EcoRI E111Q mutant (Gln111) (17,18) bound to the EcoRI sites (10) and treated with either the fast acting SHAPE reagent benzoyl cyanide (BzCN, t 1/2 of 250 ms) (19) or dimethyl sulfoxide (DMSO) as a control. RNAs are then quickly extracted and processed for paired-end sequencing to identify transcript length and SHAPEmodification position as described previously (11). RNA structural states that persist on the order of seconds are interrogated to provide "snapshots" of kinetically trapped intermediates that reveal key transitions within RNA folding pathways (10) The cotranscriptional SHAPE-Seq experiment requires that stalled TECs can be generated for all intermediate lengths within a target RNA sequence at once. Gln111 was initially selected as a roadblock because its ability to halt Escherichia coli RNAP is both robust and well characterized (18). However, the use of Gln111 comes with a number of drawbacks. Constructing the Gln111 DNA template library requires a unique primer set that encodes every stop for each RNA target and therefore contributes substantially to experimental costs, which is exacerbated during mutational analysis as many additional primers are required in order to preserve the mutation in the DNA template library. Further, intermediate lengths that are poorly amplified create the potential for gaps in the experimental data, which is particularly problematic for highly repetitive sequences. Last, RNA sequences that contain an internal 'GAATTC' must be mutated to access structural information downstream or an alternative transcription roadblock (20,21) must be used. Thus, the development of a sequence-independent roadblocking strategy is highly desirable in order to reduce both experimental costs and time, thereby facilitating a broad application of cotranscriptional SHAPE-Seq to the study of how RNA folding directs RNA function.
Here we develop a sequence-independent method for halting TECs at all positions across a DNA template using streptavidin (SAv) as a transcription roadblock and combine this method with SHAPE-Seq to characterize cotranscriptional RNA folding pathways at nucleotide resolution ( Fig. 1). We start by characterizing the robustness of SAv transcription roadblocks in the context of in vitro transcription. We then implement SAv roadblocking in the cotranscriptional SHAPE-Seq framework using randomly biotinylated DNA templates to capture TECs across all transcript lengths in a general workflow that can be applied to any RNA sequence. A comparison of the SAv and Gln111 roadblocking strategies using the Bacillus cereus crcB fluoride riboswitch (22) as a model system allowed us to identify technical distinctions between SAv and Gln111 roadblocking and we propose experimental strategies that leverage the complementary strengths of each approach. The robust and sequence-independent Following open complex formation and incubation with SAv, single-round transcription is initiated to generate stalled transcription elongation complexes. After 30s, nascent RNAs are treated with either BzCN (+) or DMSO (-), extracted, and processed for paired end sequencing using the SHAPE-Seq v2.1 protocol (11). Transcript length and SHAPE modification position are identified by paired-end sequencing and used to calculate a SHAPE reactivity profile for each transcript. Individual reactivity profiles are stacked to generate a reactivity matrix that contains nucleotide resolution structural data about the cotranscriptional folding pathway. The 'SHAPE-Seq 2.1', 'Paired-End Sequencing', 'Calculate Reactivities', and 'Construct Reactivity Matrix' panels are adapted from Watters et al., 2016 (10). nature of SAv roadblocking is a powerful addition to the cotranscriptional SHAPE-Seq method that uses reagents that are all commercially available, reduces experimental costs, and simplifies materials preparation. Together, these improvements increase the accessibility of cotranscriptional SHAPE-Seq to a broader user base to study cotranscriptional RNA folding.

Results
The efficiency of streptavidin transcription roadblocking is DNA strand dependent SAv roadblocking (23,24) has previously been shown to prevent E. coli RNAP from "running off' the DNA template during in vitro transcription (9). Typically, SAv roadblocks are introduced into a DNA template to prevent run-off transcription by

Collision of RNAP with a streptavidin roadblock does not dissociate TECs
Because cotranscriptional SHAPE-Seq aims to probe RNAs in the context of a TEC, it is critical that collision of RNAP with a roadblock does not dissociate the TEC and release the RNA. To assess the stability of TECs stalled at a SAv roadblock, we immobilized each biotinylated template using SAv paramagnetic beads and separated RNAs in complex with a stable TEC from free RNAs by magnetic pull-down (Fig. 2B).
We observed 95-98% of the RNAs in stalled TECs were still attached to beads after the pull-down, indicating that the vast majority of RNAs remain stably associated with RNAP in stalled TECs independent of the strand to which the SAv roadblock is tethered (Fig.   2B).

Design of randomly biotinylated DNA templates for cotranscriptional SHAPE-Seq
Having established that SAv roadblocking can be used to halt RNAP in stable TECs, we next sought to validate its use in the cotranscriptional SHAPE-Seq experimental framework. Randomly biotinylated DNA templates were prepared by enzymatic incorporation of biotin-11-dNTPs during PCR amplification. Vent Exo-was selected for template amplification as it is particularly tolerant of biotin-11-dNTPs (28).   (C) Distribution of read alignments over transcript lengths from cotranscriptional SHAPE-Seq with SAv roadblocking. Cotranscriptional SHAPE-Seq was performed on the Bacillus cereus crcB fluoride riboswitch with 10 mM or 0 mM fluoride using DNA templates with a targeted biotinylation rate of one, two, or four modifications per template. %Aligned Reads is calculated by dividing the unmodified RNA reads (SHAPE (-) sample) aligned at each transcript length by total unmodified reads aligned. 'Full length' terminally roadblocked TECs are separately plotted from internal stops to provide a clear view of the accumulation of both classes of stops. Fluoride riboswitch function is evident in the distribution of RNAs transcribed without fluoride, which accumulate as terminated products from position 80 -82, compared to those transcribed with fluoride, which are distributed at positions beyond the terminator or accumulate at the terminal roadblock. (D) Comparison of read alignment distribution for SAv and Gln111 cotranscriptional SHAPE-Seq of the Bacillus cereus crcB fluoride riboswitch. Reads aligning to terminally roadblocked transcripts were not included when calculating the percentage of reads aligned to each transcript length since Gln111 roadblock run-through transcripts do not align due to the presence of an EcoRI site in the DNA template (10). Gln111 cotranscriptional SHAPE-Seq data downloaded from the Small Read Archive (http://www.ncbi.nlm.nih.gov/sra) (Table S1).

Validation: Analysis of streptavidin and Gln111 transcript length alignments
proportionally shifts the distribution toward shorter lengths as it becomes increasingly likely for RNAP to encounter a SAv roadblock (Fig. 3C). As an alternative to increased template biotinylation, enrichment for stalled complexes could also be achieved by omitting the terminal roadblock and using immobilized DNA templates to remove run-off transcripts before the transcription reaction is stopped (Fig. 2B).
In all samples the transcript lengths are distributed unevenly in a distinct pattern of peaks and troughs, representing high and low abundance, respectively. Interestingly, Because the transcript length distribution of cotranscriptional SHAPE-Seq libraries produced using SAv roadblocking approximates that of libraries produced using Gln111, we concluded that SAv roadblocking provides a sufficient distribution of TECs for reliable cotranscriptional SHAPE-Seq measurements.
Consistent with the interpretation that the collision of RNAP with a SAv roadblock produces backtracked complexes, RNA folding transitions associated with aptamer folding and terminator nucleation are displaced downstream by 1-4 transcript lengths and appear to be more gradual when TECs are stalled with SAv (Fig. 5). The first such major structural transition that is observed earlier is the fluoride-independent decrease  Fig. 5A, S1E, S2E). Furthermore, reactivity changes at nucleotides A10 and A22 that were previously shown to be associated with aptamer folding (10)

Discussion
We have developed a sequence-independent method for distributing stalled TECs across a DNA template using SAv roadblocking and characterized the basic properties of SAv as an internal transcription roadblock. We also benchmarked the use of SAv roadblocks in cotranscriptional SHAPE-Seq against data that was previously generated using Gln111. We found that cotranscriptional SHAPE-Seq results were largely independent of the roadblock strategy used and propose that differences are dependent on the relative propensity for each transcription roadblock to induce backtracking. Indeed, the overall cotranscriptional SHAPE-Seq reactivity landscape of the fluoride riboswitch folding pathway is largely independent of the stalled TEC distribution strategy (Figs. 4B-E, S1A-B,S2A-B). We therefore suggest that while the use of Gln111 roadblocking for cotranscriptional SHAPE-Seq may provide greater accuracy than SAv roadblocking in mapping folding events to specific transcript lengths, the simplicity and reduced cost of SAv roadblocking makes it better suited for generation of full cotranscriptional SHAPE-Seq reactivity profiles.   (10). Analysis of crcB aptamer folding using SAv roadblocking clearly demonstrates that coordinated reactivity changes remain associated even when the transcript length to which they are mapped is shifted (Fig. S3).
Distributing TECs across all positions of a DNA template presents a technical challenge for which numerous solutions with unique advantages and disadvantages exist. Protein-based roadblocking strategies (9,18,20,21)

Amplification of Biotinylated DNA Templates
Randomly biotinylated DNA templates were prepared by PCR amplification and gel extraction as described above except that instead of supplying a dNTP mixture, each dNTP was added individually to a total of 100 nmol combined dNTP and biotin-11-dNTP. Assuming equal probability of incorporating a biotinylated or non-biotinylated dNTP, for 1x biotin incorporation, the nmol quantity of each biotin-11-dNTP included in the reaction was determined using the formula where dNTP bio is the nmol quantity of biotin-11-dNTP for base N included in the reaction, N count is number of occurrences of base N in the template and nontemplate strands of the DNA sequence that encodes the target RNA (not including reverse primer sequence), and dNTP comb is the combined nmol quantity biotinylated and nonbiotinylated dNTP for base N included in the reaction. For higher biotin modifications, dNTP bio was then multiplied by the desired number of biotin modifications per template.
The quantity of each non-biotinylated dNTP included in the reaction was determined by subtracting dNTP bio from dNTP comb . Biotin-11-dATP and biotin-11-dGTP were purchased from Perkin Elmer. Biotin-11-dCTP and biotin-11-dUTP were purchased from Biotium. Amplification of randomly biotinylated B. cereus crcB fluoride riboswitch DNA templates (Table S4) was directed by oligonucleotides F and G (Table S3).

In vitro transcription (Radiolabeled)
For each sample, 0.125 pmol of biotinylated DNA template was pre-incubated

Sequencing Library Processing
An RNA linker was adenylated using the 5' DNA adenylation kit (New England Biolabs), purified by TRIzol extraction, and quantified using a Qubit fluorometer as described in previously (10). Extracted RNAs were ligated to an RNA linker using T4 RNA Ligase 2 truncated KQ (New England Biolabs) by incubation at room temperatures described previously (10). Reverse transcription of the linker ligation products was performed using Superscript III Reverse Transcriptase (Life Technologies) as described previously (10). Ligation of an Illumina A_b adapter fragment was performed using CircLigase I ssDNA ligase (Epicentre) as described previously (10). ssDNA libraries were used to generate fluorescently labeled dsDNA libraries for library quality control as described previously (10). The resulting dsDNA libraries were analyzed by capillary electrophoresis using an ABI 3730xl and the resulting traces were used to evaluate library length distribution and the presence of adapter dimer prior to sequencing.
Sequencing libraries were generated as described previously (11).
These abbreviations were used for compatibility with the Integrated DNA Technologies ordering notation.