Distributed biotin–streptavidin transcription roadblocks for mapping cotranscriptional RNA folding

Abstract RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin–streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin–SAv roadblocks. We then show that randomly distributed biotin–SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin–SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq.


INTRODUCTION
The capacity for RNA to fold into sophisticated structures is integral to its roles in diverse cellular processes including gene expression, macromolecular assembly, and RNA splicing (1,2). Because RNA folding can occur on a shorter timescale than nucleotide addition by RNA polymerase (RNAP) (3)(4)(5), a nascent RNA can transition through multiple intermediate structural states as it is synthesized (6). Pioneering experimental studies of RNA cotranscriptional folding used biochemical methods to characterize these RNA structural intermediates (6)(7)(8) and more recently, single-molecule force spectroscopy has been used to directly observe RNA folding during transcription by measuring changes in RNA extension in real time (9). However, the lack of a robust method to directly interrogate RNA structure at nucleotide resolution during transcription has so far limited our ability to fully investigate the fundamental principles of RNA cotranscriptional folding and its impact on generating functional RNA structural states that govern biological processes.
We recently addressed this technological gap by developing cotranscriptional SHAPE-Seq to measure nascent RNA structures at nucleotide resolution (10). SHAPE-Seq combines chemical RNA structure probing with highthroughput sequencing to simultaneously characterize the structure of many RNAs in a mixture (11)(12)(13). Chemical modification of a target RNA is accomplished using any of the suite of SHAPE probes available that react with the RNA 2 -hydroxyl at 'flexible' regions of the molecule, such as unpaired nucleotides in single stranded regions and loops (14). After reverse transcription (RT), modified nucleotides can be detected as truncated RT products using high-throughput sequencing. The resulting sequencing reads are then used to generate a 'reactivity' value for each nucleotide in each RNA (15). SHAPE-Seq reactivity represents the relative flexibility of each nucleotide of an RNA: highly reactive nucleotides tend to be singlestranded, whereas nucleotides with low reactivity tend to be involved in base-pairing or other intra-or intermolecular interactions (11,16).
Cotranscriptional SHAPE-Seq combines the ability of SHAPE-Seq to characterize complex mixtures of RNAs with in vitro transcription in order to simultaneously interrogate the structure of all intermediate lengths of a target RNA. Each intermediate length is probed in the context of stalled transcription elongation complexes (TECs) (10) which are generated by constructing a DNA template library containing a promoter, a variable length of the target RNA template, and an EcoRI recognition site. Within 30s of the start of in vitro transcription, TECs are blocked by a catalytically dead EcoRI E111Q mutant (Gln111) (17,18) bound to the EcoRI recognition sites (10) and treated with either the fast acting SHAPE reagent benzoyl cyanide (BzCN, t 1/2 of 250 ms) (19) or dimethyl sulfoxide (DMSO) as a control. RNAs are then quickly extracted and processed for paired-end sequencing to identify the transcript length and SHAPE-modification position as described previously (10,11). RNA structural states that persist on the order of seconds are interrogated to provide 'snapshots' of kinetically trapped intermediates that reveal key transitions within RNA folding pathways (10).
The cotranscriptional SHAPE-Seq experiment requires stalled TECs to be generated for all intermediate lengths within a target RNA sequence at once. Gln111 was initially selected as a roadblock because its ability to halt Escherichia coli RNAP is both robust and well characterized (18). However, the use of Gln111 comes with a number of drawbacks. Constructing the Gln111 DNA template library requires a unique primer set that encodes every stop for each RNA target and therefore contributes substantially to experimental costs. This is exacerbated during mutational analysis as many additional primers are required in order to preserve the mutation in the DNA template library. Furthermore, intermediate lengths that are poorly amplified create the potential for gaps in the experimental data, which is particularly problematic for highly repetitive sequences. Lastly, RNA sequences that contain an internal EcoRI recognition site 'GAATTC' must be mutated to access structural information downstream or an alternative transcription roadblock (20,21) must be used. Thus, the development of a sequence-independent roadblocking strategy is highly desirable in order to reduce both experimental costs and time, thereby facilitating a broad application of cotranscriptional SHAPE-Seq to the study of how RNA folding directs RNA function.
Here we develop a sequence-independent method for halting TECs at all positions across a DNA template using streptavidin (SAv) as a transcription roadblock and combine this method with SHAPE-Seq to characterize cotranscriptional RNA folding pathways at nucleotide resolution ( Figure 1). We start by characterizing the robustness of biotin-SAv transcription roadblocks in the context of in vitro transcription and perform a rigorous analysis of how collision with a roadblock influences RNAP position. We then implement biotin-SAv roadblocking in the cotranscriptional SHAPE-Seq framework using randomly biotinylated DNA templates to capture TECs across all transcript lengths in a general workflow that can be applied to any RNA sequence. A comparison of the SAv and Gln111 roadblocking strategies using the Bacillus cereus crcB fluoride riboswitch (22) as a model system allowed us to determine how technical distinctions between biotin-SAv and Gln111 roadblocking can influence cotranscriptional SHAPE-Seq data. Finally, we propose experimental strategies that leverage the complementary strengths of each approach. The robust and sequence-independent nature of biotin-SAv roadblocking is a powerful addition to the cotranscriptional SHAPE-Seq method that uses reagents that are all commercially available, reduces experimental costs, and simplifies materials preparation. Together, these improvements increase the accessibility of cotranscriptional SHAPE-Seq to a broader user base to study cotranscriptional RNA folding.

DNA template preparation
Preparation of J23119 DNA templates. DNA templates for in vitro radiolabeled transcription experiments were prepared by PCR amplification. 500 l reactions included 411.25 l H 2 O, 50 l 10× ThermoPol Buffer (New England Biolabs), 6.25 l 10 mM dNTPs, 12.5 l 10 M forward primer (Supplementary Table S1), 12.5 l 10 M reverse primer (Supplementary Table S1), 2.5 l template plasmid DNA and 5 l of Vent Exo-DNA polymerase (New England Biolabs). 100 l aliquots were amplified with a thermal cycling program consisting of 30 cycles using an annealing temperature of 55 • C. After thermal cycling, reactions were pooled into two 250 l aliquots, mixed with 50 l 3 M sodium acetate (NaOAc) pH 5.5 and 1 ml 100% ethanol (EtOH) each, and stored at −80 • C for 30 min. After centrifugation, precipitated pellets were washed with 1.5 ml cold 70% EtOH, and dried using a SpeedVac. Dried pellets were pooled by dissolving in 30 l H 2 O, fractionated by gel electrophoresis with a 1% agarose gel, and extracted using the QIAquick Gel Extraction Kit (Qiagen). Purified template was quantified using a Qubit Fluorometer (Life Technologies). Amplification of SRP DNA templates (Supplementary Table S2) with a biotin modification at positions 33 and 42 relative to the transcription start site was directed with oligonucleotides A and B or C (Supplementary Table  S1), respectively.
To generate a DNA template containing a single nontemplate strand biotin modification, 10 M oligonucleotide D and 10 M oligonucleotide E (Supplementary Table S1) in 50 l of 1× ThermoPol buffer (New England Biolabs) was incubated at 95 • C for 5 min and annealed by incubating at 37 • C for 20 min. After chilling annealed oligonucleotides at 4 • C for 1 min, 10 U of ExoI was added and the sample was incubated at 37 • C for 30 min to remove excess oligonucleotides. The resulting DNA templates were first purified by using the QIAquick Purification Kit (QIAgen) and then run on a 1% agarose gel and gel extracted using the QI-Aquick Gel Extraction Kit (Qiagen). Purified DNA template concentration was measured using a Qubit Fluorometer (Life Technologies).
Preparation of λ P R templates containing precisely positioned roadblocks. Linear DNA templates were generated by PCR amplification of pIA226 plasmid with Taq1 DNA  Table S1, oligonucleotide J). The EcoRI site and internal biotin-dT were positioned to roadblock RNAP at the same nucleotide. For ExoIII footprinting, the template (bottom) strand primers were end-labeled with [␥ 32 P]-ATP using PNK (New England Biolabs) and purified using G-50 spin columns (GE Healthcare).
Preparation of Biotinylated DNA Templates. Randomly biotinylated DNA templates were prepared by PCR amplification and gel extraction as described above except that instead of supplying a dNTP mixture, each dNTP was added individually to a total of 100 nmol combined dNTP and biotin-11-dNTP. Assuming equal probability of incorporating a biotinylated or non-biotinylated dNTP, for 1× biotin incorporation, the nmol quantity of each biotin-11-dNTP included in the reaction was determined using the formula: where dNTP bio is the nmol quantity of biotin-11-dNTP for base N included in the reaction, N count is number of occurrences of base N in the template and nontemplate strands of the DNA sequence that encodes the target RNA (not including reverse primer sequence), and dNTP comb is the combined nmol quantity biotinylated and non-biotinylated dNTP for base N included in the reaction fixed by the PCR condition. For increased biotin modification, dNTP bio was then multiplied by the desired number of biotin modifications per template. The quantity of each non-biotinylated dNTP included in the reaction was determined by subtracting dNTP bio from dNTP comb . Biotin-11-dATP and biotin-11-dGTP were purchased from PerkinElmer. Biotin-11-dCTP and biotin-11-dUTP were purchased from Biotium. Amplification of randomly biotinylated B. cereus crcB fluoride riboswitch DNA templates (Supplementary Table S2) was directed by oligonucleotides F and G (Supplementary Table S1).

In vitro transcription (radiolabeled)
For each sample, 0.125 pmol of biotinylated DNA template was pre-incubated with 12. times with SAv binding buffer (0.5 M sodium chloride (NaCl) and 20 mM Tris-HCl pH 7.5) before incubating with 0.125 pmol template DNA per sample in SAv binding buffer for 30 min. Bound templates were pulled-down and washed three times with 1× transcription buffer. In vitro transcription reactions were mixed as described above. After 30s, reactions were placed on a magnetic stand and allowed to separate for 30s before the supernatant was removed and added to 125 l of stop solution and the pellet was resuspended in 125 l of stop solution and 25 l 1× transcription buffer.
Following in vitro transcription, RNA was purified by the addition of 150 l of phenol/chloroform/isoamyl alcohol (25:24:1), vortexing, centrifugation and collection of the aqueous phase. RNA was precipitated by adding 450 l of 100% EtOH and storage at −20 • C. Following precipitation, RNA was resuspended in transcription loading dye (1× transcription buffer, 80% formamide, 0.05% bromophenol blue and xylene cyanol). RNAs were fractionated by electrophoresis using 12% denaturing polyacrylamide gels containing 7.5 M urea (National Diagnostics, UreaGel). Reactive bases were detected using an Amersham Biosciences Typhoon 9400 Variable Mode Imager. Quantification of bands was performed using ImageQuant. For all experiments, individual bands were normalized for incorporation of [␣-32 P]-UTP by dividing band intensity by the number of U's in the transcript. biotin-SAv roadblocking efficiency was calculated by dividing the sum of roadblocked RNAs by the sum of all roadblocked and run-off products. Aborted/paused products are not included in this calculation.

GreB cleavage assay
Linear DNA templates (

Exonuclease footprinting
Linear radiolabeled DNA templates (140 nM), holo-RNAP (160 nM), ApU (100 M) and 5 M each UTP, GTP and ATP were incubated in TB-50 for 10 min at 37 • C to form a halted A26 TEC. 500 nM Gln111 or 1.5 M SAv were added to DNA-RI and DNA-B respectively and incubated for 8 min at 37 • C. Transcription was restarted by the addition of all NTPs at 200 M, samples were aliquoted at 5 l/tube. ExoIII (New England Biolabs) was diluted in TB-50 and added to the stalled complexes at a final concentration of 4 U/l for 3 min at 21 • C. Reactions were quenched with the Stop buffer and analyzed on a 6% polyacrylamide-7 M urea gel.

Electrophoretic mobility shift assay
Randomly biotinylated DNA templates were prepared as described above except that an unmodified reverse primer (Supplementary Table S1, oligonucleotide K) was used. 1 pmol of DNA template was incubated in the absence or presence of 10 pmol SAv for 15 min at room temperature. Six times Purple Gel Loading Dye (no SDS) (New England Biolabs) was added and samples were fractionated in using a 1.5% Agarose gel containing 1× GelRed (Biotium). Bands were detected by UV transillumination using a ChemiDoc (Bio-Rad).

In vitro transcription (cotranscriptional SHAPE-Seq)
Reaction mixtures containing 100 nM randomly biotinylated DNA template and 4 U of E. coli RNAP holoenzyme (New England Biolabs) were incubated in transcription buffer, 0.2 mg/ml BSA, and 500 M NTPs for 7.5 min at 37 • C to form open complexes. When present, NaF was included to a final concentration of 10 mM. Following open complex formation, SAv monomer (Promega) was added to 40 M and incubated for another 7.5 min. Singleround transcription reactions were initiated by addition of MgCl 2 to 5 mM and rifampicin to 10 g/ml. After 30s, RNAs were SHAPE modified by splitting the reaction and mixing half with 2.78 l of 400 mM BzCN dissolved in anhydrous DMSO (+) sample) or anhydrous DMSO only (Sigma Aldrich; (−) sample) for ∼2s before addition of 75 l of TRIzol solution (Life Technologies) and extraction. Extracted RNAs were dissolved in 20 l 1× DNase I buffer (New England Biolabs) containing 1 U DNase I (New England Biolabs) and incubated at 37 • C for 30 min. After DNA digestion, 30 l of H 2 O and 150 l of TRIzol were added and the RNA was extracted and dissolved in 10 l 10% DMSO.

Sequencing library processing
An RNA linker was adenylated using the 5 DNA adenylation kit (New England Biolabs), purified by TRIzol extraction, and quantified using a Qubit Fluorometer as described previously (10). Extracted RNAs were ligated to an RNA linker using T4 RNA Ligase 2 truncated KQ (New England Biolabs) by incubation at room temperature as described previously (10). Reverse transcription of the linker ligation products was performed using Superscript III Reverse Transcriptase (Life Technologies) as described previously (10). Ligation of an Illumina A b adapter fragment was performed using CircLigase I ssDNA ligase (Epicentre) as described previously (10). ssDNA libraries were used to generate fluorescently labeled dsDNA libraries as described previously (10). The resulting dsDNA libraries were analyzed by capillary electrophoresis using an ABI 3730xl and PAGE 5 OF 12 Nucleic Acids Research, 2017, Vol. 45, No. 12 e109 the resulting traces were used to evaluate library length distribution and the presence of adapter dimer prior to sequencing. Sequencing libraries were generated as described previously (11).

Sequencing and analysis
Sequencing of 1× (replicate 1), 2× and 4× biotin fluoride riboswitch cotranscriptional SHAPE-Seq libraries was performed on the Illumina HiSeq2500 in Rapid Run mode using 2 × 36 bp paired end reads and 20% phiX. 1× biotin fluoride riboswitch replicates 2-4 were performed in a completely different laboratory setting using completely different reagents and sequenced on the Illumina NextSeq500 using 2 × 37 bp paired end reads and 30% phiX. All cotranscriptional SHAPE-Seq computational tools used in this study can be found on GitHub at: https://github.com/ LucksLab/Cotrans SHAPE-Seq Tools/releases/ and https: //github.com/LucksLab/spats/releases/. Target FASTA files were prepared using the Cotrans targets.py script as described previously (10). Reads were mapped and processed for Spats v1.0.1 as described previously (11). In general, between 4 and 6 M reads (samples run on HiSeq2500) or between 9 and 16 M reads (samples run on NextSeq500) were mapped and used to calculate the reported cotranscriptional reactivity matrices.

The efficiency of streptavidin transcription roadblocking is DNA strand dependent
Biotin-SAv roadblocking (23,24) has previously been shown to prevent E. coli RNAP from 'running off' the DNA template during in vitro transcription (9). Typically, biotin-SAv roadblocks are introduced into a DNA template to prevent run-off transcription by including a 5 -biotinylated reverse primer during template amplification (25). Recently, a terminal biotin-SAv roadblock was used to halt a TEC to facilitate SHAPE probing of a single RNA transcript in complex with RNAP (26). Cotranscriptional SHAPE-Seq, however, requires the distribution of stalled TECs across all DNA template positions. Thus, the use of biotin-SAv roadblocking in cotranscriptional SHAPE-Seq requires random biotinylation of the DNA template during PCR, which inherently biotinylates both the template and nontemplate strands of the DNA duplex. Because each DNA strand makes distinct interactions with RNAP, a biotin-SAv roadblock in the template strand may not have an equivalent effect on RNAP compared to the nontemplate strand.
To measure the efficiency of template vs. nontemplate strand biotin-SAv roadblocks, we performed in vitro transcription using DNA templates that were biotinylated at a single internal position downstream of the promoter in the template strand (positions +33 or +42) or in the nontemplate strand (position +33) (Figure 2A). Template strand biotin-SAv roadblocks stall TECs with 80-87% efficiency as a cluster of stops 7-13 nucleotides (nts) upstream of the biotinylation site. In contrast, nontemplate strand biotin-SAv roadblocks stall TECs as a more defined stop, but with only 30% efficiency (Figure 2A). The superior roadblocking efficiency of the template strand biotin-SAv roadblocks is  by magnetic pull-down ( Figure 2B). We observed that 95-98% of the RNAs in stalled TECs were still attached to beads after the pull-down, indicating that the vast majority of RNAs remain stably associated with RNAP in stalled TECs independent of the strand to which the biotin-SAv roadblock is tethered ( Figure 2B).

Collision with a streptavidin roadblock, but not Gln111, induces extensive RNAP backtracking
The observation that template strand biotin-SAv roadblocks stall RNAP at a range of positions suggests that the enzyme may reverse translocate (backtrack) (28) upon collision with a flexibly attached roadblock. By contrast, RNAP may remain stationary after running into a rigid Gln111 roadblock. Knowing the RNAP position on the nascent RNA is essential for interpreting the RNA modification patterns. To compare the structures of TECs stalled by the two roadblocks, we designed templates on which RNAP was stalled at the same position by a biotin in the template strand or Gln111 bound to GAATTC ( Figure 3A). On these templates, RNAP that initiates transcription from a strong bacteriophage P R promoter in the absence of CTP is halted after incorporating the A residue at position 26. In the presence of all NTPs, RNAP resumes elongation and is efficiently stalled by either Gln111 or biotin-SAv linked to the template DNA strand ( Figure 3B). An EcoRI site and internal biotinylated dT were positioned to achieve RNAP stalling at the same position on the respective templates (after the addition of U45 on templates used in these experiments). As was observed in Figure 2, collision of RNAP with a biotin-SAv roadblock yields a cluster of roadblocked TECs across several transcript lengths.
To test if RNAP was backtracked after running into each roadblock, we used GreB, an E. coli elongation factor that stimulates the intrinsic endonucleolytic cleavage of the nascent RNA in backtracked TECs (29). We observed several GreB-induced cleavages in SAv-stalled complexes, consistent with RNAP backtracking by up to 4 nt upstream of the primary roadblock position at +45 ( Figure 3B). Because some TECs transcribe to +48 by 'pushing' the biotin-SAv prior to reverse translocation, the maximum possible backtrack in this sequence context is 7 nt, however, such extensive backtracking is likely to be infrequent because only ∼25% of roadblocked TECs have transcribed to +47 or +48 and these complexes may backtrack to any position between +41 and +45. By contrast, Gln111-stalled TECs did not backtrack by more than 1 nt ( Figure 3B). These results suggest that RNAP is shifted backwards in biotin-SAv-stalled complexes, as compared to those stalled by Gln111.
We next used exonuclease III (ExoIII) to map the back border of RNAP in stalled TECs ( Figure 3C). ExoIII is a processive 3 -5 exonuclease used extensively to determine the RNAP translocation register (30). The biotin-SAvstalled RNAP protected 17-19 nts of the template DNA strand upstream of A45 from ExoIII cleavage, whereas Gln111-stalled RNAP protected only 14-15 nts upstream of A45 ( Figure 3C). We conclude that when RNAP runs into a biotin-SAv roadblock, an additional 3-4 nts of the nascent RNA becomes protected inside the backtracked enzyme, as compared to the enzyme stalled by Gln111 ( Figure 3D).

Design of randomly biotinylated DNA templates for cotranscriptional SHAPE-Seq
Having established that biotin-SAv roadblocking can be used to halt RNAP in stable TECs, we next sought to validate its use in the cotranscriptional SHAPE-Seq experimental framework. Randomly biotinylated DNA templates were prepared by enzymatic incorporation of biotin-11-dNTPs during PCR amplification. Vent Exo-was selected for template amplification as it is particularly tolerant of biotin-11-dNTPs (31). To prevent biotinylation of the promoter nontemplate strand, which could interfere with promoter open complex formation, the forward PCR primer comprised positions -45 to -1 relative to the transcription start site ( Figure 4A). Optionally, the reverse primer can include a 5 biotin modification as a terminal roadblock to prevent template run-off during the cotranscriptional SHAPE-Seq experiment ( Figure 4A). We prepared randomly biotinylated DNA templates for the B. cereus crcB fluoride riboswitch (22) with targeted biotinylation levels of one, two, or four modifications per DNA template ( Figure 4B) and performed cotranscriptional SHAPE-Seq in the presence and absence of fluoride. This choice of model system allowed us to compare the general characteristics of the biotin-SAv roadblocking strategy with our previous  Table S5). Gln111 approach (10), and to perform a detailed comparison of the reactivity profiles obtained from each method.

Validation: Analysis of streptavidin and Gln111 transcript length alignments
To obtain a complete reactivity matrix for a target RNA, it is necessary to interrogate the structure of all intermediate length transcripts. Thus, it is critical that all RNA intermediates are well represented in cotranscriptional SHAPE-Seq libraries. To assess coverage of the intermediate transcript lengths when biotin-SAv roadblocks are randomly incorporated, we examined the B. cereus crcB fluoride riboswitch (22,32) with cotranscriptional SHAPE-Seq using varying degrees of biotinylation (1, 2 or 4 biotins/template) as described above. We then compared the distribution of unmodified transcript lengths to the number of biotins expected to be present in the DNA template ( Figure 4C). Increased DNA template biotinylation proportionally shifts the distribution toward shorter lengths as it becomes increasingly likely for RNAP to encounter a biotin-SAv roadblock ( Figure 4C). As an alternative to increased template biotinylation, enrichment for stalled complexes could also be achieved by omitting the terminal roadblock and using immobilized DNA templates to remove run-off transcripts before the transcription reaction is stopped ( Figure 2B).
In all samples, the transcript lengths are distributed unevenly in a distinct pattern of peaks and troughs, representing high and low abundance, respectively. Interestingly, comparison of the alignment distributions produced by biotin-SAv and Gln111 roadblocking reveal remarkable consistency between both methods ( Figure 4D). One plausi-ble explanation for the similarity of biotin-SAv and Gln111 transcript distributions is that the linker ligation step used to facilitate reverse transcription in the SHAPE-Seq v2.1 strategy (11) influences the representation of transcript lengths in cotranscriptional SHAPE-Seq libraries. There is well-documented structure-and sequence-dependent bias (33,34) in RNA-RNA ligations. While such bias does not influence cotranscriptional SHAPE-Seq reactivity calculation, as reactivity profiles are calculated internally for each length, reduction of ligation bias is a target of future development as it would reduce the sequencing depth necessary to adequately cover all intermediate transcript lengths by flattening the transcript length distributions. Because the transcript length distribution of cotranscriptional SHAPE-Seq libraries produced using biotin-SAv roadblocking approximates that of libraries produced using Gln111, we conclude that biotin-SAv roadblocking provides a sufficient distribution of TECs for reliable cotranscriptional SHAPE-Seq measurements.

Validation: analysis of streptavidin and Gln111 cotranscriptional SHAPE-Seq reactivity measurements
We next compared the cotranscriptional SHAPE-Seq reactivity measurements made for the crcB fluoride riboswitch using biotin-SAv roadblocks to previous measurements made using Gln111 roadblocks (10) (Supplementary Table  S6). Our previous characterization of the crcB fluoride riboswitch revealed key signatures of aptamer folding and fluoride binding as well as the fluoride-dependent bifurcation of the RNA folding pathway to produce the riboswitch 'ON' and 'OFF' regulatory states (10) (Figure 5A). The  Table S6). (F and G) Reactivity differences ( ) between biotin-SAv and Gln111 roadblocking data with 10 mM NaF (F) and 0 mM NaF (G). Regions of low reactivity tend to have values that are close to 0, whereas regions with moderate or high reactivity exhibit greater values. Results shown in (B) and (C) are n = 1 and are representative of four biological replicates (Supplementary Figure S1). same molecular signatures and their associated transitions are readily observable in reactivity matrices produced using biotin-SAv roadblocking ( Figure 5B-E and Supplementary  Figures S1A-B, S2A-B and S3A-B) indicating that overall, cotranscriptional SHAPE-Seq uncovers the same RNA structural information regardless of whether a biotin-SAv or Gln111 transcription roadblock is used.
We then compared biotin-SAv and Gln111 cotranscriptional SHAPE-Seq reactivity profiles for all RNA intermediates by calculating reactivity differences ( ρ) (Figure 5F-G and Supplementary Figures S2C-D and S3C-D). In perfect agreement with the analysis of preciselypositioned roadblocks ( Figure 3D), ρ analysis reveals that when biotin-SAv roadblocking is used, the RNAP footprint can protect an additional upstream ∼4-7 nt from SHAPE modification, as can be seen by the presence of a stripe of lower reactivity adjacent to the RNAP position compared to when Gln111 roadblocks are used. (Figure 5F-G and Supplementary Figures S2C-D and S3C-D). This behavior is remarkably consistent, and is visible in all biotin-SAv cotranscriptional SHAPE-Seq experiments, both in the presence and absence of fluoride.
Consistent with the interpretation that the collision of RNAP with a biotin-SAv roadblock produces backtracked complexes in different sequence contexts, RNA folding transitions associated with aptamer folding and terminator nucleation are displaced downstream by 1-4 transcript lengths and appear to be more gradual when TECs are stalled with biotin-SAv ( Figure 6 and Supplementary Figures S1C, S2E-F and S3E-F). The first such major structural transition that is observed earlier is the fluorideindependent decrease in P1 loop (nt 11-16) reactivity as it pairs with the lower 6 nt of the terminator stem (nt 42-47) to form the pseudoknot PK1 ( Figure 5A) Figure S1). PK1 folds abruptly as P1 loop reactivity decreases sharply over transcript lengths 57-59, independent of fluoride concentration. With biotin-SAv, P1 loop reactivity decreases more gradually over transcript lengths 57-63, with an even more gradual drop in the absence of fluoride ( Figure 6A and Supplementary Figures S1C, S2E and S3E). Furthermore, reactivity changes at nucleotides A10 and A22 that were previously shown to be associated with aptamer folding (10) are also displaced downstream in the biotin-SAv dataset such that they remain coordinated with PK1 folding (Supplementary Figure S4). Following aptamer folding, the crcB fluoride riboswitch directs transcription termination or antitermination in the absence or presence of fluoride, respectively. In the absence of fluoride, terminator nucleation is observed as a coordinated reactivity decrease in the upper terminator stem (nt 52-55) and reactivity increase in the P1 loop as the terminator hairpin winds and disrupts PK1 (10). Terminator folding (35) in the absence of fluoride is consistently displaced downstream by 1 transcript length when biotin-SAv roadblocking is used, occurring across lengths 76-79 with Gln111 and 77-80 with streptavidin ( Figure 6B and Supplementary Figures S1C, S2F and S3F). In the presence of fluoride, partial folding of the upper terminator stem is delayed until RNAP has traversed the poly-U tract and the fluoride-bound aptamer sequesters the base of the terminator stem so that only a partial terminator hairpin can form (10). In contrast to termination in the absence of fluoride, partial terminator nucleation in the presence of fluoride is more sensitive to the roadblock used, occurring at length 88 with Gln111 and length 92 with biotin-SAv ( Figure 6B and Supplementary Figures S1C, S2F and S3F). The relative insensitivity of terminator hairpin folding to roadblock type when nucleation occurs in close proximity to RNAP is consistent with observations that nascent RNA structure can prevent RNAP backtracking (36). Importantly, all RNA structural states associated with the crcB fluoride riboswitch termination and antitermination transitions are identified by cotranscriptional SHAPE-Seq regardless of the transcription roadblock used (Supplementary Figure S5).
The final noteworthy distinction between cotranscriptional SHAPE-Seq reactivity profiles produced with biotin-SAv and Gln111 roadblocking is observed at the transcription termination sites in the presence of fluoride. Because crcB fluoride riboswitch antitermination efficiency in cotranscriptional SHAPE-Seq conditions is close to 100% (10), we do not expect to see reactivity signatures of transcription termination in the presence of fluoride. Nonetheless, previously we observed high P1 loop reactivity at the termination sites (80-82) when Gln111 is used to stall TECs e109 Nucleic Acids Research, 2017, Vol. 45, No. 12 PAGE 10 OF 12 ( Figure 6A and Supplementary Figures S2E and S3E). In contrast, P1 loop reactivity remains low at the termination sites when biotin-SAv roadblocking is used, suggesting that TECs stalled by SAv are resistant to this effect.
Our cotranscriptional SHAPE-Seq analysis of the crcB fluoride riboswitch with biotin-SAv and Gln111 roadblocking demonstrates that the choice of transcription roadblock can influence the transcript length at which a transition is observed by 1-4 nts. This observation is consistent with measurements of RNAP location relative to the RNA 3 end made using precisely placed roadblocks (Figure 3), arguing that these measurements are generalizable to diverse sequence contexts. Thus, despite subtle distinctions in reactivity measurements, it is abundantly clear that both roadblocking strategies capture the same reactivity signatures associated with aptamer folding and the riboswitch regulatory decision ( Supplementary Figures S5 and S6).

DISCUSSION
We have developed a sequence-independent method for distributing of stalled TECs across a DNA template using biotin-SAv roadblocking and characterized the basic properties of biotin-SAv as an internal transcription roadblock. We have also benchmarked the use of biotin-SAv roadblocks in cotranscriptional SHAPE-Seq against data that was previously generated using Gln111. We found that cotranscriptional SHAPE-Seq results were largely independent of the roadblock strategy used as long as the relative propensity for each transcription roadblock to induce backtracking was taken into account. Indeed, the overall cotranscriptional SHAPE-Seq reactivity landscape of the fluoride riboswitch folding pathway is independent of the stalled TEC distribution strategy ( Figure 5B-E and Supplementary Figures S1A-B, S2A-B and S3A-B). We therefore suggest that while the use of Gln111 roadblocking for cotranscriptional SHAPE-Seq provides greater accuracy than biotin-SAv roadblocking in mapping folding events to specific transcript lengths, the simplicity and reduced cost of biotin-SAv roadblocking makes it better suited for generation of full cotranscriptional SHAPE-Seq reactivity profiles.
The use of biotin-SAv to halt TECs at all positions across a DNA template provides several advantages over Gln111 in the context of cotranscriptional SHAPE-Seq that both simplify and reduce the cost of generating high-resolution profiles of RNA folding pathways. First, amplification of DNA template libraries for biotin-SAv roadblocking only requires two primers, while Gln111 roadblocking requires a large primer set whose costs exceeds that of the biotin-11-dNTPs required for randomly biotinylated DNA template preparation, especially for long RNA targets. Second, whereas Gln111 is not commercially available, the use of SAv as a transcription roadblock requires only readily available reagents, thereby improving the accessibility of the method.
While biotin-SAv roadblocking provides several experimental advantages for cotranscriptional SHAPE-Seq, in some cases the use of Gln111 may prove advantageous because of its properties as a transcription roadblock. For example, while distributed biotin-SAv roadblocks provide a simple and effective method for elucidating full RNA folding pathways, Gln111 roadblocking is better suited for the precise mapping of RNA structural rearrangements over small subsets of intermediate transcripts. Furthermore, a useful property of Gln111 roadblocking in the context of cotranscriptional SHAPE-Seq is that it facilitates the identification and removal of reads from roadblock run-through products. Because the incorporation of an EcoRI site into a DNA template introduces non-native sequence, any TECs that 'run-through' the roadblock will synthesize an RNA containing aberrant sequence and can therefore be discarded during read alignment. In contrast, every DNA template used for biotin-SAv roadblocking contains the entirety of the target RNA sequence and therefore, reads generated by TECs that run-through the first roadblock they encounter and stop at a subsequent roadblock are indistinguishable from those that were successfully halted by the first roadblock encountered. Because non-template strand biotin-SAv complexes halt transcription at low efficiency and template strand biotin-SAv complexes halt transcription with high, but imperfect, efficiency, it is important to consider how roadblock run-through might influence a particular experimental outcome. This distinction does not influence cotranscriptional SHAPE-Seq results in the context of the B. cereus crcB fluoride riboswitch, but is an important consideration when studying RNAs that may be sensitive to transcription pausing (37,38) because readthrough of a roadblock could function as a non-native pause.
There are multiple strategies that could be applied to circumvent or minimize this limitation of biotin-SAv roadblocking. The simplest approach would be to limit DNA template biotinylation such that the population of DNA templates containing more than one biotin modification is virtually nonexistent so that roadblock run-through almost always results in run-off transcription. However, because this strategy will yield a high proportion of nonbiotinylated DNA templates it would be important to enrich for stalled complexes by purifying away run-off transcription products ( Figure 2B) in order to maximize the number of reads from RNAs within roadblocked TECs. Alternatively, enrichment for high-efficiency template-strand roadblocks should be achievable by adapting a previously published strategy for the formation of heteroduplex DNA templates (39). The specific published strategy (39) is incompatible with the preparation of randomly biotinylated DNA templates because the desired DNA heteroduplex is purified by biotin-SAv mobility shift. However the general approach can be adapted to the formation of heteroduplexes in which only the template strand is biotinylated through the use of an alternative affinity tag, such as digoxigenin. The primary disadvantage to this strategy is that the essential protocol adaptations require additional costly reagents and an increased template preparation scale. In experiments where roadblock run-through is a major concern, the most effective solution is likely to use Gln111 or an alternative sequence-specific transcription roadblock (20,21) to characterize specific intermediate transcript lengths of interest after an initial characterization of the entire folding pathway using biotin-SAv.
The development of a second strategy for distributing halted TECs across a DNA template afforded us the op-portunity to examine the influence of transcription roadblock properties on cotranscriptional SHAPE-Seq data. The broad agreement of B. cereus crcB fluoride riboswitch reactivity matrices generated with each method is indicative of a high degree of experimental reproducibility, even across roadblocking strategies ( Figure 5 and Supplementary Figure S6). Specifically, our analysis of the B. cereus crcB fluoride riboswitch revealed key molecular signatures depicting aptamer folding, fluoride binding, and the fluoridedependent bifurcation of the riboswitch folding pathway into the terminated and antiterminated functional modes regardless of the roadblock used. The primary distinction between cotranscriptional SHAPE-Seq measurements made using biotin-SAv and Gln111 roadblocking is that observed structural transitions are shifted upstream by 1-4 nts when SAv is used to roadblock TECs. This distinction corresponds to a degree of uncertainty in the location of RNAP relative to the 3 end of the nascent transcript and is a direct consequence of using the RNA 3 end to indicate RNAP location. While the RNA 3 end generally provides a close approximation of the location of RNAP on a DNA template, it is not a direct measurement of RNAP position due inherent variability of RNA 3 end position relative to RNAP (40). Because RNA must emerge from RNAP (41) before it can fold, changes in the location of RNAP relative to the RNA 3 end by processes such as backtracking would sequester RNA proximal to the RNAP exit channel while simultaneously displacing the RNA 3 end from the active center, resulting in an apparent downstream shift in structural transitions. Given the inherent flexibility of biotin-SAv roadblocks due to the necessity of tethering biotin to the DNA duplex by a linker, it is not surprising that RNAP position would be less defined following collision with biotin-SAv than following collision with the relatively static Gln111 roadblock. Indeed, precise measurement of RNAP footprint clearly indicates that biotin-SAvstalled TECs are more prone to backtracking than Gln111stalled TECs. The use of biotin-dNTPs with a shorter linker could reduce the flexibility of the roadblock, however, to our knowledge only biotin-11-dNTPs have complete commercial availability.
The distribution of TECs across all positions of a DNA template presents a technical challenge for which numerous solutions with unique advantages and disadvantages exist. Protein-based roadblocking strategies (9,18,20,21) are advantageous because they do not rely on the efficient incorporation of a modified nucleotide during transcription. Instead, protein roadblocks leverage the ubiquitous challenge of traversing a physical barrier along a DNA template and should therefore be generalizable to RNA polymerases beyond E. coli RNAP. The availability of biotin-SAv and Gln111 as transcription roadblocking strategies in cotranscriptional SHAPE-Seq allows users to tailor the method to specific experimental needs and expands the utility of a powerful RNA structure probing method.

DATA AVAILABILITY
Raw sequencing data that support the findings of this study have been deposited in the Small Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra) with the BioPro-ject accession code PRJNA374354. Individual BioSample accession codes are available in Supplementary Table S3. SHAPE-Seq Reactivity Spectra generated in this work have been deposited in the RNA Mapping Database (RMDB) (http://rmdb.stanford.edu/repository/) (42). Accession codes and sample details are available in Supplementary Table S4. Source data for Figures 4-6 and Supplementary Figures S1-S6 are available with the paper online. Full versions of cropped gel images in Figures 2A, 3B and C are available with the paper online. All other data that support the findings of this paper are available from the corresponding author upon reasonable request.