Snapshots of pre-rRNA structural flexibility reveal eukaryotic 40S assembly dynamics at nucleotide resolution

Ribosome assembly in eukaryotes involves the activity of hundreds of assembly factors that direct the hierarchical assembly of ribosomal proteins and numerous ribosomal RNA folding steps. However, detailed insights into the function of assembly factors and ribosomal RNA folding events are lacking. To address this, we have developed ChemModSeq, a method that combines structure probing, high-throughput sequencing and statistical modeling, to quantitatively measure RNA structural rearrangements during the assembly of macromolecular complexes. By applying ChemModSeq to purified 40S assembly intermediates we obtained nucleotide-resolution maps of ribosomal RNA flexibility revealing structurally distinct assembly intermediates and mechanistic insights into assembly dynamics not readily observed in cryo-electron microscopy reconstructions. We show that RNA restructuring events coincide with the release of assembly factors and predict that completion of the head domain is required before the Rio1 kinase enters the assembly pathway. Collectively, our results suggest that 40S assembly factors regulate the timely incorporation of ribosomal proteins by delaying specific folding steps in the 3′ major domain of the 20S pre-ribosomal RNA.


HiSeq library preparation and sequencing
Reverse transcription reactions (20 µl final volume) were performed using Superscript III (Invitrogen), 30-70 fmol of purified ribosomal RNA and 2.5 µM of RT oligo (Supplementary Table S4). Samples were incubated at 45°C for 30 min. Subsequently, 10 U Exonuclease I and 25 U RNAseIf (New England Biolabs) was added to the reaction and incubated at 37°C for 30 min.
The cDNAs were subsequently phenol/chloroform extracted, ethanol precipitated, resuspended in water and ligated to a 5' adapter sequence (see Supplementary Table S4) using CircLigase II (EpiCentre) according to the manufacturer's instructions. After PCR amplification, DNA products were resolved on 2% MetaPhor agarose gels (Lonza) and 150-800 bp fragments were gel-purified using the MinElute kit (Qiagen) according to the manufacturer's instructions. Purified libraries were analyzed and quantified on a 2100 Bionalyzer (Agilent) using a High-Sensitivity DNA assay. Individual libraries were pooled appropriately based on concentration and barcoding, and paired-end sequencing was performed on a HiSeq 2000 system by Edinburgh Genomics (Edinburgh, UK) and BGI (Hong Kong).

Sequencing data analyses
Raw data processing was carried out using tools described in the pyCRAC software package version 1.1.9 (https://bitbucket.org/sgrann/pycrac ; (6)); the pySolexaBarcodeFilter tool was used to split reads based on barcoded indices. PyFastqDuplicateRemover was used to remove potential PCR duplicates using random nucleotide information in the 5' adapter sequence. Reads were mapped to the Saccharomyces cerevisiae 35S gene (RDN37-1) using Novoalign version 2.07 (www.novocraft.com). Only perfectly paired reads that mapped to a single position and completely overlapped with the 20S or 18S reference sequence were considered for further analyses.
Read counts for the 18S and 20S-coding sequences, generated using pyPileup, were used to calculate RT drop-off rates.

The two-channel Poisson Expectation Maximization (TCP EM ) algorithm
The ChemModSeq protocol produces two channels of data, one series of read counts per nucleotide position on the rRNA for the chemically modified sample, and another for the control. In both cases, there are nucleotide positions where the polymerase is more likely to drop-off (high), and positions where drop-off is less likely (low). Assuming that high and low drop-off rates are approximately constant along the rRNA, we assigned each position to one of three categories: high drop-off in both channels (no assignment), low dropoff in both channels (unmodified), high drop-off in the modified channel and low in the control (modified). The drop-off rates (λ 1 λ 2 ) and the probabilities of a position belonging to a category (p i1 p i2 p i3 p i4 ) were used to calculate the probability of the observed number of drop-offs (d i ) in modified and control channels at position i given the observed read count (c i ) and (inferred) values of λ 1 λ 2 using a Poisson model. We calculate the likelihood as the product of these probabilities for all positions, and all categories. For example, for the first category where drop-off rates in both channels are λ 1 we compute: The assignment of positions to categories, and the calculation of λ 1 and λ 2 are performed iteratively using an expectation maximisation algorithm that maximises the likelihood of the data (7). In practice, the TCP EM requires, on average, 22 iterations to converge (between 5 and 52), and on convergence the assignment of positions to categories is crisp: we use thresholds of 1.0 and 0.9 to decide class membership (from the range 0-1).

Sample preparation and quantitative label-free LC-MS
In solution digest was performed in a similar manner as described previously (8). Nano-UPLC-MS/MS analysis was performed using an on   Shown are average drop-off rates (n >=2) for nucleotides called modified by the TCP EM algorithm in the 18S rRNA. Solvent accessibilities were calculated using the yeast 80S crystal structure data (10) using a probe size of 1.4Å 2 .
The R 2 value indicates the correlation coefficient between the solvent accessibility and drop-off rate.

Figure S4. Rio1-HTP transiently interacts with pre-40S complexes.
Extracts prepared from Rio1-HTP and Rio1-TAP strains were fractionated by sucrose gradient (10%-50%) centrifugation. Twenty fractions were manually collected and analyzed by western blot using antibodies that recognize the      Table S4). The analyses showed that 20S pre-rRNA purified from these Nob1-depleted complexes contained all the hallmarks of late pre-40S complexes: high levels of acp modification at U1191 and a reduction in the flexibility of nucleotides in H37 and H35. This indicates that late restructuring steps can take place in the absence of Nob1.