Structural transitions in the RNA 7SK 5′ hairpin and their effect on HEXIM binding

Abstract 7SK RNA, as part of the 7SK ribonucleoprotein complex, is crucial to the regulation of transcription by RNA-polymerase II, via its interaction with the positive transcription elongation factor P-TEFb. The interaction is induced by binding of the protein HEXIM to the 5′ hairpin (HP1) of 7SK RNA. Four distinct structural models have been obtained experimentally for HP1. Here, we employ computational methods to investigate the relative stability of these structures, transitions between them, and the effects of mutations on the observed structural ensembles. We further analyse the results with respect to mutational binding assays, and hypothesize a mechanism for HEXIM binding. Our results indicate that the dominant structure in the wild type exhibits a triplet involving the unpaired nucleotide U40 and the base pair A43-U66 in the GAUC/GAUC repeat. This conformation leads to an open major groove with enough potential binding sites for peptide recognition. Sequence mutations of the RNA change the relative stability of the different structural ensembles. Binding affinity is consequently lost if these changes alter the dominant structure.

: Comparison of the fluctuations of the experimental structure Exp2 for a fully atomistic MD simulation. Left: Superposition of simulated structures for the model with protonated C71, C74 and A77; Right: superposition of simulated structures for the model with all neutral bases.

19
S2.1 Equilibration 20 The following protocol was used for both standard MD simulations and for H-REX simulations. 21 All the runs were performed using Gromacs 5.1.2. 2 Each RNA system was first neutralized with 22 potassium cations, and then solvated in a 6 nm side cubic box containing a 0.2 M KCl solution. 23 All bonds involving hydrogen atoms were maintained rigid using the LINCS algorithm. 3 Tem-24 perature was maintained at 300 K using the stochastic velocity rescaling algorithm with a time 25 constant of 0.1 ps, 4 and pressure was fixed to 1 bar using the Parinello-Rahman barostat 5 with a 26 time constant of 0.5 ps. A cut-off of 1.1 nm was employed for van der Waals and real-space elec-27 trostatics interactions, while long-range electrostatics was treated using PME 6 with a Fourier 28 grid spacing of 1Å −1 . All simulations were then equilibrated as follows: first, the potential en-29 ergy was minimized until convergence; the solvent molecules were then relaxed for 80 ps at the 30 target temperature, while the macromolecule was maintained fixed. The whole system was then 31 heated to 300 K for 400 ps, and later equilibrated for another 400 ps at 300 K before subsequent 32 production runs were performed. Figure S2: Comparison of the RMSD (top) and the overall bending (bottom) of the experimental structure Exp2 for a fully atomistic MD simulation of the protonated and the neutral system.

34
For selected systems, enhanced sampling of the RNA conformational space was achieved using 35 a Hamiltonian replica exchange scheme. The REST2 7-9 algorithm was employed, as it can 36 be successfully applied to biomolecules in explicit solvent. Instead of performing simulations 37 of several replicas with different physical temperatures, as in the more standard temperature 38 S3 replica exchange approach, in the REST2 approach only the Hamiltonian of the biomolecule is 39 rescaled, and the number of replicas thus scales with the biomolecule size, and not that of the 40 entire simulation box. Details about this algorithm and its application to biomolecules can be 41 found elsewhere. 7-9 Briefly, in a given replica n, all RNA-RNA interactions are rescaled by λ n 42 and all RNA-solvent interactions by √ λ n , with 0 < λ n ≤ 1. In our REST2 setup, we employed 43 24 replicas and a distribution of λ n such that for the n-th replica (from 1 to 24) λ n = (1/2) n/24 .

44
REST2 simulations were propagated for 200 ns using Gromacs 5.  Figure S3: Projection of the path sampling database of native sequences onto the variables ξ and the angle U44-A65-U63, describing the inward or outward position of U63. Structures included in heat-plots are the minima accessible from M1 or M2 with a threshold of 15 kcal/mol from the bottom of the respective free energy funnels. This threshold means we only include minima that can be reached from the respective structure on a time of milliseconds. 40 The projection only represents the density of minima, and no occupation probabilities or energies are used to weight the result. Figure S4: Hydrogen bonds formed between peptide residues and RNA. We measure the presence of a bond as a percentage over the full MD trajectory (dark blue 100%, white 0%). Dashed boxes highlight the interactions of U40 (green) and of the GAUC/GAUC motif (red).

S5 Mutations: Data and figures
Here, we present more detailed results for the mutants. Figures S6, S7 and S8 show the cor-94 responding energy landscapes. In Table S1 a summary of structural descriptors is given, with 95 labels indicating their position in the disconnectivity graphs provided.
U40C-U41C U41C Figure S6: Potential energy disconnectivity graphs for the U40C, U40C+U41C and U41C mutants of the RNA 7SK HP1 hairpin. The labels A to G correspond to distinct structural ensembles, which are characterised in detail in Table S1. This figure otherwise corresponds to Figure 7 in the main text. Figure S7: Potential energy disconnectivity graphs for the delU63, the doubleCG and doubleUA mutants of the RNA 7SK HP1 hairpin. While the deletion of U63 still allows for the formation of some M2 structures, the changes in the 7SK motif in the doubleUA and doubleCG mutants, which destabilise the T2 triplet, lead to an increase in the number of extended structures. The labels A to D correspond to distinct structural ensembles, which are characterised in detail in Table S1.
A39U-U68A Figure S8: Disconnectivity graphs for the potential energy landscapes of the A39G, A39G-U68C, A39U and A39U-U68A mutants. The labels A to F correspond to distinct structural ensembles, which are characterised in detail in Table S1. S12  Table S1: Summary of discrete path sampling results for all mutants. For distinct funnels on the energy landscapes shown in Fig. S6-S8 we report the potential energy difference between the global minimum and funnel bottom, ∆V gmin , the depth of the funnel (potential energy from the lowest minimum in the funnel to the lowest energy transition state connecting it to the global minimum), ∆E funnel , the classification as M1, M2 or E structures, indicating whether we detect the artificial kink of the lowest stem folding back on the rest of the structure (Ek), the position of U63, the average number of hydrogen bonds made by the nucleotides 40 and 41, including bonds made by the base as well as by the phosphate group, and the percentage of structures exhibiting a U40-X68 base pairing. S13 Figure S9: Projection of the path sampling database of native and mutated sequences onto the variables describing the behaviour of U40 and U41 with respect to the triplets T1 and T2 (T2 angle U40-U41-G42 ≤ 40, T1: angle U40-U41-G42 ≥ 160) and the behavior our U63 (inward: angle U44-A65-U63 ≤ 90, angle outward: U44-A65-U63 ≥ 90). These projections simply present densities of minima with respect to the two chosen parameters. These plots are not weighted by occupation probabilities or energies. S14 M1 (ns)  Table S2: Temporal trajectories: autocorrelation times of the variable ξ from the 24 reconstructed continuous trajectories of the H-REX for the wild system, starting from either M1 or M2 as initial configurations. The last line gives the average and the standard deviation over all replicas. Figure S11: Block analysis of the distribution of ξ for the full Hamiltonian trajectories starting from M1 (top) and M2 (bottom). The trajectory has been divided into 5 blocks of 40 ns each.