Reweighting of molecular simulations with explicit-solvent SAXS restraints elucidates ion-dependent RNA ensembles

Abstract Small-angle X-ray scattering (SAXS) experiments are increasingly used to probe RNA structure. A number of forward models that relate measured SAXS intensities and structural features, and that are suitable to model either explicit-solvent effects or solute dynamics, have been proposed in the past years. Here, we introduce an approach that integrates atomistic molecular dynamics simulations and SAXS experiments to reconstruct RNA structural ensembles while simultaneously accounting for both RNA conformational dynamics and explicit-solvent effects. Our protocol exploits SAXS pure-solute forward models and enhanced sampling methods to sample an heterogenous ensemble of structures, with no information towards the experiments provided on-the-fly. The generated structural ensemble is then reweighted through the maximum entropy principle so as to match reference SAXS experimental data at multiple ionic conditions. Importantly, accurate explicit-solvent forward models are used at this reweighting stage. We apply this framework to the GTPase-associated center, a relevant RNA molecule involved in protein translation, in order to elucidate its ion-dependent conformational ensembles. We show that (a) both solvent and dynamics are crucial to reproduce experimental SAXS data and (b) the resulting dynamical ensembles contain an ion-dependent fraction of extended structures.


INTRODUCTION
RNA molecules accomplish a plethora of functional roles in the cell, their function being dictated not only by their sequence and structure but also, to a large extent, by their dynamical behavior (1,2). Molecular dynamics (MD) simu-lations have nowadays reached the status of a standard tool to explore the dynamics of biomolecular structure at the atomistic level (3,4). Nevertheless, inaccuracies in the force fields and limitations in terms of the accessible timescales often make their capability to reproduce and to predict experimental results limited (4). The combination of MD simulations with experimental data is thus emerging as a robust asset to characterize the conformational dynamics of relevant biomolecules (5)(6)(7)(8)(9) including RNAs (10)(11)(12)(13)(14)(15). Here, MD simulations can be seen as a powerful tool that complements experimental data making it possible to add dynamical information to experiments that report ensemble averages. This can be even more important when using lowresolution experiments such as small-angle X-ray scattering (SAXS) (16,17). Here, the synergy between MD and experiment allows faithful structural ensembles at atomistic resolutions to be generated (12,13,(18)(19)(20)(21)(22)(23). SAXS experiments are particularly valuable in capturing the structural impact of changes in the ionic conditions, that are highly relevant for RNA but poorly described by force fields (4). Importantly, in the small-angle regime, most of the contribution to the overall SAXS is expected to originate from the solute. Thus, from a computational standpoint, it is common practice to compute SAXS spectra using forward models, i.e. equations to back-calculate the experiment from the simulated structures, based on the solute atomic coordinates only and including corrections to implicitly account for the solvent (24)(25)(26). This choice reflects a compromise with the intensive effort that is typically involved to include the solvation explicitly in the computation. In recent years, methods have been devised that allow computing SAXS spectra including the solvent contribution through relatively efficient implementations, aiming at predicting SAXS spectra as accurately as possible (18,20,(27)(28)(29)(30). This can be particularly critical when dealing with highly charged biomolecules, such as RNA, whose effect on the surrounding solvent and on the ionic cloud can be sizable up to a distance of several nanometers (31,32). A possible route to combine MD and experimental data is to enforce the refer-ence experimental data during the MD simulations. However, in the context of SAXS data, this option is hindered by practical limitations in the forward models. Indeed, whereas the on-the-fly estimate of the SAXS spectra is affordable for the pure solute, explicit solvent estimators have been used sparsely in this context (20).
In this work, we introduce a protocol exploiting puresolute forward models (13) and enhanced sampling (33) during MD simulations, to favor sampling of an heterogenous ensemble, without explicitly using the experimental data on-the-fly. The ensemble is then reweighted (9) to match experimental data using accurate explicitsolvent forward models (18). We apply this framework to the GTPase-associated center (GAC), a 57-nucleotides-long RNA molecule of the 23S ribosomal subunit that is involved in protein translation (34)(35)(36). Recently, SAXS experiments reported on GAC structural flexibility in response to different ionic conditions in the buffer solution, noticing that Mg 2 + can stabilize the folded state, while K + favored less compact and more extended conformations (37). Through our protocol, we notice that explicit-solvent SAXS spectra are necessary to correctly reconstruct the ion-dependent structural ensembles and to obtain radii of gyration through Guinier fit that are compatible with the experiments (Figure 1). In particular, in the case of K + , we observed that a mixture of compact and extended structures is necessary to generate a structural ensemble that is in agreement with the experimental SAXS spectra.

Simulation details
All systems were prepared from the available crystal structure of GAC in its folded state (PDB ID: 1HC8 (38), see Figure 2) after removing the bound protein. We notice that a recent RNA-only crystal structure is virtually identical (39). The system was described using the AMBER force field for nucleic acids (40)(41)(42), the 4-point optimalpoint-charge (OPC) model for water (43), and compatible ion parameters (44,45). 4-point water models have been reported to improve the accuracy of simulated hydration effects in molecular systems. This is particularly critical for biomolecules exploring heterogeneous ensembles including more extended structures, where an accurate representation of solute-water interactions is crucial to avoid overly compact conformations (20,(46)(47)(48). The OPC model has been shown to improve the agreement of simulation and experiment for unstructured RNA tetramers (49,50). All simulations were performed using GROMACS 2018.4 (51). Four ionic conditions were considered with an increasing number of Mg 2 + ions, namely: (i) none, (ii) crystallographic Mg 2 + , (iii) crystallographic Mg 2 + plus half the amount needed to neutralize the system and (iv) crystallographic Mg 2 + plus the amount needed to neutralize the system (a representative sketch of the four systems is provide in Figure 3A). In all cases, the crystal K + was retained, a 100 mM concentration of KCl was set for the buffer, and neutralization, where needed, was achieved by adding K + accordingly. The systems were initially minimized applying soft position restraints on RNA and water heavy atoms and on the crystal ions. A multi-step equilibration was then conducted: three short simulations lasting 200 ps at 100, 200 and 300 K in the NVT ensemble using the velocity-rescaling thermostat (52), with soft position restraints on RNA heavy atoms and on crystal ions; in further 900 ps in NPT ensemble using the Parrinello-Rahman barostat (53), these restraints were gradually removed to relax the system. Production runs were performed in the same ensemble. Short plain MD simulations (10 ns) were run using a large rhombic dodecahedron simulation box containing approximately 170 000 atoms, with edges distant 3 nm from all RNA atoms, using soft position restraints on solute heavy atoms to exclude solute dynamics effects. Longer plain MD simulations (1 s) were performed using a smaller rhombic dodecahedron simulation box containing approximately 100 000 atoms, with edges distant 2 nm from all RNA atoms, with no position restraints, for the ionic conditions 1 and 4 (Figure 3 A) described above.

Enhanced sampling
Enhanced-sampling MD simulations (33) were conducted on the system with no Mg 2 + ( Figure 3A) using the larger rhombic dodecahedron simulation box (∼170 000 atoms). The Hamiltonian replica exchange (HREX) method (54) with scaling applied to selected residues (55) (non-stem residues, Figure 2, left panel in grey) was used to relax the contacts of regions outside of the stems. The lambda scaling factors were in the range 0.7-1.0 and distributed on 16 replicas according to a geometric progression. To ensure charge neutrality at each replica, the missing charge was spread over the dummy atom of all water molecules. Exchanges between neighboring replicas were attempted every 400 MD steps. HREX was combined with metadynamics (56)(57)(58), where two collective variables (CVs) were biased: the Ratio between peak (q = 0.1Å −1 ) and shoulder (q = 0.2Å −1 ) of the pure-solute SAXS spectrum in the Kratky form computed using the MARTINI model (13) and an additional variable (Diff) purposely designed to estimate the degree of formation of tertiary contacts in the RNA molecule. Diff is defined as the root mean square of the G-vectors introduced in (59) and can be computed in practice taking the rootsquare difference of the eRMSD (59,60) with respect to an arbitrary structure with no contacts formed, computed using either the whole sequence or only the stem regions (see Supplementary Figure S1). Gaussians with height 2.09 kJ mol −1 and of 0.035 and 0.05 were deposited every 400 steps with a bias factor of 10 (61). In order to decrease the computational overhead of computing collective variables, the bias was applied every second step (62). Stem regions were restrained setting an upper wall on the eRMSD with respect to the initial state at 0.7 using a force constant of 41.84 kJ mol −1 . Upper (at 25Å) and lower (at 16Å) wall restraints were also applied to the R g with force constant 41.84 kJ mol −1 nm −2 . Finally, an upper wall on the Ratio CV was placed at a value of 2.7 with a force constant of 41.84 kJ mol −1 . The whole setup was achieved through the open-source, community-developed PLUMED library (63,64) version 2.5. The simulation was conducted in the NVT ensemble for 180 ns/replica. Weights were computed a posteriori using the final bias (65). Representative sam- Figure 1. Schematic pipeline of the protocol introduced in this work. Enhanced-sampling MD simulations with explicit solvent are initially employed to sample RNA conformations of different structural compactness. At this stage, pure-solute forward models for SAXS are used as collective variables in a metadynamics protocol to guide the sampling, with no information from experiments. We then compute a posteriori the full SAXS spectra for the sampled structures, to be used as input in the subsequent reweighting procedure. Notably, the SAXS spectra can either be computed from the pure solute only (top orange arrow) or including the explicit solvent (bottom sky-blue arrow). Experimental data are finally enforced through the maximum entropy principle to reweight the original ensemble (prior, blue rounded rectangle) and thus identify the least-modified ensemble (posterior, green rounded rectangle) that is in agreement with the experimental reference. Remarkably, we observe that the inclusion of the solvent in the computation of the SAXS spectra is a critical factor to achieve a successful reweighting (bottom green arrow). pled structures were selected taking advantage of the Quality Threshold (QT) clustering algorithm (66).

Backcalculation of SAXS spectra
The SAXS spectra for all the structures sampled during the MD simulation were computed in the q range 0-0.3Å −1 with a pace of 0.01Å −1 . Pure-solute spectra for the short (10 ns), long (1 s), and enhanced sampling simulations were computed on the all-atom structures with PLUMED, relying on a MARTINI bead representation of the system as recently introduced (13). Notably, the method has been shown to produce coincident results in the small angle regime that is relevant here (0 < q < 0.3Å −1 ) when compared with the all-(solute)-atom calculation (see also Supplementary Figure S2). Crysol spectra were computed for the short simulations using a maximum order of harmonics of 20 and default parameters, with the software version 2.8.4 (24). Waxsis spectra were computed for the short simulations with an envelope constructed at a distance of 7Å from the RNA molecule and using default parameters, as implemented in the modified version of GRO-MACS 4.6.2 (29). To compute the SAXS spectra with Waxsis, 1000 frames from the 10 ns simulations were used, along with 1000 frames from an independent pure-solvent simulation with the same salt concentrations and lasting 5 ns. Capriqorn spectra were computed for the short (10 ns), long (1 s), and enhanced sampling simulations through the method introduced by Köfinger and Hummer (18) implemented in the Capriqorn software (https://github.com/ bio-phys/capriqorn) version 1.0.0. The method first computes separately the radial distribution function from a pure-solvent simulation with the same salt concentrations, for which purpose we used the same 5 ns pure-solvent simulations as for Waxsis. As the excluded volume contribution was negligible in the q range considered herein, we did not include it in the computation of the spectra. For the short and long simulations, a sphere geometry with radius 40Å was used along with a shell for solvent matching of width 7 A; for the enhanced sampling simulations, a sphere radius of 54Å and shell width of 10Å was employed (see Supplementary Figure S3 for a schematic depiction). R * g was computed from the SAXS spectra through the standard Guinier fit procedure. Specifically, the linear fit was conducted in the q range 0.02-0.06Å −1 of the SAXS spectra in the ln I vs q 2 form, using scipy (67); R * g is then computed as √ −3m, where m is the slope of the fitted line.

Ensemble reweighting
We then used ensemble reweighting to enforce the experimental spectrum for four chosen values of q. Two values (q =0.03 and q =0.05Å −1 ) were chosen inside the range used to compute R * g through Guinier fit, while other two values (q =0.1 and q =0.2Å −1 ) correspond to peak and shoulder of the SAXS spectrum in the Kratky form, respectively, which are more sensitive to variations in the structural compactness of the RNA solute. The standard Schematics representing the four tested setups. All setups were used in short (10 ns) simulations with restrained solute. Setups 1 and 4 were also used in longer (1 s) unrestrained simulations. Setup 1 was used in enhanced sampling simulations. (B) Comparison of SAXS spectra computed using different available methods on a 10 ns simulation (K + only, no Mg 2 + ) with soft restraints on RNA heavy atom positions. The spectra are displayed in the Kratky form as Iq 2 vs q and normalized by their peak (q = 0.1Å −1 ) values. (C) Probability density function for the geometric gyration radius R g and RMSD from the starting conformation for a 1 s long plain MD simulation (K + only, no Mg 2 + ), with explicit solvent, initiated from the crystal structure (PDB ID: 1HC8). (D) SAXS spectrum for the 1 s long plain MD run computed from the pure solute. The frame-by-frame spectra are shown as a light orange shade and display a root-mean-square deviation from the resulting average, in orange, of 0.01 in the displayed units. The colored dashed line indicates the Guinier fit performed at lowq values of the average spectrum to compute the effective gyration radius R * g . In the top right corner of the plot, the small sketch indicates that the spectra are computed from the pure solute (RNA, in gray) only. (E) SAXS spectrum from the 1 s plain MD run computed including the explicit solvent. The frame-by-frame spectra are shown as light sky-blue shade and display a root-mean-square deviation from the resulting average, in skyblue, of 0.03 in the displayed units. In the top right corner, the small sketch indicates that the spectra are computed from the whole system (RNA, in gray, water, in sky-blue, and ions, in red). Spectra in panels D) and E) are normalized by the peak of the average spectrum. maximum-entropy reweighting procedure (5,9) consists in determining a set of Lagrangian multipliers i so that averages computed along the entire trajectory with weights pro- k B T are equal to the reference values s exp i . Here, s i (t) is the value of the ith observable in the tth frame, V b (q(t)) is the metadynamics bias at the final time recomputed for the coordinates of the t-th frame, and k B T the thermal energy. However, when used to directly enforce SAXS intensities, this procedure requires an arbitrary scaling factor to be fixed a priori or estimated during the analysis (21). To overcome this issue, we here enforce the weighted average of the quantity to be equal to zero, where I(q, t) is the SAXS intensity at wavevector q for the t-th frame. By straightforward manipulation it can be seen that s(q, t) = 0 implies I(q, t) ∝ I exp (q). Since by construction i s(q, t) = 0, the resulting ensemble is invariant with respect to a uniform shift of the four multipliers i . It is then possible to arbitrarily set their sum to zero, i.e. i i = 0. The reweighting was performed using a custom python script taking advantage of scipy minimization algorithms (67). Two sets of experimental datapoints reported in Ref. (37) were used to reweight our samples: one set was obtained using 100 mM KCl concentration, and the other one contained 100 mM KCl and 1 mM MgCl 2 (species concentrations: 100 mM K + , 102 mM Cl − and 1 mM Mg 2 + ). Both experiments also included 10 mM MOPSO as a buffer.
Since weights can be uneven, only the structures with the highest weights effectively contribute to the ensemble. An approximate estimate of the number of structures with a significant weight can be obtained by computing the Kish effective sample size, defined as

Error calculations
Statistical errors were computed with bootstrap (69), after dividing the enhanced sampling simulation in 10 blocks and the unrestrained plain MD simulations in 5 blocks. When computing the error on the SAXS spectra back-calculated from the reweighted simulation, the Lagrangian multipliers were kept constant. These errors thus report on the statistical reliability of the spectra due to the finite simulation length. Statistical errors on the population of extended structures were instead computed by fitting the Lagrangian multipliers again at every bootstrap iteration, so as to always enforce agreement with experiment. In order to model error in the experimental data, for each iteration of the bootstrap a new sample from the experimental datapoints in a range of values of q was used (see the Jupyter notebook at https://github.com/bussilab/saxs-md-gac for details). This procedure thus reports on the statistical reliability of the population due to both the finite simulation length and the experimental error.

Spectra from MD simulations of crystal structure
Short plain MD simulations with restraints on the RNA molecule, lasting 10 ns, were performed with varying ionic conditions (see Figure 3A) on GAC RNA starting from the crystal structure, where GAC is in its folded state. Specifically, in the four considered setups, the concentration of Mg 2 + ions was gradually increased. Soft position restraints were placed on the RNA heavy atoms to exclude relevant solute dynamics while guaranteeing adequate sampling for the solvent. SAXS spectra were then computed from the MD sampled structure through different available software, namely PLUMED (13,63), Crysol (24), WAXSiS (29) and Capriqorn (18). A substantial difference between these methods is how they include the contribution of solvent to the overall SAXS spectra (see Supplementary Figure S4 for a schematic depiction). In particular, while PLUMED purely relies on the solute coordinates and CRYSOL models the solvent implicitly, both WAXSiS and Capriqorn include explicitly the solvent species in the computation of the spectra. Consistently, a comparison of the predicted SAXS spectra (Figure 3 B) for the same simulation displays differences between the methods, which are more marked in the region at q >0.1Å −1 where the contribution of the solvent becomes more significant. Most notably, WAXSiS and Capriqorn, both including explicitly the solvent contribution, provide compatible results. Additionally, no remarkable dependence on the different ionic conditions explored was observed when comparing results from the different setups (see Supplementary Figure S5).
To investigate the role of RNA dynamics and potentially observe extended conformations, we performed long plain MD simulations of 1 s with no restraints (setups 1 and 4 in Figure 3 A). Interestingly, no remarkable structural changes were observed on this time scale (see Figure  3C), although the simulation with a pure K + buffer had a slightly larger displacement from the starting structure and a slightly larger gyration radius than the simulation including Mg 2 + (see Supplementary Figure S6). Average SAXS spectra from the 1 s simulations were then computed using the pure solute ( Figure 3D) or including the solvent contribution ( Figure 3E). It is worth noticing that, when using a method that explicitly takes into account the solvent contribution ( Figure 3E), the frame-by-frame spectra exhibit significant variations from the spectrum obtained by averaging over the whole trajectory. On the contrary, when a pure-solute method is employed ( Figure 3D), the frame-byframe spectra are closer to their average. Consistently, the fluctuations in the corresponding R * g , as computed through the Guinier fit procedure (see Materials and Methods) in the low-q values of the SAXS spectra, are wider when the solvent is included explicitly. Standard deviation of R * g was 0.1 A and 1.3Å for the pure solute and explicit solvent calculations, respectively. Most notably, for the same simulation, a greater value of R * g as computed from the full simulation is observed when including the solvent, namely 16.8Å from SAXS with solvent vs 16.2Å from pure-solute SAXS, with statistical errors <0.1Å in both cases.

Spectra from enhanced-sampling MD simulations
In order to reproduce SAXS experimental spectra associated with diverse ionic conditions, enhanced sampling methods (33) were used to generate conformations with a broad range of compactness. Representative sampled structures are given in the supplementary material. Figure 4 reports the free-energy surfaces (FES) reconstructed from enhanced sampling simulations started from the crystal conformation. The system was biased with metadynamics through one variable that favored disruption of tertiary contacts and another that guided the sampling towards more extended structures and relaxation was enhanced by a replica exchange protocol (see Materials and Methods).
We stress here that no information to guide towards the experiment was used in the sampling procedure and that only the no-Mg 2 + setup (setup 1 in Figure 3A) was simulated. The resulting FES displayed a unique global minimum in correspondence of the crystal conformation (star label in the leftmost panel of Figure 4). Correspondingly, more extended conformations were located in higher free-energy regions, up to about 25 kcal mol −1 from the minimum. SAXS spectra computed a posteriori using sets of frames with different compactness are also reported, both from the pure solute and the explicit solvent calculation (Figure 4, panels with SAXS spectra). For FES regions where GAC structure was significantly expanded compared to the initial state of the simulation, the shape of the SAXS spectra in the Kratky form changed remarkably. In all cases, a significant difference can be observed between the SAXS spectra computed from the pure solute and including the solvent explicitly (Figure 4, orange and sky-blue lines in the SAXS spectra).

Enforcing experimental spectra
We conducted a reweighting procedure to possibly identify GAC structural ensembles underlying reference experimental SAXS spectra among those collected in the enhanced sampling simulation. As a first step, the SAXS spectra were computed for each of the obtained structures. Importantly, the spectra were computed for both the pure solute and for the whole system, i.e., with the explicit inclusion of the solvent (Figure 4, orange and sky-blue lines). Subsequently, a reweighting procedure was conducted using the maximum entropy principle (5,9). In particular, the ensemble generated in the K + -only simulation was reweighted so as to match experimental SAXS spectra obtained in presence of Mg 2 + or only K + (Supplementary Figure S7). To this end, the agreement with the experiment was enforced in four q points of the spectrum that included the Guinier fit region and the peak and shoulder points of the Kratky plot. In this way, both the R * g and the shape of the Kratky plot were taken into account. Interestingly, while employing as prior observables the spectra computed with the contribution from the solvent allowed to identify posterior ensembles consistent with the experimental ones (Supplementary Figure S8), this was not the case when performing the procedure using the pure-solute spectra (Supplementary Figure  S9, Table S1, and corresponding spectra in the Jupyter notebook at https://github.com/bussilab/saxs-md-gac). This can be explained by the different distributions of the R * g in the prior ensemble ( Figure 5, left panels). While R * g values were confined in the range 15-24Å in the pure-solute case, the range explored was broader when the spectra included the solvent contribution. In other words, despite they were applied to the same pool of structures, the two different ways of computing the spectra (pure-solute vs solute + solvent) produced different effects in terms of the resulting R * g . In particular, R * g was greater than the corresponding R g computed from the only solute coordinates, with the effect being more marked on compact structures and becoming milder for more extended ones (Supplementary Figure S10). . Free-energy surface (FES) and SAXS spectra from the enhanced-sampling MD simulation. The FES is shown as a function of the biased collective variables, namely the ratio between peak (q = 0.1Å −1 ) and shoulder (q = 0.2Å −1 ) of the pure-solute SAXS spectrum in the Kratky form, and a variable (Diff) estimating the degree of formation of tertiary contacts in the RNA molecule (see the Materials and Methods section for its definition). The star indicates the crystal structure, from which the simulation was initiated. For representative regions of the FES (dashed rectangles), the corresponding SAXS spectra are shown (panels indicated by the dashed arrows). Specifically, both the average spectrum computed from the pure solute (RNA only) and from the whole system (RNA and solvent) are displayed in orange and sky-blue, respectively. The corresponding frame-by-frame spectra are also reported as light orange and light sky-blue shades. From left to right panels, their root-mean-square deviations from the corresponding averages are 0.01, 0.07, 0.14 for the pure solute and 0.03, 0.06, 0.12 for the whole system, in the displayed units. For each region, a representative structure of GAC is shown, color coded consistently with Figure 2. Figure 5. Probability distribution of R * g and SAXS spectra from the ensembles reweighted to match Mg 2 + experimental data (top panels) and K + experimental data (bottom panels). In the left panels, R * g is computed through a Guinier fit on the SAXS spectra of the original (prior) ensemble (dashed lines) and of the reweighted (posterior) ensemble (solid lines). Results for both the pure solute (orange lines) and the whole system (green lines) are shown. Note that both solid lines are based on the same reweighted ensemble, i.e. the one matching experiments and SAXS spectra computed including the solvent. In the right panels, the SAXS spectra from the reweighted ensembles are shown in the Kratky form and also reported are the corresponding R * g and the associated statistical error (in green). The statistical errors for selected q values of the spectra are displayed as error bars. The reference experimental value for R * g , which was computed from the experimental SAXS spectra (in light gray and violet for Mg 2 + and K + , respectively), is also reported in gray for comparison.
The refined distribution obtained when enforcing the Mg 2 + experimental data revealed a prevalence of compact structure complemented by a small population (about 1%) of extended structures. The population of extended structures was significantly larger (about 42%) when the refinement was done using K + experimental data ( Figure 5, upper and lower left panels, respectively). It is important to notice that this reweighting stage resulted in very low Kish sizes (68,70) (15.9 and 2.9 for Mg 2 + and K + , respectively, to be compared with 36 000 structures used in the analysis), indicating that only a very limited number of structures contributed to the final spectrum. This is reflected in a high statistical error for the final spectrum, as it can be appreciated in Figure 5, especially when using K + experimental data. This effect can be ascribed to a limited sampling of more extended GAC conformations in the enhanced sampling simulations. In order to quantify our confidence in the population of extended structures in absence of Mg 2 + , its statistical error was computed by enforcing agreement with experiments at every bootstrap iteration (see Materials and Methods). The resulting error was 16%, indicating that if the predicted compact and extended structures and the back-calculation of the SAXS spectra are assumed to be correct, their population can be determined with a relatively low error given the experimental spectra. The corresponding reweighted SAXS spectra were compatible with the experimental reference for Mg 2 + and K + ( Figure 5, upper and lower right panels, respectively). As expected based on the low Kish size, the agreement was poorer for the case of K + . Nevertheless, the predicted R * g values were in agreement with the experimental ones (compare green and gray values in the right panels of Figure 5).
We also tested if adding a rigid shift to the experimental data could increase the Kish size or result in a Guinier radius in better agreement with experiment (71). The analysis suggests that no shift is necessary (see Supplementary Figure S11).

DISCUSSION
In this work we use atomistic molecular dynamics simulations, coupled with enhanced sampling methods, to generate a conformational ensemble for GAC RNA. The resulting ensemble is then reweighted so as to enforce agreement with recently published SAXS data (37). SAXS spectra are back-calculated comparing approaches that include solvent effects to a different extent.
The introduced approach combines two crucial ingredients, namely conformational dynamics and solvent effects. The former is enhanced by using replica exchange and metadynamics acting on a proxy of the SAXS intensities, so as to encourage the exploration of extended structures. The latter is achieved by using an accurate a posteriori calculation of the spectra, where the presence of the solvent is PAGE 7 OF 10 Nucleic Acids Research, 2021, Vol. 49, No. 14 e84 explicitly modeled (18), in conjunction with the maximum entropy principle. The approach is summarized in Figure 1. We note here that all the steps involved in the procedure are conducted through freely available software. When enforcing experimental data obtained in presence of Mg 2 + , our procedure suggests a low population of extended structures to contribute to the spectrum. Albeit low, this contribution is necessary to explain the difference observed between the experimental SAXS data and the spectra back-calculated using the explicit solvent approach on the native structure. When enforcing experimental data obtained in absence of Mg 2 + , instead, the fraction of extended structures is significantly higher.
We first addressed the role of solvent contribution to the SAXS intensities. We thus ran long MD simulations starting from the crystal structure and analyzed them using a range of methods that neglect solvent contribution or include it at different levels of approximation and compare the back-calculated SAXS spectra with the available experimental data. From our results it emerges that (a) an explicit inclusion of the solvent has a measurable effect on the apparent radius of gyration of the molecule as obtained with the Guinier fit and (b) the overall shape of the spectrum is affected also at relatively small scattering vectors (q < 0.3 A −1 ). Although none of the employed methods results in a SAXS spectrum in agreement with the available experimental data, irrespective of the ionic condition in which the experiments were performed and the simulations were run, the explicit-solvent methods lead to values of the observed gyration radius that were closer to the experimental ones. This suggests that the role of solvent contribution in the estimation of SAXS spectra for nucleic acids should not be neglected. We hypothesize this to be related to the high charges that make the solvation shell structured and of critical importance.
We then addressed the role of RNA dynamics. Plain MD simulations in the s timescale with no restraints stably maintained the compact, crystal conformation of GAC and no alteration of the structure towards more extended states was observed, irrespective of the simulated ion conditions. This suggested that the disruption of the initial conformation would require remarkably extensive sampling. Indeed, the crystal structure comprises a complex network of tertiary contacts between the four separate stem regions of GAC, which overall results in a significant structural stability. We thus employed a combination of two enhanced sampling methods, namely metadynamics and replica exchange. The former was used to improve the sampling of extended conformation by biasing dedicated CVs. Specifically, a first CV was designed to quantify the amount of tertiary contacts in the structure, so as to facilitate its rupture and to explore different sets of contacts, while a second CV was designed to reflect salient features of the SAXS spectra (peak and shoulder of the Kratky form), so as to encourage the exploration of structures with heterogeneous spectra and thus of varying structural compactness. At the same time, the replica exchange method was exploited to soften energetic barriers, thus accelerating local reconformations.
It is important to underline that no experimental data were used while performing MD simulations, and that SAXS intensities used for the metadynamics simulations were estimated using a pure-solute coarse-grain approach, that can be effectively used on-the-fly. This was possible here since the estimated intensities were only used to enhance sampling, and not to reproduce the experimental data. We notice that also in (72) SAXS intensities calculated in a pure-solute approximation were used to enhance sampling. Interestingly, also in the extended structures we found discrepancies between the pure-solute calculation of the SAXS intensities and their explicit-solvent counterparts. The effect of the solvent is not trivial and cannot be reduced to an increased effective gyration radius. In fact, the estimated Guinier radius for extended structure appeared not to be dependent on the inclusion of the solvent.
Although enhanced sampling simulations were capable to generate extended structures with SAXS spectra compatible with experiment, the weight of these structures in the Boltzmann ensemble was very low. This happened even if our simulations were performed in absence of Mg 2 + , and thus in conditions that are expected to favor a significant population of extended structures. This can be explained to be a consequence of (a) the choice of initializing our simulations from a compact structure and (b) a possible bias in the employed force field towards compact structures. Only by using a reweighting procedure based on the maximum entropy principle we were able to reproduce the experimental spectra. The standard maximum-entropy-based reweighting was here extended so as to allow fitting experimental data that are known up to an a priori undetermined prefactor. Notably, the employed maximum-entropy reweighting only succeeded when spectra were calculated including the solvent, confirming that both solvent contribution and dynamics are crucial in this system. We observe that it is possible to extend the maximum entropy principle so as to model experimental errors by including a regularization term (9). Error models are also explicitly included in the metainference method (6). For simplicity, we didn't include any regularization or error model here, since the statistical error on the intensities due to the limited sampling dominates on the experimental error. When attempting a fit based on the pure-solute spectra, instead, adding a regularization term was required for the minimization to converge but then lead to spectra in remarkable disagreement with experiment.
The decision to avoid an explicit usage of the available SAXS data during the simulation was here made so as to generate a generic ensemble that spans a wide range of conformations and can be then, a posteriori, reweighted in order to match experimental data obtained with different ionic conditions. In this specific case, it allowed for obtaining reference ensembles for two different datasets from the same simulation, thus at half of the computational cost. Using the same simulation for fitting two separate experiments has also the advantage that statistical errors and biases towards the initial conditions will be identical in the two cases, thus making the comparison of the resulting ensembles more robust. In general, this can be of critical importance when intensive MD simulations are involved. In fact, in case of a variation in the reference experimental data, such as the availability of additional data, the procedure can be easily renewed using the same prior ensemble, with no necessity of repeating the MD simulations (73). The usage of approximate back-calculated SAXS intensities in the enhanced sampling stage, however, helped us not to incur in the difficult situation where the initial ensemble is so poor that it cannot match experimental data when reweighted (9,70). Our comparison of the back-calculated SAXS intensities obtained from the simulation of crystal structures using different ion concentrations suggests that, for a fixed RNA structure, the effect of the ions present in the solution on the spectrum is limited. This provides a rationale for our choice to reweight our KCl simulation using both KCl and MgCl 2 data and, indirectly, justifies the neglection of the 10 mM MOPSO present in the original experiment. We also notice that the reweighted ensembles are relatively poor and show a very limited Kish size. Longer simulations might allow for better converged ensembles. However, the reported calculation is at the limit of what is currently feasible in terms of system size and statistical sampling. In particular, the investigated system is significantly larger (57 nucleotides) than other RNA systems where maximum-entropy or related methods have been used so far to integrate solution experiments with atomistic simulations (10-15) (29 nucleotides at most) and contains more heterogeneous structural motifs.
The Mg 2 + -dependent shift in the relative populations of compact and extended structures that we observe here is specific for the GAC RNA, but given that Mg 2 + is the predominant divalent ion in cells, it is likely to be a common phenomenon among folded RNA molecules. We found, surprisingly, that there is a relatively large population of compact structures of the GAC RNA even in the absence of Mg 2 + . We discern this population from the combination of experimental SAXS data and our SAXS spectra computed from our simulation structures. This estimated population is robust with respect to both the estimated experimental error and to the statistical error of the simulations. Although we cannot further characterize these compact conformations, we do have some insight into their role in the biology of the GAC. We note that recent measurements (39) showed that, in the absence of Mg 2 + , GAC RNA has negligible affinity for the L11 protein. In prokaryotes, L11 protein binding to the GAC rRNA in the large ribosomal subunit is necessary for efficient translation. Thus, even though the GAC can form compact conformations in the absence of Mg 2 + , it appears that those conformations cannot be bound by the protein. Perhaps they are misfolded tertiary structures, or there is conformational exchange on a timescale that prevents L11 protein binding. In any case, these data indicate that specific association of Mg 2 + to the GAC RNA is required to form its binding competent conformation, and thus ensure optimal translation.
To the best of our knowledge, this is the first report on a procedure to reconstruct the conformational ensemble of a functional RNA molecule that combines experimental SAXS spectra and MD simulations while fully accounting for the solvent contribution, which is included explicitly in the computation, and RNA conformational dynamics. Previous works have mostly addressed separately the two issues. For instance, a common practice used when computing SAXS spectra with explicit solvent methods is to apply soft position restraints to the backbone atoms of the solute biomolecule (18,29), although other studies were conducted where unrestrained solutes were examined (19,20). If used to analyze relatively short plain MD simulations where, by necessity, the solute dynamics is limited, these methods would mostly report the solvent fluctuations. An opposite scenario is offered by the pure-solute restraints implemented in PLUMED (13), that can be naturally combined with efficient enhanced sampling methods so as to properly characterize solute dynamics (74). In this case, however, solvent contributions are neglected by construction. In this respect, our protocol takes the best from both worlds to reconstruct conformational ensembles that are compatible with reference experimental SAXS data, by (a) enforcing approximate solute-only spectra during the simulation and (b) reweighting the resulting ensemble using the more accurate explicit solvent models.

DATA AVAILABILITY
Preprocessed data and a Jupyter notebook that can be used to generate all the figures can be downloaded at https: //github.com/bussilab/saxs-md-gac. PLUMED input files are available in the PLUMED-NEST (https://www.plumednest.org), the public repository of the PLUMED consortium (64), as plumID:21.016. Full trajectories are available at https://doi.org/10.5281/zenodo.4646262.