Watching ion-driven kinetics of ribozyme folding and misfolding caused by energetic and topological frustration one molecule at a time

Abstract Folding of ribozymes into well-defined tertiary structures usually requires divalent cations. How Mg2+ ions direct the folding kinetics has been a long-standing unsolved problem because experiments cannot detect the positions and dynamics of ions. To address this problem, we used molecular simulations to dissect the folding kinetics of the Azoarcus ribozyme by monitoring the path each molecule takes to reach the folded state. We quantitatively establish that Mg2+ binding to specific sites, coupled with counter-ion release of monovalent cations, stimulate the formation of secondary and tertiary structures, leading to diverse pathways that include direct rapid folding and trapping in misfolded structures. In some molecules, key tertiary structural elements form when Mg2+ ions bind to specific RNA sites at the earliest stages of the folding, leading to specific collapse and rapid folding. In others, the formation of non-native base pairs, whose rearrangement is needed to reach the folded state, is the rate-limiting step. Escape from energetic traps, driven by thermal fluctuations, occurs readily. In contrast, the transition to the native state from long-lived topologically trapped native-like metastable states is extremely slow. Specific collapse and formation of energetically or topologically frustrated states occur early in the assembly process.


INTRODUCTION
Many functional RNA molecules fold to specific tertiary structures, in which divalent cations play crucial roles (1)(2)(3)(4)(5)(6)(7).Their f olding pathwa ys often consist of multiple steps covering a spectrum of time scales, and traversal through multiple pathways ( 8 , 9 ).The structural ensemble of folding intermediates is, ther efor e, highl y hetero geneous ( 10 ).In spite of advances in experimental methods, there are still limitations to the spatial and temporal resolution in dissecting the folding of large RN A molecules, especiall y the mechanisms by which divalent cations modulate the folding landscape.
Gr oup I intr on is a self-splicing ribozyme, that has been widely used in studies of RNA folding, typically either from purple bacteria Azoarcus or ciliates Tetrahymena ( 11 ).In the presence of divalent cations (Mg 2+ ), the RNA molecule folds to a specific tertiary structure (Figure 1 A-B) ( 12 ).It is known that the folding takes place in a hierarchical manner ( 13 , 14 ).Mg 2+ is indispensable not only for its catalytic activity but also for the formation of the nati v e conformation The secondary structure map shows that se v eral helices are ordered along the sequence.The helices of the Azoarcus ribozyme are conventionally denoted as P2 through P9 (labeled in black).P2, P5, P6, P8, and P9 are hairpin structures in which adjacent segments form the double strands, whereas P3, P4, and P7 are double strands formed by non-local pairs of segments.Se v eral key elements involving tertiary interactions are shown in red: TH, triple helix; SE, stack exchange; G site, Guanosine-binding site; TL2-TR8, tetraloop 2 and tetraloop-receptor 8; TL9-TR5, tetraloop 9 and tetraloop-receptor 5; and AS, the acti v e site.( B ) Tertiary structure, taken from PDB 1U6B ( 12 ).The same colors are used as in (A).( C ) Time dependence of the radius of gyration ( R g ) averaged over 95 trajectories (thick red line).Thin lines show individual trajectories.At the beginning of the simulations t = 0, the average R g is R g ≈ 7.8 nm, corresponding to the value at equilibrium in the absence of Mg 2+ .The fit using three exponential functions (see the main text) is shown by the blue dotted line.Inset: Coefficient of variation (C.V.), calculated as 2 / r g ( t) where <> is the av erage ov er the trajectories.( D ) The middle to late stages of compaction are magnified from the data in (c).
The three time constants are indicated by blue vertical lines.( E ) Distance distribution functions reveal the stages in the collapse kinetics.The distribution functions were calculated using snapshots from the 95 trajectories at t = 0, 0.15, 5 and 30 ms.The dotted lines are experimental tSAXS data from ( 14 ).( 15 , 16 ).The double-strand helices in the cor e r egion can fold at Mg 2+ ∼0.2 mM, whereas the complete tertiary structur e r equir es at least ∼2 mM Mg 2+ .Woodson and coworkers re v ealed, by time-resolv ed Small Angle X-ray Scattering (tSAXS) experiments, up to 80% fraction of unfolded Azoarcus ribozyme reach compact structures in less than 1 ms upon the addition of 5 mM Mg 2+ ( 14 ).The overall folding time has been estimated to be 5-50 ms for the major fraction of RNA in ensemble experiments, which is faster in Azoarcus than Tetrahymena ribozyme ( 17 ), because Azoarcus ribozyme has smaller and simpler peripheral domains.On the other hand, it has also been known that a certain fraction of the molecule is trapped in an intermediate state, presumably because of misfolding.This persistent intermedia te sta te needs times on the order of minutes to hours to fold to the nati v e conformation ( 18 ).
Here, we simulate the multistep folding kinetics of Azoarcus gr oup I intr on by e xtensi v e Brownian dynamics simulations using a coarse-grained RNA with explicit ions ( 19 ).In previous studies, we showed that the model reproduces Mg 2+ -concentration dependence of the Azoarcus ribozyme folding and corr ectly pr edicts binding sites of Mg 2+ in equilibrium simulations ( 19 , 20 ).Here, we focused on the kinetics of the same RNA.We conducted 95 folding simulations triggered by adding 5 mM Mg 2+ to unfolded ribozyme pr epar ed in the absence of divalent cations.Among them, simulation time.We found that a certain fraction of simulated trajectories were trapped in misfolded states.The folding reaction took place through multiple phases as monitored by the time-dependent changes in the overall size ( R g ), and formation of key interactions.Most ( ∼80%) of secondary structures folded ra pidl y within the first phase.In contrast, about half of tertiary interactions formed gradually during the first and the middle phases, and the other half folded in the last phase.Non-nati v e base pairs contributed to a manifold of metastable states comprising of a combination of mispaired helices, which slowed the folding reaction.Howe v er, one of the misfolded states mostly consisted of nati v e interactions without mispaired helices (a topological trap).Thus, not only non-nati v e base pairs but also the topology of the chain are relevant in characterizing the rugged RNA folding landscape.We also analyzed the dynamics of Mg 2+ ions, and showed that Mg 2+ ra pidl y replaces K + when the folding reaction is initiated.Nearly 90% of the number of Mg 2+ ions were condensed onto the RNA in the earliest phase, in which most tertiary interactions and some helices were still unfolded.Comparison of our results, with se v eral e xperimental data, including timedependent R g from tSAXS experiments and hydroxyl radical footprinting, shows near quantitati v e agreement.This allows us to investigate the detailed structural changes triggered by Mg 2+ as the Azoarcus ribozyme folds, e v ents that cannot be accessed by ensemble or single molecule experiments.

Thr ee-inter action-site (TIS) model with explicit ions
In order to simulate the long time scale needed to fold the ribozyme, we used the TIS model in the presence of Mg 2+ as well as K + that is in the buffer ( 19 ).The TIS model for nucleic acids has three interaction sites for each nucleotide, corresponding to the phosphate, sugar, and base moiety (Supplementary Figure S1) ( 21 ).All the ions are explicitl y treated, w hereas water is modeled implicitly using a temperature-dependent dielectric constant.The force field in which the physicochemical nature of RNA was carefully considered is gi v en as U TIS = U bond + U angle + U EV + U HB + U ST + U ele .The first two terms, U bond and U angle ensure the connectivity of the bases to the ribose backbone with appropriate bending rigidity.The next term, U EV , accounts for excluded volume effects, which essentially pre v ent ov erlap between the beads.Hydrogen-bonding and stacking interactions are gi v en by U HB and U ST , respecti v ely.We consider hydrogen bonds for all possible canonical pairs of bases (any G-C, A-U or G-U base pairs can be formed), as well as tertiary hydrogen bonds that are formed in the crystal structure (PDB 1U6B).The stacking interactions are applied to any two bases from consecuti v e nucleotides along the sequence, as well as tertiary base stacking in the crystal structur e. P arameters in these terms ar e optimized so that the model reproduces the thermodynamics of nucleotide dimers, se v eral types of hairpins, and pseudoknots ( 22 ).It should be emphasized that no parameter in the energy function was adjusted to achie v e agreement with experiments on the simulated ribozyme.Thus, the results are emergent consequences of direct simulations of the transfer ab le TIS model .The detailed functional forms for these terms are gi v en in the Supplementary Methods and Supplementary Table S1 in the Supplementary Data.The optimized parameters and a list of tertiary interactions can be found in the main text and supplemental information gi v en elsewhere ( 19 ).We showed previously that the model reproduced the experimental thermodynamics data for se v eral RNA motifs, such as hairpin and pseudoknot, thermodynamics of Azoarcus ribozyme as a function of Mg 2+ concentrations, and more recently, the thermodynamics of assembly of the central domain of the ribosomal RNA ( 23 ).A crystal structure of Azoarcus group I intron (PDB 1U6B ( 12 ), Figure 1 B) was used as the reference structure for the nati v e conformation.The nucleotides are numbered from 12 through 207 following the convention in the literature of Azoarcus group I intron.

Simulation protocol
To generate the unfolded state ensemble, we performed equilibrium simulations in the absence of Mg 2+ with 12 mM KCl, corresponding to the concentration in the Tris buffer.To enhance the efficiency of sampling the configurations of the system, we employed under-damped Langevin dynamics ( 24 ) simulations by setting the friction coefficient to 1% of water viscosity.We recorded the conformations of RNA and ions e v ery 10 8 steps, to be used as initial coordinates for the folding simulations.The duration of the equilibrium simulation is sufficiently long that all the initial structur es ar e well separ ated in the configur ational space.To generate 95 initial configurations, we ran over 3 × 10 11 steps constituting equilibrium simulations.Brownian dynamics simulations ( 25 ) were performed to trace the folding reactions starting from an initial state in which the ribozyme is devoid of tertiary structures.The viscosity was set to the value of water, 8.9 × 10 −4 Pa s.The simulations were performed in a cubic 35 nm box containing ions and RNA.To minimize finite-size effects, we used periodic boundary conditions.The temperature was set to T = 37 • C. Additional details are described in Supplementary Methods.
Starting from the initial conformations, pr epar ed at [Mg 2+ ] = 0, we triggered the folding reaction by adding 5 mM Mg 2+ .Both experiments and previous simulations ( 19 ) have shown that a solution containing 5 mM Mg 2+ and 12 mM K + dri v es Azoarcus ribozyme to structures that are catal yticall y acti v e ( 14 ).We generated 95 folding trajectories until the ribozyme reaches the folded state, or the simulation time is ≈30 ms.We assessed whether the ribozyme is folded to the correct nati v e state by calculating the rootmean-square-deviation (RMSD) from the crystal structure.If the RMSD to the nati v e structure is less than 0.6 nm, then the ribozyme is folded.We confirmed that if RMSD < 0.6 nm, all the secondary and tertiary interactions are correctly formed.Note that the experimentally (tSAXS) accessible quantity, the radius of gyration ( R g ), alone is not an accurate order parameter to distinguish the nati v e structure from the misfolded structures because some of the nonnati v e structures hav e R g values that are close to the nati v e structure.We use se v eral measur es, as described her e and in the Supplementary Methods, to monitor the order of events during the folding process.

Clustering analysis
To classify the compact conformations of the ribozyme and examine if there are any non-nati v e (misfolded) conformations, we performed a clustering analysis.First, from all conformations in the 95 folding trajectories, we collected compact structures whose R g ≤ 3.5 nm, regardless of their similarities to the nati v e conformation.This criterion generated 2 162 299 structur es.After r educing the number of structures to 10 811 (1 / 200) by random selection, a clustering analysis was done by Ward's method using the Distribution of Reciprocal of Interatomic Distances (DRID) as the similarity measure ( 26 ).

Ion condensation and binding
To make a quantitati v e comparison of condensation of monovalent (K + ) and divalent (Mg 2+ ) cations, we counted the number of ions condensed onto the RNA at each time frame.We consider that an ion is condensed when it is in the vicinity of any phosphate site in the RNA.To compare Mg 2+ and K + on equal footing, we used the Bjerrum length ( l B = 0.73 nm) as the cutoff distance for ion condensation.At the distance l B , the thermal energy balances the Coulomb attraction between a cation and an anion.
To detect tightly-bound Mg 2+ ions, in a manner consistent with the previous study ( 19 ), we computed the contact Mg 2+ concentration , c * , as follows.For e v ery phosphate site in each snapshot of the simulations, we counted the number of Mg 2+ located in the range, r 0 − r < r < r 0 + r , where r is the distance from the phosphate, r 0 = R P + R Mg = 0.44 nm is the sum of the excluded-volume radii of the phosphate and Mg 2+ ion, and r = 0.15 nm is a tolerance margin for contact (Supplementary Figure S2a).The contact Mg 2+ concentration was then calculated by dividing the number by the spherical shell volume (Supplementary Figure S2).The definition of Mg 2+ binding rate using the same criterion can be found in the Supplementary Methods.

Footprinting analysis
It is known that experimental footprinting using hydroxyl radical is highly correlated with the Solvent Accessible Surface Area (SASA) of the sugar backbone (27)(28)(29).To compare our simulation results with experimental footprinting da ta, we calcula ted SASA using FreeSASA version 2.0 ( 30 ).Considering that hydroxyl r adicals prefer ab ly cleav e C4' and C5' atoms of RNA backbone ( 28 ), we took the larger value of the SASA of C4' and C5' atoms for each nucleotide.From the SASA data, we computed the 'protection factor' ( 31 ) of the i th nucleotide (see the Supplementary Methods for the detail), for the nati v e and misfolded states, respecti v el y, w here the bracket indicates the ensemb le av erage.We used the initial conformations pr epar ed in the absence of Mg 2+ to compute the average SASA of the unfolded state, SASA Unfolded .In order to perform the SASA calculation, we reconstructed atomically detailed structures of the ribozyme from coarse-grained coordinates using an in-house tool (TIS2AA https://doi.org/10.5281/zenodo.581485, Supplementary Figure S1), which employs a fragment-assembly approach ( 32 ), followed by minimization by Sander in Am-ber16.In Supplementary Figure S3, we show some examples of atomically detailed structures that were obtained from the conformations generated using the TIS model.

RESULTS
Ther e ar e two parts to the Mg 2+ -induced folding of the ribozyme.One pertains to the time-dependent changes in the conformations of the RNA as it folds.The second is related to the role that Mg 2+ plays in driving the ribozyme to the folded state.It is easier to infer the major conformational changes of the ribozyme using experiments.In contrast, it is currently almost impossible to monitor the fate of the time-dependent changes in many Mg 2+ ions as they interact with specific sites on the RNA.Both the time-dependent changes in the RNA structures and, more importantly, how correlated motions involving multiple divalent cations lead to folding, cannot be sim ultaneousl y be measured in experiments.Simulations, provided they are reasonably accurate, are best suited to probe the finer details of RNA conformational changes, and the mechanism by which Mg 2+ dri v es the ribozyme to fold as a function of time.After demonstra ting tha t our simula tions nearly quantita ti v ely reproduce the measured time-dependent changes in the size of Azoarcus ribozyme, we focus on the effect of Mg 2+ on the folding reaction.

Unfolded state ensemble is heterogeneous
We pr epar ed the unfolded state structural ensemble in the absence of Mg 2+ at 12 mM KCl concentration, which is the concentration in the Tris buffer ( 33 ).The average radius of gyration, in the absence of Mg 2+ , is R g = 7.8 nm (Figure 1 C), which is in excellent agreement with experiments ( ≈7.5 nm measured by SAXS experiments in the limit of low [Mg 2+ ]) ( 33 ).Although the tertiary interactions are fully disrupted, se v eral secondary structures, helix domains P2, P4, P5 and P8, are almost intact (Supplementary Figure S4).Ne v ertheless, globally the Azoarcus ribozyme is unstructured, with most of the characteristics of the nati v e structure being absent (see the unfolded structures in Figure 2 ).The unf olded conf ormational ensemble is highly heterogeneous, containing a mixture of some secondary structural elements and fle xib le single-stranded regions (Supplementary Figure S5).

Ribozyme collapse occurs in three stages
We first report how the folding reaction pr oceeds fr om the ensemb le perspecti v e by av eraging ov er all the folding trajectories in order to compare with the tSAXS experiments ( 14 ).Following the experimental protocol, we monitored the folding kinetics using the time-dependent changes in the radius of gyration, R g , describing the overall compaction of the ribozyme (Figure 1 C).At each time step, R g was averaged over all the trajectories.At t → 0 (see the limit of small t in Figure 1 C), the average is R g ≈ 7.8 nm, corresponding to the unfolded state.The time-dependent changes in the average R g ≡ R g ( t ) are fit using, where R g U and R g F are the average R g of the unfolded and folded state, respecti v ely, and i and i are the time constants and the amplitudes ( 1 + 2 + 3 = 1) associated with the i th phase, respecti v ely.The data could not be fit accurately using a sum of two exponential functions (Supplementary Figure S6).The best fit parameters are 1 = 0.76, 2 = 0.11, 3 = 0.13, with the corresponding time constants, 1 = 0.15, 2 = 1.6, 3 = 33 ms (Figure 1 D).
The three time scales describe the multi-step folding e v ents if the radius of gyration is a reasonable order parameter for the folding reaction: (i) rapid collapse from the unfolded state ( R g ≈ 7.8 nm) to an intermediate state in which the RNA is compact with R g ≈ 4 nm ( c = 1 = 0.15 ms); (ii) Further compaction to an intermediate state, I c , which has R g ≈ 3.5 nm; and (iii) finally folding to the nati v e structure with R g = 3 nm.Clearl y, the maxim um extent of compaction occurs in the earliest stage of the folding reaction.
It is important to compare the simulation results with experimental SAXS data ( 14 ) in order to validate our model.The tSAXS results also show the three stages, with an initial decrease to R g ≈ 4 nm occurring in τ exp 1 < 0 . 2 ms.This is close to the time scale observed in the simulations, 1 = 0.15 ms.In the tSAXS experiment, further compaction to the I c state ( R g ≈ 3.5 nm) occurs in τ exp 2 ≈ 17 ms , which is an order of magnitude larger than predicted in the simulations, 2 = 1.6 ms.In the experiments ( 14), there is no data for R g for time less than 0.6 ms, which might influence the estimate of τ exp 2 .The R g of the I c state is similar in the simulations and experiments (Figure 1 D).
By fitting R g in the simulations to the three-stage kinetics, we obtained the time constant of the final transition leading to R g ≈ 3 nm, 3 = 33 ms, that is orders of magnitude smaller than the experimental estimate ( τ exp 3 ≈ 170 s ).The most likely reason for this discrepancy is that, for computational reasons, we terminated the folding trajectories at 30 ms regardless of whether the RNA is folded or not.Ther efor e, the estimated value of 3 from simulations is a lower bound to the time constant reported in experiments.It is worth noting that analysis of the experimental data emphasized only τ exp 1 and τ exp 2 , spanning times on the order of 200 ms (see Figure 4 A in ( 14 )).
The trajectories that do not reach the nati v e state in 30 ms undergo non-specific collapse into structures that require a longer time to reach the folded state.Ne v ertheless, it is clear that the sim ulations ca pture the multistage collapse of the ribozyme observed in tSAXS experiments, with near quantitati v e agreement for the first two phases (Figure 1 D).We confirmed that distance distribution functions at se v eral different stages in the folding are also consistent with experiments ( 14 ) (Figure 1 E).
The amplitudes of all the three stages in the decay of R g ( t ) in the simulations are in a near quantitati v e agreement with fits to the measured R g ( t ) (see Figure 4  The le v el of agreement is remar kab le, considering that no parameter in the model was adjusted to obtain agreement with any observable in the experiment.If R g is a good reaction coordinate, then this would imply that nearly 80% of Azoarcus ribozyme folds ra pidl y.
The ensemble average R g hides the high degree of heterogeneity in the Mg 2+ -induced compaction of the RNA.
The results in the inset of Figure 1 C show that the dispersion in R g as a function of t is considerable even in the late stages of folding.This is the first indication of the importance of pathway heterogeneity in the folding process, which we further substantiate below.

Dynamics of folding and misfolding
In Figure 2 , the fate of the trajectories and schematic folding routes ar e pr esented along with some r epr esentati v e structures.After the initial compaction, most trajectories (87 out of 95 ≈ 92%) show a further reduction in R g in the second phase, in which compact structures form ( R g < 3.5 nm).Among them, 55 trajectories reach the folded state in 30 ms with structural features that are the same as in the crystal Two key interactions in the peripheral regions, TL2-TR8 and TL9-TR5, formed early in the folding process ( t < 1 ms), resulting in the junction J8 / 7 with an incorrect topology (See Figure 7 ).Because the incorrect chain topology cannot be resolved unless both of the peripheral interactions unf old, the RNA sta ys topologically trapped for the rest of the simulation time.See Supplementary Figure S14 for trajectories of R g , fractions of secondary and tertiary elements, Supplementary Figure S15 for another example of a kinetic trap, and Supplementary Movie 2 for watching the trajectory.( B ) Specific collapse leads to rapid folding.From the distributions of the first passage times to the folded state, P F F P ( τ F ) , we calculated, F ( t) = t 0 P F F P ( s) ds .Similarly, M ( t) = t 0 P M F P ( s) ds , where P M F P is the distribution of mean first passage times to the trapped sta te. ( C ) Fa te of the ribozyme immediately following the initial collapse.The probability distributions of the structure overlap ( ) with respect to the nati v e structure; = 0 indicates no similarity to the crystal structur e, and = 1 corr esponds to the nati v e state.(Inset) The distribution of immediately after collapse ( t < 150 s, green line, 'Just-collapsed') compared with distributions of the equilibrium unfolded state (blue, [Mg 2+ ] = 0 mM) and folded state (orange, [Mg 2+ ] = 5 mM).The distribution of the 'just-collapsed' ensemble in the main figure is decomposed into four distributions depending on the fate of each trajectory: Folded ra pidl y, trajectories reached the correct folded state within 5 ms; Folded slowly, trajectories reached the correct folded state after 5 ms but within the maximum simulation time (30 ms); Trapped, trajectories where the RNA was trapped in the major misfolded state; trajectories labeled Other were neither folded nor misfolded.structure.To identify the distinct conformations that appear in the compact structural ensemble, we performed a clustering analysis (Supplementary Methods and Supplementary Figure S7).Although there are se v eral misfolded elements in the ensemble structures, along with nati v e-like structures, we identified two types of misfolding mechanisms, namely energetic traps and topological traps.The energetic trap consists of misfolded states that contain non-nati v e helices (Supplementary Figure S8).These non-nati v e helices can be e v entually disrupted (two strands dissociate by stochastic thermal fluctuations), and undergo transition either to the nati v e-like state or to one of the second type of misfolded states.
The second type of misfolded conformations, which we classify as topological traps, are due to frustration arising from chain connectivity ( 34 ).In the topological trap, most of the helices are correctly formed, as in the nati v e structure.Howe v er, the spatial arrangement of helices in some regions differs from the nati v e structure (e.g., a part of the chain passes either on one side or the other of another strand).The incorrect topology is stabilized by se v eral tertiary interactions, especially those involving peripheral motifs such as TL2-TR8 and TL9-TR5 (Figure 1 ).As a consequence, once RNA is topologically trapped, it cannot easily escape and fold to the correct nati v e state at any reasonable time.We identified two distinct topological traps.In the major topolo gical tra p, 'Misfold (J8 / 7)', the J8 / 7 junction passes through an incorr ect r elati v e location with respect to the strands of the P3 helix (Figure 7 B).We describe this major topolo gical tra p in detail in the following sections.Although 'Misfold (J8 / 7)' resembles the nati v e structure, the lifetime of this state is so long that it cannot be resolved even on the experimental time scale.In the minor topological trap labeled as 'Misfold (P2)', a part of the chain leading to the P2 helix is incorrect.This structure is stabilized by TL9-TR5 peripheral tertiary contact (see Supplementary and Supplementary Figures S9-S11).
In summary, within the simulation time of 30 ms, 55 out of 95 trajectories folded correctly to the nati v e structure.In 14 trajectories, the ribozyme was trapped in the misfolded (J8 / 7) state, which is a nati v e-like topological trap.In 18 out of the remaining 26 trajectories, compact structures formed ( R g < 3.5 nm) ra pidl y but did not fold further, partly due to the energetic or topological frustration.Because folding is a stochastic process, the times at which each trajectory (or molecule) reaches the nati v e state vary greatly.

Visualizing folding events in a single folding trajectory
In Figure 3 , we show a r epr esentati v e trajectory in which folding is completed (see also Supplementary Movie 1).In this particular case, the ribozyme reached the nati v e structure ra pidl y in t ∼ 2.4 ms.Helices P2, P4, P5, P8 and P9 wer e alr ead y formed a t t = 0, and remained intact during the folding pr ocess.Fr om t = 0 to ∼0.5 ms, global collapse occurred, with a rapid decrease in R g to ∼4 nm.Along with the global collapse, P7 formed at an early stage, t ∼ 0.15 ms.He-lix P6 and the G-site tertiary interaction formed around t ∼ 0.4 ms.Other key interactions formed in the order of Triple Helix (TH), TL9-TR5 and P3.Note that the order of formations of these key interactions depends on the trajectory, and is by no means unique.After the formation of the P3 helix, it took a relati v ely long time ( ∼1.5 ms) for P2 (orange domain in Figure 3 ) to find the counterpart P8 (blue).At t ∼ 2.4 ms, the formation of tertiary interactions between the two domains (TL2-TR8) results in the nati v e conformation.Supplementary Figures S12 and S13 show two other examples in which the folding was completed but on a longer time scale.In the trajectory in Supplementary Figure S12, a mispaired helix in P6 formed early ( t < 1 ms), pre v enting it from folding to the nati v e state smoothl y.Eventuall y ( t ∼ 25 ms), the misfolded P6 was resolved, leading to the correct nati v e fold.
Figure 4 A shows a series of snapshots from a trajectory where the ribozyme is topologically trapped in the Misfold (J8 / 7) state.In this trajectory, two key interactions in the peripheral regions, TL2-TR8 and TL9-TR5, formed early ( t < 1 ms).Howe v er, the junction J8 / 7 was in the wrong position with respect to the P3 helix strands, causing a misfolding (Figure 7 B).Because the incorrect chain topology cannot be resolved unless both of the peripheral interactions unf old, the RNA sta ys in this misf olded topological trap for an arbitrarily long time.
From all the folding and misfolding trajectories, we calculated the distributions of first passage times to either the folded or the trapped state.The time-dependent fractions in these two states, shown in Figure 4 B, re v eal that trajectories that reach nati v e-like structures (i.e.either folded or topolo gicall y tra pped states) by ∼5 ms fold correctly.In contr ast, tr ajectories that take a longer time to be nati v elike ( > 5 ms) are kineticall y tra pped.The former type of trajectories would be the consequence of the specific collapse in the earliest stage of the folding.

Kinetics of secondary-and tertiary-structure formation
The folded RNA structures are composed of secondary structural motifs, which are often independently stable and are consolidated by tertiary contacts to render the ribozyme compact.The most abundant elementary structural unit is the double-stranded helix (Figure 1 A).In Figure 5 , timedependent formations of secondary and tertiary interactions, r epr esented by average energies stabilizing these motifs, are plotted.Each interaction type is further categorized into two main chemical components, hydrogen bonding (H-bond) and base stacking.Figures 5 A, B show that most secondary structures are ra pidl y formed in the first phase ( t 0.15 ms), although certain secondary interactions form only in the late stages.In contrast, the formation of most tertiary interactions occurs in the middle ( t ∼ 1.5 ms) and the last phase ( t 10 ms) (Figure 5 C, D).These findings illustrate the hierar chical natur e of RNA folding kinetics in the ensemble pictur e, wher e formations of secondary structur es are followed by tertiary contacts ( 35 , 36 ).
We then investigated the kinetics of individual helix formations by calculating time-dependent fractions of helix formations in the folding trajectories (Figure 5 E).In summary, all helices except P3 and P7 fold earl y, typicall y in t < 0.1 ms.Kinetics of involving helices P3 and P7 are particularly slow because the two strands of P3 and P7 are far apart in the sequence, and thus it takes substantial time to search  A-C) The major misfolded structure (J8 / 7) and its intermedia te ( B ) ar e compar ed the nati v e intermediate and the folded structures ( A ).The spatial arrangements of J8 / 7 and other strands at the cor e ar e depicted on the right in blue and r ed cir cles.In the misfolding intermedia te (B , left), the strand J8 / 7 (colored in red) passes through an incorrect location relati v e to the other two strands of the P3 helix (cyan), resulting in a topolo gicall y tra pped state.This incorrect topolo gy cannot be resolved unless other tertiary contacts such as TL2-TR8 (contact between domains P2 (orange) and P8 (blue)) disengage, which is unlikely to occur under the folding condition.As a consequence, it leads to a metastable state that is as compact as the folded state but with an incorrect chain topology.In panel ( C ), the same region in the nati v e (upper, PDB 7EZ0 ( 41 )) and misfolded (lower, PDB 7UVT ( 42 ) and PDB 7XSK ( 43)) Tetrahymena ribozyme, solved by cryo-EM are shown for comparison.( D ) Comparison of experimental footprinting data with SASAs calculated from simulations.Protection factor ( F p ) are calculated for the nati v e (middle, b lue) and misfolded ensembles (bottom, red) fr om simulations.Pr otected nucleotides, indicated by gr een-filled r ectangles on top, are from data in three experiments, as labeled on the right ( 17 , 31 , 44 ).The secondary structure is shown on top for r efer ence.Note that protections of nucleotides in the P2 helix were not resolved in two experiments for technical reasons.each other.Supplementary Discussion contains further details on the folding of individual secondary structures.
Equilibrium ensemble simulations ( 19 ) identified several key tertiary interactions (shown in red symbols in Figure 1 ) b y v arying the Mg 2+ concentration.In Figure 5 F, formations of those tertiary interactions are shown as time averages over the folded trajectories.Here, we find that the triple helix (TH) and Guanosine binding site (G site) form earlier than the other key elements.This is consistent with the results of equilibrium simulations ( 19 ) that reported that TH and G sites are formed at lower Mg 2+ concentrations compared to other key interactions.The time range of the formation corresponds to the second phase in Figure 1 C. Following the TH and G site formation, other key interactions form, mainly in the last phase ( t > 10 ms).
Interestingly, the kinetics of G site formation resembles the formation kinetics of the P7 helix (Figure 5 E).From the secondary structur e (Figur e 1 ), this can be explained by noting that the G site is formed with a part of the P7 helix.Since the two strands of P7 are separated along the sequence, the formation of the P7 helix is a rate-limiting step f or the f ormation of the G site.This is reminiscent of the diffusion-collision model proposed for protein folding ( 37 ).

Counterion release kinetics
We now turn to the role of the cations, K + and Mg 2+ , in dri ving the assemb ly of Azoarcus ribozyme.The interplay between the unbinding of K + ions and the association of Mg 2+ to the ribozyme as it folds is vividly illustrated in Figure 6 A. The figure shows the time dependence of the number of cations condensed onto RNA, averaged over all the trajectories (see Materials and Methods for the definition).At t = 0, on average, 40 K + are condensed onto the ribozyme (see Supplementary Figure S16 for snapshots).The monovalent K + cations are ra pidl y replaced by Mg 2+ in t 0.02 ms, which shows dramatically the counterion release mechanism anticipated by the application ( 1 , 20 ) of the Oosawa-Manning theory ( 38 ).In this time scale, nearly 90% of Mg 2+ ions in the final nati v e state are already condensed, e v en though most of the tertiary interactions and some helices are still unfolded.The replacement of K + ions by Mg 2+  shows that e v en the initiation of folding r equir es the r eduction in the effecti v e charge on the phosphate groups, which is accomplished efficiently by divalent cations.Howe v er, in this rapid process, some of the structures that form are topolo gicall y or energeticall y frustra ted, thus grea tly increasing the folding time.The pr ematur e condensation of Mg 2+ has been found to produce kineticall y hetero geneous structures that rearrange slowly both in Holliday junctions ( 39 ), and gr oup II intr ons ( 40 ).A fraction of unfolded RNAs undergoes specific collapse, adopting compact structures that reach the nati v e-like fold ra pidl y.

Fingerprint of Mg 2+ associations
Among the condensed Mg 2+ ions, some bind at specific sites as seen in the crystal structure ( 12 ).In Figure 6 B-D, we show the time-dependent Mg 2+ densities, c * , at each nucleotide site.At many of the specific binding sites, Mg 2+ ions associate with the ribozyme in the early stage of folding.For instance, at t = 0.15 ms, although most tertiary interactions are not formed (Figure 5 F), there are se v eral peaks in the Mg 2+ densities (Figur e 6 B).Inter estingly, nucleotides 15, 29, 127 and 181 are the Mg 2+ binding sites in the crystal structure.This shows that some of the specific binding sites are occupied by Mg 2+ at the earliest stage of folding.Coordina tion of Mg 2+ a t these sites is r equir ed to r educe the electrostatic penalty for subsequent tertiary structure for mation.In the inter mediate time scale, t = 1.6 ms, additional nucleotides bind Mg 2+ , as shown in Figure 6 C. At t = 30 ms (Figure 6 D), we confirmed that Mg 2+ binding sites are consistent with the crystal structure ( 12 ), which is another indication that the model is accurate.
We next investigated if there is a correlation between Mg 2+ -binding kinetics and the thermodynamics of ion association.Using the same distance criterion for Mg 2+ binding to the RNA, we computed the mean first passage time (MFPT) of Mg 2+ coordination to each phosphate site.The binding rate was calculated as the inverse of the MFPT and is shown in Figure 6 E. The nucleotides that have greater binding rates ( k b ) correspond to those that have higher Mg 2+ densities in the equilibrium simulations ( 19 ), including the positions found in the crystal structure ( 12 ).This shows that the association of the ions to high-density Mg 2+ sites occurs ra pidl y in the early stage of the folding.These results show that there is a remar kab le consistency between the order of accumulation of Mg 2+ ions at specific sites of the ribozyme and the conclusions reached based on equilibrium titration involving an increase in Mg 2+ concentrations.The coordination of Mg 2+ at early times to nucleotides are also the ones to which Mg 2+ ions bind at the lowest Mg 2+ concentration ( 19 ), thus linking the thermodynamics and kinetics of ion association to the ribozyme.

Specific Mg 2+ binding drives formation of tertiary interactions
From the kinetic simulation trajectories, we further anal yzed w hether Mg 2+ binding e v ents directly guide the formation of tertiary interactions.One of the key tertiary elements that folds in the early stage is the G site comprising the G-binding pocket and G206 (Figure 1 ).There are two peaks corresponding to A127 and G130 in the Mg 2+ fingerprint both in kinetics (Figure 6 E) and thermodynamics ( 19 ) simulations, consistent with a Mg 2+ ion bound between the two nucleotides in the crystal structure ( 12 ).We found a strong correlation between the first passage times of Mg 2+ binding at these nucleotides, and the formation of the G site.For example, in the same trajectory, as shown in Figure 3 , Mg 2+ binding to those nucleotides are noticeable as an increase in contact Mg 2+ concentration ( c * ) at the same time as G site formation (Figure 6 F, G).
The scatter plots in Figure 6 I, J show the correlation between the first passage time for Mg 2+ binding ( 127 and 130 ) and the first passage time for G-site formation ( G ) from all the trajectories.For both A127 and G130, there is a distinct temporal corr elation, r eflected as dense data points in the diagonal region.In the majority of the trajectories (74% for G130 and 58% for A127), the G-site formation and Mg 2+ binding occurred sim ultaneousl y within 0.2 ms (shown as red points in Fig 6 ).Overall, Mg 2+ binding to G130 exhibited a higher correlation (Pearson coefficient = 0.92), wher eas the corr elation for A127 was lower ( = 0.32) because G-site formation occurred later than the Mg 2+ binding in some trajectories (data points on the upper left triangle in Figure 6 I).Ne v ertheless, for both the nucleotides, the G-site formation does not precede Mg 2+ binding, indicated by the absence of data points on the lower right triangle.We conclude that Mg 2+ binding to these nucleotides is a necessary condition for the formation of the G site.
Strikingly, ther e ar e also noticeable incr eases in the local Mg 2+ concentra tion a t nucleotides A127 and G130 in the later stage, corresponding to the time when another tertiary element, Triple Helix (TH), forms (also indicated by arrows in Figure 6 F, G).Because the G site and TH are spatially close to each other, the formation of TH further stabilizes the Mg 2+ associations at the G site, especially A127, which is located at the end of the strand that constitutes the TH.The temporal correlations between these nucleotides and both the G site and TH show that Mg 2+ binding to some nucleotides in the core of the ribozyme contribute to more than one tertiary contact.
Figure 6 K, L that U51 and U126 also bind Mg 2+ ions upon folding of TH, which is consistent with the observation of two Mg 2+ ions in the crystal structure.The scatter plots show that, in some trajectories, TH forms before Mg 2+ ions bind to U51 and U126, indicating that Mg 2+ binding to these nucleotides are not needed for the formation of the TH.This supports the finding that TH formed in 60% of the population e v en at submillimolar Mg 2+ concentration in the equilibrium simulation study ( 19 We found similar correlations for other key tertiary elements, Stack Exchange, TL2-TR8 and TL9-TR5.See Supplementary Discussion and Supplementary Figures S17-S20.

Non-native base pairs impede folding both to the native and topologically-trapped states
Because our model allows any combination of canonical (G-C and A-U) and Wobble (G-U) base pairs to form, we found a number of non-nati v e base pairs in the folding process.Some of these base pairs formed only transientl y, w hereas others have relati v ely long lifetimes, suggesting a link between the formation of non-nati v e base pairs and misfolded states.By counting the frequencies of each non-nati v e base pair, we identified fiv e frequently mispaired str and-str and combinations (Supplementary Figure S8).These strands are parts of helices, P3, P6, P7 and junction J8 / 7.For instance, each strand of P3 forms a double-strand helix with another strand from P7, leading to the formation of mispaired helices P3u-P7u and P3d-P7d (Supplementary Figure S8 right top).Here, we introduced the notation, u and d , to distinguish between the two strands for nati v e helices, the upstream (5'-end) strand u and the one downstream d .Helices, P3, P6, P7 are unfolded in the absence of Mg 2+ (Supplementary Figure S4).Gi v en that these mispaired helices consist of at least four consecuti v e base pairs, except P3d-J8 / 7, it is reasonable that such incorrect pairings are formed in the process of the unfolded strands searching for their counterparts.In addition, we found alternati v e secondary structures of P6 helix, alt-P6 (Supplementary Figure S8, bottom right).
In Supplementary Figure S21a, we show the timedependent fractions of mispaired helices.One of such mispairing, P3d-J8 / 7, formed earlier than others, and the fraction is also higher.Inter estingly, ther e is a significant fraction of such mispaired helices e v en in folding trajectories that reach the nati v e state.Indeed, the time-dependent fraction for folded and misfolded trajectories resemble each other (Supplementary Figure S21B, C).In both cases, the fractions of mispaired helices increase between 0.1 < t < 1 ms, and then decrease to nearly zero at t ∼ 30 ms.In contrast to expectation, the persistent misfolded states do not have a significant amount of mispaired helices.We conclude that the mispaired helices (energetics traps) often form regardless of whether folding is on the pathway to the nati v e state or kinetically trapped.

Footprinting data is consistent with the formation of misfolded states
Hydroxyl radicals are often used to study hierarchical structure formations in footprinting experiments.In hydroxyl radical footprinting (analogous to hy drogen ex change experiments using NMR for proteins), one can detect the extent to which nucleotides in RNA are pr otected fr om cleavage reactions b y hy droxyl radicals.It is known that the degree of protection is highly correlated with the solventaccessible surface area (SASA) of the backbone sugar atoms, wher eas ther e is no dir ect r elationship between protection and its sequence and secondary structure (27)(28)(29).
Consequently, the technique is useful for assessing the regions that are densely packed.In the context of ribozyme folding, it is often reported as 'protections', which indicate r egions wher e tertiary interactions form to a greater extent compared to some reference state, which is typically the unfolded state before folding is initiated.For the Azoarcus ribozyme, se v eral footprinting data are available in the literature ( 17 , 31 , 44-49 ).Because it is difficult to characterize the molecular details of the misfolded conformations in experiments, our simulations provide the needed quantitati v e insights into the protection factor at the nucleotide le v el, thus filling in details that cannot be resolved experimentally.
We calculated the protection factors (f ootprints) f or the nati v e and the trapped ensembles based on SASA values of the simulated structures, and compared them with the experimental data.The two sets of footprints from the nati v e and misfolded simulation ensemb le hav e a similar pattern (Figure 7 D) but with differing amplitudes, which is an indication of the extent of protection.Even though the two structural ensemb les hav e differ ent chain topologies, ther e are many well-packed regions in common.This also implies that the ensemble of misfolded conformations shares many common characteristics with the nati v e structures, as implied by the kinetic partitioning mechanism (KPM) ( 34 , 50 ).
Experimental and calculated footprints are mostly consistent with the positions in both the protection factor profiles (Figure 7 D).There are 28 nucleotides commonly protected in the three experimental data compared ( 17 , 31 , 44 ).Among these 28 nucleotides, 23 nucleotides are protected in the nati v e ensemb le (sensiti vity (true positi v e rate) 0.82 and specificity (true negati v e rate) 0.79), whereas 24 nucleotides are protected in the misfolded ensemble from the simulation (sensitivity 0.86 and specificity 0.83), using a threshold protection factor, F p = 1.2.
The analysis based on Figure 7 D quantitati v ely show that our simula tion da ta and experiments are consistent with each other.Howe v er, the comparison also shows that footprinting analyses may not uniquely distinguish the nati v e structure from the misf olded conf ormation unless the amplitudes are quantitati v ely compared.Chauhan and Woodson ( 17 ) (their footprinting data is shown in Figure 7 D) reported that about 20% of the population was misfolded, although these experiments cannot provide the molecular details of the misfolded structures.Gi v en the good agreement found in Figure 7 , we predict that the misfolded state identified in the simulations contributes to the 20% fraction identified in experiments.The topolo gicall y-tra pped states ar e mor e fle xib le than the nati v e structure, which is reflected in the decreased amplitude in the protection factor (Figure 7 D).

DISCUSSION
We performed coarse-grained simulations to re v eal the structural details and the mechanisms by which specific and correla ted associa tion of Mg 2+ with nucleotides dri v e the multistep folding kinetics of Azoarcus gr oup-I intr on RN A. The sim ulated colla pse kinetics ( R g versus time) and the tSAXS experimental data ( 14 ) are in excellent agreement with each other.Ther e ar e thr ee major phases in the folding kinetics: (1) Ra pid colla pse from the unfolded ( R g ≈ 7.8 nm) to an intermedia te sta te in which the RNA is compact ( R g ≈ 4 nm).(2) The second phase involves the formation of the I c state that is almost as compact as the nati v e structure ( R g ≈ 3.5 nm).(3) In the final phase, there is a transition to the nati v e structure with R g ≈ 3 nm.Interestingly, most (about 80%) of the secondary structures are formed within the first phase.In contrast, only about half of the tertiary interactions form incrementally during the first and second phases, and the remaining tertiary contacts form in the last phase.This suggests that the folding transition state is close to the folded state, which confirms the conjecture made previously ( 51 ).
A surprising finding in our simulations is that Mg 2+ ions condense onto the ribozyme over a very short time window, t < 0.05 ms, which results in the release of K + , an entropically favorable event.Nearly 90% of Mg 2+ ions are condensed in this time frame, e v en though most of the tertiary interactions and some helices are disordered.These findings show that Mg 2+ condensation, in conjunction with K + r elease, pr ecedes the formation of major iondri v en rearrangements in the ribozyme.We belie v e that this is what transpires in ribozymes and compactly folded RNAs.

Kinetic partitioning
The initial ribozyme collapse could be either specific, which would populate nati v e-like structures that would reach the folded state ra pidl y, as predicted by the KPM ( 34 , 52 ), or it could be non-specific.In the latter case, the ribozyme would be kinetically trapped in the metastable structures for arbitrarily long times.In either case, theory has shown that the collapse time, c ≈ 0 N ␣ ( N is the number of nucleotides and ␣ ≈ 1) with the prefactor, 0 , that is on the order of (0.1 −1) s ( 52 ).Taking 0 ≈ 0.5 s leads to the theoretical prediction that for the 195-nucleotide Azoarcus ribozyme, c ≈ (0.02 − 0.2) ms, which is in accord with both simulations and experiments.
To determine if the structural variations at the earliest stage of folding affect the fate of the RNA, we analyzed the ensemble of conformations immediately after the initial collapse ( t < 150 s). Figure 4 C shows the probability distributions of the structur al over lap function ( ) calculated for the just-collapsed ensemble.The order parameter, , measures the extent of structural similarity to the nati v e structure (0 for no similarity and 1 if it matched the folded state).The data is decomposed into four categories depending on the fate of each tr ajectory, r a pidl y folded, slowl y folded, tr apped, and tr ajectories that are neither folded nor kineticall y tra pped.Ther e ar e two major findings in this plot: (1) Trajectories that result in either folded or trapped states have higher structural similarity to the folded state compared to those that do not reach these sta tes a t an early folding stage.(2) The distribution of the ra pidl y folded ensemble has a long tail with substantial similarity to the nati v e structure, indica ting tha t rapid folding to the folded structure arises from specific collapse.Although this result was expected on theoretical grounds ( 53 ), which has been established for RNA molecules whose folding rates vary over 7 orders of magnitude ( 54 ), there has been no direct demon-stration of specific collapse and the associated structures until this study.

Mispaired secondary structures impede folding
We observ ed se v eral incorrect base parings en route to either the nati v e or the misfolded state (Supplementary Figures S8 and S21).RNA sequences, in general, tend to form di v erse secondary structures because there are likely multiple pairs of partially complementary regions.For instance, in mRNAs that do not have specific tertiary structures, many dif ferent pa tterns of secondary structures have been observed for a single sequence ( 55 ).It is clear that e v en for a well-e volv ed sequence that has a specific tertiary structure, like the ribozyme, non-nati v e complementary pairs can not be entirely avoided ( 56 ).Such mispaired helices ine vitab ly slo w do wn the f olding ( 57 ) and have to unf old, at least partially, before the RNA reaches the nati v e state ( 58 ).
For group I intron ribozyme, it has been suggested that helix P3 has an alternati v e paring pattern (alt-P3) ( 59), which we observe as P3d-J8 / 7 in the simulations.The alt-P3 is thought to be a major reason for the slow folding rates of group I intron ribozyme.In accord with this proposal, it was shown that a point mutation in alt-P3, which stabilizes the corr ect pairing, incr eased the folding rate by 50 times in Tetrahymena ribozyme ( 60 ).In line with the same reasoning, Candida ribozyme, which does not have stable alt-P3, folds ra pidl y without kinetic tra ps ( 61 ).To our knowledge, other combinations of mispaired helices reported here (Supplementary Figure S8) have not been detected in experiments, e v en though such mispaired helices are permissible.Ther e ar e a variety of metastable structur es that r ender the folding landscape of RNA rugged, besides alt-P3.

Topological frustration causes trapping in long-lived metastable states
We also found that the intermediate state, I c , consists of not only on-pathway conformations but also misfolded structures.The Azoarcus ribozyme folds correctly in a shorter time than topolo gicall y similar but larger ribozymes such as group I intron from Tetrahymena .Ne v ertheless, se v eral e xperiments hav e shown the e xistence of metastab le states ( 14 , 18 , 47 ).These metastable states slowly transition to the nati v e state, which can be accelerated by urea ( 8 , 62 ).We did not see refolding e v ents from the misfolded state to the nati v e state within our simulation time (30 ms).This is also consistent with experiments, which showed that the misfolded state is long-li v ed and remained stable after 5 minutes of incubation with Mg 2+ ( 18 ).The estimated refolding rate was 0.29 min −1 at 32 • C with 5 mM Mg 2+ .
In our simulations, the ribozyme misfolded to a topologically frustrated state when one of the peripheral contacts, TL2-TR8 and TL9-TR5, formed early before the majority of the other tertiary contacts were fully formed (Figure 4 ).The reason for the more pronounced involvement of TL2-TR8 and TL9-TR5 in misfolding is that topological entanglement can easily occur when contacts in peripheral regions form first.We surmise that this mechanism is common to the folding of larger ribozymes.
Very fe w e xperiments hav e reported on the details of the misfolded structure due to the difficulty in distinguishing heterogeneous and transient structures.In Tetrahymena ribozyme, it was suggested that the misfolded state has a similar topology to the nati v e state, but with less non-nati v e pairing ( 63 ).The misfolded state is mostly stabilized by nati v e-like interactions, but there is 'strand-crossing', by w hich the RN A conformation is tra pped and unable to recover the native structure.Our results show that this is also the case in the smaller and faster-folding Azoarcus ribozyme.The metastable states found here consist of correct base pairs and tertiary interactions, which imply that they are nati v e-like topolo gical kinetic tra ps.
Recentl y, two groups independentl y reported cryo-EM structures of the topolo gicall y tra pped state of the Tetrahymena gr oup I intr on ( 42 , 43 ), that shares much in common with the ar chitectur e of Azoarcus .In both these structur es, as pr edicted her e, the J8 / 7 strand of the central cor e r egions is in an incorrect position relati v e to the P3 and P7 helices.In Figure 7 , a comparison of these structures with our simulated misfolded structure (J8 / 7) shows remar kab le agreement in the strand arrangement.We surmise that our simulations, conducted without any prior knowledge of the topolo gicall y tra pped states, predict the major structures of the metastable states that are populated during Azoarcus ribozyme folding.Our simulations allow us to trace the process of this misfolding back to the intermedia te sta te (Figure 7 B), thus establishing a kinetic basis for their formation.Compared to the intermediate state in the correct folding pathway (Figure 7 A), we now see how the subtle difference in the folding trajectory of the J8 / 7 strand ultimately leads to the nati v e and topolo gicall y tra pped nati v e-like states, both of which are stabilized by the peripheral contacts in a similar way.

Role of mono v alent ions
Azoarcus ribozyme folds into a compact structure at high concentrations of monovalent ions e v en in the absence of Mg 2+ ( 64 ).Howe v er, Mg 2+ is essential for splicing activity ( 45 ).Ne v ertheless, both pre vious e xperiments ( 64 ) and the recent empirical observation ( 65 , 66 ) that roughly 1 mM Mg 2+ is equivalent to 80 mM of monovalent cations raise the possibility that high monovalent cations could substitute f or Mg 2+ .Theref ore, it is interesting to pose the following question.Are the pathways explored by the ribozyme at high monovalent concentrations or when folding is dri v en b y Mg 2+ equiv alent?This question can only be answered using simulations of the kind r eported her e.Howe v er, such simulations are demanding because large system sizes are needed to obtain reliable results.Despite the difficulties, this would be a problem worth investigating in the future.

DA T A A V AILABILITY
The model structures are available in the online supplementary material.The simulation code and parameter files were deposited in Zenodo https://doi.org/10.5281/zenodo.8246270 .

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.

Figure 1 .
Figure 1.Structure and Mg 2+ -mediated collapse of the Azoarcus ribozyme.( A ) The secondary structure map shows that se v eral helices are ordered along the sequence.The helices of the Azoarcus ribozyme are conventionally denoted as P2 through P9 (labeled in black).P2, P5, P6, P8, and P9 are hairpin structures in which adjacent segments form the double strands, whereas P3, P4, and P7 are double strands formed by non-local pairs of segments.Se v eral key elements involving tertiary interactions are shown in red: TH, triple helix; SE, stack exchange; G site, Guanosine-binding site; TL2-TR8, tetraloop 2 and tetraloop-receptor 8; TL9-TR5, tetraloop 9 and tetraloop-receptor 5; and AS, the acti v e site.( B ) Tertiary structure, taken from PDB 1U6B ( 12 ).The same colors are used as in (A).( C ) Time dependence of the radius of gyration ( R g ) averaged over 95 trajectories (thick red line).Thin lines show individual trajectories.At the beginning of the simulations t = 0, the average R g is R g ≈ 7.8 nm, corresponding to the value at equilibrium in the absence of Mg 2+ .The fit using three exponential functions (see the main text) is shown by the blue dotted line.Inset: Coefficient of variation (C.V.), calculated as

Figure 2 .
Figure 2. Kinetic partitioning of the trajectories.The blue and red arrows show the fate of the 95 folding trajectories initiated from the unfolded state (top left).The numbers beside the arrows indicate the fraction of trajectories in different pathways.In the compact structural ensemb le, se v eral r epr esentati v e misfolded structures, obtained from a clustering analysis, are shown.The structures labeled 'Misfold (P2)' (shown in the middle of the panel in the box) and 'Misfold (J8 / 7)' (lower left) are topological traps.Within the simulation time, 58% of the trajectories reached the folded (nati v e) state (right bottom), whereas 15% were topolo gicall y tra pped in the J8 / 7 misfolded state (left bottom).The r emaining trajectories (27%) ar e trapped in the compact-structur e ensemble, partly because helices are mispaired (energetic trap).

10742Figure 3 .
Figure 3.A r epr esentati v e r apid folding tr ajectory.The ribozyme folds in t ∼ 2.4 ms without being kinetically trapped.( A ) Time dependent changes in R g .(B-D) Fraction of helix f ormations f or ( B ) P3, ( C ) P6 and ( D ) P7. (E-J) Fraction of major tertiary interactions ( E ) TL2-TR8, ( F ) TL9-TR5, ( G ) Triple Helix, ( H ) Stack Exchange, ( I ) G site and ( J ) Acti v e Site.See Figure 1 (A), (B) for the locations of these structural elements.Thin lines, with light colors are raw data, and thick lines with dark colors are averaged over 50 s window.(Bottom) Eight r epr esentative structur es at severaldifferent time points.Major conformational changes are indicated by blue circles with labels.See Supplementary Movie 1 to watch the trajectory.

Figure 4 .
Figure 4. Propensity to fold correctly is determined early.( A ) A r epr esentati v e misfolding trajectory showing the formation of long-li v ed topolo gicall ytrapped states.Two key interactions in the peripheral regions, TL2-TR8 and TL9-TR5, formed early in the folding process ( t < 1 ms), resulting in the junction J8 / 7 with an incorrect topology (See Figure7).Because the incorrect chain topology cannot be resolved unless both of the peripheral interactions unf old, the RNA sta ys topologically trapped for the rest of the simulation time.See Supplementary FigureS14for trajectories of R g , fractions of secondary and tertiary elements, Supplementary FigureS15for another example of a kinetic trap, and Supplementary Movie 2 for watching the trajectory.( B ) Specific collapse leads to rapid folding.From the distributions of the first passage times to the folded state, P F F P ( τ F ) , we calculated, F ( t) =

Figure 5 .
Figure 5. Hierarchical formation of secondary and tertiary structures.(A-D) Time-dependent formation of structural elements r epr esented by the potential energies associated with ( A ) secondary hydrogen bond (H-bond), ( B ) secondary stacking, ( C ) tertiary H-bond, and ( D ) tertiary stacking.The thin lines are the results for the 95 individual trajectories, and the thick line in each panel is the average over all trajectories.( E ) Time dependence of helix formation for helices P2 to P9. ( F ) Formation of six key elements associated with tertiary interactions (see Figure 1 ).

Figure 6 .
Figure 6.Fingerprints of Mg 2+ association and correlation with tertiary contact formation.( A ) Number of cations condensed onto the RNA averaged over all the trajectories.At t = 0, on average, there are 40 K + ions in the neighborhood of the ribozyme.K + cations are ra pidl y replaced by Mg 2+ in t 0.02 ms.(B-D) Mg 2+ fingerprints, measured using local nucleotide-specific concentra tion a t ( B ) t = 0.15 ms, ( C ) 1.6 ms, and ( D ) 30 ms.Symbols indicate nucleotides that are either in direct contact with Mg 2+ (gr een cir cles) or linked via a water molecule (cyan squares) in the crystal structure ( 12 ).The red crosses are nucleotides predicted to bind Mg 2+ in the equilibrium simulations ( 19 ).( E ) Binding rates calculated from the mean first passage times of Mg 2+ binding e v ent at each nucleotide.(F, G) Trajectories displaying contact Mg 2+ concentration at nucleotides.( F ) A127 and ( G ) G130 taken from the folding trajectory shown in Figure 3 .The folding times of the G site and Triple Helix (TH) are indicated by black arrows.( H ) A snapshot, at t = 0.11 ms, when the G site is formed in the trajectory in (F) and (G).Nucleotides A127 and G130 are shown in stick, and Mg 2+ ions are spheres in red.(I, J) Scatter plots of the first passage times of Mg 2+ binding to (I) A127 and ( J ) G130 versus the first passage time of the G site formation.If the Mg 2+ binding and the G site formation occurred concurrently (| G − i | < 0.2 ms), the two e v ents are regarded as strongly correlated and plotted in red; otherwise it is shown in blue.In each panel, the fraction of strong corr elations (r ed points) is indicated by the percentage in red, associated with the Pearson correlation ( ) calculated using all the data points.(K, L) Same as (I, J) except for Mg 2+ binding to ( K ) U51 and ( L ) U126 versus the first passage time of Triple Helix (TH) formation.

Figur e 7 .
Figur e 7. Topolo gical frustra tion in the persistent metastable sta te.(A-C) The major misfolded structure (J8 / 7) and its intermedia te ( B ) ar e compar ed the nati v e intermediate and the folded structures ( A ).The spatial arrangements of J8 / 7 and other strands at the cor e ar e depicted on the right in blue and r ed cir cles.In the misfolding intermedia te (B , left), the strand J8 / 7 (colored in red) passes through an incorrect location relati v e to the other two strands of the P3 helix (cyan), resulting in a topolo gicall y tra pped state.This incorrect topolo gy cannot be resolved unless other tertiary contacts such as TL2-TR8 (contact between domains P2 (orange) and P8 (blue)) disengage, which is unlikely to occur under the folding condition.As a consequence, it leads to a metastable state that is as compact as the folded state but with an incorrect chain topology.In panel ( C ), the same region in the nati v e (upper, PDB 7EZ0 ( 41 )) and misfolded (lower, PDB 7UVT( 42 ) and PDB 7XSK (43)) Tetrahymena ribozyme, solved by cryo-EM are shown for comparison.( D ) Comparison of experimental footprinting data with SASAs calculated from simulations.Protection factor ( F p ) are calculated for the nati v e (middle, b lue) and misfolded ensembles (bottom, red) fr om simulations.Pr otected nucleotides, indicated by gr een-filled r ectangles on top, are from data in three experiments, as labeled on the right( 17 , 31 , 44 ).The secondary structure is shown on top for r efer ence.Note that protections of nucleotides in the P2 helix were not resolved in two experiments for technical reasons.