Hybridization kinetics of out-of-equilibrium mixtures of short RNA oligonucleotides

Abstract Hybridization and strand displacement kinetics determine the evolution of the base paired configurations of mixtures of oligonucleotides over time. Although much attention has been focused on the thermodynamics of DNA and RNA base pairing in the scientific literature, much less work has been done on the time dependence of interactions involving multiple strands, especially in RNA. Here we provide a study of oligoribonucleotide interaction kinetics and show that it is possible to calculate the association, dissociation and strand displacement rates displayed by short oligonucleotides (5nt–12nt) that exhibit no expected secondary structure as simple functions of oligonucleotide length, CG content, ΔG of hybridization and ΔG of toehold binding. We then show that the resultant calculated kinetic parameters are consistent with the experimentally observed time dependent changes in concentrations of the different species present in mixtures of multiple competing RNA strands. We show that by changing the mixture composition, it is possible to create and tune kinetic traps that extend by orders of magnitude the typical sub-second hybridization timescale of two complementary oligonucleotides. We suggest that the slow equilibration of complex oligonucleotide mixtures may have facilitated the nonenzymatic replication of RNA during the origin of life.


2-AMINOPURINE EFFECT ON RNA THERMODYNAMICS
In this work we used 2-aminopurine as a probe for paired/unpaired state on RNA oligonucleotides. One of the main concerns regarding this approach has been whether 2-aminopurine could have heavily altered the behavior in such a way that it could have not been extended to adenine. In order to test this hypothesis, we have performed UV melting experiments of a short 5nt oligonucleotide (GCGUG) paired on a template with either adenine or 2-aminopurine in the second position. The results obtained from Van't Hoff analysis are as follows: In agreement with literature data on DNA(1) we find a modest destabilizing effect of 2-aminopurine at 37°C, which becomes negligible approaching room temperature. This observation reinforces the validity of our approach and of our results. Moreover, no significant difference could be measured by changing the buffer. Data used for the analysis are shown below: Adenine -60.20 0.1664 -10.58 Tris-HCl 5mM, 1M NaCl, pH 7

COMPARISON OF BUFFER CONDITIONS
To check whether our main buffer condition (100mM Tris Buffer pH 8.0 200mM MgCl2) could produce results comparable with a more standard condition (5mM Tris Buffer pH 7.0 1M NaCl) we performed a subset of measurements in the latter buffer. In Supplementary Section 1 we show that the thermodynamics is not affected. Here we show a global fit for the hybridization of a 8nt long oligo that clearly shows how kon, koff and KD are identical. Oligonucleotides used are A: UACAAGAUUC2ApU and B: AUGAAUCU. Parameters obtained through this characterization are kon 1.09e7 M -1 s -1 , koff 1.35s -1 and KD 1.24e-7 M -1 .

CONCENTRATION OF 2-AMINOPURINE CONTAINING OLIGORIBONUCLEOTIDES
Since there are no published values for the extinction coefficients of 2Ap-containing RNA oligoribonucleotides, we tested whether the values calculated using the parameters from Xu and Nordlund (2) and IDT oligo analyzer for 2Ap-DNA were also accurate for our molecules. To do so, we performed a series of binding experiments titrating a probe RNA sequence containing 2Ap with a target complementary sequence of known concentration (determined through UV-Vis absorbance). In the regime of [Target] >> KD, every molecule of the Target will immediately bind to the probe, quenching the 2Ap signal in a linear fashion. This process goes on until the probe is completely saturated, switching to a Target-independent signal. The transition between these two regimes marks the true concentration of the Probe. The oligonucleotide extinction coefficient at this point can be simply calculated by dividing the absorbance measured at 260nm in a 1cm cuvette for the probe concentration determined by titration. Figure 3. (a) Example of extinction coefficient determination for a 2Ap-containing RNA oligonucleotide (probe) titrated with a complementary target. The transition between linear behavior and saturating regime marks the concentration of the probe. (b) Comparison between experimentally determined extinction coefficients and the ones predicted using Xu & Nordlund parameters or IDT oligo analyzer.

ANALYSIS OF STOPPED-FLOW DATA
Measurements of association kinetics were performed using a Jasco FP-8500 Spectrofluorometer equipped with the SFS-852T Stopped-Flow accessory monitoring the fluorescence emission of a 2Apcontaining oligonucleotide. Settings used for acquisitions are the following: • Ex bandwidth 5 nm • Em bandwidth 10 nm • Ex wavelength 300 nm • Em wavelength 370 nm For long oligonucleotides studied in this work the reaction goes to completion in roughly one second with complete binding. In this regime our data are only sensitive to an upper limit of koff since the dissociation rate is fundamentally negligible. We can thus fit our data [A](t) with a simple second order reaction and extract kon. As a rule of thumb, the data are sensitive to koff when it is roughly comparable to kon[B]010 -2 , which corresponds to ≈ 10 -1 s -1 and a binding affinity KD equal to 10 -8 M. For short oligonucleotides having KD > 10 -8 M (G > -11 kcal/mol), dissociation is not negligible, and binding is not complete in the micromolar range. In this case a series of measurements at increasing concentrations of oligonucleotide B have been fit altogether with shared kon and koff:

1) d[A]/dt = -[A]⋅[B]⋅kon + [AB]⋅koff 2) d[B]/dt = -[A]⋅[B]⋅kon + [AB]⋅koff 3) d[AB]/dt = [A]⋅[B]⋅kon -[AB]⋅koff
For such cases, the MATLAB fminsearch function has been employed to minimize the residuals from the experimentally measured [A](t) and that calculated from the set of differential equations solved with ODE15s. The best fit parameters error has been determined from the sum square error as described in literature (3) using a threshold calibrated on our dataset as equal to 0.9.

COMPARISON WITH NN CALCULATIONS
Once we determined that the 2Ap effect on RNA duplex stability at room temperature is negligible, we asked how our measurements compare with predictions from the NN database. We gathered G values from multiple techniques, finding in every case a good agreement with NUPACK predictions as shown in Supplementary

COMPARISON OF UV MELTING AND FLUORESCENCE MELTING
We decided to test whether some discrepancy from the NN-determined G at room temperature could be due to differences between fluorescence melting and UV melting results. To evaluate this, we followed the melting of a 2Ap oligonucleotide through fluorescence while changing temperature and compared it to the regular adenine-containing oligonucleotide measured through UV melting. Results are shown below for sequences 12a (or 12a*) and 8b, and suggest that no methodological difference is present. Figure 5. Comparison of melting temperatures measured tracking either fluorescence emission (blue) in a 2Ap-containing oligonucleotide or UV absorbance (orange) in a regular adeninecontaining oligonucleotide with the same sequence.

HYBRIDIZATION KINETICS
To derive the formula for the hybridization rate in RNA we proceed as follows: from the reaction scheme in Eq. 5 (main text), under the assumption of a nucleation-limited process, we can define kon (see Supplementary Table 1 for list of names and definitions) as the following summation for every i-th initial contact over the total number of initial contacts (N), with N equal to the length of the shortest of the two strands that attempt annealing: where ri is the probability of complete zippering after the first i-th contact has been established. Ideally the contact-dependent probability function r should take into account every possible trajectory that the two oligonucleotides can follow in order to move from the initial state to the completely annealed state. Because this is impossible to calculate analytically, we approximated it here by considering only the addition and removal of adjacent base pairs with no bulges or internal loops (4,5). One important consequence of this approximation is that all nucleation events considered must be in-register (i.e. the initial base pairs will also be present in the final duplex), since all off-register events have no way of correcting their trajectory to align as they zipper.
This simplified the problem, allowing us to define this probability function r as the product of two terms, pn and pz. We first defined the probability (pn) of going from an AB1 to an AB2 state from a given ith initial contact according to the reaction scheme previously shown, and considering that for every initial binding site except the ones occurring at the termini there are two possible zippering directions: 2) This function determines the probability of reaching the state AB2, after which the zippering can proceed as a random walk with probability (pz) of undergoing a new step equal to: 3) with G calculated as the energy difference between the paired and unpaired states as previously defined. We computed (based on NN thermodynamic parameters) two extreme pz values for the probability of propagating the zippering by a single base pair, finding values approximately equal to 0.84 for fCG = 0 and 0.99 for fCG = 1. Given the high chance of this event, the overall success in hybridizing two strands will be heavily dominated by the probability pn of formation of the second base pair after the initial contact. However, if we assume that the probability of successful zippering is controlled uniquely by pn, we will overestimate r, since we would neglect the probability of any failing trajectory after the state AB2 has been reached. We calculate this error as follows: we define as the overall chance of failing the zippering and eventually falling back to the AB1 state after states AB2, AB3, AB4 etc. have been reached. If is close to zero, we can effectively neglect pz and consider that the overall success rate is controlled uniquely by pn. This value could be calculated through the Gambler's Ruin analysis (6), resulting in a probability of ~16% of falling back to AB1 after the second nucleobase is bound for the fCG = 0 case (for fCG = 1 this value is lower than 0.1%), plus an additional smaller probability of ~3% of falling back to AB1 from AB3, AB4 and so on. It follows that by considering the annealing as successful once AB2 is reached, we will incur a maximum error in estimating r which is lower than 19% in the worst-case scenario, and specifically equal to • (1 − ). Since we have established that the formation of a second adjacent base pair dominates the overall probability of complete zippering, it follows that the formation of two adjacent base pairs can be effectively defined as the limiting nucleation event. We therefore approximate pz to 1 and redefine kon using the following equation: When dealing with the stochastic simulations, we built up the success probability r for the annealing of two strands by resampling hybridization trajectories many times. Since the trajectories were solved using the Gillespie algorithm, we could compute an accurate kon that accounts for the time spent by the two strands in solution before colliding and the time spent in each zippering process (7), leading either to success with an average time spent or to failure with an average time spent . We should define first a collision time, which is the time spent by two oligonucleotides in solution before forming one among N possible initial contacts: (Supplementary Eq. 5) From this we could compute the kon taking into account the time spent by the strands to form the initial base pair and the time spent in failed attempts before their successful annealing: where conc is the oligonucleotide concentration (in our case we use 1 ), N the total number of initial binding sites and r the average success rate over all initial interactions.

FOUR-WAY STRAND EXCHANGE
To rule out the contribution of four-way strand exchange in our studies, we prepared 4 equimolar mixtures of pre-annealed duplexes (8a-12b* and 8b-12a) with 4nt-long complementary overhangs, at different total concentrations. In all these mixtures the expected concentration of free single stranded oligonucleotides during the reaction varies by less than a factor of two, while the total concentration of duplexes varies by a factor of 60. If an equilibration pathway going through the direct interaction of the two duplexes does exist, we would expect to see a concentration-dependence of our time traces. The concentration-independence of the measured time traces points at a negligible four-way strand exchange, with an upper limit for the bimolecular rate approximately equal to 10 2 M -1 s -1 , a value that would otherwise lead to a measurable acceleration of our reactions.

PREDICTION OF ANNEALING IN MIXTURES
To determine the evolution over time of a mixture of RNA oligonucleotides, we define a system of differential equations to describe the behavior of each component. For a symmetric system as the one studied in this work, where the length and sequence of the bound stretches in ALBS, ASBL and ASBS is the same, we can describe the reactions occurring as follows (association phenomena are highlighted in blue and strand displacement processes are highlighted in red for better clarity): To calculate the respective contributions of each pathway to reaching the equilibrium state, we introduced two extra differential equations in our system: Direct Hybridization Pathway: AL . BL . k L on Strand Displacement Pathway: ASBL . AL . kdispl These will produce proxy values of [ALBL] over time as monotonically increasing functions, and the relative amplitude of the two can be used to assess the contribution to the total [ALBL] formation. It should be noted that these functions are decoupled from the other differential equations, meaning that these are only used as readouts, and do not affect the concentration of the different species participating in the reaction.
All parameters have been calculated as described in the main text: • kdispl is computed as ks⋅KA, with ks = 8.9 s -1 . Two displacement rates should be considered here, one for AL displacing ASBL and one for BL displacing ALBS. For a symmetric system the two numbers are going to be extremely close and can be approximated as a single rate. • Association kinetics k L on and k S on have been calculated using Eq.7 and Eq.8 from main text.
• Dissociation kinetics k L off and k S off have been calculated as kon⋅e ΔG/(R⋅T), where ΔG is the binding energy of the two associated oligonucleotides.
To determine a prediction interval, we varied the parameters within their typical experimental error found in the context of this work: 3% for ΔG, 10% for kon and 5% for ΔGtoehold.