Understanding DNA interactions in crowded environments with a coarse-grained model

Abstract Nucleic acid interactions under crowded environments are of great importance for biological processes and nanotechnology. However, the kinetics and thermodynamics of nucleic acid interactions in a crowded environment remain poorly understood. We use a coarse-grained model of DNA to study the kinetics and thermodynamics of DNA duplex and hairpin formation in crowded environments. We find that crowders can increase the melting temperature of both an 8-mer DNA duplex and a hairpin with a stem of 6-nt depending on the excluded volume fraction of crowders in solution and the crowder size. The crowding induced stability originates from the entropic effect caused by the crowding particles in the system. Additionally, we study the hybridization kinetics of DNA duplex formation and the formation of hairpin stems, finding that the reaction rate kon is increased by the crowding effect, while koff is changed only moderately. The increase in kon mostly comes from increasing the probability of reaching a transition state with one base pair formed. A DNA strand displacement reaction in a crowded environment is also studied with the model and we find that rate of toehold association is increased, with possible applications to speeding up strand displacement cascades in nucleic acid nanotechnology.

OxDNA and its interaction potentials have been described in detail elsewhere [1][2][3][4] . In this work, we use the averagestrength parameterization of the model from reference 4 . The model represents DNA as a 1D chain of nucleotides, where each nucleotide (sugar, phosphate and base group) is a rigid body with three interaction sites. The potential energy of the system can be decomposed as where the first sum is taken over all nucleotides that are nearest neighbors on the same strand and the second sum comprises all remaining pairs. The interactions between nucleotides are schematically shown in Fig. S1. The backbone potential b.b. is an isotropic spring that imposes a finite maximum distance between backbone sites of neighbors, mimicking the covalent bonds along the strand. The hydrogen bonding ( HB ), cross stacking ( cr.st. ), coaxial stacking ( cx.st. ) and stacking interactions ( stack ) are anisotropic and explicitly depend on the relative orientations of the nucleotides as well as the distance between the relevant interaction sites. This orientational dependence captures the planarity of bases, and helps drive the formation of helical duplexes. The coaxial stacking term is designed to capture stacking interactions between bases that are not immediate neighbors along the backbone of a strand. Base and backbone sites also have excluded volume interactions exc and ′ exc . Finally, we treat the electrostatic interactions using the Debye-Huckel approximation ( DH ), with effective charges parameterized to reproduce the stability of short duplexes at sodium concentrations ranging from 0.1 to 1 M 4 . Hydrogen-bonding interactions are only possible between complementary (A-T and C-G) base pairs. In the averagestrength parameterization that we use for all simulations, the strengths of interactions stack and HB are set to be the same for all types of nucleotides, parameterized to reproduce melting thermodynamics of short duplexes and hairpins with average-sequence content as predicted by SantaLucia's nearest-neighbor model 5 .
To account for the presence of the crowders, we introduce a new interaction potential crow. exc. into the oxDNA model, that consists of the following terms: crow. exc. = crowder crowder + crowder back + crowder base (S2) where the RHS of Eq. S2 correspond to the excluded volume interaction between two crowders, between a crowder and a nucleotide's backbone site, and between a crowder and the nucleotide's base site respectively. The functional form of the crowder excluded volume interaction potential is given by otherwise. (S3) which consists of a Lennard-Jones potential function that is truncated using the quadratic smoothing function ensuring that the potential is a differentiable function that is equal to 0 after a specified cutoff distance cut . We set * = 2 c (where c is crowder radius) for the crowder-crowder interaction, and to * = c + b for the interaction with the backbone site or the base site, where b corresponds to the radius of the backbone site or the base site in the oxDNA model respectively.
The parameters for each of the respective terms of crow. exc. potential are listed in Table S1 for the crowder radii 0.85, 1.7 and 2.56 nm respectively. For interaction potentials shown in Table S1, crowder−back corresponds to the distance between the center of the crowder sphere and the backbone site on the nucleotide in the oxDNA model, and crowder−base is the distance between the center of the crowder sphere and the base site of the nucleotide. Finally, crowder−crowder is the distance between the centers of two crowder spheres.
As discussed in the main text, oxDNA has been extensively tested for other DNA properties and systems to which it was not fitted. Our success in describing all these phenomena gives us confidence to use it to study the dynamics of hybridization, hairpin formation, and strand displacement in the presence of crowders.

SII. Simulation Methods
A. Thermodynamics

Virtual Move Monte Carlo
A standard approach for calculating the thermodynamic properties of computational models is the Metropolis algorithm 6 . A drawback with this approach is that only moving single particles at a time results in slow equilibration for systems with strong attractions. This is true for DNA strands, where collective diffusion is strongly suppressed if nucleotides are moved individually. Simulations can be made more efficient by using the Virtual Move Monte Carlo (VMMC), which allows for collective diffusion using cluster moves of particles 7 . Specifically, we use the variant presented in the appendix of reference 7. Initially, a particle is selected, and a move is chosen at random as in the Metropolis algorithm. The particle's neighbors are then added to a co-moving'cluster' with probabilities determined by the energy changes that would result from the move. Consequently, multiple particles tend to move at once. To use VMMC, we must select 'seed' moves of a single particle. For all VMMC simulations reported here, the seed moves were: • Rotation of a nucleotide about its backbone site, with the axis chosen uniformly on the unit sphere and the angle drawn from a normal distribution with a mean of zero and a standard deviation of 0.22 radians.
• Translation of a nucleotide, where the displacement along each Cartesian axis is drawn from a normal distribution with a mean zero and a standard deviation of 0.15 simulation units of length (0.1277 nm).

Umbrella Sampling
An important concept is that of a reaction coordinate (or order parameter) , which groups together microstates of a system that share some macroscopic property (for example, all configurations of strands with a certain number of base pairs). The free-energy profile as a function of can provide useful information about the reaction, provided an appropriate choice has been made. Free-energy barriers can make certain regions of configuration space hard to reach, which prevents efficient sampling of all of the states of interest. The free-energy landscape can be artificially flattened by weighting states with different values of appropriately, a technique known as umbrella sampling 8 . Thermodynamic properties of the system can then be extracted from simulations by unweighting the resulting distributions.
In particular, for an unweighted simulation a particular microstate with coordinates q and energy (q ) is sampled with probability The equilibrium average of some variable (q ) is then given by the sum over all states, weighted by their Boltzmann factors: By applying a weighting = ( (q )) to each value of the order parameter, we change the sampling frequency to where the subscript indicates a property of the weighted system. So we can artificially ensure that our simulation samples all states equally by making constant for all microstates. Equilibrium thermodynamic properties are then obtained by unbiasing afterwards, as can be seen by rewriting Eq. (S7) as follows: Throughout this article, it makes sense to use the number of base pairs in our definition of . This is the usual choice for studying hybridization processes. The weights used in our simulation were found iteratively, first by making an educated guess for the approximate values and then iteratively adjusting them so that the simulation would sample all values of and spend approximately the same amount of time in each state corresponding to sampled values of . The final values are available in the github repository along with our code.
B. Kinetics

Molecular Dynamics
Kinetic simulations were performed using an Anderson-like thermostat, similar to the one described in appendix A of reference 9. The Newtonian equations of motion for the system are integrated by Verlet integration 10 with a discrete time-step , so that the positions, velocities, orientations, and angular velocities of the nucleotides are recalculated at each time-step. This alone would give the DNA strands constant energy and cause ballistic motion. In reality, DNA in a solvent is being bombarded by water particles and thus undergoes Brownian motion. To model Brownian motion, the velocity of each nucleotide is resampled with a probability = 0.02 from a Maxwell-Boltzmann distribution at the temperature of the solvent every 103 time steps. The algorithm also resamples angular velocities with a different probability = 0.0067. The solvent thus acts as a large heat bath at a fixed temperature, ensuring that the simulated system samples from the canonical ensemble. On time scales longer than / , where is the integration time step, the dynamics is diffusive. We choose = 1.52 × 10 −14 s for all dynamics simulations in this study. In oxDNA this time step gives a diffusion constant for a 14-mer duplex that is about 100 times higher than experimental measurements 11 of = 1.19 × 10 −10 m 2 s −1 . This artificial increase in is a common procedure for coarse-grained models where higher diffusion constants can be used to accelerate diffusion. Accelerated diffusion can also speed up certain processes by smoothing out, on a microscopic scale, energy profiles 12 . This can be advantageous because it means simulations utilizing coarse-grained models can be used to study more complicated systems. In a previous study using oxDNA, the hybridization kinetics of a non-repetitive sequence was considered 13 . In that study, it was shown that using higher friction constants (smaller diffusion coefficients) in simulations utilizing Langevin dynamics at 300K slowed down hybridization, but did not otherwise qualitatively affect the results. In particular, the tendency for initial base pairs to melt away rather than lead to a fullduplex was found to be preserved. Our systems are similar to those studied in reference 13, possessing similar numbers of total base pairing between the strands, and using a similar simulation temperature. Additionally, many approximations of real DNA have already been made in the construction of the oxDNA model, and we expect that running simulations with a diffusion coefficient that is larger than the experimentally measured value should preserve the effects that hairpins in single strands have on the relative hybridization and dissociation rates.

Forward Flux Sampling
'Brute force' dynamics simulations using an Anderson-like thermostat are not efficient enough to generate a representative ensemble of trajectories that start in the single-stranded state and end in the duplex state. Thus, we resorted to using Forward Flux Sampling (FFS) to more efficiently calculate fluxes between local free-energy minima as well as sample the transition pathways. The term 'flux' from (meta)stable state A to state B has the following definition: Given an infinitely long simulation in which many transitions are observed, the flux of trajectories from A to B is where is the number of times the simulation leaves A and then reaches B, is the total time simulated, and is the fraction of the total time simulated for which state A has been more recently visited than state B.
FFS requires use of an order parameter, , which provides a descriptive measure of the extent of the reaction between states A and B. Additionally, the order parameter must be chosen such that non-intersecting interfaces −1 can be drawn between consecutive values of . At the beginning of an FFS implementation, a brute force simulation is run starting from states described by = -2, and the flux of trajectories crossing the surface 0 −1 is measured. The total flux of trajectory from = -2 to another free-energy minimum = can be calculated as the flux of trajectories crossing 0 −1 multiplied by the probability that trajectories subsequently reach = max , all before returning to = -2. The probability can be factorized as The first term in the product on the right-hand size of Eq. S11 is calculated by loading random configurations that have just crossed 0 −1 , which are used to estimate ( 1 0 | 0 −1 ). The process is then iterated for successive interfaces, and the flux as well as the trajectories that successfully reach from the distribution of pathways can be measured.

SIII. Simulation Protocols
In this section, we discuss the implementation of the algorithms of Section SII for both single-stranded and duplex systems. We simulated the hairpin formation, duplex formation, and toehold-mediated strand displacement systems using FFS and VMMC simulations. We used a 'bonds' order parameter that measures the total number of base pairs, which can be specified to be intra-(for hairpins) or inter-strand (for duplexes and stand displacement) base pairs. The definition of a bonded base pair in our simulations is two bases with a hydrogen bonding energy below 0.596 kcal mol −1 . This value for the selected cutoff corresponds to about 15% of typical hydrogen-bond energy. In FFS simulations, we additionally used a 'distance order parameter' to measure the minimum distance between hydrogen-bonding sites overcorrect pairs of bases in the two strands.

Duplex and Hairpin Thermodynamics
For the duplex and hairpin formation, we defined order parameters as the number of native bonds present in the stem for hairpins, and between two duplexes. The weights were manually adapted in order to assure proper sampling of all values of the order parameter. For each condition studied (as defined by crowder radius c and volume fraction occupied by crowders), we run at least three separate replicas of a freeenergy calculation. The unbiased number of states for each value of the order parameter (using the procedure outlined in Sec. SII A 2) were used to plot the free-energy diagram. Histogram re-weighting was used to extrapolate free-energy profiles to other temperatures that the one at which the VMMC umbrella sampling simulations were run.
For each of the studied crowding conditions, we ran at least three independent umbrella sampling simulations. The error bars shown in the free-energy profiles in the main text in in the Supporting Information correspond to the standard deviation for the given value from all replicas for a given system. Each VMMC umbrella sampling simulation was run for at least 10 9 steps.
For the melting temperature calculation in duplex formation simulations, care must be taken in extrapolating from a simulation of two strands to a bulk solution with many more strands, because fluctuations in local concentrations play an important role. If Φ is the ratio of bound to unbound states in a simulation of two molecules, the yield of a nonself-complementary duplex in a bulk solution (with the same average concentration of reactants) is given in reference 14 as dim bulk = The melting temperature occurs when dim bulk = 1/2, which corresponds to a simulation yield of = 2.

FFS Simulation Details
In this section, we discuss the implementation of the FFS algorithm, discussed in Section SII. As mentioned in Section SII B 1, we simulated the duplex, hairpin, and strand displacement systems using molecular dynamics at = 37 ∘ ∘ C. We also use the same definition of a bonded base pair that was discussed in Section SIII.

Order Parameter Used in FFS Simulations
The order parameter used in the simulations are detailed in Tables S3, S2, and S4 for hairpins, duplex formation, and strand displacement respectively. Specifically, we use a combination of distance and base pairing criteria as outlined in Section SIII. Distance criteria are used to define states = −2 → 2, and bonding criteria for states =≥ 0. In all the FFS simulations, the strands are designed to only allow for base pairs between correctly aligned native base pairs. The flux is measured using a brute force simulations, counting the number of states per unit time that pass through interface = 0 coming from state = −2. The transi-  16   TABLE S4. The order parameter used in FFS simulations of strand displacement. The parameter is the minimum distance between any intended base pair on the two strands, is the number of base pairs between the two strands. A base pair is taken to be present if the hydrogen-bonding energy is less than −0.596 kcal mol −1 for x tion probabilities that we measure for hairpin (or duplex or strand displacement) correspond to transition through interfaces +1 for = 0 and = 1 (and also = 2 for strand displacement).

SIV. Scaled Particle Theory
For reactions of the type P(solution) ↔ P(aggregate), which include here the unimolecular hairpin folding and the bimolecular duplex formation reactions, the thermodynamic solubility constant absent crowders may be written 15 where is the activity, is the activity coefficient, and is the concentration of ssDNA. When crowders are present ( > 0), the equilibrium constant is taken to be proportional to that absent crowders as ≡ 0 . (S15) In the main text we showed that both the thermodynamics and kinetics results results obtained from oxDNA simulations for a variety of crowder sizes and concentrations confirm that crowders do not significantly influence dissociation of hairpins or duplexes. As a result we may write where 0 , are the rate constants when crowders are absent and present at finite , respectively.

A. Hairpin Formation
In the case of the unimolecular reaction, an unstructured single-strand of DNA (said to be in state A) folds into a hairpin (said to be in state * ). Both the crowder and the ssDNA are assumed to be hard spheres. For this reaction type, SPT may be used to compute the parameter 1 as 15 (S17) where 1 = 3 + 3 2 + 3 (S18) 2 = 3 3 + 4.5 2 (S19) where ≡ / , is the radius of a ssDNA ( may be ℎ for the strand that may fold into the hairpin, or 8-mer for either of the two strands that come together to form the duplex), and is the crowder radius.

B. Duplex Formation
In the 8-mer system, two unstructured, complementary strands of DNA (taken to be states and * ) come together to form a duplex (taken to be the state * ). The two unbound ssDNA in states or * are again initially modeled as hard spheres having the same radii . The duplex in state * is taken to be a hard spherocylinder comprised of two adjacent (hard-sphere) DNA strands with radius along the z-axis of the cylinder denoted , and the ratio of the axial length to the cylinder diameter is denoted as . In reality, a 'perfect' DNA duplex, where strand is the reverse complement of * and vice versa, is cylindrical in shape, but does not have capped ends, and is only 'hard' for lengths well below the persistence length of DNA (about 150 base pairs). The duplexes investigated here were only 8 bases long and as such were extremely rigid.
For both hairpin or duplex formation reactions investigated here, the relative free-energy difference between crowder-free and crowded environments, ∆∆ , relates to the equilibrium constants through which can be rewritten for the case of duplex formation as ln where the left hand side of Eq. S32 is computed according to SPT, while the right hand side is computed by estimating the free-energy terms using oxDNA simulations.
Following the approach taken in reference 18, we fit the SPT model to the oxDNA data by using the radius of the unfolded single-strand state in each hairpin and duplex systems as a fit parameter, and the spherocylinder radii in the duplex system. The former measurement represents the "effective" radius of the unfolded states as the crowder size and volume fraction varies while the latter is the "effective" spherocylinder radius. We also ran long MD simulations to obtain representative trajectories from which we measured the end-to-end separation ee in the single strand states ( Fig. 4(a)(i) and (b)(i)), and the radius of the folded hairpin state (Fig. 4(a)(ii)) to compare the oxDNA predicted radii with the fitted effective radii. The sphero-cylindrical radius of the duplex, , is illustrated in Fig. 4(b)(ii). For the unfolded single strands in Fig. 4(a)(i) and (b)(i), we measured the end-to-end distance ee between terminal bases for the configurations in a trajectory to obtain an average measurement. For the hairpin shown in Fig. 4(a)(ii), we measured the average separation between a terminal stem base and a base in the middle of the loop, that is we measured the radius two times per configuration, then averaged them to obtain the effective radius of the folded structure.
In Table S5, we list the fitted radii used to compute the LHS of Eq. S32 and the computed radii from oxDNA simulations. The results of the thermodynamics comparison between SPT and oxDNA are shown in Figure 4 in the main text. As can be seen in Table S5, we observed no dependence of oxDNA-computed ee on either the investigated values ℎ (*) ℎ (oxDNA) ℎ (oxDNA) of or for both the hairpin and the duplex systems, in either folded or unfolded states. Additionally, the Table S5 also shows that as the effective radius of the unfolded single strands increases somewhat from smallest crowder to largest crowder, and is also independent of (not shown). Finally the sphero-cylindrical radius of the duplex was observed to be nearly independent of crowder size or volume fraction, (*) = 0.85 nm, significantly less than that measured previously with oxDNA where (oxDNA) = 1.15 nm. 1 We rationalize that the radii measurements obtained by oxDNA yield volumes for hard spheres that overestimate the true volume occupied by the DNA. This is because, in oxDNA, the ssDNA behaves as a freely jointed chain with excluded volume and does not have a fixed geometry. Similarly in the case of the duplex being modeled as a spherocylinder, DNA, in reality, is cylindrical with blunt ends and not capped-ends and so likely overestimates the true volume occupied by the duplex.