Improved nearest-neighbor parameters for the stability of RNA/DNA hybrids under a physiological condition

Abstract The stability of Watson–Crick paired RNA/DNA hybrids is important for designing optimal oligonucleotides for ASO (Antisense Oligonucleotide) and CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)–Cas9 techniques. Previous nearest-neighbour (NN) parameters for predicting hybrid stability in a 1 M NaCl solution, however, may not be applicable for predicting stability at salt concentrations closer to physiological condition (e.g. ∼100 mM Na+ or K+ in the presence or absence of Mg2+). Herein, we report measured thermodynamic parameters of 38 RNA/DNA hybrids at 100 mM NaCl and derive new NN parameters to predict duplex stability. Predicted ΔG°37 and Tm values based on the established NN parameters agreed well with the measured values with 2.9% and 1.1°C deviations, respectively. The new results can also be used to make precise predictions for duplexes formed in 100 mM KCl or 100 mM NaCl in the presence of 1 mM Mg2+, which can mimic an intracellular and extracellular salt condition, respectively. Comparisons of the predicted thermodynamic parameters with published data using ASO and CRISPR–Cas9 may allow designing shorter oligonucleotides for these techniques that will diminish the probability of non-specific binding and also improve the efficiency of target gene regulation.


INTRODUCTION
The hybridization of RNA with complementary DNA occurs in many key steps of various significant biological processes, including in Okazaki fragmentation during DNA replication (1), in the transcription bubble (2), and during the reverse transcription process in the life cycle of human immunodeficiency virus (HIV) (3). RNA/DNA hybrids are also important for their therapeutic applications using ASO (Antisense Oligonucleotide) and CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-Cas9 techniques. ASO (4) is based on RNase H mediated degradation of RNA in hybrid duplexes (5) and has been proved very beneficial in the treatment of gene expression instigating diseases (6,7). Recently, a genome editing CRISPR-Cas9 technology has been developed that is expected to be a useful tool for the treatment of genetic disorders (8)(9)(10). The efficiencies of both RNase-H and Cas9 endonucleases are largely affected by the sequences and stabilities of hybrid duplexes composed of the target gene with applied complementary oligonucleotides (11,12). Moreover, telomerase activity, which is related to the existence of prolonged genes in cancer cells, can be regulated by the stability of hybrid duplexes (13). Therefore, to elucidate the mechanism of different biological processes, as well as improve therapeutics, the precise prediction of the stability of RNA/DNA hybrids under physiological conditions is required.
An approach to predict the stability of nucleic acid secondary structures from their sequences was established using the nearest-neighbor (NN) model by Tinoco et al. (14,15). According to this model, the thermodynamic stability of the duplex with Watson-Crick base pairing can be determined by the summation of the free energy changes due to all adjacent NN base pair formations in the sequence and helix initiation. Following this model, our group previously introduced NN parameters for RNA/DNA hybrid duplexes (16) in a solution containing 1 M NaCl. Prediction of duplex stabilities was extensively studied in a standard 1 M NaCl solution (16)(17)(18)(19); however, duplex stability in a lower salt concentration (∼100 mM) was also reported to be closer to that observed at physiological conditions (20)(21)(22)(23).
To derive the stability of nucleic acid duplexes (RNA, DNA and hybrid duplexes) in 100 mM NaCl solution from the estimated stability in 1 M NaCl solution, a simple method was reported by our group (21). The linear relationship between duplex stabilities in 1 M and 100 mM NaCl buffer solution (21) was proved useful for the rough estimation of duplex stability independent of sequence, length, or structure. However, several reports suggested that the effect of salt concentration on duplex stabilities depends on the sequence components (24,25). Equations for calculating the stabilities of DNA (22,26) and RNA duplexes (23) in 100 mM salt concentration have already been developed in which correction terms for sequences were incorporated. However, these equations are not applicable for RNA/DNA hybrids as the interactions of RNA/DNA hybrids with cations are different from that of RNA and DNA duplexes due to the different flexibilities and structures of hybrid duplexes (27). Recently, the prediction of the stability of only hybrid sequences in 100 mM NaCl solution was reported (28), where parameters were derived using theoretical optimization of melting temperatures (T m ). This method is very useful and adequate for calculating the T m of hybrid duplexes, however the method is not satisfactory for estimating other thermodynamic parameters ( H • , S • and G • 37 ). Therefore, an improved prediction method is needed for hybrid duplexes under the physiological salt concentration.
In this study, we have measured the thermodynamics of 38 hybrid sequences in a buffer containing 100 mM NaCl and verified the accuracy of previous methods for predicting the stability of all sequences. We observed that several sequences specifically rG−dC-or rC−dG-rich hybrid sequences containing purine-rich RNA, as well as rA−dTor rU−dA-rich hybrid sequences containing pyrimidinerich RNA, could not be predicted well using the previous NN parameters (21). We found marked average differences between the measured and predicted values for G • 37 and T m (10.7% and 4.9 • C, respectively). To improve the prediction, we developed new NN parameters that significantly reduced the average differences for the measured and predicted G • 37 and T m (2.9% and 1.1 • C, respectively). Improved parameters can be used to predict not only the melting temperatures, but also the estimation of all thermodynamic parameters remarkably well for all hybrid sequences in 100 mM NaCl solution. For computing duplex stability of any sequence in 100 mM or in 1 M NaCl solution with ease, an open access website was created. We showed that hybrid stability under typical physiological salt conditions with various monovalent and divalent cations such as Na + , K + , Mg 2+ , Ca 2+ (29) can be precisely predicted using the new parameters. Moreover, the individual effects of the monovalent and divalent cations on the stability of hybrid duplexes were discussed. We also demonstrated that the prediction of hybrid stabilities using our new parameters could be very useful for further developing of ASO and CRISPR-Cas9 techniques by enhancing the specificity and efficiency of the drugs.

Materials and reagents
The RNA oligomers listed in Table 1 and their complementary DNA oligonucleotides were purchased from Japan Bio Service Co. at High Performance Liquid Chromatography (HPLC) grade. RNA and the complementary DNA sequences of different lengths and base compositions were used as model sequences in several experiments. TOF-MS (Time Of Flight Mass Spectrometer) and HPLC (High Performance Liquid Chromatography) spectra of the four RNA and DNA oligomers of the model hybrid sequences are included in Supporting Information (Supplementary Figure S1) to exhibit the purity of purchased oligonucleotides. Concentration of each oligonucleotide was determined from average absorbance in a higher temperature range (80-90 • C) at 260 nm using the single-strand extinction coefficient (30). The RNA oligomers of Table 1 and their complementary DNA strands were mixed at an equimolar concentration to prepare corresponding RNA/DNA hybrids for this study and the RNA/DNA hybrid sequences are written here as rGCCGUGAG/dCTCACGGC for 5 rGCCGUGAG3 /5 dCTCACGGC3 . The chemicals like sodium chloride (NaCl), potassium chloride (KCl), magnesium chloride (MgCl 2 ), calcium chloride (CaCl 2 ), disodium hydrogen phosphate (Na 2 HPO 4 ) and dipotassium hydrogen phosphate (K 2 HPO 4 ) were purchased from Wako Pure Chemical Industries (Japan). Disodium ethylenediaminetetraacetate (Na 2 EDTA) and dipotassium ethylenediaminetetraacetate (K 2 EDTA) were purchased from Dojindo Molecular Technologies (Japan). All the chemicals were used as received.

Circular dichroism (CD) measurements
CD spectra were obtained using a JASCO J-820 spectropolarimeter equipped with a temperature controller. All CD spectra were collected from 350 to 200 nm wavelengths with a scan rate of 50 nm min −1 in 0.1 cm path-length cuvettes. The cuvette holding chamber was continuously flushed with a stream of dry N 2 gas to avoid water condensation forming on the cuvette exterior. All the spectra were measured at 20 M total strand concentration in the phosphate buffer solution containing either 1 M or 100 mM NaCl.

UV measurements
UV absorbance was measured using a Shimadzu 1800 spectrophotometer equipped with a thermoprogrammer. The melting curves (absorbance versus temperature curves) of the hybrid duplexes at 10-12 different oligonucleotide concentrations (varying ∼100-fold) were obtained at 260 nm using annealing from 90 • C to 0 • C, followed by heating from 0 • C to 90 • C at a rate of 0.5 • C min −1 . Water condensation on the cuvette exterior at a low temperature was avoided by flushing with a constant stream of dry N 2 gas.

Determination of thermodynamics for hybrid duplex formations
The thermodynamic parameters ( H • , S • ) for the formation of RNA/DNA hybrids were determined from T m −1 versus ln(C t /4) plot as described in a previous studies (16,31) using the following Equation (1) .  All experiments were carried out in solutions containing 100 mM NaCl, 10 mM Na 2 HPO 4 , and 1 mM Na 2 EDTA (pH 7.0). c The melting temperatures were determined at a total oligomer strand concentration of 8 M. d The hybrid stability in 100 mM NaCl buffer was predicted using Equations (4) and (5) (21) at the respective stability in 1 M NaCl buffer, determined using NN parameters (16). e The hybrid stability in 100 mM NaCl buffer was predicted using new nearest-neighbor parameters ( Table 2). The average prediction error G • 37 (measured G • 37 − predicted G • 37 ) which is represented as percentage of error with respect to predicted G • 37 and T m (measured T m − predicted T m ) were calculated for old prediction as 10.7% and 4.9 • C, respectively and for new prediction as 2.9% and 1.1 • C, respectively.
where R is gas constant and C t is total strand concentration of the oligonucleotides. The parameters ( H • , S • ) were assumed to be temperature independent and determined using the van't Hoff analysis as the transition equilibrium involves only two states (single strand and duplex). The RNA oligomers of Table 1 and their complementary DNA oligomers were previously reported to form hybrids showing two-state transition (16,32). Moreover, model RNA/DNA hybrids were examined here to show the two-state transition using CD spectra and UV melting profiles. The CD spectra were obtained using temperature induced unfolding CD assays carried out from 10 • C to 90 • C with an interval of 10 • C for hybrid duplexes, and the CD spectra for each duplex were found to be passing through an isodichroic point. The CD assays of the six model hybrid duplexes of various lengths and base compositions are shown in Supplementary Figure S2. The presence of an isodichroic point in the temperature induced CD assays confirmed the existence of only two conformational states (33,34) for each sequence used in the study. The two-state transition for any duplex can also be extrapolated from the identical denaturation and renaturation profiles of UV melting experiments. The lack of hysteresis between the denaturation and renaturation UV melting curves of the hybrid duplexes (Supplementary Figure S2) were another indication of two-state transition between the sin-gle strands and the duplex (35,36). Therefore, the difference in heat capacities ( C p ) of the single strand and duplex state was assumed to be zero as per standard practice (16)(17)(18)20,37). The melting temperature (T m ) range of each duplex was maintained between 20-55 • C to minimize extrapolation to 37 • C. Although finite value of C p was also reported during the folding of nucleic acids (38), especially for the oligonucleotides with dangling ends (39), the temperature-dependent changes in H • and S • can counterbalance each other (38). Therefore, G • 37 and T m values are relatively insensitive to the C p compared to H • and S • (40). The G • 37 were derived from H • and S • using equation (2).
where T represents a temperature of 310.15 K used to derive free energy change at 37 • C ( G • 37 ). The standard error values for H • and S • ( H • and S • , respectively) were estimated here from the linearity of the T m −1 versus ln(C t /4) plots and those for G • 37 ( G • 37 ) were calculated using the following equation (21).

Calculation of nearest-neighbor parameters
According to the NN model, the thermodynamic parameters ( H • , S • and G • 37 ) for non-self-complementary duplex formation consists of two terms: (i) a parameter for the helix initiation with the first base pair formation in the duplex; (ii) a parameter for helix propagation that is the summation of parameters for all subsequent base pairing. All 16 NN base pairs are presented in abbreviated form. For example, 5 rAG3 /5 dCT3 is represented as rAG/dCT. The NN parameters were determined (Table 2) using a linear leastsquares computer program (Supporting Information) (41). To determine appropriate parameters, the selection of the hybrid sequences were done carefully so that all the potential 16 NN base pairs present in an almost equal number of frequencies (16). The T m at 8 M total strand concentration was predicted from the parameters obtained ( H • and S • ) using Equation (1). A website was developed to calculate the stability of any duplex by applying its sequence and salt concentration. In this website, we used reported parameters for DNA duplex (20), RNA duplex (18), and RNA/DNA hybrid in 1 M NaCl solution (16) for evaluating respective stability in 1 M salt concentration and the new NN parameters derived here ( Table 2) for determining the stability of hybrid duplex in a physiological condition. To expand the scope of using additional parameters, we designed an option named 'Manual' in the website. The details about the website is given in the Supporting Information.

Measurement of thermodynamic parameters for RNA/DNA hybrids
A total of 38 RNA sequences with different base compositions and oligomer lengths (6−14 mers) and their complementary DNA sequences were designed (Table 1) here to study thermodynamics of RNA/DNA hybrids. Parameters of H • , T S • , G • 37 and T m of all the RNA/DNA hybrids were measured in a buffer solution containing 100 mM NaCl, 10 mM Na 2 HPO 4 , and 1 mM Na 2 EDTA (pH 7.0). The measured stabilities ( G • 37 and T m ) were compared with the corresponding stabilities, which were predicted using linear Equations (4) and (5), as proposed by Nakano et al. (21) (Table 1).
The prediction errors, which represented the differences between the measured and predicted values ( Table 1) The average G • 37 and T m values for all the hybrid duplexes (10.7% and 4.9 • C, respectively) were significantly higher than the average prediction error reported (5.7% and 2.4 • C, respectively) (21). Fifteen sequences (1, 3a, 3b, 4a, 4b, 7, 12, 16, 17a, 17b, 20, 22a, 26, 29b and 30) among the 38 sequences (Table 1) showed large prediction errors for both G • 37 and T m (≥10% and ≥5 • C, respectively), along with several sequences that showed large prediction errors for either G • 37 (8, 15, 18a, 18b, 24 and 25) or T m (6, 11, 13 and 27). The large values of prediction errors for several hybrid sequences will need reconsideration for the prediction method. The duplex stabilities at a salt concentration of 100 mM were estimated using Equations (4) and (5) (21) based on the stabilities calculated in 1 M NaCl using the previous NN parameters (16). Therefore, the discrepancies between the measured and the predicted stabilities may lie either in the first step of prediction using the NN parameters in 1 M NaCl or in the second step of calculation using the linear Equations (4 and 5). For the first step, we compared the experimentally observed and predicted stabilities ( G • 37 and T m ) in 1 M NaCl solutions (Supplementary Table S1) of the several model hybrid duplexes of different lengths and base compositions selected from Table 1. As a result, the experimental and predicted stabilities ( G • 37 and T m ) of the selected sequences displayed a high similarity (Supplementary  Table S1) with low values of average prediction errors (4.6% and 1.7 • C, respectively), indicating a high accuracy of the stability prediction using the previous NN parameters (16) in 1 M NaCl. Reductions in duplex stabilities were prominent from the G • 37 and T m values measured in 1 M (Supplementary Table S1) and 100 mM NaCl (Table 1) solutions. The thermodynamic parameters of several hybrid duplexes in 1 M and 100 mM salt concentration reported previously and also obtained by our experimental measurements indicated that both H • and S • were affected by salt concentration. However, the destabilization of hybrid duplexes resulting from the reduction of salt concentration from 1 M to 100 mM were induced by the greater entropic effect. This observation also supports previous reports about the effect of salt on duplex stability (42,43). Although the error of measurements for H • and S • were much higher with respect to that for G • 37 and T m . Thus we compared here the measured and predicted values of G • 37 and T m only. The stability reductions from 1 M to 100 mM salt concentration are demonstrated by the typical melting curves of the two model hybrid duplexes rGCCAGUAGG/dCCTACTGGC (8) and rAGGAUGACCG/dCGGTCATCCT (10) in 1 M and 100 mM NaCl solutions ( Figure 1A and B). However, the CD spectra of the two model hybrid sequences (8 and 10) and also other sequences (Table 1) obtained at 4 • C in both salt concentrations indicated that there were no evident changes in the spectra between the 1 M and 100 mM NaCl solutions ( Figure 1C, D, and Supplementary Figure  S3). The results suggested that the deviations in the calculated stabilities in 100 mM NaCl were not due to any unexpected conformational changes of the hybrid duplexes caused by the reduction of NaCl concentration. The prediction in the 1 M salt concentration is highly accurate, but the accuracy is low in the 100 mM salt condition, so there is scope for improvement in the second step of the prediction.
To determine the limitations of Equations (4) and (5), we especially chose eight model RNA/DNA hybrids from Table 1 containing different base compositions. The T m s for the selected hybrid duplexes at 8 M strand concentration were measured in solutions with varying NaCl concentrations (from 1 M to 50 mM) and plotted against log [Na + ] (Supplementary Figure S4). The T m s of each hybrid duplex showed a linear dependency on log [Na + ] at the range of 50 mM to 1 M. The slope and intercept values of each straight line varied from sequence to sequence. The large slope values indicate that the destabilization of hybrid due to reduction of Na + concentration is significant. For example, the duplexes rGCCAGUAGG/dCCTACTGGC (8) and rAUGGCUCCAA/dTTGGAGCCAT (14b) had almost the same T m in 1 M salt concentration (Supplementary Figure S4); however, they showed different slopes and hence different T m at a salt concentration of 50 mM due to the presence of different base compositions. Similar results were observed for the duplexes rCGCUUGUUAC/dGTAACAAGCG (11) and rUAUCUUCCGAAU/dATTCGGAAGATA (18a) in Supplementary Figure S4. The hybrid duplex rGCCGUGAG/dCTCACGGC (3a) showed a lower slope, such that the order of the stabilities between 8 and 3a were reversed at salt concentrations of 50 mM and 1 M (Supplementary Figure S4). These results clearly indicate that the decreasing order of stability for hybrid duplex at a salt concentration from 1 M to 50 mM is dependent on the sequence components (the rG−dC or rC−dG and rA−dT or rU−dA contents in hybrid duplex, as well as the purine and pyrimidine bases in the RNA and DNA strands). Notably, with an increase in the rG−dC or rC−dG content in the hybrid duplexes and the purine base in RNA, the slope values of the linear plots tended to be significantly small (Supplementary Figure S4). Since we only considered short oligonucleotides (8-12 mer), the effect of length was assumed to be insignificant. These results suggested that hybrid stability in 100 mM NaCl solution can not be estimated with significant accuracy using the linear equations due to the impact of base composition on the hybrid stability. Therefore, an improved prediction method is required for hybrid stability including the effect of base composition.

Improvement of prediction for hybrid stability by linear equations
The effect of base components on the stability of hybrid duplexes in 100 mM NaCl solution were thoroughly studied here to improve prediction method of hybrid stability. As shown in Table 1, the measured stabilities for the rG−dC-and rC−dG-rich duplexes containing purine-rich RNA strands were much higher than the corresponding calculated stabilities. On the contrary, the measured stabilities of the rA−dT-or rU−dA-rich hybrids containing pyrimidine-rich RNA strands were lower than their predicted stabilities. Previous reports have suggested that the stabilities of hybrid duplexes in a low salt concentration is very sensitive toward purine and pyrimidine base asymmetries in the RNA or DNA strands (44,45). It was also suggested that rG−dC-or rC−dG-rich duplexes are much more stable than rA−dT-or rU−dA-rich duplexes at lower salt concentrations (44,45). To determine the individual effect of each sequence component on the hybrid stability more clearly, we defined two sequence factors, excess rG−dC or rC−dG content in duplex Nucleic Acids Research, 2020, Vol. 48, No. 21 12047 N total is the oligomer length. Based on the sequence compositions, we selected two sets of sequences (Set 1 and Set 2) using the sequences designed in this study and sequences from previous reports (Supplementary Table S2 (4) and (5) do not contain correction terms for the sequence factors. Therefore, an improvement of these linear Equations (4 and 5) is required in order to predict the hybrid stability in 100 mM NaCl with significant accuracy.
Initially, we categorized the hybrid sequences into three groups depending on the prediction errors G • 37 and T m (Table 1) and the sequence factors f (G−C) and f (rPu) (Supplementary Table S2). The hybrid sequences with measured stabilities that were higher than their predicted stabilities (Table 1) generally contained higher levels of rG−dC or rC−dG content in the duplex ( f (G−C) > 0) and higher levels of purine bases in the RNA ( f (rPu) > 0), such as the oligonucleotides rGGCAGGAAUCCG /d CGGATTCCTGCC (17a); we categorized these most stable hybrid sequences as Group A. In contrast, the hybrids that had measured stabilities that were lower than the predicted stabilities (Table 1) generally contained lower levels of rG−dC or rC−dG in the duplex ( f (G−C) < 0) as well as lower levels of purine bases in the RNA ( f (rPu) < 0), such as rUAUCUUCCGAAU/dATTCGGAAGATA (18a); these less stable hybrid sequences were allocated to Group B. Finally, Group C was comprised of all of the other mixed sequence contents, such as a higher rG−dC or rC−dG content in the duplex and fewer purine bases in the RNA, i.e. rCGGAUUCCUGCC/dGGCAGGAATCC G (22a), a lower rG−dC or rC−dG content in the duplex with more purine bases in the RNA, i.e. rAAUGGAUUA CAA /dTTGTAATCCATT (19a), or equal distributions of rG−dC or rC−dG and rA−dT or rU−dA content in the duplex and the same purine and pyrimidine bases in the RNA ( f (G−C) = 0 and f (rPu) = 0), i.e. rAUGGC UCCAA/dTTGGAGCCAT (14b). The stabilities of the hybrid duplexes under Group C (14b, 19a, and 22a) could be predicted more precisely using Equations (4) and (5) (Table 1) than for Group A (17a) or Group B (18a). The measured G • 37 in 100 mM NaCl versus predicted G • 37 in 1 M salt were plotted separately for the hybrid sequences of all three groups (Group A, Group B, and Group C). The obtained independent plots showed a good linear correlation ( Figure 2). The linear equations corresponding to the hybrid sequences of the above-mentioned groups are as follows (Equations 6-8): Group C : When we plotted the measured G • 37 in 100 mM NaCl versus predicted G • 37 in 1 M salt using all the hybrid sequences (Supplementary Table S2) as suggested in previous method (21), a correlation was found ( Supplementary Figure S6), although that was not satisfactory. A similar trend was also observed for the plot of measured T m in 100 mM NaCl versus predicted T m in 1 M salt (data not shown). The linear equation corresponding to Group C (Equation 8) was very similar to Equation (4), hence the stability of the sequences from Group C could be predicted well using the previous linear equation. However, the other two linear equations, corresponding to Group A and Group B (Equations 6 and 7, respectively), were different, mainly due to the large differences between the intercept values. Therefore, we assumed that the effect of salt concentration on the stability of hybrid duplexes belonging to Group C is less dependent on the sequence factors compared to that of Group A and Group B. Lesnik and Freier proposed a relationship between the helix conformation and thermodynamic stability of hybrid duplexes (45). Comparing the CD spectra (Supplementary Figure S3) with the reported structural analysis (45,46), it was observed that the hybrid duplexes of Group A generally show A-like conformation which is very similar to RNA duplexes. The conformations of hybrid duplexes belonging to Group B are significantly deviated from the A-like conformation. Hence, it is possible that the two different conformations were affected by the salt concentration in different manners due to the differences in their Na + ion uptake during duplex formation. As hydration has a significant role in duplex formation, it is also possible that changes in water activity due to the change of salt concentration could affect duplex stability. However, the change of water activity by changing salt concentration from 1 M to 100 mM was reported to be small (22). Furthermore, the hybrid stability of the sequence which belongs to these groups could not be predicted by one linear equation, therefore at least three different linear equations for the three groups of hybrid duplexes would be needed. The three linear Equations (6)- (8) can be used to calculate the hybrid stabilities with a better accuracy than that with Equation (4), although no correction term for the sequence was included. The incorporation of correction factor could improve the prediction ability; however, the equation will no longer be linear and will become more complicated. Therefore, we required  Table 1 and Supplementary Table S2. a simple and precise prediction method for hybrid stabilities under physiological conditions able to evaluate the effect of the sequence on stability. The NN model can explain the individual contribution of changes in enthalpy, entropy, or free energy for the formation of each subsequent base pair. Therefore, problems arising in the stability prediction due to the effect of sequence could be solved easily, such that the free energy change as well as the melting temperature for duplex formation could be evaluated more precisely.

Improvement of prediction for hybrid stability using new nearest-neighbor parameters
The NN hypothesis assumes that duplexes with identical NN base pairs have identical melting behaviour and similar thermodynamic parameters for duplex formation. Therefore, to derive NN parameters for RNA/DNA hybrids in 100 mM NaCl solution, it is essential to verify the validity of the NN model under the same condition. For this purpose, two hybrid duplexes (3a and 3b) with identical NNs were chosen from Table 1. rGCCGUGAG/dCTCACGGC (3a) = rGC /dGC + rCC/dGG + rCG/dCG + rGU/dAC + rUG/dCA + rGA /dTC + rAG/dCT = rGAGCCGUG/dCACGGCTC (3b) The melting curves and the plot of T m −1 versus ln (C t /4) were compared for the pair of hybrid duplexes 3a and 3b (Supplementary Figure S7), which showed similar melting behavior and a similar T m −1 versus ln(C t /4) plot. A total of eight pairs of hybrid duplexes with identical NNs were given as Xa and Xb (X = 3, 4, 14, 17, 18, 19, 22 and 29) in Table 1. The average differences between the measured parameters ( H • , T S • , G • 37 and T m ) for all the pairs (Xa and Xb) were 8.3%, 9.2%, 3.5% and 1.1 • C, respectively. Similar range of differences in the measured thermodynamic parameters (7.7%, 8.2%, 6.5% and 2.3 • C, respectively) for the hybrid sequences with identical NNs were reported as a result of the experimental error (16) in the testing of the validity of NN model for hybrid duplexes in 1 M NaCl solution. Therefore, the hybrid duplexes follow the NN model even in 100 mM NaCl solution.
The NN parameters of H • , S • , and G • 37 ( H • NN , S • NN and G • 37NN , respectively) for the RNA/DNA hybrid in 100 mM NaCl buffer solution were obtained using the linear least square computer programming and the  derived parameters for the 16 NN base pairs and two initiation for rG−dC or rC−dG and for rA−dT or rU−dA formations are provided in Table 2. In the improvement of NN parameters for hybrids, we have tried to incorporate an extra term for terminal rA−dT or rU−dA since terminal A−U base pair was introduced previously to improve the stability prediction of duplexes (19), and the term was applied to define the difference in the number of H-bonding between G−C and A−U base pairs at the terminals. However, the incorporation of the terminal rA−dT or rU−dA did not result any further improvement in the predictions here. The two different initiation terms corresponding to the rG−dC or rC−dG and the rA−dT or rU−dA formations are perhaps already compensating for the effect of extra Hbonding of terminals in hybrid stability. The 16 NN parameters derived in this study have a major enthalpy contribution in helix propagation and a major entropy contribution in helix initiation. For example, in the formation of the subsequent base pairs rAA/dTT in 100 mM NaCl buffer at 37 • C, the enthalpy contribution ( H • NN ) is −7.8 kcal mol −1 , while the entropy contribution (T S • NN ) is −7.1 kcal mol −1 , which is less than the enthalpy contribution. To derive the initiation parameters, we assumed that the helix initiations were negatively controlled by the entropy factor, due to the restriction of individual translational and rotational movements of the single strands by converting two particles into one. Hence, the changes in the enthalpy for both initiation terms were assumed to be zero (20), which also minimizes the error for all other H • NN parameters. The comparison between the NN parameters in 1 M (Supplementary Table S3) and 100 mM NaCl ( Table 2)  These observations confirmed the sequence-dependent reduction in stability as a result of a decrease in salt concentration, which correlated well with previous reports (47)(48)(49)(50). To reduce the effort for calculating hybrid stability in 100 mM NaCl condition by our parameters (Table 2), we designed the website located at https://drive.google.com/ open?id=1xF0TianMpQ6rwszN9gsweu7C5Wnzq0kF. We can also use the website to evaluate hybrid stability and the stability of DNA and RNA duplexes in 1 M NaCl solution using the reported parameters (16,18,20).
Our parameters were able to predict the stability of designed 38 RNA/DNA hybrids (Table 1) adequately, with average values of prediction error for H • , S • and G • 37 of 9.0%, 10.1% and 2.9%, respectively, and for T m of only 1.1 • C. The prediction of stability for the hybrid duplex was much improved by the obtained parameters compared to the linear Equations (4 and 5) (21). Even the new NN parameters can better predict all thermodynamic parameters of hybrid duplex than the recently published parameters by Barbosa et al. (28). The parameters obtained from the T m optimization method (28) can be used to accurately estimate the T m of our 38 designed sequences with the prediction error value only 2.2 • C, however the prediction error for H • , S • and G • 37 came out much higher (20.3%, 22.2% and 6.5%, respectively) compared to the prediction error by our parameters. The reason behind the large prediction errors for H • , S • and G • 37 by the computational method may be due to the only optimization of T m values reported in previous studies (21,45,46), while no optimization was done for other thermodynamic parameters. Although melting temperature is sensitive to the strand concentration, thermodynamic parameters ( H • , S • and G • 37 ) are not. We could predict the T m values at any strand concentration for a duplex using these parameters. Therefore, thermodynamic parameters have significant importance and optimization of all the parameters is required. Moreover, large differences were observed between our parameters (Table 2) and the parameters derived by Barbosa et al. For example, our parameters showed the increasing order of G • 37NN for NNs as rGC/dGC < rCC/dGG < rCU/dAG, which was expected and observed for the parameters in 1 M NaCl (16). However, according to the parameters by Barbosa et al., the G • 37NN values of these three NNs were the same. Besides the formation of the subsequent base pairs rGA/dCT with purine-rich RNA has been reported by Barbosa et al. as less stable than that of rCA/dGT which is also not relevant based on our experimental observations. In that theoretical study (28), Barbosa et al. derived parameters via optimization of T m of reported sequences (21,45,46) containing an unequal number of frequencies for 16 NNs. By contrast, in our study, to derive new parameters, we selected sequences such a way that the number of all NN base pairs were as equal as possible. Therefore, the new parameters can accurately predict all the thermodynamic parameters of hybrid duplexes under a physiological condition.
Since the NN parameters were derived here using the measured thermodynamic parameters of the 38 sequences (Table 1), the new NN parameters would clearly be able to predict the stabilities of the 38 hybrid sequences with more accuracy. Therefore, we compared the stabilities of a new set of 26 hybrid duplexes previously measured by other groups (21,45,46) with the stabilities predicted using the new parameters (Supplementary Table S4). Comparison showed that the new parameters can predict well the hybrid stabilities, as the average values of prediction error ( G • 37 and T m ) for the 26 sequences came out at 8.5% and 2.7 • C, respectively. Previously, NN parameters were also improved which estimate duplex stabilities with similar prediction errors (20,37). The new NN parameters can only predict hybrid stability of oligonucleotides like the previous NN parameters that were reported for stability prediction of different nucleic acid duplexes (16,20). We have tested the parameters by predicting the stabilities of a total of 64 hybrid sequences with lengths between 6 and 16 mer and obtained reasonable average prediction errors for G • 37 and T m (4.7%, and 1.6 • C, respectively). Oligomers shorter than 6 mer generally give a T m much lower and too far from 37 • C which violates the assumption for determining H • and S • , and since the determination of T m depends on the upper and lower baselines of the melting curves, obtaining a reliable T m from the melting curves was in some cases difficult for much shorter and longer sequences. On the other hand, longer oligomers have a higher probability to form intramolecular folding that induces a non-two-state transition from single strand to duplex (51).

Prediction of hybrid stability in physiological salt condition
In the living cell, apart from Na + ions, monovalent cations like K + and various divalent cations are also present. The concentrations of Na + and K + ions are reversed for the intracellular and extracellular condition. Potassium ion concentration is dominating inside a resting mammalian cell, whereas extracellular fluid mainly contains Na + ions (29). As both mentioned monovalent cations are dominating in respective environments and they play important functions and regulations, it is also important to know the effect of KCl along with the NaCl salt concentration on the stability of hybrid duplexes. Therefore, we also studied hybrid stabilities in 100 mM KCl. Sequences from Group A (10, 17a), Group B (18a) and Group C (9) were selected from Table  1 to verify the efficacy of the new parameters in predicting hybrid stabilities even in 100 mM KCl solution. We determined the thermodynamic parameters of the hybrid sequences (9, 10, 17a and 18a) in a buffer solution containing 100 mM KCl, 10 mM K 2 HPO 4 , and 1 mM K 2 EDTA (pH 7.0), and compared them with the stabilities predicted using the new parameters (Table 2) in Supplementary Table  S5. The stabilities ( G • 37 and T m ) measured in the buffer containing 100 mM KCl were plotted ( Supplementary Figure S8) against that measured in the buffer containing 100 mM NaCl (Table 1). All four points in the plots of G • 37 and T m in Supplementary Figure S8 appeared on the diagonal lines suggesting a similar effect of monovalent Na + and K + cations on hybrid stabilities. This observation indicates that the duplexes were stabilized to the same extent in the presence of both cations (Na + and K + ) in the same concentration as reported earlier (21,26,52). A comparison (Supplementary Table S5) of the measured stabilities of the hybrid duplexes in 100 mM KCl and the corresponding stabilities predicted using the new parameters clearly displayed that the new parameters (Table 2) can predict the stability ( G • 37 and T m ) of hybrid duplexes in physiological conditions even in the presence of K + ions with average prediction errors of 7.3% and 2.4 • C, respectively.
Living cell also contains various divalent cations. To evaluate hybrid stability in realistic physiological salt condition, we further measured the stabilities of hybrids in a solution containing a composition of different cations similar to mammalian intracellular environment (29). A solution containing 120 mM KCl, 10 mM NaCl, 1 mM MgCl 2 , 0.2 M CaCl 2 and 10 mM K 2 HPO 4 (pH 7.0) was selected to mimic the intracellular conditions of a typical mammalian cell. We selected three sequences from three different groups (Group A, Group B, and Group C) to measure hybrid stabilities under the above mentioned physiological condition.
To examine the improvement of accuracy of the stability prediction by our parameters, we compared hybrid stabilities measured in a typical physiological condition with that obtained for the corresponding sequences in 1 M and 100 mM NaCl solutions using melting curves (Supplementary Figure S9). The comparison indicated that hybrid stability measured in a typical physiological condition is close to that obtained in 100 mM NaCl, however far from that obtained in 1 M NaCl solution (Supplementary Figure S9), and thus can be better predicted from the improved parameters determined in 100 mM NaCl solution. Excluding monovalent cations like Na + and K + , Mg 2+ is the most abundant cation present in cells with a varying concentration depending on cell type and the specific compartment of the cell. The concentration of free Mg 2+ is maintained between 0.4 to 1.2 mM (53). The effect of Mg 2+ ion on the stability of DNA duplex is dependent on the concentration of not only Mg 2+ ion but also Na + ion in the solution (21,26). In a solution containing 100 mM NaCl, if the concentration of Mg 2+ ion increases above 1 mM, it can stabilize duplexes as only Mg 2+ can stabilize in the absence of NaCl. On the contrary, if Mg 2+ concentration remains below 1 mM in a solution of 100 mM NaCl, it can stabilize duplexes as only Na + can stabilize in absence of Mg 2+ ions (21,26). Therefore, a mixture of 1 mM MgCl 2 and 100 mM NaCl in 10 mM Na 2 HPO 4 buffer solution (pH 7.0) was chosen here to study the effect of Mg 2+ ions in the presence of Na + ions on the stability of different groups of hybrid sequences. We measured the thermodynamic parameters of the three selected hybrid sequences (17a, 23, and 21 from Table 1) from the three different groups (Group A, Group C and Group B, respectively) and compared their stabilities in 100 mM NaCl buffer solution with and without 1 mM Mg 2+ ion concentration (Supplementary Table S6). The stabilities ( G • 37 and T m ) measured in both buffer solutions were plotted against each other (Supplementary Figure S10). The stabilities of all hybrid duplexes were close to the diagonal line of the plots, revealing that the new parameters can be applied to predict hybrid stability in a solution of 100 mM NaCl containing up to 1 mM of Mg 2+ ions. However, the stabilities of the sequence (17a in Supplementary Table S6) from Group A was observed to deviate slightly (Supplementary Figure S10A). In the presence of Mg 2+ ions, the sequence from Group A is slightly more stabilized compared to the other sequences from Group C and Group B. Interestingly, the hybrid duplexes of Group A contain rG-dC or rC-dG rich hybrid sequences with purine rich RNA. The stronger stabilization of the hybrid sequences of Group A by Mg 2+ ions can be explained by a previous report which suggests that Mg 2+ preferentially binds with RNA duplexes (54), as well as with the N7 of purine bases (55). We already observed that an RNA/DNA hybrid having purine-rich RNA and belonging to Group A obtains a more A-like conformation, similar to an RNA duplex (46,56), compared to sequences belonging to Group B and Group C. The preferential binding of Mg 2+ to the deeper major groove of the A-form or A-like conformation over the wider major groove of the B-form or B-like conformation was already known (54). Thus the differences in stabilities for different groups of hybrid sequences in the presence of Mg 2+ can be attributed to the different helical structures of hybrid duplexes. However, in the presence of 1 mM Mg 2+ ion the differences were not considerable. Since the hybrid duplexes from the three groups exhibited similar stabilities in the absence and presence of 1 mM Mg 2+ ion in 100 mM NaCl solution, our parameters can be applied to predict the stability of hybrid duplexes in physiological salt conditions even in the presence of divalent cations.

Prediction of hybrid stability in therapeutic application
Improvements in stability prediction of hybrid sequences could be used in therapeutic applications. Hybrid sequences and their stabilities are important factors for inducing cleav-Nucleic Acids Research, 2020, Vol. 48, No. 21 12051 age efficiency in the RNase H and Cas9 enzymes used in the ASO and CRISPR techniques, respectively (11,12). To enhance the activity of RNase H and Cas9, different strategies have been applied (57)(58)(59) and the use of truncated ASO (antisense oligonucleotide) or sgRNA (single guide RNA) has been proven as an important technique for reducing offtarget binding without changing on-target efficiency. Zhang et al. proposed in vitro pre-screening of sgRNA showing that truncated sgRNA enhanced target specificity and 16 mer sgRNA was found to be the most efficient one for the used target site (60). Different groups published in vivo studies where reducing the length of sgRNA from 20 to 17 nucleotides resulted in an improvement of target specificity for the GFP reporter gene, however, significant shortening of sgRNA (below 17 mer) can also cause unsatisfactory cleavage of target DNA in CRISPR-Cas9 technique (61,62). Similar studies have also been published in association with ASO drug design. The target specificity of ASO can be improved by shortening the length of modified ASO from 20 mer to 13 mer for targeting Apolipoprotein B (63). The length factor has been continuously considered as an important parameter to improve not only the on-target specificity but also cleavage efficiency. Lloyd et al. reported the cleavage efficacies of RNase H using oligonucleotides of different lengths (8-20 mer) upon TNFα mRNA target sites and a decreasing order of cleavage efficacy was found as 16 mer > 12 mer > 20 mer > 8 mer (64). Moreover, an extensive theoretical-experimental comparison study revealed a strong correlation between the structure and stability of hybrid duplexes and the cleavage efficiency of Cas9, which suggested that hybrid duplexes formed by 8-17 mer sgRNA provided a major contribution for the higher efficiency (65). Previous studies have clearly indicated that oligonucleotides of optimal length hybridize with the target site with optimum binding affinity that probably results in maximal specificity and efficiency (60,64,65). Although optimal length varies depending on the target site or other modifications, the optimal binding affinity could be the same. If more than one potential target sites have similar binding affinity, the most favorable target site should be the one that having minimum off target binding sites. The sequence of potential off target binding sites is also important, however the length shortening up to the optimal length was only developed here using the optimized binding affinity. Since free energy change ( G • 37 ) is directly proportional to the logarithm of the binding affinity which is quantified by dissociation constant between the oligonucleotide and the target, the optimum value of G • 37 can give an indication of maximum cleavage efficiency and target specificity. A kinetic study also found supporting evidence for the hypothesis that the increase of cleavage efficiency of RNase H results from binding affinity rather than length (66). Therefore, prediction of G • 37 with significant accuracy for hybridization in physiological conditions is essential for drug design in oligonucleotide based therapeutics like ASO and CRISPR-Cas9.
Here we present a correlation between the G • 37 predicted by the derived parameters (Table 2) and their knockout efficiency from the previous reports (60)(61)(62)67) in Supplementary Table S7. At first, the predicted G • 37 for hybrid duplexes of truncated sgRNAs (17 mer) and their knockout efficiencies were compared (Supplementary Table S7). As a result, we found that not all the truncated sgRNAs (17 mer) have the maximal efficiency. However, the cleavage efficiencies of all 17 mer sgRNAs were highly correlated with the hybrid stabilities predicted by our parameters. For example, the sgRNA binding target sites with predicted G • 37 above −20 kcal mol −1 according to our parameters showed maximum knockout efficiency and the efficiency decreased with lowering the value of predicted G • 37 below −20 kcal mol −1 . When the predicted G • 37 value remained between −18 and −20 kcal mol −1 , the truncated sgRNA could even show significant efficiency. However, truncated sgRNA which can hybridize with less stability than −18 kcal mol −1 displayed a sudden decrease in efficiency. Thus, the stability predicted by our parameters can explain the differences in knockout efficiencies for sgRNA of the same length. We also included other sgRNA with different lengths (13-20 mer) studied by several groups (60)(61)(62)67) in the comparison (Supplementary Table S7). The comparison similarly supported the correlation between hybrid stabilities estimated by our parameters and the Cas9 cleavage efficiencies. Therefore, to design sgRNA of maximum specificity and efficiency, we optimized the predicted value of G • 37 in a physiological condition to be −20 kcal mol −1 using our parameters, whereas previous parameters (16) could not be used for this optimization. When the knockout efficiency of each sgRNA was compared with the stability ( G • 37 ) in the physiological salt condition calculated using the old parameters applying the linear Equations (4) in Supplementary Table S7, significant correlation was not found. Initially, large differences were observed between the hybrid stabilities for each sgRNA predicted by old and new parameters in the physiological condition. As we demonstrated that the new parameters can better predict hybrid stability compared to the old ones in the physiological salt condition, the stability calculated by the new parameters can be considered more reliable. The predicted stabilities determined in physiological condition using old parameters were much lower which can mislead the assessment of optimal binding affinity of hybridization as well as optimal length of sgRNA. The hybrid stabilities predicted using old parameters cannot explain the phenomenon observed in the knockout experiments that the new parameters can, such as the differences seen in the knockout efficacies when applying truncated sgRNA (17 mer). Therefore, the new NN parameters will be more applicable not only for the precise prediction of hybrid stability in physiological salt condition but also to help in the design of oligonucleotide based therapeutics.
Moreover, an enhancement of binding affinity and hence also cleavage efficiency were observed as a result of using a purine rich sgRNA sequence which targeted a pyrimidine rich site (12), and the observation can be also explained by our NN parameters. In physiological condition, the rG−dC or rC−dG rich NN base pairs and also the NNs containing RNA with higher purine content are much more stabilized than others, which induce a higher binding affinity. Therefore, we can apply truncated sgRNA with much shorter length, thereby inducing higher specificity and efficiency. For example, the pyrimidine-rich DNA 20 mer replicates of the MALAT1 oncogene ( Supplementary Fig-ure S11) were reported to be cleaved the fastest among all replicates with a changing sequence content according to the quantitative CRISPR PCR performed by Terrazas et al. (12). Herein, the stability ( G • 37 ) of the corresponding hybrid duplex formation with the DNA 20 mer replicate was calculated using the old and new parameters simultaneously (Supplementary Figure S11). The predicted G • 37 was −20.1 kcal mol −1 according to the old parameters. However, the new parameters can estimate the stability with more accuracy and the predicted G • 37 obtained using the new parameters was −24.6 kcal mol −1 , indicating a much higher stability than that calculated using old parameters applying the linear Equations (4) (Supplementary Figure S11). Therefore, to attain the optimum stability required for maximum cleavage efficiency and specificity in gene editing by Cas9, sgRNA length can be shortened from 20 mer to 16 mer for the target site (Supplementary Figure S11). Although continuous 'G'-rich residue in sgRNA can be expected to hybridize with much higher stability, the efficiency was sometimes observed to be unexpectedly low (60)(61)(62)67). The formation of intramolecular stable conformations by 'G'-rich oligomers could explain the exceptional correlation seen between the efficiency and the G • 37 predicted by our parameters. The precise prediction of hybrid stability in physiological condition can also be used to design truncated ASO having optimal binding affinity for any target site inducing higher specificity and efficiency of RNase H cleavage. However, modified DNA is generally used as ASO drugs to improve bio-stability and cellular uptake of the oligonucleotides. The stability of hybridization with ASO is reported higher than that with unmodified DNA (68,69). Therefore, truncated ASO with a much shorter length than the sgRNA can be used to obtain optimal binding affinity. A recent report clearly suggested that duplex stability of modified ASO can be correlated with the stability of unmodified duplex in 100 mM NaCl solution (68). The stability changes of hybrid duplexes in physiological salt concentration resulting from replacing DNA with different kinds of chemically-modified ASO were published earlier (68,69). Increases in melting temperature ( T m ) per modification were reported for several modified ASO in comparison to unmodified DNA. Therefore, we can predict the stability (T m ) of several RNA/ASO hybrids from the stability of RNA/DNA predicted by our new NN parameters in the physiological salt condition. The optimal stability predicted by our parameters can help in the selection of different target sites and in shortening the sgRNA and ASO up to the optimal length to improve target specificity and cleavage efficiency (70).
In summary, we derived a set of NN parameters to calculate hybrid stability in 100 mM NaCl buffer with significant accuracy. The effect of different cations like Na + , K + and Mg 2+ which are the major components occurring in physiological conditions on hybrid stability were investigated here. The applicability of our new parameters in predicting stability of hybrids even in 100 mM KCl and 100 mM NaCl in the presence of 1 mM Mg 2+ was verified. Therefore, the improved parameters can be applied to predict stability of hybrid duplexes in different physiological conditions of extracellular and intracellular fluids. We also demonstrated that the derived NN parameters can be used in shortening of sgRNA and ASO up to an optimal length to enhance the target specificity and efficiency in CRISPR-Cas9 and antisense technique, respectively.