Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes

NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3–0.6 ppm and correlation coefficients (r2) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization.


INTERCHANGING OF STEREOSPECIFIC ASSIGNMENT IN O1 OPERATOR DNA
: Spatial distribution of the errors in the calculated 1 H NMR chemical shifts in the complex of two lac repressor molecules (yellow and green surface) bound to their natural operator O1 (PDB entry 1L1M, BMRB entry 5345): (upper part) original assignment stored in the BMRB, (lower part) optimized assignment by interchanging the stereospecific chemical shifts of amino groups in guanidine and cytosine. The improvements for all bases except DG8, for which no stereospecific assignment is available, can clearly be seen.

Double Helices: Small Palindromic DNA Duplexes
Two small palindromic double-helical DNA structures were analyzed. The first is the d(CGTACG)2 sequence (PDB entry 1K2K, BMRB entry 5339) investigated by Lam and Ip (1) at low temperature.
The second sequence d(TGATCA)2 (PDB entry 1SY8, BMRB entry 6186) was investigated by Barthwal et al. (2) as a model system, which includes the 5'd-TpG 3' element involved in the regulation of gene expression as well as targeted by several intercalating anticancer drugs. The correlation of the calculated isotropic NMR chemical shieldings and the experimental chemical shifts are shown in Figure S6. For 1SY8, we will only discuss the 1 st of the 10 model structures stored in the PDB file. The results obtained with the others show very similar behavior (data not shown). Very good correlation coefficients and slopes close to the optimal value of -1 are obtained for both systems. The slight advantage of the 1K2K structure compared to 1SY8 can probable be explained by the low temperature of the measurement and the resulting reduced thermal motion at these temperatures. In the original publication, structures at 5, 10, and 15°C were compared. Our calculations are based on the 15°C structure. For further analysis of the errors in the calculations, we converted the chemical shieldings into calculated chemical shifts as described in the Material and Methods section. For the chemical shielding of the standard values of 32.29 and 32.15 ppm were obtained for 1K2K and 1SY8, respectively, reasonably close to the quantum chemical shielding of 32.24 ppm determined for TMS with the same level of theory / basis set combination. When looking at the spatial distribution of the errors in 1K2K ( Figure S7), a concentration of the errors at the termini becomes evident. This can be explained by the experimental findings that at higher temperature the stacking interactions among the central base pairs improve facilitated by fraying of the termini (1). Larger flexibility leads to larger uncertainties in the calculated chemical shifts as shown by the large improvements, which can be obtained when including conformational averaging as described in our previous publications (3;4). In 1SY8, the errors are distributed more evenly (see also Figure S7), which agrees with the experimentally confirmed larger flexibility of the complete structure especially of the sugar-phosphate backbone with rapid inter-conversion between two different conformations at the TpG/CpA base pairs (2). To cover this, a multi-conformation approach is essential (3;4). There is, however, an additional explanation for the large errors at the sugar moiety of T1. The terminating phosphate group is not correctly represented in the PDB file, i.e. one oxygen atom is missing and the remaining three remaining substituents and the phosphorus are arranged in a plane, which has a significant influence on the calculated chemical shifts in close proximity.  acceptance as substrates by RNase H1. More specifically, they solved two structures of modified DNA/RNA hybrids using a large number of experimental constraints including RDCs and backbone torsion angles and found helical properties between those of an A and B helix. We predicted the NMR chemical shifts for one of these (central stereo specific Rp borano phosphate linkage, PDB entry 2LAR, BMRB entry 17535). The results are very comparable to 2L8P with excellent overall correlation but larger errors for the protons involved in the central hydrogen bond between the base pairs (see Figure S8). Removing those from the correlation results in an almost perfect slope of -0.983. To follow our hypothesis of too long and too weak hydrogen bonds produced by the modelling method during the structure-determination procedure, we plot the error in the predictions versus the distance of the central hydrogen bond (see Figure S9). Even if we expected some correlation, the almost perfect linear correlation was a surprise. Only one of the termini (U18) shows a larger deviation from the correlation line. A somewhat larger distance and, thus, smaller relative error is expected due to fraying of the termini. This definitely does not imply that one should scale the distances linearly to the errors to get to the "real" system. It should, however, be checked if distance constraints between the base pairs or improved force-field parameters forcing them closer together could help to improve the predictions of these and other nuclei close in space.

Non-Canonical Structures: Riboswitch N1 with Bound Ribostamycin
We now look at the very important class of riboswitches. The at-the-time-of-writing smallest riboswitch N1 functional in vivo was investigated by NMR spectroscopy (6). It contains only 27 nucleotides in a bulged hairpin secondary structure and responds to the antibiotics neomycin B and ribostamycin. In

Nucleic Acid / Protein Complexes: Antennapedia Homeodomain-DNA Complex
One of the first DNA-protein complex solved by NMR spectroscopy is the Antennapedia homeodomain-DNA complex. It was published in 1993 by the Wüthrich group (7). We again used the first model to predict the chemical shifts. The quality of the results of the same high accuracy as for the double helix structures described above (see Figure S11). Beside the high-field shift of polar protons forming hydrogen bonds in the base pairs, some protons of the bases forming hydrogen bonds to the solvent are also predicted with relative large errors. Other than that, no large deviations even at the interface to the protein are seen ( Figure S11). This speaks, on the one hand, for the excellent work done by Billeter et al. (7) more than 20 years ago and, on the other hand, the general applicability of our chemical shift calculations. In 1998, Fraenkel and Pabo compared the NMR structure to the newly determined X-ray structure of the same complex (8). They identified larger differences in the interactions of the three amino acids ARG43, GLN50, and ASN 51 with A20, C7, and A19/A20, respectively. Since the predictions of the chemical shifts in these regions do not deviate more than in other regions, our calculations support the original NMR solution structure or at least do not disprove it. Calculations are on their way to analyze the agreement between the X-ray structure and the NMR chemical shift. If these show large deviations in the regions around the three amino acids, the structural differences have to be caused by crystal packing effects.