Consecutive non-natural PZ nucleobase pairs in DNA impact helical structure as seen in 50 μs molecular dynamics simulations

Abstract Little is known about the influence of multiple consecutive ‘non-standard’ (Z, 6-amino-5-nitro-2(1H)-pyridone, and P, 2-amino-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one) nucleobase pairs on the structural parameters of duplex DNA. P:Z nucleobase pairs follow standard rules for Watson–Crick base pairing but have rearranged hydrogen bonding donor and acceptor groups. Using the X-ray crystal structure as a starting point, we have modeled the motions of a DNA duplex built from a self-complementary oligonucleotide (5΄-CTTATPPPZZZATAAG-3΄) in water over a period of 50 μs and calculated DNA local parameters, step parameters, helix parameters, and major/minor groove widths to examine how the presence of multiple, consecutive P:Z nucleobase pairs might impact helical structure. In these simulations, the PZ-containing DNA duplex exhibits a significantly wider major groove and greater average values of stagger, slide, rise, twist and h-rise than observed for a ‘control’ oligonucleotide in which P:Z nucleobase pairs are replaced by G:C. The molecular origins of these structural changes are likely associated with at least two differences between P:Z and G:C. First, the electrostatic properties of P:Z differ from G:C in terms of density distribution and dipole moment. Second, differences are seen in the base stacking of P:Z pairs in dinucleotide steps, arising from energetically favorable stacking of the nitro group in Z with π–electrons of the adjacent base.

The most comprehensive study on natural nucleic acid helix parameters was conducted by Pasi et. al. ; they studied natural nucleic acids for a full microsecond. Our study followed the same protocols as theirs within the following exceptions: They filtered out structures in which the Watson-Crick hydrogen bonds are broken from their statistics; we performed no filtering of the data. We used a cutoff of 8.5Å for non-bonded interactions, they had 9.0Å. Our calculation ran for 50 μs whereas they ran for 1 μs. All other aspects of simulation of the GC simulations are identical to their work. We did not elect to perform the filtering of non-Watson-Crick base pairs for the sake of comparison to experiments in the future. They noted a desire to convey more numerically realistic/stable helix parameters for this filtering; we wish to compare to experiment (and true base pairs do occasionally lose their hydrogen bonds within the helix, as rare events, albeit on the ms time scale). We believe the difference in non-bonded interaction cutoff to be trivially different. The authors did not report their SEM values, and thus we cannot perform a fair comparison of statistical agreement (we do not know their uncertainty windows). We can assume that their values are the "known" values of benchmark against which to compare (although our increased sampling might argue our numbers are a better benchmark). We opt to use a 95% confidence interval for our two-sided t-test for our 150 samples (there were 150 independent trajectories run and analyzed individually), giving t=1.96. We follow the 2 significant figure convention in intervals. I here discuss the logic of digits assigned in the main paper text. In the vast majority of cases, the precision of each number (as judged by the 95% confidence interval t-test) is far greater than the accuracy of these numbers could possibly be. Consequently, I have arbitrarily truncated all lengths to 0.01Å and 0.1 degrees. We list the full precision of our numbers here to give proper statistical due diligence, but numbers in the main text are truncated for ease of reader understanding. As may be seen, we are in statistical agreement with Pasi et. al. for all parameters, given that our data must be rounded to one digit in both the average and the standard deviation. The one exception is slide, in which case both the average and standard deviation are in disagreement. However, -0.9Å is very close to -1.1Å in the means, and 0.5Å standard deviation is close to 0.6Å. Table S5. Comparison of all local parameters for the PZ-and GC-containing oligonucleotides.
The left-hand column gives the Watson-Crick pair as well as its rung position in the DNA helix; each rung in the helix is given twice, once for variable PZ pair, once for the control GC helix. Each table entry gives the mean value of the parameter in question as well as its standard deviation, reflecting the range of the distribution of the parameter over time. For emphasis, this is not precision error; this is the distributional variation. The precision errors are based on a 95% confidence interval with respect to the standard error of the mean; this is true for all data analyzed in this paper. In each case, the standard error of this mean is at most 1° for angular quantities and at most 0.2Å for length quantities. Additionally, we note that there is a symmetry to all data. Helix rung position 6 is equivalent to position 11, as 7 is to 10 and 8 is to 9. When discussing the data, we will restrict analysis to discussing positions 6-8 for simplicity, since the same conclusions apply to the symmetry counterparts. The fact that the calculated data matches so perfectly to even as small as ±0.1° speaks to the precision of the data due to complete phase space sampling. We emphasize that the act of reporting a mean and standard deviation is not to imply a Gaussian distribution, merely as a descriptor of the data set. It is also of interest to note that the middle values in the helix are always slower to converge than the outer rung values (i.e., position 8 converges in all parameters, not just locally, more slowly than positions 7 and 6). It is of interest to see the particular influence of the helix rung position on each nucleotide/parameter; as there are at least 5 nucleotides before the helix end, this is not interpreted to be an edge effect.  Figure S1. Representative histograms for selected helical parameters for the PZ-containing (blue) and the control (red) oligonucleotide in the MD simulations. Density of state is defined as the number of structures for which the parameter falls in a defined range divided by the total number of structures sampled in each trajectory. Figure S2. RMSD fluctuations over a 400 ns period in the MD trajectories computed for the (top) PZ and (bottom) GC duplexes. RMSD values are computed relative to an equilibrated structure at the beginning of the production part of the simulation. These data show that there is substantial interconversion between members of the two populations observed in the bimodal distributions for slide, twist, and htwist in PZ (see main text). In each of these plots, values for the specified parameters in the PZ helix rapidly oscillate between two extreme values rather than staying in one extreme or the other for any given length of time. The tendency of the GC-rich duplex to adopt similar conformations is also evident in these data. Figure S3. Representative PZ and GC structures from MD simulations. Stick models are shown for ZP in an extended conformation (far left), ZP in an A-like conformation (center), and GC in a B-like conformation. Views are shown parallel (upper panels) and perpendicular (lower panels) to the helical axes. Pulled from cc-cd-cd-no GAFF param CA-CT-CT-HC 1 0.156 0.000 3.000 same as X -c3-c3-X CA-CT-CT-CT 1 0.156 0.000 3.000 same as X -c3-c3-X CA-CT-OS-CT 1 0.383 0.000 3.000 same as X -c3-os-X C -CA-CA-CA 1 6.650 180.000 2.000 same as X -c2-c2-X C -CA-CA-HA 1 6.650 180.000 2.000 same as X -c2-c2-X C -CA-CT-H1 1 0.000 0.000 2.000 same as X -c2-c3-X C -CA-CT-CT 1 0.000 0.000 2.000 same as X -c2-c3-X C -CA-CT-OS 1 0.000 0.000 2.000 same as X -c2-c3-X C -NA-CA-CA 1 0.625 180.000 2.000 same as X -c2-na-X C -NA-CA-N2 1 0.625 180.000 2.000 same as X -c2-na-X CA-CA-CA-HA 1 6.650 180.000 2.000 same as X -c2-c2-X CA-CA-NO-OD

S35
Coordinates of the gas-phase optimized structure of a G nucleobase

S36
Coordinates of the "shift" conformer for two stacked P:Z nucleobase pairs (Fig. 9 in the main text) Coordinates of the "slide" conformer for two stacked P:Z nucleobase pairs (Fig. 8 in  Comment on X-and Y-displacement distributions for the ZP-containing oligonucleotide. Although we find very narrowly peaked distributions for X-and Y-displacement values of the ZP-containing oligonucleotide during the trajectory (shown below), we find structures that exhibit very large values resulting from definitions of the standard helix reference frame.
The following coordinates for one structure in which the X-displacement of the central dinucleotide step is measured to be 66 Å.