Flexibility and structural conservation in a c-KIT G-quadruplex

A quadruplex sequence from the promoter region of the c-KIT gene forms a stable quadruplex, as characterized by crystallographic and NMR methods. Two new crystal structures are reported here, together with molecular dynamics simulation studies on these quadruplex crystal structures and an NMR structure. The new crystal structures, each in a distinct space group and lattice packing arrangement, together with the existing structures, demonstrate that the c-KIT quadruplex fold does not change with differing environments, suggesting that quadruplex topological dynamism is not a general phenomenon. The single and dinucleotide loops in these structures show a high degree of conformational flexibility within the three crystal forms and the NMR ensemble, with no evidence of clustering to particular conformers. This is in accord with the findings of high loop flexibility from the molecular dynamics studies. It is suggested that intramolecular quadruplexes can be grouped into two broad classes (i) those with at least one single-nucleotide loop, often showing singular topologies even though loops are highly flexible, and (ii) with all loops comprising at least two nucleotides, leading to topological dynamism. The loops can have more stable and less dynamic base-stacked secondary structures.


INTRODUCTION
Quadruplex DNA and RNA structures are formed from repeats of guanine tracts that are interspersed by mixed sequences ('loops'). These serve to connect together the basic building-blocks of these four-stranded arrangements, the G-quartet (1,2). Sequences encoding for quadruplex nucleic acids have been found in a variety of locations in the human (3,4) and other genomes (5), principally in eukaryotic telomeres (6,7), in promoter regions (8)(9)(10), in untranslated regions especially 5 and 3 -UTRs (11,12) and in a number of breakpoint regions (13)(14)(15)(16). Their biological functions remain to be fully elucidated, although the demonstration of the existence of quadruplexes in cancer cells (17,18) and of enhanced levels in human cancer tissues (19) is strongly supportive of the concept that the presence of quadruplexes may be deleterious when they cannot be fully unwound, as in cancer cells having for example deficiencies in the FANCJ helicase.
The concept that sequences within a promoter comprising repeats of short G-tracts could form higher-order quadruplex structures was first suggested for the c-MYC gene (9,20). Such sequences have subsequently found to be widely prevalent in the human genome, with a particular over-representation in genes involved in proliferation, notably in oncogenes such as k-RAS (21), b-RAF (22), SRC (23), BCL-2 (24,25) and c-KIT (26)(27)(28)(29)(30)(31)(32). Promoter quadruplexes can be stabilized by the binding of appropriate small molecules and it has been suggested that this can be an effective approach to transcriptional inhibition at the single-gene level (9,10). A large number of studies have reported downregulation of expression of one or other these genes in cellbased systems when quadruplex-binding small molecules have been used (see (10,33,34) for recent reviews). Correlations have been suggested in a number of these studies between in vitro small molecule quadruplex affinity to a particular promoter quadruplex and expression changes in that particular gene, although this is not always the case (35). Even though there is no definitive evidence to date of cause and effect in cells for a particular promoter quadruplex gene target, the attractiveness of the concept has led to its current emphasis as a novel drug discovery strategy.
The c-KIT gene codes for the KIT tyrosine kinase gene product, which is implicated in a number of human cancers, notably gastro-intestinal cancer (GIST), where disregulation of c-KIT expression is the primary causative event of this disease (36)(37)(38). Two quadruplexes have been identified in the promoter of this gene, one positioned between −87 and −109 bp (termed here c-KIT G4) (26) and  (39)(40)(41)(42)(43)(44) and it has been suggested that quadruplex targeting may be a novel approach to the therapy of GIST.
The topology and molecular structures of the c-KIT (28,31) and c-KIT1 (29,30) G4s have been characterized by nuclear magnetic resonance (NMR) studies and both show unique features compared to other known quadruplexes. In addition a crystal structure of the c-KIT G4 has been determined (32), with topology identical to that reported in the NMR study (on the identical sequence), albeit with differences in groove dimensions. The quadruplex has a parallel fold with two single-nucleotide propeller (double chainreversal) loops and a long five-nucleotide lateral stem loop; unusually one non-G-tract guanine participates in the core of stacked G-quartets ( Figure 1). The equivalence of solution and crystal structures suggests that the dynamics of the c-KIT quadruplex are limited and that the observed topology represents the global minimum structure. We report here on two further c-KIT crystal structures that have also been determined on the identical sequence, but crystallize in distinct space groups (P3 1 21 at 1.82-Å resolution and H3 at 2.73Å, compared to space group P2 1 2 1 2 for the previously reported brominated form, which was determined at 1.62-Å resolution). These new crystal structures also show the unique c-KIT topology, providing added credence to this hypothesis. We further report on a series of molecular dynamics (MD) simulations, on a total of five distinct c-KIT quadruplex starting-points from the two highresolution crystal structures and the NMR structure. This has enabled the flexibility of the c-KIT quadruplex to be examined in detail, and implications for the structure-based design of specific small molecules to be addressed.

Crystallography
The c-KIT DNA quadruplex sequence d(AG 3 AG 3 CGCTG 3 AGGAG 3 ) was synthesized and purified as described previously (32). A 2-mM DNA solution containing 20-mM potassium cacodylate buffer at pH 6.5 and 50-mM KCl was heated to 358 K before annealing by slow cooling to room temperature. Crystals were grown by the hanging-drop vapor diffusion method. 1 l of a premixed drop solution containing 10% MPD, 10-mM MgCl 2 50-mM KCl and 50-mM sodium cacodylate at pH 6.5 was added to 1 l of c-KIT DNA solution at a concentration of 2 mM. The drop solution was equilibrated against the well solution with 50% MPD, 20-mM MgCl 2 and 100-mM NaCl at 283 K. Hexagonal and rod-like crystals appeared after 2 weeks. The rod-like crystals were assigned the alternative hexagonal H3 setting for the rhombohedral space group R3 and the hexagonal crystals were assigned the trigonal space group P3 1 21. Data sets were collected at 105 K on a single flash-frozen crystal of each type at the Diamond synchrotron facility (UK).
Molecular replacement methods were used in solving these two crystal structures using the G-quartets extracted from the earlier Br U c-KIT DNA crystal structure (PDB code 3QXR). Other parts of the quadruplex structures were progressively built on the basis of 2F o -F c and F o -F c difference electron density maps. This was followed by the location of K + ions in the central ion channels. A total of 162 water molecules were identified in the 1.82-Å hexagonal crystal structure, and three water molecules were located in the 2.73-Å trigonal crystal structure. Final R and R free values are listed in Table 1. Coordinates and structure factors are available from the PDB as entries 4WO2 and 4WO3, respectively. Diffraction data were processed by the SCALE program in the CCP4 package (45). Programs COOT (46), PHASER (47) and REFMAC5 (48) were used in the processes of structure building, model fit and refinement, respectively. Programs CHIMERA (49) and PYMOL (50) were used for visualization and analysis. Table 1 details the crystallographic data.

MD simulations
Coordinates for the two higher-resolution crystal structures and the NMR structure of the c-KIT quadruplex DNA were obtained from the Protein Data Bank (PDB). Those were: (i) the crystal structure of Br U c-KIT G4 (PDB 3QXR) at 1.62-Å resolution (32); (ii) the crystal structure of the native c-KIT G4 at 1.82-Å resolution (PDB 4WO2); (iii) the NMR structure of monomeric c-KIT G4 (PDB 2O3M) (28).
All four crystallographically independent quadruplexes were used in the simulations since both of these X-ray structures comprising two intramolecular quadruplexes per asymmetric unit (chains A and B), together with model 1 from the NMR structure. These intramolecular 22base quadruplexes all have the identical native sequence, 5 -d(AG 3 AG 3 CGCTG 3 AG 2 AG 3 )-3 , which has not been modified from that occurring in the c-KIT gene promoter sequence. Nucleotide numbering in this paper is sequential from the 5 -adenosine.
The sole exception to this is the sequence in the brominated X-ray structure (32), which differs at the 12th posi-    The data on the Br U c-KIT structure have been previously reported (32): these are shown here for comparison purposes.  #1 chain A and #2 chain B, both derived from the Br U c-KIT crystal structure, #3 chain A and #4 chain B, derived from the highresolution native c-KIT crystal structure reported here, #5 model 1 from the native c-KIT NMR structure.
Since the individual conformers available in the NMR ensemble have an average deviation in their structural alignment of <0.5Å, it was considered that any one of them would be equally suitable to be used as a starting point for the MD study. The NMR structure was used for comparison purposes--to explore the dynamic behavior of c-KIT structures of identical sequence, but determined by two different techniques (X-ray and NMR).
Consecutive K + ions vertically aligned within the central core of the quadruplexes and mid-way between each G-quartet were retained at their respective crystallographic positions, as well as the structural K + ions present in the loops of individual 1, 2 and 3 quadruplexes. Two additional Mg 2+ ions observed in the loops of quadruplex 1 were also retained for the simulations. In the NMR system, the missing structural K + ions were manually added into the central core of the G4, within their respective positions as reported in the X-ray structures. The explicit water molecules, present within the crystallographic structures, were retained for the simulation setups, even though they only represent a small fraction of the total numbers of water molecules in these structures.
All of the MD simulations were all-atom ones, performed with the GROMACS v.4.6 program (51), employing the AMBER parmbsc0 force field (52) ported into GROMACS. The TIP3P water model was used and for the ions, AM-BER parm99sb parameters were applied, both for the structural ions and the counter-ions. The simulation protocols, reported elsewhere (53,54), were consistent for all five c-KIT quadruplex systems. The production step of each of the five MD simulations was carried out in triplicate for 250 ns in order to improve sampling and statistical reliability, resulting in 750 ns of simulation data per quadruplex for subsequent analysis. Altogether, 3750 ns of MD trajectories were generated and analyzed by a robust clustering algorithm that automatically provides the desired number of clusters, in order to identify the most prevalent conformations sampled throughout the simulation time at 300 K. All the simulations were performed on an in-house AMD CPU-based Linux cluster (IBM Blade Center H; 16 CPUs per simulation) at the Italian Institute of Technology (Genoa, Italy). A robust clustering algorithm (55), adapted to cope with MD data (56), was employed here (see the Supplementary data for further information). The advantage of using this particular clustering algorithm is that multiple non-consecutive trajectories of an individual system can be clustered together, while it ensures that only those transitions between individual clusters are counted/reported if the frames be-long to the same/corresponding trajectory, and also ensures that no 'artificial' (or unreal) transitions between clusters are formed.
Clustering of the MD trajectories (3 × 250 ns) was carried out for each of the five systems, employing a total of 150 000 frames, each having a 5-ps time step. The maximum number of clusters was set up to 10 (this was found to be an optimal number, based on preliminary clustering analysis where various numbers of resulting clusters were identified--ranging from 5 to more than 20).

Crystal structures
The c-KIT quadruplex in both the 1.82-Å and the 2.73-Å structures ( Table 1) forms head-to-head dimers in the crystals, closely analogous to that previously observed in the Br U form. Each monomeric intramolecular quadruplex in a dimer has been designated as chain A or B. Thus the two new crystal structures comprise four crystallographically independent c-KIT quadruplexes. The quadruplex topology in each instance is identical to that previously reported for c-KIT in the crystal and in NMR solution. In each instance the 22-nucleotide DNA sequence folds into a parallel fourstranded G-quadruplex, and the isolated non-G-tract guanine (G10) is embedded in the G-quartet core. Two singleresidue linkers (A5 and C9) form two propeller (doublechain reversal) loops that bridge three G-quartet layers. C11 and T12 form the third loop, which connects two Gquartet corners (G10 and G13). The stem-loop formed by the five-residue sequence A16-G17-G18-A19-G20 allows the terminal G21-G22 to be inserted back to the G-quartet core. The fourth loop, the long lateral stem loop retains its secondary particular features of two A:G hydrogen bonds. Potassium ions were located in the central channel of both crystal structures ( Figure 2). No additional potassium or magnesium ions were observed in the grooves or loops of these quadruplexes. This contrasts with their location in the higher-resolution Br U form (strong electron density was observed in the 1.82-Å crystal structure at the boundary between two stacked asymmetric units, which is consistent with a potassium ion rather than a water molecule).
The bases at the dimer stacking interfaces (A1, C11 and T12) show the most striking difference between the four individual c-KIT G-quadruplexes, the ones reported in the earlier Br U crystal structure, and the NMR structure (Figures 3a and b and 4). In the NMR c-KIT structure, the two bottom bases A1 and T12 form a Watson-Crick base pair and stack onto the bottom quartet surface (Figure 4a). In the earlier Br U (orthorhombic) G4 crystal structure, there is hydrogen bonding in chain A between N6 of A1 and O2 of Br U12, and a water molecule bridges between N1 of A1 and N3 of Br U12 (Figure 4b). A1 of the Br U G4 in chain B is oriented away of the structure, and bases C11 and Br U12 are stacking on the quartet surface, beneath the contact between a potassium ion and a water molecule (Figure 4c).
In the two new c-KIT crystal structures, two quadruplexes stack each other through the bottom G-quartet, and the interface bases A1, C11 and T12 are oriented away from the common surface (Figure 4d and e). The two A1 bases in each chain asymmetric unit form a stacking pair in both two crystal structures, while C11 and T12 have different roles. In the trigonal crystal form, C11 and A5 from different asymmetric units form a long stacking unit. Two C11 bases in the 2.7-Å hexagonal crystal structure form a hydrogen bond with a G7 phosphate oxygen atom in the other asymmetric unit, and the T12 base stacks with the G18 base from another asymmetric unit.
These differences between the total of six independent c-KIT quadruplexes in the three crystal structures are also seen in the diverse conformations of several others of the external, non-hydrogen-bonded bases (Figure 3a and b). Backbone torsion angles for the extra-helical nucleotides adopt a wide range of values (see Table 2 and Supplementary Table S1), with very few torsion angles being conserved across the six independent quadruplex chains (values for the nucleotides comprising the stacked G-quartet core are in general highly conserved-see the Supplementary data for a full listing of torsion angles). Loop angles ␦ and to a lesser extent have conserved values, the most notable exception being in the quadruplex A chain of the 2.73-Å crystal structure. In the Br U form, bases A1, A5 and C9 are directed out from the quadruplex core, base C11 from chain B is tucked into the core, under the G10 deoxyribose, and the Br U at position 12 is also tucked in, partially stacking onto G13. In the high-resolution 1.82-Å structure bases A1, A5 and C9 are also oriented outward, as is C11. Base T12 however adopts a distinct, partially swung-out conformation, with the edge of the thymine coming closer to the backbone at G22. By contrast, these bases in the ensemble of NMR structures adopt a significantly narrower range of conformations (Figure 3c). The most pronounced difference between the crystal structures and the NMR models involves the large cleft between the stem-loop (G18, A19 and G20) and the adjacent G-quartet. The net effect of these differences is that the large cleft is consistently narrower in all of the crystal structures.

MD simulations
Multiple-trajectory MD simulations of the intramolecular 22-mer c-KIT G4 in five distinct systems were carried out for 250 ns, each repeated in triplicate, for a total of 3750-ns simulation time. The systems were: The multiple-trajectory approach enabled improved sampling of the conformational space to be obtained for a given set of conditions/local minima, resulting in more statistically robust conclusions than analyses based on a single trajectory approach. The structural convergence of the five systems over the course of the multiple 250 ns simulations is graphically represented as RMSD plots in Figure 5 and summarized in Supplementary Table S2. Table 3    here, but during preliminary 250-ns simulations the NMR structure became unwound and unstable in the absence of K + ions). The flexibility of the bases within these systems was explored prior to starting the MD simulations, by plotting the averaged normalized B factor values per residue that were obtained from the X-ray structures (Figure 6a). Although experimental B factors are purely crystallographic quantities, theoretical B-factors (per residue) were also calculated, and normalized, for the c-KIT G4 NMR structure, employing all the conformations within the NMR bundle. These theoretical B factors were calculated by means of the GROMACS g rmsf tool. This shows a consistent pattern of increased B values for those nucleotides with extrahelical bases, especially at and around A1, A5, C9, C11 and T12. Nucleotide A19 is also flexible, even though the adenine base is stacked onto G18. The RMSD values averaged per residue over all of the models within the NMR structure were calculated and normalized by means of the GRO-MACS g rmsf tool (Figure 6a). Comparison of the crystallographic and NMR-derived fluctuations shows that the flexibility of residues A5, C9, C11, T12 and A19 is apparent both in the experimental and simulated structures. The RMS fluctuations of individual bases were calculated on completion of the MD simulations from all three trajectories available for each of the five systems, and were normalized (Figure 6b). There is consistently good agreement between the experimental and simulation data, with the same regions and nucleotides in all of the c-KIT quadruplexes showing equivalent flexibility and stability.
The three 250-ns MD runs per system were clustered together, resulting in 10 clusters per system. Each of the identified clusters is represented by a medoid structure (i.e. a representative structure/object of the cluster whose average dissimilarity to all other structures/trajectory frames in the cluster is minimal). Each of the medoids corresponds to a physical frame from the trajectory. The probability of formation (i.e. the cluster population) of each cluster was also calculated and is reflected in the graphical representation of the clusters (Figures 7 and 8).
Although clustering was performed solely for the nucleic acid atoms, the complete systems--including the structural  ions--were retrieved for visualization. The structural K + ions within the central core in between the quartets always remained at their respective positions (Figures 7 and 8), as well as the Mg 2+ ions, which remained within the loops of the #1 G4 structure (Figure 7). However the K + ions initially present in the loops of systems #1, #2 and #3 did not remain at their sites throughout the course of the MD runs.
Clusters of chains #1 and #3 (Figure 7) from both Xray structures displayed a common pattern. Nine of the 10 clusters are highly interconnected, i.e. undergoing multiple transitions between structural conformations, but one cluster is isolated (cluster no. 8 for #3, and cluster no. 3 for #1; the population in both cases ranges between 12 and 15%, respectively), and is only connected to one of the other clusters. Superimposition of all the 20 representative structures Figure 9. Different positions/orientations of bases identified in cluster representatives sampled throughout the MD runs. Bases C9, C11 and T12 were found to be oriented up (green), down (red) and away from the backbone (blue). The bases were also found to be stacking on the bases of other G-quartets (not shown here).
of #1 and #3 shows that their conformations are overall in good agreement, with the exception of bases C9, C11 and T12 where significant base flipping occurs (Figure 7).
Superimposition of the cluster representatives (medoid structures) of #2 and #4 G4s reveals more conformational flexibility/variability compared to #1, in particular in the loop region formed by bases A16-G20. Clear base-flipping occurs at the corresponding bases C9, C11 and T12 ( Figure  7).
In terms of medoid structures obtained from the NMR model MD simulations (Figure 8), very good agreement was observed for the conformations, in accord with high stability of the simulation data (trajectories). The clusters are also very well interconnected with high frequencies of transitions between them (Figure 8a). Bases C9 and C11 were the only two where some conformational variability of the bases was found (Figure 8b). However that was to a much smaller extent compared to the X-ray structures.
The flipping (and change of orientation) of bases C9, C11 and T12, determined from the clustering data of all five systems, was further examined and quantified, and is summarized in Table 4. Visualizing/keeping the G-quartets in plane, orthogonal to the K + ions in between the G-quartets, has enabled four orientations of these bases to be identified with respect to the DNA backbone: (i) up, (ii) down, (iii) oriented out, (iv) stacking, as shown in Figure 9. These positions correspond to those described in Table 4.
Base C9 remained predominantly in the same orientation in all five simulated systems, as in all three experimental structures: it is oriented away from the DNA backbone. This base did sometimes adopt other orientations, although not in structure #2 or the NMR model. Base C11 has the highest degree of conformational variability, in both #3 and #4 G4s, as well as #1. T12, on the other hand, was found to be stacking on base G13 in the NMR model as well as in both #1 and #2 structures. In #3 and #4, T12 was also found to be oriented away from the DNA backbone.

The short c-KIT loops are very flexible
The consistent experimental observations of only very limited conformational flexibility for the core of the c-KIT quadruplex in all the crystal and NMR structures are in overall excellent accord with the MD data, as shown in particular in Figure 6, comparing experimental B factors with calculated RSMD values. Similarly the conclusions from the simulations that the loop nucleotides on the exterior of the structures are conformationally flexible and show extensive base flipping are in good accord with the experimental observations from the three separate crystal structures and the NMR structure.
The trinucleotide loops in human telomeric quadruplexes have been extensively explored by MD simulation methods (57)(58)(59)(60)(61)(62)(63) and these have also been found to be conformationally mobile (60,63). Although a range of conformations have been observed in these simulations, dependent on loop type, base-base stacking in the loops does stabilize some of them for at least part of the simulation time. Loop flexibility has also been reported in a previous c-KIT quadruplex simulation (64), and the present study extends this finding by showing that single or dinucleotide loops are less likely to form discrete transiently stabilized base-stacked loops than are trinucleotide ones, as in human telomeric quadruplexes. By contrast the pentanucleotide AGGAG loop in the c-KIT quadruplexes is rather stable, with again experiment and simulation in broad agreement.

The potential ligand binding sites of the c-KIT quadruplex
The flexibility of the external unpaired bases, in particular those forming the two single-nucleotide loops A5 and C9 This feature also has 27% occurrence in the simulated chain A of the 1.82Å c-KIT G4 trigonal form (Table 4), even though the stacking is not found in the initial starting point for this particular structure. Some ligands may stack on top of the in situ T12-A1 stack rather than onto this terminal G-quartet, which would involve displacing the T12-A1 base pair.
The large cleft in the c-KIT quadruplex is conserved in all the structures, experimental and theoretical, although its dimensions do have some variability (see, in particular, Figure 8). The cleft diameter can be defined by P. . . P inter-strand separation distances across the cleft. Distance P(G8). . . P(G20) is 14.8Å in the NMR structure, 8.1Å in the Br U crystal structure and 8.7Å in the 1.82Å crystal structure reported here. The cleft is of sufficient size and nature to be a suitable small-molecule binding site. Even though in silico docking methods need to take the flexibility of cleft dimensions into account, the nature of the site is well-suited to experimental fragment-based approaches (53,65) to discover novel c-KITquadruplex-selective small molecules capable of down-regulating c-KIT expression.

The c-KIT quadruplex core topology is conserved
Quadruplex DNA and RNA nucleic acids can in principle adopt a wide range of topologies, and X-ray and NMR methods have to date almost certainly only accessed a small cohort of them. The potential for a large, but as yet undefined number of folds is due to several factors, including (i) the inequality of G-tract lengths in many quadruplexes coupled with (ii) the diversity of loop lengths and sequence. It is apparent, for example, that when a guanosine nucleotide occurs in a loop region, it can become directly involved in quartet formation and thus induces more complex arrangements than is the case with simple quadruplexforming sequences containing non-guanosine loops. The c-KIT quadruplexes is an exemplar of this complexity of structure.
The present study addresses the question of whether the c-KIT quadruplex shows conformational diversity in differing environments. The earlier observations were that its fold as found by NMR in solution was also observed in the crystalline state. These findings have been reinforced by the determination of the two further c-KIT crystal structures reported here, each of which occurs in a distinct space group. These thus in total provide visualizations of the c-KIT quadruplex in three distinct crystal lattice environments. The differing lattices have not imposed restrictions on the conformations of those external bases of the c-KIT quadruplex, which are unencumbered by hydrogen bonding or internal stacking. Instead these bases are found in a diversity of conformations, comparable to those in the NMR ensemble (compare Figure 3b and c). The CD spectrum of the c-KIT DNA quadruplex in K + solution shows the features of a typical parallel folded quadruplex (data not shown), with a prominent positive signal at around 260 nm and a negative signal at around 240 nm, fully consistent with the fold observed in the crystallographic and NMR c-KIT quadruplex structures.
It can thus be concluded that the core fold of the c-KIT quadruplex is highly conserved, stable and is not environmentally sensitive, at least in K + conditions. This is in striking contrast to the high topological variability shown by human telomeric quadruplex sequences (see, for example (66)(67)(68)(69)(70)(71)(72)(73)(74)(75)(76)). This suggests that topological variability is not inherent to quadruplexes as a general class of nucleic acid structures. Rather, only those quadruplexes that have loops containing at least two nucleotides can readily show such variability since single-nucleotides are generally constrained to form propeller loops and are unable to refold to form other types of loop, for stereochemical reasons (a single nucleotide is of insufficient dimensions to form a diagonal or lateral loop). The findings here are in accord with single-molecule studies of the c-KIT quadruplex (77) embedded in a duplex, which show that its dynamics are much reduced compared to the human telomeric quadruplex, and reinforce the view that the stability of the c-KIT topology makes it a suitable druggable quadruplex target for small molecules. The majority of promoter and other quadruplexes for which there is experimental folding data, from CD studies and in some instances from NMR studies, have at least one single-nucleotide loop and parallel folds have mostly been assigned to these structures (see, for example (24,30,(78)(79)(80)(81)). We speculate from this that intramolecular quadruplexes can be grouped into two broad classes (i) those with at least one single-nucleotide loop, which often show singular topologies even though loops are highly flexible, and (ii) with all loops comprising at least two nucleotides, which is likely to lead to folding dynamism, and with loops having more stable and less dynamic basestacked secondary structures.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.

ACKNOWLEDGMENTS
We are grateful to the Diamond Light Source for access to Synchrotron facilities, to Ambrose Cole (Birkbeck College) and Gavin Collie (CRUK Group, School of Pharmacy) for expert assistance with data collection, to Tony Reszka for help with oligonucleotide synthesis and spectroscopy and to Gary Parkinson for general discussions.