ASC-G4, an algorithm to calculate advanced structural characteristics of G-quadruplexes

Abstract ASC-G4 is an algorithm for the calculation of the advanced structural characteristics of G-quadruplexes (G4). It allows the unambiguous determination of the intramolecular G4 topology, based on the oriented strand numbering. It also resolves the ambiguity in the determination of the guanine glycosidic configuration. With this algorithm, we showed that the use of the C3’ or C5’ atoms to calculate the groove width in G4 is more appropriate than the P atoms and that the groove width does not always reflect the space available within the groove. For the latter, the minimum groove width is more appropriate. The application of ASC-G4 to 207 G4 structures guided the choices made for the calculations. A website based on ASC-G4 (http://tiny.cc/ASC-G4) was created, where the user uploads his G4 structure and gets its topology, the types of its loops and their lengths, the presence of snapbacks and bulges, the distribution of guanines in the tetrads and strands, the glycosidic configuration of these guanines, their rise, the groove widths, the minimum groove widths, the tilt and twist angles, the backbone dihedral angles, etc. It also provides a large number of atom-atom and atom-plane distances that are relevant to evaluating the quality of the structure.

ASC-G4, an algorithm to calculate advanced structural characteristics of G-quadruplexes CONTENTS p. 2 Figure S1. A G4 structure made of 4 tetrads separated into two blocks.
p. 3 Figure S2. Structural issues for the detection of tetrads.
p. 5 Figure S4. Structural issues for the detection of strands.
p. 7 Figure S6. The two extreme guanosines with the C1' atom out of the base plane.
p. 9 Figure S8. Part of the results of ASC-G4 for 7D5E.
p. 19 Figure S15. Histogram of GBA depicted according to two different ranges.
p. 10 Figure S9. The torsional twist angle and the tilt angle.
p. 6 Figure S5. Output file of stacking nucleotides of structure 2KPR.
p. 11 Figure S10. Output file of the minimum groove widths.
p. 13 Table S1. Distribution of the topologies of the 192 one-block and interlaced structures.
p. 14 Table S2. Distribution of the topologies of the 15 two-block structures.
p. 16 Figure S13 (part 1). Investigation of the difference in the topologies with both clockwise and anticlockwise orientations of the strand numbering.
1 p. 20 Figure S16. Illustration of the discrepancies between the different characteristics of the configuration.
p. 12 Figure S11. Output of the main chain and sugar torsion angles.
p. 17 Figure S13 (part 2). Same as S13 (part 1) but for hybrid topologies. Figure S1. A G4 structure made of 4 tetrads separated into two blocks. Left panel: structure PDB ID 2MS9, with the following color code, 1 st tetrad in light green, 2 nd tetrad in pink, 3 rd tetrad in purple, and 4 th tetrad in bright green; the rest is a gray tube. Right panel: schematic representation of 2MS9 where the tetrads are represented as gray planes and the strands as blue arrows for down, and orange arrows for up. The bold numbers on the top are the strand numbers and the topologies below are those of the two blocks. In this case, the two blocks are parallel but in opposite directions. Block 1 consists of tetrads 1-2 and block 2, of tetrads 3-4. The example of 2LPW illustrates the case where a guanine of a tetrad (here dG24) is closer to the facing guanine from another tetrad (dG3) than to the facing guanine of the same tetrad (dG4). The color code of atoms is as follows: C (cyan), N (blue), O (red), and P (tan). (C) The two tetrads of 148D are drawn, the first in light green and the second in pink. As observed they are not planar. (D) In 6QJO, chain B, the distance between the planes of dG11 and dG6 is greater than the distance between dG11 and dG8, although dG6 and dG11 are part of the same tetrad (light green), and dG8 (yellow) is in a loop. The blue plain line represents the base plane and the black dashed line the distance of O6 to this plane. Hydrogen atoms are omitted for clarity.  [i,j]. The latter corresponds to the distance from O6 of the H-bond acceptor [j] to the plane of the H-bond donor [i] defined by atoms, N1, O6, and N7. Lower part: list of the facing guanines and their C3'-C3', C5'-C5', and P-P distances. This file corresponds to structure 2KPR. Figure S4. Structural issues for the detection of strands. (A) Two adjacent strands (the penultimate and ultimate ones) of 5J05 are represented as yellow and blue sticks. The C1' atoms are orange and red spheres. Normally, the C1'-C1' distance between successive guanines of the same strand (dashed black lines) should be smaller than between two adjacent strands (dashed gray line), whereas here, d C1'(15)-C1'(18) < d C1'(15)-C1'(16) . (B) Two stacking guanines in a strand. They are expected to be parallel, but the plane-plane angle is equal to 50.5°. (C) Structure 6B3K consists of two tetrads. Each strand is colored and numbered differently. As can be observed, whereas in strands 1 and 2 (purple and green, respectively) guanines of the top tetrad stack well over those of the bottom tetrad, in strands 3 and 4 (orange and red, respectively) the upper guanines are shifted relative to the lower ones. Therefore, the upper G of strand 3 stacks over nothing, and that of strand 4, stacks over the lower G of strand 3, leaving the lower G of strand 4 without an upper stack.  Figure S5. Output file of stacking nucleotides of structure 2KPR. CG is the center of gravity of the purine heavy atoms of G, A, and I, and of the pyrimidine heavy atoms of C, T, and U. Cos (base-plane-angle) is the cosine of the angle between the base planes of the two stacking nucleotides. The rise is the distance between the two base planes.

6SUU:dG4
Distance of C1' to plane (N9-C4-C8) = 0.15 Å Distance of C1' to plane (N9-C4-C8) = 0.80 Å Base plane A B Figure S6. The two extreme guanosines with the C1' atom out of the base plane. (A) 5NYT:dG4 has the smallest distance (C1'-base plane) visually detectable. (B) 6SUU:dG4 has the largest distance (C1'-base plane) of our set of G4 structures. The atoms' color code is C (cyan), N (blue), O (red), and P (tan). Hydrogen atoms are omitted for clarity. Figure S7. Output file of configurations. For each stem guanosine, the c angle is given followed by the configuration based on its value, then the distances d N3-O5' and d H1'-H8 followed by their respective predicted configuration, the distance of C1' to the base plane represented by atoms (N9-C4-C8) and finally the retained configuration based on all these elements. This file corresponds to structure 2KPR. Note that dG12 has an undetermined configuration based on its c angle and this indeterminacy is removed thanks to the distance calculations.
8 Figure S8. Part of the results of ASC-G4 for 7D5E. In the Table the distribution of the stem guanines in the tetrads and strands is given, as well as the glycosidic configuration. In the columns, A refers to the chain name, followed by the guanine identification number. As observed, this 4-tetrad structure presents important discontinuities between tetrads 2 and 3 in strands 1, 2, and 3, which separate the two blocks. In the second block (tetrads 3 and 4), strand 4 consists of two nts: A15 and the snapback A26. Therefore, the direction of the strand is not obvious. However, since all gcs in this block are anti (without considering the snapback), indicating a parallel topology, and the direction of strands 1, 2, and 3 is up, it was deduced that the direction of strand 4 in block 2 is also up, resulting in a parallel topology. In the two blocks, the strands are connected by 7 loops. A linker, which is also a bulge between A13 and A15, connects the two blocks. Figure S9. The torsional twist angle and the tilt angle. (A) The torsional twist angle is the pseudo-dihedral angle C1'(i)-CG(i)-CG(i+1)-C1'(i+1), where the first C1' and CG belong to tetrad i (in light green) and the second C1' and CG to tetrad i+1 (in pink). CG is the center of gravity of the four O6 atoms of a tetrad. The torsional twist angle is directional. (B) The tilt angle is the tetrad-strand angle calculated between vectors and , the former corresponding to the tetrad and the latter to the strand. CG and C1' are shown as hard spheres.

Tilt angle
Tetrad i C1' Figure S10. Output file of the minimum groove widths. The groove widths are calculated from C3' atoms (upper part) and C5' atoms (lower part). For each groove and each tetrad, the minimum distance between guanine i of strand n (first guanine identification) and guanine j of strand n+1 (second guanine identification), is given. The interest of this file is not the minimum distance itself, which is also found in the final output file, but the nt identification of the closest guanines in each groove. This file corresponds to structure 2KPR.   Distance of dG12:O6 to dG16:plane (N1-O6-N7) = 1.0 Å Figure S12. The two extreme non-coplanar Hoogsteen base-pairs of guanines. (A) 6CCW has the smallest distance between atom O6 and the facing base plane in our range of visually detectable non-coplanar pairs. (B) 1OZ8 has the largest distance O6 to the facing base plane in our set of G4 structures. The atoms' color code: C (cyan), N (blue), O (red), and P (tan). The blue plain line represents the base plane and the black dashed line the distance of O6 to this plane. Hydrogen atoms are omitted for clarity. Figure S13 (part1). Investigation of the difference in the topologies with both clockwise (green arrows) and anticlockwise (red arrows) orientations of the strand numbering. This is a schematic representation of G4 viewed from the top of the tetrad that contains the first stem-guanine. Each circle represents a strand. The first strand (in purple), which is always down, is indicated with the number 1. The strands are colored according to their position in the stem. The direction of the strands (down, d, and up, u) are indicated near them. On the right of each G4, the strands are dispatched linearly according to the direction of their numbering (clockwise, green arrow, and anticlockwise, red arrow). The number attributed to each strand is written above the dispatched strands. This is followed by the direction of the strands (d and u) and the resulting topology. As observed, in four cases, numbering the strands in the clockwise or the anticlockwise orientation yields to an unambiguous topology, parallel, antiparallel-chair, hybrid1, and hybrid4 (see next page for the hybrid topologies). Whereas in four other cases, numbering in one orientation or the other yields two different topologies that are currently confused, antiparallel-basket with antiparallel-basket2 and hybrid2 with hybrid3. These topologies are grouped using black square brackets. Please note that here, the clockwise and anticlockwise orientations are only used to help understand the drawing, whereas, in ASC-G4, the orientation of the strand numbering is based on the configuration of the first stem-guanine: it follows the H-bond donors for an anti-G and the H-bond acceptors for a syn-G. For most G4s this orientation corresponds to a clockwise direction.