WebTetrado: a webserver to explore quadruplexes in nucleic acid 3D structures

Abstract Quadruplexes are four-stranded DNA/RNA motifs of high functional significance that fold into complex shapes. They are widely recognized as important regulators of genomic processes and are among the most frequently investigated potential drug targets. Despite interest in quadruplexes, few studies focus on automatic tools that help to understand the many unique features of their 3D folds. In this paper, we introduce WebTetrado, a web server for analyzing 3D structures of quadruplex structures. It has a user-friendly interface and offers many advanced features, including automatic identification, annotation, classification, and visualization of the motif. The program applies to the experimental or in silico generated 3D models provided in the PDB and PDBx/mmCIF files. It supports canonical G-quadruplexes as well as non-G-based quartets. It can process unimolecular, bimolecular, and tetramolecular quadruplexes. WebTetrado is implemented as a publicly available web server with an intuitive interface and can be freely accessed at https://webtetrado.cs.put.poznan.pl/.


INTRODUCTION
Quadruplex es ar e four-stranded DNA and RNA motifs that form in genomic regions rich in guanine. They are involved in many genomic processes, including transcription, replication, and epigenetic regulation ( 1 ). Numerous studies point to their association with the growth and progression of cancer and other diseases. All this makes quadruplexes promising targets in drug design and interesting subjects of structural studies (2)(3)(4)(5).
In 2020 Popenda et al. proposed a classification scheme deri v ed from base pairing patterns in tetrads ( 6 ). They defined three classes, O , N and Z , named after the shape observed in the tetrad visualizations. Each class has clockwise and anticlockwise progression, indicated with + and −, respecti v ely. Ne xt, they proposed classifying a quadruplex as O , N or Z if all of its tetrads are of this type or M (mixed) otherwise. In ad dition, they ad ded a suffix p , a or h for the par allel, antipar allel, or hybrid orientation of the strands, respecti v ely.
The topologies underlying the classification of quadruplexes and other parameters of their structures can be analyzed using a few computational tools. DSSR ( 7 ) was the first to target the detection of G-quadruplexes in 3D structure data saved in PDB and PDBx / mmCIF files and to describe their features. It runs systematically on all entries in the Protein Data Bank and collects motifs found in the DSSR-G4DB database. ElTetrado ( 8 ) can identify and analyze G4s and other kinds of tetrads and quadruplexes, classify them, and compute their parameters. It is the core of the computation pipeline running within the ONQU ADR O database system ( 9 ). The most recent tool for processing a tom coordina tes in the sear ch for quadruplex es is ASC-G4 ( 10 ). It calculates more features than DSSR and ElTetrado, but is limited to unimolecular quadruplexes and supports only the PDB format.
In this paper, we introduce WebTetrado, a web server for analyzing 3D structures of quadruplexes. It has a userfriendly interface and offers many new features compared to its command-line predecessor, ElTetrado. Novelties include dedica ted visualiza tions thanks to tight integra tion with our advanced tool Dr awTetr ado ( 11 ).

METHOD OUTLINE
The first step in the WebTetrado pipeline (see Figure 1 ) is to read the input data and the configuration parameters. The front-end feeds these data to the back-end on the basis of the input form on the main page. The validation protocol ensures that the main input is a correct PDB or PDBx / mmCIF file and that all other analysis parameters have viable values. The back-end stores successfully validated inputs in a database and enqueues a computing task. This step involves the generation of a unique identifier, which the front-end embeds in a URL. Initially, the URL displays a loading page with the option to turn on browser notification upon the task's successful completion. Later, the same URL shows the results for the ne xt se v en days, after which it expires.
The WebTetrado engine supports parallel processing, so it can handle multiple requests at the same time. The central part of its pipeline starts with reading the configuration metadata from the database. The 3D structure is then loaded and interpreted in terms of its chain, residue, and atom composition. This includes the calculation of the glycosidic bond angle and the classification of each nucleobase as anti or syn . Next, WebTetrado applies geometrical rules (i.e. constraints on atomic distances, planar and (pseudo)torsion angles) to find stacking and base-base interactions together with their Leontis-Westhof classification. The result of this step allows for the building of a directed graph of nucleotide interactions in which cycles of length four correspond to tetrads in the analyzed structure. This leads to the next step in which the stacking infor mation deter mined previousl y is a pplied on top of the tetrads to locate the N4 helices. Based on chain composition rules, these are divided into distinct quadruplexes, for which WebTetr ado tr aces loop progression and strand connectivity. Mor eover, the engine r eco gnizes cations, w hich play significant roles in quadruplex stability, and proceeds to analyze their proximity to tetrad centers or external sites. Next, the engine classifies the tetrads and quadruplexes according to all its supported schemes and computes quadruplexrela ted fea tures such as inter-tetrad twist, rise, or planarity de viation. Finally, the quadruple x motif is r epr esented in the two-line dot-bracket format.
These results are stored in the WebTetrado database and a separate drawing task is added to the queue for each supported visualization tool, VARNA ( 12 ), R-Chie ( 13 ) and Dr awTetr ado ( 11 ). This approach allows for the parallel preparation of all static visualizations. Each drawing task starts by reading the metadata and the computing task's results from the database.
The VARNA-based procedure uses a set of in-house modifications on top of VARNA software to apply custom coloring and Leontis-Westhof visual annotation, making the quadruplex visualization clear. WebTetrado precomputes four variants of VARNA-based visualization: (i) with interactions constituting tetrads only, (ii) with the addition of canonical pairs outside tetrads, (iii) with all noncanonical interactions and (iv) with all canonical and noncanonical interactions.
The R-Chie-based visualization draws arcs above and below the sequence to display two simultaneous interactions for e v ery in-tetrad nucleotide. This is necessary because G-quartets are based on multiplet base pairing patterns (i.e. each in-tetrad nucleotide has two interacting partners). WebTetrado precomputes two R-Chie-based variants, with and without canonical base pairs outside the tetrads. Unlike tetrad-involved interactions, which use distinct colors for e v ery ONZ class, the ar cs r epr esenting canonical base pairs are black.
The last tool --Dr awTetr ado --is coupled the most with WebTetrado, as the computing task's results directly influence its working. Dr awTetr ado pr epar es a 2.5D view of each G4-helix and quadruplex, showing stacking information, anti / syn conformation, and loop progression.

WEB APPLICATION
WebTetrado consists of three modules designed to provide flexibility and stability. The service core (engine) is responsible for processing user requests. It is built on top of the lightweight Flask server framework and integrates the ElTetrado tool ( 8 ) to identify and process quadruplex data. The next module, the back-end, uses database-driven middleware to manage , queue , and store user requests. It uses the Django w e b serv er frame wor k (v ersion 4.1) and the Redis task queue broker, enabling fast processing of concurrent workloads. The engine and the back-end use a Python 3.10 environment with dedicated bioinformatics libraries. They communicate via an OpenAPI-specified interface, which allows automatic validation. The w e b-accessible front-end is based on TypeScript's React 17 frame wor k, e xtended with ant-design components. It provides a series of structure visualizations pr epar ed with four incorpor ated gr aphical tools: VARNA ( 12 ), R-Chie ( 13 ), Dr awTetr ado ( 11 ) and Mol* ( 14 ). We designed WebTetrado to work on any modern w e b browser, either mobile or desktop. It is hosted and maintained by the Institute of Computing Science, Poznan Uni v ersity of Technology, using the Docker container service.

Input and output description
The input for WebTetrado is the tertiary structure of the nucleic acid gi v en as the atomic coordinates in a PDB or PDBx / mmCIF file. Users upload the file from a local dri v e or provide the PDB id of a structure. In the latter case, the back-end automatically downloads the corresponding file from the Protein Data Bank ( 15 ). Six ready-to-use examples are also available in the system to familiarize users with the tool's capabilities. Additional settings condition the identification of tetrads and quadruplexes in the input structure and their classification. We provide sensible defaults, but optionally users can modify their values.
Users can select a particular model to analyze in the case of a multi-model input file. Next, they can instruct the system to turn off G-tetrads highlightning, i.e. canonical ones composed of exactly four guanines. By default, WebTetrado does not make assumptions about nucleotide composition and finds all types of quartet, but it highlights the canonical G4s among them. This behavior can be disabled. In addition, the next setting controls whether tetrads are detected with cWH pairings only. Again, these pairings are present in the usual G4 tetrads, but by default WebTetrado generalizes the search for quartets and looks for all kinds of pairs between in-tetrad nucleobases. In addition, users can set how many nucleotides to accept for stacking mismatch. It controls how sensiti v e WebTetrado should be to inherent uncertainty in stacking interaction detection. In a perfect, canonical quadruplex, each tetrad pair contains four pairs of stacked nucleobases. Howe v er, for se v eral reasons, this might not be detected as such. For example, if the structur e r esolution is low or if it is an intermediate stage taken from the molecular dynamics trajectory, then most likely not all four nucleobase pairs will be recognized as stacked.
To alleviate this issue, WebTetrado makes it possible to set a mismatch threshold. By default, at least two pairs of nucleobases stacked between tetrads allow them to be treated as part of the same quadruplex. Finally, users can disab le chain reor dering, required to classify bi / tetramolecular quadruplexes, which is enabled in default runs. Keeping the W610 Nucleic Acids Research, 2023, Vol. 51, Web Server issue original order of chains, as given in the PDB or PDBx / mmCIF file, depends on the input settings.
The result page has a dedicated, bookmar kab le URL that allows users to return up to 7 days after completing the task. It displays all gathered quadruplex-rela ted informa tion and visualiza tions: (i) metada ta concerning the structure (PDB id, molecule type, experimental method), (ii) the sequence of the input molecule and its secondary structure in a twoline dot-bracket with colored G-tracts, (iii) quadruplex description (sequence, number of tetrads, type by number of strands, loop description, tetrad combination, rise, twist, type by strand orientation, ONZM class), (iv) tetrad description (sequence , nucleotides , planarity, angles, base pairs with Leontis-Westhof classification, ONZ class) and (v) visualizations of the secondary and tertiary structures with ONZ-related coloring (classical, arc and layer diagrams, a cartoon model).
Users can download the results in CSV f ormat f or tabular data and SVG or PNG formats for 2D and 3D structure visualizations. Figure 2 shows screenshots of the WebTetrado service. Panel 2 A shows the screen of the submission form, which allows specifying structure calculations. Submitting a task r edir ects to a self-r efr eshing waiting page, allowing users to enable browser notifications. If enabled, the browser will show a message when WebTetrado finishes processing the request.

User interface
The remaining panels are the main parts of the result page. The panel 2 B shows a table with general information about a quadruple x. Abov e the tab le, two tab selectors make it possible to show a different N4 helix or a different quadruplex. Panel 2 C shows the content of the result page. It includes se v eral tab les with details about tetrads , loops , angles, tetrad pairs, base pairs and nucleotides.

Analysis of the major G-quadruplex form of HIV-1 LTR
G-quadruplex-forming sequences ar e widespr ead in genomes, including viral ones. Human immunodeficiency virus 1 (HIV-1) has a 5'-LTR (long terminal repeat) promoter, which plays an important role in the viral replication cycle and is regulated by G-quadruplexes ( 16 ). In particular, the LTR-III fragment forms the most stable G-quadruplex. hav e been acti v ely inv estiga ted due to their fea tures and potential applications in medicine and biotechnology ( 18 ). Furthermore, the HIV-1 LTR-III quadruplex includes a V-sha ped loop, w hich occurs w hen the 5'-endmost tetrad lies in the middle of the G-quartet stack (see Figure 3 C). In addition, it has a hybrid pattern of strand orientations and a combination of 1 nt propeller, 3 nt lateral and 12 nt diagonal loops (see Figure 3 C).
The VARNA and R-Chie visualizations are semiinteracti v e --the user may r econfigur e them using switches placed above them in the user interface. These switches change the visibility of base pairs outside the tetrads. In particular, for the HIV-1 LTR-III quadruplex, the visualization of the duplex fragment can be disabled to focus only on the quadruplex part. All four visualizations are colorcoded according to the ONZ scheme, which makes it easier to understand the tetrad features in different contexts.
WebTetrado automatically finds all the confirmations of the unique quadruplex topology in the 6H1K PDB structure. In addition, it classifies the tetrads and quadruplex according to Webba da Silva ( 19 ) and the ONZ scheme ( 6 ). According to it, the HIV-1 LTR-III structure contains two Z and one O tetrad, making it an Mh (mixed hybrid) class quadruplex. The mixed class encompasses the rarest and most complex quadruplex topologies. WebTetrado also computes se v eral quantitati v e features of the G4 and shows the structural data: nucleobase conformations and basepairing information both in the tetrads and in the stemloop motif.

CONCLUSIONS
WebTetrado is a new web server for analyzing structures containing quadruplexes, four-stranded DN A / RN A motifs of high functional significance that fold into complex shapes. It supports automatic identification and advanced analyses of all types of quadruplexes based only on a tomic coordina tes. We bTetrado provides a w ealth of data computed from the gi v en input file, including classification schemes recognized by the G4 community. In addition, it shows visualizations specially designed to r epr esent quadruplexes. The tool is free and open to an y one interested in the analysis of DN A / RN A structures that include quadruplex motifs.

DA T A A V AILABILITY
WebTetrado is implemented as a publicly available w e b server with an intuitive interface and can be freely accessed at https://w e btetrado.cs.put.poznan.pl/ .