DNAmoreDB, a database of DNAzymes

Abstract Deoxyribozymes, DNA enzymes or simply DNAzymes are single-stranded oligo-deoxyribonucleotide molecules that, like proteins and ribozymes, possess the ability to perform catalysis. Although DNAzymes have not yet been found in living organisms, they have been isolated in the laboratory through in vitro selection. The selected DNAzyme sequences have the ability to catalyze a broad range of chemical reactions, utilizing DNA, RNA, peptides or small organic compounds as substrates. DNAmoreDB is a comprehensive database resource for DNAzymes that collects and organizes the following types of information: sequences, conditions of the selection procedure, catalyzed reactions, kinetic parameters, substrates, cofactors, structural information whenever available, and literature references. Currently, DNAmoreDB contains information about DNAzymes that catalyze 20 different reactions. We included a submission form for new data, a REST-based API system that allows users to retrieve the database contents in a machine-readable format, and keyword and BLASTN search features. The database is publicly available at https://www.genesilico.pl/DNAmoreDB/.


INTRODUCTION
DNAzymes, also known as deoxyribozymes, are synthetic single-stranded DNA molecules able to catalyze chemical reactions. Discovered in 1994 (1), this family of catalysts possesses desirable properties such as low cost and ease to synthesize, robustness, as well as adaptability and applicability to a wide range of processes alone or in combination with nanomaterials (2,3). DNAzymes known to date have been isolated by a process termed in vitro selection or SE-LEX (Systematic Evolution of Ligands by Exponential enrichment) (4,5). This laboratory procedure mimics natural evolution at an accelerated pace and consists of rounds of selection that iteratively isolate DNAzymes from a random pool of sequences. The isolated sequences are amplified by the polymerase chain reaction after each round of isolation, while the population becomes enriched in sequences that exhibit the activity that is selected for. The selection experiment is concluded once the desired level of catalytic activity is reached, the active sequences are cloned and, typically, the most representative DNAzymes are characterized through functional assays.
More than 1500 different DNAzyme sequences have been reported in the literature, accounting for at least 20 different catalytic activities. While deciphering the basis of DNA catalysis at the molecular level may seem to be still far in time, the wealth of biochemical data gathered throughout >25 years would serve the scientific community better if it is presented in an online resource, where it could be easy and intuitive to make comparisons, stimulate new ideas and foster data exchange and new collaborations. For instance, DNAzyme sequences are not normally available in the public sequence databases such as GenBank, with the exception of those that have had their 3D structure determined (17)(18)(19)(20)(21). Moreover, publications typically report 1 or 2 active sequences in the main text of the article. However, additional catalytic sequences are often reported in the supplementary information and, for this reason, are hard to retrieve for non-experts. We have developed the DNAmoreDB database as a single online resource to store and organize information about DNAzymes at different levels to facilitate their study, identify DNAzymes that already exist, and to enable the comparison of newly selected sequences to those in the database.

DATABASE CONTENT
The current (as of 2020.09.21) version of DNAmoreDB contains 1782 sequences drawn from 116 published references (Supplementary data), and this dataset is expected to grow with the planned updates.
The Home page of DNAmoreDB introduces the database and provides quick links to several of its features. For instance, it is possible to query the database by keywords, see which pages can be browsed, and access the sequence search tool. The toolbar displays all the database's pages and a search bar to query the database contents.
The DNAzymes present in the database are classified according to the reaction that they catalyze, namely: RNA cleavage (1,7), DNA cleavage (9,22), RNA ligation (6,(23)(24)(25), DNA ligation (24,26), DNA site-specific depurination (27), Porphyrin metalation (28,29), DNA phosphorylation (30,31), DNA capping (32), amino acid sidechain modification (9,33,34), thymine dimer repair (35,36), Copper-mediated Azide-Alkyne Cycloaddition (CuAAC) (37), Dephosphorylation (38), Diels-Alder (39), Tyrosine azido-adenylylation (40), Modification of Phosphorylated Amino Acid Side Chains (41,42), Tyrosine Phosphorylation (43,44), Glycosylation (45), Reductive amination (46), Amide hydrolysis (47,48), and Ester hydrolysis (48). This classification allows users to apply filters while browsing the DNAzymes page, which can be accessed from the toolbar and from the quick links on the Home page. DNAmoreDB recapitulates the data on DNAzymes available both in the main text of the publication as well as in the supplementary information. To avoid flooding a user with data, the 'default' option in the DNAzymes page shows only the DNAzymes reported in the main text of the article, unless otherwise specified by the user. The entries can be sorted according to different criteria, such as the length of the catalytic region or their name, and filters concerning the catalyzed reaction, the cofactor requirements, and whether or not the particular DNAzyme has been kinetically characterized can be applied to narrow down the results displayed. Moreover, the sequences displayed with bold characters indicate that the single entry page contains information about the reaction yield and/or rate constants. By clicking on a deoxyribozyme's name, the user is redirected to the single entry page (Figure 1), where the following information is displayed in a tabular format: DNAzyme's name, catalyzed reaction, substrate(s), reaction product(s), functional groups or residues taking part in the reaction, buffer conditions, yield and rate constant (if available), cofactors, the in vitro selected sequence and a Notes section. In addition, the primary reference is displayed, along with related publications, each showing first and last authors, title of the publication, PubMed ID, DOI, and the reaction for which the DNAzyme was selected. Within the single entry pages, the items displayed in blue are linked to other pages of the database or to external pages, while those containing an asterisk (*) point the user to the Notes section of the page. The internal pages of the database that can be accessed from single entry pages are the Help, Reactions ( Figure 1A), Structures ( Figure 1B), and Publications ( Figure 1C); while the external pages that can be accessed are those of the RCSB database (49) if there is structural information available for a given entry, Pubmed, and scientific journals.
To have a broader view on selected DNAzymes, the user may browse the database from the Publications page, where the datasets reported in particular articles can be accessed by clicking on the publication title. In contrast to the DNAzymes' single entry pages, in a publication's page, the available information comprises the publication's abstract, the authors list, the PubMed ID, the DOI, the name of the catalyzed reaction, the pool description, and the list of DNAzymes reported in it ( Figure 1C). The Publications page and the single publication pages are also linked to internal and external resources, as are the DNAzymes and DNAzymes' single entry pages.
A feature of DNAmoreDB useful for users interested in chemistry is the Reactions page. Accessible through the toolbar and by clicking on the reaction names (whenever they appear displayed in blue in other pages) aims at illustrating the reaction chemistries catalyzed by the DNAzymes contained in DNAmoreDB ( Figure 1A). This page is not meant to be an exhaustive chemistry book, but rather to provide representative examples of how the deoxyribozyme's substrates may be activated for the reaction to take place.
Structural knowledge is critical to our understanding of DNA catalysis and crucial for advancing the research on DNAzymes and their possible technological applications. Many of the DNAzymes in DNAmoreDB have been thoroughly characterized from a biochemical and functional point of view, however, to date only the structures of two DNAzymes in functionally relevant conformations have been determined (18,19). The structures of these DNAzymes can be visualized under the Structures tab. For each DNAzyme, the PDB accession codes take the user to a single structure page ( Figure 1B), where the DNAzyme 3D model is displayed along with relevant information such as the resolution at which the structure was determined, the method of structure determination, and the publication re-porting the structure. Additionally, the PDB file can be directly downloaded from DNAmoreDB, although the user may as well follow the link to the original source of structural information (49).
The Help page walks the user through items linked to question mark icons. Whenever an item appears next to a blue question mark, the user can click on it and be redirected to the relevant section of the Help page in which additional information will be provided. Moreover, within this page, the user can consult a number of selected review pa- The database is updated as new papers on the in vitro selection of DNAzymes become available. In particular, we continue to include literature references as they are brought to our attention. DNAmoreDB users are encouraged to use the contact form under the Submit tab to provide us with DNAzymes not yet included, with the scope of making DNAmoreDB as complete as possible.

IMPLEMENTATION
DNAmoreDB has been implemented using Python v.3.6.9 (https://www.python.org/) programming language coupled with the Django web framework v.3.0.5 (http://www. djangoproject.com/) and the Apache2 HTTP server (https: //httpd.apache.org/). The web server uses a PostgreSQL (https://www.postgresql.org/) relational database to store data and leverages on several open-source Javascript libraries as Datatables (http://datatables.net), to make the tables under DNAzymes and Publications pages sortable, interactive and searchable. Moreover, 3Dmol.js (50), a molecular data viewer, is used to show an interactive 3D rendering of DNAzymes structures, as deposited in the PDB within the single structure pages. The website is HTTPSenabled, which means that the data exchange between the user and the DNAmoreDB server is secured by an encrypted connection. In addition, the website is mobilefriendly, adapting itself to the user's screen size and device, making DNAmoreDB easily accessible from tablets and smartphones.
A REST-based API was implemented to make the database contents available to external resources, such as other databases, in a machine-readable format. By simply manipulating the URL of DNAmoreDB it is possible to retrieve information about a single DNAzyme, a group of DNAzymes (e.g. RNA-ligating DNAzymes, or Mg 2+dependent DNAzymes), or all the DNAzymes stored in the database. The API system flattens the existing relations stored in the database, describing the DNAzyme data by using specific identifiers (Supplementary data Table S1). The data can be retrieved formatted as a JSON object (by default) or as CSV (comma-separated value) text. A detailed explanation on how to make use of the API along with several examples is available under the API tab of DNAmoreDB.
The advanced search page offers a possibility to query the database by keyword, sequence, DOI or PubMed ID. In addition to the content directly linked to textual matches (e.g. a DNAzyme name, a publication title, a reaction, etc.), the keyword search returns contents that are contextually related to the query. For example, if the user would like to retrieve information about the DNAzyme 9DB1, the keyword search '9DB1' would not only retrieve the DNAzyme's single entry page, but also the article in which it was first reported, other relevant publications, and structural data. It is also possible to query the database by using the length of the random region used during the in vitro selection process. For example, using 'N40' as a keyword returns a list of DNAzymes that have a 40-nucleotide-long catalytic region.
The sequence search functionality has been implemented using NCBI Nucleotide-Nucleotide BLAST 2.4.0+ (BLASTN) (51,52), which allows querying the database with sequences in FASTA or RAW formats. The sequence search can be fine-tuned selecting the e-value threshold, below which the results should be presented (default is 1e-02) as well as the desired strand directionality (default is plus). The displayed results include the hits below the desired threshold listed according to the reactions they catalyze, their sequences, as well as the classical BLASTN output. The raw BLASTN results can be downloaded as a textual or JSON format.

DISCUSSION
DNAmoreDB is dedicated to DNAzymes and it offers a single entry point for data that so far could be obtained only by meticulously analyzing many different sources, often difficult to browse, such as supplementary materials of published papers. Our database provides users with an easy-to-use interface with flexibility to browse from lists of DNAzymes and publications, filter the results according to different criteria, and choose the order in which the results are displayed. Additionally, advanced search options are available through keyword search, and by sequence.
DNAmoreDB is open for feedback from users to ensure that all DNAzymes' published data are added, all errors are corrected, and up-to-date links to external databases are maintained. We encourage users to use the contact form under the Submit tab in the toolbar to report any error or malfunctioning of the database so that it can be fixed as soon as possible.
Future updates of the DNAmoreDB database will include a Contributors page, acknowledging all the groups contributing to the discovery and engineering of DNAzymes, and also an Applications page, that will make it possible to browse for DNAzymes and DNAzymes-based systems for practical applications. Although the current version of DNAmoreDB includes some reports in which dXTPs have been used, X being any nucleobase that has been modified to harbor a protein-like functionality, we have not included any DNAzymes with modified backbones (XNAs). XNAs and possibly other nucleic-acid based catalytic molecules will be included in the future releases of DNAmoreDB. In the next major update, we also plan to include predicted structures and sequence alignments for DNAzymes predicted to exhibit similar structures.

DATA AVAILABILITY
The web interface to the database is available at https:// www.genesilico.pl/DNAmoreDB/. This website is free, open to all users and no login or password is required.