DIANA-microT 2023: including predicted targets of virally encoded miRNAs

Abstract DIANA-microT-CDS is a state-of-the-art miRNA target prediction algorithm catering the scientific community since 2009. It is one of the first algorithms to predict miRNA binding sites in both the 3′ Untranslated Region (3′-UTR) and the coding sequence (CDS) of transcripts, with increased performance. Its current version, DIANA-microT 2023 (www.microrna.gr/microt_webserver/), brings forward a significantly updated set of interactions. DIANA-microT-CDS has been executed utilizing annotation information from Ensembl v102, miRBase 22.1 and, for the first time, MirGeneDB 2.1, yielding more than 83 million interactions in human, mouse, rat, chicken, fly and worm species. Additionally, this version delivers predicted interactions of miRNAs encoded from 20 viruses against host transcripts from human, mouse and chicken species. Numerous resources have been interconnected into DIANA-microT, including DIANA-TarBase, plasmiR, HMDD, UCSC, dbSNP, ClinVar, as well as miRNA/gene abundance values for 369 distinct cell-lines/tissues. The server interface has been redesigned allowing users to use smart filtering options, identify abundance patterns of interest, pinpoint known SNPs residing on binding sites and obtain miRNA-disease information. The contents of DIANA-microT webserver are freely accessible and can also be locally downloaded without any login requirements.


INTRODUCTION
microRN As (miRN As) are short non-coding RN As that post-transcriptionall y target miRN A reco gnition elements (MREs) in the 3 untranslated region (3 -UTR) ( 1 ) and the coding sequence (CDS) ( 2 ) of transcripts, primarily inducing transcript degradation and / or protein synthesis stall. The man y-to-man y rela tionship tha t miRNAs exhibit with messengers and other RNA species (e.g. long non-coding RNAs ( 3 ), circular RNAs ( 4 )) place them centrally to the post-transcriptional regulatory nexus and deem them important players in finetuning potentially all biological processes.
Despite the relati v e wealth in availab le e xperimentally verified interactions ( 13 , 14 ), miRNA target pr ediction r emains relevant; for a number of species, states and annotation schemes (e.g. the r efer ence miRNA database Mir-GeneDB ( 15 )) there is limited to no experimental support for miRNA targets. In such scenarios, target prediction may be the only way to guide downstream experimental and computa tional investiga tions of miRNAs' function and roles.
The rules tha t dicta te ef fecti v e (host and viral) miRNA binding are still being e xtensi v el y studied. Accum ulating evidence produced during the past two decades delineates potent fea tures tha t can be utilized to detect robust and efficacious MREs (16)(17)(18)(19)(20). These include characteristics regarding (a) the site accessibility on the candidate MRE, (b) thermodynamic properties of the MRE and its flanking sequence, (c) the MRE position within the 3 -UTR / CDS, (d) the sequence composition of the MRE, as well as of its flanking regions, (e) the miRNA-MRE duplex stability, described in ther modynamics ter ms, biochemical stability terms, as well as through the number of matches, mismatches, wobble pairs and bulges that characterize the miRNA seed binding (positions 2-7 from the miRNA's 5end) and the entire binding, as well as (f) conservation metrics of the binding sites.
DIANA-microT-CDS ( 21 ) was one of the first target prediction algorithms to integrate PAR-CLIP (photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation followed by high-throughput sequencing)deri v ed data during its training and testing, and to be composed of two distinct models trained separately for the 3 -UTR and the CDS sequences. The pre vious v ersion of the DIANA-microT w e bserver ( 22 ) focused on updating the resources used to perform predictions and providing a number of services that ar e curr ently cover ed by other DIANA-Tools online applications ( 23 , 24 ).
In this version we deliver an extensive set of microT-CDSpredicted interactions based on updated annotations of miRNAs and genes. DIANA-microT 2023 provides interactions utilizing miRBase ( 9 ) and MirGeneDB ( 15 ) as sources for miRNA sequences for human, mouse, rat, chicken, fly and worm species. Additionally, a specialized set of v-miRNA interaction predictions against host genes is also provided for viruses that infect human (14 viruses,, mouse (2 viruses, 57 miRNAs) and chicken (4 viruses, 100 miRN As) w hich have been annotated by miR-Base as miRNA-encoding. The positions of known SNPs from the r efer ence r esour ces dbSNP ( 25 ) and ClinVar ( 26 ) hav e been ov erlapped with predicted MREs, to indicate instances wher e pr edicted miRNA targeting efficacy could be altered b y v ariants. We integrated miRNA abundance estimates regarding 60 tissues and 210 cell-lines deri v ed from the miRNA Tissue Expression Database, DIANA-miTED ( 27 ) and gene expression for 99 human and mouse contexts from the Genotype-Tissue Expression project (GTEx) ( 28 ), The Cancer Genome Atlas (TCGA) ( 29 ) and the reference mouse publication by Sollner et al. ( 30 ), to make possible the incorporation of abundance information in returned interactions. Finally, human miRNAs are annotated regarding their causal associations with diseases from HMDD r esour ce ( 31 ), as well as with their capacity to function as circulating miRNA biomarkers, from plasmiR ( 32 ). The DIANA-microT 2023 w e bserver interface has been upgraded, and useful filtering and query functionalities have been set up to facilitate br owsing thr ough the enhanced content.

Supplementary data collection
Conservation. Corresponding phastCons BigWig files were deri v ed from UCSC for each species. Mean phastCons per MRE were calculated using bigWigAv erageOv erBed from UCSC Utilities ( 34 , 36 , 37 ).

Abundance information.
Gene expression data for human and mouse were obtained from GTEx ( 28 ), TCGA ( 29 ) and the Sollner et al. mouse expression atlas ( 30 ), resulting in 99 distinct tissue states. Raw read counts were transformed to transcripts-per-million. The median TPM across replica tes was estima ted for each sta te and log 2 -transformed after adding one. z -scores, denoting the distance in standar d de viations of the gene's e xpression from the mean of expressions within each sta te, were calcula ted and annotated to genes. Similarly, summarized miRNA abundance estimates were obtained through DIANA-miTED ( 27 ) as Reads-Per-Million (RPM) values. They correspond to 270 cell-line / tissue states in human (miRBase annotation only; non-viral miRN As). Lo g 2 -transformation and z -score scaling was again applied for each state.
Disease information. Causal associations of miRNAs against diseases wer e r etrie v ed from HMDD 3.2 ( 31 ). Circulating miRNA biomarker information was deri v ed from plasmiR ( 32 ). For each miRNA, the number of HMDD or plasmiR entries supporting the association or biomarker capacity respecti v el y was tallied and utilized to create miRN A disease clouds, implementing acti v e hyperlinks towar ds these external r esour ces.

Implementation
The microT-webserver utilizes the Model-View-Controller (MVC) softwar e ar chitectur e as its basis through use of the Lar avel 8 PHP fr amework, and a RESTful interface to communicate with the Angular-based frontend. The w e bserver is hosted on an Apache 2.4 HTTP server while data are stored in a relational database managed by a PostgreSQL 11.8 server. The PHP framework Laravel 8 ( https://laravel. com/ ) (PHP 7.2) handles the back-end logic including the connection to the PostgreSQL server for the storing and retrieval of the data. The front-end is designed as a one-page w e bsite using Angular 14 ( https://angular.io/ ), employing the Angular Material UI library ( https://material.angular. io/ ) and the ngx-bootstrap ( https://valor-software.com/ngxbootstr ap ) fr ame wor k for its visual and functional components. Finally, da tabase sta tistics ar e pr esented using the Chart JS ( https://www.chartjs.org/ ) library, while AnyChart ( https://www.anychart.com/ ) is utilized for the word-cloud visualizations provided to portray the frequency of diseases associated with specific miRNAs and provide acti v e hyperlinks towards HMDD and plasmiR servers.

Interface and functionality
The previous DIANA-microT version featured a minimal interface to assess miRNA target predictions, consisting of a miRNA / gene input service and a filter to manually set the prediction score threshold. This philosophy is The microT score threshold can be manually set, while ( 5 ) the option to limit output to miRNA entries annotated as 'Highly confident' by miRBase (MirGeneDB miRNAs are all annotated as 'Highly confident') can be employed. Users may also ( 6 ) retain only interactions and MREs predicted on the 3 UTR or the CDS of transcripts, ( 7 ) r equir e that output has supplemental experimental (DIANA-TarBase) or predicted (TargetScan) support, and ( 8 ) select among available tissues and cell-lines to deri v e additional information regarding the abundance of miRNAs and / or genes specifically there.
inherited in the interface of DIANA-microT 2023 w e bserv er, enab ling fast and easy retrieval of predictions of inter est (Figur e 1 A). The basic menu r equir es a target species to be set, a miRNA annotation source to be selected and any number of miRNAs to be provided. Yet, if r equir ed, users ma y unf old a supplemental menu that allows more sophisticated queries to be performed (Figure 1 B). Apart from the minimum prediction score, the capacity to deri v e interactions only supported by highly confident miRNA annotations, output only MREs on the CDS or 3 -UTR, or output only interactions that are also supported by TarBase and / or TargetScan is offered. Importantly, two dedicated controls enable selection among the available tissues and cell-lines, annotating the results with abundance information on them.
The generated interactions table has been refurbished to provide all associated information in an intuiti v e hierar chical schema (Figur e 2 ). The primary layer provides interaction-le v el details; each miRNA-gene pair is accompanied by its interaction score, notation to indicate further predicted / experimental support, a link pointing to a dedicated UCSC track with all interaction-specific MRE positions and z -scores of abundance metrics in case specific e xpression conte xts hav e been selected for genes and / or miRNAs. Abundance metrics can be used to provide conte xt-specific e xperimental support, e.g. hsa-miR-143-3p in the example is highly expr essed r elati v e to other miRNAs (6.3 standard deviations higher than the miRNAs' mean) and PSG4 appears to be moderately r epr essed (1.25 standar d de viations lower than the genes' mean). Gene identifiers and miRNA names can be clicked to re v eal geneand miRNA-le v el details, including causal association and biomarker disease-clouds for each miRNA. Supplemental information on the MRE le v el is provided by expanding an entry of interest. The region (3 -UTR or CDS), MRE coordinates on the respecti v e transcript / genome, primary binding type and MRE score are the basic MRE details. Additionally, the average phastCons conservation of the predicted MRE is provided and pop-up buttons enable retrieving a schematic of the MRE binding area and information about SNPs overlapping the MRE sequence.

CONCLUSION
As miRNA r esear ch progr esses, novel miRNA annotation efforts are made available and gradually become accepted by the comm unity. Similarl y, e v en though viral miR-NAs have been discovered and annotated in the past, their Figur e 2. DIAN A-micr oT 2023 output format. The pr ovided output is organized into a paginated-e xpandab le list of results. The first layer of information includes ( 1 ) the interacting miRNA and gene, which can be selected to re v eal / hide supplemental details. For genes, these include the Gene Description, the r epr esentati v e transcript ID, the Ensemb l v ersion and hyperlinks towar ds TarBase and Ensemb l. For miRN As, the miRN A sequence, links towards miRBase / MirGeneDB, plasmiR, DIANA-miTED, and informati v e wor d-clouds based on causal disease associations (HMDD) and miRNA biomarkers (plasmiR) ar e offer ed. ( 2 ) Information r egarding interaction scor e and supplemental support from other sour ces, ( 3 ) a hyperlink towards the UCSC Genome Browser and, if available, ( 4 ) abundance metrics in a specific context are available. Expanding the view, MRE-level details, including ( 5 ) the transcript region, site coordinates , binding type , ( 6 ) average conservation of the MRE, ( 7 ) its overlap with known SNPs, as well as ( 8 ) a text-based depiction of each binding area can be viewed. ( 9 ) Interaction-or MRE-le v el results may be retrie v ed locally in tab-delimited format, while the entire set of interactions per-species is available for download in a separate dedicated tab. targetomes remain elusi v e and hav e been mostly studied e xperimentally for a limited set of prominent viruses, such as EBV and KSHV, and v-miRNAs. These valuable miRNA sets can have limited utility if in silico and wet-lab approaches fall short of integrating them. miRNA target prediction constitutes a missing link towards the effecti v e assessment of their interactomes, their further functional investigation and their proper incorporation into downstream experimental studies. Importantly, miRBase and MirGeneDB can exhibit sequence differences (both on the 5 -and the 3 -end) e v en in mature miRNAs that they both provide, with immediate effects on the corresponding targeting r epertoir es. DIANA-microT 2023 w e bserver bridges this gap by deli v ering a service of miRNA interactions for both miRBase and MirGeneDB, as well as predictions of v-miRNA interactions with host transcripts. This major upgrade is enveloped into a newly designed interface offering new functionalities, interconnections with other tools and unr estricted r etrieval capacity.

DA T A A V AILABILITY
DIANA-microT 2023 server is accessible freely and without login r equir ements ( www.microrna.gr/ microt w e bserver , www.microrna.gr/w e bServer ). Query results, as well as the entire set of interaction predictions, are also available for local retrieval through the application.