miEAA 2023: updates, new functional microRNA sets and improved enrichment visualizations

Abstract MicroRNAs (miRNAs) are small non-coding RNAs that play a critical role in regulating diverse biological processes. Extracting functional insights from a list of miRNAs is challenging, as each miRNA can potentially interact with hundreds of genes. To address this challenge, we developed miEAA, a flexible and comprehensive miRNA enrichment analysis tool based on direct and indirect miRNA annotation. The latest release of miEAA includes a data warehouse of 19 miRNA repositories, covering 10 different organisms and 139 399 functional categories. We have added information on the cellular context of miRNAs, isomiRs, and high-confidence miRNAs to improve the accuracy of the results. We have also improved the representation of aggregated results, including interactive Upset plots to aid users in understanding the interaction among enriched terms or categories. Finally, we demonstrate the functionality of miEAA in the context of ageing and highlight the importance of carefully considering the miRNA input list. MiEAA is free to use and publicly available at https://www.ccb.uni-saarland.de/mieaa/.


INTRODUCTION
MicroRN As (miRN As) are a class of small non-coding RNAs of about 21-23 nucleotides that play a crucial role in post-transcriptional regulation of gene expression, mainly by binding to 3'UTR regions in target mRNAs to reduce or pre v ent translation ( 1 ). As one of the most studied noncoding RN As, miRN As have been described to be involved in virtually all biological processes ( 1 ), including pathologies such as cancer ( 2 ) or Parkinson's disease ( 3 ), and their expr ession corr elates with aging ( 4 ). Their deletion in experimental models frequently leads to se v ere de v elopmental consequences, which is not surprising considering that as much as 60% of all coding genes are potentially regulated by one or se v eral miRNAs ( 5 ). Although they typically target more than one gene in the same pathway to regulate its activation ( 6 ), each miRNA gene can interact with se v eral hundr ed transcripts spr eading across distinct cell functions ( 7 ). This intricate network complicates straight-forward evalu-W320 Nucleic Acids Research, 2023, Vol. 51, Web Server issue ation of the processes potentially affected by alterations or expression changes in a particular set of miRNAs. In this context, enrichment analysis has become a key tool to provide useful global insights into miRNA r esear ch, particularly in hypothesis-free setups which are a characteristic of NGS-dri v en studies.
Enrichment analysis is applied to sets of genes or miR-N As, frequentl y downstream of differential expression, to interpret and understand the functional aspects of an experiment. This process is necessary to extract relevant information from large lists of genes or miRNAs that would be infeasible to annotate manually. Some tools are specifically tailored to work with lists of miRNAs, either by direct annotation of miRNA genes and their mature transcripts or indirectly by providing enrichment analysis on the corresponding miRNA targets. One prominent example of the former approach is TAM ( 8 ), a w e bserver that relies on direct annotation of miRNAs from the Human MicroRNA Disease Database (HMDD) ( 9 ) and their own liter ature cur ation to include functional and disease-associated terms. In a mor e r ecent work, Cui et al. also introduced importance-based weighing of miRNAs to annotate in an essentiality-informed manner ( 10 ). Other highly used tools like DIANA-miRPath ( 11 ) and miRWalk ( 12 ) indirectly annotate miRNAs by deriving Gene Ontology and pathway terms from their target genes, although this approximation is not without caveats ( 13 ). Additionally, many mainstream gene-centric enrichment w e bservers like GeneCodis ( 14 ) or DAVID ( 15 ) have also included a miRNA module to accept miRBase identifiers as input. Finally, a third group of tools combine both strategies to include direct and indirect annotations in an effort to achie v e consistent miRNA enrichment analysis on an augmented r epertoir e of terms obtained from gene annotations. In that respect, miEAA ( 16 ) stands out by allowing mature and precursor identifiers from 10 different species as input, compiling a total of 139 399 terms from 19 data sources. Furthermore, its API and command-line implementation make it v ery fle xib le and attracti v e to use in pipelines and other automa ted high-throughput applica tions w here a gra phic interface is not needed.
Ra pidl y de v eloping knowledge on miRNAs, new da tabases tha t stor e r especti v e informa tion and upda tes in repositories that are included in the miEAA data warehouse call for regular updates and improvements. In this article we present the most recent version of miEAA, our miRNA Enrichment Analysis and Annotation frame wor k based on GeneTrail ( 17 ). The current update is focused around further expanding the miRNA sets available in the tool. Most notably, the new additions include high confidence miRNAs and their evolutionary origin obtained from MiRGeneDB ( 18 ), tissue-specific and alternati v ely processed miRNAs from isomiRdb ( 19 ), and cell type-specific miRNAs determined from the human cellular microR-NAome ( 20 ). Mor eover, pr eviously available datasets have been updated to reflect their latest additions. Finally, new plots have been designed to help users achie v e better understanding of the processes underlying the provided miRNA list as well as of the interaction among the enriched terms or categories. MiEAA is free to use and publicly available at: https://www.ccb.uni-saarland.de/mieaa/ .

New datasets integrated
The current update introduces new data from three resources: MiRGeneDB, isomiRdb and the human cellular micr oRNAome. Fr om MiRGeneDB (v2.1) we have retrie v ed bona fide , high confidence, mature and precursor microRNA identifiers as well as the corresponding sets comprising locus or family of origin for the supported species ( bta, cel, dme, dre, gga, hsa, mmu, rno ). We used isomiRdb to calculate tissue-specific miRNAs in human, considering as such miRNAs with TSI values > 0.75. We also compiled sets of miRNAs that are confidently modified by TUTases or display alternati v e processing by DROSHA. Finally, we calculated cell-specific mature micr oRNAs fr om the human cellular micr oRNAome in the same fashion described for tissue-specific miRNAs. We lifted-ov er e xpression values to bta , mmu and rno using MiRGeneDB identifiers for orthology conversion. For example, hsa-miR-22-3p was converted to bta-miR-22-3p since they share the same MiRGeneDB identifier. External datasets previously present in miEAA including mammal ncRNA-disease repository (MNDR v3)( 21 ), miRTarBase (v9.0) ( 22 ), RNAlocate (v2.0) ( 23 ), Gene Ontology ( 24 ) and KEGG ( 25 ) were updated following new releases using our Snak emak e pipeline as described previously ( 26 ).

Results visualization
In addition to the already available visualizations, namely term word-cloud and heatmap of enrichment P -values, sever al new gr aphs have been implemented to provide users with a better ov ervie w of the most significant terms, the Pvalue distribution across categories and the overlap between different sets. Se v eral dropdown menus hav e also been implemented to allow subselection of categories of interest.
Bar chart. This graph allows the user to list enriched terms sorted by significance, number of hits or Observed / Expected ratio and r epr esent the values as bars.
Categories summary and categories p-values. These two plots show the count of enriched terms per category and boxplots of their p-values, respecti v ely. Besides a general ov ervie w of all the enriched terms, they enable quality control to discard that all enriched terms belong to the same category or whether p-values are uniformly distributed across categories.
Nucleic Acids Research, 2023, Vol. 51, Web Server issue W321 Upset plot. This chart can be used to visualize the intersections between different enriched sets, which can re v eal a core set of miRNAs that are present in multiple relevant categories. This could either hint a possible interesting network, target of further downstream investigation, or re v eal the existence of an overlapping subset of miRNAs that leads to the co-detection of these terms as significantly enriched when this subset is present in the input list.

Use case dataset
From the original non-coding RNA atlas in aging publication ( 27 ), we downloaded the supplementary table containing correla tion coef ficients. For each tissue, Spearman rank correlation with age was calculated using e v ery miRNA e xpressed over 1 rpmm in at least 10% of the samples in that tissue. Overr epr esentation analysis (ORA) was performed using positi v el y correlated miRN As ( r > 0.5). MiRN A Set Enrichment Analysis was performed on miRNAs sorted in decreasing order by Spearman rank correlation coefficient (Supplementary Table 1). For aggregated anal ysis, miRN As considered for overrepresentation in each tissue were intersected and used to perform ORA on miRNAs positi v ely correlated in at least 3, 4 and 5 different tissues.

Updates and new functionality facilitate a broader application spectrum
The latest release, miEAA 2023, focuses on two main objecti v es: improving the underlying data warehouse and providing new graphical r epr esenta tions to facilita te a rapid interpretation of results across different categories. Additionally, we hav e improv ed the backend functionality, making it possible to analyse hundreds of requests in a more convenient manner using a more functional API.
The data warehouse is a core strength of miEAA, integrating numerous r esour ces de v eloped by us and others. In the previous version, we included 16 different databases. In the current release, we have added three more repositories and updated the existing ones (Figure 1 A). To facilitate the understanding, we grouped the 19 repositories we used in miEAA into three sets based on the number of categories they add: very large repositories with at least 5000 categories, medium-size repositories with at least 500 categories, and small repositories with fewer than 500 categories. Of note, our size classification is not based on the database's overall size or functionality but on the number of categories added to miEAA from the respecti v e resources. Interpreting the databases within each category provides evidence that miEAA 2023 describes updates of the large and medium-sized databases while the newly added r esour ces are medium-sized to small (Figure 1 A), in line with the curr ent tr end towards mor e specific and functional miRNA r esearch.
We have observed that users' queries are becoming increasingly complex in terms of their number and frequency, which is further aggravated by the growing number of terms available, particularly in human. To address this challenge, we have added functionality for aggregated analysis, allowing users to interpret the results of all categories in a single run. Upset plots were found to be among the most useful r epr esentations we explor ed during the testing phase because they allow interacti v e comparisons of an arbitrary number of intersections. This way, a user can identify relevant sets of miRNAs that are shared across se v eral genes, pathways, or diseases. The updated data sets, the new repositories supported and the enhanced functionality to interpret significant categories and their overlap in context greatly improve the application scope of miEAA.
Ageing r esear ch as use case to demonstr ate the perf ormance of miEAA To demonstrate the functionality of miEAA, we analysed the enrichment from miRNA expression patterns in ageing mice as use case. In short, this study reported se v eral miRNAs tha t correla ted with ageing in dif ferent tissues after profiling 771 samples employing miRNA-seq ( 27 ). We first performed Overr epr esentation Analysis using the list of positi v el y correlated miRN As for each tissue. No general trend could be observed after individual assessment of each list, besides the occasional appearance of Parkinson's disease, which is by definition associated with age. Ne v ertheless, some interesting terms were significantly enriched for specific tissues. For instance, the Gene Ontology term 'Negati v e regulation of fat cell differentiation' was significantly overr epr esented for marrow adipose tissue and Hdac4 , a gene involved in bone and muscle de v elopment, was ov err epr esented in miRNAs from bone. In fact, some studies have proposed the usage of different Hdac4 inhibitors as potential anti-ageing therapy. We then performed miRNA Set Enrichment Analysis on the miRNAs ranked for each tissue by their correlation coefficient and identified some interesting enrichments. For instance, we found the disease term inflammation obtained from the mammal ncRNAdisease repository (MNDR), to be significantly enriched in the brain ( P -value = 6.04e-4, Figure 1 B), an organ chronically affected by systemic inflammation ( 28 ). Although inflammation is a health y ph ysiological response, it tends to become chronic with age in a process where the influence of the adipose tissue is well known ( 29 ). Surprisingly though, we did not find this term to be significantly enriched in subcutaneous, brown or marrow adipose tissue (Figure 1 C-E). Additionally, other disorders that typically increase with age such as diabetes or metabolic disease were also not present in these results.
We next proceeded to analyse the enrichment of the overla pping miRN As in an aggr egated manner (i.e. corr elated with aging in more than n tissues) to obtain global insights. For the 30 miRNAs correlated in at least 3 different tissues, 171 entries were detected as significantly overrepresented. The analysis re v ealed se v eral significant disease terms that ar e mor e fr equent in aged individuals like sar copenia, sensorineural hearing loss, heart disease, cardiovascular disease , Parkinson's disease , and se v eral types of cancer, which is not surprising considering the input miRNAs positi v ely correlated with age in se v eral tissues. Using the interacti v e UpSet plot (Figure 1 F), we identified tha t a t least one or more miR-29 family members (miR-29a-3p, miR-29c-3p , miR-29a-3p , miR-29c-5p , miR-29a-5p , miR-29b-2-5p) wer e pr esent in e v ery one of the fiv e most significantly en- riched diseases. The original stud y alread y described and validated a prominent role of this miRN A famil y in the ageing process, particularly miR-29a-5p, which was positi v ely correlated with age in eight different tissues. This work also found a potential EXOmotif present in the miR-29 family ( 30 ), which is coherent with the detected overrepresentation of miRNAs located in Extracellular Vesicles according to RNAlocate ( P -value = 0.0015). After restricting our input list to age-associated miRNAs in at least four tissues, the enriched entries were reduced to 43. Among these, we could no longer find most diseases from the previous analysis except for Parkinson's disease and hepatocellular carcinoma. Notab ly, fiv e out of the six significantly overr epr esented diseases have one or se v eral hits from the miR-29 family (Figure 1 G). After requiring 5 different tissues, this effect further increased and only Parkinson's disease remained from the previously described set. In view of the importance of the miR-29 signal in the input list and to explore bias in the results due to their inflated presence, we decided to collapse all miRNAs belonging to this family into miR-29a-3p and miR-29c-5p, the most frequently correlated from each precursor arm. Since annotations for these miRNAs will highly over lap, particular ly those deri v ed from computational predictions based on the seed sequence, we expected this would reduce its influence to some extent. By removing just these few miRNAs from the list, the total amount of overr epr esented terms detected dr opped fr om 171 to 22. Despite the strict filtering, many terms previously identified like Hyperalgesia or Parkinson's disease remained overrepresented so the enrichment in at least some categories goes beyond the influence of the miR-29 family. In summary, our use case exemplifies the functionality of the current version of miEAA in providing a sensible ov ervie w of the processes potentially connected to gi v en sets of miRNAs. These results also highlight how small changes in the input list can have important consequences in the enrichment analysis, calling for careful consideration of the inclusion criteria or parameters.

The enlarged r epertoir e of sets provides new insights
Besides the incorpora ted visualiza tions, designed to support r esults interpr etation, ther e is also value in the newly generated and updated datasets. To illustrate this, we reanalysed previous studies that relied on miEAA and explored insights and hypothesis that can only be deri v ed using the most recent version. For instance, a stud y tha t proposed miRNA-based biomarkers to diagnose hypertrophic car diomyopathy identified se v eral enriched KEGG entries associated to the list of altered miRNAs such as se v eral lipid and amino acid metabolic pathways ( 31 ). Using the ne w v ersion, we identified 49 ne w significantly enriched terms, including miRNAs that display TUT4 / 7-uridylated isoforms. Interestingly, defecti v e miRNA urid yla tion has previously been linked to myotonic dystrophy which can lead to hypertrophic cardiomyopathy ( 32 ). We also compar ed the pr evious and curr ent output from a different stud y tha t described how the miRNA cargo of extracellular vesicles (EVs) derived from mesenchymal stromal cells reduced inflammation in a dry eye disease model ( 33 ). The authors examined the 10 most abundant miRNAs in EVs using miEAA to dissect the signalling processes that might be driving the observed effects and concluded that these miRNAs targeted se v eral important immune-related pathways like NF-B. Along the same lines, we found one new enriched target gene added from the latest miRTar-Base release, AKT1 , which is highly expressed in immune cells and participates in different inflammatory pathways ( 34 , 35 ). Finally, we compared the enrichment results obtained from d ysregula ted lung miRNAs in a mouse Cryptococcus neoformans infection model ( 36 ). The current version re v ealed se v eral enriched GO terms associated to defence r esponses (GO0002357:defense r esponse to tumor cell, GO0051607:defense response to virus, GO0042742:defense response to bacterium, GO0050830:defense response to Gram-positi v e bacterium) whereas the previous release only found one (GO0051607:defense response to virus). Even though C. neoformans is not included among neither of the above-mentioned pathogenic agents, it can be assumed that it would elicit a cellular reaction similar to the others. In the original work, the authors analysed the enriched KEGG pathways to e v entually conclude the acti vation of similar responses. Now, the same conclusion can be directly deri v ed from the input miRNA list.

DISCUSSION
The version of miEAA pr esented her e offers a complete and comprehensi v e miRNA enrichment tool considering the broad compilation of direct and indirect terms obtained from a variety of updated or newly added r esour ces; the incorpora ted visualiza tions such as Upset plot, which enables the exploration of set intersections; and the API service for automated high-throughput querying using ORA and GSEA. We have showcased some of the new functionalities with an example and by comparing the enrichment results of three studies to the previous version, warning about the consequences of overr epr esenting miRNAs from the same family in the input and highlighting the implications of e v en very slight changes in the list. In light of this observation, the convenience of the new visualizations to spot overlapping sets of miRNAs across different enriched categories becomes apparent.
Admittedly, the current term repertoire expansion has only been possible by orthology conversion of human entries and derivation from target gene annotations. These transfer approaches are limited by the existence of ortholo gous miRN As, besides other shortcomings previousl y described, and may e v entually lead to false positi v e entries. Although ideally most terms should be based on speciesspecific direct annotations extracted from literature re vie w, this can be challenging if not infeasible for many species, particularly because it is harder to automate, and far more r esear ch is published on human compared to other species. For instance, Gallus gallus has 30 times fewer entries than human and most categories come from miRBase, miRTar-Base and KEGG, since other r esour ces do not support this species. In this context, robust methods for indirect annotation become a great supplement to perform enrichment analysis or e v en become a complete replacement for situations where a better alternati v e is not available.

W324 Nucleic Acids Research, 2023, Vol. 51, Web Server issue
Rapid changes in miRNA r esear ch and growing numbers of user requests call for regular w e b server releases. One example of new functionality that will be added next is cross-dataset comparison of results (e.g. useful for time series data), cross-species comparison of the results (e.g. comparing results for a cancer in humans and mouse models) and inter-species regulation (e.g. small RNAs from bacteria targeting eukaryotic genes ( 37 )). These examples demonstra te tha t despite 8 years of de v elopment, a growing body of new analysis functionality is r equir ed to keep track with de v elopments in the miRNA r esear ch field. Curr ent and upcoming features position miEAA as a tool that supports the functional and mechanistic understanding of miRNA roles. MiEAA howe v er already stands out in this aspect as a versatile solution allowing users to restrict the categories used for enrichment analysis to direct or indirect depending on the target species or their specific needs and relying on useful visualizations that can help spot biolo gicall y interesting subsets and discerning them from potentially misleading annotation artifacts.

DA T A A V AILABILITY
miEAA 2.1 is freely available at https://www.ccb.unisaarland.de/mieaa/ . No login is r equir ed. All datasets included in the tool are available in the downloads page.

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.