The DOMINO web-server for active module identification analysis

Abstract Motivation Active module identification (AMI) is an essential step in many omics analyses. Such algorithms receive a gene network and a gene activity profile as input and report subnetworks that show significant over-representation of accrued activity signal (‘active modules’). Such modules can point out key molecular processes in the analyzed biological conditions. Results We recently introduced a novel AMI algorithm called DOMINO and demonstrated that it detects active modules that capture biological signals with markedly improved rate of empirical validation. Here, we provide an online server that executes DOMINO, making it more accessible and user-friendly. To help the interpretation of solutions, the server provides GO enrichment analysis, module visualizations and accessible output formats for customized downstream analysis. It also enables running DOMINO with various gene identifiers of different organisms. Availability and implementation The server is available at http://domino.cs.tau.ac.il. Its codebase is available at https://github.com/Shamir-Lab.


Introduction
High-throughput omics data analysis frequently utilizes biological networks. In these networks each node represents a cellular subunit (e.g. a gene or its protein product) and each edge represents a relationship between two subunits (e.g. a physical interaction between two proteins). Integrated analysis of a biological network and a molecular profile measuring gene activity levels under a certain condition can greatly boost the functional interpretation of the data (Mitra et al., 2013). Activity levels can be calculated by measuring differential expression between two conditions or samples (Chuang et al., 2007;Ideker et al., 2002), by providing a set of genes associated with a disease as inferred from a GWAS (Chang et al., 2015;Ferná ndez-Tajes et al., 2019;Nakka et al., 2016), or by estimating the mutation load of genes in cancer patients (Cerami et al., 2010). Active Module Identification (AMI) methods seek 'active modules', i.e. connected subnetworks that show a marked over-representation of high activity levels (Ideker et al., 2002;Leiserson et al., 2015). Such modules can reveal biological processes involved in the probed condition. A popular way to infer these biological processes is by conducting Gene Ontology (GO) enrichment analysis on each module (The Gene Ontology Consortium, 2019).
Recently, we evaluated six popular AMI algorithms and analyzed the GO terms that were enriched on their modules. We observed a high rate of non-specific calls of enriched GO terms in most algorithms, putting to question their capacity to illuminate processes that are specifically relevant to the probed conditions (Levi et al., 2021). Furthermore, we introduced DOMINO, a novel AMI algorithm with markedly higher rate of empirically validated calls (Levi et al., 2021). Of note, similar results were observed by a recent independent benchmark study, which also reported that DOMINO's modules had substantially higher biological signals than modules found by other AMI algorithms (Lazareva et al., 2021).
The original DOMINO tool requires download and installation on the user's machine. Here, in order to make DOMINO more accessible to researchers, we provide an online service requiring no installation. The server also enables GO-term enrichment analysis on each module, module visualization, standard output formats for downstream analysis, and options to run DOMINO with other organisms.

Input files: active gene sets and network file
The input for DOMINO is a set of active genes and a network. Note that DOMINO uses only binary gene scores (active/not active under the probed condition) and not real value scores. For the network, the user can upload a custom network file or choose a pre-loaded network. The available networks include DIP (Xenarios et al., 2002), HuRI (Luck et al., 2020) and STRING (Szklarczyk et al., 2017) with edge confidence score > 900, and PCNet (Huang et al., 2018). The preloaded networks use a cache mechanism (detailed in (Levi et al., 2021)) for faster runtime. Runs on DIP, STRING and HuRI networks (<250K edges and <20K nodes) typically require up to two minutes, and on PCNet (2.5M edges and 20K nodes) 10 minutes. For custom networks (<10 MB; <300K edges) the runtime can be up to 5 min. In addition, the user can provide several active gene sets (e.g. for different diseases) in order to analyze and compare the results in a single execution.

Resulting modules
After providing the input files and clicking the 'execute' button, a request is sent to the server to run DOMINO and perform additional analyses. The resulting modules are ranked by DOMINO's internal scores, and visualized using Cytoscape.js (Franz et al., 2016). Gene symbols are shown and right-clicking on a gene's node brings up its GeneCards page (Stelzer et al., 2016). Alongside the module, the genes it comprises are shown. The user can navigate between different modules and solutions.

GO enrichment analysis
GO enrichment analysis is performed on each module and FDR corrected for multiple testing using the goaltools library (Klopfenstein et al., 2018). The background genes used for this analysis are those comprising the input network. A list of GO terms and their enrichment scores are reported in a table alongside the visualized module.

Downloading results
To enable further use of the solution, results can be downloaded by the user. Each module can be downloaded in two forms: (i) HTML (with the visualization and other results as they are shown in the DOMINO website), and (ii) text files of the list of genes in the modules and a table summarizing the results of the GO enrichment analysis. This enables additional customized downstream analyses of modules and enriched GO terms.

Analyzing other organisms and gene identifiers
DOMINO uses by default human ENSEMBL or gene symbol identifiers. The website provides two options to run DOMINO with a list of non-human gene identifiers: (i) if the active gene list contains mouse ENSEMBL identifiers, and one of the pre-loaded networks is chosen, the genes in the active gene set will be converted to their corresponding human orthologs. In this case, GO enrichment analysis will be applied to the resulting modules. (ii) If a custom network is supplied by the user, DOMINO matches the gene identifiers in the active gene list to the network. Note that in this case, the gene identifiers need not be taken from ENSEMBL, but can be of any species. In this case, GO enrichment analysis is not executed.

API calls for automated pipelines
To enable scripts to perform automatic calls to the server, we exposed a web-API for the execution of DOMINO. Details are provided on the landing page of the website.

Supported browsers
We extensively tested the website under Firefox (94.0.1) and Chrome (version 96.0.4664.110). It can be run on other browsers as well.

Results
As a showcase, we uploaded an input set of 155 genes related to autism spectrum disorder (ASD) [taken from The SPARK Consortium (2018)]. We ran the tool with the preloaded STRING network. DOMINO detected eight modules in this run. Figure 1 shows the two largest modules. Reassuringly, they correspond to two distinct fundamental biological processes that are known to be severely abrogated in brains of ASD patients (Satterstrom. FK. et al., 2020): (i) chromatin remodeling and regulation of transcription (Cunniff et al., 2020;LaSalle, 2013) (Fig. 1A and C) and (ii) defects in neuronal trans-synaptic signaling (Guang et al., 2018) ( Fig. 1B and D).
We repeated the analysis on three other networks: DIP, HuRI and PCNet. Results are available in the project's Github. Several modules generated using different networks showed high overlap in genes and GO terms. A recent systematic evaluation of different networks reported that interactions that appear in more than one network are informative for module identification and recommended the usage of PCNet (Huang et al., 2018). As the optimal network also depends on the input set of active genes, users may wish to run DOMINO with several biological networks.