Online bias-aware disease module mining with ROBUST-Web

Abstract Summary We present ROBUST-Web which implements our recently presented ROBUST disease module mining algorithm in a user-friendly web application. ROBUST-Web features seamless downstream disease module exploration via integrated gene set enrichment analysis, tissue expression annotation, and visualization of drug–protein and disease–gene links. Moreover, ROBUST-Web includes bias-aware edge costs for the underlying Steiner tree model as a new algorithmic feature, which allow to correct for study bias in protein–protein interaction networks and further improves the robustness of the computed modules. Availability and implementation Web application: https://robust-web.net. Source code of web application and Python package with new bias-aware edge costs: https://github.com/bionetslab/robust-web, https://github.com/bionetslab/robust_bias_aware.

The resulting module is shown in Supplementary Figure 1. In addition to the eleven seeds, it contains 60 newly discovered proteins. Running DIGEST (Adamowicz et al., 2022) on the newly discovered targets to assess the functional coherence of the computed module, we obtained highly significant empirical P-values, indicating that the discovered targets might indeed be involved in a joint mechanism (Supplementary Figure 2).
Subsequently, we ran the TrustRank algorithm available via ROBUST-Web's "Drug Search" function to uncover potential drug repurposing candidates targeting the newly discovered proteins. Among the top 20 returned drugs, six drugs target the tissue-type plasminogen activator (PLAT): ximelagatran, melagatran, dabigatran, dabigatran etexilate, argatroban, and aminocaproic acid. Except for aminocaproic acid, all of these drugs also target prothrombin (F2), which is one of the input seeds.
PLAT is associated with the breakdown of blood clots. Zuo et al. (2021) have reported strong correlations between elevated PLAT levels and COVID-19-related hospitalizations, worse respiratory status, mortality and ex vivo clotlysis, and spontaneous fibrinolysis. The protein prothrombin encoded by F2 is associated with blood coagulation in humans (Royle et al., 1987;Degen and Davie, 1987). A closer look at the five drugs which target PLAT and F2 further strengthens the link to thrombosis and coagulation: Dabigatran etexilate is an FDA-approved oral thrombin inhibitor administered for the prevention of stroke in patients with atrial fibrillation (Legrand et al., 2011;Connolly et al., 2009). Ximelagatran is an oral thrombin inhibitor mostly used for the prevention of venous thromboembolism after hip or knee replacement (Evans et al., 2004;Eriksson et al., 2002b,a;Heit et al., 2001). Argatroban is a direct thrombin inhibitor used to treat a wide range of thrombotic disorders (McKeage and Plosker, 2001;Dhillon, 2009;Lewis et al., 2003;Yeh and Jang, 2006).
We performed gene set enrichment analysis (GSEA) via ROBUST-Web's inbuilt g:Profiler (Raudvere et al., 2019) interface on the seven neighboring nodes of PLAT and F2 in the computed module (selected nodes with black border in Supplementary Figure 1). Among the top three most significantly enriched terms, two denote pathways related to coagulation (see Supplementary Figure 3). Together, these results suggest that ROBUST-Web can identify potentially actionable coagulation disease mechanisms shared by severe COVID-19 and comorbid disorders.
Results of GO GSEA for COVID-19 disease modules M b and M u generated by running RO-BUST with, respectively, bait-usage-based (γ = 1) and uniform edge costs are presented in Supplementary Figure 4, along with GSEA results for their set differences M b \ M u and M u \ M b . Significantly (adjusted P < 0.05) enriched terms were obtained using the GSEApy interface of the Enrichr API (Kuleshov et al., 2016). While there is a rather large overlap between the significantly enriched terms found for M b and M u , significantly enriched terms obtained for genes found exclusively with, respectively, bait-usage-based and uniform edge costs are close to disjoint. For instance, the GO Molecular Function term "endopeptidase inhibitor activity" was found only with bait-usage-based but not with uniform edge costs. Abdel-Aziz et al. (2021) have shown a correlation between high expression of endopeptidases and COVID-19 severity (especially in patients with asthma) and various studies have investigated the use of endopeptidase inhibitors for COVID-19 treatment (Luan et al., 2020;Bojkova et al., 2020;Redondo-Calvo et al., 2022). On the other hand, the top ten GO Molecular Function terms obtained upon performing GSEA on the M u \ M b genes include very generic terms such as kinase and phosphatase binding.

Case study into precocious puberty
Precocious puberty (PP) is a condition where the onset of puberty occurs prematurely in children (before age of 8 in girls and age of 9 in boys, according to Oerter Klein (1999)). The cause of PP is unknown, and treatment is largely symptomatic (Carel and Léger, 2008).
Supplementary Figure 5. Precocious puberty disease module computed by ROBUST-Web together with targeting drugs.
Starting with eight PP-associated proteins obtained from OMIM (Amberger et al., 2019) and DisGeNET (Piñero et al., 2020) (UniProt IDs: P63092, P35354, P01229, P01148, Q5JWF2, P84996, O95467 and P05019), we ran ROBUST-Web using the in-built BioGRID protein-protein interaction (PPI) network (Oughtred et al., 2019), bait-usage-based study bias scores and all hyper-parameters set to their default values. The computed module together with targeting drugs is shown in Supplementary Figure 5. Of the eight input seeds, only three are contained in the computed module, since ROBUST filters seed nodes that are too weakly connected in the PPI network (see Bernett et al. (2022) for details). The module contains six newly discovered proteins, including insulin-degrading enzyme (IDE, UniProt ID: P14735). IDE is responsible for the degradation of insulin and natriuretic peptides (Affholter et al., 1990;Ralat et al., 2011) and multiple studies (Chen et al., 2013;Burstein et al., 1987;Sørensen et al., 2012;Hur et al., 2017) have reported elevated levels of both insulin and natriuretic peptides in PP. This leads us to hypothesize that an impairment of IDE might be causally involved in the development of PP.
While most drugs uncovered by ROBUST-Web target the seed protein prostaglandin G/H synthase 2 (UniProt ID: P35354), we find two drugs targeting IDE -biotin and thonzonium. Biotin is a vitamin B7 supplement obtained naturally from different dietary sources. Biotin starvation, although rare, is connected to several clinical conditions including insulin pathway impairment (Salvador-Adriano et al., 2014). Thonzonium is known to play a vital role in bone anti-resorption (Zhu et al., 2016) and Hur et al. (2017) have reported a decline in bone mineral density and bone strength and an increase in bone resorption markers in PP. This suggests that both biotin and thonzonium are potential drug repurposing candidates in PP, which encourages further investigation into their effects on insulin homeostasis through IDE control.
Subsequently, we performed enrichment analysis on the six newly discovered targets, using ROBUST-Web's in-built g:Profiler interface (Raudvere et al., 2019). The analysis returned endoplasmic reticulum (ER) organization as the only significantly enriched term (see Supplementary Figure 6). In mice, Linz et al. (2015) have linked ER stress during puberty to impaired bone development, which further corroborates the functional relevance of the disease module computed and M u were computed as for the COVID-19 use case and significantly enriched terms were again obtained using GSEApy. Again, significantly enriched terms for M u and M b overlap significantly, whereas significantly enriched terms for M b \ M u and M u \ M b are close to disjoint. An interesting example of a term that was found only with bait-usage-based edge costs is GO Cellular Component term "mitochondrial intermembrane space bridging (MIB) complex". MIB-1, which is part of the MIB complex, is one of the main markers of cell proliferation (Spyratos et al., 2002;Querzoli et al., 1996;Tortori-Donati et al., 1999;Ramsay et al., 1995;Scalzo et al., 1998;Diebold et al., 2017), which is a critical component of puberty (particularly relating to testicular growth in males and breast development in females) (Naccarato et al., 2000;Koskenniemi et al., 2017;Marshall and Plant, 1996). On the other hand, the top ten most significantly enriched GO Celular Component terms for M u \ M b genes include very generic terms such as such as "transferase complex, transferring phosphorus-containing groups", "axon", "intracellular membrane-bounded organelle", "protein kinase complex", "nuclear envelope lumen", and "endoplasmic reticulum lumen". That is, genes obtained with uniform edge costs only lead to very generic enrichment results.

Further supplementary information
Association between bait usage scores and functional enrichment. In both the COVID-19 and the PP case study (see Supplementary Figures 4 and 7), we obtained much fewer significantly enriched GO terms when running ROBUST with bait-usage-based rather than with uniform edge costs. A likely explanation for this is that genes with large bait usage scores are over-represented in gene annotation databases, as suggested by Haynes et al. (2018). To assess the plausibility of this explanation, we collected the top 20 genes with the largest bait usage scores f (u) and the bottom 20 genes with the smallest scores bait usage f (u). From each of the two sets, we then subsampled 100 random subsets of size 10 and carried out GO GSEA for all of them. Distributions of the numbers of obtained significantly enriched terms are shown in Supplementary Figure 8. Supplementary Table 1. Details on data used for functional relevance validation. Gene expression data was obtained from Gene Expression Omnibus (GEO) (Barrett et al., 2013), using the GEO2R R interface (https://www.ncbi.nlm.nih.gov/geo/geo2r/). DisGeNET (v7.0) associations were obtained using nDEx (Pratt et al., 2015).   Table 1.