MODifieR: an Ensemble R Package for Inference of Disease Modules from Transcriptomics Networks

,


Introduction
Various algorithms have been proposed to infer groups of diseaseassociated genes using networks, i.e. disease modules.Recently, the DREAM community compared different module inference methods that were based on network topologies to identify disease-associated genes (Choobdar et al., 2019) which led to the tool MONET (Tomasoni et al., 2019).However, no similar tools exist for extracting disease-specific modules by integrating gene expression differences of patients and controls.In addition, related work proves the benefits of using a consensus approach by integrating results from individual network-based methods for determining gene-disease associations (Navlakha and Kingsford, 2010).We, therefore, propose the R package MODule IdentiFIER (MODifieR), which lets the user easily run nine popular module inference methods within a unified framework and inspect the result.Six methods are integrated from previous packages DIAMOnD ( Ghiassian et al., 2015), DiffCoEx (Tesson et al., 2010), MCODE (Bader and Hogue, 2003), MODA (Li et al., 2016), ModuleDiscoverer (Vlaic et al., 2018) and WGCNA (Langfelder and Horvath, 2012), while three methods have been packaged from our previous publications, namely Clique-Sum (Bruhn et al., 2014;Gustafsson et al., 2014) and Correlation-Clique (Gawel et al., 2019;Hellberg et al., 2016).
MODifieR processes transcriptomic data from RNA-Seq or microarrays to differentially expressed genes (DEGs) and normalized expression matrices, encapsulating them into standardized input objects.We illustrate the use of the tool by applying it to public gene expression datasets from the asthmatic cohorts of the U-BIOPRED project.(Supplementary Material S1).

Software implementation
The main concept of MODifieR is illustrated in Figure 1.The general workflow is centered around two main objects: MODifieR_input objects, that contain preprocessed data from microarray or RNA-Seq and MODifieR_module objects, containing inferred modules and related data.

Input
Input objects are created from transcriptomic data from microarrays or RNA-Seq data.Differential expression analysis of microarray data is performed on probe level using LIMMA (Ritchie et al., 2015).In case of multiple probes mapped to the same gene, the lowest P-value is selected.For co-expression analysis, probes are collapsed into genes using WGCNA collapseRows function (Miller et al., 2011).RNA-Seq differential expression analysis follows the standard edgeR workflow (Robinson et al., 2010).The preprocessing for co-expression-based analysis consists of variance-stabilizing transformation from the DESeq2 package (Love et al., 2014) and an optional quantile normalization.The objects contain DEGs, an expression matrix, grouping of samples and a list of parameters used when creating the object.
To work with a pre-computed list of DEGs or a co-expression matrix, the data can be enclosed in an input object.If a method requires DEGs, it also requires a protein-protein interaction network to overlay the DEGs on, provided separately when calling the module inference function.For details, see sub-chapter Inference methods in Vignette.

Module inference
All module inference methods take the input object described in Section 2.1 as their primary input.The same input object can be used for all inference methods.
The output is always a MODifieR_module object.All MODifieR_module objects contain at least the disease module genes and a list of parameters used to generate the object.Several inference methods offer post-processing functions, such as splitting the module into sub-modules.

Consensus modules
Two approaches for generating a consensus module by integrating individual modules are provided in MODifieR.The first is a simple count-based method where a consensus module is derived from a set of MODifieR modules by counting the frequency of each gene in the individual methods.Supplementary Material S1 provides an example of the inference of an asthma count-based consensus module.The second method is a network-based approach called S2B (double specific betweenness), relying on betweenness centrality, originally developed to find the overlap between gene sets associated with two diseases (Garcia-Vaquero et al., 2018).The integration methods also return a MODifieR_module object.

Exporting objects
All objects can be exported to either a series of csv files or an Excel workbook with multiple sheets.

Conclusion
MODifieR is a novel R package that provides a set of common disease module inference methods using standardized input and output.It helps to derive more robust disease-specific modules by combining results from these methods.MODifieR provides a valuable complement to existing tools for module-based biomarker discovery and thus contributes to understanding complex diseases.

Fig. 1 .
Fig. 1.Workflow of MODifieR.Printi (A) Creating input objects.(B) Disease modules are inferred by individual methods.(C) Consensus module is derived by integrating modules from individual methods