decoupleR: ensemble of computational methods to infer biological activities from omics data

Abstract Summary Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor and Python package containing computational methods to extract these activities within a unified framework. decoupleR allows us to flexibly run any method with a given resource, including methods that leverage mode of regulation and weights of interactions, which are not present in other frameworks. Moreover, it leverages OmniPath, a meta-resource comprising over 100 databases of prior knowledge. Using decoupleR, we evaluated the performance of methods on transcriptomic and phospho-proteomic perturbation experiments. Our findings suggest that simple linear models and the consensus score across top methods perform better than other methods at predicting perturbed regulators. Availability and implementation decoupleR’s open-source code is available in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/decoupleR.html) for R and in GitHub (https://github.com/saezlab/decoupler-py) for Python. The code to reproduce the results is in GitHub (https://github.com/saezlab/decoupleR_manuscript) and the data in Zenodo (https://zenodo.org/record/5645208). Supplementary information Supplementary data are available at Bioinformatics Advances online.


Univariate Decision Tree
Univariate Decision Tree (UDT) fits a single decision tree for each regulator and sample. As a unique covariable, UDT uses the associated weights of a given regulator to estimate the molecular readouts of all molecular features in a sample. Target features with no associated weight are set to zero. The obtained feature importance from the fitted model is the activity of the regulator.

Multivariate Decision Trees
Multivariate Decision Trees (MDT) fits an ensemble of decision trees, known as random forest, to infer regulator activities. MDT, contrary to UDT, uses all regulators of a given network to estimate the molecular readouts of all molecular features in a sample. Same as UDT, target features with no associated weight are set to zero. The feature importances extracted from the fitted model are the regulator activities.

Fast Gene Set Enrichment Analysis
Fast Gene Set Enrichment Analysis (FGSEA) (Sergushichev, 2016) estimates regulator activities using a GSEA implementation based on an adaptive multi-level split Monte Carlo scheme. In GSEA, molecular features are first ranked per sample. Then, an enrichment score (ES) is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The magnitude of the increment depends on the correlation of the molecular feature with the regulator being evaluated. The final ES is the maximum deviation from zero encountered in the random walk. Finally, a normalized ES (NES), called norm_fgsea in decoupleR, can be calculated using permutations.

Gene Set Variation Analysis
Gene Set Variation Analysis (GSVA) (Hänzelmann et al., 2013) starts by transforming the input molecular readouts matrix to a readout-level statistic using Gaussian kernel estimation of the cumulative density function. Then, readout-level statistics are ranked per sample and normalized to up-weight the two tails of the rank distribution. A erwards, an enrichment score (ES) is calculated as in GSEA, using the running sum statistic. Finally, the ES can be normalized by subtracting the largest negative ES from the largest positive ES.

Weighted Sum
Weighted Sum (WSUM) infers regulator activities by first multiplying each target feature by its associated weight which then are summed to a final enrichment score (ES). It can be defined as: Where is the number of targets for a given regulator, is the associated mode of regulation (either positive of negative), is the likelihood of that event happening and is a molecular feature statistics like gene expression. In case or are not present, these are set to one. Furthermore, permutations of random target features can be performed to obtain a normalized score (NES), called norm_wsum in decoupleR, with being the obtained random null distribution: A corrected enrichment score (CES), called corr_wsum, is also obtained: = − 10( ) * Where is the empirical p-value defined as: Here, is the number of times was bigger than the absolute value of ES and is the number of random permutations. NES and CES are alike, but CES can handle better zero inflated distributions since NES requires a high value to avoid having a equal to zero. ( )

Weighted Mean
Weighted Mean (WMEAN) is similar to WSUM but it divides the obtained ES by the sum of the absolute value of weights. It can be defined as: Like in WSUM, a NES (norm_wmean) and a CES (corr_mean) can be calculated if random permutations of target features are performed. It is worth mentioning that norm_wmean and norm_wsum converge into the same scores since their null distributions are the same.

Over Representation Analysis
Over Representation Analysis (ORA) measures the overlap between the target feature set and a list of most altered molecular features in the input matrix. The most altered molecular features can be selected from the top and/or bottom of the molecular readout distribution. ORA first builds a contingency table and then runs a one-tailed Fisher's exact test to determine if a regulator's set of features are enriched in the selected features from the data. The resulting score is: Where is the obtained p-value from the test.

Univariate Linear Model
Univariate Linear Model (ULM), adapted from (Teschendorff and Wang, 2020), like UDT, uses as a unique covariable the weighted mode of regulation of a single regulator to estimate the molecular readouts of all molecular features in a sample. Target features with no associated weight are set to zero. The obtained t-value from the fitted model is the activity of the regulator.

Multivariate Linear Model
Multivariate Linear Model (MLM), contrary to ULM and similar to MDT, uses all regulators of a given network to estimate the molecular readouts of all molecular features in a sample. Same as ULM, target features with no associated weight are set to zero and the obtained t-values from the fitted model are the activities of the regulators.

VIPER
Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER) (Alvarez et al., 2016) estimates biological activities by performing a three-tailed enrichment score calculation. First, a ranking is performed for the absolute value of the molecular statistics in the input matrix per sample. The closer value to zero in the matrix is given a ranking of one and the most extreme positive value is given a ranking of N. Then, these rankings are quantile transformed. The one-tailed enrichment score is computed as: Here, is the number of targets for a given regulator, is the associated mode of regulation, is the likelihood of that interaction and is the quantile-transformed ranking of molecular statistics. Next, molecular targets inside each regulator are ranked again, now based on the mode of regulation, either positive or negative: = ( * ) is a vector indicating the mode of regulation for each target feature and is a vector containing the molecular statistics from a given sample. Ranks are also quantile transformed and the two-tailed ES is calculated as: ∑ is the two-tailed quantile-transformed ranking of molecular statistics. Then, the three-tail score is defined as: Where is the sign of . Finally a normalized enrichment score is estimated by: Which is an analytical approximation to random permutations.

Consensus
A consensus score is generated when more than one method is run with decoupleR. For each method, the obtained activities are transformed into z-scores, first for positive values and then for negative ones. These two sets of z-score transformed activities are computed by subsetting the values bigger or lower than 0, then by mirroring the selected values into their opposite sign and finally calculating a classic z-score. This transformation ensures that values across methods are comparable, and that they remain in their original sign (active or inactive). The final consensus score is the mean across different methods.

Benchmark design
We used decoupleR to evaluate the performance of individual methods by The obtained activities were further compared by computing the Spearman correlation between the concatenated scores of all samples from one method to another. We also checked the overlap of regulators with high absolute value score between methods by computing the Jaccard index of each pair of experiments.
The final Jaccard index comes from calculating the median across experiments.
Furthermore, to evaluate the robustness of the methods to noise, we added or deleted a percentage of edges (25%, 50% and 75%) to every regulator in the prior knowledge networks. When random edges were added, their mode of regulation and weight were set to 1. For every mode (addition or deletion) and percentage, we generated five different networks, which we ran through the benchmarking pipeline of decoupleRBench. With the inferred regulator scores, for every percentage and mode we measured robustness as the correlation of scores with the normal ones and the difference of performance in AUROC and AUPRC to the normal networks.