Abstract

Summary

Diseases can be caused by molecular perturbations that induce specific changes in regulatory interactions and their coordinated expression, also referred to as network rewiring. However, the detection of complex changes in regulatory connections remains a challenging task and would benefit from the development of novel nonparametric approaches. We develop a new ensemble method called BoostDiff (boosted differential regression trees) to infer a differential network discriminating between two conditions. BoostDiff builds an adaptively boosted (AdaBoost) ensemble of differential trees with respect to a target condition. To build the differential trees, we propose differential variance improvement as a novel splitting criterion. Variable importance measures derived from the resulting models are used to reflect changes in gene expression predictability and to build the output differential networks. BoostDiff outperforms existing differential network methods on simulated data evaluated in four different complexity settings. We then demonstrate the power of our approach when applied to real transcriptomics data in COVID-19, Crohn’s disease, breast cancer, prostate adenocarcinoma, and stress response in Bacillus subtilis. BoostDiff identifies context-specific networks that are enriched with genes of known disease-relevant pathways and complements standard differential expression analyses.

Availability and implementation

BoostDiff is available at https://github.com/scibiome/boostdiff_inference.

1 Introduction

Gene regulation is a fundamental biological process that underlies various cellular functions, including developmental, environmental, and disease contexts. The regulatory relationships in a biological sample can be represented by gene regulatory networks (GRNs), where two gene nodes with a regulatory relationship are connected by an edge (Karlebach and Shamir 2008). GRN inference remains a challenging task because of the inherent complexity of transcriptional regulation, as well as the high dimensionality and noise in biological datasets. Furthermore, GRNs are dynamic and context-specific (Ideker and Krogan 2012, Nagy-Staron et al. 2021), i.e. some regulatory processes are active only in certain cell types, tissues, conditions, or in response to specific stimuli. Changes in these pairwise dependencies have been associated with the development of complex diseases (de la Fuente 2010). Differential network analysis, which aims to detect altered connectivity between different conditions or disease states, has recently emerged as a powerful complement to standard differential expression (DE) analysis and is more suitable for detecting context-specific GRNs (de la Fuente 2010, Marbach et al. 2012). Exploring how GRN structures are rewired between two different states can reveal molecular mechanisms that drive disease development and progression and identify more relevant therapeutic targets.

Various approaches for deriving differential networks have been the focus of recent studies (Bhuva et al. 2019, Basha et al. 2020, Baur et al. 2020). Representative methods are shown in Table 1. The z-score method performs Fisher transformation of Pearson’s correlation coefficients between two conditions. The resulting z-scores are modeled as a normal distribution, followed by a z-test to detect significant pairwise edges (Zhang et al. 2007). Diffcoex, a framework based on weighted gene co-expression analysis (WGGNA), first builds a correlation matrix for each condition being compared and then calculates their difference. From the difference matrix, a dissimilarity matrix is calculated based on a topological overlap measure and hierarchical clustering is applied to find the differentially co-expressed gene clusters (Tesson et al. 2010). Another approach, a Gaussian graphical model (GGM)-based method, learns the differential network from conditional associations (Chu et al. 2011). EBcoexpress relies on empirical Bayes’ estimation to calculate the posterior probability that an edge is differentially co-expressed (Dawson and Kendziorski 2012, Dawson et al. 2012).

Table 1.

Overview of methods used to derive differential networks and patient-specific networks.a

MethodAlgorithmic approachTestDirectionalityNo. of conditionsNetwork typeReference
BoostDiffTree-basedYesTwoDifferential networkThis paper
z-scoreCorrelation-basedz-testNoTwoDifferential networkZhang et al. (2007)
EBcoexpressEmpirical Bayes; correlationNoMultipleDifferential networkDawson and Kendziorski, (2012)
DiffcoexCorrelation-basedPermutation testNoMultipleDifferential networkTesson et al. (2010)
GGM-basedGGM; posterior oddsPermutation testNoTwoDifferential networkChu et al. (2011)
chNetGGM; differential expressiont-testNoTwoDifferential networkTu et al. (2021)
LIONESSLinear interpolationNoPatient-specificKuijjer et al. (2019)
Tanaka et al. (2020)Nonparametric Bayesian network; Δ Edge contribution valuet-testYesPatient-specificTanaka et al. (2020)
DEVC-netDifferential expression variance covariance (DEVC)t-test; Wilcoxon rank sum testNoTwoDifferential network; patient-specificYu et al. (2015)
MethodAlgorithmic approachTestDirectionalityNo. of conditionsNetwork typeReference
BoostDiffTree-basedYesTwoDifferential networkThis paper
z-scoreCorrelation-basedz-testNoTwoDifferential networkZhang et al. (2007)
EBcoexpressEmpirical Bayes; correlationNoMultipleDifferential networkDawson and Kendziorski, (2012)
DiffcoexCorrelation-basedPermutation testNoMultipleDifferential networkTesson et al. (2010)
GGM-basedGGM; posterior oddsPermutation testNoTwoDifferential networkChu et al. (2011)
chNetGGM; differential expressiont-testNoTwoDifferential networkTu et al. (2021)
LIONESSLinear interpolationNoPatient-specificKuijjer et al. (2019)
Tanaka et al. (2020)Nonparametric Bayesian network; Δ Edge contribution valuet-testYesPatient-specificTanaka et al. (2020)
DEVC-netDifferential expression variance covariance (DEVC)t-test; Wilcoxon rank sum testNoTwoDifferential network; patient-specificYu et al. (2015)
a

Adapted from Bhuva et al. (2019).

Table 1.

Overview of methods used to derive differential networks and patient-specific networks.a

MethodAlgorithmic approachTestDirectionalityNo. of conditionsNetwork typeReference
BoostDiffTree-basedYesTwoDifferential networkThis paper
z-scoreCorrelation-basedz-testNoTwoDifferential networkZhang et al. (2007)
EBcoexpressEmpirical Bayes; correlationNoMultipleDifferential networkDawson and Kendziorski, (2012)
DiffcoexCorrelation-basedPermutation testNoMultipleDifferential networkTesson et al. (2010)
GGM-basedGGM; posterior oddsPermutation testNoTwoDifferential networkChu et al. (2011)
chNetGGM; differential expressiont-testNoTwoDifferential networkTu et al. (2021)
LIONESSLinear interpolationNoPatient-specificKuijjer et al. (2019)
Tanaka et al. (2020)Nonparametric Bayesian network; Δ Edge contribution valuet-testYesPatient-specificTanaka et al. (2020)
DEVC-netDifferential expression variance covariance (DEVC)t-test; Wilcoxon rank sum testNoTwoDifferential network; patient-specificYu et al. (2015)
MethodAlgorithmic approachTestDirectionalityNo. of conditionsNetwork typeReference
BoostDiffTree-basedYesTwoDifferential networkThis paper
z-scoreCorrelation-basedz-testNoTwoDifferential networkZhang et al. (2007)
EBcoexpressEmpirical Bayes; correlationNoMultipleDifferential networkDawson and Kendziorski, (2012)
DiffcoexCorrelation-basedPermutation testNoMultipleDifferential networkTesson et al. (2010)
GGM-basedGGM; posterior oddsPermutation testNoTwoDifferential networkChu et al. (2011)
chNetGGM; differential expressiont-testNoTwoDifferential networkTu et al. (2021)
LIONESSLinear interpolationNoPatient-specificKuijjer et al. (2019)
Tanaka et al. (2020)Nonparametric Bayesian network; Δ Edge contribution valuet-testYesPatient-specificTanaka et al. (2020)
DEVC-netDifferential expression variance covariance (DEVC)t-test; Wilcoxon rank sum testNoTwoDifferential network; patient-specificYu et al. (2015)
a

Adapted from Bhuva et al. (2019).

The differential network methods described above measure linear relationships or rely on joint normality assumptions, which may not hold in practice (Shojaie 2021). Nonlinear regulatory dependencies are common mechanisms that occur in biology (Brunel et al. 2010, Costello et al. 2014) and these complex relationships from real datasets may be difficult to detect using correlation- or GGM-based methods. As discussed in a recent review, new methods for differential network analysis for non-Gaussian data are needed (Shojaie 2021). Some methods for inferring patient-specific networks have also been developed. To measure the sample-specific edge contributions, DEVC-net (Yu et al. 2015) uses a method that calculates an integrated score called DEVC that considers the differential expression of a gene, differential expression variance of a gene, and covariance of gene pairs. This method uses a template network to identify differential network features from gene expression data and can be used to obtain sample-specific networks. Linear Interpolation to Obtain Network Estimates for Single Samples (LIONESS) reverse engineers sample-specific networks from an aggregate network by applying linear interpolation on the edge weights from the aggregate networks (Kuijjer et al. 2019). This approach can be used in combination with existing network inference methods to also capture nonlinear dependencies. Tanaka et al. (2020) infer nonparametric Bayesian networks using B-spline regression. From the model parameters, the method derives a value representing the contribution of each edge to the expression in a particular sample, called the edge contribution value (ECv). The absolute differences in ECvs (ΔECv) between certain samples are then used to build sample-specific networks. An alternative to the abovementioned approaches are tree-based strategies, which offer the advantage of more relaxed model assumptions. While examples such as GENIE3 and derived tools continue to be successfully applied in various biological settings (Huynh-Thu et al. 2010, Moerman et al. 2019, Van de Sande et al. 2020), they cannot be used to directly compare two different biological conditions.

We introduce BoostDiff, a nonparametric tree-based approach for reconstructing directed differential networks (Fig. 1). The main component of BoostDiff is the differential tree, a modified version of the standard regression tree that can be used to identify gene pairs exhibiting changes in regulatory dependencies between two biological conditions. To build the differential trees, we use a novel splitting criterion called the differential variance improvement (DVI), which measures the difference in predictive value of a feature on the gene expression levels between two conditions. We demonstrate that boosting the differential trees with respect to samples belonging to a specific target condition is an important step for promoting condition specificity of the output networks. Tree-based variable importance measures are then calculated to obtain a ranking of regulators and to derive the differential network. Using simulated gene expression data, we evaluate the performance of BoostDiff in comparison to baseline methods and existing differential network tools. We further apply BoostDiff to multiple transcriptomics datasets, including COVID-19, Crohn’s disease, prostate adenocarcinoma, and breast cancer data, as well as stress response data in the bacterium Bacillus subtilis.

Overview of the BoostDiff algorithm. As input, we require two gene expression matrices corresponding to a target condition (e.g. disease) and a baseline condition (e.g. control). For each of p total genes, a learning subsample (LS) is drawn from the two datasets, after which an AdaBoost ensemble of differential trees is built to identify the features that are more predictive of the gene expression levels in the target condition. By setting a target condition, BoostDiff can be used to identify regulatory relationships that are more pronounced in condition A (e.g. disease state) and condition B (e.g. control/healthy), thereby providing a differential network capturing context-specific regulatory changes. In the overall workflow, the BoostDiff algorithm is run twice, once with condition A as target condition and subsequently with B as target condition. The results are then combined to obtain the final differential network. Most notably, while existing approaches aim for the reconstruction of whole genome-scale GRNs, BoostDiff concentrates on maximizing the precision for those parts of the regulatory network that actually characterize the difference between the two phenotypes.
Figure 1.

Overview of the BoostDiff algorithm. As input, we require two gene expression matrices corresponding to a target condition (e.g. disease) and a baseline condition (e.g. control). For each of p total genes, a learning subsample (LS) is drawn from the two datasets, after which an AdaBoost ensemble of differential trees is built to identify the features that are more predictive of the gene expression levels in the target condition. By setting a target condition, BoostDiff can be used to identify regulatory relationships that are more pronounced in condition A (e.g. disease state) and condition B (e.g. control/healthy), thereby providing a differential network capturing context-specific regulatory changes. In the overall workflow, the BoostDiff algorithm is run twice, once with condition A as target condition and subsequently with B as target condition. The results are then combined to obtain the final differential network. Most notably, while existing approaches aim for the reconstruction of whole genome-scale GRNs, BoostDiff concentrates on maximizing the precision for those parts of the regulatory network that actually characterize the difference between the two phenotypes.

2 Methods

2.1 Overview of the differential network inference approach

The overall strategy of BoostDiff is inspired by the popular network inference tool GENIE3 (Huynh-Thu et al. 2010). GENIE3 aims to reconstruct a network using as input a gene expression matrix derived from a single biological condition. For this, GENIE3 fits an ensemble of regression trees, using the expression levels of the target genes as the output variable and the expression levels of potential regulators as the features. The splitting criterion for a node in the regression tree is the reduction
in variance (impurity) of the target gene’s expression caused by the split. Here, S denotes the set of samples that reach a node to be split, SL and SR are the subsets of samples in the left and right branches of the split, respectively, and Var(S) is the variance of the output variable in a set of samples S{S,SL,SR}. The importance of one variable in a single regression tree is calculated by summing up the reductions in variance of all tree nodes where this variable is used as a splitting feature. The overall importance of a feature in the ensemble is then averaged across all trees. One ensemble model is learned for each target gene in the dataset. The tree-based feature importances based on variance reduction are interpreted as edge weights of candidate regulator-target gene links. A high edge weight indicates that a regulator has a large contribution in predicting the expression level of a target gene. The edges with the highest weights are returned as the inferred GRN. Biologically, the top-ranked edges inferred by GENIE3 represent the relationships that are dominant in the condition of interest.
Similarly, the differential network inference problem can be decomposed into p independent regression subproblems, where p is the total number of genes in the expression data. Our strategy assumes that, in a given biological context, the expression level of a gene can be modeled as a function of the expression levels of other genes (Fig. 1). The crucial difference between BoostDiff and GENIE3 is that we simultaneously take into account two datasets for inferring a differential network. More precisely, our approach requires the availability of (1) gene expression data matrix
for ND measurements from a disease condition and (2) the matrix
for NC measurements from a control condition. Both matrices have p columns (total number of genes). That is, xi,gD contains the expression of gene g in measurement (patient) i from the disease condition.

The inference task is modeled as a feature selection problem that aims to find features that are more predictive of expression levels in a target condition than in the baseline condition. For this, differential network analysis is performed by solving the regression problem while taking into account information from two distinct labels. We use the AdaBoost algorithm (Drucker 1997) using differential trees as base learners to drive the improved prediction of expression levels in the target condition. The trained model provides a ranking of the edges by computing a feature importance weight for each regulator (see details below).

For each target gene g=1,,p, we define regression problems LSD(g)=(XgD,X,gD) and LSC(g)=(XgC,X,gC). The design matrices XgD and XgC are obtained by deleting the gth columns X,gD and X,gC from XD and XC, respectively. The target variables are set to the deleted columns X,gD and X,gC. The inference of our differential network is then performed as follows:

  1. For each target gene g=1,,p:

    • Generate the design matrices and target variable vectors for the regression problems LSD(g) and LSC(g) for gene g for the disease and the control conditions.

    • Train an ensemble of differential regression trees with the disease selected as target condition, i.e. tree ensembles that yields a good fit for LSD(g) but not for LSC(g).

    • From the ensemble of differential trees, extract weights for all potential regulators of g which are large if the expression of the regulator is highly predictive of g in the disease condition but not in the control condition.

  2. Aggregate and sort the p gene rankings to obtain a global ranking of all differential regulatory edges.

  3. Repeat the steps 1 and 2 with reversed roles of D and C to identify regulatory links that are present in the control but not in the disease condition.

In the following, we describe the details of the algorithm.

2.2 Step 1a: Growing a differential tree

We first explain how to build a single differential tree, the key ingredient of step 1a of the algorithm outlined above. We describe how to build the differential tree for the case where the disease has been selected as the target condition. The workflow with control as target condition is analogous.

A differential tree is built through binary recursive partitioning. The key difference to standard regression trees is that, to determine the features (i.e. genes) used for splitting the samples at the inner nodes of our trees, we use a novel split criterion called differential variance improvement (DVI) instead of variance reduction. At each node of the differential tree, we maintain subsets SD{1,,ND} and SC{1,,NC} of the disease and control samples, respectively. Given a possible split feature (i.e. candidate predictor gene) g, we define DVI(g) as follows:
For a fixed g and splitting threshold τ, the variance reduction for the disease samples is given by
where MSE(·) is the mean squared error from the sample mean, X,gD[S]={xi,gD|iS} is the restriction of the target variable to the disease samples (rows) contained in a set of samples S, and SLD(g,τ)={iSD|xi,gDτ}, and SRD(g,τ)={iSD|xi,gD>τ} are the subsets of disease samples SD available at the current node whose split gene expression levels are, respectively, smaller or larger than the threshold τ. Variance reduction for the control samples is defined analogously. A positive value of the DVI hence means that the gene g is more predictive of g’s expression level in the disease condition than in the control condition, whereas a negative DVI value indicates that the opposite is the case.

When building our differential trees, we use genes g maximizing DVI(g) as split features (see details below). If disease has been selected as the target condition, these genes correspond to regulators of g which are much more active in disease samples than in control samples. If control has been selected as the target condition, they correspond to regulators in healthy individuals that have lost their regulatory activity in the disease.

Given the regression problems LSD(g) and LSC(g) for the disease and control conditions, respectively, we construct a differential regression tree whose nodes are 5-tuples
where g is the split gene, τD is the split threshold for the disease samples, and τC is the split threshold for the control samples. Note that we use two different thresholds, since using a single threshold for both conditions while optimizing the DVI leads to a skewed expression distribution in each side of the split, with one side favoring disease samples and the other side favoring control samples. The construction is done as follows:
  1. Initialize root as

    where ‘’ is a placeholder for not yet defined split genes and thresholds.

  2. Starting at the root, recursively construct a differential tree via binary partitioning as follows:

  3. At the current node v=(SD,SC,,,) of the tree under construction, do the following:

    • If a suitable termination criterion (the maximum depth of the tree has been reached or the number of target or baseline samples reaching in a node has been reached) has been satisfied or maxgDVI(g)0, label v as leaf and traceback.

    • Otherwise, set the node v’s split gene to
      its disease threshold to
      and its control threshold to
    • Initialize v’s left child as
      its right child as

      and continue with processing vL and vR.

In the regression trees described by (Breiman et al. 1984), the prediction for a sample is determined by traversing the tree until a leaf node is reached. Here, we are more interested in predicting the expression values of the samples in the target condition; thus, prediction is performed only for target samples using the identified splitting thresholds τD. The final prediction is calculated as the expected value of the expression levels of the target samples assigned to the leaf nodes after fitting the differential tree.

2.3 Step 1b: Boosting the differential trees

Inspired by GRNBoost2 (Moerman et al. 2019), we implement a boosting algorithm that derives a strong prediction model by sequentially training a pool of differential trees as the weak learners. AdaBoost for regression is typically used for solving problems where the output is a continuous variable (i.e. expression levels) without explicitly considering the class of the samples. Here, we adapted the AdaBoost.R2 algorithm (Drucker 1997) to handle the regression problem, given labels from two classes (i.e. conditions). Using the differential trees as base learners, the modified algorithm performs the boosting with respect to samples belonging to the specified target condition. In this way, BoostDiff attempts to find a model that is more predictive of the target condition compared to the baseline condition. In each tree, only the target samples are re-weighted in subsequent boosting iterations, while samples from the baseline condition retain uniform weight. In particular, target samples that are more difficult to predict are selected with higher weights during the bootstrapping step and will always be compared to a uniform sample from the baseline condition. To avoid overfitting, we set a low number of trees and in practice find that 50–100 differential trees in the ensemble are sufficient for real transcriptomics datasets. The algorithm is described in detail in the Supplementary Methods section.

2.4 Step 1c: Computing variable importances to rank the candidate predictors

Tree-based methods allow for the calculation of a variable importance measure that can be used to rank the features according to their relevance for predicting the output. In GENIE3, the importance of a predictor gene g is calculated as the sum of the variance reduction across all nodes where g is used as the splitting feature, averaged over all trees in the ensemble. For BoostDiff, we derive a similar measure by considering the samples belonging to the target condition (step 1c in the high-level description of BoostDiff given above). If disease is selected as target condition, the importance VIM(g) attributed to a candidate predictor gene g is calculated as the weighted variance reduction for the disease samples across all M trees in the ensemble:

Here, m is the boosting iteration, αm is the weight of the differential tree returned by AdaBoost, Vm(g) is the set of nodes in tree m where g was used as the split feature, and VarRed(g,τvD,SvD) is the variance reduction for the disease samples at a node v=(SvD,,g,τvD,)Vm(g). Since each node in a differential tree has two independent thresholds, interpreting the tree becomes more abstract with increasing depth. Boosting using shallow differential trees (e.g. differential tree stumps) thus favors greater interpretability of the variable importance measure.

2.5 Steps 2 and 3: Edge ranking and filtering from boosted differential trees

Each modified AdaBoost model yields a separate ranking of the regulators. However, simply ordering the regulatory links according to the weights leads to a bias toward highly variable target genes. To avoid this, we first scale the expression levels of each target gene to unit variance, similarly implemented in GENIE3 (Huynh-Thu et al. 2010).

Boosting with respect to a target condition does not necessarily produce a model that predicts a gene’s expression in the target condition better than its expression in the baseline condition. To illustrate, sample plots of the training progression are shown in Supplementary Fig. S1. To restrict the results to differential edges, we recommend examining the distributions of the mean difference in prediction error. Sample distributions of these values from the simulated and real transcriptomics data are shown in Supplementary Figs S2 and S3, respectively. Based on these generated plots, users can filter for target genes with lower mean prediction error in the target condition than the baseline condition by applying a threshold. Alternatively, users can select the top edges with the lowest mean difference in prediction error or input a user-defined percentile. After filtering, the edges are re-ranked based on the variable importance measure used as the edge weight. The procedure is performed for each of the two sub-analyses (disease or control as target condition). The top edges (user-specified parameter) from each sub-analysis are then merged to obtain the final network.

Given the ensemble nature of BoostDiff where each differential tree is constructed differently from randomly selected predictors, deriving an explanation of the model’s predictions is challenging. To obtain more mechanistic and biologically meaningful results, we can rely on the topology of the differential network and use community detection algorithms that aim to find highly connected subnetworks that could be potentially associated with the disease of interest.

2.6 Compared methods

We evaluate the performance of BoostDiff against baseline methods and existing differential networks methods. To verify the condition specificity of the networks output by BoostDiff, we compare its performance to those of two GRN inference methods, GENIE3 and ARACNE (Margolin et al. 2006). GENIE3 is run using the corresponding R package. ARACNE is run using the implementation provided in the R package minet (Meyer et al. 2008). For both GENIE3 and ARACNE, only the gene expression matrix from the disease condition is used as input. We also compare BoostDiff to two additional baseline methods: (i) random forests of differential trees (‘RF Difftrees’) and (ii) a ‘Differential GENIE3’ approach, where we separately infer the networks from each condition using GENIE3 and subtract the calculated edge weights derived from the control condition from the weights derived from the disease condition and vice versa. For details, an AIMe report is available at https://aime.report/656I3Z/2 (Matschinske et al. 2021).

Next, we compare the performance of BoostDiff to other differential network methods from the benchmarking study in Bhuva et al. (2019). These methods include z-score, EBcoexpress, Diffcoex, and a GGM-based method, which are implemented in the dcanr package in R. Additionally, we run the chNet algorithm (Tu et al. 2021), which considers significant changes in both partial correlations of edges and differential expression. To facilitate comparability and given that only the tree-based methods provide directionality information among the methods examined here, we convert directed edges to undirected edges (Bhuva et al. 2019).

2.7 Evaluation using simulated datasets

Gene expression data for disease and control conditions are simulated by adapting the SimulatorGRN approach (Bhuva et al. 2019), a dynamical systems modeling tool to simulate differential co-expression patterns. SimulatorGRN induces differential co-expression by perturbing a reference network via gene knockdown to reduce their expression levels and models activation/repression of a gene by a set of regulators using normalized Hill equations. Knockdown leads to complex nonlinear differences between the generated disease and control samples (Supplementary Fig. S4). By default, known regulatory interactions in Saccharomyces cerevisiae are used to create a reference GRN. To obtain the ‘true’ differentially co-expressed edges and allow performance evaluation, genes in the network whose abundance are affected by the knockdown perturbation and thus exhibit regulatory dependency are determined via sensitivity analysis.

In the original SimulatorGRN framework, a sample can have multiple genes knocked down, even though the evaluation considers each knockdown gene separately. To explicitly eliminate the confounding effect of additional knockdown genes in our experiments, we generate the expression data in the perturbed condition such that exactly one randomly selected input gene is knocked down. We evaluate the different tools across four scenarios: (i) 50 nodes; 500 simulations, (ii) 150 nodes; 500 simulations, (iii) 300 nodes; 500 simulations, and (iv) 500 nodes; 250 simulations. In each simulation, 100 samples are generated per condition. The final disease samples are those which have the gene knockdown, whereas the control samples are ‘wild type’ (Bhuva et al. 2019). The edges from the ‘association network’ of each simulation are used as the ground truth edges for evaluating performance. The parameters for generating the simulated data are shown in Supplementary Table S1.

In the analyses on simulated data, all genes except the target gene are considered as potential regulators. The z-score method, chNet, Diffcoex, and the GGM-based method are run with the default parameters. The parameters used for the RF Difftrees and BoostDiff are provided in Supplementary Table S2. For each simulation, we filter for the target genes belonging to the 3rd percentile based on the mean difference in prediction errors. The parameters for the EBcoexpress runs are shown in Supplementary Table S3.

The different tools have different statistical methods and cutoffs for determining the differentially coexpressed edges depending on how the algorithm works. To facilitate comparability, we show the top k predicted edges output by each method (except for chNet, wherein the number of predicted differential edges depends on the tuning parameter and is variable for each simulation; thus, extracting the top k edges cannot be consistently applied across simulations). For visualization, we show the results based on the top 100 predicted edges output by each method for the simulated data. We report the performance using precision, recall, and F1 score as the evaluation metrics.

2.8 Evaluation using real datasets

For evaluation on real datasets, BoostDiff, Diffcoex, EBcoexpress, the z-score method, and Differential GENIE3 are run on gene expression data for four human diseases (COVID-19, Crohn’s disease, breast cancer, and prostate adenocarcinoma) and for the stress response in B.subtilis. For the human diseases, we extract the top 1000 edges to generate the final network. For the B.subtilis data, we use the top 300 edges.

For the COVID-19 RNA-Seq dataset, raw gene counts are downloaded from the Gene Expression Omnibus (GEO) database under the accession number GSE156063 (Mick et al. 2020). We use data generated from nasal swab samples from COVID-19 (n=93) and uninfected patients (n=100). Count data are normalized using the DESeq2 package in R with the variance stabilizing transformation (vst}) function. Differentially expressed genes (DEGs) are obtained using DESeq2 (Love et al. 2014).

For the Crohn’s disease (CD) dataset, normalized microarray data are downloaded from the GEO database under the accession number GSE126124 (Palmer et al. 2019), which were generated from colon biopsies of individuals with CD (n=37) and healthy controls (n=19). Differential expression analysis is performed using limma (Ritchie et al. 2015).

To evaluate BoostDiff on data from The Cancer Genome Atlas (TCGA), preprocessed RNA-Seq gene expression data and DEGs from primary tumors and normal solid tissues are obtained from https://xenabrowser.net/ (Goldman et al. 2020). We use gene expression data for two cancer types, namely, prostate adenocarcinoma (PRAD) and breast cancer (BRCA), both of which contain at least 50 samples for the normal tissues.

For the human datasets, Illumina IDs are converted to HGNC symbols using the R package biomaRt where applicable (Durinck et al. 2009). Expression levels corresponding to probes mapped to the same gene symbol are averaged. The top 20% of genes with the lowest variance are filtered out. To generate the final GRN and facilitate comparability with BoostDiff, we filter the outputs for edges containing a TF, using the list of human transcription factors downloaded from http://humantfs.ccbr.utoronto.ca/ (Lambert et al. 2018). These TFs are also used as the candidate regulators for BoostDiff. Genes whose mean difference in prediction error of the models are more extreme than the threshold identified from the 3rd percentiles of the distributions are retained. For the CD dataset, a more stringent filtering step is performed because of the small sample size available for inference. All the outputs from the different methods are filtered for the top 1000 edges (except for chNet). For BoostDiff, the final network is thus comprised of the combined top 500 edges from the run where the disease condition is set as the target condition and the top 500 edges from the run where the control condition is set as the target condition. To identify more specific candidate network modules and facilitate the interpretability of the resulting networks, we apply the Louvain community detection algorithm using the python-louvain package (https://github.com/taynaud/python-louvain). The enrichr module of the gseapy v. 1.1.0 package is used to identify enriched KEGG pathways in the output networks (Subramanian et al. 2005, Kuleshov et al. 2016). KEGG terms are considered to be significantly enriched in a network if the overlap with the gene set is at least three genes.

We further test the performance of BoostDiff using transcriptome data related to the well-characterized general stress response in B.subtilis (Nicolas et al. 2012). We examine three contexts, namely, response to salt stress, response to heat stress, and response to ethanol stress, all of which are known to induce the SigB regulon. Expression data for all protein-coding genes and RNA features (S1 to S1583) of B.subtilis from the corresponding samples were downloaded from the GEO database (accession number GSE27219). Considering that bacterial adaptation is characterized by smaller variation across samples than in animal experiments or experiments with human specimens, gene expression datasets for bacterial adaptation generally have low number of replicates. Therefore, we generate additional synthetic samples using Synthetic Minority Over-sampling TEchnique (SMOTE) (https://github.com/akazs/pySMOTE/) to obtain 48 samples per condition to be available for inference. We note that a large sample size from experimental data still remains ideal for using BoostDiff; SMOTE is used as a workaround in order to be able to run the compared network inference methods on the B.subtilis datasets. Genes with low variance are filtered out using a variance cutoff of 0.4. For the BoostDiff runs, the final network is obtained by filtering the differentially predicted target genes based on the 3rd percentile and combining the top 150 edges in the differential network for each of the two sub-analyses. The results of the compared methods are filtered for the top 300 edges to obtain the differential network. Enrichment analyses for the B.subtilis outputs is performed using the ShinyGO v. 0.77 (Ge et al. 2020).

3 Results and discussion

3.1 BoostDiff outperforms existing methods on complex simulated settings

As expected, BoostDiff better identifies the differential edges than GENIE3 and ARACNE, both of which infer a static network from the disease condition alone (Fig. 2). The Differential GENIE3 method shows similar performance to those of the baseline methods, despite using the expression data from both disease and control conditions. These findings indicate that the feature importances calculated from the trees built using DVI as the splitting criterion are useful for identifying the differential edges. In addition, in all but the smallest simulation scenarios, the boosting scheme performs significantly better than the random forest of differential trees (see results for BoostDiff and RF Difftrees in Fig. 2): While RF Difftrees outperformed all the other methods in the 50-gene setting, BoostDiff is the best performing methods in the more complex 150-, 300-, and 500-gene settings. Results are similar for varying cutoffs of the number of top edges (k{50,100,150,200}) (Supplementary Fig. S5).

Performance of differential trees and BoostDiff compared to baseline methods and other differential network tools using simulated data comprising (a) 50 genes; (b) 150 genes; (c) 300 genes; (d) 500 genes. For all methods, the top 100 edges were considered (except for chNet). The one-sided Mann–Whitney U test is used to calculate the P-values.
Figure 2.

Performance of differential trees and BoostDiff compared to baseline methods and other differential network tools using simulated data comprising (a) 50 genes; (b) 150 genes; (c) 300 genes; (d) 500 genes. For all methods, the top 100 edges were considered (except for chNet). The one-sided Mann–Whitney U test is used to calculate the P-values.

We then determine the rank of each method in terms of the F1 score across all simulations in the four simulation settings. The results show that BoostDiff is the most frequently top-ranked method in the majority of simulations, more notably in the simulation settings with larger numbers of genes (Supplementary Fig. S6). In addition, except for when a high number of base learners is used in the ensemble, the performance of BoostDiff is consistent across different hyperparameter combinations (no. of features, maximum tree depth) (Supplementary Fig. S7).

3.2 Results on real datasets

3.2.1 COVID-19

The network inferred by BoostDiff shows stronger enrichment of COVID-19-related pathways compared to those of Diffcoex and the z-score method, whereas all edges in the EBcoexpress output are assigned zero posterior probabilities (Fig. 3 and Supplementary Tables S4–S7). In particular, the Diffcoex and z-score networks are enriched in less specific terms, such as ‘Influenza A,’ ‘Measles,’ and ‘Human T-cell leukemia virus 1 infection.’ Differential GENIE3 returned ‘Coronavirus disease,’ ‘Neutrophil extracellular trap formation,’ and ‘Phagosome.’ In addition to terms related to various pathogenic infections, the BoostDiff network is significantly enriched in COVID-19-relevant pathways, such as ‘Coronavirus disease,’ ‘TNF signaling pathway,’ ‘Th17 cell differentiation pathway,’ ‘NF-κB signaling pathway,’ ‘NOD-like receptor pathway,’ ‘Toll-like receptor signaling pathway,’ and ‘Viral protein interaction with cytokine and cytokine receptor’ (Fig. 3). However, similar to BoostDiff, ‘Herpes simplex I infection’ is a top enriched term in the networks output by both methods, which could indicate the presence of shared genes activated in response to viral infection. For the BoostDiff network, examining the differential edges when considering the two sub-analyses separately shows generally similar results, indicating the enrichment of infection-related pathways (Supplementary Fig. S8). We also compare the BoostDiff network to the list of DEGs. While the overlap between the differential network nodes and DEGs is significant, it is quite low (Jaccard similarity=0.129). Further, removing the DEGs from the genes in the differential network retains the enrichment of COVID-19-related pathways (Supplementary Fig. S9), indicating that these dysregulated genes identified by BoostDiff are missed by standard DE analysis.

Enriched KEGG pathways in the networks inferred by BoostDiff for the COVID-19 dataset. The resulting differential networks were strongly enriched in inflammation-related and other relevant pathways, consistent with known disease pathophysiology.
Figure 3.

Enriched KEGG pathways in the networks inferred by BoostDiff for the COVID-19 dataset. The resulting differential networks were strongly enriched in inflammation-related and other relevant pathways, consistent with known disease pathophysiology.

To further examine the differential networks, we apply the Louvain community detection algorithm (Blondel et al. 2008). BoostDiff, Diffcoex, z-score, and Differential GENIE3, identifies modules that are enriched in viral infection terms such as ‘Influenza A,’ ‘Coronavirus disease,’ ‘Hepatitis C,’ as well as signaling pathways ‘NOD-like receptor signaling pathway,’ ‘RIG-1-like receptor signaling pathway,’ and ‘Toll-like receptor signaling pathway’ (Supplementary Figs S10–S12 and Supplementary Tables S8–S11). However, the BoostDiff module is additionally enriched in the terms ‘Cytosolic DNA-sensing pathway’ and ‘Chemokine signaling pathway’ (Fig. 4 and Supplementary Tables S10 and S11). Notable COVID-19-related genes in the BoostDiff module include CXCL10, DDX58, STAT1, STAT2, EIF2AK2, and ISG15. Other additionally known genes involved in pathogen response include IFIT1, IFIT2, IFIT3, CXCL11, and CXCL9. Chemokines are produced in response to a range of viral infections. In COVID-19, chemokine signaling has been linked to acute respiratory distress syndrome (Zhou et al. 2020). DDX58 (RIG-1) is involved in the production of interferons in response to COVID-19 (Yamada et al. 2021). Interferon signaling mediated by STAT1 and STAT2 is a key antiviral defense mechanism, and SARS-CoV-2 was demonstrated to inhibit nuclear translocation of STAT1 and STAT2 (Mu et al. 2020). OASL, OAS2, and OAS3 are members of the oligoadenylate synthetase family of proteins, which are involved in the antiviral activity of interferons and have been reported to exacerbate cardiac effects of COVID-19 (Gao et al. 2023). JAK inhibitors have been explored for the treatment and management of COVID-19 (Levy et al. 2022) and is known to regulate the release of pro-inflammatory cytokines and chemokines. For instance, the chemokines CXCL9, CXCL10 and CXCL11 are known to be upregulated in the COVID-19 response (Callahan et al. 2021). IFIT1, IFIT2, and IFIT3 form a functional complex and participate in interferon-induced broad viral response (Mears and Sweeney 2018, Singh et al. 2022). ISG15 encodes a ubiquitin-like protein whose activation triggers the release of various pro-inflammatory cytokines and chemokines (Cao 2021).

Dysregulated module identified from the COVID-19 differential network inferred by BoostDiff using the Louvain algorithm. Some notable genes in the module include JAK2, CXCL9, CXCL10, CXCL11, DDX58, STAT1, STAT2, IFIT1, IFIT2, IFIT3, IFITM3, OASL, OAS2, OAS3, and ISG15.
Figure 4.

Dysregulated module identified from the COVID-19 differential network inferred by BoostDiff using the Louvain algorithm. Some notable genes in the module include JAK2, CXCL9, CXCL10, CXCL11, DDX58, STAT1, STAT2, IFIT1, IFIT2, IFIT3, IFITM3, OASL, OAS2, OAS3, and ISG15.

In addition to the enrichment of COVID-19-related genes that are consistent with literature, BoostDiff also identified edges between TREX1 and IFIT1, IFIT3, IFITM3, ISG15, and OAS3, all of which have been implicated with COVID-19. While TREX1 has not been directly linked to COVID-19 pathogenesis, it encodes an exonuclease that is involved in host defense and has also been implicated in autoimmune diseases (Stetson et al. 2008). In particular, TREX1 regulates interferon-mediated signaling in antiviral response (Hasan et al. 2013). Interestingly, a recent genome-wide screen identified TREX1 as a pro-viral factor in viral replication during influenza infection (King et al. 2023). Given that nucleic acid detection is a key component of the immune response, TREX1 could be a candidate player involved in immune response to viral pathogens, including SARS-CoV-2, that can be examined in future investigations.

3.2.2 Crohn’s disease

Notably, the BoostDiff network shows stronger enrichment of CD-related pathways than the networks inferred by the other methods (Fig. 5 and Supplementary Tables S12–S15). The network returned by Differential GENIE3 does not show any significant enrichment. In addition to infection pathways, the network output by BoostDiff for the CD dataset is enriched in CD-relevant pathways, including ‘TNF signaling pathway,’ ‘NOD-like receptor signaling pathway,’ ‘Th17 cell differentiation,’ ‘IL-17 signaling pathway,’ and ‘NF-κB signaling,’ as well as ‘Inflammatory bowel disease.’ NOD-like receptors play important roles in host defense and homeostasis by acting as sensors of microbial pathogens in the intestinal mucosa. IBD has been associated with abnormal gut microbiota composition, which promotes NF-κB signaling and downstream inflammatory responses (Hug et al. 2018). The TF NF-κB functions in maintaining intestinal homeostasis, and dysregulation of the NF-κB pathway leads to sustained inflammatory state characteristic of IBD patients (Zaidi and Wine 2018). NF-κB signaling activation has been associated with more severe clinical manifestations in CD patients (Han et al. 2017, Zaidi and Wine 2018). The Th17 subset of CD4+ T cells have well recognized roles in IBD pathogenesis. In CD, IL-17 signaling mediates the activation of Th17 cells, which further drive pro-inflammatory cascades via the production of IL-21, IL-22, IFN-γ, and TNF (Schmitt et al. 2021). The differential edges obtained from the two BoostDiff sub-analyses where the disease or control states are used as the target condition also show enrichment of CD-relevant pathways (Supplementary Fig. S13). The nodes in the BoostDiff network have a small overlap with the list of DEGs (Jaccard similarity=0.008). Enrichment results after removal of DEGs are similar to those of the original network (Supplementary Fig. S14). The Diffcoex network is enriched in only the term ‘Herpes simplex virus 1 infection,’ while the z-score network is enriched in many cancer-related terms (Supplementary Tables S12 and S13). The EBcoexpress network is also enriched in CD-related terms, such as ‘IL-17 signaling pathway,’ ‘NOD-like receptor signaling pathway,’ ‘TNF signaling pathway’ and ‘NF-κB signaling pathway.’ (Supplementary Table S15).

Enriched KEGG pathways in the networks inferred by BoostDiff for the CD dataset. The resulting differential networks are strongly enriched in inflammation-related and other relevant pathways, consistent with known disease pathophysiology.
Figure 5.

Enriched KEGG pathways in the networks inferred by BoostDiff for the CD dataset. The resulting differential networks are strongly enriched in inflammation-related and other relevant pathways, consistent with known disease pathophysiology.

The Louvain algorithm is then applied on the differential networks to refine the results. For Diffcoex and Differential GENIE3, none of the modules show significant CD-related enrichment. For z-score, one module is enriched in metabolism-related genes (Supplementary Fig. S15 and Supplementary Table S16). The EBcoexpress network contains a module that is enriched in inflammation-related terms ‘Natural killer cell mediated cytotoxicity,’ ‘NF-κB signaling pathway,’ and ‘TNF signaling pathway’ (Supplementary Fig. S16 and Supplementary Table S17). On the other hand, the BoostDiff network contains a module enriched in the terms ‘Necroptosis,’ ‘Apoptosis,’ and ‘TNF signaling pathway’ (Fig. 6 and Supplementary Table S18). CD is characterized by inflammation in the intestinal mucosa leading to necroptosis, autophagy, and apoptosis. TLRs, such as TLR3, are central players in microbial defense in the gut that trigger downstream inflammatory signaling pathways, such as NF-κB (Lu et al. 2018). The complex roles of caspases, including CASP1, CASP8, and CASP10, are also known to drive gut inflammation and tissue damage (Mandal et al. 2018). CHUCK is a member of the NF-κB signaling pathway that functions in chemokine signaling (Awane et al. 1999). Aberrant STAT1 and STAT3 signaling in IBD pathogenesis has been previously established (Nieminen et al. 2013). The protective role of chemokine-like receptor 1 (CMKLR1) against microbiota-driven colitis has been reported (Lin et al. 2022). TRIM40 drives IBD pathogenesis by targeting cell-cell junctions in the mucosal epithelial barrier and promoting inflammation in the gut (Noguchi et al. 2011, Kang et al. 2023). Disturbances in the circadian rhythm are also linked to IBD manifestations (Giebfried and Lorentz 2023); for instance, CLOCK expression is disrupted in IBD patients (Weintraub et al. 2020, Giebfried and Lorentz 2023). Interestingly, BoostDiff identifies an edge between CLOCK and STAT3, which could point to a candidate regulatory mechanism where the circadian clock influences STAT3, a known driver of the CD phenotype. Thus, this dependency could point to further research in this direction.

Dysregulated module identified from the CD differential network output by BoostDiff. Notable genes include CLOCK, TRIM40, IL7, TLR3, CASP1, STAT3, STAT1, CASP8, CASP10, CMKLR1, and CHUK.
Figure 6.

Dysregulated module identified from the CD differential network output by BoostDiff. Notable genes include CLOCK, TRIM40, IL7, TLR3, CASP1, STAT3, STAT1, CASP8, CASP10, CMKLR1, and CHUK.

3.2.3 Stress response in Bacillus subtilis

BoostDiff also outperforms its competitors on the B.subtilis data (Supplementary Tables S19–S28). The networks inferred by BoostDiff are found to be enriched in stress-related pathways for all three conditions (Supplementary Fig. S17). On the other hand, only the Differential GENIE3 network in the salt stress condition is enriched in stress-related pathways. For Diffcoex, only one of the networks (heat treatment) is significantly enriched in stress-related pathways.

Next, we examine the networks based on the top hub genes. For the salt stress experiment, all the top ten hub genes in the differential network (csbD, S536, ohrB, yuzA, ywjC, S1199, S1379, yfhK, yycD, ywmE, and ybyB) are part of the SigB regulon (Fig. 7). Similarly, for the heat stress experiment, BoostDiff identifies members of the SigB regulon, namely, csbD, ohrB, S536, yuzA, S1199, yycD, ywjC, yfhK, ywmE, and nhaX, as hub genes. For the ethanol stress experiment, BoostDiff idebtifies yuzA, ydaD, yycD, ydaE, yocK, yjgB, yhdF, S1477, and ytaB as the top ten hubs. Consistent with the applied stressors, the abovementioned genes encoding response proteins are part of the SigB regulon. The general stress response is controlled by the alternative sigma factor SigB, which induces the transcription of >100 genes across a wide range of stress stimuli (Nannapaneni et al. 2012). On the other hand, the other methods identify fewer relevant hub genes or do not produce meaningful outputs. EBcoexpress assigns a total of 68 532 edges and 1 080 620 edges with a posterior probability of 1.0 for the salt stress and heat stress experiments, respectively, thereby hindering biological interpretation. For the ethanol stress experiment, EBcoexpress did not run even after stringent filtering because of strong correlations in the dataset. The z-score method only identifies ctc as the relevant hub gene in the salt stress condition and no relevant genes in the heat stress condition; for the ethanol experiment, the z-score method identifies the ribosomal gene rplJ as a hub, which is not directly involved in the stress response (Supplementary Fig. S18). For Diffcoex, only three members of the SigB regulon (ydbD, yqjL, and yvgO) are identified as hubs in the heat stress condition and one relevant hub (S1290) is identified in the salt stress condition; no relevant hub gene is identified in the ethanol stress condition (Supplementary Fig. S19). Differential GENIE3 identifies only yflD as a hub in the salt stress condition (Supplementary Fig. S20).

The differential networks identified by BoostDiff for the stress response experiments in B.subtilis. (a) The differential network inferred with the salt stress versus Spizizen’s minimal medium control comparison. (b) The differential network inferred with the heat stress versus Belitsky Minimal Medium (BMM) control comparison (c). The differential network inferred with the ethanol stress versus BMM control comparison. In all three conditions, the top ten hub genes identified by BoostDiff are members of the SigB regulon, which is induced in diverse stresses in B.subtilis.
Figure 7.

The differential networks identified by BoostDiff for the stress response experiments in B.subtilis. (a) The differential network inferred with the salt stress versus Spizizen’s minimal medium control comparison. (b) The differential network inferred with the heat stress versus Belitsky Minimal Medium (BMM) control comparison (c). The differential network inferred with the ethanol stress versus BMM control comparison. In all three conditions, the top ten hub genes identified by BoostDiff are members of the SigB regulon, which is induced in diverse stresses in B.subtilis.

3.2.4 TCGA—breast cancer

Again, the network inferred by BoostDiff shows stronger enrichment of disease-related pathways (including ‘Transcriptional misregulation in cancer,’ ‘Pathways in cancer,’ ‘Proteoglycans in cancer,’ and ‘Signaling pathways regulating pluripotency of stem cells’) than networks inferred using Diffcoex, EBcoexpress, Differential GENIE3, or the z-score method (Fig. 8 and Supplementary Tables S29–S32). Only the Diffcoex network is enriched in inflammatory and cancer-related pathways. Ebcoexpress assigns a posterior probability of 1 to 375 873 edges, while the z-score network is enriched in one term, ‘Tight junction’ (Supplementary Tables S30 and S31). The network inferred by Differential GENIE3 is enriched in terms related to olfactory transduction and metabolism (Supplementary Table S32). For BoostDiff, the results of enrichment analysis when considering the two sub-analyses separately shows generally similar results (Supplementary Fig. S21). The nodes in the BoostDiff network have a small overlap with the list of DEGs (Jaccard similarity=0.126). Enrichment results after removal of DEGs are similar to those of the original network (Supplementary Fig. S22).

Enriched KEGG pathways in the networks inferred by BoostDiff for the TCGA BRCA dataset.
Figure 8.

Enriched KEGG pathways in the networks inferred by BoostDiff for the TCGA BRCA dataset.

No BRCA-related module is found in the Differential GENIE3 network. Module analysis for Diffcoex identifies a module that is primarily enriched in generic inflammation-related pathways, such as ‘IL-17 signaling pathway,’ ‘TNF signaling pathway,’ and ‘JAK-STAT signaling pathway.’ (Supplementary Fig. S23 and Supplementary Table S33). BoostDiff results are more specific: Here, we identify a module that is enriched in the terms ‘PPAR signaling pathway,’ ‘Insulin resistance,’ ‘AMPK signaling pathway,’ ‘Adipocytokine signaling pathway,’ ‘Pyruvate metabolism,’ and ‘Regulation of lipolysis in adipocytes’ (Fig. 9 and Supplementary Table S34). Adipose tissue plays an important role in tumor progression because it provides nutrients, metabolites, and adipokines to the tumor microenvironment of breast cancer cells (DeBerardinis et al. 2008, Guaita-Esteruelas et al. 2017). PPAR is a ligand-activated TF that is involved in fatty acid oxidation and lipid metabolism (Lee et al. 2019). In the context of BRCA, PPAR signaling is known to contribute to metastasis and promote survival in harsh metabolic conditions (Saez et al. 2004, Wang et al. 2016).

Dysregulated Louvain module for the TCGA BRCA dataset. Notable genes include PPARG, PLIN1, PLIN4, CD36, ESR2, ADIPOQ, FABP4, PCK1, CEBPA, MEOX1, and MEOX2.
Figure 9.

Dysregulated Louvain module for the TCGA BRCA dataset. Notable genes include PPARG, PLIN1, PLIN4, CD36, ESR2, ADIPOQ, FABP4, PCK1, CEBPA, MEOX1, and MEOX2.

The BoostDiff results not only make sense in light of the enriched pathways, but also when zooming-in on the individual edges and hub nodes: For instance, BoostDiff identifies a relationship between PPARG and PLIN4, which are already known to be associated in the context of adipogenesis (Nielsen et al. 2008). Perilipins comprise lipid droplet-associated proteins that modulate lipid homeostasis. In BRCA, PLIN4 upregulation is associated with poor prognosis (Kim et al. 2015, Zhang et al. 2021). BoostDiff also predicts an edge between PPARG and PCK1, which is also part of the PPAR pathway. PCK1 encodes a protein kinase that has multiple known metabolic roles, including lipogenesis and tumorigenesis (Tang et al. 2021). BoostDiff identifies CD36 as a hub gene in the module. In BRCA, CD36 has been reported to be involved in metabolic rewiring and reprogramming favoring fatty acid oxidation and has been proposed as a therapeutic target in breast cancer (Feng et al. 2019). BoostDiff further identified an edge between CD36 and FABP4. In BRCA, CD36 was shown to directly interact with FABP4 to influence transport and metabolism of fatty acids (Gyamfi et al. 2021). Further, CD36 is upregulated along with other lipid metabolism genes PLIN1 and ADIPOQ (adiponectin) in breast cancer (Marino et al. 2020).

The correlation between ESR2 and ADIPOQ in adipose tissue is also established (Ahmed et al. 2022). While this specific relationship has not been described in breast cancer, this could be investigated in future studies, given the established role of estrogens and adiposity in breast cancer (Bhardwaj and Brown 2021). Along with PPARG, CEBPA is a known master regulator of adipogenesis that influences lipid accumulation (Lee et al. 2019). Here, CEBPA is associated with ADIPOQ, another adipocyte marker (Matulewicz et al. 2017). Other identified TFs include MEOX1 and MEOX2, which are involved in angiogenesis and vascular development (Schönrock et al. 2022, Wang et al. 2022, Zeng et al. 2023). For instance, MEOX1 expression was found to be associated with tumor growth triple-negative BRCA (Zeng et al. 2023). Based on their emerging roles in neoplastic diseases, MEOX1 and MEOX2 could be candidate targets in breast cancer. Overall, our results identify genes that can be further explored and reinforce the important role of adipogenesis in breast cancer.

3.2.5 TCGA—prostate adenocarcinoma

For PRAD, the EBcoexpress network is too large to analyze, as 375 873 edges are assigned with posterior probability of 1, while the z-score network is not significantly enriched in any terms. The Diffcoex network is enriched in infection- and cancer-related terms, while the Differential GENIE3 network is enriched in terms related to metabolism. On the other hand, the BoostDiff network shows overall stronger enrichment of cancer-related terms (Fig. 10 and Supplementary Tables S35–S37). The nodes in the BoostDiff network have a small overlap with the list of DEGs (Jaccard similarity=0.114). The results of enrichment analysis when considering the two sub-analyses separately shows generally similar results (Supplementary Fig. S24). Enrichment results after removal of DEGs are similar to those of the original network (Supplementary Fig. S25).

Enriched KEGG pathways in the networks inferred by BoostDiff for the TCGA PRAD dataset.
Figure 10.

Enriched KEGG pathways in the networks inferred by BoostDiff for the TCGA PRAD dataset.

For Diffcoex, we identify a cancer-associated module enriched in terms such as ‘Cell cycle,’ ‘Cellular senescence,’ and ‘Bladder cancer’ (Supplementary Fig. S26 and Supplementary Table S38). None of the modules in the z-score and Differential GENIE3 networks show significant enrichment of cancer-related terms. For BoostDiff, we identify a module enriched in the terms ‘Transcriptional misregulation in cancer,’ ‘Adrenergic signaling in cardiomyocytes,’ ‘Dilated cardiomyopathy,’ ‘Thyroid hormone signaling pathway,’ ‘Cardiac muscle contraction,’ and ‘Pathways in cancer’ (Fig. 11 and Supplementary Table S39). The module contains MEF2C, a TF known to be involved in oncogenesis and tumor proliferation in many types of cancers (Bai et al. 2015, Di Giorgio et al. 2017). TEAD4 was found to be an important prognostic marker in PRAD and is a known effector of oncogenic pathways (Chen et al. 2021). The potential involvement of MYH7 in PRAD has been recently described; in particular (Liang et al. 2022), found a higher frequency of MYH7 variants in prostate cancer patients. CAMK2A is known to promote tumor progression in prostate cancer (Rokhlin et al. 2007, He and Li 2021). In an earlier study, protein-protein interaction network-based analyses similarly identified ACTN2, MYL1, MYL2, and MYH7 as hub genes associated with PTEN-associated prostate cancer (Sun et al. 2019). The dysregulation of ACTN2 and MYH7 has been described in the context of other cancers (Ju et al. 2023). Our results suggest that candidate genes in the identified module can be further examined in the context of prostate cancer.

Dysregulated Louvain module for the TCGA PRAD dataset. Notable genes include MEF2C, MYL1, MYH6, MYH7, MYL2, ACTN2, CAMK2A, and TEAD4.
Figure 11.

Dysregulated Louvain module for the TCGA PRAD dataset. Notable genes include MEF2C, MYL1, MYH6, MYH7, MYL2, ACTN2, CAMK2A, and TEAD4.

3.2.6 Expression distributions of top-ranked edges identified by BoostDiff reflect shifts in correlation distributions

We additionally examine the Pearson correlations of the top edges from the differential networks identified by BoostDiff using the original expression data. This procedure is performed separately for the results of the two sub-analyses, namely, when the disease condition is used as the target condition, and when the control condition is used as the target condition. For the COVID-19 and CD datasets, for the same set of edges, we observe a unimodal distribution of correlation values in the nontarget condition and a bimodal distribution in the target condition, where strong positive correlation values suggest target-condition-specific activating regulator-target relationship, while negative values indicate target-condition-specific inhibitory relationships (Fig. 12). Similar observations can be derived for the B.subtilis stress adaptation and TCGA datasets (Supplementary Fig. S27). That is, expression levels of genes incident with differential edges returned by BoostDiff tend to exhibit highly condition-specific correlation patterns, indicating that BoostDiff can indeed uncover disease-specific gene dysregulation. Importantly, this striking observation does not hold for the set of all pairwise edges whose endpoints are DEGs or for randomly selected edges. For Differential GENIE3, similar patterns were observed for the salt stress and ethanol stress networks (Supplementary Fig. S28). However, no consistent patterns are observed when the same analysis is performed on the networks inferred from the human datasets (Supplementary Fig. S29).

Violin plots showing that the top 500 edges in the differential network predicted by BoostDiff for each sub-analysis tend to exhibit changes in correlation distributions between the disease and control expression data, indicating dysregulation in pairwise relationships. Correlations between predicted differential edges are compared to correlations between all pairwise combinations of DEGs, as well as randomly selected edges. Results are shown for (a) COVID-19, (b) CD, (c) TCGA—BRCA, and (d) TCGA—PRAD datasets.
Figure 12.

Violin plots showing that the top 500 edges in the differential network predicted by BoostDiff for each sub-analysis tend to exhibit changes in correlation distributions between the disease and control expression data, indicating dysregulation in pairwise relationships. Correlations between predicted differential edges are compared to correlations between all pairwise combinations of DEGs, as well as randomly selected edges. Results are shown for (a) COVID-19, (b) CD, (c) TCGA—BRCA, and (d) TCGA—PRAD datasets.

4 Conclusions

Gene regulation is a complex process that changes under different biological contexts. Differential network biology explores the rewiring of these regulatory interaction landscapes that are fundamentally distinct from the static networks that are inferred in most standard GRN inference methods (Ideker and Krogan 2012). By additionally considering the regulatory dependencies from a baseline condition, we can uncover a more refined picture underlying the molecular processes that are perturbed in a condition of interest, such as disease.

Inference of networks from biological expression data is a challenging task. The novelty of BoostDiff is 2-fold: (i) We use differential variance improvement as the splitting measure in a tree-based algorithm that can explicitly compare two datasets with a continuous output variable; (ii) BoostDiff adapts the AdaBoost algorithm to use differential trees as the base learner. Boosting the differential trees with respect to samples belonging to the target condition is a crucial step that significantly improves the detection of differential edges.

BoostDiff outperforms existing differential network methods on simulated data in more complex settings with higher dimensionality. BoostDiff yields biologically meaningful results when evaluated on five real-world transcriptomics datasets. We show that BoostDiff infers differential networks that are consistent with the pathophysiology of COVID-19, Crohn’s disease, and two types of cancer, namely, breast cancer and prostate adenocarcinoma, using TCGA data. We further validate BoostDiff using three different stress conditions evaluating the response of B.subtilis to salt, heat, and ethanol stressors, where BoostDiff correctly identifies members of the SigB regulon as hub genes in the differential networks. On the other hand, the compared methods recover few stress-related response genes. BoostDiff is also scalable and has reasonable runtime since it builds one model for each gene and can hence easily be parallelized. To facilitate biological interpretation and further refine the results, various graph-based algorithms can be used to find highly connected subnetworks. In this study, we apply the Louvain community detection algorithm, although other graph-based tools can also be used to find perturbed modules.

Nevertheless, our method has several limitations. First, BoostDiff can only compare two conditions at a time. Moreover, BoostDiff is similar to GENIE3 in that it does not perform statistical testing. Instead, scores are assigned to individual edges by calculating tree-based variable importance measures; thus, only the ranking of the edge weights is considered. Further, the AdaBoost algorithm can be prone to overfitting. In our experiments, BoostDiff performance decreased when setting the number of trees to 300. Thus, to avoid overfitting, we recommend users to set a low number of base differential trees (e.g. 50 trees).

The application of BoostDiff is not limited to gene expression data; the proposed feature selection approach can be generalized to other omics datasets. For instance, BoostDiff can be applied to proteomics or metabolomics studies that aim to detect changes in dependencies of proteins or metabolites. Moreover, the simple but effective strategy implemented in BoostDiff is an algorithmic advancement that can be further extended to other problems that aim to extract differentially predictive features. Adapting BoostDiff for analyzing time-series datasets is also a promising research direction.

Acknowledgements

We are grateful to Julian Beier for technical support. We thank the anonymous reviewers for their valuable suggestions.

Author contributions

G.G.: Conceptualization, Formal analysis, Methodology, Software, Writing—original draft. M.L.: Conceptualization, Writing—Review & Editing. J.B.: Conceptualization, Writing—Review & Editing. U.V.: Validation, Writing—Review & Editing. U.M.: Validation, Writing—Review & Editing. D.B.B.: Conceptualization, Methodology, Supervision, Writing—Review & Editing. T.K.: Conceptualization, Methodology, Supervision, Writing—Review & Editing.

Supplementary data

Supplementary data are available at Bioinformatics Advances online.

Conflict of interest

None declared.

Funding

This work was supported by the German Federal Ministry of Education and Research (BMBF) [031L0309A to D.B.B.]; and the Open Access Publication Funds of Technische Universität Braunschweig.

Data availability

The data used in this article are publicly available. Transcriptomics data used for analyzing the COVID-19, Crohn’s disease, and Bacillus subtilis stress response can be downloaded from Gene Expression Omnibus under the accession numbers GSE156063, GSE126124, and GSE27219, respecively. The TCGA datasets are available at https://xenabrowser.net/datapages/. The simulated gene expression data are deposited at https://zenodo.org/records/10214950. Our BoostDiff implementation, as well as sample scripts and tutorials are available on GitHub: https://github.com/scibiome/boostdiff_inference.

References

Ahmed
F
,
Hetty
S
,
Vranic
M
et al.
ESR2 expression in subcutaneous adipose tissue is related to body fat distribution in women, and knockdown impairs preadipocyte differentiation
.
Adipocyte
2022
;
11
:
434
47
.

Awane
M
,
Andres
PG
,
Li
DJ
et al.
Nf-κb-inducing kinase is a common mediator of il-17-, tnf-α-, and il-1β-induced chemokine promoter activation in intestinal epithelial cells
.
J Immunol
1999
;
162
:
5337
44
.

Bai
XL
,
Zhang
Q
,
Ye
LY
et al.
Myocyte enhancer factor 2C regulation of hepatocellular carcinoma via vascular endothelial growth factor and Wnt/β-catenin signaling
.
Oncogene
2015
;
34
:
4089
97
.

Basha
O
,
Argov
CM
,
Artzy
R
et al.
Differential network analysis of multiple human tissue interactomes highlights tissue-selective processes and genetic disorder genes
.
Bioinformatics
2020
;
36
:
2821
8
.

Baur
B
,
Shin
J
,
Zhang
S
et al.
Data integration for inferring context-specific gene regulatory networks
.
Curr Opin Syst Biol
2020
;
23
:
38
46
.

Bhardwaj
P
,
Brown
KA.
Obese adipose tissue as a driver of breast cancer growth and development: update and emerging evidence
.
Front Oncol
2021
;
11
:
638918
.

Bhuva
DD
,
Cursons
J
,
Smyth
GK
et al.
Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer
.
Genome Biol
2019
;
20
:
236
.

Blondel
VD
,
Guillaume
J-L
,
Lambiotte
R
et al. Fast unfolding of communities in large networks. J Stat Mech
2008
;
2008
:P10008.

Breiman
L
,
Friedman
J
,
Olshen
R
et al.
Classification and regression trees
.
Wadsworth Int Group
1984
;
37
:
237
51
.

Brunel
H
,
Gallardo-Chacón
J-J
,
Buil
A
et al.
MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis
.
Bioinformatics
2010
;
26
:
1811
8
.

Callahan
V
,
Hawks
S
,
Crawford
MA
et al. The pro-inflammatory chemokines CXCL9, CXCL10 and CXCL11 are upregulated following SARS-CoV-2 infection in an AKT-dependent manner. Viruses
2021
;
13
:
1062
.

Cao
X.
ISG15 secretion exacerbates inflammation in SARS-CoV-2 infection
.
Nat Immunol
2021
;
22
:
1360
2
.

Chen
C-L
,
Hsu
S-C
,
Chung
T-Y
et al.
Arginine is an epigenetic regulator targeting tead4 to modulate oxphos in prostate cancer cells
.
Nat Commun
2021
;
12
:
2398
.

Chu
J-H
,
Lazarus
R
,
Carey
VJ
et al.
Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes
.
BMC Syst Biol
2011
;
5
:
89
.

Costello
JC
,
Heiser
LM
,
Georgii
E
et al. ;
NCI DREAM Community
.
A community effort to assess and improve drug sensitivity prediction algorithms
.
Nat. Biotechnol
2014
;
32
:
1202
12
.

Dawson
JA
,
Kendziorski
C.
An empirical Bayesian approach for identifying differential coexpression in high-throughput experiments
.
Biometrics
2012
;
68
:
455
65
.

Dawson
JA
,
Ye
S
,
Kendziorski
C.
R/EBcoexpress: an empirical Bayesian framework for discovering differential co-expression
.
Bioinformatics
2012
;
28
:
1939
40
.

DeBerardinis
RJ
,
Lum
JJ
,
Hatzivassiliou
G
et al.
The biology of cancer: metabolic reprogramming fuels cell growth and proliferation
.
Cell Metab
2008
;
7
:
11
20
.

de la Fuente
A.
From ‘differential expression’ to ‘differential networking’ – identification of dysfunctional regulatory networks in diseases
.
Trends Genet
2010
;
26
:
326
33
.

Di Giorgio
E
,
Franforte
E
,
Cefalù
S
et al.
The co-existence of transcriptional activator and transcriptional repressor MEF2 complexes influences tumor aggressiveness
.
PLoS Genet
2017
;
13
:
e1006752
.

Drucker
H.
Improving regressors using boosting techniques
. In:
ICML
, July 8–12, 1997; Nashville, Tennessee, USA: Morgan Kaufmann, Vol.
97
,
1997
,
107
15
.

Durinck
S
,
Spellman
PT
,
Birney
E
et al.
Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomart
.
Nat Protoc
2009
;
4
:
1184
91
.

Feng
WW
,
Wilkins
O
,
Bang
S
et al.
CD36-Mediated metabolic rewiring of breast cancer cells promotes resistance to HER2-Targeted therapies
.
Cell Rep
2019
;
29
:
3405
20.e5
.

Gao
L-J
,
He
Z-M
,
Li
Y-Y
et al.
Role of OAS gene family in COVID-19 induced heart failure
.
J Transl Med
2023
;
21
:
212
.

Ge
SX
,
Jung
D
,
Yao
R.
ShinyGO: a graphical gene-set enrichment tool for animals and plants
.
Bioinformatics
2020
;
36
:
2628
9
.

Giebfried
J
,
Lorentz
A.
Relationship between the biological clock and inflammatory bowel disease
.
Clocks Sleep
2023
;
5
:
260
75
.

Goldman
MJ
,
Craft
B
,
Hastie
M
et al.
Visualizing and interpreting cancer genomics data via the xena platform
.
Nat Biotechnol
2020
;
38
:
675
8
.

Guaita-Esteruelas
S
,
Bosquet
A
,
Saavedra
P
et al.
Exogenous FABP4 increases breast cancer cell proliferation and activates the expression of fatty acid transport proteins
.
Mol Carcinog
2017
;
56
:
208
17
.

Gyamfi
J
,
Yeo
JH
,
Kwon
D
et al.
Interaction between CD36 and FABP4 modulates adipocyte-induced fatty acid import and metabolism in breast cancer
.
NPJ Breast Cancer
2021
;
7
:
129
.

Han
YM
,
Koh
J
,
Kim
JW
et al.
NF-kappa B activation correlates with disease phenotype in Crohn’s disease
.
PLoS One
2017
;
12
:
e0182071
.

Hasan
M
,
Koch
J
,
Rakheja
D
et al.
Trex1 regulates lysosomal biogenesis and interferon-independent activation of antiviral genes
.
Nat Immunol
2013
;
14
:
61
71
.

He
Q
,
Li
Z.
The dysregulated expression and functional effect of CaMK2 in cancer
.
Cancer Cell Int
2021
;
21
:
326
.

Hug
H
,
Mohajeri
MH
,
La Fata
G.
Toll-Like receptors: regulators of the immune response in the human gut
.
Nutrients
2018
;
10
:
203
.

Huynh-Thu
VA
,
Irrthum
A
,
Wehenkel
L
et al.
Inferring regulatory networks from expression data using tree-based methods
.
PLoS One
2010
;
5
:
203
.

Ideker
T
,
Krogan
NJ.
Differential network biology
.
Mol Syst Biol
2012
;
8
:
565
.

Ju
G
,
Yao
Z
,
Zhao
Y
et al.
Data mining on identifying diagnosis and prognosis biomarkers in head and neck squamous carcinoma
.
Sci Rep
2023
;
13
:
10020
.

Kang
S
,
Kim
J
,
Park
A
et al.
TRIM40 is a pathogenic driver of inflammatory bowel disease subverting intestinal barrier integrity
.
Nat Commun
2023
;
14
:
700
.

Karlebach
G
,
Shamir
R.
Modelling and analysis of gene regulatory networks
.
Nat Rev Mol Cell Biol
2008
;
9
:
770
80
.

Kim
S
,
Lee
Y
,
Koo
JS.
Differential expression of lipid metabolism-related proteins in different breast cancer subtypes
.
PLoS One
2015
;
10
:
e0119473
.

King
CR
,
Liu
Y
,
Amato
KA
et al.
Pathogen-driven CRISPR screens identify TREX1 as a regulator of DNA self-sensing during influenza virus infection
.
Cell Host Microbe
2023
;
31
:
1552
67.e8
.

Kuijjer
ML
,
Tung
MG
,
Yuan
G
et al.
Estimating Sample-Specific regulatory networks
.
iScience
2019
;
14
:
226
40
.

Kuleshov
MV
,
Jones
MR
,
Rouillard
AD
et al.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
.
Nucleic Acids Res
2016
;
44
:
W90
7
.

Lambert
SA
,
Jolma
A
,
Campitelli
LF
et al.
The human transcription factors
.
Cell
2018
;
175
:
598
9
.

Lee
J-E
,
Schmidt
H
,
Lai
B
et al.
Transcriptional and epigenomic regulation of adipogenesis
.
Mol Cell Biol
2019
;
39
:
e00601-18
.

Levy
G
,
Guglielmelli
P
,
Langmuir
P
et al.
JAK inhibitors and COVID-19
.
J Immunother Cancer
2022
;
10
:
e002838
.

Liang
Y
,
Chiu
PK-F
,
Zhu
Y
et al.
Whole-exome sequencing reveals a comprehensive germline mutation landscape and identifies twelve novel predisposition genes in Chinese prostate cancer patients
.
PLoS Genet
2022
;
18
:
e1010373
.

Lin
Y
,
Cai
Q
,
Luo
Y
et al.
Epithelial chemerin–CMKLR1 signaling restricts microbiota-driven colonic neutrophilia and tumorigenesis by up-regulating lactoperoxidase
.
Proc Natl Acad Sci USA
2022
;
119
:
e2205574119
.

Love
MI
,
Huber
W
,
Anders
S.
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol
2014
;
15
:
550
.

Lu
Y
,
Li
X
,
Liu
S
et al.
Toll-like receptors and inflammatory bowel disease
.
Front Immunol
2018
;
9
:
72
.

Mandal
P
,
Feng
Y
,
Lyons
JD
et al.
Caspase-8 collaborates with caspase-11 to drive tissue damage and execution of endotoxic shock
.
Immunity
2018
;
49
:
42
55.e6
.

Marbach
D
,
Costello
JC
,
Küffner
R
et al. ;
DREAM5 Consortium
.
Wisdom of crowds for robust gene network inference
.
Nat. Methods
2012
;
9
:
796
804
.

Margolin
AA
,
Nemenman
I
,
Basso
K
et al.
ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context
.
BMC Bioinformatics
2006
;
7
:
S7
.

Marino
N
,
German
R
,
Rao
X
et al.
Upregulation of lipid metabolism genes in the breast prior to cancer diagnosis
.
NPJ Breast Cancer
2020
;
6
:
50
.

Matschinske
J
,
Alcaraz
N
,
Benis
A
et al.
The AIMe registry for artificial intelligence in biomedical research
.
Nat Methods
2021
;
18
:
1128
31
.

Matulewicz
N
,
Stefanowicz
M
,
Nikolajuk
A
et al.
Markers of adipogenesis, but not inflammation, in adipose tissue are independently related to insulin sensitivity
.
J Clin Endocrinol Metab
2017
;
102
:
3040
9
.

Mears
HV
,
Sweeney
TR.
Better together: the role of IFIT protein–protein interactions in the antiviral response
.
J Gen Virol
2018
;
99
:
1463
77
.

Meyer
PE
,
Lafitte
F
,
Bontempi
G.
minet: a R/Bioconductor package for inferring large transcriptional networks using mutual information
.
BMC Bioinformatics
2008
;
9
:
461
.

Mick
E
,
Kamm
J
,
Pisco
AO
et al.
Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses
.
Nat Commun
2020
;
11
:
5854
.

Moerman
T
,
Aibar Santos
S
,
Bravo González-Blas
C
et al.
GRNBoost2 and arboreto: efficient and scalable inference of gene regulatory networks
.
Bioinformatics
2019
;
35
:
2159
61
.

Mu
J
,
Fang
Y
,
Yang
Q
et al.
SARS-CoV-2 N protein antagonizes type I interferon signaling by suppressing phosphorylation and nuclear translocation of STAT1 and STAT2
.
Cell Discov
2020
;
6
:
65
.

Nagy-Staron
A
,
Tomasek
K
,
Caruso Carter
C
et al.
Local genetic context shapes the function of a gene regulatory network
.
Elife
2021
;
10
:
e65993
.

Nannapaneni
P
,
Hertwig
F
,
Depke
M
et al.
Defining the structure of the general stress regulon of Bacillus subtilis using targeted microarray analysis and random forest classification
.
Microbiology (Reading)
2012
;
158
:
696
707
.

Nicolas
P
,
Mäder
U
,
Dervyn
E
et al.
Condition-dependent transcriptome reveals high-level regulatory architecture in Bacillus subtilis
.
Science
2012
;
335
:
1103
6
.

Nielsen
R
,
Pedersen
TA
,
Hagenbeek
D
et al.
Genome-wide profiling of PPARgamma: RXR and RNA polymerase II occupancy reveals temporal activation of distinct metabolic pathways and changes in RXR dimer composition during adipogenesis
.
Genes Dev
2008
;
22
:
2953
67
.

Nieminen
JK
,
Niemi
M
,
Sipponen
T
et al.
Dendritic cells from Crohn’s disease patients show aberrant STAT1 and STAT3 signaling
.
PLoS One
2013
;
8
:
e70738
.

Noguchi
K
,
Okumura
F
,
Takahashi
N
et al.
TRIM40 promotes neddylation of IKKγ and is downregulated in gastrointestinal cancers
.
Carcinogenesis
2011
;
32
:
995
1004
.

Palmer
NP
,
Silvester
JA
,
Lee
JJ
et al.
Concordance between gene expression in peripheral whole blood and colonic tissue in children with inflammatory bowel disease
.
PLoS One
2019
;
14
:
e0222952
.

Ritchie
ME
,
Phipson
B
,
Wu
D
et al.
limma powers differential expression analyses for RNA-sequencing and microarray studies
.
Nucleic Acids Res
2015
;
43
:
e47
.

Rokhlin
OW
,
Taghiyev
AF
,
Bayer
KU
et al.
Calcium/calmodulin-dependent kinase II plays an important role in prostate cancer cell survival
.
Cancer Biol Ther
2007
;
6
:
732
42
.

Saez
E
,
Rosenfeld
J
,
Livolsi
A
et al.
PPAR gamma signaling exacerbates mammary gland tumor development
.
Genes Dev
2004
;
18
:
528
40
.

Schmitt
H
,
Neurath
MF
,
Atreya
R.
Role of the IL23/IL17 pathway in Crohn’s disease
.
Front Immunol
2021
;
12
:
622934
.

Schönrock
A
,
Heinzelmann
E
,
Steffl
B
et al.
MEOX2 homeobox gene promotes growth of malignant gliomas
.
Neuro Oncol
2022
;
24
:
1911
24
.

Shojaie
A.
Differential network analysis: a statistical perspective
.
Wiley Interdiscip Rev Comput Stat
2021
;
13
:
e1508
.

Singh
DK
,
Aladyeva
E
,
Das
S
et al.
Myeloid cell interferon responses correlate with clearance of SARS-CoV-2
.
Nat Commun
2022
;
13
:
679
.

Stetson
DB
,
Ko
JS
,
Heidmann
T
et al.
Trex1 prevents cell-intrinsic initiation of autoimmunity
.
Cell
2008
;
134
:
587
98
.

Subramanian
A
,
Tamayo
P
,
Mootha
VK
et al.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci USA
2005
;
102
:
15545
50
.

Sun
J
,
Li
S
,
Wang
F
et al.
Identification of key pathways and genes in PTEN mutation prostate cancer by bioinformatics analysis
.
BMC Med Genet
2019
;
20
:
191
.

Tanaka
Y
,
Tamada
Y
,
Ikeguchi
M
et al.
System-based differential gene network analysis for characterizing a sample-specific subnetwork
.
Biomolecules
2020
;
10
:
306
.

Tang
K
,
Zhu
L
,
Chen
J
et al.
Hypoxia promotes breast cancer cell growth by activating a glycogen metabolic program
.
Cancer Res
2021
;
81
:
4949
63
.

Tesson
BM
,
Breitling
R
,
Jansen
RC.
DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules
.
BMC Bioinformatics
2010
;
11
:
497
.

Tu
J-J
,
Ou-Yang
L
,
Zhu
Y
et al.
Differential network analysis by simultaneously considering changes in gene interactions and gene expression
.
Bioinformatics
2021
;
37
:
4414
23
.

Van de Sande
B
,
Flerin
C
,
Davie
K
et al.
A scalable SCENIC workflow for single-cell gene regulatory network analysis
.
Nat Protoc
2020
;
15
:
2247
76
.

Wang
H
,
Tang
Y
,
Yang
X
et al.
A crucial angiogenesis-associated gene meox2 could be a promising biomarker candidate for breast cancer
.
Front Oncol
2022
;
12
:
759300
.

Wang
X
,
Wang
G
,
Shi
Y
et al.
PPAR-Delta promotes survival of breast cancer cells in harsh metabolic conditions
.
Oncogenesis
2016
;
5
:
e232
.

Weintraub
Y
,
Cohen
S
,
Chapnik
N
et al.
Clock gene disruption is an initial manifestation of inflammatory bowel diseases
.
Clin Gastroenterol Hepatol
2020
;
18
:
115
22.e1
.

Yamada
T
,
Sato
S
,
Sotoyama
Y
et al.
RIG-I triggers a signaling-abortive anti-SARS-CoV-2 defense in human lung cells
.
Nat Immunol
2021
;
22
:
820
8
.

Yu
X
,
Zeng
T
,
Wang
X
et al.
Unravelling personalized dysfunctional gene network of complex diseases based on differential network model
.
J Transl Med
2015
;
13
:
189
13
.

Zaidi
D
,
Wine
E.
Regulation of nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κβ) in inflammatory bowel diseases
.
Front Pediatrics
2018
;
6
:
317
.

Zeng
G
,
Liu
X
,
Su
X
et al.
The role of MEOX1 in non-neoplastic and neoplastic diseases
.
Biomed Pharmacother
2023
;
158
:
114068
.

Zhang
J
,
Ji
Y
,
Zhang
L.
Extracting three-way gene interactions from microarray data
.
Bioinformatics
2007
;
23
:
2903
9
.

Zhang
X
,
Su
L
,
Sun
K.
Expression status and prognostic value of the perilipin family of genes in breast cancer
.
Am J Transl Res
2021
;
13
:
4450
63
.

Zhou
Z
,
Ren
L
,
Zhang
L
et al.
Heightened innate immune responses in the respiratory tract of COVID-19 patients
.
Cell Host Microbe
2020
;
27
:
883
90.e2
.

Author notes

David B Blumenthal and Tim Kacprowski Equal contribution.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Yoshihiro Yamanishi
Yoshihiro Yamanishi
Associate Editor
Search for other works by this author on:

Supplementary data