StabJGL: a stability approach to sparsity and similarity selection in multiple-network reconstruction

Abstract Motivation In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined from the non-zero entries of the inverse covariance matrix of the data. Due to the high-dimensional nature of such problems, integrative methods that leverage similarities between multiple graphical structures have become increasingly popular. The joint graphical lasso is a powerful tool for this purpose, however, the current AIC-based selection criterion used to tune the network sparsities and similarities leads to poor performance in high-dimensional settings. Results We propose stabJGL, which equips the joint graphical lasso with a stable and well-performing penalty parameter selection approach that combines the notion of model stability with likelihood-based similarity selection. The resulting method makes the powerful joint graphical lasso available for use in omics settings, and outperforms the standard joint graphical lasso, as well as state-of-the-art joint methods, in terms of all performance measures we consider. Applying stabJGL to proteomic data from a pan-cancer study, we demonstrate the potential for novel discoveries the method brings. Availability and implementation A user-friendly R package for stabJGL with tutorials is available on Github https://github.com/Camiling/stabJGL.


Introduction
Network models have in recent years gained great popularity in many areas.In statistical omics, networks can be used to decode aspects of unknown structures, and hence study the relationships between genes, proteins, and other types of omics data.In health data sciences, rich data sets are more and more frequently encountered, enabling the development of models integrating a variety of biological resources.In the high-dimensional setting commonly found in omics, sharing information between data sources with shared structures -which could be different tissues, conditions, patient subgroups, or different omics types -can give a valuable increase in statistical power while elucidating shared biological function.A key question is how to combine the different data sources into a single model.
If a Gaussian graphical model is assumed, a conditional (in)dependence network can be estimated by determining the non-zero entries of the inverse covariance (precision) matrix of the data.With its good performance in numerical studies, the graphical lasso of Friedman et al. [2008] is a state-of-the-art method for precision matrix estimation in the setting of Gaussian graphical models.The method combines L 1 regularization with maximum likelihood estimation.Other notable methods include the neighborhood selection approach of Meinshausen and Bühlmann [2006] and the graphical SCAD [Fan et al., 2009].Notable Bayesian methods include the Bayesian graphical lasso [Wang et al., 2012], Bayesian spike-and-slab approaches [Wang, 2015] and the graphical horseshoe [Li et al., 2019a].If multiple related data sets are available, there are several ways to leverage common network structures.If focusing on one data type's network structure, data from other types can enhance inference via weighted graphical lasso methods [Li andJackson, 2015, Lingjaerde et al., 2021].However, to compare network structures across data sets, such as patient subgroups, a joint approach that leverages common information while preserving the differences can increase statistical power and provide interpretable insight.
In the area of multiple Gaussian graphical models, existing methods include the group extension of the graphical lasso to multiple networks of Guo et al. [2011], the Bayesian spike-and-slab joint graphical lasso [Li et al., 2019b] and the Markov random field approach of Peterson et al. [2015].The widely used joint graphical lasso (JGL) of Danaher et al. [2014] extends the graphical lasso to a multiple network setting and provides a powerful tool for inferring graphs with common traits.It employs two different penalty functions -group (GGL) and fused (FGL) -with the latter recommended for most applications.From this point forward, any mention of the joint graphical lasso will imply the fused version, unless otherwise specified.The method needs tuning of two regularization parameters for controlling (i) the number of non-zero effects, and (ii) the similarity between networks, respectively.However, the default parameter selection routine based on the AIC [Akaike et al., 1973] often results in severe over-selection in high-dimensional data, potentially impacting performance negatively [Liu et al., 2010, Foygel andDrton, 2010].In such settings, selection approaches based on model stability have demonstrated competitive performance [Liu et al., 2010, Angelini et al., 2022].
We propose a stable and accurate penalty parameter selection method for the joint graphical lasso, combining the model stability principle of Liu et al. [2010] with likelihood-based selection for high-dimensional data [Foygel and Drton, 2010].The resulting method inherits the powerful traits of the joint graphical lasso while mitigating the risk of severe under-or over-selection of edges in high-dimensional settings.We provide an R package, stabJGL (stable sparsity and similarity selection for the joint graphical lasso), which implements the method.
The paper is organized as follows.In Section 2, we first describe the Gaussian graphical model framework and the penalized log-likelihood problem we aim to solve.We then describe our proposed algorithm.In Section 3, we demonstrate the performance of our proposed method on simulated data and apply it proteomic data from a pan-cancer study of hormonally responsive cancers.Finally, we highlight possible extensions in Section 4.

Gaussian graphical models
In a gene network model, genes are represented by nodes and associations between them are represented by edges.Given measurable molecular units each corresponding to one gene (e.g., the encoded protein or mRNA), a network, or graph, can be constructed from their observed values.Consider n observed values of the multivariate random vector x = (X 1 , . . ., X p ) T of node attributes, with each entry corresponding to one of p nodes.If we assume multivariate Gaussian node attributes, with an n × p observation matrix X with i.i.d.rows x 1 , . . ., x n ∼ N (0, Σ), a partial correlation network can be determined by estimating the inverse covariance matrix, or precision matrix, Θ = Σ −1 .Specifically, the partial correlation between nodes i and j, conditional upon the rest of the graph, is given by where the θ ij 's are the entries of Θ and V the set of all node pairs [Lauritzen, 1996].The partial correlations coincide with the conditional correlations in the Gaussian setting.Because correlation (resp.partial correlation) equal to zero is equivalent to independence (resp.conditional independence) for Gaussian variables, a conditional independence graph can thus be constructed by determining the non-zero entries of the precision matrix.To ensure invertibility, the precision matrix also required to be positive definite, Θ ≻ 0.
In high-dimensional settings, the sample covariance matrix S = 1 n−1 X T X is rarely of full rank and thus its inverse cannot be estimated directly.It is common to assume sparse network, meaning the number of edges in the edge set E is small relative to the number of potential edges in the graph (i.e., the sparsity measure 2|E|/(p 2 − p) is small).Penalized methods such as the graphical lasso [Friedman et al., 2008] are well established for sparse Gaussian graphical model estimation.In the case of there being multiple (related) data sets available, such as from different tissue types, rather than estimating each network separately much statistical power could be gained by sharing information across networks through a joint approach.

Penalized log-likelihood problem
Assume a network inference problem with K groups.We let {Θ} = (Θ (1) , . . ., Θ (K) ) be the set of their (unknown) precision matrices, and assume that the set of K k=1 n k observations are independent.We aim to solve the penalized log-likelihood problem [Danaher et al., 2014] (1) where S (k) is the sample covariance matrix of group k and P(•) is a penalty function.In (1), det(•) denotes the determinant and tr(•) denotes the trace.The joint graphical lasso employs the fused penalty function where λ 1 and λ 2 are positive penalty parameters, abs(•) denotes the absolute value function and ∥ • ∥ 1 denotes the L 1 penalty.This penalty applies L 1 penalties to each off-diagonal element of the K precision matrices as well as to the differences between corresponding elements of each pair of precision matrices.As for the graphical lasso, the parameter λ 1 controls the sparsity.The similarity parameter λ 2 controls the degree to which the K precision matrices are forced towards each other, encouraging not only similar network structures but also similar precision matrix entries.The current penalty parameter selection approach for λ 1 and λ 2 is based on the AIC [Danaher et al., 2014].While suitable for determining network similarities, likelihood-based selection criteria can lead to severe under-or over-selection and thus poor performance in high-dimensional settings [Liu et al., 2010].

The stabJGL algorithm
To improve the performance of the joint graphical lasso with the fused penalty for omics applications and other high-dimensional problems, we propose the stabJGL algorithm for stable sparsity and similarity selection in multiple network reconstruction.Below we outline the algorithm, where we first select the sparsity parameter λ 1 in the fused penalty (2) based on the notion of model stability, and then the similarity parameter λ 2 based on model likelihood.
StabJGL jointly estimates multiple networks by leveraging their common information, and gives a basis for deeper exploration of their differences, as shown in Figure 1.

Selecting λ 1
We select λ 1 by extending the framework introduced by Liu et al. [2010] in their Stability Approach to regularization Criterion (StARS) to a multiple network setting.The aim is to select the least amount of penalization that makes graphs sparse as well as reproducible under random subsampling.This is done by drawing many random subsamples from each of the K data types and using them to construct joint graphical lasso graphs over a range of λ 1 values.The smallest parameter value for which a given graph estimation variability measure does not surpass a specified threshold is then selected.We use a measure of edge assignment instability across subsamples to quantify the variability.
Specifically, we consider a grid of λ 1 values in a suitable interval, i.e., (0, 1] and keep the similarity parameter λ 2 fixed to some small value such as 0.01 in the first instance.For η = 1, . . ., N sample , we draw a random subsample from each group k's set of n k observations without replacement, each of size b k < n k .Liu et al. [2010] show that in a single network setting, b k = ⌊10 √ n k ⌋ maintains theoretical properties for containing the true graph with high probability as well as high empirical performance, and this is the value we use.For each value of λ 1 to consider, we next construct the corresponding set of joint graphical lasso graphs {G η (k) (λ 1 )} K k=1 from these K sets of subsamples, using the fused penalty (2).
The following is then done for each value of λ 1 we consider.For each group k = 1, . . ., K and all possible node pairs (i, j) we estimate the probability of an edge between the nodes over the N sample inferred sets of graphs where 1 [•] is the indicator function.Using this estimated probability, we find which is an estimate of two times the variance of the Bernoulli indicator of the edge (i, j) in group k.It lies in [0, 0.5] and can be regarded as an estimate of the fraction of times two inferred graphs for group k found with the joint graphical lasso with the given λ 1 value will disagree on the presence of the edge (i, j).Due to the L 1 penalty in (2), the number of inferred edges will decrease as λ 1 is increased.For a given λ 1 , ξ ij (λ 1 ) can be regarded as a measure of the variability of the edge (i, j) in group k across subsamples, and the total variability of graph k can be measured by averaging over all edges, yielding the estimate For each value of λ 1 , the total variability of the whole set of graphs found by the joint graphical lasso is then found by averaging the variability over all K networks For sufficiently large λ 1 , all edges are excluded from the model and so the variability D(λ 1 ) will be 0. The variability will in general increase as the penalty λ 1 decreases, however, for small enough λ 1 the graphs will become so dense that the variability starts to decrease again.As sparse network inference is the aim, we therefore monotonize the variability function by letting D(λ 1 ) = sup 0≤t≤λ1 D(t).Finally, for a given variability threshold β 1 , the optimal penalty is chosen to be λ 1 = sup{λ 1 : D(λ 1 ) ≤ β 1 }.As opposed to λ 1 , β 1 is an interpretable quantity and we propose a default threshold of β 1 = 0.1 as suggested by Liu et al. for the original StARS algorithm, which reflects an acceptance of 10% variability in the edge assignments.

22:
end for 23: end for 25: λ1,λ2 is the estimated precision matrix of network k obtained with the penalty parameters λ 1 and λ 2 , and |E k | is the size of the corresponding edge set.A grid of λ 2 values is considered, with λ 1 fixed to the value selected in the previous step.The value of λ 2 that minimizes (7) is selected.Like for the standard eBIC, the additional edge penalty parameter γ ∈ [0, 1] must be chosen.However, since we are using the eBIC for similarity selection rather than sparsity selection, the choice of γ is not as important because we are comparing graphs with the same value of λ 1 and hence similar levels of sparsity.We typically use γ = 0, which corresponds to the ordinary BIC, for most applications.Our implementation includes the eBIC generalization to give the user the option of additional penalization in extremely high-dimensional cases.

Algorithm
The full stabJGL algorithm is given in Algorithm 1. JGL(•) indicates that the joint graphical lasso function with the fused penalty is applied.The output of the JGL function can either be a set of graphs, a set of precision matrices or an edge set, depending on what is required Algorithm 1.

Implementation details
StabJGL is implemented in R, and available as an R package at https://github.com/Camiling/stabJGL.The subsampling routine is implemented so it can be done in parallel.The joint graphical lasso fittings are done as in Danaher et al. [2014], using an ADMM (Alternating Direction Method of Multipliers) algorithm [Boyd et al., 2011] for general penalty functions to solve the penalized log-likelihood problem (1), By default, 20 subsamples are used and we evaluate 20 values each of λ 1 ∈ [0.01, 1] and λ 2 ∈ [0, 0.1].As in StARS, we use a subsample size of ⌊10 √ n k ⌋ for group k = 1, . . ., K [Liu et al., 2010].The additional penalty parameter γ in the eBIC for similarity selection is set to 0 by default, corresponding to the standard BIC.We found this value to be suitable in most applications but leave the option to increase the penalization.We employ a default variability threshold of β 1 = 0.1.

Simulated data
We first assess the performance of stabJGL on simulated data.We compare the network reconstruction ability of stabJGL to that of state-of-the-art methods, including the joint graphical lasso with the fused penalty (FGL) and group penalty (GGL) with penalty parameters selected with the default AIC-based criterion [Danaher et al., 2014].To assess the performance of another selection criterion specifically designed for high-dimensional graph selection, we also consider FGL with penalty parameters tuned by the extended BIC for multiple graphs (7) with a moderate value of γ = 0.2 [Foygel and Drton, 2010].We further include the Bayesian spike-and-slab joint graphical lasso (SSJGL) of Li et al. [2019b], as well as the graphical lasso (Glasso) of Friedman et al. [2008] tuned by StARS [Liu et al., 2010].The latter estimates each network separately.We generate data that closely resembles our omics application of interest, featuring partial correlations between 0.1 and 0.2 in absolute value, while also exhibiting the scale-free property -a typical assumption for omics data where the degree distribution (i.e., the distribution of the number of edges that are connected to the nodes) adheres to a power-law distribution [Chen and Sharp, 2004].In the main simulation scenario, we simulate K = 3 networks with p = 100 nodes, manipulating the degree of similarity in their "true" graphical structures to assess the performance of the method over a wide range of scenarios.We maintain a sparsity of 0.02 across all networks and generate data sets from the corresponding multivariate Gaussian distributions with n 1 = 150, n 2 = 200 and n 3 = 300 observations.We then apply different network reconstruction techniques to determine the networks from the data.For FGL and GGL, the two penalty parameters are chosen in a sequential fashion with the default AIC-based criterion proposed by Danaher et al. [2014], with 20 values of λ 1 ∈ [0.01, 1] and λ 2 ∈ [0, 0.1] respectively being evaluated.We consider the eBIC criterion on the same grid of values for FGL.We consider the same set of λ 1 and λ 2 values in the stabJGL algorithm and let γ = 0 in the eBIC criterion for similarity selection.For stabJGL and the graphical lasso tuned by StARS, we use a variability threshold of 0.1 and use 20 subsamples.For the Bayesian spike-and-slab joint graphical lasso all parameter specifications are as suggested by Li et al. [2019b].
In addition to the above setup, we consider additional settings with K ∈ {2, 4} graphs and p = 100 nodes.We only show a summarizing plot of these additional results, but the full tables for these simulations, as well as from additional scenarios with other values of K and p, are given in the Supplement.We also investigate the effect of the variability threshold β 1 in stabJGL on the results in a setting with p = 100 nodes and K = 2 networks.Finally, to compare the scalability of the respective methods we consider the time needed to infer networks for various p and K. Further details and code for the simulation study can be found at https://github.com/Camiling/stabJGL_simulations.
Estimation accuracy is assessed with the precision (positive predictive value), and the recall (sensitivity).The precision gives the fraction of predicted edges that were correct, while the recall is the fraction of edges in the true graph that were identified by the inference.Because the sparsity of estimated networks will vary between methods, the precision-recall trade-off should be taken into consideration.In general, the recall will increase with the number of selected edges while the precision will decrease.Since sparsity selection is a main feature of our proposed method, we do not consider threshold-free comparison metrics such as the AUC.We therefore put emphasis on the following characteristics in our comparative simulation study; (i) suitable sparsity level selection, (ii) utilization of common information at any level of network similarity, i.e., inference improves with increased network similarity, and (iii) a suitable precision-recall trade-off that overly favours either measure.

Simulation results
The results are summarized in Table 1.First, we observe that the fused and group joint graphical lasso with the default AIC-based penalty parameter selection strongly over-select edges in all cases.This leads to high recall, but very low precision.Second, they do not appear to sufficiently utilize network similarities; the performance of the two methods, particularly GGL, differs little between completely unrelated and identical networks.Notably, in all cases the selected value of λ 2 is smaller for FGL and GGL tuned by AIC than it is for stabJGL.Consequently, similarity is not sufficiently encouraged even in settings where the networks are identical.The AIC criterion does not seem to provide sufficient penalization to encourage suitable sparsity and similarity.On the other hand, we observe that the alternative eBIC criterion gives extremely sparse FGL estimates, resulting in high precision but very low recall.In half of the cases, it selects an empty graph, i.e., no edges.Although the extended BIC is developed specifically for graphical model selection, likelihood-based criteria for sparsity selection tend to perform poorly in high-dimensional settings and risk both severe under-and over-selection [Foygel and Drton, 2010].This issue is avoided in the stabJGL algorithm as the eBIC only is used to select similarity and not sparsity.
The Bayesian spike-and-slab joint graphical lasso tends to select very few edges, leading to high precision but low recall.Its performance deteriorates drastically as the network differences increase, leading to extremely low recall.This implies a lack of flexibility to adapt to varying network similarity levels, as has previously been observed [Lingjaerde et al., 2022].Out of all the joint methods, stabJGL gives the most accurate sparsity estimate.This ensures that we neither get very low precision like FGL and GGL tuned by AIC, nor very low recall like SSJGL and FGL tuned by eBIC.StabJGL also appears to adapt well to the similarity between networks, with the prediction accuracy increasing with the number of shared edges.As a result, the method either outperforms the graphical lasso tuned by StARS for highly similar networks or performs comparably to it for unrelated networks.The similar performance for unrelated networks can be explained by the fact that the sparsity controlling penalty parameter of both methods are tuned with a stability-based approach.The results suggest that stabJGL can be used agnostically in settings where there is no prior knowledge about the level of network similarity and does not run any risk of decreased accuracy should the networks have nothing in common.
The results for K = 2 and K = 4 networks are summarized in Figure 2. The results for FGL tuned with eBIC are not shown as it did not select any edges in any of the settings.The findings from the K = 3 case are echoed here, with FGL and GGL having high recall but very low precision and particularly GGL exhibiting a lack of adaption to increased network similarity.On the contrary, SSJGL selects very few edges and thus has high precision but very low recall, with its performance quickly deteriorating for less similar networks.StabJGL achieves a balanced precision-recall trade-off and adapts well to the level of network similarity.Consequently, stabJGL performs comparably or better than the graphical lasso depending on the degree of similarity between the networks.
A key question is whether stabJGL can achieve as high precision as the methods that give sparser networks (i.e., SSJGL) by using a lower variability threshold.Similarly, we want to see if stabJGL can achieve as high recall as the methods that infer more edges (i.e., FGL and GGL).To investigate this, we consider the same setting as in Figure 2 with K = 2 networks, focusing specifically on the case where the two networks have 20% edge agreement.Table 2 compares the performance of stabJGL for different values of the variability threshold β 1 to the other methods.For β 1 = 0.01, stabJGL gives very sparse estimates and obtains comparable precision and recall to SSJGL.For the higher threshold β 1 = 0.2, stabJGL selects a large number and obtains comparable recall to FGL and GGL while retaining a higher precision level.A complete comparison for all levels of edge agreement is given in the Supplement (Figure S3), where we similarly find that by varying the variability threshold β 1 we can obtain at least as high precision and/or recall as the other methods at any level of similarity.The fact that stabJGL allows the user to obtain higher or lower sparsity by changing the variability threshold means that the method can be adapted to reflect the priorities of the user (i.e., concern for false positives versus false negatives).For most applications, a middle-ground value such as 0.1 yields a good balance between false positives and false negatives as demonstrated in the simulations.

Runtime profiling
Figure 3 shows the CPU time used to jointly infer networks for K ∈ {2, 3, 4} networks and various numbers of nodes p, with n ∈ {100, 150} observations, for the joint graphical lasso with the fused penalty (FGL) with penalty parameters tuned with the AIC and stabJGL with the same parameter specifications as in the previously described simulations.Due to an efficient parallelized implementation, stabJGL has an almost identical run time to FGL when the same number of λ 1 and λ 2 values are considered.Thus, the increased estimation accuracy of stabJGL does not come at a computational cost.It is important to note that due to the generalized fused lasso problem having a closed-form solution in the special case of K = 2 [Danaher et al., 2014], stabJGL is substantially faster for only two networks than for K > 2. As stabJGL uses the fused penalty this comparison is the most relevant, but a run time comparison of all methods considered in our simulation study can be found in the Supplement (Figure S1).In the Supplement, we also demonstrate that stabJGL can be applied to problems with p > 1, 000 nodes and K > 2 networks within reasonable time (Figure S2).

Pan-cancer data
We perform a proteomic network analysis of Reverse Phase Protein Array (RPPA) data from The Cancer Genome Atlas (TCGA) across different pan-Cancer tumor types [Cancer Genome Atlas Network and others, 2012].In a large proteomic pan-Cancer study of 11 TCGA tumor types, Akbani et al. [2014] identified a major tumor super cluster consisting of hormonally responsive "women's cancers".(Luminal breast cancer, ovarian cystadenocarcinoma, and uterine corpus endometrial carcinoma).Our objective is to map the proteomic network structure of the respective tumor types, so that we can get a better grasp of the common mechanisms at play in the hormonally responsive tumors.We are also interested in highlighting the differences.We consider mature RPPA data from Luminal breast cancer (BRCA, n = 273), high-grade serous ovarian cystadenocarcinoma (OVCA, n = 412), and uterine corpus endometrial carcinoma (UCEC, n = 404).All data is downloaded from the UCSC Xena Browser [Goldman et al., 2020].The data is measured with p = 131 high-quality antibodies that target (phospho)-proteins.To alleviate batch effects, the RPPA data is normalized with replicate-base normalization [Akbani et al., 2014].We use stabJGL to jointly estimate the proteomic networks of the respective tumor types and interpret the results and their implications.We compare the output with that obtained with the fused joint graphical lasso (FGL) of Danaher et al. [2014] with the default penalty parameter tuning with AIC as described in Subsection 3.1.Further details and code for the analysis is given at https://github.com/Camiling/stabJGL_analysis.

Estimated proteomic networks
The resulting stabJGL proteomic networks of the three tumor types are shown in Figure 4, where we observe plenty of common edges as well as network-specific ones.The sparsity as well as the selected penalty parameter values in the resulting stabJGL and FGL networks is shown in Table 3.The tendency as observed in the simulations of FGL tuned by the AIC to over-select edges appears to be consistent with the findings in this context.With more than two thirds of all potential edges being determined as present by FGL, the results are challenging to interpret and derive meaningful conclusions from.From a biological standpoint, we would not expect a proteomic network to be this saturated in terms of associations due to the expected scale-free property of the degree distribution [Barabasi and Oltvai, 2004].While the degree distributions of the sparse stabJGL networks all follow a power-law with many low-degree nodes and fewer high-degree ones (hubs), an expected trait for omics data [Chen and Sharp, 2004], the degree distributions of the FGL networks do not.The full degree distributions are shown in the Supplement (Figure S4).In terms of penalty parameters, we see that just like for the simulated data the AIC selects very small penalty parameters for FGL, resulting in little sparsity and similarity encouragement.Given the findings of Akbani et al. [2014] about the presence of a super cluster consisting of the three hormonally responsive cancer types, it is not unreasonable to expect at least some proteomic network similarity to be encouraged by a joint method.This is achieved by stabJGL, which selects a large enough value of λ 2 to encourage similarity.A comparison of the pairwise similarities of the proteomic networks is given in Figure 5, where similarity is measured by Matthew's Correlation Coefficient (MCC), a discretized Pearson correlation coefficient that can be used to quantify pairwise network similarities (Matthews [1975]).StabJGL finds the networks of the three tumor types to be more similar than FGL, in accordance with the findings of Akbani et al. [2014].

BRCA OVCA UCEC
Figure 4: Proteomic network structure identified by stabJGL for the breast cancer (BRCA), ovarian cystadenocarcinoma (OVCA) and uterine corpus endometrial carcinoma (UCEC) tumors.The blue nodes represent proteins, and edges common to all three networks are marked in red, otherwise they are grey.

Edge validation in STRING
To compare the level of evidence supporting the edges detected by stabJGL and FGL tuned by the AIC in the literature, we conduct edge validation using the STRING database of known and predicted protein-protein interactions [Szklarczyk et al., 2019].To ensure the reliability of the validation process, we only consider the experimentally validated interactions in STRING as evidence, with default confidence score threshold ≥ 0.4.The fraction of edges with supporting evidence in the STRING database is computed for the respective stabJGL and FGL networks and shown in Table 4.The analysis reveals that for all three tumor types investigated, a higher proportion of the edges detected by stabJGL had supporting evidence in the STRING database compared to those identified by FGL.

Findings consistent with literature
StabJGL successfully identifies protein-protein interactions known from literature.To highlight the findings of the proposed methodology, we only discuss edges and central proteins identified by stabJGL but not FGL.One example is the edge between activated (S345-phosphorylated) Checkpoint kinase 1 (Chk1) and DNA repair protein RAD51 homolog 1 (Rad51) in ovarian and breast cancer.The complex between the tumor suppressor BRCA2, which manifests predominantly in ovarian and breast cancer, and Rad51, is mediated by the DNA damage checkpoint Chk1 through Rad51 phosphorylation [Nair et al., 2020, Bahassi et al., 2008].It is also reassuring that stabJGL identifies many relevant tumor type-specific proteins as hubs in the relevant tumor type only, such as mammalian target of rapamycin (mTOR), Tuberous Sclerosis Complex 2 (Tuberin) and Ribosomal protein S6 in BRCA, all of which are involved or up/downstream of the PI3K/AKT/mTOR pathway known to frequently be deregulated in Luminal breast cancer [Miricescu et al., 2020].Lists of the top hubs in the respective stabJGL and FGL networks of the different tumor types, and their node degree, are given in the Supplement (Tables S5 and S6).
StabJGL also captures edges that we expect to be present in all three tumor types, such as the known interaction between the transcription factor Forkhead box O3 (FOXO3a) and 14-3-3-epsilon which facilitates cancer cell proliferation [Tzivion et al., 2011, Nielsen et al., 2008].This common interaction is documented in the STRING database.Figure 6 shows the network structure identified by stabJGL that is common to all three tumor types.Central proteins in this common network structure include Oncoprotein 18 (Stathmin), which is known to be relevant in all three hormonally responsive cancers due to its role in the regulation of cell growth and motility [Bieche et al., 1998, Trovik et al., 2011, Belletti and Baldassarre, 2011].

Potential candidate hubs
The recovery of documented links in the protein networks estimated by stabJGL highlights its capability to detect numerous relevant proteins and interactions.The potential for new discoveries is however an important aspect of stabJGL, as suggested by its good performance on simulated data.For example, stabJGL identifies phosphorylated epidermal growth factor receptor (EGFR) as a central hub protein in all three tumor types.While known to be relevant in ovarian cancer Zhang et al. [2016], Yang et al. [2013], the role of activated EGFR in uterine corpus endometrial carcinoma and Luminal breast cancer and is not yet clarified.Our findings suggest it could be relevant in all three hormonally responsive tumor types.Further, Platelet endothelial cell adhesion molecule (CD31) is found to be a central protein in UCEC only.The protein is important for angiogenesis and has been implicated in other tumor types such as haemangioma [Bergom et al., 2005].Its prominence in the proteomic UCEC network suggests it may play a crucial role in this tumor type as well.Overall, these results showcase how stabJGL can aid in generating hypotheses by identifying central proteins and associations.

Discussion
Suitable sparsity and similarity selection is key for capturing and studying multiple related biological networks.We have proposed the stabJGL algorithm, which determines the penalty parameters in the fused graphical lasso for multiple networks based on the principle of model stability.StabJGL demonstrably avoids the under-or over-selection of edges observed in state-of-the-art selection methods based on information criteria, and succeeds at leveraging network similarities to a suitable degree.Consequently, the method can be employed in situations where the actual degree of similarity is uncertain, resulting in marked benefits with minimal risks associated with its use.StabJGL offers a fast parallelized implementation, particularly for K = 2 networks as a closed-form solution exists.We successfully apply the method to problems with p > 1, 000 nodes and K > 2 networks.
With our novel approach, we can identify both common and distinct mechanisms in the proteomic networks of different types of hormonally responsive women's cancers.The results obtained with stabJGL are in line with known biology and compliment those of Akbani et al. [2014] by offering additional understanding of the underlying mechanisms in action.By recognizing various proteins as highly critical in the proteomic networks, stabJGL suggests their possible involvement in driving the diseases.The method both identifies proteins that are central in all three hormonally responsive cancers (e.g., phosphorylated EGFR) and proteins of tumor-specific relevance (e.g., CD31 in UCEC).
Future extensions of the method can include alternative measures of variability, such as the entropy (see, e.g., Lartigue et al. [2020]).Further, while the method is formulated specifically for the joint graphical lasso with the fused penalty, it can in principle be used for any joint network approach requiring the tuning of sparsity-and similarity-controlling parameters.One potential method of application is JCGL [Huang et al., 2017], which is based on a group lasso penalty and currently fixes the penalty parameters according to theoretical results.
To conclude, stabJGL provides a reliable approach to joint network inference of omics data.The output can provide a better understanding of both common and data type-specific mechanisms, which can be used for hypothesis generation regarding potential therapeutic targets.We only consider the AIC selection for FGL as the eBIC considers the same grid of values and hence has identical running time.All methods are run with the same parameter specifications as in the main simulation study.The simulated networks are set to have 50% of their edges in common, generated with the same approach as in the main simulation study.As discussed by Danaher et al. [2014], the group joint graphical lasso is faster that its fused counterpart.The Bayesian spike-and-slab joint graphical lasso is substantially slower than the other methods, taking around ten times longer than the fused joint graphical lasso and stabJGL.Figure B.2 shows the time used by stabJGL to infer K ∈ {2, 3} networks with various numbers of nodes p and 50% of their edges in common.We see that for K = 2 networks, inference for p = 1, 400 nodes is feasible within half an hour, while for K = 3 inference for p = 1, 000 nodes is feasible within about eight hours.As discussed by Danaher et al. [2014], there is an explicit solution to the fused joint graphical lasso problem for K = 2 and hence inference is much faster for stabJGL as well in that case.

C Additional simulation scenarios
Additional simulation studies are conducted to assess a wider range of scenarios and compare the performance of the graphical lasso (Glasso), the fused joint graphical lasso (FGL) and the group joint graphical lasso (GGL) tuned by AIC, the fused joint graphical lasso tuned by eBIC, the Bayesian spike-and-slab joint graphical lasso (SSJGL) and stabJGL.Table C.1 shows the network reconstruction performance of the methods in a K = 3 network setting with p = 200 nodes and various similarity of the true graph structures, averaged over N = 100 simulations.Similarly, Table C.2 shows the results for a K = 4 setting with p = 100 nodes.In Table C.3 , the results are shown for a K = 2 network setting with p = 100 nodes.Finally, the results from a K = 2 network setting with p = 300 nodes are shown in Table C.4 .In the latter case, due to the longer run time of SSJGL as demonstrated in Section B, this method is omitted to make the simulation study feasible within reasonable time (< 48 hours).The results from the additional simulation are in line with those from the main simulation study; stabJGL succeeds at capturing both the sparsity level and similarity between the networks to a better degree than FGL and GGl, while either outperforming the standard graphical lasso for highly similar networks or getting comparable results for unrelated networks.FGL with the alternative eBIC selection mostly selects empty graphs.Finally, SSJGL select very few edges, leading to high precision but very low recall in all cases.

D Choice of variability threshold
Figure D.3 compares the performance of stabJGL for different values of the variability threshold β 1 to the the graphical lasso (Glasso), the fused joint graphical lasso (FGL), the group joint graphical lasso (GGL) and the Bayesian spike-andslab joint graphical lasso (SSJGL).The results for FGL tuned with eBIC are not shown as it selected an empty graph in all settings.The settings considered have K = 2 networks with p = 100 nodes of various similarity.As in the setting considered in the main manuscript, we find that by varying the variability threshold β 1 we can obtain at least as high precision and/or recall as the other methods at any level of similarity.

E.1 Degree distributions
Table E.4 shows the degree distribution of the proteomic networks identified by stabJGL and FGL.While the stabJGL networks all have degree distributions that follow clear power-law distributions, in line with biological expectations, the FGL networks have degree distributions that strongly contradict a power law with most nodes having node degree > 60.

E.2 Top hubs
Table E.5 shows the node degree of the proteins with degree larger than the 90 th percentile in the respective stabJGL networks of the different tumor types.The same table for the FGL networks is shown in Table E.5 .

Figure 1 :
Figure 1: The workflow of stabJGL, where the network structures of different data types or conditions are jointly estimated and can then be compared.

Figure 2 :
Figure 2: Performance of the Glasso, FGL and GGL tuned by AIC, SSJGL and stabJGL, reconstructing K ∈ {2, 4} graphs with p = 100 nodes and various similarity of the true graph structures.The similarity between the graphs is shown as the percentage of edges they have in common.The results are averaged over N = 100 replicates and show the precision and recall for the first estimated graph in each setting, reconstructed from n ∈ {100, 150} observations and n ∈ {150, 200, 250, 300} observations for K = 2 and K = 4 respectively.Standard deviation bars are shown for all methods.All graphs have true sparsity 0.02.

Figure 3 :
Figure3: CPU time in seconds on a logarithmic scale used to jointly infer networks for K ∈ {2, 3, 4} networks and various numbers of nodes p, with n ∈ {100, 150} observations, for FGL tuned with AIC and stabJGL.The computations were performed on a 16-core Intel Xeon CPU, 2.60 GHz.

Figure 5 :
Figure 5: Pairwise Matthew's Correlation Coefficient between the proteomic network structures of the breast cancer (BRCA), ovarian cystadenocarcinoma (OVCA) and uterine corpus endometrial carcinoma (UCEC) tumors, identified by FGL tuned by the AIC and stabJGL respectively.

Figure 6 :
Figure 6: The proteomic network structure identified by stabJGL common to all three tumor types (BRCA, UCEC and OCVA).The node size indicates the degree in the common network structure, with proteins with more edges being represented by larger nodes.
Figure B.1: CPU time in seconds on a logarithmic scale used to jointly infer networks for K ∈ {2, 4} networks and various numbers of nodes p, with n ∈ {100, 150} observations, for the fused and group joint graphical lasso with AIC penalty parameter selection (FGL and GGL), the Bayesian spike-and-slab joint graphical lasso (SSJGL) and stabJGL.The computations were performed on a 16-core Intel Xeon CPU, 2.60 GHz.

Figure
Figure B.2: CPU time in seconds on a logarithmic scale used by stabJGL to jointly infer (a) K = 2 networks with various numbers of nodes p from n 1 = n 2 = 500 observations and (b) K = 3 networks with various numbers of nodes p from n 1 = n 2 = n 3 = 500 observations.The computations were performed on a 16-core Intel Xeon CPU, 2.60 GHz.
simulations and shows the sparsity, precision, and recall of each of the K = 3 estimated graphs.The corresponding standard deviations are shown as well.The graphs are reconstructed from n graphs have sparsity 0.01.The average selected values of the penalty parameters λ 1 and λ 2 for the relevant methods is shown as well.
simulations and shows the sparsity, precision, and recall of each of the K = 4 estimated graphs.The corresponding standard deviations are shown as well.The graphs are reconstructed from n observations.All graphs have sparsity 0.02.The average selected values of the penalty parameters λ 1 and λ 2 for the relevant methods is shown as well.026 (0.007) 0.40 (0.08) 0.51 (0.06) 0.018 (0.005) 0

Figure E. 4 :
Figure E.4: Histogram of the node degrees of the proteomic network of each tumor type, for the (a) stabJGL and (b) FGL networks.

Table 1 :
Performance of the different graph reconstruction methods in simulations, reconstructing graphs with p = 100 nodes from K = 3 networks with various similarity of the true graph structures.The methods included are Glasso, FGL and GGL tuned by AIC, FGL tuned by eBIC, SSJGL and stabJGL.The similarity (percentage of edges that are in common) of the graphs is shown.The results are averaged over N = 100 simulations and shows the sparsity, precision, and recall of each of the K = 3 estimated graphs.The corresponding standard deviations are shown in parentheses.The graphs are reconstructed from n 1 = 150, n 2 = 200 and n 3 = 300 observations.All graphs have sparsity 0.02.The average selected values of the penalty parameters λ 1 and λ 2 for the relevant methods is shown as well.

Table 2 :
Performance of stabJGL for different values of the variability threshold β 1 on simulated data, compared to other graph reconstruction methods.The methods are used to estimate graphs with p = 100 nodes from K = 2 networks, both of sparsity 0.02, of which 20% of their edges are in common.The performance of stabJGL is compared to that of Glasso, FGL, GGL and SSJGL.The results are averaged over N = 100 simulations and shows the sparsity, precision, and recall of each of the K = 2 estimated graphs.The corresponding standard deviations are shown in parentheses.The graphs are reconstructed from n 1 = 100 and n 2 = 150 observations.The average selected values of the penalty parameters λ 1 and λ 2 for the relevant methods is shown as well.

Table 3 :
Network analysis results for stabJGL and FGL tuned by the AIC, applied to data from breast cancer (BRCA), ovarian cystadenocarcinoma (OVCA) and uterine corpus endometrial carcinoma (UCEC) tumors.

Table 4 :
Comparison of evidence for edges in the respective FGL tuned by AIC and stabJGL proteomic networks of breast cancer (BRCA), ovarian cystadenocarcinoma (OVCA) and uterine corpus endometrial carcinoma (UCEC) tumors, considering experimentally determined protein-protein interactions documented in the STRING database.The highest percentage of edges with evidence is in bold.

Table C .
1 : Performance of the different graph reconstruction methods in simulations, reconstructing graphs with p = 200

Table C .
3 : Performance of the different graph reconstruction methods in simulations, reconstructing graphs with p = 100 nodes from K = 2 classes with various similarity of the true graph structures.

Table E .
5 : The genes with node degree larger than the 90 th percentile in the respective stabJGL networks of the different tumor types.The genes that have node degree in the upper 10% in all three tumor types are marked in bold.The genes that only have node degree in the upper 10% in one tumor type are marked in red.

Table E .
6 : The genes with node degree larger than the 90 th percentile in the respective FGL networks of the different tumor types.The genes that have node degree in the upper 10% in all three tumor types are marked in bold.The genes that only have node degree in the upper 10% in one tumor type are marked in red.