-
PDF
- Split View
-
Views
-
Cite
Cite
Wessel N. Wieringen, Carel F. W. Peeters, Renee X. Menezes, Mark A. Wiel, Testing for Pathway (in)Activation by Using Gaussian Graphical Models, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 67, Issue 5, November 2018, Pages 1419–1436, https://doi.org/10.1111/rssc.12282
- Share Icon Share
Summary
Genes work together in sets known as pathways to contribute to cellular processes, such as apoptosis and cell proliferation. Pathway activation, or inactivation, may be reflected in varying partial correlations between the levels of expression of the genes that constitute the pathway. Here we present a method to identify pathway activation status from two-sample studies. By modelling the levels of expression in each group by using a Gaussian graphical model, their partial correlations are proportional, differing by a common multiplier that reflects the activation status. We estimate model parameters by means of penalized maximum likelihood and evaluate the estimation procedure performance in a simulation study. A permutation scheme to test for pathway activation status is proposed. A reanalysis of publicly available data on the hedgehog pathway in normal and cancer prostate tissue shows its activation in the disease group: an indication that this pathway is involved in oncogenesis. Extensive diagnostics employed in the reanalysis complete the methodology proposed.
1 Introduction
Gene sets, known as pathways, are involved in many cellular processes, such as apoptosis and cell proliferation. When a pathway is observed to be activated in one group of samples, but not in another, it indicates that pathway activation status may help to explain differences between these two groups. Many pathways, such as deoxyribonucleic acid damage repair and TP53 signalling, are known to be involved in oncogenesis (Weinberg, 2006). Changes in activation status of such pathways between tumour and normal samples are thus indicative of cancer. Although individual study of genes in such pathways is possible, it is more efficient to look at changes in pathways as a whole, as these are representative of the entire process.
Transcription levels of genes in a pathway are often described by a Gaussian graphical model (GGM) (see for example Dobra et al. (2004)). Such a model represents interactions between genes in a pathway as conditional dependences. All genes and their interactions amount to a conditional independence graph, in which nodes represent the genes and the presence of edges corresponds to conditional dependences (Whittaker, 1990). Under the assumption of multivariate normality, the conditional dependences between the genes in the pathway can be inferred from the scaled inverse covariance matrix (Whittaker, 1990). This scaled inverse covariance matrix contains the minus partial correlations, i.e. the correlations between a pair of genes conditionally on all other genes in the pathway. The size of the partial correlations reflects the strength of the conditional dependences: the smaller the absolute partial correlation, the weaker the conditional dependence between the associated gene pair. In case a partial correlation is negligible, the two genes can be considered conditionally independent.
The pathway activation status can be captured within a GGM, as the strength of the conditional dependences reflects the partial correlations. Indeed, within an inactive pathway genes typically display weaker conditional dependences compared with when the pathway is active. The conditional dependence structure can be visualized as a graph, where genes are represented by nodes and conditional dependences of a gene pair are symbolized by the presence of an edge between the nodes of this pair. Similarly, conditional independence is represented by the absence of an edge between the genes. Fig. 1 portrays the graph of an active and an inactive pathway, with edge width representing the strength of the conditional dependences. The narrower width of the edges in the inactive pathway indicates weakened conditional dependence compared with its active counterpart.

A pathway in the two conditions (a) ‘active’ and (b) ‘inactive’: in both conditions the pathway is represented by its conditional independence graph, in which an edge represents a conditional dependence between a gene pair; the edge width indicates the strength of the conditional dependence, as reflected in the partial correlations absolute value; the global difference, strong (active) versus weak (active), between these strengths distinguishes the two conditions
Here we present a test to compare pathway activation status between two groups of samples. In Section 2 we propose the use of a joint GGM for both samples, in which differential conditional dependence is represented by a single parameter. In Section 3, we describe how the model parameters can be estimated to accommodate various sample-to-pathway sizes. In Section 4, it is outlined how differential conditional dependence is tested within the model proposed. The performance of estimating the differential conditional dependence parameter is investigated by using a simulation study in Section 5. Finally, in Section 6 we use the method to reanalyse in vivo prostate gene expression data, corroborating the previously in-vitro-found activation of the hedgehog signalling pathway in cancer. The reanalysis introduces several diagnostical techniques that complement the method.
1.1 Related work
Reconstruction of differential GGMs is an active field of research. Published methods can roughly be categorized on the basis of the cross-tabulation of two criteria:
- (a)
are network differences identified by estimation or testing?;
- (b)
are the to-be-detected differences local (e.g. an edge or subnetwork) or global (e.g. some aggregated property of the whole network)?
Methods that focus on identification of local differences joint precision estimation in penalized fashion can be found in Guo et al. (2011), Danaher et al. (2014), Zhao et al. (2014), Price et al. (2015), Bilgrau et al. (2015) and Saegusa and Shojaie (2016). Ha et al. (2015) and Xia (2017) augmented such approaches by a post hoc test procedure for differential edge identification. Alternatively, Peterson et al. (2015) and Mitra et al. (2016) described Bayesian approaches to estimate groupwise precision matrices jointly and to derive the posterior probability of an edge differing between networks. Testing methods for local differences (under a sparsity assumption) are given in Xia et al. (2015) and Städler and Mukherjee (2015). There also exist ‘global plus estimation’ methods that are more heuristic in the sense that these construct the network for each group separately followed by calculation and comparison of various network metrics (Fuller et al., 2007). The method proposed tests for a global difference between two (saturated) networks. Related is the omnibus-type approach of Schott (2007) which tests for a difference in the Frobenius norm of both sample covariance matrices and is a natural competitor to the work that is presented here (see Section 5). For more related methods and an overview of the field refer to the review of Cai (2017).
2 Model
Consider an oncogenomics study with an observational experimental design involving n samples. The transcriptome of each sample in the study has been profiled. The interest lies in the expression levels of a set of p genes that constitute a pathway. Let Yi, j be the expression level of gene j in sample i. The expression profile of sample i of all p genes that belong to the pathway is denoted by Yi,* and the collection of n expression profiles by Y*,*.
The n samples originate from two groups, which we take to represent two different stages of cancer. One group is thought of as being further advanced in the disease. With the progression of cancer, the pathway under consideration is assumed to be dysregulated, so that the pathway activation status will differ between the groups.
The gene expression data Y*,* are modelled as follows. Let for and for . The covariance matrix Σ1 is parameterized as Σ1 = ϒ1Ω−1ϒ1 with ϒ1 a diagonal matrix such that Ω is the standardized precision matrix (related to the partial correlation matrix up to the sign of the off-diagonal elements). The covariance matrix Σ2 is parameterized as Σ2 = ϒ2{(1−γ)Ω + γIpp}−1ϒ2 with ϒ2 a diagonal matrix and γ ∈ [0, 1]. If γ = 0, there is no difference between the two groups. If γ = 1, all conditional dependences in the second group have vanished. Other values of γ, i.e. γ ∈ (0, 1), represent intermediate degrees of differential conditional dependence. Hence, any γ ∈ (0, 1] corresponds to pathway inactivation in the more advanced stages of cancer. Note that γ represents an aggregation of differential conditional dependence, saying little about the conditional dependence of a particular gene pair. The difference in distribution between the groups when γ > 0 is illustrated in the on-line supplementary material (SM) section I.
The joint model describing both groups is identifiable. This is obvious for the parameters of the first group's GGM. However, the GGM of the second group is overparameterized. However, by coupling the standardized precision matrices between both groups, all its parameters become identifiable. As Ω is estimable from the first group and ϒ2 from the marginal variances of S2, there remain degrees of freedom for the estimation of the scalar γ.
3 Estimation
In this section we describe how the parameters of the two-sample GGM are estimated in a classical context in Section 3.1, and in a penalized context in Section 3.2. Finally, in Section 3.3 we outline the choice of the penalty parameter.
3.1 Basics
- (a)
2γp1 > 0,
- (b)
the elements of are real,
- (c)
is semipositive definite,
- (d)
is positive definite and
- (e)
the sum of a positive definite and a semipositive definite matrix is positive definite (see lemma 14.2.4 of Harville (2008)).
Note, however, that the estimator need not have a unit diagonal. To achieve this, is standardized to have a unit diagonal, which warrants the interpretation of as a standardized precision matrix.
The estimators of the individual parameters γ, Ω, ϒ1 and ϒ2 are combined into a single estimation procedure. This gives an iterative procedure in which at each iteration the individual estimators are updated, keeping the others constant. The estimation is terminated if the absolute difference between the (penalized) log-likelihood of two successive iterations is smaller than a user-specified threshold. A full pseudocoded description of the algorithm producing the ML estimates can be found in the on-line SM sectionI I.
3.2 Penalized estimation
- (a)
unique, which follows from the concavity of the two summands that constitute the penalized log-likelihood, and
- (b)
shrinks to the identity matrix as λ → ∞.
Instead of the penalty employed we may consider an l1-penalty. The resulting estimate would be sparse, i.e. have many off-diagonal elements equalling 0. The sparsity of such an estimate appeals to the parsimony principle that prefers simple models. However, a simple model should not oversimplify. The l1-methods generally assume a severe degree of sparsity to warrant theoretical results. Depending on the context the sparsity required may be too stringent. Here the focus is on molecular biology, where the validity of a stringent sparsity assumption is doubtful (see Boyle et al. (2017)). The method proposed, making no assumption on sparsity, seems then to be more appropriate for the application in mind. Moreover, a sparse estimate may hamper the estimation of the differential conditional dependence parameter. It can then only be learned from the few non-zero elements of the sparse standardized precision matrix estimate, as it is unidentifiable from the 0s.
Note that, although the regularization of the estimation of Ω leaves the estimating equation of γ unchanged, it affects the estimate of γ. This is because an increase in either γ or λ reduces the size of the off-diagonal elements of the standardized precision matrix (pari passu the partial correlation matrix) of the second group. Consequently, the larger λ is, the closer the penalized estimate of Ω to Ipp, and the smaller the change in the partial correlations that can be assigned to γ. In particular, it is clear from equation (4) that γ cannot be estimated in the limit λ → ∞. In conclusion, for large λ the γ-parameter is likely to be underestimated and thus the differential correlation between the two conditions.
At the other end of the penalty scale, as λ approaches 0, the opposite may happen: the estimate of γ could be biased upwards. To see this, consider the setting in which is (close to being) of deficient rank, i.e. . When λ↓0 the largest eigenvalue of tends to ∞ (because of the rank deficiency of ). Consequently, the right-hand side of γ's estimating equation (4) is dominated by the contribution of this eigenvalue to the trace. In contrast, on the left-hand side of equation (4) its contribution is {(Dω)1, 1−1}/{(1−γ)(Dω)1, 1 + γ}, where (Dω)1, 1 corresponds to the first and thus largest eigenvalue of . For small γ and large (Dω)1, 1 the latter contribution is (relatively) small and does not balance the right-hand side of equation (4). Then, to satisfy its estimating equation, γ must compensate for the ill-conditioned , irrespectively of the ‘true’ value of γ. In particular, if the true γ is small, its estimate will shy away from 0. Hence, in undersampled situations too little penalization may bias the estimate of γ upwards.
Penalized estimation procedures have a Bayesian counterpart. To identify the latter for the case at hand assume a Wishart prior π(Ω), with , on Ω. The Bayesian mode a posteriori estimator will then coincide with the proposed penalized estimator. This is immediate after noting that
- (a)
the posterior is proportional to the likelihood times the prior,
- (b)
the location of the maximum of the posterior is that of the log-posterior and
- (c)
substitution of the prior's parameter in the Wishart density reveals that and the logarithm of the prior is thus proportional to the employed penalty.
Put together, the log-posterior and the penalized log-likelihood are proportional and share the same maximizer. In particular, the mode a posteriori and penalized estimator share the same λ-limiting behaviour: as λ increases π(Ω) becomes more concentrated around zero with the posterior in its trail.
3.3 The penalty parameter
The optimal penalty parameter is chosen by means of conventional leave-one-out cross-validation (LOOCV). LOOCV leaves out the samples one by one, refits the model for the remaining samples and evaluates the performance (here operationalized as the unpenalized log-likelihood) of the refitted model on the left-out sample. The penalty parameter that yields on average (over the left-out samples) the best performance is considered optimal.
4 Testing for pathway activation status
Here we describe how to evaluate whether there is a difference in the ‘average’ edge strength between the groups, represented by the parameter γ: in other words, how to test the null hypothesis H0:γ = 0 against its alternative Ha:γ > 0. A natural test statistic for this is the estimate of γ itself. The null distribution of this test statistic is generated through permutation of the group labels. Denote for permutations k = 1, …, K the test statistic, the estimated γ, by Tk. The set of test statistics that is obtained through permutation forms the null distribution associated with H0:γ = 0 from which the p-value can be estimated by . When interest lies only in the p-value exceeding the significance level, the efficient permutation scheme that was proposed by Van Wieringen et al. (2008) may be used. Its efficiency gain is due to preliminary termination of the permutation scheme when the probability that the p-value is small is quite low (under the assumption of a binomial process). To decide on this, the 99.9% confidence interval of the current p-value estimate is calculated after a limited number of permutations. When the lower confidence bound does not exceed the significance level, the true p-value is unlikely to be smaller and the permutation scheme may be terminated.
The test statistic employed is estimated in a penalized fashion. Consequently, inference is conditional on the penalty parameter λ. As the choice of λ may affect the inference, we ensure its representativeness under H0 through two modifications of the scheme that was outlined in Section 3.3:
- (a)
fix γ = 0 (as prescribed by H0) in the estimation procedure and
- (b)
permute the group labels before the selection of the penalty.
The latter warrants that λ is not only chosen under the null model, but also under the ‘null data’. Finally, to eliminate artefacts that are introduced by permutation of the group labels before penalty selection, the selection procedure may be repeated.
- (a)
diagonalizing simultaneously A, B and C and the current approximation Z of W,
- (b)
updating the error E through formula (8) by using the diagonalized matrices that are acquired in the previous step,
- (c)
updating Z by using the latest estimate of the error E and
- (d)
repeating until convergence.
With an estimate of W at hand, those of X and, consequently, ϒ are readily acquired.
5 Simulation
We now evaluate the performance of the above estimation and testing procedure by means of a simulation study. Here, several parameters are fixed throughout, whereas a full factorial design is set up for the remaining parameters.
In all simulations the standarized precision Ω is chosen to be symmetric with unit diagonal and three non-zero off-diagonal bands: (Ω)j, j + 1 = 0.5 = (Ω)j + 1, j for j = 1, …, p−1, (Ω)j, j + 2 = 0.25 = (Ω)j + 2, j for j = 1, …, p−2 and (Ω)j, j + 3 = 0.1 = (Ω)j + 3, j for j = 1, …, p−3. Similarly, the diagonal matrices ϒ1 and ϒ2 are set as equal to the identity matrix, (ϒ1)j, j = 1 = (ϒ2)j, j for j = 1, …, p, and kept constant throughout. While Ω, ϒ1 and ϒ2 are fixed, a full factorial design is built with the remaining free parameters γ, p and . The values of the remaining parameters are set to γ ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5}, p ∈ {25, 50, 75, 100} and n ∈ {25, 50, 75, 100, 125, 150}. The ranges of the last two parameters are motivated from practice. The range of p represents realistically sized and sufficiently specific pathways, whereas that of n covers the typically observed sample sizes of genomics studies. With the parameters at hand, data are drawn in accordance with the model that was outlined in Section 2: for i = 1, …, n and for i = n + 1, …, 2n.
To assess the behaviour of the estimation procedure, the model parameters γ, Ω, ϒ1 and ϒ2 are estimated from the generated data by using the estimation procedure in Section 3, for a given choice of the penalty parameter λ in a grid covering the interval [10−2, 1]. Each design point of the factorial layout is repeated 100 times. The 100 resulting estimates (or statistics derived thereof) of the parameters are then aggregated (by means of the 10%, 50% and 90% quantiles) and plotted against the penalty parameter λ.
The simulation results on the estimation of γ for all settings (combinations) of the factorial design are displayed in Figs 2(a) and 2(b) and in the on-line SM section III. The obvious conclusion from these plots is that for a reliable and unbiased estimate of γ a reasonable (or large) number of samples (and little to no penalization) is required. In all other cases, the estimate of γ will be biased. Nevertheless, estimates are concordant with the true value of the parameter, so long as the data are not dramatically undersampled and appropriate penalization is obtained.

Simulation results: (a), (b) estimated γ plotted against the (logarithm of the) penalty parameter λ for γ ∈ {0.0 (), 0.1 (
), 0.2 (
), 0.3 (
), 0.4 (
), 0.5 (
)} (samples sizes n = 25 and n = 150 both with a (25 × 25)-dimensional Ω) (
, penalty parameter at which the estimates of Ω (on average over all 6 × 100 simulations) have a condition number equal to the true Ω); (c), (d) estimated power functions versus significance level α (parameters n and p and γ-colouring are as in (a) and (b))
To assess the power of the proposed test for H0:γ = 0 versusHa:γ > 0, parameter choices for γ, Ω, ϒ1 and ϒ2, as well as values for n and p, were as in the scheme described. Here the combination (n = 25, p = 100) is excluded, as the estimates of γ did not reveal a concordant pattern with the true values of γ. Note that the significance α is k/100 with k = 1, …, 20.
The power is assessed as follows. For iteration l,
- (a)
draw from the null model (H0:γ = 0),
- (b)
determine λl for iteration l,
- (c)
obtain the permutation distribution of the test statistic for by using 1000 permutations, given λl,
- (d)
find critical value cα, l for γ (using various choices of the level of significance α) and
- (e)
draw 1000 times from the null model, estimate γ for each draw and assess whether this estimate exceeds the critical value cα, l: the fraction exceeding estimates is the power estimate.
This procedure is repeated for 100 iterations to obtain 100 power estimates. The 100 resulting power estimates are then aggregated by means of the 10%, 50% and 90% quantiles, and plotted against the significance level α. To evaluate the power under non-zero γs, we also draw from Ha rather than from H0.
The simulation results on the power of the test for all combinations of the factorial design are displayed in Figs 2(c) and 2(d) and in the on-line SM section IV. We display the results by fixing three of the parameters and examining the relationship between the remaining quantities on the power. The following conclusions can be drawn from these plots. If H0:γ = 0 is true, the power–significance relationship (i.e. 1−β = f(α)) nicely follows the line x = y, indicating that the test exhibits proper behaviour under H0. Furthermore, the power of the test increases with both γ and the sample size n. An increase in the dimension p also improves the power. Hence, for larger values of p the test is not hampered by the biased estimation of γ; rather it still has power to distinguish between different values of γ on the basis of data because of the concordancy of the estimation procedure. These results provide sample size recommendations. For instance, for a study involving a p = 25-sized pathway where it is sought to detect a differential conditional independence of γ = 0.5 at significance α = 0.05 and power 1−β = 0.80, sample sizes of suffice.
The performance of the proposed testing procedure is evaluated under a misspecified alternative Ha (see the on-line SM section V for details and results). This reveals that the test proposed still has power under a misspecified alternative, but either the sample size should be relatively large, the pathway should be reasonably sized or the proportion of the pathway exhibiting differential conditional dependence should not be too small.
We also compare the proposed testing procedure with the omnibus-type test of Schott (2007). The competing test employs the test statistic , where both and are ridge covariance estimates (Van Wieringen and Peeters, 2016) with a common penalty parameter chosen to be the average of their individual LOOCV penalty parameters. The p-value is then calculated by permutation. The power curves for this test can be found in the on-line SM section VI. The omnibus-type test performs competitively to the proposed test for large sample sizes, strong signals and large levels of significance but is outperformed by our test for smaller sample sizes and weaker signals. For instance, when n = 75, p = 25 and α = 0.05 the power to detect differential conditional independence of γ = 0.2 is approximately 0.9 and 0.99 for the omnibus-type and the proposed test respectively. Moreover, where a significant result from the omnibus-type test gives no clue about the reason for the difference, the test proposed provides a clear pointer for the difference detected.
6 Application
We applied our proposed test to reanalyse gene expression data from several prostate cancer studies. The focus is on the hedgehog signalling pathway, which is known to contribute to cellular proliferation in cancer (Evangelista et al., 2006). Both Karhadkar et al. (2004) and Sheng et al. (2004) have reported the in vitro activation of the hedgehog signalling pathway in cancer, based on significant differences in mean expression levels of hedgehog-related biomarkers. This is corroborated by a simple differential expression analysis, which reveals multiple significant gene families in the six in vivo data sets (Table 1). Here we use the proposed methodology to investigate whether this activation is also noticeable in the conditional independences among the pathway's genes.
Results of the test of differential conditional dependence in the hedgehog signalling pathway from six gene expression omnibus prostate cancer data sets (with in both the second and the third data set one outlying sample removed)†
Data set . | Number normal . | Number cancerous . | Number of families . | Number significant . | λLOOCV . | . | p-value . |
---|---|---|---|---|---|---|---|
GSE6919_1 | 81 | 90 | 13 | 7 | 3.36 × 10−8 | 0.19659 | 0.00100 |
GSE6919_2 | 76 | 91 | 13 | 4 | 1.48 × 10−8 | 0.57153 | 0.00100 |
GSE6919_3 | 75 | 89 | 8 | 4 | 1.57 × 10−8 | 0.36269 | 0.00400 |
GSE29079 | 48 | 47 | 17 | 14 | 3.09 × 10−8 | 0.01481 | 0.88212 |
GSE68907 | 50 | 52 | 13 | 1 | 4.45 × 10−8 | 0.01977 | 0.72827 |
GSE3933 | 41 | 71 | 16 | 7 | 1.12 × 10−8 | 0.23647 | 0.00100 |
Data set . | Number normal . | Number cancerous . | Number of families . | Number significant . | λLOOCV . | . | p-value . |
---|---|---|---|---|---|---|---|
GSE6919_1 | 81 | 90 | 13 | 7 | 3.36 × 10−8 | 0.19659 | 0.00100 |
GSE6919_2 | 76 | 91 | 13 | 4 | 1.48 × 10−8 | 0.57153 | 0.00100 |
GSE6919_3 | 75 | 89 | 8 | 4 | 1.57 × 10−8 | 0.36269 | 0.00400 |
GSE29079 | 48 | 47 | 17 | 14 | 3.09 × 10−8 | 0.01481 | 0.88212 |
GSE68907 | 50 | 52 | 13 | 1 | 4.45 × 10−8 | 0.01977 | 0.72827 |
GSE3933 | 41 | 71 | 16 | 7 | 1.12 × 10−8 | 0.23647 | 0.00100 |
Under the alternative hypothesis the conditional dependences are smaller in the normal group. The columns (from left to right) contain the omnibus data set identifier, the number of normal samples, the number of cancer samples, the number of gene families comprising the pathway, the number of gene families exhibiting differential expression (based on the Wilcoxon test and after Hochberg multiplicity correction), the LOOCV optimal penalty parameter, the estimated differential conditional independence parameter and the p-value.
Results of the test of differential conditional dependence in the hedgehog signalling pathway from six gene expression omnibus prostate cancer data sets (with in both the second and the third data set one outlying sample removed)†
Data set . | Number normal . | Number cancerous . | Number of families . | Number significant . | λLOOCV . | . | p-value . |
---|---|---|---|---|---|---|---|
GSE6919_1 | 81 | 90 | 13 | 7 | 3.36 × 10−8 | 0.19659 | 0.00100 |
GSE6919_2 | 76 | 91 | 13 | 4 | 1.48 × 10−8 | 0.57153 | 0.00100 |
GSE6919_3 | 75 | 89 | 8 | 4 | 1.57 × 10−8 | 0.36269 | 0.00400 |
GSE29079 | 48 | 47 | 17 | 14 | 3.09 × 10−8 | 0.01481 | 0.88212 |
GSE68907 | 50 | 52 | 13 | 1 | 4.45 × 10−8 | 0.01977 | 0.72827 |
GSE3933 | 41 | 71 | 16 | 7 | 1.12 × 10−8 | 0.23647 | 0.00100 |
Data set . | Number normal . | Number cancerous . | Number of families . | Number significant . | λLOOCV . | . | p-value . |
---|---|---|---|---|---|---|---|
GSE6919_1 | 81 | 90 | 13 | 7 | 3.36 × 10−8 | 0.19659 | 0.00100 |
GSE6919_2 | 76 | 91 | 13 | 4 | 1.48 × 10−8 | 0.57153 | 0.00100 |
GSE6919_3 | 75 | 89 | 8 | 4 | 1.57 × 10−8 | 0.36269 | 0.00400 |
GSE29079 | 48 | 47 | 17 | 14 | 3.09 × 10−8 | 0.01481 | 0.88212 |
GSE68907 | 50 | 52 | 13 | 1 | 4.45 × 10−8 | 0.01977 | 0.72827 |
GSE3933 | 41 | 71 | 16 | 7 | 1.12 × 10−8 | 0.23647 | 0.00100 |
Under the alternative hypothesis the conditional dependences are smaller in the normal group. The columns (from left to right) contain the omnibus data set identifier, the number of normal samples, the number of cancer samples, the number of gene families comprising the pathway, the number of gene families exhibiting differential expression (based on the Wilcoxon test and after Hochberg multiplicity correction), the LOOCV optimal penalty parameter, the estimated differential conditional independence parameter and the p-value.
The data originate from six prostate cancer studies that are publicly available from the gene expression omnibus repository (Edgar et al., 2002). Each study comprises normal and cancer samples (Table 1). For five of these studies (studies 1, 2, 3, 4 and 6), preprocessed data were available. A sixth study (study 5) was preprocessed as detailed in the on-line SM section VII. Of each study, probes interrogating genes which entrezID map to the Kyoto Encyclopedia of Genes and Genomes definition (Ogata et al., 1999) of the hedgehog signalling pathway were retained. Expression levels of selected probes that contribute to the same family (as defined by the Encyclopedia and for completeness given in the SM section VII) are summarized by their median. The number of gene families (see Table 1) may differ between studies because of the employed microarrays interrogating different sets of genes. Finally, the second and third data sets (GSE6919_2 and GSE6919_3) both contain an outlying sample (as can be seen from their principal component plots, which are not shown). These samples are removed, even though results were not qualitatively affected (see the SM section VII for results with outlying samples).
In each data set, we test whether the hedgehog signalling pathway is activated (i.e. has stronger partial correlations) in the cancer group. For this, the optimal penalty parameter is determined by LOOCV under H0 (see Section 4). Given the penalty parameter selected, the test statistic γ is estimated and its likelihood under H0:γ = 0 (versusHa:γ > 0) is evaluated by using 1000 permutations.
Table 1 contains the results (i.e. λLOOCV, and p-value). In each data set, λLOOCV was rather small. This is due to the reasonable number of samples in these studies in comparison with the pathway size. In four of the six data sets, the estimated γ hinted at stronger conditional dependences in the cancer group and γ significantly different from 0, whereas the other two are virtually 0. Test results on Gaussianized data, obtained through gene familywise application of the probability integral transformation that preserves conditional independences (Liu et al., 2009), confirmed conclusions in Table 1 (the SM section VII). This suggests that removal of the two outlying samples is acceptable.
Considering now the GSE29079 data set, for which the test result was not significant, we noted that it involves many differentially expressed gene families. Therefore, an explanation for its non-significance may be the constitution of the tumour group, comprising two subgroups: TMPRSS2-ERG fusion positive and negative (and several unclassified) tumours (Börno et al., 2012). Heterogeneity often leads to the dilution of partial correlations (Van Wieringen and Van der Vaart, 2015) and, as such, may obscure the activation of the hedgehog signalling pathway. In the case of the GSE68907 data set, also with a not-significant test result, no explanation can be offered except that it also exhibits little differential expression.
The results that are presented in Table 1 are not unanimous about the activation of the hedgehog signalling pathway. We therefore proceeded further exploring analyses results. We started by looking into the possibility of the inactivation of the hedgehog signalling in cancer. For this, the roles of the normal and cancer groups were simply reversed and the analyses repeated. This revealed (see the on-line SM section VII) that all estimated γs were either 0 or virtually 0, suggesting that inactivation is generally unlikely.
The test results implied strengthened partial correlations in the cancer group in several of the data sets. To verify this, the precision matrices in both groups were estimated by means of the fused ridge ML approach of Bilgrau et al. (2015), in which LOOCV-chosen penalty parameters were again rather small (not shown). The partial correlations that were obtained were compared between the groups by using violin plots (Fig. 3(a) for the GSE6919_3 data set—results for the remaining data sets are in the on-line SM section VII). These violin plots show a comparable median partial correlation in both groups, but the larger quartile box and a heavier skewed tail confirm the significant test result for GSE6919_3.

(a) Violin plot of estimated partial correlations against group, (b) scatter plot of partial correlations from the estimated standardized precision against those obtained from the fused ridge groupwise precision matrix estimates (, normal;
, cancer), (c) empirical cumulative distribution functions of the absolute partial correlations from the fused ridged groupwise precision matrix estimates (
, normal;
, cancer) and (d) marginal variance estimates ( and ) of both groups plotted against each other: all figures relate to the GSE6919_3 data set only
The presence of differential conditional dependence can be assessed visually in various ways. The estimated partial correlations can be plotted against those obtained from the fused ridge estimates of the groupwise precision matrices. To emphasize the salient features, the latter were regressed (in line with the model presented without intercept) on the former. A steeper line indicates strengthened conditional dependences and is indeed observed for most data sets (see Fig. 3(b) for the GSE6919_3 data set, and the on-line SM section VII for the remaining data sets).
Subsequently, we examined the absolute partial correlations distributions to check whether these changed between groups. For this we used the empirical cumulative distribution function which, when shifted to the right (away from zero), indicates stronger conditional dependences. The empirical cumulative distribution functions of both groups in each data set were plotted (see Fig. 3(c) for the GSE6919_3 data set, and the on-line SM section VII for the remaining data sets). For data sets exhibiting significantly differential conditional dependence, these plots revealed a right-shifted empirical cumulative distribution function for the cancer group.
Another important check is that of adequacy of the global likelihood maximization, via the joint loss function of parameters from both groups. Here the inclusion of the differential conditional independence parameter need not lead to a better fit in the normal group. This was assessed by evaluating the Frobenius and quadratic loss of and , compared with the fused ridge standardized precision estimate of the normal sample. We found that a precision matrix with globally weakened conditional dependences—i.e. inclusion of a non-zero γ—generally leads to an improved description (not shown), in terms of fit criteria other than the (joint) likelihood, of the data from the normal sample.
The equal variance assumption (ϒ1 = ϒ2) under which the null hypothesis H0:γ = 0 is also examined. For this, the unconstrained model is fitted to the data by using the penalty parameter from the original constrained analysis. For each data set, the resulting are plotted against the . The plots (see the on-line SM section VII) do not give substantial reason to doubt the equal variance assumption. Finally, the thus—without the equal variance assumption—re-estimated γs are compared with those reported in Table 1. The two estimates do not differ substantially.
Once it has been concluded that the hedgehog signalling pathway is implicated in cancer, interest shifts towards identification of its constituents that are most likely to be responsible for this activation. To assess the contribution of each gene family, we evaluated the effect of its ‘removal’ from the pathway (through conditioning) on the differential condition dependence parameter. This was done by replacing a group sample covariance matrix, e.g. S1, by its conditioned counterparts: . The estimated differential conditional dependence parameter is denoted by , making use of the penalty parameter from the original analysis. Their average approximately equals the original parameter estimate: . We observed that the ‘removal’ of gene families with positively deviating from resulted in stronger conditional dependences among the remaining families. Similarly, when removing gene families for which negatively deviated from resulted in weaker conditional dependences for remaining families. The WNT, PKA and Ci (also called GLI) gene families exhibit the largest positive deviations (consistently in significant data sets), whereas the CK1 (also called CSNK1), Gas1 and GSK3B families deviate most consistently in the opposite direction. Strikingly, the first three are known to be implicated in oncogenic processes, whereas the last three are associated with tumour suppressing functionalities (e.g. Del Sal et al. (1992) and Fuja et al. (2004)). With the effect of their ‘removal’ in mind, it may be suggested that ‘oncogenic families’ have a dysregulating effect on the pathway, reflected in weaker conditional dependences. Conversely, ‘tumour suppressing families’ appear to stimulate cohesion among the pathway's constituents.
7 Discussion
We have addressed the problem of detecting differences in pathway activation status between two groups. For each group, we posed a GGM with different marginal variances but proportional partial correlations. The proportionality is parameterized by a single parameter representing differential systematic conditional dependence. Parameters of both GGMs are estimated jointly by an iterative procedure maximizing the likelihood. The estimation procedure is extended beyond the realm of low dimensional studies by augmenting the likelihood with a non-informative concave penalty on the standardized precision matrix. The associated penalty parameter is chosen by means of cross-validation and shown to have an interpretation as the mixing parameter between the data and a non-informative sample. The capacity of the proposed estimation procedure to estimate the differential conditional dependence parameter was studied in simulation, revealing good performance in low dimensional studies (), yielding useful estimates in medium dimensional studies (up to ), but breaking down for higher dimensional studies. Next, a testing procedure for differential systematic conditional dependence by means of a permutation scheme was described. To warrant exchangeability under the null hypothesis, the estimation was modified to accommodate the assumption of equal marginal variances of the groups. Finally, with the method proposed at hand, the in vitro activation of the hedgehog signalling pathway was confirmed in various in vivo gene expression data sets. The soundness of this conclusion was assessed by a wide range of diagnostics that complement the methodology proposed.
An alternative motivation for a test of pathway inactivation can be found in an emerging paradigm in cancer research (Teschendorff and Severini, 2010; Van Wieringen and Van der Vaart, 2011). Central to this paradigm is the notion of the entropy of a system. Here a system refers to the regulatory network of the pathway. Entropy is then a measure of concentration of the transcription levels of this system and can be interpreted as a measure of dysregulation (see Van Wieringen and Van der Vaart (2015)). The paradigm suggests that the transcriptomic entropy generally (in most pathways) increases as cancer progresses, with the exception of processes that are essential to the viability of the cancer cell. Previously, in Van Wieringen and Van der Vaart (2015), we have proved (under the assumption of a GGM) that the entropy increase or decrease may be caused by weakening or strengthening of conditional dependences within the system. However, other causes may underlie the surge in transcriptomic entropy (Van Wieringen and Van der Vaart, 2015). A test for pathway (in)activation may enable us to distinguish between causes of entropy increase or decrease.
Future work involves extending the presented work in several directions. Firstly, an alternative procedure for choosing the penalty parameter may exploit the mixture interpretation of the penalized log-likelihood. The mixing proportion could be endowed with a prior, but indirectly to avoid an informative prior. This could be done through a prior on the condition number of the standardized precision matrix (following Won et al. (2013)). A large condition number would result from too little penalization (i.e. a small mixing proportion) in the medium-to-high dimensional case. A unit condition number is the result of too much penalization (and a unit mixing proportion). An ideal prior balances between these extremes and is flexible to accommodate the various study dimensionalities. A potential benefit of this approach may be computational gain with respect to the LOOCV procedure described.
Secondly, we could allow the strength of the differential conditional dependence to be associated with a linear covariate. For instance, very small ribonucleic acid (RNA) molecules (micro-RNAs) with expression levels also measured on a continuous scale either contribute to the degradation of messenger RNA or probihit its translation. As micro-RNAs are believed to play an important role in oncogenesis, it is important to be able to detect micro-RNA-activated pathways.
Thirdly, the approach proposed may need to be robustified. In the application two outlying samples were removed from the analysis. It would be more elegant to leave every sample in and, if the data give rise to it, to downweight their influence, thus robustifying the method against outliers. Pragmatically, one may estimate the groupwise sample covariance matrices—the form in which the data enter the method—robustly by using methodology described in for example Campbell (1980). Preferably, an improved version of the method that is presented here would then complement such pragmatics by in-built safeguards against extreme observations.
A final inroad for improvement would be to stengthen the method's performance beyond the low and medium dimensional domain. This may require the use of a different penalty on the standardized precision matrix, for instance, a combination of the l1- and trace norm, that would yield a sparse and low rank matrix which has been shown to preserve the correlation structure reasonably (see Richard et al. (2012)). Although the sparsity of such an estimate is not the objective of the method presented, its low rank suggests that it would be stable in the high dimensional case. However, care should be applied as the sparsity of the estimate may hamper the estimation of the differential conditional dependence parameter (see Section 3.2).