Capturing dynamic relevance in Boolean networks using graph theoretical measures

Abstract Motivation Interaction graphs are able to describe regulatory dependencies between compounds without capturing dynamics. In contrast, mathematical models that are based on interaction graphs allow to investigate the dynamics of biological systems. However, since dynamic complexity of these models grows exponentially with their size, exhaustive analyses of the dynamics and consequently screening all possible interventions eventually becomes infeasible. Thus, we designed an approach to identify dynamically relevant compounds based on the static network topology. Results Here, we present a method only based on static properties to identify dynamically influencing nodes. Coupling vertex betweenness and determinative power, we could capture relevant nodes for changing dynamics with an accuracy of 75% in a set of 35 published logical models. Further analyses of the selected compounds’ connectivity unravelled a new class of not highly connected nodes with high impact on the networks’ dynamics, which we call gatekeepers. We validated our method’s working concept on logical models, which can be readily scaled up to complex interaction networks, where dynamic analyses are not even feasible. Availability and implementation Code is freely available at https://github.com/sysbio-bioinf/BNStatic. Supplementary information Supplementary data are available at Bioinformatics online.


Preliminary analysis of static measures and model selection
In a preliminary analysis, 8 different static measures have been investigated for their ability to predict nodes with high dynamic impact. In addition to vertex betweenness (VB, (Freeman, 1977)) and determinative power (DP, (Heckel et al., 2013;Matache and Matache, 2016)) (both descirbed in the main manuscript), we evaluated total node degree (Connectivity, (Guimera and Amaral, 2005)), resistance distance (RD, assigning a node the average of resistances to all other nodes (Klein and Randić, 1993)), coreness (kC, i.e. k-core, (Giatsidis et al., 2013)), eigenvector centrality (evc, (Newman, 2008)), eccentricity (ecc, (Hage and Harary, 1995)) and the Shimbel index (si, (Shimbel, 1951(Shimbel, , 1953Rodrigue, 2016)). These measures as calculated as follows for a network of n nodes: A is the adjacency matrix of the interaction graph, which has an entry of 1 at index (i, j) if node j is regulated by node i and 0 otherwise.

Connectivity:
The connectivity (see e.g. (Guimera and Amaral, 2005)) sums up the output and input degree of a node. Self-loops are only counted once.
Resistance distance: The resistance distance of nodes in a network can be obtained from the Laplacian matrix L and the adjacency matrix A (Klein and Randić, 1993). L is obtained as the difference of the degree matrix D and A. The degree matrix contains the total degree of node i at index (i, i) and 0 otherwise. This matrix can be inverted to obtain the symmetric resistance distance matrix R, which describes how much resistance there is to the flow of information between nodes v and v . These values are averaged for every node.
A node is in the k-core of a graph if it has at least k inputs and k outputs. The highest k for which this is fulfilled is chosen (see (Giatsidis et al., 2013) for further work expanding on this measure).
Eigenvector centrality: The values of this measure are obtained by solving for the eigenvectors x belonging to the largest eigenvalue λ, using the adjacency matrix for the corresponding undirected graph of the network (Newman, 2008).

Eccentricity:
The eccentricity of a node is its shortest path distance minlength from the farthest other node in the graph (Hage and Harary, 1995).
Shimbel Index: The Shimbel Index of a node is given by the sum of lengths of shortest paths minlength from node v to all reachable nodes, given by the number of nodes in these paths, as used by e.g. (Shimbel, 1951(Shimbel, , 1953Rodrigue, 2016).
Like the 3 presented dynamic measures and their average, all of these static measures assign values to every node in a network. The dynamic measure of Hamming distance is normalised by the number of nodes in the network while the measure of attractor gain is normalised by the number of attractors in the unperturbed network. For the measure of attractor loss, the fraction of retained attractors is used which is thus in the range [0,1]. For static measures, the values for connectivity, eccentricity as well as the Shimbel Index have been normalised by the number of nodes in their respective network in order to make these measures more comparable across networks of different sizes.
In total, there are 684 nodes across 35 networks. For every fold in every run of e.g. a 10x10 crossvalidation, a test set is determined which is then separated from the remaining training set. All static and dynamic measures across all nodes are then transformed into z-scores, using the mean and standard deviation across nodes in the training set for normalisation. The z-scores for the three dynamic measures are averaged by node, creating a measure called 'DynAvg' analogously to the averaged dynamic impact measure I. This serves as the basis for labelling nodes as having high or low dynamic impact. Nodes are classified as having high impact if their z-score value is above the median of the distribution of 'DynAvg'. This median and the resulting dynamic labels are determined anew for each test set.
To determine reclassification accuracy, normalisation is performed across all nodes without the use of a separate test set.
In the following Figures S1 and S2, the results of multiple classification algorithms are shown, presenting the best combination of static features for the prediction of this dynamics-based label. Here, every sample corresponds to a single node, yielding a total of 684 samples and 8 static measures. For all algorithms, both a 10x10 as well as a leave-one-subset-out (LOSS, leaving out one network) cross-validation have been performed. We ran classification using all possible combinations of the graph-based features over all nodes of all networks. For classification we used kNN with k = {1, 3, 5}, a random forest and SVMs with linear and RBF kernels. For the 10x10-CV, results show a cross-validation accuracy ranging between minima of 0.461 and 0.545 across algorithms for the worst-performing feature combination and maxima between 0.691 and 0.749 for the best performing combination. Likewise for the LOSS-CV, cross-validation accuracy ranged from minima of 0.335 to 0.494 while its maxima ranged from 0.665 to 0.722. Analogously, the reclassification accuracy ranges between minima of 0.469 and 0.569 to maxima between 0.705 and 1.0.
Given the preferential selection of VB and DP among well-performing feature combinations across algorithms, we further focus our investigation on these two measures. We expect a combination of both static features to perform best. This will be compared to the accuracy of the individual measures.

Determination of threshold
Depending on whether a gene is correctly classified by our static ranking as having high/low impact or not, measures of sensitivity and specificity are used to quantify rates of false positives (FP) and false negatives (FN) in relation to true positives or negatives (TP or TN). These are defined as 4 Sensitivity-specificity curves of averaged dynamic impact for all static measures Figure S3 shows the mean and standard deviations for sensitivity and specificity across networks for all four static measures (VB, DP, VB∪DP and VB∩DP).
For every T , the nodes are ranked according to their scores on VB and DP. The top T = T 100 n nodes are then selected and compared to the corresponding selection based on a dynamic measure. For VB∪DP, the top T nodes are chosen for both VB and DP and the resulting sets are unified. If a node is chosen by both measures, their rankings are averaged. If it is only selected by one of the two, its ranking according to this measure is used. In case of VB∩DP, the intersection of the top T nodes in VB and DP is formed, averaging their rankings.  Figure S4 shows sensitivity-specificity values for any given threshold for the three individual dynamic measures of attractor loss, attractor gain and Hamming distance. The obtained thresholds are demonstrated to vary by 1-3% for the value of T at which sensitivity and specificity are equal, indicating that the results are independent of the chosen dynamic measure.

Bootstrap test for selection threshold
In order to determine the stability of a chosen threshold for the selected static measure, a bootstrap analysis is performed. That is, out of the N = 35 networks, a random sample of N = 35 networks is chosen (with replacement) and the selection threshold of the static measure is recalculated for the three individual dynamic measures as well as their average. This process is repeated 10 4 times. The distributions of obtained thresholds are shown in Figure S5. For the average of the three dynamic measures, a median threshold of 73.0% was obtained with an interquartile range for 4.0%. This indicates that the chosen threshold value is not dependent on the data set used for the calculation of sensitivity and specificity. Figure S5: The distribution of selection thresholds where sensitivity and specificity are equal for every dynamic measure as well as their average, obtained using a bootstrap analysis with 10 4 repetitions. For the average of dynamic measures, the median of the distribution is 73.0% with an interquartile range of 4.0%.
7 Accuracy of selection by network size Figure S6 shows the overlap of the sets of nodes ranking highest on static measures and the average of dynamic measures at the determined threshold of T = 73%, normalised by the number of dynamically relevant nodes. Figure S6: Overlap of sets determined by selecting the highest ranking nodes according to VB and DP and dynamic measures at the previously determined selection threshold of T = 73% (y-axis) versus network size (x-axis).
8 Case study: Classification of nodes 8.1 Network by Cohen et al.
Among the 32 nodes in the network presented by (Cohen et al., 2015), the following classifications are obtained using the measure of VB∩DP at T = 73%:  9 Run time differences An advantage of the use of static methods for intervention target screening is that their run times are orders of magnitude smaller than for the corresponding attractor measures. Across networks, it was found that the suggested static measure of vertex betweenness could be calculated on a time scale of milliseconds while the average time to calculate the determinative power of all nodes in a network was 1.3 seconds using a single core of a desktop computer with a 3.2 GHz Intel Core i5-6500 processor and 16 GB of memory. For comparison, the presented dynamic measures required on average 13.3 seconds to calculate. For a larger network consisting of 131 nodes and 313 edges (Madrahimov et al., 2013), determining the system's set of attractors using BoolNet's getAttractors(SBML, method="sat.exhaustive") function required 48.5 hours of run time. Given this size, the state transition graph has a size of 2 131 ≈ 10 39 nodes. This prohibits the use of the proposed dynamic measures which require a repeated execution of this function. If the expression of a single gene is fixed, the size of the state transition graph will be cut in half. However, two modifications (OE and KO) are performed for every gene. Thus the run time of the dynamic measures is estimated at 131 · 48.5 = 6353.5 hours ≈ 265 days for this network.
In contrast, the measure DP could be determined in 84.8 seconds for this system while VB was still computable in milliseconds.
10 Analysis of selected nodes 10.1 Classification for different z-score definitions We quantified the connectivity C of a gene g using a z-score for their total in-and output degree δ given as Alternative, a robust z-score C g based on the median can be used, which is defined as MAD = median(|δ g − median(δ)|) The following Table S2 shows the classification of nodes for all 35 networks for both of these definitions. We found that the choice of z-score does not affect which nodes are classified as PM (gatekeeper) or NM. However, for the networks (Méndez and Mendoza, 2016) and (Mendoza and Xenarios, 2006), the MAD became zero, which is why the robust z-score could not be calculated in these cases.

Comparison of selection based on static measures to other methods of capturing dynamic relevance
We compared our method to other common methods to detect nodes with high dynamic impact. These other methods include which are canalysing variables for some Boolean functions in their network (Murrugarra and Dimitrova, 2015), nodes making up the feedback vertex set (FVS) of a network (Zañudo et al., 2017), as well as nodes which appear in common network motifs such as feedforward-loops or bifans (Milo et al., 2002;Albergante et al., 2014).
The following Table S3 shows the number of nodes across 684 nodes in all networks which function as canalysing nodes for at least F Boolean functions in their respective network. These results are compared to the nodes selected by our method (PM and NM) and the set of nodes which was not selected by out method. A distribution of nodes in the FVS across the classes PM, NM and NS across networks is shown in Figure S8.
Given their relevance and stability as mentioned by (Albergante et al., 2014), we investigated feedforward loops (both the coherent C1FFL and the incoherent I1FFL) as well as the bifan. We counted the number of motifs of a given type a node participates in and plotted the distribution of these occurrences depending on their classification as shown in Figure S7. Across networks, there were a total of 1086 C1FFLs, 112 I1FFLs and 1197 bifans.
F Total nodes PM (Gatekeepers) NM  NS  selected  non-selected  1  543  222  179  142  2  283  119  108  56  3  137  55  62  20  4  71  26  38  7  5  34  9  22  3  6  17  7  10  0  7  9  3  6  0  8  6  1  5  0  9  6  1  5  0  10  4  1  3  0  11  1  0  1  0   Table S3: Occurrence of nodes which canalyse at least F Boolean functions in their respective network across all 684 nodes, as well as their distribution across the classes PM (positive mismatch, gatekeeper), NM (negative mismatch), NS (non-selected) and hubs. As F increases, the remaining canalysing nodes tend to belong to the more highly connected NM class.  Distribution across all classes PM/Gatekeepers, NM and NS (right). FVS is calculated using the feedback vertex set function in SageMath9.2 (Stein and Joyner, 2005). Figure S9: Venn diagrams showing the overlap between gatekeepers (227 in total), FVS nodes (138 in total), nodes which canalyse at least one Boolean function in their network (543 in total) and nodes which participate in at least one motif (423 in total) (left). The participation in motifs is further visualised as follows: 381 nodes participate in at least one C1FFL motif, 129 in at least one I1FFL and 345 in at least one Bifan (right).

Biological interpretation of selected nodes
The STRING (Szklarczyk et al., 2016) analysis was performed by parsing our networks into the STRING database and checking for co-expressed genes, with the high confidence given by the tool (confidence level of high, 0.7, see Table S4) class co-occurence co-expression PM (Gatekeepers) 1 1 NM 0 3 NS 0 1 Table S4: Number of co-occuring or co-expressed compounds in each class of the predicted nodes across all 35 measured networks according.
Next, we used cBioportal to screen for mutually exclusive pairings compounds across all 22 human networks among the set of analysed networks.