Abstract

Biological network-based strategies are useful in prioritizing genes associated with diseases. Several comprehensive human gene networks such as STRING, GIANT and HumanNet were developed and used in network-assisted algorithms to identify disease-associated genes. However, none of these networks are disease-specific and may not accurately reflect gene interactions for a specific disease. Aiming to improve disease gene prioritization using networks, we propose a Disease-Specific Network Enhancement Prioritization (DiSNEP) framework. DiSNEP first enhances a comprehensive gene network specifically for a disease through a diffusion process on a gene–gene similarity matrix derived from disease omics data. The enhanced disease-specific gene network thus better reflects true gene interactions for the disease and may improve prioritizing disease-associated genes subsequently. In simulations, DiSNEP that uses an enhanced disease-specific network prioritizes more true signal genes than comparison methods using a general gene network or without prioritization. Applications to prioritize cancer-associated gene expression and DNA methylation signal genes for five cancer types from The Cancer Genome Atlas (TCGA) project suggest that more prioritized candidate genes by DiSNEP are cancer-related according to the DisGeNET database than those prioritized by the comparison methods, consistently across all five cancer types considered, and for both gene expression and DNA methylation signal genes.

Introduction

Gene networks capture existing knowledge of biochemistry, functional and statistical interactions between genes, where nodes represent genes and edges represent interactions between genes [1]. Human gene networks have been widely used to identify genetic or genomic factors of diseases from gene expression profiles [2, 3], genome-wide association studies (GWAS) [4, 5] and epigenome-wide association studies (EWAS) [6, 7]. Many network-assisted algorithms were developed, which utilize gene network topologies to help prioritize disease-associated genes. These algorithms can be broadly grouped into two categories: direct neighborhood methods that prioritize candidate genes through utilizing directly connected neighbors on a network [5, 8] and network diffusion methods that prioritize candidate genes through propagating disease association signals on an entire network [9, 10]. As emphasized in a review paper, prioritizing candidate genes helps maximize the biological relevance of follow-up validation experiments or functional studies [11]. Using network-assisted prioritization algorithms, novel candidate omics features have been identified in a wide spectrum of diseases, such as Cerebral Cavernous Malformations [3], Crohn’s disease [4] and different types of cancers [5–7], when many of these omics features may be missed by standard statistical tests [2–7]. In these network-assisted algorithms, biological networks help prioritize disease-associated genes through their proximity to genes within networks that have strong disease association signals [12].

To better prioritize candidate genes for a disease using a network-assisted algorithm, a gene network that accurately represents gene interactions for the disease is critical. A number of comprehensive human gene networks were developed, such as STRING [13], HumanNet [4], GIANT [14] and ConsensusPathDB [15]. Also growing rapidly is the number of interactions in these networks [1]. As none of these comprehensive human gene networks are disease-specific, they may not accurately reflect gene interactions for a specific disease. Huang et al. [1] showed that the choice of a gene network greatly affects identification of downstream candidate genes. Nam et al. [16] proposed a disease-specific network-assisted prioritization method, but their method only works for immune diseases and binary protein–protein interaction networks. In this article, we propose a general Disease-Specific Network Enhancement Prioritization (DiSNEP) framework that enhances a comprehensive human gene network for a specific disease in order to improve prioritization of genes associated with the disease.

The proposed DiSNEP framework first enhances a general gene network into a disease-specific gene network for a disease. This is done through a random walk process using a gene similarity matrix derived from disease omics data where each element of the gene similarity matrix is a correlation between two genes. The enhanced disease-specific gene network can then be used to prioritize gene association signals by diffusion. We conducted simulation studies to show the better performance of the DiSNEP framework compared to methods that prioritize candidate genes using a general human gene network or without prioritization. In all simulation scenarios considered, the genes prioritized by DiSNEP include more true disease signals than those by the comparison methods with the original gene network or without prioritization. We applied the DiSNEP framework to prioritize cancer-associated gene expression signal genes using gene expression-enhanced cancer-specific networks and to prioritize DNA methylation signal genes using DNA methylation-enhanced cancer-specific networks for five cancer types from The Cancer Genome Atlas (TCGA) project: kidney renal papillary cell carcinoma (KIRP), thyroid carcinoma (THCA), stomach adenocarcinoma (STAD), liver hepatocellular carcinoma (LIHC) and prostate adenocarcinoma (PRAD). Our results show that (1) consistently across all five cancer types considered and (2) for both gene expression signal genes and DNA methylation signal genes, more candidate genes prioritized by DiSNEP are cancer-related than those by a general gene network or without prioritization according to the DisGeNET database [17]. We also applied DiSNEP to independent 450K DNA methylation data of hepatocellular carcinoma (HCC) tumor and adjacent normal tissues from the Columbia University Irving Medical Center (CUIMC) GES54503 [44] for a replication analysis and concluded that the prioritization results are reproducible if a different dataset is used. Additional pathway enrichment analyses across the five TCGA cancers suggest that the prioritized genes by DiSNEP are indeed cancer-specific.

Methods

The proposed DiSNEP framework has three steps: (1) enhance a general gene network S into a disease-specific gene network SE, (2) diffuse gene association signals g into gE on the enhanced disease-specific gene network SE and (3) prioritize and select top ranked candidate genes based on gE. Figure 1 shows the workflow of the proposed DiSNEP framework.

The DiSNEP framework workflow. We first diffuse a general gene network S on a gene similarity matrix E, which is generated using a type of disease omics data. We then remove ‘noise’ edges and make the network binary. The resulted disease-specific network SE better reflects the true interactions between genes for the disease. SE is then used to prioritize gene association measures through a diffusion process, and the top-ranked prioritized genes by the prioritized gene association measures are selected.
Figure 1

The DiSNEP framework workflow. We first diffuse a general gene network S on a gene similarity matrix E, which is generated using a type of disease omics data. We then remove ‘noise’ edges and make the network binary. The resulted disease-specific network SE better reflects the true interactions between genes for the disease. SE is then used to prioritize gene association measures through a diffusion process, and the top-ranked prioritized genes by the prioritized gene association measures are selected.

Step 1: Enhance a general gene network S

Generate a gene similarity matrix using disease omics data

We use disease omics data, for example gene expression data of a disease from both case and control samples to generate a gene similarity matrix E that reflects correlations/interactions between gene–gene pairs for the disease by calculating absolute values of pairwise correlations between genes. This could be similarly applied to DNA methylation data where the largest absolute value of correlation between any two CpGs on two different genes is selected to represent the correlation between the two genes.

Enhance a general human network S into SE

We first conduct column normalization on the gene network E in order to have a stochastic matrix [9] and denote the resulted matrix as Enorm. Here we use the general gene network STRING V11 [13], denoted as S0. Entries of S0 are edge weights between pairs of genes that represent confidence of their functional interactions. Note that S0 are symmetric with more than 90% of edges having 0 weights and the rest of weights ranging 0–999. We enhance S0 into a disease-specific network using Enorm through a random walk process as follows:
(1)

Here α is a regularization parameter representing weights for signal sources where α = 0 means no disease-specific enhancement, i.e. no data from the disease are used to enhance S0, and t is iteration steps. Simulation studies suggest that the optimal α value is 0.75, while different values of α > 0.5 do not have much impact on prioritization results. This is consistent with previous observations using similar random walk processes [18, 19]. We set iterations to stop when L1 matrix norm of St + 1St is smaller than 1 × 10−6. Typically, the threshold is reached after 10–15 iterations.

Note that after convergence, St is no longer symmetric. To build SE, we further remove ‘noise edges’ in St, i.e. for top 10% of edges ordered by edge weights, we set their edge weights to be 1. We set the rest 90% edge weights to be 0. The 10% was chosen based on a previous study that investigated a set of gene ontology-derived gold standards [18]. Since St is not symmetric, in order to have an undirected network, i.e. symmetric, we then set edge weights between any pairs of genes as 1 if at least one of the two weights between two genes is 1 [18]. The resulted network is then denoted as SE.

Step 2: Update gene association signals g

Disease-specific gene association signal scores g

Association P-values could be obtained from appropriate statistical tests based on study designs, e.g. t-test comparing mean gene expression levels between cases and controls. Denote pi as the association P-value for gene i, we calculate the association signal score gi = Φ−1 (1 − pi), where Φ is the standard normal distribution function, and denote the association signal scores of n genes as g0 = (g1, g2, …, gn). This could be similarly applied to DNA methylation data where the largest association signal score of all CpGs on a gene represents the association signal score for the gene.

Update gene association signals g into gE

With the enhanced disease-specific network SE, we first conduct column normalization on SE in order to have a stochastic matrix [9] and denote the resulted matrix as SEnorm. We then update association signals g0 through a random walk based on SEnorm as follows:
(2)

Here β is a regularization parameter representing weights for signal sources where β = 0 means no prioritization using SE. Similar to α, β has an optimal value at 0.75 from simulation studies and different values of β > 0.5 do not have much impact on prioritization results, which is consistent with previous observations [18, 19]. We set iterations to stop when L1 vector norm of gt + 1gt is smaller than 1 × 10−6. The final updated association signal scores are denoted as gE.

Step 3: Prioritize candidate genes based on gE

We prioritize genes ranked by their gE scores and choose m top-ranked genes as the prioritized candidate signal genes, where m is usually under a few hundred [20].

Note that the same set of disease omics data is used twice for two purposes in the DiSNEP framework, first to enhance a general gene network into a disease-specific network and second to generate disease association measures to be diffused for prioritization.

Results

Simulation studies

We conducted simulation studies to investigate the performance of the proposed DiSNEP framework, which enhances a general gene network into a disease-specific network and then prioritizes gene associations on the enhanced network. We evaluated the performance using true positive rates (TPR) and false positive rates (FPR) and constructed receiver operating characteristic (ROC) curves with areas under the ROC curves (AUCs). We compared the performance of DiSNEP to that of (1) the GeneWanderer method that prioritizes candidate genes by diffusion on the column-normalized STRING general gene network [9] and (ii) the t-test method that generates disease association measures similarly as in DiSNEP and GeneWanderer without using network information.

Simulation settings

Generating a general human gene network

We downloaded the human gene network from the STRING database v11, which has 19 354 genes and 11 759 454 edges [13]. We trimmed the STRING network to keep genes that are also in the TCGA KIRP gene expression dataset after quality control steps (see Real data applications). We ended with 14 066 genes and 4 056 937 edges. Using KIRP as an example to prioritize KIRP-associated genes, we downloaded a list with 499 renal carcinoma-associated genes from the DisGeNET database and considered these 499 genes as the true KIRP-associated genes. The DisGeNET database V6.0 was developed by expert curated repositories, GWAS catalogues, animal models and literatures [17]. We set the general gene network S in simulation studies to have 2000 genes, with 50 genes being renal carcinoma-associated signal genes randomly selected from the 499 renal carcinoma-associated genes and the remaining 1950 genes being noises randomly selected from the 13 567 non-renal-carcinoma-associated genes on the STRING network. We kept the edges of these 2000 genes selected from the STRING network.

Simulating gene expression data

In each simulation, we simulated gene expressions of the 2000 genes for 300 tumor samples and 50 independent normal samples mimicking real data applications using a multivariate normal distribution. For the 50 signal genes of tumor samples, we set 25 genes to have large association signals with means generated from N (μ1, 1) and the other 25 to have medium association signals with means generated from N (μ2, 1). The 1950 noise genes of tumor samples and all 2000 genes of normal samples have means generated from N (8, 1). For correlations among the 2000 genes, we set strong pairwise correlations among the 25 large signal genes from Uniform (0, 0.1) and median correlations among the other 25 median signal genes from Uniform (0, 0.02). We set weak correlations among the 1950 noise genes from Uniform (0, 0.01). Finally, we denote standard deviations of expression level of all genes as σ. We set parameters (μ1, μ2,σ) to have values (10, 9, 3, 10, 9, 2, 10, 9, 1, 11, 10, 1, 12, 11, 1) for different effect sizes. We then simulated gene expression data of 2000 genes for 300 cases and 50 controls from a multivariate normal distribution with parameters generated above and calculated the association signal scores g and the gene similarity matrix E using simulated gene expression data in each simulated dataset. We simulated 100 datasets and applied the proposed DiSNEP framework and the comparison methods, GeneWanderer and two-sample t-test.

Simulation results

With the selected top-ranked m candidate genes by signal scores, we calculated TPR and FPR and constructed ROC curves with AUCs when we ranged m from 1 to 2000 with a grid of 1. As typically a few hundred candidate genes are considered in the follow-up analyses [20], we compared parts of AUCs of the three methods when the minimum top-ranked m prioritized genes were chosen so that FPR = 0.1. Across the 100 simulated datasets, the mean of the minimum m’s selected to have FPR = 0.1 is 225 when α and β were set at 0.75. Figure 2A displays AUC results from the 100 simulated datasets when α and β were set at 0.75, and (μ1, μ2,σ) were set at (10, 9, 1), respectively. Under the same parameter setting, Figure 2B displays the average numbers of true signal genes out of top-ranked m prioritized genes when m increases from 0 to 500. Table 1 summarizes averaged AUCs when FPR ≤ 0.1 under different α and β settings when (μ1, μ2) were set at (10, 9) and σ varying from 1, 2 to 3. Additional simulation results are included in Supplementary Table S7. It is clear that in all simulation scenarios, the proposed DiSNEP framework that enhances a general gene network for a specific disease achieves higher AUCs than that using a general gene network or without network-assisted prioritization. This demonstrates that the enhancement of a general gene network for a specific disease does improve the prioritization of candidate genes for the disease. Among different values of α and β, the optimal values are both 0.75 when different α and β values ranging from 0.5 to 1 do not have much impact on the performance of DiSNEP and GeneWanderer. Previous studies using similar random walk processes also reported similar findings [18, 19].

Evaluations of the three methods in simulation studies. (A) Boxplot of AUCs when FPR ≤ 0.1; α and β were set at 0.75 and σ, μ1 and μ2 were set at 1, 10 and 9. (B) The number of true signal genes out of top ranked 0–500 genes increased by 50.
Figure 2

Evaluations of the three methods in simulation studies. (A) Boxplot of AUCs when FPR ≤ 0.1; α and β were set at 0.75 and σ, μ1 and μ2 were set at 1, 10 and 9. (B) The number of true signal genes out of top ranked 0–500 genes increased by 50.

Table 1

Simulation results with mean AUCs for t-test, GeneWanderer and DiSNEP when FPR ≤ 0.1 with different values of α and β in different simulation scenarios

MethodsMean AUC
t-test0.019
Scenario I
μ1 = 10a
μ2 = 9b
σ = 1c
β
0.50.751
GeneWanderer0.0280.0290.027
DiSNEPα0.50.0340.0360.036
0.750.0370.0380.037
10.0350.0360.035
Scenario II
μ1 = 10
μ2 = 9
σ = 2
MethodsMean AUC
t-test0.018
β
0.50.751
GeneWanderer0.0260.0270.025
DiSNEPα0.50.0310.0320.027
0.750.0320.0330.027
10.0300.0300.024
Scenario III
μ1 = 10
μ2 = 9
σ = 3
MethodsMean AUC
t-test0.016
β
0.50.751
GeneWanderer0.0250.0260.024
DiSNEPα0.50.0280.0280.025
0.750.0290.0300.025
10.0260.0270.023
MethodsMean AUC
t-test0.019
Scenario I
μ1 = 10a
μ2 = 9b
σ = 1c
β
0.50.751
GeneWanderer0.0280.0290.027
DiSNEPα0.50.0340.0360.036
0.750.0370.0380.037
10.0350.0360.035
Scenario II
μ1 = 10
μ2 = 9
σ = 2
MethodsMean AUC
t-test0.018
β
0.50.751
GeneWanderer0.0260.0270.025
DiSNEPα0.50.0310.0320.027
0.750.0320.0330.027
10.0300.0300.024
Scenario III
μ1 = 10
μ2 = 9
σ = 3
MethodsMean AUC
t-test0.016
β
0.50.751
GeneWanderer0.0250.0260.024
DiSNEPα0.50.0280.0280.025
0.750.0290.0300.025
10.0260.0270.023

aMeans of the 25 large association signal genes were generated from N(μ1, 1).

bMeans of the 25 medium association signal genes generated from N(μ2, 1).

cσ is the standard deviation of gene expression data.

Table 1

Simulation results with mean AUCs for t-test, GeneWanderer and DiSNEP when FPR ≤ 0.1 with different values of α and β in different simulation scenarios

MethodsMean AUC
t-test0.019
Scenario I
μ1 = 10a
μ2 = 9b
σ = 1c
β
0.50.751
GeneWanderer0.0280.0290.027
DiSNEPα0.50.0340.0360.036
0.750.0370.0380.037
10.0350.0360.035
Scenario II
μ1 = 10
μ2 = 9
σ = 2
MethodsMean AUC
t-test0.018
β
0.50.751
GeneWanderer0.0260.0270.025
DiSNEPα0.50.0310.0320.027
0.750.0320.0330.027
10.0300.0300.024
Scenario III
μ1 = 10
μ2 = 9
σ = 3
MethodsMean AUC
t-test0.016
β
0.50.751
GeneWanderer0.0250.0260.024
DiSNEPα0.50.0280.0280.025
0.750.0290.0300.025
10.0260.0270.023
MethodsMean AUC
t-test0.019
Scenario I
μ1 = 10a
μ2 = 9b
σ = 1c
β
0.50.751
GeneWanderer0.0280.0290.027
DiSNEPα0.50.0340.0360.036
0.750.0370.0380.037
10.0350.0360.035
Scenario II
μ1 = 10
μ2 = 9
σ = 2
MethodsMean AUC
t-test0.018
β
0.50.751
GeneWanderer0.0260.0270.025
DiSNEPα0.50.0310.0320.027
0.750.0320.0330.027
10.0300.0300.024
Scenario III
μ1 = 10
μ2 = 9
σ = 3
MethodsMean AUC
t-test0.016
β
0.50.751
GeneWanderer0.0250.0260.024
DiSNEPα0.50.0280.0280.025
0.750.0290.0300.025
10.0260.0270.023

aMeans of the 25 large association signal genes were generated from N(μ1, 1).

bMeans of the 25 medium association signal genes generated from N(μ2, 1).

cσ is the standard deviation of gene expression data.

To further evaluate the performance of DiSNEP, for each gene, we defined an interaction ratio score, which is the ratio of the number of interactions of this gene to the average number of interactions of all genes. We then calculated correlations between interaction ratio scores and prioritization ranks of signal genes and noise genes separately. Supplementary Table S4 shows that across all simulation scenarios considered, interaction ratio scores of signal genes are larger on average using DiSNEP than those using GeneWanderer, while interaction ratio scores of noise genes are smaller on average using DiSNEP than those using GeneWanderer. Correlations between interaction ratio scores and prioritization ranks for signal genes are on average larger using DiSNEP than those using GeneWanderer, while correlations for noise genes are on average smaller using DiSNEP than those using GeneWanderer. This indicates that signal genes, which should rank higher, indeed have larger interaction ratio scores using DiSNEP, i.e. DiSNEP improves prioritization through utilizing gene interaction information specifically for the disease as designed.

Real data applications

Prioritize cancer-associated genes using omics data from TCGA for five cancer types

We applied DiSNEP to prioritize cancer-associated gene expression (or DNA methylation) signal genes using gene expression-enhanced (or DNA methylation-enhanced) cancer-specific networks for five cancer types from TCGA V16: KIRP, THCA, STAD, LIHC and PRAD. We used gene expression data (DNA methylation data) of tumor and adjacent normal samples to enhance a general gene network into a gene expression-enhanced cancer-specific network (DNA methylation-enhanced cancer-specific network) and then prioritizing cancer-associated gene expression signals (DNA methylation signals) using the enhanced cancer-specific network.

We conducted the same quality control steps for gene expression data of the five cancers where we removed samples with more than 30% missing in gene expression measures and removed genes with more than 30% samples missing. We then imputed missing values using KNN [21] and corrected batch effects using Combat [22]. For DNA methylation data, if both Infinium HumanMethylation27 and HumanMethylation450 BeadChips are available, we used the overlapping probes. If only HumanMethylation450 BeadChip is available, we used all probes of the HumanMethylation450 BeadChip. We conducted the same quality control steps for all five cancers where we removed CpGs on sex chromosomes and overlapping with single-nucleotide polymorphisms. We removed samples with more than 30% missing in methylation measures and removed CpGs with more than 30% samples missing. We then imputed missing values using KNN [21], corrected batch effects using Combat [22] and corrected type I/II probe bias using wateRmelon [23]. Table 2 summarizes gene expression data and DNA methylation data of the five cancer types. After these quality control steps, we restricted the omics data and the general STRING gene network to have the same set of genes, which are used for all three methods.

Table 2

Summary of gene expression data and DNA methylation data of the five TCGA cancer types

KIRPTHCASTADLIHCPRAD
GeneExpressionNo. of tumor samples290501274367497
No. of tumor/adjacent normal pairs3058274952
No. of genes16,72916 84220 39616 08517 139
DNAMethylationNo. of tumor samples291503443337498
No. of tumor/adjacent normal pairs5056275050
No. of CpG sites22 485337 49722 488337 483337 491
No. of genes13 15723 23213 15823 23223 233
KIRPTHCASTADLIHCPRAD
GeneExpressionNo. of tumor samples290501274367497
No. of tumor/adjacent normal pairs3058274952
No. of genes16,72916 84220 39616 08517 139
DNAMethylationNo. of tumor samples291503443337498
No. of tumor/adjacent normal pairs5056275050
No. of CpG sites22 485337 49722 488337 483337 491
No. of genes13 15723 23213 15823 23223 233
Table 2

Summary of gene expression data and DNA methylation data of the five TCGA cancer types

KIRPTHCASTADLIHCPRAD
GeneExpressionNo. of tumor samples290501274367497
No. of tumor/adjacent normal pairs3058274952
No. of genes16,72916 84220 39616 08517 139
DNAMethylationNo. of tumor samples291503443337498
No. of tumor/adjacent normal pairs5056275050
No. of CpG sites22 485337 49722 488337 483337 491
No. of genes13 15723 23213 15823 23223 233
KIRPTHCASTADLIHCPRAD
GeneExpressionNo. of tumor samples290501274367497
No. of tumor/adjacent normal pairs3058274952
No. of genes16,72916 84220 39616 08517 139
DNAMethylationNo. of tumor samples291503443337498
No. of tumor/adjacent normal pairs5056275050
No. of CpG sites22 485337 49722 488337 483337 491
No. of genes13 15723 23213 15823 23223 233

TCGA KIRP cancer

Using TCGA KIRP as an example, we investigated KIRP candidate genes prioritized by DiSNEP and the comparison methods.

With gene expression data, after quality control steps, there are 16 729 genes for 290 tumor samples and 30 adjacent normal samples, among which there are 30 tumor and adjacent normal pairs. All 320 samples were used to generate the gene similarity matrix E. The 30 tumor and adjacent normal pairs were used to generate association signal scores g. Of those 16 729 genes, 14 066 are in the STRING network with 4 056 937 edges. We chose the top 100 genes ranked by prioritized association signal scores using DiSNEP and GeneWanderer and the top 100 genes ranked by association signal scores using paired t-test. Table 3 shows that among the top-ranked 100 genes by paired t-test, GeneWanderer and DiSNEP, 7, 30 and 35 genes are associated with renal carcinoma according to DisGeNET [17]. This suggests that the enhanced KIRP-specific gene network more accurately reflects gene interactions for renal carcinoma than STRING, a general gene network. Therefore, more renal carcinoma-associated genes are prioritized. The pathway enrichment analyses according to KEGG [24] show that the top 100 genes by DiSNEP also achieve the most significant enrichment P-values among the three methods not only for two general cancer pathways, ‘pathway in cancer’ and ‘proteoglycans in cancer’, but also for the renal cancer specific pathway, ‘renal cell cancer’ (Table 3).

Table 3

Results for TCGA KIRP cancer when gene expression signal genes and DNA methylation signal genes are prioritized using gene expression-enhanced KIRP-specific network and DNA methylation-enhanced KIRP-specific network, respectively. Displayed are the number of renal carcinoma-associated genes according to DisGeNETa and the enrichment P-values of ‘pathway in cancer’, ‘proteoglycans in cancer’ and ‘renal cell cancer’ from KEGG among top-ranked 100 signal genes for the three methods

Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of renal carcinoma-associated genes7303583440
Enrichment P-value of ‘pathway in cancer’NS1.4 × 10−202.4 × 10−23NS3.5 × 10−243.4 × 10−34
Enrichment P-value of ‘proteoglycans in cancer’NS6.1 × 10−211.6 × 10−27NS1.7 × 10−222.1 × 10−24
Enrichment P-value of ‘renal cell cancer’NS3.4 × 10−62.4 × 10−10NS6.8 × 10−71.7 × 10−11
Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of renal carcinoma-associated genes7303583440
Enrichment P-value of ‘pathway in cancer’NS1.4 × 10−202.4 × 10−23NS3.5 × 10−243.4 × 10−34
Enrichment P-value of ‘proteoglycans in cancer’NS6.1 × 10−211.6 × 10−27NS1.7 × 10−222.1 × 10−24
Enrichment P-value of ‘renal cell cancer’NS3.4 × 10−62.4 × 10−10NS6.8 × 10−71.7 × 10−11

NS, not significant.

aThere are 499 renal-carcinoma-associated genes according to DisGeNET [17].

Table 3

Results for TCGA KIRP cancer when gene expression signal genes and DNA methylation signal genes are prioritized using gene expression-enhanced KIRP-specific network and DNA methylation-enhanced KIRP-specific network, respectively. Displayed are the number of renal carcinoma-associated genes according to DisGeNETa and the enrichment P-values of ‘pathway in cancer’, ‘proteoglycans in cancer’ and ‘renal cell cancer’ from KEGG among top-ranked 100 signal genes for the three methods

Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of renal carcinoma-associated genes7303583440
Enrichment P-value of ‘pathway in cancer’NS1.4 × 10−202.4 × 10−23NS3.5 × 10−243.4 × 10−34
Enrichment P-value of ‘proteoglycans in cancer’NS6.1 × 10−211.6 × 10−27NS1.7 × 10−222.1 × 10−24
Enrichment P-value of ‘renal cell cancer’NS3.4 × 10−62.4 × 10−10NS6.8 × 10−71.7 × 10−11
Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of renal carcinoma-associated genes7303583440
Enrichment P-value of ‘pathway in cancer’NS1.4 × 10−202.4 × 10−23NS3.5 × 10−243.4 × 10−34
Enrichment P-value of ‘proteoglycans in cancer’NS6.1 × 10−211.6 × 10−27NS1.7 × 10−222.1 × 10−24
Enrichment P-value of ‘renal cell cancer’NS3.4 × 10−62.4 × 10−10NS6.8 × 10−71.7 × 10−11

NS, not significant.

aThere are 499 renal-carcinoma-associated genes according to DisGeNET [17].

Among the top 100 gene expression signal genes by DiSNEP only but not the other two methods, there are several important KIRP genes based on DisGeNET, such as CUL1 [26], PIK3CA [26], CD44 [27], MMP9 [28], MDM2 [29] and KDR [30]. We examined the gene CUL1 as an example. CUL1 has an association signal score 4.34 from paired t-test and ranks #3945. After prioritization using GeneWanderer, CUL1 ranks #202 with an interaction ratio score of 3.45. After prioritization using DiSNEP, CUL1 ranks #56 with an interaction ratio score of 6.29. This suggests that the KIRP-enhanced network SE more accurately reflects the strong interactions between CUL1 and other genes, which helps boosting the CUL1 gene signal.

For DNA methylation data, both HumanMethylation27 and HumanMethylation450 BeadChips are available for KIRP samples. After quality control steps, there are 13 157 genes with 22 485 CpGs for 291 tumor samples and 50 adjacent normal samples, among which there are 50 tumor and adjacent normal pairs. All 341 samples were used to generate the gene similarity matrix E. The 50 tumor and adjacent normal pairs were used to generate association signal scores g. Of those 13 157 genes, 10 104 are in the STRING network with 2 359 788 edges. We similarly chose the top 100 genes ranked by the association signal scores. Among these top 100 genes by paired t-test, GeneWanderer and DiSNEP, 8, 34 and 40 genes are reported to be associated with renal carcinoma according to DisGeNET [17] (Table 3), suggesting that the enhanced KIRP-specific gene network more accurately reflects gene interactions for renal carcinoma. The pathway enrichment analyses [24] also show that DiSNEP achieves the most significant enrichment P-values not only for two general cancer pathways, ‘pathway in cancer’ and ‘proteoglycans in cancer’, but also for the renal cancer-specific pathway, ‘renal cell cancer’.

We similarly investigated DNA methylation signal genes prioritized by DiSNEP only but not the other two methods. There are several important KIRP genes based on DisGeNET, such as PIK3CA [25], CUL1 [26], KDR [30], PIK3R1 [31], CXCL12 [32], NFKB1 [33] and PLK1 [34]. Using gene PIK3R1 as an example, PIK3R1 has an association signal score 1.82 from paired t-test and ranks #5062. After prioritization using GeneWanderer, PIK3R1 ranks #549 with an interaction ratio score of 2.54. After prioritization using DiSNEP, PIK3R1 ranks #95 with an interaction ratio score of 6.09. Again, this suggests that the KIRP-enhanced network SE more accurately reflects the strong interactions between PIK3R1 and other genes, which helps boosting the PIK3R1 gene signal.

TCGA THCA cancer

We similarly investigated prioritized THCA-associated genes as the second example.

Table 4 shows that among the top 100 gene expression (DNA methylation) signal genes by paired t-test, GeneWanderer and DiSNEP, 8 (8), 49 (52) and 54 (58) genes are associated with thyroid carcinoma according to DisGeNET [17]. The pathway enrichment analyses also show that DiSNEP-prioritized top genes achieve the most significant enrichment P-values among the three methods not only for two general cancer pathways, ‘pathway in cancer’ and ‘proteoglycans in cancer’, but also for the thyroid cancer specific pathway, ‘thyroid cancer’. Among the top 100 gene expression signal genes by DiSNEP only, there are several important THCA genes, such as ATM [35], FYN [36], PIK3CA [37], FGF2 [38] and NRAS [39]. We similarly examined gene PIK3CA. PIK3CA has an association signal score 1.99 from paired t-test and ranks #10428. After prioritization using GeneWanderer, PIK3CA ranks #148 with an interaction ratio score of 4.08. After prioritization using the THCA-enhanced network, PIK3CA ranks #71 with an interaction ratio score of 6.29. This suggests that the THCA-enhanced network SE more accurately reflects the strong interactions between PIK3CA and other genes, which helps boosting the PIK3CA gene signal. Similarly, among the top 100 DNA methylation signal genes by DiSNEP only, there are important THCA genes, such as ATM [35], PIK3CA [37], NRAS [39], CXCR4 [40], STAT1 [41], MMP9 [42] and PPARG1 [43]. We examined the gene CXCR4. CXCR4 has an association signal score 3.98 from paired t-test and ranks #7320. After prioritization using GeneWanderer, CXCR4 ranks #172 with an interaction ratio score of 3.78. After prioritization using the THCA-enhanced network, CXCR4 ranks #82 with an interaction ratio score of 6.28. This similarly suggests that the THCA-enhanced network SE more accurately reflects the strong interactions between CXCR4 and other genes, which helps boosting the CXCR4 gene signal.

Table 4

Results for TCGA THCA cancer when gene expression signal genes and DNA methylation signal genes are prioritized using gene expression-enhanced THCA-specific network and DNA methylation-enhanced THCA-specific network, respectively. Displayed are the number of thyroid carcinoma-associated genes according to DisGeNETa and the enrichment P-values of ‘pathway in cancer’, ‘proteoglycans in cancer’ and ‘thyroid cancer’ from KEGG among top-ranked 100 signal genes for the three methods

Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of thyroid carcinoma-associated genes8495485258
Enrichment P-value of ‘pathway in cancer’NS1.8 × 10−239.7 × 10−28NS4.5 × 10−287.8 × 10−39
Enrichment P-value of ‘proteoglycans in cancer’NS1.1 × 10−231.9 × 10−26NS3.3 × 10−268.2 × 10−29
Enrichment P-value of ‘thyroid cancer’NS6.9 × 10−91.2 × 10−11NS5.9 × 10−91.1 × 10−11
Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of thyroid carcinoma-associated genes8495485258
Enrichment P-value of ‘pathway in cancer’NS1.8 × 10−239.7 × 10−28NS4.5 × 10−287.8 × 10−39
Enrichment P-value of ‘proteoglycans in cancer’NS1.1 × 10−231.9 × 10−26NS3.3 × 10−268.2 × 10−29
Enrichment P-value of ‘thyroid cancer’NS6.9 × 10−91.2 × 10−11NS5.9 × 10−91.1 × 10−11

NS, not significant.

aThere are 720 thyroid carcinoma-associated genes according to DisGeNET [17].

Table 4

Results for TCGA THCA cancer when gene expression signal genes and DNA methylation signal genes are prioritized using gene expression-enhanced THCA-specific network and DNA methylation-enhanced THCA-specific network, respectively. Displayed are the number of thyroid carcinoma-associated genes according to DisGeNETa and the enrichment P-values of ‘pathway in cancer’, ‘proteoglycans in cancer’ and ‘thyroid cancer’ from KEGG among top-ranked 100 signal genes for the three methods

Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of thyroid carcinoma-associated genes8495485258
Enrichment P-value of ‘pathway in cancer’NS1.8 × 10−239.7 × 10−28NS4.5 × 10−287.8 × 10−39
Enrichment P-value of ‘proteoglycans in cancer’NS1.1 × 10−231.9 × 10−26NS3.3 × 10−268.2 × 10−29
Enrichment P-value of ‘thyroid cancer’NS6.9 × 10−91.2 × 10−11NS5.9 × 10−91.1 × 10−11
Gene expression signalsDNA methylation signals
Paired t-testGeneWandererDiSNEPPaired t-testGeneWandererDiSNEP
Number of thyroid carcinoma-associated genes8495485258
Enrichment P-value of ‘pathway in cancer’NS1.8 × 10−239.7 × 10−28NS4.5 × 10−287.8 × 10−39
Enrichment P-value of ‘proteoglycans in cancer’NS1.1 × 10−231.9 × 10−26NS3.3 × 10−268.2 × 10−29
Enrichment P-value of ‘thyroid cancer’NS6.9 × 10−91.2 × 10−11NS5.9 × 10−91.1 × 10−11

NS, not significant.

aThere are 720 thyroid carcinoma-associated genes according to DisGeNET [17].

Table 5

Reproducibility results using TCGA LIHC DNA methylation data and CUIMC HCC DNA methylation data

Paired t-testGeneWandererDiSNEP
No. of overlapping genes between the two sets of top-ranked 100 prioritized genes using the TCGA LIHC and CUIMC HCC data239176
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the TCGA LIHC data24346
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the CUIMC HCC data34046
No. (%) of overlapping liver cancer-associated genes using the TCGA LIHC and CUIMC HCC data0 out of 23 (0%)38 out of 91 (42%)36 out of 76 (47%)
Paired t-testGeneWandererDiSNEP
No. of overlapping genes between the two sets of top-ranked 100 prioritized genes using the TCGA LIHC and CUIMC HCC data239176
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the TCGA LIHC data24346
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the CUIMC HCC data34046
No. (%) of overlapping liver cancer-associated genes using the TCGA LIHC and CUIMC HCC data0 out of 23 (0%)38 out of 91 (42%)36 out of 76 (47%)
Table 5

Reproducibility results using TCGA LIHC DNA methylation data and CUIMC HCC DNA methylation data

Paired t-testGeneWandererDiSNEP
No. of overlapping genes between the two sets of top-ranked 100 prioritized genes using the TCGA LIHC and CUIMC HCC data239176
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the TCGA LIHC data24346
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the CUIMC HCC data34046
No. (%) of overlapping liver cancer-associated genes using the TCGA LIHC and CUIMC HCC data0 out of 23 (0%)38 out of 91 (42%)36 out of 76 (47%)
Paired t-testGeneWandererDiSNEP
No. of overlapping genes between the two sets of top-ranked 100 prioritized genes using the TCGA LIHC and CUIMC HCC data239176
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the TCGA LIHC data24346
No. of liver cancer-associated genes out of the top-ranked 100 prioritized genes based on DisGeNET using the CUIMC HCC data34046
No. (%) of overlapping liver cancer-associated genes using the TCGA LIHC and CUIMC HCC data0 out of 23 (0%)38 out of 91 (42%)36 out of 76 (47%)

TCGA STAD, LIHC and PRAD cancers

We obtained similar results and reached similar conclusions for STAD, LIHC and PRAD. Details are included in Supplementary Tables S1–S3. The analysis results of interaction ratio scores for five cancer types are included in Table S5.

Reproducibility of DiSNEP

To investigate whether the prioritization results of DiSNEP are reproducible, we applied DiSNEP and the two comparison methods to independent 450K DNA methylation data of tumor and adjacent normal tissues of HCC from CUIMC GES54503 [44]. After applying the same quality control steps, the CUIMC HCC DNA methylation data have 412 287 CpGs from 23 809 genes for 66 matched tumor and adjacent normal pairs. Note that the TCGA LIHC DNA methylation data have 337 483 CpGs from 23 233 genes for 50 matched tumor and adjacent normal pairs. We compared the two sets of top-ranked 100 prioritized genes using TCGA LIHC and CUIMC HCC DNA methylation data and summarized results in Table 5. Among the two sets of top-ranked 100 prioritized genes, there are 91 overlapping genes using GeneWanderer and 76 overlapping genes using DiSNEP. It is expected that network-assisted gene prioritization methods with a general network will reproduce more. When we further examined the 76 overlapping prioritized genes using DiSNEP based on the DisGeNET database [3], 36 (47%) are liver cancer-associated. While out of the 91 overlapping prioritized genes using GeneWanderer, 38 (42%) are liver cancer-associated. This suggests that the prioritization results by DiSNEP and GeneWanderer are reproducible, when DiSNEP reproduces slightly more cancer-associated genes based on DisGeNET.

Specificity of DiSNEP

We also investigated if genes prioritized by DiSNEP for a disease are indeed disease-specific.

To do so, we conducted additional enrichment analyses using all terms of the Disease Ontology (DO) database (May 2020 release) [46].

We first used top 100 DiSNEP-prioritized genes for enrichment analysis on all DO terms and selected top 20 most significantly enriched DO terms. We then extracted these 20 DO terms’ enrichment P-values using the top 100 GeneWanderer-prioritized genes. Supplementary Tables S8–S12 summarize these enrichment results of the top 20 enriched DO terms for five TCGA cancers. We can clearly see that most (≥80%) of the top 20 enriched DO terms are cancer terms. For all the enriched cancer DO terms, DiSNEP achieves the most significant enrichment P-values among the three methods. Since different cancer DO terms share many common genes, these results suggest that genes prioritized by DiSNEP are disease-specific or, in this case, cancer-specific, although specific cancer DO terms such as ‘Kidney cancer’ or ‘Thyroid cancer’ are not in the top 20 enriched DO terms.

We then used top 100 GeneWanderer-prioritized genes for enrichment analysis on all DO terms and selected top 20 most significantly enriched DO terms. We then extracted these 20 DO terms’ enrichment P-values using the top 100 DiSNEP-prioritized genes. Supplementary Tables S13–S17 summarize these enrichment results of the top 20 enriched DO terms for five TCGA cancers. We can clearly see that DiSNEP achieves the most significant enrichment P-values among the three methods for all enriched cancer DO terms but not for some of the enriched non-cancer DO terms, such as ‘connective tissue disease’, ‘musculoskeletal system disease’, ‘immune system disease’, ‘hypersensitivity reaction disease’ and ‘bone disease’ for KIRP. This also suggests that genes prioritized by DiSNEP are disease-specific, or cancer-specific in this case.

Next, we examined the DO terms with the names of the TCGA cancers we investigated: ‘Kidney cancer’, ‘Thyroid cancer’, ‘Stomach cancer’, ‘Liver cancer’, and ‘Prostate cancer’. Using the top 100 DiSNEP-prioritized genes, the enrichment P-values for these five DO cancer terms are the most significant among the three methods (Supplementary Table S18).

Finally, using the top 100 DiSNEP-prioritized genes, the top 100 GeneWanderer-prioritized genes and the top 100 genes by t-test, we examined the number of enriched non-cancer DO terms out of the top 20 most significantly enriched DO terms for five TCGA cancers (Supplementary Table S19). It is clear that there are fewer non-cancer DO terms out of the top 20 enriched DO terms using top 100 DiSNEP-prioritized genes than those enriched using top 100 GeneWanderer-prioritized or top 100 genes by t-test. These results again suggest that genes prioritized by DiSNEP are disease-specific, or cancer-specific in this case.

Discussion

In this article, we proposed DiSNEP, a framework to enhance a general human gene network into a disease-specific network in order to improve prioritization of candidate genes associated with this disease. Network-assisted methods have been proven to be useful in prioritizing candidate disease genes. Several comprehensive gene networks were developed. However, these networks are not for specific diseases, thus interactions among genes in these networks are not necessarily reflecting true interactions between genes for specific diseases. This may lead to a reduced accuracy when prioritizing candidate disease genes. Our proposed DiSNEP framework addresses this problem by first enhancing a general gene network into a disease-specific network using a gene similarity matrix generated from disease omics data through a random walk process. The enhanced disease-specific network is then used to prioritize candidate genes associated with the disease by diffusion. In simulation studies, the proposed DiSNEP framework has improved performance than that using a general gene network. We applied DiSNEP to prioritize gene expression signal genes and DNA methylation signal genes associated with five different cancer types from TCGA using gene expression-enhanced cancer-specific networks and DNA methylation-enhanced cancer-specific networks, respectively. Results suggest that more prioritized genes by DiSNEP are cancer-related than those by a general gene network or without prioritization consistently across the five cancers considered when we used the DisGeNET database as the gold standards for cancer-associated genes.

We also considered more restricted criteria to define gold standards for cancer-associated genes based on the DisGeNET database. We investigated three DisGeNET criteria, namely (1) the global disease association (GDA) score, (2) the number of reference papers and (3) the disease-specific index (DSI) to see how changes in DisGeNET gold standards affect prioritization results. We found that DiSNEP consistently prioritizes more cancer-associated genes out of the top 100 prioritized genes with different DisGeNET criteria to define true cancer-associated genes as gold standards. Detailed results are included in Part E in the Supplementary Material and Supplementary Figure S2. Moreover, the replication analysis to independent 450K DNA methylation data of CUIMC HCC tumor and adjacent normal tissues suggests that the prioritization results by DiSNEP are reproducible. One limitation is that the enhanced disease-specific network is based on STRING, while evaluations of DiSNEP are based on the DisGeNET database. It is possible that some disease associations are included in both STRING and DisGeNET, making it easy for DiSNEP to identify these associations. More rigorous evaluation procedures may be taken to evaluate the performance of the DiSNEP method.

We demonstrated that using either gene expression data or DNA methylation data to capture correlations/interactions among genes for a specific cancer type can help improving prioritization genes associated with the cancer. DiSNEP can be readily applied to use other types of omics data or multiple types of omics data jointly, which potentially could better capture correlations/interactions among genes for a disease to further improve the enhancement of a general gene network into a disease-specific network. Note that biological implications of multi-omics data should be accounted carefully when using multi-omics data for enhancement. In both simulation studies and real data applications, we used the same omics data to enhance a general gene network and to define gene signal scores. We also investigated using one type of omics data to enhance a general gene network, with which to enhance signal scores obtained using a different type of omics data. Supplementary Table S6 shows the gene expression signal genes prioritized using gene expression-enhanced cancer-specific networks or DNA methylation-enhanced cancer-specific networks and the DNA methylation signal genes prioritized using DNA methylation-enhanced cancer-specific networks or gene expression-enhanced cancer-specific networks. We can see that the difference is modest. More investigations are needed when trying to use multi-omics data to enhance a general gene network into a disease-specific gene network, with which to prioritize association signals. Since a small subset of cancer genes, such as those in the Cancer Gene Census (CGC) [45], have been causally implicated using experiments, and dysfunctions of these genes explain how they drive cancers, we could take advantages of these information and further connect these genes in an enhanced cancer-specific network after enhancing with disease omics data.

Although we focused on cancers and enhanced a general network into cancer-specific networks to prioritize genes associated with different cancer types, the DiSNEP framework is general and can be readily applied to other diseases. The developed DiSNEP framework is implemented in an R package, which is freely available at https://github.com/pfruan/DiSNEP.

Key Points
  • The proposed DiSNEP framework enhances a general human gene network into a network for a specific disease that better reflects true gene interactions for the disease, which subsequently improves network-assisted candidate gene prioritization.

  • The proposed DiSNEP framework is able to prioritize more cancer-related genes than those by network-assisted methods with a general network.

  • The proposed DiSNEP framework consistently improves cancer gene prioritization across all five cancer types considered.

  • The prioritization results by the proposed DiSNEP framework are reproducible and are indeed disease-specific.

Peifeng Ruan was a doctoral student at the George Washington University, USA and is a postdoc associate at Yale University, USA.

Shuang Wang is a Professor at Columbia University, USA.

References

1.

Huang
JK
,
Carlin
DE
,
Yu
MK
, et al.
Systematic evaluation of molecular networks for discovery of disease genes
.
Cell Syst
2018
;
6
:
484
95
.

2.

Jiang
P
,
Wang
H
,
Li
W
, et al.
Network analysis of gene essentiality in functional genomics experiments
.
Genome Biol
2015
;
16
:
239
.

3.

Gwinner
F
,
Boulday
G
,
Vandiedonck
C
, et al.
Network-based analysis of omics data: the LEAN method
.
Bioinformatics
2017
;
33
:
701
9
.

4.

Lee
I
,
Blom
UM
,
Wang
PI
, et al.
Prioritizing candidate disease genes by network-based boosting of genome-wide association data
.
Genome Res
2011
;
21
:
1109
21
.

5.

Jia
P
,
Zheng
S
,
Long
J
, et al.
dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks
.
Bioinformatics
2011
;
27
:
95
102
.

6.

Ruan
P
,
Shen
J
,
Santella
RM
, et al.
NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals
.
Nucleic Acids Res
2016
;
44
:
e134
4
.

7.

Jones
A
,
Teschendorff
AE
,
Li
Q
, et al.
Role of DNA methylation and epigenetic silencing of HAND2 in endometrial cancer development
.
PLoS Med
2013
;
10
:
e1001551
.

8.

Guala
D
,
Sjolund
E
,
Sonnhammer
EL
, et al.
MaxLink: network-based prioritization of genes tightly linked to a disease seed set
.
Bioinformatics
2014
;
30
:
2689
90
.

9.

Kohler
S
,
Bauer
S
,
Horn
D
, et al.
Walking the interactome for prioritization of candidate disease genes
.
Am J Hum Genet
2008
;
82
:
949
58
.

10.

Leiserson
MD
,
Vandin
F
,
Wu
HT
, et al.
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes
.
Nat Genet
2015
;
47
:
106
14
.

11.

Moreau
Y
,
Tranchevent
LC
.
Computational tools for prioritizing candidate genes: boosting disease gene discovery
.
Nat Rev Genet
2012
;
13
:
523
36
.

12.

Tabor
HK
,
Risch
NJ
,
Myers
RM
.
Candidate-gene approaches for studying complex genetic traits: practical considerations
.
Nat Rev Genet
2002
;
3
:
391
7
.

13.

Szklarczyk
D
,
Gable
AL
,
Lyon
D
, et al.
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
.
Nucleic Acids Res
2018
;
47
:
D607
13
.

14.

Greene
CS
,
Krishnan
A
,
Wong
AK
, et al.
Understanding multicellular function and disease with human tissue-specific networks
.
Nat Genet
2015
;
47
:
569
76
.

15.

Herwig
R
,
Hardt
C
,
Lienhard
M
, et al.
Analyzing and interpreting genome data at the network level with ConsensusPathDB
.
Nat Protoc
2016
;
11
:
1889
907
.

16.

Nam
Y
,
Jhee
JH
,
Cho
J
, et al.
Disease gene identification based on generic and disease-specific genome networks
.
Bioinformatics
2018
;
35
:
1923
30
.

17.

Piñero
J
,
Ramírez-Anguita
JM
,
Saüch-Pitarch
J
, et al.
The DisGeNET knowledge platform for disease genomics: 2019 update
.
Nucleic Acids Res
2020
;
48
:
D845
55
.

18.

Hofree
M
,
Shen
JP
,
Carter
H
, et al.
Network-based stratification of tumor mutations
.
Nat Methods
2013
;
10
:
1108
15
.

19.

Bersanelli
M
,
Mosca
E
,
Remondini
D
, et al.
Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules
.
Sci Rep
2016
;
6
:
34841
.

20.

Hwang
S
,
Kim
CY
,
Yang
S
, et al.
HumanNet v2: human gene networks for disease research
.
Nucleic Acids Res
2018
;
47
:
D573
80
.

21.

Troyanskaya
O
,
Cantor
M
,
Sherlock
G
, et al.
Missing value estimation methods for DNA microarrays
.
Bioinformatics
2001
;
17
:
520
5
.

22.

Johnson
WE
,
Li
C
,
Rabinovic
A
.
Adjusting batch effects in microarray expression data using empirical Bayes methods
.
Biostatistics
2007
;
8
:
118
27
.

23.

Pidsley
R
,
Wong
CC
,
Volta
M
, et al.
A data-driven approach to preprocessing Illumina 450K methylation array data
.
BMC Genomics
2013
;
14
:
293
.

24.

Kanehisa
M
,
Goto
SKEGG
.
Kyoto encyclopedia of genes and genomes
.
Nucleic Acids Res
2000
;
28
:
27
30
.

25.

Guo
H
,
German
P
,
Bai
S
, et al.
The PI3K/AKT pathway and renal cell carcinoma
.
J Genet Genomics
2015
;
42
:
343
53
.

26.

Ping
JG
,
Wang
F
,
Pu
JX
, et al.
The expression of Cullin1 is increased in renal cell carcinoma and promotes cancer cell proliferation, migration, and invasion
.
Tumor Biol
2016
;
37
:
12823
31
.

27.

Li
X
,
Ma
X
,
Chen
L
, et al.
Prognostic value of CD44 expression in renal cell carcinoma: a systematic review and meta-analysis
.
Sci Rep
2015
;
5
:
13157
.

28.

Mikami
S
,
Mizuno
R
,
Kosaka
T
, et al.
Expression of TNF-α and CD 44 is implicated in poor prognosis, cancer cell invasion, metastasis and resistance to the sunitinib treatment in clear cell renal cell carcinomas
.
Int J Cancer
2015
;
136
:
1504
14
.

29.

Noon
AP
,
Vlatković
N
,
Polański
R
, et al.
p53 and MDM2 in renal cell carcinoma: biomarkers for disease progression and future therapeutic targets?
Cancer
2010
;
116
:
780
90
.

30.

Puerto-Nevado
LD
,
Rojo
F
,
Zazo
S
, et al.
Active angiogenesis in metastatic renal cell carcinoma predicts clinical benefit to sunitinib-based therapy
.
Br J Cancer
2014
;
110
:
2700
7
.

31.

Lin
Y
,
Yang
Z
,
Xu
A
, et al.
PIK3R1 negatively regulates the epithelial-mesenchymal transition and stem-like phenotype of renal cancer cells through the AKT/GSK3β/CTNNB1 signaling pathway
.
Sci Rep
2015
;
5
:
8997
.

32.

Schrader
AJ
,
Lechner
O
,
Templin
M
, et al.
CXCR4/CXCL12 expression and signalling in kidney cancer
.
Br J Cancer
2002
;
86
:
1250
6
.

33.

Morais
C
,
Gobe
G
,
Johnson
DW
, et al.
The emerging role of nuclear factor kappa B in renal cell carcinoma
.
Int J Biochem Cell Biol
2011
;
43
:
1537
49
.

34.

Zhang
G
,
Zhang
Z
,
Liu
Z
.
Polo-like kinase 1 is overexpressed in renal cancer and participates in the proliferation and invasion of renal cancer cells
.
Tumor Biol
2013
;
34
:
1887
94
.

35.

Choi
M
,
Kipps
T
,
Kurzrock
R
.
ATM mutations in cancer: therapeutic implications
.
Mol Cancer Ther
2016
;
15
:
1781
91
.

36.

Zheng
J
,
Li
H
,
Xu
D
, et al.
Upregulation of tyrosine kinase FYN in human thyroid carcinoma: role in modulating tumor cell proliferation, invasion, and migration
.
Cancer Biother Radiopharm
2017
;
32
:
320
6
.

37.

García-Rostán
G
,
Costa
AM
,
Pereira-Castro
I
, et al.
Mutation of the PIK3CA gene in anaplastic thyroid cancer
.
Cancer Res
2005
;
65
:
10199
207
.

38.

Eggo
MC
,
Hopkins
JM
,
Franklyn
JA
, et al.
Expression of fibroblast growth factors in thyroid cancer
.
J Clin Endocrinol Metab
1995
;
80
:
1006
11
.

39.

Xing
M
.
Clinical utility of RAS mutations in thyroid cancer: a blurred picture now emerging clearer
.
BMC Med
2016
;
14
:
12
.

40.

Zhu
X
,
Bai
Q
,
Lu
Y
, et al.
Expression and function of CXCL12/CXCR4/CXCR7 in thyroid cancer
.
Int J Oncol
2016
;
48
:
2321
9
.

41.

Hwang
ES
,
Kim
DW
,
Hwang
JH
, et al.
Regulation of signal transducer and activator of transcription 1 (STAT1) and STAT1-dependent genes by RET/PTC (rearranged in transformation/papillary thyroid carcinoma) oncogenic tyrosine kinases
.
Mol Endocrinol
2004
;
18
:
2672
84
.

42.

Zarkesh
M
,
Zadeh-Vakili
A
,
Akbarzadeh
M
, et al.
The role of matrix metalloproteinase-9 as a prognostic biomarker in papillary thyroid cancer
.
BMC Cancer
2018
;
18
:
1199
.

43.

Raman
P
,
Koenig
RJ
.
Pax-8–PPAR-γ fusion protein in thyroid carcinoma
.
Nat Rev Endocrinol
2014
;
10
:
616
.

44.

Shen
J
,
Wang
S
,
Zhang
YJ
, et al.
Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips
.
Epigenetics
2013
;
8
:
34
43
.

45.

Sondka
Z
,
Bamford
S
,
Cole
CG
, et al.
The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers
.
Nat Rev Cancer
2018
;
18
:
696
705
.

46.

Schriml
LM
,
Mitraka
E
,
Munro
J
, et al.
Human Disease Ontology 2018 update: classification, content and workflow expansion
.
Nucleic Acids Res
2018
;
47
:
D955
62
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)