Abstract

Summary: Correlating disease mutations with clinical and phenotypic information such as drug response or patient survival is an important goal of personalized cancer genomics and a first step in biomarker discovery. HyperModules is a network search algorithm that finds frequently mutated gene modules with significant clinical or phenotypic signatures from biomolecular interaction networks.

Availability and implementation: HyperModules is available in Cytoscape App Store and as a command line tool at www.baderlab.org/Sofware/HyperModules .

Contact:  Juri.Reimand@utoronto.ca or Gary.Bader@utoronto.ca

Supplementary information:  Supplementary data are available at Bioinformatics online

1 INTRODUCTION

Establishing functional links between genetic variation and human disease is a key goal of cancer genome sequencing ( Gonzalez-Perez et al. , 2013 ) and genome-wide association studies ( Hardy and Singleton, 2009 ). Complex diseases like cancer are often driven by infrequent changes in multiple genes in pathways ( Vogelstein et al. , 2013 ). Network analysis helps interpret mutations in systems context and find disease genes, pathways and biomarkers for precision medicine ( Barabasi et al. , 2011 ).

Discovery of modules (subnetworks) in biological networks helps isolate systems with disease-related properties and reduces interactome complexity. A growing number of methods are available for this purpose. A landmark paper combines gene expression signatures with protein–protein interactions (PPI) to find predictive modules of cancer outcome ( Chuang et al. , 2007 ). The NETBAG method studies genetic associations and copy number variants to find autism-related modules ( Gilman et al. , 2011 ). HotNet detects frequently mutated pathways in networks ( Vandin et al. , 2011 , 2012 ). Net-Cox builds prognostic cancer signatures in network analysis of gene expression data ( Zhang et al. , 2013 ). The Reactome FI Cytoscape plugin uncovers prognostic gene modules from networks and gene expression data ( Wu and Stein, 2012 ). Network-based stratification predicts tumor subtypes from mutations in network regions ( Hofree et al. , 2013 ). Such modules maximize a feature of genes such as differential expression, disease mutation frequency or enrichment of interactions.

Because clinical profiles of patients are increasingly available in cancer genomics efforts such as the The Cancer Genome Atlas (TCGA) pan-cancer project ( Weinstein et al. , 2013 ), new methods are needed to discover multivariate biomarkers in networks. We recently analyzed cancer mutations in phosphorylation signaling and found that kinase–substrate networks are informative of patient survival and therapy response ( Reimand and Bader, 2013 ; Reimand et al. , 2013 ). In particular, we found network modules with rare mutations in ovarian cancer patients with improved prognosis. We created the HyperModules method to systematically discover clinically correlated modules from gene and protein networks ( Reimand and Bader, 2013 ) based on our earlier work on functional subnetwork discovery ( Altmae et al. , 2012 ; Reimand et al. , 2008 ). Here we present the previously unavailable software in open-source Java as a command line tool for automated work and a Cytoscape app for interactive graphical analysis.

2 SOFTWARE

HyperModules assumes that clinically informative mutations of complex disease occur in systems of closely interacting genes. The greedy network search algorithm focuses on a local network area, defined by a central seed node (a mutated gene) and its surrounding subnetwork. All mutated genes are sequentially considered as seeds in module discovery. Search starts from the seed and grows the module toward increased benefit by adding connected genes that best improve clinical significance. This objective is driven by statistical tests where patients defined by the module are compared with other patients. Categorical clinical variables are studied with Fisher’s exact test and survival times with log-rank test. Cox regression is currently not supported; however, we plan to add this feature in the future. To establish statistical significance of detected modules, we build a null distribution by searching networks with permuted gene names. Each module of the true network is quantified with an empirical P -value reflecting the fraction of seed-specific modules from shuffled networks exceeding the significance of the true module. This removes artifacts of the greedy strategy and corrects for topological features such as highly connected nodes.

The analysis pipeline is outlined in Figure 1 . Interaction networks are loaded into Cytoscape using standard features ( Shannon et al. , 2003 ). HyperModules requires gene mutations and patient clinical information in two tables. The user selects type of clinical analysis and columns in data table (survival time or variable such as tumor relapse). Survival analysis requires follow-up time and vital status of patients. Detected modules are studied further with network visualization, survival curves and data export. We tested HyperModules on protein networks to find survival modules with cancer mutations. We extracted three human PPI networks of variable size from iRefWeb ( Turner et al. , 2010 ), and five cancer mutation datasets from the International Cancer Genome Consortium portal version 12 ( Supplementary Fig. S1 ). For example, network analysis of 30 000 interactions with 121 liver cancer patients, 686 mutated genes and 10 000 permutations takes 10 min on an 8-core computer with 16 GB RAM. HyperModules is thus applicable to a range of networks and mutation datasets.

 HyperModules requires three inputs—( 1 ) mutated genes in patients, ( 2 ) patient clinical information and ( 3 ) protein or gene network. Search is performed for all mutated genes as seeds ( 4 ). Network visualization, clinical variable statistics and data export facilitate further analysis ( 5 )
Fig. 1.

HyperModules requires three inputs—( 1 ) mutated genes in patients, ( 2 ) patient clinical information and ( 3 ) protein or gene network. Search is performed for all mutated genes as seeds ( 4 ). Network visualization, clinical variable statistics and data export facilitate further analysis ( 5 )

3 EXAMPLE ANALYSIS

An example dataset is provided in Supplementary File S1 . It comprises 183 ovarian cancer patients from the TCGA study ( Cancer Genome Atlas Research Network, 2008 ) and the network of 4823 kinase–substrate interactions from our earlier study ( Reimand and Bader, 2013 ). The ovarian cancer mutations are restricted to 163 proteins with single nucleotide variants affecting protein phosphorylation sites or kinase domains. Two sets of modules were computed with 10 000 network permutations and are shown in Supplementary Figures S2–S3 . First, the search for survival correlations in the kinase–substrate network with log-rank test identified 19 modules, where associated patients have significantly different survival rates compared with other patients in the cohort (empirical P ≤ 0.05). Second, the categorical variable search with Fisher’s exact test revealed fivemodules with significant enrichment of alive patients (empirical P ≤ 0.05). The modules are also summarized in Supplementary Table S1 .

4 DISCUSSION

HyperModules is a biological network-mining algorithm that reveals modules of interacting genes with clinically informative disease mutations. Diverse biomolecular interaction networks can be analyzed, including PPI networks, gene regulatory networks and curated biological pathways. Disease mutations are also broadly defined. Although we initially studied cancer point mutations, other types of alterations such as copy number and gene expression changes can be used. HyperModules finds correlations with groups of genes where mutations may be infrequent but the signature strengthens through network integration. Such modules are not often directly usable as biomarkers because of small sample size; however, we believe that our approach helps discover genes and pathways as potential multivariate biomarkers for further experiments.

ACKNOWLEDGEMENTS

The authors thank Jason Montojo and Harold Rodriguez for Cytoscape help.

Funding : NRNB (US NIH NCRR grant P41 GM103504) and Google (Google Summer of Code 2013).

Conflict of Interest : none declared.

REFERENCES

Altmae
S
, et al. 
Research resource: interactome of human embryo implantation: identification of gene expression pathways, regulation, and integrated regulatory networks
Mol. Endocrinol.
2012
, vol. 
26
 (pg. 
203
-
217
)
Barabasi
AL
, et al. 
Network medicine: a network-based approach to human disease
Nat. Rev. Genet.
2011
, vol. 
12
 (pg. 
56
-
68
)
Cancer Genome Atlas Research Network
Integrated genomic analyses of ovarian carcinoma
Nature
2008
, vol. 
474
 (pg. 
609
-
615
)
Chuang
HY
, et al. 
Network-based classification of breast cancer metastasis
Mol. Syst. Biol.
2007
, vol. 
3
 pg. 
140
 
Gilman
SR
, et al. 
Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses
Neuron
2011
, vol. 
70
 (pg. 
898
-
907
)
Gonzalez-Perez
A
, et al. 
Computational approaches to identify functional genetic variants in cancer genomes
Nat. Methods
2013
, vol. 
10
 (pg. 
723
-
729
)
Hardy
J
Singleton
A
Genomewide association studies and human disease
N. Engl. J. Med.
2009
, vol. 
360
 (pg. 
1759
-
1768
)
Hofree
M
, et al. 
Network-based stratification of tumor mutations
Nat. Methods
2013
, vol. 
10
 (pg. 
1108
-
1115
)
Reimand
J
Bader
GD
Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers
Mol. Syst. Biol.
2013
, vol. 
9
 pg. 
637
 
Reimand
J
, et al. 
GraphWeb: mining heterogeneous biological networks for gene modules with functional significance
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
W452
-
W459
)
Reimand
J
, et al. 
The mutational landscape of phosphorylation signaling in cancer
Sci. Rep.
2013
, vol. 
3
 pg. 
2651
 
Shannon
P
, et al. 
Cytoscape: a software environment for integrated models of biomolecular interaction networks
Genome Res.
2003
, vol. 
13
 (pg. 
2498
-
2504
)
Turner
B
, et al. 
iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence
Database (Oxford)
2010
, vol. 
2010
 pg. 
baq023
 
Vandin
F
, et al. 
Algorithms for detecting significantly mutated pathways in cancer
J. Comput. Biol.
2011
, vol. 
18
 (pg. 
507
-
522
)
Vandin
F
, et al. 
Discovery of mutated subnetworks associated with clinical data in cancer
Pac. Symp. Biocomput.
2012
, vol. 
1
 (pg. 
55
-
66
)
Vogelstein
B
, et al. 
Cancer genome landscapes
Science
2013
, vol. 
339
 (pg. 
1546
-
1558
)
Weinstein
JN
, et al. 
The Cancer Genome Atlas Pan-Cancer analysis project
Nat. Genet.
2013
, vol. 
45
 (pg. 
1113
-
1120
)
Wu
G
Stein
L
A network module-based method for identifying cancer prognostic signatures
Genome Biol.
2012
, vol. 
13
 pg. 
R112
 
Zhang
W
, et al. 
Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment
PLoS Comput. Biol.
2013
, vol. 
9
 pg. 
e1002975
 

Author notes

Associate Editor: Alfonso Valencia

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data