JAMI: fast computation of conditional mutual information for ceRNA network analysis

Hornakova, Andrea; List, Markus; Vreeken, Jilles; Schulz, Marcel H

doi:10.1093/bioinformatics/bty221

Abstract

Motivation

Genome-wide measurements of paired miRNA and gene expression data have enabled the prediction of competing endogenous RNAs (ceRNAs). It has been shown that the sponge effect mediated by protein-coding as well as non-coding ceRNAs can play an important regulatory role in the cell in health and disease. Therefore, many computational methods for the computational identification of ceRNAs have been suggested. In particular, methods based on Conditional Mutual Information (CMI) have shown promising results. However, the currently available implementation is slow and cannot be used to perform computations on a large scale.

Results

Here, we present JAMI, a Java tool that uses a non-parametric estimator for CMI values from gene and miRNA expression data. We show that JAMI speeds up the computation of ceRNA networks by a factor of ∼70 compared to currently available implementations. Further, JAMI supports multi-threading to make use of common multi-core architectures for further performance gain.

Requirements

Java 8.

Availability and implementation

JAMI is available as open-source software from https://github.com/SchulzLab/JAMI.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

MicroRNAs (miRNAs) are ∼23 nt long RNAs that play an important role in the regulation of transcript abundance in mammalian cells. They are estimated to regulate at least half of the genes in the human genome (Friedman et al., 2009) and thus affect important biological processes and show deregulation in many diseases (Jiang et al., 2009). Several miRNAs often regulate the same transcript in a combinatorial fashion and many transcripts are regulated by the same miRNAs, leading to complex genome-wide networks of co-regulation (Tsang et al., 2010). In these competing endogenous RNA (ceRNA) networks, ceRNA genes that carry binding sites for the same miRNA(s) compete over the limited pool of available miRNA molecules (Arvey et al., 2010; Salmena et al., 2011; Tay et al., 2014). Several examples of ceRNA crosstalk have already been verified, including many genes involved in cancer such as PTEN (Poliseno et al., 2010). This evidence has sparked interest in developing systematic methods for inferring ceRNA interactions from gene and miRNA expression data, reviewed in (Le et al., 2017).

With the emergence of large-scale studies providing gene and miRNA expression data for hundreds of samples, it has become possible to infer ceRNA interactions computationally and several approaches have been suggested to achieve this. Sumazin et al. proposed the use of conditional mutual information in their method HERMES (Sumazin et al., 2011), which was later implemented as part of the CUPID software package (CUPID step III) (Chiu et al., 2015). While this method was applied successfully for inferring ceRNA networks for approximately 450 000 gene pairs (Chiu et al., 2017), the current implementation is very slow and poses a bottleneck for the construction of large-scale networks.

This issue has motivated other researchers to design alternative approaches that are faster. For example methods based on linear correlation (Liu et al., 2017; Paci et al., 2014; Wang et al., 2015). However, in contrast to CUPID, the linearity assumption limits the accuracy of these methods (Le et al., 2017). We thus sought to speed up the computations of CMI values as the only known non-linear alternative for facilitating the efficient construction of large-scale ceRNA networks.

2 Results and discussion

Here, we present JAMI, a novel implementation of the CMI computation step of CUPID (Chiu et al., 2015). Like CUPID, JAMI uses adaptive partitioning for estimating CMI values (Darbellay and Vajda, 1999). This non-parametric estimator is consistent and makes no assumption on the distribution of the data and can thus be used with expression data from any technology. JAMI uses efficient data structures in Java to implement the three-dimensional data partitioning for the computation of CMI values. In contrast to CUPID, JAMI was carefully designed to support multi-threading (Supplementary Fig. S1). In Figure 1, we show that JAMI achieves a substantially better single-threaded runtime compared to CUPID implemented in either Matlab or Java. For the latter comparison, we carefully re-implemented the original CUPID method in Java.

Fig. 1.

Open in new tab Download slide

Performance comparison between JAMI, CUPID (Matlab) and CUPID (Java). (a) Process user time in seconds. (b) Peak memory usage

Both JAMI and CUPID rank expression values before the CMI computation. In CUPID, all expression values of 0 are assigned different ranks. This introduces bias and results in positive CMI values even if genes are not expressed in any sample. To avoid this, we extended JAMI to be zero expression aware, and demonstrate that this has considerable effect on the results (Supplementary Figs S2–S5).

Preparing the input for CUPID is tedious and requires separate expression and miRNA interaction files as input for every gene pair of interest. In contrast, JAMI accepts two expression matrices as input, one for gene and one for miRNA expression, and filters these automatically for the data needed. In addition, JAMI offers great flexibility with regards to defining the triplets of interest, making it much more convenient to use JAMI in settings where several genes are of interest. JAMI output files can be directly imported in network analysis tools such as Cytoscape (Shannon et al., 2003). Moreover, JAMI does not require an expensive Matlab^® license like CUPID, making it available to a broader audience. To make sure that JAMI can also be used conveniently in a scripting language, we implemented the RJAMI wrapper package for R (http://github.com/SchulzLab/RJAMI).

We illustrate the potential of JAMI by constructing a ceRNA interaction network from the TCGA breast cancer data set (TCGA, 2012) for known ceRNAs (Tay et al., 2014) (Supplementary Fig. S6, see user manual for a step by step guide). The resulting network appears to be much denser than what is reported in the literature, emphasizing the importance of robust tools for ceRNA network inference from widely available expression data.

An open question in the field is whether linear or non-linear methods are better suited for ceRNA network inference (Le et al., 2017). Answering this question was thus far impeded by the lack of a fast tool for computing CMI values. JAMI overcomes this research barrier and facilitates comparisons with correlation-based method such as sensitivity correlation (Paci et al., 2014) (Supplementary Fig. S7).

In conclusion, JAMI is a fast, freely available and well-documented (http://jami.readthedocs.io/) tool primarily targeted at the inference of ceRNA networks. However, its implementation is general and may be used to study other modulators of gene–gene interactions, e.g. transcription factors (Flores et al., 2013).

Funding

This work was supported by the Cluster of Excellence on Multimodal Computing and Interaction (EXC284 to M.S. and J.V.) of the German National Science Foundation (D.F.G.).

Conflict of Interest: none declared.

References

Arvey

A.

et al. (

2010

)

Target mRNA abundance dilutes microRNA and siRNA activity

.

Mol. Syst. Biol

.,

6

,

363.

Chiu

H.S.

et al. (

2015

)

Cupid: simultaneous reconstruction of microRNA-target and ceRNA networks

.

Genome Research

,

25

,

257

–

267

.

Chiu

H.-S.

et al. (

2017

)

High-throughput validation of ceRNA regulatory network

.

BMC Genomics

,

18

,

418.

Darbellay

G.A.

,

Vajda

I.

(

1999

)

Estimation of the information by an adaptive partitioning of the observation space

.

IEEE Trans. Information Theory

,

45

,

1315

–

1321

.

Google Scholar

Crossref

WorldCat

Flores

M.

et al. (

2013

)

Gene regulation, modulation, and their applications in gene expression data analysis

.

Adv. Bioinformatics

,

2013

,

1.

Google Scholar

Crossref

WorldCat

Friedman

R.C.

et al. (

2009

)

Most mammalian mRNAs are conserved targets of microRNAs

.

Genome Res

.,

19

,

92

–

105

.

Jiang

Q.

et al. (

2009

)

miR2Disease: a manually curated database for microRNA deregulation in human disease

.

Nucleic Acids Res

.,

37

,

D98

–

D104

.

Le

T.D.

et al. (

2017

)

Computational methods for identifying miRNA sponge interactions

.

Brief. Bioinformatics

.

18

,

577

–

590

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Liu

C.

et al. (

2017

)

Cancer-Related Triplets of mRNA-lncRNA-miRNA Revealed by Integrative Network in Uterine Corpus Endometrial Carcinoma

.

BioMed Res. Int

.,

2017

,

3859582

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Paci

P.

et al. (

2014

)

Computational analysis identifies a sponge interaction network between long non-coding RNAs and messenger RNAs in human breast cancer

.

BMC Syst. Biol

.,

8

,

83.

Poliseno

L.

et al. (

2010

)

A coding-independent function of gene and pseudogene mRNAs regulates tumour biology

.

Nature

,

465

,

1033

–

1038

.

Salmena

L.

et al. (

2011

)

A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?

Cell

,

146

,

353

–

358

.

Shannon

P.

et al. (

2003

)

Cytoscape: a software environment for integrated models of biomolecular interaction networks

.

Genome Res

.,

13

,

2498

–

2504

.

Sumazin

P.

et al. (

2011

)

An extensive MicroRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma

.

Cell

,

147

,

370

–

381

.

Tay

Y.

et al. (

2014

)

The multilayered complexity of ceRNA crosstalk and competition

.

Nature

,

505

,

344.

TCGA, T. C. G. A. C.

(

2012

)

Comprehensive molecular portraits of human breast tumours

.

Nature

,

490

,

61

–

70

.

Crossref

PubMed

WorldCat

Tsang

J.S.

et al. (

2010

)

Genome-wide dissection of microRNA functions and co-targeting networks using gene-set signatures

.

Mol. Cell

,

38

,

140

–

153

.

Wang

P.

et al. (

2015

)

Identification of lncRNA-associated competing triplets reveals global patterns and prognostic markers for cancer

.

Nucleic Acids Res

.,

43

,

3478

–

3489

.

Author notes

The authors wish it to be known that, in their opinion, Andrea Hornakova and Markus List authors should be regarded as Joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Associate Editor:

Download all slides

Month:	Total Views:
April 2018	447
May 2018	98
June 2018	69
July 2018	34
August 2018	49
September 2018	260
October 2018	61
November 2018	73
December 2018	54
January 2019	53
February 2019	28
March 2019	78
April 2019	51
May 2019	50
June 2019	34
July 2019	25
August 2019	34
September 2019	39
October 2019	31
November 2019	36
December 2019	24
January 2020	36
February 2020	23
March 2020	36
April 2020	18
May 2020	18
June 2020	54
July 2020	52
August 2020	41
September 2020	22
October 2020	33
November 2020	28
December 2020	13
January 2021	26
February 2021	29
March 2021	21
April 2021	14
May 2021	10
June 2021	18
July 2021	15
August 2021	17
September 2021	13
October 2021	22
November 2021	21
December 2021	19
January 2022	22
February 2022	27
March 2022	18
April 2022	31
May 2022	22
June 2022	25
July 2022	25
August 2022	35
September 2022	24
October 2022	31
November 2022	17
December 2022	17
January 2023	6
February 2023	15
March 2023	35
April 2023	44
May 2023	21
June 2023	13
July 2023	14
August 2023	10
September 2023	13
October 2023	14
November 2023	22
December 2023	11
January 2024	22
February 2024	13
March 2024	16
April 2024	2

Article Contents

JAMI: fast computation of conditional mutual information for ceRNA network analysis

Abstract

1 Introduction

2 Results and discussion

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

JAMI: fast computation of conditional mutual information for ceRNA network analysis

Abstract

1 Introduction

2 Results and discussion

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only