MIIC online: a web server to reconstruct causal or non-causal networks from non-perturbative data

Sella, Nadir; Verny, Louis; Uguzzoni, Guido; Affeldt, Séverine; Isambert, Hervé

doi:10.1093/bioinformatics/btx844

Abstract

Summary

We present a web server running the MIIC algorithm, a network learning method combining constraint-based and information-theoretic frameworks to reconstruct causal, non-causal or mixed networks from non-perturbative data, without the need for an a priori choice on the class of reconstructed network. Starting from a fully connected network, the algorithm first removes dispensable edges by iteratively subtracting the most significant information contributions from indirect paths between each pair of variables. The remaining edges are then filtered based on their confidence assessment or oriented based on the signature of causality in observational data. MIIC online server can be used for a broad range of biological data, including possible unobserved (latent) variables, from single-cell gene expression data to protein sequence evolution and outperforms or matches state-of-the-art methods for either causal or non-causal network reconstruction.

Availability and implementation

MIIC online can be freely accessed at https://miic.curie.fr.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

The reconstruction of graphical models has become ubiquitous to analyze the rapidly expanding, information-rich data of biological interest. However, to date, all available network reconstruction servers are restricted to specific types of data and make an a priori choice on the causal or non-causal nature of the underlying model, such as BNW (Ziebarth et al., 2013) performing Bayesian Network reconstructions, LEGUMEGRN (Wang et al., 2013) learning directed regulatory networks or EVFOLD (Marks et al., 2011) and DCA (Morcos et al., 2011) predicting undirected protein contact interactions from amino acid homologous sequences.

MIIC online server aims to fill this gap by learning the most appropriate causal, non-causal or mixed graphical model from the available data. MIIC can be used for a broad range of biological data, from single-cell transcriptomics or genomic alterations in tumor progression to long term evolution of proteins and genomes (Fig. 1 and Supplementary Figs S1–S4 and Verny et al., 2017). MIIC online server is outlined below with more detailed information available in Supplementary Material and online Tutorial and User Guide documentation available at https://miic.curie.fr.

Fig. 1.

Open in new tab Download slide

View of MIIC online output page with a network visualization. It corresponds to a zoom of a regulatory network reconstructed from single-cell expression data from haematopoietic stem cell differentiation, Hamey et al. (2017), see full network in Supplementary Figure S2

2 Materials and methods

2.1 MIIC algorithm

Multivariate Information-based Inductive Causation (MIIC) algorithm relies on a novel information-theoretic method that combines constraint-based learning approach and maximum likelihood framework (Affeldt and Isambert, 2015; Affeldt et al., 2016; Verny et al., 2017). Starting from a fully connected graph, MIIC iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and orients the remaining edges, based on the signature of causality in observational data. MIIC also provides an edge specific confidence assessment of retained edges. The approach outperforms traditional search-and-score and constraint-based methods on a broad range of benchmark networks (Affeldt and Isambert, 2015; Affeldt et al., 2016; Verny et al., 2017). It achieves significantly better results with much fewer samples and is typically ten to hundred times faster than existing methods taking into account the causal effects of unobserved latent variables (Verny et al., 2017).

2.2 MIIC online input and main options

MIIC online pipeline (Supplementary Fig. S1) is a web interface for the MIIC algorithm. The Workbench is used to upload the user’s dataset formatted as a table with variable names specified as column names (or row names). This is the only required input to reconstruct a network using basic (default) settings.

2.2.1 Algorithm advanced parameters

This section allows the user to specify the following parameters: (i) $N_{eff}$ ⁠, the effective number of independent samples in the submitted dataset (also estimated by MIIC itself, see below), (ii) Complexity criterion, either Minimum Description Length/Bayesian Information Criterion (MDL/BIC) or Normalized Maximum Likelihood (NML; Affeldt et al., 2016), (iii) Orientation step (optional), orienting ‘v-structures,’ (iv) Propagation step (optional), if the orientation step is performed and (v) latent variables, to take into account or ignore the causal effects of unobserved (latent) variables (Verny et al., 2017).

2.2.2 Supplementary files

Supplementary Material can be uploaded (optional), in particular, to exclude specific edges based on prior knowledge, to provide a user-defined network layout (x, y co-ordinates of nodes) or to provide an order of categorical (non-numerical) variables for assigning edge signs based on correlations or causal effects between variables.

2.2.3 Confidence cut

A threshold can be provided (default $10^{- 2}$ ⁠) to filter retained edges based on their confidence estimated over a number of randomizations of the available data (default 100 randomizations).

3 Results

3.1 MIIC online output: visualization and analysis

Once the learning process is finished, the web server redirects the user to the results page, which contains several sections, displaying the network and reporting some analysis. The Advanced visualization tab uses a web-embedded Cytoscape display to visualize and interactively rearrange the resulting network. It is also possible to open a new browser tab (‘Go!’ button) to visualize and filter network edges based on their residual (partial) correlation or edge-specific confidence and save the graph visualization in various formats. The Summary tab contains all the processed information on each retained or deleted edges. The Probabilities tab contains information relative to the orientation and propagation steps of the network reconstruction and is relevant for a quantitative analysis of the causal relations in the data (Supplementary Material). The Centrality measures tab provides a topological analysis of the network using graph theoretical measures. The Cross correlation tab displays a plot of the sample cross-correlation decay along with an exponential test. In the presence of correlation biases between successive samples, a warning message is displayed on the results page and an estimate of the effective number of independent samples, $N_{eff}$ ⁠, is used to reconstruct a more reliable network model (Verny et al., 2017). The Download tab allows to download all the results associated with a network reconstruction.

3.2 Examples of causal versus non-causal networks

3.2.1 Gene regulatory network in haematopoiesis

This first example concerns the reconstruction of a regulatory network from 2167 single-cell gene expression profiles of blood stem cells from (Hamey et al., 2017), see Supplementary Material. Figure 1 displays MIIC online results page with a zoomed view of the regulatory network including 34 transcription factors, see full network in Supplementary Figure S2. MIIC predicted network exhibits a number of known central regulators such as MECOM $|$ EVI1, GATA1 and GATA2, with regulatory interactions documented in the literature, such as MECOM $\to$ PBX1, MECOM $\to$ GATA2 and GATA2 $\to$ TAL1 $|$ SCL, see Supplementary Material for details. Note, in particular, that nearly all predicted edges are directed, as expected for a transcriptional regulatory network, with red edges indicating gene activation and blue edges indicating gene repression regulations.

3.2.2 Protein undirected contact map

In contrast, the second example concerns an inherently non-causal network corresponding to the physical contact map of amino acid residues within a protein structure reconstructed from 12 533 aligned homologous sequences of an abundant protein domain family: the response regulator receiver domain (Pfam code PF00072). MIIC contact prediction results are presented in Supplementary Figures S3 and S4 and provide similar accurate predictions of the protein contact map, without a priori choice or bias on the causal or non-causal class of reconstructed networks, as compared with the state-of-the-art method for protein contact prediction, PlmDCA Ekeberg et al. (2013), although MIIC performance on protein structures with fewer homologous sequences is found to be less accurate, Supplementary Figure S5.

Funding

This work has been supported by the Labex celtisphybio and Region IdF.

Conflict of Interest: none declared.

References

Affeldt

S.

et al. (

2016

)

3off2: a network reconstruction algorithm based on 2-point and 3-point information statistics

.

BMC Bioinformatics

,

17

,

12.

Affeldt

S.

,

Isambert

H.

(

2015

) Robust reconstruction of causal graphical models based on conditional 2-point and 3-point information. In: Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI). AUAI Press, Amsterdam, pp. 42–51.

Ekeberg

M.

et al. (

2013

)

Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models

.

Phys. Rev. E

,

87

,

012707.

Google Scholar

Crossref

WorldCat

Hamey

F.K.

et al. (

2017

)

Reconstructing blood stem cell regulatory network models from single-cell molecular profiles

.

Proc. Natl. Acad. Sci. USA

,

114

,

5822

–

5829

.

Google Scholar

Crossref

WorldCat

Marks

D.S.

et al. (

2011

)

Protein 3D structure computed from evolutionary sequence variation

.

PLoS One

,

6

,

e28766.

Morcos

F.

et al. (

2011

)

Direct-coupling analysis of residue coevolution captures native contacts across many protein families

.

Proc. Natl. Acad. Sci. USA

,

108

,

E1293

–

E1301

.

Google Scholar

Crossref

WorldCat

Verny

L.

et al. (

2017

)

Learning causal networks with latent variables from multivariate information in genomic data

.

PLoS Comput. Biol

.,

13

,

e1005662.

Wang

M.

et al. (

2013

)

LegumeGRN: a gene regulatory network prediction server for functional and comparative studies

.

PloS One

,

8

,

e67434.

Ziebarth

J.D.

et al. (

2013

)

Bayesian Network Webserver: a comprehensive tool for biological network modeling

.

Bioinformatics

,

29

,

2801

–

2803

.

Author notes

Present address: LIPADE, Department of Mathematics and Computer Science, University of Paris Descartes, 45 rue des Saints Pères, Paris, France

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Associate Editor:

Download all slides

Month:	Total Views:
December 2017	11
January 2018	94
February 2018	16
March 2018	16
April 2018	13
May 2018	7
June 2018	30
July 2018	57
August 2018	14
September 2018	20
October 2018	8
November 2018	18
December 2018	9
January 2019	20
February 2019	5
March 2019	11
April 2019	8
May 2019	4
June 2019	10
July 2019	55
August 2019	17
September 2019	42
October 2019	36
November 2019	38
December 2019	35
January 2020	28
February 2020	22
March 2020	20
April 2020	37
May 2020	12
June 2020	59
July 2020	47
August 2020	15
September 2020	32
October 2020	29
November 2020	45
December 2020	21
January 2021	21
February 2021	15
March 2021	25
April 2021	10
May 2021	8
June 2021	26
July 2021	28
August 2021	15
September 2021	23
October 2021	35
November 2021	93
December 2021	27
January 2022	73
February 2022	23
March 2022	31
April 2022	24
May 2022	47
June 2022	35
July 2022	21
August 2022	36
September 2022	46
October 2022	34
November 2022	93
December 2022	37
January 2023	35
February 2023	33
March 2023	29
April 2023	27
May 2023	53
June 2023	18
July 2023	36
August 2023	29
September 2023	26
October 2023	35
November 2023	45
December 2023	20
January 2024	29
February 2024	27
March 2024	38
April 2024	11

Article Contents

MIIC online: a web server to reconstruct causal or non-causal networks from non-perturbative data

Abstract

1 Introduction

2 Materials and methods

2.1 MIIC algorithm

2.2 MIIC online input and main options

2.2.1 Algorithm advanced parameters

2.2.2 Supplementary files

2.2.3 Confidence cut

3 Results

3.1 MIIC online output: visualization and analysis

3.2 Examples of causal versus non-causal networks

3.2.1 Gene regulatory network in haematopoiesis

3.2.2 Protein undirected contact map

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

MIIC online: a web server to reconstruct causal or non-causal networks from non-perturbative data

Abstract

1 Introduction

2 Materials and methods

2.1 MIIC algorithm

2.2 MIIC online input and main options

2.2.1 Algorithm advanced parameters

2.2.2 Supplementary files

2.2.3 Confidence cut

3 Results

3.1 MIIC online output: visualization and analysis

3.2 Examples of causal versus non-causal networks

3.2.1 Gene regulatory network in haematopoiesis

3.2.2 Protein undirected contact map

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only