SLIDE – a web-based tool for interactive visualization of large-scale – omics data

Ghosh, Soumita; Datta, Abhik; Tan, Kaisen; Choi, Hyungwon

doi:10.1093/bioinformatics/bty534

Abstract

Summary

Data visualization is often regarded as a post hoc step for verifying statistically significant results in the analysis of high-throughput datasets. This common practice leaves a large amount of raw data behind, from which more information can be extracted. However, existing solutions do not provide capabilities to explore large-scale raw datasets using biologically sensible queries, nor do they allow user interaction based real-time customization of graphics. To address these drawbacks, we have designed an open-source, web-based tool called Systems-Level Interactive Data Exploration, or SLIDE to visualize large-scale -omics data interactively. SLIDE’s interface makes it easier for scientists to explore quantitative expression data in multiple resolutions in a single screen.

Availability and implementation

SLIDE is publicly available under BSD license both as an online version as well as a stand-alone version at https://github.com/soumitag/SLIDE.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Effective visual presentation of large-scale molecular datasets is challenging. A global visualization of the entire raw data can unveil overall trends across samples and discern useful patterns that may be overlooked after statistical filtering. Moreover, user-driven exploration into functionally related features (genes) allows users to quickly identify gene sets and biological functions relevant to their question. Although there are numerous open-source computational tools to visualize –omics data (Perez-Llamas and Lopez-Bigas, 2011; Saeed et al., 2003; Saldanha, 2004; Xia et al., 2013), the data volume that can be handled and the level of user interactivity remain limited (Supplementary Material).

Here, we present SLIDE, a web-based, interactive tool for visualizing -omics datasets. It allows users to interactively navigate through the entire data, visualized using heatmaps, at multiple resolutions. Hierarchical clustering in SLIDE uses highly optimized implementation (Müllner, 2013) that can scale up to very high-dimensional datasets (e.g. 50 000 genes with hundreds of samples, with data clustered within a few minutes). SLIDE comes integrated with biological pathway and Gene Ontology information for subsequent gene selection. Users can quickly create and maintain customized lists of genes, create independent sub-analyses or test enrichment of particular biological functions. These functionalities help users interpret the data at the quantitative expression level and at the function/pathway level, simultaneously. The tool has a convenient interface to input sample meta-data and can be used for various study designs such as group comparisons and time-course analysis.

SLIDE was designed as a Java driven web application to address cross-platform compatibility issues. SLIDE’s interface can be accessed from standard computers using a modern web browser. Detailed description of SLIDE’s architecture and implementation are provided in Supplementary Section S2.

2 Interactive visualization

The components in SLIDE can be grouped into feature-level module and group-level module. Feature-level modules visualize quantitative data for individual features (e.g. mRNA or protein level expression of genes), whereas group-level modules visualize biological functions or pathways enriched in user-selected gene sets. The two modules are bridged with connecting queries, enabling user’s smooth transition from one level to the other. We illustrate these capabilities using a whole lung mouse transcriptomics dataset from an influenza virus infection study with complex time-course design (Brandes et al., 2013) in Supplementary Section S3.

2.1 Feature-level visualization

SLIDE offers simultaneous views at multiple resolutions that can be used to navigate through the data. SLIDE’s web-based interface is shown in Figure 1. The control panel lists parameters for visualization and clustering that can be dynamically applied. The global view visualizes the result of agglomerative hierarchical clustering performed on the entire whole lung mouse transcriptomics data. The graphics can be customized in real-time by adjusting the parameters, such as heatmap color binning range to set a proper range of colors for intuitive graphics.

Fig. 1.

Open in new tab Download slide

Feature-level visualization interface of SLIDE on a web browser. The global view heatmap visualizes the entire expression data matrix after hierarchical clustering of the features. The search panel on top of the global view allows real-time search and tagging of features. Search tags highlight features with horizontal (green and brown) stripes alongside the heatmaps, while the search terms are displayed in the search results panel. The detailed view heatmap gives a zoomed-in view of a portion of the entire data while in the interactive dendrogram view, the branches of the tree can be clicked to visualize a subset of the clustered data. See Supplementary Figure S1B for the group-level visualization interface

Once the global view is optimized with visualization parameters, users can perform multiple operations from there on, such as searching for specific genes by gene identifiers. Users can also search for genes associated with different pathways/ontologies using the search bar. The results are shown in the search results panel and simultaneously marked as colored stripes next to all heatmaps. Clicking on a search keyword highlights associated search tags (the search key ‘response to virus’ in Figure 1.) and displays its details in the information panel.

The detailed view shows the quantitative data of individual genes in a zoomed-in view. A slider attached to the heatmap in the global view allows users to scroll through the entire data and select the portion of the data to be visualized in detailed view.

In the interactive dendrogram view, the result of hierarchical clustering is shown as a dendrogram alongside the heatmap. The branches of the dendrogram can be clicked to visualize a subset of features to further explore the data in smaller and closely related clusters of genes. Additionally, users can maintain multiple lists of individual genes, called feature lists and create sub-analyses on separate tabs. Each sub-analysis creates visualizations in a new browser tab, where further querying and clustering can be performed. Multiple sub-analyses can be recursively created from the existing ones.

2.2 Group-level visualization

In –omics data, systems-level interpretation often requires analysis of enrichment of biological pathway/ontologies in selected gene sets. In SLIDE, this is referred to as group-level analysis. Users can initiate this analysis from feature-level visualization through the user-created feature lists. SLIDE uses the hypergeometric test to evaluate statistical significance of function enrichment in the selected feature lists. In group-level visualization, columns in the heatmap represent feature list and rows represent functional terms. Enrichment analysis can be performed for biological pathways from the ConsensusPathDB (Kamburov et al., 2011) and Gene Ontology terms (Ashburner et al., 2000). Supplementary Figure S1B shows enrichment levels of biological pathways in four user-created feature lists. SLIDE also allows users to customize group-level visualization in real-time by specifying various filtering parameters to remove irrelevant functional terms. In group-level visualization, search and tagging functionalities are also available for the biological functions.

Funding

Authors thank Hiromi Koh for critical reading of the manuscript and testing of the implementation. This work was supported in part by Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Singapore Ministry of Education (MOE2016-T2-1-001) and Singapore National Medical Research Council (NMRC-CG-M009).

Conflict of Interest: none declared.

References

Ashburner

M.

et al. (

2000

)

Gene Ontology: tool for the unification of biology

.

Nat. Genet

.,

25

,

25

–

29

.

Brandes

M.

et al. (

2013

)

A systems analysis identifies a feedforward inflammatory circuit leading to lethal influenza infection

.

Cell

,

154

,

197

–

212

.

Kamburov

A.

et al. (

2011

)

ConsensusPathDB: toward a more complete picture of cell biology

.

Nucleic Acids Res

.,

39

,

D712

–

D717

.

Müllner

D.

(

2013

)

fastcluster: fast hierarchical, agglomerative clustering routines for R and Python

.

J. Stat. Software

,

53

,

1

–

18

.

Google Scholar

Crossref

WorldCat

Perez-Llamas

C.

,

Lopez-Bigas

N.

(

2011

)

Gitools: analysis and visualisation of genomic data using interactive heat-maps

.

PLoS One

,

6

,

e19541

.

Saeed

A.I.

et al. (

2003

)

TM4: a free, open-source system for microarray data management and analysis

.

Biotechniques

,

34

,

374

–

378

.

Saldanha

A.J.

(

2004

)

Java Treeview—extensible visualization of microarray data

.

Bioinformatics

,

20

,

3246

–

3248

.

Xia

J.

et al. (

2013

)

INVEX—a web-based tool for integrative visualization of expression data

.

Bioinformatics

,

29

,

3232

–

3234

.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Associate Editor:

Download all slides

Month:	Total Views:
June 2018	40
July 2018	69
August 2018	102
September 2018	77
October 2018	97
November 2018	95
December 2018	74
January 2019	786
February 2019	285
March 2019	165
April 2019	176
May 2019	198
June 2019	136
July 2019	124
August 2019	88
September 2019	87
October 2019	94
November 2019	92
December 2019	88
January 2020	89
February 2020	73
March 2020	49
April 2020	33
May 2020	49
June 2020	59
July 2020	50
August 2020	44
September 2020	81
October 2020	66
November 2020	75
December 2020	74
January 2021	42
February 2021	48
March 2021	50
April 2021	45
May 2021	42
June 2021	41
July 2021	39
August 2021	24
September 2021	39
October 2021	27
November 2021	43
December 2021	20
January 2022	40
February 2022	32
March 2022	20
April 2022	33
May 2022	22
June 2022	28
July 2022	24
August 2022	31
September 2022	35
October 2022	19
November 2022	15
December 2022	21
January 2023	24
February 2023	19
March 2023	31
April 2023	33
May 2023	16
June 2023	8
July 2023	15
August 2023	19
September 2023	26
October 2023	24
November 2023	28
December 2023	24
January 2024	30
February 2024	34
March 2024	31
April 2024	7

Article Contents

SLIDE – a web-based tool for interactive visualization of large-scale – omics data

Abstract

1 Introduction

2 Interactive visualization

2.1 Feature-level visualization

2.2 Group-level visualization

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

SLIDE – a web-based tool for interactive visualization of large-scale – omics data

Abstract

1 Introduction

2 Interactive visualization

2.1 Feature-level visualization

2.2 Group-level visualization

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only