OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data

Abstract Data analysis and interpretation remain a critical bottleneck in current multi-omics studies. Here, we introduce OmicsAnalyst, a user-friendly, web-based platform that allows users to perform a wide range of well-established data-driven approaches for multi-omics integration, and visually explore their results in a clear and meaningful manner. To help navigate complex landscapes of multi-omics analysis, these approaches are organized into three visual analytics tracks: (i) the correlation network analysis track, where users choose among univariate and multivariate methods to identify important features and explore their relationships in 2D or 3D networks; (ii) the cluster heatmap analysis track, where users apply several cutting-edge multi-view clustering algorithms and explore their results via interactive heatmaps; and (iii) the dimension reduction analysis track, where users choose among several recent multivariate techniques to reveal global data structures, and explore corresponding scores, loadings and biplots in interactive 3D scatter plots. The three visual analytics tracks are equipped with comprehensive options for parameter customization, view customization and targeted analysis. OmicsAnalyst lowers the access barriers to many well-established methods for multi-omics integration via novel visual analytics. It is freely available at https://www.omicsanalyst.ca.


INTRODUCTION
The rapid development and increasing accessibility of various omics profiling technologies such as massive parallel sequencing and mass spectrometry have made multi-omics data collection more routine practices in recent years.These multi-omics studies promise to provide more holistic pictures to enable comprehensive understanding of complex diseases and biological processes (1,2).As a result, the last few years have witnessed a growing number of bioinformatics tools and statistical methods developed for multiomics integration (3,4).These computational approaches can be largely classified as either knowledge-driven or datadriven strategies.The knowledge-driven strategy is well established.A typical example is to map genes and metabolites of interest into known metabolic pathways or networks and then visually explore the results for hypothesis generation (5)(6)(7).A key limitation of this strategy is its dependency on a prior knowledge base.Data analysis and interpretation will be conducted within the confines of this knowledge domain, making it unsuitable for novel discoveries and applications to non-model organisms.The data-driven strategy, on the other hand, depends primarily on the datasets themselves, and can be applied in a more general and unbiased manner (8).
Many different data-driven approaches have been proposed and practiced for multi-omics integration.They can be loosely put into three categories based on their main themes, including (i) Feature correlation analysis -this theme aims to identify features that are correlated across different omics layers and/or co-vary under the conditions of interest.These correlated features provide more detailed delineations of underlying biological processes than those obtained from a single omics layer; (ii) Sample clustering analysis -this theme aims to leverage multiple molecular profiles to improve sample characterization, such as to identify subsets of cancer patients for more targeted treatments (9); (iii) Understanding global structure -this theme aims to gain a high-level overview of multi-omics data by extracting and examining their shared structural variations and local patterns.Compared to the knowledge-driven strategy where many user-friendly tools are available, most datadriven methods are in the form of complex multivariate statistics or machine learning algorithms, available mainly in the form of command line programs (10)(11)(12)(13)(14).For most researchers, they are harder to use and the results are harder to interpret.User-friendly bioinformatics tools supporting data-driven strategy are urgently needed to help convert the complex multi-omics data into meaningful patterns and insights.
Here, we introduce OmicsAnalyst, a web-based visual analytics platform dedicated for data-driven multi-omics integration.It currently supports more than a dozen wellestablished methods through three visual analytics tracks -correlation network analysis, cluster heatmap analysis, and dimension reduction analysis.These three visualization tracks are equipped with comprehensive functions and menus to allow users to perform parameter customization, visual exploration and interactive analysis.To help users navigate the tool, we have compiled a comprehensive list of frequently asked questions (FAQs), four different screenshot tutorials, and a case study.The main features of Omic-sAnalyst are described below.

Overview of omicsanalyst
The workflow of OmicsAnalyst is shown in Figure 1.It consists of three main phases to help users to navigate the complex procedures of multi-omics analysis.In the Phase 1 (data processing), users go through the conventional single omics data analysis workflow including data upload, annotation, missing value estimation, data filtering, and identification of significant features.After basic quality check and optional data normalization for multi-omics integration, users enter the Phase 2 (method selection).OmicsAnalyst offers a wide array of approaches organized under three categories: correlation network analysis, cluster heatmap analysis, and dimension reduction analysis.After method selection, users are presented with an overview and diagnostic plots to decide whether the default parameters (if any) should be updated.Finally, users enter the Phase 3 (visual analytics) and explore the results through interactive visualization coupled with various statistical and functional analysis.

Data processing
Data upload and annotation.OmicsAnalyst accepts data tables containing feature abundance values (raw or normalized) generated from different omics platforms.They must share the same sample names and metadata information.For data from human and mouse, users can further perform feature annotation for transcriptomics, proteomics, metabolomics and miRNA.The annotation is required for enrichment analysis in the visual analytics stage.Missing value estimation.Omics data often contain missing values which could cause potential issues in downstream analysis.Users can exclude features with too many missing values or perform missing value estimation based on several widely used methods.Data filtering.Given the high-dimensional nature of omics data, it is strongly recommended to perform unspecific data filtering to exclude features that are unlikely to be useful in downstream analysis.In particular, features that are relatively consistent can be safely excluded based on their inter-quantile ranges (IQRs) or other variance measures.Features that are of very low abundance should also be excluded, as they contribute little to the overall variancecovariance structure in multi-omics integration.Differential analysis.Users can perform conventional statistical comparisons to identify significant features within individual omics data.These features will be available for correlation network creation or highlighted in heatmaps or scatter plots.Quality checking and normalization/scaling.The goal is to make different omics data more 'integrable' by sharing similar distributions.Users can visually examine the distribution of individual omics data through density plot, principal component analysis (PCA) plot, and t-distributed stochastic neighbor embedding (t-SNE) plot.Based on the visual assessment, users can choose among a variety of data transformation, centering and scaling options to improve the integrability.

Correlation network analysis track
The objective of the correlation network analysis is to identify and visualize relationships between key features from two omics datasets.It consists of three main steps, detailed below.
Network creation.This step involves selecting the key features and computing their pairwise correlations.By default, significant features identified by differential analysis during the data processing phase will be used for network creation.However, users can also select top features based on the loading scores from the multivariate dimension reduction methods.Details on the dimension reduction techniques can be found in the 'Dimension Reduction Analysis Track' section.The next step is to compute pairwise similarities between selected features.Due to their simplicity and widespread familiarity, univariate methods, such as Pearson correlation, are usually computed as a first line of analysis.However, these methods can produce many false connections due to presence of highly collinear features in omics data.Partial correlation, a multivariate method that measures the correlation between two variables while controlling for all others, has been successfully applied to omics data to detect connections between features that are more likely to represent true dependencies (15).

Network customization.
Networks with a large number of nodes and edges are too complex and overwhelming for vi-W478 Nucleic Acids Research, 2021, Vol.49, Web Server issue sualization and interpretation.OmicsAnalyst partially addresses this issue by allowing users to control network sizes based on the strengths of correlations.However, applying a single threshold can often produce networks with the majority of edges existing between nodes of the same omics type.This is because in many cases, correlations between features of the same omics type are categorically higher than those of different omics types, likely due to technical differences between platforms.To address this issue, OmicsAnalyst offers two filters to control correlation strengths, one for within-omics and the other for between-omics, with a more stringent default threshold for the former.In addition, users can also apply degree or betweenness filters to control network size based purely on the topological properties of the nodes.
Network visual analytics.In addition to providing different filters to allow users to refine the nodes and edges that comprise the network, OmicsAnalyst offers a variety of simple and advanced functions to facilitate visual identification of important network structures.For instance, binary edge coloring is used to differentiate positive and negative correlations, and edge thickness is used to reflect strengths of the correlation to enable quick identification of feature pairs that are highly correlated.OmicsAnalyst also offers 3D network visualization for a deeper perspective of the relationships.Advanced graph layout algorithms, for example edge bundling can be applied to aggregate similar edges into groups to reduce clutter in visualization.Other features such as the concentric circular layout facilitate the evaluation of focal nodes and hierarchical relationships within network.When features are annotated during data processing, users can perform enrichment analysis on a group of nodes selected either manually or through automatic module detection algorithms.

Cluster heatmap analysis track
The objective of the track is to identify and visually explore relationships between samples and key features in side-byside heatmaps, each displaying data from one omics type.It consists of two main steps, detailed below.

Sample cluster detection.
In multi-omics data, each omics type is a separate representation of the same samples, making it suitable for multi-view clustering (4).One main advantage of multi-view clustering is that it tends to reduce spurious correlations that are due to random noise or platformspecific technical artifacts, as it is highly unlikely that exact same erroneous effects are present across multiple datasets.OmicsAnalyst currently supports three multi-view clustering algorithms: spectral clustering (14), perturbation-based clustering (16) and similarity network fusion (13).The distinguishing features of these three methods are as follows.Spectral clustering makes use of eigenvalues derived from a similarity matrix to perform clustering based on fewer dimensions, which greatly increases the speed (17).Omics-Analyst employs the Spectrum R package, which combines the advantages of spectral clustering with several other advanced techniques (14).Perturbation clustering assumes that reliable clusters are robust to small alterations to the data (4).OmicsAnalyst uses the perturbation clustering for data integration and disease subtyping (PINSplus) R package to support this approach (16).The similar network fusion (SNF) method involves fusing individual sample similarity matrices together using a rapid nearest neighbour approach (13).Since the associated SNFtool package does not support cluster detection, the spectral clustering is applied to the learned status matrix for this purpose.

Heatmap visual analytics.
The results of clustering analysis can be intuitively explored via heatmaps, which use visual cues to show how samples are clustered and how fea-ture abundances vary across samples.OmicsAnalyst implements an interactive joint-heatmap viewer where two different omics datasets can be visualized and analyzed simultaneously.The interactive visualization was implemented based on the INVEX heatmap viewer (18).It is organized into two main views consisting of an overview and a focus view for each omics data.The overview heatmap displays the overall abundance patterns for all features.Users can click-and-drag to select a region of interest to be displayed in the focus view for a more detailed inspection.The annotation bars along the top indicate the original group memberships as well as the cluster memberships based on the selected multi-view clustering algorithm.Similar to the correlation network analysis, users can perform enrichment analysis on the features displayed in the focus view for each omics type, when features are annotated during data processing.

Dimension reduction analysis track
The objective of this track is to perform dimension reduction, and then visually explore corresponding scores, loadings and biplots in interactive 3D scatter plots to understand high-level trends and associated key features.It consists of two main steps, detailed below.

Multi-omics dimension reduction. Many standard multivariate dimension reduction techniques do not perform
well on multi-omics datasets, which typically have many more features than observations (p >> n) and a multicollinear structure.Multivariate regression, the foundation of many multivariate dimension reduction techniques, performs poorly in these cases and so special care has been taken to develop more robust techniques for multi-omics data integration (19)(20)(21).OmicsAnalyst provides five different methods including multiple co-inertia analysis (MCIA), consensus PCA (CPCA), projection to latent structures (PLS), Procrustes analysis, and data integration analysis for biomarker discovery using latent components (DIA-BLO) (10,11,22,23).In general, these algorithms aim to identify sets of components that capture maximum variance within individual datasets and maximum association across datasets.They can be distinguished by individual optimization and constraint criteria used to identify component sets across the omics datasets.More detailed information and comparisons on these methods are provided in our FAQs under the 'Dimension Reduction Analysis' tab.
Visual analytics based on 3D scatter plots.OmicsAnalyst offers an interactive 3D scatter plot viewer that can display sample space (score plot), feature space (loading plot), as well as a 'merged' space (biplot) that overlays sample and feature spaces in the same plot to showcase contributions of key feature to the overall patterns.The 3D scatter plot viewer is divided into four different sections.The left panel contains a top section ('Settings') for controlling the overall visual environment of the scatter plots.The middle section ('Overall Pattern') allows users to change the grouping of nodes based on different meta-data or clustering analysis.It offers extensive options such as colors, shapes, and highlighting effects for group visualization.The bottom section displays information related to the current selections.The main scatter plot viewer in the center displays the current view -score plot, loading plot or biplot which allow users to specify features of interest to be shown as arrows on top of sample space.Users can also overlay different metadata groups as ellipsoids on top of the feature space.The right panel is divided into top ('Comparison Test') and bottom ('Enrichment Analysis') sections to allow users to perform targeted statistical and functional analysis on the current selected groups or clusters, respectively.Click a row of the result tables, the corresponding feature(s) will be displayed as arrows in the current score plot.

Case study: multi-omics analysis of human pregnancy
To facilitate users to explore different features of Omic-sAnalyst, three example multi-omics datasets have been provided including one from the Cancer Genome Atlas (TCGA, https://www.cancer.gov/tcga),one from the STATegra (24) and one from a recent multi-omics study on human pregnancy (25).Here, we provide a case study using the proteomics and metabolomics datasets from the pregnancy study.
Various physiological systems are known to change predictably throughout pregnancy (26).This study was conducted to collect comprehensive molecular data (repeated samples from the first three trimesters and 6 weeks postpartum for baseline levels; n = 17 women) to build a predictive model for gestational age (25).Here, we re-analyze the proteomics and metabolomics data sets as a case study.Differential analysis was performed using ANOVA/t-tests with thresholds chosen to give ∼30% significant features (|log 2 FC| > 1; adjusted P-value < 0.005), and datasets were auto-scaled before integration.All three visual analytics tracks were used to gain complementary perspectives of the data.First, we used the 'Free Exploration' mode of the Heatmap Visual Analytics track to understand patterns present in individual omics.While the baseline samples form a weak cluster, samples from the three trimesters are very mixed.Next, we computed the multi-dimensional components that best separated the sample groups using DIABLO and explored the results with 3D scatter plots.The global structure confirms what we expect, with baseline samples well distinguished from those collected during pregnancy, and samples collected during later trimesters located further away from the baseline (Figure 2A).The biplot overlays the sample space with the top features that most contribute to the separation (Figure 2B), in this case highlighting several proteins and metabolites that are consistent with the biology of pregnancy.Three out of the five top metabolites are associated with hormones that are elevated during pregnancy (thyroxine, pregnanediol-3-glucuronide, and cortisol).One of the top proteins (ADAM12) is a serum marker for pregnancy, two (GDF15 and GPC3) are encoded by genes that have high expression in placenta relative to other tissues, and one is angiotensin (AGT), a hormone known to be elevated during pregnancy (27).All feature arrows point in the same direction, except for the DL-2- aminooctinoic acid metabolite.Finally, we used correlation networks to visualize relationships between key features from the top three DIABLO components.The network has a central cluster of proteins that are positively correlated with the proteins and metabolites on the left, and negatively correlated with the metabolites on the right (Figure 2C).Inspecting several individual features shows that the structure is consistent with Figure 2B: the central proteins and positively correlated metabolites contain many of the previously highlighted biplot features (ADAM12, Cortisol, Sunitinib, and Pregnanediol-3-glucuronide) while one of the negatively correlated metabolites is DL-2-aminooctinoic acid, the lone biplot feature that pointed in the opposite direction.Network module analysis with the 'Walk-Trap' algorithm resulted in three modules, all of which contained both proteins and metabolites (Figure 2D).The blue module was statistically significant, and enrichment analysis revealed that it is significantly enriched for the Reactome pathway 'Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs)'.IGF is known to be ele-vated during pregnancy (28).This case study has illustrated the improved insights and rich biological context when multi-omics data and visual analytics are used together.More details and figures from the case study are available from the 'Tutorial' page (under the 'Case Study' tab) of OmicsAnalyst.

Comparison with other web-based tools
Table 1 shows the comparisons between OmicsAnalyst and three other web-based tools dedicated for multi-omics integration and analysis, including 3Omics (29), MiBiOmics (30) and OmicsNet (7).The 3Omics supports analysis of transcriptomics, proteomics and metabolomics data from human.It includes modules for correlation analysis, coexpression profiling, phenotype mapping and functional enrichment analysis.MiBiOmics tackles multi-omics integration through correlation analysis using WGCNA-based approach and dimension reduction analysis using MCIA and Procrustes analysis.Finally, OmicsNet uses a priori interaction information to construct multi-omics networks for genes, proteins, metabolites, miRNA, and transcription factors.The resulting network is interactively visualized in 3D space.OmicsAnalyst distinguishes itself by bringing together multivariate, data-driven feature selection and inte-gration with innovative visual analytics for unbiased exploration and interrogation of complex multi-omics datasets.

CONCLUSIONS
The motivation for OmicsAnalyst was to create an intuitive, web-based platform for multi-omics integration that allows researchers to fuse statistical and visual streams of evidence together to make more informed judgements.In particular, we implemented three distinct visual analytics tracksfeature correlation analysis coupled with networks, sample clustering analysis coupled with heatmaps, and dimension reduction analysis coupled with 3D scatter plots.In doing so, OmicsAnalyst enables users to dissect large and complex multi-omics datasets by facilitating pattern recognition and cognitive reasoning through powerful yet intuitive visual analytics.

Figure 1 .
Figure 1.Overall workflow of OmicsAnalyst.Multi-omics integration is divided into three main phases -data processing, method selection and visual analytics.Each phase contains multiple steps and options to allow comprehensive analysis and customization.

Figure 2 .
Figure 2. Example outputs from the case study.Dimension reduction was performed with DIABLO and results visualized with (A) 3D scatter plot of score plot, and (B) 3D biplot with elliptical summaries of sample groups (red = baseline, green = first trimester, dark blue = second trimester, light blue = third trimester) and the contributions of top five differentially expressed proteins and metabolites (red arrows).Correlation networks of features selected from the top three DIABLO components in (C) concentric circular layout, and (D) linear bipartite/tripartite layout, with modules detected by the 'WalkTrap' algorithm.