Motivation: Gene expression analysis with microarrays has become one of the most widely used high-throughput methods for gathering genome-wide functional data. Emerging -omics fields such as proteomics and interactomics introduce new information sources. With the rise of systems biology, researchers need to concentrate on entire complex pathways that guide individual genes and related processes. Bioinformatics methods are needed to link the existing knowledge about pathways with the growing amounts of experimental data.
Results: We present KEGGanim, a novel web-based tool for visualizing experimental data in biological pathways. KEGGanim produces animations and images of KEGG pathways using public or user uploaded high-throughput data. Pathway members are coloured according to experimental measurements, and animated over experimental conditions. KEGGanim visualization highlights dynamic changes over conditions and allows the user to observe important modules and key genes that influence the pathway. The simple user interface of KEGGanim provides options for filtering genes and experimental conditions. KEGGanim may be used with public or private data for 14 organisms with a large collection of public microarray data readily available. Most common gene and protein identifiers and microarray probesets are accepted for visualization input.
High-throughput methods such as microarrays have changed the research pace in molecular biology. Thousands of genes and proteins are now routinely studied under experimental conditions (Buck et al., 2004; Eads et al., 2000; Schena et al., 1995), with results stored in public microarray databases like GEO (Barrett et al., 2007) and ArrayExpress (Parkinson et al., 2007), and protein databases like Pride (Jones et al., 2006).
As genomic and proteomic data accumulate, researchers envisage complex systems behind biological processes and functions. Genes and proteins rarely operate alone in the cell, but are regulated by elaborate mechanisms and bound into networks (Alon, 2007). Systems biology approaches are applied to view these networks in detail. Well-studied parts of networks called pathways have roles in cell signaling, gene regulation and metabolism as well as human disease. Pathways are described in databases like KEGG (Kanehisa et al., 2006) and Reactome (Vastrik et al., 2007).
Knowledge of pathways stored in databases is often far from complete. Bioinformatics methods are needed that combine various experimental data to verify existing knowledge and propose new hypotheses. Visualization has a key role in understanding complex and dynamical phenomena of pathways, proteomics and gene expression. Several efforts have been made in this area, but there is still a need for interactive web-based pathway resources. For example, KEGG allows the user to colour genes on the pathway. Reactome Skypainter and PathwayExpress (Khatri et al., 2007) link genes to pathways using overrepresentation analysis. However, these tools have no means for directly incorporating experimental data. Also, fixed images fail to deliver the temporal and spatial dynamics behind pathways and gene expression. GenMAPP (Dahlquist et al., 2002) and BioCyc Pathway Tools (Paley et al., 2006) produce user-defined pathways and cellular wiring diagrams, and allow inclusion of expression data with some animation capabilities. No such systematic visualization functions are available for the comprehensive KEGG resource.
KEGGanim is a novel web-based visualization tool that links manually curated pathway maps from KEGG with experimental data from sources like gene expression and proteomics. KEGGanim shows animated figures of pathways with genes and proteins depicted as coloured rectangles. Pathway members are painted red or green according to their experimental values in the given dataset. Animation changes the colour values of these rectangles while looping over experimental conditions in the dataset, for instance moments in a timeseries (Spellman et al., 1998), healthy and diseased samples (Alon et al., 1999), or samples of healthy tissues (Ge et al., 2005).
KEGGanim allows a researcher to observe expression and protein production dynamics in the context of pathway dependencies. Animating a pathway over consecutive timepoints reflects the behaviour of master regulatory genes, propagation of signals in the pathway over time, and the avalanche of up- and downregulation caused by the master regulator. When analysing a set of conditions or tissues on microarray, KEGGanim allows to reason about tissue specificity or the influence of global conditions on the pathway and its components. Figure 1 shows an example of KEGGanim output.
KEGGanim combines KEGG pathway data with a matrix of experimental values of genes and proteins. First, the user needs to select a pathway of interest from a dropdown menu in the web interface, which corresponds to a graphical map downloaded from the KEGG database.
The second input is a matrix containing experimental values for genes and proteins. A number of gene expression datasets from GEO and ArrayExpress are available in KEGGanim for immediate analysis. KEGGanim automatically fetches all associations to the genes in the pathway from the g:Profiler software (Reimand et al., 2007), and creates an animation of the related experimental values over different conditions. If several probesets or proteins match a pathway member, the corresponding node is split into smaller coloured areas to reflect different experimental values. Users can upload their own data for analysis and visualization. Most common gene and protein IDs and microarray probesets are accepted as input, for instance standard names, RefSeq, Entrez, Affymetrix, UniProt, EnsEMBL as well as species-specific IDs. Uploaded data is optionally centred and normalized, and missing values may be replaced with fixed values or via the kNN method (Troyanskaya et al., 2001) implementation in GEPAS (Montaner et al., 2006).
Additional options help to interpret the animations and concentrate on specific conditions or components. KEGGanim tooltips display names and descriptions of genes and proteins when the user hovers over corresponding pathway members. Lineplots display the amounts of proteins or the gene expression levels. The user can narrow down the study by selecting a subset of conditions to view. Experimental values for irrelevant pathway members and related probesets may also be excluded from the animation. The cinefilm feature overcomes the technical difficulty of including animations in printed materials by allowing the user to extract pathway snapshots of timepoints or conditions into a separate image (Fig. 1). These features are especially useful in visualizing timeseries data. The GIF animations produced by KEGGanim do not require special software packages for viewing, and can easily be inserted into presentation slides, web pages, tutorials, etc.
Advancing high-throughput technologies allow researchers to gather information about organizational, functional and physical layers of the cell. There is an increasing need for ideas that successfully integrate layers of data and explain the elaborate mechanisms responsible for creating the observed measurements. With the development of KEGGanim, we wish to contribute to the data integration goal and provide methods that take advantage of the powerful human visual analysis skill.
KEGGanim is a simple web-based visualization tool that links manually curated KEGG pathway maps with high-throughput data. The tool creates animations that allow intuitive visual analysis of condition or tissue-specific changes in gene expression or protein levels within the selected pathway. KEGGanim is already actively used in several research initiatives, e.g. in functional profiling of mouse embryonic stem cell development.
This research has been supported by the EU FP6 grants ENFIN LSHG-CT-2005-518254, FunGenES LSHG-CT-2003-503494 and Estonian Science Foundation ETF5724. The authors would like to thank Dr N. Billon, M. Kull, J. Hansen and the reviewers of this manuscript.
Conflict of Interest: none declared.