-
PDF
- Split View
-
Views
-
Cite
Cite
Florian P Breitwieser, Steven L Salzberg, Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification, Bioinformatics, Volume 36, Issue 4, February 2020, Pages 1303–1304, https://doi.org/10.1093/bioinformatics/btz715
- Share Icon Share
Abstract
Pavian is a web application for exploring classification results from metagenomics experiments. With Pavian, researchers can analyze, visualize and transform results from various classifiers—such as Kraken, Centrifuge and MethaPhlAn—using interactive data tables, heatmaps and Sankey flow diagrams. An interactive alignment coverage viewer can help in the validation of matches to a particular genome, which can be crucial when using metagenomics experiments for pathogen detection.
Pavian is implemented in the R language as a modular Shiny web app and is freely available under GPL-3 from http://github.com/fbreitwieser/pavian.
1 Introduction
Microbiome research has seen an enormous growth over the last decade, especially in the areas of health and disease, microbial ecology and infectious diseases. With metagenomics data becoming more ubiquitous, secondary data analysis and visualization methods are pivotal (Breitwieser et al., 2017). For example, when using metagenomics sequencing for the diagnosis of infections, finding a single pathogen in a complex background—which might include non-pathogenic microbial communities, contamination and false positives—can be extremely hard without the ability to compare results across multiple samples. Interactive exploration and visualization can help users understand the data and find the needle in the haystack.
With Pavian, we provide a novel interface to explore metagenomics results from the Kraken (Wood and Salzberg, 2014), KrakenUniq (Breitwieser et al., 2018), Centrifuge (Kim et al., 2016) and MetaPhlAn (Truong et al., 2015) classifiers. Pavian enables researchers to visualize and understand the species present in a single sample as well as compare identifications across multiple samples. Similar methods include Shiny-phyloseq (McMurdie and Holmes, 2015) and metaviz (Wagner et al., 2018) which are geared towards community analysis, while Pavian has unique features such as its Sankey flow diagram and genome alignment viewer.
2 Pavian software
Pavian implements a straightforward interface to analyze and compare complex metagenomics datasets. Figure 1 shows how Pavian displays data from multiple brain biopsies of patients with suspected neurological infections (Salzberg et al., 2016).

(A) Main interface with sample classification summary shown in a Sankey diagram. The width of the flow corresponds to the number of reads, and hovering over a species node brings up a barchart with the number of reads for the species across the sample set (inset). (B) The sample comparison module provides a tabular overview of the reads, percentages or z-scores over all samples at any taxonomic rank with interactive filtering. (C) The alignment viewer displays the coverage of a particular genome based on BAM alignment files
Sample view. The sample view provides an overview of the classifications (see Fig. 1A). Pavian's default visualization choice is a Sankey diagram, a type of graph that is often used to map energy flows. Pavian uses this diagram to display the flow of reads from the root of the taxonomy to more specific ranks. The width of the flow is proportional to the number of reads. The Sankey diagram puts a visual emphasis on the major flows within the system and thus provides a clear visual summary of the classifications.
Sample comparison. The sample comparison view provides a tabular view of identification results from multiple samples (Fig. 1B) to identify which microbes are commonly observed and which are present only in one or a few samples. The query-able table displays taxa as rows and samples as columns. By default, identifications at all taxonomy levels are shown, but the table can be filtered to display identifications at specific ranks, e.g. species, genus or phylum ranks, with one click. Additional columns with transformations of the data—such as percentage, z-score and rank—provide further ways to sort and filter the data.
Alignment viewer. A high read count for a particular species does not always mean that the microbe is present; e.g. contaminated genomes and low-complexity sequences can produce spurious matches between reads and genomes. Pavian implements an alignment viewer for BAM files that shows the read distribution over both the whole genome and in selected regions, which can give additional evidence on whether an identification is credible or not (Fig. 1C). The interface further provides links to download RefSeq genome assemblies.
3 Implementation
Pavian is written in R (https://www.R-project.org) as a modular Shiny app. It incorporates several interactive Javascript tables and D3 plots (Bostock et al., 2011), such as the Sankey diagram that is based on D3's sankey code and the networkD3 library. The alignment viewer uses the Rsamtools library to load the bam file. The backend can be launched from an R environment or installed on a server, and the interface is accessed through a web browser.
4 Conclusion
Pavian is a novel tool for visualizing and analyzing metagenomics data. Its functions help microbiome researchers as well as clinical microbiologists to gain a better understanding of their data through Sankey flow diagrams, multiple comparison tables and a genome alignment viewer.
Funding
This work was supported in part by grants R35-GM130151 and R01-HG006677 from the National Institutes of Health, and by grant number W911NF-14-1-0490 from the U. S. Army Research Office.
Conflict of Interest: none declared.