Abstract

Summary

Pavian is a web application for exploring classification results from metagenomics experiments. With Pavian, researchers can analyze, visualize and transform results from various classifiers—such as Kraken, Centrifuge and MethaPhlAn—using interactive data tables, heatmaps and Sankey flow diagrams. An interactive alignment coverage viewer can help in the validation of matches to a particular genome, which can be crucial when using metagenomics experiments for pathogen detection.

Availability and implementation

Pavian is implemented in the R language as a modular Shiny web app and is freely available under GPL-3 from http://github.com/fbreitwieser/pavian.

1 Introduction

Microbiome research has seen an enormous growth over the last decade, especially in the areas of health and disease, microbial ecology and infectious diseases. With metagenomics data becoming more ubiquitous, secondary data analysis and visualization methods are pivotal (Breitwieser et al., 2017). For example, when using metagenomics sequencing for the diagnosis of infections, finding a single pathogen in a complex background—which might include non-pathogenic microbial communities, contamination and false positives—can be extremely hard without the ability to compare results across multiple samples. Interactive exploration and visualization can help users understand the data and find the needle in the haystack.

With Pavian, we provide a novel interface to explore metagenomics results from the Kraken (Wood and Salzberg, 2014), KrakenUniq (Breitwieser et al., 2018), Centrifuge (Kim et al., 2016) and MetaPhlAn (Truong et al., 2015) classifiers. Pavian enables researchers to visualize and understand the species present in a single sample as well as compare identifications across multiple samples. Similar methods include Shiny-phyloseq (McMurdie and Holmes, 2015) and metaviz (Wagner et al., 2018) which are geared towards community analysis, while Pavian has unique features such as its Sankey flow diagram and genome alignment viewer.

2 Pavian software

Pavian implements a straightforward interface to analyze and compare complex metagenomics datasets. Figure 1 shows how Pavian displays data from multiple brain biopsies of patients with suspected neurological infections (Salzberg et al., 2016).

(A) Main interface with sample classification summary shown in a Sankey diagram. The width of the flow corresponds to the number of reads, and hovering over a species node brings up a barchart with the number of reads for the species across the sample set (inset). (B) The sample comparison module provides a tabular overview of the reads, percentages or z-scores over all samples at any taxonomic rank with interactive filtering. (C) The alignment viewer displays the coverage of a particular genome based on BAM alignment files
Fig. 1.

(A) Main interface with sample classification summary shown in a Sankey diagram. The width of the flow corresponds to the number of reads, and hovering over a species node brings up a barchart with the number of reads for the species across the sample set (inset). (B) The sample comparison module provides a tabular overview of the reads, percentages or z-scores over all samples at any taxonomic rank with interactive filtering. (C) The alignment viewer displays the coverage of a particular genome based on BAM alignment files

Sample view. The sample view provides an overview of the classifications (see Fig. 1A). Pavian's default visualization choice is a Sankey diagram, a type of graph that is often used to map energy flows. Pavian uses this diagram to display the flow of reads from the root of the taxonomy to more specific ranks. The width of the flow is proportional to the number of reads. The Sankey diagram puts a visual emphasis on the major flows within the system and thus provides a clear visual summary of the classifications.

Sample comparison. The sample comparison view provides a tabular view of identification results from multiple samples (Fig. 1B) to identify which microbes are commonly observed and which are present only in one or a few samples. The query-able table displays taxa as rows and samples as columns. By default, identifications at all taxonomy levels are shown, but the table can be filtered to display identifications at specific ranks, e.g. species, genus or phylum ranks, with one click. Additional columns with transformations of the data—such as percentage, z-score and rank—provide further ways to sort and filter the data.

Alignment viewer. A high read count for a particular species does not always mean that the microbe is present; e.g. contaminated genomes and low-complexity sequences can produce spurious matches between reads and genomes. Pavian implements an alignment viewer for BAM files that shows the read distribution over both the whole genome and in selected regions, which can give additional evidence on whether an identification is credible or not (Fig. 1C). The interface further provides links to download RefSeq genome assemblies.

3 Implementation

Pavian is written in R (https://www.R-project.org) as a modular Shiny app. It incorporates several interactive Javascript tables and D3 plots (Bostock et al., 2011), such as the Sankey diagram that is based on D3's sankey code and the networkD3 library. The alignment viewer uses the Rsamtools library to load the bam file. The backend can be launched from an R environment or installed on a server, and the interface is accessed through a web browser.

4 Conclusion

Pavian is a novel tool for visualizing and analyzing metagenomics data. Its functions help microbiome researchers as well as clinical microbiologists to gain a better understanding of their data through Sankey flow diagrams, multiple comparison tables and a genome alignment viewer.

Funding

This work was supported in part by grants R35-GM130151 and R01-HG006677 from the National Institutes of Health, and by grant number W911NF-14-1-0490 from the U. S. Army Research Office.

Conflict of Interest: none declared.

References

Bostock
M.
 et al. (
2011
)
D³ data-driven documents
.
IEEE Trans. Vis. Comput. Graph
.,
17
,
2301
2309
.

Breitwieser
F.P.
 et al. (
2018
)
KrakenUniq: confident and fast metagenomics classification using unique k-mer counts
.
Genome Biol
.,
19
,
198
.

Breitwieser
F.P.
 et al. (
2017
)
A review of methods and databases for metagenomic classification and assembly
.
Brief. Bioinform
. doi: 10.1093/bib/bbx120.

Kim
D.
 et al. (
2016
)
Centrifuge: rapid and sensitive classification of metagenomic sequences
.
Genome Res.
,
26
,
1721
1729
.

McMurdie
P.J.
,
Holmes
S.
(
2015
)
Shiny-phyloseq: web application for interactive microbiome analysis with provenance tracking
.
Bioinformatics
,
31
,
282
283
.

Salzberg
S.L.
 et al. (
2016
)
Next-generation sequencing in neuropathologic diagnosis of infections of the nervous system
.
Neurol. Neuroimmunol. Neuroinflamm
.,
3
,
e251
.

Truong
D.T.
 et al. (
2015
)
MetaPhlAn2 for enhanced metagenomic taxonomic profiling
.
Nat. Methods
,
12
,
902
903
.

Wagner
J.
 et al. (
2018
)
Metaviz: interactive statistical and visual analysis of metagenomic data
.
Nucleic Acids Res
.,
46
,
2777
2787
.

Wood
D.E.
,
Salzberg
S.L.
(
2014
)
Kraken: ultrafast metagenomic sequence classification using exact alignments
.
Genome Biol
.,
15
,
R46
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Associate Editor: Russell Schwartz
Russell Schwartz
Associate Editor
Search for other works by this author on: