- Split View
-
Views
-
Cite
Cite
Thomas Naake, Emmanuel Gaquerel, MetCirc: navigating mass spectral similarity in high-resolution MS/MS metabolomics data, Bioinformatics, Volume 33, Issue 15, August 2017, Pages 2419–2420, https://doi.org/10.1093/bioinformatics/btx159
- Share Icon Share
Abstract
Among the main challenges in metabolomics are the rapid dereplication of previously characterized metabolites across a range of biological samples and the structural prediction of unknowns from MS/MS data. Here, we developed MetCirc to comprehensively align and calculate pairwise similarity scores among MS/MS spectral data and visualize these across a range of biological samples. MetCirc comprises functionalities to interactively organize these data according to compound familial groupings and to accelerate the discovery of shared metabolites and hypothesis formulation for unknowns. As such, MetCirc provides a significant advance to address biological questions in areas where chemodiversity plays a role.
MetCirc, implemented in the open-source R language, together with its vignette are available in the Bioconductor project and at https://github.com/PlantDefenseMetabolism/MetCirc.
Supplementary data are available at Bioinformatics online.
1 Introduction
One of the often neglected bottlenecks in the high-throughput profiling of metabolites is the dereplication of identification knowledge. Critical to this procedure is the ability to confidently assign metabolite identity and to score similarities across metabolite-derived analytical signals. Such procedure, as it necessitates discriminating previously known from unknown metabolites, is also expected to provide insights for the annotation of unknowns. Tandem mass spectrometry (MS/MS) of metabolite-derived fragmentation patterns has become the method of choice to accomplish this task. Innovative methods are needed to organize the vast amount of data generated by modern MS/MS analytics, especially when the analysis is performed across a vast range of samples. Recently, the comprehensive analysis of MS/MS spectral similarity has emerged as a translational dereplication approach in metabolomics (Li et al., 2015; Watrous et al., 2012). Ideally, this approach should be employed to generate data-rich visual outputs that can be interactively navigated by non-bioinformatics users. Such data exploration should help to detect metabolites which are shared across the focal biological classes and more ‘specific’ ones for which a priori classification can be provided. To our knowledge, interactive visualizations fulfilling these criteria have not yet been developed. Inspired by the Circos visualization developed to align gene and syntenic blocks across genomes (Krzywinski et al., 2009), we developed MetCirc for MS/MS similarity-based interactive analysis of within- and between-sample metabolic relatedness.
2 Available functionality/implementation
MetCirc is an R open-source analysis and interactive visualization tool for high-resolution MS/MS data. It is especially designed to organize data from comparative cross-organisms/-tissues experiments. The comprehensive calculation of pairwise similarities of MS/MS spectra within MetCirc is based on a normalized dot product (NDP Watrous et al., 2012). MetCirc circular visualization is based on the circlize (Gu et al., 2014) and the shiny (https://CRAN.R-project.org/package=shiny) packages. The developed tool represents per se a novel application since the original Circos visualization approach is currently restricted to the analysis of genomics and transcriptomics data and is not yet amenable for MS/MS metabolomics data. MetCirc is available via the Bioconductor project (https://bioconductor.org).
3 ‘A typical typeline’
The pipeline includes the creation of an MSP object based on fragmentation data, binning of fragment ions (mass-to-charge ratios referred to as m/z features), the calculation of the MS/MS similarity score (NDP) for assignment to a similarity matrix and the interactive visualization of this information using the circlize framework (Gu et al., 2014).
Creation of the MSP object. The MSP object can be created from MSP files, typically used for MS/MS library building, or from other sources. The Supplementary file and the accompanying vignette describe several cases on how to create a MSP object.
Binning. Due to expected technical variations in m/z values across measurement (<1-2 ppm with modern high-resolution MS), these values are first binned according to a tolerance parameter. Finally, a matrix with binned fragment m/z values as columns, descriptors of MS/MS as rows and where entries are intensities in % is created via the following command:
binnedMSP <- binning(MSP)
Visualization. Interactive visualization is started by running:similarity <- createSimilarityMatrix(binnedMSP)
selectedFeatures <- shinyCircos(similarity, MSP)
One of the key features of the interactive framework is that pre-defined biological classes can be compared (see Fig. 1). Classes specify the affiliation of a given MS/MS feature to any biological identifier relevant to the experiment conducted, e.g. organism, cell/tissue type, treatment/sampling time, etc. shinyCircos, which combines the circlize and shiny frameworks, is used to visualize pairwise MS/MS similarities within- and/or between biological classes. Upon selection of a MS/MS feature, the displayed auxiliary information can be updated by the user within the browser. NDP MS/MS similarity scores visualized as edges can be thresholded. Additionally, on the sidebar panel of the browser, the type of biological inference link to be displayed can be selected. MS/MS ordering (according to precursor m/z, retention time or from clustering by MS/MS similarity) within a biological class can be rapidly inter-changed.
4 Conclusion
In summary, MetCirc is a novel open-source package available via the R/Bioconductor project to make biological sense of mass spectral similarities from metabolomics data. MetCirc provides a dedicated data analysis infrastructure and visualization interface to explore small molecules that mediate functionally important phenotypes. Possible applications range from MS/MS-based metabolic pathway elucidation (in which biochemical conversions of metabolic intermediates are tracked in an unbiased manner by spectral similarity) to the diagnosis of metabolic markers in clinical metabolomics projects. Finally, stepwise data mining via MetCirc can be used to pinpoint and formulate first structural hypotheses on previously non-characterized metabolites associated with a given phenotype.
Acknowledgements
We thank the technical support of the Metabolomics Core Technology Platform (University of Heidelberg) and Dapeng Li (Max Planck Institute for Chemical Ecology) for help with compiling the MS/MS data.
Funding
Research of E.G. and T.N. in Heidelberg is supported within the framework of the Deutsche Forschungsgemeinschaft Excellence Initiative to the University of Heidelberg.
Conflict of Interest: none declared.
References