Mass spectrometry is widely used in bioanalysis, including the fields of metabolomics and proteomics, to simultaneously measure large numbers of molecules in complex biological samples. Contaminants routinely occur within these samples, for example, originating from the solvents or plasticware. Identification of these contaminants is crucial to enable their removal before data analysis, in particular to maintain the validity of conclusions drawn from uni- and multivariate statistical analyses. Although efforts have been made to report contaminants within mass spectra, this information is fragmented and its accessibility is relatively limited. In response to the needs of the bioanalytical community, here we report the creation of an extensive manually well-annotated database of currently known small molecule contaminants.
Availability: The Mass spectrometry Contaminants Database (MaConDa) is freely available and accessible through all major browsers or by using the MaConDa web service http://www.maconda.bham.ac.uk.
Supplementary information:Supplementary data are available at Bioinformatics online.
Our understanding of biological systems has considerably improved through recent developments in mass spectrometry (MS)-based metabolomics (Dettmer et al., 2007; Patti et al., 2012). Continuous efforts have been made to improve the quality of metabolome measurements, including in sample preparation (Villas-Boas et al., 2005), data collection (Dunn et al., 2011; Weber et al., 2011) and data analysis (Dunn et al., 2012; Weber et al., 2011). Nonetheless, sample preparation methods and MS analyses have the potential to introduce contaminants, such as plasticizers, additives and solvents (Keller et al., 2008). Such contaminants of laboratory origin can obscure or even falsify biological interpretation of the data. For example, when using univariate or multivariate statistical analyses for biomarker discovery, the conclusions of that study can be fundamentally flawed if signals remain unidentified and are later discovered to be exogenous chemicals. Several analytical methods have been reported to minimize the interference caused by MS contaminants. Despite these improvements, contaminants are still a major problem in MS experiments. Improved methods to identify and then treat contaminants appropriately are required urgently. Although in most cases identified contaminants should be eliminated from datasets, occasionally they can be beneficial, for example, when used for internal mass calibration of spectra (Scheltema et al., 2008).
Here we present a manually well-annotated database of currently known MS contaminants to assist both the metabolomics and bioanalytical chemistry communities in their data processing.
2 METHODS AND IMPLEMENTATION
MaConDa contains more than 200 contaminant records detected across several MS platforms. The majority of records include theoretical as well as experimental MS data. In a few cases, experimental data were included without rigorous identification (Sumner et al., 2007). The majority of experimental data reported in the literature has been collected in positive ion mode, which is reflected in the database. Also, the amount of MS/MS data for contaminants is currently rather limited. However, the database has the capability to store this type of data as more is recorded by the community. As such, and to the best of our knowledge, this is the first publicly accessible, readily searchable, readily implementable into an automated computational pipeline, readily expandable database of mass spectral contaminants.
A summary of the MaConDa features:
Database access via SOAP web service;
Database access via a user-friendly browser Web interface;
Batch processing of peak lists;
Searching of contaminants using additional ion forms;
Exporting results into different formats (e.g. tab-delimited and CSV);
Multiple database identifiers (e.g. PubChem Compound Identifier and Standard InChI code) for each contaminant to allow cross-referencing with other resources or databases;
The total database is freely available in several formats (e.g. tab-delimited, CSV, XML and SQL format).
MaConDa is an extensive manually well-annotated database that provides a useful and unique resource for the MS community. Analytical techniques used in metabolomics and proteomics are continually enhanced to improve their sensitivity. As a result, new contaminants are introduced into the experimental pipeline. Continued input of these new contaminants from the MS community and our own laboratory will enhance MaConDa as a valuable resource.
We gratefully thank our colleagues (David Watson, University of Strathclyde; John Draper, Aberystwyth University; John Langley, University of Southampton; John Newman, University of California Davis; Warwick Dunn, University of Manchester; William Griffiths, Swansea University) and instrument manufacturers (Thermo Fisher Scientific and Bruker Daltonics) who provided us with MS contaminant data. We thank Cheng Cao for his contribution to the website.
Funding: We thank both the British Heart Foundation (PG/10/036/28341) and UK Engineering and Physical Sciences Research Council (EP/J501414/1) for support, as well as the University of Birmingham’s Systems Science for Health initiative.
Conflict of Interest: none declared.