Accessible and reproducible mass spectrometry imaging data analysis in Galaxy

Abstract Background Mass spectrometry imaging is increasingly used in biological and translational research because it has the ability to determine the spatial distribution of hundreds of analytes in a sample. Being at the interface of proteomics/metabolomics and imaging, the acquired datasets are large and complex and often analyzed with proprietary software or in-house scripts, which hinders reproducibility. Open source software solutions that enable reproducible data analysis often require programming skills and are therefore not accessible to many mass spectrometry imaging (MSI) researchers. Findings We have integrated 18 dedicated mass spectrometry imaging tools into the Galaxy framework to allow accessible, reproducible, and transparent data analysis. Our tools are based on Cardinal, MALDIquant, and scikit-image and enable all major MSI analysis steps such as quality control, visualization, preprocessing, statistical analysis, and image co-registration. Furthermore, we created hands-on training material for use cases in proteomics and metabolomics. To demonstrate the utility of our tools, we re-analyzed a publicly available N-linked glycan imaging dataset. By providing the entire analysis history online, we highlight how the Galaxy framework fosters transparent and reproducible research. Conclusion The Galaxy framework has emerged as a powerful analysis platform for the analysis of MSI data with ease of use and access, together with high levels of reproducibility and transparency.


The Galaxy framework for flexible and reproducible data analysis
In essence, the Galaxy framework is characterized by four hallmarks: (1) usage of a graphical front-end that is web browser based, hence alleviating the need for advanced IT skills or the requirement to locally install and maintain software tools; (2) access to largescale computational resources for academic users; (3) provenance tracking and full version control, including the ability to switch between software and tool version and to publish complete analysis, thus enabling full reproducibility; (4) access to a vast array of open-source tools with the ability of seamless passing data from one tool to another, thus generating added value by interoperability.
Multiple Galaxy servers on essentially every continent provide access to large computing resources, data storing capabilities, and hundreds of pre-installed tools for a broad range of data analysis applications through a web browser based graphical user interface [28][29][30].
Additionally, there are more than hundred public Galaxy servers available that offer more specific tools for niche application areas. For local usage, Galaxy can be installed on any computer ranging from private laptops to high computing clusters. So-called "containers" facilitate a fully functional one-click installation independent of the operating system. Hence, local Galaxy serves are easily deployed even in "private" network situations in which these servers remain invisible and inaccessible to outside users. This ability empowers Galaxy for the analysis of sensitive and protected data, e.g. in a clinical setting.
In the Galaxy framework, data analysis information is stored alongside the results of each analysis step to ensure reproducibility and traceability of results. The information includes tool and software names and versions together with all parameters [31].
We propose that MSI research can greatly benefit from the possibility to privately or publicly share data analysis histories, workflows, and visualizations with collaboration partners or the entire scientific community, e.g. as online supplementary data for peer-reviewed publications.
The latter step easily fulfills the criteria of the suggested MSI minimum reporting guidelines [6,16]. The Galaxy framework is predestinated for the analysis of multi-omics studies as it facilitates the integration of software of different origin into one analysis [32,33]. The possibility to seamlessly link tools of different origins has outstanding potential for MSI studies that often rely on different software platforms to analyze MSI data, additional MS/MS data (from liquid chromatography coupled tandem mass spectrometry), and (multimodal) imaging data. More than hundred tools for proteomic and metabolomics data analysis are readily available in Galaxy due to community driven efforts [34][35][36][37][38]. Increasing integration of MSI with other omics approaches such as genomics and transcriptomics is anticipated and the Galaxy framework offers a powerful and future-proof platform to tackle complex, interconnected data-driven experiments.

The newly available MSI toolset in the Galaxy framework
We have developed 18 Galaxy tools that are based on the commonly used open-source softwares Cardinal, MALDIquant, and scikit-image and enable all steps that commonly occur in MSI data analysis ( Figure 1) [20,21,24]. In order to deeply integrate those tools into the Galaxy framework, we developed bioconda packages and biocontainers as well as a socalled 'wrapper' for each tool [31,39]. The MSI tools consist of R scripts that were developed based on Cardinal and MALDIquant functionalities, extended for more analysis options and a consistent framework for input and output of metadata (Additional File 1). The image coregistration method uses scikit-image for image processing. All tools are deliberately build in a modular way to enable highly flexible analysis and to allow a multitude of additional functionalities by cleverly combining the MSI specific tools with already availably Galaxy tools. Typical MSI data analysis step include Quality control, file handling, preprocessing, ROI annotation, supervised and unsupervised statistical analysis, visualizations and identification of features. Due to the variety of MSI applications, tools of all or only a few of these categories are used and the order of usage is highly flexible. To serve a broad range of data analysis tasks, we provide 18 tools that cover all common data analysis procedures and can be arbitrarily connected to allow customized analysis.
Data formats and data handling: We extended the Galaxy framework to support open and standardized MSI data files such as imzML, which is the default input format for the Galaxy MSI tools. Nowadays, the major mass spectrometer vendors directly support the imzML standard and several tools exist to convert different file formats to imzML [40]. Data can be easily uploaded to Galaxy via a web browser or via a built-in file transfer protocol (FTP) functionality. Intermediate result files can be further processed in the interactive environment that supports R Studio and Jupyter or downloaded for additional analysis outside of Galaxy [41].
To facilitate the parallel analysis of multiple files, the Galaxy framework offers so-called "file collections". Numerous files can be represented in a file collection allowing simultaneous analysis of all files while the effort for the user is the same as for a single file. MSI meta data such as spectra annotations, calibrant m/z, and statistical results are stored as tab-separated values files, thus enabling processing by a plethora of tools both inside and outside the Galaxy framework. All graphical results of the MSI tools are stored as concise vector graphic PDF reports with publication-quality images.
Quality control and visualization: MSI Quality control: Quality control is an essential step in data analysis and should not only be used to judge the quality of the raw data but also to control processing steps such as smoothing, peak picking, and intensity normalization. Therefore, we have developed the 'MSI Qualitycontrol' tool that automatically generates a comprehensive pdf report with more than 30 different plots that enable a global view of all aspects of the MSI data including intensity distribution, m/z accuracy and segmentation maps. For example, spectra with bad quality, such as low total ion current or low number of peaks can be directly spotted in the quality report and subsequently be removed by applying the 'MSI data exporter' and 'MSI filtering' tools.
MSI mz image: The 'MSI mz image' tool allows to automatically generate a publicationquality pdf file with distribution heat maps for all m/z features provided in a tab-separated values file. Contrast enhancement and smoothing options are available as well as the possibility to overlay several m/z features in one image.
MSI plot spectra: The 'MSI plot spectra' tool displays multiple single or average mass spectra in a pdf file. Overlay of multiple single or averaged mass spectra with different colors in one plot is also possible.
The Galaxy framework offers various visualization options for tab-separated values files, including heatmaps, barplots, scatterplots, and histograms. This enables a quick visualization of the properties of tab-separated values files obtained during MSI analysis.
A large variety of tools that allows for filtering, sorting, and manipulating of tab-separated values files is already available in Galaxy and can be integrated into the MSI data analysis. Some dedicated tools for imzML file handling were newly integrated into the Galaxy framework.
MSI combine: The 'MSI combine' tool allows combining several imzML files into a merged dataset. This is especially important to enable direct visual but also statistical comparison of MSI data that derived from multiple files. With the 'MSI combine tool', individual MSI datasets are either placed next to each other in a coordinate system or can be shifted in x or y direction in a user defined way. The output of the tool contains a single file with the combined MSI data and an additional tab-separated values file with spectra annotations, i.e. each spectrum is annotated with its original file name (before combination) and, if applicable, with previously defined annotations such as diagnosis, disease type, and other clinical parameters.
MSI filtering: The 'MSI filtering' tool provides options to filter m/z features and pixel (spectra) of interest, either by applying manual ranges (minimum and maximum m/z, spatial area as defined by x / y coordinates) or by keeping only m/z features or coordinates of pixels that are provided in a tab-separated values file. Unwanted m/z features such as pre-defined contaminant features can be removed within a preselected m/z tolerance. MSI data exporter: The 'MSI data exporter' can export the spectra, intensity and m/z data of an imzML file together with their summarized properties into tab-separated values files.

Region of interest annotation:
For supervised analysis, spatial regions of interest (ROI) can be defined. However, annotation of these ROIs is infeasible on the MSI images. Therefore, the ROIs are annotated on a photograph or histological image of the sample. We extended and developed six new A multitude of statistical analysis options for tab-separated values files is already available in Galaxy, the most MSI relevant tools are from the Workflow4metabolomics project and consist of unsupervised and supervised statistical analysis tools [44]. For specific purposes of spatially resolved MSI data analysis, we have integrated Cardinal's powerful spatially aware statistical analysis options into the Galaxy framework.
MSI segmentation: The 'MSI segmentation' tool enables spatially aware unsupervised statistical analysis with principal component analysis, spatially aware k-means clustering and spatial shrunken centroids [45,46]. MSI classification: The 'MSI classification' tool offers three options for spatially aware supervised statistical analysis: partial least square (discriminant analysis), orthogonal partial least squares (discriminant analysis), and spatial shrunken centroids [47].
Analyte identification: m/z determination on its own often remains insufficient to identify analytes. Compound fragmentation and tandem mass spectrometry are typically employed for compound identification by mass spectrometry. In MSI, the required local confinement of the mass spectrometry analysis severely limits the compound amounts that are available for fragmentation. Hence, direct on-target fragmentation is rarely employed in MSI. A common practice for compound identification includes a combinatorial approach in which LC-MS/MS data is used to identify the analytes while MSI analyses their spatial distribution. This approach requires assigning putative analyte information to m/z values within a given accuracy range.
Join two files on a column allowing a small difference: This newly developed tool allows for the matching of numeric columns of two tab-separated values files on the smallest distance that can be absolute or in ppm. This tool can be used to identify the m/z features of a tab-separated values files by matching them to already identified m/z features of another tabseparated values file (e.g. from a database or from an analysis workflow).
Community efforts such as Galaxy-M, Galaxy-P, Phenomenal, and Workflow4Metabolomics have led to a multitude of metabolomics and proteomics analysis tools available in Galaxy [34][35][36][37][38]. These tools allow analyzing additional tandem mass spectrometry data that is often acquired to aid identification of MSI m/z features. Databases to which the results can be matched, such as uniprot and lipidmaps, are directly available in Galaxy [48,49]. The highly interdisciplinary and modular data analysis options in Galaxy render it a very powerful platform for MSI data analyses that are part of a multi-omics study.

Accessibility & training
All described tools are easily accessible and usable via the European Galaxy server [29].
Furthermore, all tools are deposited in the Galaxy Toolshed from where they can be easily installed into any other Galaxy instance [50]. We have developed bioconda packages and biocontainers that allow for version control and automated installation of all tool dependencies -those packages are also useful outside Galaxy to enhance reproducibility [31,39]. For researchers that do not want to use publicly available Galaxy servers, we provide a pre-built Docker image that is easy to install independent of the operating system.
For a swift introduction into the analysis of MSI data in Galaxy, we have developed training material for metabolomics and proteomic use cases and deposited it to the central repository of the Galaxy Training Network [51,52]. The training materials consist of a comprehensive collection of small example datasets, step-by-step explanations and workflows that enable any interested researcher in following the training and understanding it through active participation.
The first training explains data upload in Galaxy and describes the quality control of a mouse kidney tissue section in which peptides were imaged with an old MALDI-TOF [53]. The dataset contains peptide calibrants that allow the control of the digestion efficiency and m/z accuracy. Export of MSI data into tab-separated values files and further filtering of those files is explained as well.
The second training explains the examination of the spatial distribution of volatile organic compounds in a chili section. The training roughly follows the corresponding publication and explains how average mass spectra are plotted and only the relevant m/z range is kept, as well as how to automatically generate many m/z distribution maps and overlay several m/z feature maps [19].
The third training determines and identifies N-linked glycans in mouse kidney tissue sections with MALDI-TOF and additional LC-MS/MS data analysis [54,55]. The training covers combining datasets, preprocessing as well as unsupervised and supervised statistical analysis to find potential N-linked glycans that have different abundances in the PNGase F treated kidney section compared to the kidney section that was treated with buffer only. The training further covers identification of the potential N-linked glycans by matching their m/z values to a list of N-linked glycan m/z that were identified by LC-MS/MS. The full dataset is used as a case study in the following section.

Case study
To exemplify the utility of our MSI tools we re-analyzed the N-glycan dataset that was recently made available by Gustafsson et al. via the PRIDE repository with accession PXD009808 [55,56]. The aim of the study was to demonstrate that their automated sample preparation method for MALDI imaging of N-linked glycans successfully works on formalinfixed paraffin-embedded (FFPE) murine kidney tissue [54]. PNGase F was printed on two FFPE murine kidney sections to release N-linked glycans from proteins while in a third section one part of the kidney was covered with N-glycan calibrants and another part with buffer to serve as a control. We downloaded all four imzML files (two treated kidneys, control and calibrants) from PRIDE and uploaded them with the composite upload function into Galaxy. To obtain an overview of the files we used the 'MSI Qualitycontrol' tool. We resampled the m/z axis, combined all files and run again the 'MSI Qualitycontrol' tool to directly compare the four subfiles. Next, we performed TIC normalization, smoothing and baseline removal. Spectra were aligned to the stable peaks that are present in at least 80 % of all spectra [57]. Spectra, in which less than two stable peaks could be aligned, were removed. This affected mainly spectra from the control file. Peak picking, detection of monoisotopic peaks and binning was performed on the average spectra of each subfile. The obtained m/z features were extracted with Cardinal's 'peaks' algorithm from the normalized, smoothed, baseline removed and aligned file. Next, principal component analysis with four components was performed ( Figure 2). To find potential N-linked glycans, the two treated tissues were compared to the control tissue with the supervised spatial shrunken centroids algorithm. Spatial shrunken centroids is a multivariate classification method that was specifically developed to account for the spatial structure of the data (Figure 3a) [45]. The supervised analysis provided us with 28 m/z features that discriminated between the two PNGase F treated kidneys and the control kidney with a spatial shrunken centroids p-value <  In Gustafsson's own terms from a recent publication, our results show that their results are reproducible, because we, as another group, have followed as closely as possible their data analysis procedure and arrived at similar results [16].The reproducibility of the results shows the capacity of our pipeline. To enable what Gustafsson has described as "methods reproducibility" we provide the complete analysis history and the corresponding workflow.
With this in hand, any other researcher can use the same tools and parameters in Galaxy to obtain the same result as we did.  Publishing histories and workflows from Galaxy requires only a few clicks and provides more information than requested by the minimum reporting guidelines MSI MIAPE (Minimum Information About a Proteomics Experiment) and MIAMSIE (Minimum Information About a Mass Spectrometry Imaging Experiment) [6,16]. The Galaxy software itself but also the shared histories and workflows fulfil the FAIR principles that stand for findability, accessibility, interoperability, and reusability [27].

Summary
With the integration of the MSI data analysis toolset, we have incorporated an accessible and reproducible data analysis platform for MSI data in the Galaxy framework. Our MSI tools complement the multitude of already available Galaxy tools for proteomics and metabolomics that are maintained by Galaxy-M, Galaxy-P, Phenomal and Workflow4Metabolomics [34][35][36][37][38].
We are in close contact with those communities and would like to encourage developers of the MSI community to join forces and make their tools available in the Galaxy framework. We

Abstract:
Background: Mass spectrometry imaging is increasingly used in biological and translational research as it has the ability to determine the spatial distribution of hundreds of analytes in a sample. Being at the interface of proteomics/metabolomics and imaging, the acquired data sets are large and complex and often analyzed with proprietary software or in-house scripts, which hinder reproducibility. Open source software solutions that enable reproducible data analysis often require programming skills and are therefore not accessible to many MSI researchers.

Findings:
We have integrated 18 dedicated mass spectrometry imaging tools into the Galaxy framework to allow accessible, reproducible, and transparent data analysis. Our tools are based on Cardinal, MALDIquant, and scikit-image and enable all major MSI analysis steps such as quality control, visualization, preprocessing, statistical analysis, and image coregistration. Further, we created hands-on training material for use cases in proteomics and metabolomics. To demonstrate the utility of our tools, we re-analyzed a publicly available Nlinked glycan imaging dataset. By providing the entire analysis history online, we highlight how the Galaxy framework fosters transparent and reproducible research.

Conclusion:
The Galaxy framework has emerged as a powerful analysis platform for the analysis of MSI data with ease of use and access together with high levels of reproducibility and transparency. 3

Findings: Background:
Mass spectrometry imaging (MSI) is increasingly used for a broad range of biological and clinical applications as it allows the simultaneous measurement of hundreds of analytes and their spatial distribution. The versatility of MSI is based on its ability to measure many different kinds of molecules such as peptides, metabolites or chemical compounds in a large variety of samples such as cells, tissues, fingerprints or human made materials [1][2][3][4][5].
Depending on the sample, the analyte of interest and the application, different mass spectrometers are used [6]. Due to the variety of samples, analytes, and mass spectrometers, MSI is suitable for highly diverse use cases ranging from plant research, to (pre-)clinical, pharmacologic studies, and forensic investigations [2,[7][8][9]. On the other hand, the variety of research fields hinders harmonization and standardization of MSI protocols. Recently efforts were started to develop optimized sample preparation protocols and show their reproducibility in multicenter studies [10][11][12][13]. In contrast, efforts to make data analysis standardized and reproducible are in its infancy.
Reproducibility of MSI data analyses is hindered by the common use of software with restricted access such as proprietary software, license requiring software, or unpublished inhouse scripts [14]. Open source software has the potential to advance accessibility and reproducibility issues in data analysis but requires complete reporting of software versions and parameters, which is not yet routine in MSI [15][16][17].  [18]. Yet, many of these tools necessitate steep learning curves, in some cases even requiring programming knowledge to make use of their full range of functions [19][20][21][22][23].
To overcome problems with accessibility of software and computing resources, standardization, and reproducibility, we developed MSI data analysis tools for the Galaxy framework that are based on the open source software suites Cardinal, MALDIquant, and scikit-image [20,21,24]. Galaxy is an open source computational platform for biomedical research that was developed to support researchers without programming skills with the analysis of large data sets, e.g. in the field of next generation sequencing. Galaxy is used by hundred thousands of researchers and provides thousands of different tools for many different scientific fields [25].

Aims:
With the present publication, we aim to raise awareness within the MSI community for the advantages being offered by the Galaxy framework with regard to standardized and reproducible data analysis pipelines. Secondly, we present newly developed Galaxy tools and offer them to the MSI community through the graphical front-end and "drag-and-drop" workflows of the Galaxy framework. Thirdly, we apply the MSI Galaxy tools to a publicly available dataset to study N-glycan identity and distribution in murine kidney specimens in order to demonstrate usage of a Galaxy-based MSI analysis pipeline that facilitates standardization and reproducibility and is compatible with the principles of FAIR (findable, accessible, interoperable, and re-usable) data and MIAPE (minimum information about a proteomics experiment) [26,27].

The Galaxy framework for flexible and reproducible data analysis
In essence, the Galaxy framework is characterized by four hallmarks: (1) usage of a graphical front-end that is web browser based, hence alleviating the need for advanced IT skills or the requirement to locally install and maintain software tools; (2) access to largescale computational resources for academic users; (3) provenance tracking and full version control, including the ability to switch between software and tool version and to publish complete analysis, thus enabling full reproducibility; (4) access to a vast array of open-source tools with the ability of seamless passing data from one tool to another, thus generating added value by interoperability.
Multiple Galaxy servers on essentially every continent provide access to large computing resources, data storing capabilities, and hundreds of pre-installed tools for a broad range of data analysis applications through a web browser based graphical user interface [28][29][30].
Additionally, there are more than hundred public Galaxy servers available that offer more specific tools for niche application areas. For local usage, Galaxy can be installed on any computer ranging from private laptops to high computing clusters. So-called "containers" facilitate a fully functional one-click installation independent of the operating system. Hence, local Galaxy serves are easily deployed even in "private" network situations in which these servers remain invisible and inaccessible to outside users. This ability empowers Galaxy for the analysis of sensitive and protected data, e.g. in a clinical setting.
In the Galaxy framework, data analysis information is stored alongside the results of each analysis step to ensure reproducibility and traceability of results. The information includes tool and software names and versions together with all parameters [31].
We propose that MSI research can greatly benefit from the possibility to privately or publicly share data analysis histories, workflows, and visualizations with collaboration partners or the entire scientific community, e.g. as online supplementary data for peer-reviewed publications.
The latter step easily fulfills the criteria of the suggested MSI minimum reporting guidelines [6,16]. The Galaxy framework is predestinated for the analysis of multi-omics studies as it facilitates the integration of software of different origin into one analysis [32,33]. The possibility to seamlessly link tools of different origins has outstanding potential for MSI studies that often rely on different software platforms to analyze MSI data, additional MS/MS data (from liquid chromatography coupled tandem mass spectrometry), and (multimodal) imaging data. More than hundred tools for proteomic and metabolomics data analysis are readily available in Galaxy due to community driven efforts [34][35][36][37][38]. Increasing integration of MSI with other omics approaches such as genomics and transcriptomics is anticipated and the Galaxy framework offers a powerful and future-proof platform to tackle complex, interconnected data-driven experiments.

The newly available MSI toolset in the Galaxy framework
We have developed 18 Galaxy tools that are based on the commonly used open-source softwares Cardinal, MALDIquant, and scikit-image and enable all steps that commonly occur in MSI data analysis (Figure 1) [20,21,24]. In order to deeply integrate those tools into the Galaxy framework, we developed bioconda packages and biocontainers as well as a socalled 'wrapper' for each tool [31,39]. The MSI tools consist of R scripts that were developed based on Cardinal and MALDIquant functionalities, extended for more analysis options and a consistent framework for input and output of metadata (Additional File 1). The image coregistration method uses scikit-image for image processing. All tools are deliberately build in a modular way to enable highly flexible analysis and to allow a multitude of additional functionalities by cleverly combining the MSI specific tools with already availably Galaxy tools.

Figure 1: Typical MSI data analysis steps and associated Galaxy tools.
Typical MSI data analysis step include Quality control, file handling, preprocessing, ROI annotation, supervised and unsupervised statistical analysis, visualizations and identification of features. Due to the variety of MSI applications, tools of all or only a few of these categories are used and the order of usage is highly flexible. To serve a broad range of data analysis tasks, we provide 18 tools that cover all common data analysis procedures and can be arbitrarily connected to allow customized analysis.

Data formats and data handling:
We extended the Galaxy framework to support open and standardized MSI data files such as imzML, which is the default input format for the Galaxy MSI tools. Nowadays, the major mass spectrometer vendors directly support the imzML standard and several tools exist to convert different file formats to imzML [40]. Data can be easily uploaded to Galaxy via a web browser or via a built-in file transfer protocol (FTP) functionality. Intermediate result files can be further processed in the interactive environment that supports R Studio and Jupyter or downloaded for additional analysis outside of Galaxy [41].
To facilitate the parallel analysis of multiple files, the Galaxy framework offers so-called "file collections". Numerous files can be represented in a file collection allowing simultaneous analysis of all files while the effort for the user is the same as for a single file. MSI meta data such as spectra annotations, calibrant m/z, and statistical results are stored as tab-separated values files, thus enabling processing by a plethora of tools both inside and outside the Galaxy framework. All graphical results of the MSI tools are stored as concise vector graphic PDF reports with publication-quality images.
Quality control and visualization: MSI Quality control: Quality control is an essential step in data analysis and should not only be used to judge the quality of the raw data but also to control processing steps such as smoothing, peak picking, and intensity normalization. Therefore, we have developed the 'MSI Qualitycontrol' tool that automatically generates a comprehensive pdf report with more than 30 different plots that enable a global view of all aspects of the MSI data including intensity distribution, m/z accuracy and segmentation maps. For example, spectra with bad quality, such as low total ion current or low number of peaks can be directly spotted in the quality report and subsequently be removed by applying the 'MSI data exporter' and 'MSI filtering' tools. A large variety of tools that allows for filtering, sorting, and manipulating of tab-separated values files is already available in Galaxy and can be integrated into the MSI data analysis. Some dedicated tools for imzML file handling were newly integrated into the Galaxy framework.
MSI combine: The 'MSI combine' tool allows combining several imzML files into a merged dataset. This is especially important to enable direct visual but also statistical comparison of MSI data that derived from multiple files. With the 'MSI combine tool', individual MSI datasets are either placed next to each other in a coordinate system or can be shifted in x or y direction in a user defined way. The output of the tool contains a single file with the combined MSI data and an additional tab-separated values file with spectra annotations, i.e. each spectrum is annotated with its original file name (before combination) and, if applicable, with previously defined annotations such as diagnosis, disease type, and other clinical parameters.
MSI filtering: The 'MSI filtering' tool provides options to filter m/z features and pixel (spectra) of interest, either by applying manual ranges (minimum and maximum m/z, spatial area as defined by x / y coordinates) or by keeping only m/z features or coordinates of pixels that are provided in a tab-separated values file. Unwanted m/z features such as pre-defined contaminant features can be removed within a preselected m/z tolerance.
MSI data exporter: The 'MSI data exporter' can export the spectra, intensity and m/z data of an imzML file together with their summarized properties into tab-separated values files.

Region of interest annotation:
For supervised analysis, spatial regions of interest (ROI) can be defined. However, annotation of these ROIs is infeasible on the MSI images. Therefore, the ROIs are annotated on a photograph or histological image of the sample. We extended and developed six new A multitude of statistical analysis options for tab-separated values files is already available in Galaxy, the most MSI relevant tools are from the Workflow4metabolomics project and consist of unsupervised and supervised statistical analysis tools [44]. For specific purposes of spatially resolved MSI data analysis, we have integrated Cardinal's powerful spatially aware statistical analysis options into the Galaxy framework.
MSI segmentation: The 'MSI segmentation' tool enables spatially aware unsupervised statistical analysis with principal component analysis, spatially aware k-means clustering and spatial shrunken centroids [45,46]. MSI classification: The 'MSI classification' tool offers three options for spatially aware supervised statistical analysis: partial least square (discriminant analysis), orthogonal partial least squares (discriminant analysis), and spatial shrunken centroids [47].
Analyte identification: m/z determination on its own often remains insufficient to identify analytes. Compound fragmentation and tandem mass spectrometry are typically employed for compound identification by mass spectrometry. In MSI, the required local confinement of the mass spectrometry analysis severely limits the compound amounts that are available for fragmentation. Hence, direct on-target fragmentation is rarely employed in MSI. A common practice for compound identification includes a combinatorial approach in which LC-MS/MS data is used to identify the analytes while MSI analyses their spatial distribution. This approach requires assigning putative analyte information to m/z values within a given accuracy range.
Join two files on a column allowing a small difference: This newly developed tool allows for the matching of numeric columns of two tab-separated values files on the smallest distance that can be absolute or in ppm. This tool can be used to identify the m/z features of a tab-separated values files by matching them to already identified m/z features of another tabseparated values file (e.g. from a database or from an analysis workflow).
Community efforts such as Galaxy-M, Galaxy-P, Phenomenal, and Workflow4Metabolomics have led to a multitude of metabolomics and proteomics analysis tools available in Galaxy [34][35][36][37][38]. These tools allow analyzing additional tandem mass spectrometry data that is often acquired to aid identification of MSI m/z features. Databases to which the results can be matched, such as uniprot and lipidmaps, are directly available in Galaxy [48,49]. The highly interdisciplinary and modular data analysis options in Galaxy render it a very powerful platform for MSI data analyses that are part of a multi-omics study.

Accessibility & training
All described tools are easily accessible and usable via the European Galaxy server [29].
Furthermore, all tools are deposited in the Galaxy Toolshed from where they can be easily installed into any other Galaxy instance [50]. We have developed bioconda packages and biocontainers that allow for version control and automated installation of all tool dependencies -those packages are also useful outside Galaxy to enhance reproducibility [31,39]. For researchers that do not want to use publicly available Galaxy servers, we provide a pre-built Docker image that is easy to install independent of the operating system.
For a swift introduction into the analysis of MSI data in Galaxy, we have developed training material for metabolomics and proteomic use cases and deposited it to the central repository of the Galaxy Training Network [51,52]. The training materials consist of a comprehensive collection of small example datasets, step-by-step explanations and workflows that enable any interested researcher in following the training and understanding it through active participation.
The first training explains data upload in Galaxy and describes the quality control of a mouse kidney tissue section in which peptides were imaged with an old MALDI-TOF [53]. The dataset contains peptide calibrants that allow the control of the digestion efficiency and m/z accuracy. Export of MSI data into tab-separated values files and further filtering of those files is explained as well.
The second training explains the examination of the spatial distribution of volatile organic compounds in a chili section. The training roughly follows the corresponding publication and explains how average mass spectra are plotted and only the relevant m/z range is kept, as well as how to automatically generate many m/z distribution maps and overlay several m/z feature maps [19].
The third training determines and identifies N-linked glycans in mouse kidney tissue sections with MALDI-TOF and additional LC-MS/MS data analysis [54,55]. The training covers combining datasets, preprocessing as well as unsupervised and supervised statistical analysis to find potential N-linked glycans that have different abundances in the PNGase F treated kidney section compared to the kidney section that was treated with buffer only. The training further covers identification of the potential N-linked glycans by matching their m/z values to a list of N-linked glycan m/z that were identified by LC-MS/MS. The full dataset is used as a case study in the following section.

Case study
To exemplify the utility of our MSI tools we re-analyzed the N-glycan dataset that was recently made available by Gustafsson et al. via the PRIDE repository with accession PXD009808 [55,56]. The aim of the study was to demonstrate that their automated sample preparation method for MALDI imaging of N-linked glycans successfully works on formalinfixed paraffin-embedded (FFPE) murine kidney tissue [54]. PNGase F was printed on two FFPE murine kidney sections to release N-linked glycans from proteins while in a third section one part of the kidney was covered with N-glycan calibrants and another part with buffer to serve as a control. We downloaded all four imzML files (two treated kidneys, control and calibrants) from PRIDE and uploaded them with the composite upload function into Galaxy. To obtain an overview of the files we used the 'MSI Qualitycontrol' tool. We resampled the m/z axis, combined all files and run again the 'MSI Qualitycontrol' tool to directly compare the four subfiles. Next, we performed TIC normalization, smoothing and baseline removal. Spectra were aligned to the stable peaks that are present in at least 80 % of all spectra [57]. Spectra, in which less than two stable peaks could be aligned, were removed. This affected mainly spectra from the control file. Peak picking, detection of monoisotopic peaks and binning was performed on the average spectra of each subfile. The obtained m/z features were extracted with Cardinal's 'peaks' algorithm from the normalized, smoothed, baseline removed and aligned file. Next, principal component analysis with four components was performed ( Figure 2). To find potential N-linked glycans, the two treated tissues were compared to the control tissue with the supervised spatial shrunken centroids algorithm. Spatial shrunken centroids is a multivariate classification method that was specifically developed to account for the spatial structure of the data (Figure 3a) [45]. The supervised analysis provided us with 28 m/z features that discriminated between the two PNGase F treated kidneys and the control kidney with a spatial shrunken centroids p-value <  In Gustafsson's own terms from a recent publication, our results show that their results are reproducible, because we, as another group, have followed as closely as possible their data analysis procedure and arrived at similar results [16].The reproducibility of the results shows the capacity of our pipeline. To enable what Gustafsson has described as "methods reproducibility" we provide the complete analysis history and the corresponding workflow.
With this in hand, any other researcher can use the same tools and parameters in Galaxy to obtain the same result as we did. We could identify 16 N-linked glycans by matching the m/z features of the MSI data (column 1) to the identified m/z features of the LC-MS/MS experiment (column 5). We allowed a maximum tolerance of 300 ppm and multiple matches. Only single matches occurred with an average m/z error of 46 ppm (column 6).
Publishing histories and workflows from Galaxy requires only a few clicks and provides more information than requested by the minimum reporting guidelines MSI MIAPE (Minimum Information About a Proteomics Experiment) and MIAMSIE (Minimum Information About a Mass Spectrometry Imaging Experiment) [6,16]. The Galaxy software itself but also the shared histories and workflows fulfil the FAIR principles that stand for findability, accessibility, interoperability, and reusability [27].

Summary
With the integration of the MSI data analysis toolset, we have incorporated an accessible and reproducible data analysis platform for MSI data in the Galaxy framework. Our MSI tools complement the multitude of already available Galaxy tools for proteomics and metabolomics that are maintained by Galaxy-M, Galaxy-P, Phenomal and Workflow4Metabolomics [34][35][36][37][38].
We are in close contact with those communities and would like to encourage developers of the MSI community to join forces and make their tools available in the Galaxy framework. We