SPUTNIK: an R package for filtering of spatially related peaks in mass spectrometry imaging data

Abstract Summary SPUTNIK is an R package consisting of a series of tools to filter mass spectrometry imaging peaks characterized by a noisy or unlikely spatial distribution. SPUTNIK can produce mass spectrometry imaging datasets characterized by a smaller but more informative set of peaks, reduce the complexity of subsequent multi-variate analysis and increase the interpretability of the statistical results. Availability and implementation SPUTNIK is freely available online from CRAN repository and at https://github.com/paoloinglese/SPUTNIK. The package is distributed under the GNU General Public License version 3 and is accompanied by example files and data. Supplementary information Supplementary data are available at Bioinformatics online.


Pixel count based filter
Pixel count based filter select peaks whose signal pixels are connected forming groups larger than a given threshold. The threshold value is related with the physical size of the expected smallest subregion of interest.

Arguments:
1. "roiImage": binary image representing the region of interest. This can be calculated using one of the methods described in "Reference image calculation"; 2. "minNumPixels": the minimum number of connected pixels to select the peak; 3. "aggressive": level of "aggressiveness". Algorithm: 4. for each peak binarized image (using Otsu's thresholding): a. if aggressive = 0: i. measure the largest size N of connected regions (number of pixels in each cluster); ii. if N is larger than minNumPixels: ii. if N1 is greater than minNumPixels AND N1 is greater than or equal than N2: 1. retain the peak; c. if aggressive = 2: i. measure the largest size N1 of connected regions within the ROI and the largest size N2 of connected regions outside the ROI, as defined by roiImage; ii. if N1 is greater than minNumPixels AND N1 is smaller than minNumPixels: 1. retain the peak.

Complete spatial randomness filter
Complete spatial randomness filter selects peaks whose signals distributions reject the null hypothesis of complete spatial randomness.

Algorithm:
1. If method = "KS": a. use the reference image, calculated using the methods described in "Reference image calculation", as covariate density; b. for each peak: i. define a point pattern process (Baddeley and Turner, 2005) from the Otsu's binarized peak image; ii. calculate the p-value 2. if method = "ClarkEvans": a. for each peak: i. define a point pattern process (Baddeley and Turner, 2005) from the Otsu's binarized peak image; ii. apply Clark Evans test to calculate the p-value.
3. Correct the p-values using multiple testing correction method.
4. Peaks are selected setting a threshold for the p-values. Figure S1 -Effect of single filter applied to the two example dataset provided with the package. The filters were applied with the default parameters. The results confirm that DESI-MSI peak images are less scattered than MALDI-MSI, since the reference similairy based filter (RSF) returns a smaller dataset than the pixel count based filter (CPF)  Most MSI samples are constituted by a background region which contains solvent/matrix and contaminants signals, and a tissue region, which contains signals of biological nature. The provided set of filters exploit this assumption, removing all the ions whose spatial distribution is not confined in the provided reference mask, representing the spatial distribution of the tissue.
Pixel count Tests for disconnected ion signal pixels patterns or connected pixel regions smaller than a user-defined threshold Noise signals are characterized by a scattered random patterns. These patterns can be associated with detector noise, when the signal is not present in the source or when the source signal is low intense (close to the detection limits). These issues make these signals unreliable for statistical analysis. The smallest allowed connected region is data-dependent (spatial resolution, prior knowledge on the expected granularity of the spatial signal patterns). By default, this threshold is set equal to 9 pixels.
Complete spatial randomness Tests for the randomness of the spatial distribution of the signal pixels. Given a reference, it tests whether Complete spatial randomness allows the identification of (even disconnected) pixel patterns that are statistically non-random or that are characterized by a spatial density that reflects an external the (even disconnected) pixel patterns covary with it. reference heatmap. This set of filters can be used when scattered patters can still represent sample related signals. In that case, these filters should be used instead of the pixel count based filters.
Table S1 -Scheme of the three main filter families provided with SPUTNIK. Each family measures different properties of the spatial distribution of the peaks (Method column), and it addresses specific characteristics expected in the noise/uninformative signals (Rationale column).