Delineating regions of interest for mass spectrometry imaging by multimodally corroborated spatial segmentation

Abstract Mass spectrometry imaging (MSI), which localizes molecules in a tag-free, spatially resolved manner, is a powerful tool for the understanding of underlying biochemical mechanisms of biological phenomena. When analyzing MSI data, it is essential to delineate regions of interest (ROIs) that correspond to tissue areas of different anatomical or pathological labels. Spatial segmentation, obtained by clustering MSI pixels according to their mass spectral similarities, is a popular approach to automate ROI definition. However, how to select the number of clusters (#Clusters), which determines the granularity of segmentation, remains to be resolved, and an inappropriate #Clusters may lead to ROIs not biologically real. Here we report a multimodal fusion strategy to enable an objective and trustworthy selection of #Clusters by utilizing additional information from corresponding histology images. A deep learning–based algorithm is proposed to extract “histomorphological feature spectra” across an entire hematoxylin and eosin image. Clustering is then similarly performed to produce histology segmentation. Since ROIs originating from instrumental noise or artifacts would not be reproduced cross-modally, the consistency between histology and MSI segmentation becomes an effective measure of the biological validity of the results. So, #Clusters that maximize the consistency is deemed as most probable. We validated our strategy on mouse kidney and renal tumor specimens by producing multimodally corroborated ROIs that agreed excellently with ground truths. Downstream analysis based on the said ROIs revealed lipid molecules highly specific to tissue anatomy or pathology. Our work will greatly facilitate MSI-mediated spatial lipidomics, metabolomics, and proteomics research by providing intelligent software to automatically and reliably generate ROIs.

Mass spectrometry imaging, which localizes molecules in a tag-free, spatially-resolved manner, is a powerful tool for the understanding of underlying biochemical mechanisms of biological phenomena. When analyzing MSI data, it is essential to delineate Regions-of-Interest (ROIs) that correspond to tissue areas of different anatomical or pathological labels. Spatial segmentation, obtained by clustering MSI pixels according to their mass spectral similarities, is a popular approach to automate ROI definition. However, how to select the number of clusters (#Clusters), which determines the granularity of segmentation, remains to be resolved, and an inappropriate #Clusters may lead to ROIs not biologically real. Here we report a multimodal fusion strategy to enable an objective and trustworthy selection of #Clusters by utilizing additional information from corresponding histology images. A Deep Learning-based algorithm is proposed to extract "histomorphological feature spectra" across an entire H\&E image. Clustering is then similarly performed to produce Histology-segmentation. Since ROIs originating from instrumental noise or artifacts wouldn't be reproduced cross-modally, the consistency between histology-and MSIsegmentation becomes an effective measure of the biological validity of the results. So, #Clusters that maximizes the consistency is deemed as most probable. We validated our strategy on mouse kidney and renal tumor specimens by producing multimodally corroborated ROIs that agreed excellently with ground truths. Downstream analysis based on the said ROIs revealed lipid molecules highly specific to tissue anatomy or pathology. Our work will greatly facilitate MSI-mediated spatial lipidomics, metabolomics, and proteomics research by providing intelligent software to automatically and reliably generate ROIs.   [1,9]. An accurate definition of ROIs allows the extraction of tissue-19 type specific molecular abundances, which are essential for sta-20 tistically discovering molecular alterations between different ROIs 21 Figure 1. The overview of tissue segmentation based on (a) MSI data and (b) H&E stained histology image. In (a), mass spectra acquired by MSI can be formatted as a hyperspectral data cube. In (b), a high-resolution histology image was divided into an array of small tiles of size 200 × 200 pixels. A set of quantitative histomorphological features (HF) was then computed from each tile by a DCNN-based feature extractor. So another hyperspectral data cube similar to the MSI data was generated, with the only difference that the depth corresponded to HF rather than m/z. Clustering analysis in the spectral domain resulted in segmentation in the spatial domain. Segmentation/ROI validation was achieved by comparing the MSI-and histology-based results. of a tissue section), or between different specimens at the same 23 ROI (for example, differentially expressed molecules at a specific 24 anatomical structure between diseased versus healthy samples).

25
There are two commonly-used methods for defining ROIs. The 26 first is manual annotation guided by the staining microscopy of a 27 tissue specimen section (or its consecutive slice) [9]. Manual an-28 notation requires solid expert knowledge of histology and is prone 29 to human bias. It can also be rather time-consuming when the 30 sample size and/or the quantity of ROIs are large. Spatial segmen-31 tation [10] is the alternative approach for ROI definition, which 32 is data-driven in nature and substantially automated. For a set 33 of mass spectral profiles collected at different positions (i.e. pix-34 els), a clustering algorithm separates them into several groups (i.e.

35
clusters) in such a way that spectra in the same group are "simi-36 lar" to each other than to those in other groups. Pixels of identical 37 tissue types are expected to be similar in their chemical content 38 and thus mass spectral profiles. So, by labeling each pixel with the 39 color assigned to its cluster, we can obtain spatial segmentation 40 along the image domain as shown in Figure 1  In this article, we report a multimodal fusion strategy between 90 histology microscopy and MSI to enable an objective and trust-91 worthy selection of #Clusters. First, an in-house algorithm was proposed to generate histology-segmentation from a H&E histol-    for any pixel whose MSI-and HF-based cluster labels were incon-207 sistent, its transparency was set to 20%. The holes in the middle 208 right portion of the tissue section were the renal vein and artery. By 209 marking those pixels that were labeled differently by different imag-210 ing modalities, we could visualize the confidence levels across the 211 tissue segmentation: solid color indicated that the two modalities 212 had consensus, so we could be more confident about the labeling, 213 whereas transparent pixels indicated that the two modalities had 214 dissensus, so we had to be cautious about the labeling. In Figure   215 3, integrated segmentation maps are shown for #Clusters equal 216 to (d) 4, (e) 2, (f) 3, and (g) 5. In (b), pixels whose cluster label = 217 1 were assigned to red, label = 2 to blue, label = 3 to orange, and 218 label = 4 to green. In (g), magenta was added to represent cluster 219 label = 5. In line with Figure. Figure S10).  following standard data preprocessing protocols, including total-431 ion-count normalization, spectral smoothing, baseline reduction, 432 peak picking, peak alignment, peak binning, and peak filtering.

433
Pixels were classified as "foreground" or "background" (i.e. located  483 Registration was a prerequisite to fusing multimodal imaging data 484 sets, where the MSI and HF data cubes were spatially aligned. We into two sets of score maps, (2) we manually selected two score 489 maps whose spatial expression best matched each other (see Fig-490 ure S3) and used them as the inputs of following intensity-based 491 registration algorithms. DESI-MSI preferred a section thickness 492 of >20 µm to produce intense ion signals, which was above the 493 recommended thickness for H&E staining microscopy (<10 µm).

494
So, we had to use serial sections, one for MSI and one for H&E. This   This study was financially supported by National Natural Sci-     I am writing to submit our manuscript entitled, "Delineating Regions-of-interest for Mass Spectrometry Imaging by Multimodally Corroborated Spatial Segmentation" for consideration as a research article for GigaScience.
Mass Spectrometry Imaging (MSI) is a tag-free molecular imaging technique for biological samples. It is capable of simultaneous localizing tens to hundreds of biomolecules in a single experiment by acquiring a full mass spectrum in each pixel of a virtual grid. In MSI data analysis, spatial segmentation, obtained by clustering MSI pixels according to their mass spectral similarities, is a popular approach to delineate Regions-of-Interest (ROIs) that correspond to tissue areas of different anatomical or pathological labels. However, there is currently no good strategy to select the number of clusters (#Clusters), which is the most important parameter for clustering algorithms and determines the granularity of the segmentation. Wrong #Clusters may lead to ROIs that are not biologically real. Here, we propose a novel multimodal fusion strategy that uses information from segmentation of a histopathology image to inform the task of segmentation of an MSI image. (i) a Deep Learning-based method is proposed to produce histology-segmentation from H&E images; (ii) the most probable #Clusters is determined by using the consistency between histology-and MSI-segmentations as a quantitative measure of biological validity. Our strategy has the merits of being objective and rigorous and ensures that its produced ROIs are of biological relevance supported by both MSI and histology.
Given that MSI is being widely used in various areas of life sciences and that segmentation-based ROI delineation is an important step of MSI data analysis, we believe that the method presented in our paper will appeal to a broad range of readers who are current or potential MSI users. In addition, more general readers, who are interested in interdisciplinary research that applies Artificial Intelligence in life sciences, may also find our work valuable.
We have published a subset of our findings in a preprint repository BioArxiv in 2020 (doi: https://doi.org/10.1101/2020.07.17.208025). Part of the MSI data has been used in another article ("Multimodal Coregistration and Fusion between Spatial Metabolomics and Biomedical Imaging", in submission) but for entirely different research purposes.
Each named author has approved the contents of this paper, agreed to the GigaScience's submission policies, and substantially contributed to conducting the underlying research and drafting this manuscript. Additionally, to the best of our knowledge, the named authors have no conflict of interest, financial or otherwise. Sincerely,

Cover letter
Click here to access/download;Personal Cover;cover letter.docx