Quantifying and correcting slide-to-slide variation in multiplexed immunofluorescence images

Abstract Motivation Multiplexed imaging is a nascent single-cell assay with a complex data structure susceptible to technical variability that disrupts inference. These in situ methods are valuable in understanding cell–cell interactions, but few standardized processing steps or normalization techniques of multiplexed imaging data are available. Results We implement and compare data transformations and normalization algorithms in multiplexed imaging data. Our methods adapt the ComBat and functional data registration methods to remove slide effects in this domain, and we present an evaluation framework to compare the proposed approaches. We present clear slide-to-slide variation in the raw, unadjusted data and show that many of the proposed normalization methods reduce this variation while preserving and improving the biological signal. Furthermore, we find that dividing multiplexed imaging data by its slide mean, and the functional data registration methods, perform the best under our proposed evaluation framework. In summary, this approach provides a foundation for better data quality and evaluation criteria in multiplexed imaging. Availability and implementation Source code is provided at: https://github.com/statimagcoll/MultiplexedNormalization and an R package to implement these methods is available here: https://github.com/ColemanRHarris/mxnorm. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Single-cell assays are increasingly valued for their ability to provide information about the cell micro-environment and cell population interactions in healthy and cancerous tissues (Islam et al., 2020;McKinley et al., 2019;Shrubsole et al., 2008). Multiplexed imaging methods such as multiplexed immunofluorescence (MxIF) (Gerdes et al., 2013), multiplexed immunohistochemistry (IHC) (Tsujikawa et al., 2017) and CODEX (Goltsev et al., 2018) are in situ analyses of multiple marker channels over a large number of cells within a given tissue sample. These methods build upon dissociative singlecell analysis methods like flow cytometry (Bradford et al., 2004) and single-cell RNA sequencing (Chen et al., 2019) to allow scientists to better understand spatial cell-cell interactions in biological samples.
One significant issue in multiplexed imaging data is the presence of systematic noise at a variety of levels, related to batch and slide effects, imaging variables and optical effects (Berry et al., 2021;Chang et al., 2020). A single experiment may contain hundreds of slides and terabytes of data across which a researcher seeks to make inference (Maric et al., 2021). However, this data complexity and the within-slide dependencies induce complex effects that can disrupt inference. This technical variability can be compounded through the complex image pre-processing pipeline and may contribute to biases that increase type 1 or type 2 error. Furthermore, it V C The Author(s) 2022. Published by Oxford University Press.
is difficult to develop a standardized pre-processing pipeline because of substantial variability in the markers used across different studies, as target proteins differ across organs and cancer types (Schapiro et al., 2021;Yapp et al., 2021).
Image normalization is a technique used to adjust the input pixel-or image-level values of an image to remove noise and improve image quality. Due to the nascent development of multiplexed imaging, there are few established statistical tools that address challenges related to technical variation in this dataset (Chang et al., 2020). Normalization methods may improve similarity across images by removing the unknown effect of technical variability. Moreover, statistical methods for batch correction and image normalization can be modified to fit this complex data structure to ultimately reduce systematic noise and improve statistical inference.
Extensive work has been done in other fields to adjust for batch effects and systematic noise, particularly with regards to neuroimaging and genetic sequencing data. One primary method employed in both of these fields is the ComBat method, introduced for genetic micro-array data (Johnson et al., 2007) and then adapted to neuroimaging in the analysis of magnetic resonance imaging (MRI) data (Fortin et al., 2017;Yu et al., 2018). The ComBat method is a location-scale model that implements an empirical Bayes algorithm to adjust for batch effects and is robust to outliers in small sample sizes. Curve registration, a non-parametric tool from functional data analysis (FDA), has been used in recent work to adjust for systematic variability in accelerometry and MRI data (Marron et al., 2015;Wrobel et al., 2020Wrobel et al., , 2019. In the neuroimaging context, curve registration is used to normalize the imaging data by non-linearly transform the image intensity domain so that it is similar across images from different subjects, potentially collected on different scanners. Multiplexed imaging data are further complicated because it is non-negative, which other groups have remarked upon in similar imaging applications like spatial transcriptomics (Elosua-Bayes et al., 2021)-this requires unique derivations and/or applications of normalization methods to ensure no contradictions arise from negative marker intensities.
While adaptable, existing methods for normalizing data from other domains cannot be directly applied within multiplexed imaging due to the unusual format of the data (cell populations can differ substantially across samples), and the heavy skewness of the image histogram. The few algorithms adapted specifically for normalizing multiplex imaging data still could benefit from upstream normalization using algorithms adapted from other domains (Chang et al., 2020;Raza et al., 2016). For example, the RESTORE algorithm is a method developed for multiplexed imaging that uses negative control cells to remove unwanted variation across slides (Chang et al., 2020). However, this method relies on clustering mutually exclusive marker pairs using cell-level labels that are defined using unnormalized marker intensities and thus embed biases as detailed in this article. Raza et al. (2016) also introduced normalization methods in the multiplexed imaging that implement a procedure of image filters and transformations. These methods show improvements at the pixel and image level, but do not correct for slide or batch effects that are prevalent as detailed in this work. Hence, the normalization methods proposed here can be applied early in the image processing pipeline to reduce bias in subsequent steps like phenotyping and spatial correlation analyses.
In this article, we introduce and compare normalization and data transformation methods for multiplexed imaging data. These techniques combine transformations of the scale of the data from its raw form with algorithms (namely, ComBat and functional data registration) adapted to remove slide effects from the data. We further develop multiple novel metrics to quantify and measure the removal of technical variation in these data, where cell populations can differ across slides. We use data from the Human Tumor Atlas Network to evaluate the methods we compare here (Rozenblatt-Rosen et al., 2020). While we apply the methods here to segmented and quantified single-cell data from multiplexed imaging, they can also be applied at the pixel level.

Implementation
We compare three data transformations: log 10 , mean division (division by the slide-level mean) and mean division with log 10 , and three normalization procedures: no normalization, ComBat and functional data registration, for a total of nine potential multiplex image normalization algorithms (Table 1).

Transformations
Let Y ic ðuÞ denote the raw intensity of unit u on slide i for marker channel c (here u corresponds to segmented cell intensities). We consider the following transformations: the log 10 transformation, log 10 ðY ic ðuÞ þ 1Þ, where the addition of 1 follows since Y ic ðuÞ is integer-valued; the mean division transformation: YicðuÞ l ic , where l ic is the mean intensity value on slide i for channel c; and the mean division log 10 transformation, log 10 , where again l ic is the mean intensity value on slide i for channel c. Here, the data are no longer integer-valued, and the addition of 1 2 ensures values greater than 1 2 are positive and less than 1 2 are negative to properly adjust this scale of data. Other transformations, including a Z-score transformation, can be found in the Supplementary Material.

Combat normalization
We adapted the empirical Bayes framework of the ComBat algorithm (Fortin et al., 2017;Johnson et al., 2007) for multiplexed imaging data. We parameterize mean and variance of the slide-level batch effects, with the location-scale model where we define Y ic ðuÞ as the intensity of unit u on slide i for marker channel c and a c as the grand mean of Y ic ðuÞ for channel c. Though in principle, units can be at the pixel or cell level, in our application, Y ic ðuÞ is the median cell intensity (or its transformed counterpart) of a selected marker for a given segmented cell on a specific slide in the dataset. Here c ic is the mean batch effect of slide i for channel c and we assume c ic $ Nðc c ; s 2 c Þ; d 2 ic is the variance batch effect of slide i for channel c and we assume d 2 ic $ IGðx c ; b c Þ, and we assume the random errors e ic ðuÞ $ Nð0; 1Þ. We use the data to estimateâ c and then estimateĉ ic ¼ 1 Uic P u Y ic ðuÞ, or the sample mean intensity on slide i for channel c. We further definer c ¼ 1 N P ic ðY ic ðuÞ Àâ c À c ic Þ 2 and let: where we assume Z ic ðuÞ $ Nðc ic ; d 2 ic Þ. Based on the posterior conditional means, we find the following empirical Bayes estimators of the two batch effect parameters (a detailed derivation of these estimators can be found in the Supplementary Material): Where we define U ic as the number of quantified cells present on a particular slide i for a given channel c. We calculate the hyperparameter estimates of b c ; x c ; s 2 c ; c c using the method of moments and iterate between estimating the hyper-parameters and batch effect parameters until convergence (Dempster et al., 1977;Johnson et al., 2007). Upon convergence, we use these batch effects to adjust the data, This model adjusts the Z-normalized intensity data, Z ic ðuÞ, by the mean and variance batch effects, and re-scales back to the initial scale of the data with the mean and variance of the raw marker intensity values. Note that zeroes were left in the data prior to the ComBat normalization, since for each scale transformation we perform on the data the zeroes are meaningful rather than an absence of signal.

Functional data registration
For the second normalization algorithm, we implemented functional data registration using the fda R package (Ramsay et al., 2020;Ramsay and Silverman, 2005). This approach uses FDA methods to approximate the histograms for each slide and channel as smooth densities, and uses functional registration to align the densities to their average at the slide-level. Functional registration is performed by estimating a monotonic warping function for each density that stretches and compresses the intensities such that densities are aligned. These warping functions are then used to transform the marker intensity values in the images so that non-biological variability is reduced across slides.
Here, let our observed cell intensity values Y ic ðuÞ have density Y ic ðuÞ $ f ðyji; cÞ. Our goal is to remove technical variation related to the slide by estimating a warping function, / ic ðyÞ, which is a monotonic transformation of the intensities. We first use a 21 degree of freedom cubic B-spline basis to approximate the densities of the median cell intensities for each slide and marker, f ðyji; cÞ % b T gðyÞ where b 2 R 21 is an unknown coefficient vector and g(y) is a vector of known basis functions. We then register the approximated histograms to the average, restricting the warping function to be a 2 degree of freedom linear B-spline basis for some functions h 1 ðyÞ and h 2 ðyÞ and for constants C 0 and C 1 to be estimated from the data, such that the transformation is monotonic (Ramsay and Silverman, 2005). Unknown parameters b 1ic and b 2ic are estimated to minimize, ð y kf ic ð/ ic ðyÞÞ À f ðyÞk 2 dy Where f(y) is the average density across slides. We then use / ic ðyÞ to calculate the normalized intensity values, Y Ã ic ðuÞ: Note that the warping function / ic ðyÞ is a map that takes in the raw median cell intensity value and outputs a new, normalized intensity value. Images are then normalized by taking the original intensity values in the image, and transforming them using the map defined by the warping function. This combined process can be summarized as first taking the raw data, smoothing the histogram of these data using a B-spline basis expansion, and then calculating a warping function to transform the smoothed data so that densities across slides within marker channel c are approximately aligned.

Evaluation framework
There is no accepted gold standard for evaluating normalization methods in multiplexed imaging because the same tissue sample cannot be imaged twice and there is substantial heterogeneity across samples (Nadarajan et al., 2019;Rozenblatt-Rosen et al., 2020). Here, our evaluation framework relies on the two following conditions to be deemed successful: (i) reduction in slide-to-slide variance in the cell intensity data and (ii) preservation (and potential improvement) of existing biological signal in the data.

Alignment of marker densities
To determine if between-slide noise is visible when comparing densities, we visually inspect the changes in density curves for each transformation method. A priori, we expect that a successful transformation method will align the density curves across slides, and subsequently we inspect the placement of slide-level Otsu thresholds, a commonly used thresholding algorithm used in imaging analysis (Otsu, 1979), to confirm a reduction in variability between slides. To quantitatively measure the alignment of marker densities, we implement the k-sample Anderson-Darling statistic to quantify the likelihood that each slide is drawn from the same population (Scholz and Stephens, 1987). A higher value of this test statistic indicates greater evidence that the k-samples are drawn from different distributions.

Threshold discordance and accuracy
Otsu thresholding is a commonly used thresholding algorithm that defines an optimal threshold in gray-scale images and histograms, maximizing the between-class variance of pixel values to separate the data into two classes (Otsu, 1979). In this use case, we define Otsu thresholds at the slide-level for each of the markers in the study, where a cell with intensity value greater than the Otsu threshold is deemed marker positive. We then compare this to a global Otsu threshold, combining all slides, for each marker to calculate a mean discordance score across all slides for a given marker. For some marker channel c, slide i, and set of marker intensity values Y ic ðuÞ, define the indicator function for a given Otsu threshold o as O ic ðu; oÞ ¼ IðY ic ðuÞ > oÞ. Here, O ic ðu; oÞ indicates which cells are in the expressed category using threshold o. The discordance metric is then defined as: Where U ic is the number of quantified cells present on a particular slide i for a given channel c, o ic is the slide and channel-specific Otsu threshold, and o c is the threshold estimated across all slides for a given channel. Here we calculate a slide-level discordance score, e.g. the proportion of cells misclassified on each slide, and take an average of the score across slides for each marker channel. This measures the slide-to-slide discordance across all markers and transformation methods, to determine how similar Otsu thresholds are across slides following transformation. In this framework, a lower value of the threshold discordance score indicates better agreement across slides in the data.
We further implemented Otsu thresholding across slides to compare definitions of a marker positive cell with the manual labels of CD3 and CD8 as marker positive cells (see Section 2.3). This metric quantifies the accuracy of the Otsu thresholding method in recapitulating the bronze standard labels for each transformation method. Note: Transformations (rows) and normalization (columns) performed on the data. Here, y is the median cell intensity values for an arbitrary marker channel c, and l ic is the slide mean for slide i of the median cell intensity values for marker channel c.

Proportions of variance
To further assess the removal of slide-related variance following each transformation of the data, we fit a random effects model using the lme4 R package (Bates et al., 2015) with a random intercept for slide to assess what proportion of variance is present at the slidelevel for each marker. A successful normalization algorithm will reduce the slide-level variance, ultimately removing technical variability to improve the quality of the data.

UMAP embedding
The Uniform Manifold Approximation and Projection (UMAP) is a technique for dimension reduction (McInnes et al., 2018) commonly used in the biological sciences to distinguish differences in cell populations between single-cell data (Becht et al., 2019). Here we reduce the data into two UMAP embeddings for each of the transformation methods using only four markers in the dataset: vimentin, collagen, pan-cytokeratin and Na þ /K þ -ATPase. These markers were chosen for their ability to easily distinguish epithelial and stromal cells. We expect the UMAP embeddings to yield clear separation of the data when using the epithelium label in our dataset (see Section 2.3). To quantify this separation of groups, we implement a k-means clustering model on the UMAP embeddings to predict the class label, and use the adjusted Rand index to measure the similarity with the true labels (Hartigan and Wong, 1979;Hubert and Arabie, 1985). Larger values of this index indicate better agreement between two sets of labels, adjusted for the chance grouping of elements. Note that across each slide in the dataset, approximately 10% of the data was used to derive the UMAP embeddings to reduce computational and visualization time.

Dataset
The data were collected from human colorectal cancer tissue samples from the Human Tumor Atlas Network We used epithelial and stromal cell labels and manually labeled marker positive cells as biological variables in order to quantify loss or improvement of biological signal due to each normalization method. The epithelial labels were created for each slide at the image level using a random forest trained on all of the markers included in the dataset (for a complete list, see the Supplementary Material). A cell was labeled as being in a particular cell class if that was the most likely class probability within the segmented cell area. We defined marker positive cells by first manually thresholding the immune marker images to create marker positive image masks. Then, for each segmented cell, the cell was defined as marker positive if more than 30% of its area contained marker-positive pixels. We refer to these as manual labels for CD3 and CD8. We also used a tumor image mask to denote whether a cell is in a tumorcontaining region.

Alignment of marker densities
Density curves of the marker vimentin for each transformation algorithm and corresponding slide-level Otsu thresholds, along with test statistics from the k-sample Anderson-Darling test were compared to determine alignment of curves across slides after transformation (Fig. 1, Table 2). Beginning with the unnormalized transformed values, the log 10 transformation produces density curves that are somewhat well-aligned (AD Test: 130.08), while the mean division and mean division log 10 methods both compress the scale of the data and align well across slides (AD Test: 125.45, 89.90). Furthermore, each ComBat method performs poorly at aligning and reducing noise in the data, yielding the largest statistics from the Anderson-Darling test and visually noisy density curves. This is likely due to the Gaussian assumptions of the ComBat model that are not met in either the bi-modal ( log 10 , mean division log 10 ) or right-skewed (mean division) methods. The functional data registration aligns the log 10 and mean division log 10 well, and the algorithm yields marginal improvements for some of these transformations.
The best performing methods for this metric are the mean division, mean division log 10 , and mean division log 10 combined with the functional data registration algorithm: the data is well-aligned across slides and when averaging Anderson-Darling statistics across all marker channels (Table 2), we see these methods yield the lowest values presenting stronger evidence these values are derived from the same parent distribution. We also compared density curves of the markers CD3 and CD8 for each transformation algorithm, which largely present the same results (Supplementary Figs S1 and S2).

Threshold discordance score
In order to quantify how the normalization methods impact cell classification, we compared Otsu thresholding estimated at the slide level and across slides for each method to generate a discordance score and compare this to raw data ( Fig. 2A). Compared to the epithelium/stromal markers in the dataset, less identifiable markers like CD3 and CD8 yield the worst performance across nearly all methods, with large increases in the discordance score. Most methods increase the mean discordance score relative to the unadjusted data, with the exception of the mean division, mean division log 10 and the mean division log 10 with functional data registration. This evaluation again aligns with earlier assessments and suggests that these methods present improvements in the slide-to-slide agreement across all markers compared to the unadjusted data. We also observe that when comparing threshold discordance scores across all markers, these three methods yield the lowest values and are the only methods to reduce this rate relative to the raw data (Table 2).

Proportions of variance
To understand how well each method removes slide-related variability, we fit a random effects model on the median cell intensities after applying each combination of transformation and normalization. The ComBat algorithm, by design, removed all of the variability related to slide across all methods (Fig. 3, Table 2). The only other method that entirely removes all slide-to-slide variance across all marker channels is the mean division method-for the mean division log 10 and mean division log 10 with functional data registration methods, we also observe reduction in variance (though not completely removed) relative to the unnormalized data. And while ComBat reduces slide variability, it completely removes slide effects that may include biological differences. In short, the results of this metric suggest the utility of the mean division methods in removing slide-level variance across marker channels.  Table 1, where each column also matches with the normalization algorithms in Table 1. The ticks on the x-axis represent the Otsu thresholds for each slide for that transformed data, where the color again corresponds to the slide (such that the colors are one-to-one between threshold and density plot). Anderson-Darling test statistics for the marker vimentin are presented for each method in the top right corner Note: Results from the k-samples Anderson-Darling test statistic, the threshold discordance score, and the variance proportion at the slide level from the random effects modeling, all averaged across marker channels, as well as the adjusted Rand index for the slide identifiers comparing the raw data to the normalized data. For each of these metrics, small values indicate better performance for a given method.

Marker-positive accuracy using Otsu thresholds
We further utilized Otsu thresholding to identify marker positive cells and compared these to the manual labels for CD3 and CD8 to determine which normalization methods most accurately recapitulate the raw data (Fig. 2B). Results suggest that the scale of the data is pivotal in whether a method maintains marker-positive accuracy, with each of the methods on the log 10 scale demonstrating dramatic reductions in marker-positive accuracy compared to the raw data, while the mean division method performs the best across all methods. The methods that have performed well in the aforementioned evaluation metrics perform well here, namely the mean division method and the mean division log 10 with functional data registration. This continues to suggest these methods reduce the slide-to-slide variation present in the data while accurately capturing marker-positive cells after transformation.

UMAP embedding
We compared UMAP embeddings of four related markers across normalization methods to compare the separation of epithelium and stromal tissue labels. In the raw data, the embeddings separate well (Adj. Rand Index: 0.82); however, the data includes the presence of outliers that suggest mixing of the tissue classes in the UMAP embedding space (Fig. 4A). Nearly all methods implemented improve upon the separation of groups based on the adjusted Rand index, yet many of these methods present co-localization that does not clearly depict separation as desired. We do observe distinct separation of the aforementioned methods of interest: mean division Fig. 2. Threshold discordance and accuracy. (A) Otsu thresholds were calculated at the slide-level for each marker and compared to a global Otsu threshold for each marker to calculate a discordance score to compare transformation methods. The mean difference of the slide-level Otsu thresholds and the global Otsu threshold is then calculated for each marker, and presented as a point for each of the nine markers, with the white diamond representing the mean discordance score across all markers for a given method. Given that this is a discordance score, lower values indicate better agreement across slides. (B) Otsu thresholds were calculated across slides for each marker to determine marker positive cells, which were then compared to the manual labels for the markers CD3 and CD8 to determine the accuracy of defining a cell as marker positive. This is presented as the accuracy rate of recapitulating the ground truth labels-given that this is a measurement of accuracy, higher values indicate better agreement between the normalized data and labels. Note also that for each of these plots, the top row indicates the results from the raw, unadjusted data Fig. 3. Proportion of variance present at slide-level in random effects model. Scatter plots that denote the proportion of variance at the slide-level for each normalization method for each of the marker channels in this dataset. Variance proportions were calculated using a random effects model with a random intercept for slide-methods that perform well should reduce the slide level variance. Note also that the top row indicates the results from the raw, unadjusted data (Adj. Rand Index: 0.94), mean division log 10 (Adj. Rand Index: 0.95) and the mean division log 10 with functional data registration (Adj. Rand Index: 0.97)-each of these UMAP embeddings presents distinct groups that suggests these methods are improving the separation of these two tissue classes.
We also compared the distribution of the unique slide identifiers in the UMAP embeddings of these four markers, which in the raw data (Adj. Rand Index: 0.033) points to specific slide co-localization in the data (Fig. 4B, Table 2). In this case, we desire low values of the adjusted Rand index, which suggest poor prediction of slide labels and indicate the removal of slide-level variance. Many of the methods, particularly those implementing the ComBat algorithm, worsen the distribution of these slide identifiers and increase the adjusted Rand index, suggesting additional slide-to-slide noise added to the data. This suggests that ComBat removes both biological signal and slide-to-slide effects that are exaggerated in the UMAP embedding space. In contrast, there is reduced slide-to-slide clustering in the UMAP embeddings for each of the following methods: mean division (Adj. Rand Index: 0.01), mean division log 10 (Adj. Rand Index: 0.01) and mean division log 10 with functional data registration (Adj. Rand Index: 0.02). These methods appear to both reduce the observed slide-to-slide variation noted here and in the aforementioned results, while maintaining the necessary biological signal of interest.

Discussion
In this article, we derived the ComBat algorithm for a new modality and employed a novel use of functional data registration to align histograms of multiplexed imaging data. In the absence of a gold standard for comparison in multiplexed imaging data, validating any normalization procedure is challenging. The suggested evaluation framework introduced here can be used to assess the presence and reduction of slide effects in multiplexed imaging data, which we implemented to evaluate 12 combinations of transformations and normalization methods. Furthermore, our framework can be applied in the absence of a ground truth by quantifying the amount of sliderelated variability and comparing to manually labeled biological features, providing a foundation for further development of evaluation criteria in the multiplexed domain. Also note that since the proposed methods are applied within a given marker channel, this work can be extended into other imaging domains like IHC that do not involve multiplexing.
Similarly, the use of Otsu thresholding in this article is the standard procedure for imaging domains like IHC (Trinh et al., 2017;Tsujikawa et al., 2019). However, markers like the phosphorylated epidermal growth factor receptor are typically categorized into multiple groups based on staining intensity (Hashmi et al., 2018;Shan et al., 2017). While the Otsu threshold may not capture this categorization, it remains a reasonable proxy for these quantitative markers in the absence of a pathologist, and other metrics implemented here like the Anderson-Darling statistic may be more appropriate. Furthermore, future methods development could focus on implementing multi-Otsu thresholding methods into the threshold discordance score, or adapt marker-specific thresholding methods that better capture variability in the quantitative markers. Notably, the correspondence between a marker positive cell defined by an Otsu threshold and biological signal is not necessarily one-to-one. For example, the log 10 transformation non-linearly compresses the domain, such that a larger proportion of the x-axis is allotted to cells that are marker negative (background and unexpressed cells), which may have led to greater variability in the Otsu thresholds.
We find that the raw data scale has clear slide-to-slide variation present, and that normalization methods can reduce slide level variation while preserving and improving biological signal relative to the raw, unadjusted data. These findings suggest that the mean division transformation method reduces slide variability and improves the biological signal. In addition, the mean division log 10 scale (unnormalized) performs well across all evaluation metrics, with the noted exclusion of results for the marker CD8. This discrepancy is remedied with the functional data registration, which is a limitation of the mean division log 10 transformation but points to the robustness of the registration algorithm to maintain and improve the quality of the data.
However, note that the registration algorithm does not perform well with skewed data, suggesting that improvements we see in data that appears bi-modal (e.g. better suited to the non-parametric assumption of functional data) is not necessarily transferable to rightskewed data that violates assumptions of smoothness in the B-spline basis-future work could explore this result. The ComBat method performs adequately, but appears to over normalize the data and relies heavily on a Gaussian assumption that is violated in this skewed-right dataset. The clear limitation of this normalization method and others is that when applied to whole tissue slides, any between slide variability is confounded with biological variability. Recent adaptations of ComBat like ComBat-seq for RNA-seq data may provide a better framework to implement in the multiplexed imaging space (Zhang et al., 2020), including future work that could address how the algorithm handles zeroes. Note also that recent advances applying deep learning in fluorescence microscopy analysis combine information across heterogeneous combinations of markers to ameliorate similar problems that we address in this article, namely technical variation and comparing disparate data sources (Gomariz et al., 2021)-this could be a valuable avenue for future normalization approaches.
In practice, the mean division method is 'good enough'-it is simple, computationally efficient, and appears the least likely to introduce error while still reducing slide-to-slide variation and maintaining biological signal. The mean division log 10 method may be necessary in the case of statistical modeling, since skewed distributions are not suitable for many statistical models, but may not be the best way to represent cell intensities as a predictor variable (as appears the case for the mean division method). We see that in the case of mean division log 10 data, it may be necessary to use the registration algorithm to remedy discrepancies like those visible for the marker CD8.