GammaGateR: semi-automated marker gating for single-cell multiplexed imaging

Abstract Motivation Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. Results To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. Availability and implementation The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.


Introduction
Multiplexed immunofluorescence (mIF) imaging is a recently developed spatial proteomic assay that allows the investigation of a tissue microenvironment for many markers and high spatial resolution (Gerdes et al. 2013, Lin et al. 2015, Goltsev et al. 2018, Wu et al. 2022).mIF images are obtained through cyclic imaging with up to 60 marker channels, each identifying a specific protein.mIF is advantageous over other single-cell assays that disaggregate the tissue because it can reveal insights about the spatial interactions between tissue and cell types in situ and offers a subcellular spatial resolution that is higher than other spatial assays.mIF has already revealed valuable spatial insights into the tumor microenvironment (Chen et al. 2021, Lewis et al. 2021, Heiser et al. 2023, Lin et al. 2023).For example, recent findings demonstrate differences in immune cell infiltration of colorectal precancerous polyps such as a higher concentration of CD8þ Tcells in the epithelium of sessile serrated lesions (SSL) than that of conventional adenomas and higher CD68þ macrophages concentration at the luminal surface in SSL, but more randomly distributed in adenomas (Chen et al. 2021).
Before important spatial insights can be gleaned using statistical methods (Edsg€ ard et al. 2018, Keren et al. 2018, Chervoneva et al. 2021, Wilson et al. 2021), mIF images undergo an intensive preprocessing pipeline to obtain singlecell measurements.While there are various steps included in the pipeline such as image registration, single-cell segmentation, quantification, and batch correction (Graf et al. 2022, Harris et al. 2022, Hunt et al. 2022, McKinley et al. 2022), cell phenotyping is typically the final step before downstream analyses on the cell-level data, similarly to other single-cell assays.Cell phenotyping identifies individual cell phenotypes from the measured marker expression values of the cell and directly affects the subsequent cell population analysis results.
The two most common approaches for cell phenotyping in mIF are manual gating and graph-based multivariate clustering.In manual gating, each sample is visualized separately to determine a threshold, and super-threshold cells are labeled as marker-positive.This procedure is repeated for all marker channels and slides, and the phenotypes are determined by combining combinations of marker-positive cells (Chen et al. 2021, Phillips et al. 2021, Lin et al. 2023).Alternatively, multivariate graph-based clustering is adapted from other singlecell assays (Menon 2018, Kiselev et al. 2019).This approach performs cell clustering, then an expert assigns a phenotype manually to each cell group based on their average expression profile.There are two limitations of graph-based clustering: (i) it can be hard to identify phenotypes based on the average expression of the clusters because they may not be representative of all cells in the cluster due to heterogeneity, and (ii) some algorithms (e.g.Louvain clustering) can be slow to fit in very large datasets.Multivariate graph-based clustering is implemented with various modifications across many software packages (Aghaeepour et al. 2013, Levine et al. 2015, Schapiro et al. 2022).Unfortunately, both methods are labor intensive, and their accuracy suffers from image noise and spatial artifacts in mIF images that cause marker expression histograms to appear continuous or uni-modal.When the histograms appear continuous or uni-modal, there is no clear separation between expressed and unexpressed cells, which leads to high variability in marker-gating results.As a result, both phenotyping methods possess shortcomings that cannot be ignored.On one hand, manual gating can be subjective.On the other hand, graph-based clustering results are prone to over-clustering and producing poor separation between clusters (Hickey et al. 2021, Seal et al. 2021).
The challenges described above are well recognized and there are a few methods and software developed that attempt to automate cell phenotyping for mIF images (Sullivan et al. 2018, Viratham Pulsawatdi et al. 2020, Pratapa et al. 2021, De Le� on Rodr� ıguez et al. 2022, Schapiro et al. 2022).For example, CellSighter is a recently proposed supervised deeplearning algorithm for cell phenotyping that requires a "gold standard" training dataset (Amitay et al. 2023).Another recent solution, ASTIR (Automated assignment of cell identity from single-cell multiplexed imaging and proteomic data), is a fast unsupervised approach that defines cell phenotypes from segmented cell-level data by using a neural networkbased mixture model assuming a multivariate log-normal distribution (Geuenich et al. 2021).Instead of binary outputs like in classification methods, ASTIR returns posterior probabilities of different cell types for each cell.This type of output is advantageous because it offers more information than nominal cell types and leaves cell labeling to the clinician's discretion.Lastly, Ahmadian et al. treat the analysis as a pixel classification problem and design a single-step framework for mIF phenotyping that is integrated with other preprocessing steps (Ahmadian et al. 2022).
Nevertheless, inconsistencies persist in the results rendered by these learning-based methods when applied across markers, slides, batches, and datasets.These inconsistencies result from the immense variation in the cell-level distribution of phenotyping markers that are often too nuanced to be removed by existing batch correction methods (Graf et al. 2022, Harris et al. 2022, Hunt et al. 2022).For example, batch correction procedures often adjust for the location and scale of the marker distributions, but do not address the systematic differences in their shape.For these reasons, it is difficult to fully automate the cell phenotyping process, despite the availability of automated tools, and manual gating is still used to perform cell phenotyping because it is easy to visualize and evaluate the quality of the phenotype (Chen et al. 2021, Lin et al. 2023).
Because automated methods cannot be run without evaluation and supervised methods require a gold-standard dataset, no method is truly fully automated.Here, we develop an explicitly semi-automated algorithm cell-level phenotyping algorithm called GammaGateR.GammaGateR allows the user to easily perform cell phenotyping, visualize results, and conduct interpretable quality control while reducing manual labor.Based on a novel closed-form Gamma mixture model (cfGMM), GammaGateR is a probabilistic model that is fitted to each channel and slide separately, and outputs positive-component probabilities for each marker.These can then be easily thresholded and combined for semi-automated marker gating or input directly into downstream analysis.GammaGateR has important technical advantages, including (i) improved computation time and model convergence due to its novel closed-form property, and (ii) high consistency and reproducibility for phenotyping results across mIF data batches due to incorporation of parameter boundary constraints.In applications on real-world mIF data, we find that GammaGateR has fast and consistent results across many slides and markers.We provide an open-source implementation of our method in the new GammaGateR R package (https://github.com/jiangmeirubyxiong/gammagater).
In this paper, we describe the cfGMM model, and evaluate its computational performance and statistical properties.We then compare the accuracy of the GammaGateR with the current state-of-the-art unsupervised cell-level phenotyping tool, ASTIR, in three datasets.Finally, we use GammaGateR outputs to compare spatial features of immune cell populations in tissue compartments between adenoma and serrated colon polyps.

Datasets and preprocessing
We use three single-cell imaging datasets to evaluate model performance and demonstrate the use of the GammaGateR analysis pipeline (Table 1): the Colorectal Molecular Atlas Project (Colon MAP) dataset (Chen et al. 2021), the Spatial Colorectal Cancer (CRC) Atlas dataset (Heiser et al. 2023), and Ovarian Cancer dataset (Steinhart et al. 2021, Wrobel andGhosh 2022).Dataset-specific acquisition and processing are described below and in prior work (Chen et al. 2021, Steinhart et al. 2021, McKinley et al. 2022).After processing and prior to analysis, cell expression values were normalized by first mean division then log10 transformation to reduce slide-to-slide variation (Harris et al. 2022).Data collection for the Colon MAP and CRC atlas was approved by the Institutional Review Board (IRB) at Vanderbilt University Medical Center and collection of the ovarian cancer dataset was approved by the IRB at the University of Colorado.

Colon MAP data
The Colon MAP data consists of precancerous samples including conventional adenomas and SSL.SSLs are more often found in the proximal colon, represent only 10%-20% of all polyps, and exhibit higher cytotoxic immune cell infiltration (Chen et al. 2021).The goal of this study was to characterize differences in the microenvironment of these two types of polyps.Data collection and processing are as described in (Chen et al. 2021).Briefly, imaging was performed on a GE IN Cell Analyzer 2500 using the Cell DIVE platform and acquired at x200 magnification, with exposure times determined manually for each antibody.Details on antibodies, staining sequence, and exposure times are given in their previous report (Chen et al. 2021).Single-cell segmentation for the colon datasets was performed using MIRIAM, a multichannel machine learning single-cell segmentation algorithm designed and evaluated for human colon and human breast carcinoma (McKinley et al. 2022).Single-cell channel quantification was performed by taking the median value within each cell region and these values were normalized and used for subsequent analyses (Harris et al. 2022).Epithelial and stromal tissue compartments were estimated using Leiden clustering and the tumor region of each sample was manually identified.

CRC atlas data
The CRC atlas dataset includes 16 samples that represent various stages of human colon cancer (Heiser et al. 2023).The goal of this study was to investigate the co-evolution of tumor and immune microenvironments in microsatellite-stable and chromosomally unstable tumors.A molecular profiling assay was performed using mIF with 33 marker channels.Single-cell segmentation, channel quantification, and normalization are performed as in the Colon MAP dataset.

Ovarian cancer data
The Ovarian Cancer data includes multiplexed immunohistochemistry (IHC) images from a tissue microarray of 128 patients with high-grade serous carcinoma (Jordan et al. 2020, Steinhart et al. 2021).The data were collected using Vectra Automated Quantitative Pathology Systems (Akoya Biosciences) at 20x resolution and stained with antibodies specific for CD8, CD68, cytokeratin, CD3, and CD19.Preprocessing including single-cell segmentation was performed using inForm software version 2.3 (Akoya Biosciences).Cell-level expression values were computed as the mean of the pixel values in each cell.The data are freely available through the VectraPolaris Bioconductor Package (Wrobel and Ghosh 2022).

Overview
The GammaGateR algorithm is unique to existing methods for its focus on parsimoniously modeling cell-level marker expression densities.This approach yields tailored-to-slide model estimation in cell-level mIF data where marker expression distributions can vary substantially across slides.The algorithm uses a zero-inflated two-component GMM to model marker expression for each slide.The Gamma mixture model naturally identifies marker-positive and marker-negative cell distributions and returns the probability of belonging to the marker-positive cell distribution for each cell.The returned probabilities can either be used directly in subsequent analysis or combined and dichotomized to define cell phenotypes.GammaGateR incorporates user-specified constraints to provide consistent model fit across a large number of slides.The model evaluation methods included in GammaGateR allow the user to evaluate the constraints and quality check results.The power source of GammaGateR is the closed-form Gamma mixture model, which is a novel approach to phenotyping for mIF data that makes it more computationally efficient than traditional GMMs.

Closed-form Gamma mixture model estimation
For mIF data, we use the GMM to fit cell marker expression values as a weighted sum of different probability distributions that represent unique cell populations (McLachlan et al. 2019).The Gamma distribution is an excellent model for marker values because the domain of the Gamma distribution is strictly positive and it has the flexibility to model the varying skewed densities seen in mIF marker values (Fig. 1.5.a).However, GMMs are not scalable for mIF image data, because they rely on computationally inefficient numerical methods to obtain the maximum likelihood estimator (MLE).The slow convergence of the MLE for the GMM makes it prohibitive to apply across a large number of channels, slides, and cells.As a solution, we develop a closed-form GMM (cfGMM; https://github.com/jiangmeirubyxiong/cfgmm)estimation procedure based on a recently developed estimator for the Gamma distribution (Ye and Chen 2017).In addition, to improve computational efficiency, the cfGMM has the benefit of allowing prior constraints on model parameters.With the cfGMM in GammaGateR, we enable the flexibility to include a biologically meaningful range for the mode of each component in the Gamma mixture model.This way, users of GammaGateR can restrict estimation to biologically meaningful values.
For the GammaGateR model, we assume nonzero marker values belong to two components representing markerpositive and negative cells separately.Because mIF data can include zero values due to autofluorescence adjustment, we use a zero-inflated two-component Gamma mixture model: where X denotes the marker value for one cell, λ 0 represents the proportion of zeros, λ 1 is the proportion of marker-negative cells, and λ 2 is the proportion of markerpositive cells.I � ð Þ denotes the indicator function and f ðx; a k; b k Þ is the Gamma density function corresponding to the kth component, with a k and b k as its shape and scale parameters for the kth component and m k ¼ ða k À 1Þb k as its mode.We assume that m 2 > m 1 , i.e. the component λ k ; a k and b k , the posterior probability of a given cell being markerpositive is where Z is a random variable indicating the component membership of the given cell (0,1,2).In the extreme tails, the marker-positive probability might not always be higher than that of marker-negative, due to different variances of the two components' Gamma distributions.Therefore, we apply a correction to the posterior probabilities to force them to be monotonic with respect to the marker values.Specifically, after the first crossing of the density curves of the two components, the density curve of the first component was forced to be nonincreasing, and the density curve of the second component was forced to be nondecreasing.More details regarding the monotonicity adjustment can be found in the Supplementary Material Section S4.In addition to the posterior probability, GammaGateR also outputs the marginal probability of the observed marker value for the markerpositive component The marginal probability is monotonically increasing in the marker intensities and represents the probability that a marker-positive cell is less than the given value, x.
We improve computation time of the classical GMM using a recently developed closed-form estimator for the Gamma distribution (Ye and Chen 2017).We replace the Gamma distribution, f , in Equation ( 1) with the generalized gamma and derive the closed-form estimators for the GMM using the approach used for the Gamma distribution by Ye and Chen (2017),   6) Expression probabilities can be extracted for downstream analysis from the model objects and setting γ k ¼ 1 in the final expression.This approach works because a k and b k have closed-form solutions to this system of equations, whereas they do not when differentiating the Gamma distribution function with respect to a k and b k .Ye and Chen (2017) showed that these closed-form estimators are consistent for the true parameters.Derivation of the cfGMM parameter estimator can be found in the Supplementary Material Section S1.Note that the closedform estimators are not identical to the MLE of the Gamma distribution, but they approach the MLE in large samples seen in mIF datasets (see Supplementary Material Section S3).Sometimes, the local maxima of unsupervised clustering algorithms are not biologically reasonable because the marker-positive and marker-negative populations overlap.To address this issue, we add constraints on the mode of the components of the cfGMM in order to restrict the markerpositive cells to the right of the marker distribution.The constraints can be selected based on a percentile of the marker distribution or by visualizing the "elbow" of the expression distribution of the marker (see GammaGateR pipeline).To incorporate this biological information, we constrain the mode, m k , for component k, which is a function of parameters a k and b k , so that it lies within the interval ðl k ; u k Þ: Because the log-likelihood is strictly concave with respect to a k and b k , the constrained maximum must lie on the boundary if the global maximum is outside the bounds, so finding the maximum reduces to finding the solution incorporating the boundary constraint.In this case, the expected log-likelihood of the component k is maximized on the line corresponding to the lower or upper constraint, e.g.ða k À 1Þb k ¼ l k .The parameters are estimated following the Expectation-Maximization algorithm (Dempster et al. 1977) (See Supplementary Material Section S2).The closed-form estimators are used in the maximization step in place of numerical optimization that is required to obtain the MLEs for the gamma distribution.

GammaGateR pipeline
The analysis pipeline is illustrated for the CD4 marker channel (Fig. 1).GammaGateR takes a single-cell image dataset, with each row corresponding to an individual cell, and each column as the normalized intensity of a marker channel for each cell (Fig. 1.1).The first step is selecting biological constraints for model fitting by visualizing overlay histograms for each marker channel (Fig. 1.2).The constraints are not manual thresholds, but represent boundaries for the mode of each component of the fitted distribution across all slides in the dataset.Because marker-positive cells often are a small proportion of all cells and have higher expression values, we limit the mode of the higher component to be no lower than the "elbow" of the overlay histograms (Fig. 1.2, e.g.0.45).While GammaGateR can be fit without constraints, the constraints provide more consistent model estimation across many slides.Examples of initial constraints for the three datasets presented here are given in Supplementary Tables S3  and S4.Given the data and constraints, GammaGateR generates the parameter estimates of the Gamma mixture model including modes and proportion of each component, and the posterior and marginal probabilities of each cell being marker-positive.
To ensure accurate model fitting, GammaGateR includes functionality for users to evaluate model fit and modify the fit when needed.Diagnostic plots for the fitted GammaGateR model object consist of a scatter plot of all the slides fitted model modes and lambda (marker-positive probability), and the fitted density curve over the cell expression histogram for each slide in the dataset (Fig. 1.5.a).The x-axis of the scatter plot is the mode for the marker-positive component, and the y-axis is the proportion of the corresponding component.The scatter plot is useful for identifying slides that are outliers with respect to where the mode of marker-positive cells lies or the estimated proportion of marker-positive cells.The histograms can be used for visually evaluating model fit for one slide.A good model fit shows an approximate fit of the smooth density line to the histogram with a marker-positive cell distribution sitting to the right (Fig. 1.5.a).If there is poor model fit, users can compare fitted models between two different constraints to check how different boundaries affect fitted values (Fig. 1.5.b). Figure 1.5.bcompares the model fit for CD4 with no constraints (the set of density curves overlaps in the left ) to the model fit with an initial constraint with a lower bound of 0.45 for the marker-positive component (the set of density curves with a good separation).The model without constraints places the marker-positive distribution directly over the marker-negative distribution.Users can adjust the parameter boundaries and fit again until satisfactory fittings are rendered.Finally, the output of the fitted models is easily accessible in the GammaGateR model object (Fig. 1.6).The vignette in the GammaGateR R package provides a guide to fitting the GammaGateR model in the lung cancer dataset from the VectraPolaris dataset available on Bioconductor.

Model estimation
GammaGateR is fitted to all datasets following the procedure described in Fig. 1.Among them, both the colon map data and the CRC atlas data go through an additional modification of the thresholds by increasing the thresholds after visually evaluating the model fit (Supplementary Table S4).The ovarian cancer data shows good results in the first round of fitting.There are five slides that have one channel that did not converge in the colon map data and the CRC atlas data.These slides are discarded in the subsequent analyses.The markers, thresholds, adjusted thresholds, and number of slides that do not converge are all listed in Supplementary Table S4.
We compare GammaGateR results to ASTIR, another unsupervised cell-level phenotyping software developed for mIF data (Geuenich et al. 2021).ASTIR uses a combination of neural networks within a multivariate normal mixture model to obtain cell phenotypes from marker channel expression data and returns a vector of phenotype probabilities that sums to one for each cell.ASTIR takes the marker expression of each cell and an XML file as input to specify phenotypes based on prespecified marker combinations.The phenotypes input for ASTIR for the three datasets are given in Supplementary Tables S1 and S2.

Methods and phenotyping evaluation
To compare the methods in determining cell phenotypes we assess the accuracy of each method relative to a "silver standard" manual phenotyping in the Colon MAP and CRC atlas datasets and evaluate the efficacy in predicting survival in the GammaGateR ovarian cancer data.We refer to the manual labels as a silver standard because they do not necessarily represent a cell's true underlying cell type as they are an estimate obtained using the manual procedure described below.We compare phenotyping results obtained using GammaGateR and ASTIR to silver standard manual phenotyping using the Adjusted Rand Index (Hubert and Arabie 1985).Adjusted rand index typically takes values between 0 and 1, where a larger value indicates a better alignment between two categorical variables.The silver standard is obtained by gating the raw images based on visual inspection and defining marker-positive cells as those that contain more than 50% marker-positive pixels within each cell.Semi-automated marker gating results are obtained using GammaGateR as described in Section 2.3, with monotonically adjusted posterior probability and marginal probability thresholded at 0.5 to define marker-positive cells.The same phenotype definitions used for ASTIR are used to define phenotypes from marker-positive labels using GammaGateR and manual marker gating as well (Chen et al. 2021, Heiser et al. 2023).Each cell belongs to a given phenotype if it is marker-positive for combinations of markers for that phenotype (Supplementary Tables S1 and S2).ASTIR phenotypes are determined by selecting the cell type with the maximum probability for each cell.All methods use the same combinations of markers to define phenotypes (Supplementary Table S1 and S2).
Because the Ovarian Cancer dataset does not include manual cell phenotypes, we instead compare the prediction accuracy of survival time data for each method across all patients in the study to determine if one method has greater biological sensitivity than the other.The original study shows that survival of ovarian cancer patients is significantly correlated with B-cell and CD4 T-cell, as well as spatial interaction between CD4 T-cell and macrophage.Therefore, we evaluate the methods using this dataset by fitting a survival model with age, cancer stage, B-cell proportions, CD4 T-cell proportions, and the spatial interaction between Macrophage and CD4 T-cells estimated by Ripley's H with r ¼ 50 (Kiskowski et al. 2009).Ripley's H is a geospatial index that describes spatial attraction/repulsion.We fit a model for each method, where the cell phenotypes are determined using the given method, and compare all models, as well as a base model that includes only age and cancer stage.We use a random forest survival model to be sensitive to complex nonlinear relationships (Ishwaran et al. 2008, Ishwaran andKogalur 2023).To estimate variability in the out-of-bag performance error, we compare the methods across 100 bootstrap samples.Performance error quantification is based on C-index (Harrell et al. 1982), where a low performance error means that the model is a good fit, and a performance error of 0.5 is a random chance.

Evaluation of GammaGateR using simulations
We built simulations to evaluate GammaGateR's performance with batch effect, and varying proportions of markerpositive cells.To simulate batch effects, we assumed that the proportion of expressed cells across all samples were equal, but all other features of the distribution could vary across slides: the proportion of zeros and the parameters of the gamma distributions for the unexpressed and expressed cells (six parameters total) all varied.The proportion of expressed cells was fixed at 0.05 or 0.001 across all slides and we evaluated GammaGateR's ability to estimate the proportion of expressed cells and identify cell phenotypes.To generate realistic differences in the gamma parameters in the simulations, we sampled 100 sets of the gamma mixture model parameters for each from the fitted results of 16 slides in Colon MAP dataset using bootstrapping.For each slide in the sample, the proportion parameters were rescaled with the expression proportion being a fixed number: for a set of parameters for a slide with zero-expression proportion being λ 0 , and the nonexpression proportion being λ 1 , the simulated slide will have an expression proportion of 0.05, a zero-expression proportion of 0:95λ 0 λ 0 þ λ 1 , and an unexpressed nonzero proportion of 0:95λ 1 λ 0 þ λ 1 ,.The gamma distribution parameters α 1 ; α 2 ; β 1 ; β 2 are as estimated in the bootstrapped slide.A gamma mixture model was then used to simulate cell expression for each slide with the rescaled proportion and the fitted model parameters.
We constructed three sets of threshold scenarios to investigate their impact on the GammaGateR fitting process.The first scenario used "accurate" constraints where the fixed thresholds are chosen according to Fig. 1.2, and the quantile thresholds are the proportion of expressed cells per simulation setup.The second scenario, used "inaccurate" constraints that were intentionally chosen to be higher than the reasonable thresholds.The last scenario, "none," did not include boundary constraints.
Finally, GammaGateR was fitted on each of the simulated slides to estimate the unknown parameters.Because the proportion of expressed cells is known in this simulation, we could evaluate the bias estimates obtained across the 100 simulated samples relative to the true proportion.We also evaluated how well the model identifies cell types using ARI.These simulation analyses were performed for the CD11B, Lysozyme, CD68, CD8, CD4, CD3D, and CD20 markers to assess bias of the estimates for different marker distributions.

Spatial analysis in colon MAP
We use GammaGateR to perform analyses of spatial features of the tumor microenvironment in precancerous colon polyps using the Colon MAP dataset (Chen et al. 2021).Specifically, we summarize immune cell population proportions (CD3þ) and MUC5AC expression in the manually annotated tumor region of each sample, as well as quantify the spatial interaction between MUC5ACþ cells and CD68þ cells.To estimate cell population proportions, instead of dichotomizing cell populations, we average the posterior marker-positive probabilities across all cells in the tumor region.For example, the proportion of CD3þ cells in the tumor epithelium is quantified as where x i is the CD3 marker value for cell i, in the tumor epithelial region, n s is the total number of cells segmented in the tumor epithelial slide s, and PðZ ¼ 2 j X ¼ x i Þ is as described in Equation (2).To study differences in cell proportions between tumor types we fit a linear regression model on the logit transformed probabilities with regions clustered within samples and weights proportional to the number of cells in each region.We report hypothesis tests using robust standard errors and a robust effect size index (RESI) with 95% confidence intervals to quantify the effect of tumor types on marker channels (Jones et al. 2022, Kang et al. 2023).
To quantify spatial clustering, we threshold the posterior probabilities and use Ripley's H with r 2 ð0; 1000Þ to quantify spatial attraction or repulsion of MUC5ACþ and CD68þ cells within each region of each tissue sample (Kiskowski et al. 2009).After estimating Ripley's H, we average the index across all radii and test the difference between groups using a Welch t-test.

Performance evaluation
We compare the ARI of each method relative to a silver standard manual phenotype in the Colon Map and CRC atlas datasets.For both datasets and all cell types in these datasets, the posterior probability by GammaGateR yields higher Median ARI (Fig. 2a and b).This means that the posterior probability has consistently greater similarity to the silver standard than marginal probability and ASTIR (Fig. 2a and  b).However, for some cell types (e.g.Macrophage, B-cells, Myeloid, Regulatory T cells), all methods have low performance.This is an indication of systematic difference in how the algorithms identify positive cells relative to the manual labels.
To evaluate the methods in the Ovarian cancer data in the absence of manual phenotypes, we use proportions and spatial characteristics of the phenotypes from each method as predictors in a random forest survival model.For all methods, incorporating the cell-level information reduces out-ofbag error performance by approximately 0.075, over the base model that includes only age and cancer stage.This indicates that spatial cell phenotype covariates are useful in predicting survival outcomes, consistent with the original findings (Steinhart et al. 2021).The posterior probability slightly outperforms other methods, having the lowest prediction error in 46% of the bootstrap samples, compared to 36% with ASTIR and 18% with the marginal probability (Fig. 2c).

Simulation results
We used simulations to evaluate GammaGateR's performance in the context of batch effects and noise.When the true proportion was 0.05, the phenotyping accuracy (ARI) was highest when the boundary constraints were accurately specified, but still high without boundary constraints, and lowest when the boundary constraints were inaccurately selected (Fig. 3a).The results evaluating the smaller expressed cell proportion (0.001) had lower accuracy; however, the accurate boundary constraints still had the smallest bias and variance (Fig. 3b).With much smaller expressed cell proportion, failure to specify a boundary constraint led to much lower accuracy in the ARI and bias, but accurate boundary constraints performed well (Fig. 3c and d).Inaccurate boundary specification caused 279 convergence failures when estimating the models when the true proportion was 0.05, and 420 convergence failures when the true proportion was 0.001.Among the three types of fittings, the variance of no boundary fitting in proportion estimation is the most significant, and the smaller expression portion (0.001) increased this variance.Overall, with accurate boundaries, GammaGateR is minimally affected by batch effects.However, under inaccurate or no boundaries, GammaGateR performs less well, especially when expressed cells are very rare.

Cell spatial interaction in Colon MAP
We use marker gating results from GammaGateR to compare spatial features across adenoma and SSL samples.We hypothesize that SSLs, which are precursors to CRCs with high microsatellite-instability that have better prognosis, have greater immune cell infiltration in the epithelial tissue (greater proportion of CD3þ cells), greater expression of MUC5AC in the tumor region, and that cells expressing MUC5AC spatially attract macrophages (CD68þ cells) in SSLs, but not in adenoma samples (Chen et al. 2021).
From the results, we observe that SSLs are associated with enriched immune cell populations (CD3þ cell proportions) in the epithelial region of the tumor tissue [Fig.4a 4c).The concordance between quantified measurements generated by GammaGateR and qualitative observation conclusions shows that GammaGateR can help solidify observed spatial pattern.

Summary and discussion
We introduced GammaGateR, a semi-automated marker gating tool.Driven by a novel cfGMM estimation framework, GammaGateR generates reproducible and evaluable marker gating for mIF data.In addition, cfGMM enables computationally feasible model estimation for large-scale datasets like mIF.The marker gating output of GammaGateR can be used to define phenotypes and as input to downstream analysis.GammaGateR implements interactive visualization to quality check the clustering results (see vignette in Supplementary Material) and allows users to modify the constraints to improve model results when needed.Consequently, GammaGateR provides more consistent results with silver standard labels than ASTIR, the existing state-of-the-art method for automated phenotyping of cell-level mIF data.In the examples shown in this paper, GammaGateR phenotypes had slightly improved ovarian cancer survival prediction accuracy compared to ASTIR.This paper also compares the posterior and the marginal probabilities returned by GammaGateR.The marginal probabilities only use the marker-positive cell distribution to determine cell phenotypes, whereas the posterior probabilities take into account the distribution of the marker-negative cells.Using posterior probabilities was almost always better indicating the importance of accounting for the full distribution of the marker intensities when identifying marker-positive cells.
Despite being an effective tool with a high level of implementation, GammaGateR could still use a few improvements to broaden its usage.First, the accuracy of GammaGateR might further improve with pixel-level information.GammaGateR is applied to segmented cell-level data and does not rely on patterns of pixel intensities within and around each cell.While this makes GammaGateR easy to   et al. 2022et al. , Liu et al. 2023)).Second, GammaGateR focuses on obtaining a highly accurate fit to the marginal distribution of each marker channel for each slide and is applied separately to each slide and channel, making it highly parallelizable.One disadvantage of this approach is that it does not explicitly account for the joint distributions of markers, which could be used for obtaining more accurate phenotypes.
Instead, multiple markers can be incorporated by combining the posterior probabilities to define phenotypes as was performed above in the Colon MAP and CRC atlas datasets.Future work could model the multivariate distributions.However, it should be noted that this approach risks phenotypes being driven, in part, by channels that might not be relevant for a given cell type.Third, more thorough evaluation of batch effect could render more reasonable marker-gating output.We applied normalization prior to running GammaGateR to reduce the influence of batch effects.Although GammaGateR does not explicitly model and remove batch effects, because the model is fit separately for each slide, it may help to mitigate batch effects.When using the posterior probabilities in downstream analysis, an implicit assumption is that differences in the shape and location of the marker-positive component do not represent biological differences.Finally, in our applications, five models had too few cells to estimate or did not converge and were excluded from the analysis.GammaGateR parameter estimates for the failed models could be imputed by combining estimates across slides, which might also help with the estimation and removal of batch effects.GammaGateR is the first semi-automated marker gating method developed specifically for mIF data and is useful to define quality-controlled marker-positive cells by flexibly modeling marker distributions across cells and channels.GammaGateR has demonstrated consistency with manual labels and sensitivity to biological information which makes it another useful method for the multiplexed imaging scientist.
Equation (1) is equal to the gamma distribution with shape a k and scale b k .Closed-form estimators of a k and b k are obtained by differentiating Equation (1) with respect to b k and γ k , setting the derivative to 0, solving for a k and b k ,

Figure 1 .
Figure 1.Overview of GammaGateR analysis pipeline for the CD4 marker channel.(1) GammaGateR takes segmented cell-level data as input.(2) Density polygons are used to visualize all slide level histograms and select constraints for model fit (3).(4) After model estimation (5.a), diagnostic plots are used to evaluate the model fit.(5.b)New constraints can be selected and the refitted model can be compared to a previous model.(6) Expression probabilities can be extracted for downstream analysis from the model objects

Figure 2 .
Figure 2. Performance evaluation for GammaGateR on the three datasets."Posterior" and "marginal" refer to the posterior and marginal probabilities from GammaGateR, respectively.Cell phenotyping performance comparing GammaGateR to ASTIR in the (a) Colon MAP and (b) CRC Spatial atlas.(c) Survival prediction performance error in the ovarian cancer dataset, measured by 1-C-index."Base" indicates the survival model including only age and cancer stage variable apply in many datasets, it obviously does not leverage the information available at the pixel level around each cell, which may be useful for identifying doublets and improving cell phenotype identification as in recent work (Lopez de Rodas

Figure 3 .Figure 4 .
Figure 3. Simulation results with known ground truth and realistic batch effects generated from the Colon MAP dataset across 100 simulated slides.(a) Adjusted Rand index (ARI) and (b) bias for simulated data with a known expressed cell proportion of 0.05.(c) ARI and (d) bias for simulated data with a known expressed cell proportion of 0.001.ARI compares cell phenotypes identified thresholding GammaGateR posterior probabilities to the ground truth generated from the simulation.The dashed lines (b and d) indicate the true expressed cell proportion, which is the same across markers, boundary types, and slides

Table 1 .
Sample characteristics for each dataset.