ICAT: a novel algorithm to robustly identify cell states following perturbations in single-cell transcriptomes

Abstract Motivation The detection of distinct cellular identities is central to the analysis of single-cell RNA sequencing (scRNA-seq) experiments. However, in perturbation experiments, current methods typically fail to correctly match cell states between conditions or erroneously remove population substructure. Here, we present the novel, unsupervised algorithm Identify Cell states Across Treatments (ICAT) that employs self-supervised feature weighting and control-guided clustering to accurately resolve cell states across heterogeneous conditions. Results Using simulated and real datasets, we show ICAT is superior in identifying and resolving cell states compared with current integration workflows. While requiring no a priori knowledge of extant cell states or discriminatory marker genes, ICAT is robust to low signal strength, high perturbation severity, and disparate cell type proportions. We empirically validate ICAT in a developmental model and find that only ICAT identifies a perturbation-unique cellular response. Taken together, our results demonstrate that ICAT offers a significant improvement in defining cellular responses to perturbation in scRNA-seq data. Availability and implementation https://github.com/BradhamLab/icat.


Introduction
From deconstructing tumor cell compositions (Schelker et al. 2017), to reconstructing developmental pathways (Wagner et al. 2018), single-cell RNA sequencing (scRNA-seq) has revolutionized scientists' ability to explore complex tissues and systems. Clustering of individual cells is central to the analysis of scRNAseq data, in which cell states are identified by grouping cells with similar gene expression profiles (Luecken and Theis 2019). Whether to ascertain the effect of drug treatment on cancer viability or to infer the mechanistic role of a given gene in tissue specification, perturbation experiments have long been used to dissect and understand complex biological systems. Oftentimes, this analysis is formalized by identifying changes in the abundance of cell states between treatments (Haber et al. 2017;Kang et al. 2018;Luecken and Theis 2019). Inherently, such analyses are based on the assumption that cell states are readily identified and matched between treatments. Recently, graph-based community detection algorithms such as the Louvain and Leiden methods have become best practice for identifying cell states in singlecondition scRNA-seq datasets (Luecken and Theis 2019). However, these methods often fail to match identities in datasets containing perturbing treatments, which present particular difficulties due to dominating treatment effects ( Supplementary Fig.   S1), where it is typical that scRNA-seq datasets segregate by treatments rather than cellular identities ( Supplementary Fig. S1), similar to the impact of batch effects (Kang et al. 2018;Luecken and Theis 2019;Kagohara et al. 2020;Lähnemann et al. 2020). This challenge presents significant difficulties in correctly evaluating cellular responses to perturbation in scRNA-seq experiments.
Common techniques for identifying cell states in scRNAseq perturbation experiments rely on either reference datasets to map cells onto known cell states (Regev et al. 2017;Schaum et al. 2018), identifying cell states via known marker gene expression (Kang et al. 2017), or the use of an integration algorithm to minimize differences between control and perturbed cells by transforming cells into a shared space prior to clustering cells (Luecken and Theis 2019;Perillo et al. 2020). The first scenario is problematic for nonmodel systems, where such atlases may not exist, and along with marker gene analysis, is inappropriate when the goal of a given experiment is discovery of novel cell states or subtypes; for these reasons, the latter approach is often preferred. Toward that end, scRNA-seq algorithms have been developed that broadly integrate heterogeneous datasets (Hie et al. 2019;Luecken and Theis 2019;Stuart and Satija 2019). Originally developed to remove batch effects (Haghverdi et al. 2018), many of the most widely used integration algorithms work by matching mutual nearest neighbors between batches in order to learn nonlinear transformations that minimize neighbor distance in an embedded space (Haghverdi et al. 2018;Hie et al. 2019;Korsunsky et al. 2019;Stuart and Satija 2019). While useful for removing technical variation from datasets, integration algorithms are unable to discriminate technical from biological noise: thus, their use may be inappropriate when biological differences exist between samples, such that a one-to-one correspondence between cell states cannot be assumed. This difficulty is reinforced by recent findings which show that integration methods often smooth over and eliminate biologically meaningful signals, leading to incorrect cell state identification and possibly wholesale erasure of real cell states (Bü ttner et al. 2019;Luecken and Theis 2019;Tyler et al. 2021;Luecken et al. 2022).
We therefore developed a two-step algorithm to robustly and sensitively Identify Cell states Across Treatments (ICAT) in scRNA-seq data. In contrast to integration approaches that minimize differences between samples, ICAT uses a novel approach that relies on self-supervised feature selection and control-guided clustering to identify cell states using biologically relevant features. While other methods exist to either identify shared clusters between treatments (Barron et al. 2018) or to leverage sparse features (Hua et al. 2020), to our knowledge, ICAT is the only method capable of doing both while also handling multiple experimental conditions. Our results show that by emphasizing cell state-defining genes, ICAT reliably identifies cell states across scRNA-seq perturbation experiments with high accuracy. Unlike marker-based approaches, ICAT is able to discover novel cell states, making perturbation analysis in under-studied systems feasible. Importantly, ICAT does not require prior knowledge of marker genes or extant cell states, is robust to perturbation severity, identifies cell states with higher accuracy than leading integration workflows with both simulated and real scRNAseq perturbation experiments, and operates reliably at low signal resolutions.

ICAT algorithm overview
To overcome the inability of current clustering algorithms to accurately detect cell states in perturbation experiments ( Supplementary Fig. S1), we developed a two-step algorithm (Fig. 1A). The purpose of the first step is to determine the cluster-defining features and weight them among the control cells alone. The goal of the second step is to use those feature weights to co-cluster the controls and the perturbed cells together, effectively assigning the perturbed cells to control clusters. Perturbed cells may also be optionally clustered separately to determine their feature weights to detect cell states that are asymmetrically present in the perturbed condition. Using the Louvain method (Blondel et al. 2008) by default, the first step separately clusters control cells to produce initial cluster labels for each cell. Cluster labels are then used to weight genes by their ability to predict the previously generated labels using Neighborhood Component Feature Selection (NCFS) (Yang et al. 2012). NCFS weights genes by using gradient ascent to maximize a regularized leave-one-out classification accuracy metric. By the nature of regularization, most gene weights converge to zero, leaving only the most predictive genes with weights >1. Applying the learned gene weights to the original expression matrix with both control and treated cells transforms the matrix into a "cluster-defined" space, where the distances between cells are dominated by highly weighted marker genes. After this transformation, ICAT retains previously identified cluster labels for control cells, then, in the second step, performs semisupervised Louvain community detection, whereby control labels are held immutable (Fig. 1A) (see Supplementary Methods for details).
This strategy achieves three goals: (i) Because NCFS is a sparse method, it overtly separates informative and noninformative genes, making it possible to cluster directly on cellstate defining genes while simultaneously removing likely sources of biological noise. (ii) ICAT produces easily To identify cell states across treatments, ICAT first performs self-supervised feature weighting to find genes that discriminate cell identities among control cells alone, followed by semisupervised clustering using the newly transformed expression matrix. To learn feature weights, ICAT clusters control cells using Louvain community detection, then uses the newly generated cluster labels as input into NCFS to weight genes by their predictiveness. After applying the learned gene weights to the original gene expression matrix, ICAT clusters both treated and control cells together using a semisupervised implementation of Louvain community detection. During this process, ICAT holds the previously identified cluster labels for the control cells immutable. (B) The schematic illustrates the ICAT C þ T implementation, which expands feature weighting to treated cells to identify asymmetrical populations between treatments. Cells are split along treatments and independently clustered using the Louvain method; then, cluster labels are used to learn gene weights using NCFS in each treatment set independently. To retain asymmetrically informative genes, weights for each treatment are concatenated row wise and subsequently reduced to the maximum weight using a row-wise maxpool function. The reduced weight vector is then used to transform the original count matrix.
interpretable results by clustering on the expression patterns of a few, highly weighted genes. This is a noted benefit over typical integration methods, where transformations are obscured behind neighbor-based approaches (Hie et al. 2019;Stuart and Satija 2019). (iii) ICAT makes comparing treatment effects on cell states straightforward. By implementing semisupervised clustering with immutable control labels, ICAT ensures cell states discovered in control cells are retained when treated cells are introduced, alleviating concerns of artifact-induced cell states upon integration due to erroneously grouping cells. These three features together produce interpretable results that ease compositional analysis between treatments.
While similar in approach, ICAT provides several advantages compared with identifying shared cell states via marker gene mapping: first, ICAT does not require previously known marker genes, making ICAT appropriate for situations with previously uncharacterized cell states. Second, by clustering control and treated cells together, ICAT takes advantage of the structure in both datasets compared with clustering both independently. This alleviates issues with imperfect matches between marker genes. Third, by foregoing common marker gene presence-absence comparisons to map cell types, ICAT accurately identifies activated states, which is otherwise difficult. Finally, by generating cell state labels through a clustering step rather than a classification approach, ICAT is able to group cells with treatment-specific phenotypes and identify out-of-sample states.
One caveat of our approach is that if asymmetrical populations exist in treated cells, but not controls, their identifying genes will likely be collapsed by the control-only featureweighting step. Depending on the preferred cell-state resolution, this may or may not be desirable. To account for both resolution levels, researchers can choose to perform featureweighting in all treatment types to identify asymmetrical populations in each condition (Figs 1B,2A,and 3A). To reduce gene weights to a single gene-weight vector, only the maximum gene weight among the conditions is retained. During semisupervised clustering, labels for control cells are retained during label initialization, while treated cells are reinitialized as singleton clusters.

Data simulation
To assess ICAT, we initially tested performance in simulated scRNA-seq data . We simulated data using a zero-inflated Negative Binomial distribution to model gene counts per Bü ttner et al. (2019) (see Supplementary Methods for details). To simulate perturbations and apply system-wide disruptions to gene expression patterns, we randomly selected genes as perturbation targets and subsequently sampled scalar multipliers from a Cð2; 2Þ distribution to alter average expression values for each disrupted gene. To generate distinct cell types, we randomly selected a separate set of genes to act as markers for each population. Selected marker genes had their average expression shifted by a scalar multiplier following a Cð3; 3Þ distribution to create population substructure.

ICAT analysis in real datasets
To assess ICAT's ability to correctly cluster cells in real perturbation experiments, we made use of three publicly available datasets (Kang et al. 2018;Tian et al. 2019;Kagohara et al. 2020). While no true benchmark datasets currently exist for defining cell states in perturbation experiments, we selected datasets that either contain genotyped cell lines, or are from well-studied cell types with confidently labeled cells. Because NCFS feature-weighting is a computationally intensive task, and since the Kagohara and Kang datasets each contain >20 000 cells, it was necessary to subselect cells for feature weighting ( Supplementary Fig. S2). To this end, ICAT selects a representative sample of cells via submodular optimization using the apricot (Schreiber et al. 2020) Python package. Submodular optimization via apricot was able to effectively represent the data space using only a small number of cells ( Supplementary Fig. S3).
We further validate ICAT's predictions in vivo using a newly generated sea urchin scRNA-seq dataset consisting of one control and two experimental conditions. Treatmentunique cell states were first identified using ICAT; then, empirically assessed using quantitative single-molecule fluorescence in situ hybridization (smFISH) to evaluate both the cell state-defining expression patterns and the predicted changes in relative cell state abundance between treatments.

Benchmarking ICAT's performance
To benchmark ICAT against other general workflows for identifying cell states in multicondition scRNA-seq experiments, we compared four general approaches: (i) ICAT, (ii) integrating datasets across treatments followed by Each dot represents a cell with circles representing control cells and crosses denoting treated cells. Dots are colored by ground truth identity (left column), cluster label produced by performing Louvain community detection on the raw count matrix (middle column), and clusters labeled produced by ICAT (right column). (B) Average agreement between ground truth label and cluster labels produced by clustering the raw data (blue) and ICAT (orange) as measured by the ARI. Error bars represent the 95% CIs for the mean ARI for each method. Five different cellular composition conditions were simulated: "All same," both control and treated cells share the same two cell states; "Rx Unique," treated cells contain a treatment-unique cell state; "Control Unique," control cells contain a unique cell state; "Both Unique," both treated and control cells contain treatment-specific cell states; and "None Same," no shared cell states between treated and control cells. Each condition was simulated 15 times (n ¼ 15). Simulations were evaluated using the ICAT C implementation.
clustering cells via Louvain clustering, (iii) integrating datasets across treatments followed by clustering cells via ICAT, and (iv) naive clustering where we tested the "No Integration" scenario in which cells were clustered using Louvain clustering without accounting for treatment status ( Supplementary Fig.  S4). Each workflow focuses on identifying cell states, making them intrinsically comparable despite algorithmic differences.
In our evaluation, we tested three state-of-the-art integration methods: Seurat (Satija et al. 2015;Stuart and Satija 2019), Scanorama (Hie et al. 2019), and Harmony (Korsunsky et al. 2019). These methods were chosen due to their ubiquity in single-cell analysis, as well as their top performance in rigorous benchmarks comparisons (Luecken et al. 2022). Accordingly, we also evaluated clustering performance for three extended workflows: ICAT Seurat , ICAT Scan , and ICAT Harm whereby data from cells were first integrated using the respective integration algorithm prior to clustering with ICAT. While ICAT is relatively robust to parameter choice ( Supplementary Fig. S5), we used standardized nearest neighbor and resolution parameters for each method during clustering for appropriate comparisons (Supplementary Methods).
When evaluating cell state labels produced in simulated experiments, we considered three criteria reflecting reliable clustering between control and treated cells: (i) Global label conservation: to ensure cluster labels accurately reflect the underlying biology, we used the Adjusted Rand Index (ARI) to measure the extent of mapping between cluster labels and known cell states (Hubert and Arabie 1985). (ii) Treatment mixing: the local homogeneity of shared cell types without undo isolation by treatment status was assessed with the Local Inverse Simpson's Index (LISI; Korsunsky et al. 2019); (iii) Unique label separation: the accurate detection of populations of interest was measured using population-specific F1 measures (Luecken et al. 2022). Such metrics were used to measure the ability to accurately detect asymmetric populations that exist only in single conditions (F1-Unique), the ability to discriminate stimulated cell states from nonstimulated analogs (F1-Stim), and the identification of rare cell types (F1-Rare). ARI, LISI, and all F1 metrics were standardized between 0 and 1, where 1 indicated stronger performance.
Because cell state is resolution dependent and likely imperfect in real datasets (Luecken and Theis 2019; Lä hnemann et al. 2020), when evaluating performance using real data, we also calculated the label-free Davies-Bouldin (DB) metric to assess the density of clusters produced by each method (Davies and Bouldin 1979). The DB metric measures similarity between a cluster and the next most similar cluster. To facilitate comparison, each score was scaled between 0 and 1 such that 1 represents better clustering, whereas 0 represents worse clustering.

Availability of data and materials
ICAT, ncfs, and sslouvain Python packages are freely available on pip and github (github.com/BradhamLab). Scripts to perform simulated data generation and benchmark comparisons are available as a SnakeMake workflow at (github.com/ BradhamLab/ICAT_manuscript). Archived data files for both simulated and real data are also available at the same repository. The pipeline to align and quantify SMART-seq2 reads is freely available at (github.com/BradhamLab/scPipe), and the raw sequencing data are available at GEO accession GSE164240. Likewise, all scripts to segment primary mesenchyme cells (PMCs) and quantify HCR FISH signal are available at (github.com/BradhamLab/hcr_quantify) and the image data are available upon request.

ICAT accurately clusters cells in perturbation experiments regardless of compositional asymmetries
To validate the general ICAT workflow (Fig. 1), we simulated data to compare ICAT to naively clustering cells using Louvain community detection without accounting for treatment status ( Fig. 2A). To initially validate the general approach of ICAT, we first performed a set of simple experiments consisting of equal size populations to compare ICAT to naive Louvain clustering ( Fig. 2A, top row). Subsequently, we tested four different simulated experiments with each condition consisting of a different number of shared and unique populations between control and treated cells (Supplementary Table S1). ICAT exhibited near-perfect clustering with ARI scores near 1 across five different simulated experiments (Fig. 2B). Without accounting for the systematic noise introduced by perturbations, naive Louvain clustering erroneously separates cell states by treatment status (Fig. 2A) and only produced accurate cluster labels when no shared cell states between control and treated cells existed (Fig. 2B). These results not only demonstrate the necessity of accounting for treatment status while identifying cell states in perturbation experiments, but also provide validation for ICAT's novel approach.
3.2 ICAT is robust to perturbation severity and better resolves cell states at lower resolutions than leading integration workflows To further challenge ICAT's ability to robustly detect cell states in more complex scenarios, we simulated five populations of cells: three that are present in both control and treated cells, with one of these exhibiting elevated marker gene expression upon treatment and referred to as "stimulated" (population C1 and P(C1)þ), and two treatment-only populations (populations P4 and P5) ( Fig. 3A and B). These populations were simulated under varying conditions to assess ICAT's robustness to perturbation severity, degree of cell state separation, and different population abundances.
We first assessed ICAT's robustness against increasing perturbation severity, testing five conditions with varying proportions of perturbed genes, ranging from 1% to 25% ( Fig. 3C and Supplementary Table S2). Next, we assessed performance with changing resolutions between cell states by simulating 10 different experimental conditions with a varying number of average marker genes per cell type. Mean marker gene numbers ranged from 10 to 105 per population (0.67%-7.0% of total genes) ( Fig. 3D and Supplementary  Table S3). In both cases, each experiment was simulated 15 times, for a total of 90 independent datasets in the first case ( Fig. 3C) and 150 in the second case (Fig. 3D). Finally, to test performance as cell type proportions deviate from equality, we simulated three different experimental conditions with varying distributions of each cell type. Experiments ranged from the most similar, with all cell types having 100 cells per treatment, to the most disparate where the minimum and maximum population size per treatment were 50 and 150, respectively ( Fig. 3E and Supplementary Table S4). To isolate the effect of differential cell abundance, proportion simulations did not include an activated cell type, and instead were composed of three shared populations and two treatmentunique populations (Fig. 3E).
When we tested perturbation severity or variable numbers of marker genes ( Fig. 3C and D), ICAT outperformed both Scanorama and Seurat across all metrics while exhibiting highly stable performance across both increasing perturbation intensities and at varying cell-state resolutions (Fig. 3F). ICAT provides near-perfect treatment mixing, with LISI scores nearing 1 for all signal levels. In contrast, Scanorama and Seurat only approach 0.5 LISI scores at the highest signal levels. Harmony generally produces higher LISI scores than both Scanorama and Seurat, but displays significant performance loss as perturbation severity increases, while failing to match ICAT performance across cells state resolutions, indicating that ICAT exhibits a substantially improved ability to correctly and robustly mix cell states across treatments, even at low signal levels at which the standard integration algorithms fail (Fig. 3F).
Scanorama offered comparable performance to ICAT in global label conservation (ARI) and in the ability to detect asymmetrical populations (F1-Unique) (Fig. 3F), while ICAT was uniformly better at accurately distinguishing the stimulated population from its nonstimulated analog (Fig. 3F, F1-Stim). Seurat underperformed both Scanorama and ICAT in label conservation and asymmetrical population detection (ARI, F1-Unique), and offered comparable treatment-mixing performance to Scanorama (LISI), indicating that Seurat does not accurately separate known populations relative to Scanorama or ICAT (Fig. 3F). Harmony displayed inconsistent performance, showcasing dependence on both perturbation severity and marker gene abundance to correctly identify asymmetrical populations (F1-Stim, F1-Unique).
For each of the perturbation severity tests and with low marker gene numbers, Seurat and Harmony were surprisingly outperformed by the no-integration scenario for ARI (Fig. 3F) which can be explained by both methods' poor identification of asymmetrical populations (F1-Unique and F1-Stim, respectively) reflecting a general under-clustering of cells ( Supplementary Fig. S6). Scanorama exhibited a similar drop in ARI performance at low marker numbers, but consistently outperformed no integration (Fig. 3F). These results show ICAT more accurately and robustly resolves cell states across experimental conditions than either Seurat or Scanorama integration workflows.

ICAT most accurately detects rare populations
ICAT was unaffected by differences in cell state proportions as assessed by global and rare population label conservation via ARI and the F1-Rare metric, respectively, while each integration approach exhibited decreased performance as the disparity in proportions increased (Fig. 3G, left). The performance of ICAT was robust and stronger than Scanorama, Seurat, and Harmony in each measure.

ICAT improves and stabilizes the performance of integration workflows
Clustering cells in integrated datasets with ICAT, in lieu of traditional Louvain community detection, produced overall higher quality clusters compared with standard integration workflows. Compared with the typical Seurat and Harmony workflows, ICAT Seurat and ICAT Harm led to significant improvements in all three simulated experiments, leading to greater robustness to perturbation (Fig. 3F1, left), increased signal sensitivity (Fig. 3F2, left), and better isolation of rare cells (Fig. 3G, right). When compared with Scanorama, ICAT Scan better isolated rare cells (Fig. 3G, right), and improved performance in some metrics in perturbation and signal experiments. Gains were more modest compared with ICAT Seurat and ICAT Harm , primarily since Scanorama with Louvain clustering already exhibits strong performance ( Fig. 3F1 and F2, left). In particular, ICAT Scan substantially improved treatment-mixing (LISI) across conditions compared with Scanorama with Louvain (Fig. 3F1, right). Interestingly, at both signal extremes, stand-alone ICAT scored higher in the F1-Stim metric than integration methods whether combined with Louvain or ICAT for clustering (Fig. 3F2), indicating that ICAT preserves discriminatory signals better than either integration algorithm.
In summary, the results from simulated data demonstrate that ICAT offers robust and accurate cell state identification that exceeds three current state-of-the-art integration algorithms, across a range of simulated conditions including severe perturbations, low marker gene numbers, and variations in relative cell abundance. Further analysis shows that identifying cell states with ICAT following integration produces higher quality clusters compared with current integration workflows alone. Thus, the novelty of using learned feature weights combined with control-guided clustering offers a significant performance improvement over state-of-the-art integration approaches in accurately defining the cellular compositions of single-cell datasets; moreover, the results show that, unlike integration approaches, ICAT's performance is robust across a range of conditions.

ICAT produces higher quality clusters in real datasets compared with integration methods
To ensure ICAT produced clusters that accurately reflect known biological cell states, we evaluated ICAT in three separate real scRNA-seq datasets. We first used the CellMix dataset generated by Tian et al. (2019). The dataset was produced by creating various mixtures of nine cells from combinations of H2228, HCC827, and H1975 adenocarcinoma cell lines, sequencing the cell mixtures, then downsampling the resulting counts to produce pseudocells that are in-line with true singlecell data. While the Tian dataset was developed to test integration across batch effects, we reasoned that variance introduced by batch effects offers an imperfect substitute for a perturbation. To better mimic distinct phenotypes, we selected only four cell "types," consisting of pure H2228, HCC827, and H1975 cells, along with a single mixture with equal proportions of each cell line, denoted "Mixed" (Supplementary Fig. S7 and Supplementary Table S5). The second dataset from Kagohara et al. (2020) offers a simple perturbation experiment in which three genotyped cell lines of head and neck squamous carcinoma cells were treated with either PBS (control) or the chemotherapeutic cetuximab (Supplementary Table S5). Finally, we assessed performance in a peripheral blood mononuclear cell dataset collected by Kang et al. (2018) with eight cell types in control and interferon-ß (IFN-b)-stimulated conditions (Supplementary Table  S5). Both the Tian and Kagohara datasets were chosen for having gold-standard cell labels via genotyping, while the Kang dataset was selected to include a more complex, but well studied, system with silver-standard cell labels identified via known marker genes. Prior to analysis, each dataset was preprocessed by normalizing cell counts and selecting highly variable genes using Scanpy. To ensure each tested algorithm analyzed the same input data, we provided the same Scanpy preprocessed data to each method.
In the Tian dataset, ICAT outperformed Seurat and Scanorama in LISI and DB metrics and was comparable to both in ARI, where Harmony and Seurat scored highest ( Fig. 4A and B and Supplementary Table S6). ICAT outperformed all three integration methods in the Kagohara and Kang datasets (Fig. 4A and B). Aside from ARI in the Kagohara dataset, in which Scanorama and ICAT scored equally well (Fig. 4B), ICAT produced the highest scores for ARI, LISI, and DB metrics (Fig. 4B). Overall, ICAT generated higher quality clusters in real datasets compared with existing integration algorithms (Fig. 4A and B and Supplementary  Table S6).
Compared with Scanorama, ICAT Scan led to greater ARI, LISI, and DB scores across the three analyzed datasets ( Fig. 4C and Supplementary Table S6), aside from ARI in the Kagohara dataset where Scanorama and ICAT Scan scored equally well (Fig. 4C). For Seurat and Harmony, ICAT Seurat and ICAT Harm resulted in improvements that exhibit dataset and metric dependence. ICAT Seurat and ICAT Harm more readily identified cell states in the Kagohara dataset, shown by the dramatic increase in ARI scores, and clusters were tighter and better separated as shown by large increases in DB scores (Fig. 4C). In contrast, ARI scores were maintained or marginally reduced compared with traditional Seurat þ Louvain or Harmony þ Louvain workflows for the Tian and Kang datasets. The results demonstrate that ICAT offers a marked improvement in cell state identification in multicondition . ICAT outperforms current integration methods in identifying cell states across treatments in real datasets. scRNA-seq data from Tian, Kagohara, and Kang studies is compared. (A) Spider plots compare the performance for each algorithm within each dataset for ARI, LISI, and DB quality metrics. (B) Lollipop plots highlight the differences in the metrics for each method across the same datasets. (C) Spider plots comparing Seurat to ICAT Seurat , Scanorama to ICAT Scan , and Harmony to ICAT Harm , respectively. All metrics are scaled from 0 to 1, where 1 is best (closest to the apex of each corner). DB, Davies-Bouldin metric, and otherwise as in Fig. 2. ICAT performance for the Tian dataset was evaluated using the ICAT C þ T implementation, whereas Kagohara and Kang datasets were evaluated using the ICAT C implementation.
scRNA-seq experiments compared with the current integration workflows Seurat, Harmony, and Scanorama. Thus, the performance of ICAT with real data is substantially improved compared with conventional integration þ clustering workflows, in agreement with ICAT's performance in simulated data.

ICAT alone accurately predicts subpopulation response to perturbation in vivo
To empirically test ICAT predictions on cell state compositions following perturbation, we generated scRNA-seq data using the SMART-Seq2 protocol (Picelli et al. 2014). Specifically, we isolated and sequenced the skeletogenic lineage in sea urchins known as primary mesenchyme cells (PMCs) at 18 h postfertilization to compare cells from control embryos and embryos subjected to chemical treatments that perturb skeletal patterning, specifically, MK-886 (MK) and chlorate (Pidgeon et al. 2007;Piacentino et al. 2016;Simionato et al. 2021). PMCs are of interest because this population of cells receives patterning cues during development that cause their diversification into subsets; since MK and chlorate each perturb distinct patterning cues, we reasoned that these differences would be reflected in PMC subpopulation composition (Lyons et al. 2014;Sun and Ettensohn 2014;Piacentino et al. 2015Piacentino et al. , 2016. After quality control and cell filtering, we retained expression data for 435 cells (147 control,134 MK886,. ICAT clustered the combined PMC expression profiles into five distinct cell states (Fig. 5A1). The impacts of MK exposure were more dramatic than those of chlorate (Fig. 5A2), we therefore focused on the effects of MK. Two clusters showed stark compositional differences between control and MK-treated cells, in that Cluster 3 lacks MK cells compared with controls, while Clusters 2 and 4 show significant enrichment for MK cells (G-test FDR < 0.05, post hoc Fisher's Exact test FDR < 0.05; Fig. 5A2 and Supplementary Tables  S7 and S8). Since MK-treated cells dominated Cluster 4 such that it appears to be an asymmetric population, we more closely examined Clusters 3 and 4. Investigating highly predictive genes identified by ICAT found that Clusters 3 and 4 were distinguished by inverse expression of genes pks2 and sm50 ( Fig. 5A3 and A4), such that Cluster 3 displayed a sm50þ/pks2À phenotype, while cluster 4 exhibited a reciprocal sm50À/pks2þ pattern.
To empirically test ICAT's predicted enrichment of the sm50À/pks2þ cell state and depletion of sm50þ/pks2 cells in MK886-treated PMCs, we performed smFISH to quantify expression levels for both genes in DMSO (control) and MKtreated embryos (Choi et al. 2018; Fig. 5B and Supplementary  Fig. S8). To accurately map FISH signals to PMCs, embryos were stained for PMCs using the PMC-specific 6a9 antibody. Using Ilastik (Berg et al. 2019), we trained a random forest classifier (out-of-bag error ¼ 0.14) to predict and segment individual PMCs in each confocal image stack ( Fig. 5B1 and B2 and Supplementary Fig. S9). Following methods for relative FISH quantification from the original paper (Choi et al. 2018), we calculated cell-level expression values for each PMC by first restricting fluorescence signal to PMC-labeled voxels, then calculated the average relative intensity for each gene in every PMC (Fig. 5B3 and B4 and Supplementary Fig.  S8). Across embryos, we labeled individual PMCs as sm50þ or sm50À if their standardized sm50 expression values were above the 75th or below the 25th percentiles of all PMCs, respectively ( Supplementary Fig. S10). The procedure was repeated for pks2 to determine pks2þ and pks2À PMCs. We found that MK-treated embryos displayed both a significant loss of the sm50þ/pks2À phenotype, as well as a significant enrichment of the sm50À/pks2þ cell state relative to controls Figure 5. ICAT most accurately defines subpopulation response to perturbation. SMART-seq2 was performed on cells isolated from controls or from embryos treated with the perturbants chlorate or MK-886. (A) ICAT predicts five clusters from the combined data (A1), with treatmentdependent subpopulation compositions (G-test likelihood test, FDR < 0.05; ad-hoc pairwise Fisher's Exact test, FDR < 0.05) (A2). MKinduced differences are defined by reciprocal expression patterns of ICATidentified genes, sm50 and pks2, in subpopulations 3 (sm50þ/pks2À) and 4 (sm50À/pks2þ) (A3). (B) ICAT predictions for control (B1, B3) and MK-866 (B2, B4) are validated by HCR FISH analysis (B5, B6). Automated detection and segmentation of PMCs in vivo from 3D image data are shown as various colors; each color represents an individual cell (1-2). The expression of sm50 (red) and pks2 (cyan) transcripts are shown in the same embryos; arrowheads indicate PMCs that express only sm50 or only pks2 (3-4). MK-866 treated embryos exhibit statistically significantly fewer sm50þ/pks2À PMCs (B5) and more sm50À/pks2þ PMCs (B6) than controls, consistent with the predictions from ICAT (A). Dot plot error bars denote 95% CIs of the mean with each dot representing an individual embryo [n DMSO ¼ 26, n MK886 ¼ 37; B5 (Binomial GLM, l DMSO ¼ 3.59%, l MK886 ¼ 1.91%; b MK886 ¼ À0.71, P < 10 À3 ; B6 Binomial GLM, l DMSO ¼ 0.19%, l MK886 ¼ 2.71%; b MK886 ¼ 2.60, P < 10 À4 )]. (C) Seurat (C1) fails to find any treatment effect on PMC subpopulation composition, while Scanorama (C2), and Harmony (C3) predict one and two disrupted populations, respectively (G-test likelihood test, FDR < 0.05; ad hoc pairwise Fisher's Exact test, FDR < 0.05).
(adjusted P < .05; Fig. 5B5 and B6 and Supplementary Tables S9 and S10). These results empirically validate the ability of ICAT to correctly identify affected cell states in scRNA-seq datasets from perturbed samples.
Integrating the PMC scRNA-seq data using Seurat, Scanorama, or Harmony failed to identify both sm50/pks2defined cell states affected by MK-886 treatment ( Fig. 5C and Supplementary Fig. S11). In fact, Seurat failed to identify any dependence of cell state composition on treatment status (G-test FDR > 0.05; Fig. 5C1; Supplementary Fig. S11A 1 and B1 and Supplementary Table S11). Scanorama, on the other hand, correctly identified the depletion of sm50þ/ pks2À PMCs in MK embryos ( Supplementary Fig. S11B 2 and B3 and Supplementary Tables S12 and S13; G-test FDR < 0.05, post hoc Fisher's Exact test FDR < 0.05), but importantly, failed to identify sm50À/pks2þ enrichment in MK embryos ( Supplementary Fig. S11B 2 and B3 and Supplementary Tables S12 and S13; G-test FDR > 0.05). Conversely, Harmony correctly identified the enrichment of sm50À/pks2þ PMCs in MK embryos but lacked the sensitivity to identify the depletion of sm50þ/pks2À cell states ( Supplementary Fig. S11B 3 and Supplementary Tables S14 and S15; G-test FDR < 0.05, post hoc Fisher's Exact test FDR < 0.05). ICAT is the only method among these four to correctly identify the cell-state specific sm50 and pks2 responses to MK exposure, which accounts for most of the differences in cell groupings between the methods (Jaccard similarity; Supplementary Fig. S11C). Of the compared methods, Seurat failed to produce even qualitatively correct results since it did not identify any treatment-specific effects on PMCs ( Supplementary Fig. S11A 2), contradicting the known disruptions to PMCs in response to disruption of the ALOX signaling pathways (Piacentino et al. 2016).

ICAT-selected marker genes reflect underlying biology
By leveraging weighted expression of learned cell statedefining genes, ICAT is able to more correctly label cells in mixed-condition scRNA-seq experiments compared with traditional integration workflows. A latent assumption in this process is the relative stability of marker genes in nontreatment-affected cell states between conditions. To test this assumption in real data, we performed differential expression (DE) analysis between treatments using MAST (Finak et al. 2015; Supplementary Fig. S12). In the Tian and Kagohara datasets, none of the highly weighted genes were found to be DE (log 2 fc > 1, FDR < 0.05; Supplementary Fig.  S12), while only a single gene in the Kang dataset was considered DE when a looser log 2 fc threshold was applied (log 2 fc > log 2 (1.5), FDR < 0.05; Supplementary Fig. S12). These results are consistent with the assumption of marker gene stability across perturbations. However, importantly, four marker genes were considered DE in the PMC dataset (log 2 fc >1, FDR < 0.05; Supplementary Fig. S12); these genes contributed to ICAT's ability to sensitively identify the verified sm50/pks2 perturbed cell states. These outcomes underline the robustness of ICAT when challenged by marker genes whose expression is variable. Together, these results demonstrate that ICAT's novel combination of learned feature weighting combined with control-guided clustering significantly improves the computational capability to accurately detect and describe cellular responses to perturbation in scRNA-seq data compared with conventional integration workflows.

Discussion
One of the key promises of scRNA-seq technology is to more accurately define complex biological systems by offering a more finely resolved perspective on transcriptional landscapes than was previously possible. However, due to technical and biological noise, taking advantage of such a powerful tool in diverse biological conditions has proven to be challenging. We therefore developed the ICAT algorithm to improve the accuracy of cell state identification from scRNA-seq data that includes perturbation treatments. By combining sparse feature weighting with semisupervised Louvain community detection, ICAT not only correctly identifies shared and disparate cell states, but also produces interpretable results that aid in understanding analytical outcomes and selecting the most suitable downstream analyses.
When comparing the performance of ICAT to state-of-theart integration algorithms in simulated datasets, we found that ICAT performs more robustly in response to perturbation severity, readily identifies cell states at lower signal levels, better isolates asymmetrical and activated cell states, and more readily identifies rare cell populations when compared with Seurat, Scanorama, and Harmony. ICAT also outperforms the integration workflows in three benchmark datasets, further solidifying ICAT's ability to better identify cell states across biological conditions compared with either integration method alone. By testing ICAT on datasets with >20 000 cells (Kang et al. 2018;Kagohara et al. 2020), we demonstrated ICAT's ability to analyze the large datasets, necessary for any practical scRNA-seq tool. Depending on the experimental system, researchers may still elect to use integration algorithms. Excitingly, the choice between ICAT and integration methods is not mutually exclusive; we show that following Seurat, Scanorama, or Harmony with ICAT produces higher quality clusters compared with either of these integration workflows alone.
By experimentally validating two uniquely predicted cell states in a developmental model using single-molecule FISH, we empirically demonstrated that the capability of ICAT to accurately identify perturbed cell states surpasses current state-of-the-art integration workflows. ICAT was the only method among the four tested that correctly identified these distinct states and their perturbation responses, whereas integration workflows lacked the sensitivity to properly isolate both sm50/pks2 defined subpopulations, highlighting ICAT as a substantial advancement in identifying and describing cell state behaviors across biological conditions and perturbations.
With the growing ubiquity of scRNA-seq experiments and the development of ever more sophisticated experimental designs, we anticipate that ICAT will play an important role in highly resolving cell state-specific responses to treatments and differing biological conditions. In the future, the ICAT framework can easily be extended to operate on other singlecell modalities, such as scATAC-seq, and will offer a potential route to intelligently cluster multimodal data from heterogeneous conditions.