Topological and geometric analysis of cell states in single-cell transcriptomic data

Abstract Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.


Introduction
Single-cell RNA sequencing (scRNA-seq) provides a high-throughput measurement of gene expression profiles of individual cells which enables dissecting cellular heterogeneity in an unprecedented resolution. 1 Many computational methods have been developed for scRNA-seq data and have revealed numerous novel cell types, and differentiating trajectories. 2Clustering and trajectory inference are two main analysis tasks.They are often performed on a reduced dimensional space in which there is a metric to describe similarity among cells.In clustering, each cell cluster potentially represents a cell type and is often annotated by confirming marker genes with prior knowledge.In trajectory inference, a graph is often constructed by connecting cells with similar gene expression profiles upon which minimal spanning trees or graph coarsening can be performed to summarize the trajectory structures.
Identifying transition cells between states is crucial for inferring the local transitions between stable cell states.Compared to cells within a single cell type, transition cells between cell types are often not as effectively captured in scRNA-seq data due to the instability of transition states.Moreover, the biological properties of cell types such as developmental potency are mostly annotated using prior knowledge. 3Predicting the developmental potency can annotate the global temporal directionality in a dataset.Computational methods for exploring transition states and unsupervised analysis of developmental potency remain understudied.
Recently, several methods have been developed to study the transition states between cell types.QuanTC 4 and scRCMF 5 perform non-negative matrix factorizations on the cell-by-cell similarity matrix and cell-by-gene expression matrix, respectively with each factor representing a cell type.The entropy of the assignment scores of each cell to the cell types is used as an indicator of transition cells.Soft clustering algorithms can also derive soft assignment scores to determine the pure and transition cells such as SOUP, 6 DBCTI 7 and scTite, 8 using predefined criteria or entropy.MuTrans 9 models scRNA-seq data as a dynamical system based on a cell-fate dynamical manifold determined from clustering and can identify transition cells also using the entropy of assignment scores of cells to sinks.Cabybara 10 utilizes the vast reference databases of annotated bulk and single-cell transcriptomic data to assign cell types to the single cells and identifies cells predicted to have hybrid cell types as transition cells.These methods depend on clustering of data which often requires a predefined number of clusters or classification of data using reference training data.Here, we aim to explore the rich structures underlying single-cell data to infer transition cells without depending on clustering or classification results.
Analyzing pluripotency or developmental potency of cell types is valuable for refining structures and assigning global directions to the pseudo-temporal trajectories inferred from scRNA-seq data.With the accumulation of large-scale networks such as gene regulatory networks and protein-protein interaction networks, and computational methods for inferring large-scale gene networks from scRNA-seq data such as the correlation-based ones, 11,12 there exists an opportunity to infer the pluripotency by examining the structures of gene networks.For example, entropy 13,14 and curvature 15  The high-dimensional scRNA-seq data assembles a complex heterogeneous manifold, while the emerging field of topological and geometric data analysis (TGDA) 16 particularly aims to systematically extract structural information from such complex structures.Mapper 17 is one of the major tools in TGDA that derives a structural abstraction of often high-dimensional data and has been applied to scRNAseq data for extracting a simplified manifold underlying the data. 18Another major tool, persistent homology, [19][20][21] systematically examines topological features of different dimensions and at various geometric scales.Persistent homology has found its applications in various biological fields such as analyzing neural activity data 22 and structure-based biomolecular property predictions. 23,24 ersistent homology is generally applicable for different types of data including point clouds, volumetric data, 25 and networks. 26Its application in scRNA-seq data, however, remains unexplored.
Here, we aim to explore the usage of TGDA tools, specifically graph curvature and persistent homology, for establishing structure-function relationships in scRNA-seq to predict cell properties from the underlying structures of the data.We focus on two types of structures, a network of cells with cells connected based on their gene expression similarities and gene networks associated with each cell.Based on the cell network, we use Ollivier-Ricci curvature, a discretization of Ricci curvature on graphs, local persistent homology and relative persistent homology to identify transition cells independent from clustering or classification of cell types.For gene networks, we use vertex-based clique complex and edge-weighted Vietoris-Rips complex-based persistent homology to characterize node-weighted knowledge-based gene networks and edge-weighted cell-specific gene networks, respectively.The topological summaries are then related to the pluripotency or developmental potency of the cells.In a more general case, we also explore the usage of topological summaries as additional features in the task of cell type classification.These utilities are demonstrated on several real datasets with ground truth from scRNA-seq data on real time points or expert annotations.To explore the structure-function relationship underlying single-cell data, we develop scGeom, a tool that characterizes the geometric and topological properties of cell networks and gene networks and relate them to the biological properties of cells (Fig. 1).The single-cell data is first preprocessed following the common pipelines of normalization, highly variable gene selection and PCA dimension reduction.A cell network denoted by G c = (V c , E c ) is then constructed by building a k-nearest neighbor graph with respect to Euclidean distance of the PCA embeddings.On G c , a graph curvature is computed for each edge using a discretization of Ricci curvature on graphs, the Ollivier-Ricci curvature (ORC) 27 which measures the divergence of local geometry from the Euclidean space.Specifically, for the edge e ij ∈ E c , ORC examines the difference between the edge length e ij and the distance between neighborhoods of v i and v j by optimal transport with shortest path distance as the ground cost.The curvatures on nodes are then defined by summing over the corresponding edges.A cell within a community or between communities is likely to have a positive or negative graph curvature respectively, analogous to scalar curvature in Riemannian geometry.

Method Overview
The topological structures are characterized by persistent homology. 19,20 he structure of the network is represented by a growing sequence of simplicial complexes that generalize graphs to higher dimensions.This sequence is called a filtration which captures the structural features at various scales compared to using one fixed graph or simplicial complex.Along the filtration, persistent homology tracks the appearances and disappearances of k-dimensional holes and their persistence through the filtration.For example, the 0, 1, and 2−dimensional holes correspond to connected components, loops, and voids.Given a graph or point cloud, persistent homology outputs collections of persistence intervals, also called persistence diagrams, i=1 representing the filtration values corresponding to the appearance (b i ) and the disappearance (d i ) of the kth homology groups associated to the k-dimensional holes.Details of graph curvature and persistent homology are discussed in Sections 3.1 and 3.2.For each cell in the cell network, persistent homology is computed for its local neighborhood.The graph curvature κ and featurizations of the persistence diagram such as the total persistence i (d i − b i ) are used to distinguish cells in the transition and stable states (Fig. 1b).
A cell-specific gene network, 11 G g is constructed to reveal higher-order properties of cells in addition to the first-order gene expression levels.Persistence diagrams are computed for G i g of cell i using filtrations such as edge-weighted Vietoris-Rips complex, resulting in persistence diagrams Dgm k (G i g ).We then turn the persistence diagrams into features including persistence images that fit kernels to the Dgm k regarded as point clouds, Betti curves that count the number of homology groups at every filtration value, and various statistics of Dgm k such as total persistence and longest persistence.
These features are then used to analyze the pluripotency or developmental potency of cells and fed to machine learning methods together with gene expression features for predicting cell types (Fig. 1c).
Details of the methods and preprocessing of data can be found in Section 3.

Identifying transition cells with curvature and local topology
We first analyzed a scRNA-seq data of myelopoiesis which captures several transitional intermediate states during differentiation of blood cells. 28In this dataset, two relatively unstable states, a multilineage state and a monocyte intermediate state were identified by the original study using the ICGS approach 28 and another analysis of the dataset using a physics-based modeling tool MuTrans 9 (Fig. 2a).Further, the MuTrans analysis constructed a differentiation landscape and identified transition cells between states depicted by the entropy of probability score assigned to each state (Fig. 2a).
Here, based on k-nearest neighbors graphs of cells using the PCA embedding, we computed the Ollivier-Ricci curvature for each cell.We also computed topological features of each cell including local persistent homology on a neighborhood graph centered at the cell and the relative persistent homology of the global structure relative to a small neighborhood of the cell.The local persistent homology captures the multiscale and multidimensional structural characteristics of the local structure centered at each cell and the resulting persistence diagrams are turned into features by computing the total persistence and the persistence entropy. 29The relative persistent homology examines the significance of a cell in defining the global structure of the dataset.The resulting persistence diagrams are described by computing the Wasserstein distance between the relative persistence diagram and the regular persistence diagram of the whole dataset.These geometric and topological features were able to highlight both the relatively unstable cell states and the transition cells between states (Fig. 2b).
We then analyzed a single-cell dataset of induced pluripotent stem cells (iPSCs) taken at several real time points 30 (Fig. 2c).This dataset depicts two major transition events, epiblast (EPI) to primitivestreak (PS) cells and PS to mesenchymal (M) and endodermal (En) cells.Utilizing biological knowledge, the original study 30 projected these two transition events to happen around day 1.5 and day 2.5, respectively.Here, we perform an unbiased analysis without using any prior knowledge.We computed the graph curvature and local persistent homology on the PCA embedding and found a significant decrease in curvature and an increase in total persistence at day 1.5 and day 2.5.A smaller curvature indicates bridges between stable states.For topology, more significant topological features, for example, higher total persistence, reflect divergence from trivial structures and thus reveal the transition processes.

Topological signature reflects developmental potential
In addition to examining the structures of cell networks, we further explore the relation between cell states and the structures of gene networks.A prior knowledge-based gene network 13 was assigned to each cell where the node weights were determined by the gene expression levels in the corresponding cell.For each cell, persistent homology was computed on this node-weighted gene network using vertex-based clique complex filtration where the edge filtration value is determined as the smaller weights of its two nodes.
We first analyzed a scRNA-seq data of human definitive endoderm development. 31In this dataset, the human embryonic stem cells (hESC) are pluripotent cells that differentiate into several lineagespecific progenitors.Evaluating the persistent homology of cells at each state, we observe an increase in H 0 total persistence and a decrease of H 1 total persistence along the differentiating progress (Fig. 3a).The H 0 and H 1 persistent homology captures connected components and loop-like structures which indicates that the gene network of pluripotent cells tends to have less isolated components (shorter H 0 persistence) and differentiated cells tend to be active in a localized part of the gene network (shorter H 1 persistence).Comparing hESC cells with all other cells also shows significantly lower H 0 persistence and higher H 1 persistence in hESC cells (Fig. 3a).The Betti curves summarize the number of connected components (H 0 ) and 1-dimensional holes or loops (H 1 ) at every filtration value which also show that hESC cells have fewer disconnected parts and larger-scale loops with more coverage of the gene network(Fig.3b).
We also analyzed a scRNA-seq data of mouse pancreatic α cell maturation tracking along one cell lineage. 32In this dataset, scRNA-seq experiments were performed for pancreatic α cells at different developmental stages including embryonic data 17.5 (E17.5) and postnatal day P0, P9, P15, P18, and P60 (Fig. 3c).During the maturation, we also observed a similar pattern with increasing H 0 persistence and decreasing H 1 persistence (Fig. 3d).The observation is further confirmed in the Betti curves and persistence barcodes of example cells from each maturation stage (Fig. 3e).Together, these examples demonstrate that the topological signatures of gene networks reflect the developmental potential of cells.

Topoligical machine learning improves cell type classification
Having shown the utility of topological and geometric structures in single-cell data for analyzing transition cells and developmental potential, here, we explore the usage of these structures in the general task of cell type annotations.We used a mouse brain dataset and a mouse kidney dataset from the cell type annotation subtask in a benchmarking resource 33 with pre-defined train/test splittings.In this task, a predictive model is trained on annotated data to predict cell types from their gene expression profiles.
We performed two topological characterizations of the gene networks for each cell.First, a cellspecific gene network (CSN) is constructed using a correlation-based approach 11 which results in an edge-weighted gene network for each cell.The CSNs were generated on processed data using the preprocessing pipeline of SingleCellNet 34 to select the marker genes of each cell type.Then, we computed persistent homology using an inverse Vietoris-Rips complex-based filtration which adds 1simplexes and subsequently the higher dimensional simplices with large edge weights first.Second, we utilize a prior knowledge-based gene network (SCENT 13 ) on the single-cell datasets without gene filtering.The same gene network structure is assigned to each cell but with different node weights assigned from gene expression levels.In this case, persistent homology was computed based on a vertex-based clique-complex filtration where 0-simplices and subsequent higher dimensional simplices with higher weights are added first.The total persistence and persistence entropy 29 of the resulting persistence diagrams were used as topological features for the cells.
In both benchmarks, the classification performance is improved with the additional topological features evaluated by accuracy, balanced accuracy, and macro-average precision and recall (Fig. 4. The Betti curves and persistence barcodes of several example cells demonstrate the differences in the topological signature of gene networks across different cell types (Fig. 4c,f).In the brain dataset, interestingly, we observe significantly longer persistence in the H 1 persistence barcodes of neuron cells compared to other brain cells indicating a broader coverage of gene network and diverse functions.This observation agrees with the diverse signals sent and functions controlled by neuron cells. 35Methods

Curvature on graphs
Given a metric, the Ricci curvature describes how much the local geometry on a manifold differs from that of the ordinary Euclidean space.It measures intrinsic local properties of manifolds such as divergence of geodesics and meeting probabilities of coupled random walks.Several constructions have been introduced to define Ricci curvature on graphs, such as Ollivier-Ricci 27,36,37 curvature.For an edge connecting two nodes, Ollivier-Ricci curvature (ORC) measures the difference between the edge distance between the nodes and the optimal transport distance between the nodes' neighborhoods.
Let G = (V, E) be an undirected graph with vertices V = {v i } n i=1 and edges E = {e ij } 1≤i,j≤n , a measure is defined for each node describing its local neighborhood such that where N (v i ) is the set of nodes connected to v i in G and α ∈ [0, 1] is the parameter annotating the weight on the center node.The ORC κ ij between v i and v j is then defined to be where d() is the shortest path distance on G and d W is the Wasserstein distance with d() as the ground metric.Specifically, an optimal transport problem is solved,

Persistent homology
Persistent homology [19][20][21] provides a comprehensive multiscale topological characterization by tracking the appearance and disappearance of homology groups through a filtration which is a growing sequence of simplicial complexes defined on the data.An abstract k-simplex is a set of k + 1 vertices of simplices satisfying that all faces of any simplex in K is also in K and the intersection of any pair of simplices is either empty or a common face of the two.A filtration of a simplicial complex K is a nested sequence of its subcomplexes, denoted by c k is a formal sum of k-simplices in K with coefficients from a chosen set, for example, The k-chains of K forms a group called the kth chains group denoted by C k (K).The k-chains are connected by a linear boundary operator also the kth Betti number of K representing the number of k-dimensional holes in K. On the filtration of K, the p-persistent kth homology group is defined to be which intuitively represents a topological feature observed at filtration step i and persists through step ) result in a persistence pair conveniently represented as the interval [x i , x j ) often called the birth-death pairs where x i and x j are the filtration values of K i and K j respectively.Persistent homology characterization of a dataset results in a collection of such birth-death pairs and is often visualized as persistence barcodes (plotting each pair as a horizontal bar whose two endpoints correspond to the birth and death values) or persistence diagrams (plotting each pair as a point in 2D).

Curvature in single-cell data
The raw single-cell data was first preprocessed by normalizing total counts in every cell and log1p transformation (log(1 + x)).Principal component analysis is then performed with the selected highly variable genes.A k-nearest neighbor graph is constructed based on the Euclidean distance in the space of top principal components.The ORC is computed on this cell network with the α parameter (the portion of mass assigned to the center cell when determining a mass distribution representing the neighborhood of the cell) set to 0.5.The preprocessing was performed using the Scanpy package. 38

Topology of cell networks
To characterize the topological structures of the cell network for each cell, we use two approaches, a local persistent homology and a relative persistent homology.The local persistent homology aims to capture the local structure surrounding a cell in the cell network.Here, we adopt a simple approach by computing the regular persistent homology on a sub-network surrounding a cell defined by either the top k nearest neighbors or a distance cutoff.This approach has been shown effective in capturing local topological structures in various applications such as biomolecular structure analysis. 24The relative persistent homology, on the other hand, captures the significance of a cell or the neighborhood of a cell in assembling the global structure of the dataset.Let ∅ = K 0 ⊂ K 1 ⊂ • • • ⊂ K n = K be a filtration of the whole dataset.For cell i, we define a subcomplex L i which contains only this cell or its local neighborhood on the network.Then, relative persistent homology examines the homology on the relative chain groups which are quotient groups C k (K j )/C k (K j ∩ L i ).The impact of cell i on assembling the global dataset structure is then quantified by computing the Wasserstein distance between the relative persistent homology diagrams and the regular persistent homology diagram of the whole dataset.For both approaches, we used the Vietoris-Rips filtration with the Euclidean distance between cells in their PCA embeddings.

Topology of gene networks
Here, we consider two types of gene networks, a cell-specific gene network in the form of edgeweighted networks and a prior knowledge-based gene network in the form of node-weighted networks.
The package CSN 11 is used to construct a cell-specific gene network for each cell on the top highly variable genes in the dataset.The core statistic in this method evaluates the local association between each gene pair for every cell.Specifically, for cell k and genes x and y, ρ where n is the total number of cells.The parameter n (k) x is predefined and induces an interval I y .Then n x y (k) counts the number of cells in whose expressions of gene x and gene y both fall in the two intervals, respectively.Here, we used the top 1000 highly variable genes and used the default parameter values in the CSN method 11 with n (k) x = n (k) y = 0.1n and significant level set to 0.01.The constructed cell-specific gene networks are edge-weighted networks.Denoting the network of a cell by G = (V, E, W (e) ), we compute persistent homology with the Vietoris-Rips filtration V R(δ) = {σ : ∀σ (1) The filtration is computed from δ = δ max = max{W (e) } to δ = 0.The resulting persistence pairs For the prior knowledge-based gene network, a base network is first assigned to each cell and the gene expression levels in each cell are assigned as node weights.Denoting the network of a cell by G = {V, E, W (v) }, persistent homology is computed using the vertex-based clique complex filtration Similar to the edge-weighted network, the filtration is computed from δ = δ max = max{W (v) } to δ = 0.The resulting persistence pairs [b i , d i ) i are also transformed to [δ max − b i , δ max − d i ) i .The log1p transformed gene expression levels are used to assign node weight and the full dataset without gene filtering is used with the knowledge-based gene network.
The computation of filtration and persistent homology was based on the packages Gudhi, 39 Diony-sus2, 40 and Ripser. 41

Featurization of persistence diagrams
For dimension k, persistent homology computation results in a collection of n (k) persistence pairs i=1 where b are the filtration values corresponding to the birth and death of a topological feature (k-dimensional holes).Several summaries and features are derived from the persistence pairs.Total persistence describes the overall significance of topological features in the data and is computed as i , δ max }.Persistence entropy 29 describes the heterogeneity of persistence similar to Shannon entropy and is stable with respect to small perturbations in the input space.Specifically, it is computed as E i is the persistence of the ith pair.In addition to global summaries, Betti curves describe the structural changes along the filtration and are convenient for illustrating the average behavior of a group of persistence barcodes.For a filtration value δ, the k-dimensional Betti curve is computed as i )}.Betti curve is computed on a discretization of the filtration interval [0, δ max ].The Gudhi package 39 was used to compute the persistence entropy and Betti curves.

Machine learning and evaluation metrics
In the application of cell type classification, the implementation of random forest model in scikit-learn package 42 was used with 5000 trees and "class_weight" parameter set to "balanced".All other parameters are set to default values.The classification results on the testing set were evaluated using accuracy: (1/N ) i 1(ŷ i = y i ); balanced accuracy (adjusted): ; and recall (macro-average): Here, N is the total number of samples, N c(i) is the number of samples of the same class as sample i in the ground truth, y i is the true label of sample i, ŷi is the predicted label of sample i, C is the number of classes, Y c is the set of samples of class c in ground truth, Ŷc is the set of sample predicted to be class c, and 1() is the indicator function.

Conclusion
To exploit the underlying complex structures in scRNA-seq data, we developed scGeom, a tool to derive topological and geometric signatures from the network of cells and gene networks associated with each cell.It utilizes Ollivier-Ricci curvature, local persistent homology, relative persistent homology and persistent homology filtrations for edge-weighted and node-weighted networks.The utilities of these structural characterizations have been demonstrated on real scRNA-seq datasets for identifying transition cells, quantifying pluripotency or developmental potency of cells, and assisting in the classification of cells.
Persistent homology is used as a structural descriptor in this work without tracing back to individual cells or genes from the topological signatures.Recently, several methods were proposed to connect the topological features with the input data [43][44][45] which could help interpret the topological features in terms of cells or genes.The topological structures are compared as topological summaries between large-scale gene networks in this work.When comparing small-scale networks in the future, such as specific pathways, two networks could have the same structure but different arrangements of genes which will result in identical topological structures.This could be addressed by a very recent work that compares topological summaries while considering the differences in the original data. 46is work presents one of the initial endeavors to apply persistent homology to scRNA-seq data.
The methods are also potentially applicable to other single-cell omics data 47 given some similarity measurement between cells or association scores between features.With the recent developments of multiparameter persistent homology, 48 different metrics can be considered simultaneously in complex data such as single-cell multi-omics data with multiple similarity measurements and spatial transcriptomics data with both spatial distance and gene expression similarities.

Data and code availability
All datasets used are publicly available.1) The myelopoiesis data 28 is available on GEO with accession number GSE70245; 2) The iPSC data 30 is available in the Supplementary Data of the original publication; 3) The endoderm development data 31 is available on GEO with accession number GSE75748; 4) The pancreatic α cell data 32 is available on GEO with accession number GSE87375; 5) The mouse brain and kidney data with annotated cell types and predetermined train/test splits were downloaded using the Dance package. 33e package scGeom is available at https://github.com/zcang/scGeom.
on gene networks have been used to reflect the pluripotency of cells.These methods use global summaries of the local properties of the gene networks.Here, we aim to further use topological methods for multiscale exploration of both local and global structures of the gene networks.

Figure 1 :
Figure 1: Overview of scGeom.a The structure of a scRNA-seq data is often represented as cell networks where the cell-specific gene networks can be inferred for the cells.b The local structure of each cell is described by curvatures and local topology which are correlated to cell states.c The structures of cell-specific gene networks are characterized by various topological descriptors that are used to link to cell properties such as pluripotency and cell types.

Figure 2 :
Figure 2: Analysis of transition states.a A single-cell dataset of myelopoiesis.Applying MuTrans results in the entropy measuring the uncertainty of cluster assignment of a cell which indicates transition cells.b The structural features in the myelopoiesis dataset computed by scGeom on the cell network for each cell including curvature, local persistent homology, and relative persistent homology.The local persistent homology summarized as persistence entropy and total persistence, and the relative persistent homology described by the Wasserstein distance between relative persistence diagram and regular persistence diagram are shown.c A single-cell dataset of induced pluripotent stem cells (iPSC) taken at several temporal points from day 0 to day 3 where two transition events happen at day 1.5 and day 2.5.d,e The curvature on the cell network and the total persistence of the local persistent homology output.** indicates p-value less than 1e-10 by Wilcoxon test.

Figure 3 :
Figure 3: Topological analysis of developmental potential.a For a scRNA-seq data of human definitive endoderm development, the total persistence of H0 and H1 persistence barcodes were computed from a vertex-based clique complex of prior knowledge-based gene network.(hESC: H1 and H9 human embryonic stem cells, NPC: neuronal progenitor cells, DEC: definitive endoderm cells, TB: trophoblast-like cells, HFF: human foreskin fibroblasts, EC: endothelial cells) * and ** indicate p-value less than 0.05 and 1e-10, respectively for Wilcoxon tests between hESC and other cell types.b Average H0 and H1 Betti curves for the detailed celltypes and for hESC versus all other cell types.For the latter, 95% confidence intervals for the mean curve are shown.c A scRNA-seq data of pancreatic α cell maturation where the arrows show the ground truth developmental trajectory.d The total persistence of H0 and H1 persistence barcodes computed from vertex-based clique complex of the prior knowledge-based gene network.* and ** indicate p-value less than 0.05 and 1e-10, respectively for Wilcoxon tests between α-cell E17.5 and other cell states.e Average Betti curves for each cell state with 95% confidence intervals of curve mean, and persistence barcodes of example cells from each state.

Figure 4 :
Figure 4: Topology-assisted cell type annotation a A scRNA-seq dataset of Kidney with expert annotated cell types.b The classification performance with or without using topological features.The performance is evaluated by accuracy (ACC), adjusted balanced accuracy (Balanced ACC), precision (PRE) and recall (REC) both with macro-average.c The average H0 and H1 Betti curves for each cell type and persistence barcodes of two example cells for the Vietoris-Rips filtration on the cell-specific gene networks.d,e The classification performances on a scRNA-seq dataset of the brain.f The average H0 and H1 Betti curves for each cell type and persistence barcodes of two example cells for the vertex-based clique complex filtration on the prior knowledge-based gene network.
of gene x in the top n (k) x cells whose gene x expression is the closest to cell k.Similarly, an interval for gene y, I (k) y is determined by n (k) by removing vertex i.Based on the boundary operator, two groups are defined, the kernel of ∂ k , Z k (K) = ker(∂ k ) whose elements are called k-cycles and the image of ∂ k+1 denoted by B k (K) also called the kth boundary group.The kth homology group is then defined as the quotient group H k