Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep

Abstract Motivation Spatial clustering is essential and challenging for spatial transcriptomics’ data analysis to unravel tissue microenvironment and biological function. Graph neural networks are promising to address gene expression profiles and spatial location information in spatial transcriptomics to generate latent representations. However, choosing an appropriate graph deep learning module and graph neural network necessitates further exploration and investigation. Results In this article, we present GRAPHDeep to assemble a spatial clustering framework for heterogeneous spatial transcriptomics data. Through integrating 2 graph deep learning modules and 20 graph neural networks, the most appropriate combination is decided for each dataset. The constructed spatial clustering method is compared with state-of-the-art algorithms to demonstrate its effectiveness and superiority. The significant new findings include: (i) the number of genes or proteins of spatial omics data is quite crucial in spatial clustering algorithms; (ii) the variational graph autoencoder is more suitable for spatial clustering tasks than deep graph infomax module; (iii) UniMP, SAGE, SuperGAT, GATv2, GCN, and TAG are the recommended graph neural networks for spatial clustering tasks; and (iv) the used graph neural network in the existent spatial clustering frameworks is not the best candidate. This study could be regarded as desirable guidance for choosing an appropriate graph neural network for spatial clustering. Availability and implementation The source code of GRAPHDeep is available at https://github.com/narutoten520/GRAPHDeep. The studied spatial omics data are available at https://zenodo.org/record/8141084.

Spatial clustering constitutes a foundational and indispensable procedure within spatial omics research (Zeng et al. 2022).It profoundly influences downstream analyses' accuracy and performance, spanning spatial deconvolution, gene imputation, spatial organization reconstruction, cell-cell communication, and spatial trajectory inference.Diverse methodologies have emerged to address spatial clustering.For instance, Louvain, KMeans, and Leiden (Traag et al. 2019) are traditional approaches for handling gene expression features in situ sequencing data.Bayesian statistical models underlie BayesSpace (Zhao et al. 2021) and SC-MEB (Yang et al. 2022) to integrate spatial coordinates and gene expression.Leveraging morphological similarity and neighborhood smoothing, stLearn (Pham et al. 2020) segregates clusters within morphological images.By aggregating contiguous pixels within digital images, MULTILAYER (Moehlin et al. 2021) elucidates biological substructures.
Recent advancements in graph neural networks (GNNs) have unveiled unprecedented potential for integrating spatial position and gene expression information, facilitating spatial clustering tasks.GNNs, particularly when embedded within robust graph deep learning (GDL) modules, hold promise for discriminating spatial groups.Notably, SpaGCN (Hu et al. 2021) employs the traditional graph convolutional network (GCN) to capture cluster heterogeneity, integrating spatial location, gene expression, and histology.SEDR (Fu et al. 2021) and STAGATE (Dong and Zhang 2022) utilize graph autoencoders for managing spatial location profiles, wherein derived latent embeddings enhance clustering accuracy.CCST (Li et al. 2022) extends the unsupervised deep graph infomax (DGI) module to maximize mutual information between global summaries and local representations, creating latent embeddings.Furthermore, DeepST (Xu et al. 2022) utilizes variational graph autoencoders (VGAE) and denoising autoencoders for processing features from spatial transcriptomes and generating latent representations.However, existing spatial clustering architectures lack compatibility with heterogeneous spatial omics data.Clustering accuracy is influenced by sequencing quality and spatial data resolution.These frameworks lack adequate justification for selecting specific GDL module and GNN.Moreover, extensibility in modules, GNNs, and data diversity remains insufficient.
To address these limitations, this manuscript introduces GRAPHDeep, a novel approach for accomplishing spatial clustering tasks by employing an array of graph learning modules and neural networks.GRAPHDeep integrates two robust GDL modules, VGAE (Kipf and Welling 2016) and DGI (Veli ckovi c et al. 2018), utilizing 20 GNNs as encoders and decoders.This encompasses a total of 40 distinct GNN-based frameworks, each contributing to the spatial clustering objective.These frameworks are applied across various spatial omics data types, including 10X Visium (Prakrithi et al. 2023), Slide-seqV2 (Stickels et al. 2021), 4i (Gut et al. 2018), MERFISH (Lu et al. 2021), among others, to generate latent representations within an embedding space.Consequently, a versatile GDL architecture for spatial clustering is assembled, catering to diverse data types.Furthermore, comparative experiments are conducted against state-of-the-art methods [STAGATE (Dong and Zhang 2022), spaGCN (Hu et al. 2021), DeepST (Xu et al. 2022), CCST (Li et al. 2022), SEDR (Fu et al. 2021), stLearn (Pham et al. 2020), BayesSpace (Zhao et al. 2021), and Scanpy (Wolf et al. 2018)] to demonstrate the efficacy of the GNN-based clustering methodologies.The goal of GRAPHDeep is to showcase that GNN-based clustering methods can achieve improved performance by selecting the right GNN.GRAPHDeep exhibits remarkable extensibility across the GDL module, GNNs, and spatial omics data, rendering it highly versatile for comprehensive analyses of spatially resolved transcriptomes.

Overall architecture of GRAPHDeep
Figure 1 illustrates the comprehensive schematic diagram of GRAPHDeep.The core segment encompasses the GDL module (Fig. 1C).This study examines and contrasts two pivotal modules, VGAE and DGI.Surrounding the GDL modules, three distinct constituents emerge: input, output, and encoder.The spatial omics data serve as GRAPHDeep's input (Fig. 1A).Specifically, four spatial datasets are analyzed for spatial clustering.Each individual entity within spatial omics data is interpreted as a cell, pixel, or spot (referred to as "spot" for brevity).The feature matrix is formulated based on gene expression profiles, while the adjacent matrix is derived from the spatial location information (Supplementary Fig. S1).
In GRAPHDeep, the encoders comprise diverse GNNs (Fig. 1B).Given the current literature's lack of consensus on GNN selection, GRAPHDeep assesses 20 GNNs (Supplementary Table S1) within two GDL modules.These GNNs encompass prominent convolutional neural networks like auto-regressive moving average (ARMA), graph attention network (GAT), hypergraph convolution, molecular fingerprint (MF), selfsupervised GAT, topology adaptive graph (TAG), among others (Wu et al. 2021) (Supplementary Table S1).To integrate GNNs and spatial omics data, the optimal GNN for each dataset is identified.These findings inform GNN selection in the dynamic landscape of spatial clustering research.
Post-training, spatial features yield latent representations in the embedding space (Fig. 1D).Quality of latent embeddings significantly influences segmentation accuracy and performance.Subsequent to clustering, downstream analysis would be conducted (Fig. 1E).For instance, co-occurrence probability between two clusters may elucidate tissue development and disease progression.Neighbor enrichment scores across groups shed light on tumor infiltration and gene-gene interaction.We identify superior GNNs for spatial clustering based on performance across datasets and calculate average rankings.Comprehensive exposition of GRAPHDeep's capabilities is detailed in the subsequent experimental results.

Deep graph infomax
As shown in Fig. 1A, the DGI module (Veli ckovi c et al. 2018) takes the feature matrix X and adjacent matrix A as inputs.In specific spatial omics data, the feature matrix arises from normalized variance of highly variable genes.The adjacent matrix depends on spot-to-spot distances, computed via the radius neighbor model or k-nearest neighbor model (Wang et al. 2020).
The DGI module (Fig. 1C) comprises four essential elements: the corruption function C, encoder E, readout function R, and discriminator D. Initially, the corruption function generates negative samples as ðX; AÞ ¼ CðX; AÞ and is achieved by the torch.randpermfunction in Pytorch.Subsequently, the encoder transforms the feature and adjacent matrices to yield latent embeddings: (1) where encoder implies the chosen GNN.h and h are the latent embeddings with respect to the positive and negative samples.
The readout function is further employed to create the global summaries: Finally, the mutual information between the local representations and global summaries is signified by the discriminator as a probability score Dðh; SÞ and Dðh; SÞ.Without a doubt, the training objective is associated with these two mutual information.Following the original DGI module, a standard binary cross-entropy loss function (Veli ckovi c et al. 2018) is formulated:

Variational graph autoencoder
The critical reformation in VGAE is applying a two-layer GNN structure to generate the latent embedding.The firstlayer GNN decides a lower-dimensional feature matrix X $ as (Kipf and Welling 2016): where A $ is the symmetrically normalized adjacency matrix.ReLUð:Þ ¼ maxð0; :Þ is the rectified linear activation function.W 0 is the weight parameter for this layer GNN.Here, GNN has the sames meaning as encoder.Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep In the second-layer GNN, the mean and variance vectors (Fig. 1C) of feature matrix are computed with weight parameters W 1 : (5) These two values share the same weight parameters.Then, the latent representations are deduced via the reparameterization trick (Kingma and Welling 2014): where 2 Normð0; 1Þ is the standard normal distribution.In the VGAE module, the incoming and predictive graphs are expressed with probability distributions, i.e. qðZjX; AÞ and pðAjZÞ.The adjacency matrix is reconstructed by decoder (inner product) as: The loss function encompasses two error categories.The initial component quantifies reconstruction error, assessing input-output graph resemblance.The second minimizes discrepancies between distributions q and p.This loss is expressed as follows: where KL(.) indicates Kullback-Leibler divergence between two probability distributions.This study assesses the encoder in DGI and VGAE modules across 20 GNNs (Supplementary Table S1), utilizing Python's PyTorch_pyG (Fey and Lenssen 2019).These recently published neural networks generate latent embeddings for spatial domain identification.Within benchmarking methods, one GNN is incorporated, potentially not being the optimal selection.The hardware configuration and hyperparameters in GNNs are described in Supplementary Note S1.1.For various spatial omics datasets with different genes, the hyperparameters remain unchanged for a fair comparison purpose.

Spatial omics data
Four spatial omics datasets are involved in this article.Human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data are accessible via the 10X Visium platform.The spatialLIBD project (Pardo et al. 2022) presents an extensive analysis, encompassing spot-level, layer-level data, and spatial marker genes.These data comprise 12 samples, each spanning multiple neuronal layers and white matter.Among these, eight samples exhibit seven clusters, while the remaining four samples manifest five groups.To assess GRAPHDeep, Slice 151673, containing 3639 spots and 33 538 genes, is utilized for evaluation, albeit without universal application.
Iterative indirect immuno-fluorescence imaging (4i) (Kramer et al. 2023) constitutes a high-throughput method for spatial protein detection.Leveraging immuno-fluorescence microscopy, over 40 heterogeneous proteins are discernible across varied spatial scales within biological samples.Employing 4i technology facilitates quantification of expression levels and captures subcellular distribution.The 4i approach enables quantification of protein abundance, phosphorylation, and sub-compartmentalization.Squidpy (Palla et al. 2022) houses the 4i dataset comprising 270 876 cells and 43 proteins.This study utilizes a subset of the 4i dataset, featuring 16 742 cells, 43 proteins, and 10 cell types, for spatial clustering experiments.
Slide-seqV2 (Stickels et al. 2021) stands as a near-cellular resolution spatial transcriptomics dataset, capturing mRNAs' localization in mouse hippocampal neurons at 10 lm spatial resolution.This method thoroughly reproduces the initial Slide-seq protocol, encompassing library generation, bead synthesis, and array indexing.Focusing on GNN-based spatial clustering approaches, this article directly employs the Slide-seqV2 dataset from Squidpy.The pre-processed mouse hippocampus data, intended for spatial clustering study, comprise 26 146 cells, 4000 genes, and 14 groups.
MERFISH represents a massively multiplexed singlemolecule imaging technology (Lu et al. 2021), capable of quantifying RNA species' copy numbers and spatial positions within individual cells.The Vizgen data program includes multiple MERFISH datasets, notably the MERFISH mouse liver map and MERFISH mouse brain receptor map.This study concentrates on the pre-processed MERFISH mouse liver dataset from Squidpy (Palla et al. 2022).Similarly, a subset of the original data is extracted for spatial domain analysis to enhance computational efficiency.This subset encompasses 13 806 cells, 347 genes, and 28 groups.CCST (Li et al. 2022) merges the DGI module and GCNs, deriving latent embeddings from a weighted hybrid adjacency matrix.stLearn (Pham et al. 2020) employs tissue morphology, gene expression, and spatial distance for spatial clustering.BayesSpace (Zhao et al. 2021) applies Bayesian statistics to integrate spatial coordinates and gene expression, but its performance on simulated data is unsatisfactory.Scanpy (Wolf et al. 2018) serves as a Python toolkit for single-cell gene expression data analysis, including preprocessing, visualization, and clustering protocols suitable for spatial omics data.Louvain is the standard clustering algorithm in Scanpy.

Benchmarking methods
Benchmarking methods are implemented according to their specifications and default parameters for equitable comparison.Supplementary Note S1.1 details spatial omics data preprocessing, neural network hyperparameters, and hardware configuration.The adjusted Rand index (ARI) serves as the spatial clustering metric (Supplementary Note S1.2).

Clustering performance on DLPFC data
GRAPHDeep is employed to construct the GNN-based spatial clustering framework on DLPFC dataset.GRAPHDeep is executed across 12 samples, with Slice 151673 serving as an illustrative example.The ground truth, featuring seven clusters (six layers and white matter), is illustrated in Fig. 2C.ARI values for 20 GNNs in DGI and VGAE modules are presented in Fig. 2A and B, respectively.The highest ARI in the DGI module is 0.65 (SGC), and in the VGAE module, it is 0.60 (TAG).The optimal spatial clustering framework for this sample emerges as SGC in the DGI module.SGC and TAG are the top two accurate GNNs for the DGI module (Fig. 2A).Interestingly, the worst-performing GNNs differ between the two GDL modules (FiLM for DGI, 0.21; MF for VGAE, 0.13), suggesting that GDL influences GNN performance.The spatial clustering results of the top five GNNs analyzed on the tissue sample are compared with the ground truth, as depicted in Fig. 2D and E. Unfortunately, none of these clustering frameworks could accurately identify Layer_4 in the ground truth.The highest achieved clustering accuracy only reaches an ARI of $0.6, primarily due to the unsupervised nature of the task.It is worth acknowledging that SGC with the DGI module comes closest to matching the ground truth (Fig. 2E).Comparing GNNs with similar ARI levels reveals interesting insights.For instance, TAG demonstrates an ARI value of 0.60 in both the DGI and VGAE modules.However, upon closer examination of spot distributions (Fig. 2D and  E), it becomes evident that the spot distribution in the DGI module appears to be more favorable than that in VGAE.This observation aligns with the conclusion that the DGI module is better suited for this specific sample.Even under the same GDL condition (FeaSt sharing an identical ARI of 0.51 with GCN in DGI module), the spot distributions exhibit significant variations.
Figure 3A and B depicts ARI values using boxplots for 12 DLPFC data samples.The top five GNNs in these modules are SGC, TAG, FeaSt, SAGE, and ARMA, and Hypergraph, SGC, GCN, GATv2, and SAGE.In the DGI module, the top five ARI values surpass those in VGAE conditions, indicating DGI's superior applicability to this DLPFC dataset.For this human DLPFC data, the recommended combination is DGI and SGC.Interestingly, the top five GNNs determined across 12 samples closely match those for the single 151 673 sample, such as SGC, TAG, FeaSt, GCN in the DGI module, and SGC, GATv2, SAGE, UniMP in the VGAE setting.

Computational efficiency evaluation
GRAPHDeep is applied to two spatial protein datasets, 4i and MIBI-TOF, to establish an optimal spatial clustering structure and validate GNN candidate efficacy.Protein profiles inform the feature matrix derivation.Fewer proteins lead to an immature matrix and consequently inferior model performance, with calculative efficiency being a concern.The 4i subset comprises 16 742 cells/spots, 43 proteins, and 10 groups (Fig. 4A).Clusters are denser than MIBI-TOF data (Supplementary Fig. S3), with evident groups like ER_mitochondria_1, Endosomes_golgi_1, Nucleolus, and Nucleus.Additionally, 4i data boasts more proteins (43 versus 36 in MIBI-TOF), indicating potential for enhanced clustering performance.
Figure 4D and E presents ARI values of each GNN, with DGI slightly outperforming VGAE in this dataset.The spatial clustering results of top four GNNs in DGI module are depicted in Supplementary Fig. S2.The highest ARI in two modules are 0.49 and 0.44, respectively.Some GNNs share the same ARI values in this protein data, while MF and EG emerge as the least effective options.Despite the increased spot count in 4i data, ARI values surpass those in MIBI-TOF data (Supplementary Fig. S3D and E), credited to protein/ gene enhancement.This underscores the significance of selected informative genes or proteins in spatial clustering tasks.
Furthermore, the increasing spots and genes exacerbate computational load.4i consumes more time than MIBI-TOF (Fig. 4F and Supplementary Fig. S3F).The same top five timeconsuming GNNs appear in both GDL modules: FiLM, UniMP, GEN, MF, and SuperGAT (Fig. 4F and G), aligning with good ARI performance.For instance, VGAE includes GEN, TAG, and UniMP (Fig. 4D), and DGI features SuperGAT, GATv2, and UniMP (Fig. 4E).Consequently, GRAPHDeep can select GNNs based on distinct goals.For clustering accuracy alone, top GNNs from Fig. 4D and E are chosen.When considering performance-efficiency trade-offs, GNN candidates are detailed in Fig. 4C.Merely four GNNs overlap between the top 10 accurate and efficient GNNs in each module.SGC and GCN are the further overlaps between modules (Fig. 4C).In conclusion, DGI module is advisable for these data, and GNN selection should align with research goals, be it clustering accuracy or computational efficiency.

Benchmarking methods comparison
This section compares GRAPHDeep with two types of benchmarking clustering methods.Scanpy, BayesSpace, and stLearn are non-GNN clustering techniques.Five GNN-based clustering approaches include STAGATE, DeepST, CCST, spaGCN, and SEDR.Slide-seqV2 is a high-resolution spatial transcriptomics technology and the relevant number of sequencing genes is 4000, and thus the top 3000 highly variable genes are screened to constitute the feature matrix (Supplementary Note S1.1).MERFISH is an image-based single-cell resolution technology and can interrogate a low number of genes than Slide-seqV2.Its related number of sequencing genes is 347 and all genes are used to construct feature matrix for this data.Since stLearn and spaGCN rely on morphological images, they are excluded from analysis in Slide-seqV2 and MERFISH datasets.Comparison between GRAPHDeep and six baselines (STAGATE, DeepST, CCST, Scanpy, BayesSpace, and SEDR) is performed on Slide-seqV2 and MERFISH, with a specific focus on Sample 151673 of the DLPFC dataset.
Figure 5A depicts a segment of Slide-seqV2-based mouse hippocampus data, encompassing 26 146 cells, 4000 genes, and 14 clusters.Among these, only CA1_CA2_CA2_Subiculum and DentatePyramids groups show centralized distribution, while others exhibit dispersion.The ARI distributions for the two GDL modules are presented in Fig. 5C and D. VGAE is better than DGI module for this data and the spots' distributions of top four GNNs in VGAE module are described in Supplementary Fig. S4. Figure 5C includes DeepST, STAGATE, and SEDR, while Fig. 5D involves CCST, BayesSpace, and Scanpy.STAGATE's integration of a GAT yields accuracy akin to GATv2.SEDR's performance (ARI ¼ 0.39) closely aligns with STAGATE's (ARI ¼ 0.41).The best performance among the six benchmarks is seen in DeepST (ARI ¼ 0.54), yet some GNNs in VGAE surpass DeepST.BayesSpace (ARI ¼ 0.42) approximates CCST (ARI ¼ 0.43), both surpassing Scanpy (Fig. 5D).In this dataset, VGAE outperforms the DGI module,  Figure 5B illustrates the intersectional GNNs in VGAE and DGI modules.Chebyshev demonstrates optimal precision with the shortest computation time (Supplementary Fig. S5A).Familiar GNN overlaps from previous experiments, Cluster, SAGE, and UniMP, are evident in Fig. 5B.Supplementary Figure S5A and B displays the elapsed times for the six benchmarking techniques.Similarly, DeepST's duration surpasses STAGATE and SEDR (Supplementary Fig. S5A) due to its multiple neural networks.Comparatively, BayesSpace requires more time than CCST and Scanpy (Supplementary Fig. S5B).
Notably, the top time-consuming GNNs are consistent across both modules, GEN, SuperGAT, MF, and FeaSt.The calculative time in DGI takes longer than VGAE.This observation suggests that GDL module choice would significantly affect computational time.For MERFISH data, we observe the similar results that GRAPHDeep outperforms the six benchmarking methods (Supplementary Fig. S6).Ultimately, the recommended module and GNN for both mouse hippocampus and MERFISH data are VGAE and Chebyshev.
To encompass the additional baselines (spaGCN and stLearn), we compare GRAPHDeep and the original eight benchmark algorithms on Sample 151673 of human DLPFC   Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep dataset with available morphological images.spaGCN is integrated into the VGAE module, and stLearn is integrated into the DGI module (Fig. 6A and B).DGI outperforms the VGAE module for this sample.Although DeepST closely approximates the optimal GNN (TAG) in the VGAE case, the GNNs (SGC and TAG) in DGI outperform DeepST (Fig. 6A and B).STAGATE, BayesSpace, and CCST are middling methods for this sample.ARI values of the remaining baselines do not exceed 0.5.Consequently, the recommended module and GNN for this sample are DGI and SGC.The relevant normalized mutual information scores also demonstrate these findings (Supplementary Fig. S7).These results highlight that the GNNs employed in existing benchmarking methods are suboptimal, underscoring the necessity of GRAPHDeep.The higher ARI values in our experimental results underscore the effectiveness of determining optimal GNNs using GRAPHDeep.

Discussion
This article unearths multiple insightful conclusions drawn from GRAPHDeep.Firstly, VGAE outperforms the DGI module across most spatial omics data, including 4i, Slide-seqV2, and MERFISH datasets.Secondly, the number of genes or proteins significantly influences spatial clustering algorithms as it dictates feature matrix creation.Similarly, an increase in cells/spots and groups diminishes clustering accuracy and increases computational load.Thirdly, the optimal pairing of GDL module and GNN varies for heterogeneous spatial omics data.Nonetheless,  Liu et al.

Figure 1 .
Figure 1.The schematic architecture of GRAPHDeep.(A) Practicable spatial omics data in GRAPHDeep.The data are processed as the feature matrix and adjacency matrix to import the GDL module.(B) A crowd of GNNs.In the VGAE module, these GNNs are used as encoders and decoders to mimic the similarity between two graphs.In the DGI module, these GNNs are taken as encoders to represent the positive and negative samples.(C) Two GDL modules, VGAE, and DGI.These modules could enhance the learning ability of a single GNN.(D) The embedding space of latent representations.The generated latent embeddings are applied to segment the spatial domains.(E) The feasible downstream analysis of each spatial cluster.These analytical parameters could reflect each method's clustering accuracy and declare the biological meaning of spatial domains.
Benchmarked spatial clustering methods include SEDR, STAGATE,DeepST, spaGCN, CCST, stLearn, BayesSpace, and  Scanpy.SEDR (Fu et al. 2021) introduces an unsupervised spatially embedded deep representation for spatial transcriptomics.It comprises two core components: gene expression representation via a deep autoencoder network and spatial information embedding through the VGAE module.STAGATE(Dong and Zhang 2022) integrates the GAT(Brody et al. 2021) into the fundamental autoencoder, addressing spatial coordinates and gene expression profiles.DeepST(Xu et al. 2022) employs two networks for spatially resolved transcriptomes.One is VGAE, integrating gene expression and spatial location; the other is a denoising autoencoder, producing latent representations.The distinctive GNN in spaGCN(Hu et al. 2021) is GCN, identifying spatial domains with coherent expression and histology.

Figure 2 .
Figure 2. Experiment results of GRAPHDeep for the Sample 151673 in DLPFC data.(A) Clustering metric (ARI) for 20 GNNs with DGI module.The order of GNNs is the same as that in Supplementary Table S1.(B) ARI values for 20 GNNs with VGAE module.ARI is calculated according to the predicted labels and annotated ground truth.(C) The ground truth of Sample 151673, including 6 layers and white matter, 3639 spots, and 33 538 genes.(D) Top five GNNs in the VGAE module.These GNNs are TAG, UniMP, SAGE, SGC, and GATv2.The highest ARI is 0.60.(E) Top five GNNs in the DGI module.These GNNs are SGC, TAG, SuperGAT, Higher-order, FeaSt, and GCN.The ARI of FeaSt is identical to that of GCN.The largest ARI in the DGI module is 0.65 (SGC).
achieving the highest ARI values of 0.61 and 0.53 for the two respective modules.

Figure 3 .
Figure 3. ARI values with respect to 20 GNNs in two DGL modules for 12 slices of DLPFC data.(A) ARI values of 20 GNNs in DGI module.The top five accurate GNNs are SGC, TAG, FeaSt, SAGE, and ARMA.Most of them are consistent with those for the individual Sample 151673.(B) The ARI values for 20 GNNs in VGAE module.The top five GNNs in this condition are Hypergraph, SGC, GCN, GATv2, and SAGE.Most of them are also in analogy with those for the single Slice 151673.

Figure 4 .
Figure 4. GRAPHDeep's clustering results on 4i spatial protein data.(A) The subset 4i data contain 16 742 cells and 43 proteins in 10 cell types.Compared with the MIBI-TOF data (Supplementary Fig. S3), the number of cells, proteins, and groups increase simultaneously.(B) The top five GNNs in two modules are different and only has one overlap.(C) Trade-off GNNs between VGAE and DGI modules.These trade-off GNNs are selected from the top 10 accurate GNNs and top 10 efficient GNNs.(D) ARI values in the VGAE module.(E) ARI distribution in DGI modules.These ARI are larger than those in MIBI-TOF data (Supplementary Fig. S3E).The growth of proteins would result in higher ARI values.(F) Consumed time of each GNN in the VGAE module.(G) The computational time of 20 GNNs in the DGI module.For comparison, the elapsed time in 4i data is longer than that of MIBI-TOF data because of the promotion of cells and clusters.

Figure 5 .
Figure 5.A comparison between GRAPHDeep and six benchmarking methods on Slide-seqV2-based mouse hippocampus data.(A) The subset of these data contains 26 146 cells, 4000 genes, and 14 groups.Slide-seqV2 has a near-cellular resolution.(B) Overlap top GNNs between DGI and VGAE modules.There are three intersections, Cluster, SAGE, and UniMP.(C) ARI values of 20 GNNs in the VGAE module.Three baseline methods (STAGATE, SEDR, and DeepST) are compared in the VGAE module.(D) ARI distribution of 20 GNNs in the DGI module.The VGAE module performs better than DGI.

Figure 6 .
Figure 6.A comparison analysis between GRAPHDeep and eight benchmarking approaches on the Sample 151673 in DLPFC data.(A) Four benchmarking methods are implanted into the DGI module.stLearn and spaGCN are appropriate for this sample because the morphology image is available.(B) DeepST, STAGATE, spaGCN, and SEDR are added to the VGAE module.These four baselines are all related to deep autoencoders.For this sample, the accuracy order of eight benchmarking approaches shift from larger ARI to lower ones is, DeepST, CCST, STAGATE, BayesSpace, spaGCN, SEDR, stLearn, and Scanpy.The top two GNNs in two GDL modules are SGC, TAG, and TAG, UniMP, respectively.This result denotes that GRAPHDeep could build a better spatial clustering framework than the existing algorithms for each spatial omics data. 10