SGCAST: symmetric graph convolutional auto-encoder for scalable and accurate study of spatial transcriptomics

Abstract Recent advances in spatial transcriptomics (ST) have enabled comprehensive profiling of gene expression with spatial information in the context of the tissue microenvironment. However, with the improvements in the resolution and scale of ST data, deciphering spatial domains precisely while ensuring efficiency and scalability is still challenging. Here, we develop SGCAST, an efficient auto-encoder framework to identify spatial domains. SGCAST adopts a symmetric graph convolutional auto-encoder to learn aggregated latent embeddings via integrating the gene expression similarity and the proximity of the spatial spots. This framework in SGCAST enables a mini-batch training strategy, which makes SGCAST memory-efficient and scalable to high-resolution spatial transcriptomic data with a large number of spots. SGCAST improves the overall accuracy of spatial domain identification on benchmarking data. We also validated the performance of SGCAST on ST datasets at various scales across multiple platforms. Our study illustrates the superior capacity of SGCAST on analyzing spatial transcriptomic data.


INTRODUCTION
In biology, the spatial concept is important as it allows us to describe interactive biological networks, where each element is inf luenced by its surrounding environment [1].In particular, the coordinated gene expression in tissues can be used to uncover its functions and connections [2].Captured locations in spatial transcriptomics (ST) data are known as 'spots'.However, early techniques such as 10X Visium [3] have limited resolution that cannot reach the cellular level.Each spot of 10X Visium is 55 μm in diameter and may contain tens of cells, and the number of spots on a tissue slide is at most 5000.Recently, there has been development in high-resolution ST data, such as Seq-scope [4] and Stereo-seq [5], that enables the profiling of gene expression at cellular/subcellular resolution.These methods generate datasets of a much larger scale, where the number of spots can exceed 100k [5].Identification of spatial domains (regions characterized by similar expression patterns in space) has been one of the most important tasks in ST.With the explosion in the scale of ST data, scalable methods that can be implemented on high-resolution ST data are in great demand.
Methods have been developed for dimension reduction and clustering of scRNA-Seq data, such as scVI [6] and ZIFA [7].Although these methods can be implemented on the spot by gene matrix in ST data in principle, they may lose information and efficiency because they ignore the spatial information of the spots.On the other hand, methods have been developed for ST data where the information on the spatial location of the spots is incorporated.BayesSpace [8] employs a fully Bayesian method that encourages the clustering of nearby locations through a prior that stores neighborhood information.stLearn [9] employed integrative analysis to use all information including histology image, gene expression and spatial coordinates, first finding cell types, then reconstructing cell types in a tissue and finally deciphering tissue regions with high cell-to-cell interactions.SpaGCN [10] utilizes a graph convolutional network to incorporate information from gene expression, histological image and spatial coordinates, coupled with a self-supervised module to train the neural network.SEDR [11] uses a deep autoencoder to create a low-dimensional latent representation of gene expression, which is then merged with the corresponding spatial embedding obtained from a variational graph autoencoder.SpatialPCA [12] extends the probabilistic version of principal component analysis (PCA) by incorporating location information and explicitly representing the spatial correlation structure through a kernel matrix.BASS [13] performs cell type clustering and spatial domain detection simultaneously within a Bayesian hierarchical modeling framework.DeepST [14] uses multiple neural networks to extract information from tissue morphology and spatial location, then combines them with gene expression to generate the representation of spots.STAGATE [15] generates low-dimensional latent embeddings via a graph attention auto-encoder, which adaptively aggregates information from its neighbors.Although these methods incorporate spatial information in analyzing ST data, they rely on a graph constructed from all spots in a sample.This makes them difficult to handle high-resolution ST data where the size of the graph is large because of the large number of spots.Mini-batch training may be a remedy, where a subset of the spots is used in each iteration of the update.However, it is challenging to implement mini-batch training for the setting of spatial transcriptomic data, because of the relationship of the spots in the graph.Therefore, there is a need for scalable methods in analyzing high-resolution spatial transcriptomic data with a large number of spots.
To address the need, we develop SGCAST, which is based on a symmetric graph convolutional auto-encoder for the analysis of Spatially resolved Transcriptomic data.SGCAST utilizes an autoencoder for the graph convolution layer, where it employs the idea of Laplacian smoothing and Laplacian sharpening to construct its encoder and decoder.The auto-encoder in SGCAST incorporates the information of both the gene expression similarity and the proximity of the spatial spots, which effectively borrows information across the dataset and enables accurate identification of the spatial domains.This framework in SGCAST enables mini-batch training, where the spots in ST data are processed in a series of mini-batches.Instead of constructing a large graph with all the spots, which can be computationally intensive, individual small graphs are constructed using the spots in each batch.This minibatch training strategy can significantly reduce computational time and memory usage for large-scale ST data.We demonstrate the superior performance of SGCAST in multiple experimental platforms, including 10X Visium [3], Seq-scope [4] and Stereo-seq [5].The embeddings obtained by SGCAST can be used for spatial clustering mainly.In addition, the detected clusters can be effectively leveraged for downstream analysis, including differentially expressed genes detection and trajectory inference.

SGCAST improves the accuracy of identifying the layers in human dorsolateral prefrontal cortex
To quantitatively evaluate the performance of SGCAST on spatial clustering, we initiated applying it to the benchmarking dataset, which is a publicly available 10X Visium dataset including 12 slides of the human dorsolateral prefrontal cortex (DLPFC).The layers and white matter (WM) are annotated by Maynard et al. [16] based on the cytoarchitecture and gene markers (Figure 2A).Regarding it as the label, we compared SGCAST with several existing spatial clustering approaches on the accuracy (stLearn [9], BayesSpace [8], SpaGCN [10], SEDR [11], SpatialPCA [12], BASS [13], DeepST [14] and STAGATE [15]).We used adjusted rand index (ARI) [17] as the criterion, and the results demonstrated that our proposed SGCAST identifies the cortical layers effectively and outperforms other methods in 12 slides of DLPFC (Figure 2C, Supplementary Figures 1, 2 and 3).In the DLPFC slide 151673, SGCAST can detect all the layers and achieve the highest accuracy in clustering (ARI = 0.60) (Figure 2B).SpatialPCA, DeepST and STAGATE also performed well (ARI=0.58),but they did not detect layer 2 or layer 4 and separated WM into two clusters.Both BayesSpace and BASS did not detect layer 4, and they also separated WM into two clusters.To verify the effectiveness of the two graph convolutional layers in SGCAST, we implemented simplified versions of SGCAST, with only the layer capturing gene expression similarity, SGCAST exp or the layer capturing spatial proximity, SGCAST spa .Results in Figure 2C demonstrate that both layers are important and the full version of SGCAST outperforms the simplified versions with either layer.
We also performed differential expression analysis to verify the biological meaning of the domains identified from SGCAST.Specifically, we use the detected clusters as the basis to find differentially expressed genes in each spatial domain.The detected differentially expressed genes (DEGs) display clear expression patterns (Figure 2E).Among these genes, PCP4 is a known marker gene in the prefrontal cortex [18].An additional Gene Ontology (GO) analysis was performed on the detected DEGs specific to cluster 2 (orange) with P value less than 0.01 (Figure 2D).The enriched GO terms include chemical synaptic transmission, regulation of cation channel activity and anterograde transsynaptic signaling, which is consistent with the function of the marker gene for cluster 2: CAMK2N1 is differentially expressed in cluster 2 and it was shown to regulate long-term synaptic activity [19].In addition, another marker gene for cluster 2, HPCAL1, promotes glioblastoma proliferation, which is a prevalent primary cancer with evident aggressiveness in the human brain [20].

SGCAST exhibits a layered pattern clearly mirroring the layers in the annotation of the mouse olfactory bulb
Next, we applied SGCAST to the dataset of the mouse olfactory bulb generated by Stereo-seq [5].Stereo-seq combines DNA nanoball-patterned arrays and in situ RNA capture and creates spatial transcriptomic data with subcellular resolution.The spots in this dataset were binned into a resolution of around 14μm [5].SGCAST was able to accurately identify the laminar organization of various layers in the mouse olfactory bulb, including the rostral external plexiform layer (EPL), granule cell layer (GCL), glomerular layer (GL), internal plexiform layer (IPL), mitral cell layer (MCL), olfactory nerve layer (ONL) and rostral migratory stream (RMS), which matches the known anatomical characteristics seen in the DAPI-stained image (Figure 3A).In comparison with other methods such as SpaGCN, SEDR and SpatialPCA, SGCAST identified a much finer layered structure (Figure 3B).Although STAGATE and BASS also identified a clear pattern in general, STAGATE did not separate the RMS and the GCL, while BASS did not distinguish between the IPL and the GCL.BayesSpace and stLearn were not implemented for this dataset due to the high memory cost and DeepST was not implemented because it requires the histology image.
We further analyzed DEGs among domains identified by SGCAST (Figure 3C, D).By calculating the top genes, our analysis revealed genes that display predominant expression in specific layers of the olfactory bulb tissue: Pcp4 is predominantly expressed in the IPL, Ptn in the ONL, Scg2 in the MCL and Mbp in the RMS (Figure 3C, D), which are consistent with the maker genes reported in mouse olfactory bulb [21].
Finally, based on the spatial domains identified by SGCAST (Figure 3E), the UMAP visualization and trajectory inference by PAGA [22] denoted that the olfactory bulb layers revealed a clear organization from the exterior of the mouse olfactory bulb to the interior, following the spatial distribution in the annotation.

SGCAST precisely identifies tissue structures in the mouse colon from Seq-scope
In addition to the 10X Genomics Visium platform and Stereoseq platform, we verified the effectiveness of SGCAST on a Seqscope dataset with the spatial resolution of 10 μm generated from mouse colon [4].When compared with the 55 μm resolution of the which makes it scalable to large datasets.The auto-encoder takes the PCs of gene expression, adjacency matrix W exp representing gene expression similarity and adjacency matrix W spa representing spatial proximity of the spots, and outputs the latent embeddings of the spots.The embeddings can be used for clustering and visualization.Additionally, the detected clusters are used as the basis for further downstream analysis, including DEG detection and trajectory inference.The main usage of SGCAST is to provide an efficient and accurate spatial domain identification method that can be used as a basis for further downstream analysis.
10X Visium platform, Seq-scope enables the profiling of spatial expressions at the cellular level with a larger number of spots and the pixels in Seq-Scope are around 0.5-0.8μm apart from each other.As a gastrointestinal organ, the colon consists of complex tissue layers with histological zonation structure and diverse cellular components [23].From a histological perspective, the colonic wall can be divided into the colonic mucosa and the external muscle layers [24].Within the colonic wall, the colonic mucosa comprises the epithelium and lamina propria.The epithelium can be further subdivided into the crypt-base, transitional and surface layers [4].Annotation of the gridded Seq-Scope dataset (Supplementary Figure 4) revealed transcriptome phenotypes corresponding to these layers.SGCAST effectively uncovers the tissue structures in the mouse colon tissue, and it accurately deciphered the region of the lamina propria, highlighted by the red solid line in Figure 4A.This region is supported by the histology image (blue dashed curve) in Figure 4B.The clustering result given by SGCAST is close to the annotated layers (Supplementary Figure 4).BASS and STAGATE also performed well, but they were not able to clearly recover the region of lamina propria (Figure 4A).SEDR and SpaGCN did not capture the layered pattern as well, and they also did not recover the region of lamina propria (Figure 4A).SpatialPCA did not work well on this dataset.

SGCAST is scalable and efficient for large-scale ST data
With advances in technology, some ST methods can measure a large number of cells through high spatial resolutions on a great scale.For instance, Stereo-seq launched the MOSTA, a mouse organogenesis spatiotemporal transcriptomic atlas aimed at mapping the spatiotemporal transcriptomic dynamics of the whole developing mouse embryo [5].Here, we applied SGCAST to three late embryonic stages, including E12.5, E14.5 and E16.5 days (Figure 5A).The numbers of spots are 51k, 102k and 121k, respectively.The mini-batch training strategy employed by SGCAST allows it to be scalable and efficient, even for massive spatial transcriptomic datasets.In contrast, other spatial clustering methods such as SEDR failed due to their high memory cost, while BASS and SpatialPCA failed due to their long-running time, which can take more than a day to complete.This highlights the advantage of SGCAST's implementation for large-scale spatial transcriptomic data analysis.
We evaluated the performance of SGCAST in section E16.5.The comparison (Figure 5B) exhibits that SGCAST accurately identifies most tissues, including nervous system, inner ear, muscle, heart, lung, GI tract, liver, kidney and bone, which matched the manually annotated areas provided by Stereo-seq confirmed by visualizing specific marker genes [5].Moreover, we analyzed DEGs between the spatial domains identified by SGCAST and discovered canonical marker genes for the organs: Ptgds [25] in meninges, Krtdap [26] in the epidermis, Sftpc [27] in the lung, Afp [28] in the liver (Figure 5C).These results collectively demonstrate the scalability of SGCAST in identifying tissue structures from massive spatial transcriptomic data.
In addition to the evaluation of clustering performance, we also recorded the memory usage and running time of SGCAST and other methods on real datasets (Figure 5D, E).Specifically, we implemented these methods on spatial transcriptomic data with the number of spots ranging from 3k to 121k (Supplementary Table 1).Compared with the other methods, SGCAST is highly memory-efficient due to its mini-batch training, requiring only around 0.25GB of GPU memory to implement on datasets with various scales (Figure 5D).The memory usage for the other methods is either linear or quadratic in the number of spots, which limits their usage in high-resolution spatial transcriptomic data with a large number of spots.On the contrary, the memory usage of SGCAST does not depend on the total number of spots and mainly depends on the size of the mini-batch.Similar to SGCAST, SpaGCN is also based on a graph convolutional network.However, it is less memory-efficient compared with SGCAST.Because SpaGCN relies on the neighboring graph constructed from all the spots, it is difficult to implement mini-batch training.The running time was capped at 2 h in Figure 5E and the running time for methods that cannot be implemented due to high memory usage is not shown.For all datasets, it took SGCAST less than 20 min to implement, significantly faster than the other methods.The running time not displayed in Figure 5E is as follows: for the mouse Seq-scope colon dataset (28 399 spots), it took BASS and SpatialPCA 3.2 and 2.5 h, respectively; for the Stereo-seq mouse embryo E12.5 dataset (51 335 spots), it took BASS and SpatialPCA 8.1 and 5.1 h, respectively; for the Stereo-seq mouse embryo E14.5 (102 489 spots) and E16.5 (121 764 spots) datasets, it took BASS and SpatialPCA more than 12 h to implement.To summarize, SGCAST is both memory-efficient and computationally fast to be implemented on high-resolution spatial transcriptomic data with a large number of spots.

DISCUSSION
With the rapid advances in ST technology, there is an inevitable trend toward higher spatial resolution and larger data scales.In this paper, we present SGCAST, a simple and efficient framework for identifying spatial domains.Firstly, SGCAST uses an autoencoder structure where the encoder aggregates information to perform smoothing, while the decoder separates the information to perform sharpening.Secondly, SGCAST employs information efficiently by transforming information from gene expression and the position of spots into two adjacency matrices and further integrating them using the auto-encoder.Lastly, SGCAST utilizes the mini-batch training strategy, avoiding intermediate clustering throughout the training process, and displaying superior efficiency and scalability to large ST datasets.These factors contribute to the superior performance of SGCAST, which not only accurately identifies spatial domains but also extracts spatially We demonstrated the superior performance of SGCAST on multiple ST datasets from different platforms with various spatial resolutions.SGCAST precisely revealed the layer organization of the human DLPFC from 10X Visium and the mouse olfactory bulb from Stereo-seq, and facilitates the detection of differentially expressed genes over identified domains.On average, SGCAST achieved the highest ARI compared with other spatial clustering methods, indicating that the spatial domains identified by SGCAST are more biologically meaningful.We also found that SGCAST accurately identified the complex tissue structures in the mouse colon.Additionally, we demonstrated the scalability of SGCAST on the Embryo dataset and showed that SGCAST identified the major tissue structures of Embryo E16.5 clearly.
Finally, we compared the efficiency of SGCAST with other popular methods in both running time and memory cost and found that SGCAST was highly efficient.
We note that SpaGCN [10] also employs graph convolutional networks.However, there is a primary difference between SpaGCN and SGCAST: SpaGCN implements the unsupervised deep embedding (UDE) framework [29], while SGCAST implements the symmetric graph convolutional auto-encoder [30].It is challenging to implement mini-batch training in UDE for the setting of ST data: UDE requires the search of cluster centroids and assigning cluster labels to the spots in the iterations; however, the cluster centroids for different batches are likely different and it is difficult to align these centroids across batches.On the contrary, the autoencoder framework in SGCAST is more adaptable for mini-batch training.In addition, we also notice that SEDR [11] employs two auto-encoders with graph convolutional layers.However, the structure and intuition of the auto-encoder are fundamentally different.SEDR utilizes a variational graph auto-encoder to encode spatial information and a deep auto-encoder to embed transcriptional expression and concatenates the learned features.In contrast, SGCAST uses a symmetric graph convolutional autoencoder to aggregate the information, rather than concatenating it.This approach allows SGCAST to better capture the spatial information and improve the clustering performance.(More details are listed in Supplementary materials.)The input of SGCAST is the top principal components (PCs) and PCs go through the symmetric graph convolutional auto-encoder to learn the latent embedding.To display the significance of the auto-encoder, we compared results obtained by applying mclust directly to PCs with those using latent embeddings learned by the auto-encoder on the DLPFC dataset.The comparison demonstrates that the clustering result using the latent embeddings by SGCAST is much better compared with using directly the PCs (Supplementary Figure 5), which means the auto-encoder plays a critical role in aggregating information.SGCAST uses the PCs as input, which may lead to some loss of information, compared with using the whole transcriptome.However, using PCs benefits noise reduction and increases the signal-to-noise ratio in the data, compared with using the whole transcriptome.Choosing the number of PCs is a balance of signal-to-noise: more PCs preserve more information but can also include more noise, while fewer PCs may contain less noise but lose information.We found that the top 50 PCs strike a good balance of signal-to-noise, and we use the same number of top PCs for all the datasets, including the DLPFC dataset, Stereo-seq mouse olfactory bulb and embryo and Seq-scope mouse colon.In our experiments, the number of PCs has a larger impact on the human DLPFC datasets, and the results for Stereoseq mouse olfactory bulb and Seq-scope mouse colon datasets are more robust to the number of PCs (Supplementary Figures 6, 7  and 8).In addition, we ran multiple experiments regarding different training parameters and found all results of SGCAST are robust to these parameters (Supplementary Figures 9 to 14).For now, the double graph convolutional layers in SGCAST capture gene expression similarity and spatial proximity.One future direction is to further incorporate the information of histological images through an extra convolutional layer in SGCAST.Although we implemented SGCAST on spatial transcriptomic datasets, it would be interesting to test the performance of SGCAST on the emerging spatial epigenomic datasets [31][32][33].
In conclusion, SGCAST is an efficient and promising framework for learning integrated latent embeddings to decipher the spatial domain.With the advent of new ST technologies, we anticipate that SGCAST can assist in uncovering new biological insights in the spatial context.

Overview of SGCAST
The core of SGCAST is a symmetric graph convolutional autoencoder, which learns latent embeddings from spatial transcriptomic data.The symmetric graph convolutional auto-encoder combines the information of gene expression similarity and spatial proximity, through multi-view graph convolution layers, to effectively learn the latent embeddings of the spatial spots.One important feature of SGCAST is the mini-batch training strategy, which makes SGCAST memory-efficient and scalable.The latent embeddings given by SGCAST can be used for clustering, data visualization, trajectory inference and other downstream analyses.A graphical overview of SGCAST is shown in Figure 1.

Symmetric graphical auto-encoder
SGCAST first runs PCA on the preprocessed gene expression matrix of all spots and the top 50 PCs are used as the input of the symmetric graphical auto-encoder.

Mini-Batch training strategy
Previous methods [9][10][11] employing spatial information generally build a spatial network for the whole data to describe the neighborhood before training the model, which may be memoryconsuming and difficult to implement on large spatial transcriptomic data with high resolution.Unlike these methods, SGCAST regards each spot in the spatial transcriptomic data as a data point and uses a mini-batch training strategy to train the parameters in graph convolutional layers.SGCAST generates adjacency matrices for the mini-batch picked in each iteration.Suppose the total number of spots and the size of the mini-batch are N and n, respectively, then the number of iterations in each epoch equals N/n .(n is set as 2000 by default.)

Multi-view graph convolutional layers
SGCAST aggregates information among the spots in each minibatch by two factors: (1) the gene expression similarity of the spots, and (2) the physical proximity of the spots in the tissue slide.This is accomplished by constructing two graph convolutional layers in the encoder: one layer incorporates the gene expression similarity of the spots, and the other one incorporates the spatial proximity of the spots.To achieve this, SGCAST builds an adjacency matrix for each layer where the entries in the matrix measure the relatedness between spots in a mini-batch and are negatively associated with their distances [10].For the layer that captures the gene expression similarity, the distance is calculated as where x u and x v are vectors of the PCs for spots u and v, respectively.For the layer that captures the spatial proximity, the distance is calculated as where p u and p v are spatial coordinates for spots u and v.The entries in the adjacency matrices W e = [w e (u, v)] and W p = [w p (u, v)] are then computed as and where l e and l p inf luence how quickly the weight decays as a function of distance, and are dynamically determined for each minibatch in SGCAST.After a mini-batch is input into the framework, distances are calculated using equations 1 and 2, and the τ e thquantile of the distance d e and the τ p th-quantile of the distance d p are determined using torch.quantile().The values of l e and l p are then computed by solving equations where − 0. This ensures that the proportions of non-zero entries in the nondiagonal elements of the adjacency matrices W e and W p equal τ e and τ p , respectively.The default values of τ e and τ p are set to 0.07.The reason for this rule is to ensure that the average number of neighbors for the spots in each mini-batch is the same.

Encoder
The encoder in SGCAST takes the top 50 PCs as the input and aggregates information to the embedding according to the adjacency matrices.Let X i be the PCs for the i-th mini-batch and the graph convolutional layer can be written as where H (k)   i is the embedding of mini-batch i generated in layer k, (i.e.H (0) i = X i ), W (k)  i is the adjacency matrix for mini-batch i in layer k, B (k) is a 50×50 matrix representing the projection parameters of the k-th convolutional layer and δ(•) is the nonlinear activation function.There are two convolutional layers (i.e.k ∈ {1, 2}): the first layer aggregates information based on gene expression similarity, and the second layer aggregates information based on spatial proximity.The output embedding of the encoder is the representation of the spots used for spatial domain identification.

Decoder
The reconstruction of node features in the decoder is designed based on Laplacian sharpening as the counterpart of Laplacian smoothing in the encoder (Supplementary note): the adjacency matrix Ŵi in the layer of the decoder has the form 3 * I n − W i , where W i is the adjacency matrix in the encoder [30,34,35].H (2)   i is the result of the encoder, and it serves as the input for the decoder, i.e.Ĥ(2) i = H (2)  i .The k-th layer in the decoder reconstructs the embedding in layer k − 1 as follows: where Ŵ(k The final output of the decoder, Ĥ(0) , is the reconstructed PCs.To avoid overfitting, SGCAST assumes symmetry in the encoder and decoder by setting B(k

Loss function and training details
The objective of SGCAST is to minimize the reconstruction loss of PCs as follows: SGCAST implements the SGD optimizer [36] to minimize the reconstruction loss.ELU [37] serves as the activation function.The number of iterations is set as 100 by default.The initial learning rate is set as 2e-1 and decreases to 1e-1 at the last 20 epochs.

Data description
SGCAST is applicable to multiple ST datasets obtained from various platforms, such as 10X Visium, Seq-Scope and Stereo-seq.In DLPFC datasets, 12 tissue slides are annotated with DLPFC layers and WM regions with a spot range of 3498 to 4789 [16].The Seq-Scope processed datasets for mouse colon are grid sampled at a spatial resolution of 10 μm with the number of spots equal to 28 399 [4].The mouse olfactory bulb data from Stereo-seq consist of 19 109 spots with a resolution of around 14 μm.The Stereo-seq embryo data from E12.5 to E16.5 has been binned into spots with a diameter of 25 μm and the number of spots ranges from 51 335 to 121 764 [5].

Data preprocessing
ST data used by SGCAST consist of a gene expression count matrix and a two-dimensional coordinates matrix for spots.First, we logtransform and normalize the raw gene expression by library size using Scanpy [38].Next, we select the top 3000 highly variable genes and run PCA to the selected gene expression matrix.Finally, the top 50 PCs are used as inputs, which is effective across all datasets in the paper.

Clustering and refinement
Different strategies are applied to perform spatial clustering.When the number of spatial domains is provided, SGCAST implements the mclust clustering algorithm [39] on the latent embeddings to obtain cluster labels for the spots.When there is a lack of prior information, the Louvain algorithm [40] is utilized for clustering.By default, the algorithm uses a resolution parameter of 1.2.Louvain algorithm builds a graph used for clustering.By adjusting the number of neighbors in the graph, the number of resulting clusters can be determined.After clustering, SGCAST provides a step to refine the clustering result.During this step, SGCAST evaluates the domain assignments for each spot and its surrounding spots.If over half of the spots surrounding a given spot are assigned to a different domain, that spot will be relabeled to match the major label of its surrounding spots.We performed cluster refinement for all ST datasets.

Spatial trajectory inference
PAGA algorithm [22] is used to generate trajectory results.

Identifying differentially expressed genes
Wilcoxon test in Scanpy [38] is employed to discover DEGs for each identified spatial domain (Benjamin-Hochberg adjustment).

Gene Ontology enrichment analysis
For the DFPLC dataset, we conducted the gene set enrichment analysis implemented in GSEAPY package [41] to discover the enriched GO terms for spatially variable genes in the detected domain with adjusted P value < 0.01.

Key Points
• SGCAST adopts a symmetric graph convolutional autoencoder to learn aggregated latent embeddings via integrating the gene expression similarity and the proximity of the spatial spots.• For spatial domain identification, SGCAST consistently outperformed baseline methods on ST datasets at various scales across multiple platforms.• SGCAST enables mini-batch training strategy, which makes it scalable to high-resolution spatial transcriptomic data.• SGCAST facilitates the downstream analysis in the biological study, such as the detection of differentially expressed genes.

Figure 1 .
Figure1.Overview of SGCAST.SGCAST takes spatial coordinates and gene expression of the spots in spatial transcriptomic data as input.The PCs are first computed from the gene expression of all the spots.SGCAST implements a symmetric graph convolutional auto-encoder with mini-batch training, which makes it scalable to large datasets.The auto-encoder takes the PCs of gene expression, adjacency matrix W exp representing gene expression similarity and adjacency matrix W spa representing spatial proximity of the spots, and outputs the latent embeddings of the spots.The embeddings can be used for clustering and visualization.Additionally, the detected clusters are used as the basis for further downstream analysis, including DEG detection and trajectory inference.The main usage of SGCAST is to provide an efficient and accurate spatial domain identification method that can be used as a basis for further downstream analysis.

Figure 2 .
Figure 2. SGCAST displays accurate detection of spatial domains in the DLPFC datasets.A Ground truth of the layer structure in slide 151673 [16].B Clustering assignments generated by stLearn, SpaGCN, SEDR, BASS, BayesSpace, SpatialPCA, DeepST, STAGATE and SGCAST in slide 151673.C Boxplot displaying the clustering accuracy in all 12 slides of the DLPFC dataset, measured in terms of ARI scores, for 11 different methods.The median, upper and lower quartiles, and 1.5× interquartile range in the boxplot are represented by the center line, box limits and whiskers, respectively.D The enrichment analysis of GO for differentially expressed genes (161 genes) in domain 2 (orange layer in the result).E Visualization of gene expression for layer-specific genes in slide 151673.

Figure 3 .
Figure 3. SGCAST detects finer layer structure in the mouse olfactory bulb.A The DAPI-stained image with annotation for Stereo-seq mouse olfactory bulb [11].B Spatial domains identified by BASS, SEDR, SpatialPCA, STAGATE, SpaGCN (without histology image) and SGCAST in the Stereo-seq mouse olfactory bulb.C Visualization of gene expression for the detected marker genes.D Dotplot of visualization of expression fraction for layer-specific genes.E Results of UMAP and PAGA, generated using the SGCAST embeddings.

Figure 4 .
Figure 4. SGCAST can decipher complex tissue structures in the mouse colon.A Spatial domains identified by SGCAST, STAGATE, SEDR, SpaGCN (without histology image), SpatialPCA and BASS in the Seq-scope mouse colon tissue section.B Underlying H&E histology of the mouse colon [4].

Figure 5 .
Figure 5. SGCAST works on large-scale ST data accurately and efficiently.A Spatial clustering by SGCAST on mouse embryo E12.5, E14.5 and E16.5 days.B Comparison of spatial domains of major tissues identified by SGCAST with Stereo-seq annotation on mouse embryo E16.5 days [5].C Visualization of gene expression for the detected domain-specific marker genes.D Comparison of memory cost in the real datasets by stLearn, BayesSpace, SpaGCN, SEDR, BASS, SpatialPCA, DeepST, STAGATE and SGCAST.E Comparison of running time in the real datasets by stLearn, BayesSpace, SpaGCN, SEDR, BASS, SpatialPCA, DeepST, STAGATE and SGCAST.

2 p
= m, where m is the largest integer such that torch.exp(m)=