-
PDF
- Split View
-
Views
-
Cite
Cite
Zhihan Ruan, Fan Lin, Zhenjie Zhang, Jiayue Cao, Wenting Xiang, Xiaoyi Wei, Jian Liu, Pairpot: a database with real-time lasso-based analysis tailored for paired single-cell and spatial transcriptomics, Nucleic Acids Research, Volume 53, Issue D1, 6 January 2025, Pages D1087–D1098, https://doi.org/10.1093/nar/gkae986
- Share Icon Share
Abstract
Paired single-cell and spatially resolved transcriptomics (SRT) data supplement each other, providing in-depth insights into biological processes and disease mechanisms. Previous SRT databases have limitations in curating sufficient single-cell and SRT pairs (SC–SP pairs) and providing real-time heuristic analysis, which hinder the effort to uncover potential biological insights. Here, we developed Pairpot (http://pairpot.bioxai.cn), a database tailored for paired single-cell and SRT data with real-time heuristic analysis. Pairpot curates 99 high-quality pairs including 1,425,656 spots from 299 datasets, and creates the association networks. It constructs the curated pairs by integrating multiple slices and establishing potential associations between single-cell and SRT data. On this basis, Pairpot adopts semi-supervised learning that enables real-time heuristic analysis for SC–SP pairs where Lasso-View refines the user-selected SRT domains within milliseconds, Pair-View infers cell proportions of spots based on user-selected cell types in real-time and Layer-View displays SRT slices using a 3D hierarchical layout. Experiments demonstrated Pairpot’s efficiency in identifying heterogeneous domains and cell proportions.

Introduction
With the rapid development of single-cell technologies [e.g. single-cell RNA sequencing (scRNA-seq)], more and more cellular expression patterns are characterized, which allows the construction of numerous cell atlases in the single-cell resolution (1). These massive atlases reveal significant cell-to-cell heterogeneity across diverse tissues and organ systems. The functions of tissues are closely related to the positional contexts in microenvironments; however, cells usually suffer from dissociation during the whole tissue suspension (2). To effectively preserve the invaluable spatial contexts, spatially resolved transcriptomics (SRT) technologies are developed, making it promising to uncover the heterogeneous transcriptome landscapes (3).
SRT technologies fall into two categories: sequencing-based [i.e. 10× Visium (4), Slide-seq V2 (5) and Stereo-seq (6)] and imaging-based [i.e. MERFISH (7), STARmap (8) and NanoString COX (9)] strategies, which often require the presence of single-cell data to detect spatial contexts in single-cell levels (10). Sequencing-based technologies capture long-read transcripts in situ and then conduct next-generation sequencing in spot resolution (11). They require precisely annotated single-cell data as references to infer the cell proportions of each spot (12). Image-based technologies detect target sequences using fluorescent probes, which fetch hundreds of genes at most (10). They also require well-annotated single-cell data for cell-type label transfer (13). Integrating single-cell and spatial transcriptomics as pairs can facilitate discoveries in fields such as neuroscience (14) and tumor micro-environments (15). Specifically, these single-cell and spatial transcriptome data pairs (SC–SP pairs) should exhibit high similarity in terms of species, tissues, developmental stages and diseases (12).
To achieve SC–SP pairs, more and more studies performed single-cell and SRT sequencing simultaneously on samples of the same conditions and further conducted integrative analysis (16). Besides these studies, some SRT research used previously published single-cell data or atlases, which were generated under similar conditions with SRT data, to achieve SC–SP pairs for integrative analysis (17). In addition, many SRT researches are still not supplemented with paired single-cell data. Previous SRT databases mainly focus on collecting spatial transcriptomics data [e.g. SpatialDB (18), SOAR (19), SODB (20), STomicsDB (21) and Aquila (22)] but ignore curating paired single-cell data. Researchers have to frequently access corresponding single-cell data from different data resources to match SC–SP pairs, which is time-consuming. It is promising to construct a database that collects currently available single-cell and SRT data and creates valuable SC–SP pairs with cell-type annotations.
Recently, some databases have considered paired single-cell and spatial transcriptomics resources [e.g. SPASCER (23), ssREAD (24), SORC (25) and CROST (26)]. Existing databases are either limited to specific fields [e.g. cancer (25) and Alzheimer's disease (24)] or mainly focus on analysis within individual slices, ignoring potential associations across multiple slices of SRT data. Analysis within individual slices may pose challenges for capturing the global spatial distribution patterns that span across slices. These databases also ignore establishing potential associations between single-cell and SRT data, which may make it hard to discover cells and spots with similar biological processes and disease mechanisms. Further, these databases offer limited support for incorporating heuristic analysis according to user interaction patterns. They fail to dynamically personalize spatial segmentation or deconvolution according to user-selected spots (cells) in real-time, making it challenging for users to identify rare cell types and regional heterogeneity of SRT data. Suppose biologists would like to personalize the integrative analysis, such as modifying annotations in single-cell references according to their preferences. In that case, they would have to spend extra time and effort reorganizing inputs, running pipelines and fine-tuning parameters repeatedly.
To deal with the challenges above, we developed Pairpot, a database with real-time lasso-based analysis tailored for paired single-cell and spatial transcriptomics. Pairpot currently curates 99 available single-cell and SRT data pairs from 299 studies, evaluates the quality of these SC–SP pairs, and establishes their association networks. Pairpot performs in-depth pre-analysis on the curated pairs, including clustering, cell-type annotation, marker detection, cell-proportion inference and cell/domain interaction. Pairpot provides Layer-View to display multiple slices in a 3D hierarchical layout. In addition, Pairpot constructs the neighbor graphs based on integrated data to establish connections among multiple slices. Pairpot assigns signatures of different cell types from PanglaoDB (27) using UCell (28) to evaluate the relative gene expression in single-cell and SRT data. On this basis, Pairpot proposes real-time lasso-based heuristic analysis based on semi-supervised machine learning. In particular, Lasso-View refines the customized domains lassoed by users within milliseconds using the neighbor graphs from integrated data. Moreover, Pair-View infers cell proportions of spots using the cell types defined by users based on UCell signatures. Pairpot also generates code for these heuristic analysis that allows users to integrate these analysis with their existing pipelines. We leveraged Lasso-View to uncover similar domains across multiple slices using lassoed spots, and utilized Pair-View to precisely infer proportions of user-defined cell sub-types. These case studies demonstrated Pairpot’s efficiency in discovering heterogeneous domains and cell proportions. In summary, Pairpot provides valuable database resources and powerful heuristic analysis tools that empower researchers to streamline real-time single-cell and SRT integrative analysis.
Materials and methods
Data collection
Spatial transcriptomics and single-cell datasets were acquired and downloaded from databases such as the National Center for Biotechnology Information (29), European Bioinformatics Institute (30), China National Center for Bio-information (31) and 10× Genomics (https://www.10xgenomics.com).
Inspired by previous studies (12), we designed strategies for collecting paired single-cell datasets that hinge on the nature of the SRT studies. Initially, if the SRT studies themselves provide single-cell data, we would collect them directly. If the SRT studies lack their own single-cell data but use previous single-cell datasets for integration analysis, we would collect these corresponding single-cell datasets instead. In the case that the SRT studies did not mention any single-cell datasets, we would search for paired single-cell datasets according to the consistency of features such as species, tissues and diseases among the SRT and single-cell studies. If there exist multiple single-cell studies with the same features, Pairpot chooses the single-cell data that gives annotation files by default. When there exist multiple studies that provide annotation files, Pairpot chooses the study containing the widest variety of cell types by default. Lastly, when all the above strategies fail, we would resort to the cells of corresponding tissues from ovarian single-cell atlases.
We constructed association networks among different datasets using Neovis.js. For constructing the word clouds, we used the term frequency-inverse document frequency strategy based on TfidfVectorizer from scikit-learn (version 1.0.2) to evaluate the word weight in the dataset summary and overall design.
Data curation
Quality control and preprocessing
Quality control was conducted on raw scRNA-seq and SRT data using Cell Ranger (version 7.2) and Space Ranger (version 3.0) (10× Genomics) to remove low-quality reads respectively. For BAM format files, we converted them to raw sequencing files using bamtofastq in CellRanger and then performed quality control. All datasets were manually converted to Anndata format after quality control. We used Scanpy (32) (version 1.9.1) as the standard downstream analysis pipeline. Genes expressing less than five cells were filtered out. Cells (spots) expressing >200 genes or showing a mitochondrial expression ratio above 5% of the total gene expression were also removed. Doublet cells in scRNA-seq data were identified and removed using Scrublet (33) (version 1.1.0). We performed log-normalization, selected 2000 highly variable genes using sc.pp.highly_variable_genes and performed principal component analysis (PCA) using sc.pp.pca(adata, n_comps=50, svd_solver=‘arpack’).
Data integration and neighbor graph construction
To establish the associations among slices in SRT data, we integrated multiple samples presented in a dataset to correct batch effects using harmonypy (34) (version 0.0.9) based on 50 PCA components. Specifically for image-based data, we further integrated multiple image-based SRT samples with paired scRNA-seq data. For neighbor graph construction, assuming that the integrated dataset has n cells, we constructed the knn-based connectivity graph V(n*n) using the function sc.pp.neighbors. In the connectivity graph, each node represents a cell, and each edge represents the distance between two cells. We used a Gaussian kernel to generate the similarity matrix W, where
Here, the hyper-parameter α is 0.5 by default. Next, we built the probability transition matrix P, where Pij is the probability of label transition from cell i to cell j.
The dimensional reduction was performed for visualization using uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding based on the neighbor graphs. Afterwards, we used MENDER (35) to perform multi-scale cellular context representations of SRT data.
Clustering and cell annotation
Pairpot segmented the spatial domains in multiple slices using MENDER (35) based on the multi-scale cellular context representations. For scRNA-seq data, Pairpot clusters the cells using the Leiden algorithm with the default resolution. Markers were detected using sc.tl.rank_genes_groups (adata, groupby=’leiden’, method=‘wilcoxon’).
Pairpot uses different strategies to replicate annotations for single-cell data in the following scenarios. In the scenario where the original study provides a publicly available annotation file, Pairpot directly imports and verifies its integrity, maintaining a direct alignment with the source data. In the scenario where annotation files are not available, Pairpot refers to the markers and cell names provided by original studies and replicates annotations manually. In the replicating process, existing publications may provide detailed annotations for the clusters relevant to their research topics, while other clusters are loosely annotated (36). The annotations of the clusters irrelevant to the original research topics may also be valuable for other researchers. We annotate these clusters by matching their marker genes with existing canonical markers to improve annotation accuracy. In the scenario where studies lack cell-type names or markers, we search for the potential cell types according to the organs and species from existing cell marker databases [e.g. PanglaoDB (27) and CellMarker 2.0 (37)], and then perform manual annotation as same as the scenario above.
UCell assignment
Evaluating signature scores in both single-cell and SRT data helps elucidate cell-type proportions and functions in cells and spots, and therefore establish potential associations between single-cell and SRT data. Pairpot uses UCell (28) to assign signature scores and evaluate cell types for the datasets lacking original annotations. Since UCell scores only depend on relative gene expression, they are also fundamental parameters for Pair-View to perform real-time cell proportion inference in SRT data.
For an SC–SP pair, we first found the potential cell types |$\mathcal {C}=\lbrace c_1, c_2,...,c_k\rbrace$| and their corresponding gene signatures |$\mathcal {S}=\lbrace S_1, S_2,..., S_k\rbrace$| from PanglaoDB (27) based on the species and tissues. Then we evaluated the UCell scores (28) of the cell-type gene signatures in |$\mathcal {S}$|. Assuming that a dataset has m genes after pre-processing, the gene expression of a cell or spot xi is Gi = {gi1, gi2, ..., gim}. Assuming a cell-type gene signature set has n genes S = {s1, s2, ..., sn}, S ∈ Gi and the corresponding gene rank score is ri1, ri2, ..., rin, the UCell score of xi is
where
Here, U is the Mann–Whitney U statistic (bounded by 0 and Umax), and maxRank is a hyperparameter (default to 1500), which sets limitations to the rank of the top genes. Rank score Rij larger than maxRank would be set to maxRank. Rmax and Rmin are the max and min rank sum scores, respectively.
Deconvolution
Deconvolution was performed using Cell2Location (12) (version 0.1.3), RCTD (38) (version 2.0.0), CARD (39) (version 1.0), SpaTalk (40) (version 1.0), Seurat (41) (version 5.1.0) and CellTrek (42) for integrative analysis to infer the cell proportions of each spot using pre-defined cell type annotation results.
Cell communication
We employed CellphoneDB (43) (version 5.0), CellChat (44) (version 1.5.0), iTALK (https://github.com/Coolgenome/iTALK), COMMOT (45) (version 0.0.3) and SpaTalk (40) (version 1.0) for cell interaction inference leveraging the clustering and cell-type annotation results, which were used to analyze communication patterns and mechanisms between tissue regions and neighboring cell types.
Real-time heuristic analysis
Real-time heuristic analysis refers to the rapid (usually within milliseconds) processing of analytical tasks leveraging user-provided prompts, enabling timely responses. In Pairpot, we developed two real-time heuristic analysis modules: Lasso-View, which refines user-selected spatial domains or cells; and Pair-View, which infers cell proportions of spots using the user-defined cell types.
Inspired by the label propagation algorithm (46), Lasso-View aims to propagate the prior labels according to the connectivity among cells. During data curation, Pairpot evaluates similarity using the Gaussian kernel function and generates the probability transition matrix (P) for each dataset according to formula (2). Each element Pij in P represents the probability of transferring from the cell (spot) i to j. During online analysis, based on annotations provided by Pairpot, users can generate customized annotations by selecting new cell types using lasso tools. Lasso-View generates the prior labels by masking 90% of the cells annotated by Pairpot and appends the cells selected by users. Assuming that the prior labels have m candidates {l1, l2, …, lm}, Lasso-View generates the label probability matrix Yn*m, whose element Yij represents the probability of classifying cell (spot) i as label lj, 0 ≤ Yij ≤ 1. Yn*m is decomposed into YLk*m containing labeled cells, and YU(n − k)*m containing unlabelled ones, n denotes the number of cells and k denotes the number of labeled ones. Lasso-View initializes the label probability matrix Y0 as follows:
Here, 1 ≤ i ≤ k in YL0, k + 1 ≤ i ≤ n in YU0, 1 ≤ j ≤ m. In YL0, each of its elements YL0ij is set to 1 if the label of cell i is lj. In YU0, all initial values are set to 1/m, indicating that each cell (spot) has an equal probability of being classified as any label candidate. During the iteration, Lasso-View calculates a new label proportion matrix Yt by multiplying the probability transition matrix P with Yt − 1 and doing normalization subsequently. In epoch t,
Here, the normalize function denotes that each element in YUt’ is divided by the sum of all the elements in its row. After each iteration, Lasso-View evaluates the L1-norm of the subtraction between Yt and Yt − 1. Based on our experimental evaluations (Supplementary Figure S1), we determined that the termination point most suitable for Lasso-View is at either 1000 epochs or upon reaching a convergence at
After termination, for each cell (spot) i, Lasso-View assigns to it the label with the highest probability in Yi,
Indeed, a few of the cells selected by users may be dissimilar from the major selected cells. After the Lasso-View process, we rectify the label of each lassoed cell with the most frequent label among its k-nearest cells. Ultimately, the result will be displayed in the right chart of Lasso-View.
Pair-View aims to infer cell proportions of spots using the user-selected cells from single-cell data in real time. Inspired by SPOTlight (47), Pair-View performs non-negative linear square regression (NNLS) using UCell scores from user-defined cell types and spots in SRT data. During online analysis, users select cells in the single-cell data, while Paire-View assigns the user-selected cell type u to these cells. Pair-View then generates the prior labels by appending the cell type u and other previously annotated cell types. Assuming that SRT data has n spots, single-cell data have m-cell types (including the user-selected cell type) and their pre-built UCell scores have k gene signature sets. Pair-View constructs a non-negative linear regression model to infer cell proportions in spatial transcriptomics using the user-selected cells in scRNA-seq data.
Here, Ysp is the n × k UCell scores matrix of all spots in SRT data, Xsc is the m × k mean UCell scores of all cell types in single-cell data. Each UCell score is defined by formula (3). W is the n × m coefficient matrix, whose element $w$ij denotes the weight of cell type j in spot i. The coefficient matrix W is inferred by LinearRegression(positive=True) in scikit-learn (version 1.3.2), Pair-View returns the weight of user-selected cell-type u in all spots as the cell proportions in SRT data.
Database construction
The front-end frameworks were constructed with React.js (version 18.1.2) and ant-design (version 5.19.3). The real-time analysis modules were constructed with Echarts (version 5.4.3). The backend was built using Flask (version 3.1.0) and Python (version 3.10.12). Pairpot used SQLite3 (version 9.6) to store the metadata of publications and datasets. Neo4j (version 5.22.0) was used to store the relationships among datasets. Nginx (v1.22.0) was used as the reverse proxy server. Currently, Pairpot has been tested successfully in the following browsers: Google Chrome (v114.0), Safari (v17.5) and Firefox (v114.0).
Database contents and usage
Overview of Pairpot
Pairpot is a database tailored for paired single-cell and spatial transcriptomics data, providing data deployment, interactive data exploration and lasso-based real-time analysis (Figure 1). Pairpot collected 299 SRT datasets from 251 studies and generated 99 SC–SP pairs across 17 species and 25 spatial transcriptomics technologies, including 1,180,181 cells and 1,425,656 spots (Figure 1A). Pairpot provides association networks to establish connections among different single-cell and SRT studies, and wordclouds to excavate the focal points of the current available datasets in the research community. Pairpot also introduces a ‘Pair Score’ to evaluate the quality of each SC–SP pair.

Overview of Pairpot. (A) Data Collection of Pairpot. (B) Data Curation of Pairpot, including multi-sample integration, neighbor graphs construction, clustering and annotation, marker genes detection, module scores assignment, deconvolution and cell/domain communication evaluation. (C) Real-time analysis and data exploration of Pairpot.
Pairpot establishes the associations among SRT slices in a study by constructing neighbor graphs of multiple slices in multi-scales. The neighbor graphs enable users to heuristically explore and identify similar cells (spots) in other samples after selecting cells in one sample. Pairpot also provides cell-type annotations of single-cell data and their cell proportions in SRT data. Moreover, Pairpot establishes the associations among single-cell and SRT data by evaluating their UCell scores, which enables users to infer the proportions of their customized cell types in spots of SRT data (Figure 1B). Each SRT dataset in Pairpot contains a gene expression matrix and the spatial coordinates for each spot. Each single-cell dataset contains cells with precise cell-type annotations, which are randomly downsampled to 3000 in order to accelerate the web-page loading while maintaining enough biological information. Both single-cell and SRT datasets are curated into a unified data format based on AnnData (48). On this basis, Pairpot provides real-time analysis including Layer-View, Lasso-View and Pair-View that supports for incorporating heuristic analysis according to user interaction patterns (Figure 1C).
Pairpot contains five modules: the data search and browse module, the Lasso-View module, the Layer-View module, the Pair-View module and the analysis results exploration module (Figure 2). Users can access each module using the navigation bar on the top of the Pairpot home page.

User interfaces and usages of modules in Pairpot database. (A) Data Search and Browse. (B) The Lasso-View module. (C) The Layer-View module. (D) The Pair-View module. (E) Analysis results exploration, including marker heatmaps, cell/domain interaction networks and ligand-receptor pair heatmaps.
Data search and browse
Pairpot enables efficient navigation through the extensive collection of datasets, which provides convenient access to meta-information, and literature resources (Figure 2A). Users can search for datasets by any words of interest, or apply filters by clicking preset keywords in species, tissues, technologies and diseases. Afterwards, they can take an overview of statistics and word clouds about the filtered results by browsing through an extensive list or association networks of all single-cell and SRT datasets. In the extensive list, each item contains a title, summary and a series of action buttons. By clicking ‘Visualization’, users can access the paired dataset, the static analysis results and the lasso-based real-time analysis module. By clicking ‘Descriptions’, users can access detailed meta-information and links to original datasets in each item. In the association networks, each node denotes a dataset and its color denotes the dataset’s species. Nodes with self-loops in the network represent datasets that contain SC–SP pairs. Paired single-cell and SRT datasets are linked by undirected edges. Users can click the nodes to explore detailed information and access the analysis modules. In brief, Pairpot provides visual insights into dataset relationships and characteristics.
Lasso-view
Lasso-View is a heuristic analysis that discovers extra omitted cells (spots) similar to the lassoed ones in both single-cell and SRT datasets based on semi-supervised machine learning (Figure 2B). Users select cells (spots) of interest as a cluster (domain), then Lasso-View discovers unselected cells similar to those in the user-defined cluster and removes the cells mistakenly selected into the cluster (Figure 3). During data curation, the probability transition matrix of each dataset was generated by Pairpot in advance (Figure 3A). During online analysis, users can generate customized annotations by adding new cell types using lasso tools. Then, Lasso-View masks 90% of cell (spot) labels, appends the user-selected labels and initializes the initial label probability matrix (Figure 3B) and then enters iterations (Figure 3C). After the termination of iterations, Lasso-View highlights cells similar to the new cell type with rectification (Figure 3D). Combined with neighbor graphs constructed with integrated datasets, Lasso-View can further discover similar spatial domains across multiple slices based on user-selected domains in a slice. Users can click the lasso tools at TopRight and then select their cells of interest in the left chart. Lasso-View provides various select modes and tools for users to zoom, draw and erase their customized cells. Lasso-View also offers predefined annotations of spatial domains and cell annotations, which serve as a reference guide for users to select cells (spots) of their interests. After renaming the selected cells, users can click ‘Refine’ to generate refined annotations by calling the API of Lasso-View in the server. The API of Lasso-View is optimized by C++, ensuring efficient processing of single-cell and SRT datasets within a millisecond-level response time. The refined annotations can be confirmed and downloaded through their corresponding buttons, facilitating the sharing of refined results among collaborators.

Schema of Lasso-View. (A) Data preparation for Lasso-View. Pairpot evaluates the probability transition matrix Pn*n for each dataset, n denotes the number of cells in the dataset. Nodes denote cells and colors denote their corresponding labels provided by Pairpot. (B–D) Real-time analysis in Lasso-View. (B) Initialization of label probability matrix. Users generate customized annotations using Lasso tools. Then Lasso-View masks 90% of cell (spot) labels, appends the user-selected labels and evaluates the initial label probability matrix. Each column in Y corresponds to a cell-type label. (C) Iterations of Lasso-View. (D) Labels assignment and rectification. The highlighted element in the converged matrix denotes the highest probability of each row. In the output of Lasso-View, Cells 6 and 7 are similar to the user-selected cells, and Cell 3 is rectified.
Layer-view
Inspired by 3D Landscape in De-spot (49), Pairpot develops Layer-View to dynamically display gene expressions and annotations of multiple slices in 3D hierarchical layouts (Figure 2C). Leveraging the integrated neighbor graphs, the spatial domains are segmented across multiple slices using MENDER (35). Users can explore multiple slices of a study in the left chart, or focus on a single slice in the right chart. In the 3D layout, the x-axis and y-axis denote the spatial coordinates of the slices, while the z-axis represents different batches of these slices. Users can rotate the axis to switch perspectives, click ‘Inverse’ to hide all annotations, and highlight specific domains by clicking their legends.
Pair-view
Pair-View is another heuristic analysis that aids users in quickly inferring cell proportions of spots using their customized cells from single-cell data in real-time (Figure 2D). The interface of Pair-View allows users to view SRT data (right) and paired single-cell data (left) simultaneously. Users can use lasso tools to select cells of their interests in the single-cell chart, similar to Lasso-View. They can also use options in ‘scConfigs’ to select customized cell types under different annotations and embeddings. After selecting customized cell types, users can click the ‘Deconv’ buttons to call the Pair-View API. The Pair-View API performs NNLS online based on the pre-analyzed UCell scores, which uses the Mann–Whitney U statistic to evaluate the related gene expression in single-cell and SRT data (see the ‘Materials and methods’ section). The inferred cell proportions would be subsequently displayed in the SRT chart. Users can further explore cell proportions in different batches through the options in ‘spConfigs’.
Analysis results exploration
Pairpot also provides diverse pre-analysis results for user exploration (Figure 2E). Users can click a cell type in the left chart of Pair-View and then its proportions will be displayed in the right chart accordingly. Users can access spatial markers of domains through a marker heatmap, where domains are hierarchically clustered. When hovering over the scatters in the Marker Table, a tooltip will pop up with their cell types, marker names, average expressions and fractions. When clicking the ‘Rotation’ button, the marker table will exchange the x-axis and y-axis and provide a scalable slider to navigate through areas of interest. Users can browse interaction networks and ligand-receptor pairs for both single-cell and SRT data. For spatial domain interactions of SRT data, users can see the spatial locations of clusters by clicking nodes in the spatial domain interaction networks. For cell interactions of single-cell data, users can see the locations of cells in UMAP embeddings alongside. When hovering over a clustering node, the tooltip will pop up with main cell types, interaction counts and highly variable genes. In addition, users can select the provided legends to re-render the interaction networks. For ligand-receptor pair heatmaps, users can select the results generated by different interaction inference methods as preferences.
Case study
We explored the embryonic mouse brain spatial transcriptomics data (STDS0000235), which contains four slices with 4597 spots and 17 different domains (17). Its corresponding single-cell data (SCDS0000001) are from dorsal and ventral E13.5 embryonic brains of a previous published dataset (50), which are further annotated to 11 cell types including astrocytes, neurons, neuroblasts, pyramidal cells, platelets and erythroid cells (Figure 4).
![Case study of SC–SP pairs containing multiple mouse brain slices in Pairpot. (A) Lasso-View case 1. Left, the spatial chart where users select the spots of their interests in a slice. Middle, the refined domains of multiple slices in spatial embedding. Right, the refined domains of multiple slices in UMAP embedding. (B) Lasso-View case 2. Left, the single-cell chart where users select the cells of their interests. Right, the refined cells in UMAP embedding. (C) Execution time of Lasso-View API in current single-cell and SRT data [bars indicate the mean ± standard error of the mean (SEM)]. (D, E) The adjusted Rand index (ARI) of Lasso-View tested in single-cell (D) and SRT (E) data with the condition of 90% masked annotations and 0–0.95 error rate in 10% known annotations (bars indicate the mean ± SEM). The green line indicates mean ARI of Lasso-View without rectification. (F) Pair-View identifies the proportions of user-selected cells in multiple mouse brain slices. Left, Origin annotations of single-cell data. Right, cell proportions of neuroblast 2 in slices Control 02, Control 05, Control 06 and Control 09, pre-defined by Pairpot. (G) Pair-View case 1. Left, the single-cell chart where users select the neuroblast 2 subtype 1. Right, cell proportions of neuroblast 2 subtype 1 evaluated by Pair-View. (H) Pair-View case 2. Left, the single-cell chart where users select the neuroblast 2 subtype 2. Right, cell proportions of neuroblast 2 subtype 2 evaluated by Pair-View.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/nar/53/D1/10.1093_nar_gkae986/1/m_gkae986fig4.jpeg?Expires=1747899679&Signature=hhI6YF-bmSa~AJi-ClCWgj8kdwfy4n7KiG3eIE8d458steoQjEiRjRzi3OKzcDhW2cPrRX5-fEPNdENm8Ow4rL6F61iusTsSZpr1AZ343OkBWFmlqTQuL9qKU4tOdnGUwLB3KMyFftuv9SjK~XYOMNzY6lh1pA1glEqPK8qShSj6TGhsd-7Qid1rHnoIxWmunrEYW45p8hf07LDMYe38aSK1rcKS12w2YXX7m5si6JK5skaVox6aDyku8rKX6WK4YjCj03uqRB74eMiOsbuwfpoJm6Leg4NFKrOWgmpIl9J1d-KvjwTpCXIPHE3WKViC37vHNA3-o6XF9ZcRDyebpQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Case study of SC–SP pairs containing multiple mouse brain slices in Pairpot. (A) Lasso-View case 1. Left, the spatial chart where users select the spots of their interests in a slice. Middle, the refined domains of multiple slices in spatial embedding. Right, the refined domains of multiple slices in UMAP embedding. (B) Lasso-View case 2. Left, the single-cell chart where users select the cells of their interests. Right, the refined cells in UMAP embedding. (C) Execution time of Lasso-View API in current single-cell and SRT data [bars indicate the mean ± standard error of the mean (SEM)]. (D, E) The adjusted Rand index (ARI) of Lasso-View tested in single-cell (D) and SRT (E) data with the condition of 90% masked annotations and 0–0.95 error rate in 10% known annotations (bars indicate the mean ± SEM). The green line indicates mean ARI of Lasso-View without rectification. (F) Pair-View identifies the proportions of user-selected cells in multiple mouse brain slices. Left, Origin annotations of single-cell data. Right, cell proportions of neuroblast 2 in slices Control 02, Control 05, Control 06 and Control 09, pre-defined by Pairpot. (G) Pair-View case 1. Left, the single-cell chart where users select the neuroblast 2 subtype 1. Right, cell proportions of neuroblast 2 subtype 1 evaluated by Pair-View. (H) Pair-View case 2. Left, the single-cell chart where users select the neuroblast 2 subtype 2. Right, cell proportions of neuroblast 2 subtype 2 evaluated by Pair-View.
We conducted case studies to evaluate the capability of Lasso-View to discover biological insights. In Lasso-View case 1, we selected spots in a lateral ganglionic eminence (LGE) domain of slice Control 02 (Figure 4A, left). After calling Lasso-View API, the refined domains were highlighted not only in slice Control 02 but also in LGE domains of other slices (Figure 4A, right). In addition, spots mistakenly selected in the cortex regions were removed. In Lasso-View case 2, we selected part of the interneuron cells in the single-cell chart, and the refined cells covered the interneurons after calling Lasso-View API (Figure 4B). The results demonstrated that Lasso-View can discover similar domains across different slices and remove unrelated spots.
To evaluate the accuracy and response speed of Lasso-View, we randomly select cells and call the Lasso-View API 10 000 times to simulate the user’s behavior. As shown in Figure 4C, the average processing time of Lasso-View API is <200 milliseconds for single-cell data and about 500 milliseconds for SRT data. We masked 90% of labels, and randomly replaced the remained 10% of labels with errors as pseudo user-selected labels for Lasso-View. We used these pseudo-labels to predict the labels of masked cells and evaluate ARI for each cell type. The average ARI for single-cell data was over 0.8 in the non-error situation and remained above 0.7 even when the error rate was up to 0.6 (Figure 4D). For spatial transcriptome data, the average ARI reached 0.85 in the non-error situation and remained above 0.7 when the error rate was up to 0.65 (Figure 4E). Additionally, we tested the capability of rectification options. Lasso-View with rectification maintained significantly better ARI performance across different error rates (Figure 4D and E). The results above revealed that Lasso-View can accurately discover cells similar to user-selected ones and sensitively rectify errors in the user selection within milliseconds.
We also applied real cases to evaluate the capability of Pair-View in inferring cell proportions using user-selected cells from single-cell data. Neuroblast2 cells were originally detected in the dorsal and ventral embryonic brain (Figure 4F). To detect the proportions of neuroblast2 more precisely, we manually dichotomized neuroblast2 into subtype 1 and subtype 2 using lasso tools. In particular, subtype1 contains more cells from the dorsal batch, while subtype2 contains more cells from the ventral batch. After calling Pair-View API, we found that subtype 1 was correctly detected in the cortex of the dorsal embryonic brain, corresponding to the expression of TBR1 (Figure 4G). In contrast, subtype 2 was detected in the LGE region of the ventral brain, associated with GAD2 (Figure 4H). The locations of both subtype 1 and subtype 2 remain consistent across the four slices. The results demonstrated that Pair-View can perform precise cell proportion inference online and significantly improve the efficiency of exploring paired datasets.
Conclusions and future developments
We presented Pairpot, a database with real-time lasso-based analysis tailored for single-cell and SRT pairs. The database aims to ease the process of searching SC–SP pairs and facilitate the applications of paired single-cell and SRT data within the research community. Pairpot collects available SC–SP pairs and constructs the network associations among them. Pairpot evaluates the quality of SC–SP pairs, integrates different slices, establishes potential associations between single-cell and SRT data, and offers advanced real-time heuristic analysis modules to significantly promote understanding of biological processes and disease mechanisms. In particular, Lasso-View refines the user-selected cells by discovering unselected cells similar to those in the user-defined cluster and removing the mistakenly selected cells. Lasso-View is optimized by C++ and can respond within milliseconds. Pair-View can infer cell proportions of spots using the user-selected cells from single-cell data. Both Lasso-View and Pair-View achieve real-time in online scenarios, considerably reducing the extra work required by biologists in segmentation and deconvolution, thereby significantly improving the efficiency of personalized analysis. Pairpot is valuable to biologists not only because it offers easy-to-access paired datasets but also because its lasso-based modules provide a new perspective for developing efficient online computational platforms based on pre-analyzed information and user behavior. Pairpot is also useful for researchers to seek potential biological insights because it provides both overall observations for multiple slices in a 3D layout and precise real-time lass-based analysis for cells or domains of user’s interest.
Pairpot’s intuitive interface and heuristic analysis modules empower users to tailor their workflows according to their specific needs. For researchers who prefer to download the data and conduct analyses offline, Pairpot generates code for online heuristic analysis and alternative bioinformatics tools that allow users to replicate the analysis results offline. If researchers would like to analyze their own paired single-cell and SRT data, they can integrate the interfaces provided by Pairpot into their pipelines, including preprocessing, data integration, neighbor graph construction and UCell scores evaluation. For researchers without programming capabilities, Pairpot provides valuable resources and powerful tools online to streamline the SRT integrative analysis and enhance the understanding of biological processes and disease mechanisms.
We will continue to update Pairpot by collecting newly published SRT data with annotations and matching them with paired single-cell data. In addition, we plan to conduct SC–SP pairs by matching an SRT dataset with multiple single-cell datasets simultaneously. Since single-cell and spatial transcriptomics are both fast-evolving fields, we plan to continuously provide results generated by newly developed alternative tools and develop more downstream analysis modules to facilitate biomedical research.
Data availability
The Pairpot website is available online at http://pairpot.bioxai.cn. Its data resources are also freely available at http://src.bioxai.cn. The source code of Pairpot is available at GitHub: https://github.com/lyotvincent/Pairpot and Zenodo: https://doi.org/10.5281/zenodo.13919102.
Supplementary data
Supplementary Data are available at NAR Online.
Acknowledgements
Author contributions: J.L. conceived and designed the project. Z.R., F.L., Z.Z., J.C. and X.W. curated the database. Z.R., Z.Z. and J.L. developed the approach and conducted analysis pipelines. Z.R., F.L. and W.X. developed the web platforms. Z.R. and J.L. wrote the manuscript with input from all authors. All authors read and approved the final manuscript.
Funding
National Key Research and Development Program of China [2020YFA0908700 and 2020YFA0908702]; National Natural Science Foundation of China [62272246].
Conflict of interest statement. None declared.
References
Author notes
The first three authors should be regarded as Joint First Authors.
Comments