Abstract

Recent advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled the comprehensive profiling of gene expression patterns at the single-cell level, offering unprecedented insights into cellular diversity and heterogeneity within plant tissues. In this study, we present a systematic approach to construct a plant single-cell database, scPlantDB, which is publicly available at https://biobigdata.nju.edu.cn/scplantdb. We integrated single-cell transcriptomic profiles from 67 high-quality datasets across 17 plant species, comprising approximately 2.5 million cells. The data underwent rigorous collection, manual curation, strict quality control and standardized processing from public databases. scPlantDB offers interactive visualization of gene expression at the single-cell level, facilitating the exploration of both single-dataset and multiple-dataset analyses. It enables systematic comparison and functional annotation of markers across diverse cell types and species while providing tools to identify and compare cell types based on these markers. In summary, scPlantDB serves as a comprehensive database for investigating cell types and markers within plant cell atlases. It is a valuable resource for the plant research community.

Introduction

In recent years, the field of plant single-cell research has witnessed significant advancements, revolutionizing our understanding of plant biology at the cellular level (1). With the advent of high-throughput sequencing technologies and the development of innovative analytical tools, researchers have been able to delve into the intricacies of plant cellular heterogeneity, gene expression dynamics and regulatory networks. One notable progress in plant single-cell research is the expansion of single-cell transcriptomics studies across various plant species. Initially limited to model species such as Arabidopsis thaliana (2,3), researchers have now successfully applied single-cell RNA sequencing to a diverse range of plant species, including crops (4), trees (5) and non-model organisms (6). This expansion has provided insights into the conserved and unique features of cellular processes across different plants, shedding light on the evolution of plant development and adaptation mechanisms.

Another significant development is the identification and characterization of distinct cell types and their gene expression profiles within plant tissues and organs. Single-cell transcriptomics has allowed researchers to unravel the molecular signatures that define specific cell types, such as idioblast cells (7), fiber cells (8) and stigma (9). By dissecting the transcriptional programs underlying cell fate determination and differentiation, these studies have contributed to our understanding of plant development and the regulation of tissue-specific functions (10).

Significant progress has indeed been achieved in characterizing single-cell landscapes in various plant systems. However, a major challenge in the field is the lack of a comprehensive reference resource that facilitates the systematic reuse, exploration and comparison of published single-cell datasets in plants. Although several online-based databases have been established, contributing significantly to the wider utilization of this type of data (11–14), these existing databases have not adequately addressed the challenges associated with integration and comparison of multiple datasets in plants, especially considering the rapid expansion of data. Therefore, there is a pressing need for a comprehensive single-cell database for plants that offers a user-friendly interface and a one-stop solution.

Here, we have developed a comprehensive plant single-cell database, named scPlantDB (https://biobigdata.nju.edu.cn/scplantdb). This database integrates single-cell transcriptomic profiles from diverse plant species, enabling researchers to explore the cellular heterogeneity and gene expression patterns across a wide range of plant tissues and conditions. By collecting and curating high-quality datasets from public sources, followed by rigorous quality control and standardized data processing, scPlantDB provides a reliable and accessible resource for the plant research community.

Through scPlantDB, researchers can visualize gene expression profiles at the single-cell level, facilitating the comparison of gene expression patterns across cell types and species, and enabling systematic exploration of plant cellular diversity. Additionally, scPlantDB provides researchers with the means to compare and annotate cell types based on specific gene expression markers, aiding in the discovery of novel cell populations and their functional roles.

Materials and methods

Data sources

A total of 2,546,778 cells from 67 scRNA-seq datasets across 17 species were manually collected from published plant studies and added to the scPlantDB database (Table 1). The raw data were obtained from various sources, including NCBI SRA, EBI ENA, DDBJ DRA or GSA databases. Two criteria were used to select the single-cell samples: (i) a minimum of 1000 cells, (ii) availability of cell type annotations and (iii) clear sample information. It is important to note that if a dataset contained multiple tissue types, each dataset was processed separately.

Table 1.

Data summary of scPlantDB

SpeciesCellsExperimentsDatasetsCell typesCell markersConditionGenotypeTissue
Arabidopsis thaliana1 372 5781693454141 696263718
Oryza sativa441 7113664517 838557
Zea mays352 124107113820 8964311
Triticum aestivum687511197572111
Manihot esculenta35 52481135197111
Populus alba var. pyramidalis734521137557112
Medicago truncatula24 97861112671211
Brassica rapa22 7413193362111
Fragaria vesca46 0283194399311
Solanum lycopersicum61 8645293454122
Gossypium bickii12 8951183226111
Catharanthus roseus41 6533171804111
Glycine max25 1213171626312
Nicotiana attenuata3111315932311
Populus alba × P. glandulosa24 0164153433111
Gossypium hirsutum61 55312243330141
Bombax ceiba6661113558111
SpeciesCellsExperimentsDatasetsCell typesCell markersConditionGenotypeTissue
Arabidopsis thaliana1 372 5781693454141 696263718
Oryza sativa441 7113664517 838557
Zea mays352 124107113820 8964311
Triticum aestivum687511197572111
Manihot esculenta35 52481135197111
Populus alba var. pyramidalis734521137557112
Medicago truncatula24 97861112671211
Brassica rapa22 7413193362111
Fragaria vesca46 0283194399311
Solanum lycopersicum61 8645293454122
Gossypium bickii12 8951183226111
Catharanthus roseus41 6533171804111
Glycine max25 1213171626312
Nicotiana attenuata3111315932311
Populus alba × P. glandulosa24 0164153433111
Gossypium hirsutum61 55312243330141
Bombax ceiba6661113558111
Table 1.

Data summary of scPlantDB

SpeciesCellsExperimentsDatasetsCell typesCell markersConditionGenotypeTissue
Arabidopsis thaliana1 372 5781693454141 696263718
Oryza sativa441 7113664517 838557
Zea mays352 124107113820 8964311
Triticum aestivum687511197572111
Manihot esculenta35 52481135197111
Populus alba var. pyramidalis734521137557112
Medicago truncatula24 97861112671211
Brassica rapa22 7413193362111
Fragaria vesca46 0283194399311
Solanum lycopersicum61 8645293454122
Gossypium bickii12 8951183226111
Catharanthus roseus41 6533171804111
Glycine max25 1213171626312
Nicotiana attenuata3111315932311
Populus alba × P. glandulosa24 0164153433111
Gossypium hirsutum61 55312243330141
Bombax ceiba6661113558111
SpeciesCellsExperimentsDatasetsCell typesCell markersConditionGenotypeTissue
Arabidopsis thaliana1 372 5781693454141 696263718
Oryza sativa441 7113664517 838557
Zea mays352 124107113820 8964311
Triticum aestivum687511197572111
Manihot esculenta35 52481135197111
Populus alba var. pyramidalis734521137557112
Medicago truncatula24 97861112671211
Brassica rapa22 7413193362111
Fragaria vesca46 0283194399311
Solanum lycopersicum61 8645293454122
Gossypium bickii12 8951183226111
Catharanthus roseus41 6533171804111
Glycine max25 1213171626312
Nicotiana attenuata3111315932311
Populus alba × P. glandulosa24 0164153433111
Gossypium hirsutum61 55312243330141
Bombax ceiba6661113558111

Additionally, since long non-coding RNAs (lncRNAs) may serve as marker genes (15), we compiled lncRNA databases for well-annotated plant species such as Arabidopsis thaliana, rice (Oryza sativa) and maize (Zea mays), from relevant studies (if available). For plant species with available lncRNA annotations, both lncRNAs and protein-coding genes were included for quantification analysis.

Data processing

scRNA-seq data in SRA format were converted into FASTQ format using sratoolkit (v2.10.7). Raw single-cell data in FASTQ format were aligned to corresponding reference genomes (accessible at scPlantDB Marker module), and then subjected to barcode assignment and unique molecular identifier (UMI) counting using cellranger (v5.0.1) for 10x Genomics data, the McCarroll Lab protocol for Drop-seq and Dolomite Bio data (http://mccarrolllab.org/wp-content/uploads/2016/03/Drop-seqAlignmentCookbookv1.2Jan2016.pdf), the WTA Local bioinformatics pipeline for BD Rhapsody data, and a customized framework for CEL-seq2 data (https://github.com/yanailab/celseq2). The resulting count matrices were further processed using the Seurat package (v4.0.0) (16). We performed quality control on the single-cell RNA sequencing data by removing cells with low number of expressed genes (<200 genes) or with high mitochondrial expressed genes (>10% of total UMI counts). Additionally, we employed the interquartile range (IQR) method to establish cutoff values for the nFeature_RNA metric. The lower cutoff was defined as the first quartile (Q1) minus 1.5 times the IQR, while the upper cutoff was set as the third quartile (Q3) plus 1.5 times the IQR. Cells with nFeature_RNA values falling outside of these cutoffs were excluded from subsequent analysis.

Integration and clustering

To integrate different scRNA-seq datasets from a specific study, two strategies were employed. For small datasets (experiments < 10), Canonical Correlation Analysis (CCA) was used. For larger datasets (experiments > 10), the Robust Principal Component Analysis (RPCA) method was utilized. To determine the top 3000 most variably expressed genes, the ‘vst’ method in the ‘FindVariableFeatures’ function was applied. The expression values were scaled using the ‘ScaleData’ function, with regression performed on the proportion of mitochondrial UMIs (mt.percent).

For visualization purposes, the ‘RunPCA’ function computed the top 30 principal components (PCs) using the previously identified top variably expressed genes. The UMAP (Uniform Manifold Approximation and Projection) algorithm was then employed to visualize the cell clusters. To perform clustering on the integrated expression values, a shared-nearest-neighbor (SNN) graph clustering approach based on the Louvain community detection method was used. The ‘FindClusters’ function was utilized with a resolution parameter set to 0.8.

Cell-type annotation

In our study, cells were annotated using marker genes that have been experimentally verified. To ensure consistent identification of tissue and cell types, we established a standardized taxonomy of tissue and cell type names. This taxonomy was based on the Plant Ontology database (https://planteome.org/). However, certain cell types, such as phloem parenchyma, which were marked by SWEET11 and SWEET12 (17), were not represented in the database. To address this, we assigned a unique database ID (scPlantDB-ID, SP-ID) and provided a detailed description for these cell types. This allowed us to include them within our standard taxonomy, ensuring comprehensive representation of cell types in our study.

Cell type-specific marker identification

To identify markers for each cluster, the ‘FindAllMarkers’ function was employed with default parameters. Following cell type annotation, the ‘FindAllMarkers’ function was executed again for each cell type. The generated cell type-specific markers underwent filtering based on the following criteria: (i) adjusted P-value (P_val_adj) <0.05, (ii) percentage of cells expressing the marker (pct.1) >0.2 and (iii) average log2 fold change (avg_log2FC) >0.5. The filtered markers meeting these criteria are provided in the ‘Marker module’ within our database.

Similarity analysis of cell types based on cell markers

The similarity ratio among cell types were calculated based on their markers. Specifically, for cell types A and B, the ratio is calculated using the formula:|$\ \frac{{{C}_a \cap {C}_b}}{{{C}_a\cup{C}_b}}$|⁠, where |${C}_a \cap {C}_b$| represents the intersection of cell markers from cell types A and B, and |${C}_a\cup{C}_b$| represents the union of cell markers from these two cell types (18).

The resulting ratio ranges from 0 to 1. A ratio of 1 indicates that the two cell types have the exact same cell markers, while a ratio of 0 indicates that no markers are shared between them. The ratio serves as a measure of similarity between the marker genes of the two cell types.

Cell marker orthology and cross-species comparison

To identify ortholog genes, we utilized OrthoFinder (19) for all 17 species present in our database. We obtained marker gene sets, all of which are associated with one or more orthogroups, enabling us to establish a set of orthogroups. By assessing whether these genes belong to their respective orthogroups, we can determine if they serve as markers for the same cell type. The resulting table in the marker module includes ortholog markers that were identified as belonging to the same cell type. Furthermore, we conducted a cross-species comparison of cell markers to assess conservation across species based on orthology genes. This comparison allowed us to identify markers that are shared among multiple species, indicating potential similarities in cell type classification across organisms. Additionally, the cross-species comparison unveiled specific cell markers that are unique to individual species, suggesting potential differences in cell type classification between organisms.

This information is valuable for comprehending the evolutionary relationships between organisms and gaining insights into their cellular organization. It helps shed light on the conservation and divergence of cell types across species, facilitating a deeper understanding of biological processes and organismal complexity.

Database implementation

scPlantDB was developed in MySQL (https://www.mysql.com/) as the storage engine and using Flask (https://flask.palletsprojects.com/) as the backend web framework. The web user interfaces were created using Vue.js (https://vuejs.org/), Element (https://element.eleme.io/) and Bootstrap (https://getbootstrap.com/). For data visualization purposes, scPlantDB utilizes Echarts (https://echarts.apache.org/), Plotly.js (https://plotly.com/) and D3 (https://d3js.org/). These libraries provide powerful and interactive visualization capabilities, enabling users to explore and analyze the data in an intuitive manner.

Overall, the combination of Flask, Vue.js, Element, Bootstrap, Echarts, Plotly.js and D3 allows for the seamless integration of backend functionality, user interface development and dynamic data visualization within the scPlantDB platform.

Results

Overview of scPlantDB

scPlantDB is a comprehensive and specialized plant-specific database that offers the largest collection of plant single-cell transcriptome datasets (Figure 1). Currently (by May 2023), it consists of approximately 2.5 million cells from 67 datasets across 17 plant species (Figure 2A and Table 1). These datasets employ seven distinct scRNA-seq technologies, with over 80% of the data generated using the 10x Genomics Chromium scRNA-seq technology (Figure 2B). One of the key strengths of scPlantDB is that all the deposited data undergoes a uniform analysis pipeline. This ensures consistent and unbiased processing, enabling researchers to perform unbiased comparisons across different technologies, datasets and plant species. For instance, the 10x Genomics and Drop-seq technologies were employed to generate two of the earliest published scRNA-seq datasets in Arabidopsis root (accessions: SRP171040 and SRP169576) (20,2). Through our manual curation efforts, we have achieved a consistent annotation of cell types across both atlases (Figure 2C), allowing for better comparability and analysis. By harmonizing the analysis approach, scPlantDB promotes reliable and accurate cross-dataset and cross-species comparisons. To keep up with the rapid advancements in the field, scPlantDB is regularly updated on an annual basis with new datasets. This ensures that the database remains up-to-date with the latest research findings, providing researchers with a comprehensive and evolving resource for plant single-cell transcriptomics.

The architecture of scPlantDB.
Figure 1.

The architecture of scPlantDB.

Data summary of scPlantDB. (A) Statistics of scPlantDB from different views. (B) The proportion of datasets generated by different scRNA-seq technologies. (C) UMAP plots of the datasets of SRP171040 and SRP169576.
Figure 2.

Data summary of scPlantDB. (A) Statistics of scPlantDB from different views. (B) The proportion of datasets generated by different scRNA-seq technologies. (C) UMAP plots of the datasets of SRP171040 and SRP169576.

A standardized marker tree has been generated, encompassing 229,551 markers that are associated with 259 cell types through comprehensive integration. This extensive collection of markers enables detailed cellular characterization. The database provides functionalities such as conducting Gene Set Enrichment Analysis (GSEA) and comparing cellular markers across different species (Figure 1).

To facilitate user interaction and analysis, two interactive tools have been developed. The first tool allows users to predict cell types based on a user-defined gene list, enabling customized analysis and exploration. The second tool enables users to compare cell markers across different cell types, providing insights into marker expression patterns and potential cellular relationships (Figure 1).

The scPlantDB database significantly expands the scope of transcriptome research in plants and serves as a valuable resource for the scientific community. By aggregating and integrating diverse datasets, it enhances the availability of plant single-cell transcriptome data and complements existing resources in this field.

The user interface of scPlantDB

Dataset exploration. To enhance efficient and flexible exploration of each dataset, scPlantDB incorporates a user-friendly online analysis browser called the ‘Browser’. The ‘Browser’ within scPlantDB offers a comprehensive overview of each dataset, allowing researchers to navigate and explore gene expression profiles effortlessly (Figure 3A).

The Browser module for single-cell dataset selection and visualization. (A) A summary of datasets deposited in scPlantDB. (B) A screenshot of the cellxgene browser for single-cell data visualization.
Figure 3.

The Browser module for single-cell dataset selection and visualization. (A) A summary of datasets deposited in scPlantDB. (B) A screenshot of the cellxgene browser for single-cell data visualization.

This browser utilizes the cellxgene platform (21), providing researchers with a seamless and intuitive interface for dataset analysis. Users can interact with the data and visualize gene expression patterns using various interactive tools and visualizations provided by the cellxgene platform. By leveraging the cellxgene platform, scPlantDB empowers researchers to delve into the intricacies of gene expression within each dataset, enabling them to gain valuable insights and make discoveries in a user-friendly and intuitive manner (Figure 3B). Overall, the ‘Browser’ feature greatly facilitates dataset exploration and analysis within scPlantDB.

Comparative analysis. In addition to single-dataset visualization within the cellxgene platform, scPlantDB offers a powerful function for performing comparative analysis of multiple datasets at the single-cell resolution. This functionality enables users to gain insights by comparing cell-type distributions and gene expression patterns across different datasets within a species (Figure 4A). The selected genes can then be visualized using feature plots, violin plots and heatmaps, allowing for a comprehensive and comparative analysis of their expression patterns (Figure 4B, C). By leveraging this comparative analysis feature, researchers can explore similarities and differences in cell-type distributions and gene expression profiles across multiple datasets within the same plant species. This capability facilitates the identification of conserved or distinct cellular characteristics and provides a deeper understanding of the biological processes at play. Overall, the comparative analysis function in scPlantDB empowers researchers to explore and compare gene expression patterns and cell-type distributions across multiple datasets, leading to valuable insights and discoveries in the field of plant single-cell transcriptomics.

Comparative analysis of marker genes among datasets. scPlantDB allows users to select multiple datasets for comparative analysis a specific gene. Results are displayed in FeaturePlot (A), violin plot (B) and heatmap (C).
Figure 4.

Comparative analysis of marker genes among datasets. scPlantDB allows users to select multiple datasets for comparative analysis a specific gene. Results are displayed in FeaturePlot (A), violin plot (B) and heatmap (C).

Marker exploration. In the Marker module of scPlantDB, users can access marker gene information for each cell type across different plant species. The cell marker information is presented using a marker tree, which can be explored through either a Bar chart or a Tree list (Figure 5A). When selecting a specific cell type, the Marker module offers a word cloud visualization that intuitively displays the information about the number of marker genes and their source datasets (Figure 5B). This word cloud provides a quick overview of the marker genes associated with the selected cell type.

The Marker module for cell marker exploration. (A) Overview of annotated cell types in a specific plant species. (B) Detailed information of a specific cell type (root hair in the Arabidopsis as an example). (C) Functional characterization of cell types by GSEA analysis and their comparison across species. (D) Characterization and analysis of a specific marker gene.
Figure 5.

The Marker module for cell marker exploration. (A) Overview of annotated cell types in a specific plant species. (B) Detailed information of a specific cell type (root hair in the Arabidopsis as an example). (C) Functional characterization of cell types by GSEA analysis and their comparison across species. (D) Characterization and analysis of a specific marker gene.

Furthermore, the meta information for each cell type includes details such as the number of shared plant species and tissues, cell ontology and related descriptions. These details provide additional context and background information about the selected cell type, facilitating a deeper understanding of its characteristics (Figure 5B). scPlantDB also provides features for functional analysis across species. Users can utilize tools such as Gene Set Enrichment Analysis (GSEA) and cellular comparison to gain insights into the cellular diversity and evolution across different plant species (Figure 5C). These features enhance the understanding of functional implications and similarities/differences in cellular processes among species. In this regard, users can effectively explore explore a gene of interest by examining its expression dynamics across datasets and assessing its functional conservation across species (Figure 5D). By leveraging this functionality, users can gain valuable insights into the role and functional significance of the gene, enhancing their understanding of its biological implications.

In short, by integrating marker gene information, meta data and functional analysis capabilities, scPlantDB enables users to explore and analyze cellular diversity and evolution across plant species, contributing to a comprehensive understanding of plant single-cell transcriptomics.

Tools. The Tools module in scPlantDB offers several valuable functionalities to enhance the analysis and exploration of single-cell transcriptome data.

One of these tools is the BLAST feature (22), which allows users to search for cell markers by inputting or uploading CDS sequences or peptide sequences. To ensure accurate searching, users can customize various parameters. Additionally, users have the option to receive the BLAST results via email. The history feature stores records of BLAST tasks the past seven days, providing easy access to previous searches.

The cell type comparator tool enables users to explore similarities of marker genes between different cell types in selected species. It provides two types of visualizations, allowing users to customize the display according to their preferences. The dotted heatmap shows an overview of similarities between different cell types. By clicking a dot in the dotted heatmap, users can access detailed information, including a Venn plot and marker lists. Instead of selecting all cell types, users can choose specific cell types for visualization. The comparator tool would be valuable for investigating the functional similarity of different cell types, primarily because cells tend to display similarities when they share common marker gene sets.

The cell type predictor tool allows users to enter genes (using IDs or names) one per line. Based on the user-input genes, the predictor outputs possible cell types and their corresponding similarity scores. The similarity score is calculated by dividing the number of markers belonging to the matched cell type by the number of user-input genes. The predictor tool efficiently assists users in annotating cell types when they possess a list of differentially expressed genes within a specific cell cluster.

In sum, these tools in scPlantDB empower users to perform various analyses, such as marker identification, cell type comparisons and cell type prediction. They provide valuable insights into the single-cell transcriptome data and enhance the understanding of cellular characteristics and relationships.

Search. scPlantDB incorporates a comprehensive search function that enables users to efficiently search for specific genes (by gene ID, name or functional description), cell types and sequence throughout the entire database. This convenient feature enhances accessibility by allowing users to quickly access information and navigate the database with ease.

Download. The Download functionality in scPlantDB is designed to facilitate data access, sharing and further analysis. It allows researchers to retrieve the desired datasets (in the format of RDS, H5ad or matrix), markers and blast results in a format that suits their specific needs, promoting seamless integration with other tools and workflows.

Discussion

The field of plant-specific research has greatly benefited from the advancements in single-cell approaches, which have provided unprecedented insights into cellular diversity and gene expression regulation (23). To fully harness the potential of this technology, it is crucial to collect, standardize, curate, integrate and visualize data effectively (24). Several databases focused on plant scRNA-seq have been developed, such as PsctH (12), PCMDB (14) and PlantscRNAdb (13). However, scPlantDB distinguishes itself in several aspects (Table 2).

Table 2.

Comparation of scPlantDB with existing plant scRNA-seq databases

AspectPsctHPCMDBPlantscRNAdbscPlantDB
Data scaleSpecies66817
Datasets//5567
Tissues9223153
Cell types51263364259
Cells///2 546 778
Markers988111769 628229 551
Dataset exploration×
Functional modulesComparative analysis××
Marker exploration
Blast×
Cell type comparator×××
Cell type predictor×××
Project designDesign aimProvide resource of cell markers and web tool for various cell types in tissues of plant speciesProvide cell markers from experimental research, bulk RNA-seq and single-cell sequencingTrack and analyze all available plant single-cell transcriptome-related dataIntegrate single-cell transcriptomic profiles and enable researchers to explore the cellular heterogeneity and gene expression patterns
AspectPsctHPCMDBPlantscRNAdbscPlantDB
Data scaleSpecies66817
Datasets//5567
Tissues9223153
Cell types51263364259
Cells///2 546 778
Markers988111769 628229 551
Dataset exploration×
Functional modulesComparative analysis××
Marker exploration
Blast×
Cell type comparator×××
Cell type predictor×××
Project designDesign aimProvide resource of cell markers and web tool for various cell types in tissues of plant speciesProvide cell markers from experimental research, bulk RNA-seq and single-cell sequencingTrack and analyze all available plant single-cell transcriptome-related dataIntegrate single-cell transcriptomic profiles and enable researchers to explore the cellular heterogeneity and gene expression patterns
Table 2.

Comparation of scPlantDB with existing plant scRNA-seq databases

AspectPsctHPCMDBPlantscRNAdbscPlantDB
Data scaleSpecies66817
Datasets//5567
Tissues9223153
Cell types51263364259
Cells///2 546 778
Markers988111769 628229 551
Dataset exploration×
Functional modulesComparative analysis××
Marker exploration
Blast×
Cell type comparator×××
Cell type predictor×××
Project designDesign aimProvide resource of cell markers and web tool for various cell types in tissues of plant speciesProvide cell markers from experimental research, bulk RNA-seq and single-cell sequencingTrack and analyze all available plant single-cell transcriptome-related dataIntegrate single-cell transcriptomic profiles and enable researchers to explore the cellular heterogeneity and gene expression patterns
AspectPsctHPCMDBPlantscRNAdbscPlantDB
Data scaleSpecies66817
Datasets//5567
Tissues9223153
Cell types51263364259
Cells///2 546 778
Markers988111769 628229 551
Dataset exploration×
Functional modulesComparative analysis××
Marker exploration
Blast×
Cell type comparator×××
Cell type predictor×××
Project designDesign aimProvide resource of cell markers and web tool for various cell types in tissues of plant speciesProvide cell markers from experimental research, bulk RNA-seq and single-cell sequencingTrack and analyze all available plant single-cell transcriptome-related dataIntegrate single-cell transcriptomic profiles and enable researchers to explore the cellular heterogeneity and gene expression patterns

Firstly, scPlantDB stands as the largest plant-specific single-cell database, housing approximately 2.5 million cells derived from 67 high-quality datasets across 17 plant species. This extensive collection enables diverse analyses and comparisons at multiple levels, surpassing the limited coverage of other databases in terms of available plant species and datasets. This breadth of data enhances the utility of scPlantDB for researchers in the plant science community.

Secondly, scPlantDB ensures consistency and reliability by integrating multiple datasets in a standardized manner. This standardized approach sets scPlantDB apart from other databases that may lack single-cell data processing or have unstandardized cell/tissue annotation. The integration of diverse datasets in a harmonized manner facilitates reliable cross-dataset comparisons and strengthens the overall quality of the database.

Thirdly, scPlantDB provides extensive and flexible comparative analysis across species, datasets, cell types and markers. This comprehensive exploration and analysis of single-cell plant data are essential for understanding cellular diversity and evolution. In contrast, other databases may lack the functionality to support such multi-level comparisons, limiting their analytical capabilities.

Lastly, scPlantDB offers an interactive visualization interface, empowering users to customize their visualizations and facilitating easy analysis for wet-lab researchers. This feature sets scPlantDB apart from other databases that may lack user-friendly visualization tools, making it more accessible and practical for researchers to interpret and communicate their findings.

Moving forward, we remain committed to maintaining and optimizing the functionality of scPlantDB. Furthermore, we are actively developing an upgraded platform that will incorporate new functions from the scPlant tool (25), further enhancing the database's utility and user experience. By continually improving and expanding scPlantDB, we aim to contribute to the advancement of plant research and promote collaboration within the scientific community.

In summary, scPlantDB represents a significant advancement in the field of plant single-cell biology. By consolidating and standardizing single-cell transcriptomic data from various plant species, this database serves as a valuable resource for elucidating the molecular basis of plant development, physiology and response to environmental stimuli. We anticipate that scPlantDB will contribute to accelerating discoveries in plant biology and inspire further investigations into the intricate world of plant single-cell biology.

Data availability

scPlantDB can be accessed at https://biobigdata.nju.edu.cn/scplantdb.

Acknowledgements

The authors acknowledge the High Performance Computing Center of Nanjing University for providing high performance computing (HPC) resources. We would like to thank all the members of Dijun Chen's group for valuable discussions. We are grateful to all the data contributors whose invaluable contributions have made this project possible. We thank Dr Chao He from Huazhong Agricultural University for providing the annotated data of scRNA-seq in wheat. Graphical abstract was created with BioRender.com.

Author contributions: D.C. conceived the study. X.Z. designed the database, Z.H. and Y.Luo contributed to data collection and analysis with supports from T.Z. and Y.Lan. D.C., Z.H. and X.Z. wrote the manuscript. All authors reviewed and approved the manuscript.

Funding

National Natural Science Foundation of China [32070656]; Nanjing University Deng Feng Scholars Program; X.Z. appreciate the 2023 Postgraduate Research & Practice Innovation Program of Jiangsu Province [KYCX23_0131]. Funding for open access charge: National Natural Science Foundation of China.

Conflict of interest statement. None declared.

References

1.

Seyfferth
C.
,
Renema
J.
,
Wendrich
J.R.
,
Eekhout
T.
,
Seurinck
R.
,
Vandamme
N.
,
Blob
B.
,
Saeys
Y.
,
Helariutta
Y.
,
Birnbaum
K.D.
et al. .
Advances and opportunities in single-cell transcriptomics for plant research
.
Annu. Rev. Plant Biol.
2021
;
72
:
847
866
.

2.

Ryu
K.H.
,
Huang
L.
,
Kang
H.M.
,
Schiefelbein
J.
Single-cell RNA sequencing resolves molecular relationships among individual plant cells
.
Plant Physiol.
2019
;
179
:
1444
1456
.

3.

Zhang
T.-Q.
,
Xu
Z.-G.
,
Shang
G.-D.
,
Wang
J.-W.
A single-cell RNA sequencing profiles the developmental landscape of arabidopsis root
.
Mol. Plant
.
2019
;
12
:
648
660
.

4.

Wang
Y.
,
Huan
Q.
,
Li
K.
,
Qian
W.
Single-cell transcriptome atlas of the leaf and root of rice seedlings
.
J. Genet. Genomics
.
2021
;
48
:
881
898
.

5.

Li
H.
,
Dai
X.
,
Huang
X.
,
Xu
M.
,
Wang
Q.
,
Yan
X.
,
Sederoff
R.R.
,
Li
Q.
Single-cell RNA sequencing reveals a high-resolution cell atlas of xylem in Populus
.
J. Integr. Plant Biol.
2021
;
63
:
1906
1921
.

6.

Guo
X.
,
Liang
J.
,
Lin
R.
,
Zhang
L.
,
Zhang
Z.
,
Wu
J.
,
Wang
X.
Single-cell transcriptome reveals differentiation between adaxial and abaxial mesophyll cells in Brassica rapa
.
Plant Biotechnol. J.
2022
;
20
:
2233
2235
.

7.

Sun
S.
,
Shen
X.
,
Li
Y.
,
Li
Y.
,
Wang
S.
,
Li
R.
,
Zhang
H.
,
Shen
G.
,
Guo
B.
,
Wei
J.
et al. .
Single-cell RNA sequencing provides a high-resolution roadmap for understanding the multicellular compartmentation of specialized metabolism
.
Nat. Plants
.
2023
;
9
:
179
190
.

8.

Qin
Y.
,
Sun
M.
,
Li
W.
,
Xu
M.
,
Shao
L.
,
Liu
Y.
,
Zhao
G.
,
Liu
Z.
,
Xu
Z.
,
You
J.
et al. .
Single-cell RNA-seq reveals fate determination control of an individual fibre cell initiation in cotton (Gossypium hirsutum)
.
Plant Biotechnol. J.
2022
;
20
:
2372
2388
.

9.

Li
C.
,
Zhang
S.
,
Yan
X.
,
Cheng
P.
,
Yu
H.
Single-nucleus sequencing deciphers developmental trajectories in rice pistils
.
Dev. Cell
.
2023
;
58
:
694
708
.

10.

Cervantes-Pérez
S.A.
,
Thibivilliers
S.
,
Laffont
C.
,
Farmer
A.D.
,
Frugier
F.
,
Libault
M.
Cell-specific pathways recruited for symbiotic nodulation in the Medicago truncatula legume
.
Mol. Plant
.
2022
;
15
:
1868
1888
.

11.

Chen
Y.
,
Zhang
X.
,
Peng
X.
,
Jin
Y.
,
Ding
P.
,
Xiao
J.
,
Li
C.
,
Wang
F.
,
Chang
A.
,
Yue
Q.
et al. .
SPEED: single-cell pan-species atlas in the light of Ecology and Evolution for Development and diseases
.
Nucleic Acids Res.
2023
;
51
:
D1150
D1159
.

12.

Xu
Z.
,
Wang
Q.
,
Zhu
X.
,
Wang
G.
,
Qin
Y.
,
Ding
F.
,
Tu
L.
,
Daniell
H.
,
Zhang
X.
,
Jin
S.
Plant single cell Transcriptome Hub (PsctH): an integrated online tool to explore the plant single-cell transcriptome landscape
.
Plant Biotechnol. J.
2022
;
20
:
10
12
.

13.

Chen
H.
,
Yin
X.
,
Guo
L.
,
Yao
J.
,
Ding
Y.
,
Xu
X.
,
Liu
L.
,
Zhu
Q.-H.
,
Chu
Q.
,
Fan
L.
PlantscRNAdb: a database for plant single-cell RNA analysis
.
Mol. Plant
.
2021
;
14
:
855
857
.

14.

Jin
J.
,
Lu
P.
,
Xu
Y.
,
Tao
J.
,
Li
Z.
,
Wang
S.
,
Yu
S.
,
Wang
C.
,
Xie
X.
,
Gao
J.
et al. .
PCMDB: a curated and comprehensive resource of plant cell markers
.
Nucleic Acids Res.
2022
;
50
:
D1448
D1455
.

15.

Zhao
X.
,
Lan
Y.
,
Chen
D.
Exploring long non-coding RNA networks from single cell omics data
.
Comput. Struct. Biotechnol. J.
2022
;
20
:
4381
4389
.

16.

Hao
Y.
,
Hao
S.
,
Andersen-Nissen
E.
,
Mauck
W.M.
,
Zheng
S.
,
Butler
A.
,
Lee
M.J.
,
Wilk
A.J.
,
Darby
C.
,
Zager
M.
et al. .
Integrated analysis of multimodal single-cell data
.
Cell
.
2021
;
184
:
3573
3587
.

17.

Chen
L.-Q.
,
Qu
X.-Q.
,
Hou
B.-H.
,
Sosso
D.
,
Osorio
S.
,
Fernie
A.R.
,
Frommer
W.B.
Sucrose efflux mediated by SWEET proteins as a key step for Phloem transport
.
Science
.
2012
;
335
:
207
211
.

18.

Jiang
S.
,
Qian
Q.
,
Zhu
T.
,
Zong
W.
,
Shang
Y.
,
Jin
T.
,
Zhang
Y.
,
Chen
M.
,
Wu
Z.
,
Chu
Y.
et al. .
Cell Taxonomy: a curated repository of cell types with multifaceted characterization
.
Nucleic Acids Res.
2023
;
51
:
D853
D860
.

19.

Emms
D.M.
,
Kelly
S.
OrthoFinder: phylogenetic orthology inference for comparative genomics
.
Genome Biol.
2019
;
20
:
238
.

20.

Shulse
C.N.
,
Cole
B.J.
,
Ciobanu
D.
,
Lin
J.
,
Yoshinaga
Y.
,
Gouran
M.
,
Turco
G.M.
,
Zhu
Y.
,
O’Malley
R.C.
,
Brady
S.M.
et al. .
High-throughput single-cell transcriptome profiling of plant cell types
.
Cell Rep.
2019
;
27
:
2241
2247
.

21.

Prins
L.
,
Badajoz
S.
,
Mccandless
B.
,
Oliveira Pisco
A.
,
Kinsella
M.
,
Griffin
F.
,
Kiggins
J.
,
Haliburton
G.
,
Mani
A.
,
Weiden
M.
et al. .
cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices
.
2021
;
bioRxiv doi:
06 April 2021, preprint: not peer reviewed
https://doi.org/10.1101/2021.04.05.438318.

22.

Altschul
S.F.
,
Gish
W.
,
Miller
W.
,
Myers
E.W.
,
Lipman
D.J.
Basic local alignment search tool
.
J. Mol. Biol.
1990
;
215
:
403
410
.

23.

Kaur
H.
,
Jha
P.
,
Ochatt
S.J.
,
Kumar
V.
Single-cell transcriptomics is revolutionizing the improvement of plant biotechnology research: recent advances and future opportunities
.
Crit. Rev. Biotechnol.
2023
;
12
:
1
16
.

24.

Ahmed
J.
,
Alaba
O.
,
Ameen
G.
,
Arora
V.
,
Arteaga-Vazquez
M.A.
,
Arun
A.
,
Bailey-Serres
J.
,
Bartley
L.E.
,
Bassel
G.W.
,
Bergmann
D.C.
et al. .
Vision, challenges and opportunities for aPlant Cell Atlas
.
Elife
.
2021
;
10
:
e66877
.

25.

Cao
S.
,
He
Z.
,
Chen
R.
,
Luo
Y.
,
Fu
L.-Y.
,
Zhou
X.
,
He
C.
,
Yan
W.
,
Zhang
C.-Y.
,
Chen
D.
scPlant: a versatile framework for single-cell transcriptomic data analysis in plants
.
Plant Commun
.
2023
;
100631
:
2590
3462
.

Author notes

The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.