A single-nucleus transcriptomic atlas of the dog hippocampus reveals the potential relationship between specific cell types and domestication

Abstract The process of domestication has led to dramatic differences in behavioral traits between domestic dogs and gray wolves. Whole-genome research found that a class of putative positively selected genes were related to various aspects of learning and memory, such as long-term potentiation and long-term depression. In this study, we constructed a single-nucleus transcriptomic atlas of the dog hippocampus to illustrate its cell types, cell lineage and molecular features. Using the transcriptomes of 105 057 nuclei from the hippocampus of a Beagle dog, we identified 26 cell clusters and a putative trajectory of oligodendrocyte development. Comparative analysis revealed a significant convergence between dog differentially expressed genes (DEGs) and putative positively selected genes (PSGs). Forty putative PSGs were DEGs in glutamatergic neurons, especially in Cluster 14, which is related to the regulation of nervous system development. In summary, this study provides a blueprint to understand the cellular mechanism of dog domestication.


INTRODUCTION
The domestication process has led to dramatic differences in behavioral and morphological traits between domesticated animals and their wild ancestors [1][2][3]. Domestic dogs have undergone parallel evolution and convergent evolution with humans in the history of early hunting-gathering and the recent change of living environments from agrarian societies to modern cities [4][5][6][7][8]; in this process, dogs have undergone strong artificial selection, resulting in ∼450 globally recognized breeds, which makes them the most variable mammalian species on Earth [9][10][11]. Genome-wide scans for positive selection revealed that the behavioral and neurological traits likely changed subsequently with the process of domestication [12]. For instance, putative positively selected genes (PSGs) in dogs were reported to be linked to neural crest and central nervous system (CNS) development [13]. Gene expression in brains showed a strong difference in putative PSGs in neurons on learning, memory and behavior between domestic animals and their wild relatives [14,15].
The hippocampus is an important part of the limbic system in the brain that has an essential role in memory [16]. A class of domesticated genes is related to hippocampal synaptic long-term potentiation and long-term depression [5]. Single-cell RNA sequencing (scRNA-seq) has been used to identify the cell-type atlas [17][18][19], reveal cellular and molecular dynamics in neurogenesis [20] and decode hippocampal development and evolution [21]. It is worth noting that dozens of putative PSGs in dogs were related to cell types of both excitatory and inhibitory neurons and contributed to the connectivity and development of synapses and dendrites [13,15].
In the present study, we collected single nuclei from the hippocampus of a Beagle dog and performed a comprehensive analysis of the tran-scriptomic profiles of 105 057 nuclei. We defined 26 cell clusters and identified a set of marker genes for each cell type. Cross-species comparative analysis showed that the major cell types in the hippocampus were highly conserved between dogs and humans. More interestingly, 86 of the differentially expressed genes (DEGs) were putative PSGs during dog domestication. The comprehensive cell landscape of the dog hippocampus could help us establish correspondence between cell types in the nervous system and putative PSGs in dogs and facilitate the understanding of the molecular features of cells during domestication.

Single-nucleus transcriptomics of the dog hippocampus constructed using SPLiT-seq
A standardized single-nucleus RNA-sequencing (snRNA-seq) pipeline was built using SPLiT-seq [22]. The snRNA-seq data of single nuclei from the hippocampus of a 5-month-old Beagle dog were obtained with random and oligo(dT) primers. After detailed preprocessing and filtering (see 'Methods' section), we created a digital expression matrix of 105 057 single nuclei with a median of 804 genes and 1109 counts per nucleus. To remove the potential batch effects, partial principal component analysis (partial-PCA, see 'Methods' section) was performed instead of the classical PCA and then the uniform manifold approximation and projection method (UMAP) was applied to project these two batches of transcripts into the common comparable 2D space ( Supplementary Fig. S1A). Furthermore, the Leiden community detection algorithm was employed to group these cells into 26 cell clusters (Fig. 1A).
Among the clusters, 12 clusters (0, 3, 5, 6, 7, 10, 11, 13, 14, 15, 17 and 20) were identified as the glutamatergic neurons, which produce the most common excitatory neurotransmitter in the CNS [26]. Although Clusters 3, 5 and 6 had a similar expression pattern, there were unique DEGs expressed in each cluster. For instance, both Clusters 3 and 6 had high expression of GRM7, TENM2 and CACNA1E, while Cluster 3 had no expression of STMN2 or NEFL (Fig. 1B). CCBE1 is a specific marker gene specifically in Cluster 0 (Fig. 1C). It marked the canine hippocampus cornu ammonis 3 (CA3) and stratum pyramidale (SP) (Fig. 1D)  Four clusters (4, 8, 12 and 21) are GABAergic neurons related to a kind of inhibitory neurotransmitter in the CNS, indicated by the common marker gene GAD2 [23] (Fig. 1E). Immunofluorescence (IF) analysis showed that GAD2 was highly expressed in the granule cell layer (GCL) and molecular layer (ML) of the dentate gyrus (DG) area and in the stratum lacunosum-moleculare (SLM) and SP of the CA (cornu ammonis) area (Fig. 1F). The GO enrichment analysis illustrated that four GABAergic neuron clusters were enriched in behavior (GO : 0007610, Supplementary Table  S3). Cluster 25 was identified as the Cajal-Retzius cell, which plays a major role in cortical development [28,29]. The marker gene of the cell type was NDNF ( Fig. 1G and H), which was also expressed in Cajal-Retzius cells in humans and mice [22,30]. Beyond the neuron cell clusters described above, another five clusters represented non-neuronal cells. Clusters 19 and 22 were unidentified types. Clusters 1, 23 and 24 were identified as astrocytes, endothelial cells and microglia cells, respectively.

Cross-species comparison between dog and human hippocampus
To validate the cell-type annotations and explore the conservation of the hippocampus, we performed a cross-species comparison between dog and human hippocampus using human hippocampus scRNAseq data from Zhong et al. [21]. We used Harmony [31] to integrate these data sets based on the one-to-one homologous genes between dogs and humans. We found that most of the homologous major cell types were consistent and located nearby on the UMAP plots ( Fig. 2A-C). The homologous DEG overlaps between each pair of major dog and human types also revealed a similar correspondence (Fig. 2D). In particular, five typical markers are shared between dog and humans (Supplementary  The one-to-one homologous overlaps between the DEG sets of dogs and humans. (E) The predicted cell-type probabilities of each cell in the dog hippocampus, output by CAME, using the human data as the reference. (F) The predicted cell-type probabilities of each cell in the human hippocampus, output by CAME, using the dog data as the reference. Oligo-Astro (mix): a mix of oligodendrocytes and astrocytes; Oligo (mix): a mixing of oligodendrocytes; Oligo: oligodendrocytes. Table S4). GAD1-expressing inhibitory neurons are associated with human developmental and epileptic encephalopathy [32]. MBP marks oligodendrocytes and encodes the protein of the myelin membrane in the CNS [33] and several studies have shown that MBP is a biomarker for multiple sclerosis [34]. AQP4, an astrocyte marker, plays an important role in brain water homeostasis [35]. SPARC inhibits mitogenic effects in microvascular endothelial cells [36]. PTPRC (also known as CD45) marks microglia cells [37]. In addition, we applied the cross-species cell-typing and integrative tool CAME [38] to predict the major cell types of dog hippocampal cells (the query) with human snRNA-seq data as the reference (Fig. 2E). We also performed the reverse prediction, i.e. querying the human cells using dog data as the reference (Fig. 2F). We found that the cross-species cell-type predictions of CAME were consistent with the results of Harmony integration and the DEG comparisons. These results suggest that the cell-type annotations of the dog hippocampus are reasonable and that the major types in the hippocampus are highly conserved between dogs and humans.

Putative trajectory analysis reveals the conserved oligodendrocyte development trajectory
With the defined cell-type markers, Clusters 2 and 9 were inferred to be myelinating oligodendrocytes and OPCs, respectively (Fig. 3A). Consistently with this classification, CNP, a myelin-related marker gene of Cluster 2 [39], was highly expressed in the hilus, ML, SLM, F and A areas (Fig. 3B). AGAP1 was a DEG in Cluster 9 and could be used as a new marker of OPCs (Fig. 3B). Cluster 16 was linked to Clusters 2 and 9 ( Fig. 1A), with few specifically expressed genes to assign its identity. Therefore, we inferred that Clusters 2, 9 and 16 might form a development trajectory from OPCs to myelinating oligodendrocytes, which was also found in the mouse hippocampus [22].
To validate this hypothesis, we constructed the single-nucleus trajectories using Slingshot [40] based on the recalculated UMAP embedding and the original cluster labels (Fig. 3C). As a result, PTPRZ1, the marker gene for precursor cells that maintained OPCs in an undifferentiated state [41], was expressed in Cluster 9; SOX6 was expressed in Cluster 16 and repressed oligodendrocyte differentiation [42]; PLP1 was another myelinrelated gene expressed in Cluster 2 ( Supplementary  Fig. S2A). The GO enrichment analysis showed that oligodendrocytes (Clusters 2, 9 and 16) were involved in different biological pathways (Fig. 3D). The DEGs in Cluster 9 (OPCs) were related to neuron differentiation (GO : 0030182, Supplementary Table S5) and regulation of neurogenesis (GO : 0050767, Supplementary  Table S5). The DEGs of the putative trajectory were consistent with the differentiation process of oligodendrocyte development and we also verified this result in mouse data ( Fig. 3E and Supplementary  Fig. S2B-E). Additionally, we combined the data with single-cell transcriptomes from the human hippocampus for comparative analysis and validation [21]. The results showed that the hippocampus followed a developmental trajectory from OPCs to oligodendrocytes in humans and dog ( Fig. 2A-C).
All these results indicate that humans, dog and mice have a similar trajectory of oligodendrocyte development.

Significant convergence between DEGs and putative PSGs in domestication
Previous studies showed that dog putative PSGs were enriched in neurological processes such as learning and memory [5,43] and expressed specifically in brain tissues [14]. To identify which types of cells play an important role during domestication, we gathered the DEGs in the 26 clusters (with P-value < 0.001 and log 2 (fold change) > 0.25) and compared them with the putative PSGs from published studies [5,13,[43][44][45]. The genes expressed in more than three cells in the dog hippocampus were used as the background (see 'Methods' section).
The results showed that 630 of 841 putative PSGs were detected in the hippocampus single-nucleus transcriptomes (Supplementary Table S6 and Supplementary Fig. S3). Eighty-six of 630 detected putative PSGs occurred in the 1628 DEGs with a statistical significance with a P-value of 5.10E-09 (Table 1). However, the 1000 random gene sets with equal sizes were not significant, with a P-value of 0.47. Since Freedman et al. (2016) performed a statistical analysis to determine the likelihood that specific genes were under selection, we used their putative PSGs to verify the statistical significance. As a result, 16 genes occurred in both DEGs and putative PSGs with a P-value of 6.99E-03. Furthermore, there was also a significant overlap between DEGs and the putative PSGs reported in at least two published studies (P-value = 2.45E-07, Supplementary Table S7).
To explore whether putative PSGs have higher or lower specificity across clusters or cell types, we calculated the entropy of putative PSGs. The higher entropy-specificity score for a gene implies its uniqueness in specific clusters or cell types [46]. We found that the putative PSGs that occurred in more than one study have significantly higher entropy specificity across both clusters and cell types compared with the background (Table 2). To determine which cluster was highly enriched with putative PSGs, we performed enrichment analysis for each cluster-specific in the putative PSG set (Table 3). Four clusters (Clusters 2, 9, 14 and 24) were significantly enriched in three putative PSG sets and six clusters (Clusters 0, 8, 11, 16, 17 and 20) were enriched in two putative PSG sets (with P-value < 0.05). Glutamatergic receptors constitute a major excitatory transmitter system, and dog excitatory synaptic plasticity increased in the domestication process [15]. Forty of 86 putative PSGs were expressed in glutamatergic neurons and 28 genes were expressed in Clusters 0, 11, 14, 17 and 20. The GO enrichment analysis showed that DEGs of Cluster 0 were enriched in the regulation of cell migration (GO : 0030334, Supplementary  Table S2). A putative PSG CUX2 was highly expressed in DG granular cells as glutamatergic neuron DEGs in Clusters 11 and 20, and it was also expressed in CA granular cells, as the same expression pattern in mice ( Fig. 4A and B and Supplementary Fig. S4B) [47]. Another putative PSG, GRIK3, in Cluster 14 was detected mainly in the GCL of the DG and SP of the CA area ( Fig. 4A and  B). It is worth noting that the putative PSGs were significantly enriched in Cluster 14 (  1E) and it has been taken as a candidate gene to regulate brain development and behavior during domestication ( Fig. 4A and B)  Previous studies suggested that ablation of GAD2 in mice reduced freezing and increased flight and escape behavior [48]. NRXN3 (Clusters 4 and 8) encodes a receptor and cell adhesion molecule in the nervous system, and it was highly expressed in the SP of the hippocampal CA area (Fig. 4A  A previous study also confirmed that putative PSGs in Chinese native dogs were enriched in locomotory behavior [14]. Twenty-four putative PSGs occurred in oligodendrocytes, including Clusters 2, 9, 16 and 18. Three of them are related to OPC differentiation. Oligodendrocytes are a kind of cell that primarily forms CNS myelin [49]. One of the overlapping genes, COL11A1 (Clusters 9 and 16), marked OPCs and was expressed in the partial area of the hippocampal CA ( Fig. 4A and B). Previous studies have shown that OPCs are important for human white matter expansion and myelination [50]. Myelinating oligodendrocytes were the result of OPC differentiation. MBP was a marker gene that was expressed in Clusters 2, 16 and 18, and   Table S13). (B) Six genes in Fig. 4A expression in the dog hippocampus revealed by immunohistochemical and immunofluorescent staining. Six genes on the right of the picture were differentially expressed in different cell types and the GO enrichment analysis showed their relevant gene functions. It is worth noting that CUX2 and GRIK3 are DEGs in glutamatergic neurons, although compared to other cells they are highly expressed in non-neurons and GABAergic neurons, respectively. High-definition immunohistochemical and immunofluorescent staining is shown in Supplementary Fig. S4A.
highly expressed in the hilus, ML, SLM and other hippocampal regions (Fig. 4A and B). As a domestication gene, MBP encodes myelin basic protein.

Reconstructed gene regulatory networks of the dog hippocampus
We reconstructed dog hippocampal gene regulatory networks using GENIE3 software [52]. To attenuate the effects of noise and outliers, we used 4523 genes and 4281 pseudo cells (see 'Methods' section about Weighted correlation network analysis, WGCNA), which contained all the cell types in this study. More than 2 million interactions were found between the putative regulatory genes and target genes, with 2239 putative regulatory genes in the top 1% (Supplementary Table S9). The transcription factors in dogs (Canis familiaris) from AnimalTFDB3.0 [53] were used to verify these putative regulatory genes. As the results showed, there were 118 putative regulatory genes, 11 of which were both PSGs and DEGs, including Cut Like Homeobox 2 (CUX2) [13,43] and Regulatory Factor X3 (RFX3) [5], in glutamatergic neurons. GO enrichment analysis showed that the target genes of CUX2 were related to glutamatergic synaptic transmission (GO : 0035249, Supplementary Table S10). The transcription factor CUX2 is involved in early neocortical circuits, cellular fate selection and mechanosensation development [54,55].

DISCUSSION
In this study, we presented single-nucleus transcriptomics of the dog hippocampus and identified 26 cell clusters based on 105 057 single-nucleus transcriptomes, including glutamatergic neurons, GABAergic neurons, Cajal-Retzius cells, OPCs, myelinating oligodendrocytes, endothelial cells, astrocytes, microglia cells and so on. In addition, our study demonstrated the trajectory of oligodendrocyte differentiation based on the recalculated UMAP embedding analysis. The cross-species analysis revealed that eight major cell types were shared between the human and dog hippocampus [21], suggesting that dogs have potential as a model organism for human mental illness. Gene expression has spatial heterogeneity, which could reveal the relationship between hippocampal subareas and cell clusters. The hippocampus is divided into two main parts, namely the cornu ammonis and the DG, which contain pyramidal neurons and granule cells, respectively [57]. Clusters 19 (unidentified), 20 (glutamatergic neurons), 25 (Cajal-Retzius cells) and 16 (oligo-mix) might be distributed in the DG because these clusters had the highest expression proportion and average expression of CUX2, CXCR4 and SEMA5A, respectively (Supplementary Table S12). Immunohistochemical showed that CUX2 was highly expressed in the DG ( Fig. 4B and Supplementary Fig. S4A), and CXCR4 and SEMA5A are also expressed in the DG [21,58]. Clusters 7 (glutamatergic neurons) and 20 (glutamatergic neurons) might be located in CA3 since these two clusters had the highest expression proportion and average expression in NRIP3 and SULF2, respectively. These two genes are marker genes for CA3 [21]. Cluster 19 might also be located in CA1 because PID1, a marker gene for CA1, had the highest expression proportion and average expression [21]. Clusters 6 (glutamatergic neurons) and 13 (glutamatergic neurons) had the highest expression proportion and average expression in RGS14, which is a CA2 marker; therefore, we inferred that these two clusters might be located in CA2 [59].
Reduced fear and aggression are important traits selected by humans in the first step of animal domestication, which help animals not only live commensally with humans, but also stay in a crowded environment [60,61]. As a subtype of inhibitory neurons, GABAergic neurons have been linked to the response to learning and fear memory [62,63]. Domestication may reduce myelination levels, compromise neural conduction, also change the size of brain structures relevant for memory, reflexes and fear processing [51]. The lower reactive level is a component of domestication syndrome, which is a collection of common traits in domestic animals [64]. In our data, the putative PSGs in domestication were involved in the development of oligodendrocytes and might be involved in myelination levels during nervous system development.
Glutamate receptors play an important role in the CNS and respond to basal excitatory synaptic transmission; some of them are involved in learning and memory, and there were great changes in glutamate receptors in animal domestication and modern human evolution [65]. Furthermore, Cluster 14 consisted of glutamatergic neurons, and DEGs in the cluster were enriched in the regulation of synapse structure or activity (GO : 0050803, Supplementary Table S2) and regulation of nervous system development (GO : 0051960, Supplementary Table S2). These findings reveal that cells in Cluster 14 may play an important role in a decreased stress response not only in domesticated animals, but also in humans [65]. These results could verify the hypothesis that glutamatergic neurons changed their behavior during domestication and reduced the fear response in dogs by regulating cells in the hippocampus, and Cluster 14 is probably an important cluster in dog domestication.
In summary, 105 057 single-nucleus transcriptomes were classified into 26 clusters in this study and were defined with distinct identities. Further exploration revealed 86 genes that overlapped between putative PSGs and DEGs. In addition, we illustrated the OPC differentiation trajectory based on the UMAP embedding. Our results contributed to defining the cell types and revealing the development of oligodendrocytes in the dog hippocampus, illustrating the difference between the subcell types and connecting the gap in our understanding between the molecular and cellular mechanisms of animal domestication.

MATERIALS AND METHODS
Information on the materials used to conduct the research and descriptions of all methods used in the analysis are available in the Supplementary information. A 5-month-old Beagle was used in this work; the dog was provided by the Department of Laboratory Animal Science, Kunming Medical University. All animal processing procedures and experiments performed in the present study were approved by the Animal Ethics Committee of Kunming Institute of Zoology, Chinese Academy of Sciences (SMKX-20 160 301-01).

DATA AVAILABILITY
The data underlying this article are available in the Genome Sequence Archive in the BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, and can be accessed with PRJCA004294. We also constructed the dog hippocampus atlas website at https://ngdc.cncb.ac.cn/idog/ (single-cell module).

SUPPLEMENTARY DATA
Supplementary data are available at NSR online.