Single-cell transcriptomes reveal molecular specializations of neuronal cell types in the developing cerebellum

Abstract The cerebellum is critical for controlling motor and non-motor functions via cerebellar circuit that is composed of defined cell types, which approximately account for more than half of neurons in mammals. The molecular mechanisms controlling developmental progression and maturation processes of various cerebellar cell types need systematic investigation. Here, we analyzed transcriptome profiles of 21119 single cells of the postnatal mouse cerebellum and identified eight main cell clusters. Functional annotation of differentially expressed genes revealed trajectory hierarchies of granule cells (GCs) at various states and implied roles of mitochondrion and ATPases in the maturation of Purkinje cells (PCs), the sole output cells of the cerebellar cortex. Furthermore, we analyzed gene expression patterns and co-expression networks of 28 ataxia risk genes, and found that most of them are related with biological process of mitochondrion and around half of them are enriched in PCs. Our results also suggested core transcription factors that are correlated with interneuron differentiation and characteristics for the expression of secretory proteins in glia cells, which may participate in neuronal modulation. Thus, this study presents a systematic landscape of cerebellar gene expression in defined cell types and a general gene expression framework for cerebellar development and dysfunction.


Introduction
The cerebellum is known for its critical roles in motor coordination, posture balance, and motor learning (Glickstein et al., 2009;Caligiore et al., 2017;Lang et al., 2017). Recently, accumulating lines of evidence have suggested its roles in nonmotor functions such as cognition and emotion (Koziol et al., 2014;Baumann et al., 2015). These diverse functions are thought to be encoded by cerebellar circuits that are composed of relatively few cell types distributed in the cerebellar cortex and cerebellar nuclei (Buckner, 2013), such as GABAergic Purkinje cells (PCs) with an elaborate dendritic arborization extending into the superficial molecular layer, glutamatergic excitatory granule cells (GCs) in the innermost granular layer, and several types of interneurons (Marzban et al., 2014;Lackey et al., 2018). Although microscopic anatomy and canonical microcircuit of the cerebellar cortex have been extensively studied, the molecular mechanisms governing cerebellar neuronal fate determination and maturation remain unclear.
The cerebellar cytostructure emerges during early embryonic stage from multiple germinal zones, mainly ventricular zone (VZ) of the forth ventricle wall and the rhombic lip (RL) (Hatten and Heintz, 1995), and migrate to distinct layers via tangential or radial migratory pathways. PCs, the output of cerebellar cortex, are derived from the VZ of mouse at E10-E13 (Wang and Zoghbi, 2001), followed by radial migration to form a plate of several cell layers and then a monolayer configuration after birth (Marzban et al., 2014). Then, PCs experience sequential cell shape changes, from a distinct bipolar morphology to a highly elaborated dendritic configuration that is flattened within the sagittal plane (Sotelo and Rossi, 2013). The cerebellar GCs, the most abundant neurons in the vertebrate brain, are derived from the RL in mice at E12.5-17.5, followed by tangentially migration to form the external germinal layer (EGL), a temporary population of proliferating cells. At birth, some granule precursors (GPs) in inner EGL exit the cell cycle and differentiate into mature GCs resulting in the gradual disappearance of the EGL. Subsequently, postmitotic GCs exhibit dynamic morphological changes with leading processes guiding the inward migration along the radial fibers of the Bergmann glia to populate at the internal granular layer (IGL) (Rakic, 1971;Adams et al., 2002), and this process becomes obvious at P5 and is completed by P20 (Altman and Bayer, 1997). At IGL, GCs project an ascending axon that bifurcates at the molecular layer to form the parallel fibers, which synapse with dendrites of PCs or interneurons (Komuro and Yacubova, 2003). The cerebellar interneurons, including stellate and basket cells in the molecular layer, candelabrum cells in the PC layer, and Golgi cells in the granular layer, are born postnatally from arguably diverse origins (Hoshino, 2006). Although many genes and signaling pathways have been shown to be involved in the development, fate determination, and migration of cerebellar neurons, many questions remain unclear. For example, how molecular cascades finely tune neurogenesis and migration of GCs, what genetic factors specify characteristic morphology of PCs, and which factors determine specific synaptic connections. Cerebellar abnormality and dysfunction cause a number of neurological and neuropsychiatric symptoms, such as ataxia, tremor, autism spectrum disorder, and schizophrenia (Bastian, 2011;D'Angelo and Casali, 2012;Reeber et al., 2013). Understanding the cell-type specific expression of disease-related genes may help understand circuitry basis of these diseases and design specific interventions. High throughput single-cell RNA sequencing (scRNAseq) allows for deeper understanding of molecular specification of various cell types in distinct brain regions of given species. Recently, several studies have presented transcriptional profiles of the developmental murine cerebellum (Carter et al., 2018;Gupta et al., 2018;Rosenberg et al., 2018). However, their analyses focused on the expression of transcription factors (TFs) associated with fate determination of GCs or glutamatergic lineages.
Here, we performed the droplet-based scRNA-seq to survey the cerebellar cell types and identified gene expression profiling of the murine cerebellum at postnatal days 0 and 8. We show temporally dynamic gene expression patterns correlated with states of various neuronal cell types, including GCs and GPs, developing PCs, as well as interneurons and glia cells. This analysis also led to identification of novel markers of various cerebellar cell types and suggested factors that might be involved in neuronal differentiation, morphogenesis, or neuro-glia interactions. Moreover, enriched expression of ataxia-related genes in PCs indicates cellular mechanisms of cerebellum-associated diseases.

Cell types of the developing cerebellum
To investigate the postnatal development of the cerebellum, we performed scRNA-seq analysis for samples from postnatal day 0 and 8 using the droplet system from 10x Genomics Chromium (Macosko et al., 2015;Zilionis et al., 2017). After filtering out outliers with extremely low (<1000) and high (>6500) gene numbers, which represented probable dead cells and duplicates, respectively, a total of 21119 single cells from one P0 sample and two independent P8 samples were analyzed ( Figure 1A and Supplementary Dataset S1). A means of 58994 unique molecule identifiers (UMIs) and a median of 2615 genes were detected in each cell ( Figure 1B and Supplementary Figure S1A-C). To examine batch effect and individual variance, we performed Pearson correlation coefficients and canonical correlation analysis (CCA) of three samples (Butler et al., 2018). As shown in Figure 1C, the mean reads per cell of two P8 samples were highly correlated (r = 0.989). The cell distribution from two P8 samples was tightly close as visualized in 2D t-distributed stochastic neighbor embedding (t-SNE) plot ( Figure 1D), suggesting the reproducibility of our data collection. To classify major cell types in the developing cerebellum, we performed a multi-set canonical correlation analysis (Multi-CCA) on gene expression matrix and integrated three datasets (one P0 and two P8) into a single one, and identified shared correlation structures (canonical correction vectors, CC) across datasets (Butler et al., 2018). Because we have begun to see drop-off in signals of shared correction strength before CC 15, we chose CC 1 to 15 for subsequent analysis and t-SNE presentation (Supplementary Figure S1D). We identified 12 transcriptionally distinct cell clusters using Seurat R package . Then, we used a series of known cerebellar cell lineage markers to annotate each cluster, such as Atoh1 marking Clusters 0, 3, 4, and 5 as GPs and Rbfox3 marking Clusters 1 and 2 as GCs (Ben-Arie et al., 1997;Weyer and Schilling, 2003) ( Figure 1E). Notably, Rbfox3 is also expressed in a fraction of Atoh1-positive GPs, suggesting intermediate state of the population. Other clusters represent astrocytes, interneurons, oligodendrocytes, microglia, PCs and endothelial cells ( Figure 1F and G; Supplementary Figure S2A, B and Dataset S1). Our analysis did not identify a separate Bergmann glial cell population, probably because astrocytes and Bergmann glial cells are originated from common precursors (Fleming et al., 2013). Indeed, we found that astrocytes expressed Bergmann glial cell marker genes, such as Sox1 and Sox9 (Supplementary Figure S2C). As a type of GABAergic neuron, PCs expressed Gad1, in addition to known specific marker genes encoding Calbindin 1 (Calb1) and Purkinje cell protein 2 (Pcp2) ( Figure 1G). In line with the notion that cerebellar GCs is the biggest cell population in the vertebrate brain, GPs and GCs accounted for 78% of total cells at P8, whereas PCs accounted for only 1.43% ( Figure 1H and Supplementary Dataset S2). Although different cell types are distinct in the size of cell body, the numbers of genes and UMIs of each cell type were similar (Supplementary Figure S3A and B). Notably, the percentage of mitochondrial transcript reads in PCs was bigger than that in other cell types (Supplementary Figure S3C), suggesting that PCs may contain more mitochondria.

Transcriptional profile of developing PCs
Dataset containing 256 PCs (including 92 from P0 and 164 from P8) was further analyzed for insights into molecular mechanisms underlying temporal development of PCs (Figure 2A). At P0, PCs usually exhibit bipolar morphology, named as simple-fusiform cells, followed by the development of planar and fan shaped dendrites ( Figure 2B) (Sotelo and Rossi, 2013). This process is accompanied by proliferation of GPs and outside-in migration of differentiated GCs (Blazeski and Mason, 1994;Miyata et al., 1997;Rice and Curran, 1999). Although several molecules have been shown to be involved in PC morphogenesis (Kapfhammer, 2004;Fujishima et al., 2018), a complete transcriptome analysis is needed for deeper understanding about mechanisms of PC maturation and functions. We analyzed scRNA-seq dataset of individual PCs from P0 and P8 and identified 618 differentially expressed genes (DEGs) between these two time points (Supplementary Dataset S3). Strikingly, gene ontology (GO) analysis showed that many DEGs were involved in mitochondrial and ATPase biological processes ( Figure 2C-E), suggesting that mitochondrial pathway may contribute to morphological development of PCs. Although PCs occupy a minority of cells in the cerebellum, they have critical nonautonomous roles in cerebellum development and circuit formation, including proliferation of GPs (Goldowitz and Hamre, 1998;Wechsler-Reya and Scott, 1999), migration of GCs (Rezai and Yoon, 1972), as well as synapse formation and refinement (Sassoe-Pognetto and Patrizi, 2017). We extensively analyzed the genes that were differentially expressed in PCs compared to other clusters using Seurat analysis, and identified 815 DEGs between PC and other cell types at P8 ( Figure 2F and Supplementary Dataset S2). Notably, most of the top 20 divergent genes have been shown to play important roles in cerebellum development, such as genes encoding insulin-like growth factor I (Igf1), nuclear receptor ROR-alpha (Rora), and inositol 1,4,5-trisphosphate receptor type 1 (Itpr1), which have been shown to be involved in PC morphogenesis (Hamilton et al., 1996;Fukudome et al., 2003;Van de Leemput et al., 2007). Considering non-autonomous roles of PCs, we analyzed DEGs encoding secretory proteins and identified 16 genes that were highly expressed in PCs ( Figure 2G). Among them, several members of Semaphorin family of axon guidance proteins, including Sema 3a, 4 g, 6b, 7a, c-type natriuretic peptide (Nppc), neuron-derived neurotrophic factor (Ndnf), and platelet-derived growth factor subunit A (Pdgfa), are highly expressed in subsets of PCs ( Figure 2G). These results suggest potential regulators for neuronal morphogenesis, positioning, survival, or precise connections.
In addition to the known marker genes of PC, such as Calbindin, we also found several other genes that were specifically expressed in PCs, including cold shock domain-containing protein C2 (Csdc2), coiled-coil-helix-coiled-coil-helix domain-containing 10 (Chchd10), inositol-trisphosphate 3-kinase A (Itpka), phosphatidate cytidylyltransferase 1 (Cds1), and adenylate cyclase type 1 (Adcy1) ( Figure 3A and B). The expression of these genes in PCs was verified by immunostaining ( Figure 3C) or in situ hybridization (ISH) ( Figure 3D). Notably, Adcy1 protein was not only expressed in the soma of PCs, but also expressed in dendritic processes in outer external granule layer/molecular layer ( Figure 3C). These results indicate novel marker genes expressed in PCs, whose functions remain to be investigated. Indeed, previous studies have shown that mutations of Chchd10 are associated with several diseases, such as frontotemporal dementia and cerebellar ataxia (Bannwarth et al., 2014;Chaussenot et al., 2014). These findings refine and extend previous literatures about molecular specification of PCs and implicate mechanisms underlying cerebellar circuit formation.

Expression pattern and co-expression network of ataxia risk genes
Previous studies have identified a number of ataxia risk genes (Paulson, 2009;Ashizawa et al., 2018), whereas susceptible cell types in the cerebellum remain unclear. We analyzed the expression pattern of 28 ataxia risk genes and found that majority of them are highly expressed in PCs, while only a few of them, including elongation factor 2 (Eef2), ataxin-10 (Atxn10), nucleolar protein 56 (Nop56), are widely expressed in all cell types ( Figure 4A). This finding is in line with the hypothesis that dysfunction of PCs is closely related with occurrence of ataxia disorder (Sakaguchi et al., 1996;Taroni and DiDonato, 2004;Walter et al., 2006). We also analyzed the co-expression network of those ataxia risk genes and revealed a central role of Eef2, Grid2, Tubb4a in the co-expression networks (Irrthum et al., 2010), while Ttbk2 and Bean1 have few interactions with other ataxia risk genes ( Figure 4B). This result indicates that different ataxia risk genes may act through shared or diverse gene expression networks to regulate cerebellar development or functions. GO analysis of these co-expression genes showed that most significantly enriched terms were related with organizations of organelle and mitochondrial inner membrane ( Figure 4C). Strikingly, GO enrichments of ataxia risk co-expression genes were largely consistent with DEGs of PCs between P0 and P8. Together, these results suggest that mitochondrial function is highly correlated with development of PCs and implicate cellular basis of motor behavior disorder.

Gene regulation network for GC differentiation
GCs account for around 80% of all neurons in the cerebellum, where they receive afferent inputs through 4-5 dendrites in the granule layer and form synapses with other cells in the molecular layer through the parallel fibers (Kalinichenko and Okhotin, 2005;Azevedo et al., 2009;Chédotal, 2010). GPs originated from the rhombic lip region migrate to cerebellar anlage and proliferate in the outer EGL, then exit the cell cycle to become migrating neuroblasts in the inner germinal layer. Differentiated GCs migrate into the IGL and mature around postnatal week 3 into fully functional excitatory GCs ( Figure 5A). To reveal the developmental processes of GCs, we analyzed DEGs between GPs and GCs. The enriched GO terms in GPs were related to cell division and proliferation, while GCs enriched genes were related to axonal development and cell morphogenesis ( Figure 5B). To investigate the developmental trajectory of GCs in more detail, a total of 17160 single GCs and GPs were further divided into 6 clusters with random forest algorithm ( Figure 5C). Clusters 3 and 4 showed higher expression of proliferation genes, such as Mki67 and Tpx2 and thus represent proliferating GPs ( Figure 5D). Clusters 0 and 5 represented newly differentiated GCs as they expressed high levels of Atoh1 and Hey1 ( Figure 5D). Clusters 1 and 2 highly expressed neuronal maturation genes, such as Cntn2 and Tubb3, and thus considered to be mature GCs ( Figure 5D). To further explore the gene regulation network involved in GC differentiation, we selected top 13 TFs from DEGs of Cluster 0 and 5 and analyzed co-expression of these TFs with GC expression genes ( Figure 5E). Notably, nascent polypeptide-associated complex subunit alpha (Naca), a transcriptional co-activator which has been shown to be involved in developmental of cardiac and skeletal muscles (Park et al., 2010), exhibited the strongest co-regulation patterns with target genes, suggesting a potential role of Naca in regulating the differentiation of GCs ( Figure 5E). To further display TFs that may orchestrate GCs differentiation, we reconstructed the developmental time-course using Monocle analysis (Trapnell et al., 2014) (Supplementary Figure S4A-D) and assessed the relative expression of 13 TFs in Pseudotime ( Figure 5F) (Trapnell et al., 2014). We found that Naca showed persistent highlevel expression in all stages, whereas differentiation-related genes, such as Atoh1, Hey1, Barhl1 showed transient upregulation in the intermediate stage, and proliferation-related genes, such as Tipin and Cited, exhibited gradual downregulation ( Figure 5F). These results demonstrate temporal and dynamic transcriptional program in the continuous process of GC trajectory.

Implications of interneuron development and glia functions
Cerebellum interneurons are a diverse population that varies widely in morphology, connectivity and patterns of activity. The basket/stellate cells in molecular layer and Golgi cells in granule layer are inhibitory interneurons derived from a common precursor population in the cerebellar white matter (Maricich and Herrup, 1999;Weisheit et al., 2006), whereas unipolar brush cells in granule layer are excitatory neurons derived from the rhombic lip (Englund et al., 2006). As indicated in Figure 6A, a total of 2034 interneurons or their progenitors were subdivided into 5 clusters. Among them, Clusters 1 and 3 represent inhibitory interneuron progenitors in proliferation or intermediate state, characterized by the expression of cerebellar inhibitory interneuron lineage marker genes Ptf1a and Ascl1 without or with the expression of proliferation genes Mki67 and Top2a ( Figure 6B). The expression of Pax3 in these cells agrees with a recent report that showed localization of Pax3-positive inhibitory neural precursors in the cerebellar white matter (Rosenberg et al., 2018). Clusters 0 and 2 represent differentiated inhibitory neurons, as reflected from the expression of Gad1, Gad2, and GABA transporter Slc6a1 ( Figure 6B and C). The expression of Atoh1 and Reelin (Reln) in Cluster 4 suggested origin of this cell population from rhombic lip regions, and the expression of Pax2 and Slc32a1 defined this cluster as a lineage of inhibitory interneurons ( Figure 6B). To further delineate gene regulatory network related with cerebellar interneuron differentiation, we extracted abundantly expressed TFs from Clusters 1 and 4, and analyzed co-expression network of those TFs, respectively. We found that Erh and Hmgb3 are the candidate core TFs regulating the differentiation of Basket/Stellate/Golgi cells, and the distinct TF coexpression network in Cluster 4 suggests again a new subtype of inhibitory neuron linage derived from rhombic lip ( Figure 6D).
It is known that glia cells play important roles in neural development (Mount et al., 1995;Marın-Teva et al., 2004;Zuchero   and Barres, 2015), e.g. guiding nerve growth or neuronal migration, through non-autonomous mechanisms. However, the expression pattern of genes encoding secretory proteins in the developing cerebellum is not clear. From a total of 2826 glia cells (2034 astrocytes, 455 oligodendrocytes, 337 microglia) in our dataset, we identified 233 DEGs among microglia, astrocytes, and oligodendrocytes. Among them, several genes encoding Semaphorin family proteins were enriched in glial cells, e.g. Sema6d in astrocytes, Sema5a and 5b in oligodendrocytes ( Figure 6E). A series of genes encoding chemokine ligands were expressed in microglia, suggesting their roles in cerebellar development ( Figure 6E).

Discussion
Although mammalian cerebellum accounts for <10% of total brain mass and relatively few cell types, it contains more than half of neurons in the whole brain (Wang and Zoghbi, 2001;Lackey et al., 2018). Extensive efforts have been put to investigate development of the cerebellum, in particular factors that determine the generation, positioning, and differentiation of cerebellar neurons (Wechsler-Reya and Scott, 1999;Butts et al., 2014;Marzban et al., 2014). However, the dynamic gene expression networks that orchestrate cell fate trajectory and functions are not completely understood. Single-cell RNA sequencing provides a unprecedented angle for the discovery of new cell types, identifying specific marker genes for target cell types, understanding gene expression patterns specifying particular cell fate, and implicating molecular basis of cells to execute their functions (Pollen et al., 2015;Shin et al., 2015;Zeisel et al., 2015;La Manno et al., 2016;Marques et al., 2016). For instance, it has been widely used for the analysis of cell types in developing or adult cortex of various species, including mouse and human (Pollen et al., 2014;Zeisel et al., 2015;Telley et al., 2016;Nowakowski et al., 2017). In this study, we analyzed 21119 single cells from the cerebellum of postnatal mice. Different from most recent studies, which mainly validated cell types or provided a transcriptional atlas of the developing murine cerebellum (Carter et al., 2018;Rosenberg et al., 2018), our analysis focused on identifying molecules that are correlated with neuronal maturation and functions. Our analysis of ataxia-related genes implicates cell types that mediate abnormal processes in disease.
PCs are the most elaborate cell type in the whole brain with complex dendritic trees receiving around one hundred thousand synaptic inputs per cell (De Camilli et al., 1984). How is this particular morphology determined? Previous studies have shown the involvement of several factors in the morphogenesis of PCs, such as Igf1, Rora, Itpr1, Foxp2, and Geranylgeranyltransferase I (Fukudome et al., 2003;Boukhtouche et al., 2006;Wu et al., 2010;French et al., 2018). We identified 618 DEGs in PCs between P0 and P8 and found that most of them are involved in the mitochondria and ATPase pathways, suggesting roles of these biological processes in PC development and maturation. By analyzing DEGs among different cell clusters, we also identified novel genes specifically expressed in PCs, suggesting new markers of PCs. Because morphogenesis of murine PCs last several postnatal weeks (Berry and Bradley, 1976;Altman and Winfree, 1977), analysis for the gene expression profiles in later time points is necessary for more complete understanding of mechanisms underlying dendritic development, as well as synapse and circuitry formation.
Dysfunction of the cerebellar circuit causes abnormal behavioral outcomes, ranging from neurological to neuropsychiatric conditions, such as ataxia, tremor, and autism spectrum disorder (Lackey et al., 2018). A number of genes have been shown to be associated with cerebellum-associated diseases, but types of cells affected are mostly unclear. Here we analyzed the expression of 28 ataxia risk genes in each cerebellar cell type and the co-expression network of related genes. We found that around half of ataxia risk genes are associated with mitochondria pathway and, remarkably, specifically expressed in PCs. These results suggest cell types susceptible to mutations of disease risk genes. Functional validation of these genes in defined cell types using genetic approaches will further strengthen their functions in neuronal or circuitry development and provide deeper insights into cellular mechanisms of related diseases.
GCs in the cerebellum are the most numerous cell types in the brain. Their precursors are derived from upper rhombic lip at middle embryonic stage (E13.5 in mice), followed by continued proliferation after arriving at the EGL and sequential differentiation and migration at postnatal stages (Wang and Zoghbi, 2001). Thus, the cerebellum at P0 and P8 should contain GCs at all stages. Our scRNA-seq analysis indicates that this is indeed the case. We found continuous trajectory of GCs, ranging from proliferation, differentiation, to mature states. The expression dynamics of TFs in pseudo-time demonstrates a comprehensive genetic program governing state transitions. The expression of genes encoding neuronal cytoskeletal molecules in differentiated GCs suggests molecular mechanisms underlying morphological dynamics during migration.
We expect the results obtained from our dataset and analyses will expedite understanding of mechanisms of cerebellum development. The expression of disease-related genes in defined cell types will accelerate understanding of molecular and cellular mechanisms of cerebellar diseases.

Animals
All animal usage and manipulations followed guidelines of Institutional Animal Care and Use Committees at the Institute of Neuroscience, Chinese Academy of Sciences. C57BL/6 J male mice at postnatal days 0 and 8 were used for single cell dissociation.

Single-cell dissociation and 10x genomics chromium library construction
Dissection and cell dissociation were carried out as described previously (Wu et al., 2014). Briefly, entire cerebellum was separated from the whole brain using tweezers to remove the meninges. Then, the samples were sheared in 1×Hank's balanced salt solution (HBSS) and digested for 15-20 min in 0.25% trypsin solution (Thermo Fisher Scientific). Dissociated cells were filtered through a 22-μm cell strainer (Millipore), pelleted by centrifugation at 300 g for 3 min, and re-suspended in a solution containing 0.04% BSA (Sigma) in 1× PBS (calcium and magnesium free) at a concentration of 2000-3000 cells/ml. Cell healthy state and numbers were measured by trypan blue staining. Dissociated cells in each sample were partitioned into nanoliter-scale Gel Bead-In-EMulsions (GEMs) in a limiting dilution to lower multiplet rate. Then, single-cell GEMs were incubated with reverse-transcription reagents, including several primers for cell and transcript barcoding in cDNA synthesis, followed by cDNA amplification and library construction. All procedures followed the guidelines provided by 10x Genomics, Inc.
Single-cell RNA-seq data processing, filtering, and clustering For RNA-seq data processing, we used the Cell Ranger software provided by 10x Genomics, which helps to convert 10x droplet data to matrices of expression counts. Briefly, the raw base call (BCL) files generated from the Illumina HiSeqX were demultiplexed into paired-end, gzip-compressed FASTQ files. Both pairs of FASTQ files were provided as input to 'cellranger count', and reads were aligned to mus musculus reference transcriptome (GenBank assembly accession: GCA_000001635.8). Both 'cellranger mkfastq' and 'cellranger count' were run with default command line options. Cells were first filtered to remove those that contained >6500 and <1000 genes detected. Genes that were detected in <6 cells were also removed. We next run canocical correlation analysis to indentify common sources of variation between samples and Multi-CCA to combine three objects to one object. Because we have begun to see drop-off in signal before CC15, we chose CC1-15 for subsequent analysis. The Seurat 'FindClusters' function was used to partition the cells into transcriptionally distinct clusters and present the graph clustering output on 2D map by t-SNE.

Differentially expressed genes and GO analysis
The 'FindAllMarkers' function in Seurat was used to identify unique cluster-specific marker genes with threshold set as 0.25. The 'roc' test was used to evaluate 'classification power' for a certain gene (ranging from 0-random, to 1-perfect), and 'avg. logFC' represents fold change in the corresponding clusters. Violin plots, feature plot and heat maps were generated using Seurat. GO enrichment analysis was carried out by 'ClusterProfiler' in R software program.

Developmental pseudotime analysis
The Monocle 2 package in R software program was used to determine the developmental pseudotime of GCs. Single-cell trajectories were constructed by DEGs based on the analysis of UMI reads data. Default settings were used for all other parameters. The regulation networks for the TFs were constructed by GENIE3 package and plotted by Cytoscape.

Data availability
The raw data reported in this work were deposited to NCBI with the accession numbers GEO: GSE122357 or SRP165255.

Supplementary material
Supplementary material is available at Journal of Molecular Cell Biology online.

Funding
This study was partially supported by grants from the National Natural Science Foundation of China (31490591, 31330032, and 61327902), the National Key R&D Program of China (2017YFA 0700500), the Frontier Key Project of the Chinese Academy of Sciences (QYZDJ-SSW-SMC025), Shanghai Municipal Science and Technology Major Project (2018SHZDZX05).