Variational autoencoding of gene landscapes during mouse CNS development uncovers layered roles of Polycomb Repressor Complex 2

Abstract A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.


Supplemental
. Normalised distance between tissues. For each Eed-cKO condition the distance (normalised sum of square differences between gene expression) between the mutant's merged replicates and all WT conditions is shown. The WT condition with the smallest distance is to the Eed-cKO tissue is shown. There is an evident regression along the A-P axis of Eed-cKO FB samples, and some Eed-cKO MB samples being most similar to more posterior WT tissues (highlighted in bold). The log 2 FC for each tissue was used to rank the significant genes for GSEA showing that in FB and MB there is negative enrichment for the cell cycle pathway and RNA biology pathways, such as "ribosome" and "spliceosome" (these do not appear to be negatively enriched in the HB and SC). In all four tissues the top positively enriched pathways are associated with immune response. In the posterior tissues there is negative enrichment of some overlapping pathways with the anterior tissues, such as "DNA replication". .5 there is downregulation of the "cell cycle" pathway. At E18.5 "cell cycle" is no longer a significant term but other terms such as "DNA replication" are shared. In the positively enriched pathways there is enrichment of immune associated pathways, and a response to a disrupted system (e.g., "allograft rejection") .5 there is a downregulation of RNA pathways ("RNA degradation", "RNA polymerase"), and an upregulation of immune response or cancer terms, as well as of "blood cells" and "hematopoietic cell lineage". At E15.5 and E18.5 there is upregulation of the cell cycle pathway and also of blood cancer associated pathways, such as "chromic myeloid leukemia". At the later stages there is negative enrichment for brain associated pathways, such as "neuroactive ligand receptor interaction" and "axon guidance". . Groups of FB, MB, HB, and SC genes were most apparent when using the consistently affected gene set (1,371 genes) with separation reduced when adding the much larger set of partly affected genes. (B) Separability (as defined by Silhouette score, higher is better) between the diverse gene groups, each linked to a specific A-P development term, over 20 runs for tSNE, UMAP, deep VAE, shallow VAE, and PHATE and a single run for PCA (which is deterministic). tSNE was omitted from D = 6/all affected genes as it failed to complete (Ubuntu 20.04.2, 240GB System memory, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz) within 24 hours of runtime (all other tools completed within one hour).

Supplemental
Supplemental Fig 8. GSEA for VAE Genes ordered by each VAE dimension enrich in KEGG pathways (top-5 shown as per NES), which concord with GO analysis ( Figure 5) and reinforce the characterisation of each gene cohort, e.g., unmarked proliferation genes map to large tail of dimension 1, and intersect with pathways for "cell cycle" and "oocyte meiosis".
Supplemental Fig 9. Enrichment for tSNE (A) Genes were ranked by each tSNE dimension (up to 3) to identify negatively and positively enriched pathways. tSNE 1 is negatively enriched for cell cycle associated pathways and positively enriched for one pathway: "neuroactive ligand receptor interaction". tSNE 2 is negatively enriched for immune response pathways and positively enriched for cancer associated pathways. tSNE 3 is negatively enriched for one pathway; "tight junction", while positively enriched for diverse terms, including "cell cycle" and brain pathways. (B) The top and bottom 200 genes along each tSNE dimension were tested for enriched GO terms. tSNE 1 aligns with cell cycle associated terms, fitting with the enriched pathways along dimension 1, while not enriched for any terms in the top genes. In dimension 2, the bottom genes enrich immune response genes, and the top RNA metabolism associated GO terms (contain the Hox genes), all of which agree with enriched pathways. The terms for bottom genes in tSNE 3 overlap with forebrain development, while the terms for the top are more diverse, analogous to pathways, enriching for cell projection as well as development terms. None of the gene cohorts appear to specifically encode for development terms.
Supplemental Fig 10. Enrichment for PCA (A) Genes were ranked by each Principal Component (PC) (up to 3) to identify negatively and positively enriched pathways. PC 1 is negatively enriched for immune associated pathways, and positively enriched for brain dysregulation, "WNT signalling, Hedgehog signalling" and cancer pathways. PC 2 is negatively enriched for only two pathways: "Nod like receptor" and "arachidonic acid metabolism", which is associated with neurotransmitter systems. The top five positive pathways in PC 2 are diverse, including "cell cycle", "axon guidance" and cancer pathways. PC 3 is negatively enriched for diverse pathways, including cell cycle, carcinoma, and Hedgehog signalling. PC 3 is positively enriched for brain associated pathways, including "Alzheimers" and "axon guidance". (B) The top and bottom 200 genes along each PC were tested for enriched GO terms. PC 1 agrees with the enrichment of pathways, e.g., with RNA metabolism associated GO terms positively enriched (contain Hox genes). In PC 2 the top 200 genes are not enriched for any GO terms, while the bottom 200 enriched predominately for cell cycle terms. PC 3 is negatively enriched for similar terms that PC 2 enriched for, and positively enriched for transport associated and development terms. There is no enrichment for anterior specific function.
Supplemental Fig 11. Enrichment for UMAP (A) Genes were ranked by each UMAP dimension (up to 3) and to identify negatively and positively enriched pathways. UMAP 1 is negatively enriched for cancer associated pathways, and positively enriched for immune response. UMAP 2 is negatively enriched for a diverse range of pathways, including "cell cycle" and the immune associated "systemic lupus erythematosus". UMAP 2 is only enriched for two pathways: "TGF beta signalling", which is associated with development, and "pathogenic Escherichia coli infection". The top five positive pathways in UMAP 3 covers similar pathways to UMAP 2 i.e., "cell cycle" and immune associated pathways. UMAP 3 is positively enriched for terms from UMAP 1 and UMAP 2. (B) The top and bottom 200 genes along each UMAP dimension were tested for enriched GO terms. UMAP 1 is positively enriched with RNA metabolism associated GO terms (contain the Hox genes), which does not overlap strongly with the enriched pathways. The top genes from UMAP 1 predominantly enrich in immune response terms and agree with pathways. The bottom genes from UMAP 2 are associated with cell cycle terms, while the opposing side of the dimension are development associated. In UMAP 3 both bottom and top terms overlap with UMAP 2. There is no enrichment for anterior specific function.