Identification of a differentiation stall in epithelial mesenchymal transition in histone H3–mutant diffuse midline glioma

Abstract Background Diffuse midline gliomas with histone H3 K27M (H3K27M) mutations occur in early childhood and are marked by an invasive phenotype and global decrease in H3K27me3, an epigenetic mark that regulates differentiation and development. H3K27M mutation timing and effect on early embryonic brain development are not fully characterized. Results We analyzed multiple publicly available RNA sequencing datasets to identify differentially expressed genes between H3K27M and non-K27M pediatric gliomas. We found that genes involved in the epithelial-mesenchymal transition (EMT) were significantly overrepresented among differentially expressed genes. Overall, the expression of pre-EMT genes was increased in the H3K27M tumors as compared to non-K27M tumors, while the expression of post-EMT genes was decreased. We hypothesized that H3K27M may contribute to gliomagenesis by stalling an EMT required for early brain development, and evaluated this hypothesis by using another publicly available dataset of single-cell and bulk RNA sequencing data from developing cerebral organoids. This analysis revealed similarities between H3K27M tumors and pre-EMT normal brain cells. Finally, a previously published single-cell RNA sequencing dataset of H3K27M and non-K27M gliomas revealed subgroups of cells at different stages of EMT. In particular, H3.1K27M tumors resemble a later EMT stage compared to H3.3K27M tumors. Conclusions Our data analyses indicate that this mutation may be associated with a differentiation stall evident from the failure to proceed through the EMT-like developmental processes, and that H3K27M cells preferentially exist in a pre-EMT cell phenotype. This study demonstrates how novel biological insights could be derived from combined analysis of several previously published datasets, highlighting the importance of making genomic data available to the community in a timely manner.

Background Diffuse midline gliomas with Histone H3 K27M (H3K27M) mutations occur in early childhood and are marked by an invasive phenotype and global decrease in H3K27me3, an epigenetic mark which regulates differentiation and development. H3K27M mutation timing and effect on early embryonic brain development are not fully characterized.

Results
We analyzed multiple publicly available RNA sequencing datasets to identify differentially expressed genes between H3K27M and nonK27M pediatric gliomas. We found that genes involved in the epithelial-mesenchymal transition (EMT) were significantly overrepresented among differentially expressed genes. Overall, the expression of pre-EMT genes was increased in the H3K27M tumors as compared to nonK27M tumors, while the expression of post-EMT genes was decreased. We hypothesized that H3K27M may contribute to gliomagenesis by stalling an EMT required for early brain development, and evaluated this hypothesis by employing another publicly available dataset of single-cell and bulk RNA sequencing data from developing cerebral organoids. This analysis revealed similarities between H3K27M tumors and pre-EMT normal brain cells. Finally, a previously published single-cell RNA sequencing dataset of H3K27M and nonK27M gliomas revealed subgroups of cells at different stages of EMT. In particular, H3.1K27M tumors resemble a later EMT stage compared to H3.3K27M tumors. Conclusions Our data analyses indicate that this mutation may be associated with a differentiation stall evident from failure to proceed through the EMT-like developmental processes, and that H3K27M cells preferentially exist in a pre-EMT cell phenotype. This study demonstrates how novel biological insights could be derived from combined analysis of several previously published datasets, highlighting the importance of making genomic data available to the community in a timely manner. 5. It's not super clear that the previous finding of gliomagenesis in the postnatal period was identified in a pre-EMT cell type, and the finding of no-gliomagenesis occurred in a post-EMT cell type. Could it be made clearer in the figure what cell types these previous experiments were performed in? Response: To clarify, Lewis et al., Science 2013 (ref. 62) found that expression of H3.3K27M with p53 loss in neonatal progenitor cells was insufficient for gliomagenesis. Subsequently, Larson et al. Cancer Cell 2019 (ref. 63) found that induction of H3.3K27M, p53 loss and mutant PDGFRA in neonatal progenitor cells did result in spontaneous gliomagenesis. Both experiments involved pre-EMT progenitor cells. We have clarified this in the text on page 12.
EMT gene signature and score 6. Could you please provide more information on how this was developed and calculated. Response: The EMT completion gene signature was developed through literature review and contains canonical post-EMT gene markers. Supplementary Table 3 shows the genes in this signature and literature sources for the role of each gene in EMT. The single cell transcriptional score is calculated using a previously published method (Tirosh et  For each gene in the gene set of interest, 100 genes are randomly selected from that gene's expression bin. This creates a control gene set with a comparable expression distribution, so that its expression provides a background gene set that is 100 times larger than the gene set of interest. For each cell, the average expression of the gene set of interest is compared to the average expression of the control gene set to create a raw score for that cell. The scores are then normalized across the dataset. The code for our implementation of this method is publicly available at: github.com/lauren-sanders/EMT-paper/   9. I think there may be a word missing in this line: "resulting in H3K27 trimethylation of key epithelial genes such as concurrently upregulating mesenchymal genes" Response: Thank you; we have reworded the sentence. Reviewer #2: The authors searched a subset of publicly available pediatric high-grade gliomas datasets to compare differentially expressed genes between H3K27M mutant diffuse gliomas and H3K27M wild-type gliomas. They identified an early EMT signature overexerted in the former compared to the latter. MAJOR COMMENTS 1. Diffuse midline glioma whether H3K27M mutant are different from hemispheric glioma especially in their invasive phenotype. As such, comparing these entity is proved to show a difference in EMT as invasion and EMT are so linked. Therefore, the authors can not infer that the difference shown here is due to the H3K27M mutation per se. To support this statement, they should either perform KO or KD experiments in H3K27M mutant cells or KI experiments in H3K27M wild-type cells. Alternatively, separating in the data set diffuse gliomas from the more circumscribed gliomas would be helpful. Ideally a significant data set from diffuse gliomas without H3K27M mutation may be a better control. If these experiments can not be performed, the statement should be then softened. Response: We thank the reviewer for raising a very important point and providing some excellent recommendations for additional analysis. In order to address this issue, we have repeated our differential expression analysis with a cohort of DIPG samples containing no hemispheric glioma samples, in order to clarify the role of H3K27M mutation after removing possible histological or location factors. The results of this analysis are consistent with the original findings. We describe this analysis on page 10 of the manuscript, and we also provide a caveat in the Discussion (page 24) and suggest future studies that could more clearly describe the role of H3K27M mutation alone in the observed EMT-related transcriptional signature. We provide the description of the additional DIPG-specific analysis, as well as the caveat in the Discussion, below: Additional DIPG-specific Analysis: (page 10) Finally, because EMT is associated with invasiveness in gliomas, and diffuse midline glioma are by nature more invasive than hemispheric glioma, we performed an additional analysis restricted to diffuse intrinsic pontine glioma (DIPG) to elucidate the role of the H3K27M mutation in the observed EMT-related transcriptional profiles. The goal of this analysis was to remove any potential histological or location signal that may be influencing the EMT-related gene expression. We used 10 H3 wild-type DIPG samples and 47 H3K27M DIPG samples from Treehouse cancer compendium v11. Limma differential expression analysis revealed 48 genes with higher expression in H3K27M DIPG compared to nonK27M DIPG (Supplementary Table 2). We again computed statistical overlap of these genes with 9 gene sets relating to epithelial cells and early brain development, and found significant overlap with 4 of the 9 gene sets (pvalue<0.1, Supplementary Table 2). The enrichment of 48 randomly selected genes in these 9 gene sets were not significant (Supplementary Table 2).
Caveat in the Discussion (page 24): "Additionally, a limitation of our study is that it is difficult to isolate the role of the H3K27M mutation from other factors such as histology and tumor location. Future studies will focus on EMT-related transcriptional programs in cellular models with inducible H3K27M expression to further characterize the molecular interplay between the H3K27M mutation and EMT in developing brain cells." 2. A comparison with adult type gliomas of the mesenchymal subtype would be interesting as well to tease out the real role of H3K27M mutation in this phenotype.

Response:
We examined the expression of the adult glioma mesenchymal subtype gene signature (Verhaak et al., Cancer Cell 2010) in our pHGG cohort (Reviewer Requested Figure 1). The H3K27M and nonK27M pHGG samples display similar expression of the adult mesenchymal signature, such that the expression of these genes does not separate them into different subtypes. This may be explained by the numerous differences between adult glioblastomas and pediatric diffuse midline gliomas. Adult glioblastomas have a higher mutation rate than pediatric gliomas (Schwartzentruber et al. Additionally, the adult GBM Mesenchymal subtype is predominantly characterized by mutations or deletions of NF1, a gene which is rarely altered in pediatric glioma. Moreover, this subtype expresses a combination of mesenchymal and astrocytic markers linked to dedifferentiated tumors. While dedifferentiation is well characterized in adult tumors since they arise in fully developed tissues, pediatric gliomas arise during embryonic development and are thought to represent a differentiation stall rather than dedifferentiation events (Filbin and Monje, Nat Med 2019).
Also, while the EMT is associated with proliferation and invasion in some adult cancers (Marcucci et al., Nat Rev Drug Discov 2016), here we refer to the EMT as a developmental process which must occur multiple times during normal neuronal development. This process is controlled by the H3K27me3 epigenetic mark, and we hypothesize that lack of H3K27me3 may prevent full completion of EMT at one or multiple timepoints during early brain development. Here, we follow up on previous work in the field which shows that early neural or glial precursor cells are in a susceptible transcriptional state and undergo a differentiation stall upon the occurrence of H3K27M. Because embryonic neural progenitor/stem cells undergo EMT-like processes during the formation of the neural plate and neural tube, we hypothesized that early neural or glial precursors poised to undergo an EMT as part of normal development are in a susceptible transcriptional state for gliomagenesis. Importantly, here we discuss EMT as a normal developmental process implicated in the early stages of H3K27M gliomagenesis, separate and distinct from the EMT as a process utilized by malignant cells undergoing metastasis, which may also occur later in pHGG tumor growth, as noted in Puget et al., PLoS One 2012.
We have clarified this throughout the manuscript by removing the use of the words "epithelial" or "mesenchymal" when referring to the EMT-like pHGG phenotypes that we observe.
3. The concept of EMT in gliomas can not be extrapolated form the data acquired in epithelial tumors. Most glioma gene expression profiles are mesenchymal and not epithelial at all while the so called mesenchymal subtype is even more mesenchymal (Iser et al, Med Res Rev 2017). The term is therefore misleading, should be used with caution and the EMT datasets from epithelial cancers should not be applied. The genes defining mesenchymal subset of GBM may have been more appropriate.
Response: Thank you to the reviewer for an excellent point. To address this comment, we have removed genes derived from EMT datasets from epithelial cancers, and retain only genes from datasets annotated by EMT as a cellular developmental process. This change is reflected in Supplementary Table 2 "EMT master gene list" and Figures 1B  and 3A, as well as in the text on page 8. As discussed in point 2, the genes defining the mesenchymal subtype of adult GBM may be less relevant for studying EMT as a developmental process in the brain.
MINOR COMMENTS -Differential expression analysis of pediatric gliomas with and without H3K27M mutation reveals deregulation of genes involved in epithelial-mesenchymal transition. 4. The authors should provide a reference that indicate that EMT is orchestrated by H3K27me3 marks (I found one reference in glioma Zhou et al, Oncol Lett 2019, the rest is in epithelial cancers). Response: Thank you to the reviewer for identifying an excellent reference. We have included this reference on page 4. 5. The limitation of GSEA datasets is that in most of the cases they are not oriented by over-expression or repression of the gene in the described pathway. Response: In order to overcome this limitation, we performed differential expression analysis first, then used gene set enrichment analysis to investigate gene sets enriched in overexpressed genes in the H3K27M glioma cohort. 6. The authors claim that over expressed genes in H3K27M mutant genes are associated with epithelial-like states and normally unregulated prior EMT while those under expressed are associated with the mesenchymal state, this is in contradiction with the adult literature in gliomas and also previous publications (Meel et  Response: In order to provide statistically meaningful analysis of the genes over-expressed in H3K27M mutant gliomas, we performed additional gene set enrichment analysis using the EMT genes over-expressed in H3K27M mutant glioma. Our results are consistent with the rest of our original findings, and show significant enrichment in gene sets related to early development, neurodevelopment, epithelial proliferation and epithelium development. We have included these results in the manuscript on page 9.
With respect to the publications mentioned by the reviewer, Meel et al 2018 primarily reviews the data on EMT in adult gliomas; there is limited data on the expression of EMT genes in DIPGs. They also acknowledge differences in the role of EMT genes between adult gliomas and DIPGs. In figure 4 of their paper, the authors show that in 6/8 pro-EMT genes examined, their expression in DIPGs is lower than or equal to the expression in adult GBM. Those results concur with our data in H3K27M gliomas. In addition, Meel et al specifically calls for more research in EMT in pediatric gliomas, which is the focus of our study.
Puget 2012 identifies a subset of DIPG samples (whose H3K27M status is unknown) that is enriched for the mesenchymal adult GBM subtype (mentioned above). They also examine expression of a list of 53 transcription factors associated with EMT in adult HGGs. It appears from figure 3A that some portion of these genes has relatively low expression in this mesenchymal subset of DIPGs, which is consistent with our analysis as well. While Puget was crucial in discovering a role for EMT in DIPGs, our study provides a novel contribution by focusing on the role of H3K27M and the concomitant loss of H3K27me3 in the expression of EMT-related genes.
We discuss both of these previous studies in our manuscript (Meel et  -H3K27M-mediated gliomagenesis is associated with pre-EMT cell types. 7. If I understood the experiment correctly, the authors show that H3K27M mutant gliomas have a pre-EMT signature and that this signature is present in the developing brain in cells in pre-EMT stage. This is a rather tautologic demonstration. EMT and stem cell state are really correlated in these gliomas where the bulk of the tumor cells are stem-like cells (Filbin et al, Science 2018). This could well be another hypothesis. This organoid experiment does not clarify it in my view. Response: To clarify, the organoid experiment serves to identify specific cell stages in the developing brain which are transcriptionally similar to H3K27M glioma. We employed the organoid data in order to investigate the hypothesis that H3K27M gliomas are similar to pre-EMT developing cells, and this analysis shows a transcriptional similarity between H3K27M gliomas and pre-EMT brain cell stages. We have clarified this experiment in the text on page 14.
-Single-cell profiling of H3K27M gliomas reveals groups of cells at different stages of EMT. 8. Number of samples are really low and lower than in the the reference paper where this data was extracted from, why? Response: The reference paper analyzed 6 K27M gliomas and 3 IDH-WT / H3-WT GBMs, all of which were publicly available at GSE102130 and we have included all of these data in our analysis. Although the reference paper does reference additional IDH-mutant astrocytoma and IDH-mutant oligodendroglioma samples, these data were not publicly released and would not have been relevant for our analysis. 9. How do the A, F, G, H clusters compare with the stemless signature used by Filbin et al?

Response:
We examined the expression of the 4 signatures identified by Filbin et al., Science 2018 (cell cycle, astrocytic, oligodendrocytic, and OPC-like) in the clusters identified in the EMT analysis. Consistent with our analysis, the mature astrocytic signature is enriched in "post-EMT" clusters A and B. The mature oligodendrocytic signature is enriched in cluster H, consistent with our finding that cluster H is ODC-like. The oligodendrocyte precursor stem-like signature is enriched in cluster E, consistent with our finding that cluster E is less differentiated and "pre-EMT", and and in "OPC-like" cluster G. We include this figure in the manuscript as Supplementary Figure 5. 10. The occurence of tumor cells in different state of EMT is in contradiction with the claim that H3K27M tumors are in a differentiation stall, isn't? Response: The single cell RNA-seq analysis refines the hypothesis that H3K27M tumors undergo a differentiation stall during developmental EMT events. While both H3.3K27M and H3.1K27M cells appear to be less mature than H3WT cells, the two mutations result in slightly different EMT-related transcriptional signatures, with H3.3K27M cells expressing a more primitive signature than H3.1K27M cells. 11. The mixing of different H3K27M tumors is not appropriate, especially with the many differences already shown between H3.1 and H3.3K27M tumors (Nagaraja, Castel for exemple). Response: There are indeed many known differences between H3.1K27M and H3.3K27M tumors as noted by the reviewer. However, for our study, we started out by hypothesizing that both H3.1K27M and H3.3K27M mutant gliomas would be different from the H3WT tumors because they both have loss of H3K27 trimethylation. Since we included both mutation types and wild-type cells in our unsupervised single cell clustering, we were able to identify different transcriptional signatures which would not have been apparent if we had kept the mutation types separate.
-Histone H3.1K27M glioma cells may represent a more advanced stage of EMT than H3.3K27M glioma cells. 12. EMT has been observed in DIPG (the prototypic H3K27M diffuse midline glioma) since 2012 by Puget et al. Albeit the histone H3 mutation was not known. The mesenchymal subtype described in this paper resembles the H3.1K27M subgroup mentioned here. This may be discussed later. Moreover, the H3.1K27M DIPG gene expression profile has been already shown to be enriched with the mesenchymal GBM signature, this old data support their finding and should not be omitted in the discussion.

Results
We analyzed multiple publicly available RNA sequencing datasets to identify differentially expressed genes between H3K27M and nonK27M pediatric gliomas. We found that genes involved in the epithelial-mesenchymal transition (EMT) were significantly overrepresented among differentially expressed genes. Overall, the expression of pre-EMT genes was increased in the H3K27M tumors as compared to nonK27M tumors, while the expression of post-EMT genes was decreased. We hypothesized that H3K27M may contribute to gliomagenesis by stalling an EMT required for early brain development, and evaluated this hypothesis by employing another publicly available dataset of single-cell and bulk RNA sequencing data from developing cerebral organoids. This analysis revealed similarities between H3K27M tumors and pre-EMT normal brain cells. Finally, a previously published single-cell RNA sequencing dataset of H3K27M and nonK27M gliomas revealed subgroups of cells at different stages of EMT. In particular, H3.1K27M tumors resemble a later EMT stage compared to H3.3K27M tumors.

Conclusions
Our data analyses indicate that this mutation may be associated with a differentiation stall evident from failure to proceed through the EMT-like developmental processes, and that H3K27M cells preferentially exist in a pre-EMT cell phenotype. This study demonstrates how novel biological insights could be derived from combined analysis of several previously published datasets, highlighting the importance of making genomic data available to the community in a timely manner.

Background
Pediatric high grade gliomas (pHGGs) are aggressive brain tumors occurring at a median age of 6 [1] . Sixty percent of pHGGs harbor a histone H3 K27M mutation, which is associated with an aggressive phenotype and dismal survival rates [2] . H3K27M-mutant pHGG tumors are located along the midline, including in the pons, cerebellum, and brainstem. A diffuse phenotype and delicate location leave them unsuitable for surgery, and their pronounced chemoresistance renders the standard treatments for gliomas ineffective, resulting in a median survival time of only 12 months [3,4] . The prognostic significance of the H3 K27M mutation in these gliomas resulted in a new WHO tumor classification, diffuse midline glioma with H3K27M mutation [5] .
The H3K27M mutation results in a global decrease in H3K27me3, an epigenetic repressive mark and posttranslational histone modification [6] . Seventy five percent of gene loci lose or have reduced H3K27me3, although a few loci gain the mark as a result of the H3K27M mutation [2,7] . H3K27me3 is deposited predominantly by EZH2, the catalytic subunit of the PRC2 methyltransferase complex. By regulating H3K27me3, EZH2 maintains cell identity and regulates cellular differentiation [8][9][10][11] . Silencing EZH2 in neuroepithelial cells before their differentiation alters the distribution of the progeny cell types [12] . EZH2 also maintains neuroepithelial cell integrity, and midbrain identity [13,14] . 4 Because H3K27me3 is globally lost in H3K27M-mutant glioma, the subsequent deregulation of gene expression is thought to lead to tumorigenesis, although the developmental timing of the mutational event is important [15] . H3K27M expression in neural stem cells has led to tumorigenesis in mice when accompanied by TP53 knockout and/or PDGFRA amplification, but this combination of molecular aberrations failed to result in tumorigenesis when introduced in mature astrocytes [16,17] . However, the precise cell type of origin for H3K27M gliomas is not yet known. Candidate cell types include neuroepithelial cells (also known as neural stem cells), radial glia (also known as neural progenitor cells), and oligodendrocyte precursor cells (OPCs) [16][17][18] .
Many important brain developmental processes are regulated by H3K27me3 deposition and could contribute to gliomagenesis if not well controlled. One of these is the epithelial-mesenchymal transition (EMT) pathway, which is essential for gastrulation, migration of neural crest cells, and neural tube formation [19][20][21][22] . The EMT is regulated by SNAI1, a transcription factor master regulator [23][24][25] . By regulating EMT, SNAI1 plays a critical role in many developmental processes, including gastrulation and differentiation of embryonic stem cells [26][27][28] . SNAI1 induces EMT through direct recruitment of PRC2, resulting in H3K27 trimethylation of key epithelial genes such as well as concurrently upregulating mesenchymal genes [29,30] .
In the brain, cellular transitions driven by EMT-like transcriptional programs are involved in key developmental steps such as the differentiation of neuroepithelial cells to both neuronal and glial cells [31,32] . These transitional transcriptional programs, which control cell fate and identity in early neural cell development, are regulated by EZH2 [33] .
Given the regulation of EMT-associated gene transcription by H3K27me3 deposition in the brain, and the disruption of this deposition by the H3K27M mutation, we sought to investigate 5 EMT-related gene expression in pHGGs with and without the H3K27M mutation. We analyzed RNA sequencing data from 78 pHGGs obtained from several different studies (Supplementary   Table 1). First, we performed differential expression analysis using RNA sequencing (RNA-seq) derived gene expression from bulk tumor samples, and found that H3K27M gliomas differentially express pre-EMT genes [34] . Secondly, we examined previously published cerebral organoid data and observed transcriptional similarities between pre-transition neural stem cells and H3K27M gliomas [35] . Finally, we leveraged a recent single cell RNA sequencing dataset to uncover multiple EMT-related transcriptional states in H3K27M tumor cells [18] . Overall, our results suggest that the H3K27M mutation may cause an arrest in development of a neural stem cell type due to lack of H3K27me3 transcriptional control of EMT-related cellular transitions, indicating a developmental window of opportunity for H3K27M mutations to induce gliomagenesis.
Our study highlights the importance of genomic data sharing for rare diseases, such as pHGGs.
By combining RNA sequencing data from multiple previously published studies, we were able to assemble a cohort of 78 pHGG, large enough for the differential expression analysis of pHGGs with and without the H3K27M mutation. We used this new cohort of previously published data to derive a novel biological model to describe the molecular pathogenesis of the disease.

Data Description
The RNA sequencing data from bulk clinical pediatric glioma samples used in these analyses were downloaded from the Treehouse cancer compendium v8, which is publicly available at the Treehouse website ( treehousegenomics.soe.ucsc.edu/public-data/). All samples passed the RNA sequencing quality control analysis used in the curation of the Treehouse cancer compendium [34] .
The single cell glioma RNA sequencing data were downloaded from the Gene Expression  Figure 1A). KRAS pathway enrichment is consistent with a recent study which found RAS signaling to be activated in H3K27M gliomas [45] .
Because genes involved in the epithelial-mesenchymal transition (EMT) are regulated by deposition of H3K27me3, an epigenetic transcriptional repressive mark that is lost in H3K27M cells, we were particularly interested in the differential expression of genes involved in the EMT pathway. The Hallmark EMT pathway gene list is limited to 200 genes [46] , so to comprehensively characterize expression of EMT-associated genes in H3K27M mutant versus nonK27M tumors, we generated a master list of non-redundant EMT-related genes (n=437) by merging several MSigDB developmental and cellular EMT-related gene sets (Supplementary Table 2). We included only genes from gene sets focused on EMT as a developmental process, and eliminated gene sets that were derived from published studies of EMT in adult carcinomas as per MSigDB [46] , because the epithelial nature of those cancers makes those gene sets inapplicable to pediatric gliomas. This list includes genes implicated in both pre-and post-EMT cell states, as well as intermediate EMT cell states and EMT-like processes.
To investigate differential EMT gene expression, we calculated the overlap between the EMT master list and the differentially expressed genes (Supplementary Table 2). We found 123 49 differentially expressed genes from the EMT master list, indicating potential differential activity of the EMT pathway in H3K27M mutant gliomas (pvalue<7.89 -14 , hypergeometric test). Of these genes, 26 were more highly expressed in H3K27M tumors, and the remaining 23 were more highly expressed in nonK27M tumors. (Figure 1B). Further investigation via manual inspection revealed that, in general, the EMT-related genes overexpressed in the H3K27M cohort are associated with the transcriptional profile of cells prior to an EMT-like transition. In contrast, 8 many of the EMT genes underexpressed in H3K27M tumors associated with a post-EMT cell state.
To statistically quantify the association of the 26 EMT-related genes overexpressed in the H3K27M cohort with pre-EMT cell states in the brain, we manually identified 9 gene sets relating to epithelial cells and early brain development (Supplementary Table 2). The H3K27M-high EMT genes had significant enrichment in 8/9 gene sets (pvalue<0.1), In contrast, we calculated the enrichment of 26 randomly selected genes in these 9 gene sets and they were not significant (Supplementary Table 2). The enriched epithelial gene sets include "GO Epithelium Development" (pvalue<3.4 -12 , hypergeometric test), "GO Epithelial Cell Differentiation", (pvalue<3.587 -04 ), and "GO Neural Tube Formation" (pvalue<0.001). In the developing brain, some of the cells of the neural tube, a pseudostratified epithelium, undergo an EMT in order to migrate [21] . RHOB , which plays a role in epithelial cell maintenance in the neural tube [47] , is more highly expressed in H3K27M tumors and belongs to 3/9 epithelial gene sets. Additionally, SFRP1 and SFRP2 , which are crucial in neural tube formation [48] , are more highly expressed in H3K27M tumors and belong to 6/9 epithelial gene sets.
Importantly, we noted that SNAI1 , a transcription factor and key regulator of the EMT transcriptional program, is significantly overexpressed in H3K27M tumors (LFC=0.6; Figure 1C).
Because SNAI1 induces EMT-like processes in the developing brain by directly recruiting PRC2 methyltransferase activity for H3K27-trimethylation, a process blocked by the H3K27M mutation, we hypothesized that the occurrence of the H3K27M mutation may promote tumorigenesis by stalling EMT during early neuroepithelial differentiation. To further investigate this hypothesis, we performed comparative RNA-sequencing expression outlier analysis developed by the Treehouse Childhood Cancer Initiative, which identifies genes with outlier expression in individual samples as compared to a background cohort of highly correlated and disease-matched samples (pan-disease analysis, see Methods) [34] . We identified genes with outlier expression only in nonK27M pHGG samples (but not H3K27M pHGG samples) as compared to a background glioma cohort, and noted that four of the top ten enriched pathways were related to EMT, including "TGF-Beta regulation of the extracellular matrix" (adjusted pvalue 4.01 -09 ) and "Extracellular matrix organization" (adjusted pvalue 2.98 -05 ) (Supplementary Figure   1, Supplemental Table 2).
Finally, because EMT is associated with invasiveness in gliomas, and diffuse midline glioma are by nature more invasive than hemispheric glioma, we performed an additional analysis restricted to diffuse intrinsic pontine glioma (DIPG) to elucidate the role of the H3K27M mutation in the observed EMT-related transcriptional profiles. The goal of this analysis was to remove any potential histological or location signal that may be influencing the EMT-related gene expression. We used 10 H3 wild-type DIPG samples and 47 H3K27M DIPG samples from Treehouse cancer compendium v11.
Limma differential expression analysis revealed 48 genes with higher expression in H3K27M DIPG compared to nonK27M DIPG (Supplementary Table 2). We again computed statistical overlap of these genes with 9 gene sets relating to epithelial cells and early brain development, and found significant overlap with 4 of the 9 gene sets (pvalue<0.1, Supplementary Table 2).
The enrichment of 48 randomly selected genes in these 9 gene sets were not significant (Supplementary Table 2).
Overall, our multiple analyses of the pHGG RNA-seq cohort suggest that H3K27M pHGG tumors are characterized by a transcriptional profile typically expressed by cells before undergoing an EMT-like transitional process, while nonK27M pHGG tumors are characterized by post-EMT gene expression.

B. H3K27M-mediated gliomagenesis is associated with pre-EMT cell types.
Consistent with our differential expression analysis, a review of the literature revealed that H3K27M-associated gliomagenesis has been experimentally recapitulated only in cell types which are poised to undergo an EMT differentiation event (Figure 2A). For example, a combination of H3K27M, p53 loss, and PDGFRA constitutive activation in human neural progenitor cells (NPCs) induced low grade gliomas when injected into the pons of neonatal mice [16] . These gliomas expressed markers of pre-EMT neuroepithelial cells. Another study found that H3K27M and Trp53 loss was sufficient for gliomagenesis in the NPCs of embryonic mice in the forebrain and hindbrain [17] . Strikingly, when introduced post-natally, H3K27M and p53 loss in pre-EMT NPCs was not sufficient for gliomagenesis, although post-natal induction of H3K27M, Trp53 loss and PDGFRA amplification in pre-EMT NPCs resulted in glioma formation [53,54] . Additionally, no tumorigenesis was observed upon introduction of H3K27M, p53 loss and PDGFRA constitutive activation in mature astrocytes, a post-EMT cell type [16] .
These observations indicate that experimental H3K27M-mediated gliomagenesis occurs in a pre-EMT cell type.  Table 3, see Methods). Hierarchical clustering of the expression profiles of these genes in normal cell types during neural development revealed highest expression in pre-EMT neural epithelium and early radial glia ( Figure 2D). We then ranked this gene signature based on each gene's expression in each cell type (see Methods). 13 We found that this signature is ranked most highly in pre-EMT neural epithelium and in early radial glia (pvalue<0.05, Figure 2E).
Overall, these results suggest that the differential EMT-related gene expression observed in our tumor cohort is consistent with identifiable stages in cyclic EMT-like transcriptional programs in the normal developing brain, and that H3K27M tumor cells resemble normal developing brain cells at a point where they are expressing a pre-EMT transcriptional profile. (Mann-Whitney significance test; * pvalue < 0.05, ** pvalue < 0.01, **** pvalue < 0.0001)

EMT-related transcriptional profiles.
We utilized recently published single cell RNA-seq data from 6 H3K27M and 2 H3 wild type (H3WT) gliomas to directly investigate the EMT-related transcriptional profiles of single cell populations within each tumor type [18] . One of the H3K27M tumors harbors the mutation in the  Table 4).
Cluster gene signatures were identified by assigning each cluster the genes with maximum mean expression in that cell cluster across the dataset (Supplementary Table 4).
We assigned cluster function based on manual review of genes in each signature, and observed several populations of cells whose presence in this dataset has already been noted [18] . Cluster I is composed predominantly of non-malignant immune cells, indicated by comparatively high expression of immune markers such as CD68 [60] . Cluster H resembles oligodendrocytic cells, with highest expression of PADI2, PMP22 and RHOA , and cluster G resembles oligodendrocyte precursor cells with the highest expression of PDGFRA [61][62][63][64] . The presence of each of these cell types has already been noted in H3K27M gliomas [18] .
However, the remaining clusters are defined by genes associated with EMT. We again scored the EMT status of each cell with a gene signature representing EMT completion ( Figure 3A, see Methods) [18,[55][56][57][58] . Clusters D, E and F scored the lowest overall, while clusters A, B and C scored the highest overall. Cluster relationships are shown with Uniform Manifold Approximation and Projection (UMAP) in Figure 3B, and expression patterns of selected genes relating to transcriptional stages of EMT-like transitions are shown in the lower panel of Figure 3B. Of the genes identified in the bulk RNA sequencing analysis ( Figure 1C), only FN1 , CDH2 and CDH11 were expressed in the glioma single cell RNA-seq data, so we also visualized VIM as a post-EMT marker and SFRP1 as a pre-EMT marker.
In keeping with our previous analysis, we noted that cluster A, which is composed mainly of H3WT glioma cells, strongly resembles post-EMT cells and most highly expresses canonical post-EMT markers including CDH2 , CDH6 and VIM [65,66] . This is consistent with our observation that nonK27M gliomas transcriptionally resemble a post-EMT state as compared to H3K27M in the bulk RNA-seq pHGG cohort.
Interestingly, within the clusters composed predominantly of H3K27M cells, we observed In contrast, H3K27M-expressing clusters E and F cells exhibit comparatively the highest expression of several genes known for their expression in pre-EMT cell types, including CADM1 , PTEN, CTNNB1 and SFRP1 [48,[67][68][69][70] . Therefore, we defined Clusters E and F "pre-EMT". In contrast, Cluster C has comparatively the highest expression of only 10 genes and has no clear expression profile of any stage of EMT, so we defined Cluster C "EMT-ambiguous".
Cluster D was defined "EMT-intermediate", because it displays high expression of genes normally expressed while the EMT process is taking place, without a clear bias towards epithelial or mesenchymal gene expression, including SMAD2 and VCAN, which are activated during the EMT process rather than before or after [71,72] . H3.1K27M gliomas are comparatively rare but have a slightly better prognosis [40,73] . Normally, histone H3.3 is preferentially located at active chromatin [74][75][76] . This leads to distinct patterns of epigenetic reprogramming in each histone variant, where loss of the H3.3K27me3 mark is directly correlated with areas of H3.3 genomic enrichment, while H3.1K27me3 loss is higher at intergenic regions [76,77] . Because the H3K27M mutation is known to induce dose-dependent inhibition of PRC2 methyltransferase, this suggests that the localized distribution of histone H3.3 may result in higher local inhibition of PRC2 and loss of H3K27me3 at H3.3K27M sites [53,76] .
Because precise control of gene transcription via active chromatin is necessary for EMT-like developmental cell state transitions, a H3.3K27M mutation would be particularly damaging to proper regulation of these processes. Indeed, functional analysis of enhancer regions in H3.3K27M-expressing NPCs revealed enrichment of regions positively regulating EMT-related genes, indicating that H3.3 active chromatin regions are directly involved in transcriptional control of EMT-related genes [76] . This suggests that EMT-poised H3.3K27M cells will be unable to properly complete the transition due to lack of transcriptional control.

Accordingly, we observed EMT-intermediate or E/M hybrid expression genes in glioma
single-cell cluster D, which has a substantial number of H3.1K27M glioma cells. We hypothesized that H3.1K27M cells may be more differentiated than H3.3K27M cells.
In order to investigate this hypothesis further, we subset the single cell glioma RNA-seq data to 2458 cells with H3.1K27M or H3.3K27M mutation and performed Wilcoxon rank-sum test to identify genes overexpressed in each variant group (Supplementary Table 4 all cells for EMT completeness shows that H3.1K27M cells score significantly higher overall than H3.3K27M cells, while nonK27M cells score significantly higher than either mutant cell type (Supplementary Figure 3). However, because the H3.1K27M cells come from a single tumor, we performed additional analysis to investigate this observation.
We cultured DIPG primary cell lines isolated in a previous study to investigate the expression of EMT markers in H3.3K27M, H3.1K27M and nonK27M glioma cells [78] . Morphologically, we observed that when cultured in serum-free conditions, the H3.1K27M cell lines preferentially grow attached to the flask (4 of 5 cell lines), while the H3.3K27M cells preferentially grow as neurospheres (8 of 9 cell lines) ( Figure 4C). Because differentiation out of the neurosphere state is accompanied by attachment and increased expression of N-cadherin, this morphological trend is consistent with our hypothesis that H3.1K27M cells exist in a more differentiated state than H3.3K27M cells [79] .
We analyzed RNA-seq data from 3 DIPG cell lines to compare the expression of EMT genes We then performed RT-PCR to quantify the expression of FN1 and CDH2 , canonical post-EMT genes which were previously identified as differentially expressed by Mann-Whitney test in the bulk glioma RNA sequencing analysis ( Figure 4E, full-length gel in Supplementary Figure 4).
We attempted to quantify E-cadherin/ CDH1 as it is a canonical pre-EMT marker, but the levels were so low as to be undetectable by RT-PCR in these cell lines (RNA-seq <1.0 log2(TPM+1)).
We Our computational and in vitro observations are consistent with a recent study indicating that H3.1K27M tumor cells are overall more differentiated than H3.3K27M tumor cells [76] . Our results are also consistent with previous studies on EMT in pediatric gliomas which first found a mesenchymal subtype of DIPG and subsequently discovered that H3.1K27M mutant gliomas express genes associated with a more mesenchymal subtype of glioblastoma [73,80] .
Overall, these data suggest that the histone H3K27M mutation is associated with a preferentially early or pre-EMT cell state as compared to nonK27M cells, but that H3.1K27M cells may represent a somewhat later or intermediate-EMT cell state as compared to H3.3K27M cells.

Discussion
H3K27M diffuse midline gliomas are aggressive tumors generally occurring in early childhood in the hindbrain or midline. These tumors have poor prognosis and do not respond to standard chemotherapies for adult gliomas [81] . In contrast to most adult cancers, pediatric cancers, 20 including pediatric gliomas, are thought to occur due to a developmental stall relating to epigenetic dysregulation of normal cellular differentiation pathways [15,40,82] . H3K27M diffuse midline glioma cells lose EZH2-deposited H3K27me3 epigenetic transcriptional control markers, which are known to have crucial roles in cell differentiation and development in the brain [6] . In particular, normal H3K27me3 deposition controls neural cell differentiation through multiple EMT processes [22] . Research has implicated the EMT in pediatric gliomas [80,83,84] , particularly those with a more invasive phenotype. Histone deacetylase inhibitor treatment of in vitro pHGG cells reversed mesenchymal phenotypes, in keeping with a model in which the interplay between H3K27 acetylation and methylation controls EMT-related transcriptional states [38] . We hypothesized that loss of H3K27me3 in H3K27M mutant gliomas may lead to a stall in EMT processes in normal brain development.
In this study, we observed that various canonical EMT-inducing genes are significantly overexpressed in H3K27M mutant pHGGs, compared to nonK27M pHGGs, while many canonical mesenchymal markers are underexpressed in H3K27M pHGGs as compared to the nonK27M tumors. In particular, we noted higher expression of the pre-EMT transcription factor SNAI1 in H3K27M-mutant gliomas. Because SNAI1 relies on PRC2 and H3K27me3 to facilitate EMT through gene expression regulation, this may indicate an arrest in the EMT process. The existence of a hybrid epithelial/mesenchymal phenotype is well-established: the result of a partial EMT is the expression of both epithelial and mesenchymal genes [85] . Studies have shown that a hybrid E/M phenotype may indicate a worse prognosis than mesenchymal-only states in solid tumors [85][86][87] .
We hypothesized that if H3K27M mutation prevents full EMT, neural stem cells harboring H3K27M may be forced to retain a proliferative, stem cell phenotype, eventually leading to tumorigenic development. Accordingly, we observed from extensive literature review that experimental induction of H3K27M-associated gliomas has occurred exclusively in pre-EMT cell types, and that two consecutive EMT-like transcriptional transitions occur early in normal brain development.
Single To conclude, we mined 3 publicly available RNA-seq datasets from pediatric gliomas and cerebral organoids to generate a hypothesis for the gliomagenesis of H3K27M gliomas. We propose that the H3K27M mutation is tumorigenic when the mutational hit occurs in a cell poised to undergo an EMT-like cell state transition, due to the dependence of EMT-associated transcriptional activity on the correct timing of the H3K27me3 mark ( Figure 5). More work is needed to characterize the observed difference in the EMT-associated transcriptional profiles between the H3.1 and H3.3K27M variants. Additionally, a limitation of our study is that it is difficult to isolate the role of the H3K27M mutation from other factors such as histology and tumor location. Future studies will focus on EMT-related transcriptional programs in cellular models with inducible H3K27M expression to further characterize the molecular interplay between the H3K27M mutation and EMT in developing brain cells.
Taken together, our results hold important implications for better understanding the developmental origin and timing of these aggressive and untreatable cancers. Further, the 22 presence of an epigenetically-driven differentiation stall may imply that a pharmacological methylation agent or a pro-differentiation therapy may aid in future treatment of H3K27M mutant tumors [88] .

Potential Implications
Our study holds implications for other diseases, because H3K27M mutation is not exclusive to diffuse midline gliomas. It can also be found in a fraction of pediatric ependymomas and medulloblastomas [89] . Interestingly, ependymomas located in the posterior fossa typically do not harbor the H3K27M mutation, but exhibit the K27M-associated H3K27 hypomethylation phenotype. Thus, the proposed differentiation stall and an associated EMT transcriptional signature as a result of H3K27me3 loss may also apply to these cancers. Beyond the SNAI1-H3K27me3 axis, EMT is also regulated by other epigenetic marks [90] . Given the epigenetically dysfunctional nature of many pediatric cancers [15] , EMT arrest could conceivably play a role in the oncogenesis of these tumors as well.

Glioma bulk RNA sequencing data
Gene expression data from 78 pediatric high grade glioma samples were downloaded from the Treehouse Childhood Cancer Initiative public compendium v8 (Tumor Compendium v8 Public) [36] . All samples in the compendium have been uniformly processed using the UC Santa

Cerebral organoid RNA sequencing data (bulk and single cell)
Gene expression data (TPM) from 6 weekly timepoints of human cerebral organoid growth were downloaded from accession GSE106245 [35] . Organoid weeks 0-5 were converted to gestational weeks 1-6 and duplicate gene measurements were averaged. For Figure 2A, expression of each gene was normalized between 0-1. Single cell RNA sequencing data from weeks 2 and 5 (gestational weeks 3 and 6) cerebral organoids were downloaded from accession GSE106245 [35] . Expression data were filtered to remove genes with expression in fewer than 10% of cells. Cell types were assigned using a list of marker genes (Supplementary Table 3).

Glioma single cell RNA sequencing data
Smart-seq2 RSEM TPM single cell RNA sequencing data from 3,057 glioma cells were downloaded from accession GSE102130 [18] . Data were log2-normalized and filtered to remove genes with expression in fewer than 20% of cells.

DIPG Cell Lines
The

Data Analysis
All statistical comparisons are performed with a two-sided Mann-Whitney test, with measurements taken from distinct samples without assumption of normality, and Benjamini Hochberg multiple testing correction was applied. Single cell and bulk tumor samples were scored for EMT activity using a manually curated set of mesenchymal genes and a previously published scoring method based on aggregate expression of the gene set as compared to a control gene set (Supplementary Table 3) [18,56,94] .

Availability of source code and requirements
Code for figures and data analysis: github.com/lauren-sanders/EMT-paper/ Apache-2.0 License Code for outlier analysis: github.com/UCSC-Treehouse/CARE/ Apache-2.0 License Operating system: Platform independent Programming languages: Python, R

Availability of supporting data and materials
All data used in the manuscript is available at the following websites or accession numbers: Publicly available: 1) bulk glioma RNA-seq: treehousegenomics.soe.ucsc.edu/public-data , 2) cerebral organoid RNA-seq: GSE106245, 3) glioma single-cell RNAseq: GSE102130. Data available with permission for the glioma cell line RNA-seq data dbGap phs000900.v1.p1. All supporting data and materials are available in the GigaScience GigaDB database [95] .

Ethics Statement
The protocols for the PNOC-003 trial, Dr. Michelle Monje's studies, Dr. Mariella Filbin's studies, The Cancer Genome Atlas, the Children's Brain Tumor Tissue Consortium, the International Cancer Genome Consortium, and the University of Michigan Clinical Sequencing Exploratory Research have been previously described [18,[37][38][39][40][41][42] . The UCSC Treehouse Childhood Cancer Initiative protocol was approved by the UCSC Institutional Review Board (No. HS2648) [34] .  Click here to access/download; Figure;Figure4.pdf