Surveying Brain Tumor Heterogeneity by Single-Cell RNA Sequencing of Multi-sector Biopsies

Brain tumors are among the most challenging human tumors for which the mechanisms driving progression and heterogeneity remain poorly understood. We combined single-cell RNA-seq with multisector biopsies to sample and analyze single-cell expression profiles of gliomas from 13 Chinese patients. After classifying individual cells, we generated a spatial and temporal landscape of glioma that revealed the patterns of invasion between the different sub-regions of gliomas. We also used single-cell inferred CNVs and pseudotime trajectories to inform on the crucial branches that dominate tumor progression. The dynamic cell components of the multi-region biopsy analysis allowed us to spatially deconvolute with unprecedented accuracy the transcriptomic features of the core and those of the periphery of glioma at single cell level. Through this rich and geographically detailed dataset, we were also able to characterize and construct the chemokine and chemokine receptor interactions that exist among different tumor and non-tumor cells. This study provides the first spatial-level analysis of the cellular states that characterize human gliomas. It also presents an initial molecular map of the cross-talks between glioma cells and the surrounding microenvironment with single cell resolution.


INTRODUCTION
Gliomas are among the most lethal forms of human tumors as they are characterized by aggressive behaviors and resistance to multiple therapies. The development of genetic mutations in malignant cells and the complex interactions between tumor and non-tumor cells in the glioma micro-environment foster intratumoral heterogeneity, thus contributing to the therapeutic failures and the generally poor prognosis of gliomas. A major unmet challenge in neuro-oncology is our ability to understand glioma heterogeneity and progression in gliomas and how they influence therapeutic resistance [1].
Several studies reported that malignant gliomas are characterized by a formidable degree of intratumoral heterogeneity. For example, mosaic amplification of receptor tyrosine kinase genes (EGFR, MET, PDGFRA) is known to represent a classical hallmark of genetic heterogeneity affecting neighboring tumor cells within bulk glioma samples [2].
Furthermore, single cell-derived clones of glioma cells have been identified and shown to exhibit divergent proliferation and differentiation abilities [3]. Finally, the multi-region genetic analysis of gliomas with single nucleotide polymorphism (SNP) array or whole exome sequencing revealed that divergent glioma subtypes can be recovered from different geographical regions, which together give rise to a branched pattern of progression [4,5].
As single-cell RNA-sequencing became a feasible approach to investigate human tumors, glioma heterogeneity has started to be explored with single-cell resolution [6][7][8][9][10][11]. However, most of the studies that have previously reported single cell RNA-sequencing of gliomas did not include the analysis of either tumor cells and the tumor microenvironment (TME) from multiple spatially annotated regions of gliomas, thus limiting our understanding of patterns of spatial evolution and brain infiltration, the latter being one the most critical hallmark of aggressiveness and progression of malignant glioma.
To delineate the glioma single-cell heterogeneity in both spatial and temporal resolution, we performed a glioma single-cell analysis from multisector biopsies informed by precision navigation surgery. Cell type components of each tumor fragments and temporal relationship of cells in each individual patient were unbiasedly identified. Our analysis did not use approaches aimed at selecting specific tumor or non-tumor cell populations. Therefore, we report the first single cell based comprehensive spatial analysis of the geographical structure of glioma and the dynamic progression of the interactions of tumor cells with individual non-tumor cells from multiple tumor locations.

Precision Navigation Based Multi-sector Biopsies and Single-Cell RNAseq of Glioma Cells
Tumor sections with potential representative divergent properties were marked in a presurgical 3D enhanced magnetic resonance imaging (MRI) model and tumor tissues were precisely collected during surgery by navigation sampling. Samples were quickly dissociated and subjected to single-cell RNA sequencing (scRNA-seq) library preparation [12,13] (Fig. 1A). Overall, 7,928 single-cell transcriptomes were generated, and 6,148 passed stringent quality filtering steps after alignment and reads counting (  Table 1 and 2).
Since regional gene expression status could be affected by copy number variation (CNV) dose effect, we adopted previously reported methods to predict large fragment copy number status with single-cell gene expression matrix [6,8]. The generated copy number matrix clustered into 5 CNV subgroups (Fig. 1D). Malignant cells were identified based on this classification. Subgroup G1 and G2 included glioblastoma multiforme (GBM) cells sharing a chr7 amp /chr10 del driven transcriptome whereas a unique CNV pattern was apparent in GBM patient GS3 (subgroup G5). The G4 subgroup included low-grade glioma (LGG) cells with chr1p/chr19q codeletion-driven signatures. The last subgroup (G3) was composed of non-malignant cells without obvious CNVs.

Optimized t-SNE Map and Clustering Identifies 25 Cell Type Clusters
As reported by previous single cell RNA-seq studies of human glioma, the malignant cells from different patients showed a fragmented relationship in clustering analysis, mainly because of the dose effect of gene expression caused by diverse CNVs [6,8]. Therefore, we argued that removal of CNV variances would optimize the principal component analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) analysis.
To obtain the CNV status of each tumor sample, we performed low depth whole genome sequencing (WGS) on bulk biopsies. Based on the WGS data, genes with interpatient large copy number changes were removed from the clustering analysis ( Fig Supplementary Fig. S6A, B and D), the fragmented cell distribution was resolved to a map with more concentrated points and clearer edges (Fig. 2D). This distribution was confirmed by independent binary regulon activity clustering with SCENIC ( Fig. 2E and Supplementary Fig. S6E and S7) [14].
Considering the cell source and CNV status with a global t-SNE map, the non-tumor cells and the glioma cells lacking extensive genomic rearrangements (such as those derived from LGG) showed a converged distribution, indicating that our data were not influenced by batch effects. They also indicated that the LGG-derived tumor cells were more uniform among different patients ( Fig. 2C and Supplementary Fig. S6C). Cell cluster-specific genes coalesced the 25 clusters into 4 major groups (Supplementary Table 2): CNV -MOG + normal glial cells [15], KRT5 + lung cancer (LC) metastasis cells [16], CD45 (PTPRC) + immune cells and CNV + malignant tumor cells. In malignant glioma cells, OLIG2/DLL3 and AQP4/CLU distinguish tumor cells exhibiting transcriptomic features of oligodendrocytes and astrocytes, respectively [17][18][19]. Cells positive to the proliferative marker MKI67 + were present among both glioma cell phenotypes. Moreover, PTPRZ1 and SOX2 were significantly overexpressed in glioma cells and could therefore be considered as useful markers to estimate the tumor cell purity in bulk glioma tissues. Interestingly, clusters 10 and 12 specifically overexpressed cilium-related markers (HYDIN, FOXJ1, etc.) [20,21].
The tumor cells included in these clusters exhibited astrocyte-like features and expressed low levels of PTPRZ1 (Fig. 2F, G and Supplementary Fig. S8). Clusters 10 and 12 mostly derived from patient GS13, indicating that this patient developed a rare form of ciliated glioma for which single-cell studies have not previously been reported.

Classification of Single Glioma Cells
Normal diploid cells in the tumor samples were mainly cells within the tumor microenvironment (TME). To conduct an unbiased investigation of the TME in glioma, we did not include flow cytometry-based selection in our sampling procedure. Normal oligodendrocytes, microglia and macrophages accounted for approximately half of these cells, with relative homogeneity among patients, such a conclusion could not have emerged from CD45 selected samples, as this is the prevailing sampling method used in previous scRNA-seq studies. We also captured cells showing specialized activities, such as interferon-induced oligodendrocytes and polarized microglia/macrophages (Fig. 3A).
Next, we classified malignant cells according to the The Cancer Genome Atlas (TCGA) GBM classification model [19]. Grade II tumor cells with chromosome 1p/19q codeletion had a strong proneural (PN) signature. Conversely, the individual tumor cells recovered from GBM were distributed among each of the three transcriptomic subgroups (PN, classical (CL) or mesenchymal (MES)). We did not identify glioma cells that could be assigned to the neural (NL) subtype but we detected cells expressing cilium-related signatures that were distinct from those associated with known TCGA subgroups ( Fig. 3B and C).

CNV Accumulation and Tumor Progression in Patient GS1
Since CNVs usually accumulate during tumor progression, the CNV profile of individual tumor cells has emerged as an accurate inference to trace tumor progression [2][3][4]8,22,23]. We used arm-level scRNA-seq derived CNVs to trace tumor cell clones, which were confirmed by bulk WGS (Fig. 1D, Supplementary Fig. S3A   The peritumoral biopsies contained a higher fraction of non-tumor cells than glioma cells. As scRNA-seq can be used to infer the underlying CNV, we interrogated our dataset to uncover the spatial dynamics of CNV changes in glioma cells in order to reconstruct the trajectory of tumor initiation and progression. Five CNV-based clusters (CN-1 to  were found in patient GS1 (Fig. 4B). CN-1 and CN-2 contained diploid non-tumor cells, but were divided into two clusters because their expression profiles differed from those of normal glial and immune cells. CN-3 to CN-5 clusters contained a series of aneuploid subclones that shared the chr7/10/19/20 CNVs that are recurrent genomic alterations in GBM. As they progressed, these subclones first accumulated CNVs on chr2 and later on They are also consistent with the notion that an increased definition of the multi-sector sampling of glioma might reveal finer and more accurate genetic trajectories of glioma evolution, as shown in recent hepato-carcinoma studies [24].
To unravel the phenotypic changes that mark spatial evolution of glioma we performed differential gene expression analysis between the different clones. As cells classified within

Patient GS13
Patient GS13 was a male with an IDH-wild type GBM characterized by high expression of genes associated with motile cilium activities (e.g., FOXJ1, FAM183A, HYDIN, DNALI1, etc.) (Supplementary Fig. S8 and S10). We noted that only about 5% of TCGA GBM patients exhibited high expression of cilium-related genes. Therefore, the particular type of GBM analyzed from patient GS13 belongs to a rare type of glioma that was not previously investigated at single cell level. However, cilium-specific gene expression was not associated with a specific pattern of survival ( Supplementary Fig. S11). From this patient, we acquired 978 cells from 3 core and 2 peritumoral sites, and found that the cells at core locations consisted of PN and cilium-positive cells (Fig. 5A). CNVs were also evident in this case. The clonal CNV reconstruction revealed 2 CNV clones (CN-2 and CN-3) and the transition from CN-2 to CN-3 was marked by accumulation of chr1 amp and chr19 del .
Furthermore, as cells transitioned to CN-3, the expression profile changed from global cluster 6 (PN) to global clusters 10 and 12 (cilium) (Fig. 5B-E). We also applied hierarchical clustering of CN-2 and CN-3 malignant cells and identified differentially expressed genes between these two groups. Interestingly, after cells transitioned to CN-3, they formed 2 branches with distinct gene expression profiles ( Fig. 5F and H, Supplementary Table 2).
The functional gene set enrichment analysis showed that the transition from CN-2 to CN-3 was associated with decreased expression of group 1 genes, which were implicated in glial cell differentiation and adhesion. Conversely, the expression of genes in groups 2 and 3 (implicated in cell cycle/DNA replication and cilium regulation, respectively) increased (  5H). From previous studies, a protumorigenic activity of TP73 has recently emerged, and especially in the context of the N-terminal truncated TP73 isoform [26]. Furthermore, TP73 overexpression has been reported in several tumor types, including breast cancer, melanoma, prostate cancer and neuroblastoma, and was shown to induce metastasis, chemo-resistance and other hallmarks of tumor progression that confer poor clinical outcome [27]. By RT-qPCR detection, both full length and ΔN-TP73 existed in GS13 tumor ( Supplementary Fig. S14).
In conclusion, the expression of a motile cilium signature is not an unusual event in GBM, but little is known about this particular GBM phenotype. The branched model revealed that glioma cells may undergo through different spatial destinations despite sharing similar CNV profiles.

Characterization of the Tumor Micro Environment (TME) in Glioma
In the glioma TME, tumor associated macrophages (TAM) communicate through ligand/receptor cross-talks with tumor and non-tumor cells to promote tumor aggressiveness [28,29]. Since our single-cell platform exhibit high sensitivity with higher Microglia and M2a/c macrophages, which expressed the CX3CR1 receptor, coexisted with glioma cells that expressed the CX3CL1 ligand. Lymphocyte infiltrates expressed CXCR3/6 and CCR6 but the CXCR3 ligands CXCL9/10/11 were rarely detected in glioma samples with the exception of interferon activated microglia. These ligand-receptor pairs were previously reported to recruit tumor-infiltrating lymphocytes and inhibits tumor growth [30].
CXCR6 ligand CXCL16 which existed in both transmembrane and soluble form [31], was highly expressed in TAM cells, and mildly expressed in the malignant cells. Lymphocytes also expressed CXCR4 like TAM cells, whereas the ligands were expressed in microglia cells. Their binding was reported to mediate glioma chemotaxis and regulate cell survival through activating AKT related pathways [32].
CCL5/CCL8 and CCR5 was another chemokine axis between lymphocytes and TAM.
Besides inflammatory chemoattractant functions, it can also mediate NK cell activation, which promotes tumor genesis and metastasis [33]. Another receptor of CCL8 is CCR2, which is expressed in monocytes, was highly expressed in the TAM cells and MES cells. In our spatial cell distribution data, macrophages were the most abundant non-tumor cell in core biopsies, but were replaced by larger fractions of microglial cells in biopsies from the tumor margins (Fig. 6C). Thus, a switch from macrophage to microglia infiltration from the glioma core to the periphery was a general event that likely marked the microenvironmental changes. The microenvironment changes are likely to be dictated by different requirements of glioma cells as they migrate from core tumor regions to the invading front at the tumor periphery. This pattern was also recapitulated in the IVY GAP dataset [37] ( Supplementary   Fig. S17).
Finally, a lineage trajectory was built with TAM cells (Fig. 6D-F

DISCUSSION
As malignant gliomas are characterized by high degree of intratumoral heterogeneity, single-cell genomic technologies have rapidly emerged as a crucial approach to disentangle glioma heterogeneity. However, most of the previous single cell RNA-seq glioma studies relied on the analysis of a single biopsy from each tumor specimen, thus lacking information on the special heterogeneity of the analyzed tumor lesions [6][7][8][9].
Although one study reported the multiregional analysis of glioma with single-cell sequencing, the number of cells analyzed in that study was very limited and could not provide a comprehensive picture of the geographical structure of glioma at the single cell level [10].
Here, we presented a comprehensive single-cell landscape of multiple subtypes of gliomas, each of which was analyzed by multi-region samplings, provided the first spatial-level analysis of the cellular states that characterize human gliomas. We designed multisector biopsies with 3D enhanced MRI model, and collected them during surgery by navigation sampling. For each biopsy, we generated and functionally annotated transcriptomes of hundreds of single tumor and non-tumor cells collected from multiple core and periphery tumor locations. Together, they provide a coherent map of the dynamic states and interactions between the different cell types that integrates the key features of glioma homeostasis at each tumor location. We found that both the number and the transcriptomic subtypes assigned to individual glioma cells frequently change dramatically between biopsies collected from different locations, even when they originated from neighboring glioma regions. We also made the unexpected observation that whereas core biopsies contained a high number of macrophages, this configuration of the core TME was replaced by a comparatively higher number of resident microglia at the glioma periphery, which represents the invading front of the tumor towards the normal brain.
As malignant glioma cells share high proliferation capacity, they readily accumulate multiple types of genetic alterations that trigger an increasing degree of aneuploidy with constant adaptation to the demands created by the growing tumor mass in relation to the TME [38]. An important, novel finding contributed by our work is the deconvolution of the cross-talks between tumor and non-tumor cells in the glioma TME. We found that the active communications between the different cell types are primarily implemented by multiple combinations of chemokine ligands with their corresponding receptors that we have characterized within different regions of individual tumors and among the different types of glioma we studied [39]. In particular, we found that the communication between non-tumor cells was dominated by the prominent role of the CXCL family of chemokines and related receptors, which was especially apparent in M2b macrophages and neutrophils. We followed up on this finding and determined the enrichment score of M2b/Neutrophil cells in bulk gliomas to evaluate the consequences of the infiltration of these cell types for clinical outcome of glioma patients. The analysis, which was performed with TCGA-derived glioma RNA-seq datasets, was able to distinguish patients with divergent clinical outcome based on the predicted level of infiltration of the two cell types.
In conclusion, we used the single cell RNA-seq technology to generate an extensive complete map of the geographical molecular structure of gliomas. The trajectories of reciprocal genomic and functional changes that accompany glioma cells as they move within the tridimensional space of the tumor mass, combined with the deconvolution of the cross-talks between different cells in the glioma TME, paint an unprecedented scenario that elucidates the intratumoral heterogeneity of this lethal tumor type.

COMPETING INTERESTS
The authors declare no competing interests.

Tumor samples and multisector biopsy
Tumor samples were obtained from consenting patients after they signed an informed

Tumor tissue dissection
Immediately after sample collection, the tumors were washed with phosphate-buffered saline (PBS), fully dissociated and subjected to enzymatic dissociation with trypsin-EDTA solution. Following filtrating through a cell strainer and red blood cell lysis, the cell suspensions were used for single-cell isolation.

Single-cell RNA-seq library construction
Single-cell RNA-seq libraries were constructed following the STRT-seq [1] protocol with minor modifications that were previously described [2,3].
Briefly, we used a mouth pipette to pick single cells into cell lysis buffer, containing

Single-cell RNA-seq data processing
Raw sequencing reads of single cell were obtained from pooled library data by cell-specific barcodes at the beginning of read 2, followed by trimming of the read 1 TSO sequence with an in-house script. More quality control procedures were applied to the remaining sequence of read 1; sequences containing poly-A tails, sequencing adapters, or low quality bases (N bases > 10%) were removed. Clean data were aligned to the hg38 human reference genome with hisat2 (version 2.0.5) [4]. Uniquely mapped reads on each gene were calculated with htseq-count, and PCR redundant reads were eliminated with UMI sequences in each paired read, generating a UMI count matrix of each gene in each cell [5].
In these 14 patients, we sequenced a total of 7,928 single cells, but only 6,148 cells with more than 10,000 UMIs and 2,000 detected genes were kept for downstream analysis. The median number of UMIs in the 6,148 cells was 97,294, and the median number of detected genes was 4,470.
Saturation analysis was performed to ensure sufficient data to make sure we have generated enough data to detect most of transcript in our single-cell library. For each single cell, down sampling was done by randomly extracting 10 to 90% of the original data, followed by calculating the detected gene number within this subset of sequencing data.
The box plot in Supplementary Fig. S1A indicates that more than 90% of genes could be detected using only 70% of the data; thus, we had sufficient data for the subsequent analyses.
For the differential expression analysis, TPM (transcripts per million) values were calculated from the UMI matrix. Considering that most of the single cells did not have more than 1 million UMIs, we used a previously published method for normalization to avoid counting each transcript several times: we calculated the log2(TPM/10+1) instead of the traditional log2(TPM+1).

CNV identification with whole genome DNA-seq and single-cell RNA-seq
To identify large fragment copy number variations (CNVs), we performed low-throughput whole genome sequencing with bulk cells from 42 sampling points, achieving an average sequencing depth of 0.66X. In each sample, we calculated the average depth of every 1-Mb window in all the chromosomes. To scale the average depth to the diploid level, we collected all the average window depth values and generated a histogram with 1,000 bins, in which diploid depth corresponded to the peak. Thus, all the raw average window depth data could be scaled based on a diploid baseline of 2, for a scaled average window depth of a normal diploid chromosome region equal to 2. This method could be used in most of our samples; however, as there was strong chromosome instability in patients S3 and S15, we had to define the ratio between the raw average window depth and the diploid average window depth based on observations of the ensemble copy number status, especially for the sex chromosomes ( Supplementary Fig. S3B).
For single cells, we adapted a previously described method [6,7] to estimate large fragment copy number alterations with scRNA-seq data. Since limited genes could be detected within single-cell data and we just wanted to identify CNV events at large fragment resolution, only genes with an average expression value avg(log2(TPM+1)) > 1.5 were kept for analysis. As CNV events could yield a similar pattern of changes in expression level of these genes, the average expression level in a moving window containing 50 genes in both the upstream and downstream directions was calculated to estimate the local CNV value, as described in a previous paper, followed by scaling to the centered CNV matrix.
Because there was a significant difference between normal and malignant glial cells, we performed t-SNE analysis and identified a group of cells that not only composed a major proportion of cells in nontumor biopsy sites but also expressed several normal oligodendrocyte marker genes (MOG, MAG, etc.). These cells were identified as normal oligodendrocytes, and they should be free of CNVs. Therefore, we used these cells as the control group and calculated the average CNV values of the control group with a method similar to that described above, followed by the centering procedure. The final CNV value matrix of the selected genes was calculated by subtracting the centered control CNV values from the centered CNV matrix of each corresponding sliding window ( Fig. 1D and Supplementary Fig. S3C).
To generate a large fragment CNV matrix for hierarchical clustering, the average CNV value of a fixed chromosome region size with the 10 Mb bin was calculated in each cell. In each patient, the CNV matrix subset containing major large-scale CNVs was used to identify clusters, and the chromosomes we used for clustering in different patients are described in Supplementary Fig. S3B.

Cell clustering with PCA and t-SNE analysis in Seurat
The R package Seurat (version 1.4) was used to cluster the UMI count matrix of the 6,148 cells [8]. First, we selected highly variable genes with the MeanVarPlot function; genes with an average expression value between 0.25 and 4 and a dispersion value above 0.8 were selected for principal component analysis (PCA) and t-SNE analysis. Next, we performed linear dimensional reduction with PCA to convert the large number of variable genes into a set of representative variables, or principal components (PCs). The most representative PCs were retained for downstream t-SNE analysis by the jackstraw resampling test with 1,000 replicates. We used 30 PCs to find clusters and run the t-SNE analysis; the resolution parameter in the find clusters procedure was set to 1.5 by multiple trial and observation ( Supplementary Fig. S5).
Within the clustered 2-dimensional t-SNE map, we observed a fragmented pattern in which cells from different patients were separated from others, despite expressing similar biomarkers (Fig. 2B). This finding was caused by the divergent CNV pattern in glioma patients. Thus, we used CNV information from bulk DNA sequencing, selected chromosome regions that contained fewer copy number alterations ( Fig. 2A), and removed variable genes outside these regions. We reperformed the PCA and t-SNE analysis with this variable gene subset; 28 PC were used to identify clusters, and we generated the final global t-SNE map.
To validate the feasibility of our strategy that removes variable genes from regions with numerous CNVs, firstly, we checked the gene expression data of those removed genes, their pattern are following the optimized 25 cell clusters. We also used SCENIC to reanalyze our data [9]. SCENIC utilizes UMI counts to construct gene-regulatory networks and thus identify cell states. First, potential transcription factor (TF) targets were identified with gene co-expression information, followed by enrichment analysis of these TF motifs, thereby generating a subset of pruned modules called regulons. Then, regulons were scored by calculating the area under the recovery curve (AUC) for each single cell, and an AUC score matrix was generated. By inspecting the distribution of the AUC score of each regulon, the active status of each regulon was determined based on a cutoff value, and the original continuous AUC score matrix was transformed to a binary regulon activity matrix, representing "on" and "off" with the binary values "1" and "0". With this matrix, we subsequently performed the t-SNE analysis and compared the results with those of our global t-SNE clusters (Fig. 2E). We used adjust rand index and adjusted mutual info score to evaluate the difference between SCENIC based clusters and our global Seurat clusters, the scores are 0.58 and 0.66, respectively. This is not a perfect score, however, after we fitted several important marker gene expression values into the SCENIC t-SNE map, we found that the SCENIC binary regulon matrix has its own characteristics.

Identification of differentially expressed genes and functional enrichment analysis
To understand the cell type and functional characteristics of each cell cluster, the Seurat function FindAllMarkers was used to perform a differential expression analysis with a likelihood-ratio test and argument: only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25.
AUC value were also calculated with roc test. For each cluster, AUC value of each gene were sorted in descending order. Genes with a lower adjusted p-value or higher AUC value were highly expressed in this cluster and represent the function and morphological characteristics of the corresponding cluster.
In the patient sub-CNV analysis, differentially expressed (DE) genes were identified among 2 or 3 groups of cells. For these cells, we use a t-test between each of 2 groups of cells; genes that met the condition of log2FC > 1 and P-value < 0.01 were kept as significant DE genes.
Marker genes of clusters (AUC>0.7) and significant DE genes between specified cell groups were used to deduce function. We performed functional enrichment analysis with Metascape (http://metascape.org) and the R package 'clusterProfiler' [10].

Single-cell trajectory analysis
During tumor development, malignant cells transition from one state to another, with differential expression of various genes. Thus, RNA-seq data for a group of related single cells could be used to deduce the relationships among these cells and construct a robust track of their changes in pseudotime. The R package Monocle2 was applied to estimate the developmental pseudotime of all malignant cells in an individual patient [11]. For each patient, we followed the standard monocle document and identified DE genes that defined progress towards the UMI count matrix of malignant cells. Next, the function reduceDimension with the DDRTree method was applied to reduce the dimensions, and the cells were then ordered along the trajectory using the orderCells function. Since Monocle produces a linear trajectory, and the direction of the pseudotime trajectory was unknown, we manually set the pseudotime direction based on other information such as CNV changes and the expression levels of progenitor or malignant marker genes.

Gene set enrichment score calculation
We adopted the method described in published work [12] to calculate the enrichment score In our analysis, we collected gene lists related to "HALLMARK EPITHELIAL MESENCHYMAL TRANSITINSITION" and "HARRIS BRAIN CANCER PROGENITORS" from the Molecular Signatures Database [13][14][15].

M2b macrophage/neutrophil marker identification and survival analysis with TCGA glioma datasets
M2b macrophages and neutrophils are important factors that reflect the status and degree of inflammation and malignancy; thus, estimating the cellular constituents using bulk tumor expression data could be valuable for clinical curation. In our PCA analysis of nonmalignant cells, we found that PC2 genes perfectly screened out M2b macrophages and neutrophils.
Therefore, we selected the top 200 PC2 genes, and hierarchical clustering helped us identify 78 marker genes that were highly expressed in M2b macrophages and neutrophils. Therefore, we removed marker genes not generally correlated by calculating a Pearson correlation matrix of these 78 genes with both the GBM and LGG matrices, and genes with an average correlation value above 0.2 were retained. Thirty-eight genes were used to calculate the M2bNeu score ( Supplementary Fig. S16C).

Isolation and culturation of primary glioma cells
Tumor samples were washed and dissociated enzymatically as previously described [16].