Brain tumour genetic network signatures of survival

Abstract Tumour heterogeneity is increasingly recognized as a major obstacle to therapeutic success across neuro-oncology. Gliomas are characterized by distinct combinations of genetic and epigenetic alterations, resulting in complex interactions across multiple molecular pathways. Predicting disease evolution and prescribing individually optimal treatment requires statistical models complex enough to capture the intricate (epi)genetic structure underpinning oncogenesis. Here, we formalize this task as the inference of distinct patterns of connectivity within hierarchical latent representations of genetic networks. Evaluating multi-institutional clinical, genetic and outcome data from 4023 glioma patients over 14 years, across 12 countries, we employ Bayesian generative stochastic block modelling to reveal a hierarchical network structure of tumour genetics spanning molecularly confirmed glioblastoma, IDH-wildtype; oligodendroglioma, IDH-mutant and 1p/19q codeleted; and astrocytoma, IDH-mutant. Our findings illuminate the complex dependence between features across the genetic landscape of brain tumours and show that generative network models reveal distinct signatures of survival with better prognostic fidelity than current gold standard diagnostic categories.


Introduction
Brain tumours remain remarkably resistant to treatment, and impose a socioeconomic burden second amongst cancers only to breast and lung 1 . Fewer than half of people with the commonest malignant type-glioblastoma, IDH-wildtype-survive a year, a prognosis unchanged over the past three decades in the face of an increase in incidence by more than a sixth 2,3 . These striking numbers suggest fundamental obstacles to treatment success that may signal the need for a radical change in our approach.
One of the greatest obstacles for innovation across oncology is inter-and intra-tumour heterogeneity 4-7 : the presence of richly structured diversity, either between different tumours or within different parts of the same one. Brain tumours typically exhibit numerous genetic mutations, spanning several cellular pathways, that open multiple avenues to oncogenesis no single intervention could conceivably block. It is no surprise that patients with higher levels of tumour heterogeneity-ranging across genetic 4, 8,9 , epigenetic, cellular and imaging characteristics 10,11 -exhibit both poorer clinical outcomes and weaker responses to therapy 5,[12][13][14] .
A pre-requisite to overcoming heterogeneity is obtaining a structured description, as comprehensive as available data allow, of the parameter space that encloses it. Such a description is difficult to derive because tumour heterogeneity is distributed across many potentially interacting features 4-6 , inhabiting a large, high-dimensional parameter space. It requires highly expressive mathematical models, capable of capturing multiple, richly

Data
The demographic, procedural, histopathological, tumour genetic, and diagnostic labels of 9518 neuro-oncology patients referred to our national centre were recorded prospectively from 2006 to 2020 (Figure 1, Supplementary Figure 1). The distribution of countries of origin was, in descending order, the UK (n=9149), Colombia (n=170), Sweden (n=157), Latvia (n=14), Hungary were stratified into low (1-7 copies), medium (8-15 copies), and high (16 copies or more), in accordance with standard practice at our centre. Sampling frequency, including missing data, was captured within the stochastic block model itself, exploiting its generative nature.
Our focus is on quantifying the potential value of graph-theoretic analysis of data routinely acquired during standard clinical care. Such an approach lowers the barriers to real-world application, for no change to standard investigational pathways is required, and enables the derivation of insights from historical data. Neither additional time nor economic cost is incurred, to patient or healthcare provider: the only necessary resource is compute. We therefore included all genetic features acquired as part of routine neuro-oncological care in the molecular neuropathology panel work-up at our centre. This panel is described online 28 , and aligns with established clinical practice providing both classification of tumours within the current WHO classification system 27 and prognostic or prescriptive utility such as MGMT methylation status for Temozolomide use 29 . Since our cohort dates back to 2006, it does not include CDKNDA/B testing first recommended in 2018 30 , and even now considered optional by many 27 . Our centre did not routinely test for the supplementary IDH mutant astrocytoma diagnostic marker TP53, favouring ATRX instead as recommended by the recent WHO classification 27 . Histological grade, determined by microscopic rather than molecular features, was not available. We did not perform any prior feature selection, for we are interested in the interactions between features our graph technique is specifically designed to illuminate, instead including all data available from our neuro-oncology service. Our modelling approach can cope with high-dimensional data, so no selection is required on methodological grounds.
Our analysis focussed on four major categories of glioma: glioblastoma, IDH-wildtype; astrocytoma, IDH-mutant; oligodendroglioma, IDH-mutant and 1p/19q codeleted; and a final group titled 'other glioma' that combined rarer entities. The rationale for the latter group was that we were wary of drawing inference from much smaller samples of rarer lesions, contrasted to the remaining diagnoses with significantly greater samples sizes, that would otherwise render the performance inequitable 31 (Table 1). Survival data was available for 1323 patients, constrained only by the mechanism of referral. For statistical modelling, we discarded samples where any graph community, diagnostic or genetic variable received fewer than 5 patients, and clamped days of survival at the 1 st and 99 th percentile to attenuate the influence of extreme outliers. A full cohort breakdown, including where applicable data missingness, is detailed in Table 1. A study flow chart is provided as Supplementary Figure 1.

Ethical approval
The study was approved by the local ethics committee at University College London. We received ethical permission for the consentless analysis of irrevocably anonymized data collected during routine clinical care.

Analytic compliance
All analyses were performed and reported in accordance with international TRIPOD and PROBAST-AI guidelines 32 .

Demographic analysis
One-way analysis of variance (ANOVA) with Tukey's procedure was used to establish the relation between patient age and diagnosis, and multivariate logistic regression for patient sex and diagnosis. Our criterion for statistical significance was a family-wise error rate (FWER) adjusted p<0.05, and all p values reported are corrected accordingly. Model coefficients were converted into odds or hazard ratios where appropriate.

Network genetic signature analysis
A network representation of tumour genetics can be formulated in two ways: with respect to genetic features, yielding signatures of characteristic patterns of genetic lesion cooccurrence, or with respect to patients, yielding distinct subpopulations exhibiting similar genetic signatures. The former illuminates the mechanisms of oncogenesis, the latter their heterogeneous manifestation across the population.

Stochastic block modelling of tumour genetic inter-relations
The relations between genetic features may be naturally formulated in terms of Bayes' rule 33,34 : where ( ) and ( ) refer to the probabilities of the states of given genetic features and respectively. ( | ) is the conditional probability of given , and ( | ) is the posterior conditional probability of given . In general ( | ) ≠ ( | ) so, unlike merely correlative indices, conditional probabilities enable us to construct a directed probabilistic graph of the pairwise relations between genetic features. The number of edges is given as the number of nodes choose 2, multiplied by two to cover bidirectional conditional probabilities, 2-! " .. We reviewed the weighted edge histogram of the graph according to conditional probability ( | ), comparing it to arguably simpler metric approximating covariance, the probability of intersection ( ∩ ). Conditional probability edges showed far greater weight variance (range 0.00 to 1.00, ± standard deviation (SD) 0.25) compared to intersection weights (range 0.00 to 0.51, ± SD 0.06). A reasonable assumption drawn from this process were that the use of directed conditional probability weights between genetic features may offer more sophisticated variation of information than simpler intersection (or covariance-based) metrics, and thus were adopted for subsequent mathematical modelling between genes. In compliment, a patient linkage graph was modelled with multi-variately weighted edges by binomial linkage of individual tumour genetic characteristics (schematic for both approaches shown in Figure 2).
We characterised simple centrality measures -eigenvector, hub, authority, betweenness, and page rank -weighted by the conditional probability assigned to the directed edges. We then statistically compared these centrality metrics between genetic domains with one-way analysis of variance (ANOVA). A stochastic block model is a generative model of the community structure of a graph composed of nodes, divided into blocks with edges #$ between blocks and 35 . The model can be framed hierarchically, where edge counts #$ form block multigraphs with nodes corresponding to individual blocks and edge counts arising as edge multiplicities between block pairs, including self-loops. We seek to infer the most plausible partition { % } of the nodes, where { % } ∈ [1, ] & identifies the block membership of node in observed network , with maximisation of the posterior likelihood P( |{ % }). The result is a hierarchically organised community structure of nodes assigned into blocks that yields the most compact representation of the graph, as indexed by its minimum description length 36 , ∑. The general approach is described in further detail elsewhere 35 .
We can use a layered formulation 26 of the model to distinguish between two potentially conflicting effects: associations between features driven by clinically-directed sampling vs by biologically-driven conditional probability. Key here is formal comparison between models that encode these effects separately, within their own layers, vs those where the distinction is not respected. In a Bayesian setting 26 , the procedure for model selection amounts to finding the model parameters, { }, that maximise the posterior likelihood as In this instance, ({ }|{ ' }, ℋ) is the posterior according to a given hypothesis ℋ, i.e., the true or null formulation. (ℋ) is then the prior belief for hypothesis ℋ, and Δ ∑ = ∑ 0 − ∑ 1 the difference in the model description length for these hypotheses. The description length of the true and null models can thus be formally compared. Where the description length of the true model (i.e., where sampling co-occurrence and conditional probability weights are correctly segregated by layer properties) is less than that of the null, then the model encoding sampling and conditional probability separately is preferred. Conversely, where the description length of the null is smaller, the layered formulation is shown to be superfluous, indicating the simpler, non-layered formulation should instead be preferred 21,26 .
Next, we interrogated the structure of the graph with a nonparametric Bayesian stochastic block modelling approach. The result is a hierarchically organised community structure of nodes assigned into blocks that yields the most compact representation of the graph, as indexed by its minimum description length 36 , ∑. Stochastic block models are described in extensive detail elsewhere 25,35,37 ; their utility in neuroscience has been demonstrated and validated by multiple groups 21,22,25,37,38 . An evaluation of 275 empirical networks spanning a range of domains, including social, transport, information technological, and biological (including brain connectome data) has shown that networks whose diameter, ⊘, is not large and random walk mixing times, τ, are not slow are well suited to such modelling 37 . Z-scored with respect to the 275 surveyed networks, the parameters of our network were ⊘ = -0.092 and τ = -0.11, well within the interval of well-modelled systems.
Having established the suitability of our approach, we proceeded to fit a stochastic block model

Stochastic block modelling of patient genetic signatures
The foregoing models reveal the community structure of the relations between genetic features, conditioning against linkages merely driven by sampling panel frequencies.
We now proceed to model the community structure of the relations between individual patients shaped by their shared tumour genetic characteristics. The inferred structure is interpretable as a patient-level representation based on characteristic, signature genetic patterns. We hypothesized that this network representation would yield higher quality stratification of survival than either diagnostic labels or linear representations of genetic factors, demonstrating successful capture of tumour heterogeneity. We used a Sankey chart to visualize the links between known genetic mutations and the current best-practice diagnostic nomenclature 27 , illustrating the diagnostic heterogeneity a stochastic block model representation could theoretically capture.
We created a dense graph with each patient, defined as a node, connected to every other by an undirected edge. Each edge was then independently weighted by the count of each genetic feature shared by the connected pair, resulting in a dense, fully connected graph with multiple binomial edge covariates spanning the full set of modelled tests. The number of edges is given as the number of nodes (patients) choose 2, -! " .. We visualised the graph as a minimum spanning tree labelled by WHO CNS5 diagnosis or survival, enabling a qualitative impression of its expressive power in comparison with a non-graph linear model of the low-dimensional structure of the data based on principal component analysis (PCA).
We proceeded to fit and optimise a stochastic block model as outlined in the previous section, yielding a hierarchical community structure of patients. The z-scored ⊘ and τ parameters of our network were -0.067 and -0.098 respectively, again within the interval of well-modelled graphs. We then used Bayesian multinomial regression 43 to quantify the contribution of each genetic feature to each community. The multinomial regression was estimated with MCMC, employing a single chain running to 100,000 samples, a burn-in of 100,000 and thinning of 5, reporting the regression coefficient estimated with 95% Bayesian credibility interval.

Survival modelling
To quantify the stratifying power of our network representation, we examined the prediction of survival, in days from the date of biopsy, for patients surveyed over at least 3 years. Date of biopsy was used as the index of onset in keeping with established practice in the field 44,45 .
We sought to compare survival models based on i) our network genetic signatures, ii) patient diagnosis, and iii) the raw tumour genetic information used to fit the stochastic block model.
We first constructed Cox's proportional hazard models, employing graph representational signatures, diagnosis, or raw tumour genetics across different models, with age and gender as nuisance covariates. We used a penaliser term of 0.1, and the Breslow baseline estimation method 46 . Model performance was evaluated by 5-fold cross-validation, relying on the median out-of-sample concordance index 46,47 . We extracted the survival function and hazard ratios of graph communities, diagnoses, and individual genetic domains for downstream comparison.
We augmented this analysis with a series of Bayesian logistic regression models 48,49 , predicting survival at 12, 24 and 36 months, motivated by the widespread use of annual survival-based metrics 50,51 . These classification models replicated the inputs of the survival models, and were estimated with MCMC, employing 100,000 samples, a burn-in of 100,000 and thinning of 5. A series of prior shrinkage schemes were evaluated, including g, horseshoe, horseshoe+, ridge, lasso, and logt 48 . Model performance and goodness-of-fit were determined by pseudo-R 2 and the widely applicable information criterion (WAIC) 52 , respectively.
The decision to evaluate the performance of network signatures against models of diagnosis or raw genetic features was driven by two factors: data requirements and favourable parameterisation. First, a systematic review of brain tumour survival models undertaken revealed that no previously published model incorporated the range of molecular data we had curated, studied different cohorts of the glioma landscape (e.g. just glioblastoma alone), and/or mandated additional data either not acquired during routine clinical care (e.g., full genome sequencing or proteomics), and/or necessitated multi-modal combination with medical imaging [53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70] . While these areas are undoubtedly interesting and add value to the field, our focus was to provide a means of forecasting survival with genetic data acquired in routine clinical care across the range of diagnoses available to us. Therefore, it was deemed appropriate to derive comparator models that would be tested against the graph-representations criterion on the original genetic data, and the WHO CNS5 diagnosis 27 . Second, it was important that our comparator models were comparable architecturally, so that any differences in model fidelity could be plausibly attributed to the quality of the representations, and not the hyperparameters/architectures that fit them. For this reason, it was judged appropriate to fit univariate models of diagnosis and linear multivariate models of genetics, but not nonlinear multivariate models of genetics. With all possible feature interactions here, the model parameter space rises to 3 628 800, which is clearly too large a space for a discriminative model supported by only 1323 patients. A non-linear model is therefore likely to overfit.

Null models
We evaluated a series of nulls of the preceding models, created by randomly permuting edge features before following exactly the same modelling steps. Model comparison to the corresponding null by description length allows us to infer that the structure of a target model does not arise by chance. We additionally quantified the difference in the predictive performance of survival models based on the inferred community structure.

Data and code availability
All code shall be made publicly available upon publication at https://github.com/highdimensional. Trained model weights are available upon request. Patient data is not available for dissemination under the ethical framework that governs its use.

Software
Analyses were predominantly performed within a Python (version 3.6.9) environment with the following software packages: graph-tool 42 , GeoPy 71 , gravis 72 , hdbscan 73

Compute
Analyses were performed on a 32-core Linux workstation with 128Gb of RAM and an NVIDIA 2080Ti GPU.    ANOVA of centrality metrics of the features within these communities identified a statistically significant difference in eigenvector centrality (p<0.0001), authority centrality (p<0.0001), hub centrality (p<0.0001), page rank (p<0.0001) and betweenness centrality (p=0.01) (Figure 3).

IDH
We confirm the association of IDH wildtype status with ATRX retention 81

ATRX
Retained ATRX is confirmed to be associated with IDH wildtype in glioblastoma, as well as with the presence of IDH-mutants, TERT mutants and 1p/19q codeletions, in oligodendroglioma.
ATRX loss was indicative of non-amplified EGFR, largely in astrocytoma. Lastly, while ATRX loss was confirmed to be associated with preservation of 1p/19q, isolated 19q deletion was found to occasionally exist with ATRX loss in 19q-deleted astrocytoma 98 . Both histone G34R and K27M mutants could manifest ATRX loss or retention.

EGFR
Non-amplified EGFR showed the expected association with IDH mutants (see section IDH), BRAF mutants, 1p/19q codeletions, and histone mutants (the latter largely in paediatric lesions). Any degree of EGFR amplification was associated with IDH wildtype, ATRX retention, variable TERT mutation status, and absence of a 1p/19q deletion, typically in glioblastomas 27,99 .
We reveal the propensity for at least moderate, if not high-level, EGFR amplification to manifest with IDH wildtype glioblastoma.

TERT
We confirm the known association of TERT wildtype with preserved 1p/19q and IDH mutants in astrocytoma, and of its mutants with IDH wildtype in glioblastoma, and IDH mutants in oligodendroglioma 100,101 . TERT wildtype was non-specific for both BRAF and histone wildtype and its mutants. Both TERT C228T and C250T promoter mutants were associated with IDH wildtype, preserved 1p/19q, histone wildtypes, and ATRX retention in glioblastoma cases. The mutually exclusive relationship between ATRX and TERT is confirmed 81,83 . Both TERT C228T and C250T mutants were also seen with IDH-mutants and 1p/19q codeletions.

1p/19q
We replicate the known exclusivity between 1p/19q codeletions and ATRX loss, where 1p/19q codeletion/ATRX retention is found in oligodendroglioma, but preserved 1p/19q and ATRX loss occurs in astrocytoma 81,95 . A 19q deletion alone was also associated with TERT wildtype and, interestingly, could also be seen with ATRX loss in astrocytomas 98 . We additionally reproduce the association of 1p/19q codeletion with EGFR amplification 102 .

Histone
Histone (K27M or G34R) altered tumours (typically diffuse hemispheric gliomas of paediatric/teenage and young adult demographic) were associated with IDH wildtype 27 , as expected. But we also found that where either histone mutant was present, 1p/19q, BRAF, and TERT were wildtype, typically with no EGFR amplification. Histone K27M mutants were also less likely to exhibit MGMT methylation. Both K27M and G34R altered tumours could exhibit ATRX loss or retention, though ATRX loss was more likely in our histone G34R altered samples.    Table 1). Survival models based on the source tumour genetic data revealed comparatively few significant predictive features: the 95% confidence intervals of histone, ATRX and MGMT methylation HR all crossed 1, EGFR amplification and TERT mutants were significantly associated with poorer prognosis (HR 1.64 and 1.23, respectively).

Network signatures of tumour genetic heterogeneity
1p/19q deletion and IDH mutations were both significantly associated with a better prognosis (HRs 0.54, and 0.39, respectively) ( Figure 6 and Supplementary Figure 9). We conducted formal model comparison to determine whether network signatures, diagnosis, or raw (epi)genetic data (both inherently diagnostic-e.g., IDH status-and supplementary variables-e.g., MGMT methylation array) offered superior fidelity in forecasting survival. In keeping with established practice, models were statistically compared with R 2 and the widely applicable information criterion (WAIC), inferring the best model to be the one with the lowest WAIC. We did so with all plausibly expressive levels of the graph hierarchy (L1 to L4 agglomerative community blocks -see Figure 5), and with both continuous regression models (Cox's proportional Hazard), and Bayesian logits for 12-, 24-and 36-month survival (Figure 7). In contrast, survival models using randomized null graph models failed to derive any meaningful survival prediction, nor with any community segregation (Supplementary Figure   10), offering chance accuracy: CPH c-index 0.534; 12-, 24-and 36-month survival R 2 0.008 or lower (Supplementary Figure 11).

Discussion
We have developed a comprehensive framework, founded on Bayesian non-parametric models of the community structure of graphs, for extracting interactive biological patterns from routinely acquired high-dimensional brain tumour genetic data, modelling relations not only between individual genetic features, but also between individual patients, with large-scale, representative, fully-inclusive international data acquired prospectively over a 14-year period.
Our framework has two aims: first, to reveal systematic genetic inter-relations potentially material to the pathogenesis of brain tumours over and above individual genetic contributions, thereby catalysing mechanistic hypothesis generation and therapeutic innovation, and second, to enable higher fidelity, more closely individuated patient stratification, with potential prognostic and prescriptive utility. Our approach not only successfully identifies known genetic inter-relations but reveals new ones, and not only replicates the WHO CNS5 diagnosis but provides a hierarchical patient stratification capable of predicting survival with higher individual-level fidelity than either diagnosis or simple linear models of the raw genetic and epigenetic features. Overall, these findings overwhelmingly support the value in applied network science in neuro-oncology 18 .
The demographic structure of brain tumour genetics We identify striking heterogeneities in the demographics of genetically defined brain tumours and their subtypes in our dataset of operable patients. In line with the literature 3,27 , patients with glioblastoma, IDH-wildtype were the oldest, followed in descending order of age by oligodendroglioma, IDH-mutant and 1p/19q codeleted, astrocytoma, IDH-mutant, and the remaining other gliomas (including BRAF mutant lesions characteristic of children and young adults). Overall, men were more prevalent than women in this large multi-site glioma sample, but significantly more so in glioblastoma, IDH wildtype, than the other tumours. Conversely, women were significantly more likely to be diagnosed with an oligodendroglioma, IDH-mutant and 1p/19q codeleted, when explicitly controlling for the cohort gender imbalance.

The value of a network approach
Our analysis attests to the value of a graph modelling [15][16][17][18]22,25,26,37,41,103 in eliciting rich phenotypic information underpinning the genetic heterogeneity of brain tumours. We have shown that graph analysis can reveal hierarchical communities of tumour genetic features sharing similar patterns of inter-relatedness and influence upon an overall tumour genetic structure that plausibly have mechanistic implications for the manifestation of brain tumours. Such communities are potential targets for more detailed examination and should be investigated across future research.
Moreover, we illustrate how graph analysis provides not only a representation 35  where it is high, a finer representation becomes statistically tractable. Generative stochastic block models provide formal support for these representations, relying on a formal equivalence between compression and inference in the specific setting 104 .
Critically, there is no theoretical constraint on the size of the models, only the impact of practical constraints such as data and compute that need be examined empirically. Future modelling could include other feature sets, such as more comprehensive genomics, exomics, or with features of intra-tumoural heterogeneity such as variant allele frequencies, sample error or purity. The algorithmic approach has been successfully applied to graphs of 3.38 million nodes (>840 times more than ours) 37 , leaving plenty of room for expansion. Note that such community structure as is discernible in the current models suggests the domain is eminently suited to stochastic block modelling, moreover allied research suggests biological networks (as is the case here) are especially well suited to the methodological approach 37 .
Graph feature genetic mapping

Highlighting genetic interactions
The vast majority of the genetic features studied here have been associated with a specific diagnosis and/or a particular prognosis 115 , explaining their inclusion in routine clinical investigation. Nonetheless, the striking multiplicity of features demonstrates the natural complexity of oncogenesis 23,24 . It is from this premise that we argue for the value of the graph approach presented here 15,16,18 -which definitionally incorporates interactions between multiple features-as a means of illuminating disease processes, on which future treatment innovation inevitably depends.
Survival prediction illustrates the potential value of our approach. Take the following genetic features, all familiar in isolation: i) IDH -its wildtype form now signifies a tumour to be a glioblastoma 27,115 and associated with a poorer prognosis; ii) MGMT -greater degrees of methylation are associated with altered responsiveness to Temozolomide, and a fairer prognosis (although likely in part due to treatment allocation) 29,119-122 ; iii) EGFR -greater amplification is associated with poorer outcomes 123 ; and iv) TERT promotor mutants -telomere extension is thought essential to key neoplastic mechanisms 115,124 . All these features are individually prognostic to some degree, but their interactions cast further light, segregating patients into intersectional subpopulations whose prognosis varies substantially and systematically with the specific pattern of interaction ( Figure 5, Figure 6, Supplementary   Figures 8-9). It is not the case that simply more mutations equate to poorer prognosis, but rather specific sets of interactions dictate them, supporting the notion that sets of features are prognostic 81 , rather than single factors taken in isolation. For instance, although isolated TERT mutants carry a poor prognosis (see Figure 6, panel e, and other studies 116,118 ), a TERT wildtype paired with EGFR amplification and MGMT methylation yielded poorer prognoses than many other tumour genetic communities, including many of those exhibiting TERT promotor mutants.

Enhancing individual-level prognosis
Network signatures of patient brain tumour genetic communities predict survival with greater fidelity than coarse diagnostic labels 27  lesions, reporting a c-index between 0.75-0.84 126 , but these models require comprehensive genomic data rarely available as part of routine clinical care. We suggest that the inclusivity of our framework, and its dependence only on routinely acquired genetic data, allows us to cast the net more widely in pursuing associations with potential clinical value. Moreover, we show here that whereas survival modelling by diagnosis is primarily driven by the distinction between IDH wildtype glioblastoma and other diagnoses, the graph community structure offers a far more finely stratified result. Glioblastoma subpopulations faring better or worse hinged on specific genetic traits, with similarly varied survivability across more favourable diagnoses ( Figure 6). It is intriguing that linear survival models constructed with the same tumour genetic data used to fit the graph community structure performed no better than diagnosis-based models. That the graph representation provides greater predictive power illustrates the potential value of harnessing the complex high-dimensional inter-relationships between tumour genetic features, and ought to stimulate further investigation.
Note that the superiority of network signatures was evident not only in Cox's proportional hazard modelling, but also in annually discretized classification within a Bayesian inferential framework. These models demonstrated more favourable goodness-of-fit by WAIC, indicating the superiority is not trivially explained by model overparameterization but by a better representation.

Study limitations
We sought to reveal the nature and prognostic value of modelling the inter-relationships between tumour genetic features acquired in the context of routine clinical care. The computational complexity of the task mandates the assembly of a large-scale, fully inclusive set of data. Such a set inevitably requires accumulation of data over long periods, covering substantial changes in investigational and diagnostic practice 27 . We therefore adopted a careful, multi-step approach for appropriate handling of data missingness that rendered 4023 of 9518 patients prospectively curated from 2006 to 2020 eligible for inclusion. Our objective, however, is not to provide a definitive representation of tumour genetics, but to demonstrate a suitable approach to drawing intelligence from tumour genetic data in a manner sensitive to its complex interactions. For survival modelling, while we included the demographic features of age and sex, we could not include performance index or other clinical characteristics owing to their lack of availability. Naturally, where such data is available it ought to be modelled, and its value quantified through the kind of model comparison we perform here.

Conclusion
Graph models of brain tumour genetics illuminate the landscape of tumour heterogeneity and enable better prognosis of survival than either diagnosis or models of individual genetic features. They offer a principled means of deriving rich phenotypic representations, with the finer descriptive granularity on which greater personalisation of care inevitably depends.
Translation of such an approach to the clinical frontline may offer opportunity for better and more patient-focussed care.