Systematic review and network meta-analysis of the efficacy of existing treatments for patients with recurrent glioblastoma

Abstract Background Despite advances in the treatment of cancers over the last years, treatment options for patients with recurrent glioblastoma (rGBM) remain limited with poor outcomes. Many regimens have been investigated in clinical trials; however, there is a lack of knowledge on comparative effectiveness. The aim of this systematic review is to provide an overview of existing treatment strategies and to estimate the relative efficacy of these regimens in terms of progression-free survival (PFS) and overall survival (OS). Methods We conducted a systematic review to identify randomized controlled trials (RCTs) investigating any treatment regimen in adult patients suffering from rGBM. Connected studies reporting at least one of our primary outcomes were included in a Bayesian network meta-analysis (NMA) estimating relative treatment effects. Results Forty RCTs fulfilled our inclusion criteria evaluating the efficacy of 38 drugs as mono- or combination therapy. Median OS ranged from 2.9 to 18.3 months; median PFS ranged from 0.7 to 6 months. We performed an NMA including 24 treatments that were connected within a large evidence network. Our NMA indicated improvement in PFS with most bevacizumab (BV)-based regimens compared to other regimens. We did not find any differences in OS between treatments. Conclusion This systematic review provides a comprehensive overview of existing treatment options for rGBM. The NMA provides relative effects for many of these treatment regimens, which have not been directly compared in RCTs. Overall, outcomes for patients with rGBM remain poor across all treatment options, highlighting the need for innovative treatment options.

Among primary brain tumors, gliomas represent around 80% of brain cancers. Glioblastoma (GBM) (grade 4 glioma according to the World Health organization (WHO) classification 1 ) is the most common and aggressive brain tumor in adults. With an incidence of GBM of 3.2 in 100,000 people, 2 the cancer affects all ages of the population. Despite an armamentarium of therapies developed, relapse of GBM is inevitable for almost all patients; median survival is around 1 year after recurrence. 3 The diagnosis of recurrence in GBM is still challenging. The resistance to treatment of tumor cells, the heterogeneity and evolution of subclonal populations of cancer cells constituting the tumor as well as the genetic features of tumor cells seem to be the pillar of progression and failure of therapy. 4 Treatment options upon progression are limited, with no standard of care clearly defined. Recurrent glioblastoma (rGBM) patients are recommended to enroll in clinical studies where possible. 5 Treatment choice is guided by several factors including performance status, 6 tumor size, and location. [7][8][9] Only a small percentage of patients are eligible for re-operation. 10,11 The benefit of salvage surgery on survival depends on performance status, 10 tumor location, initial resection status, and age at relapse of GBM patients. 11,12 With the availability of improved imaging technology, re-irradiation is another option that is used in the treatment of rGBM. However, stringent criteria are applied including tumor size, tumor resection size, age, prior therapy, and the time between irradiation and re-irradiation. 9 The third option is systemic treatment. Drugs mostly used are alkylating or anti-angiogenic agents alone or in combination with other molecules. One of the anti-angiogenic drugs, bevacizumab (BV) is the only drug licensed for rGBM by the Food and Drug Administration (FDA) in the United States (since 2009). 13 BV has not been approved by the European Medicines Agency (EMA) owing to the lack of sufficient and convincing data. 14 The high level of vascularization and expression of vascular endothelial growth factor (VEGF) in rGBM supports the use of BV. Alkylating agents, such as nitrosoureas (carmustine [BCNU] or lomustine [CCNU]) are used for their lipophilic properties; they were the first drugs used in the treatment of rGBM 3 before the FDA approved BV. Recognized as a reference drug in the treatment of newly diagnosed patients, 15 the use of temozolomide (TMZ) appears challenging in the treatment of rGBM patients with MGMT (O 6methylguanine-DNAmethyl-transferase) promoter methylated tumors.
Meta-analysis allows analyzing data from multiple trials comparing the same two interventions simultaneously producing an overall effect of relative efficacy. 16,17 Network Meta-analysis (NMA) is a natural extension of the methodology to allow for the estimation of relative efficacy within a network of multiple interventions. The methodology makes use of direct and indirect evidence and is useful where multiple treatment options exist, which have not been directly compared or where head-to-head evidence is insufficient. [18][19][20] Several meta-analyses on the topic exist; however, these evaluated the efficiency of only one treatment strategy compared to another, 8,[21][22][23][24] as well as few NMA analyses including a small subset of treatment options. [25][26][27] Until now, no NMA aiming to compare a large number of available therapies for rGBM exists. Thus, we have conducted an extensive systematic review of the literature and fitted a large NMA incorporating all connected treatment regimens investigated to treat rGBM in an RCT setting. The objective of this analysis is to (i) provide an overview of treatment regimens evaluated for use in rGBM in an RCT setting and their associated efficacy and (ii) estimate the relative efficacy between treatment regimens using a Bayesian NMA. Outcomes considered for this analysis are progression-free survival (PFS) and overall survival (OS).

Search Strategy and Selection Criteria
This systematic review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) criteria. 28 The review is registered with PROSPERO (CRD42019142695).

Importance of the Study
Given the large number of possible treatment options for recurrent glioblastoma, it is important to understand the comparative efficacy between these treatments. Undertaking evidence synthesis of the available trials allows this to be done in a systematic way. This literature review and Bayesian network-meta analysis investigates the current evidence of available treatment options and their efficacy to treat patients with recurrent glioblastoma. We systematically searched the literature for any randomized controlled trial investigating any treatment option for adults with recurrent glioblastoma. Given the scarcity of direct comparative evidence, network meta-analysis provides a means of estimating comparative treatment effects based on direct as well as indirect evidence. Our analysis is the largest network meta-analysis conducted in this patient population to date, estimating relative treatment effects of 24 distinct treatment regimens. Overall, our analysis highlights the need for innovative treatment options for recurrent glioblastoma, as outcomes for patients remain poor across treatment options. A systematic search of the published literature was  conducted from inception to July 2019; search results  were updated in March 2020 to identify eligible studies  using EMBASE, MEDLINE (via PubMed) and CENTRAL (via Cochrane library) databases. The search was complemented by a search of the clinical trial register (clinicaltrials. gov). Systematic reviews on the topic were hand-searched for additional trials.

Neuro-Oncology Advances
Inclusion criteria for the systematic review were RCTs of adult patients (≥18 years) with rGBM investigating any intervention compared to either placebo or an active comparator. Only trials reporting OS, PFS, or tumor response were included. The full search strategy can be accessed as Supplementary Material 1.
Two independent reviewers (A.S./A.F. or N.A./S.S.) screened each article. Inconsistencies during title/abstract screening and full-text screening were resolved in discussion between both reviewers or by a third reviewer in cases where no agreement was found.
A data extraction form was developed using Microsoft Word to create a form with fillable fields with Adobe Acrobat Pro. The final format was agreed upon following piloting a first version using 8 articles. We extracted general information about the study record, questions about eligibility of the study in the systematic review (Population, Intervention, Comparison, Outcomes and Study (PICOS)), information and setting of the study population, results of the outcomes of interests and information about applicability of the study to the review. The full form can be accessed as Supplementary Material 2. Two independent reviewers also performed data extraction. Where no confidence intervals or variance measures of median OS or median PFS were provided, available Kaplan-Meier plots and event tables were digitized to recreate the underlying numerical data using WebPlotDigitizer. 29 A suitable R function was applied to recreate independent patient data to calculate Kaplan-Meier estimates. 2

Clinical Endpoints
For the NMA median OS and median PFS were chosen, as they were the most widely reported outcomes. Other endpoints, that were extracted but not further analyzed due to low number of reporting, were 6-month PFS, 12-month PFS, 6-month OS, 12-month OS, and tumor response rates. Tumor response rates were assessed as overall response (complete response + partial response), complete response, partial response, and stable disease.

Statistical Analysis
Study characteristics were described using means or frequencies. We performed fixed-effects Bayesian NMA including 24 treatments that were connected within a large evidence network for OS and 23 for PFS. An NMA analyses an entire network of treatments estimating relative treatment effects between all pairwise comparisons in the network utilizing direct and indirect evidence. The inclusion of a large evidence base fits naturally in a Bayesian framework, which supports conclusions based on all available information. 18 Based on median PFS data and patient numbers, the model estimated the relative efficacy for each pairwise comparison, measured as hazard ratios (HRs) assuming an exponential survival model, as has been done previously. 30 Noninformative priors, following a normal distribution with a mean equal to 0 and precision set to 0.01, were used. A fixed effects model was used due to sparse network connections. The Surface Under the Cumulative RAnking curve (SUCRA) was used to rank the treatments. 31 The SUCRA score takes values between 0 and 1, where higher numbers indicate higher ranked treatments. Relative effects are reported as mean and 95% credible intervals. Statistical analysis was performed using R Studio 3.6.3 with package R2WinBUGS and WINBUGS14. The model code is available in Supplementary Material 3.

Risk of Bias
We assessed Risk of Bias (RoB) for every included study using the Cochrane risk of bias tool for randomized controlled trials. 32 The tool evaluates 7 domains: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting and other in order to assess selection bias, performance bias, detection bias, attrition bias, and reporting bias.
Based on available information, each domain was judged by 2 independent reviewers to be of high or low risk or unclear. Disagreements were resolved in discussion.

Selected Studies and Characteristics
A total of 308 records were identified through the database search and 271 records through clinicaltrials.gov ( Figure 1). After duplicates were removed, 232 records remained from the database search. Following title and abstract screening, 86 single trial publications were available, 27 systematic reviews and 33 trials from clinicaltrials.gov. PICOS criteria were checked in full text screening, resulting in 42 records of single trial publications, 3 references of additional RCTs from systematic reviews and 5 trials from clinicaltrials.gov. A total of 50 records were included in the systematic review providing information on 40 RCTs, which fulfilled our inclusion criteria evaluating the efficacy of 56 treatment regimens for OS and 46 for PFS.
A median of 119 participants were included per study with a minimum of 21 participants and a maximum of 437. Thirty-two (80.0%) trials included 2 treatment arms and 8 (20.0%) trials included 3 treatment arms. The mean age of all study participants was 54.0 (range: 49.7-58.5) with 63.4% (range: 32.4%-78.0%) being male. The vast majority of the studies were phase II studies (30/40; 75.0%), 1 study (2.5%; not included in the NMA due to lack of connection) was a phase I study, and 9 studies were phase III studies (

Neuro-Oncology Advances
Efficacy Across all trials, the median OS for patients ranged from 2.9 to 18.3 months; median PFS ranged from 0.7 to 6.0 months (Figures 2 and 3). The study by Reardon et al. (2011) 33 showed the lowest median OS with TMZ (2.9 months) which may be due to the fact that 78 % of the study participants had two or more recurrences already. In addition, the number of study participants was rather low (n = 10). The combination therapy of dose intense TMZ + cannabidiol (CBP) + delta-9-tetrahydrocannabinol (THC) in Short et al. (2017) 34 and Twelves et al. (2017) 35 had the highest median OS with 18.3 months. These results also need to be interpreted with caution, as the number of study participants was also low (n = 12), no confidence intervals were provided and 100% of study participants had one recurrence only, demonstrating a less sick population. The median OS for BV monotherapy ranged from 3.4 months 36 to 12.6 months 37 . For BV combination therapies, the median OS ranged from 6.4 months 38 to 11.0 months 39,40 ) with

Schritz et al. Patients with recurrent glioblastoma
Neuro-Oncology Advances not be connected to the network and were excluded from the analysis: TMZ as monotherapy or in combination with BV, CBP, or THC or depatux-M (also tested as monotherapy); trabedersen, pembrolizumab as mono-therapy or in combination with BV; CT-322; hydroxyurea with or without imatinib; axitinib with or without CCNU; sorafenib with or without temsirolimus; personal peptide vaccination; cintredekin besudotox; carboplatin with or without RMP-7; vismodegib; novo TTF; semustine; erlotinib; procarbazine as well as BV in combination with TMZ or pembrolizumab. Figure 5 displays the estimated HRs for treatment comparisons versus BV in the network; Figure 6 displays the SUCRA scores. Pairwise HRs for all treatment comparisons in the network can be found in Supplementary Material 5 for OS and 6 for PFS.
The NMA indicated no statistically significant differences with respect to OS between any of the included treatment regimens on the 95% credible level. The SUCRA score was highest for BV + etoposide (EPS) (0.78), regorafenib (0.78) and BV + CCNU (0.72) indicating that there is a higher probability that these treatment is more effective in terms of OS. On the other side, BV + onartuzumab (0.19), alecsat (0.24), and HSPPC-96 + concomitant BV (0.27) were located on the lowest ranks. BV monotherapy was located in the middle with a SUCRA score of 0.51.   More differences were observed in the analysis of PFS data. Eleven of the 12 BV-based regimen hold the top 12 ranks. The only other treatment ranked within these therapies was fotemustine (FTM) in rank 8. BV mono-therapy was ranked in 10th place. No statistically significant differences were identified between these interventions, but many of these show a statistically improved PFS when compared to regimens in the lower ranks (see Supplementary Material 6 for details). We observed no differences between the lower ranked treatment regimens.
When comparing SUCRA ranks for OS and PFS, some opposite rankings were observed. Strongest changes were observed for BV + onartuzumab ranked fourth for median PFS, and last (24) for median OS and regorafenib, which was ranked 17th (of 23) for median PFS and highest for median OS. However, as no differences between treatment regimens were identified in the OS analysis, the rank changes should not be over-interpreted.
Overall, very small benefits in survival were observed with any of the interventions, highlighting the unmet need for patients with rGBM. Our NMA indicated some improvement in PFS with most BV-based regimen compared to other regimen but we did not find any differences in OS between treatments.

Risk of Bias
A risk of bias assessment was performed for each study during data extraction. See Figure 7 for a summary of the risk of bias assessment. Details per trial evaluations are displayed in Supplementary Material 7. Nonblinding of participants and personnel was declared as high risk for 72.5% of the studies. Almost all criteria showed a high percentage of unknown risk due to nonreporting.

Discussion
We conducted an exhaustive literature review identifying a large number of RCTs evaluating the efficacy of a large number of treatment regimens for use in patients in rGBM. The most common comparator arm in the identified trials was BV, however, many trials used a different intervention and the relative efficacy between many treatment options remains unknown. To our knowledge, this study presents the largest NMA estimating relative treatment effects in rGBM conducted to date.
We found a superiority of most BV-based therapies compared to other therapy options in terms of PFS. However, this effect did not translate into an improved OS. Fotemustine showed similar efficacy in terms of PFS as BV-based therapies. No significant differences were found between treatment regimens in the analysis of OS. This finding questions the use of PFS as a surrogate outcome for OS in rGBM. While OS is the most precise and unambiguous clinical endpoint in a trial, 43 very often, PFS is taken as the primary outcome as a surrogate for OS, as it can reduce the length of the trial and its sample size ultimately resulting in lower costs. 44 Our findings highlight a strong need to demonstrate treatment superiority in terms of OS in rGBM trials.
Our study has some limitations.

Neuro-Oncology Advances
While we were able to include a large number of trials in our NMA, not all trials contributed to a connected network. Hence, these could not be included in our analysis and no relative treatment effects could be estimated for these regimens. In the absence of connecting RCTs, additional research incorporating nonrandomized evidence or matching methods or additional assumptions on additivity of treatment components could be used to establish a connection, as has been done in the area of multiple myeloma for example. 30,45 However, certainty in the results would suffer due to the reliance on additional assumptions for nonrandomized evidence.
There was some heterogeneity across the included studies, which the model did not account for. Included trials were conducted over a 20-year time horizon, the earliest publication dating back to 2000. The definition of progression and response changes over time, with MacDonald 46 criteria more commonly used in earlier trials compared to the Response Assessment in Neuro-Oncology (RANO) 47,48 more commonly used in later trials. Additionally, the number of recurrences of study participants varied across studies, with some studies only including patients experiencing first recurrence, 39,40,49-51 1 study 33 including 78% of participants with 2 or more recurrences, 5 studies 13,36,52-54 with more than 60% of participants experiencing first recurrence and less than 40% experiencing their second recurrence. Unfortunately, a high rate of included studies (14 out of 23) did not provide detailed information regarding the number of recurrences and their distribution across the included participants. Due to this low number of reporting, it was unfortunately not possible to include the number of recurrences in the statistical analysis. Further, the proportion of male participants varied from 32% to 78%, which may cause some level of heterogeneity, considering evidence that male gender is predictive of poorer outcomes. Due to the sparse network connections, we were unable to fit a random effect model or meta-regression accounting for some of this heterogeneity.
Our NMA included studies differing in size, with arms including between 10 and 288 patients. While increased uncertainty due to small trials is propagated throughout the network, some of our results still rely on very small studies only. The top ranked treatment, BV + EPS, for example, was only investigated in one small trial, 33 including 23 patients in total, where it compared favorably (if not significantly so) with its comparator. Nevertheless, our goal was to incorporate all available evidence in the analysis, regardless of number of included patients and we feel that the uncertainty is reflected in our results.
The results of our RoB assessment indicate that there were many design flaws in the trials. Lack of blinding caused many of the trials to be of high risk of bias. Bias in study designs leads to misinterpretation of what the study outcome can demonstrate and it is often not possible to interpret the true results of these studies. In the review presented here, the large proportion of high risk of bias in at least one dimension did not allow for a subgroup analysis excluding these trials.
We chose PFS and OS as outcomes for our NMA, as they are regarded as the gold standard in oncology studies and were most widely reported across the trials. In order to make clinical decisions, other endpoints, including for instance adverse events and quality of life, also need to be taken into account. Unfortunately, these endpoints were not reported often enough or sufficiently homogenous to conduct a comparative analysis. Our NMA model relies on median PFS and OS values, as these were most widely reported in the trials. Alternative models based on reported  HR assuming a normal likelihood, would have resulted in a much smaller number of trials to be included. Despite BV not being approved as a standard of care in the treatment of rGBM, it is widely used in clinical trials, alone or combined with other drugs and therapies.
While our analysis shows no difference in any included treatment regimen with OS as an endpoint, we observed more differences with PFS and BV combination therapies ranked highest in terms of PFS. While SUCRA scores can be helpful, in terms of ranking treatments in order of probability of benefit, it should also be interpreted carefully when significant heterogeneity exists, as is the case here. 55 The introduction of new treatments in cancer such as immunotherapy has changed the paradigm of cancer treatment. 56 Our list of references encompasses trials based on chemotherapeutic or immunotherapy studies. The use of immunotherapy in the rGBM treatment covers many drug classes including checkpoint inhibitors (nivolumab, pembrolizumab), vaccines (HSPPC-96) or antibodies (onartuzumab, BV) and Alecsat (Autologous Lymphoid Effector Cells Specific Against Tumor Cells), a new epigenetic approach to immunotherapy. While our systematic review identified trials evaluating pembrolizumab, 57 unfortunately, they did not connect to the evidence network and could not be included in our NMA. Our analysis validates the superiority of BV-based regimens compared to many other regimens in terms of PFS. Additional evidence is needed to derive evidence on how these interventions compare to others.

Conclusions
Comparative treatment effects are key to guide clinical decision-making. Comparative trials between new and innovative interventions and existing treatments are needed to establish such evidence. NMA is a way of estimating relative treatment effects within a connected network of clinical trials, reducing the number of clinical trials needed to compare a large number of regimens. Our review highlighted a lack of comparative trials, which prevented us from establishing relative treatment effects between many regimens. Future trials of new interventions need to compare to existing interventions to allow for the estimation of such effects.
NMA results need to be interpreted carefully, especially where trial heterogeneity is high. Consistent reporting of important confounding variables, such as previous treatment history for example, would allow adjusting for some level of heterogeneity.
While there has been a steady publication rate of new clinical trials investigating additional treatment regimens, especially during the last 10 years, outcomes for patients with rGBM remain poor. We found no significant improvement in OS for any of the evaluated regimens compared to others. BV-based therapies demonstrated some superiority in terms of PFS.
Overall, our analysis highlights the high need to develop new and innovative treatments for this patient population delivering advances in patient relevant outcomes.