Machine prescription for chronic migraine

Abstract Responsive to treatment individually, chronic migraine remains strikingly resistant collectively, incurring the second-highest population burden of disability worldwide. A heterogeneity of responsiveness, requiring prolonged—currently heuristic—individual evaluation of available treatments, may reflect a diversity of causal mechanisms, or the failure to identify the most important, single causal factor. Distinguishing between these possibilities, now possible through the application of complex modelling to large-scale data, is critical to determine the optimal approach to identify new interventions in migraine and making the best use of existing ones. Examining a richly phenotyped cohort of 1446 consecutive unselected patients with chronic migraine, here we use causal multitask Gaussian process models to estimate individual treatment effects across 10 classes of preventatives. Such modelling enables us to quantify the accessibility of heterogeneous responsiveness to high-dimensional modelling, to infer the likely scale of the underlying causal diversity. We calculate the treatment effects in the overall population, and the conditional treatment effects among those modelled to respond and compare the true response rates between these two groups. Identifying a difference in response rates between the groups supports a diversity of causal mechanisms. Moreover, we propose a data-driven machine prescription policy, estimating the time-to-response when sequentially trialling preventatives by individualized treatment effects and comparing it to expert guideline sequences. All model performances are quantified out-of-sample. We identify significantly higher true response rates among individuals modelled to respond, compared with the overall population (mean difference of 0.034; 95% confidence interval 0.003–0.065; P = 0.033), supporting significant heterogeneity of responsiveness and diverse causal mechanisms. The machine prescription policy yields an estimated 35% reduction in time-to-response (3.750 months; 95% confidence interval 3.507–3.993; P < 0.0001) compared with expert guidelines, with no substantive increase in expense per patient. We conclude that the highly distributed mode of causation in chronic migraine necessitates high-dimensional modelling for optimal management. Machine prescription should be considered an essential clinical decision-support tool in the future management of chronic migraine.


Graphical Abstract Introduction
Migraine presents a therapeutic paradox. It is the second most disabling disease worldwide-first in the 15-50 age interval-with enormous social and economic impact. [1][2][3][4] Yet it is considered a treatable disease, responsive to a wide array of readily administered, mechanistically diverse interventions. 5,6 How do we find ourselves losing a war whose individual battles we are seemingly so well-equipped to win?
Two polar possibilities arise, distinguished by migraine's currently unknown mode of causation. If its cause is unitary-there is a necessary and sufficient mechanism common to all patients-the response to current treatments may be variable because their effect is collateral to the critical disease process. Here, finding a new, universally effective agent is theoretically possible, and its effect may be proven in an adequately powered randomized trial.
Conversely, if its cause is distributed-there is no single mechanism but a wide, heterogeneous field of interacting causal factors-treatment variability may be explained by varying correspondence between the chosen therapeutic agent and the patient's specific causal field. 7 Here our task necessarily complicates to identify not one but a family of mechanisms-and therefore modifying agents-and cannot be plausibly solved by any practicable set of conventional trials, for the unknown fraction of a sample responsive to any given treatment cannot be quantified without an overview of the treatment heterogeneity of the population as a whole.
Reality may fall anywhere between these two extremes. But in relying on randomized controlled trials, the currently dominant approach to therapeutic innovation in migraine excludes the second possibility entirely. It is, moreover, radically at odds with the widespread clinical impression of treatment heterogeneity, and the established practice of speculative, heuristic treatment, optimized by individual feedback over many months. 6 If a presumption is to be made, it is in favour of distributed, not unitary causation. But in the absence of widely applicable methods of studying complex distributed causation, the distinction has been untestable. The recent advent of highly expressive, computationally assisted mathematical models now allows us to investigate it empirically, and to address two questions of major translational significance.
First, examining an unselected, consecutive, fully inclusive, richly phenotyped cohort of 1446 patients with chronic migraine, here we quantify the individualized treatment effects of major categories of prophylactic treatment, exploiting causal multitask Gaussian processes models of proven power to extract heterogeneous causal effects from highdimensional observational data. 8 In the setting of unitary causation, where individual variability to current agents arises incidentally, there should be no marked difference between treatment effects evaluated across the populationaverage treatment effects-and treatment effects evaluated across the subpopulation identified to be susceptible-conditional average treatment effects. Conversely, finding such a difference would support the presence of distributed causation, reflecting consistent individual patterns of diverse mechanistic susceptibility.
Second, if we find individual responsiveness to be determinable, the order in which candidate agents are sequentially evaluated in a patient could be objectively optimized. Here we compare the theoretical benefit-quantified in time-to-response-of such machine prescription against established heuristic treatment policies, contextualized by estimates of the treatment cost. If the substantial benefit is observed, machine prescription ought to be preferred over the current expert-driven approach to treatment selection.
To assure generalizability, we quantify all effects on out-of-sample test data, unseen by the models in training.
Moreover, our focus is on the general extent of achievable treatment individuation and its impact, not the precise observed rank of individual treatments, for our questions seek to establish the correct causal framework in chronic migraine, and the best use of existing evidence in guiding treatment while we await further insight into the aetiology of this complex disorder.

Patients, interventions and outcomes
An unselected, consecutive, retrospective cohort comprising all patients seen by one of the authors, a neurologist with headache expertise (M.M.), at the secondary and tertiary Headache Centre at the National Hospital for Neurology and Neurosurgery, Queen Square, UK, from May 2007 to September 2019 was examined. The inclusion criteria were a diagnosis of chronic migraine and the availability of a structured clinical phenotypic record completed according to the template used at the clinic. Participants were not required to strictly fulfil the diagnostic criteria for chronic migraine, 9 but all exhibited the distinctive features of migraine as a firm diagnosis. Patients with chronic migraine only considered a differential diagnosis or in cases where the diagnosis was unclear were excluded. All patients evaluated at the Headache Centre routinely undergo a structured clinical assessment, including comprehensive detailed phenotyping and documentation of prior medical history. A proportion undergoes further investigations, including imaging as clinically indicated. Modelling incorporating brain imaging is the subject of a subsequent report. The study population characteristics are provided in Table 1.
The interventions modelled in this study were classed by mode of action and included all preventive therapies for which there were at least 100 adequately documented patient trials. The modelled therapeutic classes were onabotulinumtoxinA, flunarizine, candesartan, serotonin noradrenaline reuptake inhibitors (SNRI), topiramate, tricyclic antidepressants (TCA), acupuncture, valproate, betablockers and serotonergic agents (pizotifen and methysergide). Treatment response for a therapeutic class was defined as positive where more than 50% reduction in the number of headache days across the last 3 months of trialling was observed, and negative otherwise, by any agent within the class, over an evaluation period of at least 3 months. A headache day was defined as a day when the patient was affected by headache for any part or whole of the day, as per the international classification of headache disorders third edition definition. 9 Headache days were recorded by patients prospectively on a paper headache diary and evaluated by the neurologist. Treatment responses to onabotulinumtoxinA were labelled as effective only if the PREEMPT paradigm was followed, and a treatment effect remained after the second set of injections, to account for the known high placebo response.
A total of 1831 patients were eligible for inclusion. Of those, 131 were excluded owing to diagnostic uncertainty, and a further 269 owing to missing data for more than 10% of all features, leaving 1446 for analysis ( Supplementary Fig. 1).

Data acquisition and data management
Data were collected through automated extraction of the Microsoft Word template-based structured clinical record employed by the Headache Centre. Standard natural language processing techniques such as string matching with regular expressions and grammatical decomposition were used. One of the authors (A.S.) reviewed the complete records of a held-out subset of 60 patients and manually extracted and plotted all features to serve as a comparison to the automated data extraction. Accuracy against manual extraction from the held-out subset was 90.73%. Note that this processing was performed for service optimization purposes wholly within the clinical digital environment; all subsequent analysis was performed on data from which all identifiers had been removed. Categorical and continuous variables were converted to a continuous interval scale. All other features were binarized as present or absent. Supplementary Table 1 outlines details on all included features and outcomes. Subjects with missing data for more than 10% of the features were removed from the dataset. The mean and standard deviation were used for precision and variance estimates in cases of normal distribution; the median and interquartile range were used otherwise. Normality assumptions were based on visual inspection of histograms and the Shapiro-Wilk test for normality. Effect estimates were reported with 95% confidence intervals (CI). The significance level was set to α = 0.004 after Bonferroni correction for 14 comparisons (0.05/14 = 0.004) in the prescriptive modelling, and the conventional 0.05 in the individualized treatment effect modelling.

Modelling and statistical analysis Individualized treatment effect modelling
We randomly split the dataset into three stratified subsets: training, validation and test, the last providing a held-out, out-of-sample definitive benchmark of performance. The partitions were kept separate and created with the following ratios: 4:1 training to test and within the training set, 4:1 training to validation ( Supplementary Fig. 1). Missing data were imputed with a probabilistic principal component analysis imputer based on the training dataset, and the data were scaled.
To model individualized treatment effects, we implemented a causal multitask Gaussian process model. 8 The model has been validated to be capable of inferring individualized treatment effects from observational data, accounting for a nonrandom distribution of the treatment factor. The model learns from the high-dimensional array of features (in our case, headache phenotype and comorbidities) to infer treatment effects. Treatment effects may be interpreted as the theoretical difference in response (here defined as ≥50% reduction in headache frequency) when exposed to a treatment versus not exposed to a treatment. In the implementation of the model, two different interventions are compared, and by learning from the training data, the model can predict treatment effects in unseen data at the individual level-i.e. individualized treatment effects. The model was trained and optimized using the training and validation subsets. The discounted cumulative gain was used as a scoring metric to evaluate the choice of kernel hyperparameters (Supplementary Table 2). The best performing model in the validation set was finally evaluated on the out-of-sample test set.
We made pairwise comparisons between all prophylactic intervention classes giving individualized treatment effects for each intervention compared with each of the others. Each patient's individualized treatment effect for an intervention class was calculated as the mean of all pairwise effects including that class. Thus, we arrive at a modelled individualized treatment effect for all intervention classes for each patient. We calculate the average treatment effects -i.e. the estimated population treatment effect inferred from the Gaussian process model-as the median and interquartile range of individualized treatment effects for each intervention class. We also report the median and interquartile range of the mean of individualized treatment effects across all pairwise comparisons to provide descriptive insample and out-of-sample average treatment effect estimates for each intervention class ( Supplementary Fig. 2).
Next, we defined a conditional subgroup consisting of the patients whose modelled individualized treatment effect was above the median (owing to skewed data). By calculating the average treatment effect similarly as above for the conditional subgroup, we derive the conditional average treatment effect-i.e. each intervention class' average treatment effect among those predicted by the model to respond. We then compare the conditional true treatment effect (the true response rate captured from headache diaries in the conditional subgroup) with the overall true treatment effect using a one-sample t-test of the differences across all intervention classes and report the mean difference with a 95% CI. Finally, we estimate the validity of the machine prescription by calculating the 10-fold cross-validated accuracy of a logistic regression model using the predicted individualized treatment effects as the independent variable and the true response as the dependent variable. A logistic regression model was deemed as most suited as there was only one continuous independent variable (predicted individualized treatment effect) and one binary dependent variable (response or no response).

Impact and prescriptive modelling
To model the impact of machine prescription, we implemented the following strategy: the individualized treatment effects were used to rank the preventative therapies from highest to lowest probability of response for each individual patient. From this, we ascertained the proportion of patients having tried in reality their top three predicted treatments.
Further, given a patient and a sequence of agents ordered according to their response probabilities-i.e. individualized treatment effects-from highest to lowest, p 1 , p 2 , p 3 … p 10 , we calculated the probability of arriving at a treatment success after a given number of failed treatments, delivered in the optimal predicted sequence, as follows: where X denotes the number of independent failures before a success (at trial k + 1). Given treatment success at trial k + 1, we are able to calculate the expected number of months in pain (i.e. months with failed treatments) before completion of a successful treatment trial as for each patient at k = 0, k = 1, k = 2 … k = 9. Here t equals the necessary time to evaluate a treatment trial which was defined as 3 months for all treatments except onabotulinumtoxinA which was 6 months. This gives a population distribution of the number of months to completion of a successful treatment trial, allowing us to estimate time-to-response given different sequences of intervention trialling.
We then aimed to evaluate the optimal predicted sequence of intervention trialling to other possible sequences. We compared the population distribution given by the Gaussian process machine prescription to the population distribution given by ranking by different guideline and expert recommendations, 10,11 following a random sequence, and ordering treatments by increasing costs. We constructed a series of sequences to reflect different available guidelines and expert opinions. Guideline recommendation 1 sequence was constructed by picking three random of the evidencebased oral preventives suggested in at least one guideline (TCA, SNRI, betablockers, candesartan, topiramate, valproate and flunarizine) followed by onabotulinumtoxinA. 10 Guideline recommendation 2 sequence was based on picking two random among betablockers, candesartan, TCA and SNRI, followed by one random of topiramate, valproate and flunarizine, followed by onabotulinumtoxinA. 11 The National Institute for Health and Care Excellence guideline sequence was based on the recommendation of trying each of betablockers, topiramate and TCA (order not specified), followed by onabotulinumtoxinA. The expert panel sequence was based on an aggregate of 23 UK headache specialists asked to order the treatments based on a general understanding of efficacy and adverse events (Supplementary Table 3). The random sequence was created from the mean at each timepoint k 0 , k 1 , k 2 … k 9 of a Monte Carlo simulation with 1000 realizations, i.e. 1000 random sequences. For the machine prescription, we also restricted onabotulinumtoxinA to be the fourth trialled treatment to mitigate bias from the difference in evaluation period between onabotulinumtoxinA and other treatments. A twotailed t-test was used to compare the population distributions of time-to-response, reporting the mean difference with 95% CI.
Using the British National Formulary price tariffs, we derived estimates of individual treatment-related expenses in pound sterling. We then compared the optimal machine prescription sequence of intervention trialling to sequences ordered by treatment costs. Moreover, we reported the difference in expenses defined as the sum of the n top predicted trials subtracted from the sum of the n actual trials, where n is the number of trials. We reported estimates for the lowest and highest available price tariffs (Supplementary Table 4).
We also conducted a sensitivity analysis on sub-strata of severely affected patients versus less severely affected patients comparing machine prescription to the guideline recommendation 1 sequence. We reiterated the individualized treatment effect and impact analysis on two sub-strata of the population. The first strata consisted of patients with at least 25 headache days/month and a headache intensity of 9 or higher. The second strata consisted of patients with ,25 headache days/month and headache intensity below 9.

Data availability
The minimum dataset required to replicate this study contains personal data and is not publicly available. The code used in this study is available upon reasonable request to the authors.
Out-of-sample comparison of the true treatment effect across the population to the conditional true treatment effect within the subpopulation of predicted responders for each treatment class showed a mean difference of 0.034 (95% CI 0.003-0.065; P = 0.033) in favour of the latter (Fig. 1). This discrepancy was greatest for flunarizine, serotonergic drugs and valproate. The accuracy of validating the modelled individualized treatment effects compared with true treatment effects was consistently high (0.731 + 0.103). Table 2 outlines modelled average treatment effects, modelled conditional average treatment effects and true treatment effects for all intervention classes.

Machine prescription and its impact
Out of the 253 patients included in the test set, 85 (33.6%) had tried their model predicted best treatment, 140 (55.3%) patients had tried at least one of their top two treatments and 170 (67.2%) had tried at least one of the top three treatments.
Sequentially evaluating treatments by machine prescription resulted in arriving at a successful treatment in significantly fewer months than administering treatments in order by generic guideline recommendations (−3.750 months; 95% CI −3.993 to −3.507; P , 0.0001), or indeed any other justifiable order, including experienced clinician rankings ( Fig. 2 Table 5). The best treatment policy did not differ from randomly evaluating therapies. Finally, the average additional 3-monthly cost for machine prescription was −£2 for low drug tariff estimates and £1 for high drug tariff estimates.

Discussion
Surveying a chronic migraine population among the largest and most finely phenotyped in the literature, here we show treatment heterogeneity to be robustly predictable from high-dimensional causal modelling of routinely collected clinical data. This finding supports a complex, distributed underlying mode of causation in chronic migraine, and suggests that neither the pursuit of a unitary causal mechanism, nor the evaluation of treatment effects within conventional randomized controlled trials is likely to be productive. Rather, deeper characterization of patient heterogeneity is likely to be needed, through modelling richer additional features, such as imaging, physiological and genetic data [12][13][14][15][16][17][18] at larger data scales, illuminating the wide causal field of factors that clearly underpins this complex disorder.
We show further that current treatment policy guidelines yield broadly the same time-to-response as chance. This is Figure 1 Individualized treatment effects. Violin-plot representing modelled individualized treatment effects for overall out-of-sample data (the left violin of each pair) and the conditional subgroup predicted to respond (the right violin of each pair) of the 253 subjects included in the test set. Individualized treatment effects are estimated using a causal multitask Gaussian process model for all modelled intervention classes. The two rightmost violins represent the overall discrepancy between modelled treatment effects and modelled conditional treatment effects, with a mean difference of 0.034 (95% CI 0.003-0.065; P = 0.033) in favour of the latter. Recall that this observed discrepancy in treatment effects, reclaimed in the true treatment effects, supports highly heterogenous treatment responsiveness and distributed mode of causation. SNRI, serotonin noradrenaline reuptake inhibitors; TCA, tricyclic antidepressants.    The population time-to-response using machine prescription was compared with other treatment evaluation strategies using a two-tailed t-test.
consistent with the widespread belief among clinicians that the individual selection of optimal treatment based on a small subset of individual patient factors is very difficult, 6 a belief reinforced by expert panel rankings of treatments (Supplementary Table 3). By contrast, machine prescription offers a significantly shorter time-to-response, with a substantial mean effect size exceeding 3 months-equating to a 35% reduction. Crucially, better treatment is here achieved without a marked increase in cost, or plausibly greater risk of side-effects, 19,20 and without substantial variability across different severity strata. Close consideration must clearly be given to adopting the approach at scale, for the balance of risks and benefits is here heavily weighted in our favour. It may seem premature to draw so general a set of conclusions from a single centre, tertiary referral population, even if this is one of the largest reported in the literature. But it is crucial to appreciate that all inferences are here drawn from out-of-sample test data, indicating generalizability beyond the training data. Moreover, if a marked discrepancy between individual and population responsiveness is robustly identified within a comparatively small population with lower levels of heterogeneity than are observed in wider care, a larger-scale analysis can only magnify it. This is because the tractability of patterns of heterogeneity can only be enhanced with data of greater scale and inclusivity. Indeed, our analysis invites replication with primary care data which we here show can be readily performed automatically with structured clinical records. The objective of this study is less to derive a set of specific models than to illustrate the optimal way of approaching machine prescription in migraine, given its manifest complexity.
Though a small proportion of patients did not fulfil strict diagnostic criteria, they were judged by an expert headache specialist to have chronic migraine, and our population demographics overall are in line with other large chronic migraine cohorts. 21,22 Such patients should, and generally would, be treated as chronic migraine in real-world practice, and excluding them would limit rather than enhance generalizability to the wider population. Without objectively determinable aetiological criteria, all classification is in any event heuristic. That the conclusions here apply to a slightly broader population does not make them invalid: it extends their reach.
Finally, though allocation bias inevitably corrupts observational studies, in the context of heterogeneous treatment effects clearly exemplified here, it is only one contributory factor to the fidelity of the inference, and becomes increasingly secondary to the quality of the outcome model as the scale of available data rises. 23 In any event, a multi-agent randomized controlled trial is obviously infeasible here, and evaluated with conventional statistics would be critically confounded by the individual level, high-dimensional patterns of treatment responsiveness we have already demonstrated.

Conclusion
Our analysis of a large and richly characterized dataset of chronic migraine phenotypes demonstrates-not only the value-but also arguably the necessity of high-dimensional modelling in the management of migraine. We develop and evaluate a causal model of commonly used anti-migraine preventives that demonstrate both the distributed mode of causation, and the feasibility of machine prescription at the individual level. We conclude that the application of highdimensional modelling to prescribing is a critical step towards reducing the massive global burden of migraine through realizing the personalized, precision medicine this remarkably complex condition demands.

Funding
Funded by the UCLH NIHR Biomedical Research Centre and the Wellcome Trust.

Competing interest
Dr A.S. reports grants from Samarbeidsorganet Midt-Norge, during the conduct of the study; other from Nordic Brain Tech AS, outside the submitted work; in addition, Dr A.S. has a patent mHealth Biofeedback treatment concept for episodic migraine pending. Dr R.G. has nothing to disclose. Dr E.T. reports grants from The Central Norway Regional

Supplementary material
Supplementary material is available at Brain Communications online.

Ethics
The study was performed under NRES approval by the London-West London and GTAC Research Ethics Committee for the consentless analysis of irrevocably anonymized data.