Multivariate lesion symptom mapping for predicting trajectories of recovery from aphasia

Abstract Individuals with post-stroke aphasia tend to recover their language to some extent; however, it remains challenging to reliably predict the nature and extent of recovery that will occur in the long term. The aim of this study was to quantitatively predict language outcomes in the first year of recovery from aphasia across multiple domains of language and at multiple timepoints post-stroke. We recruited 217 patients with aphasia following acute left hemisphere ischaemic or haemorrhagic stroke and evaluated their speech and language function using the Quick Aphasia Battery acutely and then acquired longitudinal follow-up data at up to three timepoints post-stroke: 1 month (n = 102), 3 months (n = 98) and 1 year (n = 74). We used support vector regression to predict language outcomes at each timepoint using acute clinical imaging data, demographic variables and initial aphasia severity as input. We found that ∼60% of the variance in long-term (1 year) aphasia severity could be predicted using these models, with detailed information about lesion location importantly contributing to these predictions. Predictions at the 1- and 3-month timepoints were somewhat less accurate based on lesion location alone, but reached comparable accuracy to predictions at the 1-year timepoint when initial aphasia severity was included in the models. Specific subdomains of language besides overall severity were predicted with varying but often similar degrees of accuracy. Our findings demonstrate the feasibility of using support vector regression models with leave-one-out cross-validation to make personalized predictions about long-term recovery from aphasia and provide a valuable neuroanatomical baseline upon which to build future models incorporating information beyond neuroanatomical and demographic predictors.


Introduction
For an aphasia-friendly version of this paper, please see Supplementary Fig. 1.
Aphasia, an acquired disorder of language, is a common and debilitating consequence of stroke.0][21][22] Initial language presentation has also been reported as a powerful predictor of long-term outcome, 2,3,5,12,23 though this measure is primarily a function of lesion location and extent.
The aim of the present study is to quantitatively predict language outcomes longitudinally in the first year of recovery from aphasia, across multiple domains of language and at multiple timepoints post-stroke.The ability to make such predictions is important for clinical reasons, such as providing data-driven expectations to patients and their loved ones and increasing clinicians' ability to anticipate treatment needs in the context of clinical care.It is also important for neuroscientific reasons, as patterns of predictive utility of a model across language functions can provide insight into the extent to which distinct language subdomains can be mapped onto distinct neural substrates.Finally, such a model could provide a baseline upon which to assess the relative influence of other factors such as functional reorganization on long-term language outcomes.
8][29] Though other studies have investigated recovery from aphasia longitudinally, most have not included image-based metrics among their predictors. 2,3,5,6,8,23Thus, no existing studies have aimed to account for the multidimensional and highly variable nature of aphasia recovery in a simultaneously longitudinal, lesion-informed, comprehensive and reliable manner.
Here, we use support vector regression (SVR) to predict scores on a multidimensional language battery at multiple timepoints post-stroke using demographic, language, lesion extent and lesion location-based predictors as input.

Participants
A total of 217 individuals with aphasia were included in this study.All patients presenting at the Vanderbilt Stroke and Cerebrovascular Center at Vanderbilt University Medical Center were considered for inclusion.For our broader aphasia recovery project of which this study is a part, 4 our inclusion criteria were (i) acute ischaemic or haemorrhagic stroke predominantly confined to left hemisphere supratentorial regions, or right hemisphere stroke with aphasia clearly indicating right hemisphere language dominance; (ii) age 18-90 years; and (iii) infarct at least 1 cm 3 with the following exceptions: (i) thalamic infarcts were included regardless of extent, and (ii) starting after ∼21 months of data collection, basal ganglia and/or subcortical white matter infarcts were included only if they exceeded ∼6 cm.Our exclusion criteria were (i) unconscious with grave prognosis; (ii) not fluent in English premorbidly; (iii) prior symptomatic stroke significantly impacting language regions or homotopic regions, neurodegenerative disease or any other neurological condition impacting language or cognition; (iv) major psychiatric disorder; and (v) substance abuse serious enough to interfere with study participation.One thousand and fifty-five patients met the first inclusion criterion and were evaluated for inclusion, and ultimately, 354 met all criteria and consented to participate. 4For the present analysis, we focused only on patients who presented with aphasia acutely (n = 218), but we excluded one patient who had only mild aphasia despite an extensive left middle cerebral artery lesion, representing clear evidence for right hemisphere language lateralization, yielding our final sample of 217 individuals.

Speech and language evaluations
Speech and language evaluation was completed at each timepoint using the Quick Aphasia Battery (QAB 31 ; Fig. 1A).The QAB is a valid, reliable and time-efficient aphasia assessment consisting of eight subtests, from which a QAB overall score is derived, as well as seven subscores reflecting speech and language domains: single-word comprehension, sentence comprehension, word finding, grammatical construction, speech motor programming (i.e.absence of apraxia of speech), repetition and reading.We also examined speech motor execution, i.e. absence of dysarthria, which is scored as part of the QAB but does not contribute to the overall score.Scores vary on a scale from 0 (complete impairment) to 10 (no impairment/ normal performance).Patients who were untestable at early timepoints but presumed (later confirmed) to be aphasic were assigned a QAB overall score of 0 (maximally impaired), while their subscores were treated as missing.Subscores were occasionally missing at other timepoints for various idiosyncratic reasons, e.g.limited baseline reading ability preventing assessment of reading difficulties due to stroke.These scores were treated as missing, and modified procedures were used to calculate QAB overall scores where necessary. 4All language evaluations were administered by certified speech-language pathologists (authors J.L.E., S.M.S. or C.F.O.).
QAB evaluations were sought from all eligible patients within the first 5 days after stroke.For those patients who presented with aphasia on initial evaluation or were untestable acutely and presumed likely to have aphasia, follow-up evaluations were sought at 1 month, 3 months and 1 year post-stroke.Note that, while the QAB defines the quantitative cut-off for aphasia as a QAB overall score of 8.9, diagnoses of aphasia were made using clinical impression as the gold standard.Of the 217 individuals with aphasia included in the study, 199 were formally tested using the QAB while 18 were untestable acutely but were found to be aphasic on follow-up (mean overall score at 1 month = 4.93 ± 2.33, range 0-8.05).The majority of these patients had extensive left hemisphere lesions (mean lesion size = 146.75cm 3 , SD = 107.22cm 3 , range = 6.32-376.56cm 3 ).
Among individuals who were testable acutely, there was no difference in initial severity between patients for whom follow-up data were obtained (mean overall score = 5.57 ± 2.69) versus not obtained [mean overall score = 6.04 ± 2.68, t(197) = 1.25, P = 0.21].Demographic information at each timepoint is available in Table 1.There was no difference in the distribution of initial scores among the followed-up patients at any timepoint, suggesting no sampling bias towards patients who were initially less impaired in the longitudinal data (Supplementary Fig. 2).
Audio and video were recorded for all sessions, which were then transcribed, scored and reviewed in consensus meetings attended by four to six authors.

Neuroimaging
As part of their clinical care, all patients who come through Vanderbilt University Medical Center suspected for stroke undergo a brain MRI and/or head CT to identify the presence, location and extent of neural damage.Lesions were delineated manually on these images by trained personnel (authors D.F.L. and M.R.; Fig. 1B).Coregistration and normalization of lesions were carried out as described in Wilson et al. 4 prior to smoothing with an 8 mm full width at half maximum Gaussian kernel.An overlay of the resulting lesion masks for the full data set is displayed in Fig. 1C.
The resulting lesion masks were transformed into vector space representations, henceforth referred to as lesion load vectors (LLVs), via calculation of the overlap of each patient's lesion mask with 150 spatial regions of interest (ROIs) in the left hemisphere of a custom combined grey matter and white matter atlas (based on Mori et al. 33 and Fan et al. 34 ; Fig. 2A).This atlas was designed to afford sufficient granularity across broad swaths of language cortex that are known to be heterogeneous in nature, 35 in particular the ability to distinguish between the superior temporal sulcus and the adjacent superior and middle temporal gyri.The resulting atlas consisted of 123 left hemisphere grey matter ROIs, 21 left hemisphere white matter ROIs and the left hemisphere portions of six commissural tracts.Each patient's LLV consisted of 150 values between 0 and 1 representing the proportion of each ROI that was lesioned (Fig. 2B).

Model fitting
SVR with a linear kernel was chosen to model relationships between predictors and language scores due to its ability to handle high-dimensional input data, lack of sensitivity to outliers and resistance to overfitting. 36,37We sought to follow best practices in multivariate lesion symptom mapping (e.g.full independence of training/testing data and use of appropriate metrics of prediction accuracy; see Scheinost et al. 38 for details).
Two main sets of models were constructed to predict QAB overall and each of the eight domain-specific subscores at each timepoint.
The first set of models will be referred to as LLV models.These models attempted to predict speech/language measures  based on lesion location, as encoded in the 1 × 150 LLVs.Also included in the models were lesion extent, age, sex, handedness, years of education and stroke type (ischaemic/ haemorrhagic).All of these additional variables were minmax scaled.
The second set of models will be referred to as LLV + initial presentation (LLV + IP) models.These models contained the same explanatory variables just described but also included patients' overall scores at the initial timepoint.Because initial scores were included as inputs, these models were only constructed for the 1-month, 3-month and 1-year timepoints.This set of models reflects potential clinical applications, in which lesion location and IP are known, and the goal is to predict subsequent trajectories.
Two sets of reduced models were also generated for each of the LLV and LLV + IP models: the first excluding LLVs (so that prediction was based only on lesion extent, demographic and stroke type variables, plus IP in the case of LLV + IP) and the second also excluding lesion extent (so that prediction was based solely on demographic and stroke type variables, plus IP in the case of LLV + IP).
All models were fit using the 'fitrsvm' function in Matlab2022b using the default parameters for linear SVR (box constraint = 1, epsilon = interquartile range of response variable/13.49and gamma = 1).Following model fitting, predictions were capped to the range of possible scores (0-10).

Assessment of predictive accuracy
Model generalizability was evaluated using a leave-one-out cross-validation procedure, in which each patient was held out in turn to have their score predicted from a model based on data from the remaining patients.
Model performance was evaluated using prediction r 2 as defined in Alexander et al., 39 corresponding to the ratio of the difference between each observed value and its predicted value compared to the difference between each observed value and the mean (that is, how much better the model performs than simply guessing the mean response value).Note that prediction r 2 is more conservative than the oft-reported squared correlation coefficient; note also that prediction r 2 can be negative in cases where the model performs worse than predicting the mean, which may occur in thresholdbased model-fitting procedures such as epsilon-insensitive SVR when predictors are not actually informative.
Prediction r 2 is a particularly conservative metric in the context of ceiling effects, as it is penalized in a manner that increases with decreasing variance in the observed data 39 ; therefore, predictive accuracy will be assessed as worse when the true scores to be predicted fall within a narrow range.We report root mean squared error in Supplementary Tables 1 and 2 as a complementary metric to reflect raw prediction accuracies unaffected by underlying variance.

Topographic mapping using feature weights
In order to investigate the potential neural bases of long-term greater aphasia severity, feature weights (i.e.model regression coefficients) in which higher values of the predictor were associated with lower QAB overall scores were extracted from the LLV model at the 1-year timepoint with a threshold of 1 SD from the mean feature weight.Note that there are currently no agreed-upon guidelines for assessing the statistical significance of SVR-based beta weights, 40 and thus, these features serve only as a preliminary means of understanding some of the neural regions that may play the biggest role in the prediction of aphasia outcomes.

Results
For a descriptive account of trajectories of recovery across the data set at large, see Wilson et al. 4 (Note that slight discrepancies in reported numbers are due to exclusion of one patient with clear right hemisphere language lateralization in the current paper.)

LLV models
These models included information about lesion location and extent, as well as age, sex, handedness, education and stroke type, but no information about IP.
Reduced models including only demographic and stroke type information had little to no predictive power, as expected.
Predictive power varied for the nine QAB subscores (Fig. 3, Supplementary Table 1).Word finding and grammatical construction were predicted particularly well across all timepoints, while single-word comprehension, speech motor execution and reading proved more difficult to predict.
LLVs improved performance in 34 out of 36 (timepoints by subscores) cases, indicating that specific information about the lesion site is critical to optimize prediction.

LLV + IP models
These models included information about IP (as measured by QAB overall at the acute timepoint) along with lesion location and extent, age, sex, handedness, education and stroke type.
QAB overall was predicted with r 2 = 0.64 at the 1-month timepoint, r 2 = 0.58 at the 3-month timepoint and r 2 = 0.60 at the 1-year timepoint (Fig. 4).(Note that prediction at the acute timepoint was not included because acute scores were among the model predictors.) For QAB overall score, reduced models including only IP, demographic and stroke type information were already relatively predictive of outcomes at the 1-month timepoint; however, this predictive utility of the reduced models decreased notably at later timepoints.This contrasts with the full models, which either retained or increased their predictive utility as time post-stroke increased.As above, predictive power varied for the nine QAB subscores (Fig. 4, Supplementary Table 2).Word finding, grammatical construction, speech motor programming and repetition were predicted particularly well across all timepoints, while single-word comprehension, speech motor execution and reading again proved more difficult to predict.
LLVs improved performance in 19 out of 27 (timepoints by subscores) cases, again most notably as time post-stroke increased.This pattern was particularly salient for the sentence comprehension, grammatical construction, reading and repetition subscores.

Neural predictors of overall aphasia severity
In order to investigate which regions may be most associated with aphasia severity in the long term, we probed feature weights for the QAB overall predictions at the 1-year timepoint using the LLV model.Grey matter predictors of lower QAB scores included the left superior temporal gyrus (STG), precentral gyrus, orbital gyrus and basal ganglia; white matter predictors included the left anterior corona radiata, retrolenticular internal capsule, genu of the corpus callosum, sagittal stratum and superior longitudinal fasciculus (Fig. 5).

Discussion
Our findings indicate that a great deal of the variance in longterm recovery from aphasia can be effectively predicted using SVR-based lesion-symptom mapping models and show that information about the location of a lesion, beyond simply its size, is in many cases crucial for making these predictions.This finding holds true even in cases where initial severity is accounted for, particularly at later timepoints post-stroke.Strengths of our study include its large and representative sample, its prospective longitudinal design, its detailed characterization of language using a validated aphasia battery and its careful consideration of best practices in multivariate lesion symptom mapping. 38his work is the first to our knowledge to systematically predict language outcomes for multiple predefined timepoints and on multiple language domains post-stroke.This work provides a quantitative follow-up to a recent descriptive study detailing trajectories of recovery from aphasia based on acute neuroimaging. 4cross models and timepoints, QAB overall and word finding were the outcomes that could be predicted most reliably, while outcomes in single-word comprehension, speech motor execution and reading were more difficult to predict.2][43] The apparent lack of predictive ability for single-word comprehension and speech motor execution may reflect the fact that these deficits tend to resolve well in the long term, 4 leaving minimal available variance for the models to learn from or predict.Reading, however, demonstrated poor prediction accuracy despite showing more variable outcomes in the long term.Future work should aim to investigate the ability to prognosticate reading outcomes in more detail using evaluations that more comprehensively account for various profiles of alexia with theoretically distinct anatomical bases. 44ncluding information about lesion location in the form of LLVs led to improvements in prediction accuracy in most models (34/36 LLV models, 19/27 LLV + IP models).While models that included acute QAB overall score sometimes performed well at the 1-month timepoint even without the inclusion of detailed lesion information, the addition of lesion load information regularly led to increases in predictive ability at the 3-month and 1-year timepoints.This pattern may reflect the complex nature of the acute post-stroke period, in which various factors not captured by our models, such as hypoperfusion, diaschisis and/or other medical complications, exert more influence, compared to later timepoints by which these issues have largely resolved and rendered lesion location a clearer predictor.
Our finding that lesion location-based predictions are more accurate at later timepoints is in line with prior work demonstrating transience and changeability in aphasia particularly in the early post-stroke period 4,45,46 ; however, it stands in opposition to a theorized 'proportional recovery rule' stating that individuals with stroke tend to recover some fixed proportion of their lost function. 23,47It is important to note, however, that the original claims in these proportional recovery studies were limited by small sample sizes, Figure 3 Continued show the acute (1-5 days post) timepoint, 1-month timepoint, 3-month timepoint and 12-month timepoint.Sample size and prediction r 2 are displayed for each model.Grey identity lines are plotted for reference to show how perfect prediction accuracy would appear.(Right) Bar plots showing prediction r 2 across all timepoints for QAB overall and the eight subscores (rows).Unfilled bars correspond to models using demographiconly predictors, shaded/striped bars correspond to models using demographic and lesion size predictors and solid bars correspond to models using demographic, lesion size and lesion load/location predictors.Sample sizes for each group of bars within a plot are equal and match those listed on the scatter plot for the corresponding subscore and timepoint.QAB overall, Quick Aphasia Battery overall score; Word comp, single-word comprehension; Sentence comp, sentence comprehension; Gram constr, grammatical construction; Speech mot prog, speech motor programming; Speech mot exec, speech motor execution.9][50] Thus, while initial language presentation may be a good predictor of outcomes in the short term, information about the integrity of specific anatomical regions may be more useful for effectively predicting outcomes in the chronic stage.Regions in which damage was the most associated with greater aphasia severity in the long term fell in both grey and white matter: specifically, the left posterior STG, precentral gyrus, orbital gyrus, middle frontal gyrus and basal ganglia in grey matter and the anterior corona radiata, retrolenticular internal capsule, genu of the corpus callosum, sagittal stratum and superior longitudinal fasciculus in white matter.57] These regions are thus reasonably expected correlates of long-term impairment in language.However, the absence of 'Broca's area' as a predictor of long-term impairment is noteworthy and is in line with prior work demonstrating that most aphasias following lesions to this region are transient in nature.[58][59][60][61] Regarding white matter predictors, the extent to which grey versus white matter measures are valuable for prediction is disputed, with some researchers suggesting metrics of structural connectivity increase predictive accuracy 24,29 and others claiming white matter information is largely redundant with grey matter measures.27,28 Prior work has, however, noted the importance of white matter 'bottlenecks' in left frontal and temporoparietal regions for supporting language function, [62][63][64] which aligns with our finding that the top two strongest predictors of overall outcomes were in anterior regions of the corona radiata and posterior regions of the internal capsule, close to these proposed bottlenecks.
While this study is the first to our knowledge to specifically predict longitudinal language outcomes across multiple domains of language post-stroke, a handful of previous studies have used similar approaches to explore the extent to which post-stroke language abilities can be using machine learning analyses of neuroimaging data.Most of these studies have been cross-sectional in nature, that is, investigating language performance in chronic cohorts at a single timepoint without reference to their acute presentation.Among these cross-sectional studies, some chose aphasia subtypes or global measures of aphasia severity as their outcomes of interest 25,26 ; others predicted more specific measures but achieved only modest predictive accuracy in out-of-sample testing, e.g.r 2 = 0.44-0.49, 24,29,30even as calculated using the squared correlation coefficient (a more liberal metric of predictive accuracy than prediction r 2 reported here 39 ).To our knowledge, the previous study most similar to the present study is Hope et al., 14 one of the only studies using multivariate lesion symptom mapping to make an explicit attempt to account for recovery.This study used Gaussian process regression based on structural imaging data and clinical variables to predict a measure of speech production derived from the Comprehensive Aphasia Test 3 at both single and multiple timepoints.However, while this study had a large initial sample size of 270 total patients, only 38 individuals were assessed longitudinally, and these individuals varied widely in the times of assessment post-stroke.Additionally, the study focused only on speech production.
The ability to accurately predict aphasia outcomes as demonstrated herein could have a positive impact on clinical practice and individuals living with aphasia.First, a better baseline understanding of expected trajectories of recovery from aphasia lays the groundwork for assessing the efficacy of treatment in clinical practice and/or clinical trials.Second, the ability to provide a patient with a sense of what recovery is likely to look like 'for them', specifically, could help to set realistic expectations for the patient, their loved ones and their clinical team, such that appropriate strategies for managing impairment and collaborative goal setting could be put into place. 65Finally, while speech-language pathologists tend to recognize the importance of neuroanatomical awareness in clinical practice, 66,67 neuroanatomical information is often found intimidating 68 and can be poorly retained. 69us, developing algorithms which can help to 'interpret' neuroimaging data, using technology similar to the models described here, may help clinicians across the spectrum of care more easily make neuroanatomically informed predictions for patients.
Regarding the real-world applicability of using neuroimaging to predict language recovery, Shuster 70 has raised concerns about prior attempts at this aim, citing, for example, a lack of regard for individual differences, poor validation on independent data sets, inaccessibility of scanner environments for certain patients and inattention to predictors that do not relate directly to the academic hypotheses in question.We have addressed many of these concerns in the present study: individual differences are accounted for via the positioning of patients in a multidimensional symptom space; leave-one-out cross-validation helps to handle the risk of overfitting; patients who were not MRI safe are included via drawing lesions on CTs; demographic and non-lesionbased predictors are already included, with even more predictors planned for inclusion in the future.Nevertheless, this work should simply be considered an early step towards a better understanding of the myriad factors that can influence language recovery, considered in tandem with other individual patient characteristics, therapeutic interventions and changes in neural function due to neuroplasticity.Indeed, machine learning approaches are simply models and should always be considered as a supplement to, rather than a replacement for, clinical expertise.

Limitations
This study has several notable limitations.First, many of the limitations noted in Wilson et al. 4 remain relevant to this follow-up study.As noted therein, the QAB is designed to be brief and therefore cannot comprehensively account for all aspects of language and associated functions; lesions were delineated using only acute neuroimaging, which may not be entirely reflective of irreversible neural damage; and sample sizes decreased longitudinally, with smaller sample sizes at later timepoints.Though there were no differences in severity across patients with and without follow-up timepoints, future studies with larger sample sizes at later timepoints will be necessary to verify the findings reported here.
Second, we chose to use within-sample leave-one-out cross-validation to assess the predictive accuracy of our models.Although the training and validation data used in our cross-validation procedure were fully independent, we were not able to hold out a truly independent test set to evaluate final model performance without sacrificing the power of our sample size.As data from future patients is collected, this new data will become the test set upon which the true generalizability of our models can be assessed.Prior work has discussed potential pitfalls of leave-one-out cross-validation, in particular the potential for anti-correlation between training and testing data in the presence of high variance across test exemplars. 71However, it is important to note that there are trade-offs incurred by all methods of cross-validation. 38,72iven our relatively small sample sizes at later timepoints and the relatively consistent lesion-symptom relationships observed, the bias-variance trade-offs incurred by using leave-one-out cross-validation were deemed preferable to those associated with holding out larger testing sets, especially decreased power to detect relationships between predictors and language symptoms.This issue could, again, be addressed in future studies with larger samples.
Finally, the reporting of neural correlates of language outcomes using beta weights to ascribe importance to particular spatial predictors of aphasia severity is experimental.The interpretation of feature weights in machine learning models, even in linear models as used here, is not straightforward due to the fact that they are calculated to meet algorithm-specific regularization constraints, rather than to model a direct relationship with the behavioural variable in question. 40,73,74Thus, these results should be interpreted with caution.

Conclusion
This study is the first to systematically predict outcomes for multiple predefined timepoints and on multiple speech and language domains post-stroke, explaining about three-fifths of the variance in aphasia outcome at 1 year.Our findings demonstrate that information about lesion location is crucial for making many of these predictions, particularly at later timepoints post-stroke.This work both demonstrates the feasibility of using SVR models to make precise and personalized predictions about long-term recovery from aphasia and provides a valuable structural baseline upon which to build more elaborate models, including information about functional language organization, brain health, diffusion tractography and/or speech and language therapy.Such models could help to further clarify what is different when, structural damage being equal, recovery is more successful in some individuals than others.Taken together, these scientific endeavours will aid both clinicians and scientists by providing a more effective means to predict outcomes in aphasia and by further elucidating the neural bases of language.

Figure 1
Figure 1 Overview of methods.(A) Example slides from the QAB 31 (used with permission from copyright holder).(B) Examples of manual delineation (top) and normalization (bottom) on different imaging types; left shows diffusion weighted imaging as used for ischaemic strokes and right shows fluid-attenuated inversion recovery imaging as used for haemorrhagic strokes.(C) Overlay of lesions included in full data set.

Figure 3
Figure 3Model performance for the lesion load models.(Left) Scatter plots comparing actual (y-axis) and predicted (x-axis) scores on the QAB overall as well as eight subscores (rows) in models using lesion load, lesion size demographic information as predictors.The four columns (continued)

8 | 2 Figure
Figure Model performance for the lesion load + initial severity models.As in Fig. 3 except that model performance reflects the inclusion of initial QAB overall score as an additional predictor.Note that acute predictions are not shown due to the presence of an acute score among the model predictors.

Figure 5
Figure 5 Neural predictors of long-term aphasia severity.Regions corresponding to negative feature weights 1 SD more extreme than the mean in the lesion load model, in which damage was predictive of lower overall QAB scores at the 1-year timepoint.Brighter regions correspond to beta weights associated with larger reductions in QAB overall at the 1-year timepoint.