Impaired value-based decision-making in Parkinson’s disease apathy

Abstract Apathy is a common and disabling complication of Parkinson’s disease characterized by reduced goal-directed behaviour. Several studies have reported dysfunction within prefrontal cortical regions and projections from brainstem nuclei whose neuromodulators include dopamine, serotonin and noradrenaline. Work in animal and human neuroscience have confirmed contributions of these neuromodulators on aspects of motivated decision-making. Specifically, these neuromodulators have overlapping contributions to encoding the value of decisions, and influence whether to explore alternative courses of action or persist in an existing strategy to achieve a rewarding goal. Building upon this work, we hypothesized that apathy in Parkinson’s disease should be associated with an impairment in value-based learning. Using a four-armed restless bandit reinforcement learning task, we studied decision-making in 75 volunteers; 53 patients with Parkinson’s disease, with and without clinical apathy, and 22 age-matched healthy control subjects. Patients with apathy exhibited impaired ability to choose the highest value bandit. Task performance predicted an individual patient’s apathy severity measured using the Lille Apathy Rating Scale (R = −0.46, P < 0.001). Computational modelling of the patient’s choices confirmed the apathy group made decisions that were indifferent to the learnt value of the options, consistent with previous reports of reward insensitivity. Further analysis demonstrated a shift away from exploiting the highest value option and a reduction in perseveration, which also correlated with apathy scores (R = −0.5, P < 0.001). We went on to acquire functional MRI in 59 volunteers; a group of 19 patients with and 20 without apathy and 20 age-matched controls performing the Restless Bandit Task. Analysis of the functional MRI signal at the point of reward feedback confirmed diminished signal within ventromedial prefrontal cortex in Parkinson’s disease, which was more marked in apathy, but not predictive of their individual apathy severity. Using a model-based categorization of choice type, decisions to explore lower value bandits in the apathy group activated prefrontal cortex to a similar degree to the age-matched controls. In contrast, Parkinson’s patients without apathy demonstrated significantly increased activation across a distributed thalamo-cortical network. Enhanced activity in the thalamus predicted individual apathy severity across both patient groups and exhibited functional connectivity with dorsal anterior cingulate cortex and anterior insula. Given that task performance in patients without apathy was no different to the age-matched control subjects, we interpret the recruitment of this network as a possible compensatory mechanism, which compensates against symptomatic manifestation of apathy in Parkinson’s disease.

Given our a priori hypothesis that a perseverative strategy might reflect heighted effort sensitivity in apathy 1,2 , we also included two further choice rules, which influenced choice repetition by weighting each bandit's choice probability in the model by a preservation bonus,  , which is added to the chosen actions probability of being chosen, if two more consecutive trials are repeated, Ict−1=i.
In both the SMP and SMEP choice rule, the variable represented by I , is either 0 or 1 depending upon whether the chosen bandit on trial t is the same as that on the previous trial .

Choice rule 3 (SMP):
- In both the SMP and SMEP choice rule, the variable represented by  , is either 0 or 1 depending upon whether the chosen bandit on trial  is the same as that on the previous trial  −1 .

Bayesian Learner
The Bayesian learner implements the Kalman filter For bandits that are not chosen on any one trial, the prior mean and variance remains unchanged within a trial.However, the prior distributions are updated for all bandits in between trials based upon the subjects belief about the gaussian random walk so that; Where  ̂ ,  ̂,  ̂ are constants representing the decay parameter, decay centre and the diffusion variances and were fixed at values of 0.98, 50, and 2.8 for each respective parameter.We used the same values as Chakroun et.al, (2020 4 ) which are derived from the actual parameters that govern the gaussian walk which determines the bandits payout.

Delta rule
The delta learning rule has a fixed learning rate, ,and there is no variance tracking for the uncertainty of the valuation.After a bandit is chosen   the reward for that bandit  is obtained   .The prediction error   is derived in the same way and the prediction of the bandit's value  ̂ ,  can be updated as such: There is no decay in this model between trials, the estimated value for a bandit is only updated when that bandit is selected.In the absence of an equivalent estimate of uncertainty  ̂

Model Fitting
Posterior parameter distributions were estimated for each subject for each of the free parameters specific to the learning ( in the Delta learning rule) and choice rules for each model variant (, , ).
Parameter estimates were derived using Hierarchical Bayesian modelling within Stan (Version 2.17.0; Stan Development Team, 2017), operating in Python version 13. 1.10.Sampling was performed with four chains, each chain running after a warmup period of 5000 iterations for a total of 20000 samples.
The prior for each group-level mean was uniformly distributed.For each group-level standard deviation, a half-Cauchy distribution with location parameter 0 and scale parameter 1 was used as a weakly informative prior 6 .Priors for all subject-level parameters were normally distributed with a parameter-specific mean and standard deviation.
Group level posterior distributions for the free parameters were estimated separately for each patient group (PD-Apathy, PD-Non-apathy and HC).Posterior means and highest density intervals (HDI) were calculated using "bayestestR2 and 'tidybayes' packages in R (version 4.1.1http://www.rproject.org) 7,8.

fMRI methods
Preprocessing was performed in SPM12 (http://www.fil.ion.ucl.ac.uk/spm).During this phase, images were realigned, co-registered and normalised to the SPM12 Montreal Neurological Institute (MNI) echo planar imaging template, then smoothed using a 6 mm full-width at half-maximum (FWHM) Gaussian kernel.
The first level General Linear model (GLM) analysis included two time points: the trial onset when the four "bandits" were displayed and the outcome (payout) presentation time.To create regressors for the GLM time points, we convolved these event onsets with the canonical haemodynamic response function (HRF).
For the main GLM, first level contrast images were created for each subject with four session related constants for the five regressors of interest, in the following order: 1) trial onset, 2) outcome onset,

Model posterior predictive checks
Replicating previous studies in healthy controls performing the restless bandit task 4 , 9 the overall winning model was the Bayes-SMEP (Supplementary Figure 2A).Consistent with this capturing the decisions making in the task robustly, model parameters could still be successfully recovered from synthetic choice data generated from simulated choices derived from individual subjects' model parameters (Supplementary 2 Figure B-D).We found further support of a good model fit of the experimental data by showing that the simulated data choices overlapped with the experimental group data (Supplementary Figure 3 A-B, D-E).Furthermore, the relationship between simulated choices from the individual subjects LARS score could also be reproduced from the simulated data (Supplementary Figure 3

3 )
choice type, 4) reward prediction error and 5) outcome value.The trial onset regressor was parametrically modulated by the type of choice as [explore = 1, exploit = 0].The outcome time was modulated by the outcome value and model prediction error.Orthogonalization between regressors was turned off, as previous work has shown that many of these regressors correlate strongly and could otherwise cancel each other out [44].Low-frequency noise was removed by employing a temporal high-pass filter with a cut-off frequency of 1/128 Hz, and a first order autoregressive model AR(1) was used to remove serial correlations.

Table 6 Summary of fMRI activations at the decision time during exploration.
All activations surviving cluster-level FWE correction at P < 0.05 with an initial uncorrected cluster-defining threshold of P < 0.001 or * P<0.0001 (with k ≥10 voxels) *