Focused stimulation of dorsal versus ventral subthalamic nucleus enhances action–outcome learning in patients with Parkinson’s disease

Abstract Deep brain stimulation of the subthalamic nucleus is an effective treatment for the clinical motor symptoms of Parkinson’s disease, but may alter the ability to learn contingencies between stimuli, actions and outcomes. We investigated how stimulation of the functional subregions in the subthalamic nucleus (motor and cognitive regions) modulates stimulus–action–outcome learning in Parkinson’s disease patients. Twelve Parkinson’s disease patients with deep brain stimulation of the subthalamic nucleus completed a probabilistic stimulus–action–outcome task while undergoing ventral and dorsal subthalamic nucleus stimulation (within subjects, order counterbalanced). The task orthogonalized action choice and outcome valence, which created four action–outcome learning conditions: action–reward, inhibit–reward, action–punishment avoidance and inhibit–punishment avoidance. We compared the effects of deep brain stimulation on learning rates across these conditions as well as on computed Pavlovian learning biases. Dorsal stimulation was associated with higher overall learning proficiency relative to ventral subthalamic nucleus stimulation. Compared to ventral stimulation, stimulating the dorsal subthalamic nucleus led to a particular advantage in learning to inhibit action to produce desired outcomes (gain reward or avoid punishment) as well as better learning proficiency across all conditions providing reward opportunities. The Pavlovian reward bias was reduced with dorsal relative to ventral subthalamic nucleus stimulation, which was reflected by improved inhibit–reward learning. Our results show that focused stimulation in the dorsal compared to the ventral subthalamic nucleus is relatively more favourable for learning action–outcome contingencies and reduces the Pavlovian bias that could lead to reward-driven behaviour. Considering the effects of deep brain stimulation of the subthalamic nucleus on learning and behaviour could be important when optimizing stimulation parameters to avoid side effects like impulsive reward-driven behaviour.


Graphical Abstract Introduction
Adapting behaviour in novel environments often requires learning new associations between stimulus events, and how to react to these events, to produce desired outcomes. 1,2owever, the discovery of these often subtle or unpredictable patterns in everyday life plays an important role in decision optimization, generally to maximizing positive outcomes and minimizing negative ones.
A unique feature of stimulus-action-outcome learning is the demonstration of innate biases between actions and outcomes across species.Such biases, termed Pavlovian biases, include 'act to gain a positive outcome' (action-gain reward bias) and 'inhibit action to avoid a negative outcome' (inhibit-punishment avoidance bias). 1,3,4These actionreward and inhibit-punishment avoidance biases are argued to facilitate decision-making in novel situations by quickly determining which actions should be selected to produce desired outcomes and which actions should be inhibited to avoid undesired outcomes. 5,6However, these Pavlovian biases come with trade-offs that can disrupt learning flexibility, particularly the learning of alternative contingencies that violate or conflict with these biases.For example, a strong action-reward bias makes it more difficult to learn a link between inhibiting an action to gain a reward.Similarly, a strong inhibition-punishment avoidance bias makes it more difficult to learn a link between selecting an action to avoid punishment. 1,5,6he formation and expression of stimulus-actionoutcome associations may be in part organized within frontal-basal ganglia circuitry. 7,8A range of neurological and neuropsychiatric disorders associated with altered frontal-basal ganglia circuitry (e.g.Parkinson's disease, obsessive-compulsive disorder and schizophrenia) [9][10][11] experience difficulties learning contingencies between actions and outcomes or, specifically, demonstrate changes in the strength of Pavlovian biases during stimulus-action-outcome learning. 12,13n individuals with Parkinson's disease, a reduction of dopaminergic activity in the basal ganglia shifts learning biases by weakening the action-reward Pavlovian bias while enhancing action-punishment avoidance learning. 14Thus, Parkinson's disease patients (off medication) become more sensitive to actions that help avoid negative outcomes as opposed to actions that generate rewards.Treatments for the clinical motor symptoms of Parkinson's disease, like dopaminergic medication or deep brain stimulation of the subthalamic nucleus (DBS-STN), appear to restore or even amplify the stronger action-reward Pavlovian bias, [15][16][17] which may explain some of the treatment-related increase in reward-driven impulsive behaviour. 18,19iven indications that basal ganglia circuitries are implicated in stimulus-action-outcome processing, the current study aimed to advance our understanding of how DBS, directed at specific subregions and the associated circuitries within STN, impacts Pavlovian learning biases and shifts action-outcome sensitivities during learning.A clearer picture of STN subregion and circuitry effects could also contribute novel insights about behavioural side effects of DBS (e.g.impulsivity, apathy) as well as inform electrode placement decisions to minimize certain cognitive side effects.
DBS therapy for Parkinson's disease offers a unique opportunity to probe links between basal ganglia function and stimulus-action-outcome learning.With DBS, specific structures, like the STN or the globus pallidus interna of the basal ganglia, can be stimulated directly with highfrequency electrical pulses.The STN may play a particularly unique role in action-outcome decisions and learning because of its putative contribution to the inhibition of action. 201][22][23][24][25] Stimulating the STN reduces action thresholds in a global manner, thus restoring Parkinson's disease patients' ability to move more freely but also increasing their risk of reacting prematurely or impulsively. 26It has been proposed that when a conflict between action choices arises, the cortex activates the STN through the hyperdirect pathway. 27The excitation of STN suppresses basal ganglia output to the thalamus, which stops or pauses action-generating signals from triggering motor responses in the cortex. 20,28In Parkinson's disease, the overactive STN induces a strong braking of action that is alleviated by DBS-STN.0][31][32][33][34][35][36] This suggests that lowering the threshold to act with clinical DBS during reward-learning or reward-based decision-making may thereby amplify the Pavlovian action-reward bias.
While clinical DBS-STN generally targets dorsal motor areas of the STN (innervated by primary motor cortex and supplementary motor cortex), a confounding factor in using clinical DBS parameters is that they produce large stimulation fields, thereby impacting multiple STN subregions and subcircuitries. 22,37The potential impact of stimulation in STN subregions on learning that involves both action selection and inhibition, as well as processing reward and punishment outcomes, is largely unexplored.There may be dissociable effects on stimulus-action-outcome learning when focal stimulation is directed to either dorsal motor areas of the STN or the relatively ventral cognitive and limbic STN subregions.
9][40][41] Stimulation of ventral STN circuitries has been shown to impact the processing of emotional valence and may also modulate the appraisal of positive or negative outcomes. 41,42Thus, specific effects of DBS-STN on action-outcome processing may depend on the subregion stimulated.In the current study, we refer to dorsal and ventral subregions of the STN, but note that this is synonymous to the superior versus inferior direction along the STN.
A thorough investigation into how DBS targeting distinct STN subterritories influences bias in learning stimulusaction-outcome associations will enhance our understanding of the role of STN in outcome-based learning in general and also contribute to a deeper understanding of the underlying processes that might relate to some of the unintended side effects of STN stimulation.In this study, we investigated how subregional DBS-STN impacts stimulus-actionoutcome learning by measuring learning effects when stimulation was being delivered to electrodes positioned in relatively dorsal STN versus relatively ventral STN.Based on the dissociable functional inputs to STN subregions, we hypothesized that stimulation of the relatively dorsal, motor subregions of the STN would exert stronger effects on the action component of action-outcome learning (action selection versus action inhibition) while stimulation of the relatively ventral STN subregion would exert specific effects on the outcome or valence (reward versus punishment avoidance) component of action-outcome learning.

Participants
Parkinson's disease participants (n = 12), who had previously undergone bilateral DBS-STN implantation, were recruited from the Movement Disorders Clinic at Vanderbilt University Medical Center.All participants were withdrawn from dopaminergic medications during study participation following a 24-h withdrawal from levodopa and 48-h withdrawal from dopamine agonist.
Inclusion criteria included idiopathic Parkinson's disease with bilateral STN implants, with lead contacts localized in dorsal and ventral subregions of STN (see 'DBS contact registration and selection' for details).
Exclusion criteria were similar to previously published studies 43,44 and are detailed in the Supplementary material.Patients were recruited after DBS surgery based on electrode positioning methods described below.Prior to study enrolment, participants experienced a minimum of 6 months' improvement in clinical motor symptoms as classified by medical record review and neurological ratings of motor symptoms [Movements Disorders Society Unified Parkinson's Disease Rating Scale Motor (MDS UPDRS-III)].On average, participants were recruited 26 months post-surgery (SE = 5.3 months).Individual demographic and clinical profiles are outlined in Table 1.Clinical stimulation parameters are shown in the Supplementary Tables 1 and 2.
A detailed informed consent process was conducted before enrolment.Research activities complied with the standards of ethical conduct in human subject research as outlined by the Vanderbilt University Medical Center.

DBS contact registration and selection
DBS contact registration and selection were performed using methods similar to our previous studies. 43,44Per standard clinical care, participants underwent preoperative MRI with T 1 -weighted and T 2 -weighted images, which were acquired via 3T Philips (Philips Achieva) using phased array SENSE 8-channel reception and body coil transmission.We captured T 1 -weighted images [typical time repetition (TR)/time echo (TE) = 7.9/3.6ms] using 1.0 mm 3 isotropic spatial resolution and captured T 2 -weighted images (typical TR/TE = 3000/ 80 ms) using a 47 × 47 mm 2 in-plane resolution and 2 mm slice thickness.Postoperative imaging (1 month after surgery to allow for resolution of pneumocephalus) included a brain CT collected at kVp = 120 V, with 350 mAs exposure, capturing 512 × 412 pixels.In-plane resolution was set at 0.5 mm while slice thickness was set at 0.75 mm.Note that patients were not rescanned prior to study enrolment, but we have previously shown that DBS leads secured with our surgical technique do not significantly migrate. 45 localized DBS contacts both manually and automatically using the CRAnialVault Explorer (CRAVE 46 ) software, as described previously. 43,44Briefly, we used the software to first localize DBS implants and contacts in the CT images (delayed ∼4 weeks to allow resolution of pneumocephalus) and then visually verified the localization accuracy for each individual contact.To determine the anatomic location of each contact, the preoperative MR images were first registered to the postoperative CT images with the use of fully automatic intensity-based rigid registration techniques integrated into CRAVE.Next, the CRAVE T 1 -weighted MR anatomic atlas in which deep brain structures including the STN have been localized using high-field (7 Tesla) images 47 was registered to the preoperative T 1 -weighted image.This was achieved with a fully automatic intensity-based nonrigid technique also integrated into CRAVE. 48ontours of anatomic structures were projected from the atlas to the preoperative T 1 -weighted images, and registration accuracy was assessed visually.If either contact localization or atlas registration was deemed inaccurate, the subject was excluded from the study.Ultimately, individual contacts were projected onto the atlas using the inverse of the atlas-to-subject transformation.The accuracy of this last step was again assessed visually by confirming that the contacts were in the same STN subregion in the atlas and the subject's T 1 -weighted image.
Figure 1A-C show the individual electrodes used for dorsal and ventral stimulation projected onto the atlas STN. Figure 1D shows the groups' average position.As was done in our previous study, 43 for each individual, we defined the ventral and dorsal STN using an oblique plane (perpendicular to the lead trajectory) and further categorized the dorsolateral and ventromedial subregions in line with   Haynes and Haber. 22Participants were deemed appropriate candidates for recruitment if bilateral leads had a minimum of one contact in both the dorsal and ventral STNs (as defined by their individual scan mapped on the atlas in CRAVE).Table 2 shows the electrodes that were used for focused stimulation at dorsal and ventral contacts.

Design and procedures
Participants completed two sessions of a stimulus-actionoutcome learning task, 1,6,15,16,50 a session with dorsal STN stimulation and a session with ventral STN stimulation.
The study was limited to two stimulation settings (no 'OFF' condition included) to limit the overall experiment time and discomfort for the patients.The order of sessions was counterbalanced across participants, and participants were blinded to the order of stimulation.Before the first testing session and between DBS setting changes, we included a 30-min waiting period.Prior studies suggest this time period is sufficient to produce dissociable effects on cognitive and action control processes 16,44 and accounts for most of the changes in the motor symptoms. 51,52he participants' clinical DBS settings were modified to specifically target STN subregions by adjusting the stimulation parameters.Unlike the clinical DBS settings, which often lead to overlapping stimulation fields across adjacent contacts, we maintained a consistent current of 0.4 mA, a stimulation frequency of 130 Hz and a pulse width of 60 μs. 43,44These settings aimed to achieve a uniform current density throughout the targeted STN subregions and across all participants while limiting the estimated field of tissue activation.According to Butson and McIntyre, 49 applying a stimulation amplitude of ∼0.4 mA (assuming an average clinical impedance of 1 kΩ) would result in a volume of tissue activated (VTA) with a radius of about 1.3 mm.This approach ensured minimal volumetric overlap between adjacent electrodes to ensure distinguishable effects.
For the learning task, participants were informed that their goal was to learn whether to act or inhibit action to each of four different colour patches to maximize monetary earnings, either by gaining a reward or avoiding a loss (punishment).This task has been widely used to measure Pavlovian biases in learning, to contrast action versus inhibition learning and to investigate the role of reward versus punishment avoidance in learning. 1,6,15,16,50he task started with the presentation of a centred fixation point for 750 ms.After the fixation point extinguished, one of the four colour patches flashed on the screen for 200 ms.Participants then had 2 s to either act (i.e.make a twohanded button press; the bilateral response avoids potential handedness effects) or inhibit action (i.e.withhold or refrain from the two-handed button press).After the 2-s response decision window, feedback was displayed on the screen for 2 s and indicated if the participant's action choice (act or inhibit action) resulted in one of three outcomes: monetary reward (+ 25 cents), monetary loss (− 25 cents) or no monetary outcome (0 cents).The feedback information then disappeared, triggering the next trial with presentation of one of the colour patches.For each session, a set of four unique colour patches appeared in random order with equal probability (10 times for each colour) across 4 blocks of 40 learning trials.The order of these sets of colour patches was counterbalanced across subjects.
Within a session, each colour was presented 40 times across the 4 blocks.Throughout each block, participants could see their total earnings displayed in the upper centre of the screen.One individual block lasted for ∼3 min, and 1-min breaks were allotted between blocks.Before the experiment started, they were given 15 practice trials.
As the participants were blinded to which colours were associated with the respective outcomes, they were tasked to learn to maximize their outcomes by optimizing reward and avoiding losses.Additionally, two colours from each set produced the ideal outcome by acting (either gaining a reward or avoiding punishment), while two others produced the ideal outcome via inaction.This 2 × 2 factor design setup allows for evaluation of action and valence.
However, patterns were not 100% deterministic throughout the learning blocks to keep the task engaging and challenging, thereby maintaining the participants' attention.Participants were instructed that each stimulusaction-outcome association was probabilistic and would yield the desired outcome most of the time but not all of the time.The stimulus-action-outcome conditions are outlined below, and the optimal response that would lead to higher accuracy and more reward/fewer losses for each stimulus is underlined.1. Stimulus A: action to gain reward.Participants had a 90% chance of receiving a reward with an action to this stimulus and only a 10% chance that action was not Participants had a 90% chance of avoiding a negative outcome (no loss) when they inhibited or withheld action to this stimulus but only a 10% chance of receiving a negative outcome (loss of $) when inhibiting or withholding action.Action to this stimulus led to a negative outcome 100% of the time.

Behavioural outcome measures
Participants performed 160 learning trials in total, 40 trials for each of the 4 unique colour patches.Our primary behavioural outcome was accuracy across trials and conditions.Higher accuracy (through action or inaction) on the conditions reflects better learning.We also collected reaction times (RTs) for the trials that required an action, although speed was not emphasized in the task instructions.Additionally, we calculated a Pavlovian reward or punishment bias similar to previous studies. 15Specifically, the error rate on trials with 'conflicting' action-outcome associations (action-punishment avoidance, inhibition-reward) was divided by the accuracy on the Pavlovian bias conditions separately for reward and punishment conditions.The reward bias was calculated by dividing the errors on 'inhibitionreward' trials by the accuracy on the 'action-reward' trials.The avoid punishment bias was calculated by dividing errors on 'action-avoid punishment' trials by the accuracy of the 'inhibition-avoid punishment' trials.The numerator represents the impact of the Pavlovian biases on learning the unnatural or conflicting stimulus-action-outcome conditions, which is scaled by the denominator (reflecting the strength of learning the more natural or Pavlovian associations).A smaller value on the Pavlovian bias (closer to 0) indicates fewer errors on unnatural associations during the task, whereas values closer to 1 indicate a higher Pavlovian bias; i.e. more errors were made on unnatural associations due to the bias of responding in line with the expected natural action-valence combinations.A large number of errors on the inhibition-reward trials relative to the accuracy on 'action-reward' trials would thus result in a high reward bias (close to 1).Similarly, a high punishment bias would be the result of a high error rate on the action-avoid punishment trials relative to the accuracy on inhibition-avoid punishment trials.

Statistical techniques
We used a generalized mixed linear model (GLM) with a binary distribution and a logit link to test whether the probability of an accurate response depended on the STN subregion, action or outcome condition.Independent variables were DBS subregion (dorsal STN, ventral STN), action choice (action, inhibit action) and outcome (reward, avoid punishment) and the interactions between these variables.We also included the independent variable trial number (1-40)  and the interaction of trial number with DBS, action and valence to capture effects across trials.A random intercept per individual was included to account for individual variability for each subject.In addition, a hierarchal design was specified by adding random effects of trial number nested within the interaction of action and outcome (action and outcome nested within DBS) for each individual subject.Logistic regression model assumptions 53 (absence of multicollinearity, linearity of logit with the continuous variable trial number, lack of strongly influential outliers and the independence of errors) were checked.Biserial correlations were used to evaluate the correlation between trial number and the other independent variables (all binary).The linearity of logit accuracy and the continuous variable trial number was evaluated by a visual evaluation of the scatter plot trial number versus residuals.Different dependencies due to the longitudinal nature of the experiment were adjusted for through the hierarchical/nested/mixed model design.The significance level was set to 5%.
Additionally, we analysed whether there was an effect of STN subregion stimulation on RTs for the action trials.We compared the RTs for the stimulus-action-outcome conditions (action-reward, action-avoid punishment) between STN subregions with a repeated measures (RM) ANOVA, within subjects' factors outcome and DBS.We used a more parsimonious ANOVA model for the RTs rather than a GLM because (i) the RTs were only available for the learning trials that required an action (half of the trials), and (ii) the learning task instructions did not emphasize speed; thus, we did not have specific predictions about RTs across trials that would require a multifactorial model.
Pavlovian biases during dorsal and ventral DBS-STN were analysed with repeated sample Wilcoxon rank order tests (due to their skewed distribution), separately for reward and punishment-avoidance bias.

DBS-STN subregion effects on action-outcome learning performance (accuracy)
The GLM analysis tested the effect of stimulation site and learning conditions (action and outcome) on the probability of an accurate response during the learning task.Figure 2 displays the probability of an accurate response across trials for each stimulus-action-outcome condition, separate for dorsal and ventral STN stimulation and main effects across conditions.Figure 3 shows the detailed trial-by-trial probability that a participant responds with an action for each action-valence condition separated by DBS-STN site.The probability of an accurate response across trials and learning conditions was significantly larger with dorsal DBS-STN (76.36%) compared to ventral STN [66.15%; odds ratio (OR) = 1.65, t = 2.50, P = 0.03; note that an OR of 1.65 indicates that the likelihood of a correct response with dorsal DBS was 1.65 times greater than with ventral DBS; the significance of these ORs is indicated by the t-statistic].Across DBS conditions, trials associated with action were more likely to receive a correct response (80.77%) compared to trials associated with inhibition (60.05%;OR = 2.79, t = −2.95,P = 0.004).Moreover, across DBS conditions, responses were more likely to be correct on trials linked to reward (79.19%) compared to trials linked to punishment avoidance (62.39%;OR = 2.29, t = −2.38,P = 0.02).
In terms of DBS effects, with dorsal STN stimulation, responses were more likely to be correct compared to ventral stimulation on conditions where participants had to learn to inhibit action to produce optimal outcomes (inhibition conditions: dorsal = 67.66%,ventral = 51.92%,OR = 1.94, t = 2.57, P = 0.01).Additionally, responses were more likely to be correct with dorsal relative to ventral DBS-STN on conditions where participants were optimizing Note that when learning takes place, the probability to act across trials is expected be high at the end of the 40 trials for the action conditions (A and B) and expected to be low at the end of the 40 trials for the inhibition learning conditions (C and D).
Stimulating STN subregions produced a specific interaction effect.DBS targeting the dorsal STN increased the likelihood of an accurate response when learning to inhibit action for a rewarding outcome (dorsal DBS-STN: 62.5%; ventral DBS-STN: 40.08%; inhibition-reward conditions: OR = 2.49, t = 2.6, P = 0.01) compared to targeting the ventral portion of the STN, when participants were more likely to experience greater difficulties learning to inhibit action to obtain rewards.

DBS-STN subregion effects on Pavlovian learning biases
The Wilcoxon rank order tests were used to investigate Pavlovian biases by stimulation site and outcome valence.Figure 5 displays the mean Pavlovian bias separated by stimulation site and outcome.Wilcoxon rank order test for paired samples showed a significantly stronger action-reward Pavlovian bias with ventral DBS (0.93) compared to dorsal DBS-STN (0.53), Wilcoxon signed rank = 2.43, P = 0.015.That is, with ventral STN stimulation, participants experienced greater difficulty overriding the bias associating action to reward, which made them less effective at learning to inhibit action to produce reward.This aligns with the finding in the mixed model analysis that showed a learning advantage under dorsal STN stimulation for conditions requiring action inhibition to gain reward.Notably, there was no difference in STN subregional stimulation effects on the Pavlovian inhibition-avoid punishment bias, Wilcoxon signed rank = 0.47, P = 0.64 (dorsal = 0.78, ventral = 0.84).

Discussion
We investigated the effects of applying focused stimulation to distinct STN functional subregions on stimulus-actionoutcome learning.Parkinson's disease patients showed expected learning patterns in line with previous findings on similar probabilistic action-outcome learning tasks. 1,6hat is, participants performed relatively better with stimuli that required action versus inhibition, and they showed more proficient performance with stimuli that led to reward compared to avoiding punishment.Participants also showed Pavlovian biases in their learning patterns.They were most adept at learning to act to obtain a reward and to inhibit action to avoid punishment, a pattern consistent with Pavlovian biases.In contrast, performance was reduced for action-outcome contingencies that conflicted with these Pavlovian biases, such as learning to act to avoid punishment or learning to inhibit action to gain a reward, which replicated prior findings. 1,6Specifically, Parkinson's disease patients showed overall better learning performance when stimulation was bilaterally applied to the dorsal STN compared to the ventral STN.With respect to subregional DBS effects on specific learning patterns and conditions, the most striking differences between DBS-STN subregions were found when learning to inhibit action to optimize outcomes or when learning to gain a reward and particularly when these were combined and required learning to inhibit action to gain rewards.With ventral STN stimulation, learning to inhibit action to obtain reward was markedly less proficient, suggesting a more pronounced Pavlovian action-reward bias that interfered with their ability to associate action inhibition to obtain reward.Stimulating dorsal STN countered this bias and led to more proficient learning to inhibit action to optimize reward.

Clinical implications of DBS-induced shifts in Pavlovian biases
The relatively stronger action-reward Pavlovian bias with ventral STN simulation compared to dorsal STN stimulation may provide novel clues about alterations in the perception of reward. 55In the current study, stimulating ventral STN made it more challenging to learn to inhibit or restrain behaviour to generate rewarding outcomes, suggesting a shift towards a stronger drive towards action in the pursuit of rewards.This is in line with studies showing DBS-STN side effects like euphoria, apathy and impulsivity [56][57][58][59][60][61] that could potentially be explained by stimulation of the valence processing network through the more ventral regions of the STN 62,63 innervated by limbic circuitries. 22,37wo prior investigations in Parkinson's disease studied the effects of clinical DBS settings on a similar action-valence learning paradigm and demonstrated an enhanced Pavlovian action-reward bias when DBS-STN was turned ON compared to OFF. 15,17 Eisinger et al. 15 also reported a positive relationship between the strength of reward bias and clinical impulsivity measured by the Questionnaire for Impulsive-Compulsive Disorders in Parkinson's Disease (QUIP 64 ).Since clinical DBS settings produce tissue activation fields that easily encompass large sections of STN, it is certainly possible that enhanced action-reward bias was driven by stimulation effects in ventral STN, which would also be consistent with a stronger relationship between induced behaviours captured by the QUIP and stronger action-reward bias.The current study used carefully calibrated stimulation parameters to focus the field of tissue activation to STN subregions (see also van Wouwe et al. 43,44 ).This more focused approach provides direct confirmation that DBS in ventral STN may be associated with a stronger action-reward bias compared to dorsal stimulation.Dorsal DBS-STN selectively appears to counteract this bias.
It is important to recognize that while the current results implicate a clear dissociation in action-reward bias with ventral STN compared to dorsal DBS-STN, we cannot conclusively state whether ventral STN stimulation led to an increased reward bias or whether dorsal DBS reduced the reward bias because our study did not include a control condition with DBS turned OFF.In a previous comparison between Parkinson's disease ON and OFF dopaminergic medication on a similar action-valence learning task, 16 we demonstrated that performance ON medication while learning to 'inhibit action to obtain reward' was at 54%; this is  relatively higher than what we found in the current study with ventral DBS-STN (40%), but worse compared to dorsal DBS-STN (62.5%).This comparison is at least suggestive that stimulating the ventral, limbic STN subregion could exacerbate the action-reward bias beyond the effect of dopaminergic medication, whereas stimulating the dorsal, motor STN subregion appears to reduce it relative to dopamine medication.Note that the previous study 16 was performed with different outcome probabilities (80-20 versus the 90-10 used here); thus, an important objective of future research is to investigate the combined effects of medication and DBS on these actionoutcome learning biases.Additionally, preoperative individual differences in impulsivity as measured by the QUIP could play an important role in the DBS effect on the reward bias with ventral stimulation and future studies could take this into account.

DBS-STN subregion effects on inhibitory action and punishment avoidance learning
Beyond the effect of DBS on Pavlovian biases, we had predicted that stimulation of the relatively dorsal STN region might exert stronger effects on the action versus inhibition dimension of stimulus-action-outcome learning.Overall, dorsal DBS-STN was associated with a higher proficiency of learning to inhibit action to optimize outcomes compared to the effects of ventral DBS-STN.This improved inhibition learning was especially effective at countering the strong 'action-reward' Pavlovian bias.The relatively dorsal STN substructure is innervated by hyper-direct projections from the supplementary motor area and M1.These cortical-STN circuitries have been associated previously with the inhibition of motor impulses in the Simon task. 65,66imilarly, previous DBS effects on an inhibitory action control task (Simon task) demonstrated that focused dorsal stimulation and clinical DBS improved the ability to selectively inhibit undesired action impulses in situations of conflict. 33,43Thus, the finding that dorsal STN stimulation appears to play a role in action inhibition learning is consistent with a role in inhibitory motor control more broadly.
It is worth mentioning that stimulating dorsal STN may achieve more proficient inhibition learning through an alternative mechanism.It has been proposed that the STN is involved in setting action decision thresholds, either when regulating action impulses or when learning to associate actions with optimal outcomes. 20,21,26Specifically, when conflicts arise between response options, such as between a more optimal action that conflicts with an action consistent with a Pavlovian bias, the dorsal STN may engage a transient inhibition or pause signal to allow more time to select an optimal choice.DBS applied to the dorsal motor network of the STN may be beneficial for disrupting the dysfunctional betaband oscillations in Parkinson's disease and thereby improve action related cognitive performance. 67nce action inhibition and punishment avoidance learning form the basis of a strong Pavlovian bias, we further speculated that dorsal DBS-STN might impact punishment avoidance learning as well.We did not observe a beneficial effect of dorsal DBS-STN (compared to ventral STN) with respect to the punishment avoidance bias, i.e. on the learning conditions that conflict with the Pavlovian punishment avoidance bias ('action-avoid punishment').Participants showed a slightly stronger punishment avoidance bias than a reward bias, and this bias may have been harder to overcome across stimulation conditions.Future EEG imaging studies should aim to establish how DBS-STN (across subregions) impacts the relative amount of conflict induced by the dissociable Pavlovian biases, for example as reflected by a positive or negative prediction error in MPFC or ACC or by the conflict related power in the theta band. 21he current study's observation of improved action inhibition learning with dorsal DBS-STN also seems to contrast with recent findings that ventral stimulation benefits another form of inhibitory control: the proficiency to stop intended actions (as measured by the stop task). 44,68Neurophysiological and imaging studies have also associated the ventral STN circuitry to stopping control. 20,69However, withholding action in context of the stop task involves exclusively inhibiting a motor response and does not require stopping a valence-induced action bias like the action-outcome learning task.Potentially, in the presence of outcome information, actions might be biased more strongly by their associated valence during ventral relative to dorsal DBS-STN.This could be established by investigating the effect of focused STN subregion stimulation separately for motor and valence conflict, or by parametrically adjusting the relative role of conflicting valence information (for example using variable reward magnitudes).

Limitations and future studies
We acknowledge that the exact delineation of functional subregions within the STN is still under debate and may display a gradient akin to the structural gradient of cortical connections from motor, associative and limbic areas, spanning the dorsolateral to ventromedial STN. 70,71uture studies could benefit from technological advancements in axonal pathway, 72 normative connectomes 73 and electrical field modelling 74,75 to investigate the functional pathways that are impacted by DBS.These developments offer the potential to more precisely locate the DBS effect on functional and structural gradients within the STN circuits that play a role in processing conflicting motor impulses, conflicting valence information and conflicts with action-outcome learning biases.
Furthermore, the results are limited by the relative variability in placement of electrodes across STN subregions.That is, in the current study, dorsal and ventral electrodes were selected for each individual patient.This allowed us to make a relative comparison of DBS effects along the gradient of the STN in terms of reward-based learning outcomes, but the relative dorsal or ventral position of the contacts across individuals could impact the outcomes.In a post hoc analysis, we tested this by correlating the electrode coordinates of the inferior-superior axes for dorsal and ventral electrodes with the Pavlovian biases.None of those correlations were significant (see Supplementary Table 3).Future studies aiming to understand the underlying functional circuitry of the STN could prospectively use normative connectomes (i.e.LeadDBS) 73,76 to delineate stimulation effects on motor and valence networks more precisely.
8][79] Although currently limited to animal studies, optogenetic approaches allow control over specific STN cell populations, which has yielded more detailed insight into the neural mechanism of DBS on motor deficits in Parkinson's disease. 9,80Similar techniques could be used in the context of action-outcome learning paradigms to understand STN-DBS effects on reward-related behaviour.
Finally, we recognize the following limitations in our study design: there was no DBS OFF condition to compare the focused stimulation with, and our findings are based on a small sample size.Accordingly, future studies should strive to incorporate a larger group of participants and include a control group OFF DBS.

Conclusion
The current study contributes to a growing literature on how neuromodulation in the basal ganglia circuitry impacts stimulus-action-outcome learning, a fundamental aspect of human cognition that influences daily learning and decision-making.Beyond Parkinson's disease, understanding neuromodulation effects on action-outcome learning and Pavlovian biases could be relevant to a range of neurologic and neuropsychiatric diseases (OCD, addiction, depression and Huntington's disease that could benefit from DBS 13,81,82 ).

Figure 1
Figure 1 Individual electrode positions in subthalamic nucleus (STN) used for dorsal (blue) and ventral (red) stimulation in (A) sagittal (enlarged and overview), (B) coronal planes (enlarged and overview) and (C) average electrode position in coronal plane and (D) individual electrodes in axial plane.Substantia nigra (structure below STN) and thalamus (structure above STN) are displayed for reference.Electrode volumes represent the estimated VTA radius, i.e. based on Butson and McIntyre, 49 the projected volume of tissue activation with 0.4 mA, 130 Hz and 60 ms (with an average clinical impedance of 1 kΩ) would result in a VTA radius of ∼1.3 mm.

Figure 2 ( 7 Figure 3
Figure 2 (A) Estimated probability of a correct response by learning condition and DBS.Lines within the bars reflect 95% confidence intervals.Connected dots next to the bars show individual variability in performance with dorsal and ventral stimulation for each condition.(B) Estimated probability of a correct response between conditions with significant differences (action, STN subregion, valence and interaction effects between action and STN subregion and between valence and STN subregion) based on GLM (OR > 1.65, ts > 2.38, ps < 0.05).*P < 0.05; **P < 0.01.

Figure 5 Average
Figure 5 Average Pavlovian bias separated by stimulation site and outcome.Ventral stimulation resulted in a larger Pavlovian reward bias relative to dorsal stimulation.Connected dots next to the bars show individual variability in performance with dorsal and ventral stimulation for each bias.Note that removal of the outlier in the ventral reward bias does not change the effect of subregion stimulation, i.e. the reward bias remains significantly larger with ventral than the dorsal DBS (Wilcoxon signed rank = 2.43, P = 0.02).*P < 0.05.

Figure 4
Figure 4 Average RTs (ms) for action trials separated by stimulation site and outcome.Overall, performance was faster with the trials that had a rewarding outcome, as shown by the RM-ANOVA [F(1,11) = 11.33,P < 0.01].Connected dots next to the bars show individual variability in performance with dorsal and ventral stimulation for each condition.**P < 0.01.

Table 2 Electrodes used for the experimental focused stimulation at relative dorsal and ventral contacts
Experimental bilateral stimulation settings were set at 0.4 mA, 130 Hz frequency and 60 ms pulse width.Note 1: for Medtronic 3389 leads, 0 indicates the most ventral lead and 3 is the most dorsal lead of the four-contact array.The 3389 leads have an electrode contact size of 1.5 mm with 0.5 mm spacing between contacts.Note 2: subjects 5 (left side), 11 and 12 did not receive ventral/dorsal stimulation at adjacent contacts, and the ventral contact subject 11 (right) and subject 12 (left) were partially inside the STN.All other participants received stimulation at a ventral contact immediately below a dorsal contact.rewarded.Inhibiting or withholding action was not rewarded 100% of the time.2. Stimulus B: inhibit action to gain reward.Participants had a 90% chance of receiving a reward when they inhibited or withheld action and only a 10% chance that inhibiting or withholding action was not rewarded.Acting to this stimulus was not rewarded 100% of the time.3. Stimulus C: action to avoid punishment.Participants had a 90% chance of avoiding a negative outcome (no loss) when they acted to this stimulus and only a 10% chance of receiving a negative outcome (loss of $) when they acted.Inhibiting or withholding action to this stimulus resulted in a negative outcome 100% of the time.4. Stimulus D: inhibit action to avoid punishment.