Drugs that alter dopamine transmission have opposite effects on reward and punishment learning. These opposite effects have been suggested to depend on dopamine in the striatum. Here, we establish for the first time the neurochemical specificity of such drug effects, during reward and punishment learning in humans, by adopting a coadministration design. Participants (N = 22) were scanned on 4 occasions using functional magnetic resonance imaging, following intake of placebo, bromocriptine (dopamine-receptor agonist), sulpiride (dopamine-receptor antagonist), or a combination of both drugs. A reversal-learning task was employed, in which both unexpected rewards and punishments signaled reversals. Drug effects were stratified with baseline working memory to take into account individual variations in drug response. Sulpiride induced parallel span-dependent changes on striatal blood oxygen level-dependent (BOLD) signal during unexpected rewards and punishments. These drug effects were found to be partially dopamine-dependent, as they were blocked by coadministration with bromocriptine. In contrast, sulpiride elicited opposite effects on behavioral measures of reward and punishment learning. Moreover, sulpiride-induced increases in striatal BOLD signal during both outcomes were associated with behavioral improvement in reward versus punishment learning. These results provide a strong support for current theories, suggesting that drug effects on reward and punishment learning are mediated via striatal dopamine.
Brain dopamine has long been implicated in learning. One of the most robust observations in this domain is that dopaminergic drugs have opposite effects on learning from reward and punishment (Frank et al. 2004; Cools et al. 2006; Frank, Samanta, et al. 2007; Moustafa et al. 2008; Bodi et al. 2009; Cools et al. 2009; Palminteri et al. 2009). According to recent theory, these opposite effects reflect dopamine-induced shifts in the balance between the GO and NOGO pathways of the basal ganglia (Frank 2005). Specifically, it is suggested that increases in dopamine shift the balance toward the GO pathway (improving reward—relative to punishment—learning) and decreases in dopamine shift the balance toward the NOGO pathway (impairing reward—relative to punishment—learning). This model is supported by evidence from positron emission tomography (PET; Cools et al. 2009), experimental animal work (Frank and Fossella 2011), and genetic studies (Frank, Moustafa, et al. 2007; Frank and Hutchison 2009) and is predictive of medication withdrawal effects in psychiatric disorders including Parkinson's disease (Frank et al. 2004; Cools et al. 2006; Frank, Samanta, et al. 2007; Moustafa et al. 2008; Bodi et al. 2009; Palminteri et al. 2009; Voon, Pessiglione, et al. 2010), attention deficit hyperactivity disorder(Frank, Santamaria, et al. 2007), and Tourette's syndrome (Palminteri et al. 2009).
Here, we address 2 key remaining questions regarding these opposite drug effects on reward and punishment learning. First, we establish the neurochemical specificity of these drug effects in humans. Previous studies have used drugs that are neurochemically nonspecific, acting also, for example, on the noradrenaline system. According to current standards in animal pharmacology (Feldman et al. 1997), claims about the neurochemical specificity of drug effects can be made only based on the blocking of receptor agonist action by coadministration with a receptor antagonist (or vice versa). Such a coadministration approach has rarely been adopted in human research. Here, we administered, in 4 sessions, sulpiride, bromocriptine, placebo, and a combination of sulpiride and bromocriptine. Sulpiride is a selective D2-receptor antagonist, and bromocriptine is a D1- and D2-receptor agonist with high affinity for the D2-receptor. If drug-induced changes are dopamine dependent (i.e. exerted via dopamine (D2-) receptors, targeted by both drugs), they should be blocked by coadministration with the other drug.
Secondly, despite the consistency of dopamine's effects on behavior, there is a controversy with regard to dopamine's effects on human striatal responses during reward and punishment learning, as measured with pharmacological functional magnetic resonance imaging (fMRI). Some studies have revealed drug effects on the striatum during punishment, but not reward (Cools et al. 2007; Dodds et al. 2008; Voon, Reynolds, et al. 2010). By contrast, other studies have revealed dopaminergic drug effects on the striatum during reward, but not punishment (Pessiglione et al. 2006; Voon, Pessiglione, et al. 2010; Jocham et al. 2011; Ott et al. 2011). This discrepancy might lie in the use of different tasks. The former studies employed instrumental reversal paradigms, in which performance depended more critically on punishment learning. Conversely, the latter studies did not measure the reversal of contingencies. We used an adapted reversal paradigm, in which reversals were signaled by either unexpected rewards or unexpected punishments. Previous fMRI work with this paradigm has shown that both unexpected rewards and punishments elicit a prediction error signal in the striatum (Robinson et al. 2010). Importantly, the demands for punishment and reward reversals were matched (Cools et al. 2006; Cools et al. 2009), thus allowing us to investigate the dopaminergic drug effects on human striatal responses during both punishment and reward learning. We considered 2 possibilities: First, the opposite effects of dopaminergic drugs on reward and punishment learning would be accompanied by similarly opposite effects on striatal blood oxygen level-dependent (BOLD) signal during reward and punishment events. To this end, we assessed drug effects on valence-dependent BOLD signal. Secondly, the opposite effects on reward and punishment learning would be accompanied by parallel changes in striatal BOLD signal during both unexpected outcomes, so that improved reward relative to punishment learning would be accompanied by increased striatal BOLD signal during reward and punishment events. To this latter end, we assessed drug effects on valence-independent BOLD signal.
One well established challenge for drug research is the large individual variability of drug effects (Cools and D'Esposito 2011). Bromocriptine's effects on this task have been shown to depend on baseline dopamine levels as measured with neurochemical PET, improving reward—relative to punishment—learning in low-dopamine participants, while impairing it in high-dopamine participants (Cools et al. 2009). Similar opposite effects of both dopamine-receptor agonists and antagonists have repeatedly been observed as a function of working memory capacity (Kimberg et al. 1997; Frank and O'Reilly 2006), which has been shown to correlate with baseline dopamine synthesis capacity as measured with neurochemical PET (Cools et al. 2008; Landau et al. 2009). Accordingly, given this baseline-dependency principle of drug efficacy (Cools and D'Esposito 2011), we stratified our drug effects with baseline working memory capacity to take into account putative individual differences in drug response.
Materials and Methods
Twenty-eight healthy participants gave informed consent approved by the local research ethics committee (“Comissie Mensgebonden Onderzoek”, Arnhem-Nijmegen, number: 2008/078, date: 9 September 2008) and were compensated for participation. Six participants were excluded from analysis. One participant withdrew after the first session; 2 participants showed excessive head movements in the scanner (greater than twice the voxel size, e.g. translation >6 mm); 2 full fMRI datasets were unusable due to technical reasons; 1 non-native Dutch speaker was excluded from analyses, because good practice of the Dutch language was essential for the optimal assessment of baseline working memory capacity. The remaining 22 participants were right-handed, European Caucasians (mean age 23.1, range 18.9–30.3 years, 11 men) with no relevant medical/psychiatric history 3 years prior to testing. Two participants completed only 2 drug sessions (sulpiride–bromocriptine and sulpiride–placebo). Accordingly, the sulpiride–bromocriptine and the placebo–sulpiride comparisons were computed using the data from 21 participants (11 and 10 men, respectively), and the remaining comparisons were computed using the data from 20 participants (10 men). All participants were screened to assess their medical/psychiatric status during intake (Supplementary Material). During intake, a Dutch version of the listening span test (Salthouse et al. 1991) was taken and the reversal-learning task was practiced during a short structural magnetic resonance imaging (MRI) scan.
Pharmacological Design and General Procedure
Participants were tested on 4 occasions separated by at least 1 week. Starting time of the sessions (8.30 or 10.30 AM) was kept constant across sessions within participants. Participants abstained from alcohol and nicotine 24 h before testing and from caffeine on the day of testing. Upon arrival, participants were asked about their compliance with the above-mentioned restrictions and were given a light breakfast 1 h before ingestion of the drugs. Participants were tested after placebo, sulpiride (Dogmatil®, Sanofi-aventis, 400 mg), bromocriptine (Parlodel®, Novartis, 1.25 mg), and a combination of bromocriptine and sulpiride. Administration order was randomized according to a counterbalanced, placebo-controlled, double-blind design (Supplementary Material for details on administration orders). A double-dummy design was employed; all participants received 2 different opaque, gelatin capsules on 2 separate time points a day. Participants received placebo or bromocriptine 30 min after receiving placebo or sulpiride. The dose selection was based on previous studies, which revealed significant effects and good tolerance (Mehta et al. 2004, 2008; Cools et al. 2009; Dodds et al. 2009). Timing was optimized to have maximal effects during both the fMRI paradigm and a subsequent behavioral paradigm (reported elsewhere, van Holstein et al. 2011). The reversal-learning task started approximately 2¼h after first drug intake (i.e. sulpiride) with a duration of approximately 60min. The mean time to maximal plasma concentration of sulpiride is approximately 3h, with a plasma half-life of approximately 12h (Mehta et al. 2003). The mean time to maximal plasma concentration of bromocriptine is approximately 2½h, with a plasma half-life of approximately 7h (Deleu et al. 2002). Accordingly, time of testing coincided with the time-window of maximal drug effects.
Subjective mood ratings were measured with the Bond and Lader visual analog scales (Bond and Lader 1974). Mood measures, blood pressure, and heart rate were taken approximately 30min before, approximately 2h after, and approximately 6h after first drug intake. Blood samples were taken approximately 30min before and approximately 2h after first drug intake and were used to determine the change in prolactin levels due to dopamine D2-receptor binding (Fitzgerald and Dinan 2008).
Background neuropsychological tests (block completion, number cancellation, verbal fluency, and digit span) were assessed at the end of the day, approximately 5h after drug intake. The current paradigm was part of a larger protocol, including the assessment of a behavioral task after the fMRI session (van Holstein et al. 2011).
Baseline Working Memory
Baseline working memory capacity was defined as the average score of 4 assessments of the digit span (Groth-Marnat 1997), which was measured approximately 5½h after first drug intake. Drugs did not affect the digit span as revealed by analysis of variance (ANOVA) with the factors drug (4) and span (forward and backward; main drug: F21 = 0.46, P = 0.71, drug × span: F21 = 1.14, P = 0.34). The average total span across the 4 drug sessions correlated significantly with the Dutch version of the listening span (Salthouse et al. 1991; r22 = 0.56, P = 0.003), which was measured during intake. The average digit span was selected for drug effect stratification, because it was thought to provide a more reliable estimate of working memory capacity due to the fact that it was repeatedly administered. Supplementary Material analyses were done to confirm working memory-dependent drug effects with measures taken during intake (listening span) and placebo (digit span) only (Supplementary Material).
The reversal-learning task (Fig. 1) was similar to that used previously (Cools et al. 2006, 2009; Robinson et al. 2010; van der Schaaf et al. 2011), programmed with Presentation software (Version 10.2 Neurobehavioral Systems, Inc.) and presented on a screen that was visible via a mirror on the head coil in the scanner. On each trial, participants were presented with 2 simultaneously presented vertically adjacent stimuli, a face and a scene (location randomized). One of these stimuli was associated with reward and the other with punishment. Unlike standard probabilistic reversal paradigms, participants did not choose between the 2 stimuli. Instead, one of the stimuli was selected by the computer (highlighted with a black border), and participants were asked to predict the outcome associated with this preselected stimulus. After the prediction, indicated with a right index or middle finger button press (counterbalanced across participants), the actual outcome was presented. Note that these outcomes did not depend on participants' responses, but were directly coupled to the highlighted stimulus. Accordingly, task contingencies were Pavlovian rather than instrumental. Stimuli were presented until a response was made. After a 1000-msec delay, the actual outcome was presented (100% deterministic) for 500 msec, followed by an intertrial interval of 1500–6000 msec. If participants did not respond within 1500 msec, a “Too late” message was presented for 500 msec. The stimulus–outcome contingencies reversed after 4, 5, or 6 consecutive correct predictions. Such reversals were signaled by either an unexpected punishment (presented after a previously rewarded stimulus was highlighted) or an unexpected reward (presented after a previously punished stimulus was highlighted). Accuracy on the trials directly following these unexpected outcomes (reversal trials) reflect how well participants updated stimulus–outcome associations after either unexpected rewards or unexpected punishments. Reward consisted of a green smiley with a “+€100” sign. Punishment consisted of a red sad smiley with a “−€100” sign. Participants were not proportionally compensated to the winnings, because the number of rewards and punishments was equated to the end, and did not depend on participants' performance. However, similar neural effects have been reported for real and hypothetical rewards (Bray et al. 2010; Miyapuram et al. 2012), supporting the use of fictive monetary outcomes in combination with smiley's as a good alternative in the present study (for further details on stimulus presentation see Supplementary Material). Participants performed 4 experimental blocks, each consisting of 1 acquisition stage until the first reversal and a variable number of reversal stages depending on the participant's performance. The average number of reversal trials was 33 (±9) for punishment and 33 (±8) for reward. The task finished when 592 trials were completed (∼60 min).
Our dependent variable was the proportion of correct responses on reversal trials following the unexpected outcomes. On these reversal trials, the same stimulus was highlighted again such that requirements for motor switching and prediction updating were matched between reward and punishment conditions. This enabled direct comparisons between reward and punishment reversals. “Valence-dependent” and “valence-independent” reversal scores were calculated by computing, respectively, the difference between, and the average of reward and punishment reversal scores.
Image Acquisition and Analysis
Structural (T1-weighted MP-RAGE sequence, time echo/time repetition (TE/TR) = 3.03/2300 ms, flip angle = 8°, field of view (FOV) = 256 × 256 × 192 mm, voxel size = 1 mm isotropic, GRAPPA acceleration factor 2) and functional images (whole-brain gradient echo planar imaging sequence; TE/TR = 25/1890 ms, flip angle = 80°, FOV = 212 × 212 × 122 mm, matrix = 64 × 64, 37 ascending transverse slices; voxel size = 3.3 × 3.3 × 3.0 mm, 0.3 mm gap between slices) were collected using a 3-T Siemens MRI scanner with an 8-channel head coil. To reduce signal drop-out and geometric distortions, we used a short TE and reduced echo train length by means of factor 2 accelerated GRAPPA (Griswold et al. 2002). Images were analyzed using SPM5 (Wellcome Department of Cognitive Neurology, London, United Kingdom). Images were realigned to the first volume, coregistered to the structural MR image, normalized to the standard Montreal Neurological Institute (MNI) template, re-sampled into 2-mm isotropic voxels, and smoothed with an isotropic 8-mm full-width half-maximum Gaussian kernel.
Event-related responses were modeled with a canonical hemodynamic response function and time-locked to the presentation of the outcome. For each drug session, a general linear model was defined including 6 task-related parameters, their temporal derivatives, and 9 noise parameters. The following parameters were included: 1) Unexpected punishment, 2) unexpected reward, 3) correctly predicted expected punishment, 4) correctly predicted expected reward, 5) incorrectly predicted or missed expected punishment, and 6) incorrectly predicted or missed expected reward, 7–12) 6 head movement parameters (obtained from the realignment procedure), and 12–15) 3 parameters to model global intensity changes (the time series of the mean signal from the white matter, cerebral spinal fluid, and out of brain segments; Verhagen et al. 2008). Time series were high-pass filtered (cut-off 128 s). The parameter estimates, derived from the mean least squares fit of the model to the data, reflect the strength of covariance between the data and the canonical response function for a given trial type of interest.
Individual contrast maps, between parameter estimates for different events of interest, were calculated for 2 contrasts per drug condition: One representing the valence-independent BOLD signal ([unexpected reward − correctly predicted expected reward] +[unexpected punishment − correctly predicted expected punishment]) and an other representing the valence-dependent BOLD signal ([unexpected reward − correctly predicted expected reward] − [unexpected punishment − correctly predicted expected punishment]). By contrasting all unexpected outcomes with expected outcomes, we controlled for processes related to visual stimulation (e.g. reward and punishment). For both contrasts, individual drug-difference maps were calculated for all 6 pair-wise drug comparisons (bromocriptine–sulpiride, sulpiride–placebo, bromocriptine–placebo, sulpiride–combined administration, bromocriptine–combined administration, and placebo–combined administration), which were then taken to the second-level random-effects group analyses.
The second-level group analysis was performed, using simple regression procedures, as implemented in Statistical Parametric Mapping software (SPM5, http://www.fil.ion.ucl.ac.uk/spm), to investigate working memory-dependent drug effects on valence-dependent and valence-independent BOLD signal. To this end, individual drug-difference maps were submitted to a second-level one-sample T-test, and individual working memory scores were entered as a covariate of interest. Significant voxels revealed by the covariate of interest represent brain regions where there is a linear relationship between individual measures of working memory and drug effects on valence-dependent or valence-independent BOLD signal. Working memory-independent effects of drug were assessed using similar one-sample T-tests, without the covariate.
Because simple regression analysis better captures the linear nature of the association between working memory and drug effects on BOLD signal, this method was chosen over a more commonly used median-split analysis, in which drug effects on BOLD signal are compared between 2 groups with high- and low-working memory participants. To ensure that the results from the simple regression analysis were not statistical artifacts driven by, for example, outliers, supplementary analyses were done using the median-split procedure (Supplementary Material).
Testing of the different drug comparisons was done in an a priori-defined fixed order. We focused our first analyses on comparisons between bromocriptine and sulpiride to assess the differences between dopamine-receptor stimulation and dopamine-receptor blockade. If this analysis resulted in significant effects, each single-drug sessions was then compared with the placebo session. To assess whether any such drug effects were attenuated by coadministration of both drugs, we assessed comparisons between the combined bromocriptine/sulpiride session versus each of the single-drug sessions and placebo.
Statistical inference (P < 0.05) was performed at the voxel level, family-wise error (FWE) correcting for multiple comparisons over the search volume (the whole brain [pwb_fwe] or the anatomically defined small volume (SV) of interest, the striatum [psv_fwe]). The striatum was defined anatomically as the bilateral putamen and caudate nucleus selected from the Automated Anatomical Labeling atlas (Tzourio-Mazoyer et al. 2002). In addition, MarsBar software (Brett et al. 2002) was used to extract mean parameter estimates from the anatomically defined striatum. Drug effects on mean parameter estimates (i.e. differences between unexpected reward and expected reward and unexpected punishment and expected punishment) were assessed with repeated-measures ANOVA with the factors drug (4 levels) and valence (reward and punishment), and span (mean centered) as a covariate of interest. Subsequent parametric correlation analyses (Pearson's rho, bivariate) were performed to further investigate the associations between drug effects on BOLD signal, individual differences in working memory capacity, and behavioral performance.
Proportions of correct responses on reversal trials were arcsine transformed (2× arcsin(√x)) as is appropriate when the variance is proportional to the mean (Howell 1997). Valence-independent and valence-dependent drug effects across the group (independent of span) were analyzed with repeated-measures ANOVA (SPSS, Inc., release 15.0.0 for Windows, 2006) with the within-subject factors valence (reward and punishment) and drug. Similar to the analysis of the neural data, pair-wise drug comparisons were done according to an a priori-defined fixed order (see above). Span-dependent drug effects on behavior were assessed with correlation analysis (Pearson's rho, bivariate). To this end, individual difference scores between 2 drug sessions of interest (following from the neural data) were calculated for valence-independent and valence-dependent reversal scores and correlated with individual measures of baseline working memory. Greenhouse–Geisser corrections were applied when the sphericity assumption was violated (Howell 1997).
Dopaminergic Drug Elicit Effects on Striatal BOLD Signal During Both Reward and Punishment Reversal Learning
When working memory capacity was taken into account, voxel-wise analyses revealed significant span-dependent drug effects on valence-independent BOLD signal in the striatum. No span-dependent drug effects were seen on valence-dependent BOLD signal. Specifically, working memory capacity predicted the effect of sulpiride relative to bromocriptine on valence-independent BOLD signal in the striatum (left striatum x, y, z = −22, 18, 4, psv_fwe = 0.03, right striatum x, y, z = 32, 8, 0; psv_fwe = 0.045; Table 1; Fig. 2a). Further analyses revealed that these span-dependent effects were driven by effects of sulpiride relative to placebo (left striatum x, y, z = −30, 2, 4, psv_fwe = 0.02; right striatum x, y, z = 24, 14, −8, psv_fwe = 0.04), while there were no effects of bromocriptine relative to placebo. Thus, greater working memory capacity was associated with greater sulpiride-induced increases in striatal BOLD signal during both unexpected rewards and punishments (Fig. 2a). No significant drug effects on valence-independent or valence-dependent BOLD signal were seen when working memory was not taken into account.
|Region||N voxels||T-value||MNI coordinates|
|Region||N voxels||T-value||MNI coordinates|
L = left side, R = right side.
The voxel-wise results were strengthened by analyses on mean BOLD signal extracted from the anatomically defined striatum. Repeated-measures ANOVA with the factors drug (4), valence (2), and span as a covariate revealed a significant interaction between drug and span (F16,3 = 4.540, P = 0.007). These effects were restricted to valence-independent signal, as there was no drug × span × valence interaction (F16,3 = 0.46, P = 0.71). There were also no span-independent drug effects (main effect of drug: F16,3 = 0.56, P = 0.65; drug × valence interaction : F16,3 = 0.03, P = 0.99). Subsequent correlation analyses revealed a highly significant relationship between span and drug effects on valence-independent BOLD signal for sulpiride versus bromocriptine (r19 = 0.69, P < 0.001) and for sulpiride versus placebo (r19 = 0.6, P = 0.005) (Fig. 2b), but not for bromocriptine versus placebo (r19 = −0.06, P = 0.8). Similar relationships were observed when BOLD signal was separately assessed for unexpected–expected rewards (sulpiride–bromocriptine: r19 = 0.6, P = 0.005; sulpiride–placebo: r19 = 0.5, P = 0.02) and for unexpected–expected punishment (sulpiride–bromocriptine: r19 = 0.5, P = 0.022; sulpiride–placebo: r19 = 0.46, P = 0.04).
Critically, these span-dependent effects of sulpiride on striatal BOLD signal were attenuated when participants were subsequently treated with bromocriptine. Voxel-wise analyses no longer revealed any significant (span-dependent) effects of the combined administration of sulpiride + bromocriptine relative to the placebo, and mean signal analyses confirmed this lack of effect (r18 = 0.3, P = 0.2). In addition, voxel-wise direct comparison between sulpiride and sulpiride + bromocriptine revealed a trend toward span-dependent effects on valence-independent signal (left striatum x, y, z = −26, 14, −6, psv_fwe = 0.06). This was supported by mean signal analyses, which revealed a similar trend (r18 = 0.4, P = 0.096). See Supplementary Material for confirmation of these effects with a median-split analysis and with correlation analyses using baseline working memory measures taken during intake and placebo only.
Dopaminergic Drugs Elicit Differential Effects on Reward and Punishment Reversal Learning
Consistent with previous work (Cools et al. 2006, 2009), we found differential dopaminergic drug effects on behavioral measures of reward- versus punishment-based reversal learning (Table 2). Repeated-measures ANOVA on reversal scores across participants (irrespective of span) revealed a valence-dependent effect of sulpiride relative to bromocriptine. This was evidenced by a drug (bromocriptine vs. sulpiride) by the valence interaction (F1,20 = 4.6, P = 0.045), due to better reward- relative to punishment-based reversal learning after sulpiride compared with bromocriptine. Comparisons of the valence-dependent effects of sulpiride, and of bromocriptine, with placebo did not reach significance (sulpiride vs. placebo: F1,20 = 0.3, P = 0.6; bromocriptine vs. placebo: F1,19 = 1.8, P = 0.2). Valence-dependent reversal scores in the combined sulpiride–bromocriptine session did not differ from those in the bromocriptine session (drug × valence: F1,19 = 0.08, P = 0.8), from those in the sulpiride session (drug × valence: F1,19 = 2.7, P = 0.1), or from those in the placebo session (drug × valence: F1,19 = 1.4, P = 0.3). Together, these data show opposite effects of sulpiride and bromocriptine on reward- relative to punishment-based reversal learning across participants.
|Sulpiride + bromocriptine||0.90(0.02)||0.90(0.02)|
|Sulpiride + bromocriptine||0.90(0.02)||0.90(0.02)|
Note: values represent mean proportion (standard error of the mean) of correct responses on the reversal trials immediately after the unexpected reward or punishment (raw proportion scores).
Critically, as was the case for the neural effects, the behavioral effects of sulpiride (vs. placebo) could also be predicted from working memory capacity. Thus, there was a significant positive relationship between working memory capacity and effects of sulpiride (relative to placebo) on valence-dependent reversal scores (reward- minus punishment-based reversal; r19 = .5, P = 0.03). Greater working memory capacity was associated with greater sulpiride-induced increases on reward- relative to punishment-based reversal learning (Fig. 2c). See Supplementary Material for confirmation of this effect with baseline working memory measures taken during intake and placebo only.
Dopaminergic Drug Effects on Striatal Signal are Associated with Differential Effects on Reward and Punishment Learning
Next, we investigated brain–behavior associations between the effects of sulpiride on striatal BOLD signal (extracted from the anatomically defined striatum) and on reversal scores. Intriguingly, the effect of sulpiride (relative to placebo) on valence-independent striatal signal during both reward and punishment was positively correlated with the effect of sulpiride (relative to placebo) on valence-dependent (reward vs. punishment) reversal scores (r19 = 0.5, P = 0.02; Fig. 2d). Sulpiride-induced increases in the striatal BOLD signal during unexpected reward and punishment were associated with sulpiride-induced improvement in reward- relative to punishment-based reversal learning. Conversely, sulpiride-induced decreases in striatal signal during unexpected reward and punishment were associated with sulpiride-induced impairments on reward- relative to punishment-based reversal learning.
Physiology, Mood, and Background Neuropsychology
Physiological effects were as predicted, with large prolactin increases after intake of sulpiride (Time2–Time1: T20 = −9.2, P < 0.001) or sulpiride + bromocriptine (Time2–Time1: T20 = −9.7, P < 0.001), and prolactin decreases (Time2–Time1: T20 = 3.4, P = 0.003) and systolic blood pressure decreases (Time3–Time1: T21 = −3.32, P = 0.003) after intake of bromocriptine (Supplementary Table S1). The absolute prolactin decreases after bromocriptine and increases after sulpiride or combined administration were consistent and in the same direction for all participants. Drug effects were not attributable to nonspecific drug effects on mood and global cognitive functioning, as drugs did not affect general cognitive performance or mood (Supplementary Tables S2 and S3).
In the present study, pharmacological fMRI was used to investigate the effects of catecholaminergic drugs on striatal signals during reward and punishment reversal learning. The data add significantly to previous work on the effects of dopaminergic drugs on reward and punishment learning in 3 ways. First, previous work did not allow conclusions about the dopamine dependency of the effects. Here, we demonstrate that effects of the dopamine-receptor antagonist sulpiride on striatal BOLD signal during reward and punishment learning are partly attenuated by coadministration with the dopamine-receptor agonist bromocriptine. This evidences the neurochemical specificity of the effects. Secondly, existing studies have provided conflicting data regarding drug effects on striatal responses during punishment. We show that sulpiride-induced increases in the striatal BOLD signal during both rewards and punishments were associated, on a subject-by-subject basis, with opposite effects on behavior, that is, improvement in reward versus punishment reversal learning. Finally, our findings extend existing observations that behavioral effects of dopamine agents on learning depend on working memory capacity (Frank and O'Reilly 2006; Cools 2011), presumably reflecting baseline dopamine synthesis capacity (Cools et al. 2008; Landau et al. 2009). Here, we show that not only behavioral, but also neural effects on learning in the striatum strongly depend on individual differences in working memory capacity.
The present findings concur generally with previous work by Jocham et al. (2011), who have shown beneficial effects of the dopamine-receptor antagonist amisulpiride on reward-based choice. The degree to which sulpiride enhanced the ability to select the better of 2 rewarding options correlated with the degree to which sulpiride potentiated reward prediction error signals during learning in the striatum (c.f. Frank and O'Reilly 2006 for similar behavioral effects). However, this striatal finding by Jocham et al. (2011) was in direct contrast with that reported by Pessiglione et al. (2006), who showed that, relative to l-dopa, the catecholamine antagonist, haloperidol impaired reward learning and attenuated reward prediction error signals in the striatum. The present study suggests that these paradoxical effects on striatal processing might reflect individual differences in the baseline state of the system. Thus, the dopamine-receptor antagonist sulpiride potentiated reward- and punishment-related BOLD signal in the striatum in participants with high-working memory capacity, while attenuating striatal BOLD signal in participants with low-working memory capacity.
No previous studies revealed drug effects on striatal signals during punishment learning. This lack of effects had suggested an important role for striatal dopamine in reward, but not punishment learning. However, the present study demonstrates that dopamine affects both reward and punishment prediction error signals in the striatum in the same direction. Unlike previous studies, we might have revealed these effects during both outcomes because in our paradigm, reward and punishment events were matched in terms of frequency of occurrence as well as the need for behavioral adjustment. Importantly, these changes in valence-independent striatal BOLD signal were directly associated with valence-dependent reversal learning. Thus, drug-related increases in striatal signal during both rewards and punishments were associated with drug-related increases in reward relative to punishment learning. At first sight, this finding might seem surprising, given studies that argue that striatal responses represent valence-independent signals associated with a mismatch between expected and actual outcomes, saliency, or valence nonspecific switching (Redgrave et al. 1999; Zink et al. 2003; Jensen et al. 2007; Matsumoto and Hikosaka 2009). However, if the striatal signal had reflected such nonspecific processing, then its potentiation should have been associated with behavioral improvement in both reward and punishment learning. Instead, we observed that its potentiation was associated with improved reward relative to punishment learning. Accordingly, we argue that our parallel effects likely reflect the modulation of overlapping and/or intermingled representations or reward and punishment prediction errors (Seymour et al. 2007; Robinson et al. 2010). This interpretation concurs with recent insights, suggesting that striatal BOLD signal increases with increased dopamine release (Knutson and Gibbs 2007; Schott et al. 2008; Duzel et al. 2009). Indeed, we have previously shown that increased dopamine activity is associated with increased reward relative to punishment learning (Cools et al. 2009). Furthermore, striatal BOLD signal has been proposed to relate specifically to postsynaptic D1 signaling (Knutson and Gibbs 2007). This is particularly interesting because theoretical modeling work suggests that improvement in reward relative to punishment learning might reflect a shift in the balance between D1-receptor-mediated GO pathway processing and D2-receptor-mediated NOGO pathway processing (Frank 2005). Thus, sulpiride-induced changes in striatal BOLD might reflect a shift in the D1/D2-receptor signaling balance, such that increases in striatal BOLD reflect a shift in favor of the D1-mediated GO pathway processing, improving reward relative to punishment learning and decreases in striatal BOLD reflect a shift away from the D1-receptor-mediated GO pathway, impairing reward relative to punishment learning. In this context, it is not surprising that changes in reward relative to punishment learning are accompanied by parallel changes in striatal signal during both rewards and punishments.
Moreover, the present study employed a Pavlovian task, which required the prediction of outcomes rather than the use of outcomes for instrumental action control. Similarly, above-baseline striatal signals for reward and punishment prediction errors were previously observed in nondrug studies, mostly in studies using Pavlovian tasks (Seymour et al. 2005, 2007; Delgado et al. 2008; Robinson et al. 2010). Indeed, accumulating evidence indicates the sensitivity of Pavlovian control to dopamine manipulation (Dickinson et al. 2000; Wyvell and Berridge 2000; Flagel et al. 2011).
The effects depended on baseline working memory capacity. Greater working memory capacity was associated with greater sulpiride-induced improvement on reward versus punishment learning and with greater potentiation of striatal signals. Based on previous work (Cools et al. 2008; Cools et al. 2009), we argue that this dependency on working memory capacity reflects dependency on baseline dopamine synthesis capacity, so that greater improvement with the dopamine-receptor antagonist sulpiride was particularly prominent in participants with higher baseline dopamine levels. Such individual differences in drug effects might reflect individual differences in pre- versus postsynaptic D2-receptor sensitivity (Frank and O'Reilly 2006; Cools et al. 2009), with high-dopamine participants being particularly sensitive to the presynaptic effects of sulpiride. Thus, D2-receptor blockade with sulpiride might have improved reward versus punishment learning in high-dopamine participants, because it blocked the presynaptic autoregulation of dopamine levels, leading to a net increase in postsynaptic (D1)-receptor stimulation. Similarly, it has been argued that low doses of D2-receptor antagonists might in fact act as a stimulant, acting on the more sensitive presynaptic autoreceptors when administered in healthy participants (Frank and O'Reilly 2006; Santesso et al. 2009; Jocham et al. 2011). However, a definitive account of the direction of drug effects remains a major challenge in dopaminergic drug research, because the determinants of pre- versus postsynaptic action of dopamine-receptor agents are unknown. Accordingly, we refrain here from definitively interpreting the direction of the drug effect.
Coadministration of the dopamine-receptor antagonist sulpiride and the dopamine-receptor agonist bromocriptine diminished the neural effects of sulpiride, suggesting that sulpiride exerted its effects via action at dopamine receptors that are targeted by both sulpiride and bromocriptine. Because both bromocriptine and sulpiride have high affinity for D2-receptors, these effects are likely mediated by D2-receptor mechanisms in the striatum. Previous human pharmacological work did not provide such evidence, because the drugs used were not neurochemically specific. Bromocriptine itself had no significant effect on striatal signals when tested relative to placebo. Perhaps effects did not reach significance due to the time of testing relative to administration. The time of testing was optimized for the current fMRI paradigm and for a subsequent behavioral task. Accordingly, participants were tested 1¾h after bromocriptine administration, relative to 2½h in our prior study, which did reveal effects of bromocriptine (Cools et al. 2009). Similar to that previous study, bromocriptine (but not sulpiride) did affect performance on the subsequent set-shifting task administered 3½h after bromocriptine intake in the same participants (van Holstein et al. 2011). This suggests that effects did not reach significance due to the time of testing relative to administration. However, combined administration of bromocriptine did abolish effects of sulpiride, suggesting that effects of bromocriptine were nevertheless sufficient for blocking the effects of sulpiride. Moreover, the lack of a significant effect of bromocriptine relative to placebo is important for the interpretation of our results, because it demonstrates that the neural effect of sulpiride is blocked instead of masked (or averaged out) by bromocriptine. The implication of this pattern of findings is that we can assert that sulpiride exerted, at least part of, its effects via action at dopamine (D2-) receptors.
In summary, the present study investigated dopaminergic drug effects on striatal BOLD signaling during reward and punishment learning. Results revealed that sulpiride induced parallel working memory-dependent changes on striatal BOLD signal during unexpected rewards and punishments. These effects were partially dopamine dependent as they were blocked by coadministration with bromocriptine. Moreover, sulpiride-induced increases in the striatal BOLD signal during both outcomes were associated with behavioral improvement in reward versus punishment reversal learning. These results support current theories, suggesting that drug effects on reward and punishment learning are mediated by striatal dopamine (Frank 2005).
These results have implications not just for understanding how dopamine alters learning, but also for (the treatment of) dopamine-related disorders in psychiatry. For example, sulpiride is commonly used to treat psychosis. One mechanism by which dopaminergic abnormality might elicit psychosis is by inducing aberrant assignment of salience or motivational importance to irrelevant stimuli (Kapur et al. 2005). Our results suggest that sulpiride might remediate psychosis in schizophrenia by attenuating aberrant reward relative to punishment prediction updating and associated striatal processing in a dopamine-dependent fashion (Morrison and Murray 2009). This hypothesis concurs with the observation that schizophrenia is accompanied by low-working memory capacity (Barch and Ceaser 2011). More generally, our findings highlight the need for individual tailoring of dopaminergic drug administration in psychiatry.
R.C. was supported by a VIDI grant from the innovational Incentives Scheme of the Netherlands Organization for Scientific Research (NWO) and a research grant to Kae Nakamura, R.C., and Nathaniel Daw from the Human Frontiers Science Program (RGP0036/2009-C).
We thank Jan Leijtens for blood sampling and assistance during medical screening. We thank Joris van Elshout for assistance with data collection. We thank the Chemical Endocrinology Department at the Radboud University Nijmegen, especially Fred Sweep and Rob van den Berg, for analyzing the blood samples. Conflict of Interest: None declared.