Learning to predict rewarding and aversive outcomes is based on the comparison between predicted and actual outcomes (prediction error: PE). Recent electrophysiological studies reported that during a Pavlovian procedure some dopamine neurons code a classical PE signal while a larger population of dopaminergic neurons reflect a “salient” prediction error (SPE) signal, being excited both by unpredictable aversive events and by rewards. Yet, it is still unclear whether specific human brain structures receiving afferents from dopaminergic neurons code a SPE and whether this signal depends upon reinforcer type. Here, we used a model-based functional magnetic resonance imaging approach implementing a reinforcement learning model to compute the PE while subjects underwent a Pavlovian conditioning procedure with 2 types of rewards (pleasant juice and monetary gain) and 2 types of punishments (aversive juice and aversive picture). The results revealed that activity of a brain network composed of the striatum, anterior insula, and anterior cingulate cortex covaried with a SPE for appetitive and aversive juice. Moreover, amygdala activity correlated with a SPE for these 2 reinforcers and for aversive pictures. These results provide insights into the neurobiological mechanisms underlying the ability to learn stimuli-rewards and stimuli-punishments contingencies, by demonstrating that the network reflecting the SPE depends upon reinforcement’s type.
Rewards and punishments have opposite hedonic valences but both are motivationally salient events. A fundamental question is to know whether motivational salience and hedonic valence, which represent 2 distinct but closely related attributes of reward and punishment, are separately encoded by the brain. Animals, including humans, learn to associate various stimuli with different types of rewards and punishments. Central for such behavior is the capacity to compute the discrepancy between the prediction and the actual outcome (prediction error: PE). Based on a wealth of evidence from electrophysiological recording studies in nonhuman primates, rodents and humans, it has been widely assumed that dopaminergic neurons encode a reward PE (RPE), with a positive phasic response when the outcome is better than expected (unexpected reward or omission of expected punishment) and a negative response when it is worse than expected (unexpected punishment or omission of expected reward) (Schultz 1998; Bayer and Glimcher 2005; Pan et al. 2005; Roesch et al. 2007; Zaghloul et al. 2009). According to this hypothesis, referred to as the RPE hypothesis, the sign of the PE is opposite for rewards and punishments.
However, in awake monkeys, recent recordings from the same dopaminergic neurons for rewards and aversive events point to the coexistence of a phasic dopaminergic signal encoding biologically salient events conveying both positive and negative information (Matsumoto and Hikosaka 2009b). During a Pavlovian procedure, one class of dopaminergic neurons located ventromedially, some in the VTA, are excited by unexpected rewards and inhibited by unexpected aversive stimuli, as expected by the RPE hypothesis. Yet, a larger subpopulation of dopamine neurons, located more dorsolaterally in the substantia nigra pars compacta, are excited both by unpredictable reward and aversive stimuli, as would predict a salient PE (SPE) hypothesis. Moreover, recent results in rodents confirm that, while some dopaminergic neurons of the VTA are inhibited by aversive stimuli, others are excited by these same stimuli (Brischoux et al. 2009). These findings suggest that different groups of dopamine neurons convey RPE and SPE signals, shedding light on increased striatal dopamine levels observed not only during appetitive conditioning (Reynolds et al. 2001) but also during aversive conditioning (Pezze and Feldon 2004; Young 2004). Together, these results raised the possibility of the coexistence of 2 brain networks active during the learning of associations between cues and rewards or punishments: a reward brain network, treating reward and punishment in opposite ways (opposite hedonic valences), and a salient brain network, which treats them in a similar manner as motivationally salient events.
To date, most of human functional magnetic resonance imaging (fMRI) studies using computational reinforcement learning models investigated which brain system show responses consistent with the RPE when learning associations between conditioned stimuli and different types of rewards, such as juice (Berns et al. 2001; McClure et al. 2003; O'Doherty et al. 2004), odor (Gottfried et al. 2003), money (Tanaka et al. 2004; Abler et al. 2006; Pessiglione et al. 2006), and attractive faces (Bray and O'Doherty 2007). Paralleling these investigations, a number of human fMRI studies also investigated the cerebral substrates of aversive conditioning using a variety of punishments, such as painful stimuli, aversive juices, odors, tones, or visual stimuli (LaBar et al. 1998; Buchel et al. 1999; Gottfried, Deichmann, et al. 2002; Seymour et al. 2004; Knight et al. 2010; Sarinopoulos et al. 2010) or monetary losses (Delgado et al. 2000, 2008; Knutson et al. 2000; Nieuwenhuis et al. 2005). However, these fMRI studies either separately investigated RPE or PE related to aversive stimuli in designs using only positive or only aversive events or did not vary the type of reinforcer (e.g., only used monetary gains and losses). Thus, it is still unclear whether the regions with response profiles consistent with the SPE and RPE signals depend upon the reinforcer type.
Only a task design simultaneously manipulating both positive and negative outcomes, as well as reinforcer types, would allow us to directly investigate whether separate or overlapping targets of dopaminergic neurons are consistent with an RPE or with an SPE signal. Using a computational reinforcement learning approach in a Pavlovian conditioning procedure combining appetitive and aversive juice, money and aversive pictures, we investigated whether the RPE and SPE signals depend upon reinforcer types. We chose to investigate the cerebral representations of PE and SPE related to 2 appetitive cues: one immediate and gustatory (apple juice), the other visual and delayed (money), and 2 aversive cues, both indicating an immediate aversive outcome (aversive picture and salty water).
Brain regions in which the activity reflects an RPE signal should represent reward and punishment in opposite fashion, responding with a positive PE when the outcome is better than expected (unexpected reward or omission of expected punishment) and a negative PE when it is worse than expected (unexpected punishment or omission of expected reward). In contrast, brain regions with response profiles consistent with an SPE signal should treat both rewards and punishments in a similar fashion, as salient outcomes, with a positive response when both of these salient events are delivered and a negative response for their omission (Fig. 1).
Materials and Methods
Twenty healthy subjects (10 females) with no history of neurological or psychiatric illness participated in the experiment (mean age: 24.4; range: 18–33). All subjects were right-handed as assessed by the Edinburgh Handedness Inventory (Oldfield 1971). Subjects were drink deprived for 12 h prior to scanning to ensure that subjects remained thirsty during the experiment and only drunk 2.4 mL of liquid over the experiment to keep them in a drink deprivation state. The study was approved by the local ethics committee (Lyon) and all subjects gave informed consent.
The task used a Pavlovian conditioning procedure in which affectively neutral visual cues were paired with 4 types of reinforcers: apple juice, salty water, money, or aversive picture (Fig. 1A). Each trial was divided into 2 phases: anticipation and reception. The anticipation phase began with a cue (geometric form) displayed until a response button was made (in less than 1 s). This cue was followed by a delay period of 6 s displaying a fixation cross. Then, in the reception phase, either the corresponding reinforcer or a scrambled picture was presented for 1.5 s, in a 50% reinforcement schedule. Each reinforced trial was followed by real consequences: 1) in the apple juice condition, the subjects were delivered 0.05 mL of apple juice in the mouth while they were presented with a picture representing a glass of apple juice; 2) in the salty water condition, the subjects were delivered 0.05 mL of salty water while they were presented with a picture representing a brown glass of water; 3) in the aversive picture condition, an aversive picture was presented; 4) in the monetary reward condition, subjects were presented with a 20 Euros bill picture and were informed that they would earn a percentage of each of these bills at the end of the experiment. A blank screen was finally used as an intertrial interval of variable duration (2.5–5.5 s) (Fig. 1B). In a fifth condition (neutral condition), a cue announced the neutral scrambled picture with certainty. To maintain the subjects’ attention, they were asked to press a response button as soon as they saw the cue. Subjects were explicitly informed that the delivery of the reinforcer was independent of their responses and knew that the cue would disappear after 1 s, even if they did not make a key press.
Stimuli and Reinforcers
Visual stimuli were back projected on a screen located at the head of the scanner bed and presented to the subjects through an adjustable mirror located above their head. We investigated the neural representations of PE and SPE related to 2 appetitive cues (apple juice and money) and 2 aversive cues (aversive picture and salty water). The presentation of the stimuli as well as the juice delivery were controlled by Presentation software (Neurobehavioral Systems), which also recorded trigger pulses from the scanner signaling the beginning of each volume acquisition.
We used 2 liquids of opposite valence: appetitive apple juice and aversive salty water consisting of 0.2 M of NaCl. They were contained in 2 60-mL syringes, connected to an IVAC P7000 electronic pump positioned in the scanner control room. For each reinforced trial in the taste conditions, 0.05 mL of liquid were delivered in the subjects’ mouth via 2 separate 6 m long and 1 mm wide polyethylene tubes. This small amount of liquid was chosen to minimize any satiety effect that could occur during the experiment. In order to reduce head movement related to swallowing, subjects were instructed to swallow only during the intertrial interval, after the reinforcer offset and before the new trial onset.
For the monetary reward trials, a picture of a 20 Euros bill was presented at the center of the screen and subjects were told that they would earn a percentage of this amount after the experiment. Subjects were not told the exact percentage to avoid counting during the experiment. By the end of the experiment, each subject had seen 24 bills and earned 20€, in addition to the 50€ earned for being scanned. Thus, subjects were paid a fixed amount of 70€ for their participation but did not know this amount before the end of the experiment. Finally, in the aversive picture condition, visually negative reinforced trials corresponding to a highly repelling picture from the International Affective Picture System (Lang et al. 2005) showed a mutilated face (picture no 3060: valence = 1.79 ± 1.56; arousal = 7.12 ± 2.09). All unreinforced trials were represented by the presentation of a unique scrambled neutral picture.
There were intrinsic differences between the 4 reinforcers due to their specific nature, since they had distinct valence (appetitive/aversive), modality (visual only/visual + gustatory) and were of different types (primary/secondary). Each reinforcer can be characterized in the following way: apple juice: appetitive, gustatory + visual, immediate (primary); monetary reward: appetitive, visual, delayed (secondary); salty water: aversive, gustatory + visual, immediate (primary); aversive picture: aversive, visual, immediate (primary). Thus, one should keep in mind that the 4 conditions did not differ by only one factor but also that this is not a major problem since we did not perform one to one comparison between each reinforcer.
Note that we did not include monetary losses in the experiment because monetary gains and losses are known to weigh differently (Tom et al. 2007) and because 1) it can be questioned whether monetary loss acts as a primary punishment like aversive liquids or aversive pictures; 2) monetary losses are the removal of a valued appetitive stimulus (type II punishment) whereas physical punishments (type I punishment) are the administration of an aversive stimulus (Skinner 1938).
In fact, monetary losses and physical punishments may be coded in opposite direction during aversive conditioning, unexpected monetary losses leading to a decrease in striatal blood oxygen level–dependent (BOLD) response (Delgado et al. 2000; Yacubian et al. 2006) while unexpected painful electrical stimulation may lead to an increase in striatal BOLD response (Seymour et al. 2004; Menon et al. 2007). Since we did not include monetary losses, it should be noted that our design is not symmetric. However, if we had chosen to use only money (gains and losses) and juice (appetitive and aversive), we could not have made generalizations about PE and SPE coding for primary reinforcers because we would have included only one type of primary reward/punishment (juice).
The experiment consisted in 3 scanning runs of 15 min, separated by short breaks during which the echo-planar image (EPI) sequence was stopped, allowing the subject to rest. In each run, the 5 conditions were presented in blocks of 16 successive trials. Each run was pseudorandomly ordered according to a Latin square design so that each of the 5 conditions appeared only once at different serial positions within a run and that they alternated with no repetition within a run (e.g., run I: 12345, run II: 43521, run III: 51432, where 1,2,3,4,5 corresponds to each condition). The order of the runs was also counterbalanced across subjects. In each run, a new cue was used for each condition and subjects had to learn the probabilistic association between this cue and the corresponding reinforcer. The trials from the different conditions were not mixed between each others to avoid relative comparison between the values of the different reinforcers. Indeed, a large body of literature reports context-dependent activity in different components of the reward system (Tremblay and Schultz 1999; Nieuwenhuis et al. 2005). This design allowed us to test the SPE hypothesis versus the RPE hypothesis (Fig. 1C), while distinguishing between the modality (gustatory or visual) and the nature (primary/immediate or secondary/delayed) of the reinforcers.
Before the scanning session, subjects performed a two-alternative forced-choice preference task designed to investigate if they had a preference for cues explicitly associated with positive reinforcers relative to cues associated with negative reinforcers. On each trial (72 trials total), 2 out of 4 possible cues explicitly indicating both the type of reinforcer and the chance to be reinforced (P = 0.5) were presented side-by-side and subjects had to choose which one they preferred by a left or right key press. There were 6 different pairs of cues, repeated 12 times each, and the cues were randomly assigned to the left or right side of the screen. After the decision, the chosen reinforcer was effectively delivered with a probability of P = 0.5 (i.e., the cues predicting apple juice or salty water were followed in 50% of the trials by the simultaneous presentation of the corresponding glass and by the delivery of 0.05 mL of liquid in the mouth of the subject). For each reinforcer, we computed a preference score (percent chosen if available) as the number of times one cue was chosen divided by the number of times this cue was presented.
During scanning, to assess whether the probabilities of the cue-reinforcer association were explicitly learnt, subjects were asked, at the end of each block, to rate the probability that each cue was associated with the reinforcer. Such rating was done by positioning a cursor on a continuous scale from 0 to 1, representing an estimate of the probability that a cue was paired with a specific reinforcer.
Finally, to assess the value of each reinforcer after the scanning session, subjects were also asked to provide pleasantness ratings for each reinforcer on a scale ranging from −2 (very unpleasant) to 2 (very pleasant). Subjects were also asked to rate their thirst on a scale ranging from 1 (not thirsty at all) to 5 (extremely thirsty), both before and after scanning. The a priori significance level was defined at P < 0.05 for all the behavioral tests.
fMRI Data Acquisition and Preprocesses
fMRI data were acquired on a 1.5 Tesla Siemens MRI scanner. BOLD signal was measured with gradient echo -weighted EPIs. Twenty-six interleaved slices parallel to the AC-PC line were acquired per volume (matrix 64 × 64, voxel size = 3.4 × 3.4 × 4 mm). In total, 410 volumes were acquired continuously every 2.5 s for each of the 3 runs. The first 4 volumes of each run were discarded to allow the BOLD signal to reach a steady state. A T1-weighted structural image (1 × 1 × 1 mm) was also acquired for each subject at the end of the experiment.
Data were preprocessed using the SPM5 software package. First, outlier scans (>1.5% variation in global intensity or >0.5 mm/time repetition scan-to-scan motion) were detected using the ArtRepair SPM toolbox http://spnl.stanford.edu/tools/ArtRepair/ArtRepair.htm (Mazaika et al. 2009). Since less than 5% of outlier scans were detected per subject, no repair was performed. Then, images were corrected for slice timing and spatially realigned to the first image from the first run. They were normalized to SPM5’s EPI template in Montreal Neurological Institute (MNI) space with a resampled voxel size of 3 × 3 × 3 mm and spatial smoothed using a Gaussian kernel with full-width at half-maximum of 8 mm. The T1-weighted structural scan of each subject was normalized to a standard T1 template in MNI space with a resampled voxel size of 1 × 1 × 1 mm.
We computed the value of the PE for each subject according to the sequence of stimuli they received, providing a statistical regressor for the fMRI data. The use of a probabilistic reinforcement strategy, in which the cues are only 50% predictive of their outcomes, ensures constant learning and updating of predictions and generates both positive and negative PE throughout the course of the experiment, therefore maximizing the variability of PE over each run.
Predicted values and PE values were calculated trial-by-trial by using a Rescorla–Wagner rule (Rescorla and Wagner 1972). For each trial t, a PE δ(t) was computed as the difference between the actual outcome value R(t) and its predicted value V(t) on that trial (eq. 1):
Then, the predicted value of the next trial V (t + 1) was updated by adding the PE δ (t) weighted by a learning rate α (eq. 2):
The outcome value R(t) was set to 1 when a reinforcer (either a reward or a punishment) was delivered and to 0 when a scrambled picture was delivered. V(t) was initialized to 0. The learning rate (α) was derived from subjects’ response times (RTs) to the cue. RTs have been shown to be good indicators of conditioning (Critchley et al. 2002; Gottfried et al. 2003) and to be correlated with the prediction v(t) estimated by a reinforcement learning model (Seymour et al. 2004). Several recent fMRI studies have used RTs to estimate the learning rate of reinforcement learning models during tasks in which the buttons presses were irrelevant to receive the reward (Seymour et al. 2005; Bray and O'Doherty 2007).
First, RTs were normalized to allow analysis across subjects. We derived the prediction V(t) for each subject based on their individual conditioning histories for a range of learning rates (ranging from 0.01 to 0.5). Then, trial-by-trial RTs across subjects were fitted to a regression model that included the prediction V(t). The best fit yielded a learning rate of 0.24, which is close to the value used in other studies (O'Doherty, Dayan, et al. 2003; Seymour et al. 2005; Jensen et al. 2007). Note, however, that the qualitative behavior of the model is robust to a range of such parameters (between 0.1 and 0.5).
fMRI Data Analysis
First, statistical analysis was performed using the general linear model. For each of the 5 conditions (apple juice, monetary reward, salty water, aversive picture, and neutral), 2 phases (anticipation and reception) were modeled, resulting in the creation of 10 regressors. The anticipation phase was modeled as an epoch, time locked to the onset time of the cue, with a duration equal to RT + anticipatory period (=6 s). The reception phase was modeled as a boxcar of 1.5-s duration. For each reinforced condition (i.e., all conditions except the neutral condition), predicted values V(t) and PE δ(t) generated by the Rescorla–Wagner model were used as parametric modulators of the anticipation and reception regressors, respectively. All of these 18 regressors were convolved with a canonical hemodynamic response function. In addition, the 6 ongoing motion parameters estimated during realignment were included as regressors of no interest.
Second, we calculated first-level single-subject contrasts at the time of the reception for: 1) apple juice delivery and omission positively modulated by PE (C1), 2) monetary reward delivery and omission positively modulated by PE (C2), 3) aversive juice delivery and omission positively modulated by PE (C3), 4) aversive picture delivery or omission positively modulated by PE, (C4), 5) aversive juice delivery and omission negatively modulated by PE (C5), 6) aversive picture delivery or omission negatively modulated by PE (C6).
Third, we performed 2 second-level one-way flexible factorial design (described below) including a subject factor accounting for between-subject variability, in which we used a series of conjunctions testing the conjunction null hypothesis, as implemented in SPM5 (Nichols et al. 2005).
SPE Flexible Factorial Design
This flexible factorial design included the contrasts C1–C4 described above. This analysis was used to identify an “SPE brain network,” defined as a set of brain regions responding in the same way for positive and negative reinforcers: that is, showing high BOLD signal when an unexpected reward or punishment is delivered and low BOLD signal for its unexpected omission (Fig. 1C, top). First, we searched for a “global SPE brain network,” by performing a formal conjunction analysis of brain regions showing a positive correlation with the PE for all 4 reinforcers. Next, we investigated a “primary SPE brain network” by performing a conjunction analysis of the brain regions showing positive correlations with the PE for each of the primary reinforcers (apple juice, salty water, and aversive picture). Then, we investigated a “gustatory SPE brain network” by performing a conjunction analysis of the regions showing a positive correlation with the PE for the 2 gustatory conditions (apple juice and salty water). Finally, we searched for a “visual SPE brain network” by performing a conjunction analysis of the brain regions showing positive correlations with PE for the 2 visual conditions (money and aversive picture).
RPE Flexible Factorial Design
This flexible factorial design included the contrasts C1, C2, C5, and C6. This analysis was used to identify an “RPE brain network,” defined as a set of brain regions responding to positive and negative reinforcers in opposite ways: that is, showing high BOLD signal when an unexpected reward is delivered and an unexpected punishment is omitted and low BOLD signal for an unexpected reward omission and unexpected punishment delivery (Fig. 1C, bottom). Again, we first tested for a “global RPE brain network” by performing a conjunction analysis across the brain regions showing a positive correlation with the PE for the appetitive conditions and the regions showing a negative correlation with PE for the aversive conditions. We also performed specific conjunction analyses to look for brain regions in which the RPE signal explains the BOLD signal for the primary reinforcers, the gustatory reinforcers and finally, the visual reinforcers.
Activations Localization and Reported Statistics
Anatomic labeling of activated regions was done using the SPM Anatomy toolbox (Eickhoff et al. 2005) and the probabilistic atlas of Hammers et al. (2003). Reported coordinates conform to the MNI space. Activations from whole brain analysis are reported with a threshold of P < 0.05 after false discovery rate (FDR) correction for multiple comparisons.
Region of Interest Analyses
For illustrative purposes, we extracted and plotted the time course of activity corresponding to the reinforced and unreinforced trials for each condition in several regions of interest (ROIs). The ROIs were created in 3 stages. First, we identified several brain regions that have established roles during aversive and appetitive conditioning: the striatum, the insula, the anterior cingulate cortex (ACC) and the amygdala (Jensen et al. 2003; Seymour et al. 2004; Bray and O'Doherty 2007; Herwig et al. 2007; Sescousse et al. 2010). Second, we built 7 anatomical ROIs defined with the probabilistic atlas of Hammers et al. (2003): the ACC ROI resulted from the union of left and right ACC cortex and the other anatomical ROIs were left/right insula, left/right amygdala, and left/right striatum. Third, we intersected each of these anatomical ROIs with the functional clusters revealed by our whole brain conjunction analyses. Time course extractions were conducted with MarsBaR toolbox for SPM (http://marsbar.sourceforge.net/). Moreover, because the substantia nigra (SN) is known to be a key region in processing rewarding and aversive events (Schultz 1998; Matsumoto and Hikosaka 2009b), we used the probabilistic brain atlas of Hammers et al. (2003) to build bilateral ROIs in this region.
We performed a 2 valences (positive vs. negative) × 2 modalities (gustatory vs. visual) repeated-measures analysis of variance (ANOVA) on the preference scores. As expected, the main effect of the valence demonstrated that the cues announcing positive reinforcers (apple juice and money) were preferred to those announcing negative reinforcers (salty water and aversive picture) (F1,19 = 2684.94, P < 10−6) (Fig. 2A). During scanning, the percent accuracy for detecting the cues was 97%, confirming that subjects paid attention to the cues. A one-way repeated-measures ANOVA on the RTs from the 5 conditions did not reveal any significant RT difference between reinforcer types (F4,72 = 2.0242, P = 0.1) at the time of the cue. For each of the 4 conditions having a probability of P = 0.5, the rating of the estimated probability that a cue led to a specific reinforcer was nonsignificantly different from 0.5 (Fig. 2B). For the neutral condition (P = 0), this estimated probability did not differ from 0. These results show that the probabilities of the association between cues and reinforcers were explicitly learnt in a valence-insensitive manner. Postscan subjective ratings confirmed that subjects perceived positive reinforcers as more pleasant than negative reinforcers. This was assessed using a 2 valences (positive vs. negative) × 2 modalities (gustatory vs. visual) repeated-measures ANOVA, in which the main effect of the valence was significant (F1,19 = 180.91, P < 10−6) (Fig. 2C). Finally, subjects’ self-reports indicated that they were drink deprived for an average of 12 h 24 min ± 2 h 16 min prior to scanning, therefore respecting their instructions to be drink deprived for 12 h. No significant difference was observed between the ratings of the thirst sensation performed before and after scanning (before: 3.16 ± 1.01, after: 2.74 ± 0.99; paired t-test: t18 = −1.41, P = 0.18). Although absence of evidence is no evidence for absence, the fact that our subjects were drink deprived for 12 h (this includes asking subjects to restrain from drinking water during this time period) and only drunk a very small amount of apple juice (1.2 mL, i.e., less than 2% of a glass of 200 mL) over the course of the experiment supports that the motivation to drink was stable over the course of the experiment. Interestingly, previous reinforcer devaluation fMRI studies reported that the amygdala and orbitofrontal cortex showed reduced activation after feeding subjects to satiety with one type of food but not with another (Small et al. 2001; Gottfried et al. 2003; Kringelbach et al. 2003). Yet, it is unlikely that such satiety effects were present in our Pavlovian conditioning paradigm because motivation for apple juice was not directly manipulated in our experiment and because apple juice was not devalued over the course of the experiment.
Brain regions with response profiles consistent with an RPE signal should code reward and punishment in opposite fashion, responding with a “positive” PE when the outcome is better than expected (unexpected reward or omission of expected punishment) and a “negative” PE when it is worse than expected (unexpected punishment or omission of expected reward). In contrast, brain regions in which the activity reflects an SPE signal should treat both rewards and punishments in a similar fashion, as salient outcomes, with a positive response when both of these salient events are delivered and a negative response for their omission.
To test the hypothesis of different cerebral networks modulated by the SPE signal (Fig. 1C, up), we performed 4 different conjunctions combining all or different subsets of the reinforcers (primary, gustatory, and visual). All reported results are FDR corrected, P < 0.05.
Global SPE Analysis
First, when investigating whether there is a set of brain regions modulated by a global SPE signal encoding all unexpected reinforcers in a similar fashion independently of their valence, significant clusters of activation were only found in the bilateral occipital lobe (x, y, z = −30, −69, −18, T = 6.02; 24, −81, −6, T = 5.18) (Table 1a). Note that this global SPE analysis can either reflect an SPE signal regardless of reinforcer type or reflect a general visual SPE-related signal since all conditions share in common a visual component.
|a. Global salient conjunction|
|b. Primary/immediate salient conjunction|
|c. Gustatory salient conjunction|
|Supplementary motor area||L||−3||−3||63||5.95|
|Middle frontal gyrus||L||−33||36||36||4.02|
|d. Visual salient conjunction|
|Lateral orbital gyrus||R||39||27||−12||3.89|
|Middle frontal gyrus||R||48||0||57||3.24|
|a. Global salient conjunction|
|b. Primary/immediate salient conjunction|
|c. Gustatory salient conjunction|
|Supplementary motor area||L||−3||−3||63||5.95|
|Middle frontal gyrus||L||−33||36||36||4.02|
|d. Visual salient conjunction|
|Lateral orbital gyrus||R||39||27||−12||3.89|
|Middle frontal gyrus||R||48||0||57||3.24|
Gustatory SPE Analysis
To test for a gustatory SPE brain network, we performed a conjunction analysis of the positive correlation between BOLD activity and PE in the gustatory conditions (apple juice and salty water). This analysis revealed higher activity in the putamen bilaterally (x, y, z = −21, 3, −9, T = 5.87; 21, 6, −12, T = 4.83), the insula bilaterally (x, y, z = −39, −3, −6, T = 8.91; 42, −6, 6, T = 6.83), and the ACC (x, y, z = −6, 9, 39, T = 5.63) (Fig. 3 and Table 1c). All these activations were also observed when directly comparing the brain regions positively correlating with the SPE for the 2 gustative reinforcers with those responding to the money-related PE and the aversive picture-related PE taken together (with a threshold of P < 0.05 after FDR correction for multiple comparisons), demonstrating that the BOLD signal in these brain regions correlated more strongly with SPE related to gustative reinforcers than to the PE related to the other 2 reinforcers. For illustrative purpose, the time courses of activation were extracted in the ROIs corresponding to the 5 brain regions noted above (Fig. 3). They showed an increase in BOLD signal when unexpected apple juice or salty water were received and a BOLD decrease when either type of juice was unexpectedly omitted.
Primary SPE Analysis
Next, we investigated whether the activity of a specific set of brain regions may be modulated by an SPE signal for primary reinforcers. We thus performed a conjunction analysis (logical AND) of the positive correlations between the BOLD signal and PE for the 3 primary reinforcers conditions (apple juice, salty water, and aversive picture). This analysis revealed that activity in bilateral amygdala (x, y, z = −21, −6, −8, T = 4.48; 21, −3, −18, T = 3.75) was positively modulated by the SPE for the 3 primary reinforcers (Fig. 4 and Table 1b). The left amygdala activation survived the direct contrast between the brain regions modulated by the SPE for the 3 primary reinforcers compared with the money-related PE, demonstrating that the BOLD signal in this brain region correlated more strongly with SPE related to primary reinforcers than to PE related to monetary reward, with a threshold of P < 0.05 after FDR correction for multiple comparisons. To illustrate this point, we built up 2 ROIs, in left and right amygdala, in which we extracted the time course of each condition for the reinforced and unreinforced trials separately. The resulting time courses indicate a positive increase in BOLD signal when an unexpected primary reinforcer is delivered but not when it is omitted.
Visual SPE Analysis
Monetary reward and aversive picture were presented in the same visual modality, without the gustatory components. Thus, we searched for common brain regions responding as an SPE signal for these 2 visual conditions. The conjunction of the brain regions showing a positive correlation with PE for both the monetary reward and the aversive picture conditions, revealed large clusters of activity in bilateral occipital lobe (x, y, z = −27, −81, −12, T = 10.86; 24, −81, −6, T = 9.71), and smaller clusters in right lateral orbital gyrus and the middle frontal gyrus (Table 1d).
As noted in the Introduction, an alternative to the SPE hypothesis is that some brain regions activity covaries with an RPE signal, responding in opposite fashion for positive and negative events, regardless of whether these events are reinforced or unreinforced (Fig. 1C, bottom). Such an RPE signal should thus be high for unexpected reward delivery (apple juice and money) but low for unexpected punishment (salty water and aversive picture).
Global, Gustatory, and Visual RPE Analysis
To test for brain regions responding like a global RPE signal for all reinforcers, we performed a conjunction analysis of the brain regions showing activity positively correlated with the PE for the 2 rewarding conditions and of the brain regions showing activity negatively correlated with the PE for the 2 aversive conditions. This analysis revealed no significant correlation with RPE signal, even with the very liberal threshold of P = 0.01 uncorrected. Moreover, no brain region was found to elicit an activity modulated by the RPE signal for the gustatory or visual conditions when restricting the conjunctions analyses to one or the other of these 2 modalities.
RPE for Each Reinforcer
For completeness, we also report the list of brain regions positively and negatively modulated by the PE for each reinforcer taken separately (Supplementary Tables S1–S4). Using our standard imaging procedure, we were unable to find evidence of SPE-related activity in the VTA/SN for any of the reinforcers when performing a small volume correction in a spherical ROI centered at x, y, z = 6, −16, −14 (6 mm radius) or when using the SN ROI of a probabilistic brain atlas (Hammers et al. 2003).
The present study identifies the brain structures responding to the “salient” and the “reward” PE signals in a Pavlovian reinforcement procedure with 2 types of rewards and 2 types of punishments. Critically, our model-based fMRI approach allowed us to extend at the whole brain level in humans the distinction between RPE and SPE signals recently observed in midbrain dopaminergic neurons in monkeys (Matsumoto and Hikosaka 2009b). Our results reveal the contributions of specific brain systems covarying with distinct SPE signals during Pavlovian learning of different types of stimuli-rewards and stimuli-punishments associations. We found evidence of a brain network including the striatum, anterior insula, and the ACC in which the activity reflected an SPE signal that depended upon the type of reinforcer. This brain network responded more robustly for appetitive and aversive juice outcomes than for monetary or aversive pictures. Moreover, the activity in the amygdala correlated with the SPE for primary reinforcers (i.e., for both types of juice and for aversive picture outcomes), but we found no evidence for this brain region encoding SPE for the secondary reinforcer (money). In contrast, no evidence was found for a nonsensory brain network coding RPE regardless of reinforcers type. These results demonstrate that an SPE signal covaries with the activity of specific components of the reward system and depends upon reinforcers type.
A current theory proposes that dopaminergic neurons encode a form of valence for learning and motivating reward-seeking behavior (Schultz et al. 1997; Berridge and Robinson 1998; Wise 2004) while another theory states that these neurons encode a form of salience (or ‘‘alerting’’) for shifting attention to unpredicted events (Redgrave et al. 1999; Horvitz 2000). In contrast, recent electrophysiological data propose that dopamine neurons may follow both theories at different times during a single task (Bromberg-Martin et al. 2010) and our findings that some components of the reward system respond to an SPE signal dependent upon the type of reinforcer are thus consistent with the latter results.
Previous fMRI studies using only potentially rewarded outcomes (presented or omitted) or only negative reinforcers could not search for the localization of brain activities covarying with an SPE signal regardless of reinforcer type (i.e., involved in the same way for unexpected reward and punishment). Our observed brain network in which the activity covaries with an SPE signal for both positive and aversive juices complement 2 sets of literature, one concerning appetitive conditioning and the other concerning aversive conditioning. First, our findings are consistent with previous human associative learning fMRI studies reporting striatal activity for different types of rewards, such as unexpected appetitive juice (McClure et al. 2003; O'Doherty, Dayan, et al. 2003) or positive odors and faces (Gottfried, O'Doherty, et al. 2002; Bray and O'Doherty 2007). Second, a number of human fMRI studies also investigated the neural substrates of aversive conditioning using a variety of punishments (Seymour et al. 2004; Jensen et al. 2007; Menon et al. 2007; Sarinopoulos et al. 2010). In agreement with the current findings, some experiments using a reinforcement learning approach involving cues that predict potential painful stimuli reported that the striatal BOLD signal increases with unexpected punishment and decreases when it is unexpectedly omitted (Jensen et al. 2003; Seymour et al. 2004; Menon et al. 2007). However, most aversive Pavlovian conditioning experiments did not use computational reinforcement learning models and did not compare directly how the brain learns to predict different types of rewards and punishments, making the comparison with our current findings difficult.
The present findings add to the growing body of evidence supporting the role of the striatum in coding PE for aversive outcomes, both in humans (Jensen et al. 2003; Seymour et al. 2004; Menon et al. 2007) and animals (Pezze and Feldon 2004). Moreover, an SPE-like response has been observed both in the human striatum, which shows increased activity for unexpected reward and for unexpected punishment (Seymour et al. 2005; Jensen et al. 2007) and in the striatum of nonhuman primates, which represent both appetitive and aversive events (Ravel et al. 2003) Together, our results point to a general role of the striatum in coding a SPE signal across a broad range of reinforcer types and complement previous reports that the striatum responds to nonrewarding unexpected stimuli, consistent with a saliency interpretation (Dreher and Grafman 2002; Zink et al. 2006).
In addition to the striatum, responses in other brain regions, including the ACC and the anterior insula, also correlated with a gustatory SPE, regardless of juice valence. ACC neurons are known to encode PE when macaque monkeys learn a correct action to make in a given context (Matsumoto et al. 2007) and the activity of this region has been shown to correlate with PE for both pain increase and pain relief (Seymour et al. 2005). This literature is consistent with our current findings that both appetitive and aversive PE correlate with ACC activity. Our results also suggest that the anterior insula activation signals an error in predicting juice salience, regardless of its valence. This result extends previous findings that anterior insula activity is often reported with various aversive events (Jensen et al. 2003; Liu et al. 2007), is driven by taste intensity irrespective of valence (Small et al. 2003), and correlates with the PE for both aversive and appetitive stimuli when using pain and pain relief as reinforcers (Seymour et al. 2005).
A number of studies have focused on valence, using monetary incentives or aversive shock, but have not independently varied salience (Breiter et al. 2001; Knutson et al. 2001; Abler et al. 2006; Dreher et al. 2006; Jensen et al. 2007; Tobler et al. 2007; Tom et al. 2007; Sescousse et al. 2010). Other studies have varied salience but have not independently varied valence across gains and losses (Jensen et al. 2003; Tricomi et al. 2004; Zink et al. 2006; Bjork and Hommer 2007). The novel aspect of the current study is that aversive and appetitive stimuli of different types are combined within a single paradigm and that different modality-dependent saliency PE are computed to identify specific saliency networks. When simultaneous manipulation of valence and salience have been performed, the results have shown some inconsistencies between studies, in part because the concept of saliency has been defined in different ways (Zink et al. 2006; Cooper and Knutson 2008; Carter et al. 2009). For example, Cooper and Knutson (2008) independently manipulated valence and salience by cuing participants to anticipate certain and uncertain monetary gains and losses. They found a significant interaction between valence and salience of anticipated incentives in the ventral striatum, suggesting that it separately represents valence and salience. Other studies reported that ventral striatal activation increases in anticipation of both gains and losses, supporting a saliency interpretation (Carter et al. 2009) or that some components of the reward system depend on valence (Seymour et al. 2007). Salient trials have also been reported to engage the dorsal striatum, when salience is defined by a monetary reception dependent upon a correct response (active condition) as opposed to a passive monetary reception (Tricomi et al. 2004; Zink et al. 2006). The concept of saliency used in these previous studies, however, greatly differed between paradigms and was not implemented in terms of computational processes, as is the case in our study.
It is worth noting that monetary RPE did not evoke reliable striatal or amygdala activity in our Pavlovian conditioning experiment. To the best of our knowledge, striatal activity is evoked in instrumental conditioning experiments using money as reward (Pessiglione et al. 2006; Yacubian et al. 2006; Liu et al. 2007) but has not been reported during monetary Pavlovian conditioning. In most experiments involving monetary rewards, receipt of money is contingent upon subjects’ action (i.e., monetary decision making task, guessing task, and gambling task) and money delivery only follows a correct response (O'Doherty, Critchley, et al. 2003; Dreher et al. 2006; Liu et al. 2007). Similarly, fMRI experiments reporting PE signal in the striatum (O'Doherty Critchley, et al. 2003; Pessiglione et al. 2006; Yacubian et al. 2006) did so during instrumental learning paradigms. The fact that instrumental, but not Pavlovian conditioning, engage the striatum for monetary reward is directly supported by a study reporting higher striatal activation when the money delivery depended upon subject’s response as compared with when the receipt of money was independent of subject’s actions (Zink et al. 2004). In our paradigm, subjects did not have to choose between actions leading to potentially aversive/positive outcomes and therefore could not actively avoid or approach anticipated outcomes. This may explain the absence of striatal response to monetary reward in our Pavlovian conditioning procedure since the ability to learn the valence of stimuli-outcomes associations is fundamental for appropriate subsequent approach or avoidance behavior. It is also possible that our study was insensitive to RPE for signal to noise reasons.
Bilateral amygdala response was observed with the SPE for all types of primary reinforcers (positive/aversive juices and aversive pictures) but did not significantly evoke PE for secondary reward (money). One reason may be that primary reinforcers are more salient than secondary reinforcers because they are more important for survival, capturing orienting or information seeking behavior more effectively. The amygdala is critical for processing rewarding and aversive outcomes (Murray 2007). Consistent with its role in processing both emotional valence and intensity (Machado et al. 2009), it is in an ideal position to integrate appetitive and aversive events to direct emotional responses. Information about primary pleasant and aversive stimuli converges in the amygdala, which receives input from sensory systems of all modalities (Stefanacci and Amaral 2002). Our results are thus consistent with studies emphasizing the role of the amygdala as a detector of behaviorally relevant stimuli (Sander et al. 2003) and during conditioning procedures with both appetitive and aversive stimuli (Buchel et al. 1999; Seymour et al. 2005; Paton et al. 2006). A number of brain regions, including the amygdala, may also reflect greater saliency associated with risk-taking during neuroeconomics paradigms (Kahn et al. 2002; Bechara et al. 2003; Cohen and Ranganath 2005; Hsu et al. 2005) as well as during speeded as compared with delayed motor responses during the stop signal task, a classical cognitive control paradigm requiring individuals to restrain a habitual response (Li et al. 2009). Interestingly, at the neuronal level, electrophysiological responses in the primate amygdala support both valence-specific and salience-specific processes, with both common and distinct populations of neurons involved in learning the association between conditioned visual stimuli and rewards or punishments (Buchel et al. 1999; Paton et al. 2006; Belova et al. 2008). Supporting a saliency signal, some cells showed a similar effect of expectation on responses to rewards and punishments. In contrast, in other cells, expectation modulates the responses to either rewards or aversive stimuli but not both, potentially playing a role in processes that require information about stimulus valence (Paton et al. 2006; Belova et al. 2008). A recent study in rodents also demonstrated that the amygdala provides an unsigned error signal (Roesch et al. 2010) with characteristics consistent with those postulated by signaling “motivational salience” (Belova et al. 2008). Many of these processes, which include learning when to exhibit defensive as opposed to approach behavior, are thus likely to involve the amygdala (LeDoux 2000).
A recent study used an axiomatic approach rooted in economics theory to formally test the class of RPE models on neural data (Rutledge et al. 2010). This approach is dividing the space of all possible models into subdomains and attempts to falsify the hypothesis that one or more members of an entire class of models can account for a set of empirical observations. This elegant study showed that the neural responses observed in the striatum, medial prefrontal cortex, amygdala, and posterior cingulate cortex satisfy axioms (necessary and sufficient conditions) for the entire class of RPE models. However, activity measured from the anterior insula falsified the axiomatic model, and therefore, no RPE model can account for measured activity in this brain region. Additional analyses suggested that the anterior insula might encode a saliency signal at outcome, consistent with our results. However, our current study, which used a regression approach (as well as all other model-based fMRI studies on RPE) cannot falsify the hypothesis that dopamine-related activity encodes an RPE signal. On the other hand, it is important to note that this study used a task requiring choices between lotteries only in the monetary domain. Therefore, this paradigm—in contrast to the current one—did not allow the authors to test whether the distinct nature of reinforcers differentially influence distinct brain networks.
The absence of positive results concerning the VTA/SN related to PE or SPE could be due to known difficulties to image midbrain dopamine nuclei without high-resolution fMRI and midbrain-specific alignment algorithms (D'Ardenne et al. 2008). In fact, a number of previous fMRI studies did report midbrain-related PE activity in different paradigms (Dreher et al. 2006; D'Ardenne et al. 2008) while others did not (McClure et al. 2003; O'Doherty, Dayan, et al. 2003; Abler et al. 2006; Li et al. 2006; Pessiglione et al. 2006; Behrens et al. 2008; Hare et al. 2008; Sescousse et al. 2010). In the same line, the neural activity of the lateral habenula has been reported to increase when expected rewards are not delivered or when unexpected punishments are received (Matsumoto and Hikosaka 2007, 2009a). Such signals may be an important source of negative RPE inputs to the VTA since activity increases in lateral habenular neurons inactivate dopamine neurons in the VTA. Some fMRI studies in humans also reported that the habenula responds to negative PE (Salas et al. 2010) and sends a signal to the VTA/SN during error detection in a stop signal task (Ide and Li 2011). Although these results are consistent with the hypothesis that the VTA and habenula are part of a saliency brain circuit, most fMRI studies on PEs using either standard or high-resolution imaging have failed to report such activity in the lateral habenula (McClure et al. 2003; O'Doherty, Dayan, et al. 2003; Seymour et al. 2004; Abler et al. 2006; Dreher et al. 2006; Li et al. 2006; Pessiglione et al. 2006; Behrens et al. 2008; D'Ardenne et al. 2008; Hare et al. 2008; Rutledge et al. 2010; Sescousse et al. 2010).
Finally, it should be noted that finding BOLD activity that covariates with an SPE signal does not mean that the source of the SPE network is unique. Indeed, a region X might receive an RPE from region A and a punishment PE from region B and, therefore, may appear as if it is receiving a single SPE signal. This is not only theoretically possible, since it has been suggested that dopamine encodes an RPE and that serotonin may encode a punishment PE or may regulate patience while waiting for future reward (Daw et al. 2002; Miyazaki et al. 2012). Thus, there is a conceptual distinction to be made between brain activity that directly reflects an SPE—that is, computes an SPE or receives direct projection(s) from area(s) that computes it—and brain activity that correlates with this signal, as a result of increased salience. This distinction is particularly useful to understand that the only brain area that responds to the global SPE, the occipital lobe, is devoid of dopamine projections.
Our findings advance our understanding of the neurobiological mechanisms underlying the ability to learn associations between stimuli and rewards or punishments. Our model-based fMRI approach allowed us to pinpoint the brain structures responding to the 2 types of hypothesized PE signals (Matsumoto and Hikosaka 2009b). In this electrophysiological study, the SPE hypothesis was developed using liquid reward and airpuff aversive stimuli, which have not only opposite valence but are also of different types (gustatory vs. tactile). Our results reveal the contributions made by distinct brain regions in computing PE depending upon the type and valence of the outcomes. Thus, our results provide direct empirical evidence for formal learning theories that posit a critical role for the SPE signal during Pavlovian conditioning in humans and demonstrate that the cerebral representation of this signal depends not only on the valence but also upon the type of the reinforcement.
Marie Curie International reintegration grant; Fyssen Foundation grant (to J.-C.D.).
We thank the CERMEP staff for help during scanning. Conflict of Interest: None declared.