Functional imaging studies have revealed roles for orbitofrontal cortex (OFC) in reward processing and decision making. In many situations, rewards signal that the current behavior should be maintained, whereas punishments cue a change in behavior. Thus, hedonic responses to reinforcers are conflated with their function as behavioral cues. In an attempt to disambiguate these functions, we performed a functional magnetic resonance imaging study of a 2-choice decision-making task. After each trial, subjects were rewarded or punished and independently provided with a cue to maintain or change behavior. We identified key regions of OFC involved in these processes. An anterior medial focus responded to reward, whereas bilateral lateral foci responded to punishment. The right-sided lateral region that responded to punishment also responded to cues for behavior change (shift), whereas a more ventral and anterior bilateral region responded to cues for behavioral maintenance (stay). The right-sided stay region responded specifically when stay cues were combined with punishment. These results support the view that OFC codes both hedonic responses to reinforcers and their behavioral consequences. Punishments and shift cues are associated with the same right lateral OFC focus, suggesting a fundamental connection between emotive response to negative reinforcement and use of negative information to cue behavioral change.
The orbitofrontal cortex (OFC) has been widely implicated in reinforcement processing, both in animals and humans. However, it is clear that there is considerable functional heterogeneity within this region and recent functional magnetic resonance imaging (fMRI) studies in humans have attempted to fractionate distinct roles (for reviews: Kringelbach and Rolls 2004; Elliott and Deakin 2005; Kringelbach 2005; Murray et al. 2007; O'Doherty 2007). Several key distinctions have been suggested. One of the more reliable is the trend for medial regions of the OFC to be more associated with positive reinforcement, whereas lateral regions are more associated with negative reinforcement. It has been suggested that although this distinction could be due to representations of the hedonic properties and/or reinforcement values of stimuli, links to behavioral choice are also important (Kringelbach and Rolls 2004; Kringelbach 2005; O'Doherty 2007). A role for OFC regions in choice responding is clear from the behavior of patients with OFC lesions, who are typically inflexible in their behavior (Damasio 1994). It has been hypothesized that this may reflect a critical role for OFC in inhibiting established responses which cease to be appropriate (Elliott et al. 2000). However, recent reviews have argued that the evidence is better accounted for by the theory that OFC evaluates reinforcers, providing a signal that can lead to behavioral change (Kringelbach and Rolls 2004; Murray et al. 2007).
In many experimental paradigms of reinforcement-related processing, rewards act as a signal that the preceding action was correct, whereas punishments signal that the preceding action was incorrect. A classic example of this type of paradigm is reversal learning. In reversal tasks, one stimulus is associated with reward and another with punishment; then these contingencies are reversed. Subjects respond to a stimulus that is rewarded, but then, when the reversal occurs, they receive a punishment that cues the need to switch responding to the other stimulus. In both monkeys (Iversen and Mishkin 1970; Dias et al. 1996) and humans (Rolls et al. 1994; Fellows and Farah 2003; Hornak et al. 2004), orbitofrontal lesions result in impaired reversal learning. In basic reversal tasks (and many other reinforcement-based learning situations), hedonic and informational aspects of reinforcers are typically conflated. Rewards act as a signal to maintain behavior, whereas punishments cue change. It can therefore be unclear whether neural substrates are coding hedonic properties of reinforcers, behavioral signals, or both. Kringelbach and Rolls (2003) used an fMRI reversal task with angry faces as switch cues and observed enhanced response in bilateral lateral OFC to angry faces as switch cues relative to angry faces not serving as cues to behavioral change. Although this design provided evidence that cued change elicits lateral OFC response beyond that due to negative reinforcement alone, shift cueing and negative reinforcement were still partially conflated.
Another approach to teasing out these functions has been to use probabilistic reversal learning paradigms. In these tasks, 2 stimuli are associated with rewards and punishments respectively but only on a percentage of trials. For example, stimulus A may be associated with a reward on 70% of trials but a punishment on 30%, whereas stimulus B is associated with reward on 30% and punishment on 70%. Subjects soon learn that stimulus A is the advantageous choice, more likely to be associated with reward. Once optimal behavior is established, contingencies are reversed and subjects typically take a few trials to realize that they should stop responding to A and start responding to B in order to maximize rewards. These paradigms allow reward/punishment responses to be decoupled from behavioral cueing to some extent. In particular, there are trials where subjects are punished but choose to maintain the same response (stay) because they attribute punishment to the probabilistic nature of the task rather than a valid cue to change behavior. However, typically, there are very few trials in these paradigms where subjects are rewarded but choose to change behavior (shift).
Cools et al. (2002) used a probabilistic reversal paradigm with event-related fMRI and assessed neural responses to the final error before behavioral shift. Right ventrolateral prefrontal signal was associated with this final error, interpreted as a signal to shift. This region did not respond to probabilistic errors or reversal errors that did not result in a shift; that is, it was specific to trials preceding a shift rather than the experience of an error. The region observed was adjacent to lateral OFC, but the imaging protocol did not allow visualization of the OFC itself. A subsequent study by O'Doherty et al. (2003) did assess OFC function associated with a probabilistic reversal task. In this study, winning compared with losing was associated with medial OFC response and losing compared with winning on nonshift trials was associated with lateral prefrontal response. Decisions to stay were associated with medial and left lateral OFC, whereas the decision to shift was associated with a caudolateral region of OFC, close to the ventrolateral region observed in the Cools et al. study. Remijnse et al. (2005) also reported response in a caudal OFC region extending to the insula associated with switching in a probabilistic reversal learning task as well as a left lateral region associated with punishment trials.
In these tasks, critically, rewards and punishments are still functioning as the cues to guide behavior, even though the probabilistic characteristics mean that responses to rewards and punishments can be evaluated separately to stay versus shift decisions. However, this separability is only partial; in particular, there are generally too few win-shift events to analyze. In the present study, our aim was to design a paradigm where hedonic and informational properties of reinforcers are fully decoupled, but subjects are provided with a clear strategy for optimum performance. To this end, we adapted a decision-making paradigm developed by Paulus (1997) and Paulus et al. (2001). This paradigm was designed to assess decision making under conditions of uncertainty and compares responses of subjects making a guess to making a fully-specified choice. The dominant strategy adopted by subjects was “win-stay/lose-shift” (Evenden and Robbins 1983); that is, when their guess was correct, they repeated the response and when incorrect, they shifted to the opposite response. In our version, subjects were provided with not only both feedback and reinforcement, as in the Paulus studies, but also an independent behavioral signal cueing the optimum decision on the subsequent trial. Thus subjects were provided with a strategy for maximizing rewards. This was a factorial design with outcome and behavioral cue as orthogonal factors rather than partially separated factors as in the probabilistic reversal paradigms. We were thus able to assess the main effects of these factors as well as the interaction between them. We hypothesized that regions of the lateral OFC would be associated with both losing and changing behavior, whereas medial regions of the OFC would be associated with winning and maintaining behavior.
Materials and Methods
A total of 12 right-handed male volunteers were recruited by advertisement in the local community. Mean age was 31.7 (range 21–45) and mean years of education was 15.1 (range 11–20). Subjects were screened using self-report to exclude any psychiatric or neurological history, current or previous substance abuse, or current medication. All subjects gave written informed consent, and the study was approved by the local research ethics committee.
Cognitive Activation Paradigm
The paradigm was adapted from that developed by Paulus et al. (2001). In our version, subjects were presented with an image of a house with a driveway on either side. They were asked to make a right or left button press to predict whether a car would arrive to the right or the left of the house. After a brief interval, the car was presented on either the right or the left. If they had guessed correctly, a reward display was shown (an image of a coin) representing a win of £1. If they had guessed incorrectly, a loss display was shown (image of a coin with a cross through it), representing the loss of £1 from their total. On some trials, a blue dot was superimposed on the win and loss displays signaling to subjects that the car would be more likely to arrive on the SAME side of the house on the next trial. Conversely, a gray dot would signal that the car would be more likely to arrive on the OPPOSITE side of the house on the next trial (see Fig. 1). The color of the dot was an accurate predictor on 70% of trials. Each color of dot appeared on half the trials, in a predetermined, pseudorandom order.
There were thus 4 types of trial outcome:
“Win-stay”: a reward with a blue dot
“Win-shift”: a reward with a gray dot
“Lose-stay”: a loss with a blue dot
“Lose-shift”: a loss with a gray dot
A subject who adopted the (logical) strategy of being guided with the dot color on all trials would experience win-stay on 35% of trials, and win-shift on 35% of trials (because following the dot was associated with 70% probability of winning) and lose-stay and lose-shift each on 15% of trials. There were 100 trials in total.
The task was explicitly designed such that reward and punishment should not increase or decrease the likelihood of a particular response. For 50% of the rewards, a stay response was appropriate, whereas for 50%, a shift response was appropriate and similarly for punishments. Subjects were extensively trained on the task immediately prior to scanning and were also told that the car would arrive on the left on the first trial to provide a clear “start point.”
The intertrial interval varied from 8 to 12 s, comprising 1 s of the house image, 3 s of the house with “guess now” prompt (during which subjects’ response was recorded), a variable delay of 0–4 s, a 2.5-s outcome display (car arriving and reward/loss with blue or gray dot), and then a 1.5-s blank screen before the next trial. The variable delay of 0–4 s (with a mean of 2 s) was used to temporally decouple the decision from the outcome. The task was performed continuously during scanning, and subjects won an amount corresponding to total wins minus total losses (£35 for a subject performing optimally).
Scanning was performed on a 1.5-T Philips Intera system. A T1-weighted structural scan was acquired for reference and to exclude any gross structural abnormality. A T2*-weighted sequence was acquired during task performance. This sequence was 16 min and 40 s long and comprised 318 functional dynamics. Whole-brain coverage was achieved by acquiring 40 axial slices with a 3.5-mm spacing and in-plane resolution of 3 × 3 mm. Repetition time was 3.14 s, echo time 40 ms, and slices were acquired interleaved to optimize signal in orbital regions.
The data were analyzed using Statistical Parametric Mapping (SPM5: Friston et al., www.fil.ion.ucl.ac.uk/spm). The first 2 scans were excluded to allow for equilibration effects. Slice timing correction was performed to adjust for the time difference inherent in acquiring multislice images. Images were then realigned using the first as a reference, spatially normalized to Montreal Neurological Institute templates, and smoothed with a 10-mm isotropic Gaussian kernel. Realignment parameters were used as covariates of no interest in subsequent statistical analysis to reduce confounds due to movement effects.
An event-related random effects analysis was performed, using a canonical hemodynamic response function to model expected blood oxygen level–dependent (BOLD) signal. At the first level, a general linear model was constructed for each subject. We modeled the 4 different outcome events, defined as the moment at which the outcome appeared on the screen. For each subject, we performed a 2 × 2 factorial analysis and generated contrasts corresponding to the 2 main effects: win versus lose and shift versus stay and the 2 interaction terms. These individual contrast images were then entered into second level analyses using 1-sample t-tests. Our prior hypotheses concerned the OFC, and we therefore performed a region of interest (ROI) analysis at P < 0.05 corrected within this region. This analysis was performed by using anatomical masks to constrain the regions considered at the statistical inference stage, performing small volume correction within these regions. We used anatomical masks from the automated anatomical labels of the WFU pick atlas toolbox (Tzourio-Mazoyer et al. 2002; Maldjian et al. 2003) corresponding to the inferior, middle, and medial orbital gyri. Prior literature (Cools et al. 2002; O'Doherty et al. 2003; Remijnse et al. 2005) suggests that other prefrontal regions will also respond to shift cues. However, the specific aim of this study was to focus on OFC functions, and we therefore did not use ROIs in these regions. Similarly, we did not use ROIs for regions of extended reward systems such as amygdala and ventral striatum. Additional regions of BOLD signal change are only reported if they survive family-wise error (FWE) correction of P < 0.05 in a whole-brain analysis. For the interested reader, full figures showing BOLD signal changes for whole-brain analysis thresholded at P < 0.001 uncorrected are available as Supplementary Material.
All subjects responded within the 3-s window on all trials. Their behavior showed a clear pattern of following the dot cue on most trials. A subject who responded as prompted by the dot on all trials would win on 70/100 trials. The mean number of wins was 66.9 (range 62–70). In a debrief session, all subjects described their strategy as being guided by the dot prompt, but “getting it wrong” on a small number of trials. The 11 subjects made a total of 34 errors. In all, 11 of the errors were shifts after a win-stay cue, 9 were stays after a win-shift cue, 7 were shifts after a lose-stay cue, and 7 were stays after a lose-shift cue.
The data were analyzed 2 different ways. In one analysis, we used all trials, regardless of whether or not subjects responded “correctly.” In the other, we used only those trials where subjects responded as prompted by the color of the blue dot. Unsurprisingly, given the very few trials where subjects did not act as prompted, there were no differences between the 2 analyses, and therefore, we report the results of the full analysis.
Main Effect of Win versus Lose
Within the OFC ROI, we observed significant BOLD response in an anterior medial region (Brodmann area [BA] 11; peak at −3 45 −15; see Fig. 2a). No other regions survived whole-brain FWE correction.
Main Effect of Lose versus Win
Within the OFC ROI, we observed significant BOLD response in a right lateral region (BA 47; peak at 51 24 −6). A corresponding peak was observed on the left (−48 21 −9) although this was not significant at P < 0.05 corrected. It should be noted that the area of significant BOLD responds includes the lateral OFC focus but also extends dorsally to frontoinsular cortex (see Fig. 2b). No other regions survived whole-brain FWE correction.
Main Effect of Shift versus Stay
Within the OFC ROI, we observed significant BOLD responses in a right lateral region (BA 47; peak at 48 21 −6), closely corresponding to the right-sided focus observed in losing compared with winning (see Fig. 2c). No other regions survived whole-brain FWE correction.
Main Effect of Stay versus Shift
Within the OFC ROI, we observed significant BOLD response in a right-sided region (BA 11; peak at 27 33 −18). A corresponding peak was observed on the left (−33 39 −15) although this was not significant at P < 0.05 corrected. These foci were anterior and ventral to the right-sided region associated with shift versus stay (see Fig. 2d). No other regions survived whole-brain FWE correction.
Interaction of Lose versus Win with Shift versus Stay
There were no BOLD signal changes within the OFC ROI and no other responses that survived whole-brain FWE correction.
Interaction of Lose versus Win with Stay versus Shift
Within the OFC ROI, there was significant BOLD response in a right-sided region (BA 11; peak at 27 33 −18). This peak was also observed in the main effect of stay versus shift, and an exploration of the simple main effects driving the interaction revealed that this OFC response to stay versus shift was seen only for losses and not for wins. There were no other responses that survived whole-brain FWE correction.
As predicted, we observed significant BOLD responses in distinct subregions of the OFC. In line with our prior hypothesis, winning compared with losing was associated with medial OFC response, whereas the reverse contrast was associated with left and right lateral OFC response. The same right lateral region was also associated with the orthogonal shift compared with stay contrast, designed to assess cued behavioral change. We had hypothesized that stay compared with shift would be associated with more medial OFC signal; however, the region that in fact showed significant response was a relatively lateral region, anterior and ventral to that associated with shift compared with stay. This response was also bilateral, although the right-sided region showed a distinct pattern of response, being associated with the interaction between the factors. Specifically, response here was associated with stay compared with shift only on losing and not on winning trials.
Like many previous studies, we observed medial OFC response to reward and lateral OFC response to punishment, irrespective of behavioral cueing. O'Doherty et al. (2001) first reported a medial versus lateral dissociation to financial rewards and punishments respectively in reversal learning task. Other studies have since supported this view, using both financial reinforcers (Ursu and Carter 2005; Kim et al. 2006) and pleasant compared with aversive tastes and smells (Small et al. 2001; Anderson et al. 2003; Rolls et al. 2003). However, not all studies have supported a clear-cut dissociation (Breiter et al. 2001; O'Doherty et al. 2003; Elliott et al. 2004). O'Doherty (2007) has suggested that the context in which reinforcement occurs is important, particularly when abstract financial reinforcers are used. He argues that some of the studies where the medial–lateral dissociation has been less clear-cut have involved complex tasks where anticipation and expectation are typically involved as well as outcome coding. Kim et al. (2006) reported medial OFC response to reward receipt but other OFC regions also responding to expectation. This is consistent with the present finding because the measured responses were time locked to outcome.
Another critical factor cited by O'Doherty (2007) is behavioral significance. Specifically, he argues that in some studies, punishment or monetary loss signals behavioral change, whereas in other studies, losses do not automatically signal change. Our results are directly relevant to this argument. We found that bilateral lateral OFC responded to punishment more than reward, independent of behavioral significance, suggesting that hedonic value per se is coded in this region. However, the right lateral OFC also responded to shift cues compared with stay cues. There was no interaction in the right-sided focus, suggesting that the punishment and shift cueing had additive rather than modulatory effects. Thus, right lateral OFC responds to punishment and to shift cueing with maximal response in this region when both punishment evaluation and behavioral change are required. Previous studies of probabilistic reversal have been somewhat inconsistent with regard to lateral OFC response. Most studies have reported OFC response to punished switch trials, or greater response to punished switch trials than to punished nonswitch trials, which is fully consistent with our findings. However, some have reported an absence of lateral OFC response to punished nonswitch trials, contrary to the results presented here. Interestingly, O'Doherty et al. (2003) reported a very similar right-sided response when contrasting punished nonswitch trials with rewarded trials but specifically in an “imperative” rather than a choice condition. The imperative condition involved subjects being told which response to select on each trial rather than having to decide. In our task, subjects are required to choose; however, the behavioral cue provides a clear basis for making a choice and thus there are similarities with O'Doherty's imperative condition. Ursu et al. (2008) have also reported lateral OFC responses to negative outcomes, regardless of their implications for cognitive control processes, again suggesting an evaluative role for lateral OFC that is at least partially independent of behavior.
Although lateral OFC appears to fulfill an evaluation function for negative reinforcers in the absence of shift cueing, we also found evidence for a role in shifting. Previous studies of reversal learning have consistently reported a right ventrolateral region associated with switching behavior. We find a similar response that was time locked to a cue to shift to a different motor response from the previous trial, consistent with the hypothesis that this region is important in signaling behavioral change. Studies of probabilistic reversal have reported somewhat different foci within a broadly right ventrolateral region. Cools et al. (2002) described a right inferior frontal focus, although their fMRI protocol did not encompass the OFC. O'Doherty et al. (2003) reported a region spanning anterior insula and caudolateral OFC, whereas Remijnse et al. (2005) reported right-sided posterior OFC response as well as a left lateral OFC response. Using a task without the probabilistic element, Hampshire and Owen (2006) observed bilateral OFC response associated with reversal based on negative reinforcement. In our ROI analysis, the maximally activated voxel of our right-sided region was located in right lateral OFC; however, the cluster extended to anterior insula and inferior frontal cortex. Thus, our findings extend the evidence for a right ventrolateral frontal/orbitofrontal/anterior insula region playing a critical role in signaling behavioral change.
These findings are not compatible with our earlier hypothesis that lateral OFC is involved specifically when previously rewarded responses must be suppressed (Elliott et al. 2000) because the region responds to punishment whether change or maintenance is required. Kringelbach and Rolls (2004) have proposed that lateral OFC regions are concerned with evaluation of punishment, thus providing a signal that can lead to behavioral change. Our results do support this theory, with the right lateral regions responding to both punishments and cued behavioral change, suggesting a dual role. This is also corroborated in a recent study by Mitchell et al. (2008) using a complex response reversal task. They observed right ventrolateral frontal response not only to reversal errors but also to correct reversals and to nonreversal errors. Like our finding, this evidence supports the theory that right ventrolateral/orbitofrontal cortex is important for both punishment evaluation and behavioral reversal with response seen when “either” process is required.
An alternative possibility is that our paradigm failed to separate the processes of punishment and behavioral cueing at a neuronal level. In terms of the contingencies, there were equal numbers of lose-stay and lose-shift cues and therefore losing did not reinforce either shift or stay behavior. The very few errors made by subjects on lose trials were equally divided between errors after lose-stay and errors after lose-shift, and neuronal responses were the same when error trials were removed from the analysis. Therefore, at an experimental level, punishment and shifting were orthogonal factors. However, in reality, punishment and behavioral change are generally conflated and therefore the association between punishment and behavioral change may be “hard wired,” such that the processes cannot be dissociated psychologically or neuronally. This would be consistent with the shared neuronal substrate of punishment and cueing in the right lateral OFC; however, as Figure 2 shows, the pattern of response in the left OFC was different. This region responded differentially to punishment but not to switching, suggesting at least a partial dissociation at a neuronal level.
It should be noted that some authors have stressed the importance of other frontal regions in cued switching (Remijnse et al. 2005; Budhani et al. 2007). Indeed, Remijnse et al. (2005) suggested that responses in anterior cingulate and dorsolateral prefrontal cortices were more closely correlated with switching than orbitofrontal regions. Shift versus stay cues in our experiment were associated with responses in other regions as well as OFC, including dorsolateral prefrontal and anterior cingulate cortices. However, these responses did not survive FWE correction or even false discovery rate correction for multiple comparisons. The focus of this study was to explore OFC responses to reinforcers and behavioral cues, and thus, we did not consider ROI analyses in other frontal regions. Whole-brain activation maps are available as online Supplementary Material for the interested reader.
Although studies of probabilistic reversal have previously considered the neural basis of behavioral switching, the converse has received less attention. Neither Cools et al. (2002) or Remijnse et al. (2005) discuss the role of OFC regions in behavioral maintenance (or stay cueing as we have termed it here). O'Doherty et al. (2003) did report a contrast between response maintenance and response switching, with both medial and left lateral OFC foci involved. We observed anterior lateral foci for a similar contrast (stay–shift, irrespective of outcome). Our left lateral focus is very similar to that reported by O'Doherty et al. (2003) and our right focus is similar to that dubbed “central OFC” and associated with response maintenance specifically in conditions where subjects were making a choice but not an imperative condition where the response was selected for them. As argued above, our task has some features of both the choice and imperative conditions; with a choice required but subjects provided with a clear basis for making that choice. Interestingly, the right central OFC focus that we observed for stay compared with shift was only significant for losing trials. It is possible that lose-stay is the least instinctive of the 4 combinations because losing (or punishment) is so strongly associated with behavioral change in most contexts. Thus, it is plausible that the central OFC region is coding a mismatch between instinctive behavior and that required by this task. There is far less discussion of behavioral maintenance in the literature than behavioral change and this is an area for future studies to explore further.
Our study design has certain methodological limitations, emphasizing the inherent problems teasing out the processes of interest. First, there were more winning trials than losing trials. In order for the cues to have predictive validity, subjects had to win more often than losing and therefore this imbalance was effectively demanded by the psychological parameters of the task. However, it created a bias in expectation which confounds the responses to outcomes. This is a significant confound but unavoidable in this psychological context. Second, it could be argued that the attempt to separate reinforcer evaluation from behavioral cueing is somewhat artificial. In order to create this separation, subjects are effectively being told what to do rather than choosing on the basis of outcome information. Thus, we were studying a very specific process which may have limited applicability to more complex real-life reinforced decision making.
In conclusion, this study has shown that regions of the human OFC code hedonic and informational aspects of reinforcement. Medial OFC response was specific to positive relative to negative outcomes, independent of their behavioral significance. This suggests a primarily hedonic response within this region. Negative relative to positive outcomes were associated with bilateral OFC responses, confirming a role for these regions in evaluation of negative stimuli. The right-sided lateral OFC response also coded shift cues compared with stay cues, suggesting either that this region fulfills both evaluative and behavioral guidance functions or that our paradigm failed to dissociate punishment and shifting at a neuronal level. Previous studies have reported neural response in this region associated with a variety of different paradigms involving punishment as well as paradigms involving behavioral change in the absence of punishment. However, it remains unclear whether these processes are distinct or whether behavioral shift cueing is a fundamental element of punishment response. Finally, we observed regions of anterior lateral and central OFC that coded cues for behavioral maintenance, again highlighting the role of OFC in behavioral control as well as reinforcer evaluation. As recent reviews (Kringelbach and Rolls 2004; Kringelbach 2005; Murray et al. 2007; O'Doherty 2007) have stressed, the OFC is a functionally heterogeneous region involved in complex adaptive behaviors. Both reinforcer evaluation and behavioral control functions are important and may be subsumed by distinct but overlapping regions of the OFC.
Medical Research Council Programme Grant (G9828266 to J.F.W.D., M.C. Dolan, I.M. Anderson, S.R. Williams, and R.E.).
Scanning was carried out at the Wellcome Trust Clinical Research Facility, Manchester, and we are grateful to the staff of the facility, particularly, radiographers Lisa Leahy and Barry Withnall. Conflict of Interest: None declared.