Anatomical, imaging, and lesion work have suggested that medial and lateral aspects of orbitofrontal cortex (OFC) play different roles in reward-guided decision-making, yet few single-neuron recording studies have examined activity in more medial parts of the OFC (mOFC) making it difficult to fully assess its involvement in motivated behavior. Previously, we have shown that neurons in lateral parts of the OFC (lOFC) selectively fire for rewards of different values. In that study, we trained rats to respond to different fluid wells for rewards of different sizes or delivered at different delays. Rats preferred large over small reward, and rewards delivered after short compared with long delays. Here, we recorded from single neurons in rat rostral mOFC as they performed the same task. Similar to the lOFC, activity was attenuated for rewards that were delivered after long delays and was enhanced for delivery of larger rewards. However, unlike lOFC, odor-responsive neurons in the mOFC were more active when cues predicted low-value outcomes. These data suggest that odor-responsive mOFC neurons signal the association between environmental cues and unfavorable outcomes during decision making.
Orbitofrontal cortex (OFC) is involved in learning and reward-based decision-making (Kringelbach 2005; Schoenbaum and Roesch 2005; Murray et al. 2007; Wallis 2007; Kable and Glimcher 2009; Schoenbaum et al. 2009; Padoa-Schioppa 2011). Although OFC is often treated as a unitary structure, anatomical and imaging studies have suggested dissociable functions within subregions of the OFC (Carmichael and Price 1996; Elliott et al. 2000; O'Doherty et al. 2001; Kringelbach and Rolls 2004; McClure et al. 2004, 2007; Hoover and Vertes 2011; Kahnt et al. 2012; Wallis 2012). This dissociation has become clearer as researchers start to apply focal lesions to different aspects of the OFC in rats and primates (Iversen and Mishkin 1970; Noonan et al. 2010; Rygula et al. 2010; Mar et al. 2011; Rudebeck and Murray 2011a, 2011b). For example, work in nonhuman primates has shown that lateral OFC (lOFC) is critical for updating the value of objects during selective satiation, whereas medial OFC (mOFC) appears to be more critical for stopping responding when previously rewarded objects are no longer rewarded during extinction (Rudebeck and Murray 2011a, 2011b). Other primate labs report that lateral, not medial, OFC is critical for reward–credit assignment, whereas mOFC is necessary for normal reward-guided decision-making (Noonan et al. 2010).
In rats, a similar story is starting to develop (St Onge and Floresco 2010; Mar et al. 2011; Stopper et al. 2014). For example, a recent report showed that lesions to mOFC and lOFC make rats less and more impulsive, respectively, during performance of a standard delay-discounting task (Mar et al. 2011). In this task, rats chose between a large delayed reward and a small immediate reward. Over several trial blocks, the delay that preceded the large reward increased from 0 to 60 s. In tasks like these, rats initially choose the large reward, but gradually stop selecting it when the delay becomes longer. The delay at which the rat stops selecting the large reward reflects the impulsivity level of the rat. Mar and colleagues found that rats with mOFC lesions were less impulsive after extended postlesion training (i.e., continued to choose the large reward at longer delays compared with controls), whereas rats with lOFC lesions were more impulsive (i.e., selecting the large delayed reward less often than controls).
These datasets suggest that models of decision making that include the OFC must be revised to account for the functional dissociation between mOFC and lOFC. Unfortunately, the precise nature of the mOFC's involvement in decision making is still unclear, in part because few studies have examined activity in mOFC in behaving animals. The differential effects observed after lesions to more medial and lateral subregions suggest that neural correlates related to decision making and reward evaluation in the mOFC must be different than those that have been characterized in more lateral portions (Tremblay and Schultz 1999; Wallis and Miller 2003; Roesch and Olson 2004, 2005; Schoenbaum and Roesch 2005; Padoa-Schioppa and Assad 2006; Roesch and Olson 2007; Simmons et al. 2007; van Duuren et al. 2007; Wallis 2007; Kennerley and Wallis 2009; van Duuren et al. 2009; Bouret and Richmond 2010; Kennerley et al. 2011; Morrison and Salzman 2011; Morrison et al. 2011; Padoa-Schioppa 2011). Alternatively, neural processing related to these functions might be similar between these 2 structures and the differential loss of function after lesions might simply reflect the output structures that they project to (Morecraft et al. 1992; Carmichael and Price 1995a, 1995b, 1996; Price et al. 1996; Price 2007; Saleem et al. 2008; Schilman et al. 2008).
To address this issue, we recorded from single neurons in the rostral mOFC while rats performed an odor-guided task in which they chose between differently valued rewards. Value was manipulated by independently varying the expected delay to and size of the reward. At the time of reward delivery, reward-responsive neurons showed elevated firing for immediate and larger rewards. Unlike lOFC—and most reward-related brain areas for that matter—odor-responsive neurons in the mOFC fired significantly more strongly for odor cues that predicted a low value.
Materials and Methods
Male Long-Evans rats (n = 7) were obtained at 175–200 g from Charles River Labs, Wilmington, MA, USA. Rats were tested at the University of Maryland, College Park, in accordance with the University of Maryland and National Institute of Health guidelines.
Surgical Procedures and Histology
Rats had a drivable bundle of 10 25 µm diameter FeNiCr wires chronically implanted in the left or right hemisphere dorsal to mOFC (n = 7; 4.7 mm anterior to bregma, 0.5 mm laterally, and 2 mm ventral to the brain surface; Bryden, Johnson, Diao, et al. 2011). Electrode wires were housed in 27-G cannula. Immediately prior to implantation, the wires were freshly cut with surgical scissors to extend approximately 1 mm beyond the cannula and electroplated with platinum (H2PtCl6, Aldrich, Milwaukee, WI, USA) to an impedance of approximately 300 kΩ. Brains were removed and processed for histology using standard techniques.
We define mOFC as rostral portions of the frontal cortex that include both ventral and medial aspects of the OFC according to Paxinos and Watson (1997). Solid gray bars in Figure 1e represent the estimated location of the recording electrodes based on histology. Electrode penetrations that crossed the coronal plane at which the forceps minor corpus callosum became visible and/or extended more laterally than 1.5 mm were excluded. Two rats were excluded due to the misplacement of electrodes (Fig. 1e, open boxes).
On each trial, a nose poke into the odor port after house light illumination resulted in delivery of an odor cue to a hemicylinder located behind this opening (Bryden, Johnson, Diao, et al. 2011; Roesch and Bryden 2011). One of 3 different odors (2-octanol, pentyl acetate, or carvone) was delivered to the port on each trial. One odor instructed the rat to go to the left to get reward, a second odor instructed the rat to go to the right to get reward, and a third odor indicated that the rat could obtain the reward at either well. Odors were counterbalanced across rats. The meaning of each odor did not change across sessions. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7 of 20 trials and the left/right odors were presented in equal in the remaining trials.
During recording, one well was randomly designated as short (500 ms) and the other long (1–7 s) at the start of the session (Fig. 1a: Block 1). In the second block of trials, these contingencies were switched (Fig. 1a: Block 2). The length of the delay under long conditions abided by the following algorithm: the side designated as long started off as 1 s and increased by 1 s every time that side was chosen on a free-choice odor (up to a maximum of 7 s). If the rat chose the side designated as long fewer than 8 of the previous 10 free-choice trials, the delay was reduced by 1 s for each trial to a minimum of 3 s. The reward delay for long free-choice trials was yoked to the delay in forced-choice trials during these blocks. In later blocks, we held the delay preceding reward delivery constant (500 ms) while manipulating the size of the expected reward (Fig. 1a: Blocks 3 and 4). The reward was a 0.05-mL bolus of 10% sucrose solution. For a big reward, an additional bolus or two was delivered 500 ms after the first bolus. At least 60 trials per block were collected for each neuron. Size blocks were always performed in Blocks 3 and 4 to offset changes in motivation that might occur due to satiety. Essentially there were 4 basic trial types (short, long, big, and small) by 2 directions (left and right). Conditions were pseudorandomly interleaved, so that no more than 3 trial types occur consecutively. Rats were water deprived (∼20–30 min of free water per day) with free access on weekends.
Procedures were the same as described previously (Bryden, Johnson, Diao, et al. 2011; Bryden, Johnson, Tobia, et al. 2011). Wires were screened for activity daily; if no activity was detected, the rat was removed, and the electrode assembly was advanced 40 or 80 µm. Otherwise, active wires were selected to be recorded, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using 4 identical Plexon Multichannel Acquisition Processor systems (Dallas, TX, USA) interfaced with odor discrimination training chambers. Signals from the electrode wires were amplified ×20 by an op-amp headstage (Plexon, Inc., HST/8o50-G20-GR), located on the electrode array. Immediately outside the training chamber, the signals were passed through a differential preamplifier (Plexon, Inc., PBX2/16sp-r-G50/16fp-G50), where the single-unit signals were amplified ×50 and filtered at 150–9000 Hz. The single-unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz, and amplified at ×1–32. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation with event timestamps from the behavior computer and sorted in Offline Sorter using template matching (Plexon).
Analysis epochs were computed by taking the total number of spikes and dividing by the time over which spikes were counted (firing rate). Neurons were first characterized by comparing firing rate during baseline with that in response to odors and rewards, averaged over all trial types (t-test, P < 0.05). Odor-related neural firing was examined over an analysis epoch that started 100 ms after onset of the odor and ended when the rat exited the odor port (“odor epoch”). To analyze reward-related activity (“reward epoch”) and licking rate, we examined activity on short- and long-delay trials starting at reward delivery and ending 1 s later. For large and small trials, the reward epoch started 500 ms after the delivery of the first bolus (i.e., delivery of the second bolus on large trials), and lasted 2 s to capture activity related to consumption of the additional reward. On average, rats spent 3.7 and 5.1 s in the well after reward delivery on small and big reward trials, respectively. Thus, this comparison captures activity when the rats were experiencing the extra boli, but were still in the fluid well. Finally, activity during the 500 ms prior to reward delivery on long-delay trials was examined to determine whether mOFC neurons fired in anticipation of the delayed reward. Baseline activity was taken during 1 s starting 2 s before odor onset (“baseline”).
These epochs (odor and reward) were used for each neuron to compute difference scores (value indices) between differently valued outcomes (i.e., short minus long; large minus small). Wilcoxon tests (P < 0.05) were used to measure significant differences between trial types at the population level, and to measure significant shifts from zero in distributions of value indices, which were not normally distributed (Jarque-Bera; P < 0.05). Analyses of variance (ANOVAs) and t-tests were used to measure differences between baseline and analysis epochs, and between trial types at the single-cell level (P < 0.05). The activity of neurons for which we examined differences between trials types at the single-cell level only violated normality (Jarque-Bera; P < 0.05) in 7% of neurons, which is fewer than expected from chance alone (χ2; P = 0.32). Multiple regression analyses with delay and size as factors were performed on in individual units during the odor and reward epochs (P < 0.05).
Rats were trained on the behavioral choice task in which we manipulated reward size and delay. The sequence of events is described in the Methods and depicted in Figure 1. Rats started each trial by nose poking into a central odor port. After 500 ms, 1 of 3 odors was delivered. Two of the odors signaled to the rat to move to the left or right to receive the reward (forced-choice odors). A third odor indicated that they could choose either well to receive the reward (free-choice). The delay to and size of reward were independently varied in different trial blocks (Fig. 1a). On average, rats performed 244 correct trials per session.
Rats perceived differently delayed and sized rewards as having different values across all 4 trial blocks. On free-choice trials, rats chose the well associated with large reward and short delay significantly more often than the well associated with small reward and long delay, respectively. This was significant across all recording sessions (Fig. 1b; t-test; t68' > 10; P' < 0.05) and individually for each rat (t-test; t68' >4; P' < 0.05). On forced-choice trials, rats were more accurate and faster on large-reward and short-delay trials, when compared with their respective counterparts (Fig. 1c,d; t-test; t68' > 5; P' < 0.05). The impact of expected value on forced-choice trials was also consistent across rats. Each individual rat exhibited significantly faster responding on short-delay and large-reward trials (t-test; t68' > 3; P' < 0.05). Finally, rats were motivated across all 4 trial blocks; reaction times were not significantly different between delay and size blocks (t-test; t68 = 1.2; P = 0.24), and the differences between differently valued outcomes were significant in all 4 trial blocks (t-test; t68' > 3; P' < 0.05). Thus, performance on free- and forced-choice trials was modulated by the predicted outcomes in both size and delay trial blocks.
Reward-Related Activity in the mOFC Was Stronger for an Immediate and Large Reward
We recorded from 251 rostral mOFC neurons in 5 rats (n' = 9, 31, 36, 83, and 92) from 69 sessions. We first characterized neurons as being reward-responsive by asking how many neurons showed significantly higher firing during reward delivery (reward epoch) compared with baseline (baseline epoch; t-test; P < 0.05). The average baseline firing was 4.15 spikes/s (n = 251; 1 s epoch before nose poke). Of the 251 neurons, 56 neurons [22%; n = 3 (33%), 5 (16%), 12 (33%), 15 (18%), and 21(23%)] showed significant increases in firing over baseline, which is more than expected by chance alone (χ2; P < 0.05).
As illustrated by firing of the single-cell examples in Figure 2a,b, and for the population (n = 56) in Figure 2c,d, activity was often higher when the reward was large (Fig. 2a,d; dark gray) or delivered after a short delay (Fig. 2a–c; dark gray), compared with when the reward was small or delayed by several seconds (dark gray vs. gray; reward epoch; single unit: t-test, P < 0.05; population: Wilcoxon; P < 0.05). To quantify these effects, for each reward-responsive neuron, we plotted difference scores between firing during short and long, and large and small rewards, and asked in how many neurons was there significant differential firing within each value manipulation on forced-choice trials (ANOVA; regression; P < 0.05; reward epoch). The distributions of value indices are plotted in Figure 2e. For both delay (Fig. 2ei) and size (Fig. 2eiii), the distributions were significantly shifted (Wilcoxon; P' < 0.05) in the positive direction, indicating that the majority of mOFC neurons fired more strongly for high- compared with low-value reward (i.e., short and large over long and small, respectively). The 2 effects were not correlated (Fig. 2eii; P = 0.37; r2 = 0.01), suggesting that neurons that increased firing for one value manipulation did not show the same change for the other value manipulation as one would expect if activity in the mOFC reflected some sort of common-currency encoding (Roesch et al. 2006).
Finally, the counts of reward-responsive neurons that fired significantly more strongly for high- compared to low-value outcomes (ANOVA; P < 0.025; Bonferroni) were in the significant majority [Fig. 2e; black bars; 20 (49%) vs. 9 (22%); χ2; P < 0.05]. To further illustrate the significance of this result at the single-unit level, we performed a multiple regression analysis with delay and size as factors (Roesch et al. 2006). During the reward epoch, 29 (52%) and 14 (25%) of reward-responsive neurons showed a positive and negative correlation with a value in the multiple regression, respectively (P < 0.05).
Firing to Delayed Rewards
Reward-related activity in the mOFC appeared to reflect the anticipation of reward. During trials in which the delay was only 500 ms (i.e., short, big, and small), activity started to rise prior to reward onset. For long-delayed rewards, significant increases in firing did not occur until after reward delivery. This is evident in the single-cell example shown in Figure 2a and across the population (Fig. 2c). In Fig. 2a,c, activity for rewards delivered after short delay increased firing during the 500 ms preceding reward delivery (dark gray), whereas activity did not show a change in firing during long-delay trials until reward was actually delivered at time zero (light gray). This likely reflects the difficulty that animals have in timing rewards that are delayed by several seconds as measured by anticipatory licking (Fiorillo et al. 2008; Kobayashi and Schultz 2008; Takahashi et al. 2009).
Indeed, rats in our study licked more in anticipation of reward delivery on short- compared with long-delay trials. The average lick rate during the 250 ms before reward delivery was significantly higher for short-delay trials (vs. long-delay; t-test; t55 = 8.0; P < 0.05), suggesting that they could better anticipate the more immediate reward. The strength of this difference was correlated with the difference in firing in the mOFC during short- and long-delay trials during the reward epoch (Fig. 2f; P = 0.05; r2 = 0.07), suggesting that when rats could better anticipate reward delivery, activity was stronger in the mOFC.
This correlation might suggest that activity in the mOFC simply reflected motor commands that are coupled to value, such as licking, orofacial movements, and swallowing (Gutierrez et al. 2006). Although it is difficult to rule this out, we do not think this is the case because the average licking rate during the reward epoch was not correlated with the average firing rate during the same period (reward epoch; P = 0.63; r2 = 0.004) during performance of size blocks. In addition, as we will describe below, activity in the mOFC represented the spatial direction of the movement, whereas the licking rate was not significantly modulated by the spatial location (reward epoch; t-test; t55 = 0.06 P = 0.95). To further address this issue, we examined activity and licking 2–5 s after reward delivery, during which time activity in the mOFC might have reflected prolonged licking associated with consumption of the large reward. Even during this extended period, the correlation between licking and firing rates was not significant (P = 0.18; r2 = 0.04).
Although reward-related activity was attenuated on long- relative to short-delay trials as in Figure 2a, other neurons did maintain firing during anticipation of rewards delayed by several seconds (long-delay trials). This is best illustrated by the single-cell example in Figure 2b (long). Activity immediately preceding reward delivery (500 ms) was significantly higher compared with baseline when rats were waiting for the delayed reward (light gray). Of the 56 reward-responsive neurons, 27 (48%; t-test; P < 0.05) exhibited significantly higher firing during the 500 ms before reward delivery compared with baseline, whereas only 7 fired significantly less (χ2; P < 0.05), demonstrating that many single neurons in the mOFC fired in anticipation of the delayed reward.
Odor-Evoked Activity in the OFC Was Stronger for Cues That Predict Long Delay and Small Reward
Then, we examined activity during odor sampling. Of the 251 total mOFC neurons, 41 [16%; n' = 1 (11%), 1 (3%), 4 (11%), 17 (21%), and 18 (20%)] fired significantly more strongly during odor sampling compared with baseline (odor epoch; t-test; P < 0.05; χ2; P < 0.05). Surprisingly, neurons in the mOFC fired significantly more strongly for odor cues that predicted low-value outcomes. This is illustrated in the single-cell examples and across the entire population of odor-responsive mOFC neurons in Figure 3a–d. Immediately after odor onset, before initiation of the behavioral response (port exit), population activity was significantly higher when small versus large reward was predicted (Fig. 3d; dark vs. light gray; odor epoch; Wilcoxon; P < 0.05) and when long versus short delay was predicted (Fig. 3c; dark vs. light gray; odor epoch; Wilcoxon; P < 0.05).
To further quantify this effect, we computed the difference scores between high- and low-value rewards and asked in how many odor-responsive units did forced-choice odors that predicted low-value reward elicit significantly stronger firing (Fig. 3e; ANOVA; odor epoch; black bars). The number of neurons that fired more strongly for odors that predicted a low value (ANOVA; P < 0.025; Bonferroni) were in the significant majority [11 (26%) vs. 2 (5%); χ2 = 3.8; P < 0.05]. To further illustrate the significance of this result at the single-unit level, we performed a multiple regression analysis with delay and size as factors during the odor epoch (Roesch et al. 2006). As expected from the ANOVA, 1 (2.4%) and 10 (24%) showed a positive and negative correlation with the value, respectively (P < 0.05).
At the population level, both delay (Fig. 3ei) and size (Fig. 3eiii) distributions were significantly shifted in the negative direction (Wilcoxon; P' < 0.05). Although delay effects appeared weaker than size effects, the 2 distributions were not significantly different (Wilcoxon; P = 0.75). As we will describe below, stronger differences emerge when trials are broken down by the direction.
Although the 2 effects appeared to be correlated—most of the cells fell in the bottom left quadrant—the correlation between size and delay indices was not significant (Fig. 3eii; P = 0.20; r2 = 0.04). This indicates that those neurons that fired more strongly for cues that predicted longer delays were not necessarily those that fired more strongly when the same cue predicted small reward, and vice versa, even though the overall effect was one of higher firing for longer delay and smaller reward predicting cues. As in the lOFC, this suggests that encoding in the mOFC does not reflect some sort of common-currency encoding for expected rewards (Roesch et al. 2006).
Increased activity during odors that predict a low-value reward does not simply reflect the fact that rats spent more time in the odor port. Activity in the mOFC does not continue to fire until port exit. This is illustrated in Figure 3f, which aligns activity on odor port exit for high- and low-value trials averaged over size and delay manipulations. Note that activity on low- and high-value trials peak and come back together before port exit. Further, if we repeat the analysis that examines difference scores described in the previous paragraph with an analysis epoch that is cut off at 100 ms after odor offset, instead of being variable to port exit, the results remain the same; both delay and size distributions are significantly shifted in the negative direction (Wilcoxon; P' < 0.05).
Thus, activity during reward delivery and odor sampling in the mOFC carry different signs in relation to rewarding outcomes, with odor- and reward-related activity being stronger and weaker for low-value outcomes, respectively. To determine whether the 2 effects were correlated, we performed a regression analysis on value indices during the odor and reward epoch for all odor- and reward-responsive neurons (n = 97). The correlation between the 2 was not significant (r2 = 0.03; P = 0.10), suggesting that neurons that tended to fire more strongly for cues that predicted a low-value reward did not tend to fire more strongly during delivery of high-value outcomes.
Encoding of Response Direction in the mOFC
Previous results have shown that activity in rat lOFC responds differently depending on the direction of the behavioral response (Feierstein et al. 2006; Roesch et al. 2006, 2007). Here, we asked if activity in the mOFC was also modulated by response direction. Of the 41 odor-responsive neurons, 9 (22%) showed a significant main or interaction effect with response direction in a 2-factor ANOVA (P < 0.025; χ2; P < 0.05) as illustrated by the single-cell example in Figure 4a; activity was highest when odor cues predicted a small reward in the left well. To further qualify this, we broke down the population activity into each cell's preferred and nonpreferred direction (Fig. 4b), as defined by the direction that elicited the strongest response (e.g., left in the example). Here, “preferred” refers to the direction that elicited the highest activity, not the outcome favored by the animal. Across the population of odor-responsive cells, the difference between high- and low-value outcomes appeared to be stronger for responses made in the preferred direction (filled dark vs. light gray) compared with the nonpreferred direction (open dark vs. light gray). In the preferred direction, the distribution of value indices was shifted in the negative direction, indicating stronger firing for a lower-value reward (Fig. 4c; black; Wilcoxon; P < 0.05). In the nonpreferred direction, the distribution of value indices was not significantly shifted (Fig. 4c; gray; Wilcoxon; P = 0.88) and was significantly different than that of indices obtained from preferred direction trials (Fig. 4c; Wilcoxon; P < 0.05).
Of the 56 reward-responsive neurons, 21 (38%) showed a significant main or interaction effect with response direction in the ANOVA (P < 0.025; χ2; P < 0.05). In addition, the difference between high- and low-value outcomes appeared to be stronger for responses made in the preferred direction (Fig. 5b; filled dark gray vs. light gray). In the preferred direction, values were shifted in the positive direction indicating stronger firing for higher-value reward (Fig. 5c; black; Wilcoxon; P < 0.05). The distribution of value indices in the nonpreferred direction was not significantly shifted (Fig. 5c; gray; Wilcoxon; P = 0.28) and was significantly different from that of value indices obtained on preferred direction trials (Wilcoxon; P < 0.05). Thus, we conclude that odor- and reward-responsive mOFC neurons showed enhanced value encoding in the cell's preferred response direction.
Emergence of Outcome Selectivity During Odor Sampling and its Relation to Behavior
In a final analysis, we examined how selectivity for cues that predicted low-value outcomes emerged during learning and whether activity in the mOFC was correlated with reaction time. Figure 6a,b plots the average firing rate over delay and size blocks for movements made in the preferred direction during the first and last 5 trials for each trial type. Consistent with the previous sections, activity after learning was stronger for low-value outcomes (Fig. 6a,b; solid gray vs. black). Interestingly, this selectivity developed as a result of decreased firing on high-value trials that occurred with learning (Fig. 6a,b; black dashed vs. black solid). That is, activity was significantly lower for short-delay and large-reward trials at the end of the trial block relative to the beginning (odor epoch; Wilcoxon; P's < 0.05). This relationship did not exist for cues that predicted low-value rewards (Fig. 6a,b; gray dashed vs. gray solid). Thus, selectivity emerged through a reduction in firing for cues that predicted a high-value reward.
This is further quantified in Figure 6c, which plots the normalized difference between high- and low-value outcomes during the first 5 (black) and last 5 (gray) trials. The distribution is significantly shifted below zero only after learning (Wilcoxon; P < 0.05) and significantly different than during early trials (Wilcoxon; P < 0.05). Differences in firing reflected the rats' behavior in that value-induced differences in reaction time (faster for high-value reward) were present during late (Fig. 6d; gray; P < 0.05), but not early, trials (Fig. 6d; black; Wilcoxon; P = 0.81). Furthermore, the change in firing that occurred over the course of the trial block was significantly correlated with the strength of learning that occurred during the session as measured by changes in reaction time (Fig. 6e; r2 = 0.09; P < 0.05).
These results suggest that mOFC may serve to alter behavior when low-value rewards are predicted by forced-choice cues. If true, then neural selectivity observed after learning, during odor sampling, might be correlated with reaction time differences observed between cues that predict high- and low-value outcomes. Consistent with this hypothesis, strong enhancement of the firing rate during odor sampling was correlated with slower behavioral responses. This is illustrated in Figure 7, which plots the value index (high − low/high + low) computed on average firing rates during the odor epoch against the value index computed for reaction times during those trials. As expected from the analysis above, both indices were negative, indicating slower reaction times and higher firing on low-value trials. Furthermore, both were correlated, demonstrating that when rats showed stronger reaction time differences, neural selectivity in the mOFC was enhanced (P < 0.05; r2 = 0.21).
Consistent with imaging and anatomical studies, recent lesion work in rats and primates has shown that subregions of the OFC perform different functions (Noonan et al. 2010; Mar et al. 2011; Rudebeck and Murray 2011a, 2011b). However, few studies have examined activity in the mOFC making it difficult to understand the exact nature that mOFC plays in reward-guided decision-making. Here, we demonstrate that reward and cue-evoked responses in the mOFC are modulated by the size of and delay to reward, 2 value manipulations that clearly impact decision making. At the time of reward delivery, activity was higher when outcomes were of higher value. During odor sampling, the opposite effect was observed, that is, firing was higher for odor cues that predicted low-value outcomes in odor-responsive neurons. Below we discuss these results, comparing mOFC to activity in other areas, including our own work in lOFC, with the caveat that these comparisons are made across studies, in different rats, and that neurons might have been sampled from different layers considering the structure of these 2 areas.
In previous reports, we characterized firing in the lOFC as rats performed the same task (Roesch et al. 2006; Takahashi et al. 2009; Roesch et al. 2012). Activity related to reward expectancy and delivery was similar across mOFC and lOFC in that overall activity was reduced when rewards were delayed. The major difference between these 2 subregions emerged during the sampling of odors that predicted different outcomes. Although neurons that show increased firing for cues that predict a low-value reward have been described previously in the lOFC, the proportion of neurons in the mOFC showing this effect were in the majority, and the population response was stronger over all neurons when cues signaled a low value. This makes mOFC unique among brain areas thought to be critical in reward-guided decision-making; most reward-related regions in the brain fire more strongly for cues that predict a more valuable reward, including lOFC (Tremblay and Schultz 1999; Roesch and Olson 2004, 2005; Schoenbaum and Roesch 2005; Padoa-Schioppa and Assad 2006; Roesch and Olson 2007; Simmons et al. 2007; van Duuren et al. 2007; Wallis 2007; Kennerley and Wallis 2009; van Duuren et al. 2009; Bouret and Richmond 2010; Kennerley et al. 2011; Padoa-Schioppa 2011).
Such a signal might be important for inhibitory control and/or complement more common response bias signals that are elevated when animals expect better rewards. Consistent with this idea, reports in humans and rats have shown that mOFC dysfunction causes subjects to make more risky decisions, possibly due to a disruption in inhibitory control over biases to select riskier rewards (Clark et al. 2008; St Onge and Floresco 2010; Zeeb et al. 2010; Stopper et al. 2014). Furthermore, our data indicate that increased firing in the mOFC during the sampling of odors that predicted low-value outcomes was positively correlated with differences in reaction time, indicating that when activity was high, reaction times were slow. However, we must exercise some caution here because increased firing on low-value trials might also be interpreted as signals that allow for a behavioral response to occur, albeit away from the more valued outcome.
In none of our studies, have we found a single brain area that increased population firing to cues that predicted a low-value reward (Roesch and Bryden 2011). This includes brain areas that are in relatively close proximity to our recording sites, such as medial prefrontal cortex (mPFC) and lOFC (Roesch et al. 2006, 2012; Gruber et al. 2010). Unfortunately, in this study, we cannot dissociate between medial and ventral OFC because our sample size was too low. With that said, we observed no obvious differences between the 2 regions, but future work is necessary to determine if they carry different signals. Although connections of the medial and ventral OFC do overlap, recent work based on connectivity has suggested that ventral and medial OFC might play different roles comparable with lOFC and mPFC, respectively, and that both ventral and medial aspects of OFC might serve as a link between lOFC and mPFC (Hoover and Vertes 2011; Kahnt et al. 2012; Wallis 2012). Findings such as these make it difficult to draw a hard line between mOFC and mPFC. Regardless of whether you consider this region part of the OFC or PFC, we show that predicted outcome encoding is considerably different relative to the lOFC (Roesch et al. 2006, 2012) and PFC (Gruber et al. 2010), consistent with recent lesion work targeting this specific region (Mar et al. 2011; Stopper et al. 2014).
To the best of our knowledge, there is only one other single-unit study that has shown elevated firing in the majority of cue-responsive neurons when animals anticipate a small reward. In that paper, monkeys performed a go/no-go task for a predicted large or small reward (Minamimoto et al. 2005). They found increased activity in the centromedian nucleus of the thalamus (CM) when monkeys made actions (go or no-go) for a small compared with a large reward. Further, they showed that stimulation of CM caused typically speeded reactions to be slow, demonstrating a role for CM in a mechanism complementary to more common signals that are thought to bias animals toward better reward. Although connectivity between mOFC and CM is relatively light, it is possible that interactions between them are critical for reward-guided behaviors (Hoover and Vertes 2011; Vertes et al. 2012).
For decades, it was thought that OFC was critical for response inhibition because damage to OFC made animals and humans lose aspects of inhibitory control and become more impulsive in their actions (Damasio et al. 1994; Bechara et al. 2000; Berlin et al. 2004; Torregrossa et al. 2008; Schoenbaum et al. 2009). Here, we show that activity was high during situations in which the animal had to inhibit responding at the beginning of trial blocks and when forced-choice trials instructed the rat to respond away from the desired reward toward the low value well. Loss of signaling of unfavorable outcomes during decision making and learning could account for many of the deficits thought to reflect deficits in inhibition. Interestingly, if the role of this signal is to inhibit behavioral output, it appears to do so in an outcome-specific manner, because the correlation between selective firing during size and delay blocks was not significant (Fig. 3e), suggesting that OFC does not output a simple general/global inhibition signal.
One common way to assess response inhibition and impulsivity is to conduct delay-discounting procedures in which animals choose between small immediate rewards and large rewards delivered after long delays (Cardinal et al. 2004; Kalenscher and Pennartz 2008; Zeeb et al. 2010). Although the involvement of OFC in impulsive choice is indisputable, the exact role it plays is still unclear due to conflicting findings from several different labs (Kheramin et al. 2002; Mobini et al. 2002; Winstanley et al. 2004; Rudebeck et al. 2006; Winstanley 2007; Clark et al. 2008; Churchwell et al. 2009; Sellitto et al. 2010; Zeeb et al. 2010; Mar et al. 2011).
To add to the complexity of this story, more recent work has shown that different regions of the OFC serve opposing functions during delay discounting (Mar et al. 2011). In this study, Mar and colleagues showed that lesions of the mOFC made rats discount less, encouraging responding to the delayed reward after extended postlesion training (i.e., less impulsive), whereas lOFC lesions make rats discount more, decreasing preference for the delayed reward (i.e., more impulsive). Still, others have reported no impact of mOFC inactivation on delay discounting (Stopper et al. 2014).
Here, we show that, like lOFC, activity in mOFC reward-responsive neurons was attenuated for delayed reward. These results suggest that distinctive deficits observed after focal lesions to mOFC and lOFC cannot be explained by differences in firing during reward delivery. However, unlike lOFC, odor-responsive neurons in the mOFC signal low-value outcomes at the time of the decision. We suggest that mOFC's role in classic delay-discounting tasks is to signal low-value outcomes during decision making. Although these effects were significant, they were not dramatic, suggesting that other tasks are necessary to fully uncover the roles that mOFC plays in behavior, such as tasks that require more inhibitory-related functions (e.g., stop-signal) and those that require decisions made under risk (e.g., probability and uncertainty).
This work was supported by grants from the National Institute on Drug Abuse (R01DA031695, M.R.).
Conflict of Interest: None declared.