To survive in their complex environment, primates must integrate information over time and adjust their actions beyond immediate events. The underlying neurobiological processes, however, remain unclear. Here, we assessed the contribution of the ventromedial prefrontal cortex (VMPFC), a brain region important for value-based decision-making. We recorded single VMPFC neurons in monkeys performing a task where obtaining fluid rewards required squeezing a grip. The willingness to perform the action was modulated not only by visual information about Effort and Reward levels but also by contextual factors such as Trial Number (i.e., fatigue and/or satiety) or behavior in recent trials. A greater fraction of VMPFC neurons encoded contextual information, compared with visual stimuli. Moreover, the dynamics of VMPFC firing was more closely related to slow changes in motivational states driven by these contextual factors rather than rapid responses to individual task events. Thus, the firing of VMPFC neurons continuously integrated contextual information and reliably predicted the monkeys's willingness to perform the task. This function might be critical when animals forage in a complex environment and need to integrate information over time. Its relation with motivational states also resonates with the VMPFC's implication in the “default mode” or in mood disorders.
Natural selection puts a strong pressure on all animals to optimize the ratio between reward-associated costs and benefits (Milton and May 1976; Stephens and Krebs 1986; Altman 2006). This optimization can be achieved by simple stereotyped reflexes to behaviorally relevant stimuli (Lorenz 1981; Berridge 2004). With only these reflexes, however, behavior would remain directly bound to the immediate environment, and in several species, goal-directed behavior enables integration of information over space and time (Balleine and Dickinson 1998; Clayton et al. 2003; Correia et al. 2007; Fuster, 2008; Aminoff et al. 2013). In primates, this would be crucial since most of them are frugivorous and fruiting trees are both sparsely distributed and highly seasonal: primates could not just wander randomly and expect to find fruits “by chance” in the forest (Cunningham and Janson 2007; Janmaat et al. 2011, 2013; Noser and Byrne 2015). This ecological pressure might have driven the evolution of specific cognitive abilities associated with the prefrontal cortex, which is particularly developed in primates (Fuster 2008; Passingham et al. 2012; Genovesio et al. 2013).
Recent work has emphasized the key role of the ventral prefrontal cortex in the representation of reward values (O'Doherty et al. 2001; Padoa-Schioppa and Assad 2006; Kable and Glimcher 2007; Chib et al. 2009; Lebreton et al. 2009; Walton et al. 2009; Bouret and Richmond 2010; Boorman et al. 2013; Hosokawa et al. 2013; Klein-Flugge et al. 2013; Strait et al. 2014). Anatomically, the medial and orbital regions can be dissociated by the strength of connections with the hippocampus and parahippocampal cortices (stronger for the medial network) and sensory structures (stronger for the orbital network) (Lavenex and Amaral 2000; Ongür and Price 2000; Neubert et al. 2015). Functionally, however, this distinction remains debated. In humans, there is a consensus regarding the specific implication of the ventromedial prefrontal cortex (VMPFC) in processing subjective value (Kable and Glimcher 2007; Rangel et al. 2008; Lebreton et al. 2009; Rushworth et al. 2011; Bartra et al. 2013; Clithero and Rangel 2014). In monkeys, activity related to reward value has been traditionally described in the orbitofrontal cortex (OFC), especially when reward information is provided by sensory stimuli (Thorpe et al. 1983; Tremblay and Schultz 2000; Roesch and Olson 2004; Padoa-Schioppa and Assad 2006). However, more recent studies indicate that the VMPFC may also encode value in monkeys, especially when it relies upon internal information (Bouret and Richmond 2010; Noonan et al. 2010; Monosov and Hikosaka 2012; Strait et al. 2014; Abitbol et al. 2015). This is coherent with studies emphasizing the importance of the interaction between VMPFC and the medial temporal lobe to attribute value to imaginary items (Peters and Büchel 2010; Barron et al. 2013; Clark et al. 2013; Lebreton et al. 2013; Benoit et al. 2014).
In that framework, the primate VMPFC should be critical for adjusting the willingness to engage in the current course of action based on a slow accumulation of internal and external information. Critically, this estimate should not be limited to specific events implemented in experimental tasks. Rather, it should continuously integrate information over time, irrespectively of its source (sensory stimuli or contextual information). To test this hypothesis, we recorded single unit activity in monkeys performing a task where reward and effort levels were systematically manipulated. We focused on area 14r, a part of the VMPFC which is specific to primates (Wise 2008). In line with our hypothesis, we found that VMPFC activity was related to several factors driving the willingness to engage in the task. Importantly, this relation was both slow and coherent over time, in line with the idea that VMPFC activity reflects states of motivation more strongly than discrete event-related functions. These results are in line with recent studies in humans indicating that the role of the VMPFC in value-based decision-making is especially critical for guiding behavior based on evaluation processes reaching beyond the immediate environment.
Material and Methods
We used 2 subadult male macaque monkeys for these experiments, A (4 years, 6 kg) and B (5 years, 7 kg). Monkeys were housed in a group of 6 individuals, with free access to food and controlled access to water during the course of the experiments. Monkeys received water as a reward for performing the task. Experiments were carried out in accordance with the European Community Council Directive and the French legislation (Ministère de l'Agriculture et de la Forêt, Commission nationale de l'expérimentation animale) (86/609/EEC). They were approved by the Darwin ethics committee of the university Paris 6 (CREEA IDF no. 3).
The behavioral setting was identical to that of our recent study (Varazzani et al. 2015). Each monkey squatted in a primate chair positioned in front of a monitor on which visual stimuli were displayed. A pneumatic grip (M2E Unimecanique, Paris, France) was mounted on the chair at the level of the monkey's hands. Liquid rewards were delivered from a tube positioned between the monkey's lips. Eye position and pupil area were monitored continuously using a video-based eye tracker (Iscan Inc, MA, USA). All along the experimental procedures, focal distance and magnification have been kept constant after calibration. The behavioral paradigm was controlled using the REX system (NIH, MD, USA) and Presentation software (Neurobehavioral Systems, Inc., CA, USA).
Monkeys were trained to perform a simple force task: A red target point (wait signal) appeared at the center of the monitor. After a random interval of 500–1500 ms, the target turned green (go signal). If the monkey squeezed the grip 200–1000 ms after the green target appeared, the target turned blue (feedback) and the monkey had to maintain the effort for another 300–600 ms in order to get the fluid reward. For this initial phase, the required effort was adjusted to ~70% of the maximum force, assessed by progressively increasing the threshold necessary to obtain the reward.
Once monkeys were comfortable with this task, we progressively introduced the 2 parameters of interest (3 levels of effort and 3 levels of reward) as well as the corresponding visual cues. Importantly, error trials were repeated, to prevent monkeys from systematically “skipping” the least favorite conditions. The cue appeared within 1 s after the onset of the red “wait” signal and remained on the screen until the end of the trial. We used several cue sets per monkey, mostly during the initial recording but also during the recording sessions. All cues were gray-level isoluminant fractals or scrambled versions of the same original fractal image, in order to minimize luminance differences (Fig. 1A). The luminance of all stimuli was first established using software (Matlab and Adobe Photoshop) and confirmed with a photometer. Monkeys were trained until they could reliably express a differential behavior across the 9 conditions.
Once monkeys had reached a stable performance, a 3 T MR image was obtained to determine the location of the VMPFC to guide recording well placement. Then, a sterile surgical procedure was carried out under general isoflurane anesthesia in a fully equipped and staffed surgical suite to place the recording well and the head fixation post. Training resumed 4 weeks after surgery.
Monkeys were trained to perform the task with the head fixed, and then trained to fixate a central spot. Following these simple procedures, we introduced the final version of the task, which required the monkey to fixate the “wait” (red) spot for 500–800 ms before the cue appeared. Monkeys were required to fixate the central spot until reward delivery. The go signal (i.e., red point turned green) appeared 800–1600 ms after cue onset, and the monkey had up to 1 s to respond by squeezing the grip. Once they had reached the required level of force, the point turned blue and the monkey had to maintain the effort level above the required threshold for another 300–600 ms in order to get the reward. The intertrial interval lasted 1300–1700 ms. Again, error trials were repeated.
We sorted the types of errors in 2 categories, those reflecting a choice of the animal to not perform the trial (fixation break and omissions to squeeze the bar) and those reflecting a failed attempt to perform the task (when the exerted force did not reach the required threshold or when the monkey released the grip too early). The latter (around 1% of the trials) were considered together with correct trials as choices to perform the trial. In other words, a choice was counted as positive when monkeys squeezed the bar, and all other cases (including fixation breaks) were counted as negative choices. In the vast majority of cases, monkeys decided to forgo the trial by breaking fixation, which could happen before or after cue onset.
Electrophysiological recordings were made with tungsten microelectrodes (FHC, impedance: 1.5 MΩ). The electrode was positioned using a stereotaxic plastic insert with holes 1mm apart in a rectangular grid (Crist Instruments). The electrode was inserted through a guide tube. After several recording sessions, MR scans were obtained with the electrode at one of the recording sites; the position of the recording sites was reconstructed based on relative position in the stereotaxic plastic insert and on the alternation of white and gray matter based on electrophysiological criteria during recording sessions. All recording sites were located in the ventral part of the girus rectus, from 1 to 7 mm anterior to the genu of the corpus callosum. All recordings were obtained from area 14r, based on architectonic maps of Carmichael and Price (1994). All the neurophysiological and behavioral data were collected on an Omniplex system (Plexon Co, TX, USA). The signal was amplified (×10 000) and filtered (100 Hz–2 kHz) for single unit sorting. Single units were isolated offline using plexon software.
To estimate the willingness to work on a trial by trial basis, we measured the choice to perform the action, or not, as a function of task parameters (Reward and Effort levels) and contextual effects including progression through the session and past responses. The choice variable was equal to 1 in trials where monkeys squeezed the bar, and otherwise it was equal to zero. Note that in some trials, monkeys did not even fixate the point long enough to let the cue appear, but modulated their behavior based on contextual information. Thus, all trials were included in the analysis, in order to provide a continuous measure of the willingness to work, except if specified otherwise (see below). We used a logistic regression to predict trial by trial choices based on a constant term and the 3 key task parameters (Effort level, Reward size, and Trial number). We estimated the regression coefficients of each of these parameters using the glmfit.m function in Matlab, with a logit link. We used this measure to evaluate the willingness to work either in specific subsets of trials (Repeated trials and New trials) or in all trials, whether or not monkeys fixated long enough to let the cue appear. In the latter case, the variable “information about Reward” or “information about Effort” was scored as 0 when it was not available (e.g., in New trials, if monkeys broke fixation before the cue appeared). In all other cases (in Repeated trials or in New trials if monkeys fixated long enough to let the cue appear before choosing to respond or not), the monkeys readily had access to the information about Reward and Effort. Repeated trials correspond to trials where monkeys failed to complete the previous trials but thereby obtained information about Force and Reward levels for the current one. Since monkeys are familiar with the structure of the task (erroneous trials are repeated), they can infer that Reward and Effort will be the same as in the previous trial, choose to engage in the trial or not from the onset of the wait signal, before the cue appears. New trials were trials following a correct response, where Reward and Effort levels could only be inferred from the visual cues. To estimate the progression through the session and the resulting effect of fatigue and satiety, we measured the cumulated sum of trials since the beginning of the recording session (Trial Number, see Bouret and Richmond 2010). We also examined the influence of responses in past trials in a systematic fashion, over distances ranging from 1 to 35 trials back. Importantly, we systematically included potential confounding factors (Reward, Effort, and Trial Number) as coregressors in the model, to evaluate the effect of past responses on current behavior over and above these parameters. We used the following generalized linear model (GLM) (with a logistic link): choice (n) = βr.reward (n) + βe.effort (n) + βt.trial nb (n) + βp.choice (n−x) + constant.
In this equation, n refers to the current trial and x refers to the distance between the current trial and the previous one. We included all trials of a session to estimate the betas and we repeated this analysis for x ranging from 1 to 35, in order to assess the relative weight of past responses as a function of the distance from the current trial.
We used a similar approach to evaluate the influence these factors on neuronal activity. We measured the firing rate of each VMPFC neurons in 3 windows: 1) from 0 to 500 ms after the onset of the fixation point, but only in repeated trials; 2) from 100 to 600 ms after cue onset, but only in unrepeated trials; and 3) from 0 to 500 ms after the onset of reward delivery. To measure the neural encoding of task factors, we used GLM in which neuronal single-trial firing rates were modeled as a constant factor plus a weighted linear combination of 3 variables: Effort level, Reward size, and Trial Number. We used the estimated regression coefficients of each of these variables to compare their relative influence on neuronal activity. The variables were z-scored for each neuron to allow comparison of the effects. The firing rates were raw data expressed in spikes per second. Since we analyzed single unit activity and since we did not pool the firing of multiple neurons together, we did not z-score the neuronal data. We also used a GLM to estimate the relation between firing and willingness to work, over and above other factors such as task parameters and choice at the previous trial (by including them as coregressors). We used the 3 following GLMs:
spike count = βc.choice + constant
spike count = βc.choice + βr.reward + βe.effort + βt.trial nb + constant
spike count = βc.choice + βr.reward + βe.effort + βt.trial nb + βp.previous choice + constant
To compare the relation between VMPFC firing and engagement in the task across several time scales, we measured the firing rate and the number of trials completed in bins of several sizes. For each bin size, the entire session was split into successive bins in which we counted not only the firing rate (number of spikes fired by the neuron in each bin) and the work rate (number of trials completed in each bin) but also the average reward size and the average effort size over all the trials within the bin. Indeed, slow fluctuation in VMPFC activity and Work Rate could be directly driven by slow changes in Reward/Effort or by the progression through the session, so we needed to estimate the variance explained by slow fluctuations above and beyond these factors. Practically speaking, we ran 2 analyses: in the first one, we simply computed the correlation between work rate and firing rate across all bins. In the second model of firing rate, task variables (Reward, Effort, and Trial number) were included as coregressors, along with Work Rate, from which we regressed out the variance due to task factors (reward, effort, and trial number). Thus, for each bin size, we tried to predict VMPFC firing rates using a GLM to estimate the coefficients of 4 parameters: Work Rate (corrected), average Reward, average Effort, Bin number, plus a constant term.
We trained 2 monkeys to perform the effort/reward task depicted in Figure 1A,B. First, monkeys readily used the information about upcoming effort to adjust their behavior. We assessed the influence of Effort level on the amount of force produced using a linear regression, which was significant in both animals: monkey A: βE = 0.8 ± 0.009, P < 10−3; monkey B: βE = 0.9 ± 0.004, P < 10−3. Thus, monkeys readily adjusted the amount of force produced to the task requirement, rather than producing a fixed high force.
Second, monkeys did not systematically perform the action to obtain the reward: on average, monkey A accepted to perform the task in 52 ± 5% of the trials (22 sessions) and monkey B accepted in 66 ± 4% of the trials (36 sessions). In this task, since trials were repeated until they were performed correctly, monkeys could choose to engage in the trial in 2 very distinct conditions: Repeated and New trials. In New trials, which followed a correctly performed action, information about upcoming Reward and Effort was provided by the visual cue. By contrast, in Repeated trials, upcoming Reward and Effort levels could be inferred using general knowledge about the task structure (error trials are repeated) and memory of the previous trial (either the cue itself, the information, or the decision). Thus, at the trial onset (fixation point), monkeys could choose to engage in the trial, or not, based on information in memory rather than based on visual cues. Hence, we separated choices made at the onset of the fixation point in Repeated trials and choices made after the cue onset in New trials. In both cases, the choices consisted either in maintaining fixation and perform the action (“yes”), or in breaking fixation and abort the trial (“no”) (see Methods for further details).
Both monkeys modulated their choices to perform the action as a function of task parameters (the expected amount of reward and effort) and as a function of progression through the session (number of trials performed), which combines fatigue and satiety. But these effects differed between Repeated and New trials (Fig. 1C). For each condition (Repeated vs. New trials) and each monkey (A and B), we used a logistic regression to evaluate the influence of a constant plus 3 parameters (Reward, Effort, and Trial Number) on the choice to perform the action or not. The interactions among the 3 task factors were not included because initial analysis showed that they were not significant. All regressors were z-scored to allow a direct comparison of the estimated coefficients. We then compared these coefficients using a 3-way ANOVA to evaluate the influence of 3 factors: Monkey (2 categories, one for each animal); Condition (2 categories, Repeated vs. New trials), and Variable (3 categories: Reward Size, Effort Level, and Trial Number). There was no difference between the 2 animals (no main effect and no interaction involving the factor Monkey was significant, all P > 0.1). As shown in Figure 1C, the constant was greater in New trials, indicating that globally monkeys were more willing to perform the action in those trials, compared with repeated trials. Indeed, monkeys responded more positively in New (78 ± 2%; average across animals) compared with Repeated trials (50 ± 3%; average across animals). This is presumably because at the time of the cue in New trials, monkeys had already committed to see the cue and obtain the information about costs and benefits. Thus, since they were already engaged, they probably had a bias for performing the action and they were less sensitive to information about costs and benefits. In line with this interpretation, the effects of all 3 factors (Effort, Reward, and Trial Number) were greater in Repeated compared with New trials. First, there was a greater variability in behavior in Repeated trials (total variance = 0.24) compared with New trials (variance = 0.17). As expected, there was a clear difference among the effects of the 3 task variables (Main effect of Variable, F = 65.6; P < 10−4). Both Effort and Trial Number had a negative influence on choices, whereas Reward had a positive effect. But the greater variability in Repeated trials was captured by the clear interaction between the factors Variable and Condition (F = 14, P < 10−4). As shown in Figure 1C, the effects of all 3 factors were more pronounced in Repeated compared with New trials.
One possible confound is that since Repeated trials occur after monkeys fail to complete a trial and they tend to do so more often as the session advances (because of fatigue and satiety), the difference between Repeated and New trials could actually be nothing but a difference between the beginning and the end of the session (over and above the linear effect of trial number, which we already use as a coregressor). Indeed, the percentage of trials where monkeys chose to perform the trial decreased significantly between the first (66 ± 3%) and second half (55 ± 4%) of the sessions (n = 58 sessions, paired test, t = 4.4, P = 5.10−5), together with the percentage of New trials (46 ± 2% vs. 39 ± 3% for the first and second halves of the sessions, respectively, paired t-test, t = 3.1, P = 0.003). The difference in behavior between New and Repeated trials, however, could not be simply accounted for by the advancement through the session (and the corresponding changes in fatigue and satiety). We measured the variance in the choices to perform the trial in the first versus second half of each session, for both New and Repeated trials, and compared the effect of these 2 factors using a 2-way ANOVA (factor “session,” with 2 levels corresponding to the first and second half of the session; factor “trial type,” with 2 levels corresponding to New vs. Repeated trials). There was only a significant effect of “trial type” (F = 9.4, P = 0.002), but no effect of “session” (F = 1.6, P = 0.2) and no interaction (F = 3.3, P = 0.07). Thus, choice variance was significantly greater for Repeated (0.18 ± 0.007) compared with New trials (0.15 ± 0.006), but it was undistinguishable between the first and the second half of the sessions. We also evaluated the impact of the 3 task variables (Reward, Effort, and Trial Number), on choices by measuring the sum of the variance explained by these variables. As we did for the variance in the previous analysis, we compared the effects of “session” and “trial type” using ANOVA, but here we added a third factor (“task variable,” with 3 levels corresponding to Reward Effort and Trial Number). The amount of variance explained was only affected by the factor “trial type” (F = 3.9, P = 0.05), with significantly more variance explained in Repeated (62 ± 5%) versus New trials (38 ± 2%). None of main effects of the other 2 factors and none of the interactions among the 3 factors reached significance (all P < 0.05). The variability in choices to perform the task or not was equally sensitive to the 3 task variables (even if the directions of these effects differed) and the weight of these variables was undistinguishable between the first and the second half of each session, but the 3 task variables had stronger effects on behavior in Repeated versus New trials. Thus, the difference in behavior between New and Repeated trials cannot be simply accounted for by a side effect of progression through the session.
In short, monkeys adjusted their willingness to engage in the task across conditions, defined by a combination of Reward and Effort level and the progression through the task (Trial number). In New trials, monkeys were globally more likely to perform the action and less sensitive to task factors, presumably because they were already engaged in the trial.
We recorded 57 and 56 neurons from the VMPFC of monkey A and B, respectively (see Fig. 2). All neurons encountered along the tracks were included in the analysis, as long as the units were well isolated using a time–voltage threshold discrimination criterion and as long as these criteria (signal-to-noise ratio in the spike shape) remained stable throughout the session. All neurons were recorded from the gyrus rectus, between 1 and 9 mm anterior to the genu of the corpus callosum (area 14r, Carmichael and Price 1994). The activity profiles were similar in the 2 animals, and we did not notice any systematic difference among neurons recorded at distinct sites along the antero/posterior, medio/lateral, or dorso/ventral axis. Thus, the neuronal data (113 units) were pooled. The average firing rate of those neurons was 3.6 ± 0.6 spk/s.
Encoding of Task Variables Around Task Events?
We first examined the modulation of firing as a function of the 3 task variables, Reward Size, Effort Level, and Trial Number (Fig. 3). Figure 3A,B show representative examples of single VMPFC neurons encoding Reward Size across task epochs. We used a GLM to estimate the influence of Effort, Reward, and Trial Number on the firing rate of each neuron in sliding windows around 3 task events: 1) cue onset in New trials (when information about reward and effort was provided by visual cues, Fig. 4A), 2) onset of the fixation point in Repeated trials (when information about trial outcome could be inferred from the previous trial and knowledge about the task, Fig. 4B), 3) reward delivery (the actual trial outcome, Fig. 4C). Note that since fixation point and cue onsets were separated by 500–800 ms, cue-related activity in Repeated trials can be observed ~650 ms after onset of the fixation point in those trials. Conversely, activity around the fixation in New trials occurs about 650 ms before cue onset in those trials. All neurons were included in the analysis and all P values were corrected for multiple comparisons across windows using an False Discovery Rate procedure.
First, for these 3 task events, a greater proportion of VMPFC neurons encoded the factor Trial Number, which captures the slow effects of fatigue and satiety, than to the information about Effort and Reward. Second, a small but significant proportion of VMPFC neurons encoded the amount of expected reward at the cue onset in New trials (Fig. 4A). In the same conditions, we hardly found any VMPFC neuron that was sensitive to the visual information about upcoming effort. In Repeated trials, an equally small but significant proportion of neurons encoded these 2 factors (Fig. 4B). An equivalent proportion of VMPFC neurons also encoded these factors around the trial outcome (Fig. 4C). In short, VMPFC neurons were more sensitive to advancement through the session (Trial Number) than the information about Reward and Effort. The encoding of these 2 factors engaged a small but significant portion of the population, and VMPFC neurons seemed more sensitive to information about Reward than Effort, especially when it was provided by visual cues.
We also examined the relation among responses to Reward, Effort, and Trial Number (Fig. 5). We conducted this analysis at all time points around the 3 task events (fixation point, cue onset, and reward delivery), using the same sliding window procedure for both Repeated and New trials. At each time point, we measured the correlation coefficient of the betas of all 113 neurons for the 3 possible pairs of variables (Reward-Effort, Reward-Trial Number, Effort-Trial Number). None of these correlations reached significance (all P > 0.05), even without correcting for multiple comparison. We represented the relation among neural responses to Effort, Reward, and Trial Number at 2 key events: cue onset in New trials (Fig. 5A) and fixation point onset in Repeated trials (Fig. 5B). We also evaluated the proportion of neurons encoding 0, 1, 2, or 3 task variables at these 2 events (Fig. 5C,D). In line with the previous analysis, very few VMPFC neurons displayed a significant modulation by more than one factor.
Because of the possible confound between the influence of trial type (Repeated vs. New trials) and the advancement through the session, we repeated these analysis after having split each session in 2 blocks (first vs. second half). The results of these analysis were undistinguishable between the 2 blocks (
VMPFC Neurons Encode Reward Size and Trial Number Reliably Over Task Events
As shown in Figure 3A,B, the encoding of reward size by VMPFC neurons was relatively consistent over task epochs. We examined more systematically the consistency of encoding of each of these 3 factors (Effort, Reward, and Trial Number) across task epochs (Fig. 6). Practically, we measured the correlation between estimated regression coefficients of all 113 neurons across 3 epochs of the task: onset of the fixation point, onset of the cue, and trial outcome. As can be seen in Figure 6, all the distributions of regression coefficients were centered on zero (one sample t-test against zero, all P > 0.05), indicating that there was no bias in this population to encode Reward, Effort, or Trial Number in a positive or negative fashion. Rather, the proportion of neurons displaying a positive versus negative relation with each of these factors was equivalent. In other words, the global firing rate of this sample of VMPFC neurons did not change as a function of Reward, Effort, or Trial Number. Overall, Reward Size and Trial Number were encoded reliably across task epochs, but Effort Level was not. More specifically, as shown in Figures 3A and 6 (middle), the regression coefficients for Reward at cue onset were significantly correlated with the Reward coefficients estimated at trial outcome (reward delivery). Note that the correlation between Reward coefficients at the cue and the action onset was also significant (P < 0.05, data not shown). Thus, the encoding of reward levels by individual VMPFC neurons was very consistent across the different epochs of a trial. In addition, Reward Size was reliably encoded across trial types: the coefficients for Reward were significantly correlated between the onset of the fixation point in repeated trials and the onset of the cue in New trials (see Fig. 3B for an example neuron; population analysis in Fig. 6, center). Using the same approach, we found a similar consistency in the encoding of Trial Number across the different task epochs and across the different trials, in line with the idea that this other factor is encoded reliably in the firing of individual VMPFC neurons (Fig. 6, right). We used the same approach to measure the correlation between regression coefficients for the factor Effort across different task epochs but in that case there was no significant correlation (Fig. 6, left; all P > 0.05). This is coherent with the limited sensitivity of VMPFC neurons to information about physical effort compared with Reward and Trial Number.
Altogether, these results suggest that Reward level and Trial Number induce a coherent firing pattern for individual neurons across trial types and task events, yet their influence on firing is heterogenous and independent across the population of VMPFC neurons.
VMPFC Activity and Willingness to Perform the Task
After measuring the relation between VMPFC activity and task factors, we examined its relation with behavior. We focused on the willingness to perform the task on a trial by trial basis by measuring the relation between neural activity and choices to perform the action or not.
We first examined the relation between firing and choices in discrete windows around specific task events, in different subsets of trials: cue onset in New trials and fixation point onset in Repeated trials. We did not notice any specific pattern (e.g., threshold signal) in the relation between willingness to work and the activity if VMPFC neurons. At the onset of the cue in New trials, only 2/113 neurons encoded choices. However, as shown in Figure 1, choices in these conditions displayed little variability and virtually no influence of Trial Number. As discussed above (see Behavior), monkeys have a strong bias for performing the action in New trials and whatever its source, this strong positive bias probably accounts for the lack of modulation by task factors, for both choices and neuronal activity. The proportion of neurons encoding choices was greater at the onset of the fixation point in Repeated trials (n = 28/109), where choices were both more variable and more sensitive to Reward, Effort, and Trial Number. Note that at the fixation point in New trials, only 10 neurons encoded choices, in line with the idea that the VMPFC neurons are more closely related to willingness to work in Repeated versus New trials. Again, we repeated this analysis after splitting the data between the first and the second half of the session, and proportion of neurons encoding choices were undistinguishable between the beginning and end of the session, both for Cue and Fixation point onset, in New and repeated trials (Chi square tests, all Χ2 < 1; all P > 0.05). There are many features that differ between New and Repeated trials, including the presence of the cue and the history of recent events, so drawing a direct comparison between neural activity in these conditions is difficult. But it is safe to conclude that a significant proportion of VMPFC neurons encoded the willingness to engage in the task and perform action to get the reward when monkeys displayed enough variability in their choices.
We assessed the relation between neuronal activity and autonomic arousal by measuring the relation between pupil diameter and VMPFC activity, both at the time of the cue and at the time of action execution. In both cases, we measured the correlation between neuronal activity and the magnitude of event-related pupil dilation. We performed this analysis on a subset of 97 units for which the pupil signal was available, and only 11 and 8 neurons displayed a significant correlation with evoked pupil responses at the cue and the action, respectively. These numbers are smaller than the number of neurons expected by chance (n = 14) given a population of this size. Thus, the relation between neuronal activity in area 14r and autonomic arousal is relatively limited.
Altogether, our results show that VMPFC neurons provide a temporally stable representation of information about reward and trial number. When monkeys readily use this information to adjust their willingness to perform the task (in Repeated trials), VMPFC neurons also reflect the resulting course of action. This indicates that VMPFC activity does not reflect simple sensory-motor processes. Rather, VMPFC neurons were mobilized when monkeys adjusted their behavior based on contextual factors. Next, we decided to explore the slow dynamics of the relation between VMPFC firing and the animals’ willingness to work.
Slow Modulation of VMPFC Activity and Willingness to Work: Influence of Previous Trials
As shown for a representative example in Figure 7, the dynamics of the relation between VMPFC and the willingness to engage in the task was relatively slow, developing over seconds and encompassing several trials. Since the previous analysis was based on subset of trials, it could not capture the slow dynamics of that relation. In other words, the willingness to work is clearly modulated according to a state function, which aggregates not only individual trials events but also past responses and internal changes, at a time scale slower than individual trials in the task. To capture more specifically this slow component of the willingness to work and its relation with VMPFC activity, we considered the lasting influence of past trials, which can readily be quantified in our setting. We measured the influence of previous trials on 2 variables: the choice to perform the task or not and the firing at the onset of the fixation point. Critically, we included every trial in the analysis and we evaluated the influence of previous trials over and above current levels of Reward, Effort, and Trial Number.
We evaluated the encoding of willingness to work by VMPFC neurons and its dependence upon both task factors (Effort, Reward, and Trial Number) and the decision in the previous trial. We first used a GLM to predict the firing of each neuron at the fixation point as a function of a constant term plus the choice (to perform the trial or not), on a trial by trial basis. We found 34 neurons showing a significant modulation as a function of choice in these conditions. We computed a second GLM where we added the 3 task parameters (Reward, Effort, and Trial Number) as coregressors (in addition to choice). In these conditions, only 23 neurons displayed a significant modulation by the choice. We then computed a third GLM that included not only choice and the 3 task factors but also the choice in the previous trial as a coregressor. In these conditions, only 10 neurons displayed a significant modulation by the choice. In short, 34 VMPFC neurons displayed a significant relation with the choice (willingness to work) at the fixation point, only 23 of them were related to choice over and above task factors but only 10 of them were related to the choice over and above task factors and previous choice. This is a significant decrease, compared with 23 neurons that coded choices before previous choice was added as a coregressor (Χ2 = 5.1, P = 0.02) and this is less than the number of neurons expected by chance given a population of this size. This indicates that the relation between VMPFC and willingness to work is significantly affected by the previous actions.
We evaluated the relative weight of past trials as a function of their distance to the current one (up to 35 trials), for both the behavioral response and VMPFC activity. For the behavior, we used a simple logistic regression to account for the choice to perform the task in any trial n with a weighted linear combination of the following regressors: Reward, Effort, and Trial Number at trial n, as well as choices at trial n−x, with x ranging from 1 to 35. This allowed us to measure the relative weight of previous choices as a function of the distance x (in trials) between previous and current choice. The willingness to perform the task in a given trial was related to the behavior up to 20 trials in the past, over and above the positive influence of current Reward and the negative influence of Effort levels and Trial Number (Fig. 8A).
To estimate the influence of firing in previous trials on VMPFC activity, we used a regular GLM to account for spike count at the onset of the fixation point at trial n based on task factors at trial n as well as spike count at trials n−x. We also observed a significant impact of activity in past trials on the firing of VMPFC neurons at trial onset (Fig. 8B). Note that we quantified the effect at the population level by counting the number of significant neurons since there was no tendency for VMPFC neurons to encode task parameters in a systematic positive or negative fashion, the average beta values for Reward, Effort, and Trial Number were not significantly different from zero (second-level analysis with t-tests, all P > 0.05). By contrast, as shown in Figure 8C (black line), the relation between firing in current versus past trials was clearly positive, indicating that the firing of VMPFC neurons was relatively consistent over several successive trials, over and above changes in firing induced by changes in Reward, Effort, or Trial Number.
On average, the influence of previous trials on behavior and VMPFC firing showed a very similar temporal profile (Fig. 8C), with a strong influence of responses occurring up to 20 trials before on behavioral and neuronal responses in a given trial. Given that similarity, we examined the possibility that this slow changes in behavior and VMPFC activity were directly related, on a session by session basis. For both behavior and VMPFC activity, we measured the influence of past responses as a function of the distance between past and current response (ranging from 1 to 35) in each recording session. Note that the average curve for all sessions is available on Panel C, for both behavior and neuronal responses. For each measure (behavioral and neuronal responses) and each session, we fitted the influence of previous responses with a simple exponential function using a variational Bayes method, using the VBA toolbox in Matlab (Daunizeau et al. 2014). Practically, we calculated the optimal parameters a and k describing the function y = a.exp(−k × x), where y is the effect size of the influence of past responses, over and above task parameters (measured using the beta coefficient in the GLM described before) and x is the distance between past and current response, ranging from 1 to 35 trials. After extracting this couple of parameters a and k for both spiking (as and ks) and behavior (ab and kb) in each session, we measured the correlation between parameters describing behavioral and neuronal responses across all 113 recordings. There was a significant positive correlation between both pairs of parameters (as and ab: rho = 0.33, P = 3.410−4; ks and kb: rho = 0.23; P = 0.016; Fig. 8D). As shown in Figure 8D, this effect was not due to a difference between the 2 monkeys. We repeated this analysis after regressing out the difference between the 2 animals, for both spike counts and choices, and the correlation between firing and behavior remained significant (as vs. ab: rho = 0.23, P = 0.015 ; ks vs. kb: rho = 0.27, P = 0.003). Thus, across recording sessions, there was a significant relation between the influences of past trials on the monkeys’ willingness to perform the task and VMPFC firing.
In summary, there was a strong autocorrelation in the willingness to work across successive trials, in line with the idea that willingness to work could be regarded as a state. The fluctuations of VMPFC activity across trials were strongly related to these slow fluctuations of willingness to work.
Slow Modulation of VMPFC Activity and Willingness to Work: Beyond Task Events
Finally, we reasoned that if the firing of VMPFC neurons was encoding the willingness to engage in the task in a continuous fashion, it should reflect fluctuations in work rate (number of trials performed in a given duration) at time scales longer than individual trial events and even longer than the duration of a trial (about 3–4 s). To explore more systematically the relation between VMPFC activity and engagement in the task across time scales, we compared the relation between firing rates and work rate by splitting each session into successive time windows ranging from 2 to 100 s (Fig. 9). This analysis follows on the idea that the encoding of task factors was stable across task events and trial types, allowing for investigation of average activity over long recording periods. Note that the statistical power decreases when the time window is prolonged, as there are fewer data points to assess the correlation. Consistently, the strength of the correlation between firing rate and Work Rate (gray line) displayed a monotonic decrease over window sizes. Yet a large number of neurons significantly encoded Work Rate even at longer time scales.
To verify that this correlation was not simply due to the influence of Reward, Effort, and Trial Number, which all vary at the scale of trial duration (3–4 s), we conducted a second analysis where we corrected Work Rate for the effects of these factors by including them as coregressors in the model to predict firing rate. Practically, we used a GLM to evaluate the impact of 4 variables on VMPFC activity: work rate (corrected), Reward, Effort, and Window Number (see Methods). The sensitivity of VMPFC neurons to the work rate increased for windows of 2–10 s and decreased for wider windows, but the number of neurons showing a significant effect remained above chance level (n = 16/113). Note that the proportion of neurons showing a positive and negative relation between firing rate and work rate was equivalent, so that overall, there was no global change in the firing rate of the population in relation to the slow changes in work rate. In short, for a large fraction of VMPFC neurons, the relation between spiking activity and willingness to work was strong and reliable enough over time to appear when analyzed at a time scale much longer than task events.
In summary, VMPFC neurons coherently encoded information about upcoming reward and progression through the session across distinct task epochs. The encoding of the upcoming effort was weaker and less coherent, but it also affected VMPFC firing at a slower time scale. The activity of VMPFC neurons was strongly related to the willingness to engage in the task, and it captured the influence of multiple factors on behavior, beyond individual task events. Indeed, VMPFC neurons seem to monitor information over time to encode the value of the current course of action in a continuous fashion, over and above the specific task parameters. Thus, VMPFC neurons can continuously integrate internal and external information about potential costs and benefits, to determine the global willingness to engage in a goal-directed behavior.
Encoding of Reward and Effort by VMPFC Neurons
Even if the proportion of neurons encoding Reward Size was relatively limited compared with Trial Number, it was not negligible since it survived correction procedures. Moreover, the encoding of the expected reward size was consistent across task epochs, whether reward was experienced directly (at trial outcome), announced by visual stimuli (at cue onset), or just predicted by memorized information (at fixation point). This reinforces and extends recent neurophysiological studies in monkeys showing that VMPFC neurons reliably encode reward information (Monosov and Hikosaka 2012; Strait et al. 2014).
Our task includes an effort component that is critical for understanding how the VMPFC contributes to balancing energetic costs and benefits for decision-making. But as seen in recent human neuroimaging data (Croxson et al. 2009; Prevost et al. 2010; Skvortsova et al. 2014), VMPFC activity was less sensitive to information about effort, especially when it was provided by visual stimuli. This is in line with the idea that effort processing engages more strongly the anterior cingulate cortex, as was initially found in rats and more recently in monkeys (Walton et al. 2002, Rudebeck et al. 2006, Hosokawa et al. 2013). Moreover, in contrast with what we observed in the VMPFC, individual neurons in the anterior cingulate cortex reliably integrate costs and benefits in a coherent fashion (Shidara & Richmond, 2002; Kennerley et al. 2009; Hunt et al. 2015). Thus, anterior cingulate might be more strongly involved in mobilizing resources for task engagement as a function of both costs and benefits (Thaler et al. 1995; Bonnelle et al. 2016; Scholl et al. 2015; Klein-Flugge et al. 2016). This is reminiscent of noradrenergic locus coeruleus neurons that are activated both by the amount of expected reward and the amount of effort to produce (Bouret and Richmond 2015; Varazzani et al. 2015). By contrast, the VMPFC might only play a role in task engagement as a function of expected benefits. This is reminiscent of the activity of dopaminergic neurons, for which we and others also observed a higher sensitivity to reward versus effort (Gan et al. 2010; Pasquereau and Turner 2013; Varazzani et al. 2015). Altogether, these data suggest that task engagement could rely upon 2 distinct processes, one associated with the anterior cingulate and noradrenaline when engagement relies upon costs and benefits, and one associated with VMPFC and dopamine when engagement relies mostly upon expected benefits.
The activity of VMPFC suggests a role in a relatively abstract representation of task engagement based on contextual information, rather than in simple sensory-motor processes. Indeed, in our experiment, a significant number of VMPFC neurons did encode effort when information was provided by contextual information, both on a trial by trial basis (Figs 5B and 8B) and when we analyzed activity at slower time scale (see Fig. 9). Thus, the difference between reward and effort encoding was most visible following cue onset in new trials, which corresponds to the situation examined in human functional magnetic resonance imaging experiments (Croxson et al. 2009; Prevost et al. 2010; Skvortsova et al. 2014). It is possible that this phasic activation to expected reward represents a reward prediction error, since before new trials the expectation is always the same, such that reward level and reward prediction error are confounded at this epoch. In contrast, tonic encoding of reward and effort levels engaged an equivalent proportion of VMPFC neurons. This was the case both when the information was already known (in Repeated trials) or when taking larger time windows that provide more robust estimates. It remains possible that the continuous coding of effort level represents a subjective effort estimate, which would be more relevant to decision-making than the objective amount of force to exert, which is more important for motor control. This subjective estimate might relate to reward probability, if the effect of effort level on the choice not to perform the task is taken into account. Alternatively, subjective effort might correspond to the discomfort induced by action execution, which would be integrated as a (negative) sensory feedback in brain regions representing the outcome space and not the action space. This idea that information about effort has a different meaning across task conditions could also account for the lack of coherence in the encoding of this parameter across task epochs, compared with reward size and trial number (Fig. 6). But beyond these limitations, VMPFC neurons are more sensitive to effort when its influence can be integrated over time, which is in line with the general idea that VMPFC neurons integrate information relevant for guiding behavior over a relatively slow time scale.
There was no correlation among the sensitivities of individual VMPFC neurons to the 3 task factors (Reward, Effort, and Trial Number). This is in contradiction with recent studies reporting a significant correlation between the estimated parameters capturing the influence of distinct task factors (Strait et al. 2014; Abitbol et al. 2015). But this is coherent with studies indicating that the ability to “multiplex” information is significantly stronger in more dorsal and posterior regions of the medial prefrontal cortex (Kennerley and Wallis 2009; Hosokawa et al. 2013). Thus, multiplexing is probably not the most critical feature of VMPFC neurons. Note that the absence of correlation does not imply that the different task factors were represented in different populations of neurons. It simply means that the code might be more complex than initially thought, as the weights of the different factors were independently distributed over neurons.
Sensory Versus Contextual Information
Another key feature of VMPFC neurons is their strong sensitivity to contextual information, compared with information provided by sensory stimuli.
First, in line with our previous work, a very large fraction of VMPFC neurons encoded the progression through the session (Trial Number), which is considered a proxy for fatigue and/or satiety, was also very reliable across task events and trial types and it is in line with our previous work (Bouret and Richmond 2010). Note that the much stronger sensitivity of VMPFC neurons to Trial Number compared with Reward and Effort level was at odds with the relatively balanced influence of these factors on behavior (Figs 1C and 5), which confirmed the idea that VMPFC neurons were particularly sensitive to contextual information (here in the physiological domain). Note that the activity of VMPFC neurons could not be simply be described in terms of arousal because very few neurons displayed a systematic relation with pupil diameter. This limited relation with autonomic arousal in area 14r contrasts with what has been described for posterior regions of the VMPFC in rats and monkeys (Owens et al. 1999; Resstel and Corrêa 2005; Rudebeck et al. 2014), in line with the stronger connections with regions controlling autonomic responses in subgenual cortices compared with anterior VMPFC regions such as area 14r (Fisk and Wyss 1997; Ongür et al. 1998; An et al. 1998). Thus, the firing of neurons in area 14r of the VMPFC is more closely related to willingness to work, including its internal components such as satiety, than to autonomic arousal.
Second, even if it was relatively small, the percentage of VMPFC neurons encoding Reward and Effort levels was as high or higher in conditions when the information relied on memory (Repeated trials) compared with when it relied upon visual stimuli (New trials). It is difficult to identify the nature of the memorized information in this task, and it presumably includes recent information about task conditions, recent actions, physiological state, and knowledge about the task. But irrespectively of the nature of that information, it is not provided by direct sensory cues and it has a strong influence both on behavior and VMPFC firing. The sensitivity of VMPFC neurons to that contextual information is compatible with lesion studies in rats and monkeys emphasizing the specific role of medial orbitofrontal cortices in computing action value using both observable and unobservable information (Noonan et al. 2010; Bradfield et al. 2015).
This feature, together with the coherent coding over task events, seems much more prominent in the VMPFC compared with more lateral regions of the ventral prefrontal cortex (Bouret and Richmond 2010; Abitbol et al. 2015). In the OFC, the encoding of information related to reward appears more specific and better locked in time to critical events (Wilson et al. 2014; Blanchard et al. 2015; Howard et al. 2015). This is also in line with the imaging literature in humans, which emphasizes the critical role of the VMPFC, but not the OFC, for the representation of subjective value (Chib et al. 2009; Lebreton et al. 2009; Bartra et al. 2013; Boorman et al. 2013; Clithero and Rangel 2014). The subjective value represented in the VMPFC integrates information bits that are encoded in distinct brain regions, notably when computing the value requires mental constructs, rather than direct sensory processing (Hare et al. 2010; Barron et al. 2013; Benoit et al. 2014). Finally, this is coherent with our past work describing the complementary roles of OFC and VMPFC in stimulus-bound versus context-dependent reward processing, respectively (Bouret and Richmond 2010).
Slow Relation Between VMPFC Activity and Willingness to Work
Another important feature of VMPFC activity in this task is the relatively slow time scale with which it integrated information about the task or about the behavior. This is in line with our recent work showing a strong relation between pre-stimulus activity and the subjective value of visual cues, both using single units in monkeys and fMRI in humans (Abitbol et al. 2015). Here, as discussed above, we showed that the encoding of Reward Size and Trial Number was very coherent over time. We also observed a strong autocorrelation in the firing of VMPFC neurons across successive trials, over and above the influence of reward, Effort, and Trial Number. In addition, the autocorrelation in the activity of VMPFC neurons mirrored the strong autocorrelation in behavior (Fig. 8). At the behavioral level, this tendency to maintain the same behavior over up to 20 successive trials, over and above task parameters, suggests that animal goes through series of states. These behavioral states were characterized by distinct levels of engagement in the task. The slow dynamics of VMPFC neurons was directly related to these slow fluctuations in the animals’ willingness to work (Figs 7–9). Thus, VMPFC neurons might be directly involved in the encoding of these motivational states, defined by a global willingness to perform reward-directed actions over and above task relevant information. This slow relation between VMPFC activity and behavior is reminiscent of behavioral studies in monkeys showing that ablation of the VMPFC disrupted the maintenance of a successful strategy over time (Noonan et al. 2012). Similarly, vmPFC activity in humans is also predictive of maintenance of a similar or default-like response strategy (Kolling et al. 2012, 2014). This feature might be important for understanding the implication of the VMPFC in mood and its disorder such as depression (Ressler and Mayberg 2007; Rutledge et al. 2014).
This ability to encode the willingness to work over relatively long time scales, rather than discrete sensory-motor operations, is consistent with the notion that value coding in the VMPFC is automatic. Indeed, the encoding of value in the VMPFC appears even when subjects are not engaged in a valuation task such as choice or rating (Harvey et al. 2010; Levy et al. 2011; Lebreton et al. 2009, 2015). Here the effort task was imperative and did not include any explicit choice phase with different responses corresponding to different options, and yet VMPFC neurons continuously encoded the willingness to work. The fact that the VMPFC automatically and continuously encodes the willingness to perform the task might be reminiscent of its potential implication in default mode network, even if this point remains debated (Raichle and Snyder 2007; Crittenden et al. 2015). Indeed, this network is typically observed in contrasts between blocks of effortful cognitive tasks and resting periods. Even if our single unit data are not directly comparable with the global change in metabolic activity associated with task engagement/disengagement, we confirm that VMPFC neurons display a strong relation with engagement in the task when examined at a slow time scale.
Conclusion and Perspectives
Altogether, this work is directly related to the emerging idea that the VMPFC computes outcome value based on contextual/memory information through a direct interaction with hippocampus and associated cortices (Noonan et al. 2010; Peters and Büchel 2010; Aminoff et al. 2013; Barron et al. 2013; Clark et al. 2013; Lebreton et al. 2013; Benoit et al. 2014; Lin et al. 2015; Brown et al. 2016). This ability to adjust behavior as a function of memorized information is probably critical for primates, which generally forage for fruits in complex and variable environments (Cunningham and Janson 2007; Janmaat et al. 2011, 2013; Noser and Byrne 2015). Thus, the development of this system involving the VMPFC in primates might be directly related to a strong ecological pressure to integrate information about costs and benefits over time and compute the willingness to engage in reward-directed actions based on both immediate and memorized information. One of the challenges ahead is to understand how distinct physiological and ecological constraints have shaped the relative development of these functions and the associated structures across primate species.
European Research Council (ERC-BioMotiv). C.V. received a Ph.D. fellowship from the “École doctorale Frontières du Vivant” (FdV).
We would like to thank Jean Daunizeau for helpful comments on data analysis. We would like to thank Morgane Monfort, Serban Morosan, and the personnel from the ICM primate facility for assistance with surgery and veterinary procedures. Conflict of Interest: None declared.