Contributions of the Medial Prefrontal Cortex to Social Influence in Economic Decision-Making

Economic decisions are guided by highly subjective reward valuations (SVs). Often these SVs are over-ridden when individuals conform to social norms. Yet, the neural mechanisms that underpin the distinct processing of such normative reward valuations (NVs) are poorly understood. The dorsomedial and ventromedial portions of the prefrontal cortex (dmPFC/vmPFC) are putatively key regions for processing social and economic information respectively. However, the contribution of these regions to economic decisions guided by social norms is unclear. Using functional magnetic resonance imaging and computational modeling we examine the neural mechanisms underlying the processing of SVs and NVs. Subjects (n = 15) indicated either their own economic preferences or made similar choices based on a social norm-learnt during a training session. We found that that the vmPFC and dmPFC make dissociable contributions to the processing of SV and NV. Regions of the dmPFC processed "only" the value of rewards when making normative choices. In contrast, we identify a novel mechanism in the vmPFC for the coding of value. This region signaled both subjective and normative valuations, but activity was scaled positively for SV and negatively for NV. These results highlight some of the key mechanisms that underpin conformity and social influence in economic decision-making.


Introduction
Behavior is frequently driven by evaluations of the value of different courses of action (Kahneman and Tversky 1984;Rushworth and Behrens 2008). In humans, these evaluations form part of our everyday lives when we make economic decisions and place value on behaviors that result in rewarding outcomes. Such decisions can be highly subjective (Green et al. 1994;Mazur 2001). For example, whilst some people are impulsive and favor small immediate rewards to larger delayed benefits, others are patient and prefer to wait much longer in order to obtain only slightly larger rewards.
The neural basis of such subjective impulsivity in economic decision-making is becoming increasingly understood (McClure et al. 2004;Glimcher 2007, 2010;Kim et al. 2008;Peters and Buchel 2009;Figner et al. 2010;Louie and Glimcher 2010). However, economic decisions are often made in the context of groups of individuals interacting together. Within these groups behavior is often dictated by social norms which determine what behaviors are permissible or preferred (Asch 1956;Kahneman and Miller 1986;Boyd et al. 2003;Fehr and Fischbacher 2004). Such normative preferences often over-ride subjective evaluations, with people flexibly switching between making economic decisions based on subjective or normative valuations. However, there is currently a very limited understanding of how reward valuations based on normative preferences are recalled and influence activity in the brain (Izuma 2013;Ruff and Fehr 2014).
How does the brain process rewards that are valued subjectively, or normatively through recalled social norms?
The medial prefrontal cortex (mPFC) is understood to contribute to the processing of social norm information, but also to impulsive economic decision-making (Izuma 2013). The prevailing view of functional organization in the mPFC suggests that these functions are localized to distinct zones of the mPFC (Rudebeck et al. 2008). It is often claimed that dorsal subregions (dmPFC) are specialized for processing social information, whereas ventral portions (vmPFC) are specialized for processing information about the value of rewards (Amodio and Frith 2006;Rushworth and Behrens 2008).
More recent evidence has cast doubt on whether there is such a strong dissociation. Studies examining the neural mechanisms that underlie the learning of another's subjective valuation of a reward (Garvert et al. 2015), or choosing on behalf of another based on their preferences (Nicolle et al. 2012) have shown the involvement of both the vmPFC and dmPFC. They found that both the dmPFC and vmPFC can be engaged by the value of a reward according to either ourselves or another person. However, these results suggest that dmPFC is engaged when this valuation is not guiding the execution of one's current actions (it is engaged in "offline" valuations) and the vmPFC is engaged by the value guiding one's current choices ("online" valuation'). This would, therefore, point to neither the vmPFC nor dmPFC being engaged by social or nonsocial information specifically. However, these decisions were made either to reward the subjects themselves, or one other person. The mechanisms that underlie how the brain processes the value of a reward when required to conform to a norm are unclear. Moreover, it is unclear how different regions of the mPFC contribute to the processing of both subjective and normative valuations.
In this study, we dissect out the contribution of different regions of the mPFC to making decisions based on subjective (SV) and normative valuations (NV). We designed a paradigm based on previous delay-discounting economic decision-making tasks that have been used to examine subjective, impulsive behaviors (Mazur 2001;Kable and Glimcher 2007). Subjects made intertemporal economic decisions choosing between a small immediate reward, and a larger, delayed reward. On half of the trials these choices were made based on their subjective preferences, and on half of the trials they were made based on a normative valuation that was learnt during a training session. Using this design in conjunction with functional magnetic resonance imaging (fMRI) and computational modeling, we were able to examine the contribution of different regions of the mPFC to the processing of subjective or normative reward valuations.

Subjects
Sixteen healthy right-handed participants were screened for neurological, psychiatric, and psychological disorders (ages 18-32; 13 female). This study was approved by the Royal Holloway, University of London Psychology Department Ethics Committee and conformed to the regulations set out in the CUBIC MRI Rules of Operation. We excluded participants who made translations or rotations of >3 mm (or >3°) volume to volume, or subjects who's total movement was greater than 3 mm. One (male) subject was excluded for failing to respond on 30% of the trials and excessive head motion. Subjects were paid for their participation (see "Payment" in Supplementary Methods). Subjects were informed that a previous behavioral experiment had taken place with 100 participants. They were told that these participants received payment in the same manner that they had. However, in fact no previous study was conducted.

Task
The aim of this experiment was to examine the processing of rewards, the value of which was either discounted by temporal delays subjectively, or in a manner that conformed to a social norm. Subjects were engaged in the experiment over 2 consecutive days. On the first day, subjects performed 2 tasks. First, they indicated their own preferences on a temporal discounting task and second they learnt by trial and error what they believed was normative preferences on a similar delay-discounting task. Subjects were told that this preference reflected what "at least 69% of people chose to do in the prior pilot experiment". On the second day, subjects performed similar trials of a temporal discounting task but on half of the trials they were required to indicate their own preferences (Subjective trials) and on the other half they were required to conform to the "normative" preferences that they had learnt during the training session and not their own preferences. Although it should be noted that subjects were free to choose on both types of trial.
For the temporal discounting task, subjects were required to choose between an immediate reward (£3) and a delayed reward that was shown on the screen (see Supplementary Fig. 1 and Fig. 1). The magnitude of the delayed options were £3.10, £3.75, £5, £8, £12, and £20 and were available at delays of 1 day, 15, 30, 60, 100, and 180 days. Thus, there were 36 different combinations of delay and magnitude that were used as the delayed options. The same options were used in both the training tasks and the task inside the scanner in both conditions, although trials were presented in different pseudo-randomized orders.

Payment
Subjects were told that their payment would be based on their choice on 1 trial each from the normative or subjective trials during the training or scanning session. This payment would be determined by them selecting numbers at random "out of a hat". The number would correspond to one of the trials during the training and scanning session and their payment would be based on whichever choice they indicated on that trial. This approach has previously been used to ensure that subjects are incentivised to accurately indicate their preferences on all trials. Subjects were informed that they would be paid by cheque for their participation. If on the selected trial they chose the delayed option, they would be paid that higher reward value, but the cheque would be dated such that it could not be cashed until the delay had passed. If they chose the immediate option then the cheque would be dated the same day. All of this information was provided to the subjects before the experiment, thus subjects were aware that their decisions during the experiment were real economic decisions. Importantly, subjects were not rewarded during the training or scanning session for making responses that were congruent or incongruent with the norm. This was important, as we did not want subjects to associate making decisions on the normative trials with any additional rewards-other than that being offered in association with a delay. Such additional associations could have distorted behavior and the processing of delayed rewards in the main experiment, as they would only be present in the normative condition and not in the subjective condition. Subjects were told that one trial from the normative condition would be selected at random and would affect their payment, in exactly the same manner as on the subjective trials. The subjects were not, therefore, incentivized to conform but were simply instructed to do so. This ensured that subjects believed that the economic choices made during the main tasks were the only choices that influenced their payment for the experiment.

Apparatus
Subjects lay supine in an MRI scanner (3 T Siemens Trio, CUBIC, Royal Holloway, University of London) with the fingers of the right hand positioned on an MRI-compatible response box. Stimuli were projected onto a screen behind the subject and viewed in a mirror positioned above the subjects face. Presentation software (Neurobehavioral Systems, Inc.) was used for experimental control (stimulus presentation and response collection). A custom-built parallel port interface connected to the Presentation PC received transistor-transistor logic (TTL) pulse inputs from the response keypad. It also received TTL pulses from the MRI scanner at the onset of each volume acquisition, allowing events in the experiment to become precisely synchronized with the onset of each scan. The timings of all events in the experiment were sampled accurately, continuously and simultaneously (independently of Presentation) at a frequency of 1 kHz using an A/D 1401 unit (Cambridge Electronic Design). Spike2 software was used to create a temporal record of these events. Event timings were prepared for subsequent general linear model (GLM) analysis of fMRI data (see "Event Definition and Modeling" section in the main text).

Trial Structure
During the scanning session, each trial began with a trial-type cue (either the word "YOU" or "GROUP"), that indicated whether subjects were required to make subjective preferences ("YOU") or indicate normative preferences ("GROUP") on the trial. Following the trial-type cue, after a variable delay, an offer cue was presented that indicated the magnitude and delay of the delayed option. After a further variable delay, a trigger cue was presented where subjects were required to indicate their choice (for full timings see Fig. 1). A "now" stimulus was used to signify the £3 immediate option and a "wait" stimulus was used for the delayed option. Subjects were required to indicate their choice at the time of the trigger cue, by pressing 1 of 2 buttons on a keypad. The trigger cue was presented for 1000 ms, any responses before or after this time period resulted in the trials being classified as "missed" (All subjects included missed less than 3% of trials). Subjects were instructed to press the button that corresponded to "now" if the preference was the £3 immediate option or "wait" for the larger delayed option. In order to prevent subjects from preparing a specific motor response at the time of the offer cue, the position of the "now" and "wait" stimuli were pseudorandomly organized, such that subjects could not predict which button would be "now" and which would be "wait" at the time of the offer cue. All stimuli were also color-coded to ensure that is was clear whether a subjective or normative preference was required on each trial. Yellow cues indicated that a normative choice should be made and white cues indicated that a subjective choice should be made. By introducing variable jitters between offer cues and responses we decreased the possibility of finding reaction time related behavioral or neuroimaging findings, due to extended and variable fore-periods before subjects indicated choices. As expected, therefore, we found no behavioral effects on reaction times.

Procedure
Training Subjects were trained in 2 phases 1 day prior to scanning. In the first phase, the subject was presented with a series of visual stimuli on a monitor and performed a series of delaydiscounting trials where they were required to indicate their own preferences between the delayed and an immediate Figure 1. Trial Structure. Each trial began with a cue that indicated whether a choice should be made based on a subjective preference ("YOU") or normative ("GROUP") valuation. The font used throughout the trial was color-coded to also indicate the trial-type to the participant. Following a temporal jitter, an offer cue indicated the magnitude (£3.10-£20) and delay (1-180 days) that was on offer on the trial. On each trial subjects were required to evaluate whether they would prefer this offer or a fixed baseline of (£3) following no delay. Following another temporal jitter subjects were required to indicate their response by making a button press on a keypad. To avoid activity at the time of the offer cue being confounded by preparatory motor activity, which button corresponded to taking the delayed offer ("wait") or the immediate offer ("now") varied randomly on each trial. Social Influence in Medial Prefrontal Apps and Ramnani | 4637 options. Each trial consisted of an offer cue (an amount of money and a delay period) and a trigger cue (2 lines corresponding to 2 buttons on the keypad, with the words "wait" above one line and "now" above the other line). During this phase of training subjects performed 108 trials. This stage of the training enabled subjects to familiarize themselves with performing delay-discounting trials. This task allowed us to examine the stability of subjective preferences pre and post the learning of normative preferences.
In the second phase of training, subjects performed a task where they learned the normative preferences for each delayed option. The subjects were informed that a previous behavioral experiment had taken place with 100 participants and that there was always at least 69% agreement on whether people should wait for the delayed reward or take the immediate one. Their task was to learn what this majority/group of people would choose to do through trial and error. Each trial consisted of a delayed offer cue and a trigger cue but also an additional feedback cue. The feedback cue indicated the social norm preference for the delayed option on each trial. The cue was either the word "NOW" or "WAIT", which indicated whether the normative preference was for the immediate or delayed option, respectively. Subjects were instructed to indicate the normative preferences by learning from the feedback. The normative preferences learnt during this session were based around the behavior of subjects during a pilot experiment (see "Computational Modeling" section below for more details). During this session, subjects performed 108 trials, with the same options presented as during the first phase of the training, in a pseudo-random order. During training every subject correctly indicated the norm response at greater than 95% accuracy after the first 10 trials during this session. Moreover, in the subsequent scanning session, subjects performed a further 20 trials of this task with feedback, ensuring that subjects learnt and recalled the norm during the main experiment (see below).

Scanning Session
On the day following the training session, subjects performed a similar delay-discounting task inside the MRI scanner. There were 216 pseudorandomly organized trials, 108 where they indicated their own preference (subjective trials) and 108 where they indicated the preferences according to the majority behavior they had learnt during the training (normative trials). The large number of trials, and the parametric analysis, were approaches employed that can increase power to detect within-subject effects (although no formal power analysis was conducted, due to difficulties in interpretation; Mumford 2012). In this session, there was no feedback cue on the normative trials. Subjects were, therefore, required to recall the normative preferences they had learned during training. In this session, all stimuli were color-coded, with a different colored font used for each condition, ensuring that subjects were able to distinguish between the subjective and normative conditions. This design enabled us to examine activity in the brain when subjects made identical decisions, but when these decisions were based on subjective or normative valuations.

Computational Modeling
Previous research has shown that behavior in delay-discounting tasks can be modeled using a number of different functions (Green et al. 1994;Mazur 2001) that contain discount factors (free parameters that explain how rewards are idiosyncratically discounted by time). Two models were compared separately in terms of their fit to the subjective preferences of the subjects and also the subject's behavior on the normative trials. The first was a hyperbolic model (Mazur 2001) in which the subjective value of a reward (V) is a function of its magnitude (M) and the delay (d) on a given trial: In this model, k is the discount factor, an idiosyncratic free parameter that discounts the magnitude (M) of the reward, such that the subjective value (V) is less than its objective magnitude. The value of k, therefore, reflects the extent to which a subject discounts a delayed reward, such that a high k decreases the value of the reward quickly as the delay becomes greater.
We compared the hyperbolic model with an alternative model, to examine whether he hyperbolic model best reflected choice behavior in both the normative and subjective trials. In this second model, the subjective value of the rewards (V) was a function of the exponential effect on the delay where: In equation (2), the discounting effect of the delay is expressed as an exponential transform (e) of the discount factor (k) multiplied by the delay period (d). As such, the magnitude of a reward (M) is idiosyncratically, but exponentially discounted by the length of delay before its receipt. The hyperbolic and exponential models were fitted separately to the preferences on the subjective trials and the choices on the normative trials. Thus, separate discount factors (k) could be estimated for the subjective and normative trials and also each model could be compared in terms of it is fit to the data.
As in previous studies, these 2 models were fit to the behaviour of the subjects on both the subjective and normative trials using the softmax algorithm and maximum likelihood estimation (Sutton and Barto 1998;Apps et al. 2015). The softmax approach was employed separately for estimation of the normative and subjective discount factors. This method assigns a probability to the choices made by the subjects: . This equation converts the subjective values of the choices made by the subjects (V o1 ) into a probability ( ) P a , as a function of the value of both options. On trials where the delayed option is chosen V o2 is the value of the immediate option and equal to 3 (£3 was always the immediate option). On trials where the immediate option is chosen V o2 is the value of the delayed option. The coefficient β represents the stochasticity (or temperature) of the behavior (i.e., the sensitivity to the value of each option). This algorithm therefore compared the value of the chosen option to the other options, the output is the probability of that option being chosen, given the value of the free parameters (k and β). The values were taken from the 2 models (see equations (1) and (2)) outlined above and fitted separately to both the subject's own preferences and also the behavior on the normative trials. This allowed for comparisons to be made between the fit of the exponential and hyperbolic models for both the subjective and normative behaviors.
Importantly the normative preferences that subjects learnt during the training session were based on a hyperbolic model which was set with a fixed discount factor (k = 0.023). Thus, it would be expected that a hyperbolic, rather than an exponential model would better explain subjects behavior on the normative trials, as we found.
This approach allowed us to examine the blood oxygen level dependent (BOLD) response that covaried with value on each trial. Thus, we could examine activity that covaried with the subjective value (SV) of a reward on subjective trials and with the value of a reward according to the social norm on the normative trials (NV).
It is important to note that we did not compare these models, in which there were separate parameters for subjective and normative choices, with those in which the same parameters could account for choices on both types of trial. Whilst such models may have provided a parsimonious account of the choice data, such an approach may not be theoretically valid or allow us to examine shifts in preferences. The changes in subjective preferences from the training session to the scanning session reveal that (1) the source of the valuation guiding the decisions is distinct and (2) subjects value rewards very differently prior to learning the norm compared to how they do after learning it. As such, assuming a single discount function or single temperature parameter may be parsimonious for explaining the data, but, would not allow us to examine these changes in behavior.

Functional Imaging and Analysis
Data Acquisition 878 echo planar images (EPI) were acquired from each participant. Thirty-eight slices (10% distance factor) were acquired in an ascending manner, at an oblique angle (≈30˚) to the AC-PC line to decrease the impact of susceptibility artifact in the subgenual mPFC (Deichmann et al. 2003). A voxel size of 3 × 3 × 3 mm (20% slice gap, 0.6 mm) was used; TR = 3 s, TE = 32 ms, flip angle = 85°. The functional sequence lasted 51 min. T1-weighted structural images were also acquired at a resolution of 1 × 1 × 1 mm using an MPRAGE sequence. Immediately following the functional sequence, phase and magnitude maps were collected using a GRE field map sequence (TE 1 = 5.19 ms, TE 2 = 7.65 ms).

Image Preprocessing
Scans were pre-processed using SPM8 (www.fil.ion.ucl.ac.uk/ spm). The EPI images from each subject were corrected for distortions caused by susceptibility induced field inhomogeneities using the FieldMap toolbox (Andersson et al. 2001). This approach corrects for both static distortions and changes in these distortions attributable to head motion (Hutton et al. 2002). The static distortions were calculated using the phase and magnitude maps acquired after the EPI sequence. The EPI images were then realigned, and coregistered to the subject's own anatomical image. The structural image was processed using a unified segmentation procedure combining segmentation, bias correction, and spatial normalization to the MNI template (Ashburner and Friston 2005); the same normalization parameters were then used to normalize the EPI images. Lastly, a Gaussian kernel of 8 mm FWHM was applied to spatially smooth the images in order to conform to the assumptions of the GLM implemented in SPM8.

Event Definition and Modeling
In this study, 2 GLM analyses were performed to investigate activity that varied parametrically with the subjective (SV) and normative values (NV) of temporally discounted rewards.
The first GLM was employed to examine activity that varied with the normatively and subjectively discounted values. In the second GLM an analysis was performed to examine whether activity on subjective and normative trials might be better explained by "offline" vs "online" valuations as in Nicolle et al. (2012)-see Supplementary materials.
Each GLM design matrix contained regressors modeling: • Trial-type cue (informing the subject whether they should indicate their own preference or the normative preference on the trial). • Subject offer cue (the delayed reward option on the trials where the subjects indicated their own preferences). • Norm offer cue (cuing the subject to indicate the normative preference). • Subject trigger cue (the trigger cue when the subject indicated their own preference). • Norm trigger cue (cuing the subject to indicate the normative preference). • Missed trials (a regressor modeling the onsets of the trialtype cue, option and trigger cues of missed trials).
Regressors were constructed for each of these events by convolving the event timings with the canonical Haemodynamic Response Function (HRF). The residual effects of head motion were modeled in the analysis by including the 6 parameters of head motion acquired during preprocessing as covariates of no interest. We ran 2 GLMs at the single-subject level that each contained different additional parameters. In addition to the regressors defined for the event types outlined above, each GLM also contained regressors which were first order parametric modulations of the offer cue and trigger cue events. These modulators scaled the amplitude of the HRF in line with either the SV and NV on the subjective or normative trials, respectively. The values from SV and NV were taken from the hyperbolic model as outlined above. Thus, we were able to examine whether the BOLD signal covaried with SV and NV at the time the options were presented and at the time when subjects indicated their choices. The second GLM had the same structure except that different parametric modulators were used. Specifically, we used the offline SV as a parametric modulator on the normative option and trigger cues, and the offline NV was used as a parametric modulator of the subjective option and trigger cues. This allowed us to examine whether different regions of the mPFC showed the same "online/offline" profile as suggested by Nicolle et al. (2012).

Group Analysis, Contrasts, and Thresholding
Random effects analyses were applied to determine voxels significantly different at the group level. SPM{t} images from all subjects at the first-level were input into second-level Flexible Factorial design matrices. T and F-contrasts were conducted on the regressors to examine differences between SV and NV and the effects of SV and NV independently. These contrasts identified voxels in which activity varied parametrically in the manner predicted by the subjective or normative value parameters. The main analysis is reported from activity time-locked to the offer cues. An additional analysis from the trigger cues was also conducted (see Supplementary Table 1). To examine the effects of shifts in subjects' preferences in subjective value from before and after learning the social norm, covariates were entered on ttests for SV and NV in second-level design matrices.

Social Influence in Medial Prefrontal Apps and Ramnani | 4639
To correct for multiple comparisons we first identified a large cluster using a cluster forming threshold (P < 0.001) as recommended by Woo et al. 2014 to identify an initial, large mPFC cluster. To then be more anatomically specific we used masks of putatively areas 8, 9, 11, 32, and preSMA from Neubert et al. (2015) as small volume corrections (SVCs). In addition, we used these same masks in a region of interest (ROI) analysis (see below). The masks of Neubert et al. (2015) were created based on resting-state and diffusion-weighted imaging in both humans and macaques, and highlight the largely preserved connectional properties of these regions across species. These masks therefore provide a detailed anatomically derived parcellation of the mPFC that we can use to delineate the contributions of different regions of the mPFC to value processing in this study. Note that by using this approach, we could examine the effects of SV and NV in different regions of the mPFC in a manner that would not be possible using the traditional approach of contrasting SV with NV and examining the peak response or averaging over the effects across the whole cluster.
In addition to the SVC approach, we also performed a ROI analysis using MARSBAR. In this approach, we averaged over the effects of all of the voxels in each mask. This analysis is reported in full in Supplementary Fig. 2.

Results
We designed a novel version of an intertemporal decision-making task to investigate activity covarying with subjective or normative valuations of rewards (Fig. 1). During fMRI, subjects made choices between a low (£3), immediately received reward, and a larger in magnitude (£3.10-£20) but delayed (1-180 days) reward. On half of the trials subjects were instructed to indicate their own preference, but crucially, on the other half of the trials, subjects were required to indicate the preference. The normative preferences had been learnt through trial and error during a training session (see "Materials and Methods" section and Supplementary Material). Thus on half the trials subjects choices were made based on a subjective valuation of a delayed reward, but on the other half of the trials the same kind of valuation was dictated by a social norm.

Rewards are Hyperbolically Discounted by Delays
Are rewards devalued by temporal delays? Consistent with a large body of previous work, rewards were subjectively devalued by the temporal delay before receipt (Fig. 2 A,B; Supplementary Results). Crucially, however, there was no difference between subjects' choices on the subjective and normative trials at the group level during scanning (P > 0.1). This result importantly ensures that activity identified in neural analyses is not driven by systematic differences in valuation of delayed rewards by subjects in the normative and subjective conditions (see Supplementary Results).
To characterize the nature of the discounting effect, we fitted hyperbolic and exponential "discount" models separately to the choices on the normative and subjective trials, using the softmax algorithm and maximum likelihood estimation (see "Materials and Methods" section, (Fig. 2C) and Supplementary  Table 2.). Thus, for each model we estimated a "discount factor ('k')" which dictates the extent to which a reward is devalued by a delay and a stochasticity parameter (β), which represents how noisy valuations are. To determine which function best fitted the behavioral data, (Fig. 2C) we conducted a 2 × 2 repeated measures ANOVA on the log-likelihood for each condition (subjective, normative) and function (exponential, hyperbolic). We found a marginal effect of condition and a main effect of function, and a marginal interaction (Condition (F 1,14 = 4.391, P = 0.055); Function (F 1,14 = 17.554, P < 0.001); Condition × Function (F 1,14 = 3.874, P = 0.07)). Examination of the log-likelihood estimates shows that the effect was driven by a better fit of the hyperbolic model to both the subjective and normative choice data, in line with previous studies examining intertemporal choice data (Mazur 2001). Further analyses were therefore completed using only the hyperbolic model.

Social Norms are Accurately Reproduced
For the aims of this study it was important that subjects were able to reproduce the normative preferences that they had learnt during the training session accurately on the normative trials inside the scanner. To examine this we compared the estimated discount factor from the hyperbolic model on the normative trials during scanning, with the discount factor which was used to create the normative behavior that subjects learnt through trial and error during training (k = 0.023). We found no significant difference between these discount factors (t (14) = 0.36, P > 0.7), highlighting that subjects' choices on the normative trials were not significantly different from the behavior that they had learnt as normative during training (Fig. 2D).
Were subjects making the same choices on both the normative and subjective trials? There was no significant difference between the discount factors in the normative and subjective conditions (U (14) = 0.47, P > 0.6). However, this absence of a difference can most likely be attributed to the high levels of variability across subjects in the subjective valuation trials. Importantly, there was also no correlation between the subjective discount parameters and the normative discount parameters across subjects (r s = 0.23, P = 0.4). This indicates that whilst at the group level there was no difference in behavior in the 2 conditions, subjects were not simply performing the normative trials using the same discount function as they were on the subjective trials. Thus, they were behaving differently in the 2 conditions, and performing the normative trials in accordance with the normative discount function they had learned.

Shifts in Subjective Preferences after Learning Norms
Seventy-three percentage of subjects shifted their preferences in the direction of the norm between the subjective preferences task at the beginning of the training session and the subjective trials inside the scanner after they had learnt the norm (Fig. 2). A Mann-Whitney U-test on the difference of subjective discount factors from the norm discount factors prior to learning the norm, compared to the difference between the parameters after learning the norm discount factor, showed a significant effect (U (14) = −1.7, P < 0.042, 1-tailed). Overall subjects were significantly closer to the norm in the scanning session, after they had learnt the norm, than they were during the training session before being exposed to the norm. However, there was a correlation between a subjects discounting parameter prior to learning the norm and post learning the norm, suggesting that whilst there was a shift in behavior overall, subjects remained relatively impatient or patient with respect to the other subjects after learning the normative preferences (R s = 0.8, P < 0.001). Thus, in line with research showing that minimal group exposure can have powerful effects on behavior (Hewstone et al. 2002), our results suggest that subjects shifted their subjective preferences toward the norm following the learning of normative preferences, rather than shifting their performance on the normative trials towards their original subjective preferences. However, we note that given the sample size that inferences made upon all analyses related to this shift in preferences should be made with caution.

fMRI Results
In this study, we were interested in examining activity timelocked to the offer cue when subjects evaluated either the subjective value (SV) or normative value (NV) of an offered, delayed, reward (analysis of the response cues are included in the Supplementary Material). To examine the processing of SV and NV, we used the values taken from the hyperbolic model fitted to the subjects behavior, and used them as parametric modulators of activity at the time of the offer cue on the corresponding trials (see Supplementary Table 1 for results from the response cue). That is, we examined activity that parametrically varied with NV on normative trials and activity that parametrically varied with SV on subjective trials.

The mPFC Differentially Codes Subjective and Normative Value
The first hypothesis was that subregions across the dmPFC and vmPFC might signal value differently on subjective and normative trials. To examine this question, we first performed a whole-brain corrected comparison between SV and NV. An Fcontrast between the SV and NV parametric modulators revealed a large cluster extending over both the dorsal and ventral portions of the mPFC (3095 voxels) that was significant at a cluster-wise threshold (Z = 4.09, P < 0.001 uncorrected voxel-wise threshold, P < 0.05 FWE cluster threshold (as recommended by Woo et al. 2014)). Although other regions also showed a significant difference between SV and NV (see Supplementary Results), for the aims of this paper we focus on the mPFC cluster. However, we note that we also found a similar pattern of results in the posterior superior temporal sulcus to those we report in the mPFC, and at a reduced threshold we found a cluster in the ventral striatum that coded SV only ( Supplementary Figs 3 and 5).
In line with our hypotheses, the mPFC cluster contained multiple peaks which corresponded putatively with divisions of the mPFC based on cytoarchitectonic and connectional anatomy Neubert et al. (2015). In line with our predictions peaks were identified in areas BA 8, 9, 11, and 32 (see Supplementary Fig. 2 for masks of these regions). In the following sections, we demonstrate that each of these 5 zones did in fact signal SV and NV differently, however, they clustered into 3 subregions that each had a distinct signature for signaling SV and NV. For each region, we highlight a small volume corrected result to demonstrate the difference in signaling of SV and NV in each region before then highlighting the specific nature of value signaling in each region. Figure 2. Behavioral results. Subjects devalued rewards as a function of temporal delays. As delays increased subjects were more likely to choose the immediate option (A) and as rewards increased they were more likely to take the delayed option (B). Overall choices on the intertemporal choice trials were better explained by a hyperbolic, rather than exponential model (C). (D) Subjective or normative discount parameters estimated based on the subjects choice behavior did not differ from the normative discount factor that was learnt during training (dotted blue line). Subjective discount factors in the scanning session correlated with subjective discount factors during the training before subjects had learnt the normative behavior (E). However, there were substantial shifts in subjective discount factors from before to after learning the norm (F). About 73% of subjects discount factors were closer to the norm after learning normative preferences during training, suggesting they were influenced by the social norm based preferences. Error bars depict standard error of the mean (SEM). Apps and Ramnani | 4641 Coding of Normative, but not Subjective, Value in the Anterior dmPFC To examine effects in the anterior dmPFC, we used masks corresponding to BA 8 and 9, both regions which have been identified in studies examining the neural basis of social norm processing (Izuma 2013). Details of the extent of these masks are shown in Supplementary Material (see Supplementary Fig. 2). We identified a peak within the spatial extent of each mask for the contrast between SV and NV (Area 8 mask: x = −4, y = 32, z = 52; Z = 3.88, P < 0.05 SVC; Area 9: x = 6, y = 52, z = 38; Z = 3.80 SVC) demonstrating that both subregions differentially signaled NV and SV. Examination of the response in the peak voxel in each region (Fig. 3) suggested that activity in both areas covaried only with NV on normative trials and not with SV on the subjective trials. Statistically, this was demonstrated by clusters in the same regions, with the same peak voxels, showing an effect of NV (Area 8: x = −4, y = 32, z = 52; Z = 3.8, P < 0.05 SVC); (Area 9: x = 6, y = 52, z = 38; Z = 3.69 SVC) but no voxels in these regions showing any effect of SV even at a considerably reduced threshold (P < 0.05 uncorrected). Thus, anterior dmPFC subregions were specifically sensitive to NV on normative trials, but not SV on subjective trials. That is the region was sensitive to value but only in the normative condition.

Opposing Coding of Subjective and Normative Value in the vmPFC
We applied the same approach to examine activity in the vmPFC, using masks of areas 11 and 32 (see Supplementary  Fig. 2 for spatial extent). Both of these regions have been identified as signaling subjective value (Kable and Glimcher 2007;Neubert et al. 2015) in human neuroimaging tasks.
Similar to the dmPFC, clusters in both regions showed a significant difference between SV and NV (Area 32 mask: x = 6, y = 46, z = 10; Z = 3.56, P < 0.05 SVC; Area 11: x = −8, y = 42, z = −10; Z = 3.58, P < 0.05 SVC). Activity in these regions showed an effect of both SV and NV, but in the opposite direction (Fig. 3). This was demonstrated statistically by clusters within the spatial extent of both masks-each containing the peak voxel from the contrast between SV and NV-showing a positive effect of Clusters in anterior regions of the dmPFC-areas 9 (A) and 8 (B)-showed a significant negative effect of Normative value (NV) on the normative trials but no effect of Subjective value (SV) on the subjective trials. Plots show response from the peak voxel (responses of the whole clusters are reported in Supplementary Fig. 2). Clusters in the vmPFC-areas 32 (C) and 11 (D)-showed a significant negative effect of NV on the normative trials and a significant positive effect of SV on the subjective trials. A region in the posterior dmPFC showed a significant conjunction (E), signaling both SV and NV and no difference between the 2. Thus, the mPFC is sensitive to reward valuations in intertemporal choice. Error bars depict SEM. Results are shown at P < 0.001 uncorrected for display purposes. SV and a negative effect of NV alone, albeit at a slightly reduced threshold (P < 0.002 uncorrected). Activity in these regions showed an effect of both SV and NV, but in the opposite direction (Fig. 3). This suggests that both subregions of the vmPFC showed a significant effect of value, but activity covaried in the opposite direction for SV and NV in both regions. Our claim is supported by the ROI analyses in which both clusters had significant effects of SV and NV when correcting for multiple comparisons (see "Materials and Methods" section, Supplementary Results and Supplementary Fig. 2). Although the effects for SV in the vmPFC were slightly weaker than our other reported results, there is considerable evidence that the vmPFC does signal such information, particularly in temporal discounting tasks (Kable & Glimcher 2007) and in this study the reduced significance may have been driven by the variability in this region for signaling SV (see below).

Subjective and Normative Value in the Posterior dmPFC
To examine whether any regions processed the value of rewards regardless of whether the valuation was normative or subjective we performed a conjunction between SV and NV. This contrast revealed voxels in in a more posterior region of the dmPFC (P < 0.001 uncorrected voxel-wise threshold, P < 0.05 FWE cluster corrected). This region lying anterior to the border between the precentral gyrus and the superior frontal gyrus on the medial surface fell within the region often referred to as the presupplementary motor area (PreSMA; (x = −8, y = 20, z = 44; Z = 4.96, P < 0.05 SVC)). A cluster and also the peak voxel from the conjunction was also identified independently as showing a significant effect of both SV and NV in the same direction (P < 0.001 uncorrected). Activity in the posterior regions of the dmPFC, therefore, covaried both with SV on subjective trials and NV on the normative trials in the same manner.

Shifts in Subjective Valuations and Conformity to Social Norms
Behaviorally, we found a shift of subjects' valuations toward the normative valuations on the subjective trials after learning the normative preferences. Previous studies have shown that regions of the dmPFC and vmPFC are engaged when updating behavior in order to conform to social norms (Klucharev et al. 2009;Garvert et al. 2015). To examine whether this effect was related to the processing of SV and NV in the mPFC, we used the difference in discount factors between subjective discount factor during the first part of training to that during scanning as a covariate. We then examined whether the extent to which activity that varied with SV on subjective trials or NV on normative trials covaried with the extent to which subjects shifted their choices. We found 3 separate clusters in the mPFC that survived correction for multiple comparisons using the masks of area 32 (14, 48, 4, Z = 3.69, P < 0.05 SVC), 8 (−12, 32, 50, Z = 4.11, P < 0.05 SVC), and 9 (−10, 48, 42, Z = 3.64, P < 0.05 SVC) in which activity was correlated with SV on subjective trials (see Supplementary Fig. 4). Clusters in the same regions also showed a significant correlation between the degree of SV coding and the extent to which subjects got closer to the norm from the first to the second session, at a reduced threshold (P < 0.005 uncorrected). We found no clusters in which activity covarying with NV correlated with shifts in subjective valuations. These findings are consistent with the notion that activity in these regions is related to shifts in behavior based on learning a norm. However, we note that given the sample size that inferences made upon all analyses related to this shift in preferences should be made with caution.

Memory Demands
To examine whether our effects could be driven simply by memory demands on the normative trials compared to the subjective trials, we performed a contrast between activity at the time of the offer cues on the normative trials to those on the subjective trials. We found no voxels in the medial prefrontal cortex showing a significant effect (P < 0.05 FWE-SVC corrected for each ROI). This would, therefore, argue against the notion that activity may be related to simply recalling what decision to make against choosing one based on one's own valuation.

Value not Conflict
Could activity on either trial-type be related to "conflict" between the choice options? To examine this we performed 2 additional analyses. We created a parametric regressor of the difference between the subjective and normative valuation on each trial. In these regressors, covarying activity would be dependent on the degree of conflict between the 2 valuations. We then examined in the ROIs used in the main analyses whether activity covaried with these regressors on the subjective or normative trials. We found that none of the ROIs showed a significant effect of these "conflict regressors" even at a reduced threshold (P < 0.01 uncorrected) on either type of trial. Second, we then examined whether activity in these regions signaled "decision conflict/difficulty", regardless of trial-type. This decision difficulty regressor was the absolute value on each trial-type subtracted from the value of the immediate offer (£3). This examined activity that would be highest when the value of the immediate and delayed option was closest to each other and thus choosing between them would be most difficult. None of the ROIs in the mPFC showed activity that significantly covaried with this this decision difficulty/conflict regressor, even at a reduced threshold (P < 0.01 uncorrected). There is therefore little evidence of any conflict related signals in the regions we examined.

Discussion
In many social situations our subjective valuations of economic rewards are over-ridden by our desire to conform to social norms. In this study, we examined how the brain processes the value of rewards according to a social norm and whether such valuations are coded similarly to subjective value in the mPFC. Subjects performed intertemporal choices that were either based on their own subjective preferences or on a learnt social norm. We found that a large portion of the mPFC was sensitive to reward valuations and showed a difference between the processing of subjective and normative valuations. Across this region, 3 distinct subregions were identified that processed the value of rewards differently if the valuation was driven by subjective or normative information. In support of previous research examining the processing of social norms, our results implicated the anterior portions of the dmPFC (areas 8 and 9). However, we show that this region covaried exclusively with value on the normative trials, and not on the subjective trials. However, activity in posterior regions of the dmPFC covaried with value on both subjective and normative trials and activity Social Influence in Medial Prefrontal Apps and Ramnani | 4643 in the vmPFC varied with subjective value positively but normative values negatively. In addition, we found that the extent to which parts of the anterior dmPFC, and area 32 of the vmPFC, signal SV is correlated with shifts in subjective preferences. Specifically how much individuals have shifted their subjective valuations after being exposed to a norm correlates with the degree to which these regions signal SV. These results highlight that normative valuations are coded distinctly from subjective valuations in the mPFC.
Research examining the neural basis of social norm processing, social influence and conformity has consistently implicated the dmPFC (Izuma 2013). Neuroimaging studies have shown that activity across the PreSMA, BA 8 and BA 9 tracks how different one's opinions are from those of another group or individual, and these regions are engaged when updating one's own, or when learning another group or persons opinions (Berns et al. 2005;Klucharev et al. 2009;Campbell-Meiklejohn et al. 2010;Zaki et al. 2011;Nicolle et al. 2012;Izuma and Adolphs 2013;Garvert et al. 2015;Nook and Zaki 2015). Moreover, transcranial magnetic stimulation studies have shown that information processing in the dmPFC is causally linked to social norm guided behaviors and conformity (Klucharev et al. 2011;Izuma et al. 2015). As a result, it is often argued that the dmPFC processes information that guides behaviors when operating in social groups and when under social influence. Alternatively, some have argued that the mPFC plays an important role in signaling decision difficulty or conflict monitoring (Botvinick et al. 2004). However, we found little evidence of such signals in this experiment. This may be down to the fact that the region which has often been debated as signaling conflict, lies in the ACC in areas 24 or 32, and not in areas 8 and 9. Our results are, therefore, much more consistent with the dmPFC signaling social information that guides behavior when interacting in social groups. We show that such social specificity is only present in the anterior portions of the dmPFC (areas 8 and 9). In addition, we show that specificity for processing NV in the dmPFC may depend on the extent to which people's subjective preferences are influenced by normative information. That is, the extent to which activity in subregions of the dmPFC (area 9) varied with SV was dependent on how much an individual's subjective preferences were influenced by the learning of normative information. This suggests that the extent to which people are influenced by social information might influence how the dmPFC codes our own valuations. This is in line with evidence that the dmPFC may be a key region for learning about the social norms (Izuma 2013;Izuma et al. 2015) and also by how influenced people are by the behavior of another individual (Garvert et al. 2015;Wittmann et al. 2016).
However, our results suggest that these properties are not shared across all of the dmPFC. Specifically, we found that a more posterior portion of the dmPFC (putatively in the preSMA), signaled value on both subjective and normative trials. This would support accounts that suggest this region is important for the processing the value of choices, but does not play any specific role in social or nonsocial behaviors (Nachev et al. 2008). Specifically, there is evidence that neurons that direct actions toward rewards are present in this region and also that this region is important for initiating behaviors (Wang et al. 2001;Nachev et al. 2008). Supporting this claim, anatomical evidence shows that the preSMA is strongly connected to parts of the motor system, with connections of the premotor cortex and parts of the basal-ganglia that are putatively important for valuing and selecting actions (Nachev et al. 2008). Our results support the notion that the preSMA is sensitive to value. However, we show the importance of accurate localization for understanding dmPFC function. Using the approach implemented here and localizing activity to specific regions that are known to have distinct functions, connections and anatomical properties, we were able to show that the PreSMA may signal value regardless of the source of the valuation with greater precision.
A wealth of research has suggested that the vmPFC processes information that is consistent with a role in valueguided choice (Kable and Glimcher 2007;Rushworth and Behrens 2008;O'Doherty 2014;Strait et al. 2014;Manohar and Husain 2016). Such a notion is supported by its anatomical connections to other regions of the brain that process the subjective value of rewards including portions of the intraparietal sulcus, amygdala nuclei, both medial and lateral portions of the orbitofrontal cortex, and the ventral striatum (Haber et al. 2006;Glimcher 2007, 2010;Petrides and Pandya 2007;Kim et al. 2008;Hunt et al. 2012;Neubert et al. 2015). In addition a plethora of neuroimaging studies highlight that activity in this region scales with the subjective value of a chosen reward (O'Doherty 2014). These findings have led many to suggest that this region plays a crucial role in integrating information about the value of rewards in order to select behaviors that have the highest value. Our results do not support this account. Instead, activity in both areas 11 and 32 in the vmPFC scaled positively with SV but crucially scaled "negatively"with the NV of rewards. This suggests that this region is sensitive to the value of rewards, but its processing of value is context-dependent and the nature of the value-computations performed in this region may differ depending on whether a reward valuation is a function of social information or is subjective.
There is in fact a plethora of evidence supporting the notion that the vmPFC processes information that influences social behavior is supported by anatomical and functional evidence. Medial portions of areas 11 and 32 in the vmPFC have connections to portions of the amygdala, dysgranular portions of the anterior insula, posterior portions of the superior temporal sulcus, the anterior cingulate gyrus, and regions in the dmPFC that we found to process only normative value (Morecraft et al. 1992;Petrides and Pandya 2007). These regions are well known for their roles in processing social information (Amodio and Frith 2006;Behrens et al. 2009;Hurlemann et al. 2010;Apps and Tsakiris 2013;Blair 2013;Gu et al. 2015;Lockwood et al. 2015;Apps et al. 2016). Neuroimaging studies have also shown that the vmPFC processes information about the value of rewards that others will receive and is engaged by the value of monetary rewards donated to charity and when enforcing social norms (Krajbich et al. 2009;Cooper et al. 2010;Hare et al. 2010;Tricomi et al. 2010;Baumgartner et al. 2011;Buckholtz and Marois 2012;Janowski et al. 2013;Zaki et al. 2014;Apps et al. 2015). In addition, there is considerable evidence that lesions to the vmPFC, and structural changes to this region in healthy individuals, are linked to antisocial behavior and influence the extent to which people conform to social norms (Blair 2013;Gu et al. 2015;O'Callaghan et al. 2016). Our results therefore go against the viewpoint that the vmPFC does not process social information (Rudebeck et al. 2008) and support emerging evidence that this region may play an important role in processing value-related information during social interactions.
Recently, studies have examined the processing of value in a context where the beneficiary of the value-guided choice was either the subject themselves, or another individual. Sul et al. (2015) suggested that when choosing to benefit either ourselves or another, when the choices are based on our own preferences, more ventral portions of the mPFC encode value for ourselves but more dorsal portions signal value for other people-although this differs between prosocial and selfish individuals. In contrast, Nicolle et al. (2012) and Garvert et al. (2015) found that value was coded in an "offline", modeled frame of reference in more dorsal portions of the mPFC and in a more "online" executed frame of reference in the vmPFC. That is, the vmPFC coded the value of rewards according to whoever might receive the outcome of a decision, whereas the dmPFC signaled value according to the preferences of the person who would not receive the reward on that trial. Our results, however, do not fully support either viewpoint. Our results support the suggestion that dmPFC is specialized for processing social information, as argued by Sul et al. (2015), but would not support their claim that the vmPFC processes rewards only when they are subjectively valued. Likewise, we did not find evidence to strongly support the claim that the findings of Garvert et al. (2015) and Nicolle et al. (2012) that the dmPFC and vmPFC encode value in an offline and online reference frame, respectively.
How can we reconcile these previous studies findings with our results? One possible explanation is that there is a substantial difference in the nature of decisions that conform to norms compared with decisions made that benefit another. Specifically, when making normative decisions an individual is both the beneficiary of the choice and the decision-maker, whereas when choosing for others the individual is not the beneficiary of the outcome. This key distinction in the frame of reference for the beneficiary of the choice may explain why we do not find similar results. In addition, it is also plausible that our results are referring to different regions of the mPFC from those of Sul and colleagues and Nicolle and colleagues. Specifically, our results extended across a large portion of the mPFC (areas 8, 9, 32, and 11), whereas the distinctions identified in previous studies refer to a smaller circumscribed region potentially restricted to distinctions within areas 32 and 9. Thus, our results extend these previous findings by examining regions that extend over a larger spatial extent of the mPFC. Moreover our findings highlight the contributions of several mPFC regions to making decisions based on a social norm guided valuation are somewhat distinct from the contributions made when making decisions to benefit another.
Lastly, could these results be accounted for by differences in the memory demands between subjective and normative decisions? Notably, we did not find any main effect differences in the mPFC when comparing activity time-locked to the offer cues on subjective or normative trials. This suggests that the differences in activity in the mPFC related to the discounted value of delayed rewards and not purely to the requirement to recall a response from memory. Moreover, subjects showed conformity to the norm even on subjective trials, suggesting that recall of the norm was guiding behavior on both subjective and normative trials. Behaviorally therefore, our results suggest that the normative and subjective trials were comparable in terms of their memory demands. Finally, whether the dmPFC processes information in a domain-general way (i.e., is engaged by memory processes including those that are socially relevant) or processes exclusively social information is a question that has been examined extensively in social neuroscience. There is considerable debate over whether parts of the mPFC operate in a domain-general or socially specific manner (Amodio and Frith 2006;Izuma 2013), and whether operating in larger social groups was an evolutionary pressure for greater memory capacity and the expansion of the prefrontal cortex (Dunbar and Shultz 2007). Theoretical accounts also directly link the evolution of the mPFC to the requirement to process increasingly abstract rules about how to interact with others (Murray et al. 2016) and prefrontal cortex development is linked to the degree to which social influence changes behavior through ageing (Steinberg and Monahan 2007;Steinberg 2008;Tamnes et al. 2017). To base one's behavior on recalled information rather than one's own preferences may therefore be a fundamental aspect of social norm guided behavior and a key mechanism underlying the functional properties of the mPFC. The issue of whether information processing in the dmPFC is exclusively "social" in nature, or is a more domain-general process, cannot be resolved by this study alone. However, crucially either of these possibilities does not contradict our key argument, that the neural mechanisms underlying subjective valuations may be different than those that underlie the social norm recalled valuations.
In summary, we have identified 3 zones in the mPFC that process value-related signals but each has a different profile for processing subjectively or normatively valued rewards. Our results highlight some of the key neural and computational mechanisms that may underpin reward valuation and social influence. Moreover, this may pave the way for understanding why people can make much more impulsive or patient economic decisions when interacting with others than they would when making the same decisions alone.