Rewards Enhance Proactive and Reactive Control in Adolescence and Adulthood

Abstract Cognitive control allows the coordination of cognitive processes to achieve goals. Control may be sustained in anticipation of goal-relevant cues (proactive control) or transient in response to the cues themselves (reactive control). Adolescents typically exhibit a more reactive pattern than adults in the absence of incentives. We investigated how reward modulates cognitive control engagement in a letter-array working memory (WM) task in 30 adolescents (12–17 years) and 20 adults (23–30 years) using a mixed block- and event-related functional magnetic resonance imaging design. After a Baseline run without rewards, participants performed a Reward run where 50% trials were monetarily rewarded. Accuracy and reaction time (RT) differences between Reward and Baseline runs indicated engagement of proactive control, which was associated with increased sustained activity in the bilateral anterior insula (AI), right dorsolateral prefrontal cortex (PFC) and right posterior parietal cortex (PPC). RT differences between Reward and No reward trials of the Reward run suggested additional reactive engagement of cognitive control, accompanied with transient activation in bilateral AI, lateral PFC, PPC, supplementary motor area, anterior cingulate cortex, putamen and caudate. Despite behavioural and neural differences during Baseline WM task performance, adolescents and adults showed similar modulations of proactive and reactive control by reward.


Introduction
Adolescents' ability to exert cognitive control is particularly susceptible to potential rewards and affectively charged contexts (Crone and Dahl, 2012;Cohen et al., 2016;van Duijvenvoorde et al., 2016). Prevailing frameworks suggesting a maturational imbalance in adolescence have focused on instances when cognitive control fails to constrain reward-sensitive systems, leading to potentially negative outcomes, typically during risky decision-making (Casey, 2015). Less is known about situations in which cognitive control might be enhanced by reward sensitivity (Strang and Pollak, 2014). In this study, we explored whether adolescents and adults can adaptively engage cognitive control processes as a function of the temporal dynamics of reward to maximise their performance in a working memory (WM) task.
The dual mechanisms of control (DMC) framework distinguishes between two temporally distinct cognitive control strategies (Braver, 2012). Proactive control refers to the sustained maintenance of goal-relevant information in anticipation of a cue. Reactive control refers to the transient reactivation of goals in response to a cue. Reactive control is less demanding than proactive control but more susceptible to interference (Braver, 2012;Chiew and Braver, 2017). While adults vary in the recruitment of proactive and reactive control as a function of trait factors (Locke and Braver, 2008;Chiew and Braver, 2017), they can flexibly engage the most efficient mode of cognitive control to adapt to contextual demands, as evidenced by changes in response to experimental manipulations (Braver et al., 2009;Chiew and Braver, 2013). Mixed event-related/blocked functional magnetic resonance imaging (fMRI) designs (Visscher et al., 2003) are specifically optimised to dissociate sustained vs transient changes in neural activation within a single experimental paradigm. These designs have been employed to study cognitive control (Marklund et al., 2007;Brahmbhatt et al., 2010;McDaniel et al., 2013) and the impact of reward manipulations on cognitive control strategies (Jimura et al., 2010). fMRI studies have predominantly implicated the frontoparietal network in implementing proactive and reactive control (Braver et al., 2009;Jimura et al., 2010). In addition, the dorsal anterior cingulate cortex (ACC) is also involved in sustained control, and the rostral ACC is involved in reactive compensations (Jiang et al., 2015). Further, the anterior insula (AI) participates in estimating the volatility of control demands and the caudate in predicting forthcoming demands (Jiang et al., 2015).
Reliance on reactive control in early childhood shifts towards a mix of proactive and reactive control depending on individual differences and task demands in mid-to late childhood (Chevalier et al., 2015). By age 8, children seem to have the capacity to flexibly adapt strategies to be more efficient (Chatham et al., 2009;Blackwell and Munakata, 2014;Chevalier et al., 2015). In a handful of fMRI studies, more protracted proactive control development compared to reactive control has been described in adolescence (Velanova et al., 2009;Andrews-Hanna et al., 2011;Alahyane et al., 2014) and was associated with reduced prefrontal activity in adolescents compared to adults in posterior dorsolateral prefrontal cortex (PFC) (Andrews-Hanna et al., 2011) and with reduced sustained activity in children and adolescents compared to adults in a region near the inferior frontal junction (Velanova et al., 2009). In contrast, Alahyane et al. (2014) found that adolescents and adults had comparable frontoparietal activity associated with prosaccade and antisaccade preparation, which was higher than in children (8-12 years old).
The balance between reactive and proactive cognitive control is sensitive to the motivational context (Braver et al., 2014;Chiew and Braver, 2017) and interacts with reward circuitry in the presence of incentives . Reward-driven enhancement of performance may be driven by top-down control mechanisms that modulate the processing of subsequent stimuli in preparatory fashion through increased sustained proactive control (Locke and Braver, 2008;Jimura et al., 2010) or transient increases in reactive control on a trial-by-trial basis (Jimura et al., 2010). There is also evidence for some contribution by more automatic bottom-up processes, suggesting increased saliency of reward-related features (Krebs et al., 2015).
In the absence of reward, cognitive control continues to develop and become more stable during adolescence (for WM, see review in Zanolie and Crone, 2018). Over the course of development, cognitive control-related prefrontal activation becomes more attuned to varying contextual demands (Chevalier et al., 2019). Adolescents can improve their inhibitory control performance to match adults' performance in the presence of reward (Geier et al., 2010;Padmanabhan et al., 2011;Luna et al., 2015;Zhai et al., 2015). Along these changes in performance, in the reward context, adolescents show increased transient recruitment of cognitive control regions (frontal cortex along the precentral sulcus) and reward regions (ventral striatum) during response preparation, compared to adults (Geier et al., 2010;Padmanabhan et al., 2011). Corticostriatal coupling under high and low rewards continues to develop in adolescence, underlying the increased capacity of adults to modulate cognitive control selectively in the context of high rewards (Insel et al., 2017). Overall, this suggests greater integration of executive control and motivation during development (Smith et al., 2014).
In contrast, Strang and Pollack (2014) found that in a task of proactive and reactive control [AX-Continuous Performance Test] children, adolescents and adults between 9 and 30 years old showed a similar ability to shift into a proactive control strategy in the context of reward, associated with increased sustained activity in the right lateral PFC, right posterior parietal cortex (PPC) and right AI, among other regions. An outstanding question is whether modulation of transient brain activity can also be observed across age groups. Greater modulation of prefrontal activation in response to contextual demands has been proposed as one of the developmental mechanisms underlying cognitive control development (Chevalier et al., 2019).
Here, we investigated the age-related increases in proactive and reactive cognitive control and their modulation by a motivational (reward) context that varied trial-by-trial and across blocks. We employed a mixed block-and event-related fMRI design while adolescents and adults completed a WM task in neutral and reward conditions adapted from Jimura et al. (2010). We used a mixed experimental design which allowed us to detect sustained brain activity across blocks (proactive control) and transient activity in response to trials (reactive control). We expected adolescents to be more reliant on reactive control and to show greater sensitivity to a rewarding context, in terms of behaviour and transient neural activity, while we expected adults to exhibit a more proactive control strategy, with associated sustained frontoparietal activity across blocks.

Participants
Thirty adolescents (15 females, 12-17 years old, mean [M] = 14.6 ± 1.4 [standard deviation (SD)]) and 20 adults (10 females, 22-30 years old, M = 27.1 ± 1.9) participants took part in this study. Participants were reimbursed £20 (plus up to £8 depending on their performance on the task) and their travel expenses. This study was approved by the University College London Research Ethics Committee. Consent was obtained according to the Declaration of Helsinki, adults and the parents of adolescents provided written consent while adolescent themselves gave verbal consent. Adolescent and adult groups did not differ in their age-normed scores on the Vocabulary subtest of the Wechsler Abbreviated Scale of Intelligence (WASI-II; Wechsler, 2011) [adolescents: M = 66.9 ± 0.9 (s.d.); adults: M = 64.7 ± 1.7; t(28.8) = 1.15, P = 0.25].

Experimental design and stimulus material
Design. The fMRI task had one between-subjects factor (age group, adults and adolescents) and two types of within-subject factors: either sustained, run effects (Baseline run vs Reward run) or transient, trial effects (Baseline trials vs Reward trials vs No reward trials). In the Reward run, half of the trials had potential rewards (Reward trials) and half did not (No reward trials). Preceding the Reward run, participants were unaware of the potential rewards, and hence all the Baseline trials in the Baseline run were unrewarded (Fig. 1A).

Letter-array WM task.
We employed a fixed set-size Sternbergitem recognition task adapted from Jimura, Locke and Braver Participants performed two runs of 30 trials each of a letter-array WM task. In the first run, none of the trials was rewarded. In the second run, half of the trials could be rewarded. (B) Example trials. On each trial, participants were presented with a five-letter set and, after a delay, had to indicate whether the probe was present in the set. Each trial was preceded with a cue screen indicating whether participants could earn rewards (stars) on this trial and followed by feedback on performance. The next trial started after an intertrial interval lasting 2.5, 5 or 7.5 s.
(2008) (Fig. 1B). At the beginning of each WM trial, a cue indicated whether a potential reward could be obtained on this trial (Reward trial) or not (Baseline trial or No reward trial). Five uppercase consonants were then presented and after a retention interval a single lowercase probe letter. Participants indicated by pressing one of two buttons on a handheld response box whether the probe matched one of the letters from the memory set (right index finger) or not (right middle finger). Participants were encouraged to respond both accurately and quickly. Visual feedback indicated whether the response was incorrect, too slow, correct and not rewarded or correct and rewarded (Fig. 1B). Cut-off times were individually set for each participant based on his/her own median correct reaction time (RT) on trials performed in the practice.

Other behavioural measures.
After scanning, participants completed computerised versions of the Behavioural Activation and Inhibition Scale (BIS/BAS; Carver and White, 1994); Sensitivity to Punishment and Sensitivity to Reward Questionnaire (SPSRQ; Torrubia et al., 2001); WEBEXEC, a web-based short self-report of executive functions (Buchanan et al., 2010); and a simple Go/No Go task (Simmonds et al., 2008;Humphrey and Dumontheil, 2016). Lastly, participants completed forward and backward digit span tasks and the Vocabulary subtest of the WASI-II. In addition, after scanning, participants rated how rewarding they found both stars and money on a scale from 1 (not at all) to 5 (very rewarding).

Procedure
Participants were trained on the letter-array WM task outside the scanner. After receiving task instructions, participants per-formed one block of 10 trials with a cut-off time of 2.5 s and one further block of 15 trials with their individual cut-off time limit (median RT in the first 10 trials). This was done to adapt task difficulty to each individual and quickly achieve a consistent level of performance.
In the scanner, participants first performed the Baseline run (30 Baseline trials) of the task. At this point, participants were naïve regarding the chance to earn further money based on their performance on the task. Participants were then given further instructions regarding the reward component of the second run; they were told: 'In the second run, in some of the trials you can earn stars. Stars will turn into money in the end. You can win up to £8.00'. Participants were also introduced to the reward cues and reward feedback. Participants then performed the Reward run (15 Reward trials, 15 No reward trials). The order of the trial types was fixed in one of two possible sequences, which were counterbalanced across participants. Sequences started with the presentation of a Reward trial and did not present the same trial type more than twice in a row (i.e. RNRRN RNRN RNRNN NRNRRN). Task blocks lasted 57.5-95.0 s and alternated with fixation periods lasting 21.9-29.7 s. Block starts and ends were indicated by a 1.5 s instruction screen. The task was programmed in Cogent (www.vislab.ucl.ac.uk/ cogent_graphics.php) running in MATLAB (The MathWorks, Inc., Natick, MA).

Image acquisition
Functional data were acquired using The Center for Magnetic Resonance Research (CMRR) at the University of Minnesota multiband echo-planar imaging sequence (Xu et al., 2013) 2x acceleration, leak block on (Cauley et al., 2014) with blood-oxygen-level-dependent (BOLD) contrast (44 axial slices with a voxel resolution of 3 × 3 × 3 mm covering most of the cerebrum; repetition time (TR) = 2 s; echo time (TE) = 45 ms; acquisition time (TA) = 2 s) in a 1.5 T MRI scanner with a 30-channel head coil (Siemens TIM Avanto, Erlangen, Germany). Participants completed two scanning runs in which 321 functional volumes were obtained. A T1-weighted Magnetization Prepared -RApid Gradient Echo (MPRAGE) with 2× GeneRalized Autocalibrating Partial Parallel Acquisition (GRAPPA) acceleration anatomical image lasting 5 min 30 s was acquired after the functional runs.

Data analysis
Behavioural data analysis. 2 (age group) × 3 (trial type: Baseline, No reward, Reward) mixed-model repeated measures analysis of variances (ANOVAs) were performed on correct trials mean RT and accuracy of the letter-array WM task. Models were fitted in R 3.5.2 (R Core Team, 2018) using afex (Singmann et al., 2018). Greenhouse-Geisser correction was employed for violation of sphericity and Tukey correction for multiple comparisons.
MRI data pre-processing. MRI data were pre-processed and analysed using Statistical Parametric Mapping (SPM12, Wellcome Trust Centre for Neuroimaging, http://www.fil.ion.ucl.ac.uk/ spm/). Images were realigned to the first analysed volume with a second-degree B-spline interpolation. The bias-field corrected structural image was coregistered to the mean, realigned functional image and segmented using Montreal Neurological Institute (MNI)-registered International Consortium for Brain Mapping tissue probability maps. Resulting spatial normalisation, parameters were applied to obtain normalised functional images with a voxel size of 3 × 3 × 3 mm, which were smoothed with an 8 mm full width at half maximum Gaussian kernel. Realignment estimates were used to calculate framewise displacement (FD) for each volume (Siegel et al., 2014). Volumes with an FD >0.9 mm were censored and excluded from general linear model estimation by including a regressor of no interest for each censored volume. Adolescents and adults did not differ in estimated movements (all P's >0.28) except adolescents had a lower root mean square translational movement (M = 0.18, s.d. = 0.07) than adults (M = 0.24, s.d. = 0.12, P = 0.03).

Block-and event-related fMRI data analysis.
Sustained activity was modelled in Reward and No reward runs separately using extended boxcar regressors representing task and fixation blocks. Transient activity was modelled using two boxcar regressors of 10.5 s, representing correctly answered Reward trials and No reward trials (in the Baseline run, this distinction was arbitrary but matched the order of Reward and No reward trials in the Reward run). Other regressors were start of blocks (1.5 s), end of blocks (1.5 s), incorrect trials (10.5 s), censored volumes and session means. Regressors were convolved with a canonical haemodynamic response function. The data and model were high-pass filtered at 1/128 Hz.
Two second-level whole-brain random-effect flexible factorial analyses were performed to look at sustained and transient patterns of activation. The first included the factors subject, age group and block type [(Baseline blocks − fixation blocks), (Reward blocks − fixation blocks)], modelling subject as a main effect and the age group x block type interaction. The second analysis similarly included the factors subject, age group and trial type (Baseline trials, No reward trials, Reward trials eventrelated activation).
Statistical contrasts were thresholded at P < 0.001 at the voxel-level with cluster size family-wise error (FWE) correction (P < 0.05) corresponding to a minimum cluster size of 82 voxels. In addition, activations that survived whole-brain FWE correction at P < 0.05 are indicated. Automatic anatomical labelling was done using AAL2 (Tzourio-Mazoyer et al., 2002;Rolls et al., 2015) and manual Brodmann area labelling with mricron (Rorden et al., 2007). Regions that exhibited mixed sustained and transient effects were identified by running the transient contrasts inclusively masked by the sustained contrasts. Reversely, to identify regions that were exclusively sustained or transient, the relevant contrast was exclusively masked (P uncorr < 0.05). Statistical maps for all whole-brain, voxel-wise analyses are available at https://neurovault.org/ collections/4686/.

Regions of interest analyses. Region of interest (ROI) analyses
were performed on extracted mean signal within regions that exhibited a mixed pattern of transient and sustained sensitivity to reward to explore possible interaction effects between task, condition and age group using the mixed block/event analysis parameter estimates. ROIs were defined using MarsBar (Brett et al., 2002) as 10 mm radius spheres centred on the peak coordinates of clusters identified in the relevant contrasts.

Behavioural results
Accuracy and speed in the letter-array WM task increased with age (Table 1) and differed between trial types (accuracy: F(1.7,84.3) = 4.20, P = 0.02, η p 2 = 0.02, RT: (F(1.6,77.9) = 40.50, Participants were more accurate in Reward trials than in Baseline trials, with similar, but not significant, increased accuracy in No reward trials ( Fig. 2A). Participants were faster in No reward trials than in Baseline trials and even faster for Reward trials (Fig. 2B). There was a trend for a decrease in reward sensitivity (z-score normalised composite index of the two self-report indices, SPSRQ and BIS/BAS) with age. A post hoc analysis revealed adolescents were more sensitive to rewards than adults when assessed with the SPSRQ, but not the BIS/BAS (Table 1). Adolescents and adults earned comparable amounts of money although adolescents reported finding money incentives more rewarding than adults (Table 1). Adults had greater backward digit span scores than adolescents, but the two age groups did not differ on the forward digit span task, WEBEXEC or Go/No go task (Table 1). When including backward digit span score as a covariate in the mixed design ANOVAs of the letter-array WM task, the difference between age groups in accuracy became nonsignificant (F(1,46) = 1.21, P = 0.28); however, the RT difference remained (F(1,46) = 5.82, P = 0.02).

Neuroimaging results
Baseline activation during the WM task. A broad network of regions showed sustained increased BOLD signal during letterarray WM task blocks compared to fixation blocks in the Baseline run (Table 2 and Fig. 3A). Activation was overall more extensive in the left hemisphere, which may reflect the verbal nature of the task but also overlapped with the 'default mode network'. In the frontal lobes, bilateral activation was observed in the superior frontal gyri (SFG) and anterior part of the inferior frontal gyri (IFG), extending along the medial wall into the anterior aspect of the ACC. There was increased bilateral parietal activity in the 17.0 ± 0.6 18.1 ± 0.7 n.s. (P = 0.24) Backward digit span total score (possible range: 1-22) 8.9 ± 0.6 11.8 ± 0.8 F(1,48) = 10.62, P < 0.002, η p 2 = 0.18 No go accuracy (%) 87.8 ± 1.7 91.7 ± 1.9 n.s. (P = 0.14) WEBEXEC (possible range: 6-24) b 13.3 ± 0.5 12.9 ± 0.7 n.s. (P = 0.57) Total money earned (£) (possible range: 0-8) 6.50 ± 0.24 6.67 ± 0.23 n.s. (P = 0.63) Rating of monetary incentive (possible range: 1-5) c 4.17 ± 0.14 3.45 ± 0.30 F(1,48) = 6.02, P = 0.02, η p 2 = 0.11 a Higher scores indicate more sensitivity to reward. b Higher scores indicate more executive function failures. c Higher scores indicate finding money more rewarding. left and right angular gyri, as well as in the left middle temporal gyri and medial and left inferior occipital gyri. Activation in the occipital cortex and angular gyrus bilaterally, left fusiform gyrus, middle temporal gyrus and temporal poles, inferior and superior frontal gyri and putamen survived voxel-level whole-brain correction (Table 2). Compared to adolescents, adults showed increased activation, which survived whole-brain correction at the cluster-level (but not voxel-level) in the left superior frontal and superior medial gyri, extending into the ACC, precentral gyrus and supplementary motor area (SMA), and activity in the lingual gyri (Table 2 and Fig. 3A).
Widespread transient increased BOLD activation was observed in frontal, parietal and temporal regions, during WM task trials in the Baseline run, surviving both cluster-level and voxel-level whole-brain correction (Table 2 and Fig. 3B). In the frontal lobes, bilateral activation was observed in the SFG and anterior part of the IFG, as well as orbitofrontal cortex, extending along the medial wall predominantly into the middle cingulate cortex. There was increased bilateral activity in the insulae, in the angular gyri in the parietal cortex, as well as in the middle temporal gyri and inferior occipital gyri. Increases in subcortical activation were observed in the caudate and putamen, as well as in the thalamus and hippocampus bilaterally. There was widespread bilateral activation in the cerebellum. Adults showed increased activity in the precentral gyrus bilaterally, extending predominantly into the left postcentral gyrus compared to adolescents (left hemisphere peaks survived whole-brain correction at the voxel-level, except clusters in left hippocampus and postcentral gyrus), while adolescents exhibited less deactivation in medial PFC and precuneus than adults (Table 2 and Fig. 3B).

Reward effects.
Sustained effects of reward were assessed by contrasting task block activation of the Reward run and the Baseline run. This resulted in a large cluster peaking in the right insula and extending into bilateral ventral PFC, right dorsolateral PFC and the left insula, as well as into the caudate and putamen subcortically. There was an additional cluster in the right angular and supramarginal gyri. Bilateral ventroprefrontal, insula, caudate and occipital peaks survived whole-brain correction at the voxel-level (Table 3 and Fig. 4A). The pattern of activation largely did not overlap with the sustained WM task effects (Fig. 3A vs Fig. 4A). No increased activation was observed in the reverse contrast [Baseline blocks > Reward blocks].
Transient effects of reward were assessed by contrasting activation in Reward trials with No reward trials within the Reward run. This resulted in a large cluster peaking in inferior middle occipital gyrus and extending into the middle occipital gyrus, the superior and inferior parietal cortex bilaterally, right ventral and dorsolateral PFC and along the medial wall the medial frontal cortex and anterior and middle cingulate cortex, as well as the right insula. There were additional clusters in the left insula and right precentral and middle frontal gyri. Subcortical activity was observed bilaterally in the caudate nucleus extending slightly into accumbens, pallidum, thalamus, and bilateral hippocampi. There was widespread activation of cerebellar regions. Peaks located in posterior occipital brain regions, inferior parietal cortex, subcortical regions and the cerebellum survived whole-brain correction at the voxel-level. Among anterior brain regions, the ACC and insulae were the only peaks surviving voxel-level correction. No increased activation was observed in the reverse contrast [No reward trials > Reward trials] (Table 3 and Fig. 4B).
To further explore the pattern of changes in transient changes in BOLD signal according to reward, Reward and No reward trials were contrasted to Baseline trials (Table 3). Reward trials were associated with less deactivation of the precuneus and left lingual gyrus and middle occipital cortex than Baseline trials (the latter was not significant with voxel-level whole-brain correction) and greater activation than Baseline trials within a subset of the left lingual gyrus cluster (Fig. 5A). No difference in activation was observed in the reverse contrast [Baseline trials > Reward trials]. No reward trials showed, similarly to Reward trials, less deactivation in the precuneus than Baseline trials, as well as less deactivation than Baseline trials in the left superior frontal gyrus (the latter was not significant with voxel-level whole-brain correction) (Fig. 5B). Finally, Baseline trials showed higher activation than No reward trials in bilateral insulae, left precentral gyrus, medial frontal gyrus extending into middle cingulate gyrus and left inferior frontal cortex; there was also activation in the caudate and inferior and middle occipital gyri (Fig. 5C). Only activations in the bilateral insulae survived voxellevel whole-brain correction. The predominant pattern across regions showing transient increases in activation during the WM trials was therefore No reward trials < Baseline trials < Reward trials.
Inclusive and exclusive masking contrasts indicated that bilateral insulae (surviving voxel-wise whole-brain correction), right angular gyrus and a subcortical cluster including right caudate nucleus, thalamus and left pallidum exhibited reward context-related changes in both transient and sustained activity. No regions exhibited an exclusively sustained pattern of activation, but the more anterior aspect of the ACC as well as some cerebellar and occipital areas exhibited transient changes only in response to reward. Activations in the inferior occipital lobe and ACC survived voxel-level whole-brain correction (Table 3).

ROI analyses.
To explore age effects in regions identified to have a mixed pattern of response to rewards, we extracted mean parameter estimates in 10 mm radius spheres centred on the peaks of the four clusters exhibiting modulation by   reward of both transient and sustained activation (left and right insulae, angular gyrus and caudate, see Table 3). The left AI showed a significant interaction between run and age group: adolescents exhibited a greater increase in reward-dependent sustained activation than adults (F(1,48) = 6.35, P = 0.02, η p 2 = 0.05, Fig. 6A). The right AI showed a similar pattern (F(1,48) = 3.91, P = 0.05, η p 2 = 0.02) (Fig. 6B). Analyses of transient activations showed that adults exhibited increased overall activity in the right AI (F(1,48) = 4.79, P = 0.03, η p 2 = 0.08), but not the left AI (F(1,48) = 2.22, P = 0.14, η p 2 = 0.04), across Reward and No reward trials compared to adolescents ( Fig. 6C and D). No other age effects were identified.

Discussion
We examined the impact of reward on sustained and transient engagement of cognitive control, and whether differences exist between adolescence and adulthood. In the letter-array WM task, high accuracy rates can be achieved with a reactive control strategy. However, to produce accurate responses that are fast enough, the optimal strategy is to proactively sustain the task set and rule use across trials, in anticipation of the stimuli (Jimura et al., 2010). Results showed similar behavioural and neural evidence for engagement of proactive and reactive strategies in the context of reward for adolescents and adults.

Proactive control
RTs were faster in No reward than Baseline trials, suggesting sustained performance improvement in the Reward run associated with a proactive cognitive control strategy (Jimura et al., 2010). Reward blocks were associated with increased frontoparietal activity outside of the network recruited in the main WM block contrast. Our results align with findings of rewardrelated increased sustained activity in the right lateral PFC (middle frontal gyrus) and right PPC (angular gyrus), regions associated with proactive control in adults (Locke and Braver, 2008;Jimura et al., 2010) and children, adolescents and adults (Strang and Pollak, 2014). An increased cognitive control system engagement could also be reflective of higher load due to the introduction of two different reward conditions, Reward and No reward trials. However, this is unlikely to reflect a task-switching load, as the WM task stays constant [there is no perceptual, response or set shifting needed (Kim et al., 2012)], and we see a pattern of improved performance rather than the performance cost typically associated with a switching context (Monsell, 2003). In addition, there was evidence for sustained activation in regions typically associated with reward across age groups: the caudate nucleus, putamen and orbitofrontal cortex (Silverman et al., 2015).

Reactive control
Adolescents and adults were fastest for Reward trials, with a similar trend for accuracy, which points to a trial-by-trial reward enhancement reflecting reactive control. Reward trials were associated with increased transient activity in cortical regions recruited overall during the letter-array WM task (AI, ACC and parietal cortex), in contrast to the block effects of reward, which did not overlap with WM regions. The overall pattern shows intermediate activations for Baseline trials compared to No reward and Reward trials. A possible interpretation of these results is that, as transient activation is not as resource consuming as sustained activation, there is still scope for increased trial-by-trial recruitment of WM regions to increase the chance of obtaining a reward, on top of WM transient activation. In addition to the sustained activation, Reward trials also recruited the right orbitofrontal cortex, the caudate nucleus and putamen (Haber and Knutson, 2010).
In the present study, transient activity in the ACC might be related to increased monitoring in response to reward. The ACC has been proposed to be involved in performance monitoring and, when conflict arises, is thought to recruit higher control order structures in the lateral PFC (Botvinick and Braver, 2015). In this case, conflict might signal a cost-benefit analysis where cost of task performance is weighed against expect values of its outcome and necessary effort required (Shenhav et al., 2013). Further, exclusively transient activations were observed in anterior aspects of the ACC, which fit with previous evidence that this region is selectively involved in compensatory reactive control processes (Jiang et al., 2015).

Mixed and exclusively transient regions
Bilateral insulae, the right PPC and the caudate exhibited a mixed pattern: with both higher sustained activity for the Reward run than the Baseline run and higher transient increases in activity for Reward trials compared to No reward trials, with Baseline trials showing intermediary levels of transient activity (i.e. No reward < Baseline < Reward in most cases). The AI emerged as the key mixed region involved in proactive and reactive control in response to reward, while the DLPFC had a more sustained pattern of activation in the baseline WM run. Sustained activation in the DLPFC in the reward context may reflect increased top-down 'boosting' excitatory connectivity to more posterior regions, which may facilitate maintenance of representations over a delay (Edin et al., 2009). The AI has been implicated in top-down control processes including taskset maintenance (Dosenbach et al., 2008;Nelson et al., 2010) and tracking cognitive control demand stability (Jiang et al., 2015). It has also been implicated in bottom-up salience detection of relevant cues (Menon and Uddin, 2010) as part of the salience network and key cognitive-emotional hub (Menon and Uddin, 2010;Smith et al., 2014). It has been suggested that the AI may support the transient detection of salient stimuli and initiating attentional control signals which are then sustained by the ACC and the ventrolateral and DLPFC (Menon and Uddin, 2010). As one of the most commonly activated areas in fMRI studies, the AI has been implicated in controlling attention as a function of task demands (see Nelson et al., 2010 for a review).
The caudate nucleus has been implicated in processing extrinsic reward related to monetary gains and losses (Haber and Knutson, 2010;Richards et al., 2013). We found that activity in the caudate nucleus was greater in Reward blocks than Baseline blocks and that it tracked trial reward status as baseline activity was maintained for Reward trials only, while activation was lower for No reward trials. A speculative interpretation of these results is that once the explicit reward trials were introduced, the value of the No reward trials dropped compared to their starting level. However, the caudate nucleus has also been found to be activated in WM tasks in the absence of rewards (e.g. Ziermans et al., 2012), and in the present study there were transient increases in activation in the caudate in Baseline WM task trials, which suggests a non-exclusive reward role of the caudate.
Our results speak to the debate surrounding two underlying configurations which have been proposed for the DMC (Jiang et al., 2015). One of the accounts proposes that proactive and reactive control are implemented by different dynamics within the same region of the right dorsolateral PFC (BA 46/9) (Braver et al., 2009;Burgess and Braver, 2010;Jimura et al., 2010). Other accounts propose that different strategies are implemented by distinct brain regions (De Pisapia and Braver, 2006;Jiang et al., 2015). Here, we found evidence that both mechanisms might be at play. The bilateral AI, right PPC and caudate nucleus showed a mixed pattern of response while the anterior aspects of the ACC showed transient effects of reward only.

Developmental effects
In line with developmental studies of cognitive control Humphrey and Dumontheil, 2016), adults had greater overall accuracy and faster RT than adolescents, as well as greater backward digit span scores (Karakas et al., 2002). However, since speed thresholds were determined individually, adolescents and adults earned comparable monetary rewards. Adolescents reported finding monetary incentives more rewarding than adults. Comparable performance between adolescents and adults might be driven by increased motivation to perform by the adolescents, perhaps associated with finding money more rewarding.
Adolescents showed a trend for more reward sensitivity than adults (Galván, 2013;van Duijvenvoorde et al., 2016). Post hoc analyses revealed adolescents had significantly greater reward sensitivity than adults on the SPSRQ (Torrubia et al., 2001, which assesses reward sensitivity per se, but not on the approach motivation subscale of the BIS/BAS (Carver and White, 1994). However, we did not find age differences in reward-related brain activation. Although age differences between adolescents and adults are often described in the neuroimaging reward literature, they are not consistent across stages of reward processing or type of task (Galván, 2010(Galván, , 2013. In the absence of rewards, previous developmental work has suggested that adolescents employ a more reactive than proactive cognitive control strategy (Velanova et al., 2009;Andrews-Hanna et al., 2011;Alahyane et al., 2014). In the present study, we show that, like Strang and Pollack (2014), in the context of potential rewards, adolescents, like adults, can sustain cognitive control proactively. By also examining reactive control, in contrast to Strang and Pollack (2014), we provide evidence that adolescents, like adults, show additional improvements in a trial-by-trial fashion.
In follow-up ROI analyses of the main results, we found that transient activation was overall greater in the right AI in adults than in adolescents but that both age groups showed similar increases in transient activation in Reward trials compared to No reward trials. A different pattern was observed for sustained activation, whereby adolescents showed a greater increase in activation in task blocks of the Reward run compared to the Baseline run in the left AI, 'catching up' with adult levels of activation. Adolescents may be relying on an adaptive mechanism of sustained, but not transient, increase in AI activation. This speaks to an immature proactive capacity in adolescents that is only engaged in the context of reward. The role of the AI in adolescent decision-making processes is increasingly recognised, suggesting that the relative immaturity of this cognitiveemotional hub, which is connected to both the lateral PFC and striatum, may bias adolescents in affectively driven contexts (for a review, see Smith et al., 2014). Here we suggest that a sensitivity to reward context in the AI may support increased sustained engagement of cognitive control in some instances.

Limitations and future directions
Varying between Reward and No reward trials could be an additional component of the task which may have led to an increase in sustained activation of the cognitive control system by making the task more engaging. However, a greater overall engagement could not account for transient differences between Reward and No reward trials. Order effects are a limitation of the current study (as in Jimura et al., 2010). Counterbalancing the order of blocks was not possible to ensure participants were at first naïve regarding potential rewards to determine their baseline performance. To minimise order effects related to practice, we introduced a long practice period to ensure that participant's performance stabilised before the scanning runs. It is possible that the better performance observed in the second run could still be driven in part by practice effects. Plots of RT and accuracy as a function of trial number and block number suggest indeed that in adolescents the RT difference between Baseline and No reward trials may has been driven by practice effects (Supplementary Figure S1). However, this is not apparent in adults, nor in the accuracy data (Supplementary Figure S2). Practice effects however could not explain the difference in RT and brain activation between Reward and No reward trials. Although demanding, the task was not very difficult, as reflected by high accuracy rates. It might be that the balance between proactive and reactive strategies begins to emerge in more challenging cognitive control tasks, and future studies could investigate this.

Conclusion
This study shows behavioural and neuroimaging evidence of modulation of both proactive and reactive control by reward in adults and in adolescents. Proactive and reactive control were found to be supported both by partly separable frontoparietal neural circuitries and by regions that exhibit both sustained and transient modulation by reward. In the face of incentives, adolescents and adults can sustain cognitive control in a proactive fashion, with additional transient readjustments in response to the reward. There is some evidence of adaptive higher sustained activation in the AI by adolescents in the context of reward.

Supplementary data
Supplementary data are available at SCAN online.