The prefrontal cortex (PFC) of the rat supports cognitive flexibility, the ability to spontaneously adapt goal-directed behavior in response to radically changing situational demands. We have shown previously that transient inactivation of the rat medial PFC (mPFC) impairs initial reversal learning in a spatial 2-lever discrimination task. Given the importance of dopamine (DA) for PFC function, we studied DA (and noradrenaline [NA]) efflux in the mPFC during reversal learning. We observed a higher and more extended increase in DA efflux in rats performing the first reversal compared with controls performing the previously acquired discrimination. The results of an additional experiment suggest that such a difference between the reversal- and control-induced DA increases was absent during a third reversal. During the extinction session, DA efflux did not increase from basal levels. Increases in NA efflux were less than in DA and did not differ between control and any condition. We conclude that prefrontal DA activity is increased during execution of instrumental discrimination tasks and that this increase is amplified during the acquisition of a first, but not of later reversals. These data corroborate our previous findings and indicate that DA is critically involved in this form of cognitive flexibility.
Cognitive flexibility is the ability to spontaneously adapt goal-directed behavior in response to radically changing situational demands. As a complex process in itself, flexibility calls upon specific operations in working memory, and cognitive capacities such as performance monitoring, novelty detection, inhibition, maintenance and shifting of mental set and decision making. The elucidation of neural circuits and neurochemical mechanisms involved in these capacities could be of major importance for the development of effective interventions for various psychiatric disorders (Rahman and others 2001).
There is accumulating evidence from nonhuman primates and rodents that separate prefrontal cortex (PFC) regions influence distinct levels of complexity of cognitive flexibility (Uylings and others 2003). Dias and others (1997) showed that lesions of the lateral PFC of marmosets impaired the shifting of attentional set between 2 stimulus dimensions (extramodal shift) but not the reversal of previously learned response–reward relations, which was impaired by orbitofrontal lesions. Lesion studies in rats performing a food-search task resulted in comparable effects: Extramodal shifting depends on the medial PFC (mPFC), the prelimbic cortex in particular (Birrell and Brown 2000), whereas reversal learning in these settings does not and instead calls upon the orbital PFC (McAlonan and Brown 2003). However, mPFC supports response selection in general when effortful or higher order processing is involved (Granon and others 1994; Kesner and Ragozzino 2003). This not only includes rule shifting in various tasks (De Bruin and others 1994; Ragozzino and others 1999) but also reversal learning in “difficult” nonspatial (Bussey and others 1997) and in spatial instrumental tasks (De Bruin and others 2000). Indeed, the mPFC has a special role in spatial planning and in associating places with their motivational salience (Hok and others 2005) and interacts with the hippocampus in subserving the spatial attribute within the rule-based memory systems (Kesner and Rogers 2004). In accordance with this we have reported that transient inactivation of the mPFC but not the orbital/lateral PFC impaired reversal learning in an instrumental 2-lever spatial-discrimination task, while leaving discrimination learning or extinction unaffected (De Bruin and others 2000).
The PFC receives a dense dopaminergic (DA) innervation from the ventral tegmental area, which is known to be important for prefrontal functions such as working memory (review, Goldman-Rakic and others 2000). Given the presumed importance of working memory for cognitive flexibility, it is not surprising that recent evidence indicates that prefrontal DA is important for cognitive flexibility as well. Prefrontal DA depletion in marmosets impaired attentional set-shifting (Crofts and others 2001), whereas blockade of the D1 receptor in the rat mPFC affected rule shifting in a food-search task (Ragozzino 2002). Preliminary findings suggest that this is also the case in spatial reversal (but not discrimination or extinction) learning in an instrumental task (De Bruin and others 2000).
These results indicate an important role for DA in mPFC-related flexibility. We hypothesize that DA efflux will be increased in initial reversal learning, when the task becomes more complex and a new response–reward association is learned. We tested this in 2 microdialysis experiments in which DA efflux was measured simultaneously in the mPFC of the rat during reversal learning in operant boxes. As the initial phase of reversal learning involves an inhibition of previously established response–reward relationships, we performed additional experiments to study the effects of extinction learning on catecholamine efflux in a separate group.
Noradrenergic (NA) fibers originating in the locus coeruleus (LC) innervate the PFC and have been implicated in prefrontal functions as well (Arnsten and Li 2005). LC neurons are activated during discrimination and reversal phases of goal-directed activity (Bouret and Sara 2004) and are increased tonically in reversals of response–reward relations (Aston-Jones and others 1997). Therefore, we simultaneously measured NA efflux (Feenstra and others 1998) to compare the activation of both catecholamines during reversal and extinction learning.
Materials and Methods
All experiments were approved by the Animal Experimentation Committee of the Royal Netherlands Academy of Arts and Sciences and were carried out in agreement with Dutch laws (Wet op de Dierproeven 1996) and European regulations (Guideline 86/609/EEC).
Male experimentally naive Wistar rats (Harlan/CPB-WU, 275–325 g at time of the experiment) were kept 4 to a cage in a temperature and humidity controlled room, under a reversed light/dark cycle. All experiments were carried out in the dark period (dimmed red light from 07:00 AM to 07:00 PM). Two days before start of the training animals were put on a food-restriction schedule to keep them at 90% of their free-feeding weight.
Training and testing were conducted in dialysis-compatible Skinner boxes, measuring 29.2 × 24.1 × 21 cm (MED Associates, Georgia, VT) enclosed by a sound- and light-attenuating chamber and dimly illuminated by a house light oriented toward the ceiling. All Skinner boxes were equipped with 2 retractable levers, one on either side of a food dispenser, where reward pellets (Noyes, formula P, 45 mg; Sandown Scientific, Hampton, UK) could be delivered. This distance between the 2 levers (center to center) was 16.5 cm. Because of the extended stay in the boxes water bottles were present. The boxes were connected via a Med–PC interface to a computer and controlled by MED software.
Bilateral microdialysis probes were placed in the mPFC (coordinates relative to bregma: anterior-posterior +2.70; ventral −5.5; lateral ±1.8, angle 12°, according to Paxinos and Watson 1986) and secured with dental acrylic cement. Rats were anesthetized by an intramuscular injection of a combination of 0.24 mg/kg fentanyl citrate and 7.5 mg/kg fluanisone (Hypnorm®, Janssen, Turnhout, Belgium), followed by a subcutaneous injection of 0.75 mg/kg midazolam (Dormicum®, Roche, Mijdrecht, The Netherlands). After surgery, a subcutaneous injection of 0.075 mg/kg buprenorfine (Temgesic®, Schering–Plough, Reckitt Benckiser, UK) was given and rats were housed separately.
All rats were trained in Skinner boxes in the same way. The training started with a 1-lever shaping schedule (session duration ± 40 min, 40 trials, 5 sessions in 3 days): The rats learned to associate a lever press with the delivery of a pellet. In each trial, one of the 2 levers randomly appeared and the rats had to press that lever within 60 s after insertion to be rewarded. A lever press was followed by retraction of the lever and start of an intertrial-interval period of 10 s. When rats pressed between 90% and 100% of the times, this was followed by a gradual increase in the fixed ratio of lever pressing from 1 to 3 (fixed ratio 3, session duration ± 30 min, 30 trials, 2 sessions in 2 days). The rats had to press a lever 3 times in order to receive a reward, to ensure that accidental presses on the correct lever did not lead to reward delivery.
Training was continued by the acquisition of a 2-lever discrimination task. Now 2 levers were presented and the rat had to learn that 3 lever presses on only one of these levers resulted in reward presentation. Every trial started when the levers both came out and the stimulus lights above the levers were lit simultaneously. Thus, the only difference between the stimuli presented for the discrimination was the spatial position. Three lever presses on the correct lever resulted in the retraction of the levers and the delivery of a food pellet. Three lever presses on the incorrect lever also resulted in the retraction of the levers, but without the delivery of a food pellet. Following an interval of 10 s, a new trial began. If the rat failed to make 3 responses on the same lever in a 1-min period, the levers were retracted, appearing again after a 10-s time-out period. Rats were trained on one lever (the left or the right one) for 5 days receiving one session a day (64 trials, session duration 35–45 min). This discrimination phase was followed by placement of the microdialysis probes. In all rats, the implantation of the probes in the brain was followed by a recovery period of 1 week before microdialysis measurements were performed. We showed before that a 1-week recovery period does not affect basal extracellular concentrations of DA and NA during the dark phase of the circadian cycle or the increases in DA and NA efflux induced by unconditioned stimuli, as compared with measurements on the day after probe insertion (Feenstra and others 2000). Six days after placement, rats were allocated to 1 of 2 experiments.
Experiment 1. Groups R1, E1, C1
In the first experiment, microdialysis measurements were carried out during a first reversal or extinction of action–outcome contingencies. Rats were divided into 3 groups according to the task they had to perform on the microdialysis day: reversal (R1), extinction (E1), and their control (C1). The control group had to press the same lever that was previously rewarded in the discrimination phase, the reversal group had to learn to press the other lever, whereas the extinction group learned not to press at all because no lever was rewarded.
On the sixth day after surgery, rats went through rehearsal sessions in order to familiarize them with the experimental procedure on the subsequent day. They were connected to a swivel (fluid lines not connected) and placed in the Skinner box for the whole day (8 h). One and a half hour after connection rats received the same behavioral session as in the discrimination phase. Three hours after the start of the first session a second session was given. On the morning of the seventh day after surgery, the microdialysis day, the 2 PFC probes were connected in series to the microdialysis setup. One and a half hour later the first session was started. All reversal and control groups were subjected to 2 sessions (morning and afternoon) with an interval of 3 h, whereas the extinction group received one session (De Bruin and others 2000). The afternoon session always had the same lever-reward contingency as the morning session. All these sessions consisted of 64 trials and the session duration was 35–45 min for discrimination or reversal and 45–55 min for extinction.
Experiment 2. Groups R3 and C3
In the second experiment, rats were first subjected to 2 reversals and then microdialysis measurements were carried out during a third reversal. Rats were divided into 2 groups: reversal (R3), and control (C3). These groups started the fourth day after surgery with 2 rehearsal sessions, in which the same lever was rewarded as during the discrimination phase, followed on the fifth day by 2 sessions of the first reversal (the other lever rewarded). On the sixth day they had 2 sessions of the second reversal (rewarded lever changed again) while they were connected to the swivel. The seventh day after surgery was the microdialysis day, on which the R3 group received 2 sessions in which they had to switch again from lever in order to be rewarded (their third reversal), whereas the C3 group received sessions in which the rewarded lever was the same as on the previous day (and during the discrimination).
All microdialysis measurements were carried out on the seventh day after the implantation of the probes. The 2 probes were connected in series by a short piece of flexible PEEK tubing (outer diameter 0.51 mm, inner diameter 0.13 mm; Upchurch Scientific, Oak Harbor, WA) and to a Tsumura (TCS 2-23) quartz-lined dual channel swivel (Pronexus, Skärholmen, Sweden) and a perfusion pump using flexible PEEK (Feenstra and others 2000, 2002a). Ringer solution (145 mM NaCl, 2.7 mM KCl, 1.2 mM CaCl2, 1.0 mM MgCl2) was perfused at a rate of 3–3.5 μL/min through the probes.
Dialysis samples were collected on-line in the injector loop (50 μL) and automatically injected every 16 min into the high-performance liquid chromatography system to measure extracellular concentrations of DA and NA as described in Feenstra and others (1998). Separation was achieved on a Supelcosil LC-18-DB column at 35–40 °C using a mobile phase with 9.1 mM citric acid (1.75 g/L), 6.1 mM sodium acetate (5 g/L), 1.6 mM heptanesulfonic acid (325 mg/L), 11.9 mM NaNO3 (1 g/L), and 12.5% methanol. Coulochem 5011 detector cells (electrode 1 operated on +175 to 250 mV, electrode 2 on −250 to 350 mV) were used for detection and were controlled by Antec Decade (ANTEC, Leyden, The Netherlands) or ESA (Chelmsford, MA) Coulochem 5100A. The reduction current measured at electrode 2 was used for analysis. Detection limit: 0.08–0.2 pg/50 μL.
After microdialysis, rats were deeply anesthetized by inhalation of a CO2/O2 mixture and decapitated. The brains were removed, frozen on solid CO2, and 20-μm coronal sections were cut with a cryostat. These were stained with thionine and examined for the precise location of the microdialysis probes. Data from rats with incorrectly placed probes were not reported.
The number of lever presses on each lever and the correct responses (3 lever presses on the correct lever leading to reward delivery) of the 64 trials of each discrimination or reversal session were recorded in 8 blocks of 8 trials. From these data, the total number of lever presses and the omissions could be calculated. It should be noted that the total number of lever presses can vary even if rats make only correct responses, as sometimes rats are able to press more than 3 times before the lever retracts. For the extinction session, only the total number of lever presses was recorded. On both the rehearsal and the microdialysis day, the behavioral results were analyzed with a repeated-measurements analysis of variance (ANOVA) to assess time, group, and interaction effects (block as within-subject factor and group as between-subjects factor). The within-subjects factor was corrected for degrees of freedom using the Huynh–Feldt epsilon. If on the rehearsal day a group difference in correct responses was found in a session, the average score of correct responses of the eight blocks of that session per rat was added as a covariate in the analysis of the neurochemical and behavior data. When a group effect or a block × group interaction was detected, a post hoc analysis was carried out for each block with group as between-subject factor using ANOVA or a t-test with equal variances not assumed. When an effect for sample or for sample × group was detected, new repeated-measures ANOVAs were carried out for each group, followed by simple contrasts of all subsequent samples against the first block. Where appropriate, separate analysis of the behavior data for the specific groups used for neurochemical analysis was also performed. The level of significance was P < 0.05 in all analyses.
Extracellular measurements of DA and NA were taken on the microdialysis day. The values of the first 3 samples preceding the sessions were used to calculate the mean basal concentration, which in turn was used for calculating relative values. Absolute basal values were compared using 1-way ANOVA. Percentage sample values (basal value 3 and effect samples 1–5) were analyzed by a repeated-measurements ANOVA, to assess time, group, and interaction effects (block as within-subject factor and group as between-subjects factor). The within-subjects factor was corrected for degrees of freedom using the Huynh–Feldt epsilon. When a group effect or a sample × group interaction was detected, a post hoc analysis was carried out for each sample with group as between-subject factor using ANOVA followed by a Student–Newman–Keuls test (in the case of 3 groups). When an effect for sample or for sample × group was detected new repeated-measures ANOVAs were carried out for each group, followed by simple contrasts of all subsequent samples against basal value 3 as the reference sample. The level of significance was P < 0.05 in all analyses.
The chromatographic analysis was performed using a Class VP 5.1 software package (Shimadzu's Hertogenbosch, Netherlands), the statistical analysis using a SPSS 11.0 software package (SPSS, Gorinchem, Netherlands).
Experiment 1. First Reversal and Extinction
After recovery from surgery, all 3 groups received the same behavioral session on the rehearsal day as they did before surgery in the discrimination phase. No differences were found in performance of the discrimination task before and after surgery. A small group effect was found (F2,23 = 3.785; P < 0.038, results not shown) for discrimination task performance after surgery. Post hoc analysis revealed a difference between C1 and R1 groups (contrast analysis: P < 0.018). Using these results as a covariate in further analyses of this experiment did, however, not result in any statistical differences.
On the microdialysis day, the control group C1 (n = 10) was presented with 2 sessions that were similar to the previous discrimination sessions. The experimental groups E1 (n = 7) and R1 (n = 9) were, however, exposed to the first extinction or reversal of action–outcome contingencies, in 1 and 2 sessions, respectively. DA and NA efflux in the mPFC were measured continuously during the microdialysis day. Basal extracellular concentrations of the 3 groups (Table 1) were not significantly different. In session 1, group C1 performed as before, but the R1 group started with an average of less than one correct response in the first block of 8 trials. They gradually improved their performance, reaching 50% at the end of this session (Fig. 1, left side). During session 2, the rats continued to improve and reached near maximum accuracy at the end (Fig. 1, right side). Reversal learning led to reciprocal changes in the number of correct and incorrect lever presses with a negligible number of omissions (Fig. 2). The E1 group started the first blocks of the session with normal levels of lever pressing (i.e., a total of around 24 per block), but gradually ceased activity as they learned that no rewards were delivered (Fig. 3). During rewarded operant sessions, DA—and, to a lesser extent, NA—efflux was increased.
|Exp. 1||C1||2.24 ± 0.29||7.44 ± 0.81|
|R1||2.22 ± 0.35||5.82 ± 0.99|
|E1||1.38 ± 0.22||5.02 ± 1.30|
|Exp. 2||C3||2.55 ± 0.47||3.12 ± 1.02|
|R3||1.63 ± 0.28||3.52 ± 0.95|
|Exp. 1||C1||2.24 ± 0.29||7.44 ± 0.81|
|R1||2.22 ± 0.35||5.82 ± 0.99|
|E1||1.38 ± 0.22||5.02 ± 1.30|
|Exp. 2||C3||2.55 ± 0.47||3.12 ± 1.02|
|R3||1.63 ± 0.28||3.52 ± 0.95|
Statistical Evaluation Behavior C1, R1
A repeated-measurements ANOVA of the correct responses in session 1 showed main effects for time (block), time × group, and group (Figs 1 and 2) (F7,119 = 7.753, P < 0.001; F7,119 = 7.330, P = 0.001, F1,17 = 86.487, P = 0.001, respectively) (Fig. 1, left side). Post hoc analysis revealed significant group differences for all blocks of session 1. Further analysis showed significant time, time × group interaction, and group effects for the number of correct (F7,119 = 8.075; F7,119 = 7.534; F1,17 = 94.648, respectively; all P < 0.001) and incorrect lever presses (F7,119 = 11.288; F7,119 = `7.560; F1,17 = 76.105, respectively; all P < 0.001) but not for the total number of lever presses or the omissions. The numbers of correct and incorrect lever presses were significantly different between the R1 and C1 groups in all blocks of this session. Separate group analysis of the correct responses resulted in time effects for R1 (F7,56 =7.073, P < 0.001) and revealed significant differences for block 2 and blocks 6–8 (filled symbols in Fig. 1), suggesting that at the end of the session a switch to the other lever was consistently present. In session 2 an effect for group, but not of time (block) or interaction was found for correct responses (F1,9 = 22.110, P = 0,001). Post hoc tests revealed significant differences for blocks 2–5 (Fig. 1, right side). Further analysis showed significant time, time × group interaction, and group effects for the number of incorrect lever presses (F7,63 = 3.755, P < 0.05; F7,63 = 3.808, P < 0.05; F1,9 = 42.759, P < 0.001, respectively) and group effects for correct (F1,9 = 23.419, P = 0.001) and total lever presses (F1,9 = 10.334, P < 0.05) but not for the omissions. Differences between groups are indicated in Figure 2. Separate group analysis did not result in significant effects.
Statistical Evaluation Behavior C1, E1
Analysis of the total number of lever presses (the relevant measure in extinction) in session 1 revealed main effects for time (block), time × group, and group (F7,105 = 15.205, P < 0.001; F14,105 = 12.829, P = 0.001; F1,15 = 31.723, P = 0.001, respectively) (Fig. 3). Post hoc analysis revealed group differences in blocks 3–8. Upon separate analysis, a time effect was found for E1 (F7,42 = 10.200, P < 0.001): blocks 5–8 were different from the first block, suggesting the presence of a significant response inhibition. A time effect was also found for C1 (F7,63 = 3.188, P < 0.010), resulting in a small difference between the first and second blocks (P < 0.048).
Statistical Evaluation DA Efflux C1, R1, E1
A repeated-measures ANOVA of DA efflux in session 1 showed significant effects for group, sample, and interaction (F2,115 = 11.213, P < 0.001; F5,23 = 14.249, P < 0.001; F10,115 = 3.185, P < 0.002, respectively) (Figs 1 and 3). Post hoc analysis revealed that the R1 group differed from C1 and E1 in samples 6, 7, and 8 and that the E1 group differed from the C1 and R1 in samples 5 and 6. Separate analysis of the groups showed an effect of sample for C1 (F5,45 = 12.548, P < 0.001) and R1 (F5,40 = 7.603, P < 0.001) but not for E1. The execution of the operant discrimination task (C1) led to increased DA efflux in samples 4, 5 (150%) and 6 (Fig. 1). Execution of the reversal task led to increased DA efflux in samples 4, 5, 6 (170%), and 7 (Fig. 1). The extinction task did not lead to increased DA efflux (Fig. 3). When the mean maximum effects (in any of the samples during session 1) were taken for each animal, the values for C1 (158.7 ± 9.1, n = 10), R1 (186.9 ± 10.5, n = 9), and E1 (120.9 ± 7.4, n = 7) were significantly different from each other (ANOVA F2,25 = 11.101, P < 0.001), and a higher maximum level was reached during the reversal than during the control session. Session 2 was presented only to groups R1 and C1 (Fig. 1, right side). A complete set of measurements during the afternoon session could not be collected for all rats and the numbers of rats in groups C1 and R1 were now 7 and 4, respectively. Repeated-measures ANOVA analysis showed a significant effect for time (F5,45 = 12.254, P < 0.001), but not for group or interaction. Separate analysis of the C1 and R1 groups showed a sample effect F5,30 = 8.656, P < 0.001; F5,15 =6.533, P = 0.002, respectively) and samples 4, 5 (150%), 6, 7, and 8 of the C1 group and 5 (145%) of the R1 group were increased compared with the basal values. Maximum effects during the afternoon session were not significantly different between C1 (158.4 ± 10.2, n = 7) and R1 (146.5 ± 15.8, n = 4).
Statistical Evaluation NA Efflux C1, R1, E1
Repeated-measures analysis of NA efflux during session 1 showed a significant effect of sample (F5,108 = 5.333, P < 0.001) but not of group or interaction (Figs 1 and 3). Separate analysis of the C1 group showed a sample effect (F5,45 = 9.874, P < 0.001) and samples 4 (112%) and 5 were increased compared with the last basal value. No effect of sample was observed in groups E1 and R1. Analysis of NA efflux in session 2 (R1 and C1) showed a significant effect for sample (F5,45 = 3.745, P < 0.05) but not for group or interaction (Fig. 1, right side). Separate analysis of the C1 group showed a sample effect (F5,30 = 4.909, P = 0.007) and samples 4–6 (120%) and 8 were increased compared with the last basal value. No effect of sample was observed in the R1 group.
Experiment 2. Third Reversal
On the day before microdialysis, no differences were observed in the behavior of the groups C3 (n = 8) and R3 (n = 7) in the 2 sessions of the second reversal (F1,10 = 1.392, P < 0.259, results not shown). The microdialysis experiment was performed at the same interval after surgery as in experiment 1. After going through reversals 1 and 2, group C3 was given 2 sessions with the same contingency as the day before, whereas group R3 was given 2 sessions of the third reversal. The behavioral data indicate that the third reversal led to a similar switching of levers as shown for the first reversal, but that the effective switch was reached more rapidly and was completed already at the end of the first session (Figs 4 and 5). DA and NA efflux in the mPFC were measured continuously during the microdialysis day. Basal extracellular concentrations (Table 1) were not significantly different. DA, but not NA, efflux was increased during the operant sessions.
Statistical Evaluation Behavior C3, R3
Repeated-measures ANOVA of the correct responses of session 1 revealed effects for time (block), time × group, and group (F7,70 = 11.386, P < 0.001; F7,70 = 9.874, P < 0.001; F1,10 = 13.665, P < 0.002, respectively) (Figs 4 and 5). A post hoc test showed group differences in blocks 1–4. Similar results were obtained when one rat from the R3 group in which DA was not detectable was left out but group differences were now restricted to blocks 1–3. Separate analysis of the groups showed a time effect for group R3 (F7,35 = 8.408, P < 0.001; the first block was different from all other blocks) but not for the C3 group. The numbers of correct and incorrect lever presses showed significant time, time × group interaction, and group effects for the number of correct (F7,91 = 9.414, P < 0.001; F7,91 = 7.790 P < 0.001; F1,13 = 12.889, P < 0.01, respectively) and incorrect lever (F7,91 = 13.902; F7,91 = 9.985; F1,13 = 29.635, respectively; all P < 0.001) presses but not for the total number of lever presses or the omissions. The number of correct and incorrect lever presses was significantly different between the R3 and C3 groups in blocks 1–4 and 6 of this session. Analysis of session 2 did not reveal differences for time, group, or interaction for any of the behavioral measures.
Statistical Evaluation DA Efflux C3, R3
Repeated-measures ANOVA of DA efflux in session 1 showed a significant effect of time (F5,50 = 12.130, P < 0.001). Separate group analysis indicated a time effect for C3 (F5,25 = 13.591, P < 0.001, samples 4–7 were significantly increased) and for R3 (F5,25 = 2.978, P < 0.048, samples 4 and 5 were increased) (Fig. 4). In session 2, DA levels showed a similar increase of 78% for both groups in sample 5 (Fig. 4, right side), and again only a significant time effect (F5,50 = 13.551, df = 5, P < 0.001) was observed. Separate analysis indicated time effects for C1 (F5,25 = 7.545, df = 3.751, P < 0.001; samples 4 and 5 were different from the last basal value) and for R1 (F5,25 = 6.549, df = 5, P < 0.001; samples 4, 5, and 6 were different from the last basal value).
Statistical Evaluation NA Efflux C3, R3
Repeated-measures ANOVA of NA efflux in sessions 1 and 2 did not show any significant effect for time or group (Fig. 4).
Our experiments suggest that prefrontal DA is involved especially in initial reversal learning of an instrumental 2-lever spatial-discrimination task. Performing a first reversal resulted in an extra and more prolonged increase of DA efflux than performing the control task, a well-learned choice for one of the levers. This difference in DA efflux arose when the rats reached a significant improvement in performance after the start of the reversal, when new response–reward relationships were formed and task load was highest. After repeated reversals, DA efflux increased to the same extent in control and reversal groups. Inhibition of the old response–reward relationship during extinction, did not increase DA at all. NA behaved differently and showed only small increases in efflux and no clear differences between the conditions tested. These data suggest that DA is differentially involved in the execution, adaptation, and extinction phases of goal-directed behavior, whereas NA may be activated to a lesser extent irrespective of the phase.
In the first reversal session, performance slowly improved from very few correct responses in the first trials to almost 100% at the end of the second session. The more detailed analysis of the behavior (Fig. 2) shows that rats indeed shifted their responses from the incorrect to the correct lever and that switching did not lead to increased omissions. During the extinction session, rats started responding with a maximum number of lever presses but gradually stopped pressing. Repeated reversals led to a more rapid switching in general and this is clearly illustrated in the results of the third reversal. These behavioral results in rats that were simultaneously used for microdialysis measurements compare well with our previous results in unoperated controls (De Bruin and others 2000; van der Meulen and others 2003).
We report clear and consistent increases in DA efflux in the mPFC during execution of a well-learned instrumental discrimination task. This is in agreement with other microdialysis data indicating that DA is activated whenever rewards are presented in the same (skinner box) setting in operant (Feenstra and others 2002b; Winstanley and others 2006) or Pavlovian tasks (Mingote and others 2004). Microdialysis and voltammetric measurements in the nucleus accumbens (Sokolowski and others 1998; Roitman and others 2004) gave similar results. Electrophysiological data also suggest an activation of DA neuronal activity upon either the reward presentation itself (when it is not predicted) or upon the first stimulus that predicts the reward (Hollerman and Schultz 1998; Pan and others 2005). In contrast to DA, NA efflux was only weakly activated during task execution, which corresponds to previous microdialysis results in operant tasks (Dalley and others 2001; Feenstra and others 2002b). It should be noted, though, that strong increases in NA efflux are reported when task contingencies were changed, resulting in a permanent loss of instrumental control over reward delivery (Dalley and others 2001; Feenstra and others 2002b). In this respect, NA reacts similarly to DA (Feenstra and others 2002b; Cheng and Feenstra 2006b).
Reversal learning involves the switching of responses, that is, a simultaneous inhibition of the old response and learning and consolidation of the new one. Rats performing the first reversal started the session with a similar DA increase as seen for controls performing the discrimination task, although they obtained far less rewards. The total numbers of lever presses were similar, however, to those in the first blocks of the extinction session, where no significant increase in DA was observed. Apparently, the few rewarded trials are enough to raise DA efflux to the same extent as in the controls. DA efflux in the first reversal session continued to increase reaching a maximum at the moment of significant improvement in the behavior, but the reversal rats still received only half of the possible rewards. We conclude that these extra increases compared with control and extinction groups are due to the specific process of reversal learning. Indeed, when maximum increases during the session were compared, the reversal group showed a significantly higher maximal efflux than the control group. Previously, we observed that the actual number of lever presses (in fixed-ratio schedules from 1 to 20) to obtain a reward does not affect DA and NA efflux in the mPFC during an operant task as long as the task involves a stable relation between lever presses and rewards (Feenstra and others 2002b). In that study we also observed that the actual number of pellets obtained did not affect DA and NA efflux in the mPFC (Feenstra and others 2002b). In the present reversal experiment, however, the response–reward relation did change radically. This led to unexpected rewards and uncertainty about the outcome of the actions, conditions that have been suggested to induce higher DA neuronal activity than during normal task performance (Hollerman and Schultz 1998; Fiorillo and others 2003). More precisely, the nonrewarded responses on the old lever may have been expected to lead to a decrease at the moment of reward omission (Hollerman and Schultz 1998) but apparently this is either a weak effect or is compensated for by trigger-induced increases (see further) as we showed that overall DA efflux was not altered in the extinction session (Fig. 3). The rewarded responses on the new lever are, however, expected to lead to strong DA increases at the reward presentation (unexpected reward; Fiorillo and others 2003) and, as mentioned above, this might explain the increases in DA efflux throughout the first reversal session. Differential phasic responses at the stimulus presentation (light on, lever out) may not be expected, as the stimuli are exactly the same for the rewarded and the nonrewarded trials and therefore cannot acquire a predictive value. They may act as a trigger stimulus, indicating the start of a new trial in which reward might be obtained and this may lead to a moderate increase in DA efflux each time the stimuli are presented. In the period between the lever presses and the time of reward presentation the rat may experience uncertainty as to the outcome of his actions. A ramping increase of DA activity related to outcome uncertainty has been reported to be maximal at maximal uncertainty (chance of 50% reward, Fiorillo and others 2003). Although this is a situation that is not formally present in our present experiments, reversal learning does involve an increased level of uncertainty regarding the outcome of actions. The level of 50% rewarded trials is reached in the reversal group R1 at the end of the first reversal session, where indeed an increased DA efflux was observed (Fig. 1). It should be stressed, though, that this first reversal presents a special case, as the rats experience for the first time that pressing the previously unrewarded lever now results in reward delivery. Later reversals may be classified more as the switching between 2 familiar response options. This may be the reason why an apparent similar uncertainty at the beginning of the third reversal (Fig. 4) does not lead to increased DA efflux.
The difference in DA efflux between the first reversal and control groups did not occur during the third reversal. This pattern is in agreement with the finding that a first reversal shift is dependent on the integrity of the mPFC and on DA D1-receptors in that area, but a third reversal is not (De Bruin and others 2000). The extra increase in DA efflux we observed in animals performing the first reversal compared with their controls should be important for the development of a new strategy and the subsequent establishment of the new relationship between the response and the reward. Thus, Ragozzino (2002) reported mPFC D1-receptor involvement in strategy shifting in a cross-maze, whereas the importance of D1-receptor–dependent intracellular processes in instrumental reversal learning was also observed by Heyser and others (2000) using dopamine- and cyclic-AMP-regulated phosphoprotein (DARPP) knock-out mice. After repeated reversals, strategy switching may be considered to be part of the behavioral repertoire of the rat and involvement of the PFC (Dias and others 1997) and, consequently, its DA innervation is not needed anymore for switching.
The absence of DA increases during extinction is in line with our previous findings during extinction of a Pavlovian paradigm (Mingote and others 2004) and reports that DA efflux in the N. accumbens was not activated during extinction of instrumental reinforcement (Hernandez and Hoebel 1988). This indicates the importance of existing or newly formed positive response–reward associations for DA efflux. In extinction, the old relationship between reward and response is inhibited and a new, negative one is formed. Apparently, this was less important for DA activation during reversal learning than the detection and learning of a new positive relationship between a reward and a response. It should be noted that we now showed that prefrontal DA, which is reactive to stress (Feenstra 2000), was not activated in extinction of instrumental behavior, although this may be associated to a possibly stressful “frustrative nonreward” and to increased plasma corticosterone and adrenaline (Coover and others 1971; De Boer and others 1990). The role of DA in extinction of appetitive instrumental behavior has been associated to the presence of a reward prediction error and a decrease in DA neuronal firing rates (Hollerman and Schultz 1998). Therefore, an increase of activity upon stimulus presentations may be partially or fully cancelled out by a decrease upon reward omission. It should furthermore be considered that, although the existence of stress-induced activations of DA efflux in rodents is undisputed, the interpretation of such increases is not and it is not clear whether it is the aversive nature of the stimuli that leads to increased DA efflux or, for example, “a heightened attention of the animal or activation of cognitive processes in an attempt to cope with the stressor” (D'Angio and others 1988). In our (moderately food restricted) animals, the omission of reward rapidly led to a strong decrease in instrumental activity, that is, the animals apparently resigned themselves to the new situation and refrained from further activity.
The mPFC plays a role in effortful processing related to response selection (Granon and others 1994) and/or in spatial planning (Hok and others 2005). Kesner and Rogers (2004) recently presented an overview of the relation between different memory attributes and their neurobiological substrates in memory systems. The mPFC is proposed to support spatial attributes, whereas the orbital PFC supports affect attributes. Reversal learning (“affective shifting,” cf. Dias and others 1996, 1997) in food-search tasks has been shown to depend on an intact orbital PFC, whereas attentional set-shifting or rule shifting depends on the mPFC (De Bruin and others 1994; Ragozzino and others 1999; Birrell and Brown 2000; McAlonan and Brown 2003). It has been reported that reversal learning may also depend on the mPFC when it involves special demands (Bussey and others 1997) or spatial instrumental tasks (De Bruin and others 2000). Our present findings suggest that DA activation might interact with local processes to support the reversal of spatial instrumental responses. In explanation of a working memory–related increase in DA efflux in some PFC areas, Watanabe and others (1997) suggested that such increases are due to working memory requirements, increased attentional demands, and general task difficulty. During the initial reversal, all these factors were indeed higher than in the well-learned or repeated reversal condition. What is clear is that differential reactions of DA to reward-related or reward-predicting sensory or contextual stimuli are important to maintain selected goals in working memory (Seamans 2004) and to “stamp-in” stimulus/response–reward associations in long-term memory (Wise 2004). The modulation of neuronal plasticity functions as long-term potentiation, and long-term depression by DA (Jay 2003; Otani and others 2003) can be considered to underlie these functions of DA in reversal learning.
We report differential activation of DA efflux for at least 2 phases of the acquisition–discrimination–reversal–extinction schedule that we applied, that is, reversal and extinction. We do not yet know whether extra activation of DA occurs during discrimination learning compared with performance, although the negative pharmacological results reported by De Bruin and others (2000) do not predict this. We do, however, suggest that extra activation during learning is present during acquisition, that is, instrumental learning. This is based on the data reported by Izaki and others (1998) and our results in the nucleus accumbens (Cheng and Feenstra 2006a) and would be in line with the pharmacological results presented by Baldwin and others (2002) and Smith-Roe and Kelley (2000), respectively, and with the above-mentioned importance of the formation of new response–reward relations for activation of DA efflux.
Changes in catecholamine efflux were measured on a fairly large timescale (minutes) and this is often regarded as reflecting changes in intermediate or tonic, but not phasic, DA activity (Schultz 2002; Wightman and Robinson 2002). Yet, it is interesting to consider our results in the light of recent theories of phasic DA activity in reward-related behaviors. Primate DA neurons fire reliably in reward-related situations, showing large increases upon unexpected rewards and decreases upon reward omission (Hollerman and Schultz 1998). DA neurons also show longer lasting activations when uncertainty about task outcome is high (Fiorillo and others 2003). We find reliable increases in prefrontal DA efflux during normal, rewarded operant behavior, similar increases in the first phase of the first reversal where few (unexpected) rewards occur amongst many reward omissions, higher increases in the second phase when uncertainty about the outcome is high and no increase during the extinction where expected rewards do not occur. This would suggest that DA efflux follows, to a certain extent, the changes in phasic release.
NA efflux was only weakly activated during most conditions tested, suggesting that NA is less important for the actual relationship between operant response and reward than for the general attentional level needed for flexible task execution (Aston-Jones and others 2000). Firing rates of NA neurons increase upon unexpected rewards (Bouret and Sara 2004) and during operant rewarded behavior, including reversals (Aston-Jones and others 1997; Bouret and Sara 2004) but not during extinction (Bouret and Sara 2004). The increases reported in rats by Bouret and Sara (2004), however, were very short lasting and this could be the reason that they are not clearly reflected in our measurements. The longer lasting increases reported by Aston-Jones and others (1997) were obtained using a vigilance task, which could be expected to lead to a much stronger NAergic activation than the spatial lever-press discrimination that we employed. NA involvement in flexible responding in an attention-shifting paradigm was proposed on the basis of effects of idazoxan, an antagonist of α2-receptors (Devauges and Sara 1990). The outcome of later studies has been equivocal (Ridley and others 1981; Langlais and others 1993; Beversdorf and others 2002) but recently Lapiz and Morilak (2006) reported improved reversal learning and extradimensional set-shifting after administration of atipamezole, another α2-receptor antagonist. The effects on set-shifting were inhibited by local administration of an α1-antagonist in the mPFC. Although these data support a role for NA in cognitive flexibility, they are not conclusive. Inhibition of the reversal-improving effects was not demonstrated, and the relative contribution of different (e.g., orbital vs. medial) PFC areas was not investigated. Moreover, all α2-receptor antagonists, including idazoxan and atipamezole, increase cortical DA efflux in addition to NA efflux (Gobert and others 1997; Matsumoto and others 1998). Our results do not provide strong support for an increased activation of NA in flexibility of response–reward learning such as we find for DA. Indeed, NA efflux was significantly elevated in the control group C1, but not in the reversal group R1.
As stated in the Introduction, cognitive flexibility involves a number of specific functions and capacities. PFC neurons may hold a key function as they are suggested to monitor the outcome of goal-directed behavior and encode both spatial and specific reward-related information in working memory (Watanabe 1996). Our results indicate that DA controls this PFC function during adaptation of goal-directed activity.
This research was supported by Eli Lilly and Company. We thank Dr Harry Uylings for discussions and for his helpful comments on this manuscript. Conflict of Interest: None declared.