The prelimbic (PL) region of the prefrontal cortex and the posterior subregion of the dorsomedial striatum (pDMS) are components of a corticostriatal circuit subserving instrumental learning. Here, we examined whether dopamine (DA) signals conveyed to the PL and pDMS are critical for instrumental learning. Rats with 6-hydroxydopamine or vehicle infusion into the PL and pDMS were trained to press 2 levers, either for food pellets or a sucrose solution. Thereafter, we tested whether the animals were sensitive 1) to a selective degradation of 1 of 2 outcomes using a specific satiety procedure and 2) to a selective degradation of 1 of 2 contingencies controlling instrumental behavior. Rats with PL DA depletion displayed a reduced rate of lever presses but appeared to be sensitive to outcome devaluation and contingency degradation. Thus, PL DA seems to modulate lever pressing but does not support instrumental conditioning. In contrast, rats with pDMS DA depletion had intact response rates and were sensitive to selective outcome devaluation; however, they showed a reduced sensitivity to contingency degradation. Therefore, pDMS DA signaling seems not to be involved in maintaining lever pressing but instead contributes to instrumental conditioning by supporting the detection of causal relationships between an action and its consequences.
Evidence from rodent studies suggests that in instrumental conditioning rats encode the contingency between their actions and the incentive value of the outcomes (Dickinson and Balleine 1994; Balleine and Dickinson 1998). Hence, posttraining devaluation of 1 of 2 outcomes contingent upon distinct actions selectively reduced performance of the action that leads to the devalued outcome. Likewise, posttraining degradation of 1 of 2 action-outcome contingencies selectively reduced performance of the action that is no longer causal to the delivery of a particular reward. Sensitivity to outcome devaluation and contingency degradation are key features of goal-directed behavior (Dickinson and Balleine 1994; Balleine and Dickinson 1998). Behavioral studies in rodents suggest that the prelimbic region of the prefrontal (PL) cortex and the posterior subregion of the dorsomedial striatum (pDMS) may form a corticostriatal circuit subserving instrumental learning. For instance, cell body lesions of the PL and the pDMS produced an insensitivity to outcome devaluation and contingency degradation (Balleine and Dickinson 1998; Corbit and Balleine 2003; Killcross and Coutureau 2003; Yin et al. 2005). Likewise, studies in humans implicated the PL cortex and the dorsomedial striatum in encoding the causal effects of actions, thereby indicating that brain systems involved in controlling goal-directed action seem to be highly conserved across mammalian species (Tanaka et al. 2008).
Dopamine (DA) plays a crucial role in modulating synaptic plasticity in corticostriatal pathways and impact on the temporal coordination of activity in corticostriatal pathways (Wickens et al. 2007). Furthermore, DA signals conveyed to the target areas of DA fibers such as striatum and PL cortex, encode a prediction error (Tobler et al. 2005; Schultz 2007a), a teaching signal that may be critical for instrumental conditioning. Consistent with this notion, electrophysiological recordings revealed that the activity of DA neurons is modulated by the value of upcoming actions (Morris et al. 2006; Roesch et al. 2007). Furthermore, reinstatement of DA signaling in the dorsal striatum by viral gene transfer restored instrumental conditioning in DA depleted mice (Robinson et al. 2007). However, little is known about the contribution of DA signals in different target areas of midbrain DA neurons in controlling goal-directed behavior.
Here, we investigated the role of DA in the PL and pDMS in goal-directed behavior. Rats subjected to 6-hydroxydopamine (OHDA) lesions of the PL and the pDMS, respectively, were tested for their sensitivity to outcome devaluation and contingency degradation. If DA in the PL and pDMS is important for the acquisition of action-outcome contingencies underlying goal-directed behavior, then, DA depletion in either area should render rats insensitive to outcome devaluation and contingency degradation.
Materials and Methods
Experiment 1: Effect of PL DA Depletion on Outcome Devaluation and Contingency Degradation
Subjects and Apparatus
The subjects were 18 naive male Lister-hooded rats (Harlan-Winkelmann, Borchen, Germany) weighing between 200 and 230 g upon arrival. The rats were housed in groups of 4 in transparent plastic cages (55 × 39 × 27 cm, Ferplast, Nürnberg, Germany) in a temperature- and humidity-controlled room (20 ± 2 °C, 50–60%) on a 12:12-h light–dark cycle (lights on at 7.00 AM.). Throughout the experiment the rats had ad libitum access to water. Standard laboratory maintenance chow (Altromin, Lage, Germany) was given ad libitum for 2 days after arrival, after which food was restricted to 15 g per animal per day to maintain them at ∼85% of their free-feeding weight. All animal experiments were conducted according to the German law on animal protection and approved by the proper authorities.
Training and testing took place in identical operant chambers (24 × 21 × 30 cm, Med Associates, St Albans, VT) housed within sound attenuating cubicles. Each operant chamber was equipped with a pellet dispenser that delivered 45-mg Noyes Pellets (formula A/I; Sandown Scientific, Hampton, Middlesex, UK) into a dual pellet/liquid cup receptacle which was positioned in the middle of the right wall and a syringe pump that delivered 0.1 mL of a 20% sucrose solution into the same receptacle. Each chamber also contained 2 retractable levers located on either side of the dual pellet/liquid cup receptacle. A 24 V/3 W houselight mounted on the top center of the opposite wall illuminated the chambers and an electric fan integrated into the cubicle provided a constant background noise (∼70 dB). A computer with the PC program MED-PC IV controlled the equipment and recorded the data.
After pretreatment with atropine (0.2 mg/kg intraperitoneal [i.p.]; WDT, Garbsen, Germany), the animals were anaesthetized with sodium pentobarbital (60 mg/kg i.p.; Medial GmbH, Hallbergmoos, Germany) and xylazine (4 mg/kg intramuscularly [i.m.]; Bayer AG, Leverkusen, Germany) before being placed in a stereotaxic frame (Kopf Instruments, Tujunga, CA). The skull surface was exposed and 2 small holes were drilled bilaterally above the PL. Animals subjected to DA depletion (n = 9) received bilateral intra-PL injections of 4 μg 6-OHDA hydrobromide (Sigma Aldrich, Steinheim, Germany) in 0.4 μL saline containing 0.01% ascorbic acid (Sigma Aldrich) at the following coordinates using a 1-μL Hamilton syringe: anterior-posterior (AP) +3.3; mediolateral (ML) ±0.7; dorsoventral (DV) –3.5 with the tooth bar −3.3 mm below the interaural line. Coordinates were determined from the atlas of Paxinos and Watson (2007). Sham controls (n = 9) received injections of 0.4-μL vehicle, that is, saline containing 0.01% ascorbic acid, at the same coordinates. The infusion time was 2 min and the needle was left in place for an additional 5 min to allow for diffusion. Animals were allowed to recover for at least 7 days before behavioral testing.
The behavioral procedure was similar to a protocol by Corbit et al. (2003). Unless otherwise stated each training session lasted for 20 min, began with the illumination of the houselight as well as the insertion of the appropriate lever and ended with the retraction of the lever and the turning off of the houselight.
Magazine and lever-press training.
First, all animals received 2 magazine training sessions in which both outcomes (pellets and 20% sucrose solution) were delivered on independent random-time schedules (RT-60) with both levers withdrawn. Thereafter, lever-press training was started; for half of the animals of each lesion group pressing the left lever earned one pellet and pressing the right lever earned 0.1 mL 20% sucrose solution. The other half received the opposite action-outcome pairings. All animals received 2 daily lever-press training sessions in which only one lever and one outcome was available. After the first training session, there was at least a 2-h break before the second training session with the other lever-outcome pairing began. The order of the pellet and sucrose sessions was alternated each day. Lever-press training was conducted for 11 consecutive days with progressively leaner random ratio (RR-) schedules of reinforcement, except for the first day in which a continuous reinforcement (CRF) schedule was used (Day 1 = CRF, P (O/A) = 1.0; Day 2–5 = RR-5, P (O/A) = 0.2; Day 6–8 = RR-10, P (O/A) = 0.1; Day 9–11 = RR-20, P (O/A) = 0.05).
Outcome devaluation test in extinction.
In these experiments, transparent plastic feeding cages (42 × 26 × 18 cm, Ferplast) were used for prefeeding. To familiarize animals to these cages, they were placed into the cages immediately before their training session for 20 min on the last 3 days of lever-press training (RR-20). On the day after the last instrumental training session, all animals were given 1-h ad libitum access to 1 of the 2 outcomes in the feeding cages (1 animal per cage). Half of the animals of each lesion group received pellets (in a glass bowl) and the other half received the sucrose solution (in a drinking bottle). Immediately after prefeeding, the rats were placed into the operant chambers and a 10-min choice extinction test was conducted, in which both levers were inserted but no outcomes were given. On the next day, the rats received 2 retraining session (RR-20; one session on each lever) after which they were given a second devaluation test. This second devaluation test was identical to the first one except that those animals that had received free access to the pellets on the first test were given free access to the sucrose solution and “vice versa.”
Outcome devaluation test under reward.
A potential failure to show a selective devaluation effect in extinction could have several reasons, that is, the encoding of the value of the different outcomes could be impaired or, alternatively, the ability to remember and discriminate the 2 outcomes. Therefore, sensitivity to outcome devaluation was also tested with rewards being given. An intact sensitivity to outcome devaluation in the rewarded test suggests that the rat's ability to remember and discriminate the outcomes is not impaired. A retraining session (RR-20; one session on each lever) was conducted on the day after the second devaluation test in extinction. On the following day, rats received a third devaluation test in which half of the animals of each group received pellets in a glass bowl and the other half received sucrose solution in a drinking bottle. This test was identical to the first devaluation test except that responding on the levers was rewarded, that is, each outcome was delivered on an independent RR-20 schedule.
Contingency degradation training.
On the 2 days following the rewarded devaluation test, rats received retraining sessions (RR-20; one session on each lever), after which the contingency training began. To assess whether the rats were sensitive to a degradation of the instrumental contingency, a protocol was applied in which the same outcome earned by pressing 1 of the 2 levers was additionally given in a noncontingent manner with the same probability in each second without a response, that is, the probability of outcome delivery by pressing a lever was P (O/A) = 0.05 and the probability of outcome delivery by not pressing a lever was P (O/no A) = 0.05. Thus the experienced probability of outcome delivery was the same regardless if the animal performed that action or not, a protocol which should degrade this action-outcome association (for details, see Balleine and Dickinson 1998). Because the animals received 2 training sessions each day (one with each action-outcome pairing) and the outcome given noncontingently was the same in both sessions, one action-outcome association was degraded, whereas the other action-outcome association was not degraded. For half of the animals of each lesion group, the lever-pellet contingency was degraded (i.e., pellets were the noncontingently given outcome), whereas for the other half, the lever-sucrose contingency was degraded. The rats received two 30-min training sessions each day (one on each lever) and had a break of at least 2 h between the 2 sessions. The order of the 2 sessions was alternated each day and training continued for 6 days.
Contingency degradation test in extinction.
Subsequently, we tested whether the effect of contingency degradation during training persisted in a test conducted in extinction. Therefore, on the day after the last contingency training session a 10 min-choice extinction test was given. Both levers were inserted but no rewards were given.
Tyrosinehydroxylase (TH) immunohistochemistry was used to assess the exact location and extent of the loss of DA terminals within the PL cortex. After the behavioral testing, animals were killed by an overdose of isoflurane (cp-pharma, Burgdorf, Germany), perfused transcardially with 0.05% buffered heparin solution followed by a 4% buffered formalin solution. The brains were extracted, postfixed in a 4% buffered formalin solution for 24 h, and then transferred into a 30% sucrose solution for at least 48 h. Coronal brain sections were cut (40 μm; Microm HM550, Microm GmbH, Walldorf, Germany) in the region of the PL. The slices were initially washed in Tris-buffered saline (TBS; 3 × 10 min), treated for 15 min with TBS containing 2% hydrogen peroxide and 10% methanol, washed again in TBS (3 × 10 min), and then blocked for 20 min with 4% natural horse serum (NHS; Vector Laboratories, Burlingame, CA) in TBS containing 0.2% Triton X-100 (Sigma Aldrich; TBS-T). Slices were incubated overnight at 4 °C in a primary antibody (mouse, anti-TH, 1:7500 in TBS-T containing 4% NHS; Immunostar, Hudson, WI), then washed in TBS-T (3 × 10 min) and incubated in a secondary antibody (horse, antimouse, rat adsorbed, biotinylated IgG (H + L), 1:500 in TBS-T containing 4% NHS; Vector Laboratories) for 90 min at room temperature. Using the biotin–avidin system, slices were washed in TBS-T containing the avidin-biotinylated enzyme complex (1:500, ABC-Elite Kit; Vector Laboratories) for 60 min at room temperature, washed in TBS (3 × 10 min), and stained with 3,3′-diaminobenzidine (DAB Substrate Kit, Vector Laboratories). The brain slices were then washed in TBS (3 × 10 min), mounted on coated slides, dried overnight, dehydrated in ascending alcohol concentrations, treated with xylene, and finally coverslipped using DePex (Serva, Heidelberg, Germany). To determine the size and placement of the lesions, the TH immunoreactivity was analyzed under a microscope with reference to the atlas of Paxinos and Watson (2007).
Experiment 2: Effect of pDMS DA Depletion on Outcome Devaluation and Contingency Degradation
Unless otherwise noted, the same procedures as in Experiment 1 were used.
Subjects and Apparatus
Subjects were 29 naive male Lister-hooded rats (Harlan-Winkelmann) weighing between 200 and 230 g upon arrival.
After pretreatment with atropine (0.2 mg/kg i.p.; WDT), the animals were anaesthetized with sodium pentobarbital (60 mg/kg i.p.; Medial GmbH) and xylazine (4 mg/kg i.m.; Bayer AG) before being placed in a stereotaxic frame (Kopf Instruments). The skull surface was exposed and 2 small holes were drilled bilaterally above the pDMS. Animals subjected to DA depletion (n = 15) received bilateral intra-pDMS injections of 6 μg 6-OHDA hydrobromide (Sigma Aldrich) in 0.4 μL saline containing 0.01% ascorbic acid (Sigma Aldrich) at the following coordinates using a 1-μL Hamilton syringe: AP −0.4; ML ±2.6; DV −4.5 with the toothbar −3.3 mm below the interaural line. Coordinates were determined from the atlas of Paxinos and Watson (2007). Sham controls (n = 14) received injections of 0.4 μL-vehicle, that is, saline containing 0.01% ascorbic acid, at the same coordinates. The infusion time was 2 min and the needle was left in place for an additional 5 min to allow for diffusion. Animals were allowed to recover for at least 7 days before behavioral testing.
Magazine and lever-press training.
The instrumental training procedure was identical to Experiment 1 except that training was given on 10 rather than 11 consecutive days with the following schedule: Day 1 = CRF, P (O/A) = 1; Day 2–4 = RR-5, P (O/A) = 0.2; Day 5–7 = RR-10, P (O/A) = 0.1; Day 8–10 = RR-20, P (O/A) = 0.05.
Experiment 1: Effect of PL DA Depletion on Outcome Devaluation and Contingency Degradation
The lesion placements were assessed by reconstructing the damaged areas on standard stereotaxic atlas templates from Paxinos and Watson (2007). TH-positive fibers in the PL were abundant in sham-lesioned rats (n = 9) but rare in rats with 6-OHDA lesions (n = 9). The loss of TH-positive fibers was relatively consistent across all rats with 6-OHDA lesions; the areas which were nearly devoid of TH immunoreactivity appeared from +4.2 to +2.2 relative to bregma and were restricted predominantly to the PL with occasional and minimal damage to the anterior cingulate cortex and the infralimbic cortex (Fig. 1). No evidence for a subcortical loss of TH-positive fibers was found. Consistent with this description, Pycock et al. (1980) demonstrated that PL 6-OHDA infusion had relatively restricted effects around the infusion site and did not affect adjacent areas. Furthermore, PL 6-OHDA infusions of comparable concentrations (4–6 μg/μL) markedly reduced DA levels by up to 80% (Pycock et al. 1980; Bubser 1994; McGregor et al. 1996) and produced cognitive impairments in maze tasks (Bubser and Schmidt 1990) and operant tasks (Kheramin et al. 2004). Likewise, in monkeys multiple injections of 6-OHDA (6 μg/μL) induced a profound PL DA depletion (60–80%) and disrupted performance in a spatial delayed responses test (Roberts et al. 1994). Moreover, Bubser and Schmidt (1990) demonstrated that PL 6-OHDA injection produced a relatively persistent DA depletion; therefore recovery of DA terminals seems unlikely. Thus, its seems unlikely that negative results in animals with PL DA depletions reflect learning-related plasticity accounted for by minimal amounts of residual DA. In addition, consideration of the TH immunoreactivity presented in Figure 1 illustrates that almost no DA terminals remained in the PL.
After recovery from surgery, all animals received instrumental training on both levers, one delivering food pellets, the other sucrose solution. Sham controls and animals with PL DA depletion both showed increased response rates, as the ratio schedule parameter increased across consecutive training days (Fig. 2). However, relative to sham controls, performance of animals with PL DA depletion was reduced. An analyses of variance (ANOVA) with repeated measures revealed a significant main effect of group (F1,16 = 5.01, P < 0.05), a main effect of training day (F10,160 = 73.48, P < 0.001), and a significant interaction of group × training day (F10,160 = 4.43, P < 0.001).
Outcome Devaluation Test in Extinction
After the specific satiety devaluation procedure, both groups displayed a selective devaluation effect, that is, fewer responses were emitted to the lever that in training delivered the now devaluated outcome (Fig. 3A). This observation was confirmed by the statistical analysis. An ANOVA with repeated measures revealed a significant effect of devaluation (F1,16 = 26.74, P < 0.001), but no effect of group (F < 1, not siginificant [n.s.]), and no interaction of lesion × devaluation (F < 1, n.s.).
Outcome Devaluation Test under Reward
Thereafter, the sensitivity of the animals to outcome devaluation was tested with the 2 outcomes being delivered. Results demonstrate a devaluation effects in both groups, that is, they emitted fewer responses to the lever for which the outcome has been devaluated relative to the other (Fig. 3B). As animals were rewarded, lever-press rates in both groups were substantially higher relative to those in the outcome devaluation test performed in extinction. In addition, the overall performance of animals with PL DA depletion was lower. A repeated measures ANOVA revealed a significant effect of group (F1,16 = 7.49, P < 0.05), a significant effect of devaluation (F1,16 = 9.69, P < 0.01) but no group × devaluation interaction (F < 2.5, n.s.).
Contingency Degradation Training
Then, performance was tested in sessions in which the contingency between one action and outcome was selectively degraded, whereas the other action-outcome contingency was not altered. Results demonstrated a selective contingency degradation effects in both groups. As already observed during training and outcome devaluation testing, overall lever-press rates in animals with PL DA depletion were somewhat lower (Fig. 4A). A repeated measures ANOVA revealed a significant effect of degradation (F1,16 = 20.8, P < 0.001). In addition, there was a tendency for a group (F1,16 = 3.65, P = 0.074) and a group × degradation interaction (F1,16 = 3.72, P = 0.071) that did not reach significance. Furthermore, there was a day × degradation interaction (F5,80 = 7.92, P < 0.001) but no day × degradation × group interaction (F < 2, n.s.) indicating the contingency degradation effect became greater over the course of the training in both groups. It is important to note that the trend for a group × degradation interaction might not reflect an insensitivity to contingency degradation in animals with PL DA depletion but primarily resulted from their reduced overall lever-press rates. A further analysis of the contingency degradation effects by planned contrasts for each training day confirmed this description. Animals with DA PL depletion showed significant contingency degradation effects on a number of training days (Fig. 4A). Correspondingly, animals with PL DA depletion were sensitive to the contingency degradation in the subsequent test performed in extinction.
Contingency Degradation Test in Extinction
When tested in extinction, both groups demonstrated a degradation effect (Fig. 4B). A repeated measures ANOVA indicated a significant effect of degradation (F1,16 = 19.23, P < 0.001), but no effect of group (F < 2.5, n.s.) and no group × degradation interaction (F < 3.5, n.s.).
Experiment 2: Effect of pDMS DA Depletion on Outcome Devaluation and Contingency Degradation
Figure 5 provides a schematic representation of the extent of striatal damage in animals with 6-OHDA lesions of the pDMS. The data from 2 animals of the pDMS-lesion group had to be excluded because of the absence of any detectable lesion in the target area. Furthermore, one animal in each group had to be excluded due to very low lever-press rates during training (see below). The final group sizes were thus as follows: sham-lesioned rats (n = 13) and 6-OHDA lesioned rats (n = 12). TH-positive fibers in the pDMS were abundant in sham-lesioned rats but rare in rats with 6-OHDA lesions. Loss of TH-positive fibers in the pDMS appeared from about +0.2 to −0.8 relative to bregma with the maximum extension at around −0.4 mm relative to bregma. The lesions never extended more than 3.2 mm laterally from the midline indicating an intact DA innervation of the dorsolateral striatum. Moreover, lesions never extended beyond +0.2 mm anterior to bregma indicating an intact DA innervation of the anterior dorsomedial striatum. No evidence of any damage outside the striatum was found. Accordingly, intra-pDMS infusion of the excitotoxin N-methyl-D-asparate in the same volume as used here for 6-OHDA infusions produced a comparable focal lesion of the pDMS (Yin et al. 2005). Consistent with our observation that rats subjected to 6-OHDA infusions had a near complete pDMS DA depletion, intrastriatal infusion of solutions with lower concentrations of 6-OHDA (<7 μg/μL) as used here (15 μg/μL) profoundly reduced tissue concentrations of DA (>85%; e.g., Brown and Robbins 1991) as well as % density of TH-positive fibers (Yuan et al. 2005) and caused a significant DA denervation as assessed by DA transporter autoradiography (Winkler et al. 2002). Furthermore, it is unlikely that pDMS lesions did recover during the course of our experiments. A detailed analysis of the time course by Blandini et al. (2007) suggests that a 6-OHDA induced loss of striatal DA terminals remained stable over 4 weeks. Likewise, using a similar concentration and volume of 6-OHDA as in our study, Faure et al. (2005) demonstrated a massive striatal DA depletion 16 weeks postlesion.
One animal from each group had to be excluded from the experiment because of very low lever-press rates. The remaining animals of both groups acquired the instrumental task (Fig. 6). A repeated measures ANOVA indicated a significant effect of day (F9,207 = 59.41, P < 0.001) but no effect of group (F < 1, n.s.) and no lesion × day interaction (F < 1, n.s.).
Outcome Devaluation Test in Extinction
Both groups showed a clear devaluation effect (Fig. 7A). This observation was confirmed by the statistical analysis. A repeated measures ANOVA revealed a significant effect of devaluation (F1,23 = 16.47, P < 0.001) but no effect of group (F < 1.5, n.s.) and no lesion × devaluation interaction (F < 1, n.s.).
Outcome Devaluation Test under Reward
When tested in the rewarded conditions, both groups also displayed a devaluation effect (Fig. 7B). Accordingly, a repeated measures ANOVA demonstrated a significant effect of devaluation (F1,23 = 39.09, P < 0.001) but no effect of group (F < 3, n.s.) and no group × devaluation interaction (F < 4, n.s.).
Contingency Degradation Training
As shown in Figure 8A, in sham controls there was a marked selective contingency degradation effect, whereas animals with pDMS DA depletion seemed to be sensitive to a degradation of the instrumental contingency, albeit to a lesser extent. A repeated measures ANOVA revealed a significant effect of degradation (F1,23 = 15.53, P < 0.001) but no effect of group (F < 1, n.s.). Furthermore, the group × degradation interaction missed significance (F1,23 = 2.13, P = 0.16). Contingency degradation effects were analyzed in more detail using planned contrasts for each training day. As shown in Figure 8A, sham controls were sensitive to the contingency degradation in 5 out of 6 days, whereas animals with pDMS DA depletion were sensitive in only 2 out of 6 days. Furthermore, there was a near significant day × degradation interaction (F5,115 = 2.27, P = 0.051) indicating that the contingency degradation effect became greater over the course of the training, regardless of the group (no day × degradation × lesion interaction; F < 1, n.s.).
Contingency Degradation Test in Extinction
As shown in Figure 8B, unlike sham controls, animals with pDMS DA depletion did not show a contingency degradation effect when tested in extinction. A repeated measures ANOVA revealed no effect of group (F < 1, n.s.), but a significant effect of degradation (F1,23 = 11.22, P < 0.01) and, most importantly, a significant interaction of group × degradation (F1,23 = 4.32, P < 0.05). Simple main effects analyses further showed that sham controls were sensitive to the degradation of the instrumental contingency (F1,12 = 10.37, P < 0.01), whereas animals with pDMS DA depletion were not (F < 1.7, n.s.).
Our results revealed that rats with PL DA depletion displayed a lower rate of lever pressing but, like sham controls, showed a selective outcome devaluation and contingency degradation effect. These findings suggest that PL DA modulates lever pressing but might not support instrumental conditioning. By contrast, rats with pDMS DA depletion had intact overall response rates and were sensitive to a selective outcome devaluation procedure; however, they showed a reduced sensitivity to contingency degradation. Thus, pDMS DA signaling seems not to be involved in the control of lever pressing but in instrumental conditioning by supporting the detection of the causal relationship between an action and its consequence.
Prelimbic DA and Instrumental Conditioning
Cell body lesions aimed at the PL abolished the sensitivity to outcome devaluation and contingency degradation suggesting that this PL subregion plays a critical role in instrumental conditioning (Balleine and Dickinson 1998; Corbit and Balleine 2003; Killcross and Coutureau 2003). Results from Experiment 1 suggest that DA signals in the PL may not be involved in encoding the value of an outcome and learning the causal relationship between an action and its consequences. Furthermore, our data demonstrate that rats with PL DA depletion displayed a reduced lever-press performance during acquisition and throughout all subsequent tests. Similarly, during acquisition of a comparable task as used here overall lever-response rates were reduced by PL cell body lesions (Corbit and Balleine 2003). Furthermore, we observed a trend for a group × degradation interaction that could have reached significance if using higher sample sizes, for example, as in Experiment 2. However, this trend for an interaction is predominantly due to the impaired overall lever-press rates in lesioned animals, that is, lever-press rates for the nondegraded lever were lower and differences between lever-press rates for the nondegraded versus degraded lever became smaller (Fig. 4). Notably, Naneix et al. (2009) also found that PL DA depletion did not affect sensitivity to outcome devaluation but—at variance with our findings—rendered animals insensitive to contingency degradation. Different contingency degradation protocols used in this and our study might account for these discrepancies. Importantly, while we used a RR-20 schedule during contingency degradation training, Naneix et al. (2009) employed a RR-10 schedule resulting in a relatively high rate of noncontingent reward delivery ((O|no A) = 0.1). Thus, a significant response competition between lever pressing and magazine-directed responses to check for reward could have occurred thereby reducing the lever-press rate for the degraded lever. Furthermore, at variance with the study by Naneix et al. (2009), we provided the noncontingent reward both in sessions with and without contingency degradation to check for the selectivity of contingency degradation. In addition, we also examined contingency degradation during extinction in order to test for previously learned associations without contamination by new learning.
As PL DA depletion or DA receptor blockade did not affect gross motor activity (Seamans et al. 1998), nonspecific motor impairments are unlikely to account for the reduced lever-press performance. Computational models suggest that tonic DA levels report the average reward rate in free operant tasks thereby providing one way to explain why low DA levels are generally associated with less vigorous responding (Niv et al. 2006). In a similar vein, a reduced expectation of average reward could be one explanation for reduced response rates in animals with PL DA depletion seen here. However, as those animals seemed to be still sensitive to outcome devaluation and contingency degradation, it is unlikely that the reduced lever-press rates reflect impaired instrumental learning. Notably, sham controls pressed the valued lever more often in the rewarded outcome devaluation test than in the outcome devaluation test in extinction. There is evidence that instrumental conditioning involves many associative processes including the formation of outcome-response associations that support responding (see Corbit and Balleine 2003). Accordingly, in sham controls outcome delivery, by activating the valued outcome-response association, could selectively increase the rate of the associated action in the rewarded outcome devaluation test but not in the outcome devaluation test performed in extinction. By contrast, in animals with a PL DA depletion pressing of the valued lever was equally low both in the rewarded outcome devaluation test and the outcome devaluation test in extinction. Thus, it is conceivable that PL DA supports the formation of outcome-action associations. Yet, this speculation needs experimental support, for example, by analyzing whether PL DA depletions impair the specificity of instrumental reinstatement that depends on outcome-response associations. However, Corbit and Balleine (2003) showed that PL cell body lesions left outcome-response associations intact. Furthermore, Ostlund and Balleine (2005) recently demonstrated that cell body lesions of the medial PL cortex had no effect on the outcome-specificity of reinstatement but attenuated the response invigorating effects of the outcome. These findings are more consistent with the view that PL DA may play a role in invigorating responding than in the formation of outcome-action associations.
Taken together, findings from Experiment 1 suggest that PL DA invigorates lever pressing but may not support the encoding of action-outcome associations.
Striatal DA and Instrumental Conditioning
As cell body lesions abolished both the sensitivity to outcome devaluation and contingency degradation, the pDMS has been implicated in encoding action-outcome associations (Yin et al. 2005). Experiment 2 revealed that animals with pDMS DA depletion were sensitive to changes in the outcome value and can thus integrate an imposed change in value into an association between a specific action and the related outcome. Furthermore, selective responding in the outcome devaluation tests suggests that pDMS DA depletions did not compromise the rat's ability to discriminate between the 2 actions or the 2 outcomes. Therefore, rats with pDMS DA depletion not only seem to be able to encode the current value of an outcome but also to encode and to retrieve action-outcome associations that guide selective responding. Hence, DA signaling in the pDMS seems not to be involved in encoding of action-outcome associations. By contrast, recent studies revealed that instrumental conditioning is compromised in DA depleted rodents, an impairment that can be restored by a reinstatement of DA signaling in the dorsal striatum using viral gene transfer (Robinson et al. 2007). Furthermore, electrophysiological studies showed that the activity of DA neurons is modulated by the value of upcoming actions (Morris et al. 2006; Roesch et al. 2007). However, these studies do not necessarily contradict our findings as DA signaling in target areas of DA neurons other than the PL or pDMS may contribute to instrumental conditioning. Alternatively, DA signaling may not support goal-directed behavior that depends on representations of contingencies and outcomes as suggested by computational accounts (Niv and Schoenbaum 2008). In line with this notion, Dickinson et al. (2000) demonstrated that a systemic DA receptor blockade did not interfere with incentive instrumental learning in rats.
Importantly, Experiment 2 further revealed that the sensitivity to contingency degradation was impaired in animals with pDMS DA depletion. In the extinction test, sham controls responded less often to the lever previously associated with the noncontingent reward relative to the other lever, whereas animals with pDMS DA depletion responded to both levers on about equally low rates. The extinction test which examined previously learned associations is critical as lesions that impair the sensitivity to contingency degradation tested in extinction do not necessarily impair the rates of acquisition during contingency degradation training (Corbit et al. 2003). Thus, animals with pDMS DA depletion display an altered sensitivity to contingency degradation. Similar observations were made after cell body lesions of the pDMS (Yin et al. 2005); in the extinction test, lesioned animals pressed both levers with equal intermediate rates. As animals with pDMS DA depletion showed an outcome devaluation effect that probably relies on an intact action-outcome association, an impairment in encoding action-outcome associations may not account for their reduced sensitivity to contingency degradation. Furthermore, it is unlikely that pDMS DA depletion impaired the vigor of responding as during instrumental training lever pressing was unaffected on a RR-20 schedule, a schedule with high work requirements (Salamone et al. 2007). In addition, lesioned animals could have been unable to maintain high levels of responding in extinction because DA invigorates instrumental responding in particular in the absence of a reward. However, this possibility is unlikely because the mean lever-press rates for the degraded and for the nondegraded lever during contingency degradation training (overall mean over 6 days) and during extinction are almost identical in lesioned animals (data not shown).
Surprisingly, lever-press rates for the degraded lever were only moderately higher in lesioned versus sham-lesioned animals. Similar observations have been made in contingency degradation studies after lesion of various brain areas including pDMS, PL, entorhinal cortex, and mediodorsal thalamus (Balleine and Dickinson 1998; Corbit et al. 2002, 2003; Corbit and Balleine 2003; Yin et al. 2005). To date, the reasons for this phenomenon are yet unclear.
Considerable evidence suggests that in many situations behavioral responding is governed both by Pavlovian and instrumental mechanisms (Dickinson and Balleine 1994). In addition, DA in the nucleus accumbens has been implicated in Pavlovian conditioning (Dalley et al. 2002; Parkinson et al. 2002; Lex and Hauber 2008) and DA in the dorsal striatum in instrumental conditioning (Robinson et al. 2007). Thus, lever pressing in rats with pDMS DA depletions could be maintained predominantly by Pavlovian rather than instrumental contingencies. Furthermore, rodent studies indicate that autoshaped lever pressing is supported by Pavlovian mechanisms as it was only minimally affected when lever pressing prevented the delivery of the outcome (Locurto et al. 1976). Collectively, these findings point to the possibility that responding of pDMS DA depleted rats to the degraded lever could reflect an automatic Pavlovian approach behavior elicited by the lever that acts as an appetitive Pavlovian stimulus. Likewise, intact outcome devaluation in rats with pDMS DA depletion could be maintained by Pavlovian mechanisms as responding subserved by Pavlovian stimuli can be sensitive to outcome devaluation (e.g., Holland and Straub 1979). If so, such mechanisms may not govern devaluation performance in animals with cell body lesions of the pDMS as they were insensitive to outcome devaluation (Yin et al. 2005).
Alternatively, in normal rats, reduced instrumental performance associated with contingency degradation has been interpreted to reflect the result of a comparison of the validity that an action is predictive of reward and the validity that the background or context is predictive of reward (e.g., Colwill and Rescorla 1986; Corbit and Balleine 2000). Accordingly, another possible explanation of the results from the contingency test is that animals with pDMS DA depletion are less sensitive to the high probability of an outcome delivery even if they do not perform an action. In other words, their reduced sensitivity to contingency degradation could not be due to a failure to encode action-outcome associations but to assess the rate of reward in the absence of an action. Notably, cell body lesions of the entorhinal cortex produced the same pattern of results as seen here, that is, they left the sensitivity to outcome devaluation intact but affected the sensitivity to a degradation in the instrumental contingency (Corbit et al. 2002). It was hypothesized by the authors that the deficit in contingency sensitivity in animals with entorhinal lesions resulted from a reduced ability to calculate the background rate of reinforcement, that is, the rate of reinforcement in the absence of an action. Deficits in context conditioning produced by lesions of the entorhinal cortex (e.g., Maren and Fanselow 1997; Majchrzak et al. 2006) might account for this deficit, that is, in lesioned animals, the context may not become a valid predictor for reward during contingency degradation (Corbit et al. 2002). Although it is well known that major components of the hippocampal formation such as the hippocampus, entorhinal cortex, or subiculum play a role in context conditioning (Majchrzak et al. 2006; Ji and Maren 2008), their contribution to instrumental conditioning is less well understood. Lesion studies in rats revealed that the dorsal hippocampus and the subiculum do not mediate the sensitivity to contingency degradation (Corbit and Balleine 2000; Corbit et al. 2002), whereas the role of the ventral hippocampus is still unknown. Thus, among the components of the hippocampal formation examined so far, the entorhinal cortex seems to be critical for the detection of changes in the instrumental contingency. There is consistent evidence that the striatum and the hippocampal formation can act in an independent or competitive manner (Packard and Knowlton 2002; Poldrack and Packard 2003). However, as entorhinal lesions and pDMS DA depletions selectively impaired the sensitivity to contingency degradation, it is conceivable that both structures could interact to mediate instrumental responding. According to this account, rats with pDMS DA depletion may fail to integrate context-related entorhinal information and, in turn, are less sensitive to contingency degradation. Consistent with this notion, anatomical evidence suggests that the more posterior regions of the pDMS receive significant projections from the entorhinal cortex (McGeorge and Faull 1987, 1989). However, we cannot rule out that other DA-dependent mechanisms compromised in lesioned animals contribute to the reduced sensitivity to contingency degradation. For instance, contingency degradation also involves a transition of the predictive value of an action that could entail a DA-dependent prediction error signal (Schultz 2007b). Therefore, it is plausible that in animals with pDMS DA depletion an impaired prediction error signaling prevented the detection of a change of the instrumental contingency.
Interestingly, studies that examined spatial navigation in mazes provided support to the notion that the DMS and the hippocampal formation may form a functional circuit (Ragozzino et al. 2002; Mulder et al. 2004; Yin and Knowlton 2006). For instance, Yin and Knowlton (2004) found that, unlike sham controls, animals with a pDMS cell body lesions tested in a cross-maze task used a response, instead of a place strategy, possibly due to an impaired representation of contextual cues. Based on this and other findings, Yin and Knowlton (2006) suggested that the hippocampus and the pDMS might form a functional circuit that mediates goal-directed behavior based on a representation of the environment. Goal-directed behavior tested in spatial navigation tasks and operant tasks as used here might involve different forms of learning, thus respective comparisons have to be done with caution. Yet, our findings in Experiment 2 are consistent with the view that the pDMS is a key component of a corticostriatal circuit that mediates flexible goal-directed behavior based on contextual representation (Yin and Knowlton 2006). In addition, our data suggest that DA could be important in enabling or regulating the contextual information flow from the hippocampal formation to the pDMS. In a similar vein, electrophysiological work demonstrated that reward-directed behavior depends on the interaction between hippocampal inputs and neurons in the ventral striatum that is modulated by DA (Goto and Grace 2005).
Taken together, goal-directed instrumental action is characterized by 2 criteria: sensitivity to changes in the outcome value and changes in the contingency between action and outcome (Dickinson and Balleine 1994). Studies in humans (e.g., Tricomi et al. 2004; Tanaka et al. 2008) and rodents (e.g., Balleine and Dickinson 1998; Killcross and Coutureau 2003; Yin et al. 2005) implicated the medial PL cortex and the DMS in goal-directed action. For instance, cell body lesions of the PL in rats produced an insensitivity to outcome devaluation and contingency degradation (Balleine and Dickinson 1998; Corbit and Balleine 2003; Killcross and Coutureau 2003). Our present findings suggest that DA signals in the PL may not play a critical role in action-outcome learning. Furthermore, rodent studies using cell body lesions demonstrate that an inactivation of pDMS produced an insensitivity to outcome devaluation and contingency degradation (Yin et al. 2005). Our data show that DA signals in the pDMS are important for one aspect of goal-directed behavior, that is, they seem not to modulate sensitivity to outcome devaluation but may serve to detect changes in action-outcome contingencies. We propose that pDMS DA-modulation contributes to the control of goal-directed action by regulating the contextual information flow to the pDMS.
Deutsche Forschungsgemeinschaft (HA2340/8-2).
Conflict of Interest: None declared.