As animals learn novel behavioural responses, performance is maintained by two dissociable influences. Initial responding is goal-directed and under voluntary control, but overtraining of the same response routine leads to behavioural autonomy and the development of habits that are no longer voluntary or goal-directed. Rats normally show goal-directed performance after limited training, indexed by sensitivity to changes in the value of reward, but this sensitivity to goal value is lost with extended training. Rats with selective lesions of the prelimbic medial prefrontal cortex showed no sensitivity to goal value after either limited or extended training, whereas rats with lesions of the infralimbic region of the medial prefrontal cortex showed the opposite pattern of deficit, a marked sensitivity to goal value after both limited and extended training. This double-dissociation suggests that the prelimbic region is responsible for voluntary response performance and the infralimbic cortex mediates the incremental ability of extended training to override this goal-directed behaviour.
The ability to learn to perform purposive, goal-directed actions endows animals with a highly beneficial degree of behavioural flexibility in the face of ever-changing environmental conditions. However, this voluntary control of performance comes with a price in terms of effortful control and monitoring of the response, frequently reducing the capacity for alternative cognitive processing (Gehring and Knight, 2000). One way in which animals can come to balance the twin desire for simplicity and flexibility is through the development of habits (Dickinson, 1985). A venerable research history (James, 1890; Bryan and Harter, 1897; Kimble and Perlmuter, 1970; Boakes, 1993) documents the notion that an initially effortful and cognitively demanding response comes, with practice, to be produced fluidly and without difficulty. This two-process view is reflected in the development of theories of behavioural responding that depends on both mechanistic, reflexive stimulus– response (S–R) habits (either acquired or innate) and on actions that are voluntary and goal-directed.
Although instrumental conditioning is frequently described only in terms of S–R relationships, recent evidence suggests the involvement of at least two different forms of association, operating in tandem (Dickinson, 1985; Rescorla, 1991; Dickinson and Balleine, 1994). Empirical evidence from goal devaluation or revaluation experiments (Adams, 1982; Balleine and Dickinson, 1991, 1998a) indicates that during early stages of learning instrumental actions are goal-directed, requiring animals to encode the specific consequences of their actions as well as the causal relationship between the action and the goal, i.e. the nature of the response–outcome (R–O) association. As training proceeds, instrumental performance becomes habitual, stimulus-bound, and independent of the current value of the goal. This effect appears to depend not on repetition of the response per se, but perhaps on the fact that overtraining reduces the animals’ perceived correlation between performance of the response and achievement of the goal (Dickinson et al., 1995).
Recent research has highlighted the role of the prefrontal cortex in the control and organization of goal-directed behaviour (Watanabe, 1996; Tremblay and Schultz, 1999), the monitoring of ongoing voluntary action sequences (Gehring and Knight, 2000) and the planning and selection of appropriate actions based on anticipated reward (Petrides, 1995; Rowe et al., 2000). In rats, the medial part of the prefrontal cortex has been associated with the ability to learn the contingency between actions and specific outcomes (Balleine and Dickinson, 1998a). Other research (mostly concerning the nature and localization of procedural memory) suggests that the S–R associations likely to underpin habitual responding rely upon neural substrates that depend, at least in part, upon the integrity of the basal ganglia (Mishkin et al., 1984; Reading et al., 1991; Knowlton et al., 1996; White, 1997; Jog et al., 1999). However, even though psychological accounts of instrumental learning are frequently described in terms of a combination of these two processes, little work has examined the important issue of the neural underpinnings of the interaction between goal-directed and habitual processes that is responsible for everyday behaviour. As coordination of these two processes is perhaps best defined as an executive function (Shallice, 1988), one logical possibility is that this function is achieved by operations within different areas the prefrontal cortex, providing the basis both for the development of goal-directed actions, and a permissive role for the ability of habits to override voluntary performance.
Both anatomical and behavioural data have shown that the medial prefrontal cortex is a heterogeneous structure, comprising the ventral infralimbic cortex underneath more dorsal prelimbic and anterior cingulate regions (Fisk and Wyss, 1999). The afferent, efferent and intrinsic connections of these regions can readily be dissociated (Sesack et al., 1989; Hurley et al., 1991; Takagishi and Tanemichi, 1991; Conde et al., 1995). The more ventral, infralimbic, region projects extrinsically to a variety of limbic and autonomic regions, including the hypothalamus, the amygdala, the bed nucleus of the stria terminalis, the periaqueductal gray, the dorsal motor vagal nucleus, the nucleus of the solitary tract and the parabrachial nucleus (Sesack et al., 1989; Hurley et al., 1991), as well as to the shell region of the nucleus accumbens (McGeorge and Faull, 1989; Berendse et al., 1992), and intrinsically shares restricted reciprocal connections to the prelimbic and dorsal peduncular cortices (Fisk and Wyss, 1999). In contrast, the more dorsal prelimbic region projects to core regions of the nucleus accumbens (Gorelova and Yang, 1997) as well as (from its dorsal extreme) to dorso-medial regions of the dorsal striatum (McGeorge and Faull, 1989; Berendse et al., 1992). Furthermore, reciprocal intrinsic connections exist between this region and the more dorsal anterior cingulate and medial agranular cortices, and from there to premotor and motor cortices (Bates and Goldman-Rakic, 1993; Morecroft and Van Hoesen, 1993; Lu et al., 1994). This interactive system in the medial wall of the prefrontal cortex is paralleled by a hierarchical flow of information through accumbens shell, core, central striatum and dorsal striatum under the influence of striato-nigrostriatal subcircuits (Haber et al., 2000). These two hierarchies may represent interconnected, parallel limbic, cognitive, motor systems, suggesting that components of the medial prefrontal cortex could act as an interface for the interaction between motivationally sensitive, cognitive, goal-directed responding and automatic habitual responses.
In the present study, comparing rats with selective discrete, excitotoxic lesions of the dorsal (prelimbic) or ventral (infralimbic) regions of the medial prefrontal cortex and sham-operated control animals, we have examined the role of these two sub-regions in animals’ ability to produce goal-directed actions, as well as the subsequent tendency for S–R habits to override these responses. We measured the sensitivity of responses to changes in goal-value following either restricted or extended training of discriminable instrumental responses. Changes in goal value were achieved using a specific satiety procedure in which rats were pre-fed on a specific reinforcing outcome before testing.
Materials and Methods
Thirty-two male, Lister-hooded rats (275–300 g) were used. Rats were accustomed to the temperature- and humidity-controlled laboratory vivarium for 1 week. They were housed two per cage. Following recovery from surgery animals were maintained at ∼90% free-feeding weight. The vivarium was maintained at 21°C with the light on from 8 a.m. to 8 p.m. All experiments were carried out during the light portion of the cycle. All procedures involving animals and their care conformed to institutional guidelines that comply with international (Directive 86-609, 24 November 1986, European Community) and national [UK Animals (Scientific Procedures) Act 1986] laws and policies.
Rats were anesthetized using isofluorane and then placed in a Kopf stereotaxic frame (Kopf Instruments, Tujunga, CA) in a flat skull position. The bone of the skull above the region to be lesioned was removed using a high-speed drill. Ibotenic acid (Biosearch Technologies Inc., San Rafael, CA) was dissolved in phosphate buffered saline (pH 7.4) to provide a solution with a concentration of 63 mM. This solution was injected into the brain through a glass pipette glued onto the end of the needle of a 5 ml Hamilton syringe held with a Kopf microinjector (Model 5000). For lesioning the infralimbic cortex, 0.1 μl of ibotenic acid was infused at the following coordinates (in millimetres from bregma): A-P (antero-posterior) + 3.0, L (lateral) ± 0.7, V (ventral) −5.4; for the prelimbic cortex 0.2 μl was given at the following coordinates: A-P + 3.2, L ± 0.7, D-V −4.0. Injections were made manually at a rate of 0.1 ml/min and the pipette was left in place for 3 min after the injection to allow diffusion of the solution into tissue. Rats in the sham group were given a similar surgical procedure with the absence of injection of neurotoxin. Eleven rats received lesions of the prelimbic cortex (group PL), 11 received lesions of the infralimbic cortex (group IL) and 10 served as sham-operated control animals. All subjects recovered for a period of 5–7 days after surgery with ad libitum access to food and water, after which the food deprivation schedule was maintained for 3 days prior to the start of behavioural procedures. Animals were individually handled for 5 min on each of these days.
Two separate sets of eight Skinner boxes, individually housed in sound- and light-attenuating chambers, were used. The two sets of boxes were easily discriminable in terms of size, wall colour and composition, lever dimensions and relationship of the levers and magazine to the front access panel of the chambers. Both fluid and pellet reinforcement could be delivered in each set of chambers. Chambers in one set (Paul Fray Ltd, Cambridge, UK) measured 25 × 25 × 22 cm. Each chamber had three aluminium walls and a clear Perspex front wall. The roof was made of clear Perspex and the floor consisted of 18, 5 mm diameter steel bars spaced 1.5 cm apart centre-to-centre, parallel to the back wall of the chamber. A recessed magazine that provided access to rewards via a hinged Plexiglas panel was located in the centre of the left-hand wall. Two box-type retractable levers could be inserted to the left and right of the magazine. A BBC Master-128 microcomputer equipped with the Spider extension for on-line control (Paul Fray Ltd) controlled the equipment and recorded the data. Chambers in the second set (Med Associates Inc., St Albans, VT) measured 30 × 24 × 21 cm. The chambers were made of clear Perspex except the right-hand wall, which was made of aluminium. A recessed food magazine panel was located in the centre of the right-hand wall and access was determined by means of infra-red detectors mounted across the mouth of the recess. Two flat-panel retractable levers could be inserted to the left and right of the magazine. The floor consisted of 19 steel rods, 4.8 mm in diameter, spaced 1.6 cm apart. An IBM-compatible microcomputer equipped with MED-PC software (Med Associates Inc.) controlled the equipment and recorded the data.
Training consisted of three stages: magazine training and two levels of lever press training (high and low training). The experimental design is summarized in Table 1. For any given animal, magazine training and high-training lever pressing took place in one type of chamber, and low-training lever pressing occurred in the alternative type of chamber. The two different rewards used in the experiment were 45 g food pellets (Formula A/I; Noyes, Lancaster, NH) and 0.5 ml, 20% w/v sucrose solution. Initially, all rats were trained to collect rewards during 2 × 30 min magazine training sessions. Rewards were delivered on a random time 60 s schedule. The duration of each 30 min session was signaled by illumination of the houselight. All rats were then trained in a single 30 min session to press a lever (left or right, counterbalanced across animals) to earn rewards on a continuous schedule of reinforcement (i.e. every press was rewarded), after which sessions employed a variable interval 60 s schedule. This training continued for 15 × 30 min sessions, after which all animals had earned ∼450 of one reward-type. All animals were then transferred to 20 min sessions on a variable ratio 20 (VR20) schedule in these chambers. Animals now also received separate 20 min sessions of training in their alternative operant chambers, in which they pressed on the opposite-side lever (right or left, counterbalanced), initially on a VR10 (one session) and then a VR20 schedule, for the alternative reward. Hence, all animals now received two sessions of training per day, one in each type of chamber, for alternative rewards. The order of sessions was counterbalanced across animals and training days, with the second session starting ∼5 min after the first. All animals received up to 5 days of these VR20 training sessions across which they earned a total of 50 rewards of each type (strictly limited by the control program) in the two different operant chambers. Therefore at the end of training animals had received a maximum of 20 sessions of training and ∼500 deliveries of one type of reward following one response (high training) in one set of chambers, but a maximum of only five sessions of training and 50 deliveries of the alternative reward following the second response (low training) in the alternative chambers.
Devaluation Extinction Tests
Each rat then received 2 days of specific-satiety devaluation extinction testing during which lever press and magazine approach behaviours were measured. On the day after the final day of training, all rats were given free access in their home cages to one of the two types of reward (half receiving their low-training and half their high-training, reward) for 60 min. Immediately after this pre-feeding treatment, the animals were placed in one type of operant chamber for a 15 min extinction session during which responding on the appropriate lever was measured in the absence of reward delivery. The animals were then transferred immediately to their alternative operant chambers and received an identical 15 min extinction test with the alternative lever. The order of testing was counterbalanced with respect to pre-fed reward and type of chamber. The following day, animals received a single 20 min, VR20 rewarded recovery session in each type of chambers. A second test was conducted on the following day. This was identical to the first, except animals were pre-fed with the alternative reward prior to extinction testing (again test order was counterbalanced with respect to pre-fed reward and chamber). After each of the extinction tests and in order to evaluate the effectiveness of the pre-feeding procedure in the different groups, animals were allowed free access to each of two rewards successively (the pre-fed and the non-pre-fed, order counterbalanced across animals and test days) for 30 min in their home cage and overall consumption was measured.
After behavioural testing, animals received a lethal dose of sodium pentobarbital and were perfused transcardially with saline (0.9%) followed by 10% formal saline. The brains were then removed and post-fixed for 2 h. After post-fixation, the brains were transferred in 0.1 M phosphate-buffered 20% w/v sucrose solution in which they remained at room temperature for 24 h. Coronal sections of the brains (40 μm thick) were cut using a freezing microtome (−20°C). The sections were collected onto gelatin-coated slides and dried at room temperature for 36 h before being stained with cresyl violet.
Histological analysis was performed by one of the authors (S.K.), who was blind to lesion condition. Sections were examined for the extent of lesion by microscopically examining slides for gross morphological changes, gliosis and scarring. Reconstructions were drawn from sections with reference to the atlas of Paxinos and Watson (Paxinos and Watson, 1998).
Statistical analyses were performed using analysis of variance (ANOVA) with a between subjects factor of group (sham, PL and IL), and within subjects factors of training (high versus low) and specific satiety devaluation (devalued versus non-devalued).
For histological analysis, the following criteria were followed for inclusion: (i) significant damage (or gliosis) to the targeted area; (ii) damage in both hemispheres; and (iii) no significant damage to the neighbouring structures. Figure 1 shows photo-micrographs of representative specific prelimbic (Fig. 1a) and infralimbic lesions (Fig. 1b). Figure 2 shows schematic reconstructions of lesions to either the prelimbic cortex (Fig. 2a) or the infralimbic cortex (Fig. 2b). As shown, the areas of the two lesion sites do not overlap, but are limited to prelimbic or infralimbic subregions of the medial prefrontal cortex. In group PL, lesions were acceptable in 10 rats. One rat showing damage that extended into the infralimbic region was discarded. Out of the 11 infralimbic cortex-lesioned rats, the lesions were acceptable in seven rats. The remaining four animals were discarded as they had only unilateral lesions (n = 3) or a lesion extending beyond the boundaries of the infralimbic cortex (n=1).
The three groups of rats acquired the initial instrumental response at the same rate (data not shown). By the end of training (last session prior to extinction testing), there was a main effect of extent of training, with higher rates of performance on the overtrained response [high training, 14.9 responses/min; low training, 8.1 responses/min; F(1,24) = 54.0, P < 0.001], but no effect of lesion, nor any interaction (Fs < 1; mean responses/min: group IL high = 14.1; low = 8.7; group PL high = 15.3; low = 6.9; sham high = 15.0; low = 8.9). This is in line with previous findings (Adams, 1982) and, furthermore, suggests that differences in baseline responding cannot account for any effects of the lesions on test performance. Similarly, although initial performance of the low-training response was lower than that of the high training response (first VR sessions, mean responses/min: group IL high = 12.0; low = 1.9; group PL high = 10.3; low = 2.2; sham high = 12.2; low = 2.0), analysis revealed only a main effect of extent of training [F(1,24) = 204.1, P < 0.001] and no main effect or interaction involving lesion [maximum F(2,24) = 1.1, P > 0.3]. Not only does this emphasize that there were no differences in baseline responding due to lesion across a wide range of response rates, but the failure of any substantial generalization and transfer between the high-training and low-training responses on this first VR session indicates that the different lever press responses in the two types of chamber were markedly discriminable for the animals.
Devaluation Extinction Test—Lever Press Performance
The upper panels of Figure 3 show the instrumental performance during the extinction tests for the sham animals, and lesioned groups PL and IL for both the low-training (Fig. 3a) and high-training responses (Fig. 3b), as a proportion of their baseline rates of pressing (which did not differ — see details above). As a result of pre-feeding with the low-training reward (devalued — closed bars), but not the high-training reward (non-devalued — open bars), performance of the low-training response decreased in the sham group, suggesting that the low-training response was goal-directed. By contrast, performance of the high-training response by control animals was no longer goal-directed as sham animals tested after pre-feeding with the high-training reward (devalued — closed bars) showed no differential reduction in performance of the high-training response relative to testing after pre-feeding with the low-training reward (non-devalued — open bars). This demonstration of the effects of devaluation by specific satiety on a minimally trained, but not overtrained, response reflects the presence of two forms of responding in these animals: the former goal-directed and the latter habitual. Following lesions of the infralimbic region of the medial prefrontal cortex (group IL), rats showed a sensitivity to reward devaluation for both the low-training and high-training responses, indicating that in these animals responding remained goal-directed despite extensive overtraining. In contrast, rats with lesions of the more dorsal prelimbic region of the medial prefrontal cortex (group PL) failed to show sensitivity to the devaluation of the reward of either the low-training or high-training response, indicating that in neither case was responding goal-directed.
Statistical analysis confirmed this description of the data. Mixed analysis of variance with between-subject factors ‘lesion’ (sham, infralimbic, prelimbic) and within-subject factors ‘training’ (low, high) and ‘devaluation’ (devalued, nondevalued), revealed a significant effect of devaluation [F(1,24) = 14.5, P < 0.01] and a lesion × devaluation interaction [F(2,24) = 5.5, P < 0.05]. The analysis also revealed a significant three-way interaction [F(2,24) = 3.5, P < 0.05], suggesting that the effect of lesion on devaluation depended upon the level of training. Further analysis of each lesion group individually indicated a training × devaluation interaction only in sham-operated rats [F(1,9) = 5.6, P < 0.05], with post hoc pairwise comparisons revealing that devalued and nondevalued performance differed after minimal training (P < 0.05), but not after overtraining. In contrast, rats with prelimbic lesions showed no effect of training or devaluation, nor any interaction [maximum F(1,9) = 1.3], whereas rats with infralimbic lesions showed no effect of training (F < 1) but a main effect of devaluation [F(1,6) = 19.1, P < 0.005].
The suggestion from Figure 3b that the non-devalued performance (open bars) of the high-training response by animals in group IL exceeded the equivalent performance by sham-operated animals was confirmed by post hoc Newman– Keuls pairwise comparison (P < 0.05). This difference in baseline extinction test performance is likely to be due to the overall suppression of habit-based or S–R performance in sham-operated animals (and group PL) by a general (as opposed to specific) satiety effect following pre-feeding. That is, although selective devaluation effects due to specific response–outcome associations are not seen following extensive training (no difference between devalued and non-devalued performance), general suppression of performance due to reduced motivation is known to directly impact S–R associations (Dickinson et al., 1995), following earlier drive (Hull, 1943) and two-process (Rescorla and Solomon, 1967) theories. Hence, the greater levels of non-devalued responding by group IL despite extensive training is again likely to reflect the persistence of goal-directed responding in this group relative to habitual performance in sham-operated animals and group PL. Interestingly, a parallel (although non-significant) trend is apparent in Figure 3a in which performance of the non-devalued low-training response is lowest in group PL (which may be showing a general suppression of S–R performance), relative to equivalent performance by sham-operated animals and group IL (which devaluation tests suggest is goal-directed).
Devaluation Extinction Test — Magazine Approach Performance
The lower panels of Figure 3 show the magazine activity during the extinction tests for the three groups of animals in the low-training (Fig. 3c) and high-training (Fig. 3d) chambers. Preliminary analysis revealed that there was no effect of lesion or training on baseline levels of magazine approach, nor an interaction of these factors (all Fs < 1). Hence, data from the test sessions were expressed as a proportion of baseline responses. Baseline rates were: group IL high = 8.0; low = 7.0; group PL high = 6.5; low = 6.8; sham high = 7.4; low = 6.9 responses/min. Pre-feeding produced a uniform decrease in magazine approach in across all groups, regardless of level of training and irrespective of whether the animal was in the low-training or the high-training response environment. Hence, neither prefrontal lesion influenced the ability of specific satiety to produce a devaluation of magazine approach. Further, extensive exposure to a reward during response acquisition in the high-training condition had little effect on the ability of pre-feeding with that reward to reduce magazine responding. This suggests that magazine approach is under somewhat different neural and psychological control systems to instrumental responding. This result also indicates that the devaluation procedure was successful in reducing certain aspects of reward value, even though this did not extend to control of instrumental lever press performance. This was confirmed by statistical analysis. ANOVA as described above with factors of ‘lesion’, ‘training’ and ‘devaluation’, produced a main effect of devaluation [F(1,24) = 7.7, P < 0.05], but no main effect of lesion (F < 1), nor any interaction involving lesion (all Fs < 1). There was a main effect of training [F(1,24) = 7.5, P < 0.05], but no training × devaluation interaction (F < 1), reflecting a lower relative rate in extinction in the high-training condition.
The results of the consumption test confirmed that the specific satiety pre-feeding treatment successfully devalued the rewards in all groups. All animals readily rejected the reward on which they had just been sated and consumed high levels of the non-sated alternative (mean consumption/g: group IL sated = 12.9; non-sated = 2.9; group PL sated = 13.8; non-sated = 2.2; sham sated = 13.6; non-sated = 3.3). Statistical analysis revealed a main effect of satiety [F(1,24) = 395.7, P < 0.001], but no main effect of lesion or interactions (Fs < 1).
The present experiments investigated the effects of selective excitotoxic lesions of the prelimbic or infralimbic cortex on goal-directed behaviour and habitual responding. These lesions produced a double dissociation of the effects of overtraining on sensitivity of lever pressing to reward devaluation. Lever press responses of sham-lesioned animals showed sensitivity to devaluation after relatively small amounts of training, but this sensitivity was lost when training was more protracted. In contrast, animals with lesions of the prelimbic prefrontal cortex failed to show any sensitivity to devaluation even in the minimally trained condition, whereas those with lesions of the infralimbic cortex continued to demonstrate sensitivity to devaluation even after extensive training. These results have important implications for our understanding of the way in which goal-directed actions and habitual responses may be represented at the psychological level and coordinated by the prefrontal cortex.
Lesion-induced Changes in Instrumental Responding
Animals with lesions of the infralimbic prefrontal cortex showed sensitivity to goal devaluation at a stage of training when the responding of sham-operated control animals had become insensitive to goal value. This suggests that the infralimbic cortex is involved in the mechanism whereby overtraining comes to produce habitual responding that overrides goal-directed actions. Psychological theories of the interaction between goal-directed performance and habit learning have suggested that both processes develop in tandem. It has been proposed that the influence of habit learning increases across training, whereas the influence of goal-directed performance decreases as the experienced correlation of the rate of responding and rate of reward associated with that response decreases (Dickinson et al., 1995). Hence, the gradually increasing influence of habit behaviour over goal-directed performance could be mediated by an increasing level of inhibitory control exerted by the infralimbic region over the prelimbic cortex. However, it is unlikely that the infralimbic region itself is involved in associative processes underpinning habit formation. Current evidence surrounding the neural substrates of habit-based learning, both in humans and non-human animals, suggests a subcortical substrate involving the dorsolateral striatum in rats (Mishkin et al., 1984; Reading et al., 1991; White, 1997; Jog et al., 1999) or the neostriatum (caudate and putamen) in humans (Knowlton et al., 1996). In contrast there is little empirical evidence to suggest that the prefrontal cortex is directly involved in this process. Rather it may be hypothesized that this region provides the route by which habitual and reflexive forms of behaviour come to be dominate voluntary, goal-directed responding. Further, lesions of the ventral region of the medial prefrontal cortex have been associated with changes in inhibitory control, especially in the context of Pavlovian fear conditioning and anxiety (Frysztak and Neafsey, 1994; Morgan and LeDoux, 1995; Jinks and MacGregor, 1997; Morrow et al., 1999; Quirk et al., 2000). Damage to the infralimbic cortex leads to a general loss of inhibitory control and a decrease in the influence of prior learning on responding, supporting previous work examining the role of the ventromedial prefrontal cortex in affective inhibition (Dias et al., 1996), and work in humans demonstrating that damage to the ventromedial portion of the prefrontal cortex can lead to a loss of modulatory control of voluntary actions by emotional inputs (Damasio, 1994). Altogether, these data suggest that a general role of the infralimbic region could be to modulate the influence of reflexive responding by the inhibition of current goal-directed actions, as well as by modulation of downstream structures to which it has extensive connections, such as the central nucleus of the amygdala, and brainstem and hypothalamic regions associated with reflexive inhibition and fear responding. To be specific, it may be responsible for the suppression of goal-directed actions based on current contingencies in favour of reflexive responses based on previous learning. Further work is required to determine more precisely the distinct role of this region relative to other areas frequently implicated in inhibitory deficits, for example orbital regions in perseveration (Jones and Mishkin, 1972; Dias et al., 1996).
The failure of animals with lesions of the prelimbic region to show any sensitivity to devaluation confirms previous data (Balleine and Dickinson, 1998a), extending this work with more explicitly selective lesions, and in direct contrast to parallel lesions of the infralimbic cortex. This result strongly suggests that this region is crucial to the development and maintenance of representation of goals and/or to the apprehension of the relationship between purposive actions and those goals. This suggestion is in agreement with recent data highlighting the role of the prefrontal cortex in goal-directed responding (Petrides, 1995; Balleine and Dickinson, 1998a; Tremblay and Schultz, 1999) and may be placed in the broader context of mechanisms coordinating controlled and automatic responding. It is likely that the prelimbic region achieves this function by integrating information from a variety of sources. Indeed, anatomical data have showed that it has extensive bidirectional connections with other prefrontal regions known to be involved in the evaluation of goals (Conde et al., 1995), including the orbitofrontal (Gallagher et al, 1999; Baxter et al., 2000) and insular (Balleine and Dickinson, 2000) regions, the cingulate cortex (Watanabe, 1996), and subcortical connections to regions of limbic-motor interface (Groenewegen et al., 1991) implicated in the control of goal-directed action, including the basolateral nucleus of the amygdala (Killcross et al., 1997; Garcia et al., 1999; Blundell et al., 2001; Killcross and Blundell, 2002; Balleine et al., 2003) and the core subterritory of the nucleus accumbens (Baldwin et al., 2000). In order to achieve this coordinating role in the development of novel, goal-directed responses it is also possible that the prelimbic region plays some role in the active inhibition of prepotent or habitual responses, perhaps suggestive of a mutually inhibitory role for the prelimbic and infralimbic cortices.
As many previous studies examining the role of the prefrontal cortex in behavioural control have employed larger lesions covering both infralimbic and prelimbic medial prefrontal areas, the separable role of these two regions revealed may now provide insight into the behavioural effects of more general lesions. Damage to the infralimbic cortex appears to lead to a general loss of inhibitory control, perhaps specifically in the context of prior learning or innate responses (e.g. to novelty). In contrast, damage to the prelimbic region leads to an almost opposite effect, producing an increased influence of prior learning and habitual and innate responding on current behaviour. As it is hypothesized that the infralimbic region is responsible for the mechanism whereby these forms of learning come to dominate goal-directed behaviour coordinated by the prelimbic region, lesions encompassing both regions are likely produce behaviour dominated by subcortical systems mediating behaviour controlled by prior learning, habit and innate reflexive responding. The combination of lesions will produce both dysexecutive (loss of goal-directed responding) and perseverative (stimulus-bound, habitual) behaviours commonly found following non-selective damage to the dorso-ventral aspect of the medial prefrontal cortex in both rats (Ragozzino et al., 1999; Gisquet-Verrier et al., 2000; Yee, 2000) and monkeys (Dias et al., 1996).
Implications for Goal-directed and Habitual Responding in Normal Animals
The results obtained in the sham-operated group are in line with previous findings (Adams, 1980; Adams and Dickinson, 1981), confirming that two mechanisms are likely to operate in the control of instrumental performance. Specific satiety induced by pre-feeding decreased performance of a minimally trained response, but not an extensively trained one, suggesting that overtraining converts a goal-directed behaviour into a stimulus-bound habitual response. Whereas these results are in line with some previous work, others (Colwill and Rescorla, 1985) failed to find an effect of amount of training on sensitivity to devaluation. However, it has been suggested (Dickinson et al. 1995) that this failure to observe an overtraining effect may be due to the use of a within-subjects experimental procedure in which animals received differential training on multiple manipulanda for multiple rewards. This may have caused animals to remain sensitive to individual R–O correlations irrespective of duration of training and therefore precluded the normal development of habits, or obscured their assessment. Although the present experiment also employed a within-subject design, animals only ever received training with different R–O mappings in one of two contexts (either the low- or high-training conditions). This may have reduced the extent to which differential R–O training allowed animals to maintain sensitivity to individual R–O mappings and hence allowed the normal development of habitual responding with extended training.
These data may also have implications for the way in which R–O and S–R mechanisms interact at a psychological level. There are at least two possible forms of interaction. First, despite the transition to habitual responding with overtraining, R–O associations may persist, but be overridden by motor output due to S–R associations at the performance level. Secondly, the perception of instrumental contingency that underpins the existence of R–O associations may degrade with extensive training as reductions in the variability of rates of responding reduce the extent to which animals experience the R–O correlation (Dickinson, 1985). S–R associations are assumed to develop more gradually and to come to dominate responding as the perceived R–O correlation declines. Although not addressed directly by the current experiment, the failure to find any evidence for differences in the rate of acquisition of lever pressing between group IL where performance was always goal-directed and group PL where performance was always S–R might provide stronger support for the former hypothesis than the latter.
Finally, no effect of level of training was found in sham-operated animals with respect to magazine activity. This confirms previous suggestions that instrumental lever press responses and magazine approach behaviour may well be sustained by different psychological processes (Balleine and Dickinson, 1998b; Holland, 1998). This point is further emphasized by the failure to observe any effects in either of the two lesioned groups. Not only does this provide direct evidence of the success of the devaluation procedure in all groups, but also indicates that prefrontal mechanisms are not involved in the impact of devaluation procedures on Pavlovian approach responses. One possible reason for this is that the impact of devaluation in instrumental performance requires modulation of the value of goals through incentive learning, whereas the impact of devaluation on Pavlovian processes is mediated by more direct modulation of innate affective systems (Balleine, 2001).
The results of the present study suggest that the prelimbic and infralimbic areas act together as a coordinating interface between voluntary, guided behaviours and stimulus-bound, habitual response systems, respectively. This may be seen as the interface between behaviour guided by explicit declarative knowledge of the world and that governed by implicit, non-declarative knowledge that includes procedural skills, priming, Pavlovian conditioning and habituation processes (Squire and Zola, 1996). A potentially related distinction has been drawn between the selective role of prefrontal and anterior cingulate cortices in early learning (Gabriel and Orona, 1982; Bussey et al., 1996) and posterior cingulate region in late learning (Buchanan and Powell, 1982; Gabriel, 1990; Bussey et al., 1996), also suggesting the presence of multiple learning systems that may compete for behavioural expression.
The importance of this coordinating function in development (Goldman-Rakic, 1987), health and illness is very great. Changes in the balance of this system are likely to be associated with the progressive development of habitual drug addiction (Deminiere et al., 1989; Everitt and Wolf, 2002), associated with changes in ventromedial cortical function as the duration of exposure to drugs such as cocaine increases (Porrino and Lyons, 2000). Similarly, failures of this system would lead to disruption of executive control of goal-directed actions and are likely to be implicated in the loss of intentional control over action. Such losses of control are a frequent feature of neuropathology that follows frontal lobe damage and occur in a variety of human psychopathologies such as schizophrenia, autism and obsessive-compulsive disorders (Frith and Frith, 1999), as well as in normal, everyday-life lapses of attention (West and Alain, 2000).
|Level of training||15 × VI60||5 × VR20||Pre-feeding||Extinction|
|S1/2, Lp1/2 and O1/2 refer to alternative experimental chambers, lever press responses and reward types, respectively. See text for further details.|
|High||S1: Lp1 → O1||S1: Lp1 → O1||S1: Lp1 → Ø|
|O1 or O2||and|
|Low||S2: Lp2 → O2||S2: Lp2 → Ø|
|Level of training||15 × VI60||5 × VR20||Pre-feeding||Extinction|
|S1/2, Lp1/2 and O1/2 refer to alternative experimental chambers, lever press responses and reward types, respectively. See text for further details.|
|High||S1: Lp1 → O1||S1: Lp1 → O1||S1: Lp1 → Ø|
|O1 or O2||and|
|Low||S2: Lp2 → O2||S2: Lp2 → Ø|
The authors would like to thank Dr R. Honey for comments on a preliminary version of this manuscript. This work was supported by grants to S.K. from the Wellcome Trust and UK MRC.