For flexible control of behaviour, it is important to associate preceding behavioural response with its outcome. Since the dorsolateral prefrontal cortex (dlPFC) plays a major role in such control, it is likely that this area has a neuronal mechanism of coding response–outcome, such as reward/non-reward, based on the nature of the behavioural response made immediately before. To test this hypothesis, we examined neuronal activity in the dlPFC while monkeys performed a variant of the oculomotor delayed-response (ODR) task that had two reward conditions. In this task, the correct response was rewarded in half of the trials only and the subject could not expect the outcome (reward/non-reward). The response was followed by a fixation of 2 s (F2-period). We also employed a fixation (FIX) task that required monkeys to fixate on the peripheral target only, with two reward conditions that were similar to those in the ODR task. Post-response activity of a subset of dlPFC neurons was modulated by both the direction of the preceding response and its outcome. None of these neurons showed directional F2-period activity in the FIX task. These results suggest that a subset of dlPFC neurons represent response–outcome (i.e. reward/non-reward associated with directional saccade made immediately before).
As primates, including humans, can react flexibly to a given environment, it is important to associate the nature of behavioural response with its outcome. Since the dorsolateral prefrontal cortex (dlPFC) plays a major role in flexible/cognitive control of behaviour (Cohen et al., 1996; Miller and Cohen, 2001), this area may possess a neural mechanism to associate differential behavioural response with its outcome, particularly reward/non-reward. Indeed, the dlPFC in humans is activated by delivery of monetary reward (Thut et al., 1997; Pochon et al., 2002; Ramnani and Miall, 2003), while dlPFC neurons in monkeys show activity related to reward and/or expectancy of reward (Niki and Watanabe, 1979; Watanabe, 1989, 1996; Leon and Shadlen, 1999; Kobayashi et al., 2002; Watanabe et al., 2002). In addition to reward prediction, which was examined in these previous studies, it is also important in reward-based behavioural control to associate the immediately preceding response with its outcome, such as reward/non-reward (Balleine and Dickinson, 1998). Although a subset of dlPFC neurons appears to show differential reward-related activity depending on different behavioural responses (Watanabe, 1989), systematic studies on this issue are lacking; in particular, the neuronal association between cognitive spatial behaviour, whose control is a major function of the PFC (see below), and its outcome is completely unknown.
It is well known that a major role of the dlPFC is the cognitive control of spatial behaviour (Funahashi and Kubota, 1994; Goldman-Rakic, 1995; Fuster, 1997). The neuronal basis of this function has been extensively studied using the delayed-response paradigm. Such studies have demonstrated that the activity of dlPFC neurons is related to various aspects of these tasks, such as cue (Fuster, 1973; Niki, 1974; Kojima, 1980; Funahashi et al., 1990), delay (Kubota et al., 1974; Kojima and Goldman-Rakic, 1982; Funahashi et al., 1989; Rao et al., 1998; Sawaguchi and Yamane, 1999; Takeda and Funahashi, 2002) and response, i.e. saccades/reaching (Kojima, 1980; Boch and Goldberg, 1989; Funahashi et al., 1991). The magnitude of these task-related activities often differs according to the direction of the cue/response. Such ‘directional’ task-related activity has been considered a neural basis of visuospatial working memory processes that are critical for cognitive control of goal-directed spatial behaviour (Funahashi and Kubota, 1994; Goldman-Rakic, 1995; Fuster, 1997).
Thus, the dlPFC is likely to have a neuronal mechanism to associate different (directional) spatial behavioural response with its specific outcome, such as reward/non-reward, for flexible cognitive control of spatial behaviour (Cohen et al., 1996; Miller and Cohen, 2001; Schultz, 2001). We therefore hypothesized that a subset of neurons in the PFC codes response–outcome (reward/non-reward) based on the nature of the spatial response made immediately before, particularly a spatial working memory-guided response. To address this hypothesis, we recorded neuronal activity in the dlPFC while the monkey performed a variant of the oculomotor delayed-response (ODR) task (Funahashi et al., 1989), which had two conditions of reward contingency. In this task, the correct motor response was rewarded in half of the trials only and the monkeys could not anticipate the outcome (reward/non-reward) until the end of the response. We report here that post-response activity of a subset of dlPFC neurons was modulated by both the direction of preceding response and its outcome and, hence, coded the outcome associated with a differential (directional) spatial response.
Materials and Methods
Subjects and Task Procedures
Two male macaque monkeys (Macaca fuscata, ∼4.5 and ∼5.5 kg) were used as subjects. Throughout this study, the subjects were treated in accordance with the ‘Guide for Care and Use of Laboratory Animals’ of both the National Institutes of Health and our institutes.
Before training, the monkeys were habituated to a monkey chair and then preliminary surgery was performed under deep pentobarbital sodium anaesthesia (∼25 mg/kg i.v.) and aseptic conditions. The skull was partly exposed and two head-holding devices (stainless steel pipes, 8 mm internal diameter) were implanted on the anterior and posterior portions of the skull with dental acrylic. For grounding, small stainless-steel bolts (3 mm diameter) were anchored to the skull and fixed with dental acrylic. To prevent infection, antibiotics were injected i.m. on the day of surgery and daily for 1 week thereafter.
After recovery from the preliminary surgery, the subjects were trained to perform a variant of the ODR task with two different reward conditions: immediate reward (IM-Rw) and delayed reward (DL-Rw; Fig. 1a). In both conditions, the trial commenced when the monkey fixated on a central spot (a white square, 0.5 × 0.5°) on the CRT monitor. After 1.5 s, a cue (a white square, 0.5 × 0.5°) appeared at one of the six symmetric peripheral locations (eccentricity 15°; Fig. 1b) for 0.5 s. After a delay period of 3 s, the fixation spot turned off, which instructed the monkey to make a memory-guided saccade to the cued location. When the eye movement fell inside a target window of 5° from the cue position, the cue reappeared as a target of second fixation; in the IM-Rw condition only, a drop of water (∼0.1 ml) was delivered into the mouth of the monkey 500 ms later. In the DL-Rw condition, the click sound of the solenoid valve, which was used for reward delivery, was presented without delivery of the water. After the monkey fixated on the peripheral target for 2 s (i.e. 2.5 s from the end of the saccade), the same amount of reward (∼0.1 ml) was delivered in both conditions. The target remained for another 1 s and the monkey fixated on it, although this third fixation was not rewarded in both conditions. Throughout the trial, the eye position was restricted to within 5° of the central fixation point or peripheral target and if the monkey broke fixation, the trial was aborted. The two conditions were randomly intermixed so that the monkey could not anticipate whether his saccade would be rewarded immediately after (i.e. 500 ms) the saccade.
The monkeys were firstly trained on a conventional ODR task, where they were rewarded immediately after every correct saccade. Secondly, the two conditions were introduced simultaneously and then the duration of F2-period was gradually extended to 2.5 s. The monkeys were sufficiently over-trained in the final version of the task so that the order of training should be unrelated to differences in neuronal activity between the two conditions. At the end of the training session and throughout the recording sessions, the performance of both monkeys was almost perfect (>95% correct responses).
As a control task, we also employed a fixation (FIX) task with two reward conditions that were similar to the ODR task. After the subject fixated on the peripheral target for 2 s, the reward was delivered (IM-Rw condition), or the click tone of the solenoid valve was presented (DL-Rw condition). An additional 2 s-fixation (‘F2-period’) was followed by reward delivery in both conditions. The reward sizes were the same as for the ODR task.
The tasks and the recordings were controlled by a system consisting of an infrared eye-camera system (R-21C-A; RMS, Hirosaki, Japan), two personal computers (PC9801 FA and BA39; NEC, Tokyo) and other associated peripheral equipment. The eye-camera system was connected to the personal computers via A/D converters and was used for monitoring and sampling eye positions. The two personal computers were networked by RS232C and parallel I/O. One of the computers controlled the tasks, while the other monitored and collected the data for neuronal activities, eye positions and task events.
After training was completed, surgery for recording was performed. Under pentobarbital sodium anaesthesia (∼25 mg/kg, i.v.) and aseptic conditions, an oval opening was made in the skull to expose the dura over the frontal cortex and a stainless steel cylinder (20 × 40 mm) was implanted with dental acrylic. Prophylactic antibiotics were injected i.m. on the day of surgery and daily for 1 week thereafter.
The activity of single neurons was recorded with custom-made glass-insulated elgiloy microelectrodes (0.3–1.8 MΩ), using conventional electrophysiological techniques similar to those described in our previous studies (Sawaguchi and Yamane, 1999; Iba and Sawaguchi, 2002). The microelectrode was positioned using a pulse motor-driven micromanipulator (MO-81; Narishige, Tokyo) and a plastic grid with numerous small holes (0.7 mm internal diameter, 1.5 mm apart from each other) attached to the cylinder. We did not pre-screen neurons for task-related responses. Rather, we advanced the electrode until the activity of one or more neurons was well isolated and then commenced data recording. Data for neuronal activity were digitized by a window discriminator (DDIS-1; BAK Electronics, Germantown, MD). The data for task events and eye positions were also digitized using an A/D converter. These digitized data were stored in a data-collection computer and analysed off-line. Furthermore, all analogue data (i.e. neuronal activity, eye positions and task sequence) were recorded on digital audio tape (DAT) using an eight-channel DAT recorder (PC-208 M; Sony, Tokyo, Japan).
We focused on neurons in the dlPFC rostral to the frontal eye field (FEF) (Fig. 1c). To estimate physiologically the FEF, we applied intracortical microstimulation (ICMS; 22 cathodal pulses of 0.3 ms duration at 333 Hz, up to 100 µA) through the recording electrodes. When eye movements were elicited by the ICMS, the site was considered to be within the FEF (Bruce et al., 1985) and data recorded from these sites were excluded from this study.
Since we were interested in neuronal processes associated with response–outcome in this study, we focused on F2-period activity and applied a two-factor analysis of variance (ANOVA) to examine the effects of saccade direction and its outcome (reward/non-reward) on the activity during the F2-period. Neurons with significant main effects in both factors were the focus of the study, as activity is influenced by both the preceding response and its outcome (P < 0.05).
To examine the spatial tuning of the F2-period activity, we adapted the Gaussian function as follows:
where f(d) is the mean discharge rate as a function of the saccade direction and d is saccade direction, B indicates the ‘baseline’ discharge rate during the precue ‘control’ period (1 s before the cue presentation), D indicates the discharge rate at the best direction, T is 360°, A is the activity response strength and Td is an index of the tuning width, with a smaller Td value indicating sharper tuning.
We recorded activity from a total of 735 neurons in the dlPFC while the monkey performed the ODR task under the two conditions and we examined activity during the post-response F2-period (500–2500 ms after target onset). In sum, 309 (42%) neurons showed a significant main effect in at least one of the two factors, as revealed by two-way ANOVA (direction × reward condition, P < 0.05). Of the 309 F2 neurons, 156 (51%) showed significantly different activity between the reward conditions. Most of them (104/156, 67%) also showed significant differences according to the direction of the preceding response. We focused here on these 104 neurons, where post-response activity was affected by both response direction and reward-contingency, since they are likely to code response–outcome based on the nature of response made immediately before. This sample of neurons contained two distinct types: ‘Rw– type’ (n = 41), which showed significantly higher F2-period activity in the DL-Rw condition than in the IM-Rw condition, and ‘Rw+ type’ (n = 37), which showed significantly higher F2-period activity in the IM-Rw condition. The F2-period activity of the remaining 26 neurons (‘complex type’) was not as straightforward, showing a significant interaction between the direction and reward conditions in addition to the main effects. Most Rw– and Rw+ neurons did not show any clear change in activity preceding the onset of the response: for Rw– neurons, only 20% (8/41) showed directional cue and/or delay period activity (one-way ANOVA, P < 0.05; n = 3 for cue only, n = 2 for delay only and n = 3 for both cue and delay), whereas for Rw+ neurons, only 19% (7/37) showed directional cue and/or delay period activity (n = 2 for cue only, n = 3 for delay only and n = 2 for both cue and delay). The activity preceding the response was relatively weak and it was not apparent in the population activity of either Rw– or Rw+ neurons, as is evident in the population histograms (see below).
F2-period Activity of Representative Neurons
An example of Rw– neurons is shown in Figure 2a. In this figure, raster displays and averaged histograms of neuronal activity are illustrated separately for the two conditions and the six directions. This neuron showed an increase in activity during the F2-period in the DL-Rw condition, particularly in lower right (300°) trials. In contrast, this neuron showed little change in activity during the F2-period in the IM-Rw condition. The two-factor ANOVA confirmed the significant main effects of both directions and reward conditions (P < 0.01, for both factors). Figure 2b illustrates tuning curves of F2-period activity in the neuron shown in Figure 2a, which were estimated using a Gaussian function. F2-period activity in the DL-Rw condition had a clear spatial tuning (Td = 51.7°, best direction = 277°), which was not evident in the IM-Rw condition.
Figure 3a shows an example of Rw+ neurons. This neuron showed an increase in activity during the F2-period in the IM-Rw condition, particularly in 60 and 120° trials, showing a clear spatial tuning (Fig. 3b, Td = 61.5°, best direction = 102°). In contrast, this neuron did not show such a directional F2-period activity or spatial tuning in the DL-Rw condition. The ANOVA revealed significant main effects of the two factors (direction and reward condition, P < 0.01 for both factors).
Figure 4 shows two examples of ‘complex’ neurons, which show rather complex F2-period activity with significant interaction in addition to the main effects of reward and direction. In this figure, the mean discharge rate during the F2-period for each reward condition is plotted separately for the six directions, with a Gaussian function-fitting curve applied to each data set. The neuron in Figure 4a shows directional F2-period activity, especially in the DL-Rw condition, with a best direction of 354°, whereas this neuron showed spatial tuning for almost the opposite direction in the IM-Rw condition (best direction = 176°), although the tuning was not as clear. The neuron shown in Figure 4b showed a similar pattern of interaction between direction and reward condition, although the magnitude of F2-period activity was larger and spatial tuning was sharper in the IM-Rw condition (best direction = 275° for DL-Rw and 61° for IM-Rw conditions). Although these complex neurons are likely to code response–outcome, we excluded them from further analyses, because their relatively small number (n = 26), and complex activity prohibited analysis.
Further Examinations of the Properties of F2-period Activity
To examine further the properties of the activity of Rw– and Rw+ neurons, we summed neuronal activity for the direction that had the greatest differential activity between the two conditions, separately for each type, and made population histograms (Rw– and Rw+, n = 41 and 37, respectively; see Fig. 5). In addition, at the population level, the two types of neurons showed distinct F2-period activity according to the reward condition, although both the magnitude and time course were similar between the two types of neurons. Further, at the population level, both types of neurons did not show any clear activity change preceding the response, indicating that these neurons form distinct groups that are different from neurons showing activity in other periods such as cue and delay, as described above.
To compare the spatial tuning in Rw– and Rw+ neurons for DL-Rw and IM-Rw conditions, respectively, we calculated an index of the tuning width (Td) and the best direction using a Gaussian function. Td values were similarly distributed for Rw– and Rw+ neurons in DL-Rw and IM-Rw conditions, respectively (mean ± SD, 76.59 ± 18.76 versus 81.89 ± 20.68 for Rw– and Rw+ neurons in DL-Rw and IM-Rw conditions, respectively; P > 0.05, Mann–Whitney U-test). The best directions were also similarly distributed between Rw– and Rw+ neurons (mean ± SD, 188 ± 107° versus 163 ± 109° for Rw– and Rw+ in DL-Rw and IM-Rw conditions, respectively; P > 0.05, Mann–Whitney U-test) and did not show any bias in the visual field. Thus, the two groups of neurons showed similar spatial tuning in terms of width and direction.
Considering the nature of the present behavioural paradigm, it was possible that the neuronal response of Rw– and Rw+ neurons during the F2-period was influenced by the reward condition of previous trials. To examine this possibility, we calculated the mean discharge rate separately according to the condition of the immediately preceding trials. Figures 2c and 3c show the results of such analysis on each neuron shown in Figures 2a and 3a, respectively. In these figures, the recalculated F2-period activity of the direction and condition with maximum activity (i.e. 300° in DL-Rw condition for Fig. 2, 60° in IM-Rw condition for Fig. 3) are illustrated separately for the condition of previous trials. As shown in Figures 2c and 3c, neither Rw– nor Rw+ neuron showed any clear difference in activity between reward conditions of immediately preceding responses. Similar results were obtained for all of Rw– and Rw+ neurons. Thus, Rw– and Rw+ neurons did not appear to be influenced by history of the reward delivery in the previous trials.
F2-period Activity in the FIX Task
Although the F2-period activity of Rw– and Rw+ neurons showed a clear spatial tuning, it was possible that directional selectivity was associated not with the directions of saccade made immediately before, but with the eye position on a specific peripheral target. To test this possibility, some Rw– and Rw+ F2 neurons were also examined in a fixation (FIX) task under two reward conditions that were similar to the ODR task (see Materials and Methods), in which the monkey was required to fixate solely on the target that appeared at one of the six locations. After fixation on the target for 2 s, the reward was delivered in the IM-Rw condition only; an additional 2 s-fixation (‘F2-period’) was followed by reward in both the DL-Rw and IM-Rw conditions. F2-period activities of the two representative neurons examined in both tasks are illustrated in Figure 6, in which the F2-period activity of the direction with most differential activity between the two reward conditions in the ODR task are illustrated separately for the four conditions. The neuron in Figure 6a showed higher F2-period activity, especially in the lower right (240°) trials (illustrated direction) in the DL-Rw condition of the ODR task only (i.e. Rw– type). However, such a difference did not appear in the FIX task, despite using the same eye position during this F2-period. Further, this neuron showed spatial tuning in the DL-Rw condition in the ODR task only (Fig. 6c). Similarly, an Rw+ neuron in Figure 6b showed differential activity in the IM-Rw condition of the ODR task only and had spatial tuning for only one of the four conditions (i.e. IM-Rw condition in the ODR task; Fig. 6d).
Among 11 neurons examined in the four conditions of the two tasks (DL-Rw and IM-Rw conditions of the ODR and FIX tasks), none showed significantly different F2-period activity according to the reward conditions in the FIX task. To clarify this point, we made population histograms of Rw– and Rw+ neurons that were examined in both the ODR and FIX tasks (n = 7 for Rw–, n = 4 for Rw+), illustrated in Figure 7. As expected, the neuronal population showed differential F2-period activity between reward conditions in the ODR task only (Fig. 7a and 7b for Rw– and Rw+ neurons, respectively), while such a significant difference in activity was not observed in the FIX task (Fig. 7c and 7d for Rw– and Rw+ neurons, respectively).
The present study examined neuronal activity in the dlPFC of monkeys performing a variant of the ODR task under two reward conditions. We found that a subset of neurons showed differential post-response (‘F2-period’) activity, depending on whether the response had been rewarded, the magnitude of which differed with the direction of saccade made immediately before. The differences between the directions did not appear to be associated with the position of the gaze, but, rather, with the behavioural response (i.e. directional memory-guided saccade) made immediately before. The difference in activity between reward and non-reward conditions could not be ascribed to any aspects of the saccadic response and sensory events, because the subject was not able to anticipate reward delivery/non-delivery until the end of saccade and the visual and auditory (including the click sound by reward delivery) events were exactly the same in both conditions. Further, the directional difference in activity cannot be explained by such factors as licking movements or gustatory stimuli, since these factors should be the same for all directions of each condition. In addition, the difference in activity between two conditions did not seem to represent differences in the predictability of reward, because in our paradigm, DL- and IM-Rw conditions were randomly intermixed and the same amount of reward was delivered in both conditions after the F2-period. Thus, our findings suggest that a subset of neurons in the dlPFC code directional spatial response–outcome (i.e. reward/non-reward associated with directional saccade made immediately before).
Several studies have suggested that single neurons in the PFC show directional delay-period activity that is modulated by predicted reward (Watanabe, 1996; Leon and Shadlen, 1999; Kobayashi et al., 2002). Further, the PFC neurons have been demonstrated to code associations between sensory stimuli and the predicted reward (Watanabe, 1990, 1992). These studies have suggested that such reward prediction can contribute to reward-based control/learning of behaviour. However, it is also important for reward-based behavioural control to associate reward/non-reward with the response made immediately before (Balleine and Dickinson, 1998). We demonstrate here that the dlPFC neurons show post-response activity that was modulated by both the direction of the prior response and its reward-contingency, suggesting that such neurons encode retrospective, but not prospective/expectational, response–outcome association.
A few previous studies using tasks with GO/NO-GO responses reported that neurons in the dlPFC showed differential reward-related post-trial activity according to the preceding response, i.e. GO/NO-GO (Niki and Watanabe, 1979; Kubota and Komatsu, 1985; Watanabe, 1989). However, systematic analysis of these post-trial activities has been limited, particularly for cognitive spatial behaviours. In contrast to these previous studies, we adopted a variant of an ODR task that has been widely used to examine the involvement of the PFC in visuospatial working memory in monkeys (Funahashi et al., 1989; Sawaguchi and Iba, 2001) as well as humans (Sweeney et al., 1996). These studies with the ODR task have suggested that the dlPFC is involved in the cognitive control of spatial behaviour, particularly spatial working-memory guided behaviour (for reviews, see Funahashi and Kubota, 1994; Goldman-Rakic, 1995; Fuster, 1997). In agreement, several lines of research have reported that the dlPFC is critical for performance of a spatial self-ordered task (Passingham, 1985; Owen et al., 1996; Levy and Goldman-Rakic, 1999), suggesting that this region is involved in the monitoring and/or manipulation of outcome of a spatial response. Here, for the first time, we found that distinct dlPFC neuronal groups code the outcome (reward/non-reward) of a different spatial response made immediately before. The PFC has been suggested to be involved in processes of updating the status to select a response (Owen et al., 1996; Funahashi and Inoue, 2000; Petrides, 2000), which is essential for contextually dependent, flexible control of behaviour. The Rw– and Rw+ neurons described here may subserve such a flexible/dynamic control of appropriate spatial behaviour by means of contributing to updating signals.
Several lines of research have provided evidence that the dlPFC is involved in learning (Petrides, 1986; Asaad et al., 1998; Fletcher et al., 2001), in which a response is reinforced according to the response–reward contingency and the response is extinguished when a response is not rewarded. Rw– and Rw+ neurons have characteristics that are typically associated with learning processes, such as reinforcement and extinction, although our paradigm did not require any learning of outcome information.
Thus, for the first time, we provide evidence that the dlPFC contains distinct neuronal groups that code outcome following the differential spatial response, which should play a role in associating different (directional) response with its outcome. Such coding may be critical in flexible control and/or learning of appropriate spatial behaviour according to a rule or situation that can change dynamically, which has long been considered a critical role of the PFC (Milner, 1963; Miller and Cohen, 2001; Wallis et al., 2001). However, it is still unclear whether the activation of F2 neurons is actually used for such behavioural control, because our paradigm did not explicitly require the subject to select the next response based on the response–outcome; this problem/topic should be approached in further studies using more appropriate behavioural paradigms. Nevertheless, our findings of distinct neuronal groups that code response–outcome would be significant for understanding the neuronal mechanisms of flexible control/learning of cognitive spatial behaviours mediated by the dlPFC.
The authors thank Dr K. Amemori for valuable comments on this study. The authors also thank K. Watanabe-Sawaguchi and E. Ishida for their assistance with animal care and surgery. This work was supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science, and Technology.