In 2 human event-related brain potential (ERP) experiments, we examined the feedback error–related negativity (fERN), an ERP component associated with reward processing by the midbrain dopamine system, and the N170, an ERP component thought to be generated by the medial temporal lobe (MTL), to investigate the contributions of these neural systems toward learning to find rewards in a “virtual T-maze” environment. We found that feedback indicating the absence versus presence of a reward differentially modulated fERN amplitude, but only when the outcome was not predicted by an earlier stimulus. By contrast, when a cue predicted the reward outcome, then the predictive cue (and not the feedback) differentially modulated fERN amplitude. We further found that the spatial location of the feedback stimuli elicited a large N170 at electrode sites sensitive to right MTL activation and that the latency of this component was sensitive to the spatial location of the reward, occurring slightly earlier for rewards following a right versus left turn in the maze. Taken together, these results confirm a fundamental prediction of a dopamine theory of the fERN and suggest that the dopamine and MTL systems may interact in navigational learning tasks.
Human learning is mediated by multiple interacting neural systems: the midbrain “dopamine system,” the basal ganglia, the prefrontal cortex, the anterior cingulate cortex (ACC), the hippocampus, the parahippocampal cortex, the cerebellum, the amygdala, and many others. Although these systems appear to process the same information—namely, stimulus events, responses, and reinforcers—each encodes the relationships between the elements of that information in its own way. Thus, for example, learning to find rewards in a novel environment depends both on reinforcement learning mediated by midbrain dopamine neurons and their neural targets (e.g., the basal ganglia, ACC, and prefrontal cortex), hereafter, the dopamine system, and on spatial learning mediated by the medial temporal lobe system (MTL system), which is comprised of the hippocampal formation (i.e., CA fields, subicular complex, and the dentate gyrus) and surrounding structures in the “parahippocampal region” of medial temporal cortex (i.e., parahippocampal, perirhinal, and entorhinal cortices). In a simple navigational task, the dopamine system reinforces behaviors that yield reward (Schultz 1997, 1999, 2006; Schultz et al. 1997; Miller and Cohen 2001; Packard and Knowlton 2002), whereas the MTL system has the ability to compress and store information related to the external environment (McClelland et al. 1995; Tulving and Markowitsch 1997, 1998; O'Reilly and Rudy 2001; Squire et al. 2004; Duzel et al. 2003). Nevertheless, although both the dopamine and MTL systems contribute to navigational learning, the interaction between these systems in the service of common goals is not well understood (Packard and Knowlton 2002; White and McDonald 2002; Atallah et al. 2004). Here we investigated this issue in a human event-related brain potential (ERP) experiment using a task that involves both reinforcement learning and spatial processing.
Maze Learning: A Special Case
Maze learning presents an interesting problem for investigation because it could in principle depend on the MTL system, the dopamine system, or both (Packard and McGaugh 1996; Packard and Knowlton 2002; White and McDonald 2002; Poldrack and Packard 2003; Squire et al. 2004). Following the pioneering work of Thorndike (1933), theorists initially assumed that mazes were learned by a process of reinforcement learning. According to Thorndike's Law of Effect, the appropriate path through a maze could be learned through a lengthy trial-and-error process whereby if an action is followed by a reward or punishment then that action will be more or less likely, respectively, to reoccur (Catania 1999). On this view, instances of serially organized behavior reduce to discrete stimulus–response units, each linked to the next by virtue of reinforcement (Terrace 2005). It is now well documented that dopamine neurons in the ventral tegmental area and substantia nigra pars compacta distribute information about rewarding events, such that phasic increases in dopamine activity, seen as bursts of action potentials, are elicited when events are “better than expected,” and phasic decreases of dopamine activity, seen as transient cessations from baseline firing rate, are elicited when events are “worse than expected” (Schultz 1997, 1998a, 1998b). And with learning, the dopaminergic signal “propagates back in time” from the time the reward is delivered to stimuli that predict the outcome (Schultz et al. 1997; Schultz 1998a; Suri and Schultz 2001; Montague et al. 2004). Further, several groups of investigators have noted similarities between this phasic activity of the midbrain dopamine system and a particular reinforcement learning signal called a temporal difference error (TD error) (Schultz et al. 1997). These investigators have suggested that the midbrain dopamine system carries the TD errors to its neural targets, where the signals are used for the purpose of action selection and reinforcement learning (for review, see Montague et al. 2004). Importantly, these TD errors appear to be utilized by cortical structures (especially orbital frontal cortex, dorsolateral prefrontal cortex, and ACC) and the basal ganglia for the purpose of cognitive control and decision making (Schultz et al. 1998, 2000; Miller and Cohen 2001; Suri 2001; Cohen et al. 2002; Holroyd and Coles 2002; Packard and Knowlton 2002; Frank and Claus 2006).
In a seminal study, however, Tolman (1948) demonstrated that reinforcement learning alone could not account for maze learning. In his study, rats were allowed to explore a complex maze for several days in the absence of reward. Once a reward was introduced and the rats found the reward, then on subsequent trials the rats easily returned to the reward location. Tolman argued that this learning could not be mediated by reinforcement, as the rats learned the reward location without recourse to a lengthy process of trial-and-error. Rather, Tolman proposed that the rats constructed a “cognitive map” of the maze during the initial exploratory phase, which they later utilized in the latter phase after the reward was introduced. These studies demonstrated that rats do not rely solely on reinforcement learning principles to find rewards, as once assumed by stimulus–response theorists, but actually utilize a cognitive map to get there (Tolman 1948). Relatively recent evidence has revealed that the formation of such cognitive maps depends on a system of anatomically related structures in the MTL, specifically the hippocampal formation and the parahippocampal region (O'Keefe and Dostrovsky 1971; Aguirre et al. 1996; Maguire et al. 1999; Burgess et al. 2002).
Today, this system is understood to be involved in more than just cognitive map formation. Rather, the MTL contributes in general to the rapid encoding of configural, nonoverlapping representations (O'Reilly and Rudy 2001), of which cognitive map formation is an example, as well as to the production of familiarity signals associated with these representations (Norman and O'Reilly 2003). Figure 1 illustrates the neural connectivity within the MTL system. The memory formation process begins with the parahippocampal and perirhinal cortices, which relay highly processed sensory and cognitive information from all 4 association areas of the neocortex via the entorhinal cortex to the hippocampal formation (Suzuki and Amaral 1994; Burwell and Amaral 1998; Lavenex and Amaral 2000; Witter et al. 2000). In particular, the parahippocampal cortex receives visuospatial “where” information from unimodal and polymodal association areas and uses this information for the purpose of both allocentric and egocentric based navigation as well as for encoding landmarks, object-in-place associations, scene details, and topographical layouts (Maguire et al. 1998; Burgess et al. 2002; Spiers and Maguire 2004). By contrast, the perirhinal cortex receives object “what” information from the association areas and uses this information for the purpose of nonspatial aspects of navigation such as object recognition and identification and associating objects with other objects and with abstract concepts (e.g., progress towards a goal; Murry and Richmond 2001). Theories of the parahippocampal region emphasize a role for this brain area in storing intermediate-term memories by compressing individual stimuli into compound percepts (Eichenbaum et al. 1994) and in producing a familiarity signal related to previous experiences (Norman and O'Reilly 2003).
Theories of hippocampal function hold that the hippocampus rapidly binds together the elements of such information from the parahippocampal region into unitary conjunctive representations of specific events, a function called configural learning (McClelland et al. 1995; O'Reilly and Rudy 2001; Squire et al. 2004). With regard to spatial navigation, Manns and Eichenbaum (2006) recently proposed that the hippocampus combines information about spatial context from the parahippocampal cortex with nonspatial information about items from the perirhinal cortex (see also Diana et al. 2007; Eichenbaum et al. 2007). Essentially, the hippocampus creates topographical memories of the environment, or cognitive maps, that can be put in the service of navigation (Landis et al. 1986; Habib and Sirigu 1987; Aguirre et al. 1996; Maguire et al. 1996), and the parahippocampal region—particularly in the right hemisphere—can be considered a gateway that passes topographical information to and from the cognitive map stored in the hippocampus and that stores memories related to this information.
Reinforcement Leaning, Cognitive Maps, and the ERP
Because of their high temporal resolution, ERPs have provided important timing information about neural processes involved in perception, selective attention, emotion, language processing, and memory (Rugg and Coles 1995). Within the last decade, ERP studies have examined how an evaluative system that processes rewarding events is implemented in the brain. Specifically, in studies where human participants received feedback in trial-and-error learning tasks, analysis of the ERPs following the feedback stimulus revealed a difference between correct trials and error trials. This ERP component became known as the feedback error–related negativity (fERN) and is characterized by a negative deflection at fronto-central recording sites that peaks approximately 250 ms following negative feedback presentation (Miltner et al. 1997; Holroyd and Coles 2002; Holroyd et al. 2004; Nieuwenhuis et al. 2004, 2005). Recent evidence has indicated that this negative deflection associated with negative feedback is the system's default response and that this negativity is abolished following unexpected positive feedback (Holroyd et al., forthcoming.). The evaluative system that produces the fERN appears to classify outcomes into binary categories—as events that either do or do not indicate that a task goal has been achieved (Holroyd et al. 2003; Hajcak et al. 2006). The reinforcement learning theory of the ERN (RL-ERN theory) holds that the fERN is associated with the impact of a TD error signal carried by the midbrain dopamine system onto the motor areas of the ACC and that the ACC uses these TD error signals to improve performance on the task at hand according to principles of reinforcement learning (Holroyd and Coles 2002). For these reasons, the fERN would be an appropriate candidate to investigate reward processing in a simple navigational task.
To the best of our knowledge, ERPs have yet to be utilized to investigate topographical learning in humans. This apparent lack of interest may be due to the consensus view that the hippocampus is located too deep in the brain to be detected with electrodes placed at the scalp and because of its spiral organization, which would likely produce a closed electromagnetic field (Braun et al. 1990; see also Allison 1982; Lutzenberger et al. 1987). These problems notwithstanding, the activity of neighboring cortical areas in MTL are amenable to investigation with ERPs. For example, neural activity in the fusiform gyrus—which is located immediately adjacent to parahippocampal cortex—is thought to produce the N170, an ERP component that is sensitive to the processing and recognition of faces (Rossion et al. 2003; Itier and Taylor 2004; Iidaka et al. 2006; Deffke et al. 2007). Given that parahippocampal cortex is laterally bounded by the fusiform gyrus (McDonald et al. 2000; Kahn et al., forthcoming), with comparable neural organization and orientation, we suggest that parahippocampal activity can create a scalp potential like the N170 (see Fig. 1). Specifically, we suggest that the N170 could be an appropriate candidate to investigate scene or place processing by the parahippocampal cortex (Aguirre et al. 1996; Maguire 2001). In principle, an ERP experiment could provide critical temporal information related to the role of parahippocampal cortex in topographical learning and memory, as is suggested by the results of hemodynamic and intracranial brain imaging studies involving spatial navigation through virtual environments (Maguire 2001; Burgess and O'Keefe 2003; Ekstrom et al. 2003; Ekstrom and Bookheimer 2007). For the purpose of clarity, we hereafter refer to this ERP component as the “topographical” N170 (NT170) so as not to confuse it with the “face” N170.
The Present Study
In summary, both the dopamine and MTL systems appear to support maze learning, each according to its own particular function (Packard and McGaugh 1996; White and McDonald 2002; Packard and Knowlton 2002). Nevertheless, it remains unclear how these neural systems interact. Here we investigated this issue in 2 experiments in which we recorded ERPs from humans engaged in a novel virtual T-maze task. Our purpose was 4-fold. First, in Experiment 1, we wished to demonstrate that rewards and punishments in the virtual T-maze task differentially modulate the amplitude of the fERN. We have previously suggested that the fERN is associated with the impact of TD error signals carried by the midbrain dopamine system onto motor areas in ACC, where they are utilized for the adaptive modification of behavior (Holroyd and Coles 2002). Thus, our first goal in this study was to replicate the standard fERN effect using the novel T-maze task. Second, in Experiment 2, we wished to test a central claim of the RL-ERN theory, which is that fERN amplitude, like phasic activity of the midbrain dopamine system, is sensitive to events that “first” indicate when things are better or worse than expected (Holroyd and Coles 2002). On some trials of this experiment, participants passed a predictive cue in the maze indicating whether or not they would find a reward at the maze's end, whereas on the remaining trials, the cue was noninformative. Consistent with the RL-ERN theory, we predicted that on predictive trials fERN amplitude would be differentially modulated by the cue and not by the feedback, whereas on nonpredictive trials fERN amplitude would be modulated by the feedback and not by the cue. Third, in Experiment 1, we examined the ERP for evidence that the MTL system, particularly the parahippocampal region, contributes to the formation of conjunctive representations of the maze. To foreshadow our results, we found that the NT170 was in fact sensitive to the spatial location of the feedback stimulus in the maze, occurring earlier for feedback stimuli found in the right alley in the maze compared with the left alley. Fourth, given the novelty of this NT170 finding, we sought to replicate it in Experiment 2. Taken together, our results provide evidence for contributions by both the dopamine system and the MTL system toward navigational learning in humans.
Twelve participants (6 male and 6 female, aged 18–25) were included in Experiment 1. All participants had normal or corrected-to-normal vision and none had a history of head injury. All were undergraduate students recruited from the University of Victoria and each received course credit as well as a monetary bonus associated with the experimental task. The amount of money depended on the probability of the reward, as described below. All participants gave informed consent. The study was approved by the local research ethics committee and was conducted in accordance with the ethical standards prescribed in the 1964 Declaration of Helsinki.
The virtual T-maze was constructed using commercially available computer software (Home design 3D; Expert Software Inc., Coral Gables, FL). In keeping with the classical design of the T-maze, this virtual version consisted of a stem and 2 alleys extending at 90° angles out from a junction point with the stem and was located on a virtual grass landscape with an open ceiling exposed to a cloudy blue sky (Fig. 2; top panel). The stem and the 2 alleys appeared to be made of different types of brick, the texture and color of which were held constant for the stem but which were counterbalanced across subjects for the left and right alleys. To give participants a first person, video game-like experience for the experiment, still images were taken from cardinal points inside the 3-dimensional maze (start, left, and right alley), to be used as the imperative stimuli in the task. The start image was viewed from the base of the maze, looking down the length of the stem toward the maze junction point; left and right alley images were viewed from the junction point looking down the length of the choice arms (Fig. 2; bottom panel). Participants viewed all stimuli from a distance of about 70 cm (13.9° wide, 9.8° high) and displayed on a 17-inch computer monitor using E-Prime experiment control software (Psychological Software Tools, Pittsburgh, PA).
Participants were seated comfortably in an electromagnetically shielded room under dim lighting and were asked to position their hand and forearm so that both fingertips of the index fingers would rest on a standard E-prime SRX Button Box placed in front of them. Participants were provided with both written and verbal instructions that explained the procedure and stressed that they should maintain correct posture and minimize head movement and eye blinks. They were then shown 3 different aerial views of the maze, each for 3 s, to familiarize themselves with its virtual dimensions (Fig. 2 [top]). On each trial, participants were first presented with the stem image for 1000 ms, followed by a green double arrow (2.7° wide and 1.6° high) appearing in the center of the stem image at the maze intersect (Fig. 2 [bottom]), which together remained on the screen until the participant made their alley selection. Participants were instructed to press button 1 with their left index finger to select the left alley or to press button 2 with their right index finger to select the right alley. At the time of their response, the image of the selected alley appeared for a duration of 500 ms (Fig. 2 [bottom]). Then, an apple image or an orange image appeared in the center of the alley image (1.6° wide and high, viewed against the alley background; Fig. 2 [bottom]). Together, the alley and fruit image remained on the screen for 1000 ms. Participants were told that the presentation of one type of fruit indicated that the alley they selected contained 5 cents (reward) and that the presentation of the other type of fruit indicated that the alley they selected was empty (no reward); the mappings between feedback stimuli and feedback types were counterbalanced across participants. Participants were also informed that at the end of the experiment they would be rewarded all the money they found and that they should respond in a way that would maximize the total amount of money earned. Unbeknownst to them, on each trial, the type of feedback was selected at random (50% probability for each feedback type). The feedback was followed by a blank screen delay for 1000 ms and then the next trial began. The experiment consisted of 4 blocks of 100 trials each separated by self-paced rest periods. At the end of the experiment, participants were informed about the probabilities and were given a $10 performance bonus.
Data Acquisition and Analysis
The electroencephalogram (EEG) was recorded using a montage of 36 electrode sites in accordance to the extended international 10–20 system (Jasper 1958). Signals were acquired using Ag/AgCl ring electrodes mounted in a nylon electrode cap with an abrasive, conductive gel (EASYCAP GmbH, Herrsching-Breitbrunn, Germany). Signals were amplified by low-noise electrode differential amplifiers with a frequency response of DC 0.017–67.5 Hz (90 dB–octave roll off) and digitized at a rate of 250 samples per second. Digitized signals were recorded to disk using Brain Vision Recorder software (Brain Products GmbH, Munich, Germany). Interelectrode impedances were maintained below 10 kΩ. Two electrodes were also placed on the left and right mastoids. The EEG was recorded using the average reference. The electroocculogram (EOG) was recorded for the purpose of artifact correction; horizontal EOG was recorded from the external canthi of both eyes, and vertical EOG was recorded from the suborbit of the right eye and electrode channel Fp2.
Postprocessing and data visualization were performed using Brain Vision Analyzer software (Brain Products GmbH). The digitized signals were filtered using a fourth-order digital Butterworth filter with a passband of 0.10–20 Hz. An 800 ms epoch of data extending from 200 ms prior to 600 ms following the onset of each feedback stimulus, as well as alley choice, was extracted from the continuous data file for analysis. Ocular artifacts were corrected using the eye movement correction algorithm described by Gratton et al. (1983). The EEG data were re-referenced to linked mastoids electrodes. The data were baseline corrected by subtracting from each sample the mean voltage associated with that electrode during the 200-ms interval preceding stimulus onset. Muscular and other artifacts were removed using a ±100 μV level threshold and a ±50 μV step threshold as rejection criteria. ERPs were then created for each electrode and participant by averaging the single-trial EEG according to feedback (reward, no reward) and feedback location (feedback in left alley, feedback in right alley).
The fERN was measured at channel FCz, where it reaches maximum amplitude (Holroyd et al. 2004; Nieuwenhuis et al. 2004; Holroyd and Krigolson 2007). For each participant, the fERN was evaluated as a difference wave by subtracting the reward feedback ERPs from the corresponding no-reward feedback ERPs (Holroyd and Coles 2002; Holroyd and Krigolson 2007). The peak amplitude of this difference wave was obtained by detecting its maximum negative deflection within a 600 ms window following the onset of the feedback stimulus. The peak amplitude of the maximum negative deflection was statistically tested against zero using a 1-sample t-test.
The N170 typically reaches maximal amplitude bilaterally over lateral temporal and parieto-occipital scalp areas within 140–190 ms following stimulus onset (Bentin and Deouell 2000; Minnebusch et al. 2007). We found the scalp location where the NT170 component was maximal by averaging the EEG data across all conditions and applying the following algorithm to the ERP data. The NT170 was measured base to peak at each electrode channel by first identifying the most positive value of the ERP within a 50–125 ms window following the presentation of the feedback, the latency of which was taken as the time of onset of the NT170. Next, the maximum negative value of the ERP within a window extending from NT170 onset to 200 ms following the presentation of the feedback was identified, the time of which was taken as the peak latency of the NT170. NT170 amplitude was defined as the difference in the ERP values associated with the component maximum and the component onset. We then found the channel location where the NT170 was largest and then tested this value with paired-sample t-tests against the values associated with all other channels. This procedure indicated that the NT170 was maximal at channel PO8 (see Results), in accordance with previous literature (Minnebusch et al. 2007). For this reason, the data associated with this electrode were selected for further analysis. Specifically, we analyzed NT170 latency and amplitude as functions of feedback type (i.e., reward vs. no reward) and feedback location (i.e., whether the feedback was presented in the left alley or the right alley) using repeated measures analyses of variance (ANOVAs). The Greenhouse–Geisser correction for nonsphericity was applied where appropriate.
The fERN was quantified at channel FCz, where it was maximal. Figure 3 presents stimulus-locked grand averages for both feedback conditions at channel FCz. Consistent with previous research, the fERN was clearly evident as a sharp negative deflection in the difference wave (M = −6.84 uV, standard error [SE] ± 0.88) that peaked 258 ms after feedback onset (Holroyd et al. 2004; Nieuwenhuis et al. 2004; Holroyd and Krigolson 2007). This difference was significant, t11 = −5.89, P < 0.0001, 95% confidence interval = [−9.4 uV, −4.3 uV], and exhibited a frontal–central scalp distribution with a maximum at channel FCz (Fig. 3). These results are characteristic of the fERN (e.g., Miltner et al. 1997; Holroyd and Krigolson 2007) and indicate that the T-maze task is capable of eliciting this ERP component.
NT170 amplitudes averaged across all trials for all electrodes are shown in Figure 4. From visual inspection, it is apparent that the NT170 was maximal at electrode site PO8, which is located over the far right, posterior area of the scalp; this impression was confirmed by pairwise t-tests, which indicated that the NT170 was significantly larger at channel PO8 than at every other channel (P < 0.01). For this reason, the NT170 was measured at channel PO8. NT170 amplitude and latency were each submitted to a 2-way ANOVA with repeated measures having 2 levels of feedback type (reward, no reward) and 2 levels of feedback location (left alley, right alley). In regards to amplitude, the ANOVA did not reveal any main effects nor an interaction (P > 0.05). By contrast, NT170 latency was affected by the location of the feedback stimulus in the maze (Fig. 5). A 2-way repeated measures ANOVA on NT170 latency revealed a main effect of feedback location, F1,11 = 45.92, P < 0.001, η2 = 0.807, but no main effect of feedback type (P > 0.05) nor an interaction between feedback type and feedback location (P > 0.05). The latency of the NT170 was significantly greater for the feedback presented in the left alley (187.00 ms, SE ± 2.2) than feedback presented in the right alley (180 ms, SE ± 1.9). As a check, we also tested for this latency difference at all other electrode sites; the difference was significant at channel P8, F1,11 = 17.92, P < 0.001, η2 = 0.607, but was not significant at any other channel location (P > 0.05).
To our knowledge, this study is the first to examine electrophysiological measures of feedback-related processing using a virtual maze environment. Our first goal was to replicate the standard fERN effect using a novel T-maze task. Our results clearly demonstrate that fERN amplitude was modulated by reward versus no-reward feedback in the virtual T-maze task. Our second goal in Experiment 1 was to examine the ERP for evidence of medial temporal cortex activation, particularly the parahippocampal region, in a virtual environment. Interestingly, we found that the latency of the NT170 was sensitive to the spatial location of the feedback stimulus in the maze, occurring about 7 ms earlier for feedback stimuli found in the right alley in the T-maze compared with feedback stimuli found in the left alley in the T-maze. The amplitude of the NT170 was maximal at channel PO8, consistent with previous literature on the N170 (Minnebusch et al. 2007).
To expand on these findings, we ran a subsequent experiment using a modified version of the T-maze, the “TF-maze.” Our first goal in Experiment 2 was to test a central claim of the RL-ERN theory, which is that the fERN reflects the arrival of a TD error signal conveyed by the mesencephalic dopamine system to motor-related areas in the ACC, where the signal is used for the adaptive modification of behavior (Holroyd and Coles 2002). We predicted that fERN amplitude, like the phasic activity of the midbrain dopamine system, would be sensitive to events that are better or worse than expected such that unpredicted feedback would elicit the fERN, but if the feedback were predicted by a preceding cue, then fERN amplitude would be modulated by the predictive cue and not by the feedback. Although this a fundamental prediction of the RL-ERN theory, a clear inverse relationship between the fERNs elicited by the predictive cue and the feedback has not yet been reported (Holroyd and Coles 2002; but see also Krigolson and Holroyd 2007; Dunning and Hajcak 2007). In addition, given the novelty of our finding that NT170 latency depends on the spatial location of the reward, our second goal in Experiment 2 was to replicate this result.
Twelve participants (5 male and 7 female, aged 18–26) participated in Experiment 2. All participants had normal or corrected-to-normal vision and none reported a history of head injury. All were undergraduate students recruited from the University of Victoria and each received course credit as well as a monetary bonus associated with the experimental task. The amount of money depended on the probability of the feedback, as described below. All participants gave informed consent. The study was approved by the local research ethics committee and was conducted in accordance with the ethical standards prescribed in the 1964 Declaration of Helsinki.
Stimuli and Procedure
Experiment 2 was identical to Experiment 1, except that the T-maze was constructed in the shape of a tuning fork (a “TF-maze”), and predictive cues were placed at the front alley of the maze arms (Fig. 6 [right]). As in Experiment 1, participants were shown 3 different aerial views of the maze, each for 3 s, to familiarize themselves with its virtual dimensions and placement of the predictive cues and feedback stimuli. On each trial, participants were shown the start image (500 ms), followed by the green double arrow, which remained on the screen until participants pressed a left button to enter the left arm or a right button to enter the right arm (Fig. 6 [left]). To note, each arm contained 2 alleys: a front alley (predictive cue location) and a back alley (feedback stimulus location). At the time of their response, the image of the front end of the selected arm appeared for 500 ms. Then, in contrast to Experiment 1—when a feedback stimulus appeared—1 of 3 predictive cues appeared (1000 ms). Specifically, 1 of 3 colored squares (i.e., blue, yellow, green; 1.6 °) that indicated the upcoming outcome (predictive-reward cue, predictive-no-reward cue, nonpredictive cue) appeared against the far wall of the front alley of the chosen arm; the mappings between color and predictive type were counterbalanced across participants. Immediately following, participants viewed the back alley from the bend of the front alley for 500 ms, followed by the appearance of a feedback stimulus on the far wall of the back alley (1000 ms). Unbeknownst to the participants, the type of predictive cue on each trial was selected at random (1/3 probability for each of the 3 cues). The participants were informed (truthfully) that if a predictive-reward cue appeared at the front alley of the chosen arm of the maze, then they would receive reward feedback at the back alley, and conversely, if a predictive no-reward cue appeared at the front alley, then they would receive no-reward feedback at the back alley. Further, they were told that if a nonpredictive cue appeared at the front alley of the chosen arm, then the back alley would contain either reward or no-reward feedback. The probability of the feedback stimulus following the nonpredictive cue was random and equiprobable (50%/50%). The experiment consisted of 240 trials in 4 blocks separated by rest periods. At the end of the experiment, participants were informed about the feedback probabilities and were given a $10 performance bonus.
Electrophysiological Recordings and ERP Analysis
Electrophysiological recordings and ERP analysis were identical to that of Experiment 1, except that ERPs elicited by the predictive cues were created for each participant and electrode by averaging the single-trial EEG according to type of predictive cue (predictive-reward cue, predictive-no-reward cue, nonpredictive cue). Note that 2 ERPs were created for the nonpredictive cue condition, one for nonpredictive cues that preceded reward feedback and a second for nonpredictive cues that preceded no-reward feedback. Further, ERPs elicited by the feedback stimuli were created for each participant and electrode by averaging the single-trial EEG according to feedback type (reward and no-reward feedback following the predictive cue and reward and no-reward feedback following the nonpredictive cue). Thus, a total of 8 ERPs were generated across all conditions. In contrast to Experiment 1, where a single difference wave was constructed, for Experiment 2, 4 difference waves were constructed. Specifically, 2 differences waves were created from the predictive cue ERPs by subtracting 1) the predictive-reward ERPs from the corresponding predictive no-reward ERPs and 2) the nonpredictive ERPs on rewarded trials (i.e., the “nonpredictive-reward ERPs”) from the corresponding nonpredictive ERPs on no-reward trials (i.e., the “nonpredictive-no-reward ERPs”). Further, 2 more differences waves were created from the feedback ERPs by subtracting 1) the reward ERPs on predictive trials from the no-reward ERPs on predictive trials and 2) the reward ERPs on nonpredictive trials from the no-reward ERPs on nonpredictive trials.
Because our analysis of NT170 amplitude in Experiment 1 failed to yield any statistically significant results, in Experiment 2, we focused our analysis on the question of interest, namely whether NT170 latency depended on the spatial location of the predictive cues and feedback stimuli. Electrophysiological recordings and ERP analyses were identical to that of Experiment 1, except that ERPs were created for each participant and electrode by averaging the single-trial EEG according to stimulus location (left alley and right alley), predictive condition (predictive and nonpredictive), and stimulus type (cue and feedback). Note that for this analysis the ERPs were averaged across feedback type (reward and no-reward). The fERN and NT170 were analyzed with repeated measures ANOVA.
A 2-way repeated measure ANOVA on the amplitude of the fERN (evaluated as the maximum amplitude of each difference wave) with factors predictive condition (predictive, nonpredictive) and stimulus type (cue, feedback) revealed a main effect of stimulus type, F1,11 = 5.13, P < 0.05, η2 = 0.318, but no main effect of predictive condition, F1,11 = 2.33, P > 0.05. Post hoc analysis indicated that the fERN elicited by the feedback stimulus (M = −2.9 uV, SE = ±0.5) was larger than the fERN elicited by the predictive cue (M = −1.7 uV, SE = ±0.3). Further, the 2-way repeated measure ANOVA revealed an interaction between predictive condition and stimulus type, F1,11 = 48.00, P < 0.0001, η2 = 0.814 (Fig. 7). For the predictive condition, a paired-sample t-test revealed that the amplitude of the fERN elicited by the cue (M = −3.2 uV, SE = ±0.5) was larger than the amplitude of the fERN elicited by the feedback (M = −1.7, SE = ±0.66), t11 = −2.1, P < 0.05. For the nonpredictive condition, a paired-sample t-test revealed that the amplitude of the fERN elicited by the cue (M = −0.25, SE = ±0.25) was smaller than the fERN elicited by the feedback (M = −4.14, SE = ±0.62), t11 = 5.6, P < 0.0001. Importantly, the fERN elicited by the cue in the predictive condition and the fERN elicited by the feedback in the nonpredictive condition both exhibited a frontal–central scalp distribution with a maximum at channel FCz, a finding consistent with previous reports of the fERN (e.g., Miltner et al. 1997; Holroyd and Krigolson 2007; Fig. 7).
Consistent with Experiment 1, the NT170 was found to be maximal at the far right, posterior electrode location PO8 (Fig. 8). This impression was confirmed by pairwise t-tests, which indicated that the NT170 was significantly larger at channel P08 than at every other electrode channel (P < 0.05) except at its neighboring site, P8 (P > 0.05). NT170 latency associated with channel P08 was subjected to a 3-way ANOVA with repeated measures having 2 levels of stimulus type (cue, feedback), 2 levels of predictive condition (nonpredictive, predictive), and 2 levels of stimulus location (left alley, right alley). The ANOVA did not reveal any main effects of stimulus type (P > 0.05), predictive condition (P > 0.05), or stimulus location (P > 0.05). However, the ANOVA did reveal an interaction between stimulus type and alley, F1,11 = 22.18, P < 0.001, η2 = 0.67 (Fig. 9). All other interactions did not reach significance (P > 0.05). Post hoc analysis indicated that for the cue, the NT170 latency was longer for the front left alley (M = 195 ms, SE = ±2.6) compared with the front right alley (M = 185 ms, SE = ±3.7), t12 = −3.02, P = 0.01. In contrast, for the feedback stimulus, the NT170 latency was longer for the right back alley (M = 190 ms, SE = ±2.1) compared with the left back alley (M = 185 ms, SE = ±1.8), t12 = −3.85, P = 0.005 (Fig. 9).
As a check, we also ran the same analysis on NT170 amplitude. In particular, the NT170 amplitude associated with channel P08 was subjected to a 3-way ANOVA with repeated measures having 2 levels of stimulus type (cue, feedback), 2 levels of predictive condition (nonpredictive, predictive), and 2 levels of stimulus location (left alley, right alley). The ANOVA revealed main effects of predictive condition F1,11 = 5.95, P < 0.05, η2 = 0.35 and stimulus location F1,11 = 5.17, P < 0.05, η2 = 0.31 but not for stimulus type (P > 0.05). Post hoc analysis indicated that the NT170 amplitude was larger for the nonpredictive condition compared with the predictive condition t12 = −2.42, P = 0.03, and NT170 amplitude was larger for the right alley compared with the left alley t12 = 2.27, P = 0.04. Further, the ANOVA did reveal an 3-way interaction between stimulus type, predictive condition, and stimulus location, F1,11 = 6.96, P < 0.05, η2 = 0.39. All other interactions did not reach significance (P > 0.05). The 3-way interaction indicated that for the nonpredictive condition, there was a significant difference in amplitude between the left and right alleys for the feedback t12 = 2.5, P = 0.03 and not the cue (P > 0.05), and for the predictive condition, there was and a trend in a difference in amplitude between the left and right alleys for the cue, t12 = 1.97, P = 0.08, and not the feedback (P > 0.05). These results seem to suggest that the NT170 amplitude mirrors the difference in NT170 latency associated with left and right turns, but like the fERN only for events that are informative (cues in the predictive condition, feedback in the nonpredictive condition). However, because this result was not observed in Experiment 1 and because only a trend in this direction was found for the predictive condition in Experiment 2, these amplitude findings will not be discussed further.
Our first goal in Experiment 2 was to test a central claim of the RL-ERN theory, which is that fERN amplitude would be modulated by predictive cues and unpredicted feedback but not by nonpredictive cues and predicted feedback. The present study confirmed this prediction. Specifically, our data demonstrated that feedback indicating the absence versus presence of a reward differentially modulated fERN amplitude, but only when the outcome was not predicted by an earlier stimulus. By contrast, when a cue predicted the reward outcome, then the predictive cue (and not the feedback) differentially modulated fERN amplitude. Consistent with their identification with the fERN, both negative-going ERP components were maximal over frontal–central areas of the scalp and peaked approximately 250 ms following the onset of the stimulus (Miltner et al. 1997). This evidence provides additional support for the position that the fERN reflects a TD error signal carried by the midbrain dopamine system.
Furthermore, we sought to replicate and extend our NT170 finding in Experiment 1, namely, that the NT170 was sensitive to the spatial location of the feedback stimulus, occurring slightly earlier for feedback found in the right compared with the left arm of the maze. As with Experiment 1, the scalp distribution of the NT170 in Experiment 2 was extremely focal and maximal at channel PO8, indicating that neural regions near this location respond selectively to the spatial locations of reward-related information within the maze. Further, we found that the NT170 elicited by the predictive cues presented in the right front alley reached peak amplitude about 10 ms faster than did the NT170 elicited by the predictive cues presented in the left front alley, a finding consistent with the T-maze results. This latency difference reversed for the feedback stimuli presented in the back alleys, such that the NT170 elicited by the feedback stimuli presented in the left back alley reached peak amplitude about 5 ms faster than did the NT170 elicited by the feedback presented in the right back alley. Note that this observation indicates that the neural system generating this ERP component processed cue and feedback information differently when the participant turned to their right compared with when the participant turned to their left. Specifically, the NT170 component elicited by the predictive cue occurred earlier when the participant turned rightward into the front alley compared with when they turned leftward into the front alley, and the NT170 component elicited by the feedback stimulus occurred earlier when the participant turned rightward into the back alley compared with when they turned leftward into the back alley (Fig. 10). This result is clearly not due to a visual field bias (Handy et al. 1996) because the stimuli were not presented in retinotopic coordinates. Instead, we suggest that it may stem from an egocentric bias such that objects encountered on the right side of one's spatial environment are processed differently than objects encountered on the left side.
Maze learning presents an interesting problem for investigation because it normally depends on the learning functions associated with both the MTL and dopamine systems (Packard and McGaugh 1996; Poldrack and Packard 2003). In the present study, we used ERP measures to investigate the relative contributions of these 2 systems toward human navigation in a simple virtual maze.
Reinforcement Learning and the Dopamine System
As described above, Schultz et al. (1998) used electrophysiological techniques to record the activity of individual midbrain dopamine neurons in primates learning to perform simple delayed response tasks and found that a phasic increase in dopamine activity is elicited when an event is better than predicted (positive TD error signal) and a transient cessation in activity results when an event is worse than predicted (negative TD error signal). With learning, the positive TD error signals “propagate back in time” from rewarding outcomes to stimuli that predict reward (Schultz et al. 1997; Schultz1998; Suri and Schultz 2001; Montague et al. 2004). These TD error signals are conveyed to various brain structures involved in cognitive control (e.g., basal ganglia, prefrontal cortex, and the ACC), where they appear to be used for the purpose of reinforcement learning (Schultz 1998; Schultz et al. 2000; Tobler et al. 2005; Samejima et al. 2005). In particular, the ACC appears to use such signals for learning about action–reward relationships (Hadland et al. 2003; Rushworth et al. 2007; Rushworth et al., forthcoming; Holroyd and Coles 2008; Rushworth 2008), whereas striatal circuitry may be relatively more involved in learning about stimulus-reward contingencies (Robbins et al. 1989; Tanaka et al. 2006; Yin and Knowlton 2006; Corbit and Janak 2007). Thus, for example, Matsumoto et al. (2003) demonstrated in a primate study that neurons in the medial prefrontal cortex were activated only by particular action–reward combinations and suggested that this region is especially important for motor selection based on the anticipation of reward. According to the RL-ERN theory (Holroyd and Coles 2002), furthermore, the impact of these signals on ACC modulates fERN amplitude. Thus, the fERN, like the dopamine signals, should be elicited by the first event that indicates whether ongoing events are better or worse than expected—a prediction that we here confirmed.
Specifically, in Experiment 1, our results demonstrated that feedback stimuli in the virtual T-maze task elicited a fERN. This finding is consistent with evidence from a number of studies that have also found fERNs elicited by the feedback in various guessing tasks (Miltner et al. 1997; Holroyd and Coles 2002; Holroyd et al. 2004; Nieuwenhuis et al. 2004; Hajcak et al. 2005; Krigolson and Holroyd 2007). Further, although the RL-ERN theory holds that fERN amplitude should be modulated by external stimuli that first predict the outcome of the trial, direct empirical evidence of this predictive relationship has yet to be demonstrated (Nieuwenhuis et al. 2004; but see also Dunning and Hajcak 2007; Krigolson and Holroyd 2007). Therefore, in Experiment 2, we tested this prediction using a modified version of the T-maze, the TF-maze, in which predictive cues were placed at the bends of the maze arms. Consistent with the RL-ERN theory, our data revealed that the absence versus presence of a reward differentially modulated fERN amplitude, but only when the outcome was not predicted by an earlier stimulus. By contrast, when a cue predicted the reward outcome, then the predictive cue (and not the feedback) differentially modulated fERN amplitude.
Topographical Learning and the MTL System
It is widely accepted that the MTL system is directly involved in the encoding and retrieval of spatial representations of the environment, otherwise known as topographical memory, which can be utilized for navigating to unseen goals (for review, see O'Keefe and Nadel 1978; Burgess et al. 2002). A long history of animal studies, human neuropsychological studies, and functional brain imaging studies provide compelling evidence for this position (Maguire 1997; Milner et al. 1997; Squire and Zola-Morgan 1991; Squire et al. 2004; Maguire 2001; Spiers et al. 2001; Burgess et al. 2002; Crane and Milner 2005; Maguire, Nannery, and Spiers 2006; Maguire, Woollett, and Spiers 2006; Kumaran et al. 2007). Of particular interest, recent functional brain imaging studies involving human participants have revealed a network of structures within the MTL system that subserve different aspects of topographical learning and memory (Maguire 2001). For example, in a series of 3 positron emission tomography (PET) experiments, researchers dissociated neural areas in the MTL that processed different types of visual stimuli: buildings, landscapes, human faces, and animal faces (Maguire et al. 2001). Their findings revealed that the encoding of both buildings and landscapes activated the parahippocampal gyrus bilaterally, whereas recognition of these visual stimuli activated only the right parahippocampal gyrus. In contrast, the encoding of both human and animal faces activated the fusiform gyrus bilaterally, whereas recognition of these visual stimuli activated only the right fusiform gyrus. Overall, this study demonstrated that some medial temporal structures, particularly the right parahippocampal gyrus, are specifically involved in the learning and recognition of topographically relevant stimuli, whereas other medial temporal structures (i.e., the fusiform gyrus) support a type of learning and recognition that does not rely on topographical landmarks.
In another PET study, Maguire et al. (1998) found that particular parahippocampal regions were activated while participants explored 2 virtual reality environments. Specifically, one environment contained salient objects and textures that could be used to discriminate between different rooms, and the second environment involved plain empty rooms that were distinguishable only by their shape. Interestingly, the presence of salient objects in the environment resulted in increased activity in the right parahippocampal gyrus, but this region was not activated during exploration of the empty environment. Further, Aguirre et al. (1996) collected functional magnetic resonance imaging (fMRI) data from participants navigating a virtual maze-like environment and also found that the right parahippocampus was activated during encoding of the maze, and a another recent fMRI study demonstrated that landmarks situated at decision points within a virtual maze activated the parahippocampal gyrus more than did landmarks that were not situated at decision points (Janzen and van 2004). These authors suggested that the parahippocampus is involved in the automatic processing of the location of landmarks associated with key decision points in a spatial environment.
Taken together, these studies present evidence that the parahippocampal gyrus, particularly in the right hemisphere, provides the neural substrate for the encoding of landmarks, object-in-place associations, scene details, and layouts within the larger system for topographical learning in humans (Maguire et al. 1998; Epstein et al. 1999; Burgess et al. 2002; Ekstrom et al. 2003; Spiers and Maguire 2004; Ekstrom and Bookheimer 2007). In the present study, we examined the ERP for evidence that this system contributed to topographical learning and memory in the T-maze. We propose that the NT170 is analogous to the “face” N170, which is thought to be generated in the fusiform gyrus (Rossion et al. 2003; Itier and Taylor 2004; Iidaka et al. 2006; Deffke et al. 2007), a neural area that lies immediately adjacent to parahippocampal cortex (McDonald et al. 2001). Consistent with our proposition, nonface stimuli elicited a NT170 that was maximal over the right hemisphere. In Experiment 1, we observed that the presentation of feedback stimuli presented in a virtual maze evoked a large NT170 over electrode site P08, reaching maximal amplitude at approximately 180 ms postfeedback onset. The polarity (negative deflection), latency (170–190 ms), and location (PO8) of this ERP component bear a very close resemblance to the N170 elicited by face stimuli. Interestingly, this ERP component was extremely focal, being significantly larger at that right-hemisphere electrode location than at all other locations (see Fig. 4), in keeping with observations that the right parahippocampal region is involved in spatial tasks (Aguirre et al. 1996, Maguire et al. 1998), whereas the left parahippocampal region is involved in language tasks (Fujii et al. 1997). More significantly, the latency of the NT170 component was sensitive to the spatial location of the feedback within the maze, occurring earlier for feedback stimuli found in the right arm of the T-maze compared with feedback stimuli found in the left arm of the T-maze. This latency difference was only observed at the electrode site where this component was found to be maximal.
Further, these findings were generally replicated in Experiment 2. Specifically, we found that the NT170 component was again maximal at channel PO8, where it was significantly larger than at all other channels except at its nearest neighbor channel P8, and reached peak amplitude approximately 180 ms following feedback onset. In addition, there was an interaction of stimulus type and alley for the latency of the NT170 component, such that it occurred earlier for the predictive cues presented in the right front alley than for the predictive cues presented in the left front alley, whereas it occurred later for the feedback stimuli presented in the right back alley than for the feedback stimuli presented in the left back alley (see Figs. 9 and 10). Importantly, the feedback stimuli and predictive cues used in our virtual environments share features with stimuli used in hemodynamic neuroimaging studies (i.e., fMRI, PET), such as rooms with salient objects placed in them (Maguire et al. 1998), that tended to evoke neural activity in the right parahippocampal gyrus. On the basis of this evidence, we propose that presentation of the reward-related stimuli in the maze alleys evoked activity in the right parahippocampal gyrus that in turn produced the NT170 (see the Supplementary Materials online for the results of a source localization analysis that are consistent with this hypothesis).
Suggestively in this regard, neurons in the hippocampal formation and parahippocampal region produce extra cellular potentials that oscillate in the frequency range of 4–12 Hz—the “theta rhythm”(O'Keefe and Recce 1993; Hafting et al. 2005, 2008; Yamaguchi et al. 2007; Cornwell et al. 2008). These neurons are characterized by their sensitivity to “place fields” that represent the local environment (i.e., the neurons exhibit environmental position-dependent firing) and as such are implicated in the MTL's “cognitive mapping” system (O'Keefe 1976; O'Keefe and Nadel 1978; O'Keefe and Recce 1993; O'Keefe et al. 1998; Hafting et al. 2008). Importantly, as an animal traverses these place fields, the time of firing of the associated neurons systematically shifts with respect to the phase of the ongoing theta rhythm. This observation has motivated the proposal that the relationship between the timing of place field firing of neurons and theta rhythm phase may constitute a temporal mechanism for encoding spatial information (Brun et al. 2002; Hafting 2005). Further, human studies have revealed evidence of high-amplitude theta oscillations during virtual maze learning (Caplan et al. 2001, Kahana et al. 2001). We tentatively suggest that the appearance of the eliciting stimulus (e.g., positive feedback) resets the phase of the ongoing theta rhythm but that the timing of the resetting occurs slightly earlier for objects encountered on the subject's right side compared with the left side. Phase resetting of the ongoing EEG can appear as stimulus-locked deflections in the ERP (Makeig et al. 2002; Yeung et al. 2004; see also Yeung et al. 2007). Because the parahippocampal region produces a theta rhythm, then phase resetting of this rhythm might produce the NT170 (see the Supplementary Materials online for the results of a frequency-domain analysis that are consistent with this hypothesis). Further, if the timing of the reset differed for objects encountered in the right versus the left side of the subject's environment, then NT170 latency might differ by a commensurate amount. Although admittedly speculative, we hope that this suggestion will motivate future research.
To our knowledge, our study is the first to examine electrophysiological measures of navigation and reward processing in humans using a virtual T-maze task. Consistent with previous accounts (O'Reilly and Rudy 2000, 2001; O'Reilly and Norman 2002; Frank et al. 2003; Atallah et al. 2004), our findings suggest that the functions of a dopamine-dependent frontal system for reinforcement learning (as revealed by the fERN) and a MTL system for spatial learning (as revealed by the NT170) interact as participants learn to navigate a virtual maze. Although our fERN findings confirm theoretical predictions about the dopamine system (Holroyd and Coles 2002) and are consistent with previous results (Nieuwenhuis et al. 2004), our NT170 findings are entirely novel and will need to be explored in future experiments. Further, it remains unclear whether the systems that generate the NT170 and fERN constitute 2 competing, parallel systems, or whether they work in concert, each contributing specialized functions (Packard and McGaugh 1996; White and McDonald 2002; Packard and Knowlton 2002; Poldrack and Packard 2003; Squire et al. 2004). For example, dopamine signals carried to the MTL have been shown to affect long-term potentiation by inhibiting or facilitating excitatory synaptic transmission there (Kotecha et al. 2002), so in principle the dopamine reward prediction error signals could be used to shape and consolidate the conjunctive representations formed by the MTL (Foster et al. 2000). These issues are ripe for future investigation.
Natural Sciences and Engineering Research Council Discovery Grant (RGPIN 312409-05); Integrated Mentorship Program of Addiction Research Training (IMPART) doctoral fellowship.
Travis E. Baker and Clay B. Holroyd, Department of Psychology, University of Victoria. Portions of this paper were presented at the 46th Annual Meeting of the Society for Psychophysiological Research, Vancouver, Canada, October 2006, and at Canadian Society for Brain, Behaviour, and Cognitive Science 17th Annual Meeting, Victoria, Canada, June 2007. We are grateful to the research assistants in the Brain and Cognition Laboratory for help with data collection. Conflict of Interest: None declared.