Abstract

Associative theory postulates that learning the consequences of our actions in a given context is represented in the brain as stimulus–response–outcome associations that evolve according to prediction-error signals (the discrepancy between the observed and predicted outcome). We tested the theory on brain functional magnetic resonance imaging data acquired from human participants learning arbitrary visuomotor associations. We developed a novel task that systematically manipulated learning and induced highly reproducible performances. This granted the validation of the model-based results and an in-depth analysis of the brain signals in representative single trials. Consistent with the Rescorla–Wagner model, prediction-error signals are computed in the human brain and selectively engage the ventral striatum. In addition, we found evidence of computations not formally predicted by the Rescorla–Wagner model. The dorsal fronto-parietal network, the dorsal striatum, and the ventrolateral prefrontal cortex are activated both on the incorrect and first correct trials and may reflect the processing of relevant visuomotor mappings during the early phases of learning. The left dorsolateral prefrontal cortex is selectively activated on the first correct outcome. The results provide quantitative evidence of the neural computations mediating arbitrary visuomotor learning and suggest new directions for future computational models.

Learning the consequences of our actions in their context is a fundamental cognitive ability, because it allows us and other animals to anticipate relevant events and to adapt to varying environments. If the relation between the visual stimulus, or context, the action, and its outcome is arbitrary and causal, we refer to as arbitrary visuomotor learning (Wise and Murray 2000). In monkey neurophysiology, research on its neural bases has relied on 2 complementary approaches. One searched for covariations between the firing rate of single neurons and behavioral performance, measured by the probability of correct response. Three classes of neurons, correlating positively or negatively with the probability of correct response or with its rate of change, have been found in the dorsal premotor (PMd) and prefrontal cortex, striatum, and hippocampus (for reviews, see Wise and Murray 2000; Brasted and Wise 2005; Suzuki and Brown 2005). A complementary approach searched for changes in selectivity for the rewarded action in the average firing rate of neuronal populations (Asaad et al. 1998; Pasupathy and Miller 2005). The authors found that the selectivity strength of neurons in the prefrontal cortex and in the striatum increases and its latency decreases (occurs earlier in the trial) as learning takes place. In human neuroimaging, a number of studies using positron emission tomography and functional magnetic resonance imaging (fMRI) have also sought to identify the underlying brain correlates. Regional cerebral blood flow has been shown to vary during learning in the frontal and parietal networks (Deiber et al. 1997) and in the prefrontal–basal ganglia pathways (Toni and Passingham 1999). Learning-related changes in the blood oxygenation level–dependent (BOLD) signal have been identified in the temporal and prefrontal areas (Toni et al. 2001), and in the parietal and frontal cortical areas (Eliassen et al. 2003). More recently, BOLD changes reflecting the probability of correct response have been found in the medial temporal lobe (MTL) as well as in the cingulate cortex and frontal lobe (Law et al. 2005).

These studies have established that arbitrary visuomotor learning engages a large brain network including the frontal–parietal system, the basal ganglia, and medial temporal structures. However, the neural computations mediating the acquisition of arbitrary visuomotor relations are not fully understood. To address this issue, we put forward hypotheses about the internal computations using animal associative learning theory (Dickinson 1980; Pearce 1997). We assumed that this ability resides in the computation of goal-directed stimulus–response–outcome associations whose strengths depend on the contingency and contiguity of the events, as well as the current goals and motivational state (Rescorla 1991; Dickinson 1994; Pearce 1997; Balleine and Dickinson 1998). We then predicted the evolution of the associative values and outcome prediction errors (i.e., the discrepancy between the observed and predicted outcome) by fitting the Rescorla–Wagner model (1972) to the behavioral performances of human participants learning arbitrary visuomotor associations. In the fMRI analysis, we tested if the predicted computations are represented in the brain using a model-based analysis of the BOLD signals. In addition, because the newly developed task systematically manipulated learning and induced highly reproducible performances, we validated and complemented the results by analyzing the brain signals in representative single trials. The results provide novel insights into the neural computations subserving arbitrary visuomotor learning and how distributed brain networks cooperate in the early phases of learning.

Materials and Methods

Subjects

Fourteen healthy subjects participated in the study (all were right handed and 7 were females; average age 26 years old). All participants gave written informed consent according to established institutional guidelines and they received monetary compensation (45 euros) for their participation. The project has been approved by the local ethics committee.

Task Design

Our implementation of arbitrary visuomotor learning required subjects to find by trial-and-error the correct associations between 3 colored circles and 5 finger movements (Fig. 1A). We developed a new task design to allow both a model-based analysis of the fMRI data (Corrado and Doya 2007; O'Doherty et al. 2007) and a trial-based validation of the results. Trial-based analysis is problematic when the behavioral choices are poorly reproducible across subjects and sessions, such as in the early phases of learning. To ensure the statistical power to analyze the BOLD responses in single representative trials, the novel task had to systematically manipulate learning and induce highly reproducible behavioral performances across subjects. To do so, the correct stimulus–response associations were not set a priori. Instead, the correct stimulus–response associations were assigned as the subject proceeded in the task. Figure 1B shows the matrix of all the possible stimulus–response combinations, updated according to the exemplar learning session (Fig. 1A). The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the subject's motor response (Fig. 1B, from trials 1 to 3). Then, on the second presentation of stimulus S1, any untried finger movement was always considered as a correct response. Because the stimuli were presented randomly within blocks of 3 trials, stimulus S1 could occur for the second time from trials 4 to 6 (trial 4 in Fig. 1). For the second stimulus S2, the response was correct only when the subjects had performed 3 incorrect finger movements (trial 9 in Fig. 1B). For stimulus S3, the subject had to try 4 different finger movements before the correct response was found (trial 12 in Fig. 1B). In other words, the correct response was the second finger movement (different from the first tried response) for stimulus S1, the fourth finger movement for stimulus S2, the fifth for stimulus S3. Each learning session was composed of 42 trials, 3 stimulus types (i.e., different colors, S1, S2, and S3) and 5 possible finger movements. During scanning, each subject performed 4 learning sessions, each containing new colored stimuli.

Figure 1.

(A) Task design of an exemplar learning session. (B) Matrix of all the possible stimulus–response combinations corresponding to the exemplar session in (A). A red cross and a green tick-mark refer to incorrect and correct stimulus–response sequences, respectively. The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the motor response (from trials 1 to 3). On the second presentation of the stimulus S1 (the blue circle), any untried finger movement was always followed by a correct outcome (trial 4). The correct response for S2 and S3 (red and green circles, respectively) was found after 3 and 4 incorrect finger movements (at trials 9 and 12, respectively).

Figure 1.

(A) Task design of an exemplar learning session. (B) Matrix of all the possible stimulus–response combinations corresponding to the exemplar session in (A). A red cross and a green tick-mark refer to incorrect and correct stimulus–response sequences, respectively. The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the motor response (from trials 1 to 3). On the second presentation of the stimulus S1 (the blue circle), any untried finger movement was always followed by a correct outcome (trial 4). The correct response for S2 and S3 (red and green circles, respectively) was found after 3 and 4 incorrect finger movements (at trials 9 and 12, respectively).

The second issue that motivated our task design was the need to dissociate the BOLD signals produced by the stimulus presentation from those due to the outcome. To do so, we introduced a variable delay ranging from 4 to 12 s (randomly drawn from a log-normal distribution) between the presentation of the colored circles and the feedback image. The next trial started after a variable delay ranging from 4 to 12 s with the presentation of another visual stimulus. The outcome image informed the subject whether the response was correct, incorrect, or too late (if the reaction time exceeded 1.5 s). “Late” trials were excluded from the analysis, because they were very rare (i.e., maximum 2 late trials per session). The visual stimulus was projected onto the center of a screen positioned in the back of the scanner using a video projector. In the scanner, subjects could see the video reflected in a mirror (15 × 9 cm) suspended 10 cm in front of their face and subtending visual angles of 42° horizontally and 32° vertically. Finger movements and their timings were recorded using an fMRI-compatible keyboard with 5 buttons positioned under the 5 fingers of the right hand.

The subjects were told that the correct stimulus–response associations were not mutually exclusive (i.e., 2 stimuli could be associated with the same finger movement), meaning that they could not infer correct associations by excluding previous correct movements. Subjects were explicitly told that every stimulus was associated with only 1 finger movement, thus convincing them that the probability of correct response on the first presentation of each stimulus was 0.2 (1 over 5 possible finger movements). When interviewed after the scanning session, none of the subjects realized that learning was manipulated by the experimenter. In particular, none realized that the first stimulus presentation was always followed by an incorrect outcome, irrespective of the response.

Behavioral Model

Implementation of the Associative Model

Various models provide different accounts of the evolution of the associative values (Rescorla and Wagner 1972; Macintosh 1975; Pearce and Hall 1980), but all assign a central role to the computation of the outcome prediction error (i.e., the discrepancy between the observed and predicted outcome). We modeled arbitrary visuomotor learning using one of the most influential associative learning theories, the Rescorla–Wagner model (1972). The evolution of the associative strengths for each action are given by 

(1)
graphic
where n is the trial number, a = 1, … k (k is the number of possible actions) and η is the learning rate for a given stimulus. The asymptotic value of the association strength λ takes values greater than 0 after a correct response, 0 if incorrect. The change in associative strength at each trial is proportional to the prediction-error signal, which represents the discrepancy between the obtained and expected outcome (for a review see, Schultz 2006). The expected outcome prior to learning is formalized as the initial value of the associative strength. Because we had instructed the subjects that 1 of the 5 finger movements had to be associated to a given stimulus, the initial probability of correct response can be assumed to be 0.2. To model such initial probabilities, we set the initial value of all associative strengths to λ/k.

Associative learning theory claims that the probability to perform a given action is proportional to its associative value. To model such an action selection process, we transformed the association values V(n) into probabilities according to the softmax equation, which is a standard method in reinforcement learning theory (Sutton and Barto 1998): 

(2)
graphic
The coefficient β is termed the inverse “temperature”: low β (less than 1) causes all actions to be (nearly) equiprobable, whereas high β (greater than 1) amplifies the differences in association values. In reinforcement learning theory (Sutton and Barto 1998), this model is also referred to as the Q-learning algorithm (Watkins and Dayan 1992), in which action values are updated through the Rescorla–Wagner learning rule. To summarize, learning is described as follows: 1) the learner computes the stimulus–response–outcome association values according to the expected outcome, 2) he/she selects an action by comparing the values of multiple stimulus–response alternatives (softmax equation), and 3) he/she updates the association values according to an error-correcting process equal to the discrepancy between the obtained and estimated outcome (i.e., the prediction-error signal). Overall, learning requires the computation of the associative values and the prediction-error signals.

Estimation of the Models’ Parameters

To quantify the learning computations, we need to define the free parameters of the model, namely the learning rate η, the asymptotic value of the associative strength λ, and the inverse “temperature” β in the softmax method (eq. 2). We identified the set of parameters that best fitted the behavioural data using a maximum likelihood approach. For each learning session, we varied the learning rate η from 0.1 to 1 (in steps of 0.05), the asymptotic value λ was varied from 1 to 5 (in steps of 0.5) and β was varied from 0.5 to 5 (in steps of 0.25). For each parameter set, we computed the log-likelihood of the probability to make the action performed by the animal as follows: 

(3)
graphic
where Pa(n) is the probability to perform the action executed by the participant at trial n according to the model (eq. 2). If the model perfectly predicts the responses, Pa(n) is 1 and the log-likelihood L is 0. If the probability values are less than 1, L assumes negative values. Thus, we selected the set of model parameters θ′ giving the best fit (i.e., highest value of L) with the behavioural data: 
(4)
graphic

fMRI Data Acquisition and Preprocessing

Images were acquired using a 3-T whole-body imager equipped with a circular polarized head coil. For each participant, we acquired a high-resolution structural T1-weighted anatomical image (inversion-recovery sequence, 1 × 0.75 × 1.22 mm) parallel to the anterior commissure-posterior commissure plane, covering the whole brain. For functional imaging, we used a T2*-weighted echo-planar (EP) sequence at 36 interleaved 3-mm-thick axial slices with 1-mm gap (time repetition [TR] = 2400 ms, time echo = 35 ms, flip angle = 80°, field of view = 19.2 × 19.2 cm, 64 × 64 matrix of 3 × 3 mm voxels). Image analysis was conducted using SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). For each subject, all functional images were slice-time corrected to a slice acquired half-way through the volume acquisition (at TR/2) and then realigned to the first slice of each learning session. The anatomical MRI was spatially normalized to a standard T2* template and the normalization parameters were used to normalize the preprocessed functional EP images. The resulting fMRI data were then smoothed using Gaussian kernel of a full width at half-maximum of 8 mm. Finally, intensity normalization and high-pass filtering (128 s) were applied to the data.

fMRI Data Analysis

Model-Based Analysis. Identification of Learning-Related Activations

We performed a model-based analysis of the fMRI data to test if the computations predicted by the associative model are represented in the brain. The statistical analysis of the preprocessed event-related BOLD signals was performed using a general linear model approach. Because we aimed at dissociating 2 events per trial (the stimulus–response event and the outcome image, as in Fig. 1A), the regressors were constructed by convolving the canonical hemodynamic response function with delta functions of constant or varying amplitudes aligned on the time of stimulus or outcome presentation. The first and third regressors of the design matrix had unit amplitudes and were aligned on the time of stimulus and outcome presentation, respectively. These 2 regressors are excluded from the results, as they account for BOLD responses that did not vary during learning. Results will focus on 3 regressors whose amplitude varied according to the associative model: 1 regressor (the second in the design matrix) “searched” for the neural representations of the associative values and 2 (the fourth and fifth) for those of the prediction-error signals. The delta functions of the second regressor with onsets at the stimulus presentations and a duration of 1.5 s was parametrically modulated according to the associative values Va(n). In order to explore the neural substrate of prediction-error signals, the amplitude of the fourth regressor varied according to the rate of change of the Va(n), that is Va(n + 1) − Va(n) and it was aligned on the outcome image. Because the prediction-error is negative on the first incorrect trials, we also hypothesized that negative prediction-error signals could produce positive BOLD deflections. Therefore, the parametric modulation of the fifth regressor was equal to the absolute value of the fourth regressor, that is the absolute value of the prediction-error signals. The regression parameters (the beta values) were estimated for each subject and design matrix, and they were taken to the random-effects level. All the fMRI statistics and P values arise from group random-effects analyzes. Because we searched for significant correlations over the entire brain, we considered as activated all the voxels with a P < 0.01 after whole-brain correction for multiple comparisons (family-wise error). In addition, we defined as activated brain regions, those clusters of more than 5 voxels, thus giving a P < 0.001 corrected for cluster dimension.

The Montreal Neurological Institute (MNI) coordinates of the activated voxels were transformed to Talairach space using a Matlab (MathWorks, Inc. MA) code developed by Matthew Brett (http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach). The anatomical location of each activated voxel within each activated cluster was identified using the Talairach Daemon software (Lancaster et al. 2000) (http://ric.uthscsa.edu/resources/), and the location of a cluster was defined as the brain region corresponding to the majority of activated voxels. The extent and the number of voxels in each activated cluster was defined on functional rather than anatomical data.

Trial-Based Analysis. Computation of the Grand-Average BOLD Responses and the Identification of Learning-Related Functional Networks

To validate the results of the model-based analysis and to better understand the functional role of each activated cluster, we analyzed the BOLD responses in 5 sets of representative single trials: the first set of 3 incorrect trials (trials 1–3 in each learning session, Fig. 1A), the next set of 3 incorrect trials, the first set of 3 correct trials (1 for each stimulus–response association), and the second and third sets of correct trials (3 trials each for each learning session). We first normalized the preprocessed fMRI signals by subtracting the mean value and dividing by standard deviation calculated over the scanning session. Then, we averaged the normalized time courses over the activated voxels for each cluster. In order to attain the highest temporal resolution, we grouped all the data points across subjects for a given set of trials (168 trials: 14 subjects with 4 sessions and 3 trials each), and we analyzed the data in sliding windows of 1.2 s (TR/2) stepped every 0.1 s. For each time window, we computed the mean BOLD response and the standard error of the mean (Supplementary Fig. 2). Because the fMRI scans were not synchronized to stimulus presentation, the BOLD signals were not sampled at exactly the same latencies with respect to the external events, but varied across trials. By grouping all the data across subjects, we could thus compute the grand-average BOLD responses for the 5 sets of representative trials at a temporal resolution less than the 2.4 s, the repetition time (Supplementary Fig. 2). On average, each window of 1.2 s contained 83.5 ± 10.6 (mean ± standard deviation) data points.

We then examined whether the different clusters formed functionally homogeneous networks. To investigate grouping in our fMRI data, we used a hierarchical clustering method. Hierarchical clustering uses similarity or dissimilarity between every pair of objects in the data to create a hierarchical tree or dendrogram. We analyzed the concatenated grand-average BOLD responses for the 5 sets of trials, and we computed the Pearson's correlation r among all pairs of clusters to create a dissimilarity matrix. The dissimilarity or “distance” between clusters was defined as 1 − r: 2 highly correlated clusters (r close to 1) had shortest “distance” (close to 0) and could be considered as part of the same network; very dissimilar clusters (r close to 0) had largest “distance” and they should have been considered as belonging to different networks. We identified agglomerations according to the minimum distance (i.e., maximal correlation) between clusters and we built a hierarchical tree by progressively merging clusters into networks using an algorithm known as single linkage clustering algorithm. The created hierarchical tree, or dendrogram, is read from bottom to top: clusters with shortest distance (i.e., networks composed of highly correlated clusters) are initially merged together and then merged into successively larger networks. The height of each inverted U is the distance (1 − r) between the 2 clusters or networks being connected. Further information on clustering methods can be found on (http://en.wikipedia.org/wiki/Data_clustering). Matlab (The MathWorks, Inc.) functions in the Statistical Toolbox were used for this analysis.

Results

Behavioral Results and Model's Predictions

Our task design produced reproducible learning performances across sessions and subjects. As illustrated in Figure 2A, the small standard error values provide evidence for highly reproducible learning performances for the 3 stimulus–response associations. Figure 2B shows the probability of correct response computed on the sequences of outcomes using the state-space model approach developed by Smith et al. (2004) and previously applied on animal data by Wirth et al. (2003). Similarly, the variance across sessions and subjects is small with respect to the mean value. The first association is rapidly learnt, whereas the second and third are acquired more gradually. Thus, our task design successfully manipulated learning performance and induced reproducible performances ranging from rapid to gradual acquisition. In addition, the observed learning curves are similar to those observed in previous electrophysiological and neuroimaging studies (e.g., Mitz et al. 1991; Chen and Wise 1995; Law et al. 2005).

Figure 2.

Behavioral learning curves. (A) The 3 curves (1 for each stimulus) representing the probability of correct response versus the number of stimulus presentations (not according to the actual appearance during the learning session) computed on the sequence of outcome (1 for correct and 0 for incorrect). The small standard error values provide evidence for quasi-stereotyped learning behaviors. (B) The probability of correct response computed on the sequences of outcomes using the state-space model approach developed by Smith et al. (2004).

Figure 2.

Behavioral learning curves. (A) The 3 curves (1 for each stimulus) representing the probability of correct response versus the number of stimulus presentations (not according to the actual appearance during the learning session) computed on the sequence of outcome (1 for correct and 0 for incorrect). The small standard error values provide evidence for quasi-stereotyped learning behaviors. (B) The probability of correct response computed on the sequences of outcomes using the state-space model approach developed by Smith et al. (2004).

Model's Predictions

The associative model significantly fitted the behavioral responses during learning (Supplementary Fig. 1). The learning computations predicted by the associative model are the associative value Va(t) and its rate of change, which is proportional to the prediction-error signal. Figure 3A shows the mean associative values after the presentation of the 3 different stimuli averaged over sessions and subjects. The small standard error values provide evidence for highly reproducible learning computations. The associative strengths increased rapidly after the second, fourth, and fifth presentation of stimuli 1, 2, and 3, respectively. The average increase in associative strength for stimulus 1 displayed a step-like function, whereas it was more gradual for stimulus 3. This is evident in the average prediction-error curve (Fig. 3B). On the stimulus presentation where the action was incorrect (e.g., presentations 1, 2, and 3 for stimulus 2), the prediction error of the performed (incorrect) action was negative, as predicted by equation 2. The prediction-error values were positive on the first correct trials for all stimuli (Fig. 3B). For stimuli 2 and 3, the prediction-error signals did not reach zero on the second correct trials, but maintained above zero on the first few trials (Fig. 3B inset, stimulus presentations 6–8). This explains the more gradual acquisition of the second and third association. In addition, the peak prediction-error values decreased slightly from stimulus 1 to stimulus 3. This is in line with the notion that the prediction-error is higher for unexpected positive feedback (i.e., correct action after the presentation of stimulus 1) than for nearly predictable outcomes (i.e., the last possible action for stimulus 3). Figure 3C shows the average probability of correct response predicted by the associative model (eq. 2) and fits the one computed on the behavioral data shown in Figure 2B.

Figure 3.

Mean evolution of the learning representations and the probability of correct response (PCR). (A) Associative values V(t) (1 for each stimulus); the values are plotted against the number of stimulus presentations and not according to the actual appearance during the learning session. (B) Average prediction-error curves. (C) Average PCR predicted by the associative model (eq. 2). The inset in (B) is a zoom-in from trials 6–10 of the average prediction-error curves.

Figure 3.

Mean evolution of the learning representations and the probability of correct response (PCR). (A) Associative values V(t) (1 for each stimulus); the values are plotted against the number of stimulus presentations and not according to the actual appearance during the learning session. (B) Average prediction-error curves. (C) Average PCR predicted by the associative model (eq. 2). The inset in (B) is a zoom-in from trials 6–10 of the average prediction-error curves.

Neuroimaging Results

We searched for the brain regions whose BOLD response correlated with the learning computations predicted by the associative model, that is the associative strengths and the prediction-error signals. No voxels were found to correlate with the associative values at the presentation of the stimulus. By contrast, several brain structures displayed BOLD responses correlating with either the signed or the absolute values of the prediction-error signals (fourth and fifth regressors). The results are summarized in Table 1: 2 clusters (cluster numbers 1 and 2) preferentially correlated with the signed prediction-error signals during learning and 9 clusters (3–11) with its absolute value. Clusters 1 and 2 were localized in the ventral portion of the right head of the caudate nucleus (i.e., the ventral striatum) and in the left dorsolateral prefrontal cortex (DLPFC, Fig. 4). Clusters 3–11 were localized in different cortical and subcortical areas (Fig. 5): clusters 3 and 4 were localized in the left and right dorsal striatum, respectively (left and right head of the dorsal portion of the caudate nucleus and putamen); clusters 5 and 6 were in the premotor areas (left and right PMd cortices); cluster 7 in the left supplementary motor area (SMA); clusters 8 and 9 were located in the superior parietal lobule (SPL); and cluster 10 in the inferior parietal lobule (IPL); finally, cluster 11 was in the anterior portion of the ventrolateral prefrontal cortex in the right hemisphere (VLPFC).

Table 1

Activated clusters correlating with prediction-error signals

Cluster name Cluster size P value Peak MNI coordinates (x, y, zPeak P value Peak T value Brain area Hemisphere 
Prediction error 
    1 0.0001 0.0001 7.00 Ventral striatum 
    2 0.0001 −45 30 0.0009 6.24 DLPFC (BA 9) 
Absolute prediction error 
    3 59 0.0000 −21 15 0.0000 7.60 Dorsal striatum 
    4 97 0.0000 15 18 0.0000 8.52 Dorsal striatum 
    5 17 0.0000 −27 54 0.0004 6.48 PMd (BA 6) 
    6 92 0.0000 27 60 0.0000 8.07 PMd (BA 6) 
    7 16 0.0000 −6 57 0.0002 6.59 SMA (BA 6) 
    8 23 0.0000 −21 −72 63 0.0012 6.17 SPL (BA 7) 
    9 0.0001 −69 57 0.0029 5.96 SPL (BA 7) 
    10 20 0.0000 45 −42 51 0.0009 6.26 IPL (BA 7) 
    11 12 0.0000 30 54 0.0005 6.38 Anterior VLPFC (BA 10) 
Cluster name Cluster size P value Peak MNI coordinates (x, y, zPeak P value Peak T value Brain area Hemisphere 
Prediction error 
    1 0.0001 0.0001 7.00 Ventral striatum 
    2 0.0001 −45 30 0.0009 6.24 DLPFC (BA 9) 
Absolute prediction error 
    3 59 0.0000 −21 15 0.0000 7.60 Dorsal striatum 
    4 97 0.0000 15 18 0.0000 8.52 Dorsal striatum 
    5 17 0.0000 −27 54 0.0004 6.48 PMd (BA 6) 
    6 92 0.0000 27 60 0.0000 8.07 PMd (BA 6) 
    7 16 0.0000 −6 57 0.0002 6.59 SMA (BA 6) 
    8 23 0.0000 −21 −72 63 0.0012 6.17 SPL (BA 7) 
    9 0.0001 −69 57 0.0029 5.96 SPL (BA 7) 
    10 20 0.0000 45 −42 51 0.0009 6.26 IPL (BA 7) 
    11 12 0.0000 30 54 0.0005 6.38 Anterior VLPFC (BA 10) 

Note: Clusters 1–2 significantly correlate with the signed value of the predictions error signal, whereas clusters 3–11 with its absolute value. The P values for each cluster (third column) arise from group random-effects analyses and they are corrected for multiple comparisons (family-wise error) and cluster size (see Methods). Abbreviations: BA, Brodmann area; DLPFC, dorsolateral prefrontal cortex.

Figure 4.

Brain clusters significantly correlating with the prediction-error signals. (A) Axial and (B) coronal views of the activations. The cluster number is the same as in Table 1. Activated voxels with a P < 0.01 after whole-brain correction for multiple comparisons (family-wise error). Activated clusters were those with more than 5 voxels, thus giving a P < 0.001 corrected for cluster dimension.

Figure 4.

Brain clusters significantly correlating with the prediction-error signals. (A) Axial and (B) coronal views of the activations. The cluster number is the same as in Table 1. Activated voxels with a P < 0.01 after whole-brain correction for multiple comparisons (family-wise error). Activated clusters were those with more than 5 voxels, thus giving a P < 0.001 corrected for cluster dimension.

Figure 5.

Brain clusters significantly correlating with the absolute value of the prediction-error signals. (A) and (B) are coronal views of the activations whereas (C) and (D) are axial views. The cluster numbering is the same as in Table 1.

Figure 5.

Brain clusters significantly correlating with the absolute value of the prediction-error signals. (A) and (B) are coronal views of the activations whereas (C) and (D) are axial views. The cluster numbering is the same as in Table 1.

In order to examine whether the different learning-related clusters formed functionally homogeneous networks, we analyzed the grand-average BOLD responses using a hierarchical clustering method (see Methods). Figure 6 shows the hierarchical tree or dendrogram obtained from this analysis. A first network originated from the 2 premotor areas (PMd left and right) before merging with the 2 superior parietal clusters and finally with the inferior parietal cluster (green in Fig. 6); a second network formed between the clusters in the dorsal striatum and the SMA (in blue). These 2 networks then merged together with the anterior VLPFC (in yellow), the DLPF cortex (in red), and finally with the ventral striatum (in gray).

Figure 6.

Cluster analysis. The dendrogram is read from bottom to top, starting from the shortest distance between clusters (i.e., networks composed of highly correlated clusters). The height of each inverted U represents the distance (1 − r) between the 2 clusters being connected. The cluster number is the same as in Table 1; clusters 1 and 2 are selective for the signed value of the prediction-error signals, whereas clusters 3–11 correlate with its absolute value. The corresponding cluster names are also reported.

Figure 6.

Cluster analysis. The dendrogram is read from bottom to top, starting from the shortest distance between clusters (i.e., networks composed of highly correlated clusters). The height of each inverted U represents the distance (1 − r) between the 2 clusters being connected. The cluster number is the same as in Table 1; clusters 1 and 2 are selective for the signed value of the prediction-error signals, whereas clusters 3–11 correlate with its absolute value. The corresponding cluster names are also reported.

Figure 7A shows the grand-average BOLD responses for the 5 clusters merging in the fronto-parietal network (in green in Fig. 7). These clusters significantly correlate with the absolute value of the prediction error (Table 1) and they display an increase on both the incorrect (Fig. 7A, left panels) and first correct trials (Fig. 7A, right panels). This is evident also in Figure 8A, which shows the histogram of the beta values of the second-level analysis (group analysis across subjects) for the 5 regressors of the design matrix: the main contribution comes from correlations with the absolute value of the prediction error (fifth column in each panel of Fig. 8). Figure 7B shows the grand-average BOLD responses for 3 clusters forming a fronto-striatal network (blue in Fig. 6), that significantly correlated with the absolute value of the prediction-error signal (Table 1). However, a closer inspection of the histograms of the beta coefficients in Figure 8B indicates a nonzero contribution of the fourth regressor, thus indicating a small canceling effect between the signed and absolute prediction-error signals. This reflects the fact that the BOLD responses on the first set of correct trials is stronger than those on the incorrect trials. Figure 7C shows the grand-average BOLD responses in the cluster located in the anterior ventrolateral prefrontal cortex, which correlated with the absolute values of the prediction errors (Fig. 8C). Figure 7D shows the response for the left DLPFC, which significantly correlated with the signed values of the prediction-error signal. However, this cluster shows a strong response only on the first correct trials (green curve in the right panel); in fact, the contributions from the signed and absolute values of the prediction errors are similar (Fig. 8D) and thus canceled out in the incorrect trials. Finally, the grand-average BOLD response measured in the ventral striatum (Fig. 7E) showed a negative peak around 4 s after incorrect outcome onset and a strong positive deflection on the first correct trials, thus correlating with the signed value of the prediction-error signal (Fig. 8E). Only this cluster specifically activated as the signed prediction-error signal. This analysis shows that the trial-based validation of the model-based fMRI results provides a deeper insight into the dynamics of activation.

Figure 7.

Grand-average BOLD responses aligned on the outcome image for the 5 sets of trial (legend). The left panels shows the grand-average BOLD responses for the 2 sets of incorrect trials (black and gray curves), whereas the right panels show the responses for the first, second, and third set of correct trials (green, red, and blue curves, respectively). (A) Network including the PMd and the parietal lobule (green network in Fig. 6). (B) Network with the SMA and the dorsal striatum (blue in Fig. 6). (C) Right VLPFC cluster (yellow in Fig. 6). (D) Left dorsolateral prefrontal cortex, and (E) the ventral striatum.

Figure 7.

Grand-average BOLD responses aligned on the outcome image for the 5 sets of trial (legend). The left panels shows the grand-average BOLD responses for the 2 sets of incorrect trials (black and gray curves), whereas the right panels show the responses for the first, second, and third set of correct trials (green, red, and blue curves, respectively). (A) Network including the PMd and the parietal lobule (green network in Fig. 6). (B) Network with the SMA and the dorsal striatum (blue in Fig. 6). (C) Right VLPFC cluster (yellow in Fig. 6). (D) Left dorsolateral prefrontal cortex, and (E) the ventral striatum.

Figure 8.

Histogram of the mean ± standard error of the regression coefficients (the beta values) of the second-level analysis (group analysis across subjects). (A) Network including the PMd and the parietal lobule (green network in Fig. 6). (B) Network with the SMA and the dorsal striatum (blue in Fig. 6). (C) Right VLPFC cluster (yellow in Fig. 6). (D) Left dorsolateral prefrontal cortex, and (E) the ventral striatum.

Figure 8.

Histogram of the mean ± standard error of the regression coefficients (the beta values) of the second-level analysis (group analysis across subjects). (A) Network including the PMd and the parietal lobule (green network in Fig. 6). (B) Network with the SMA and the dorsal striatum (blue in Fig. 6). (C) Right VLPFC cluster (yellow in Fig. 6). (D) Left dorsolateral prefrontal cortex, and (E) the ventral striatum.

Discussion

We tested the Rescorla–Wagner model on fMRI data acquired from human participants learning arbitrary visuomotor associations. We developed a novel task that systematically manipulated learning and induced highly reproducible performances across subjects and sessions. This allowed us to perform a model-based analysis of the fMRI and an in-depth validation of the results using a trial-based approach. The results provide quantitative evidence of the neural computations mediating arbitrary visuomotor learning, some beyond those predicted by the Rescorla–Wagner model.

Neural Computations Predicted by the Rescorla–Wagner Model

Associative Values

No brain areas were found to reflect the stimulus–response–outcome associative strengths. Previous neuroimaging studies of arbitrary visuomotor learning found learning-related increases in the temporal and prefrontal areas (Toni et al. 2001) and in the parietal and frontal cortical areas (Eliassen et al. 2003). BOLD signals correlating with the probability of correct response, which is a relative measure of the associative values, have been found in the MTL as well as in the cingulate cortex and frontal lobe (Law et al. 2005). However, these effects are not comparable with those presented here, because they reflected the brain changes produced both by the stimulus–response pair and the outcome. Haruno and Kawato (2006) found stimulus-specific BOLD correlates of the associative strengths in the putamen, superior parietal, dorsolateral prefrontal, PMd, and occipital cortices, insula, thalamus, cerebellum, anterior cingulate cortex, and SMA. Kim et al. (2006) also showed that the stimulus-driven activity in the orbito-frontal cortex reflects the expected reward value signals. However, both studies used stochastic rather than deterministic visuomotor associations, and they also differed in the computational models and statistical tools used. From the neurophysiological point of view, several studies have shown that neurons displaying either an increase or decrease in firing rate during learning are spatially intermixed in the brain regions mediating arbitrary visuomotor learning (Wise and Murray 2000; Hadj-Bouziane and Boussaoud 2003; Brasted and Wise 2005; Suzuki and Brown 2005). Neurons coding for action-specific reward values were also found in the striatum (Samejima et al. 2005). If we interpret such modulations as the correlates of the present associative values at the single neuron level, it is tempting to suggest that local canceling effects could produce null net BOLD signals, and thus explain our negative results. Overall, further work is needed to provide a global picture of the neural correlates at the single neuron and system level of the associative values, whether during probabilistic or deterministic learning.

Prediction-Error Signals

The activity in the ventral striatum correlated with the signed value of the prediction-error signals (Table 1, Fig. 5). The grand-average BOLD responses showed a deactivation on the 2 sets of incorrect trials and an increase on the first set of correct trials (Figs 4A,B and 7E). This structure is likely to be functionally independent from the remaining network, because it merged last in the dendrogram (Fig. 6). Prediction-error signals were never observed during arbitrary visuomotor learning (Deiber et al. 1997; Toni and Passingham 1999; Toni et al. 2001; Eliassen et al. 2003). This result corroborates reports on prediction-error signals in the ventral striatum during probabilistic instrumental learning (O'Doherty et al. 2004; Kim et al. 2006; Pessiglione et al. 2006). Given that the ventral striatum receives dopaminergic afferents coding for prediction-error signals during a one-association instrumental learning task (Hollerman and Schultz 1998; Schultz 1998; Hollerman et al. 2000; Schultz and Dickinson 2000), the observed BOLD response in the ventral striatum is likely to be produced by changes in synaptic activity due to modulations in dopaminergic neuron activity. Our results, however, cannot rule out that other brain structures such as the medial prefrontal cortex (Matsumoto et al. 2007) could represent prediction-error signals.

Learning Computations Beyond the Rescorla–Wagner Model

Absolute Value of the Prediction-Error Signals

Several brain areas correlated with the absolute value of the prediction-error signals (Table 1). The PMd cortex, SPL, and IPL (clusters 5, 6, 8, 9 and 10 in Table 1; Fig. 5) formed a homogeneous functional network (in green in Fig. 6). Because the ventral striatum correlated with the signed prediction-error signals, this activity should be interpreted as reflecting computations other than pure prediction-error signals. An alternative model of associative learning could provide a plausible interpretation (Pearce and Hall 1980). The Pearce–Hall model is based on the assumption that animals need to attend to a stimulus or motor response only while they are learning about the relationship with their consequence (Pearce 1997; Pearce and Bouton 2001). During learning, the amount of controlled attention, and the amount of processing of a stimulus or motor response, is proportional to the absolute prediction-error signal. Thus, extinction and acquisition are formulated differently from the Rescorla–Wagner model. In fact, extinction is seen as an additional form of conditioning, the reinforcer being the omission of the expected outcome. In other words, the model postulates that inhibitory learning occurs if the expected correct outcome is omitted (as in the first incorrect trials in our experiment) such that an association forms between the observed stimulus (or response) and the absence of expected outcome (e.g., the frustration produced after an incorrect outcome or, more simply, with the incorrect outcome image). Similarly to the Rescorla–Wagner model, excitatory learning is thought to be responsible for the acquisition of correct associations. Therefore, we suggest that inhibitory and excitatory learning take place after the presentation of incorrect and correct outcomes, respectively. The dorsal fronto-parietal network should reflect the processing of preceding incorrect or correct visuomotor mappings while learning about their relationship with their consequences, that is when attention is directed to them. Current hypotheses about the functional role of the dorsal fronto-parietal network support this interpretation: previous work has highlighted its dual role in the computation of visuomotor transformations (Burnod et al. 1999; Culham and Valyear 2006) and in the control of goal-direct attention to salient stimuli and responses (Corbetta and Schulman 2002). The known cortico-cortical connections among the fronto-parietal regions additionally support the notion of a functionally homogeneous network (Wise et al. 1997; Tanne-Gariepy et al. 2002). This interpretation provides an explanation for the central role of the premotor cortex known from neurophysiological and lesion studies (Mitz et al. 1991; Chen and Wise 1995; Wise and Murray 2000; Hadj-Bouziane et al. 2003; Suzuki and Brown 2005) and of the PMd–SPL network following neuroimaging studies (Deiber et al. 1997; Eliassen et al. 2003; Law et al. 2005).

A second network correlating with the absolute value of the prediction error included the dorsal striatum and the SMA (in blue in Fig. 6). The dorsal striatum clusters extend over the head of the caudate and the putamen, but no clear distinction among them could be made. This network differed from the fronto-parietal network by a relatively smaller increase in BOLD response on the incorrect trials, especially in the SMA (Fig. 7B). Previous studies showed caudate activity correlating with the absolute value of the prediction error during probabilistic instrumental learning (Haruno and Kawato 2006). Learning-related activity was found in the head of the caudate nucleus (Toni et al. 2001; Grol et al. 2006; Seger and Cincotta 2005); fMRI activity was found in the SMA (Boettiger and D'Esposito 2005) and in the putamen (Seger and Cincotta 2005) during categorization learning tasks, with robust activations when subjects perceived a contingency between their actions and the outcome (Tricomi et al. 2004). In general, the caudate is thought to be involved in stimulus–response habit learning (Graybiel 1998, 2005; Packard and Knowlton 2002) and/or action–outcome learning (Hollerman et al. 2000; Hadj-Bouziane et al. 2003; Yin and Knowlton 2006). Consistently, Williams and Eskandar (2006) recently showed that microstimulation of the dorsal striatum at the time of reward after a correct response significantly enhanced the rate of learning. Therefore, both the fronto-parietal network and the fronto-striatal network reflect the processing of preceding incorrect or correct visuomotor and sensorimotor mappings while learning is taking place. These networks might differ in their contribution in the visuomotor (fronto-parietal circuit) and consolidation (fronto-striatal system) processes during learning.

The anterior VLPFC (cluster 11) also correlated with the absolute value of the prediction error (Fig. 5). The lateral prefrontal cortex is known to be involved in conditional visuomotor associations (Toni and Passingham 1999; Passingham et al. 2000; Toni et al. 2001), maintenance of goal-relevant information in working-memory, rule-based response selection, and the generation of new abstract rules (Miller 2000; Bunge 2004; Miller et al. 2004). Our fMRI result is, however, original, because it showed how this area activates in the early phases of learning. Therefore, we suggest that the prefrontal activation reflects the retrieval from memory of the observed stimuli and previously performed actions and/or their processing during the early phases of learning.

First Correct Outcomes

In the dorsolateral prefrontal cortex, Brodmann area 9 (clusters 2 in Table 1), we found a selective BOLD activation on the first correct trials (Fig. 7C) and a comparable contribution from the signed and absolute values of the prediction-error signals (Fig. 8C). The cluster is relatively independent from the other structures (in yellow in Fig. 6). In line with our result, the inferior frontal gyrus has been shown to selectively activate on first correct trials (Eliassen et al. 2003). In addition, the dorsolateral prefrontal cortex was found to decrease in activity during arbitrary visuomotor learning (Deiber et al. 1997; Law et al. 2005). The ability to rapidly learn stimulus–response–outcome associations is required for a correct implementation of learning strategies such as the repeat-stay (perform the same action if previously rewarded). Another learning strategy such as the change-shift (to select a different action if previously unrewarded) requires the reactivation of the incorrect associations. Thus, our results are in line with previous reports showing deficits in rapid arbitrary visuomotor learning and strategy use after lesions of the lateral and orbital prefrontal cortex (Bussey et al. 2001) and electrophysiological findings showing a selectivity in the discharge of prefrontal neurons for the type of learning strategy (Genovesio et al. 2005).

Conclusion

Overall, we showed that arbitrary visuomotor learning involves distributed areas of the “sensorimotor,” “associative,” and “limbic” fronto-striatal circuits that interact with the fronto-parietal network at the level of the PMd. Consistently with the Rescorla–Wagner model, prediction-error signals are computed in the human brain and selectively engage the ventral striatum. In parallel, the dorsal fronto-parietal network and the dorsal striatum activate in relation with the absolute value of the prediction error, and may reflect the selective processing of incorrect and correct visuomotor mappings while learning about their relationship with their consequences. The right VLPFC might represent the retrieval from memory of the observed stimuli and previously performed actions and/or their processing during the early phases of learning. Interestingly, the information about the first correct outcomes, which is crucial for a rapid acquisition of arbitrary associations, activates selectively the left dorsolateral prefrontal cortex. We thus provided quantitative evidence about the neural computations underlying arbitrary visuomotor learning and novel information about the functional specialization of the fronto-parietal and fronto-striatal systems. The newly developed learning task systematically manipulated learning and induced highly reproducible performances. Most importantly, it provides a novel perspective for the development of new tasks addressing other types of learning and/or decision making processes. In addition, the results suggest new directions for future computational models of arbitrary visuomotor learning within the scope of modern concepts in instrumental learning (Belleine and Dickinson 1998; Gallistel et al. 2004).

Supplementary Material

Supplementary material can be found at: http://www.cercor.oxfordjournals.org/.

Funding

French Ministère de la Recherche (ACI Neurosciences intégratives et computationnelles); and 2-year post doc fellowship awarded by the Fondation pour la Recherche Médicale (Paris, France) to A.B.

We wish to thank Jean-Luc Anton and Muriel Roth (fMRI Centre, IFR 131 Cerveau et Cognition, Marseille, France) for support in the fMRI data acquisition, Pierre-Arnaud Coquelin (Centre de Mathématiques Appliquées, UMR 7641 CNRS-Ecole polytechnique) for help in modelling, and Anders Ledberg (Computational Neuroscience Group, Universitat Pompeu Fabra, Barcelona, Spain) for useful suggestions in data analysis. Conflict of Interest: None declared.

References

Asaad
WF
Rainer
G
Miller
EK
Neural activity in the primate prefrontal cortex during associative learning
Neuron
 , 
1998
, vol. 
21
 (pg. 
1399
-
1407
)
Balleine
BW
Dickinson
A
Goal-directed instrumental action: contingency and incentive learning and their cortical substrates
Neuropharmacology
 , 
1998
, vol. 
37
 (pg. 
407
-
419
)
Boettiger
CA
D'Esposito
M
Frontal networks for learning and executing arbitrary stimulus-response associations
J Neurosci
 , 
2005
, vol. 
25
 (pg. 
2723
-
2732
)
Brasted
PJ
Wise
SP
Riehle
A
Vaadia
E
The arbitrary mapping of sensory inputs to volontary and involontary movements: learning-dependent activity in the motor cortex and other telencephalic networks
Motor cortex in voluntary movements
 , 
2005
Boca Raton
CRC Press
(pg. 
259
-
296
)
Bunge
SA
How we use rules to select actions: a review of evidence from cognitive neuroscience
Cogn Affect Behav Neurosci
 , 
2004
, vol. 
4
 (pg. 
564
-
579
)
Burnod
Y
Baraduc
P
Battaglia-Mayer
A
Guigon
E
Koechlin
E
Ferraina
S
Lacquaniti
F
Caminiti
R
Parieto-frontal coding of reaching: an integrated framework
Exp Brain Res.
 , 
1999
, vol. 
129
 (pg. 
325
-
346
)
Bussey
TJ
Wise
SP
Murray
EA
The role of ventral and orbital prefrontal cortex in conditional visuomotor learning and strategy use in rhesus monkeys (Macaca mulatta)
Behav Neurosci
 , 
2001
, vol. 
115
 (pg. 
971
-
982
)
Chen
LL
Wise
SP
Neuronal activity in the supplementary eye field during acquisition of conditional oculomotor associations
J Neurophysiol
 , 
1995
, vol. 
73
 (pg. 
1101
-
1121
)
Corbetta
M
Shulman
GL
Control of goal-directed and stimulus-driven attention in the brain
Nat Rev Neurosci
 , 
2002
, vol. 
3
 (pg. 
201
-
215
)
Corrado
G
Doya
K
Understanding neural coding through the model-based analysis of decision making
J Neurosci. 1
 , 
2007
, vol. 
18
 
7
(pg. 
1485
-
1495
)
Culham
JC
Valyear
KF
Human parietal cortex in action
Curr Opin Neurobiol.
 , 
2006
, vol. 
16
 (pg. 
205
-
212
)
Deiber
MP
Wise
SP
Honda
M
Catalan
MJ
Grafman
J
Hallett
M
Frontal and parietal networks for conditional motor learning: a positron emission tomography study
J Neurophysiol
 , 
1997
, vol. 
78
 (pg. 
977
-
991
)
Dickinson
A
Contemporary animal learning theory
 , 
1980
UK
Cambridge University Press
Dickinson
A
Mackintosh
NJ
Instrumental conditioning
Animal cognition and learning
 , 
1994
London
Academic Press
(pg. 
45
-
79
)
Eliassen
JC
Souza
T
Sanes
JN
Experience-dependent activation patterns in human brain during visual-motor associative learning
J Neurosci
 , 
2003
, vol. 
23
 (pg. 
10540
-
10547
)
Gallistel
CR
Fairhurst
S
Balsam
P
The learning curve: implications of a quantitative analysis
Proc Natl Acad Sci USA
 , 
2004
, vol. 
101
 (pg. 
13124
-
13131
)
Genovesio
A
Brasted
PJ
Mitz
AR
Wise
SP
Prefrontal cortex activity related to abstract response strategies
Neuron
 , 
2005
, vol. 
47
 (pg. 
307
-
320
)
Graybiel
AM
The basal ganglia and chunking of action repertoires
Neurobiol Learn Mem
 , 
1998
, vol. 
70
 (pg. 
119
-
136
)
Graybiel
AM
The basal ganglia: learning new tricks and loving it
Curr Opin Neurobiol.
 , 
2005
, vol. 
15
 (pg. 
638
-
644
)
Grol
MJ
de Lange
FP
Verstraten
FA
Passingham
RE
Toni
I
Cerebral changes during performance of overlearned arbitrary visuomotor associations
J Neurosci
 , 
2006
, vol. 
26
 (pg. 
117
-
125
)
Hadj-Bouziane
F
Boussaoud
D
Neuronal activity in the monkey striatum during conditional visuomotor learning
Exp Brain Res.
 , 
2003
, vol. 
153
 (pg. 
190
-
196
)
Hadj-Bouziane
F
Meunier
M
Boussaoud
D
Conditional visuo-motor learning in primates: a key role for the basal ganglia
J Physiol Paris
 , 
2003
, vol. 
97
 (pg. 
567
-
579
)
Haruno
M
Kawato
M
Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning
J Neurophysiol
 , 
2006
, vol. 
95
 (pg. 
948
-
959
)
Hollerman
JR
Schultz
W
Dopamine neurons report an error in the temporal prediction of reward during learning
Nat Neurosci
 , 
1998
, vol. 
1
 (pg. 
304
-
309
)
Hollerman
JR
Tremblay
L
Schultz
W
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior
Prog Brain Res.
 , 
2000
, vol. 
126
 (pg. 
193
-
215
)
Kim
H
Shimojo
S
O'Doherty
JP
Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain
PLoS Biol
 , 
2006
, vol. 
4
 pg. 
e233
 
Lancaster
JL
Woldorff
MG
Parsons
LM
Liotti
M
Freitas
CS
Rainey
L
Kochunov
PV
Nickerson
D
Mikiten
SA
Fox
PT
Automated Talairach atlas labels for functional brain mapping
Hum Brain Mapp
 , 
2000
, vol. 
10
 (pg. 
120
-
131
)
Law
JR
Flanery
MA
Wirth
S
Yanike
M
Smith
AC
Frank
LM
Suzuki
WA
Brown
EN
Stark
CE
Functional magnetic resonance imaging activity during the gradual acquisition and expression of paired-associate memory
J Neurosci
 , 
2005
, vol. 
25
 (pg. 
5720
-
5729
)
Macintosh
N
A theory of attention: variations in the associability of stimuli with reinforcement
Psychol Rev.
 , 
1975
, vol. 
82
 (pg. 
276
-
298
)
Matsumoto
M
Matsumoto
K
Abe
H
Tanaka
K
Medial prefrontal cell activity signaling prediction errors of action values
Nat Neurosci
 , 
2007
, vol. 
10
 (pg. 
647
-
656
)
Miller
EK
The prefrontal cortex and cognitive control
Nat Rev Neurosci
 , 
2000
, vol. 
1
 (pg. 
59
-
65
)
Miller
EK
Freedman
DJ
Wallis
J
The prefrontal cortex: categories, concepts and cognition
Philos Trans R Soc Lond B
 , 
2004
, vol. 
357
 (pg. 
1123
-
1136
)
Mitz
AR
Godschalk
M
Wise
SP
Learning-dependent neuronal activity in the premotor cortex: activity during the acquisition of conditional motor associations
J Neurosci
 , 
1991
, vol. 
11
 (pg. 
1855
-
1872
)
O'Doherty
J
Dayan
P
Schultz
J
Deichmann
R
Friston
K
Dolan
RJ
Dissociable roles of ventral and dorsal striatum in instrumental conditioning
Science
 , 
2004
, vol. 
304
 (pg. 
452
-
454
)
O'Doherty
JP
Hampton
A
Kim
H
Model-based fMRI and its application to reward learning and decision making
Ann N Y Acad Sci.
 , 
2007
, vol. 
1104
 (pg. 
35
-
53
)
Packard
MG
Knowlton
BJ
Learning and memory functions of the basal ganglia
Annu Rev Neurosci
 , 
2002
, vol. 
25
 (pg. 
563
-
593
)
Passingham
RE
Toni
I
Rushworth
MF
Specialisation within the prefrontal cortex: the ventral prefrontal cortex and associative learning
Exp Brain Res.
 , 
2000
, vol. 
133
 (pg. 
103
-
113
)
Pasupathy
A
Miller
EK
Different time courses of learning-related activity in the prefrontal cortex and striatum
Nature
 , 
2005
, vol. 
433
 (pg. 
873
-
876
)
Pearce
JM
Animal leaning and cognition
 , 
1997
United Kingdom
Psychology Press
Pearce
JM
Bouton
ME
Theories of associative learning in animals
Annu Rev Psychol
 , 
2001
, vol. 
52
 (pg. 
111
-
139
)
Pearce
JM
Hall
G
A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli
Psychol Rev.
 , 
1980
, vol. 
87
 (pg. 
532
-
552
)
Pessiglione
M
Seymour
B
Flandin
G
Dolan
RJ
Frith
CD
Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans
Nature
 , 
2006
, vol. 
442
 (pg. 
1042
-
1045
)
Rescorla
RA
Associative relations in instrumental learning: the eighteenth Barlett Memorial Lecture
1991
Rescorla
RA
Wagner
AR
Black
AH
Prokasy
WF
A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement
Classical conditioning II: current theory and research
 , 
1972
New York
Appleton-Century-Crofts
(pg. 
64
-
99
)
Samejima
K
Ueda
Y
Doya
K
Kimura
M
Representation of action-specific reward values in the striatum
Science
 , 
2005
, vol. 
310
 (pg. 
1337
-
1340
)
Schultz
W
Predictive reward signal of dopamine neurons
J Neurophysiol
 , 
1998
, vol. 
80
 (pg. 
1
-
27
)
Schultz
W
Behavioral theories and the neurophysiology of reward
Annu Rev Psychol
 , 
2006
, vol. 
57
 (pg. 
87
-
115
)
Schultz
W
Dickinson
A
Neural coding of prediction errors
Annu Rev Neurosci
 , 
2000
, vol. 
23
 (pg. 
473
-
500
)
Seger
CA
Cincotta
CM
The roles of the caudate nucleus in human classification learning
J Neurosci
 , 
2005
, vol. 
25
 (pg. 
2941
-
2951
)
Smith
AC
Frank
LM
Wirth
S
Yanike
M
Hu
D
Kubota
Y
Graybiel
AM
Suzuki
WA
Brown
EN
Dynamic analysis of learning in behavioral experiments
J Neurosci
 , 
2004
, vol. 
24
 (pg. 
447
-
461
)
Sutton
RS
Barto
AG
Reinforcement learning: an introduction
 , 
1998
Cambridge (MA)
MIT Press
Suzuki
W
Brown
E
Behavioral and neurophysiological analyses of dynamic learning processes
Behav Cogn Neurosci Rev.
 , 
2005
, vol. 
4
 (pg. 
67
-
95
)
Tanne-Gariepy
J
Rouiller
EM
Boussaoud
D
Parietal inputs to dorsal versus ventral premotor areas in the macaque monkey: evidence for largely segregated visuomotor pathways
Exp Brain Res.
 , 
2002
, vol. 
145
 (pg. 
91
-
103
)
Toni
I
Passingham
RE
Prefrontal-basal ganglia pathways are involved in the learning of arbitrary visuomotor associations: a PET study
Exp Brain Res.
 , 
1999
, vol. 
127
 (pg. 
19
-
32
)
Toni
I
Ramnani
N
Josephs
O
Ashburner
J
Passingham
RE
Learning arbitrary visuomotor associations: temporal dynamic of brain activity
NeuroImage
 , 
2001
, vol. 
14
 (pg. 
1048
-
1057
)
Tricomi
EM
Delgado
MR
Fiez
JA
Modulation of caudate activity by action contingency
Neuron
 , 
2004
, vol. 
41
 (pg. 
281
-
292
)
Watkins
CJCH
Dayan
P
Q-learning
Mach Learn
 , 
1992
, vol. 
8
 (pg. 
279
-
292
)
Williams
ZM
Eskandar
EN
Selective enhancement of associative learning by microstimulation of the anterior caudate
Nat Neurosci
 , 
2006
, vol. 
9
 (pg. 
562
-
568
)
Wirth
S
Yanike
M
Frank
LM
Smith
AC
Brown
EN
Suzuki
WA
Single neurons in the monkey hippocampus and learning of new associations
Science
 , 
2003
, vol. 
300
 (pg. 
1578
-
1581
)
Wise
SP
Boussaoud
D
Johnson
PB
Caminiti
R
Premotor and parietal cortex: corticocortical connectivity and combinatorial computations
Annu Rev Neurosci
 , 
1997
, vol. 
20
 (pg. 
25
-
42
)
Wise
SP
Murray
EA
Arbitrary associations between antecedents and actions
Trends Neurosci
 , 
2000
, vol. 
23
 (pg. 
271
-
276
)
Yin
HH
Knowlton
BJ
The role of the basal ganglia in habit formation
Nat Rev Neurosci
 , 
2006
, vol. 
7
 (pg. 
464
-
476
)