During smooth pursuit, the image of the target is stabilized on the fovea, implying that speed judgments made during pursuit must rely on an extraretinal signal providing precise eye speed information. To characterize the introduction of such extraretinal signal into the human visual system, we performed a factorial, functional magnetic resonance imaging experiment, in which we manipulated the factor eye movement, with “fixation” and “pursuit” as levels, and the factor task, with “speed” and “form” judgments as levels. We hypothesized that the extraretinal speed signal is reflected as an interaction between speed judgments and pursuit. Random effects analysis yielded an interaction only in dorsal early visual cortex. Retinotopic mapping localized this interaction on the horizontal meridian (HM) between dorsal areas visual 2 and 3 (V2/V3) at 1–2° azimuth. This corresponded to the position the pursuit target would have reached, if moving retinotopically, at the time of the subject's speed judgment. Because the 2 V2/V3 HMs are redundant, both may be involved in speed judgments, the ventral one involving judgments based on retinal motion and the dorsal one judgments requiring an internal signal. These results indicate that an extraretinal speed signal is injected into early visual cortex during pursuit.
Pursuit eye movements are an important acquisition of primates allowing them to analyze a moving object in detail by stabilizing its image onto the fovea (Lisberger et al. 1987; Fukushima 2003; Krauzlis 2004; Ilg and Thier 2008). A moving target appears to be moving when pursued even though its image remains almost stationary on the retina, and its speed can be assessed as precisely as during fixation when the image moves over the retina (Kowler and McKee 1987; Lisberger et al. 1987). This requires an extraretinal signal that can be combined with the retinal motion signals to correctly perceive speed and maintain pursuit. These extraretinal signals arise mainly from the speeds of previous pursuit movements, the so-called velocity memory in pursuit (Newsome et al. 1988) and the speed of the eye itself. The latter signals are believed to be generated in eye movement control systems and correct for spurious visual effects of eye movements. The best known of these is the position signal that maintains stability of the visual scene across saccades. Such stability has recently been shown to entail shifting receptive fields (RFs) in the frontal eye field (FEF) by signals originating in the superior colliculus and transiting through the thalamus (Sommer and Wurtz 2008). Extraretinal signals have also been postulated to promote object coherence (Hafed and Krauzlis 2006) and to correct the perception of motion direction (Souman et al. 2005). Pursuit requires signals correcting for changes in image speed rather than image position. The question we address here concerns the brain region where this extraretinal speed signal is integrated with the retinal signals.
There is considerable evidence in the monkey that the network controlling pursuit involves middle temporal or visual area 5 (MT/V5). Both the mean pursuit response (Newsome et al. 1985) and trial-by-trial variations (Osborne et al. 2005) indicate that the initial 100–125 ms of pursuit depends upon the MT/V5 responses. One way to unmask extraretinal input during visual pursuit has been to temporarily blank the pursued target. Under these circumstances, pursuit continues, albeit at reduced speed (Becker and Fuchs 1985); yet, MT/V5 responses disappear, whereas those in ventral part of the medial superior temporal area (MSTv) remain (Newsome et al. 1988). This has been taken as evidence that the speed and direction signals elaborated in MT/V5 are combined with extraretinal signals in MSTv. Churchland and Lisberger (2005) found that an ideal observer could predict target speed from the extraretinal singal in MSTv in only 25% of the pursuit trials. Pooling the data from many MSTv neurons increased the success rate to only 50%. Thus, the extraretinal signal in MSTv is too unreliable to compute target speed during pursuit but might help to control the gain during pursuit (Churchland and Lisberger 2002). This control depends on the part of FEF devoted to pursuit (Tanaka and Lisberger 2002a, 2002b), to which MSTv projects. Blanking the pursuit target increases neuronal activity in this part of FEF over time but only when the monkey predicts the direction of pursuit (Ilg and Thier 2008). This is consistent with the role of FEF in the prediction of pursuit (Fukushima et al. 2002), which has been linked to velocity memory.
Blanking of the pursuit target has also been used in human functional imaging studies to reveal integration of extraretinal and retinal signals in the human pursuit network (Nagel et al. 2006, 2008). These studies have yielded contradictory results depending on the experimental design: block versus event related. This inconsistency was interpreted by Nagel et al. (2008) as an indication that the regions unmasked by blanking were involved in higher order, executive functions (Burke and Barnes 2008) more than in velocity memory. Interestingly, the more controlled of the 2 studies (Nagel et al. 2008) revealed an activation site in visual area 1 (V1), which was not discussed, perhaps because of visual differences between the conditions. Yet, early integration of retinal and extraretinal signals would have the advantage that most visual areas receive the corrected signal. This would be beneficial, especially for humans, in whom not just MT/V5+ but also V3A is motion sensitive (Tootell et al. 1998), unlike monkey V3A (Vanduffel et al. 2001). It raises a formidable problem, however, in the sense that these early visual regions are retinotopically organized and would have to signal the motion of a target whose image is quasi-immobile on the retina! Thus, it is presently unclear at what level of the human or nonhuman primate visual system the extraretinal speed signal is integrated with retinal speed during pursuit.
To address this issue, we have used a paradigm different from the blanking paradigm for tracking the extraretinal signal during pursuit. We compared brain activity during pursuit when subjects must explicitly compute target speed, which they report manually, with activity when they are not required to do so because they judge only the shape of the pursued object. In both tasks, the subjects pursue the target but the pursuit is largely predictive, considerably lessening the need for a precise target speed signal and could rely on velocity memory (Barnes and Collins 2008). However, estimating speed during the pursuit requires a precise assessment of target speed, enhancing the need for a precise extraretinal speed signal, possibly of oculomotor origin. Thus, we expect the extraretinal signal to elicit a stronger activation when subjects discriminate speed rather than form during pursuit. Although both tasks may similarly mobilize spatial attention during pursuit (Ohlendorf et al. 2007), differences in featural attention—the difference between attending to the attribute speed compared with the feature shape—may also cause differences in magnetic resonance activity between the 2 tasks. To exclude this possibility, these 2 tasks were also used during fixation. During fixation, the extraretinal signal is absent and will introduce no differences in cerebral activation. Thus, in this paradigm, the extraretinal speed signal would be revealed by an interaction between task and eye movement factors: the difference between the speed and the shape tasks should be greater during pursuit than during fixation. The random effects analysis in 17 subjects yielded an interaction only at the horizontal meridian (HM) representation separating the dorsal portions of areas V2 and V3. This is a very remarkable location in the human visual cortex, as the V2–3 boundary is the only site where the retinotopic representation is redundant. This way the visual system can solve the conundrum of localizing in a retinotopic framework a target whose image is stabilized on the retina and yet has to be perceived as moving.
Materials and Methods
Seventeen volunteers (7 males, median age = 23 years, standard deviation [SD] = 2.01) participated in the study. All were right handed, with normal uncorrected vision. None presented oculomotor deficits, ophthalmologic, psychiatric, or neurological disorders, or other counter-indications for functional magnetic resonance imaging (fMRI), and none were on medication. All subjects gave their written informed consent. The protocol was approved by the local Ethics Committee (CPP Ile de France VI). Each subject underwent five 1-h training sessions before the scanning session and was asked to avoid alcohol consumption before these sessions. During scanning, the subjects laid supine and viewed a translucent screen through a 45° tilted mirror. Eye movements were recorded at 120 Hz using an infrared eye tracker (Applied Science Laboratories Long Arranges Optics L6).
Images were generated using in-house software (Paradigm, S. Kinkingnéhun, Paris, France) and were projected (Epson EMP-8300 1024 × 768 pixels) onto a translucent screen (20.5° × 15.3°) 85 cm from the subjects’ eyes.
The main paradigm included 4 conditions following a 2 × 2 factorial design (Fig. 1A). The oculomotor factor had 2 levels: smooth pursuit (P) and fixation (F), as well as the perceptual task factor: target speed discrimination (S) and form discrimination (F). The stimuli were devised to minimize intrusive saccades and to equalize visual stimulation in the pursuit and fixation trials as much as possible. Three stimuli were presented, all composed of small checkerboard patches with the same mean luminance as the uniform gray background (120 cd/m2): the moving target (1° × 0.5° vertically elongated ellipse), a central 0.33° square (central square), and a small 0.17° circle (fixation cue). Pursuit trials (Fig. 1B) used a step-ramp stimulus (Rashbass 1961, 1965). Before the trial, subjects fixated a cue at 2° azimuth, and each trial started with the target appearing slightly more eccentric than the fixation cue (2.5° azimuth) and then immediately moving horizontally toward the midline of the screen (Fig. 1A). After an initial period of 517–1095 ms (mean 702 ms, SD 83 ms for trials with speeds faster than the reference [4.5°/s]; mean 823 ms, SD 41 ms for trials with speeds slower than the reference—see below), the target slowed exponentially and disappeared when it reached 2° azimuth on the opposite side of the midline. The motion epoch lasted 1650–1800 ms in 50 ms steps. It was immediately replaced by a new fixation cue at 2° azimuth, that is, where the target had disappeared, helping the subject to maintain fixation while waiting (for 200–550 ms) for the next trial in the opposite direction. Subjects had to pursue the moving target as smoothly as possible or to gaze at the fixation cue between trials. During pursuit trials, a third stimulus, the central square, remained static on the screen. In fixation trials (Fig. 1C), the trajectory was slightly modified in order to focus attention on the central part of the retina: while subjects fixated the central square, target movement started at 1.5° from the midline, with the rest of the motion being similar to that in pursuit trials. The target thus stopped at 3° azimuth on the other side of the midline and was replaced by the small circle serving as fixation cue in the pursuit trials. As in the pursuit trials, motion direction alternated from left to right. Half of the trials were in the upper field (UF) (1.5° above the HM) and the rest in the lower field (LF) (1.5° below the HM). Because it took about 1° of target motion before the pursuit gain reached 1, the image of the central square began to move over the retina at the same speed as the target on the screen but in the opposite direction when the target was 1.5° from the midline. This way, the motion of the central square in pursuit trials starting at the left above the HM corresponded to the motion of the target in fixation trial starting from the right below the HM, and so on, for all 4 combinations of pursuit and fixation trials. Hence, the retinal stimulation was largely equalized in pursuit and fixation trials, but minor differences could not be avoided. On one hand, the retinal stimulation was stronger in pursuit than in fixation trials close to the fixation point due to the difference in the shapes of target and central square and the retinal slip at the end of the motion and also between 2.5° and 1.5° azimuth because of the position of the fixation cue and the appearance and the transient slip of the target image at the beginning of pursuit. On the other hand, retinal stimulation was stronger in fixation than pursuit trials at 3° and at 1.5° azimuth because of the onset of the fixation cue or target.
For each trial, the speed and form of the target were randomly chosen from 1 of 2 ranges. For speed, the initial constant velocity was either faster than 4.5°/s (“fast” trials, mean 4.89°/s, SD 0.12°/s) or slower than 4.5°/s (“slow” trials, mean 4.11°/s, SD 0.12°/s). For form, the vertical target diameter was either larger than 1° (“elongated” trials) or smaller than 1° (“round” trials), keeping its area constant. The differences in speed or shape at the beginning of a block depended on performance in earlier blocks. Trials of long and short duration were randomly selected to uncouple speed from amplitude. One of 4 different conditions were presented to the subjects: 1) PS condition: speed discrimination during smooth pursuit, 2) PF condition: form discrimination during smooth pursuit, 3) FS condition: speed discrimination during fixation, and 4) FF condition: form discrimination during fixation. Instructions were provided at the beginning of each block of trials of a given condition (Fig. 1). Judgments were signaled by button presses with the left (slow or round targets) or right (fast or elongated targets) hand. Trials lasted 2000, 2100, or 2200 ms. Blocks of each condition lasted approximately 30 s (14 successive trials, mean duration 29.3 ± 0.14 s), were preceded by 5-s instruction, and pseudorandomly repeated twice within one scanning run or time series. The target moved within either the upper or the lower hemifield in each 30-s block in pseudorandom order in a run (50% in upper and 50% in lower hemifields for each condition). The paradigm also included a 30-s simple fixation (FIX) condition (preceded by a 3-s instruction), during which the subject fixated the central square without any other visual stimulation or judgment. The total duration of a run was therefore 8.52 min.
Each subject completed 4 runs in each training or scanning session. During training sessions, discrimination thresholds for each of the 4 judgment tasks were assessed and stimulus differences in subsequent blocks adjusted to the thresholds reached in previous blocks. In addition, within blocks, stimulations were gradually modified by staircase procedure to track 84% positive responses for each condition (Wetherill and Levitt 1965). This procedure allowed us to adjust all judgment tasks to the same level of difficulty during the fMRI session and to compare conditions under equivalent loads of attention and difficulty.
In all subjects, 2 localizers, “pursuit” and “motion,” were tested. In the 360-s “pursuit localizer,” 30-s pursuit and “fixation” blocks alternated. During pursuit blocks, a target (1° × 0.5° vertically elongated ellipse) moved sinusoidally along the HM (period = 1 s, peak displacement = 5°), and subjects had to pursue the target as smoothly as possible. During fixation blocks, subjects had to fixate the central square with no other visual stimulation. In the motion localizer (Sunaert et al. 1999), 30-s blocks of a moving or static random texture pattern (7° diameter) alternated. Subjects fixated the center of the screen as carefully as possible.
For 15 of the subjects, the visual cortex was mapped retinotopically using black and white flickering (4 Hz) checkerboards to alternately stimulate horizontal, upper and lower vertical visual meridians, upper and lower visual fields, and the central representation of visual cortex (Fize et al. 2003; Claeys et al. 2004). HM stimuli were composed of horizontal wedges to the left and right of the center, 45° in width. Upper vertical meridian (VM) (UVM) and lower VM (LVM) stimuli were single 60° vertical wedges. UF and LF stimuli were 160° wedges sparing the central 1.5°. The radial extent of these stimuli measured 5.7°. The central stimuli were adjusted to stimulate a region of only 0.7° radius centered on the fovea. Each stimulus was presented continuously during a block of 25 s. These 6 retinotopic stimuli and a fixation-only condition were presented 3 times pseudorandomly within one run.
Functional acquisitions were made using a 3-T Magnetic Resonance system (Siemens Magnetom Trio Tim 32 channels syngo MR. VB13, Center of NeuroImagery of Research, Hôpital Pitié Salpêtrière) equipped with a 12-channel head coil. During fMRI sessions, cerebral sections were directed parallel to AC–PC axis. Functional data were acquired with -weighted gradient echo planar image (EPI) sequences (time repetition 2800 ms; time echo 30 ms; flip angle 90°, matrix size 80 × 80; field of view 200 × 200 mm2; voxel size 2.5 × 2.5 × 2.5 mm3; 50 transverse slices, 0.25 mm gap,). One run lasted approximately 8.52 min, allowing the acquisition of 182 or 183 volumes, depending on the exact duration. In a first session, 4 runs of the main paradigm were scanned, as well as a single pursuit localizer run and a single motion localizer run. In the second session, 2 retinotopic runs were acquired. T1-weighted 3D anatomical acquisitions (magnetization-prepared rapid gradient-echo) were made at the end of the first session.
Functional Data Analyses
Magnetic resonance imaging data were analyzed with the SPM5 software package (Wellcome Department of Cognitive Neurology, London, UK). The same preprocessing methods were applied to all images: slice timing, correction of head movements, coregistration of anatomical and functional images, spatial normalization to the Montreal Neurological Institute (MNI) EPI template brain, and spatial smoothing with isotropic Gaussian kernels of 5-mm full-width at half-maximum (FWHM) (Friston, Ashburner, et al. 1995). In a first-level analysis, individual data were fitted to a general linear model (Friston, Holmes, et al. 1995). The model was specified by defining for each of the 5 conditions a regressor describing the theoretical variation of the blood oxygen level–dependent (BOLD) signal during the run. This model was estimated, and contrasts were defined at the individual level. A second smoothing with a 5-mm FWHM isotropic Gaussian kernel was applied to the contrast images, and a random effect second-level group analysis was performed to assess the significance of the activations at the population level (Friston, Ashburner, et al. 1995). The resulting T-score maps were either superimposed on the SPM5 T1-weighted anatomical template or projected onto the human population-averaged landmark- and surface-based (PALS) atlas surface (http://brainvis.wustl.edu/wiki/index.php/Caret:About; Van Essen 2005) using the Caret software (http://brainvis.wustl.edu/caret/; Van Essen et al. 2001). Coordinates refer to the standard template defined by the MNI.
The following contrasts were calculated using SPM5. The first main effect (PS + PF) − (FS + FF) tests the influence of the “oculomotor” factor (oculomotor contrast). Positive values reveal activations by smooth pursuit compared with fixation across the discrimination tasks. Negative values represent activity related to fixation while making judgments. The second main contrast (PS + FS) − (PF + FF) tests the effect of the “task” factor (judgment contrast). Positive values here indicate activation by speed discrimination relative to form discrimination, whereas negative values indicate the opposite. For these 2 main contrasts, statistical parametric maps were thresholded at P < 0.05 corrected for multiple comparisons using the familywise error (FWE) correction and for cluster extent at P < 0.05. The critical contrast, however, is the interaction between the 2 factors: (PS − PF) − (FS − FF). To restrict this interaction to speed-sensitive regions, it was masked with the main effect of speed judgment conditions (inclusive mask). The masking also ensures that the interaction effects reflect the effects of the PS condition rather than the FF condition, which has the same sign in the interaction equation. Thus, the positive interaction values revealed cortical activities involved in speed judgment during smooth pursuit eye movement, whereas negative effects reflected speed judgment activity during fixation. The interaction contrast was thresholded at P < 0.001 uncorrected both at the individual and group levels. Cluster size threshold was set at 5 voxels to be able to detect relatively small interaction sites. The lower threshold is justified by the fact that, unlike main effects, an interaction is a difference of differences, thus including 2 variances in the statistical calculations (Georgieva et al. 2009). Because activity during speed discrimination was lower than during form discrimination, the speed main effect mask was defined at P < 0.05 uncorrected for multiple comparisons for the group and at P < 0.5 uncorrected for multiple comparisons for individual subjects. Standard contrasts of the pursuit (pursuit vs. fixation) and motion (motion vs. stationary) localizer tests were also subjected to a random effects analysis. The threshold was set at P < 0.05 FWE corrected, with a correction for cluster extent (P < 0.05). The human MT/V5 (hMT/V5+) complex was defined by the motion localizer.
In the analysis of the retinotopic data, each meridian-sensitive area was defined by contrasting twice the activity of a meridian with the sum of the 2 others: 2 × (HM) − (UVM + LVM) for HM, 2 × (UVM) − (HM + LVM) for UVM, and 2 × (LVM) − (HM + UVM) for LVM. A similar procedure was used to detect sensitivity to visual eccentricity. Positive values of the contrast 2 × (center) − (UF + LF) delineated the central representation in visual cortex (radius = 0.7°) and negative values corresponded to the immediate periphery in the visual field (from 0.7° to 5.7°). The difference between UF and LF sensitivity was determined by contrasting one condition with the other: (UF) − (LF). Random effect analysis was performed at the group level in addition to single-subject analyses. Threshold was set at P < 0.001 uncorrected but lowered to P < 0.05 uncorrected if necessary, given the availability of a priori information provided by previous mapping studies (for review, Wandell et al. 2007). The aim of this analysis is not to decide whether or not an activation site is present but rather where it is localized. Thus, it is not necessary to guard against false positives, and high thresholds are not required. We first attempted to localize the local maxima (meridians) or determine the limits of the activation (eccentricity) at P < 0.001 uncorrected and, if necessary, used activity at P < 0.05 uncorrected. The resulting borders are indicated by solid and stippled lines, respectively. Individual and group T-score maps were then projected onto the PALS atlas surface to define borders of areas V1, V2, and V3, dorsally and ventrally, as well as the HMs of lateral occipital 1 (LO1) and human area visual 4 (hV4) (Larsson and Heeger 2006; Georgieva et al. 2009).
To characterize the functional properties of cortical regions, we computed activity profiles plotting percent signal change relative to fixation baseline, averaged across subjects, in the different conditions. We defined regions of interest (ROIs) either by the motion localizer or by the interaction in the main paradigm, masked with the speed main effect. The middle occipital gyrus (MOG) ROIs were defined as the clusters of voxels reaching P < 0.001 uncorrected in the interaction, masked by the main effect of speed. The hMT/V5+ ROIs were centered on the most significant voxel near the ascending branch of inferior temporal sulcus (ITS) in the motion localizer and were similar in size to the MOG ROIs: 78 and 81 voxels for left and right hMT/V5+ compared with 91 and 81 voxels for left and right MOGs.
In addition, we defined ROIs on the flat map along the 5 HMs as identified in individual retinotopic flat maps: HM1 between ventral and dorsal V1, HM2 between dorsal V2 and V3, HM3 between ventral V2 and V3, HM4 in the middle of LO1, and HM5 in the middle of ventral V4. ROIs were centered on the HMs and sized according to the eccentricity, extending 4 mm near the center and 8 mm at the periphery, with a constant width of 3 mm and a 3-mm gap between adjacent ROIs. Coordinates of corresponding voxels were extracted for each ROI and activity profiles computed. The cortical distance between each ROI and the center of the map (defined by the confluence of all HMs) was then converted from millimeters to degrees: each meridian had its own specific magnification function θ = ea(d+b), where θ the distance in degrees, d the distance in millimeters, and a and b 2 constants which had to be specified (Engel et al. 1997; Qiu et al. 2006; Schira et al. 2007). Because the relationship between millimeters and visual degrees was known for 2 eccentricities (0.7° and 5.7°), the 2 constants, a and b, could be determined for each HM. The values of a and b were similar for the different meridians: the mean value of a ranged from 0.39 (SD 0.09) on HM1 to 0.56 (SD 0.17) on HM4 and that of b from −3.86 (SD 0.36) on HM2 to −4.86 (SD 0.86) on HM5. This enabled us to plot the BOLD signals in the 4 conditions (as percent signal change relative to fixation) as a function of azimuth (in degrees) along 5 HMs for each hemisphere. These BOLD signals were derived from the individual subject analysis (first step of the random effects analysis, see above) using a 5-mm smoothing, which was retained in order to average across trials in UF and LF and across subjects.
Because of between-subject variability in the retinotopic maps, the number and positions of ROIs along an HM (mean 8.2, SD 0.8, range 6–10) differed between hemispheres. To perform a group analysis on the curves, individual curves for a given HM needed to include the same number of data points. To that end, for a given HM, we considered all ROI positions (in degrees) present in the 30 hemispheres (HM1: 150, HM2: 67, HM3: 104, HM4: 95, and HM5: 163 positions) and supplied any missing points in a given hemisphere by linear interpolation. Curves were then averaged across hemispheres using these data points and the BOLD signals extracted at 7 azimuth values, starting from zero in 1° steps. Thus, average curves included 7 points, approximating the original average number of ROIs per curve. For the 30 cortical surfaces (15 right and 15 left hemispheres), 175 data points were available, grouped by 3 factors: 5 meridians × 5 conditions × 7 eccentricities. Data were then fitted using a general linear model to perform an analysis of variance (ANOVA, Statistica) with repeated measures (across hemispheres) of main effects and interactions between factors and post hoc analysis (Tukey's honestly significant differences [HSD] test). In addition, planned comparisons tested the involvement of each meridian as a function of eccentricity in the speed or shape discrimination independently of the oculomotor factor, attributing a coefficient of +1 to PS and FS, −1 to PF and FF, and 0 to FIX. Finally, the interaction among the 4 conditions was also tested with planned comparisons (coefficient +1 to PS and FF, −1 to PF and FS, and 0 to FIX) at every azimuth of the 5 meridians. Differences were significant for P values <0.001 (correcting for 5 × 7 tests).
Eye Movement and Behavioral Data Analysis
The oculomotor recordings were taken at 120 Hz. Calibrations were performed at the start and end of each run. The raw data were stored at the end of each run and analyzed by a semiautomated software developed in Matlab (The Mathworks, Inc., Natick, MA) following standard procedures (Barnes 1982). Artifacts and eyeblinks were removed and replaced by linearly interpolated values, and trials with more than 25% modified data were discarded. Next, a low-pass filtering (50 Hz) was applied, and saccades were detected by acceleration (threshold > 2 SD of baseline distribution) and amplitude (amplitude > 1°) criteria (Bennett and Barnes 2003) and removed after visual inspection. The missing data were replaced by linear interpolation. Given the small amplitude of the pursuit eye movements (4°) and the relatively restricted accuracy of the eye tracker (0.5°), the pursuit component of the eye movement signal was obtained by smoothing using a 1 Hz, second-order Butterworth filter.
For each trial, the pursuit gain was calculated as the ratio between the maximum eye velocity and the maximum target velocity. The relative distance between the fovea and the target image was expressed as the difference between eye position and target position for each trial and classified according to the type of trial (speed or form discrimination) and the target speed (slower or faster than 4.5°/s). The behavioral data were stored at the end of each run for later analysis using custom software.
Behavioral Performance during Scanning
The subjects were trained in the 2 discrimination tasks outside the scanner. These training sessions provided values for the Weber fractions (stimulus difference divided by the mean) used to achieve 84% correct responses. As a consequence, subjects performed all 4 tasks equally well during scanning. Performance averaged 83–85% correct (Fig. 2A), with no significant difference between conditions (χ2 = 3.97, degrees of freedom = 3, P > 0.2). Average stimulus differences expressed as Weber fractions were greater for speed than for shape tasks (Fig. 2C). Note that the Weber fraction was slightly greater for the speed judgments during fixation than during pursuit. An ANOVA testing for differences in Weber fractions across the 4 conditions was significant (F3,64 = 63.7, P < 0.001), and the post hoc HSD Tukey test between PS and FS reached P < 0.05. In addition to these overall differences, the Weber fractions were adapted from trial to trial depending on the subject's performance in the scanner (Fig. 2D): the difference was decreased 5% following 3 successive correct responses and increased 5% after an incorrect response (Wetherill and Levitt 1965). This further equalized performance across the 4 tasks. The reaction times for the button presses were long, just over 1 s (Fig. 2B), and slightly longer for pursuit than fixation tasks, but differences were not significant (ANOVA, F3,64 = 1.4). Thus, subjects responded late in all tasks near the end of target motion.
Oculomotor Performance during Scanning
All subjects were trained outside the scanner to perform both the perceptual and the oculomotor tasks. Eye movement recordings outside the scanner confirmed that subjects pursued the moving target well. Eye recordings from at least one scanning run could be analyzed in 11 of the 17 subjects (mean number of runs analyzed per subject [n = 11] = 3.1, SD 1.3). For these subjects, the quality of the recordings allowed us to investigate pursuit behavior in detail. For the other 6 subjects, artifacts were too numerous to warrant analysis. One way of assessing the quality of oculomotor behavior, applicable to both fixation and pursuit behaviors, is the number of saccades in the different conditions. In the fixation tasks, subjects (n = 11) averaged 1 saccade per min (mean 1.2, SD 1.2 for FS and mean 0.76, SD 0.68 for FF) and less than 2 saccades per min in the pursuit tasks (mean 1.8, SD 1.2 for PS and mean 1.5, SD 1.4 for PF), but with no significant differences between the 5 conditions (ANOVA, F4,48 = 1.8, not significant [NS]). The frequency of the saccades is low in the present study. One of our earlier studies reported 4.5 saccades/min (Jastorff and Orban 2009), close to the 6/min reported for a large population by Ettinger et al. (2003). This low frequency might reflect the extensive training in oculomotor tasks administered before scanning. These results indicate that we successfully avoided the intrusion of saccades into the pursuit tasks, with the exception of catch-up saccades smaller than 1°. Thus, any differences between conditions are unlikely to reflect differences in the stabilization of gaze on the static or moving targets. It has been reported that the frequency of catch-up saccades is greater in pursuit than during fixation (Ettinger et al. 2003). Although we cannot exclude this possibility in the present study, it will not affect our main result based on interaction between oculomotor and task factors.
Figure 3A,B illustrates the eye speed curves during slow and fast trials of a single subject along with the stimulus speed. These latter curves illustrate the 2 motion durations used to uncouple speed from amplitude in judgments and also show the slowing of the stimulus at the end of motion to avoid saccades. Figure 3C shows the eye speed curves averaged across subjects (N = 11) for the slow and fast trials of the 2 pursuit tasks, PS and PF. Clearly, these curves are extremely similar for the 2 tasks but different for slow and fast trials. In fact, the maximum eye speed approaches the target speed in all curves, such that the maximal gain of the pursuit (eye speed divided by target speed) was close to 1 in all 4 types of trials, measuring 1.03 (SD 0.06) for fast trials and 1.06 (SD 0.08) for slow trials, with no significant difference (ANOVA, F3,40 = 1.2, NS). Thus, on average, the subjects’ pursuit eye movements followed the target speed well. What then about the variability across trials? Figure 3D plots the distribution of the maximal eye speeds in slow and fast trials during pursuit. As expected from the previous figure, the averages were clearly different but there was some overlap between the distributions, which crossed at 4.7°/s, corresponding to 28.5% of the trials. An observer using pursuit speed to guess target speed would therefore average 29% errors. Oculomotor performance was slightly poorer than the psychophysical performance because subjects made only 16% errors in judgments of target speed. Of course, the subjects knew that speed differences were adapted to their psychophysical performance, and thus, discrimination was the primary goal in this double task situation.
During the fixation trials, subjects were instructed to maintain their gaze on the central square. To ascertain how well they followed this instruction, we calculated in the 11 subjects the pursuit gain for the first 200 ms of each trial. The gain averaged 0.35 (SD 0.04) and 0.01 (SD 0.02) for pursuit and fixation trials, respectively. This clearly shows that subjects were able to inhibit smooth pursuit during fixation, in agreement with physiological studies revealing a neural substrate for the inhibitory control of smooth pursuit during fixation in the brain stem (Basso et al. 2000; Missal and Keller 2002).
Because subjects generally pursued the target rather precisely, one would expect that the target image was never far from the fovea. Figure 4 shows that this proved true for the slow and the fast trials of both speed and form judgments. In 11 subjects, the target image averaged never more than 0.5° from the VM. Even adding 2 SDs to this average still left the target image within 0.75° of the VM. This is important for interpreting cortical activation during pursuit trials: any activation beyond the projection of 0.75° azimuth is unlikely to reflect the retinal slip of the target image.
Functional Imaging: Group Voxel-Based Analysis
The interaction between task and the oculomotor factors (masked by the main effect of speed) indicates the regions that are more active when subjects judged target speed compared with when they judged its shape in pursuit than in fixation tasks. This indicates candidate regions where an extraretinal eye speed signal could enter the visual system and be combined with retinal image speed signals to yield useful target speed estimates for judgments of speed. The interaction reached significance in only one region, the MOG of both hemispheres (Fig. 5). No interaction was observed in the hMT/V5 complex. To establish the absence of an interaction at this level, we defined hMT/V5+ ROIs from the motion localizer. There was little difference in activity between the 4 task conditions in hMT/V5+ (Fig. 6A,B), and the ANOVA failed to reveal a significant effect of task (see legend). In fact, the activity on the left side was, if anything, slightly greater for form than for speed discrimination tasks. In contrast, the profile of the MOG ROI (Fig. 6C,D), defined from the activation in Figure 5, indicated a difference in activities for speed compared with form discrimination present during pursuit but absent during fixation. Unsurprisingly, the ANOVA revealed significant interaction between perceptual and oculomotor tasks (F1,128 = 6, P < 0.05) in MOG, and the post hoc analysis revealed a significant difference between the PS and PF conditions but not FS and FF (see Fig 6 legend). The profiles of Figure 6 confirmed that the interaction indeed arose from a difference between pursuit conditions, as intended, and very little or none from differences between fixation conditions. The conjunction of the interaction with the main effect was used to ensure this result, but the activity profile verifies this important aspect of the activation pattern, confirmed by the post hoc analysis.
The location of the interaction site in the occipital gyrus suggests that it belongs to an early retinotopic region. Therefore, we mapped the retinotopic organization in most (15/17) subjects. In Figure 7A, the projections of the meridians onto the posterior portions of average flattened hemispheres (PALS atlas) of this group mapping (supplementary Fig. S1) are indicated in black and white lines and the boundaries of central and near peripheral stimuli by purple lines. Because we averaged the trials above and below the HM, the average activations related to the moving target should occur along the projections of the HM (white lines). We were able to map the HM representation splitting V1 into dorsal and ventral halves (HM1), as well as the HMs separating V2 and V3 dorsally (HM2) and ventrally (HM3). In addition, we could discern a representation of the HM in front of dorsal V3 (HM4), belonging to LO1, and another anterior to ventral V3 (HM5), belonging to hV4 (Larsson and Heeger 2006; Georgieva et al. 2009). Projecting the interaction activation pattern onto this retinotopic map (Fig. 7B) indicates that the interaction straddles the HM separating dorsal V2/3 in the left hemisphere and extends from this HM to that of V1 in the right hemisphere.
In addition to the interaction, we also projected the activation pattern of the main effects onto the retinotopic map, as this provides information about the activation context of the interaction effect. In most early retinotopic cortex, activity was much greater during pursuit than during fixation (Fig. 8A). Notice that this was not the case in hMT/V5+ even when the threshold was lowered below the significance level, in agreement with data shown in Figure 6, but was indeed the case for more ventral regions in posterior inferior temporal and fusiform gyri. On the other hand, with respect to perceptual tasks, the activation pattern for form discrimination was much more extensive than for speed discrimination, both at significance level and below (Fig. 8B). The main effect of speed discrimination did not reach significance in occipital cortex. Lowering the threshold revealed only a small site in V1. The main effect of shape discrimination drove the cortex anterior to the HMs of LO1 and hV4. This pattern, including the extension into occipital intraparietal sulcus, is reminiscent of 2D shape sensitivity obtained by comparing intact and scrambled images (Denys et al. 2004).
The random effects group analysis revealed an interaction between the oculomotor and perceptual factors in a single site localized chiefly on the HM representations separating V2/3 dorsally. Although the effect was statistically robust because of the random effect analysis, the identification as dorsal V2/3 was approximate, given that we used an average retinotopic map of 15 subjects and that individual maps can differ. We therefore localized the projections of the HMs in single subjects.
fMRI: ROI Analysis along the Representations of the HM
In single subjects, we were able to identify 5 representations of the HM in occipital cortex using procedures similar to the group analysis (supplementary Fig. S2 and Fig. 9A). Hence, in all 15 subjects, we localized the HM1 within V1, HM2 between V2 and V3 dorsal, HM3 between V2 and V3 ventral, HM4 within LO1, that is, anterior to dorsal V3, and HM5 within hV4, that is, anterior to ventral V3. Retinotopic maps of all 15 subjects are shown in supplementary Figure S3. This figure indicates that a significant interaction between oculomotor and perceptual factors occurred on HM2 of most subjects (10/15 subjects in each hemisphere), much more so than on HM1 (3/15 subjects in each hemisphere) or on HM3 (1/15 subjects in each hemisphere).
These HM representations were tiled with ROIs (see Materials and methods). We also estimated in each subject the 0.7° and 5.7° eccentricity, the limits of activation for the central and near peripheral stimuli (Fig. 9B). Using these 2 reference points on each meridian, we estimated the 2 constants of the magnification function for each HM and derived the relationship between distance in millimeters on the cortical surface and visual degrees in the visual field (see Materials and methods). This allowed us to calculate the azimuth of each ROI along the various HM projections (Fig. 9C) and, finally, to plot the BOLD signals, and their transformations, as a function of azimuth along each of the 5 HMs. Because the azimuths of the different ROIs along an HM differed slightly between hemispheres, we interpolated between the azimuths of the ROIs in the individual hemispheres (see Materials and methods) and averaged the resulting curves across hemispheres.
Figure 10 plots the BOLD signal relative to fixation in each of the 4 experimental conditions as a function of azimuth along each of the 5 HMs, averaged over the 30 hemispheres. The 3-way ANOVA for repeated measures with meridian, azimuth, and condition as factors yielded significant main effects for the 3 factors, significant 2-way interactions between each pair of factors, and a significant 3-way interaction (see legend Fig. 10). Testing for differences between meridians revealed that HM2 differed from all other meridians (post hoc HSD Tukey P < 0.001). On the other hand, the effect of azimuth along HM1 appeared to differ from that along the other HMs, and pairwise interactions with azimuth were significant between HM1 and HM2, HM3, or HM4. Along all meridians, the signals for pursuit tasks were far stronger than those of fixation tasks (post hoc HSD Tukey between 2 pursuit tasks and 3 fixation tasks P < 0.001), as expected from Figure 8. Note that the difference between pursuit and fixation task depended little on azimuth in HM1 but did so along HM4 and HM5, with HM2 and HM3 lying between these extremes.
The most interesting finding, however, is that on HM2 at 2° azimuth and to some extent at 1° and 3°, a clear interaction is visible (Fig. 10): during pursuit (solid lines), the speed judgment task activated HM2 more than the form task, whereas there was no difference during fixation (stippled lines). This is the same interaction as shown in Figure 5B, which is due to differences between pursuit tasks, not between fixation tasks, as required. Statistically, a post hoc HSD Tukey for the interaction was significant on HM2 at 1° (P < 0.001), 2° (P < 0.001), and even 3° (P < 0.01) azimuth. Notice that the opposite interaction is visible at similar azimuth values on HM1 and HM3. At 2° on HM1, the speed task induced more activation than the shape task during fixation, whereas there was little difference between the 2 during pursuit (post hoc HSD Tukey P < 0.01). The same holds for 3° azimuth on HM3 (post hoc HSD Tukey P < 0.005). This explains why the speed main effect failed to reach significance in the early areas (V1–V3): opposite interactions in HM2 and the 2 other meridians cancelled one another.
Activity during the 2 fixation tasks in Figure 10 should reflect target motion, and the negative BOLD signals, relative to fixation, observed in most fixation curves deserve further comments. Based on the target trajectories (Fig. 1), we expect double the activation between 0° and 1.5° azimuth as between 1.5° and 3°, with little or no activation beyond. This pattern derived from the retinal stimulation is blurred to some extent by smoothing and averaging of data. However, it is also modified by attentional effects. In retinotopic regions, attention to a small target enhances the activation by this target, especially beyond V2–3 (Tootell et al. 1998; Huk and Heeger 2000). In addition, activity surrounding the target representation is suppressed (Tootell et al. 1998; Vanduffel et al. 2000), especially in area V1. These suppressive effects explain the negative BOLD signals relative to fixation (see Fig. 10 legend and supplementary Fig. S4).
Figure 11 plots the magnitude of the task main effects and the interaction directly as a function of the azimuth along the 5 HMs. A 2-way repeated measure ANOVA with azimuth and meridian as factors yielded significant main effects and interaction (see legend). Clearly, the first 3 HMs are activated during speed tasks, whereas HM4 and HM5 show opposite behaviors (Fig. 11A). The speed effect was significant only on HM1, in agreement with Figure 8 (this same map displayed at a lower threshold in supplementary Fig. S5), and it reaches its maximum between 2° and 4° azimuth, roughly corresponding to the target position at the reaction time of the manual response. The plot of the interaction (Fig. 11B) confirms that it reached significance only on HM2 at 1° and 2° azimuth. Post hoc testing (HSD Tukey) confirmed that on HM2 1° and 2° azimuths differ from 4°, 5°, and 6° (P < 0.001). On the other hand, at 2° azimuth, HM2 differed from all the other 4 meridians (P < 0.001). Comparison of flat maps in supplementary Figure S5 and Figure 7B confirms that the interaction effect occurred at a slightly smaller azimuth than the main effect of speed, as suggested by the plots along the HMs.
These results clearly demonstrate that the interaction between oculomotor and perceptual factors occurs only on the HM between dorsal V2 and V3 at an azimuth of 1–2°. This azimuth is never reached by the target image (Fig. 4), and eye speeds were very similar in the 2 pursuit tasks (Fig. 3C). Thus, this activation must reflect an extraretinal speed signal active during pursuit and speed discrimination. An azimuth of 1–2° corresponds roughly to the position that the target would have reached at the reaction time of the speed judgment if it were moving over the retina. Notice that the many substantial differences between pursuit and fixation task suggest that additional extraretinal signals may operate during pursuit in the early areas. Indeed, this differential activity is unlikely to reflect the visual differences because the visual activity evoked by the small targets was relatively restricted.
Our results show that during pursuit, a precise extraretinal speed signal is injected at 2° azimuth on the HM representation separating dorsal V2–3. Findings indicate that speed discrimination involves early V1–V3, with some division of labor. V1 and ventral V2–3 contribute to speed discrimination during fixation, in agreement with our earlier study (Sunaert et al. 2000), whereas dorsal V2–3, the recipients of the extraretinal speed signal, contribute during pursuit. Finally, additional extraretinal signals might reach early visual areas.
Interaction between Oculomotor and Perceptual Factors in Dorsal V2–V3
The interaction site on the HM separating dorsal V2 and V3 must represent an extraretinal signal. Indeed, the azimuth of the site exceeds the distance that the target image travels on the retina during pursuit, thus excluding a retinal origin for this activation. This extraretinal speed signal is quite precise as subjects were able to discriminate equally small speed differences during pursuit as during fixation. The azimuth corresponds to the position the moving target would have reached if it were moving over the retina. We propose that this is how the visual system solves the conundrum of localizing a target in retinotopic space that moves in space but whose image is largely immobile on the retina. This solution applies because the HMs between dorsal and ventral V2–3 represent the same region of the visual field twice. They may divide the task between them: one HM representation, in ventral V2–3, carries signals reflecting what occurs on the retina and hence is used during fixation and the other in dorsal V2–3 is used to localize activity that reinterprets retinal signals by including extraretinal signals. In the monkey, V3 neurons exhibit filling-in (De Weerd et al. 1995). When a stimulus with a gap is shown for some time, it fills in perceptually and V3 neurons with their RFs lying inside the gap soon become active, following a time course similar to the perception. This indicates that V3 neurons are capable of reinterpreting local retinal signals and provide signals that do not simply correspond to the retinal stimulation.
Two Visuomotor Pathways
There is considerable evidence that pursuit eye movements and speed discrimination have similar precision, with similar just noticeable differences for perceptual judgments and pursuit eye movements (Kowler and McKee 1987; Gegenfurtner et al. 2003; Osborne et al. 2005, 2007). One would therefore expect that if the extraretinal signal injected at the level of V2–3 allows fine speed discrimination during pursuit, it should also improve pursuit performance during speed discrimination compared with pursuit during form discrimination (when the signal is absent). In fact, this is not what was observed. Pursuit was equally precise during speed and form discriminations, but the oculomotor precision was poorer than the perceptual precision. This implies that the pathway leading to pursuit and to the speed judgments diverges beyond V2–3, at least for small moving targets. The button press used in the speed discrimination involves putative human anterior intraparietal area (phAIP) (Binkofski et al. 1999; Culham et al. 2003), as shown by the comparison of the pursuit tasks with the pursuit localizer (data not shown). On the other hand, there is considerable evidence linking MT/V5 activity to pursuit (Newsome et al. 1985; Dursteler and Wurtz 1988; Yamasaki and Wurtz 1991; Lagae et al. 1993; Schiller and Lee 1994; Priebe et al. 2001; Born and Bradley 2005; Huang and Lisberger 2009). Hence, we propose (Fig. 12) that in the present study, the speed is computed at the level of V2–3 and that this signal travels to the posterior parietal cortex (PPC), perhaps after relay via V3A, reaching phAIP, where the sensorimotor transformation for the button press occurs. On the other hand, the V2–3 speed signal is also fed to hMT/V5+ from whence it projects to FEF for generating the smooth pursuit. The fact that hMT/V5+ is equally active during all tasks suggests that it is involved not only in pursuit but also in attentive tracking of a moving target (Ohlendorf et al. 2007). Thus, the 2 motor responses here, button presses and pursuit, depend on different pathways, which might explain differences in performance, as different levels of noise may be introduced along these 2 visuomotor pathways. Because the stimuli were continuously adapted to the subjects’ discrimination performance, it seems logical that subjects considered discrimination the primary task in the double task context. Hence, unlike the single-task paradigm, more noise might be injected into the pathway generating pursuit than in that generating button presses, explaining the different levels of performance in pursuit and speed judgments. In the single-task conditions in which performance have been compared so far, the motor noise has been small compared with the sensory noise (Osborne et al. 2005, 2007) and both tasks were equally precise, at least for longer stimulus duration (Rasche and Gegenfurtner 2009). Although it is generally accepted that this common sensory limitation involves area MT/V5 (Osborne et al. 2005; Rasche and Gegenfurtner 2009), our experiments suggest that this common stage may be V2–V3. That pursuit and speed discrimination exhibit different behaviors in our experiments agrees with other studies reporting dissociations between pursuit and speed discrimination (Spering and Gegenfurtner 2008; Rasche and Gegenfurtner 2009).
Speed Discrimination and hMT/V5+
The present results, though in agreement with our previous study (Sunaert et al. 2000), seem at odds with a number of other studies, both in humans and monkeys, commonly considered to support the role of MT/V5 in speed perception. Several factors, not mutually exclusive, might explain the discrepancy of the present study with these other studies. First, we tested human subjects, whereas many others have used monkeys. Although the visual cortex of primates shows considerable homology, one of the major changes involves V3A, which is motion sensitive in humans (Tootell et al. 1997) and is integrated into an extended complex (Georgieva et al. 2009). This might be the gateway of motion information into PPC of humans (Orban et al. 2004). Second, we required manual responses, whereas many monkey experiments have used eye movements as operant, and the lateral intraparietal (LIP) area and FEF, involved in visual control of saccades, are strongly linked to MT/V5. Finally, we used a small moving target, whereas monkey studies used patches of random dot patterns (RDPs). It is unclear how much attentional tracking a moving RDP evokes, but MT/V5 might also be instrumental for directing attention to brief the motion of an RDP, especially when it is small in size.
Functional imaging studies in humans have implicated hMT/V5+ in speed discrimination, particularly those of Corbetta et al. (1991), Beauchamp et al. (1997), and Huk and Heeger (2000). On the other hand, Sunaert et al. (2000), confirming an earlier positron emission tomographyPET) study (Orban et al. 1998), found no evidence for involvement of hMT/V5+ in speed discrimination, in agreement with the present study. Unlike the present study, none of those earlier studies reported an analysis of eye movements recorded in the scanner. Eye movement confounds are less of a concern with small stimuli such as those used by Sunaert et al. (2000) (3° diameter) than with the large RDPs used by Corbetta et al. (1991) (32° diameter), Beauchamp et al. (1997) (16° diameter), and Huk and Heeger (2000) (14° diameter). Furthermore, in all imaging studies, the activity of the hMT/V5 complex was investigated, not that of hMT/V5 proper.
The study by Corbetta et al. (1991) in no way contradicts our study, as the main area involved in speed discrimination was an inferior parietal lobule region. Activation in a LO region, commensurate with the location of the hMT/V5 complex, was observed only in the contrast speed discrimination versus passive viewing, in which a lingual activation, compatible with ventral V2–V3, was also observed. The interpretation of the result of Beauchamp et al. (1997) is difficult in that the contrast between speed discrimination and color discrimination is confounded with a sensory change from color contrast to motion contrast. In a carefully controlled study, Huk and Heeger (2000) reported a small but consistent modulation of hMT/V5+ when subjects (n = 4) alternated between speed and contrast discrimination. Given the large size of their stimuli, any effect in V2/3 may have been difficult to observe because different subjects may have attended to different parts of the stimulus. It is worth noting that the present study, in which moving stimuli alternated in direction, and the study of Sunaert et al. (2000) in which well-trained subjects discriminated small speed differences, eliminate the objections raised against the initial PET study of Orban et al. (1998) by Huk and Heeger (2000). Thus, it is possible that the role of hMT/V5 in speed discrimination is more limited for small stimuli than for large RDP. MT/V5 houses 2 types of neurons, those with and without antagonistic surrounds (Raiguel et al. 1995; Born 2000), and it is possible that the neurons without surrounds contribute to speed computation for large RDPs, unlike those with surrounds, supposedly involved in driving pursuit (Groh et al. 1997).
Although the imaging studies are not necessarily contradictory to our results, the evidence from monkey studies favoring MT/V5 involvement in speed discrimination seems indisputable. Before resorting to species differences, it is worth reviewing the pertinence of the 4 sets of monkey data as support for the contribution of MT/V5 to speed discrimination. The first argument that MT/V5 neurons are much better suited to the task than their afferents in V1 has lost much ground. MT/V5 neurons are pattern selective (Movshon et al. 1985) and integrate local motion components, from V1 layer 6 (Movshon and Newsome 1996), to compute speed accurately. Yet, it has become clear that the various projections from V1 to MT/V5 probably carry different motion signals (Born and Bradley 2005; Orban 2008) and that V1 end-stopped neurons (Livingstone and Conway 2007) should be able to signal the speed of a patch of checkerboard correctly. In the same vein, it was initially reported that MT/V5 neurons but not V1 neurons coded speed (Priebe et al. 2001), but more recently, such a class of V1 complex cells has been described (Priebe et al. 2006). Thus, a transit through MT/V5 for speed computation no longer seems essential, and the role of V3 neurons is simply unexplored. In addition, recent single-cell studies have shown that the relationship of speed and contrast in MT/V5 neurons is at odds with accepted models invoked to explain speed judgments (Krekelberg et al. 2006).
Neither are our results inconsistent with 2 other arguments favoring the MT/V5 participation in speed discrimination: choice probabilities and electrical stimulation (Liu and Newsome 2005). Choice probabilities were low (0.52 for 2 monkeys), and their value as evidence for the involvement of an area in a behavioral task is unclear (Cohen and Newsome 2008). Liu and Newsome (2005) tested electrical stimulation in 2 monkeys and obtained effects in only 1 of the 2 subjects. Even accepting these results, they do not specify any pathway from MT/V5 to the behavioral response. In particular, one cannot rule out the possibility of feedback projections from MT/V5 to V2 and V3 (Maunsell and Van Essen 1983), and it remains that V3 neurons respond to moving RDPs (De Weerd et al. 1995).
In the monkey, lesions of MT/V5 produce profound deficits in speed discrimination (Pasternak and Merigan 1994; Orban et al. 1995). The deficits were more profound in the study of Orban et al. in which pursuit eye movements were avoided by short presentation times. The monkeys’ task was to compare the speeds of 2 briefly (200 ms) presented moving RDPs (7° diameter). Because the first stimulus was always the reference, the animals probably followed an identification strategy using the first stimulus as a sort of warning. It is thus possible that MT/V5, given its direct projection to FEF, helps direct attention and gaze to a brief moving stimulus, especially a small one. The effect of MT/V5 lesions on catch-up saccades at the start of pursuit (Newsome et al. 1985) favors this view. The control tasks of the study of Orban et al. (1995) involved only static stimuli, leaving open the possibility that the lesion effect was less specific for speed than originally believed. Furthermore, we cannot exclude the possibility that the MT/V5 lesions modified the behavior of the V3 neurons, causing the behavioral deficit, rather than the lesion itself. Thus, our present results do not necessarily contradict the monkey data and suggest that the role of MT/V5 in motion processing may be more restricted than initially believed (Zeki 1974, 1978; Zeki et al. 1991). It also indicates that to understand the role of a sensory region in behavior, its relationship to regions outside the sensory system is as important as that with other regions within that system (Orban 2007).
The main result of our study is the finding that an extraretinal speed signal is injected at the level of HM2 to compute target speed with respect to an external reference rather than the retina. However, sizeable differences between BOLD signals during pursuit and fixation tasks observed in early visual cortex suggest that additional extraretinal influences operate even at that early level. In addition to the extraretinal signal used to compute the speed of the pursued target in head-centered coordinates, as reported here, at least 2 other extraretinal signals have been thought to play a role in smooth pursuit. One of these relates to anticipation in pursuit and probably involves the velocity memory (for review, Barnes 2008). Barnes considers only control of eye movements and assumes that both signals are injected at the same locus in the processing of eye velocity. Here we consider the possibility that in addition to a direct loop for speed judgments specifically targeting dorsal V2–3, indirect internal signals related to predictive control exert a more general effect on early visual cortex, accounting for the difference between pursuit and fixation curves observed in V1–4. A possible origin of these signals is the supplementary eye field (SEF) (Missal and Heinen 2004) with a relay through the FEF to early visual cortex (Ekstrom et al. 2008). Indeed, blanking of visual stimuli during pursuit leaves pursuit-related activity in SEF intact, but the role of FEF in the prediction of pursuit is under debate (Keating 1993; Fukushima et al. 2002; Ilg and Thier 2008). A third extraretinal pursuit signal is involved in nullifying the spurious background motion during pursuit. Its cancellation effect requires an extraretinal signal, which in humans arises from the lateral hemispheres (CRUS I) of the cerebellum (Lindner and Ilg 2006). This signal might originate in SEF, be elaborated within the cerebellum, and enter at the level of the parietoinsular vestibular cortex, a scheme supported by lesion studies (Haarmeier et al. 1997). Although it is unlikely that this signal reaches the early visual cortex, a signal from the same origin might be used in V2/V3 to compute the speed of the pursued target. Finally, for completeness, one could include the gain adaptation signals that presumably involve FEF and MSTv (see Introduction) as a fourth extraretinal signal operating during pursuit.
There is ample single-cell evidence for corollary discharge at a higher level such as FEF (Sommer and Wurtz 2008) and extraretinal signals in MSTv (Newsome et al. 1988) and SEF (Heinen 1995; Heinen and Liu 1997). Also, extraretinal signals have been reported in the hMT/V5 complex (Goossens et al. 2006). In addition, there is mounting evidence from single-cell studies that extraretinal position signals reach V1 (Trotter et al. 1992; Celebrini et al. 1993; Trotter and Celebrini 1999). Such position signals might operate over the whole of V1 and could contribute to the difference in BOLD levels we observed between pursuit and fixation trials in V1. On the other hand, Kagan et al. (2008) reported suppression followed by rebound activation evoked by fixational and volitional saccades in V1. In addition, in V3 and V3A, RF remapping signals have been observed during saccades, strongly reminiscent of those observed in LIP, FEF, and the superior colliculus (Merriam and Colby 2005). Similar signals have been mapped in human V3A and V4 and to a lesser extent in V3 (Merriam et al. 2007). It is unclear whether these 2 types of saccade-related extraretinal signals contribute to differences between pursuit and fixation trials that we observed in V1–V4.
In conclusion, we have unmasked an extraretinal speed signal during pursuit, not previously described in the human or primate brain. This signal is injected in early retinotopic cortex at the site of the sole redundancy in human visual cortex, providing a solution to the conundrum of how to retinotopically represent the motion of a target whose image is stabilized on the retina. These results underscore the importance of V1–V3 in speed discrimination and call for a reappraisal of the role of MT/V5 in motion processing, which may be more limited than originally thought.
CHU de Nantes and Fédération des Aveugles de France to P.L.; Neurobotics to A.B.; EF 14/5 to G.A.O.
The help of P. Leboucher, M. Ehrette, S. Kinkingnéhun, R. Valabrègue, K. Nigaud, E. Bertasi, R. Peeters, and J. Jastorff is kindly acknowledged. The authors are indebted to S. Raiguel and M. Missal for comments on earlier versions of the manuscript. This study was initiated while G.A.O. held the European chair at the Collège de France. Conflict of Interest: None declared.