When and where are decisions made? In the visual system a saccade, which is a fast shift of gaze toward a target in the visual scene, is the behavioral outcome of a decision. Current neurophysiological data and reaction time models show that saccadic reaction times are determined by a build-up of activity in motor-related structures, such as the frontal eye fields. These structures depend on the sensory evidence of the stimulus. Here we use a delayed figure-ground detection task to show that late modulated activity in the visual cortex (V1) predicts saccadic reaction time. This predictive activity is part of the process of figure-ground segregation and is specific for the saccade target location. These observations indicate that sensory signals are directly involved in the decision of when and where to look.
Saccadic reaction time is determined by neurons in motor-related structures, such as: lateral intraparietal area (LIP), frontal eye fields (FEF), the superior colliculus (SC), and other parts of the brain (Munoz and Wurtz 1995; Thompson and others 1996; Schall 2001; Shadlen and Newsome 2001). In these areas, the neural activity that controls oculomotor behavior increases in time until it reaches a certain threshold. Once this threshold is reached, a saccade is initiated toward the visual target location. The variability in saccadic reaction times is due to variability in the rate of increase in neural activity toward the threshold (Hanes and Schall 1996; Shadlen and Newsome 1996). Models that describe oculomotor decisions also assume that a signal accumulates over time until a threshold is reached (Pacut 1982; Reddi and Carpenter 2000; Smith and Ratcliff 2004). According to such models, the closer neural responses get to the threshold, the more accurately one can predict where and when the eyes will move.
What determines the rate of accumulation of activity in motor-related areas? It has been proposed that sensory evidence is read out by the motor-related structures (Groh and others 1997; Kim and Shadlen 1999; Shadlen and Newsome 2001). Previously, we provided evidence that in a figure-ground detection task, late (i.e., >100 ms) V1 responses represent the internal evidence of the stimulus (Lamme and others 2000; Supèr and others 2001a,b). In addition, the strength of figure-ground responses predicts saccadic reaction time (Supèr and others 2003a). These observations suggest that oculomotor areas read out figure-ground activity as the sensory evidence. However, the strength of V1 activity just before the onset of a saccadic eye movement also predicts saccadic reaction time (Supèr and others 2004). Preparation of the saccade takes place during this presaccadic period, which is postperceptual (Thompson and others 1996; Schall and Thompson 1999). Therefore, it is currently unclear whether V1 responses that predict saccadic reaction time reflect sensory- or motor-related processing. To explore the contribution of sensory activity to saccadic reaction times, we recorded multiunit activity (Supèr and Roelfsema 2004) in the primary visual cortex of monkeys performing a delayed figure-ground response task. In such a task, neural responses related to the internal evidence of the stimulus, that is, figure-ground activity (Lamme and others 2000; Supèr and others 2001a,b; Tong 2003; see also Corthout and Supèr 2004), are separated in time from responses related to motor preparation of the saccade (Supèr and others 2004). Therefore, this delayed detection task allows us to study perceptual processes in isolation from movement-related processes, and we can determine at which moment in time V1 responses start to predict saccadic reaction time.
We show that in such a delayed figure-ground detection task, V1 responses predict the time of a saccade toward the figure location. Faster behavioral responses are observed when V1 activity is stronger. This predictive activity starts during the late response period of V1 neurons, when figure elements segregate from background elements, and continues until a saccade is initiated. Because such late modulated responses relate to visual perception, we suggest that the oculomotor system uses this signal to control saccadic eye movements. In conclusion, the present findings indicate that the variability in sensorimotor reaction times is not only determined by motor-related structures but also by sensory areas.
Materials and Methods
Two monkeys (Macaca mulatta) were trained to fixate on a central point on a computer monitor. After 300 ms fixation, the stimulus screen appeared containing a texture-defined figure, randomly positioned in 1 of 3 possible locations (Fig. 1). Eighty-four milliseconds for monkey U and 280 ms for monkey T later, the figure-ground texture was replaced by a different figure-ground texture or by a homogeneous texture (see also below). Different times were chosen to capture the possible temporal effects of late modulated responses on saccadic behavior. However, no clear effects were noticed. In the former case, a figure of the same size reappeared at the same location as the first figure. We labeled these trials where the relevant visual information (=figure) remained visible, visually guided trials. In the latter case, the figure disappeared and the animals had to remember the figure location. These trials are referred to as memory-guided trials. Visually guided trials and memory-guided trials were randomly interleaved.
The animals maintained fixation until cued (cue is disappearance of the fixation spot, 1000 ms after onset of the first figure-ground texture) to saccade toward the figure location. Saccades directed toward the figure that covered the receptive fields of the recorded neurons are referred to as the figure condition. For the other 2 figure locations, receptive fields were covered by ground and saccades were directed away from the receptive fields. These trials are referred to as the ground condition. Only correct saccades were analyzed. The maximum time allowed for responding to the figure was 500 ms. Trials were discarded if the eye position left the electronic fixation window (1° × 1°) during fixation, for example, due to fixational saccades, or if the animals made incorrect responses. Eye movements were monitored using scleral search coils with the modified double magnetic induction method, and digitized at 400 Hz (Bour and others 1984).
The stimulus screen with the figure-ground display consisted of a texture of a single particular orientation of line segments, except for a small square region (figure), where line segments were orientated orthogonally. Stimuli were presented on a 21-inch monitor screen, driven by TIGA software. The display resolution was 1024 × 768 pixels and the refresh rate 72 Hz. The monkey was seated in a primate chair and placed in a dark room 75 cm from the monitor screen. The screen subtended a 28° × 21° visual angle. In each trial, a square of 3° was randomly presented at 1 out of 3 possible locations at an eccentricity of 2.7–4.4° from the fixation point (a central red spot of 0.2°). The onset of figure-ground trials consisted of an abrupt transition from a texture of randomly oriented line segments into a texture of oriented line segments, with a 90° orientation difference between figure and ground elements. Line segments could have 135° or 45° orientation and were 16 × 1 pixels (0.44° × 0.027°). The density was 5 line segments per square degree. Both orientations were used for figure and background, resulting in 2 possible (i.e., 135°/45° and 45°/135°) figure-ground textures. These textures were replaced by a homogeneous texture (135°/135° and 45°/45°) for memory-guided trials or by a figure-ground texture (45°/135° and 135°/45°) for visually guided trials. Responses to these pairs were averaged, so that the local receptive field stimulation was identical for figure and background conditions.
Data Recording and Analysis
A comprehensive description of the recording technique is given elsewhere (Supèr and Roelfsema 2004). In brief, multiunit activity was recorded through platinum–iridium microwire electrodes (16 out of ∼40 electrodes per animal, impedances 100–350 kΩ at 1000 Hz). These were surgically implanted in the operculum of mainly superficial layers of the V1, within an area <2 cm2. The interelectrode spacing was approximately 1 mm. Selection was based on the quality of the signal and the position of the receptive field. The obtained signals were amplified (40 000×), band-pass filtered (750–5000 Hz), full-wave rectified, and then low-pass filtered (<200 Hz). The resulting signal represents spiking activity. Such recordings are similar to single-unit recordings (Lamme 1995). Prior to the experiments, aggregate receptive field size and positions at each electrode were determined, using moving bars. Receptive field sizes ranged from 0.4° to 1.0°, and eccentricity from 1° to 6°. For each monkey, figure positions and electrodes were chosen so that the figure covered the receptive fields of the 16 electrodes simultaneously (=figure condition). Therefore, many recorded neurons had overlapping receptive fields. A few receptive fields extended slightly beyond the figure border. This had no significant effects on the results (see also Supèr and others 2003a). In the other 2 figure locations, the receptive fields were covered by ground (=ground condition).
We subtracted the direct current component (average baseline activity from 0 to 30 ms after stimulus onset) from the responses. Thereafter, the average responses at each electrode were normalized. The responses were divided by a constant factor at each electrode. This factor was the maximum response found for any of the conditions (i.e., figure, ground, visually guided trials, and memory-guided trials). Therefore, each electrode contributed equally to the population average. However, relative differences between conditions were maintained in spite of the normalization. All population data are from 16 electrodes per animal and the correlation coefficients for the data are of both animals unless specified otherwise.
Both monkeys performed the delayed detection task well. The percentage correct was 93% (U) and 89% (T) in the visually guided condition, and 88% (U) and 84% (T) in the memory-guided condition. The mean reaction time (mean ± standard deviation [SD]) was 339 ± 105 ms (U) and 353 ± 133 ms (T) in the visually guided condition, and 319 ± 106 ms (U) and 331 ± 118 ms (T) in the memory-guided condition. While the animals performed this task we recorded multiunit activity in the primary visual cortex.
First we grouped the neural data from the figure condition (see Fig. 1) into 2 groups. Groups were formed according to the saccadic reaction time, so that each group contained approximately an equal number of trials. The saccade latency was determined for each trial, and the neural data placed in the fast or slow saccade latency group. We then compared the average neural responses of these 2 groups. The strength of the average neural responses in the visually guided condition was similar for the fast and slow reaction time group (Fig. 2). However, a clear difference between the fast and slow reaction time group was noticed in the memory-guided condition (Fig. 2). In this condition, the average neural response in the fast reaction group was stronger than the average neural response in the slow reaction time group.
To further establish whether a correlation exists between V1 activity—in particular the late modulated response—and saccade latency, we calculated the strength of the average neural population responses as a function of saccadic reaction time. We first divided the neural data into 6 consecutive saccadic reaction time groups of 25 ms bin size each (varying the number of bins and bin size gave comparable results). We then calculated the average strength of the late neural responses for all 6 reaction time groups. To select the late modulated responses, we used a window of 85–135 ms for both animals. This window starts when modulated responses to the first figure-ground texture occur (∼80 ms) and ends before the start of the visual responses to the second texture (∼135 ms for monkey U). Finally, to determine whether there is a relation between the strength of the late V1 responses and saccadic reaction time, for each time window we plotted the 6 data points of the average neural responses against the reaction time and fitted a linear regression line (Fig. 3A). As expected, no significant correlations were observed in the visually guided condition (R = 0.08, P = 0.87). However, responses showed a clear negative correlation function (R = −0.80, P = 0.04) in the memory-guided condition. Thus, the strength of V1 responses predicts the moment of memory-guided saccades, but not that of visually guided saccades.
A potential concern was that the position (Sharma and others 2003) or movements (Martinez-Conde and others 2000; Snodderly and others 2001) of the eyes could differ in some subtle respect during fixation. This could cause differences in neural response. To control for such differences in fixation behavior, we analyzed the eye movements in the 6 reaction time groups. We calculated the SD of the x and y coordinates of the eye position during the interval from 0 to 1000 ms of each trial. The higher this value, the less accurate fixation was maintained. Examples of fixations in memory-guided trials are shown in Figure 4. They are superimposed on 25 other randomly chosen fixations of the same reaction time group. The average SDs for the 6 types of trials showed no correlation with reaction time. Correlation strengths with reaction time were as follows: horizontal eye movements: R = 0.26 in the visually guided condition, R = 0.06 in the memory-guided condition; vertical eye movements: R = 0.20 in the visually guided condition, and R = 0.03 in the memory-guided condition. This result shows that any differences in neural response between different reaction time groups cannot be attributed to differences in fixation eye movements. This confirmed our earlier analyses (Supèr and others 2004), and controls on the lack of involvement of eye movements in responses obtained with these stimuli.
Thus far, we have analyzed the responses to figure. We also calculated the correlation function of the responses to ground. Figure trials correspond to saccade to stimulus in the receptive field. Ground trials correspond to saccade to stimulus outside of the receptive field. Therefore, an analysis of responses to ground will show whether correlations are spatially specific or whether they occur irrespective of the saccade direction. No significant correlations were observed for the ground condition (Fig. 5A; visually guided condition: R = −0.01, P = 0.98; memory-guided condition: R = −0.37, P = 0.46).
Besides the analysis of the average population responses, we calculated the slope of the correlation function for each animal and for each individual recording site. We then plotted the distribution. In the visually guided condition, the distribution of the slopes for the figure (Fig. 3B) and ground (Fig. 5A) trials was not significantly different from zero (figure condition: P = 0.13; ground condition: P = 0.36; 2-tailed t-test). In the memory-guided condition, most sites (∼90%) showed a negative slope—that is, a negative correlation function—for the figure trials (Fig. 3B) but not for the ground trials (Fig. 5A; figure trials: P < 10−6; ground trials: P = 0.05; 2-tailed t-test). Such spatial specificity of the predictive activity in the memory-guided condition was supported by the finding of a strong correlation between the figure-ground signal and saccadic reaction time (Fig. 5B; R = −0.80, P = 0.05), in which most sites showed a negative correlation function (∼84%, P < 10−4, 2-tailed t-test). The figure-ground signal is the difference between the responses to figure and the responses to ground. No significant correlation function was observed (R = 0.07, P = 0.80) for the figure-ground signal in the visually guided condition. Thus, late modulated V1 responses that predict saccade latencies specifically occur at the figure location, that is, the saccade target location.
Therefore, late V1 responses predict saccadic reaction time, but the actual eye movement is made ∼1 s later. We then wanted to know how the predictive activity progresses over time during the trial. For all 6 reaction time groups, we calculated the average population strength of neural responses in consecutive 10-ms time windows, starting from stimulus onset. We determined the correlation strength as described above for each time window. Such an analysis gives the temporal evolution of correlation strengths between V1 responses and saccadic reaction times. Figure 6 shows these correlations over time. The dots represent a significant (P < 0.05) correlation per time window for responses to figure and to ground separately.
Strong negative correlations between figure responses and saccadic reaction times were observed for the memory-guided condition (average correlation strength: R = −0.82 ± 0.28; mean ± SD), especially during the delay period (average: R = −0.92 ± 0.06; mean ± SD). No significant average correlations were observed for the ground trials (0.14 ± 0.45; mean ± SD). However, for 1 animal (U) the correlations became significant at the end of the trial. Similarly, no significant correlations were observed at any time point during the trial in the visually guided condition (average correlation strength: figure trials, R = 0.21 ± 0.34 and ground trials, 0.06 ± 0.29; mean ± SD).
The correlations between neural response and saccadic reaction time became significant at 89 ± 30 ms (mean ± SD) in the memory-guided condition. This time is similar to the moment when neural responses to figure segregate from ground responses (Lamme 1995; Lamme and others 1999; Supèr and others 2003a). This is exemplified in Figure 7, where the average population responses to figure and ground are shown (Fig. 7A) and their differences (Fig. 7B). Figure 7(B) also shows the differences between the same figure responses, grouped according to fast and slow saccadic reaction time, as described above. The difference in response strength between the fast and slow reaction time group becomes apparent at the same time figure responses segregate from ground responses. This result indicates that the onset of predictive V1 activity starts when neural signals distinguish between figure and background line segments and continues while the animal is remembering the figure location.
Previous studies have demonstrated that presaccadic activity predicts the direction and latency of the eye movement in motor-related structures (see Schall and Thompson 1999) and in V1 (Supèr and others 2004). Does such motor-related activity in V1 depend on perceptual information? To explore a possible neural link between perception and motor activity, we aligned the figure and ground responses of the memory-guided condition to the onset of the saccade and calculated the strength of the figure-ground activity. We then computed the correlation function per recording site as described above but for presaccadic (=100 ms window before the start of the saccade) figure-ground responses. These results showed that the predictive figure-ground activity in the memory condition continued until the onset of a saccade (Fig. 8A,B; R = 0.96, P < 10−3). Most recording sites (87%) showed a negative correlation function (Fig. 8C). Furthermore, enhancement of the figure-ground signal before the start of the eye movement was noticed both in the visually guided and memory-guided condition. The strength of this presaccadic enhancement of figure-ground activity showed a significant positive correlation with the initial (i.e., 85- to 135-ms period) strength of figure-ground segregation (Fig. 9; R = 0.66, P = 0.005) in the memory-guided condition. However, this was not the case in the visually guided condition (R = 0.28, P = 0.34). Thus, the strength of the perceptual signal (=figure-ground responses to first figure-ground texture) is predictive for the strength of the motor-related signal.
In this study we showed that the strength of late activity in the primary visual cortex predicts the time of a saccadic eye movement in a delayed figure-ground detection task. Stronger neural responses correlate with an earlier saccadic eye movement. This predictive activity in V1 is observed at the saccade target location for memory-guided saccades. However, it is not seen at nonsaccade target locations or for visually guided saccades. The predictive signal starts at the same moment when responses to figure (=saccade target location) segregate from responses to ground (nonsaccade target location) and continues until the onset of the saccade. Over time, the predictive value remains constant. Besides saccadic latency, V1 responses also predict the direction of the saccade, because responses to figure are stronger than responses to ground.
Origin of Predictive Responses in V1
These results cannot be explained by the visual stimulus. Responses in sensory areas are largely defined by the properties of the sensory stimulus, for example, preferred stimuli give stronger neural responses than nonpreferred stimuli. However, in this study the average classical receptive field stimuli are identical in the visually guided and memory-guided conditions (see Methods). Neither can this predictive activity be a result of arousal nor vigilance simultaneously causing stronger neural responses and faster behavioral responses. The difference in correlation strength between the memory and visually guided condition excludes this possibility because the visually guided and memory-guided trials are randomly interleaved. Expectancy of reward can modulate behavior and sensory responses (Watanabe and Hikosaka 2005; Shuler and Bear 2006). However, in our study reward was constant. Thus, reward expectancy cannot explain the present results.
Attention, however, may be a relevant factor. Frontal areas may provide an attention signal to visual cortical areas (Moore and Armstrong 2003). Here, attention produces stronger neural responses (Treue 2001). In our study, possible attention effects should start when late modulated responses appear, and continue throughout the trial. Specifically, such attention effects should be maintained in the memory-guided condition and not in the visually guided condition. Thus, differences in attention can explain the observed results if attention acts differently in the memory-guided condition than in the visually guided condition.
Alternatively, the predictive activity in V1 may represent an efferent copy of motor signals from higher areas in the parietal and frontal cortex, which control saccadic eye movements. However, neural activity in parietal or frontal areas that predict reaction time is observed during the presaccadic period. It is postperceptual, that is to say, it starts after the perception of the target stimulus. Typically, such responses occur 100–200 ms before the initiation of the saccade. Also build-up activity in the superior colliculus starts a few hundreds milliseconds before saccade onset (Dorris and Munoz 1998). We believe that the correlations of figure-ground activity in V1 with saccadic reaction times are not a result of an efferent copy of a planned eye movement signal from motor-related areas. Although currently we cannot discount this possibility. On the contrary, the predictive activity in motor-related areas may be a result of the perceptual signal in lower visual areas (see below).
Another possible interpretation of the findings can be made if we assume that figure-ground activity is the internal evidence of the stimulus. The oculomotor system uses internal evidence of the stimulus, which in our case is the figure-ground signal, to control saccadic behavior. The figure is only briefly presented in the memory-guided condition, and the animal needs to remember the figure location. This means that the internal evidence of the stimulus at the moment of the saccade, that is, the memory trace, is dependent on the initial perception of the figure, that is, the initial figure-ground segregation (Supèr and others 2001b; present results). In the memory-guided condition, therefore, a correlation between the strength of late V1 activity and the behavioral responses can be found. In the visually guided condition, the relevant visual information (=textured figure) persists throughout the trial. This means that the internal evidence of the stimulus can be refreshed continuously throughout the trial. In this case, no dependence on the initial figure-ground signal emerges. Thus, a correlation between the initial figure-ground signal (=figure-ground perception), presaccadic figure-ground signal (=memory trace of figure), and behavioral responses (=saccades) will exist in the memory-guided but not in the visually guided condition. Indeed, this is what we observed.
Role of V1 in Visuomotor Integration
In a figure-ground task, feature extraction occurs during the initial response period when V1 neurons receive their receptive field information. The late response period is influenced by the context of the receptive field stimulus. During this phase figure responses are enhanced compared with background responses (=figure-ground activity). These late modulated responses represent the figure as such and not the line elements that make up the figure or the boundaries of the figure (Lamme 1995; Lamme and others 1999; see also Corthout and Supèr 2004). This late part of V1 responses also represents the neural correlate of figure-ground perception. It is present when the stimulus is perceived and absent when the stimulus is not perceived (Supèr and others 2001a). It is not observed in anesthetized animals (Lamme and others 1998). Furthermore, figure-ground activity in V1 only continues after the removal of the stimulus when the stimulus is relevant to the animal (present results; Supèr and others 2001b). Therefore, on the basis of these observations, we suggest that the visual system uses late modulated activity in V1 as the internal evidence of the stimulus.
Like other visual- and motor-related areas, V1 projects to the superior colliculus (Finlay and others 1976; Fries 1984. The projection originates from deep layers of V1 and terminates in the visual layers of the superior colliculus. The function of the V1 connection to this brainstem saccade generator is unknown. What is known is that the superior colliculus integrates visual and motor signals and that sensory information directs eye movements. For example, studies show that saccades are guided by objects rather than by the features that constitute the objects (Melcher and Kowler 1999; Hayhoe and Ballard 2005). Thus, bearing in mind the role of figure-ground activity in perception, we propose that V1 provides the internal evidence of the stimulus to guide saccades. This conclusion agrees with recent microstimulation studies in V1 that show interferences with visual detection (Tehovnik and others 2002; Slocum and Tehovnik 2004).
Together with other visual areas (Sheinberg and Logothetis 2001; Mazer and Gallant 2003), V1 may give input to a saliency map that guides oculomotor planning. Such a saliency map needs to have a high spatial resolution because spatial precision of gaze shifts is remarkable. Gaze changes of 20° or more in magnitude have a precision of a few degrees (Aivar and others 2005). Neurons in higher visual areas like V4 and TE and motor-related areas FEF and LIP have large receptive fields, albeit, some receptive fields shrink before a saccadic eye movement (Tolias and others 2001). Such a poor spatial resolution makes these areas not suitable to guide accurate saccadic eye movements, especially in natural and crowded scenes. The small receptive fields of V1 neurons, however, offer a good spatial resolution map. We propose that the saliency network makes use of the fine spatial information of V1 to guide saccadic eye movements.
Figure-Ground Activity and Saccadic Reaction Times
It is conjectured that figure-ground activity in V1 depends on horizontal connections within V1 and feedback connections from higher visual areas (Payne and others 1996; Wang and others 2000). The occurrence of figure-ground activity thus depends on recurrent interactions. These observations are supported by transcranial magnetic stimulation (TMS) studies in humans, which show that visual perception critically depends on recurrent processes between visual areas (Pascual-Leone and Walsh 2001; Corthout and others 2003; Clavagnier and others 2004; Juan and others 2004; Silvanto and others 2005). As a consequence, the strength of figure-ground responses to an identical stimulus may vary over time (Supèr and others 2003b). Previously, we used a decision-making task to show that the variability in the strength of figure-ground responses correlates with saccadic reaction times (Supèr and others 2003b). Similar observations have been made in the middle temporal (MT) visual area. The MT area provides neural evidence of perception of motion stimuli (Britten and others 1992, 1996). The strength of the MT signal, which is highly variable, correlates with behavioral reaction time (Cook and Maunsell 2002). Thus, perceptual signals arise within visual areas and predict saccadic reaction times.
However, the moment of a saccade is determined by the build-up of activity in motor-related structures during the motor preparation stage (Hanes and Schall 1996; Shadlen and Newsome 1996; Thompson and others 1996). Neural activity grows in time in these motor areas until it reaches a threshold and initiates a saccade. It has been proposed that the motor structures read out and accumulate the sensory evidence of the stimulus during this period (Groh and others 1997; Shadlen and Newsome 2001; Smith and Ratcliff 2004). Therefore, it can be argued that motor neurons read out the figure-ground signal in our figure-ground detection task. As a result, a strong figure-ground signal will produce a rapid growth of build-up activity in the motor-related structures. The threshold will be reached earlier, which will lead to a fast reaction time. This idea would explain our observations.
We propose that figure-ground activity in the primary visual cortex provides the oculomotor system with the perceptual information for making the decision of when and where to look. Variability in sensorimotor reaction times is therefore determined by both the motor and the sensory end of the process.
We thank Kor Brandsma and Jacques de Feiter for biotechnical support. H.S. is supported by a Netherlands Organisation for Scientific Research Program Grant. Conflict of interest: None declared.