Attention is known to play a key role in perception, including action selection, object recognition and memory. Despite findings revealing competitive interactions among cell populations, attention remains difficult to explain. The central purpose of this paper is to link up a large number of findings in a single computational approach. Our simulation results suggest that attention can be well explained on a network level involving many areas of the brain. We argue that attention is an emergent phenomenon that arises from reentry and competitive interactions. We hypothesize that guided visual search requires the usage of an object-specific template in prefrontal cortex to sensitize V4 and IT cells whose preferred stimuli match the target template. This induces a feature-specific bias and provides guidance for eye movements. Prior to an eye movement, a spatially organized reentry from occulomotor centers, specifically the movement cells of the frontal eye field, occurs and modulates the gain of V4 and IT cells. The processes involved are elucidated by quantitatively comparing the time course of simulated neural activity with experimental data. Using visual search tasks as an example, we provide clear and empirically testable predictions for the participation of IT, V4 and the frontal eye field in attention. Finally, we explain a possible physiological mechanism that can lead to non-flat search slopes as the result of a slow, parallel discrimination process.
Experiments investigating object detection and attention indicate that sets of cells encoding object features compete with one another in parallel. Chelazzi et al. (1993, 1998) assume that such a competition can be resolved by a feature-specific bias from working memory. Similarly, the feature-similarity framework (Treue and Martínez Trujillo, 1999) suggests that feedback implements a parallel feature-based gain control. Other work has revealed that a spatial bias can also resolve competition among cells (Luck et al., 1997; Reynolds et al., 1999).
Computational models have shown that interactions within a network can lead to attentive effects (Mumford, 1992; Tononi et al., 1992; Hamker, 1999; Kirkland and Gerstein, 1999; Corchs and Deco, 2002; Knoblauch and Palm, 2002). Specifically, we have recently shown that a global feature-specific bias can guide spatial selection by feedback within the ventral pathway (Hamker, 2004b). According to our model, a target template in prefrontal areas enhances the gain of cells in IT and V4 and facilitates processing of the features that are to be detected. The origin of a spatially selective bias, however, is rather unclear. Among others, the lateral intraparietal area (Bisley and Goldberg, 2003), the superior colliculus (Ignashchenkova et al., 2004) and the frontal eye field (FEF) (Bichot and Schall, 1999a) have been suggested to implement spatial attention. Inspired by the latter findings, we designed a computational model in which spatial attention emerges by reentry from the FEF, and showed that the temporal course of IT cell activity fits with some data of Chelazzi et al.'s (1993, 1998) experiment (Hamker, 2001, 2002, 2003). Further evidence in favor of the FEF has been given by Moore and Armstrong (2003), who have shown that the gain of V4 cells can be modified by a brief stimulation of FEF neurons. Assuming the FEF is indeed directly involved in spatial attention, the FEF could implement a gain modulation in V4 in two ways. Movement and visuomovement cells exhibit target selection and both could be the source of a reentry signal. A visual selection model and a movement preparation model have been proposed. The visual selection model predicts that target selection in the visuomovement cells provides the focus of attention (Thompson et al., 1997; Murthy et al., 2001; Sato and Schall, 2003). Alternatively, the movement plan model predicts that the activity of movement cells provides a spatial reentry signal (Hamker, 2003). At present there is no conclusive data in favor of one over the other.
In order to shed more light on the function and predictions of the movement plan model, the present paper focuses on a comparison of the movement plan model with a range of experimental data. We demonstrate that the reentry signal of the movement plan model is consistent with other conditions tested in Chelazzi et al.'s (1998, 2001) visual search experiment and with data from a conjunctive visual search task (Bichot and Schall, 1999b). Alternative models are shown to be less consistent with the data of Chelazzi et al. (1998). We further show that the model exhibits target selection in the visuomovement cells similar to FEF data in an eye movement task (Sato et al., 2001), although the reentry signal originates from movement cells. In order to obtain a model with predictive power we (i) put much emphasis on the selection of areas involved in the visual search task; and (ii) constrain our model to match the typical temporal course of activity of cells in all implemented areas.
Our simulations result in novel and specific predictions, one of the most relevant being that the latency of the spatial reentry depends on the degree of the target–distractor discrimination. This finding has strong implications on the emergence of search slopes in visual search experiments.
Materials and Methods
Memory-guided Search Task
We simulate the memory-guided search task used by Chelazzi et al. (1998). If the sample reappears in the search array, the condition is called ‘Target Present’ (Fig. 1). The result is a ‘saccade to the good stimulus’ if we observe a cell that is strongly driven by the cue. Let us now assume that we observe the same cell but present a cue stimulus that does not drive the cell very well. In this case the outcome is denoted ‘saccade to the poor stimulus’, since the chosen stimulus is a poor stimulus for the observed cell. If the good and the poor stimulus are within the search array and we present the poor stimulus as the cue, we observe distractor suppression. In the ‘Target Absent’ condition the cue stimulus is different from the stimuli in the choice array. In this case the saccade has to be withheld.
Outline of the Model Proposed
We identified and constructed a network of relevant brain areas that are sufficient to perform the visual search task in Chelazzi's experiment (Fig. 2). Our model consists of ascending populations called ‘stimulus cells’ that can be primed by feedback connections, and descending populations of ‘target cells’ that project dominant patterns back into the source areas. In brief, the proposed dynamics of perception are as follows. Massive feedback projections within the ventral pathway implement a gain control in order to transfer target information represented in ‘higher’ areas to intermediate areas (V4). These intermediate areas drive the FEF and lead to target discrimination in visually responsive cells. By way of reentry into extrastriate visual areas from FEF movement cells, neurons in V4 and IT that have their receptive fields at the location of an intended eye movement increase their sensitivity and gain an additional advantage in competition.
We now describe the central gain control mechanism which determines the interaction of different areas, followed by an explanation of the different model areas. A mathematical description of the model can be found in Appendix I.
Mechanisms of Interaction between Brain Areas
The selectivity of each cell is defined by its location i ∈ N in the population and its activity ri reflects the conspicuity of its preferred stimulus. Each cell is simulated by an ordinary differential equation (equation 1), that governs its average firing rate over time. Thus, using the model we are able to observe the temporal change of activity induced by a reentry signal.
Consistent with recent findings (Hupé et al., 2001), we model the influence of reentry as a gain control mechanism on the feedforward signal. In abstract terms, the reentry signal represents the expectation
Altogether, the change of activity of a cell i is a function of input
Given an identical input, the timing of reentry determines the change of activity of a target cell (Fig. 4). The difference of responses prior to 100 ms solely depends on influence from IT. After that, the FEF starts to weakly modulate the response. A strong modulation from the FEF does not occur prior to 150 ms.
Our gain control mechanism builds the core of the system in respect that it defines how areas on a different hierarchical level interact with each other in a continuous fashion.
Interactions among Model V4 Cells
The model V4 cells are driven by the input to the model and, consistent with known massive feedback projections in the ventral pathway (Rockland and van Hoesen, 1994; Rockland et al., 1994), are modulated by IT. Another source of top-down influence seems to have its origin in the occulomotor circuit (Moore, 1999; Moore and Fallah, 2001; Tolias et al., 2001), in particular the FEF (Moore and Armstrong, 2003). We suggest that FEF movement cells modulate the gain of cells in V4 and IT (Hamker, 2003). Although retrograde labeling by tracers has revealed connections from layer 5 in the FEF, which contains movement cells, to extrastriate visual areas (Schall et al., 1995a), there is no direct evidence for the assumption that the movement cells are responsible for gain control.
The V4 used in our model is consistent with a range of experimental findings (Hamker, 2004a): if the receptive field contains just one stimulus, then a spatial bias results in a multiplicative gain increase. This has been observed in MT, MST and V4 (Treue and Maunsell, 1999; McAdams and Maunsell, 1999). If two stimuli are presented within the same receptive field, then the model V4 reproduces the data of Reynolds et al. (1999): a bias towards one stimulus reduces the influence of the other stimulus within the receptive field. We explain these attention effects by an input gain increase and additionally by an indirect inhibition among active populations.
Interactions among Model IT Cells
Consistent with the large receptive fields of IT neurons, our model IT cell population receives converging input from all V4 populations (Fig. 2). Elevated baseline activity in IT cells (Tanaka et al., 1991; Miller et al., 1993; Chelazzi et al., 1993, 1998) is likely to originate in the prefrontal cortex (Tomita et al., 1999). Consistent with this finding, model prefrontal areas provide feedback into IT (Fig. 2). Since FEF projects to TEO (Schall et al., 1995a) the input gain in IT is also affected by model FEF movement cells (Fig. 2). We use the same model for IT as we do for V4, but our IT cells have stronger lateral inhibition.
Task Control by PF Cells
The prefrontal cortex has been extensively studied in recordings around the principal and arcuate sulci, i.e. areas 8, 46 and 45 (Miller and Cohen, 2001) and is known to participate in the coordination of tasks (White and Wise, 1999; Asaad et al., 2000; Hasegawa et al., 2000; Miller and Cohen, 2001; Tanji and Hoshi, 2001). Areas 8 and 46, which overlap the frontal eye field, are often reported to code location- and motor-related signals, while area 45 is involved in categorization and feature detection (Freedman et al., 2001). Prefrontal cortex might apply a modulation over other areas in order to alter the mapping from perception to action (Miller and Cohen, 2001). Extending this concept, we show that prefrontal modulation can change the internal state of the system. One aspect of this control function is often referred to as working memory, while another is the detection of a match between object and sample in a delayed match-to-category task (Freedman et al., 2002). Our model prefrontal cortex fulfills these two major functions, encoding a pattern in PF working memory cells and indicating a match of the incoming pattern with the memorized pattern in PF match cells. Thus, IT cells can only drive PF match cells when their pattern matches the expectation from PF working memory cells (Fig. 2).
Saccade Target Selection by FEF Cells
The FEF has connections to occipital, temporal and parietal areas, the thalamus, superior colliculus, and prefrontal cortex (Stanton et al., 1988, 1993; Schall et al., 1995a). The FEF can be subdivided into lateral and medial parts.
The lateral FEF, which generates short and precise saccades (Bahill et al., 1975), is connected to the dorsal (LIP, MT, MST, V3) and ventral (TEO, V4, V2) pathways, the ventrolateral prefrontal cortex (Baizer et al., 1991; Schall, 1995; Schall et al., 1995a; Stanton et al., 1995), and the superior colliculus (Sommer and Wurtz, 2000). The projections from V2 and V3 are weak, while the one from V4 is intermediate. Strong projections from TEO, MT and MST suggest that the FEF uses features after several stages of processing for target selection (Webster et al., 1994; Schall et al., 1995a).
Our model is consistent with this anatomy. FEF neurons receive convergent afferents from features across all dimensions in V4 at the same retinotopic location. Since anterior IT cortex, the area from which Chelazzi et al. (1998) recorded, does not project directly to FEF, we do not model any input to this area from IT.
The neurons in the FEF can be categorized based on both their responses to visual stimuli and to saccade execution into visual, visuomovement, fixation and movement cells (Bruce and Goldberg, 1985; Schall et al., 1995b). We consider visuomovement, fixation and movement cells (Fig. 2), and even model their temporal dynamics: visuomovement cells in deep layers are active from stimulus onset until saccade execution. Typically their initial response does not distinguish between distractor or target, but the activity decays when a distractor is in the receptive field (Schall et al., 1995b). Movement cells are active prior to saccades and do not show any response to stimulus onset (Hanes et al., 1998). Fixation cells decrease their activity before a saccade and increase their firing rate after the saccade or to terminate a planned eye movement (Hanes et al., 1998). Movement-related cells in the FEF show a fixation-disengagement discharge (Dias and Bruce, 1994), which indicates that fixation cells inhibit movement cells (Burman and Bruce, 1997).
The decision to execute an eye movement or to withhold gaze is based on a threshold detection of the PF match cells. If the PF match cells fire, the target is detected in the search array and the movement cells are disinhibited by removing the input into the fixation cell (Fig. 2).
Sensory Interactions in IT During Visual Search
We now verify the reentry hypothesis by comparing the firing rate of our IT stimulus cells with recordings in IT (Fig. 5). All of our simulations correlated well with the experimental data, even with regard to the time course of competition.
When an array containing both the good and the poor stimuli is displayed (Fig. 5a), each cell initially encodes the presence of its preferred stimulus, but nonetheless the target cell shows an early advantage. Between 150 and 300 ms the cells encoding the non-target get suppressed almost to baseline activity, whereas the cells encoding the target show a small dip but then increase to the same level of the initial activation or even exceed it.
When only the good stimulus is presented, the physiological data show no difference in activity between the target and non-target conditions before the execution of an eye movement (Fig. 5b). Our simulations show a slight attention effect in favor of the target, since spatial and feature feedback cannot be completely shut off. However, as the activity of a model cell increases, feedback becomes less efficient and thus the attention effect is smaller than in conditions when stimuli compete. The presentation of a poor stimulus alone leads to a suppression, since in contrast to the experiment, our chosen poor stimulus does not drive the cell encoding the good stimulus.
A crucial condition is the target-absent condition (Fig. 5c). If the good and the poor stimuli are presented, the responses decrease after the initial burst. We explain this observation based on a weak winner-take-all competition. In the target-absent condition the good and the poor stimuli receive no top-down bias; they suppress each other and self-excitation is not strong enough for one population to dominate the other. Since prefrontal areas do not indicate the presence of the target, none receives a significant reentry, and the firing rate of IT cells decreases to a limit above baseline activity.
Figure 5d shows that the response to the good stimulus in the non-target condition is approximately halfway between the responses to the good stimulus and the poor stimulus in the target-present condition.
If we compare the stimulus alone with the two-stimulus array condition (Fig. 5e), we see that in both cases the good stimulus has almost the same activity around the time of the eye movement, although the activity in the good stimulus alone condition is initially stronger.
Our simulation replicates the temporal course of activity in the different conditions of the experiment from Chelazzi et al. (1998). This constraint allows us to make reliable predictions. Thus, we now explain the possible influence of the other simulated areas on the activity in IT.
Contribution of Other Areas to Visual Search
The good fit with the data in IT is only of value if we can demonstrate that the temporal course of activity in other model areas is consistent with experimental findings. Here, we restrict ourselves to the condition with a target and one distractor in the display (Fig. 6). The presentation of the cue elicits a response in IT cells, which is stored by working memory cells. Consistent with studies using a delayed match-to-sample task (Miller et al., 1996), elevated firing rates are visible during the delay. In addition, the temporal course of activity of the PF match cell is very similar to what has been observed in the prefrontal cortex during a delayed match-to-category task (Freedman et al., 2002).
The receptive field of the V4 cell shown in Figure 6 does not encompass the location of the cue. Our model predicts a baseline increase during the cue presentation only for those V4 cells that receive direct feature-selective feedback from the IT. For other cells it predicts a slight suppression due to unspecific long-range inhibition. Consistent with this prediction, Chelazzi et al. (2001) report that 4.9% of V4 cells exhibit a significant baseline increase, while 67.9% are inhibited during cue presentation.
In order to guide eye movements, the information about the presence of the target encoded in IT has to be converted into the information about the target's location. We have shown that feedback from IT to V4 cells, which have smaller receptive fields, can provide information both about the features of the target and its location (Hamker, 2003). Thus, the model predicts an early target effect in V4. Consistent with this prediction, Chelazzi et al. (2001) found a slight early target effect in V4 cells, which is stronger when two stimuli are located within the same receptive field. Although this early attention effect is only small, it is remarkable since V4 is the second stage of feedback after TEO.
In the projection from V4 to FEF the neural firing pattern in V4 is averaged over dimensions and the feature specifity gets lost. Thus, the initial feature-specific enhancement in IT is transferred via V4 and FEF into a location specific advantage of some locations over others. The threshold detection in the PF match cells causes the FEF fixation cell to decrease in activity in order to plan an eye movement. Initially, FEF movement cells are able to gain activity regardless of whether they encode a target or non-target in their movement fields. This is supported by experimental data (Bichot et al., 2001b).
The time the model needs to select a target for an eye movement is variable. We notice that the latency of the eye movements increases with the set size (Fig. 7a). We observe a slope of 12 ms per item compared to 26 ms measured by Chelazzi et al. (1998). However, such a steep slope results from the fast response in the one-stimulus case. Consistent with the empirical data of Chelazzi et al. (1998), an eye movement is delayed for ∼40 ms when two stimuli are presented. Apparently, the processing of a target stimulus slows down when its selection occurs during a competition with distractors.
Consistent with FEF data (Hanes and Schall, 1996; Schall, 2002), the variability in search time can have two reasons in our model: variability in the growth rate of the movement activity, and variability in the onset of the movement cell activity. The observed set-size effect originates primarily in the variability of the growth rate of the movement cell activity. We find that it decreases with latency: the time for the behavioral response is highly correlated with the time span from target detection to action selection (Fig. 7b). The growth rate of the movement activity in turn depends on the target discrimination in the input as well as the overall strength of the input. The two-stimulus condition shows a better target discrimination as well as a stronger input (Fig. 8).
In our model, the onset of the movement cell activity depends directly on the detection of the target's presence in PF match cells (target detection), as a result of the constraint to withhold an eye movement to a distractor. Thus, target detection also influences the set-size effect. In the present simulations, however, target detection begins at a fairly constant time, ∼120 ms after target presentation. Miller et al. (1996) measured an average match response of ∼110–120 ms in prefrontal cortex as well, while presenting just one object at a time. The reason why we find a constant target presence detection lies in the simple stimuli and the low-level feature space we use [the scenes used in the experiment of Chelazzi et al. (1998) were also relatively simple]. Consistent with our model, difficult scenes can result in a delay, or even failure, in detecting the presence of the target.
We have demonstrated that a movement plan model fits with the temporal course of activity in IT and V4 using the paradigm of Chelazzi et al. (1998, 2001). Usher and Niebur (1996) have shown target selection in IT with only a feature-specific bias. Alternatively, it was suggested that visuomovement cells in the FEF could select the target (Thompson et al., 1997; Sato and Schall, 2003). We simulated these alternative models as well, to shed more light on their limitations (Fig. 9). Since all models contain a bias, either feature-specific alone or an additional location-specific bias, we observe the trivial result that the responses to the good and the poor stimuli differ. The objectives of rating the simulation data are as follows. First, in the target-present condition the IT cells show a transient response to the good stimulus and increase in firing prior to the eye movement. Second, in the target-absent condition none of the behaviorally irrelevant stimuli gets selected. From the experiment of Chelazzi et al. (1998) we cannot rule out that attention is not directed to non-target stimuli. However, since none of the stimuli receives a bias given by the instruction and the monkey has to hold fixation, we demand that noise in the neural responses alone should not result in the selection of a behaviorally irrelevant stimulus in response of the presentation of two stimuli. The parameters of all models are optimized separately to meet the objectives as well as possible for each model.
We simulated the model following the classical interpretation with a feature-specific bias from prefrontal cortex using a strong feedback from IT to V4 (Fig. 9a) and an intermediate feedback from IT to V4 (Fig. 9b). The strong feedback condition fulfills our first objective and shows an increase of activity prior to the eye movement, but it clearly fails to achieve the second one. The reduction of the weight of feedback from IT to V4 decreased the sensitivity to noise (second objective), but a reasonable bias from prefrontal cortex to IT does not sufficiently activate the response to the target prior to the eye movement. In general, any form of recurrent excitation is sensitive to noise. Thus, a strong excitatory loop within IT would select a behaviorally irrelevant stimulus as well.
Another alternative model for explaining the observation is a spatial reentry signal from the FEF visuomovement cells (Fig. 9c). In this model all locations receive a transient spatial bias due to stimulus onset, but since the visuomovement cells exhibit target discrimination (Fig. 8a), their activity can be sent back to V4 and IT to spatially select a stimulus. However, a reentry from the visuomovement cells shows difficulties meeting the objectives as well. We had to choose a weak gain for the spatial reentry signal, since otherwise noise results in the selection of a non-target. Even the strongest possible gain, which already slightly selects a non-target, did not allow for meeting the first objective.
We conclude that the target-present condition is difficult to explain entirely through the activation of a feature-specific top-down bias from prefrontal areas. A strong self-enhancement is sensitive to noise and thus predicts a winner in the non-target condition as well. A weak self-enhancement needs an additional strong (driving) bias. Visuomovement cells do not provide a good bias, since they are not decoupled from the early sensory processing and, thus, their bias is also sensitive to noise. A spatial reentry from movement cells is decoupled from direct sensory processing, since it requires the decision to plan an eye movement and so is not sensitive to noise.
We are careful to definitively rule out the alternative models, since the data from Chelazzi et al. (1998) do not allow a quantitative analysis. However, we exposed obvious inherent limitations of the alternative models in explaining the findings. According to our simulations, a spatial bias from the movement cells fits the objectives best.
An alternative, feature-specific explanation could be a weak early prefrontal bias and a strong late prefrontal bias. However, the monkey in Chelazzi et al.'s experiment knows the target object and its search plan is set, so it is unclear why a difference in strength between early and late prefrontal bias should occur. This does not mean that we can definitively exclude a feature-specific explanation. Nevertheless, as explained later, our hypothesis results in new testable predictions.
Saccade Target Selection and Saccade Latency
Our model was optimized to fit IT data in the visual search task of Chelazzi et al. (1998) using general information about the time course of activity in the FEF (Bruce and Goldberg, 1985; Schall, 1995; Bichot and Schall, 1999a). We have already discussed its fit with the V4 data obtained by Chelazzi et al. (2001). To further demonstrate that our model FEF can account for the data from a variety of experiments, we compare the same model with identical parameters to the behavioral data of a conjunction visual search experiment from Bichot and Schall (1999b) as well as FEF data from Sato et al. (2001).
Bichot and Schall (1999b) found that correct saccades are faster than incorrect ones. In our simulation (Appendix II) we varied the search efficiency of the task by a random selection of the feedback strength from PF working memory to IT. We observe a performance of 96% for correct target selection in trials with set size 4 and of 94% in trials with set size 6. Consistent with Bichot and Schall (1999b), the average time for correct saccades (291 ms) is significantly shorter than for incorrect saccades (360 ms) in the set size 4 condition (t-test, P < 0.001) as well as in the set size 6 condition (t-test, P < 0.001), with 298 ms for correct saccades and 472 ms for incorrect saccades. As we shall see next, the model predicts this increase on the basis that a poor discrimination leads to longer competition in the FEF.
A recent report investigated the effect of input discrimination on visual selection in the visuomovement cells of the FEF (Sato et al., 2001). Increasing the similarity of the distractors to the target increased reaction time and increased the time needed to discriminate the target by FEF visually responsive neurons. We have shown that increasing the target–distractor similarity increases the time to select the target and increases the number of errors (Hamker, 2004b). The target–distractor similarity and other factors, such as the availability of a target template, determine search efficiency by varying the target discrimination in the input of the FEF. We now shed more light of how target discrimination affects the time for target selection. We sorted the responses in the conjunction visual search simulation according to the reaction time and separated the trials into three equal groups (fast, medium, slow). By comparing the fast and slow groups, we see — similar to Sato et al. (2001) — a clear latency-increase in target discrimination with slower response time (Fig. 10). Thus, our model FEF transfers the target discrimination into the latency of a reentry signal.
Figure 11 shows the activity of the visuomovement cells in the fastest and slowest conditions. The initial activity clearly reflects the top-down advantage from the ‘what’ pathway (i.e. the number of dimensions that the item shares with the target). The target extends the discrimination with increasing time, consistent with the experimental data (Bichot and Schall, 1999a; Bichot et al., 2001a; Sato et al. 2001). In the fastest trial target discrimination occurs very early (50 ms), whereas in the slowest trial the discrimination of the target occurs at 290 ms.
A Poor Target Discrimination in FEF Visual Cells Results in Higher Activation of Non-target FEF Movement Cells
Our model FEF visuomovement cells show the effect of search efficiency on the visual selection in the FEF (Sato et al., 2001): low efficiency is characterized by poor (late) target discrimination in the visual cells. We now predict how search efficiency affects the movement cells, which were not investigated by Sato et al. (2001). In the case of a low efficiency, where a poor (late) target discrimination in the visual cells was observed, the model movement cells need more time to resolve the competition (Fig. 12). Our model predicts that in this case the distractor location can achieve a high activation relative to the condition with a good (fast) target discrimination.
A Late Target Effect in V4 and IT Is Launched by Spatial Reentry
Chelazzi et al. (1998) defined an early time window from 70 to 170 ms after stimulus onset and a late time window from 100 ms before the saccade until its execution. The responses of IT and V4 cells show an enhanced activity for the target in the early window, whereas a significant target selection was observed in the late time window (Chelazzi et al., 1998, 2001). It was suggested that the observed responses can be explained by a feature-specific bias from prefrontal areas (Chelazzi et al., 1998). Usher and Niebur (1996) have shown that competition among model IT cells is sufficient for the target selection observed by Chelazzi et al. (1993). However, their model is limited to the case of one target and one distractor, and did not explain the target-absent case. Our simulations of target-present and target-absent cases have shown that the target selection in the late phase is consistent with a reentry from the fronto-parietal network. A model without spatial reentry has difficulties in reconciling both the target-present and target-absent data.
Movement Cells of the Frontal Eye Field Are the Origin of Spatial Reentry
In the search for the saliency map, proposed by the classical hypothesis of spatial attention, a task-relevant increase has been reported in several fronto-parietal areas that process space, such as LIP (Bisley and Goldberg, 2003) and FEF (Bichot and Schall, 1999a). However, the major question is not which areas reflect attention but which areas are likely candidates for a spatially organized feedback signal — the source of spatial attention in the ventral pathway. Some recent experiments reported presaccadic activity in V4 (Moore, 1999; Tolias et al., 2001) which is likely to originate from the FEF (Moore and Armstrong, 2003). Since visual, visuomovement and movement cells exhibit target discrimination, spatial attention could be explained by a visual selection model or a movement plan model. Thompson and Schall (2000) observed a discrimination in the visuomovement cells and proposed a direct feedback of these cells into V4. We observe this discrimination in our model as well (Fig. 11). However, we suggest a movement plan model. Movement neurons have a late response and no phasic burst in response to stimulus onset. They show only little enhancement for distractors in visual search (Bichot et al., 2001b) and correct rejections in masking experiments (Thompson and Schall, 2000). Thus, movement cells are decoupled from direct visual processing. Our model suggests feedforward excitation and global inhibition from the visuomotor cells as a possible mechanism. Such a mechanism ensures that a broad activation pattern within the visuomovement cells is not transferred to movement cells. A strong and early feedback for target and distractors, as predicted if the phasic visual or visuomovement cells are the origin of reentry, introduces a selective bias in V4 and IT, which is sensitive to noise. We could only reconcile the experimental data with the simulation by assuming a feedback from the movement cells. The timing of a strong discrimination for our FEF movement cells, beginning 150 ms after array onset and 110 ms before eye movement, fits very well with the late target effect in the experimental data. This result is also consistent with information theory. If we define the reentry signal towards the target (true expected location) as the signal of interest and overall firing rate (false expected location) as noise, we would get a much higher signal/noise ratio in the movement cells than in the visuomovement cells.
Given this definition of spatial attention, our model predicts that target discrimination in the visuomovement cells can indicate spatial attention (Figs 8a, 11). However, in our model, target discrimination in the visuomovement cells guides spatial selection but does not provide the causal connection to spatial attention in V4.
Target Discrimination Translates into Latency of a Spatial Reentry Signal in Visual Search
We observed that a low target discrimination results in a slow and error-prone reentry process. As a result of a correct reentry, our simple model already predicts set-size effects in parallel searching (Fig. 7).
What factors might determine the input into the FEF (V4 activity) in such a way that it needs more time to select the target? Duncan and Humphreys (1992) have shown that varying the target–distractor and distractor–distractor similarity changes the efficiency of the task and produces different search slopes. The underlying reason could be that the different similarities determine the discrimination of the target in the ventral stream but do not produce any delay as such. An increasing set size might also reduce the discrimination through competitive interactions in V4. Since the ventral stream feeds the fronto-parietal network, the initial discrimination in action planning centers must also be poorer. Our simulations show that this poorer discrimination causes a slower spatial selection (Fig. 10). We observed selection times in the movement cells ranging from 220 to 400 ms after stimulus onset. Longer selection processes have not been observed, since noise in the system enforces either a correct or wrong selection. Depending on the efficiency of the search task our parallel mechanism can show a difference of 180 ms in selection time. Thus, we predict no faster selection times of covert attention than ∼120 ms, which is the discrimination time of movement cells in the fastest trial (Fig. 12a). Under the assumption that the number of items in the display affects the target–distractor discrimination, we predict that shallow but non-flat search slopes are based on a parallel mechanism. The prediction of a parallel search is of course difficult to test, since it would require showing the absence of any repetitive serial selection. However, we can give theoretical evidence that a slow reentry signal from movement cells can explain non-flat search slopes as result of a parallel process.
We aimed to demonstrate the suitability of our reentry hypothesis by comparing simulations with experimental data. Each modeled area exhibits a temporal course of activity that has been observed by similar physiological experiments performed by various investigators. Our approach is an attempt to tie together the existing understanding into a unified whole, so that we can better understand the interactions between different areas and design appropriate future experiments. We have demonstrated that the model can account for recent findings (Sato et al., 2001; Bichot et al., 2001a; Chelazzi et al., 2001) for which the model was not adjusted. Moreover, the simulations resulted in several experimentally testable predictions. We now discuss possible impacts of our study on theories of visual perception.
Reentry and Competitive Mechanisms Evoke Attention
Attention is generally assumed to be computed within some brain areas in order to control processing in the brain. For example, Posner and Dehaene (1994) suggested that there were anterior and posterior attention systems. Such a localized view of attention is even more explicit in models in which attention originates within a saliency map (Treisman and Gelade, 1980; Wolfe, 1994; Itti and Koch, 2000). Other models have emphasized the controlling function of attention such as selective tuning (Tsotsos et al., 1995), the shifter-circuit (Olshausen et al., 1993) or a gain field (Salinas and Abbott, 1997). We admit that such models can be useful to describe aspects of attention, but they offer only a very abstract explanation of this phenomenon. Electrophysiology has started to investigate the neural mechanisms of attention. For example, within the biased competition framework attention has been suggested to be an emergent property of neural mechanisms (Desimone and Duncan, 1995). In particular, effects within the receptive field of cells have been revealed. In addition, the feature-similarity framework (Treue and Martínez Trujillo, 1999) suggests that mechanisms of feedback implement a global gain control.
Some recent computational models have emphasized the role of interactions within a network for explaining vision (Tononi et al., 1992; Mumford, 1992; Hamker, 1999; Kirkland and Gerstein, 1999; Hamker, 2000; Corchs and Deco, 2002). However, we are still missing an approach that allows us to describe how different areas contribute to object detection, attention and eye movement control. Tasks such as Chalazzi's visual search experiment can only be fully explained by an account that shows how different areas operate on the same event (Duncan et al., 1997). The present approach is particularly relevant, since each area is clearly defined and its cell dynamics have been observed in various experiments. We even account for the subdivision of cells in the FEF. This constraint considerably improves the validity of the claim that attention can be explained by already known areas, which compute specific variables, but not attention itself. We suggest that attention should not be regarded as a resource given by some control module. Attention is the result of mechanisms that act on the processed variables, such as gain control, by reentry and competitive interactions. We propose that future research focuses on identifying the areas that modulate vision. Movement cells of the FEF could provide an ideal signal for spatial selection. Other relevant areas controlling vision are the planning stages of the task at hand, which set task instructions and compute variables of interest. The mechanism described allows vision to be under cognitive control to resolve interference and to connect high-level task descriptions or actions with low-level scene descriptions.
The Mechanism of Spatial Reentry Influences the Search Slope
In most visual search tasks the reaction time of subjects increases with the number of items. Two opposing theories have been suggested. The serial search hypothesis assumes that non-flat search slopes are necessarily the result of a scanning process that visits one item after another (Treisman and Gelade, 1980; Treisman and Sato, 1990; Wolfe, 1994; Itti and Koch, 2000). This assumption sometimes results in selection times of 30–50 ms per item. Parallel search has explained set-size effects in terms of a slow competitive mechanism (Duncan and Humphreys, 1989; Palmer, 1995; Deco et al., 2002).
Hybrid models have also been formulated (Bundesen, 1990, 1999; Chelazzi, 1999). They typically differentiate between a parallel capacity limited ‘one-view search’ and an additional slow spatial shift of attention. However, they do not specify the underlying neural mechanisms so that it is unclear on what kind of processes search is based. Since observation of human reaction times does not allow one or other explanation to be ruled out, experiments using a variety of methods have recently been conducted to ascertain the type of process (Corbetta et al., 1995; Woodman and Luck, 1999; Donner et al., 2000; Hopf et al., 2000; Leonards et al., 2000). Although some experiments tried to identify areas involved in a serial selection, the overall results are still inconclusive.
Our suggested spatial reentry mechanism predicts the involvement of a slow parallel as well as a serial component in visual search. Based on our simulation results we suggest that the brain does not have a fast scanning mechanism, only a slow one. We explain shallow but non-flat search slopes by a poorer and slower discrimination process for reentry. Steep search slopes, however, are likely be based on sequential reentry components. Interestingly, both modes are grounded in the same process. The strength of our approach lies in its testable predictions, which is an inherent result of the assumption that FEF movement cells provide a spatially selective reentry signal. Thus, we offer a clear description of the underlying process that can lead to set-size effects. The timing of the spatial reentry signal depends on the target discrimination and is therefore a variable parallel process. A poor discrimination, however, can lead to a wrong reentry. Since a distractor will be identified as such by the enhanced gain of cells encoding the distractor, a disengagement and following engagement of the spatial reentry component introduces the serial mechanism.
Benefits and Limitations of the Model
At the model's core a reentry signal acts multiplicatively on the input of a cell, and thus gain control is described by means of a comparison of the feedforward with the reentry signal. The exact implementation in the brain is controversial; however, on an abstract level, multiplicative interactions are consistent with observations (Eskandar et al., 1992; McAdams and Maunsell, 1999; Hupé et al., 2001). Although we achieve a good fit with the temporal course of activity in several areas, and we have shown earlier that such a gain control also fits with recent experiments observing attention effects in V4 (Hamker, 2004a), at present it would be too early to claim that this describes an universal mechanism to implement a cognitive control of vision.
We have excluded the effects of stimulus-driven saliency. Consistent with our model, these effects might emerge from interactions in the network as well (Nothdurft et al., 1999; Kapadia et al., 2000; Li, 2002; Hochstein and Ahissar, 2002). Salient features would then be enhanced similar to feature-based, top-down effects.
We compared our model with data in which the monkey responded by making an eye movement towards the target. Chelazzi et al. (1998) report similar findings in a task where the monkey responded by pressing a lever. Our model would also produce qualitatively similar results if we assume that in this task the monkey is planning an eye movement, but movement cells do not reach threshold activity. At present, no experiment has studied FEF movement cell activity in covert attention tasks.
We do not claim that the FEF movement cells are the only source of spatial reentry. Within a distributed system, other areas are likely to have established similar mechanisms. The model is based on current anatomical and electrophysiological knowledge. Other areas, if necessary, can be included based on our gain control mechanism without changing the basic functionality described. Our simulations cannot prove that the movement cells or the FEF in general necessarily are responsible for the reentry signal. However, feedback from the visuomovement cells or no feedback at all resulted in a poor fit with the temporal course of activity in IT. Thus, based on our computational evidence, we suggest that the typical temporal course of activity of the FEF movement cells (Figs 8b, 12) is a necessary signal to discriminate the target from the background. Provided that anatomical studies show evidence for feedback connections, this prediction could be used to preselect cells in other brain areas in order to investigate if they are a source of reentry. LIP, for example, has only a few movement-type cells.
A strength of this model is its testability based on the predictions. In future work this model will be tested with other experimental paradigms. We have already managed to scale-up the model to cope with natural scenes (Hamker and Worcester, 2002). From the theoretical point of view our simulations reveal that an action/perception network can operate in a coordinated fashion by means of reentry. The decision in one area affects the outcome of the competition in another area, so that finally all areas operate on the same problem, an aspect of binding in the brain.
Appendix I: Computational Aspects of the Model
We now give a formal description of the model. We first explain the input stimuli as well as the mechanisms of pooling and gain control. Then the equations of each area are given. Each connection in the model has an independent additive noise term that leads to variations in the transmission from one cell to another.
Input stimuli Id,i,x are encoded as a population of cells i determined by a Gaussian distribution at each dimension d and each location x. For realistic experimental conditions, we delayed the input for 30 ms to account for the time a stimulus needs to reach V2. Since V1 cells typically fire very strongly in the beginning and then decrease in firing rate, we include a short-term synaptic depression Sd,i,x (similar to Chance et al., 1998, 1999) of the input.
We describe the modulation of the firing rate
The filtered incoming pattern is continuously compared with the expectation, such as spatial location or specific stimulus features. The gain is enhanced if the expectation
We use a non-linear pooling function f to define the influence of the filtered afferents
Chelazzi et al. (1998) reported no attention effect on a single stimulus within a receptive field. A simple multiplicative gain increase would predict an even stronger effect. Reynolds et al. (2000) found that the effect of spatial attention can be best described as a contrast gain model. Attention increases the effective strength of a stimulus but not with high-contrast stimuli. Chelazzi et al. (1998) also used high-contrast stimuli. We do not aim to explain the possible underlying mechanisms of this effect here, but rather account for the finding by decreasing the efficiency of the feedback signal when the cell activity is higher according to
Pooling Across Afferents
According to a previous study (Hamker, 2004a) we simulate a convergent projection from areas with smaller receptive field sizes to areas with larger receptive field sizes (Fig. 13) with a max-pooling function:
In our model we do not increase the complexity of features from V4 to IT. Thus, our model IT populations represent the same feature space as our model V4 populations. The receptive field size, however, increases in our model, so that all populations in V4 converge onto one population in IT.
The underlying circuits, which are responsible for memory and the detection of a match, can involve many regions including subcortical areas. For simplicity, we assume a recurrent local circuit for working memory which is driven by ITs cells. The lateral weights wij are computed from a Gaussian with wii = 0.3 and σ2 = 0.6. Match cells (PFm) compare in parallel the current pattern in ITs cells with those in working memory (PFwm) (Fig. 2).
To determine whether a pattern in the visual scene is similar to the pattern in memory we multiply the activity of the working memory cells with the one of IT cells. Activity increases in the match cells only if populations in ITs and working memory match. Cells with such characteristics have been observed (Freedman et al., 2002). The lateral weights wij are computed as in PF working memory.
We simulate frontal eye field visuomovement neurons which receive convergent afferents from V4 at the same retinotopic location (Fig. 2). Different dimensions d add up.
Specification of Parameters
The temporal dynamics, including the effect of inhibitory pools within each, area has been worked out over several years, starting from an early simple model (Hamker, 1999). Once the dynamics, including the gain control mechanism, have been set up, the parameters of the model were specified from local to global. Our choice of parameters was guided by the typical course of activity measured in cell recordings. V4 was fit with experimental data from an attention experiment (Hamker, 2004a). The fine tuning to fit the experimental data of Chelazzi et al. (1998) was done by iteratively adjusting the weights between the areas, keeping the parameters within the areas fixed. The final values used are examples for which the model exhibits dynamics that closely resemble those of the recordings of Chelazzi et al. (1998). The qualitative behavior of the model is stable over a reasonable range of the parameters. Although the model contains several parameters to simulate the firing rates, the degrees of freedom are strongly limited by the constraint of matching the typical course of activity and by ananomical constraints. Such systems models differ largely from mathematical models (e.g. Bundesen, 1999) in which parameters are much less constrained by electrophysiology and anatomy.
Appendix II: Conjunctive Search Task
Two conjunction visual search experiments have been simulated: a target with three distractors and target with five distractors. We construct a target item in two dimensions, i.e. ‘color’ and ‘shape’. The color-similar distractor activates the same neural population as the target in the first dimension and the shape-similar distractor activates the same population as the target the second dimension. The four-item display contains a target, a dissimilar, a shape-similar and a color-similar distractor. The six-item display is extended with an additional shape-similar and color-similar distractor. The target ‘color’ and ‘shape’ are stored in memory before the search begins without showing a cue.
To investigate interesting dependencies between correct and error trials, as well as easy and difficult trials, we varied the search efficiency of the task by varying the top-down weight from the PF working memory to IT. Among other sources that determine search efficiency, this simulates the availability of a target template. The simulations are repeated 80 times for each set size. Unlike the simulation of the experiment of Chelazzi et al. (1998), a saccade is always executed even if the match with the target template is poor.
Appendix III: Target Discrimination Analysis
To determine the time at which neural activity in FEF visuomovement cells discriminates the target from distractors, we defined a discrimination threshold. For sufficient discrimination of the target the difference between its activity and the activity of a cell encoding a distractor location has to exceed the discrimination threshold for 15 ms. This is much simpler than the method used by Sato et al. (2001) for their recordings, but sufficient for a reliable measurement, since our model cells are less noisy than real cells. For all simulations we used the same model parameters.
I am grateful to Jamie Mazer, Jeffrey Schall, Leonardo Chelazzi, Rufin VanRullen and Christof Koch for helpful comments on earlier versions of this manuscript. I also thank Narcisse Bichot and Andrew Rossi for valuable discussions. This research was supported by DFG HA2630/2-1 and in part by the ERC Program of the NSF (EEC-9402726).