Abstract

A computational neuroscience framework is proposed to better understand the role and the neuronal correlate of spatial attention modulation in visual perception. The model consists of several interconnected modules that can be related to the different areas of the dorsal and ventral paths of the visual cortex. Competitive neural interactions are implemented at both microscopic and interareal levels, according to the biased competition hypothesis. This hypothesis has been experimentally confirmed in studies in humans using functional magnetic resonance imaging (fMRI) techniques and also in single-cell recording studies in monkeys. Within this neuro-dynamical approach, numerical simulations are carried out that describe both the fMRI and the electrophysiological data. The proposed model draws together data of different spatial and temporal resolution, as are the above-mentioned imaging and single-cell results.

Introduction

As is well known, because of the limited processing capacity of the visual system, attentional mechanisms are required in order to process information from a given scene.

The dominant neurobiological hypothesis to account for attentional selection is that attention serves to enhance the responses of neurons representing stimuli at a single relevant location in the visual field. This enhancement model is related to the metaphor for focal attention in terms of a spotlight (Treisman, 1982, 1988). This metaphor postulates a spotlight of attention that illuminates a portion of the field of view where stimuli are processed in higher detail while the information outside the spotlight is filtered out. According to this classical view, a relevant object in a cluttered scene is found by rapidly shifting the spotlight from one object in the scene to the next one, until the target is found. Therefore, according to this assumption, the concept of attention is based on explicit serial mechanisms.

There exists an alternative mechanism for selective attention, the biased competition model (Duncan and Humphreys, 1989; Desimone et al., 1990; Desimone and Duncan, 1995; Duncan, 1996). According to this model, the enhancement of attention on neuronal responses is understood in the context of competition among all of the stimuli in the visual field. The biased competition hypothesis states that the multiple stimuli in the visual field activate populations of neurons that engage in competitive mechanisms. Attending to a stimulus at a particular location or with a particular feature biases this competition in favour of neurons that respond to the location or the features of the attended stimulus. This attentional effect is produced by generating signals within areas outside the visual cortex which are then fed back to extrastriate areas, where they bias the competition such that when multiple stimuli appear in the visual field, the cells representing the attended stimulus ‘win’, thereby suppressing cells representing distracting stimuli (Duncan and Humphreys, 1989; Desimone and Duncan, 1995; Duncan, 1996). According to this line of work, attention appears as an emergent property of competitive interactions that work in parallel across the visual field.

Several neurophysiological experiments have been performed suggesting biased competition neural mechanisms that are consistent with such a hypothesis (Spitzer et al., 1988; Chelazzi et al., 1993; Chelazzi and Desimone, 1994; Chelazzi, 1999; Luck et al., 1997; Reynolds et al., 1999). Some of the strongest support for the model comes from studies in infero-temporal (IT) cortex of macaque monkeys studied while the monkeys performed a visual search task (Chelazzi et al., 1993). Chelazzi et al. measured activation in IT neurons in monkeys whilst the animals were observing a display containing a target and a distractor. These physiological results illustrate some of the basic components of the biased competition model such as a bias in favour of cells representing the relevant stimulus and the suppression of responses of the irrelevant distractors.

Single-cell recording studies in monkeys from extrastriate areas V2 and V4 also seem to support the biased competition theory for the case of spatially directed attention. It was found that attention has a large effect on responses when two stimuli compete within the same receptive field. When two stimuli are located within the same receptive field of cells in V2 or V4 and the animal attends to one of them, the cell's response is predominantly determined by the attended stimulus (Moran and Desimone, 1985; Luck et al., 1997). Moran and Desimone showed that the firing activity of tuned neurons in the cortex was modulated if monkeys were instructed to attend to the location of the target stimulus.

Therefore, the main effect of attentional selection appears to be a modulation of the underlying competitive interaction between the stimuli in the visual field. The studies in areas V2 and V4 indicate that attention serves to modulate the suppressive interaction between two or more stimuli within the receptive field (Luck et al., 1997; Reynolds et al., 1999). Even in area V1 a weak attentional modulation was already observed (McAdams and Maunsell, 1999).

Additional evidence comes from functional magnetic resonance imaging (fMRI) in humans (Kastner et al., 1998, 1999). According to the biased competition hypothesis, these results show that when multiple stimuli are present simultaneously in the visual field, their cortical representations within the object recognition pathway interact in a competitive, suppressive fashion, which is not the case when the stimuli are presented sequentially. The authors also observed that directing attention to one of the stimuli counteracts the suppressive influence of nearby stimuli.

In the present paper we model some of the former experimental data (electrophysiology and fMRI measurements) within a theoretical framework of a biased competitive neurodynamics. These data take into account different temporal and spatial resolutions and have different experimental paradigms. The aim of this work is to describe these experimental findings within a unified picture. The neurodynamical model consists essentially of several interconnected network modules which can be related to the different areas of the dorsal and ventral path of the visual cortex. Each of the modules consists of a population of cortical neurons, where the temporal evolution of the system is described within the framework of a mean-field approximation, i.e. an ensemble average of the neural population is calculated in order to obtain the corresponding activity. Working within this neurodynamical approach, numerical simulations will be carried out to describe the single-cell recording data of Reynolds et al. (Reynolds et al., 1999) as well as the fMRI experimental findings of Kastner et al. (Kastner et al., 1999).

Results

A. The Neurodynamical Model

The overall systemic representation of the model is shown in Figure 1. The system is essentially composed of six modules (V1, V2–V4, IT, PP, v46, d46), structured such that they resemble the two known main visual paths of the mammalian visual cortex: the what and where paths (Deco and Zihl, 1998, 2001; Hamker, 1998). These six modules represent the minimum number of components to be taken into account within this complex system in order to describe the desired visual attention mechanism.

Information from the retino-geniculo-striate pathway enters the visual cortex through areas V1–V2 in the occipital lobe and proceeds into two processing streams. The occipital–temporal stream (what pathway) leads ventrally through V4 and IT (inferotemporal cortex) and is mainly concerned with object recognition, independently of position and scaling. The occipitoparietal stream (where pathway) leads dorsally into PP (posterior parietal) and is concerned with the location of objects and the spatial relationships between objects. The model considers that feature attention biases intermodular competition between V4 and IT, whereas spatial attention biases intermodular competition between V1, V4 and PP.

The ventral stream consists of four modules: V1, V2–V4, IT, and a module v46 corresponding to the ventral area 46 of the prefrontal cortex, which maintains the short-term memory of the recognized object or generates the target object in a visual search task. The V1 module is concerned with the extraction of simple features (for example, bars at different locations, orientations and size). It consists of pools of neurons with Gabor receptive fields tuned at different positions in the visual field, orientations and spatial frequency resolutions. The V1 module contains P × P hypercolumns that cover the N × N pixel scene. Each hypercolumn contains L orientation columns of complex cells with K octave levels corresponding to different spatial frequencies. This V1 module inputs spatial and feature information up to the dorsal and ventral streams. Also, there is one inhibitory pool interacting with the complex cells of all orientations at each scale. The inhibitory pool integrates information from all the excitatory pools within the module and feedbacks unspecific inhibition uniformly to each of the excitatory pools. It mediates normalizing lateral inhibition or competitive interactions among the excitatory cell pools within the module.

The IT module is concerned with the recognition of objects and consists of pools of neurons that are sensitive to the presence of a specific object in the visual field. It contains C pools, as the network is trained to search for or recognize C particular objects. The V2–V4 module serves primarily to pool and channel the responses of V1 neurons to IT to achieve a limited degree of translation invariance. It also mediates a certain degree of localized competitive interaction between different targets. A lattice is used to represent the V2–V4 module. Each node in this lattice has L × K assemblies as in a hypercolumn in V1. Each cell assembly receives convergent inputs from the cell assemblies of the same tuning from an M × M hypercolumn neighbourhood in V1. The feedforward connections from V1 to V2–V4 are modelled with convergent Gaussian weight function, with symmetric recurrent connection (Kandel et al., 1991).

The dorsal stream consists of three modules: V1, PP and d46. The module PP consists of pools coding the position of the stimuli. It is responsible for mediating spatial attention modulation and for updating the spatial position of the attended object. A lattice of N × N nodes represents the topographical organization of the module PP. Each node on the lattice corresponds to the spatial position of each pixel in the input image. The d46 module corresponds to the dorsal area 46 of the prefrontal cortex that maintains the short-term spatial memory or generates the attentional bias for spatial location.

The prefrontal areas 46 (modules v46 and d46) are not explicitly modelled. Feedback connections between these areas provide the external top-down bias that specifies the task. The feedback connection from area v46 to the IT module specifies the target object in a visual search task. The feedback connection from area d46 to the PP module generates the bias to a targeted spatial location.

The system operates in two different modes: the learning mode and the recognition mode. During the learning mode, the synaptic connections between V4 and IT are trained by means of Hebbian learning during several presentations of a specific object. During the recognition mode there are two possibilities of running the system. First, an object can be localized in a scene (visual search) by biasing the system with an external top-down component at the IT module which drives the competition in favour of the pool associated with the specific object to be searched. Then, the intermodular attentional modulation V1– V4–IT will enhance the activity of the pools in V4 and V1 associated with the features of the specific object to be searched. Finally, the intermodular attentional modulation V4–PP and V1–PP will drive the competition in favour of the pool localizing the specific object. Second, an object can be identified (object recognition) at a specific spatial location by biasing the system with an external top-down component at the PP module. This drives the competition in favour of the pool associated with the specific location such that the intermodular attentional modulation V4–PP and V1–PP will favour the pools in V1 and V4 associated with the features of the object at that location. Intermodular attentional modulation V1–V4–IT will favour the pool that recognized the object at that location.

Each pool of neurons will be described within the mean field approximation (Wilson and Cowan, 1972; Amit and Tsodyks, 1991; Abbot, 1992; Usher and Niebur, 1996), which consists of replacing the temporal averaged discharge rate of a cell with an equivalent activity of a neural population (ensemble average). The mathematical formulation of our model is given in the Appendix.

B. Single-cell Experiments: Electrophysiological Data

Reynolds et al. [(Reynolds et al., 1999), see also (Reynolds and Desimone, 1999)] first examined the presence of competitive interactions in the absence of attentional effects, making the monkey attend to a location far outside the receptive field of the neuron that they were recording. They compared the firing activity response of the neuron when a single reference stimulus was located within the receptive field to the response when a probe stimulus was added to the visual field (see Fig. 2). When the probe was added to the field, the activity of the neuron was shifted toward the activity level that would have been evoked if the probe had appeared alone. When the reference is an effective stimulus (high response) and the probe is an ineffective stimulus (low response) the firing activity is suppressed after adding the probe (Fig. 2a). In contrast, the response of the cell increased when an effective probe stimulus was added to an ineffective reference stimulus (Fig. 2b). The authors also tested attentional modulatory effects independently by repeating the same experiment with the difference that the monkey attended to the reference stimulus within the receptive field of the recorded neuron. The effect of the attention on the response of the V2 neuron was to almost compensate the suppressive or excitatory effect of the probe. That is, if the probe caused a suppression of the activity response to the reference when the attention was outside the receptive field, then attending to the reference restored the neuron's activity to the level corresponding to the case of the reference stimulus alone (Fig. 2a). Symmetrically, if the probe stimulus had increased the neuron's level of activity, attending to the reference stimulus compensates the response by shifting the activity to the level that had been recorded when the reference was presented alone (Fig. 2b).

Numerical Simulations

In this section we present the simulations corresponding to the experiments by Reynolds et al. on single-cell recording in V2 neurons in monkeys. We study the dynamical behaviour of the cortical architecture presented in the previous section by solving numerically the system of coupled differential equations in a computer simulation. We introduce for this experiment a module of V2 neurons. The input system processed an image of 66 × 66 pixels (N = 66). The V1 hypercolumns covered the entire image uniformly. They were distributed in 33 × 33 locations (P = 33) and each hypercolumn was sensitive to two spatial frequencies and to eight different orientations (K = 2 and L = 8). The V2 module has 2 × 8 pools receiving convergent input from the pools of the same tuning from a 10 × 10 (i.e. M = 10) hypercolumn neighbourhood in V1. The feedforward connection from V1 to V2 are modelled with a convergent Gaussian weight function, having a symmetric recurrent connection. We analysed the firing activity of a single pool in the V2 module that was highly sensitive to a vertical bar presented in its receptive field (effective stimulus) and poorly sensitive to a 75° oriented bar presented in its receptive field (ineffective stimulus). The size of the bars were 2 × 4 pixels. Following the experimental set-up of the work by Reynolds et al., we plot in Figure 3 the evolution of the firing activity of a V2 pool under four different conditions: (i) single reference stimulus within the receptive field; (ii) single probe stimulus within the receptive field; (iii) reference and probe stimulus within the receptive field without attention; and (iv) reference and probe stimuli within the receptive field and attention directed to the spatial location of the reference stimulus. In our simulations, the attention was directed to the reference location by setting the top-down attentional bias IijPP,A in PP equal to 0.07 if i and j correspond to the location of the reference stimulus and to zero otherwise. In the unattended condition, the external top-down bias was set equal to zero everywhere. Comparing with the experimental results, the same qualitative behaviour is observed for all experimental conditions analysed. The competitive interactions in the absence of attention are due to the intramodular competitive dynamics at the level of V1 (i.e. the suppressive and excitatory effects of the probe). The modulatory biasing corrections in the attended condition are caused by the intermodular interactions between V1 and PP pools, and PP pools and prefrontal top-down modulation.

C. fMRI Data

The experimental studies of Kastner et al. (Kastner et al., 1998, 1999) show that when multiple stimuli are present simultaneously in the visual field, their cortical representations within the object recognition pathway interact in a competitive, suppressive fashion. The authors also observed that directing attention to one of the stimuli counteracts the suppressive influence of nearby stimuli. These experimental results were obtained by applying the fMRI technique in humans. The authors designed an experiment and different conditions were examined. In the first experimental condition the authors tested the presence of suppressive interactions among stimuli presented simultaneously within the visual field in the absence of directed attention. In the second experimental condition they investigated the influence of spatially directed attention on the suppressive interactions. Finally, in the third condition they analysed the neural activity during directed attention but in the absence of visual stimulation.

The design of the experiment is shown in Figure 4a. Complex visual images are shown in randomized order in four nearby locations within the right upper quadrant under two presentation conditions: sequential (SEQ) and simultaneous (SIM). In the SEQ condition, each of the stimuli was shown separately in one of the four locations. In the SIM condition, the stimuli appeared together in all four locations. The presentation time was 250 ms, followed by a blank period of 750 ms, on average, in each location. A stimulation period of 1 s is shown in the figure, which was repeated in blocks of 15 s.

Four blocks of visual stimulation (SEQ-SIM-SIM-SEQ) were tested in different conditions. The results measured for areas V1, V2, V4 and TEO are shown in Figure 4b. In the first part of the experiment, no attention condition was considered (blocks without shading in the figure). The authors observed that, because of the mutual suppression induced by competitively interacting stimuli, the fMRI signals were smaller during the SIM than during the SEQ presentations. In the second part of the experiment, there were two main factors: presentation condition (SEQ versus SIM) and directed attention condition (unattended versus attended; striped areas indicate the attended presentations in Fig. 4b). The average fMRI signals with attention increased more strongly for simultaneously presented stimuli than the corresponding ones for sequentially presented stimuli. Thus, the suppressive interactions were partially cancelled out by attention. Finally, in the third part, the attended condition was indicated 10 s before the onset of visual presentations (grey shaded areas indicate the expectation period). During this expectation period (EXP), subjects covertly directed attention to the target location expecting the occurrences of the presentations. Blocks with expectation and attended presentations were tested: EXP-SEQ(attended)-SIM-EXP-SIM(attended)-SEQ. During the EXP period, activity increased in the absence of visual presentations and further increased after the onset of visual stimuli.

Numerical Simulations

The dynamical evolution of activity at the cortical area level, as evidenced in the behaviour of fMRI signals in experiments with humans, can be simulated in the framework of the present model by integrating the pool activity in a given area over space and time. The integration over space yields an average activity of the considered brain area at a given time. With respect to the integration over time, it is performed in order to simulate the temporal resolution of fMRI experiments. In this section we simulate fMRI signals from V4 under the experimental conditions defined by Kastner et al. (Kastner et al., 1999). We use the same parameters as in the last section but the V1 hyper-columns now include three levels of spatial resolution (K = 3). Let us remark that for this particular case, the IT module as well as the v46 area are not explicitly needed for the computational simulations.

In order to simulate the data by Kastner et al., we also use four complex images similar to the ones these authors used in their work. These images were presented as input images in four near-by locations in the upper-right quadrant. The neurodynamics (as described in the Appendix) is solved through an interactive process where we choose 200 iterations to represent 1 s (that corresponds to the time resolution of the fMRI measurements).

Stimuli were shown in the two above-mentioned conditions: sequential and simultaneous. In the SEQ condition, stimuli were presented separately in one of the four locations for 250 ms. In the SIM condition, the four stimuli appeared simultaneously for 250 ms (see Fig. 4a) and with equal blank intervals between each other. The order of the stimuli and location was randomized.

Two attentional conditions were simulated: an unattended condition, during which no external top-down bias from prefrontal areas was present (i.e. IijPP,A is zero everywhere) and an attended condition that was defined 10 s before the onset of visual presentations (expectation period EXP) and continued during the subsequent 10 s block. The attended condition was implemented by setting IijPP,A equal to 0.07 for the locations associated with the lowest left stimulus and zero elsewhere. Figure 5 shows the results of our computational simulations for a sequential simulation block: BLK-EXP-SEQ(attended)-BLK-SIM-BLK-EXP-SIM(attended). As in the experiments of Kastner et al. (Kastner et al., 1999) (see area V4 in Fig. 4b), these simulations show that the fMRI signals were smaller in magnitude during the SIM than during the SEQ presentations in the unattended conditions because of the mutual suppression induced by competitively interacting stimuli. On the other hand, the average fMRI signals with attention increased more strongly for simultaneously presented stimuli than the corresponding ones for sequentially presented stimuli. Thus, the suppressive interactions were partially cancelled out by attention. Finally, during the expectation period, activity increased in the absence of visual presentations and further increased after the onset of visual stimuli.

In Figure 6 we plot the mean theoretical activities obtained by averaging across 50 computer simulations for each of the conditions already mentioned. The mean signal changes averaged across subjects and measured in V4 by Kastner et al. are also shown in the figure. Our theoretical results are normalized to the experimental value corresponding to the SEQ (unattended) condition. An analysis of the results similar to the one corresponding to the previous figure (Fig. 5) can be done in this case. Similar conclusions can be drawn with respect to the role of attention. We observe that our theoretical data describe quite well the qualitative behaviour of the experiments. The quantitative differences found between the simulated and empirical data are due to the numerical value of the parameters used in the model. Closer results could be obtained by adjusting these parameters. However, our intention is only to show a qualitative description of the data and to achieve a quantitative agreement is out of the scope of the present work.

D. Theoretical Prediction

In this section we describe a novel, testable prediction that follows from our model. It is concerned with feature attention, where the IT module as well as the v46 area of the theoretical model will be explicitly involved. The experiment we propose consists of four different parts where in each part the fMRI signal should be measured. In the first part, an image containing two objects (object 1 and object 2) is presented to the subject. No attention condition is required in this case. After a blank period, where no image is presented, an attention condition (attend to object 1) is indicated 10 s before the onset of visual presentations (second part of the experiment: the expectation period). In the third part, the same first image (objects 1 and 2) is presented, now having the subject attend to object 1. Finally, in the fourth part, a second image is presented (containing object 2 and object 3), once more having the subject attend to object 1. According to our theoretical prediction, an increase of the fMRI signal corresponding to the feature attention case will be observed with respect to the no-attention condition. For the second image (with attention condition but without the presence of the attended element), no important increase will be noted, again compared to the first image with no-attention condition. An increase in response among neurons that respond to the attended but absent stimulus is also expected (expectation period).

In order to simulate numerically the proposed experiment, in a first step, the network is trained so that the object 1 (required to implement the attention condition) can be learnt. Hebbian learning is used for training. We simulate the experiment repeating the SIM condition already mentioned in the above sections: the first image (containing objects 1 and 2) was presented for 250 ms followed by a blank period of 750 ms. The images of objects 1–3 are similar to the ones used in the experiment of Kastner et al. Two attentional conditions were tested: an unattended (UNATT) and an attended (ATT) one. For the attended condition, we also presented a second image (objects 2 and 3) during 250 ms followed by the blank period of 750 ms. Each of these simulation periods of 1 s was repeated in blocks of 10 s interleaved with blank periods. For the ATT condition, the external attentional object specific bias in IT IcIT,E is set so that only the pool c corresponding to the object 1 receives a positive bias (IcIT,E = 0.018) while the external attentional location specific bias in PP IijPP,A is set equal to zero everywhere. For the expectation period, the same ATT condition is used but in absence of visual presentations. During the UNATT condition, no external top-down bias from prefrontal areas was present (i.e. IcIT,E is zero everywhere). Our theoretical results are presented in Figure 7. We can observe from the figure that the effect of implementing feature attention increases the corresponding activity, compared with the unattended case. The top-down bias introduced to the ventral IT module is responsible for elevating the neural response. This top-down bias is also responsible, during the expectation period, for the increase observed in the activity even in the absence of visual presentations. In the fourth part of the experiment, the activity remains at the same level as the first part since the attended object is no longer present in the image.

Therefore, within our present formulation, object attention emerges when a top-down bias is introduced to the ventral IT module. A similar conclusion was drawn for Kastner et al.'s experimental results: the average fMRI signals with attention increased more strongly for simultaneously presented stimuli than the corresponding ones for sequentially presented stimuli. The suppressive interactions (present in the sequential condition) were partially cancelled out by attention. For this particular case, the spatial mode of attention emerges when a top-down bias is introduced to the dorsal PP module. Thus, in this framework, attention is produced by a simple top-down bias communicated from the executive control regions of the brain (e.g. prefrontal cortex) to the dorsal stream or the ventral stream.

Discussion

In the present work, we followed a computational neuroscience approach in order to study the role of attention in visual perception.

The aim of this paper was to attempt to provide a mathematical formulation that unifies microscopic, mesoscopic and macroscopic mechanisms involved in the brain functions, allowing the description of the existing experimental data (and the prediction of new results as well) at all neuroscience levels (psychophysics, functional brain imaging and measurement of single neural cells).

We have focused on the analysis of the microscopic neurodynamical mechanisms that underlie visual attention. We presented a computational system that consists of interconnected populations of cortical neurons distributed in different brain modules, which in turn can be related to the different areas of the dorsal and ventral paths of the cortex. Competitive mechanisms were implemented by connecting the pools of a given module with a common inhibitory pool. In this way, the more pools of the module are active, the more active the common inhibitory pool will be and, consequently, the pools in the module will experience more feedback inhibition, such that only the most excited group of pools will survive and win the competition. On the other hand, external top-down bias could shift the competition in favour of a specific group of pools. Therefore, this basic computational model implements the biased competition hypothesis. Taking into account the computational role of individual brain areas and their mutual interactions, the macroscopic phenomenological behaviour arises as the result of the global dynamical interactions between the different modules.

The theoretical framework of neurodynamics is used here to integrate both known experimental phenomena and hypothesis. Neurodynamics offers a quantitative formulation for describing the dynamical evolution of single neurons, neural networks and coupled modules of neural networks. Then, the formulation enables us to explain and predict the dynamical evolution of single neurons as well as the functional interaction between cortical areas and psychophysical behaviour (Deco and Zihl, 2001). Our neurodynamical model has been applied to simulate electrophysiological as well as fMRI data.

Finally, let us mention that the two apparently different processes such as spatial versus object attention can be accounted for by a unitary system as proposed and shown in this paper.

The two modes of attention emerge depending simply on whether a top-down bias is introduced to either the dorsal PP module (as in the experiment by Kastner et al.) or the ventral IT module (as in the theoretical prediction here proposed).

To summarize, the dynamical evolution of the firing activity at the neural level of our cortical architecture is consistent with the biased competition hypothesis as it is suggested by the single-cell experiments of Reynolds et al. As for the fMRI measurements, our results demonstrate that our cortical architecture at the neurophysiological level of brain area activity shows the typical dynamical competition and attentional modulation effects.

Appendix: Mathematical Formulation of the Model

We describe in this section the mathematical formulation of the cortical model of visual attention used in this paper.

We consider a pixelized grey-scaled image given by a N × N matrix Iijorig. The subindex ij denotes the spatial position of the pixel. Each pixel value is given a grey value coded between 0 (black) and 255 (white). The first step in the pre-processing consists of removing the DC component of the image which is probably done in the lateral geniculate nucleus (LGN) of the thalamus. We denote this LGN representation by the n × n matrix Γij. In our experiments, the images were pixelized in a 66 × 66 matrix (N = 66) for the single-cell recording simulations and in a 64 × 64 matrix for the fMRI simulations. Feedforward connections to a layer in V1 perform the extraction of simple features. Following Daugman (Daugman, 1988) and Marcelja (Marcelja, 1980), simple cells in the primary visual cortex are modelled by 2-D Gabor functions. The Gabor receptive fields have five degrees of freedom (2-D location of the receptive field's centre, size of the receptive field, orientation and symmetry) and are given by the product of an elliptical Gaussian and a complex plane wave. Daugman (Daugman, 1988) has proposed that an ensemble of simple cells can be better modelled as a family of 2-D Gabor wavelets. The experimental neurophysiological constraints (Kulikowski and Bishop, 1981; DeValois et al., 1982; Webster and De Valois, 1985; De Valois and De Valois, 1988) should also be taken into account. Lee (Lee, 1996) derived a family of discrete wavelets that satisfies the wavelet theory and the neurophysiological constraints for simple cells and we use such a representation in our formulation.

The neurons in the pools in V1 have receptive fields performing a Gabor wavelet transform. Let us denote by IkpqlV,E the sensorial input activity to a pool in V1 which is sensitive to a determined spatial frequency given at octave k, to a preferred orientation defined by the rotation index l and to stimuli at the centre location specified by the indices pq. The sensorial input activity to a pool in V1 is therefore defined by the module of the convolution between the corresponding receptive fields and the image. The large receptive fields of V2 and V4 can be approximately taken into account by including in V1 pools with receptive fields corresponding to several octaves of the 2-D Gabor wavelet transform (i.e. not only the typical narrow receptive fields of V1 but also larger receptive fields are included in V1). Therefore, for the fMRI simulations we compact the V1 and V2–V4 modules in just one module, namely V1. The reduced system connects all cell assemblies in V1 with all cell assemblies in IT. However, in order to simulate Reynolds et al.'s experiment and to be able to compare with their data, we include a V2 pool that is directly connected to V1.

Let us now define the neurodynamical equations that regulate the evolution of the whole system. The activity level of the input current in the V1 module is given by 

(1)
\begin{eqnarray*}&&{\tau}\frac{{\partial}}{{\partial}t}I^{V\mathrm{1}}_{kpql}(t)\ =\ {-}I^{V\mathrm{1}}_{kpql}(t)\ {+}\ aF\left(I^{V\mathrm{1}}_{kpql}(t)\right){-}\ bF\left(I^{V\mathrm{1},I}_{k}(t)\right)\ {+}\ I^{V\mathrm{1},E}_{kpql}(t)\\&&{+}\ I^{V\mathrm{1}{-}PP}_{pq}(t)\ {+}\ I^{V\mathrm{1}{-}IT}_{kpql}(t)\ {+}\ I_{0}\ {+}\ v\end{eqnarray*}
where the attentional biasing due to the intermodular where connections with the pools in the PP module, IpqV1–PP, is given by 
(2)
\[I^{V\mathrm{1}{-}PP}_{pq}(t)\ =\ {{\sum}_{i,j}}W_{pqij}F\left(I^{PP}_{if}(t)\right)\]
and the coefficients Wpqij are evaluated as 
(3)
\[W_{pqif}=K_{V\mathrm{1}PP}{\cdot}\mathrm{exp}\left[\frac{{-}\mathrm{dist}^{2}}{\mathrm{2}S^{2}}\right]\]
with KV1PP being the coupling constant between both modules, S = 2 and dist represents the distance from spatial localization (i, j) to the position of the receptive field (p, q).

Also, in equation (1) I0 is a diffuse spontaneous background input, ν is the Gaussian noise and IkpqlV1–IT represents the attentional biasing due to the intermodular what connections with the pools in the temporal module IT and is defined by 

(4)
\[I^{V\mathrm{1}{-}IT}_{kpql}(t)={{\sum}_{c=\mathrm{1}}^{c=C}}w_{ckpql}F\left(I^{IT}_{c}(t)\right)\]
where wckpql is the connection strength between pools V1 and IT, corresponding to the coding of a specific object category c. We assume that the IT module has C pools corresponding to different object categories.

Excitatory cell pools in each module are engaged in competition, mediated by an inhibitory pool that receives excitatory input from all the excitatory pools and provides uniform inhibitory feedback to each of the excitatory pools. The current activity of the inhibitory pools IkV1,I obey the following equations 

(5)
\[{\tau}\frac{{\partial}}{{\partial}t}I^{V\mathrm{1},I}_{k}(t)\ =\ {-}I^{V\mathrm{1},I}_{k}(t)\ {+}\ c{^\prime}{{\sum}_{p,q,l}}F\left(I^{V\mathrm{1},I}_{kpql}(t)\right)\ {-}\ \mathrm{d}F\left(I^{V\mathrm{1},I}_{k}(t)\right)\]
Similarly, the current activity of the excitatory pools in the PP module is given by 
(6)
\begin{eqnarray*}&&{\tau}\frac{{\partial}}{{\partial}t}I^{PP}_{ij}(t)\ =\ {-}I^{PP}_{ij}(t)\ {+}\ aF\left(I^{PP}_{ij}(t)\right)\ {-}\ bF\left(I^{PP,I}(t)\right)\\&&{+}\ I^{PP{-}V\mathrm{1}}_{ij}(t)\ {+}\ I^{PP,A}_{ij}(t)\ {+}\ I_{0}\ {+}\ v\end{eqnarray*}
where IijPP,A is an external attentional spatial-specific top-down bias; the intermodular attentional biasing IijPP–V1 through the connections with the pools in the module V1 is 
(7)
\[I^{PP{-}V\mathrm{1}}_{ij}(t)\ =\ {{\sum}_{k,p,q,l}}W_{pqij}F\left(I^{V\mathrm{1}}_{kpql}(t)\right)\]
and the activity current of the common PP inhibitory pool evolves according to 
(8)
\[{\tau}\frac{{\partial}}{{\partial}t}I^{PP,I}(t)\ =\ {-}I^{PP,I}_{k}(t)\ {+}\ c{^\prime}{{\sum}_{i,j}}F\left(I^{PP}_{ij}(t)\right)\ {-}\ \mathrm{d}F\left(I^{PP,I}(t)\right)\]
The dynamics of the inferotemporal module IT is given by 
(9)
\begin{eqnarray*}&&{\tau}\frac{{\partial}}{{\partial}t}I_{c}^{IT}(t)\ =\ {-}I_{c}^{IT}(t)\ {+}\ aF\left(I_{c}^{IT}(t)\right)\ {-}\ bF\left(I_{c}^{IT,I}(t)\right)\\&&{+}\ I_{c}^{IT{-}V\mathrm{1}}(t)\ {+}\ I_{c}^{IT,E}(t)\ {+}\ I_{0}\ {+}\ v\end{eqnarray*}
where IcIT,E denotes an external attentional spatial-specific top-down bias, and the intermodular attentional biasing between IT and V1 pools is 
(10)
\[I_{c}^{IT{-}V\mathrm{1}}(t)={{\sum}_{k,p,q,l}}w_{ckpql}F\left(I_{kpql}^{V\mathrm{1}}(t)\right)\]
where the weights wckpql are trained by Hebbian learning.

Finally, the activity current of the common PP inhibitory pool evolves according to 

(11)
\[{\tau}\frac{{\partial}}{{\partial}t}I^{IT,I}(t)\ =\ {-}I^{IT,I}_{k}(t)\ {+}\ c{^\prime}{{\sum}_{c}}F\left(I^{IT}_{c}(t)\right)\ {-}\ \mathrm{d}F\left(I^{IT,I}(t)\right)\]

Notes

We wish to thank the referees for their critical and fruitful comments, and M. Stetter and V. Tresp for a critical reading of the manuscript. S.C. acknowledges financial support from Consejo Nacional de Investigaciones Científicas y Técnicas and Universidad Nacional de Rosario, Argentina.

Figure 1.

Architecture of the neurodynamical approach. The system is essentially composed of six modules structured such that they resemble the two known main visual paths of the visual cortex. Information from the retino-geniculo-striate pathway enters the visual cortex through area V1 in the occipital lobe and proceeds into two processing streams. The occipital–temporal stream leads ventrally through V2–V4 and IT, and is mainly concerned with object recognition. The occipito-parietal stream leads dorsally into PP and is responsible for maintaining a spatial map of an object's location.

Figure 1.

Architecture of the neurodynamical approach. The system is essentially composed of six modules structured such that they resemble the two known main visual paths of the visual cortex. Information from the retino-geniculo-striate pathway enters the visual cortex through area V1 in the occipital lobe and proceeds into two processing streams. The occipital–temporal stream leads ventrally through V2–V4 and IT, and is mainly concerned with object recognition. The occipito-parietal stream leads dorsally into PP and is responsible for maintaining a spatial map of an object's location.

Figure 2.

Competitive interactions and attentional modulation of responses of single neurons in area V2. (a) Inhibitory suppression by the probe and attentional compensation, (b) excitatory reinforcement by the probe and attentional compensation. Adapted from Reynolds et al. (Reynolds et al., 1999). Black horizontal bar at the bottom indicates stimulus duration.

Figure 2.

Competitive interactions and attentional modulation of responses of single neurons in area V2. (a) Inhibitory suppression by the probe and attentional compensation, (b) excitatory reinforcement by the probe and attentional compensation. Adapted from Reynolds et al. (Reynolds et al., 1999). Black horizontal bar at the bottom indicates stimulus duration.

Figure 3.

Computer simulations of the responses of single neurons in area V2 in the experiments of Reynolds et al. (Reynolds et al., 1999). (a) Inhibitory suppression by the probe and attentional compensation, (b) excitatory reinforcement by the probe and attentional compensation. Black horizontal bars at the bottom indicate stimulus duration.

Figure 3.

Computer simulations of the responses of single neurons in area V2 in the experiments of Reynolds et al. (Reynolds et al., 1999). (a) Inhibitory suppression by the probe and attentional compensation, (b) excitatory reinforcement by the probe and attentional compensation. Black horizontal bars at the bottom indicate stimulus duration.

Figure 4.

fMRI signals in visual cortex averaged over all subjects. (a) Task. Subjects fixated a spot while stimuli appeared either asynchronously (left images, SEQ) or simultaneously (right images, SIM). The total amount of time each stimulus appeared was constant. Subjects either performed an attentionally demanding task at fixation or attended to one of the four stimuli. (b) fMRI signals. In area V1, where receptive fields are too small to include more than one stimulus, the presence of additional stimuli did not suppress neuronal responses, and attention had very little effect. On the contrary, in area V4, where receptive fields are large enough to include multiple stimuli, presenting the stimuli simultaneously caused a suppressed response (see the reduced magnitude of the two middle peaks ‘SIM SIM’). Grey shade areas indicate the expectation period, striped areas the attended presentations and blocks without shading correspond to the unattended condition. Adapted from Kastner et al. (Kastner et al., 1999).

Figure 4.

fMRI signals in visual cortex averaged over all subjects. (a) Task. Subjects fixated a spot while stimuli appeared either asynchronously (left images, SEQ) or simultaneously (right images, SIM). The total amount of time each stimulus appeared was constant. Subjects either performed an attentionally demanding task at fixation or attended to one of the four stimuli. (b) fMRI signals. In area V1, where receptive fields are too small to include more than one stimulus, the presence of additional stimuli did not suppress neuronal responses, and attention had very little effect. On the contrary, in area V4, where receptive fields are large enough to include multiple stimuli, presenting the stimuli simultaneously caused a suppressed response (see the reduced magnitude of the two middle peaks ‘SIM SIM’). Grey shade areas indicate the expectation period, striped areas the attended presentations and blocks without shading correspond to the unattended condition. Adapted from Kastner et al. (Kastner et al., 1999).

Figure 5.

Computer simulations of fMRI signals in visual cortex based on the experiments of Kastner et al. (Kastner et al., 1999). Grey shaded areas indicate the expectation period, striped areas the attended presentations and blocks without shading correspond to the unattended condition.

Figure 5.

Computer simulations of fMRI signals in visual cortex based on the experiments of Kastner et al. (Kastner et al., 1999). Grey shaded areas indicate the expectation period, striped areas the attended presentations and blocks without shading correspond to the unattended condition.

Figure 6.

Mean theoretical activities averaged across 50 computer simulations. The mean signal changes in V4 averaged across subjects, measured by Kastner et al. (Kastner et al., 1999), are also included in the figure.

Figure 6.

Mean theoretical activities averaged across 50 computer simulations. The mean signal changes in V4 averaged across subjects, measured by Kastner et al. (Kastner et al., 1999), are also included in the figure.

Figure 7.

Theoretical prediction that follows from our model. The experiment consists of four parts and is concerned with feature attention. In the first part, an image containing two objects (objects 1 and 2) is presented to the subject. No attention condition is required in this case. After a blank period, where no image is presented, an attention condition (attend to object 1) is indicated 10 s before the onset of visual presentations (second part of the experiment: the expectation period). In the third part, the same first image (objects 1 and 2) is presented, now having the subject attend to object 1. Finally, in the fourth part, a second image is presented (containing objects 2 and 3), once more having the subject attend to object 1. We show our numerical simulations for the corresponding activity of area V4.

Figure 7.

Theoretical prediction that follows from our model. The experiment consists of four parts and is concerned with feature attention. In the first part, an image containing two objects (objects 1 and 2) is presented to the subject. No attention condition is required in this case. After a blank period, where no image is presented, an attention condition (attend to object 1) is indicated 10 s before the onset of visual presentations (second part of the experiment: the expectation period). In the third part, the same first image (objects 1 and 2) is presented, now having the subject attend to object 1. Finally, in the fourth part, a second image is presented (containing objects 2 and 3), once more having the subject attend to object 1. We show our numerical simulations for the corresponding activity of area V4.

2
Permanent address: Instituto de Física Rosario, Consejo Nacional de Investigaciones Científicas y Técnicas and Universidad Nacional de Rosario, Argentina

References

Abbot L (1992) Firing rate models for neural populations. In: Neural networks: from biology to high energy physics (Benhar O, Bosio C, Giudice P, Tabet E, eds). Pisa: ETS Editrice.
Amit D, Tsodyks M (
1991
) Quantitative study of attractor neural network retrieving at low spike rates: I. Substrate spikes, rates and neuronal gain.
Network
 
2
:
259
–273.
Chelazzi L (
1999
) Serial attention mechanisms in visual search: a critical look at the evidence.
Psychol Res
 
62
:
195
–219.
Chelazzi L, Desimone R (
1994
) Responses of V4 neurons during visual search.
Soc Neurosci
 
20
:
1054
.
Chelazzi L, Miller E, Duncan J, Desimone R (
1993
) A neural basis for visual search in inferior temporal cortex.
Nature
 
363
:
345
–347.
Daugman J (
1988
) Complete discrete 2D-Gabor transforms by neural networks for image analysis and compression.
IEEE Trans Acoust Speech Signal Process
 
36
:
1169
–1179.
Deco G, Zihl J (1998) A neural model of binding and selective attention for visual search. In: Proceeding of the 5th Neural Computation and Psychology Workshop (NCPW’98), University of Birmingham, England, September 1998 (von Heinke D, Humphreys GW, Olson A, eds), pp. 262–271. London: Springer.
Deco G, Zihl J (
2001
) Top-down selective visual attention: a neurodynamical approach.
Vis Cogn
 
8
:
118
–139.
Desimone R, Duncan J (
1995
) Neural mechanisms of selective visual attention.
Annu Rev Neurosci
 
18
:
193
–222.
Desimone R, Wessinger M, Thomas L, Schneider W (
1990
) Attentional control of visual perception: cortical and subcortical mechanisms.
Cold Spring Harb Symp Quant Biol
 
55
:
963
–971.
De Valois R, De Valois K (1988) Spatial vision. New York: Oxford University Press.
De Valois R, Albrecht D, Thorell L (
1982
) Spatial frequency selectivity of cells in macaque visual cortex.
Vision Res
 
22
:
545
–559.
Duncan J (1996) Cooperating brain systems in selective perception and action. In: Attention and performance XVI (Inui T, McClelland JL, eds), pp. 549–578. Cambridge, MA: MIT Press.
Duncan J, Humphreys G (
1989
) Visual search and stimulus similarity.
Psychol Rev
 
96
:
433
–458.
Hamker F (1998) The role of feedback connections in task-driven visual search. Proceedings of the 5th Neural Computation and Psychology Workshop (NCPW’98), University of Birmingham, England, September 1998 (von Heinke D, Humphreys GW, Olson A, eds), pp. 252–261. London: Springer Verlag.
Kandel E, Scwartz J, Jessell T (1991) The principles of neural science, 3rd edn. Norwalk, CT: Appleton and Lange.
Kastner S, De Weerd P, Desimone R, Ungerleider L (
1998
) Mechanisms of directed attention in the human extrastriate cortex as revealed by functional MRI.
Science
 
282
:
108
–111.
Kastner S, Pinsk M, De Weerd P, Desimone R, Ungerleider L (
1999
) Increased activity in human visual cortex during directed attention in the absence of visual stimulation.
Neuron
 
22
:
751
–761.
Kulikowski J, Bishop P (
1981
) Fourier analysis and spatial representation in the visual cortex.
Experientia
 
37
:
160
–163.
Lee TS (
1996
) Image representation using 2D Gabor wavelets.
IEEE Trans Pattern Anal Machine Intell
 
18
, 10:
959
–971.
Luck S, Chelazzi L, Hillyard S, Desimone R (
1997
) Neural mechanisms of spatial selective attention in areas V1, V2 and V4 of macaque visual cortex.
J Neurophysiol
 
77
:
24
–42.
Marcelja S (
1980
) Mathematical description of the responses of simple cortical cells.
J Opt Soc Am
 
70
:
1297
–1300.
McAdams C, Maunsell J (
1999
) Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4.
J Neurosci
 
19
:
431
–441.
Moran J, Desimone R (
1985
) Selective attention gates visual processing in the extrastriate cortex.
Science
 
229
:
782
–784.
Reynolds J, Desimone R (
1999
) The role of neural mechanisms of attention in solving the binding problem.
Neuron
 
24
:
19
–29.
Reynolds J, Chelazzi L, Desimone R (
1999
) Competitive mechanisms subserve attention in macaque areas V2 and V4.
J Neurosci
 
19
:
1736
–1753.
Spitzer H, Desimone R, Moran J (
1988
) Increased attention enhances both behavioral and neuronal performance.
Science
 
240
:
338
–340.
Treisman A (
1982
) Perceptual grouping and attention in visual search for features and for objects.
J Exp Psychol Hum Percept Perform
 
8
:
194
–214.
Treisman A (
1988
) Features and objects: the fourteenth Barlett memorial lecture.
Q J Exp Psychol A
 
40
:
201
–237.
Usher M, Niebur E (
1996
) Modeling the temporal dynamics of IT neurons in visual search: a mechanism for top-down selective attention.
J Cogn Neurosci
 
8
:
311
–327.
Webster M, De Valois R (
1985
) Relationships between spatial frequency and orientation tuning of striate cortex cells.
J Opt Soc Am
 
A2
, no.
7
.
Wilson H, Cowan S (
1972
) Excitatory and inhibitory interactions in localised populations of model neurons.
Biol Cybern
 
12
:
1
–24.