## Abstract

We study the time evolution of a neural network model as it learns the three stages of a visual delayed-matching-to-sample (DMS) task: identification of the sample, retention during delay, and matching of sample and target, ignoring distractors. We introduce a neurobiologically plausible, uncommited architecture, comprising an ‘executive’ subnetwork gating connections to and from a ‘working’ layer. The network learns DMS by reinforcement: reward-dependent synaptic plasticity generates task-dependent behaviour. During learning, working layer cells exhibit stimulus specialization and increased tuning of their firing. The emergence of top-down activity is observed, reproducing aspects of prefrontal cortex control on activity in the visual areas of inferior temporal cortex. We observe a lability of neural systems during learning, with a tendency to encode spurious associations. Executive areas are instrumental during learning to prevent such associations; they are also fundamental for the ‘mature’ network to keep passing DMS. In the mature model, the working layer functions as a short-term memory. The mature system is remarkably robust against cell damage and its performance degrades gracefully as damage increases. The model underlines that executive systems, which regulate the flow of information between working memory and sensory areas, are required for passing tests such as DMS. At the behavioural level, the model makes testable predictions about the errors expected from subjects learning the DMS.

## Introduction

Delayed-response tasks are efficient behavioural tests used to investigate basic cognitive functions (Hunter, 1913; Jacobsen, 1935). In the test considered here, the delayed-matching-to-sample (DMS) task, a subject is asked to retain an image during a short delay period and then select it from among an array of distractor images.

Lesion studies indicate that the prefrontal (PF) and inferior temporal (IT) cortices play crucial roles in such tasks. Damage to higher visual area IT cortex interferes with performance of the task for visual stimuli (for humans, see Kimura, 1963; Milner, 1968; for monkeys, see Gaffan and Weiskrantz, 1980; Fuster et al., 1981; Mishkin, 1982; Sahgal et al., 1983) while lesions of PF cortex still allow successful completion of the task but only for short delay periods (Goldman et al., 1971; Mishkin and Manning, 1978). This indicates that PF cortex contributes to the retention in working memory of the sample image during the delay (Sakai et al. 2002), while IT cortex is central to the processing of visual stimuli.

Electrophysiological recordings during performance of the task reveal neurons with activity patterns characteristic of the different stages of the task. Neurons with high response selectivity to task-related stimuli were found in IT cortex (Miyashita and Chang, 1988), indicating that this area contributes to the visual component of the representation of images when either seen or remembered. Studies on PF cortex by Fuster and Alexander (1971) and Watanabe (1981) revealed cells with activity patterns similar to those of IT neurons, presumably involved in the memory of the retained image. In addition, PF cells also comprise a non stimulus-specific component which seems related to more procedural aspects of the task such as saliency of inputs and attention (Sakai and Passingham, 2003). These findings point towards another, more executive role for PF cortex in the DMS task.

Electrophysiological recordings also show that complex interactions are taking place between IT and PF cortex. Studies on the monkey performing variations of DMS (Miller and Desimone, 1991, 1994; Miller et al., 1993, 1996; Sakai and Passingham, 2003) indicate that information does not flow freely between visual areas and PF cortex; rather, there exists a task-related management of this transfer. Similarly, activity in IT cortex during a standard DMS trial is determined by the content of working memory (Chelazzi et al., 1993) and the requirements of the task.

Here, we propose a neurobiologically plausible network model which learns and performs the perception, retention and target selection phases of DMS. From the available behavioural and electrophysiological data on delayed-response tasks, we were led to include in the model elements of the architecture and connectivity of PF and IT cortical areas; we also used biologically realistic synaptic modification algorithms. The biological hypotheses underlying the model most crucially include a role for PF cortex as an ‘executive’ system which gates and regulates information flow in the brain.

During the learning phase, the model network must evolve from a non-committed connectivity to a specialized one able to pass the DMS. The formal model, like the subject in the experiment, receives positive reward when choosing the target: this is the only indication as to what it is expected to do. Our aim here is to understand how the relevant circuits are selected out of all those possible, during the task acquisition process. Our basic hypothesis is that tasks such as DMS are learned through synaptic modifications occurring under reinforcement conditions. This process bridges the gap between ‘macroscopic’ or network-scale learning of some behavioural task and the ‘microscopic’ variation of parameters at the cellular, synaptic and molecular levels (Dehaene and Changeux, 1989, 1991). The link through positive reward is what allows neural circuits to emerge and produce non-trivial and consistent behaviour at the global network level. We shall see that standard Hebbian plasticity (Hebb, 1949; Amit, 1994; for recent experimental advances, see Golding et al., 2002) suffices to account for the emergence of stimulus-specific neural assemblies with self-sustainable activity. Reinforcement processes involving Hebbian and reward-sensitive components (Schultz and Dickinson, 2000; Waelti et al., 2001) are necessary, however, for connecting these clusters into the multilevel assemblies spanning PF and IT cortex which form the representations required to pass the DMS task.

The model reproduces extensive data obtained on monkeys which have learned the task (Chelazzi et al., 1993; Fuster, 1971, 1973, 1997). The model shows that in the early stages of learning, the system organization is labile and may, for example, store ‘false’ associations. Indeed, avoiding inadequate associations in learning is at least as much of a problem as building the correct ones. We propose that the executive or ‘gating’ areas are the internal guiding force that enables the system to correctly interpret and use reward, preventing conflicts in activity and spurious associations. In our view, gating of bottom-up and top-down signals could already be present, maybe through the agency of the basal ganglia, for general functional purposes (Frank et al., 2001); the attending alternation of information flow, resulting in a global sequence of perception–decision–reaction phases, might then be available to be recruited by PF cortex for further learning. The present work thus illustrates how it may be possible to distinguish, in the primate brain, between the neural correlates of sensory stimuli, on the one hand, and a more general, stimulus-independent neural correlate of the task itself in which these stimuli are used on the other.

Finally, we observe that the model is characterized by a remarkable resistance to damage. Performance decreases slowly and gracefully until a critical stage is reached where a ‘cognitive catastrophe’ occurs.

## Materials and Methods

The model neural network was initially trained to perform the DMS task with four stimuli. For simplicity, our modelling starts at the level of the higher visual areas in IT cortex, where neurons referred to as ‘image-specific’ have been recorded in tasks involving long-term visual memory (Miyashita and Chang, 1988); these neurons emerge through learning (Sakai and Miyashita, 1994). We will thus use indifferently the terms ‘stimulus’ and ‘image’.

The DMS task consists of an initial waiting period of 0.1 s, after which a sample stimulus is presented during the cue period (0.5 s). The memory of the sample is then maintained during the delay period (1.0 s) until the choice period (0.5 s). During this period, the network is exposed to the target image together with a distractor chosen from the remaining three images. Positive reward is dispensed to the network during the response period (0.5 s) if the target image is selected, while choosing the wrong image or no image is sanctioned by a negative reward. In what follows, the notation

$$x{\rightarrow}x{+}y$$
is used for a trial where image x is the sample and target, and y the distractor.

### Neuronal Architecture of the Model

The model includes (see Fig. 1A) the following.

Figure 1.

(A) Overall diagram of the model and definition of variables. Gold lines on the left sketch tentative assignments of neurons to cortical regions, i.e. M to prefrontal and VR and L to IT (see text for details). Excitatory neurons are represented in green colour, inhibitory in red. Blue lines represent synaptic connections from executive areas, which gate connections between layers M, VR and L. For clarity, only a small sample of connections and M neurons have been drawn. (B) Time activity during a typical trial where image 4 is used as sample and 1 as distractor. Input variable (4) depicts presentation of stimulus 4 (the sample and target); input variable (1) that of stimulus 1, which is the distractor in this trial, denoted as ‘4 → 4 + 1’; executive gating variables Gu and Gd, are set to 1 only when bottom-up (resp. top-down) information flow is allowed. Gating activity during learning and during routine task performance in the mature network differ only by the value of gating Gu in the choice and response periods, being 1 for the mature network and 0 for the immature network.

Figure 1.

(A) Overall diagram of the model and definition of variables. Gold lines on the left sketch tentative assignments of neurons to cortical regions, i.e. M to prefrontal and VR and L to IT (see text for details). Excitatory neurons are represented in green colour, inhibitory in red. Blue lines represent synaptic connections from executive areas, which gate connections between layers M, VR and L. For clarity, only a small sample of connections and M neurons have been drawn. (B) Time activity during a typical trial where image 4 is used as sample and 1 as distractor. Input variable (4) depicts presentation of stimulus 4 (the sample and target); input variable (1) that of stimulus 1, which is the distractor in this trial, denoted as ‘4 → 4 + 1’; executive gating variables Gu and Gd, are set to 1 only when bottom-up (resp. top-down) information flow is allowed. Gating activity during learning and during routine task performance in the mature network differ only by the value of gating Gu in the choice and response periods, being 1 for the mature network and 0 for the immature network.

#### Working Area M

This is a working area, represented by a single layer of Nm neurons, of which equal numbers (on average) are excitatory and inhibitory. These cells are interconnected at short range. The layer's major role will turn out to be the maintenance of stimulus-related activity (memory) during delay. This, however, will require the development of connections forming feedback circuits in M (see also e.g. Brunel, 2003; Miller et al., 2003). [The alternative, that of sustained firing being generated by intrinsic properties of the cells (see e.g. Koch, 2000), has not been explored in the present work — see Discussion.] In addition to their short-range connections, inhibitory neurons also send longer-range axons to all the other neurons at medium and long distances. These projections will enable the self-organization of lateral inhibition in layer M and limit the number of representations which can be active there simultaneously. The assumption is also made that layer M only receives visual input through IT cortex. We set Nm = 900 in all simulations presented in this paper.

#### Higher Visual Areas VR

Higher visual areas are implemented by a layer VR of four stimulus-specific neurons, each cell responding to a single image. The bottom-up transmission of visual information from lower cortical areas is simulated by one-to-one connections from the input layer to the VR layer. Each neuron of layer VR sends ascending projections to a subset of cells in layer M (see Fig. 1A). Reciprocally, every excitatory cell of layer M sends long-range, descending, diffuse connections onto the whole layer VR. These represent the long-range cortico-cortical two-way connections between PF and IT cortex. It has been shown by electrophysiological experiments that, by way of these connections, PF cortex can trigger and sustain the firing of IT neurons in a delay task (Tomita et al., 1999).

#### Lateral Inhibition Layer L

A prefrontally controlled layer L provides a stimulus-specific, time-dependent lateral inhibition on visual area VR. Layer L allows for only one neuron to fire at a given time in VR. Note that we do not necessarily envision L as separate cortical area, it may well consist, for example, of a subset of interneurons located in the same area as VR (Gupta et al. 2000). For instance, when active, neuron L(1) prevents the firing of VR(2), L(2), VR(3), L(3), VR(4) and L(4): only VR(1) is free to be activated. L is a shunting inhibition, i.e. it has a strong, multiplicative effect on the excitatory inputs to the same neuron. It is activated exclusively by M neurons through diffuse vertical connections which are themselves gated by control area Gd (see below). Hence, lateral inhibition can be turned ‘on’ or ‘off’ as area Gd lets it through or not. This way, M is able to exert strong but temporary control over the activity of VR.

#### Executive areas Gu and Gd

Two control layers, Gu and Gd, gate connections between M, VR and L. They can be ‘on’ (Gx = 1, x = u,d) or ‘off’ (Gx = 0, x = u,d). Gu, when ‘on’, allows visual input to rise from layer VR to working memory M. Gd, when ‘on’, allows for the output of working memory neurons to reach layers L and VR, enhancing or suppressing activity there. In Figure 1A we represent the gatings as neuronal circuits possessing a diffuse connectivity in close association with the diffuse projections of PF and IT cortex to their targets.

Recently, Gutfreund et al. (2002) have shown that connections from the optic tectum to the inferior colliculus of the barn owl are apparently controlled by a diffuse inhibitory gating signal. They suggest that this inhibition is globally lifted when the auditory and visual maps of the owl are out of alignement and a learning phase must take place to correct the discrepancy. This mechanism is quite similar in structure and function to the two-way gating suggested here. Let us also note that another model, proposing precise, stimulus-specific gating circuitry and located in the basal ganglia, has been proposed by Frank et al. (2001) (see Discussion).

We assume that our diffuse gating projections make shunting synapses on their postsynaptic neurons (see above); but they can also form synapses on the projecting axons themselves. This not only enables control areas to modify the membrane potential of the neurons targeted by these long-range projections, but also to modify globally their output level (see later). Activities of Gu and Gd during a trial are depicted in Figure 1B. This activity pattern is considered here as given, and will turn out to constitute in effect a prerequisite for successful task learning and performance.

#### Input Layer

This layer models the input to the system from lower visual areas.

### Formal Neurons

All units are of integrate-and-fire type, i.e. each formal neuron i is modelled by three dynamic variables: its membrane potential vi(t) (which exhibits a slow decay due to the finite resistance of the neuron's membrane); its output Fi(t) (which indicates whether an axon potential is emitted); and its firing threshold Vσi(t) (characterizing the cell's excitability: the lower the threshold, the more easily the cell can be triggered). vi(t) obeys the following equation:

(1)
$v_{i}(t{+}\mathrm{{\Delta}}t){=}v_{i}(t)\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}{+}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}\left[G(t){{\sum}_{j{\neq}i}}J_{ij}F_{j}(t)\right]$
Here τv is the membrane time constant of the neurons, which originates from the finite resistance of the cell membrane and the current leaks resulting from it; it ranges from 10 ms for VR and L neurons to 50 ms for M neurons. Equation (1) is an approximate discretized solution to the differential equation governing the dynamics of vi(t) where time t changes by discrete integration steps of size Δt = 0.025 s. The first term represents the decay of membrane charge for each time increment Δt, and the second represents the inputs from presynaptic neurons j with connections of synaptic strengths Jij. This last sum includes, in the case of VR, afferent connections from input layers. G(t) represents the gating from control areas Gu(t) or Gd(t) described earlier on projections from neuron j to neuron i, and takes a value of 0 or 1 depending on the current phase in the trial. It affects the cell's input by removing from it, or including in it, the sum total of all connections gated by G(t).

In our simple model, cells fire whenever their membrane potential is higher than their firing threshold Vσi. The case of probabilistic firing is not considered here. Thus, the output Fi(t) of neuron i is 1 whenever vi(t) >Vσi(t), and 0 otherwise. For neurons in layers VR and L, the output is also gated by the activity of neurons in layer L.

Each M neuron sends on average 20 short-range connections to the rest of the layer, following a logarithmic distribution with radius 1. In addition, if the neuron is inhibitory, it projects further around 40 medium- to long-range projections uniformly over layer M. Each neuron of layer VR sends 975 upwards, uniformly distributed connections on layer M, though with a bias towards one of its four corners (some M neurons therefore receive more than 1 projection from a single VR). Each of the eight neurons in VR and L receives on average 700 downward connections sent only from excitatory M neurons. Overall, this makes about 40 000 horizontal connections in M, and 10 000 vertical connections between layers M, VR and L.

For simplicity, we have not modelled the refractory period of neurons (especially since Δt is much larger than realistic values for this parameter). Neither did we compute the effects on the membrane potential of neuron firing itself. Neurons are updated asynchronously: the membrane potential, thresholds, firing state and new synaptic values are computed one neuron at a time, in a random order chosen at each time step. The states we observed are thus robust and more likely to faithfully represent the noisy dynamics of the brain (Amit, 1989).

### Learning Algorithms

Connections in the model have a time-dependent synaptic strength noted J whose absolute value is constrained between 0 and 1. The modification of J allows learning in the network; at each step:

(2)
$J(t{+}\mathrm{{\Delta}}t){=}J(t){+}\mathrm{{\delta}}J$
where δJ is the modification of synaptic strength generated by learning. It consists of two parts, i.e.
$$\mathrm{{\delta}}J{=}\mathrm{{\delta}}J_{H}{+}\mathrm{{\delta}}J_{R}$$
as defined below, and is a function of neural activity and of external factors (such as reward).

All connections are subject to standard Hebbian learning (Hebb, 1949). The basic principle as implemented here is that excitatory synapses which, by their activity, tend to increase postsynaptic firing are reinforced; likewise, inhibitory synapses which are idle while postsynaptic firing occurs are reinforced. Thus, for excitatory connections (J > 0), the Hebbian modification to the synaptic strength is

(3)
$\mathrm{{\delta}}J_{H}{=}\mathrm{{\eta}}F_{\mathrm{pre}}(t)F_{\mathrm{post}}(t)$
where Fpre(post) denotes the activity of the pre(post)-synaptic neuron. For inhibitory connections (J < 0),
(4)
$\mathrm{{\delta}}J_{H}{=}{-}\mathrm{{\eta}}(1{-}F_{\mathrm{pre}}(t))F_{\mathrm{post}}(t)$

The parameter η = 0.025Δt sets the magnitude of Hebbian synaptic plasticity. Its dependence on the time increment Δt ensures that the synaptic coefficients J will evolve slowly enough to allow learning by the network for different values of Δt.

Reinforcement learning is assumed to act only on downward connections between layers. It represents at the cellular and molecular level the influence of reward as applied in behavioural experiments. Experimental animals are indeed not shown explicitly what to do, but have their behaviour reinforced by positive or negative reward. Reward or reward expectation are known to be controlled by well-defined neurotransmitter pathways (mainly dopaminergic; see Waelti et al., 2001) which directly or indirectly affect response probabilities as a function of reward(s). Dopaminergic innervation is diffuse in the monkey and covers in particular areas 46, 9, 19, 11 and 12 (PF), as well as areas 21, 22 (IT) (Berger et al., 1988) and TE (Middleton and Strick, 1996); in humans this is even more marked (Gaspar et al., 1989). Dopamine D2 receptors are additionally known to be present in areas of human temporal cortex (Goldsmith and Joyce, 1996).

To formalize the reward-associated part of synaptic plasticity, we introduce the discrete variable R which signals the occurrence of reward itself. Reward is dispensed at the beginning of the response period: thus R = 0 during the whole trial except during response, where R is set to +1 (positive reward) or −1 (negative reward) for a correct (resp. incorrect) response. R can be thought of as a neurotransmitter concentration pulse transferring reward all the way down to the synaptic level. The reinforcement-associated plasticity affects only descending connections from excitatory neurons of M onto VR and L and is defined by:

(5)
$\mathrm{{\delta}}J_{R}{=}\mathrm{{\rho}}F_{\mathrm{pre}}(t)F_{\mathrm{post}}(t)R$
where ρ = 0.1 specifies the magnitude of reinforcement learning. Since δJR can be either positive or negative, vertical connections can be either strengthened or weakened by reward. This grants the system leeway to test solutions and to either stabilize or delete them. Note that, as ρ ≫ η, reinforcement plasticity dominates Hebbian learning, so that the system is ‘attending’ to external guidance (when present) rather than simply strengthening its spontaneous, unchecked activities.

### Variable Threshold for Neuronal Firing

As a result of these synaptic adjustments, the overall strength of signals impinging on a neuron changes during learning. Hence a physiological adaptation process is required; it may be effected in cortical areas by the scaling of synaptic coefficients as a function of neural activity (Turrigiano et al., 1998; Desai et al., 2002; Feldman, 2002; for a possible molecular implementation see Edelstein and Changeux, 1998). Here, for the sake of computational efficiency, we settled on dynamic thresholds for neuron activation, which vary according to the cell's membrane potential and activity. If a neuron fires frequently, afferent synapses will be strengthened and the threshold will rise. If they are weakened, it will slowly go down. This is a generalization of the ‘variable threshold’ introduced by Kerszberg et al. (1992) where the threshold is taken to be a function of the membrane potential through the number and magnitude of excitatory afferent synapses. Under certain assumptions, this definition can be shown to be mathematically equivalent to Turrigiano's scaling law (see Appendix). Our expression for the threshold Vσi for cell i is thus:

(6)
$V_{\mathrm{{\sigma}}i}(t){=}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}\frac{1}{1{-}\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}}\left[\frac{J_{0}}{2}{+}\mathrm{{\alpha}}G(t){{\sum}_{j{\neq}i,J{>}0}}J_{ij}(t)\right]$
where τv is the membrane constant of the cell (typically 10–50 ms in the model) and α is a constant set to a value (typically between 0.1 and 0.01) such that Vσi stays roughly of the same order of magnitude as the cell membrane potential Vi during the simulation and
$$J_{0}{=}1/N_{m}{\ll}1.$$
The constant J0 in equation (6) ensures that thresholds are always larger than 0, and that neurons quickly stop firing when no longer stimulated. The present model thus neglects spontaneous firing and bursts, though this could be included at the cost of additional complexity.

The factor

$$1/(1{-}e^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}v})$$
in equation (6) is introduced to take into account the finite value of the neuron membrane time constant: we use leaky integrate-and-fire units, and so charge injected into a cell will not dissipate immediately but will accumulate instead. This factor then describes how the setting of the firing threshold must take this charge accumulation into account as well. The factor G(t) represents the overall modification of the neuron's excitability by gatings Gu(t) or Gd(t) acting on diffuse inter-layer connections.

### Gating Signals: Traffic Regulators for Learning

Hebbian plasticity favours the emergence in M of cell groups which fire together for any pair of inputs often presented simultaneously. Not all such coincidences are important, however. Avoiding spurious ones is as important as enhancing the useful ones. In the model, two mechanisms prevent the system from recording task-unrelated associations. The first is the already mentioned long-range inhibitory connections in layer M, which generate lateral inhibition in working memory. Activity of a neuron in M hyperpolarizes all the other neurons: as a result, if, for example, visual input conflicting with current activity is fed into layer M, it will have little (but non-zero) probability of disrupting the representation currently held in working memory.

In addition, an important circuit element in making learning the task possible are the gating activity patterns of the executive areas, Gu and Gd. The role of these gating systems is fundamentally very simple: they allow or shut down signal traffic, and do so in accordance with task requirements. The requirements are (i) initially, expect something from the outside; and (ii) later on, during delay and choice, pay reduced attention to outside stimulation, privileging instead the stored internal state. Admittedly, these are important elements of the task and they are not learned by the model. We claim, however, that this alternation of two ‘mind states’, one allowing external influences in and the other not, may correspond to a general necessity of neural system function (see Gutfreund et al., 2002; see also Discussion) and that the monkey recruits such a general function switch between two states early in learning DMS. It is conceivable that, when the beginning of a trial is announced, the monkey goes automatically into a stimulus expectation phase. Most important here, at any rate, is that the mixing of internal and external influences should not occur in an uncontrolled way.

In fact, the strict alternation of input-dominated and memory dominated regimes is necessary only at the very beginning of the learning phase, during the first trials of a run. We have indeed assumed that these activities could start differing later on from this simple pattern. The constraint, when relaxed, will lead in fact to a testable prediction. Figure 1B shows the difference bewteen strict and relaxed gatings, which resides in the firing of gating Gu during the choice and response periods: during learning, Gu is ‘on’ for the cue period only, which implies that visual information rises into working memory only during sample presentation. Once learning is over, gating Gu may also be set to ‘on’ during the choice and response periods. Cells in M may then also fire due to a bottom-up signal from the target and distractor images.

### Network Preparation, Maturation and Output Response

At the beginning of each simulation, the network is built anew: neuron types (excitatory or inhibitory) in layer M are randomly chosen with equal probabilities, and all the connections in the model are generated by a random distribution process and assigned a minimal synaptic weight (J0 for excitatory connections and −J0 for inhibitory ones). The resulting network is therefore ‘unique’ as its connectivity, though layered, is randomly chosen from the astronomically large space of all possible connectivity patterns. The network is then trained on the task by reinforcement.

Since we have not modelled motor areas, the network cannot act out its response; hence we must define a mechanism to read the network's decisions. Electrophysiological experiments seem to indicate that processing in the cortex during the choice period proceeds roughly as follows (see e.g. Fuster, 1997): attention is directed to the target; stimulus-specific sensory activity rises, followed by the buildup of activity in premotor and motor cortices, leading finally to physical motion (push of a button or visual saccade). Having limited our modelling at the IT cortex level, we use the activity of cells in layer VR to monitor the network's response in a trial: a trial is considered successful if the neuron in the VR layer associated to the target is firing during the whole response period, and if all the other neurons from VR are silent during this same time period.

The output of neurons in layer M is meant to represent approximately that of individual cells. The few units making up layers VR and L, as well as areas Gu and Gd, on the other hand, each stand for a larger group of cells with similar connectivity. To compare their activity with that of real cells, one needs to reinterpret their membrane potential as a cell firing probability: the higher the potential, the more a neuron belonging to the group is likely to fire. This firing is possible, however, only when the output of the neuron is not gated ‘off’ by lateral inhibition. The resulting ‘generalized firing probability’ is the variable plotted in Figure 7B (see below).

### Computing System

All computations were performed on a PC equipped with an Athlon XP processor clocked at 1.6GHz, using 512 Mb of RAM and running Redhat Linux 7.2. The model was implemented by a custom-made program written using the Linux C compiler and Matlab 6.1 as graphical interface (The MathWorks, Framingham, MA).

## Results

### Measuring Network Performance

A ‘run’ starts with a randomly built network and comprises 120 trials. The system undergoes a learning phase during which learning algorithms are allowed to modify synaptic weights. Stimuli are grouped in random order in blocks of four trials, each trial containing all four stimuli. Groups are then put together in series. This ensures that the network is presented with each stimulus an equal number of times, and helps avoid overtraining with certain images compared to others. Learning proceeds until the network is able to successfully pass 20 trials in a row (or, equivalently, pass DMS five times with each of the four stimuli). If and when the system reaches this stage, it is considered mature: all learning algorithms are stopped and the firing of the gating areas is switched to the mature pattern (see Fig. 1B). The percentage of successful trials is then measured. Typically, 100 runs are completed and the results compiled. The proportion of successful runs provides a quantitative measure of the performance of individual networks, but also yields an estimation of just how efficient the postulated network architectural principles and learning algorithms might be.

In <10% of the runs, the system is unable to learn the DMS within the imposed limit of 120 trials. We find, however, that >90% of the time our networks do manage to fulfil the success criterion, and in >80% of cases they actually enjoy continued success during the mature phase (the networks responsible for the difference still pass the task, but with a success rate comprised between 80 and 95%).

On average, the successful networks failed in about 20 trials before meeting the criterion for success, with a minimum of four failed trials for the brightest network and a maximum of 92.

### Network Training and Emergence of Neural Activity

During the learning phase, activity in the network self-organizes in two stages. First, upward connections from visual areas to working memory and horizontal connections within the latter settle to generate stable activity patterns. As we shall see, this process is completed in a few trials. It takes longer to complete the second, crucial phase, during which downward connections from working memory to VR and L layers are selected and strengthened.

#### Neural Circuit Development in Layer M

In Figure 2A the number of M cells which fire during sample presentation is plotted during a complete, successful run. The figure illustrates the rapid segregation of M cells in four clusters, each associated with a unique stimulus. This phenomenon is a direct result of Hebbian synaptic modification law: excitatory projections between firing cells are strengthened as are inhibitory connections onto silent cells, giving rise to the emergence in M of self-reinforcing, mutually exclusive neural circuits, or clusters. The number of cells in layer M is large enough to allow the formation of these circuits from the initial random distribution of inhibitory and excitatory neurons and the connections between them. The resulting clusters have a center–surround structure, with a kernel of excitatory cells and an outer shell of inhibitory cells (see e.g. Fig. 6C below). The observed delineation of stimulus-specific clusters is an example of the general mechanism introduced by Kerszberg et al. (1992) for the emergence of complex receptor fields in two-dimensional neuron layers. Cluster build-up takes less than ∼10 trials. The clusters thereafter display a stable structure, save possibly for small persistent ‘oscillations’ as the precise number of cells responding to an image varies from one trial to the next. Most cells respond only to a single image, but some, localized near cluster boundaries, may switch their preferred image permanently due to learning. Neuron clusters are illustrated in Figure 2B.

Figure 2.

Evolution of response characterstics of layer M neurons. (A) Number of neurons from layer M responding to each image during a run. The dotted line indicates the end of the training period. Neurons are quickly recruited to their preferred image, in clusters which stabilize in ∼10 trials, as seen on the final plateaus. (B) Distribution of cells in layer M according to their stimulus specificity recorded during the learning phase. Most neurons (848 out of 900) respond only to one of the four images and a very small minority (18/900) do not fire at all. The remaining 34 cells respond to two images in the course of learning. They are found at the border between regions occupied by stimulus-specific cells. (C) Evolution, during a complete run, of the number of neurons of layer M with given responses and reacting only to a single image. The dotted line indicates the end of the training phase. Each curve represents the number of cells exhibiting one particular response pattern during a trial. Five categories (presented on D) generally encompass almost all cells, the remainder (turquoise curve) being on the order of 1%. For better statistics, we have averaged the cells' outputs over four trials, each trial using one of the four stimuli as target (d1 = first 0.3 s of the delay). (D) Diagram of single-cell response evolution during learning for neurons in layer M. This illustrates the possible changes in firing pattern a cell might undergo at the next trial, given its response in the current trial. A trial consists of c: cue; d: delay; ch: choice; r: response periods (see ‘legend’); a response pattern describes the presence (black) or absence (white) of reaction during all successive trial phases. The figure was obtained by compiling the transition matrix between response patterns for all 900 M neurons during the first 60 trials of the run (where most of transitions in cell firing takes place). The result, divided by the total number of transitions, gives the probability that one cell chooses pattern x at the present trial if it exhibited pattern y at the previous one. This can be understood as a ‘flow’ diagram of response pattern activity. Each box represents one of the five most common type of neural response during a whole trial. Arrow widths represent relative probabilities for the transition from one response to the other (numerical factors are used to indicate the large probabilities for no response change).

Figure 2.

Evolution of response characterstics of layer M neurons. (A) Number of neurons from layer M responding to each image during a run. The dotted line indicates the end of the training period. Neurons are quickly recruited to their preferred image, in clusters which stabilize in ∼10 trials, as seen on the final plateaus. (B) Distribution of cells in layer M according to their stimulus specificity recorded during the learning phase. Most neurons (848 out of 900) respond only to one of the four images and a very small minority (18/900) do not fire at all. The remaining 34 cells respond to two images in the course of learning. They are found at the border between regions occupied by stimulus-specific cells. (C) Evolution, during a complete run, of the number of neurons of layer M with given responses and reacting only to a single image. The dotted line indicates the end of the training phase. Each curve represents the number of cells exhibiting one particular response pattern during a trial. Five categories (presented on D) generally encompass almost all cells, the remainder (turquoise curve) being on the order of 1%. For better statistics, we have averaged the cells' outputs over four trials, each trial using one of the four stimuli as target (d1 = first 0.3 s of the delay). (D) Diagram of single-cell response evolution during learning for neurons in layer M. This illustrates the possible changes in firing pattern a cell might undergo at the next trial, given its response in the current trial. A trial consists of c: cue; d: delay; ch: choice; r: response periods (see ‘legend’); a response pattern describes the presence (black) or absence (white) of reaction during all successive trial phases. The figure was obtained by compiling the transition matrix between response patterns for all 900 M neurons during the first 60 trials of the run (where most of transitions in cell firing takes place). The result, divided by the total number of transitions, gives the probability that one cell chooses pattern x at the present trial if it exhibited pattern y at the previous one. This can be understood as a ‘flow’ diagram of response pattern activity. Each box represents one of the five most common type of neural response during a whole trial. Arrow widths represent relative probabilities for the transition from one response to the other (numerical factors are used to indicate the large probabilities for no response change).

#### Evolution of Neural Response in M

As clusters of neurons differentiate in M during learning, the receptive fields and firing patterns of M neurons themselves evolve. In the largely random initial state of the network, most neurons respond in a non-robust way from trial to trial. During learning, responses become more structured, robust and stimulus-specific. In the final state, when learning has been completed successfully, most neurons exhibit a single type of reproducible response from trial to trial, limited to one stimulus or none at all. There remains a minority of cells which continue to fluctuate between response patterns from trial to trial even after training is completed.

To analyse the evolution of cell responses, we prepared an exhaustive catalog of firing patterns encountered in all trials of a given run. We then counted trial by trial the number of cells in each response category. We found >20 different response types in most simulations, but the very large majority of cells (>98%) always fell into at most the same five categories, which are illustrated in Figure 2C,D, and can be described as follows. Type 0: no response; type 1: response during the whole trial; type 2: response during the whole trial except cue period; type 3: firing only during cue period; and type 4: firing during cue and first 0.3 s of the delay. Figure 2C shows the number of cells in each category trial by trial.

Less than half of cells in M exhibit their final firing pattern from the outset. The majority of neurons will adapt their response by undergoing transitions between firing patterns; examples are displayed in Figure 3. On the whole, cells tend to increase the duration of their firing, as can be confirmed in Figure 2C where the number of cells with no or with sparse firing (types 0 and 3) decreases while that of neurons with more tonic firing (types 1 and 2) increases.

Figure 3.

Evolution of the response of VR neuron activity during consecutive trials at the very beginning of the learning period (c: cue; d: delay; ch: choice; r: response). Solid vertical lines separate trials, dotted lines separate periods of a given trial. (A–E) Neurons responding to a single image. Most cells in the system behave like shown in (A–D), increasing their firing during trials. For cell response type definition, see text. (A) A transition from type 0 to 1, (B) from type 0 to 3, (C) from type 0 to 3 to 1 and (D) from 0 to 3, then 4 and finally 1. Fewer cells on the contrary have responses which decrease during learning, as demonstrated in (E), which makes a transition from 1 to 0. Only trials where the cells' preferred image is used as sample have been used here. (F) An example of a cell which initially responds to image 4 (dark grey) but shifts to image 2 (light grey) after a few trials. Only trials using images 2 and 4 are plotted here.

Figure 3.

Evolution of the response of VR neuron activity during consecutive trials at the very beginning of the learning period (c: cue; d: delay; ch: choice; r: response). Solid vertical lines separate trials, dotted lines separate periods of a given trial. (A–E) Neurons responding to a single image. Most cells in the system behave like shown in (A–D), increasing their firing during trials. For cell response type definition, see text. (A) A transition from type 0 to 1, (B) from type 0 to 3, (C) from type 0 to 3 to 1 and (D) from 0 to 3, then 4 and finally 1. Fewer cells on the contrary have responses which decrease during learning, as demonstrated in (E), which makes a transition from 1 to 0. Only trials where the cells' preferred image is used as sample have been used here. (F) An example of a cell which initially responds to image 4 (dark grey) but shifts to image 2 (light grey) after a few trials. Only trials using images 2 and 4 are plotted here.

A comprehensive overview of transitions between cell response patterns from one trial to the next is shown in Figure 2D and in Tables 1 and 2. In Figure 2D, circular arrows represent the probability that a cell sustains the same firing for one or several trials before changing to a different response. The firing history of the cells shown in Figure 3A–E can readily be deciphered here.

Table 1

How firing patterns of M neurons fluctuate from one trial to the next at the beginning of learning

c → s

s = 0

s = 1

s = 2

s = 3

s = 4

c = 0 1196 127 13 277 19
c = 1 92 7772 66 30 10
c = 2 10 35 59
c = 3 204 69 953 86
c = 4

7

76

0

33

402

c → s

s = 0

s = 1

s = 2

s = 3

s = 4

c = 0 1196 127 13 277 19
c = 1 92 7772 66 30 10
c = 2 10 35 59
c = 3 204 69 953 86
c = 4

7

76

0

33

402

The number of transitions observed from a firing pattern c ( 0, 1, 2, 3, 4 — see convention of Fig. 2D) during a trial, to another firing pattern s during the next trial for neurons in M. The transitions were sampled during the first 60 trials of the run, i.e. when learning takes place. Though most of its content is on the diagonal (invariant firing pattern), the matrix is not exclusively diagonal (firing varies) and not symmetrical (some transitions take place more often than others, e.g. 3 → 4 happens more than 4 → 3). This corresponds to the irreversible modification of the firing of cells in layer M during training. These transitions are also graphically illustrated by Figure 2D.

Table 2

How firing patterns of M neurons fluctuate from one trial to the next after learning

c → s

s = 0

s = 1

s = 2

s = 3

s = 4

c = 0 990 15 190
c = 1 16 8427 74
c = 2 51 262
c = 3 185 996 43
c = 4

5

14

0

21

526

c → s

s = 0

s = 1

s = 2

s = 3

s = 4

c = 0 990 15 190
c = 1 16 8427 74
c = 2 51 262
c = 3 185 996 43
c = 4

5

14

0

21

526

The same transitions as in Table 1, sampled during the last 60 trials of the run, i.e. in the mature network which has successfully completed its training phase. The matrix is more symmetrical and its off-diagonal elements have decreased in magnitude, illustrating that most M cells exhibit now a stable firing pattern. Some transitions between firing patterns still occur (e.g. between states 3 and 4) because of persistent instabilities in the steady states of the clusters.

Transitions gradually tend toward a statistical stationarity, as most non-circular arrows on the diagram of Figure 2D fade away. Those few arrows which remain represent oscillations between response type 0 and 3 (no response and cue response), type 1 and 2 (full response and post-cue response) and type 3 and 4 (cue response and cue plus early delay firing).

Some cells in M change their stimulus specificity in the course of learning. In the particular run illustrated in Figure 2B there were 34 such cells. Figure 3F presents the activity of one of these cells as its stimulus selectivity shifts from image 4 to image 2.

Because of our simplifying assumptions (sparse connectivity, reset of layer M between trials, absence of spontaneous neural firing), cells in layer M end up responding at most to a single stimulus (although, as shown in Figs 2B and 3F, some cells might respond to several stimuli in the course of training).

#### Selection of Connections between Layers M, L and VR: Emergence of Control of PF on IT Cortex

In the course of the learning phase, the system must construct a representation of each image spanning working memory M and the visual layers L and VR.

Figure 4A,B illustrates the dramatic refinement of vertical connections into directional bundles during task learning. Figure 4B shows only the connections which have reached a minimal absolute value after the learning process has completed. The key to passing DMS, once stimulus-specific clusters have stabilized in layer M, is indeed the reward-elicited reinforcement of the correct connections from each cluster in M towards those VR and L neurons corresponding to the same image.

Figure 4.

Selection of vertical connections during the learning phase. Evolution of bottom-up and top-down connections between layers M, VR and L dictating the dynamics of the network (connections within layer M are not represented). (A) Initial configuration of the system. Overall connection pattern is homogeneous and diffuse, except for upward projections from layer VR to M which have a slight bias toward each of the corners of layer M. (B) Strengthened connections which dominate the network dynamics after completion of learning. These connections (with, typically, a strength ≥ 0.75) constitute a two-way network of projections linking each VR(i) neuron (i = 1, 2, 3, 4) with the corresponding neural cluster of image i and the lateral inhibitor neuron L(i).

Figure 4.

Selection of vertical connections during the learning phase. Evolution of bottom-up and top-down connections between layers M, VR and L dictating the dynamics of the network (connections within layer M are not represented). (A) Initial configuration of the system. Overall connection pattern is homogeneous and diffuse, except for upward projections from layer VR to M which have a slight bias toward each of the corners of layer M. (B) Strengthened connections which dominate the network dynamics after completion of learning. These connections (with, typically, a strength ≥ 0.75) constitute a two-way network of projections linking each VR(i) neuron (i = 1, 2, 3, 4) with the corresponding neural cluster of image i and the lateral inhibitor neuron L(i).

Figure 5 presents activity of all four VR neurons during the first 10 trials of a run. Obviously, the network shown here, like 90% of its kin, is able to rapidly learn the DMS by reinforcement. Note that the response of cells during the cue and the first part of the choice periods is purely sensory, and as such are not modified by training; however, activities during the delay, the second part of the choice period and the response period are produced and sustained through long-range connections by PF cortex, and must therefore be constructed through learning. As already noted for layer M, the cells' firing becomes less erratic and more tonic as learning progresses. Trials 7–10 show examples of fully mature cell activities: tonic firing of the cell coding for the sample and target image (except during the end of the delay) and temporary response of a different cell to the presentation of the distractor image. Figure 5 also illustrates several possible scenarios for failed trials.

Figure 5.

Response of VR cells during the first 10 trials of a run. Records for trials are separated by blue lines; each numbered trial is divided by dotted lines in c: cue; d: delay; ch: choice; and r: response periods. Indicated below the records are: (Trials) number of the current trial; (Stimuli) the stimuli used for each trial (sample, and distractor in parentheses); (Guess) the tentative target proposed by the network for each trial. Reward awarded to the network for each trial is indicated by colour (red = negative reward, green = positive reward). Trial 1 – sample: 1 – failed. VR(1) fires and initiates a bottom-up representation of stimulus 1, which propagates upward into layer M where it triggers the cluster specific to this image (not shown). The system fails twice at sustaining activity of VR(1) once the sample image is hidden: during the delay [when it triggers VR(3)] and during the choice period (it activates VR(4) corresponding to the distractor). R = −1: Connections from the M-cluster of image 1 onto VR(4) and L(4) are weakened by decrements of −δJR all the way to J0. Trial 2 – sample: 2 – succeeded. VR(2) and the cluster in M of image 2 are triggered by the sample presentation during the cue period. VR(2), however, stays on during the delay, although the system ‘hesitated’ for a few instants and triggered VR(1) and VR(4). During the choice period, the network selects activity corresponding to the target, VR(2) and L(2). R = 1: reinforcement of all connections between the M-cluster of image 2 and VR(2) and L(2). Trial 3 – sample: 3 – failed. Similar to trial 1 though with sample image 3. System wrongly expects image 4 as target instead of stimulus 3 and receives negative reward as a result: the expectation of image 4 as a target by the system suppresses the activity of cells VR(2) and VR(3), disregarding them although they code for images actually presented to the network. Trial 4 – sample: 4 – succeeded. The model passes the trial even though it made the wrong assumption for target during the delay (choosing image 3 instead of 4). Connections from the cluster of image 4 in M toward VR(4) and L(4) are reinforced. Trial 5 – sample: 1 – succeeded. Another example of a successful trial in spite of wrong choice during the delay period. Trial 6 – sample: 2 – failed. The system chose an image as target (image 3) which was neither the sample nor the distractor, even though it passed trial 2 where the sample was image 2. The final four trials are successful.

Figure 5.

Response of VR cells during the first 10 trials of a run. Records for trials are separated by blue lines; each numbered trial is divided by dotted lines in c: cue; d: delay; ch: choice; and r: response periods. Indicated below the records are: (Trials) number of the current trial; (Stimuli) the stimuli used for each trial (sample, and distractor in parentheses); (Guess) the tentative target proposed by the network for each trial. Reward awarded to the network for each trial is indicated by colour (red = negative reward, green = positive reward). Trial 1 – sample: 1 – failed. VR(1) fires and initiates a bottom-up representation of stimulus 1, which propagates upward into layer M where it triggers the cluster specific to this image (not shown). The system fails twice at sustaining activity of VR(1) once the sample image is hidden: during the delay [when it triggers VR(3)] and during the choice period (it activates VR(4) corresponding to the distractor). R = −1: Connections from the M-cluster of image 1 onto VR(4) and L(4) are weakened by decrements of −δJR all the way to J0. Trial 2 – sample: 2 – succeeded. VR(2) and the cluster in M of image 2 are triggered by the sample presentation during the cue period. VR(2), however, stays on during the delay, although the system ‘hesitated’ for a few instants and triggered VR(1) and VR(4). During the choice period, the network selects activity corresponding to the target, VR(2) and L(2). R = 1: reinforcement of all connections between the M-cluster of image 2 and VR(2) and L(2). Trial 3 – sample: 3 – failed. Similar to trial 1 though with sample image 3. System wrongly expects image 4 as target instead of stimulus 3 and receives negative reward as a result: the expectation of image 4 as a target by the system suppresses the activity of cells VR(2) and VR(3), disregarding them although they code for images actually presented to the network. Trial 4 – sample: 4 – succeeded. The model passes the trial even though it made the wrong assumption for target during the delay (choosing image 3 instead of 4). Connections from the cluster of image 4 in M toward VR(4) and L(4) are reinforced. Trial 5 – sample: 1 – succeeded. Another example of a successful trial in spite of wrong choice during the delay period. Trial 6 – sample: 2 – failed. The system chose an image as target (image 3) which was neither the sample nor the distractor, even though it passed trial 2 where the sample was image 2. The final four trials are successful.

Threshold adjustment turns out to be important here, as it allows the system to limit future possibilities after each positively reinforced trial. Indeed, upon positive reinforcement, all connections afferent to neurons in VR and in L corresponding to the target will have their synaptic weights augmented; the firing thresholds of these cells then rise sharply due to the thresholds being dependent on afferent synaptic coefficients (equation 6). Hence, the cells in question will no longer be triggered by connections sent from the other (non-target) clusters in M, because these connections are still weak: the result is that the reinforced stimulus-specific pair of VR and L cells can from now on become active only during trials using that same image as sample; this reduces the amount of trial and error by the network. Still, there is no guaranteed success at the next trial with that particular sample image, if other VR and L cells still have low thresholds (see trials 2 and 6 of Fig. 5 for an example of this general argument). The network can be said to learn by elimination.

This is clearly an original prediction: once a trial with some sample has succeeded, this particular image will not be proposed by the network in subsequent trials with other sample images. Each time the network receives positive reward, this mechanism reduces the number of possibilities remaining to be explored and is an effective — and potentially quite general — means to beat task complexity.

### The Mature Model

#### Performance of a Trial

The operation of the mature network should be compared with that of the nervous system of a monkey having learned DMS over months of practice. Using snapshots taken at key moments, Figure 6 illustrates how our network operates during a complete, successful trial. Figure 1B shows the activity of the input layer and control areas during the same trial.

Figure 6.

Snapshots of the dynamics of the network at key moments of a trial where image 4 is the sample and target, and image 1 the distractor. In layers M, L and VR, neurons are represented by coloured dots (black, green and red represent silent, firing excitatory and firing inhibitory neurons, respectively). Control activities Gu, Gd and visual inputs are displayed only when active. We only display the most strongly enhanced vertical connections (green arrows).

Figure 6.

Snapshots of the dynamics of the network at key moments of a trial where image 4 is the sample and target, and image 1 the distractor. In layers M, L and VR, neurons are represented by coloured dots (black, green and red represent silent, firing excitatory and firing inhibitory neurons, respectively). Control activities Gu, Gd and visual inputs are displayed only when active. We only display the most strongly enhanced vertical connections (green arrows).

Before the warning period, all neurons are reinitialized and their membrane potentials set to zero, thus ensuring that successive trials do not interfere. Such a resetting can be considered a result of the activity of attentional mechanisms (Miller, 2000; Ignashchenkova et al., 2004), but we chose not to include it in our already complex formalism.

From the beginning of the cue period, image 4 is presented to the network (see Fig. 6A): through connections representing pathways from primary visual areas to IT cortex, neuron VR(4) is depolarized and fires. The network expects at this time a task-relevant stimulus: the gating Gu on the upward connections leading from VR to working memory layer M is open. Visual information evoked by the sample image thus rises efficiently through connections which have been strengthened by learning, and activates a specific, stable pattern of activity in M representing image 4. The network has effectively recognized stimulus 4 and keeps a trace of this by the sustained activity of a group of neurons. This group has a large overlap with the cluster shown in pink in Figure 2B, and has emerged during learning, as described earlier. Neuron VR(4) and the cells firing in M (‘cluster 4’) constitute a bottom-up (i.e. input triggered) representation of image 4.

The delay period starts when the stimulus is hidden. Retention now takes place due to a neural trace of the stimulus remaining active in layer M: M has effectively become a self-sustained working memory system. The memorized trace, it should be noted, does not consist of exactly the same set of neurons as those which were firing during presentation (compare parts A and B in Fig. 6), though there is a large overlap. Because of memory, the gating Gu on upward connections from VR to M may be safely inactivated. Simultaneously, the gating Gd on connections from M to VR and L is opened, allowing for sustained, stimulus-related activity in the visual areas during the delay. Note that opening Gd at this moment is not needed for the model itself to execute DMS correctly; the opening is, however, required in order to reproduce experimentally observed activity patterns in IT. Successful learning ensures that, of all the connections originating from the M neurons of, say, cluster 4, only those projecting to neurons VR(4) and L(4) have been reinforced. Therefore the activity in layer M, if initiated by presentation of image 4, now only triggers neurons VR(4) and L(4) in the visual areas during the delay period. Cluster 4 and neurons VR(4) and L(4) form a new representation for image 4, a memory or top-down representation generated through self-sustained M neuron activity.

Towards the end of the delay period, the artificial network, just like a real monkey, must be put in readiness for the presentation of two images, the sample and the distractor. For this to take place efficiently in the model, the visual areas VR and L need to be free of activity. Hence, as shown in Figure 6C, the gating Gd on downward connections from M to VR and L becomes closed during the last phase of the delay period. The membrane potentials of neurons in VR and L, which no longer receive any inputs, fall rapidly below firing threshold.

Figure 6D shows the state of the network at the beginning of the choice period: the target (stimulus 4) and the distractor (stimulus 1) images are presented via the input layer, triggering neurons VR(1) and VR(4). Working memory, on the other hand, contains a memorized representation of stimulus 4. Because both gatings Gu and Gd are closed, working memory and visual areas are able to work separately and in parallel: one keeping a stimulus ‘online’ and the other processing visual information.

After the first 200 ms of the choice period, both gatings Gu and Gd are opened. This results in a short and crucial computation as, simultaneously, neurons VR(1) and VR(4) try to depolarize units in layer M while the active cells in M (i.e. cluster 4) try to activate those in VR and L. However, with cluster 4 stably firing and the strong long-range inhibitory connections it extends to all the rest of layer M, the system quickly converges to a state close to that of Figure 6B: image 4 is chosen over stimulus 1 and VR(1) is turned off by the lateral inhibitory connections originating from L(4) (see Fig. 6E). According to Chelazzi et al. (1993), this process can be interpreted as a competition between the representations of images 1 and 4 to monopolize resources in working memory and in the visual system. However, Figure 6B,E shows that no representation in M actually ‘wins’ over the other since the activity patterns in layer M are not identical; the resulting activity constitutes yet another representation of image 4 as a compromise between the target expectation held in working memory during the delay and the presentation of the target together with the distractor.

Finally, during the response period, the network indicates its choice by sustaining the firing of cells VR(4) and L(4) through downward connections (see Fig. 6F), despite the absence of inputs.

#### Activity of Single Neurons

Firing of VR and L units is described in Figure 7B. Cells in layer VR play the role of visual representation in the model. Each of them is selective for a single stimulus. If this stimulus is the sample image, the cell will fire during the whole trial with the exception of the preparation period during the last 200 ms of the delay [see neuron VR(4) for the trial 4 → 4 + 1 in Fig. 6C]. In a trial where this image is used as distractor, the neuron does not fire at all except at the very beginning of the choice period. This is the case for neuron VR(1) in the same trial depicted in Figure 6D. VR neurons do not respond during trials where their favourite stimulus is not presented.

Figure 7.

Response of visual areas cells for trials where the preferred stimulus is the target and the distractor. (A) Data from Chelazzi et al. (1993) showing that the response of an IT neuron cell during the choice period of a DMS task depends on the sample image (modified from Chelazzi et al. (1993). (B, top) Activation (see text for details) of neuron VR(1) during trials 1 → 1 + 2 and 2 → 2 + 1. (B, bottom) Activation of inhibitory neuron L(1) as simulated in the model (continuous line: 1 → 1 + 2, dotted lined: 2 → 2 + 1).

Figure 7.

Response of visual areas cells for trials where the preferred stimulus is the target and the distractor. (A) Data from Chelazzi et al. (1993) showing that the response of an IT neuron cell during the choice period of a DMS task depends on the sample image (modified from Chelazzi et al. (1993). (B, top) Activation (see text for details) of neuron VR(1) during trials 1 → 1 + 2 and 2 → 2 + 1. (B, bottom) Activation of inhibitory neuron L(1) as simulated in the model (continuous line: 1 → 1 + 2, dotted lined: 2 → 2 + 1).

Layer L enforces a lateral inhibition controlled by the gating Gd, and its activity is specified by the content of working memory: it is therefore specific to the sample image and L neurons only fire in trials where their preferred stimulus is the sample. Further, since L is used to enforce uniqueness of representation in layer VR, L neurons are only active during the delay, choice and response periods, minus the time interval starting 200 ms before stimulus onset and 200 ms after it (Fig. 6C,D).

Activity in layer M is more difficult to analyse due to the larger number of cells involved. We start by a global overview and move on to the statistics of neural response. Figure 8 shows the total activity in M as a function of time, for trials with sample image 4 and the three possible distractors. All curves almost exactly overlap (up to small variations during the transitory regime) during the cue and delay periods of the trials.

Figure 8.

Time variation of the number of firing neurons in layer M during trials with sample image 4 and distractors 1, 2 and 3.

Figure 8.

Time variation of the number of firing neurons in layer M during trials with sample image 4 and distractors 1, 2 and 3.

There is first a sharp rise during the cue period as the neurons of cluster 4 are recruited. Activity settles to a plateau before the end of the cue period, indicating a stable state with 142 neurons firing steadily. As delay begins, the system switches to the memorized representation of stimulus 4; as this new state does not benefit from the support of sensory inputs, it mobilizes fewer neurons (133 in this case): there are cells in layer M which fire exclusively during sample presentation, and others firing exclusively after it. This situation will persist until the end of the first part of the choice period.

At that time, due to the presentation of the target and distractor images during the choice period, there is a sudden increase in the number of discharging neurons. This difference in neural mobilization also depends slightly on the distractor (more neurons are firing if image 2 is used as distractor), an effect which can be interpreted as a modification of the representation of image 4 in the context of the presentation of the distractor. As will be shown below, this change in the number of firing cells comes from a group of neurons whose activity is highly dependent on the time period considered.

As we saw above, during the learning phase, neurons in layer M become partitioned in four stimulus-specific clusters. Within each group, we find that neurons can be further distinguished according to their firing pattern. Figure 9A shows the proportion of all cells in M for each type of activity, compiled for all four images.

Figure 9.

(A) Distribution of M neurons according to their temporal firing pattern after completion of the learning phase. Each frame indicates (top) the cell's response when the favourite image x is the sample, (middle) when it is the distractor and (bottom) for other images. The symbol ‘Ø’ indicates cells which do not respond during the task. Almost all neurons respond in a robust way to at most one image; the rest (10 cells or roughly 1%) fire either in an inconsistent way or respond to several images. (B) Comparison of activity of PF cells in the monkey and of units in working layer M. exp: firing pattern of cells recorded in the PF cortex of the monkey while performing the DMS task. Notation ‘up’ and ‘down’ refers to the sliding of a blind used in the experimental device, which was raised and lowered to show and hide objects to the subject (modified from Fuster, 1973). Percentage near each curve represents the proportion of cells falling in each response pattern class. model: activity of M neurons with firing patterns similar to that on exp.

Figure 9.

(A) Distribution of M neurons according to their temporal firing pattern after completion of the learning phase. Each frame indicates (top) the cell's response when the favourite image x is the sample, (middle) when it is the distractor and (bottom) for other images. The symbol ‘Ø’ indicates cells which do not respond during the task. Almost all neurons respond in a robust way to at most one image; the rest (10 cells or roughly 1%) fire either in an inconsistent way or respond to several images. (B) Comparison of activity of PF cells in the monkey and of units in working layer M. exp: firing pattern of cells recorded in the PF cortex of the monkey while performing the DMS task. Notation ‘up’ and ‘down’ refers to the sliding of a blind used in the experimental device, which was raised and lowered to show and hide objects to the subject (modified from Fuster, 1973). Percentage near each curve represents the proportion of cells falling in each response pattern class. model: activity of M neurons with firing patterns similar to that on exp.

Neurons of the largest class (type 1) are active during the whole trial and make up 71% (637/900 cells) of layer M. These neurons are those doing most of the work of sample discrimination (cue period), retention (delay period) and image selection (choice and response periods). There are also cells (type 1′: 10/900 ∼1%) which, in addition, respond during the choice period in trials where these cells' preferred image is used as distractor. The response to the distractor image is transitory: in most cases, inhibitory connections from neurons in the target-specific cluster rapidly hyperpolarize these cells. Nonetheless, its seems that activity in layer M may represent briefly more than one image at a time.

Neurons in the remaining groups have similar firing patterns except that they lack certain response components. This may be explained by a paucity of excitatory connections afferent to these cells from units active during certain trial periods, but can also be due to a complementary excess of inhibitory projections. In fact, we observed that, because layer M is dominated by short-range projections, pairs of neighbouring cells tend to share the same firing pattern if they are excitatory. On the contrary, pairs of cells of different types have a tendency to exhibit opposite firing patterns as, for instance, an inhibitory cell will keep neighbouring cells from firing whenever it is active. This produces a rich dynamics at the neuron level although the activity at the cluster level tends to be very stable and reproducible from one trial to the next. This observation is consistent with the finding by Wilson et al. (1994) that both pyramidal and GABAergic cells carry specific informational signals, though with inverted response. This observation led Wilson et al. (1994) to propose a role for GABAergic type cells in the shaping of receptive fields of pyramidal neurons. This is indeed what happens in the model, where short-range inhibition is crucial to the self-organization of stimulus-specific clusters.

Type 3 cells are ‘sample cells’, responding only during sample presentation (11% or 97/900 neurons). Three per cent of M neurons display the opposite behaviour, namely no activity during the sample period, followed by sustained firing during the rest of the trial (type 2: 24/900 cells).

Forty cells (∼4%) are of type 4: they respond to their preferential image both during the cue and choice periods. They only lack the ability of sustaining their firing during the delay. Twelve more cells (∼1%), of type 4′, have the same firing pattern but respond in addition to their preferred stimulus used as distractor. These neurons therefore respond in bottom-up fashion to a given image whenever it is presented as sample, target and distractor. We refer to neurons of type 4 and 4′ as ‘bottom-up’ cells.

There are also 70 cells (8%) which do not fire at all during the task, the remaining 10 cells (1%) exhibiting hybrid and unstable activity.

#### Comparison with Experiments I: Attending to Stimuli

Let us begin with the activity of layer VR, which should be compared to higher visual areas such as IT. Figure 7A exhibits data obtained by Chelazzi et al. (1993) on monkeys performing DMS. What is shown is the recording of a single IT cell for two stimuli as the sample, one of which (the ‘good stimulus’) was selected because it produced a high response during the cue and delay periods, the other (the ‘poor stimulus’) because it produced little such activity. During the choice period, in both trials, the good and the poor stimulus are presented together: hence, the difference in activity between the two curves must be a consequence of the retained memory.

In trials where the sample is the good stimulus, the cell's activity is strong during the cue period. There is then sustained activity during most of the delay period, and a return to spontaneous rate during the last half second or so of that period. At choice time, there is first a rise in activity which culminates 200–300 ms after the onset of the image, and then another rise until the cell's activity reaches a new maximum comparable to that registered during the sample presentation. The cell's response to the poor stimulus as the sample is at the level of background firing or even below during cue and delay periods. But during the choice period, this cell's activity rises very much like when the good stimulus was used. This surge only lasts 200–300 ms; Chelazzi et al. (1993) attributed it to a mechanism involved in selecting objects in the visual field to which the monkey attends and foveates.

In Figure 7, we compare the behaviour of neuron VR(1) with the Chelazzi et al. (1993) recordings. Figure 7B (top) shows the membrane potential VR(t) as a function of time, reinterpreted as a firing probability for the neuron VR(1) (see Materials and Methods). The figure demonstrates that simulated data agrees reasonably with the experimental curves. In the case of the model, for cell VR(1), image 1 is the ‘good sample’ while image 2 is a ‘poor sample’. The model reproduces well the rise in activity during sample presentation, the sustained delay activity, the initial rise in activity at the beginning of the choice period and the later phase where response to the good stimlus is enhanced while that to the distractor collapses.

On the other hand, the model response during the late choice period is slightly higher than that to the sample, while the reverse is true for experimental data. Also, the sample response to the poor stimulus is below spontaneous activity level in experimental data, while it is at background level for the model. These slight discrepancies suggest the existence of an independent, unknown mechanism able to actively enhance or suppress activity in IT cortex. In addition, our simulations show that during the late phase (last 500–600 ms) of the delay period, activity for the good sample drops to spontaneous level. This phase could be interpreted as a preparation phase of IT cortex to process forthcoming visual information. Such a phase of low activity was, however, not mentioned by Chelazzi et al. (1993).

#### Comparison with Experiments II: DMS Task Execution

Figure 9B(exp) presents several classes of cell activity patterns found during a DMS-type task (Fuster, 1973). Cells generally have different firing patterns during different trial periods. In addition (Fuster, 1997; Miller, 2000), responses can be further separated into sample-related and task-related components (the latter being invariant from one trial to the next). These components vary from cell to cell and from one trial period to the other, though cue-differential activity is usually larger during the delay period.

Correspondingly, there are two neuronal activity components in the parts of our model representing PF: as shown in Figure 1B, control areas Gu and Gd have a task-specific but stimulus non-specific activity, while layer M neurons have developed, through learning, activity patterns highly selective for the stimuli being used (see e.g. Fig. 2B).

We now compare experimental (Fuster, 1997) and model activity profiles. Considering that we have not attempted to model motor or premotor cortices or attentional mechanisms, the agreement of our simulations with experiment seems rather good. Figure 9B (model) presents activity of some M cells during the task and shows that it is possible to find in the simulations neurons with activity reproducing all those observed experimentally. The only exception is a slight difference in neurons of class a, which respond to a stimulus absent from our model (namely the sound of a sliding curtain preceding presentation of the images). According to the model, class b cells fall into ‘bottom-up’ neurons of type 4 in our model, reacting to the bottom-up presentation of images but unable to sustain their activity during the delay (see Fig. 9A). Cell c fits in the type 1 and 1′ class, participating in all three representations of an image. This type of cell dominates in number both in the model and in the experimental data. Cells of type d, however, are very rare in layer M. They probably play little role in the network dynamics.

### Robustness and Resistance to Damage

It is not uninteresting to begin this section with a brief description of the two main network failure modes. First, badly formed clusters may occur in layer M; secondly, improper connection bundles may occur, linking clusters in layer M with wrong cells in VR and L. The first failure mode is the most common, and can be minimized by increasing the number of cells and connections in layer M. The second mode of failure — wrong interlayer connectivity — can be traced be due to an excessive lability of the clusters during the learning phase.

Simulations show that performance of the network remains excellent over a wide range of parameters α, η and ρ (we have not tried to fine-tune parameter values to maximize success). Simply, we chose threshold scale parameters α large enough to avoid saturation of neural activity before the end of learning but small enough that most cells will respond when their more stenghtened afferent connections are active. The cooperative nature of the clusters which self-sustain their activity in the system is such that performance depends only mildly on task parameters such as length of the delay. The main effect of increasing the latter is to somewhat lengthen the learning phase until recurrent connections are strong enough to bridge the delay gap in self-sustained fashion.

We have tested the resistance of the mature system to cell death. In a network which has learned to perform with 94% success, killing 25% of the cells in M leads to a performance drop to ∼ 90% success, which is still very good. With 50% and even 60% of cells eliminated, the performance is still above or around 50% success rate. The ‘cognitive catastrophe’ occurs around or above 60% of cell death: performance becomes very erratic and quickly drops to nil. In this process of graceful degradation, individual networks may behave very differently, and statistics hardly represent the full variety of observations.

To analyse the effect of damage in the crucial gating areas Gu and Gd, we introduced and progressively increased randomness in their operation. We supposed that the value of the gatings can become different from 0 or 1, taking values in between, which simulates partial misfirings. Thus, when gatings G are assumed to be perturbed on average 10 times per second with white noise of amplitude N, keeping 0 ≤ G ≤ 1, performance decreases only slowly as N increases but drops abruptly to almost nil when N reaches ∼1.5, i.e. when gating can be completely reversed by noise about twice per second.

### Importance of Gating Precision

Gating signals are important for task performance. However, as shown above, gating precision can be relaxed to quite an extent after learning. Additionally, the gating activities suited for reproducing some detailed data on task performance need not be the ones that lead to the best learning capacity.

We have not checked systematically the effect of noisy gatings on learning. However, an example will suffice to illustrate the possibilities. Figure 10B shows what happens to the model in a particular trial (with image 3 as sample/target) if we use, during learning, gating activities which work well in the mature network (compare Fig. 10A). The dynamics unfolds correctly during the cue and delay periods but, when gating Gu is set to ‘on’ during choice, visual information concerning both the target and the distractor rises from visual area VR to working memory. Working memory already harbours activity coding for the sample/target image 3, but now, cells selective for the distractor (image 1) will begin firing as well, and Hebbian learning will spuriously strengthen excitatory connections between cells of clusters 1 and 3. This process can prove sufficient to prevent successful learning. This is also the case when we set Gu ‘on’ during delay (Fig. 10C): this introduces a feedback between M and VR and eliminates almost all stimulus specificity of responses in layer M.

Figure 10.

Influence of gating Gu on the dynamics and neural segregation of activities for three instantiations of a network which have identical connectivity, layers M (columns), neuron updating sequences and which are submitted to the same series of trials and stimuli in the phases of learning. (A) System with gating Gu ‘off’ during the delay and choice periods. Shown are snapshots of the system dynamics during cue, delay and choice; bottom seggregation of cells in four groups according to the image they respond to. (B) System with Gu set ‘on’ during the choice period. This variation induces the firing of clusters in M representing both the sample/target and distractor and failure of the trial. Consequently, through Hebbian learning, clusters of cells for images 1 and 3 become associated (bottom). (C) System with gating Gu set ‘on’ during the delay for a different trial than (B). The feedback between layers M and VR now induces multiple associations, and all specialization of cells is lost, as shown at the bottom: most cells in M now fire for all of the images.

Figure 10.

Influence of gating Gu on the dynamics and neural segregation of activities for three instantiations of a network which have identical connectivity, layers M (columns), neuron updating sequences and which are submitted to the same series of trials and stimuli in the phases of learning. (A) System with gating Gu ‘off’ during the delay and choice periods. Shown are snapshots of the system dynamics during cue, delay and choice; bottom seggregation of cells in four groups according to the image they respond to. (B) System with Gu set ‘on’ during the choice period. This variation induces the firing of clusters in M representing both the sample/target and distractor and failure of the trial. Consequently, through Hebbian learning, clusters of cells for images 1 and 3 become associated (bottom). (C) System with gating Gu set ‘on’ during the delay for a different trial than (B). The feedback between layers M and VR now induces multiple associations, and all specialization of cells is lost, as shown at the bottom: most cells in M now fire for all of the images.

Simulations show, however, that after 10–15 successful trials, Gu may actually be switched to ‘on’ during the choice period, even before removing all synaptic plasticity, with only a small loss of performance. It seems that, at this point, the connectivity of the network is sufficiently committed to stable execution of the task to let the model ignore the distractor image.

Note also that, particularly in view of Frank et al.'s (2001) modelling results, it is not unlikely that some of the burden of signal gating might be transferred to the basal ganglia in the course of task routinization.

## Discussion

### Predictions of the Model

The present model makes the following predictions.

• 1. The simulations give quantitative predictions about cell segregation into stimulus-specific sets and about activity development in working memory (i.e. PF cortex).

• 2. The development of activity in visual IT cortex (represented by layer VR) is predicted in detail during learning, as connections from PF to IT cortex are selected and reinforced.

The predictions in (1) and (2) are in general agreement with data obtained on monkeys before and after learning delayed-response tasks; indeed observations indicate that existing responses are increased, decreased or eliminated by learning, in full agreement with our results (Kubota and Komatsu, 1985; Tremblay and Schultz, 2000; Schultz et al., 2003).

• 3. The model illustrates the lability and relative vulnerability of neural systems during the early phase of task learning, and the subsequent consolidation.

• 4. The model makes a testable behavioural prediction on the type of errors made during learning: after succeeding in a trial using a given sample, that sample will not be selected again as target in trials with a different sample (though the subject can still fail trials with the said sample). This rule of ‘learning by elimination’ (Changeux, 1983), which is rigorously observed by our network, is a direct consequence of our operational definition of the neurons' variable firing thresholds. In experimental data gathered on animals or humans, it could manifest itself as a statistical bias.

• 5. Resistance to damage: the system's performance degrades gracefully as neurons are destroyed in the working memory subsystem; we predict that roughly two-thirds of all neurons in the relevant PF area must become non-functional before performance is totally lost. The gating areas can also be perturbed to quite an extent, though because of the way we modelled those areas it is not possible to estimate damage resistance in terms of quantitative neuronal damage. In future work we plan to investigate the effects of transmission delays as may result from damage to axonal myelin sheaths.

Apart from the ordinary DMS task, the model can pass slightly different versions of DMS. Its strong lateral inhibition in layers M and L allows it to succeed, with almost no performance loss, for example, in trials where two distractors are presented together with the target during the choice period instead of just one. We also subjected the model to trials where two distractor images are presented during the choice period, and the target is absent. In this case the model architecture is not sufficiently complex to be able to suppress top-down firing corresponding to the ‘expected’ target — a case where a finer distinction between target memory, which should still be present, and pointing gesture (impossible because of target absence) should be drawn.

### Comparison with Other Models

We compare first the neural network model presented here with a class of models for learning DMS-type tasks that was proposed by Zipser, Moody and colleagues (Zipser et al., 1993; Moody et al., 1998; Moody and Wise, 2000). These authors use a variant of the back-propagation learning algorithm. Back-propagation is a supervised learning algorithm where the network is explicitly given information on how to solve a problem. In response to a given input, the model offers a tentative response and an error function is computed from that response and the desired solution, to quantify the difference between the two. This error is used to compute the simultaneous modification to all synapses of the network, which will bring the next answer of the network closer to the desired one. The back-propagation algorithm is mathematically efficient, but little physiological evidence can be adduced in its favour. However, this highly detailed supervised learning allows the network to learn and pass the task with essentially no initial anatomical structure (it is homogeneous and fully connected): neurons with various functionalities may appear anywhere in the system. It nonetheless shares with the current work the presence of a gating which allows stimuli to be stored in working memory (e.g. it has a component similar to our executive Gu) Also, once learning is complete, the modelled network activities of Zipser, Moody and colleagues reproduce quite well that of PF and IT neurons in the monkey after learning various DMS-type tasks.

Here we have used Hebbian learning and reinforcement learning, which are far more plausible biologically (reviewed in Dehaene and Changeux, 2000) but undeniably less powerful than back-propagation. We thus endowed our model beforehand with spatial structure (layers) and with a temporal gating structure; these are the features that lead here to task learning. Spatially, cells at different levels in the hierarchy of layers M, VR and L, have different connection patterns. This is consistent with what is known of the anatomy of the cerebral cortex. Moreover, a combination of temporally coherent gatings allows for coherence in the states of activity of the various areas. Indeed, it seems reasonable to assume that in the course of evolution, as specialized brain areas diversified in architecture and function, a mechanism emerged to allow these areas to work in concert. How gating patterns themselves might emerge in neural circuitry will be the subject of future work; here we simply note again that gating activity during learning has indeed been observed in electrophysiological studies on the barn owl midbrain by Knudsen and colleagues (Gutfreund et al., 2002; Knudsen, 2002); such gating operates when fine-tuning of auditory and visual maps is required, and appears inactive otherwise.

Thus, the model network presented here is not as efficient at learning DMS as a back-propagation model is, but it is much more neurobiologically realistic. The system self-organizes to pass DMS due to basic features of its anatomy, and to the temporal activity patterns in its executive areas. While many parameters, such as the number of connections, cell types in layer M, magnitude of the synaptic plasticity and gating activity, have been chosen so that the model operates with a fair amount of success, many resulting features, like the size and exact number of cells in each cluster, are largely unpredictable, and may change from one run to the next. Indeed, the probability that identical networks are produced in two separate runs is negligible. Nonetheless, the system, which possesses several thousand degrees of freedom, evolves in >90% of the cases from an uncommited connectivity into a coherent state where the DMS task has been learned.

A model for IT and PF cortex generally similar to ours has been proposed by Usher and Niebur (1996). A fundamental difference is that these authors rely, to account for the Chelazzi et al. (1993) data, on a slightly biased competition between the activity representing target and distractor. The present model, by adding an active inhibitory layer L, is of an attentional type where PF cortex can actively select which stimulus must be presently attended to.

Frank et al. (2001) have proposed a model where gatings roughly analogous to our Gu are putatively located in the basal ganglia. Due to their model's connectivity pattern, however, and contrary to our gatings, theirs are stimulus-specific, thus forming no representation of the task proper. Their model also emphasizes single-cell rather than neuronal group properties as does ours; and they include no analogue of our Gd.

Another interesting model for delayed-response tasks has been proposed by (Guigon et al., 1995). It focuses on a spatial delayed task, where the subject must remember during delay which of two levers he must pull. The necessary memory is somewhat limited, and the delay is explicitly ended by a ‘go’ signal. The model possesses visual and motor cortices connected to hidden units representing PF cortex; there are also ‘reward’ and ‘motivation’ pathways. ‘Motivation’ is simply a variable which resets the network at the beginning of each trial; reward influences future states of the network through synaptic plasticity rules differing from standard Hebbian learning. The main difference with the present formalism is the use of modified, bistable neurons which can stay in a silent or firing state without any external input. These units are therefore able to keep firing without excitatory recurrent connections such as those we employ. An interesting aspect of Guigon et al.'s work is the training phase, which proceeds in three stages similar to those used with experimental animals: the network first learns to associate reward with pulling a lever, then with pulling a lever but only when the go signal has been given, and finally pulling, after the go signal, the one lever indicated before delay. Overall, the network reproduces well the electrophysiological data on IT and PF cortex in the monkey performing the task, and makes predictions on neural activity during the whole training period.

### Appraisal of the Model Assumptions

Several simplifying assumptions underlie our work. First, we omit the process of perception whereby primary visual cortex associates neural activity to external objects. Also, we use only four cells to represent IT cortex. This last restriction can, however, easily be lifted: as we have shown in the Results section, mechanisms that allow layers of hundreds of neurons to segregate into a few stimulus-specific clusters do exist and could be introduced in our modelling of IT.

The model neurons are not capable of graded firing. To test the robustness of our conclusions, we have performed some runs using neurons endowed with graded (piecewise-linear), non-saturating response curves (i.e. with ∼10% residual misfiring even at maximal discharge rates). Performance is not affected, except for the need of a slightly longer learning phase. We observe that during the delay, activity is more labile, and involves smaller and more dispersed clusters. Individual cell responses become more complex in terms of quantitative activity levels, yet still fall mostly into the categories defined above; noise is, of course, more prevalent. Rarely (in 2 or 3 cells in 900) does one observe in layer M some responses of a cell to two images, corresponding apparently to misfiring in adjacent inhibitory units.

Layer M serves a dual purpose in the model's operation. First, it stores the stimuli used for the task by synaptic modification, creating a ‘long-term’ memory trace of these images. Secondly, it constitutes a ‘working memory’, a short term store of stimulus-specific patterns, whereby sample presentation elicits a specific activity in M by reactivating the neural circuits constructed during learning. Regarding memory, we did not model the contribution of hippocampus and thalamus; nonetheless, it seems plausible that layer M encompasses functions of hippocampus and thalamus in addition to PF. Neural activity of units in layer M might therefore match that of hippocampus or thalamic cells in addition to PF neurons during DMS task performance.

Activity in layer M is very stable, as appropriate for memory functions. The whole layer therefore has to be explicitly reset in some way at the beginning of each trial. Cells in the monkey do present a collapse of activity at the end of trials. Fuster (1997) recognized the necessity for a mechanism of this type; he coined the term ‘inhibitory control’ for it, and placed its neural substrate in orbitomedial PF cortex. Note also that the model predicts the existence, in PF cortex, of a transient activity related to the distractor and coexisting with that of the target (see Fig. 9A). This should be fairly easy to test experimentally.

Layer L implements time-dependent, sample- or target-specific lateral inhibition. It serves as an intermediary between the PF and IT cortices, playing a role in visual attention by enforcing uniqueness of stimulus representation in visual area VR. As far as we know, cells with a firing pattern such as those of layer L during performance of the DMS task have not yet been observed. They are stimulus-specific but do not fire during sample presentation nor during the first few moments (200–300 ms) of the choice period. Cells of layer L type are one of the founding hypotheses of the present model and their existence could be tested experimentally. Neurons in L, which enable PF cortex to suppress activity in IT cortex, play a role of inhibitory control, just like the memory reset mentioned above. However, L might be located in IT cortex. Indeed, its connection pattern is similar to that of IT-located layer VR, as it is controlled at-a-distance by long-range downwards projections from layer M. Further, short-range shunting connections sent by L neurons complement those sent by layer VR to working memory. The emergence and development of such a tight arrangement in the cortex is easier to explain if layers L and VR are in close proximity.

We do not anticipate that lifting the restriction of layer VR to only four cells will lead to difficulties. Only computational limitations have prevented us from doing so here. One might wonder then whether our model has more cells in layer M than it really needs. It should be noted that, of all M neurons, 20% or so are not active during the whole of a trial. This arises through the organization of cells into clusters and is a consequence of the complex connectivity within clusters. Are such cells needed at all? From our simulations, we are led to doubt that a system comprising no partially inactive cells could emerge at all through learning and still possess a stable dynamics. We can, for example, relax somewhat the constraint of the connection bias from VR toward the corners of M: this produces diffuse and intricate clusters, and the role played in the stability of clusters by cells other than those of type 1 (active all along) becomes then absolutely crucial.

In fact, because of its simplicity, the DMS task can in principle be modelled using a very small working memory module retaining the cue image as stimulus-specific sustained activity which then ‘points’ to the target by a classical mechanism of competition, which in turn eliminates distractor-related activity in visual cortex. However, watching monkeys go through several stages of learning, each of slightly increased difficulty, to finally reach correct DMS performance strongly suggests the progressive construction of a representation of the task itself. Here, we propose an architecture for PF and IT cortex answering this requirement. Whether such neural correlates of the task exists in the brain of monkeys should be investigated experimentally.

Control variables Gu and Gd play a key role in the dynamics of the model. They have on/off gating activities; they are stimulus non-specific, and were chosen a priori to allow successively for sample storage, sustained memory and matching with the target. While here these gatings have not been learned, they correspond to a simple binary switch between input-dominated and memory-dominated regimes; we shall examine in future work how these gatings could be learned. Their existence, at any rate, corresponds to a fundamental, executive-level modular organization of the brain. There is evidence that laboratory animals go beyond simple association when learning DMS, since they are capable of generalizing from a set of known stimuli to a set of previously unseen ones (Rainer and Miller, 2002); this raises the question of the existence, location and dynamics of a neural substrate of this ‘task-level representation’. The inhibitory control proposed by Fuster (1997) certainly has some overlap with this neural substrate; our variables Gu and Gd are phenomenological variables that would appear as indipensable gating complements of Fuster's inhibitory control. In view of Wallis et al.'s (2001) results on rule-specific activity in orbitofrontal cortex, we may propose tentatively that this area could also harbour gating related activity, i.e. task-specific cells.

At any rate, the presence of neural circuitry in the central nervous system similar to that simulated by areas Gu and Gd is indirectly supported by electrophysiological data which indicate complex, task-dependent interactions between PF and IT cortices. A first example is the activity of IT cortex cells recorded while a monkey is performing a so-called running-DMS task in which, after the delay, several images are presented sequentially one-by-one to the subject as distractors before the appearance of the target (Miller and Desimone, 1991, 1994; Miller et al., 1993, 1996). To succeed, the monkey must hold its response until the target image appears. Electrophysiological recordings show that numerous stimulus-specific cells in IT have activity which is highly modified by the successive presentation of the distractor images, while some PF cortex cells have a more stable response across the trial (Miller et al., 1996; Sakai and Passingham, 2003). This indicates that information does not merely rise automatically from IT to PF cortex: some control, related to the task at hand, seems to take place as memory held in PF is shielded from the distractors which are at the same time processed by IT cortex (Miller and Desimone, 1994; Sakai and Passingham, 2003). Similarly, activity in IT cortex during the DMS task does not simply reflect that of the content of working memory as indicated by the work of Chelazzi et al. (1993). The two phases of response observed by these authors — first to target and distractor, then mostly to target — show that activity in IT cortex can be modified by PF cortex in a task-related way.

We take the experimental results just described and the success of our simulations as convincing evidence for a complex ‘executive’ mechanism by which PF controls (i.e. suppresses or enhances) activity in IT at precise times, and by which PF gates the information allowed to rise into PF itself from sensory areas in a manner essentially related to the task at hand.

The precise choice of the gating functions in the model is however a difficult issue. Thus the patterns of Figure 1B show only one possibility, i.e. gating Gu ‘off’ during the choice and response periods while learning but ‘on’ once training is over. It should be emphasized that the network passes DMS just as well, after the training period, with Gu kept ‘off’ during the choice and response periods. We chose the former activity pattern for two reasons.

First, it brings neural activity closer to that measured on monkey PF cortex, as it introduces a bottom-up component in the activity of cells which fire in response to the target or the distractor during the choice and response periods (neuron types 1′, 4, and 4′). Secondly, and not unrelated, is that it allows working memory the freedom to receive information during the choice period. This information could be, for example, the distractor. As type 1′ and 4′ cells show, our working memory module can sustain for a short time neural representations of both the target and distractor images. Figure 8 shows that there actually exists indeed an interaction between these representations, as the number of neurons firing in response to target image 4 differs, depending on the distractor which accompanies it (we would expect this effect to be more pronounced if the bias on connections from layer VR to M were lifted — see above). To our knowledge, no electrophysiological study of the interaction between the representation of the target and distractor images have yet been published for DMS, but we predict that such interactions might take place if restrictions on the Gu gating are as chosen here.

Another type of information which could be gated in layer M by setting Gu ‘on’ during the choice period might be a behavioural input such as a go/no-go signal instructing the system to withhold its response. Although the current task does not require this precise control, it seems to us that allowing for it represents a natural step towards opening the present architecture to more complicated tasks.

## Conclusion

We have presented a multilayer neural network, endowed with executive gating systems, which is able to learn by reinforcement and thereafter execute correctly a unimodal delayed-matching-to-sample task. The way in which the system develops during learning, and functions when mature, yields specific and experimentally accessible predictions at the level of both neural and behavioural organization, and suggests further empirical studies.

## Appendix: Equivalence of threshold adjustment and synaptic scaling

According to equations (1) and (6), neurons fire whenever vi(t) > Vσi(t), i.e. when the following equation is satisfied:

(7)
$v_{i}(t{-}\mathrm{{\Delta}}t)\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}{+}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}\left[{{\sum}_{j{\neq}i}}J_{ij}(t)F_{j}(t)\right]{>}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}\frac{1}{1{-}\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}}\left[\frac{J_{0}}{2}{+}\mathrm{{\alpha}}{{\sum}_{j{\neq}i,J{>}0}}J_{ij}(t)\right]$
if we neglect gating G(t) for simplicity. As soon as learning starts, the value of the sum over the Jij on the left increases rapidly, making J0/2 negligible. Dividing both sides by the factor S(t) = ∑ji,J>0Jij(t), we obtain:
(8)
$\frac{v_{i}(t{-}\mathrm{{\Delta}}t)}{S(t)}\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}{+}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}{{\sum}_{j{\neq}i}}\frac{J_{ij}(t)}{S(t)}F_{j}(t){>}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}\frac{\mathrm{{\alpha}}}{1{-}\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}}$
If we now define new formal variables scaled by the overall synaptic strength, namely
$$J{^\prime}_{ij}(t){=}J_{ij}(t)/s(t)$$
and
$$v{^\prime}_{i}(t){=}v_{i}(t)/s(t),$$
the firing condition equation (8) becomes:
(9)
$v{^\prime}_{i}(t{-}\mathrm{{\Delta}}t)\mathrm{e}^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}{+}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}{{\sum}_{j{\neq}i}}J{^\prime}_{ij}(t)F_{j}(t){>}\frac{\mathrm{{\Delta}}t}{\mathrm{{\tau}}_{v}}\frac{\mathrm{{\alpha}}}{1{-}e^{{-}\mathrm{{\Delta}}t/\mathrm{{\tau}}_{v}}}$
which is the equation for a neuron with constant firing threshold [but with synaptic strengths scaled by the activity- and learning-dependent factor 1/s(t)], in complete analogy to the scheme proposed by Turrigiano et al. (1998) based on their experimental observations.

T.G. received a Châteaubriand postdoctoral scholarship from the Ministère Français des Affaires Etrangères, and a fellowship of the Canadian Institutes of Health Research.

We thank N. Borde, J.-P. Bourgeois, P. Faure, S. Granon and R. Klink for useful discussions, as well as C.W. Chen, and A. and M. Dharwadker for help with the numerical work.

## References

Amit DJ (
1994
) The Hebbian paradigm reintegrated: local reverberations as internal representations.
Behav Brain Sci

18
:
617
–626.
Amit DJ (
1989
) Modeling brain function — the world of attractor neural network. Cambridge: Cambridge University Press.
Berger B, Trottier S, Verney C, Gaspar P, Alvarez C (
1988
) Regional and laminar distribution of the dopamine and serotonin innervation in the macaque cerebral cortex: a radioautographic study.
J Comp Neurol

273
:
99
–119.
Brunel N (
2003
) Dynamics and plasticity of stimulus-selective persistent activity in cortical networks models.
Cereb Cortex

13
:
1151
–1161.
Changeux JP (
1983
) L'Homme neuronal. Paris: Fayard.
Chelazzi L, Miller EK, Duncan J, Desimone R (
1993
) A neural basis for visual search in inferior temporal cortex.
Nature

363
:
345
–347.
Dehaene S, Changeux JP (
1989
) A simple model of prefrontal cortex function in delayed-response tasks.
J Cogn Neurol

1
:
244
–261.
Dehaene S, Changeux JP (
1991
) The Wisconsin card sorting test: theoretical analysis and modeling in a neural network. 1:62–79.
Dehaene S, Changeux JP (
2000
) Hierarchical neuronal modeling of cognitive functions: from synaptic transmission to the Tower of London.
Int J Psychophysiol

35
:
179
–187.
Desai NS, Cudmore RH, Nelson SB, Turrigiano GG (
2002
) Critical periods for experience-dependent synaptic scaling in visual cortex.
Nat Neurosci

5
:
783
–789.
Edelstein SJ, Changeux JP (
1998
) Allosteric transitions of the acetylcholine receptor.

51
:
121
–183.
Feldman D (
2002
) Synapses, scaling and homeostasis.
Nat Neurosci

5
:
712
–714.
Frank, MJ, Loughry, B, O'Reilly RC (
2001
) Interactions between frontal cortex and basal ganglia in working memory: a computational model.
Cogn Affect Behav Neurosci

1
:
137
–160.
Fuster JM (
1973
) Unit activity in prefrontal cortex during delayed-reponse performance: neuronal correlates of transient memory.
J Neurophysiol

36
:
61
–78.
Fuster JM (
1997
) The prefrontal cortex — anatomy, physiology and neuropsychology of the frontal lobe. Philadelphia, PA: Lippincott-Raven Publishers.
Fuster JM, Alexander GE (
1971
) Neuron activity to short-term memory.
Science

173
:
652
–654.
Fuster JM, Bauer RH, Jervey JP (
1981
) Effects of cooling inferotemporal cortex on performance of visual memory tasks.
Exp Neurol

71
:
398
–409.
Gaffan D, Weiskrantz L (
1980
) Recency effects and lesions effects in delayed non-matching to randomly baited samples by monkeys.
Brain Res

196
:
373
–386.
Gaspar P, Berger B, Febvret A, Vigny A, Henry JP (
1989
) Catecholamine innervation of the human cerebral cortex as revealed by comparative immunohistochemistry of tyrosine hydroxylase and dopamine-beta-hydroxylase.
J Comp Neurol

279
:
249
–271.
Golding NL, Staff NP, Spruston N (
2002
) Dendritic spikes as a mechanism for cooperative long-term potentiation.
Nature

418
:
326
–331.
Goldman PS, Rosvold HE, Vest B, Galkin TW (
1971
) Analysis of the delayed-alternation deficit produced by dorsolateral prefrontal lesions in the rhesus monkey.
J Comp Physiol Psychol

77
:
212
–220.
Goldsmith SK, Joyce JN (
1996
) Dopamine D2 receptors are organized in bands in normal human temporal cortex.
Neuroscience

74
:
435
–451.
Guigon E, Dorizzi B, Burnod Y, Schultz W (
1995
) Neural correlates of learning in the prefrontal cortex of the monkey: a predictive model.
Cereb Cortex

5
:
135
–147.
Gupta A, Wang Y, Markram H (
2000
) Organizing principles for a diversity of GABAergic interneurons and synapses in the neocortex.
Science

287
:
273
–278.
Gutfreund Y, Zheng W, Knudsen EI (
2002
) Gated visual input to the central auditory system.
Science

297
:
1556
–1559.
Hebb D (
1949
) The organization of behavior: a neuropsychological theory. New York: Wiley.
Hunter WS (
1913
) The delayed reaction in animals and children.
Behav Monogr

2
:
1
–86.
Ignashchenkova A, Dicke PW, Haarmeier T, Thier P (
2004
) Neuron-specific contribution of the superior colliculus to overt and covert shifts of attention.
Nat Neurosci

7
:
56
–64.
Jacobsen CF (
1935
) Functions of the frontal association area in primates.
Arch Neurol Psychiatry

33
:
558
–569.
Kerszberg M (
1990
) Genetics and epigenetics of neural function: a model.
J Cogn Neurosci

2
:
51
–57.
Kerszberg M, Dehaene S, Changeux JP (
1992
) Stabilization of complex input–ouput functions in neural clusters by synapse selection.
Neur Net

5
:
403
–413.
Kimura D (
1963
) Right temporal-lobe damage: perception of unfamiliar stimuli after damage.
Arch Neurol

8
:
264
–271.
Knudsen EI (
2002
) Instructed learning in the auditory localization pathway of the barn owl.
Nature

417
:
322
–328.
Koch C (
2000
) Biophysics of computation. Oxford: Oxford University Press.
Kubota K, Komatsu H (
1985
) Neuron activities of monkey prefrontal cortex during the learning of visual discrimination tasks with GO/NO-GO performances.
Neurosci Res

3
:
106
–129.
Middleton FA, Strick PL (
1996
) The temporal lobe is a target of output from the basal ganglia.

93
:
8683
–8687.
Miller EK (
2000
) The neural basis of top-down control of visual attention in the prefrontal cortex. In: Attention and performance. 18. Control of cognitive processes (Monsell S, Driver J, eds), pp. 511–534. Cambridge, MA: MIT Press.
Miller EK, Erickson CA, Desimone R (
1996
) Neural mechanisms of visual working memory in prefrontal cortex of the macaque.
J Neurosci

16
:
5154
–5167.
Miller EK, Desimone R (
1991
) A neural mechanism for working and recognition memory in inferior temporal cortex.
Science

254
:
1377
–1379.
Miller EK, Desimone R (
1994
) Parallel neuronal mechanisms for short-term memory.
Science

263
:
520
–522.
Miller EK, Li L, Desimone R (
1993
) Activity of neurons in anterior inferior temporal cortex during a short-term memory task.
J Neurosci

13
:
1460
–1478.
Miller P, Brody CD, Romo R, Wang XJ (
2003
) A recurrent network model of somatosensory parametric working memory in the prefrontal cortex.
Cereb Cortex

13
:
1208
–1218.
Milner B (
1968
) Visual recognition and recall after right temporal lobe excision in man.
Neuropsychologia

6
:
191
–209.
Mishkin M (
1982
) A memory system in the monkey.
Philos Trans R Soc Lond B Biol SCi

298
:
83
–95.
Mishkin M, Manning FJ (
1978
) Non-spatial memory after selective prefrontal lesions in monkeys.
Brain Res

143
:
313
–323.
Miyashita Y, Chang HS (
1988
) Neuronal correlate of pictorial short-term memory in the primate temporal cortex.
Nature

331
:
68
–70.
Moody SL, Wise SP, di Pellegrino G, Zipser D (
1998
) A model that accounts for activity in primate frontal cortex during a delayed matching-to-sample task.
J Neurosci

18
:
399
–410.
Moody SL, Wise SP (
2000
) A model that accounts for activity prior to sensory inputs and responses during matching-to-sample tasks.
J Cogn Neurosci

12
:
429
–448.
Rainer G, Miller EK (
2002
) Timecourse of object-related neural activity in the primate prefrontal cortex during a short-term memory task.
Eur J Neurosci

15
:
1244
–1254.
Sahgal A, Hutchinson R, Hughes RP, Iversen SD (
1983
) The effects of inferotemporal cortex lesions on Konorski delayed pair comparison in monkey.
Behav Brain Res

8
:
361
–373.
Sakai K, Miyashita Y (
1994
) Neuronal tuning to learned complex forms in vision.
Neuroreport

5
:
829
–823.
Sakai K, Passingham RE (
2003
) Prefrontal interactions reflect future task operations.
Nat Neurosci

6
:
75
–81.
Sakai K, Rowe JB, Passingham RE (
2002
) Active maintenance in prefrontal area 46 creates distractor-resistant memory.
Nat Neurosci

5
:
479
–484.
Schultz W, Dickinson A (
2000
) Neuronal coding of prediction errors.
Annu Rev Neurosci

23
:
473
–500.
Schultz W, Tremblay L, Hollerman J (
2003
) Changes in behavior-related neuronal activity in the striatum during learning.
Trends Neurosci

26
:
321
–328.
Tomita H, Ohbayashi M, Nakahara K, Hasegawa I, Miyashita Y (
1999
) Top-down signal from prefrontal cortex in executive control of memory retrieval.
Nature

401
:
699
–703.
Tremblay L, Schultz W (
2000
) Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex.
J Neurophysiol

83
:
1877
–1885.
Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (
1998
) Activity-dependent scaling of quantal amplitude in neocortical neurons.
Nature

391
:
892
–896.
Usher M, Niebur E (
1996
) Modeling the temporal dynamics of IT neurons in visual search: a mechanism for top-down selective attention.
J Cogn Neurosci

8
:
311
–327.
Wallis JD, Anderson KC, Miller EK (
2001
) Single neurons in prefrontal cortex encode abstract rules.
Nature

411
:
953
–956.
Watanabe M (
1981
) Prefrontal unit activity during delayed conditional discrimination in the monkey.
Brain Res

225
:
51
–66.
Waelti P, Dickinson A, Schultz W (
2001
) Dopamine responses comply with basic assumptions of formal learning theory.
Nature

412
:
43
–48.
Wilson FAW, Scalaidhe SPO, Goldman-Rakic PS (
1994
) Functional synergism between putative γ-aminobutyrate-containing neurons and pyramidal neurons in prefrontal cortex.

91
:
4009
–4013.
Zipser D, Kehoe B, Littleworth G, Fuster J (
1993
) A spiking network model of short-term active memory.
J Neurosci

13
:
3406
–3420.

## Author notes

1Récepteurs et Cognition, Institut Pasteur, 25 rue du Docteur Roux, 75015 Paris Cedex 15, France and 2Université Pierre et Marie Curie, Modélisation dynamique des systèmes intégrés UMR CNRS 7138 — Systématique, Adaptation, Évolution, 7 quai Saint Bernard, 75252 Paris Cedex 05, France