The brain uses attention and expectation as flexible devices for optimizing behavioral responses associated with expected but unpredictably timed events. The neural bases of attention and expectation are thought to engage higher cognitive loci; however, their influence at the level of primary visual cortex (V1) remains unknown. Here, we asked whether single-neuron responses in monkey V1 were influenced by an attention task of unpredictable duration. Monkeys covertly attended to a spot that remained unchanged for a fixed period and then abruptly disappeared at variable times, prompting a lever release for reward. We show that monkeys responded progressively faster and performed better as the trial duration increased. Neural responses also followed monkey's task engagement—there was an early, but short duration, response facilitation, followed by a late but sustained increase during the time monkeys expected the attention spot to disappear. This late attentional modulation was significantly and negatively correlated with the reaction time and was well explained by a modified hazard function. Such bimodal, time-dependent changes were, however, absent in a task that did not require explicit attentional engagement. Thus, V1 neurons carry reliable signals of attention and temporal expectation that correlate with predictable influences on monkeys' behavioral responses.
Attention is a flexible spatio-temporal conduit that biases perceptual processing and prioritizes behavioral responses. Spatially directed attention is known to improve detection of change in stimulus attributes, such as the contrast or orientation of an element (Posner et al. 1982; Desimone and Duncan 1995; Carrasco and McElree 2001; Gutnisky et al. 2009). Attention in the temporal domain also has direct implications for perception and behavior (Coull and Nobre 1998; Gallistel and Gibbon 2000; Nobre 2001; Tse et al. 2004). For instance, it has been shown that stimulus processing is improved by prior knowledge of stimulus location, as well as timing of its appearance (or disappearance) (Coull and Nobre 1998; Nobre 2001; Ghose and Maunsell 2002; Hayden and Gallant 2005; Lakatos et al. 2007). A closely related, but a less explored, internal construct is expectation, which may also have spatial and temporal implications. It is generated by likelihood of an event to take place and gets updated based on previous occurrences in space or in time (Mauk and Buonomano 2004; Oswal et al. 2007). The degree of temporal expectation of an event, such as the probability of change, manifests as a marked improvement in behavioral performance, which scales with time. Expectation may thus improve detection accuracy and response timing, ultimately leading to behavioral optimization (Nobre 2001; Oswal et al. 2007). However, the relationship between attention and expectation and their relative influence on early sensory processing remains unclear and even disputed (Nobre et al. 2007; Summerfield and Egner 2009).
Our primary aims were 2-fold: To investigate whether single neurons in V1 were influenced by spatial attention; and whether neural responses varied in time with expectation of a change at the attended location in monkeys performing an attention task. Past studies have provided conflicting evidence regarding the influence of attention in primate V1. This is quite in contrast to higher visual areas such as V4 and MT, as well as parietal and frontal cortices, where single neurons readily exhibit robust attentional modulation (Goldman-Rakic 1995; Luck et al. 1997; Goldberg et al. 2006; Maunsell and Treue 2006; Squire et al. 2013). Several studies found minimal, if any, influence of spatially selective attention in primate V1 (Moran and Desimone 1985; Luck et al. 1997). Whereas Motter (1993) observed a significant increase in responses in about 25% of V1 neurons when monkeys attended to the cued location. Importantly, he also found a sizable population, roughly 10% of neurons, to exhibit suppression in responses with spatial attention. It appears that the key requirement is to employ tasks with sufficiently high attentional demand and an innovative use of stimulus configurations, specifically tailored to the properties of V1 neurons (Roelfsema et al. 1998; Vidyasagar 1998; Ito and Gilbert 1999; Sengpiel and Hubener 1999). In addition, consistently locating the target stimulus and distracters in close proximity, ideally within the receptive field (RF), may be critical, but difficult to achieve within the restricted size of the RF of V1 neurons (Luck et al. 1997; Klein 2000; Ling and Carrasco 2006; reviewed in Maunsell and Treue 2006).
Similar to spatial attention, neurons in hierarchically higher areas have also been shown to be influenced by the temporal flow of events that reflect internal state changes such as expectation or anticipation of an upcoming event. For example, neurons in the parietal cortex seem to follow a monkeys' subjective assessment of time in a manner which is specific to the learned schedules of the task (Janssen and Shadlen 2005). Similarly, responses in area V4 have been shown to correlate with monkeys' anticipation of a change in stimulus properties, borne out of previous training, and follow the hazard rate, or the conditional probability that an event would occur provided it has not happened already (Ghose and Maunsell 2002). However, neural underpinnings of expectation (or lack of it) in V1 have been the subject of considerable debate (Sirotin and Das 2009; Kleinschmidt and Muller 2010; Vanzetta and Slovin 2010; Handwerker and Bandettini 2011). Sirotin and Das (2009) have claimed that hemodynamic signals, and not neural responses in V1, are time-locked to a monkeys' anticipation of a visual stimulus. Others have argued that the mismatch may simply be a methodological artifact (Kleinschmidt and Muller 2010; Vanzetta and Slovin 2010). Regardless of whether neuronal activity correlates with hemodynamic responses, the question as to whether expectation or anticipation of an impending event (leading to a behavioral report) influences neural responses in V1, and whether attention plays a role in guiding temporal expectation, remains unresolved.
Our main challenge was to devise a behavioral task, which would allow us to have explicit control over spatial attention and temporal expectation. We hypothesized that the two could be disambiguated by sequentially manipulating the requirement to attend to the location of a target and then to the time when the target change was most likely to occur. As the target change occurred at an unpredictable time, the subjects had to implicitly track time. The key was to alternate high demand for attention to target location and then to time, that is, when the target change was expected to take place. In between was a period of relatively low attention demand that served to separate the 2 aspects of attention. Accordingly, we trained 2 monkeys in a covert attention task—while fixating at a central fixation spot they continuously monitored a peripheral attention spot. When it disappeared, they had to release a lever within a fixed time window to earn liquid reward. The time between attention spot appearance and its extinction was divided into 2: An initial “fixed” time window, in which the attention spot appeared and stayed on for a fixed duration, and a “variable” time window when the attention spot could disappear any time, prompting a behavioral response by the monkey. There were no cues given with regard to timing of fixed and variable periods, and monkeys had to make a subjective assessment of the elapsed time to correctly predict when the attention spot would extinguish. While the monkeys performed the task, we recorded from single neurons in V1 and subsequently analyzed their response dynamics. Our results show that change in V1 responses was significantly correlated with the task structure and monkeys' behavioral performance. Specifically, their reaction times (RTs) and performance improved with increasing trial duration. Neural responses showed a bimodal change—an early, moderate, but transient increase was followed by a sustained, late increase, beginning just before the time the monkeys expected a change to occur. Importantly, the late increase inversely correlated with RT, demonstrating that the early, largely spatial attention was later modulated by temporal expectation. To further confirm that bimodal response modulation was indeed due to time-dependent change in the attentional requirement, we recorded from the same area while the monkeys performed a simple fixation task. Here, no attentional engagement was necessary and the animals were rewarded for holding fixation within a window for fixed periods. We go on to show that in this task, there was no late, time-dependent increase in V1 responses, consistent with the lack of internally generated temporal expectation.
All experiments were performed under protocols approved by MIT's Animal Care and Use Committee and conformed to NIH guidelines. Two rhesus monkeys (6–7 kg) underwent initial chair training and were fitted with titanium head-posts under anesthesia (Dragoi et al. 2002; Sharma et al. 2003). After allowing 6–8 weeks for wound healing and head-post stabilization, monkeys were initially trained in standard eye-fixation and color change detection tasks. Training in experiment-specific tasks (see details below) followed where monkeys were exposed to increasing levels of difficulty as their learning progressed. After several months of training, the monkeys were able to maintain fixation on a small red dot (0.1°) for up to 5 s within a tight fixation window of 0.5° (monkey AB) and 0.75° (monkey AT) and were able to complete 3000–4000 trials in daily sessions at better than 95% accuracy. Importantly, the monkeys were trained for much longer fixation durations and completed twice as many trials as were required for the actual experiments. This was done to a keep constant motivational level and stable performance at the time of data recording. An infra-red eye tracker (ISCAN, Inc., Burlington, MA, USA), with 240 Hz sampling at ±0.1° Resolution, was used for monitoring eye position. Monkeys were required to hold stable fixation throughout the trial; fixational instabilities >0.25° aborted the trial. The eye tracker was calibrated before each experiment using a five-point calibration procedure, in which the animal was required to fixate on each 1 of 5 points (1 in the center, 2 in the vertical, and 2 in the horizontal axes or the diagonals) in steps of 5 and 12° from the central fixation spot. Gains were adjusted for linear horizontal and vertical eye deflections. Fixation patterns were carefully analyzed to rule out any systematic biases and inconsistencies under different experimental conditions [e.g., Attend-Toward (Attend-To) or Attend-Away].
Experiments 1 and 2
Figure 1a,b shows the task structure and the temporal sequence of events. Trials were initiated when the monkeys pressed down on a lever and a red fixation spot appeared. Once they acquired stable fixation for 300 ms, an attention spot (green, 0.1°) and an identical, isoluminant, gray distracter spot appeared on either side of fixation spot. In the Attend-To condition, the attention spot appeared toward the neurons' RF, whereas it appeared at a matched location on the contralateral side in the Attend-Away condition. The attention spot could appear in 1 of 4 locations chosen randomly for each recording—2 on the side of the RF, separated by at least 90° polar angle; the other 2 were mirror symmetric locations (Fig. 1c shows one such configuration). Introducing randomization in location of the attention spot had 2 advantages: first, it helped normalize any location specific bias in attention modulation with respect to a neuron's RF; secondly, it increased the task difficulty, requiring greater attentional engagement by the monkey. In experiment 1, the attention spot remained on for a fixed period of 900 ms, after which it could disappear anytime between 900 and 2300 ms. Monkeys had to wait for additional 150 ms before releasing the lever (within 500 ms) to earn their liquid reward. The mandatory delay of 150 ms was introduced to discourage monkeys from making a reflexive release, thereby minimizing the influence of preparatory motor signals from affecting the neural responses. No other cues were given at any time during the trial and the monkeys had to learn and implicitly follow the trial structure. While the probability of attention-spot disappearance was distributed evenly over time, its conditional probability increased with time (Luce 1986; Ghose and Maunsell 2002). Since the attention spot stayed on for the entire trial duration, monkeys had to continuously monitor the spot and were allowed to break fixation only after releasing the lever, thus controlling for oculomotor influences on neural responses. In experiment 2, the protocol was similar with the exception that the time schedules were very different. The total duration of each trial was shortened to 1550 ms while the initial, fixed time window was longer and lasted 1250 instead of 900 ms, thus substantially compressing the variable period when the attention spot could disappear to 300 ms (1250–1550 ms). In experiment 2, we wished to test whether changing the fixed time window would affect the duration and magnitude of early, temporally limited attention modulation. We also wanted to assess the influence of duration of the “variable” period and thereby changing the slope of the hazard rate would bring a commensurate change in the late attention modulation. Ideally, this could be done keeping the maximum trial length the same but increasing the fixed period and shortening the variable period. However, during training, monkeys were exposed to different lengths of “variable” schedules of the same trial duration. We therefore suspected that unless we changed the trial duration itself, just changing the variable period would not sufficiently change their previously learned behavior and might confound the results. The monkeys were also trained in durations much longer than were actually used for data recording to ensure consistent motivational levels. So, the option of increasing the trial duration also posed limitations. Shortening the total duration on the other hand, while increasing the fixed period and substantially compressing the variable period, we believe, achieved the same results as would have been done by keeping the overall duration constant, with an added advantage that monkeys were now exposed to an entirely different temporal structure of the task that had minimal overlap with the previously learned time schedules.
Stimulus Configuration for Attention Tasks (Experiments 1 and 2)
Stimuli were presented on a gamma-corrected monitor placed at a distance of 60 cm from the animal's eyes. Two identical patches of gray sinusoidal gratings, at 18% contrast (2 × 2°; 8 randomly interleaved orientations), were presented at the monitor refresh rate of 75 Hz (13 ms each frame; Fig. 1c)—one covering the RF, and another at a matched location in the opposite hemifield. Each trial cycle consisted of 112 stimulus presentations and 8 blank conditions, for a total of 120 conditions. To minimize the influence of physical attributes on neural responses, the attention spot was always placed outside the classical RF of the recorded neuron, yet close enough to elicit attentional influence in its vicinity. Similar configurations have been used previously for examining attention-related responses in V4 (Connor et al. 1997). The RF of the recorded neuron was first mapped with manual mouse-controlled stimuli and later with patches of randomly oriented gratings in a reverse correlation paradigm while the monkeys performed the fixation task. The extent of the classical RF was ascertained by minimum and maximum response field criteria. Stimulus patterns of either expanding center or converging annuli were placed at the center of the mapped RF—a significant decrease with expanding center or a significant increase with converging annulus gave a fairly precise estimate of the extent. At visual eccentricities of 4–8° from the fovea, the average size of the RF's encountered varied between 0.67° and 1.14° To account for fixational instabilities from potentially making the attention-spot drift in and out of the RF, thereby affecting the neural response, it was placed 0.25° outside the edge of the RF. As trials were automatically aborted if a monkey's fixation drifted >0.25°, this strategy effectively ensured that the attention spot never entered the RF. In addition, the attention spot appeared randomly in 1 of 2 locations, ≥90° (polar angle) apart outside the RF, and at matched symmetric locations in the opposite hemifield (also see Discussion).
In this experiment, there was no attention condition. Trials commenced when a red fixation spot appeared in the center of the stimulus monitor. Monkeys were simply required to acquire fixation within 300 ms and hold it steady within the prescribed window till the spot disappeared after 1500 ms. Notably in this task, there were no attention cues or any behavioral report required. Once the monkeys acquired stable fixation, stimuli consisting of randomly oriented sinusoidal gratings (8 orientations; spatial frequency: 2 cycles/deg; temporal frequency: 1 Hz, at 28% contrast) covering the RF were presented. The question we wished to ask was whether temporal expectation-related late modulation of neural responses as seen in experiments 1 and 2 was contingent upon attention. In other words, by stripping the task from requirements of attention to time or to space, we could disambiguate the influence of attention from time-dependent changes in stimulus-evoked neural responses. During the initial recordings, we used either 1 or 2 identical stimulus patches on either side of the fixation spot, which was similar to the configuration used in attention tasks of experiments 1 and 2. However, as the responses for the single or the dual stimulus configurations were similar, we later switched to one stimulus condition for all subsequent recordings.
Neuronal Recording and Data Analysis
We made transdural recordings in area V1, used a Crist grid (Crist Instruments, Baltimore, MD) to advance tungsten microelectrodes (1–2 MΩ at 1 kHz; FHC, Inc., ME, USA) via stainless-steel guide tubes. Neural signals were recorded using the Multichannel Acquisition Processor system (MAP; Plexon, Inc., Dallas, TX, USA). Single units were amplified, filtered, and viewed on an oscilloscope, and heard through a speaker. Spike waveforms were sorted using an off-line spike sorter program (Plexon, Inc., Dallas, TX, USA) and later analyzed with custom software written in Matlab (Mathworks, Natick, MA, USA). Responses to individual frames were assessed with the reverse correlation technique and were averaged over the entire duration of a single trial (Ringach et al. 1997). The probability distribution of stimulus-evoked response was calculated after subtracting responses during the stimulus blank condition for each stimulus. Only the responses until attention-spot offset in each stimulus condition were used to make a composite response histogram for all trial durations.
Reaction Time and Attention Modulation Index Calculations
The RTs (in ms) were calculated by taking the difference between time of attention-spot offset (plus an additional fixed time of 150 ms) and release of the response lever. In Experiment 1, for example, the earliest lever release in a correct trial could occur only after 1050 ms after attention-spot onset; hence the RT distributions were computed starting at 1100 ms (see Fig. 2a). Change in RT (Δ RT) was analyzed as a function of trial duration. Influence of attention on neural responses was quantified by calculating an attention modulation index (AMI) by averaging responses of each neuron in 2 attention conditions for the entire trial duration, as: [Response (Attend-To − Attend-Away)/Response (Attend-To + Attend-Away)]. AMI varies between −1 and 1, where negative values signify suppression of responses and positive values indicate facilitation in the Attend-To condition when compared with the Attend-Away condition. AMI for individual neurons was averaged in 50 ms bins using a sliding window with 10 ms overlap to capture the dynamics of attentional modulation over the course of the trial. To test whether neural responses modulated by attention over the entire trial period exhibited a bimodal distribution—an early spatial attention modulation followed by late, temporal expectation—we used Hartigan's Dip test (Hartigan and Hartigan 1985). The Dip test measures maximum difference between the empirical distribution function and a unimodal distribution to test for multimodality. The Dip test code in Matlab was adapted from the code originally written by F. Mechler (Mechler and Ringach 2002) and is available online.
Two monkeys (AB and AT) performed a sustained attention task in experiments 1and 2—they covertly monitored an attention spot that appeared in 1 of 4 possible locations and waited till it disappeared before releasing a lever within 500 ms to obtain a reward. Maintaining stable attention to a small spot of low to medium saliency presented within a patch of rapidly changing gratings of different orientations made the task attentionally demanding. Consequently, their average stable performance after several months of training was on the moderate side 72% (±5.8 SD) for AB and 76% (±4.1 SD) for AT.
Behavioral Data: Performance Measures
The monkeys also made significantly fewer performance errors in longer compared with shorter duration trials, suggesting progressively greater allocation of attention in the longer trials (P < 0.05; Student's t-test, for all errors including break fixation, early release, delayed release, and no release). Monkeys RT improved and performance errors systematically decreased with the length of the trials. Supplementary Figure 1 shows the histogram of RT and error rate in short, medium, and long duration trials for all data from both monkeys. Since the attention spot remained unchanged for a minimum fixed duration before a change occurred, at highly variable times, we examined whether behavioral performance reflected awareness of this temporal dichotomy. We found that monkeys tended to make far fewer errors during the initial “fixed” time window when compared with later in the trial when the transition was made to the “variable” period of attention-spot extinction. The “early release errors” in each trial session were calculated by taking the ratio of errors in the fixed and variable periods from the total errors. Of 5673 total errors in 53 sessions, there were 109 and 461 early release errors in fixed and variable periods, respectively (arcsin 0.17 and 0.3; P < 0.0002 two-tailed t-test, session-by-session average). This is consistent with the monkeys' awareness of elapsed time and their internal expectation of when the attention spot would disappear after trial onset.
Reaction Times and Model
RT data were analyzed from all sessions in which monkeys completed at least 3 trial cycles (typically 4–5 cycles) of 120 stimulus conditions per trial (Methods), and the same principal single units could be held for the entire duration. The mean RT (53 recording sessions: 32 from monkey AB, 21 from monkey AT, n = 25 808 correct trials) showed strong inverse correlation with the trial duration (r = −0.93, P < 0.001, Spearman's test; Fig. 2a). This systematic decrease in RT with increasing trial duration indicates that monkeys were sensitive to elapsed time and implicitly tracked the likelihood of attention-spot extinction as it increased with each passing moment after a certain point in the trial. Consequently, their expectation of an impending response also increased with time, resulting in faster RTs. Similar expectation-related response changes have been shown in human psychophysics and nonhuman primate studies (cf. Coull and Nobre 1998; Nobre 2001; Oswal et al. 2007; Lima et al. 2011). Figure 2a also shows mean RT as a function of trial length in the Attend-Away condition; there was no significant difference in RTs between Attend-To and Attend-Away conditions (P = 0.78, Spearman's test). This is consistent from a monkey's behavioral point of view as the 2 conditions were identical, requiring similar attentional resources; additionally they received the same amount of reward for correct responses irrespective of attention-spot location. The behavioral data show RTs monotonously decreasing with increasing trial duration as shown by a linear regression (R2 = 0.945 and 0.958 for Attend-To and Attend-Away conditions, respectively). However, it is important to note that it would be unlikely for monkeys to continue to respond faster and faster, with increasing trial duration. This apparently seems to be the case, for a closer examination of RT data does show the tendency of RTs to asymptote toward the end. Consequently, RT data could as well be fitted to a log function rather than linear regression, though R2 values in the current experiments were comparable for the 2 fits (0.93 and 0.95, respectively) (Tsunoda and Kakei 2008).
These behavioral results could be qualitatively described by a model of probabilistic inference under temporal uncertainty. Task performance under time-dependent processes can be conceptualized as the hazard rate h(t) (Luce 1986; Nobre and Shapiro 2006). Formally,
Neural Data: Analysis, Model, and Correlation with Behavioral Data
While monkeys engaged with the task, single-neuron activity was recorded in V1 (2 animals and 3 hemispheres). Our first priority was to ascertain if neural responses individually and at the population level were influenced by location of the attention spot, namely toward the RF (Attend-To) or away from it (Attend-Away). Stimulus-evoked responses at each time point were pooled in the 2 attention conditions and averaged for the entire trial period. Figure 2c (left ordinate) shows the time course of normalized population response of 98 well-isolated single units. The population responses in the Attend-To conditions were moderately higher in the beginning and then tapered off to become non-different from the Attend-Away condition before starting to diverge consistently, a trend that continued for the rest of the trial period. Since each stimulus was presented with equal probability in each trial cycle, the combined activity over time had attention as the only variable. The AMI was calculated for individual neurons and averaged to obtain population AMI (Fig. 2c, right ordinate; see Methods for details). The population averages showed a transient increase in responses in the Attend-To condition, peaking around 140 ms (±33 ms, mean ± SD), and leveling off by 500 ms (±80 ms). The dashed line indicates significance at P < 0.05, calculated by a bootstrap analysis where responses for individual units in the 2 attention conditions were shuffled and randomly picked 10 000 times to arrive at an unbiased estimate. As individual neurons showed considerable variability, we ran the same bootstrap analysis for the early period (first 600 ms after attention-spot onset) and plotted average AMI as well as peak AMI for individual units that showed either attentional facilitation or suppression (Supplementary Fig. 4; also see representative examples of individual neurons in Fig. 3a). After remaining close to the baseline for a few hundred milliseconds, the responses again diverged, beginning 810 ms (±40 ms) after attention-spot onset and continuing until the end of the trial. We calculated mean responses for the 2 attention conditions for the period starting 800 ms after attention-spot onset to the end of the trials for an individual neuron. For the entire population, the average response during this period for the same neuron was significantly higher in the Attend-To compared with the Attend-Away condition [F2,97 = 76.51, df = 2, mean square error (MSE) = 0.002, P < 0.014, two-way ANOVAs]. Notably, this increase in response in the Attend-To condition and AMI preceded behavioral response at the earliest time point (or the shortest trial duration) by several tens of milliseconds (90 ± 40 ms). For further examination of change in AMI with trial length, we divided the trials into short (<1400 ms) and long (>1900 ms) duration trials (Fig. 2d). The average AMI was significantly higher in the longer compared with the shorter trials (AMIlong = 0.16 ± 0.03 SEM; AMIshort = 0.09 ± 0.02; F2,97 = 29.51, df = 2, MSE = 0.013, P < 0.05, two-way ANOVAs). When AMI was compared for the time windows where expectation-related changes were prominent, including 800–1400 ms (short trials; starting 100 ms before the earliest attention-spot extinction) and 800–2300 ms (long trials), the population average AMI for the long trials was again significantly higher than that in the short trials (AMIlong = 0.23 ± 0.02 SD; AMIshort = 0.12 ± 0.03 SD; F2,97 = 97.51, df = 2, MSE = 0.001, P < 0.01, two-way ANOVAs). We also compared the standard deviation (SD) in average AMI for short and long duration trials by taking the same number of short and long duration trials and calculated SD for average AMI. There was a trend toward SD for short trials (1000–1400 ms) to be lower compared with long trials (1900–2300). Taken together, the neural responses and attentional modulation during the same trial showed clear bimodality: An early, moderate increase followed by a sustained increase, beginning several milliseconds before the earliest attention-spot extinction. The AMI change over time was significantly bimodal—the significant dip occurred between 350 and 650 ms as confirmed by Hartigan's Dip test (Dip = 0.029; P = 0.002; see Methods for details), demonstrating significant variability in attention that seemed to follow the monkeys' subjective assessment of the trial structure.
Extending our earlier conception of time-dependent, uncertainty modulated hazard rate (UMHR) to neural data, we modeled the response of cell i at time t as a function of the conditional probability that a target will appear in ri, the region in the vicinity of cell I's RF, at time t or a near-future time (up to t + δ). We assumed that the monkey treated spatial and temporal components of the target's appearance and disappearance as independent events conditioned on its experience. This, in fact, describes the true structure of the task: first, the target appears indicating “where” the attention needs to be directed; subsequently, the temporal dynamics of the task provides evidence about “when” the target will disappear. In practice, any single neuron's response will not be a function of these exact probabilities, but only estimates of these probabilities obtained by sampling stochastic evidence in the form of input from other neurons. Our model of neural activity thus takes the form of a linear combination of 2 components:
The first term, p(ri|x(t)), simply adds a Gaussian with an empirically set delay to the predicted response as soon as the monkey receives evidence about where the attention spot will appear. The rationale derives from the proposal that the initial deployment of attention toward a salient or standout region, if sustained, is quickly suppressed by internal inhibitory influences, thereby prompting an attentional shift to a new locus (e.g., Koch and Ullman 1985). This manifests in a time-dependent interplay of bottom-up facilitation generated by external saliency, and intracortical inhibition that limits this facilitation (Klein 2000; Itti and Koch 2001). Indeed, it has been shown that sustained attention over time can impair perceptual sensitivity to the attended stimulus (Ling and Carrasco 2006), possibly due to direct inhibitory effects that follow attentional gain over time, especially in early sensory processing (Lou 1999). Similar inhibitory processes may underlie the well-known psychophysical phenomena variously described as “inhibition of return” (Posner et al. 1982) or “attentional blink” (Raymond et al. 1992). We emphasize that our choice of temporally limited Gaussian envelope as a physiologically plausible descriptor of the initial deployment of attention and its decay over time is not critical for the model; one can as well use a simple step function of the same duration without affecting the overall results. The second term, h*(t), is the uncertainty modulated hazard rate (UMHR). We also posit that different circuits underlie computation of the spatial and temporal evidence terms, with different effective strengths or confidence values; consequently, the neural responses are represented by a 2-component model with differential weights ws and wt for the spatial and temporal terms, respectively. The model output and population AMI showed highly significant correlation (r = 0.92, P < 0.001, Spearman's test; Fig. 2e). The model simulations thus support the hypothesis that within the course of the same trial, the modulation of neuronal responses carries different weights depending on the task—that is, an initial modulation weighted more by a spatial component and a later component emphasizing time-dependent change in attention, relating closely to the expectation of an attention-spot extinction (hazard rate) and contingent behavioral response.
Next, we examined if responses of individual neurons also reflected the spatial and temporal variations inherent in the task. Previous studies have shown that V1 neurons exhibit considerable response variability depending on their laminar location and subtype specificity, indicating their diverse roles in processing bottom-up and top-down influences (Gilbert et al. 1996; McAlonan et al. 2008). The time course of responses for 2 attention conditions and attention modulation for 3 sample neurons are shown in Figure 3a–c) and represent the diversity of neural populations in our data. Focusing first on the early period, soon after attention-spot onset, Neuron 1 (top left) exhibited virtually no difference between the Attend-To and Attend-Away conditions and no attentional modulation. Neuron 2 (top middle) showed response facilitation in the Attend-To condition and significant attentional modulation, whereas Neuron 3 (top right) displayed response suppression in the Attend-To condition, demonstrating attentional suppression. Interestingly, the situation was much different later in the trial, when, regardless of their early preferences, a significantly greater proportion of neurons exhibited prominent attentional facilitation (Fig. 3a–c, right ordinate). This was further confirmed by analyzing each neuron's AMI in early and late phases for the entire population (Fig. 3d). In the early phase, there was no significant difference, with 22% (23/98) showing facilitation and 15% (15/98) suppression in the Attend-To condition. However, in the later phase, a significantly higher proportion of neurons (47/98, or 48%) were facilitated, with 40/98 showing a significant increase in AMI >0.15 (P < 0.01, Student's t-test), whereas only 5/98 exhibited suppression in response during the same period. Additionally, there was no correlation between AMI in early and late phases of the same trial (Fig. 3e). This suggests that distinct sources of top-down attention may come into play during early and late periods of a trial. Next, we examined whether change in the AMI and RT was correlated with the task timing. The AMI from individual neurons and RT data from the same trial sets were averaged in segments of 50 ms, starting 800 ms after the attention-spot onset. There was a significant positive correlation between AMI and trial duration and a significant negative correlation between mean RT and trial duration (r = 0.9 and −0.93, respectively; P < 0.001, Spearman's test, Fig. 3f); on direct comparison, AMI and RT (Fig. 3g) showed a significant negative correlation (r = −0.7; P < 0.001, Spearman's test). These results demonstrate that trial length significantly influences changes in AMI and covaries with behavioral responses.
From experiment 1, we were able to confirm that even in a sustained attention task, monkeys' attention varied with time and followed its internal assessment of task timing. There was a bimodal modulation in attention: An early temporally limited change was followed by a sustained increase that scaled with trial duration. We wondered if such bimodal changes in AMI would be replicable in trials with completely different time schedules. In experiment 2, the task was essentially the same as in experiment 1, with the following differences: The total trial duration was shorter, the initial “fixed” time window was longer, and the “variable” time window, when the attention spot disappeared, was shorter (see Methods for rationale and details). Figure 4a shows the population response of 32 single neurons (left ordinate), and the change in attention modulation (right ordinate), with the abscissa showing the time of the trials. The dashed line indicates the bootstrap averaged attention modulation at the P < 0.05 significance level. Qualitatively, the responses in the 2 attention conditions as well as AMI followed similar dynamics as in experiment 1: An initial, moderate, time-limited increase that tapered off was followed by a late, consistent increase in responses in the Attend-To condition relative to the Attend-Away condition. It is noteworthy that the late increase in AMI occurred later in experiment 2, around 1180 (±35) ms after attention-spot onset, compared with 810 (±50) ms in experiment 1. This demonstrates that the late increase did follow task timing and monkeys allocated attentional resources in accordance with their expectation of a change in relation to target onset as well as its offset. In addition, the slope of AMI was significantly different than in experiment 1. Thus, late attentional modulation in V1 responses not only reflect their estimation of time as to “when” the expected change was to take place but also, the “duration” within which it was most likely to occur.
Finally, to examine if the pattern and time course of response modulation was indeed an indication of the monkeys' internal state contingent on its behavioral response, we ran a third experiment where the animal's behavioral task required no attentional control or behavioral response. Thus, in experiment 3, they had to simply hold fixation at a central spot while gratings were presented in the RF for a fixed duration, at the end of which they received a liquid reward. Normalized population responses from roughly the same sites, as in previous experiments, showed a robust, stimulus-evoked increase that gradually declined as the trial progressed. Figure 4b shows the average population response from 33 single units in experiment 3 (black trace). For comparison, also shown are the average responses in the 2 attention conditions from experiment 1 (blue), and experiment 2 (red). In experiment 3 and unlike experiments 1 and 2, we observed no late, time-dependent change of V1 responses. This indicates that, in the absence of attention and contingent behavioral responses, V1 neurons do not exhibit response modulations due to attentional engagement and temporal expectation-related changes (Fig. 4b). The bimodality in responses was completely absent in the absence of spatially localized attention and task-dependent behavioral requirements.
The proposal that late attentional modulation may derive from temporal demands of the task is supported by the significantly different rates of change of attention modulation in experiments 1 and 2 (Fig. 4c; see legend for details). The task-dependent change in AMI and the model simulation (normalized for comparison, Fig. 4c) shows the time courses of attention modulation in experiments 1 and 2, including an early modest increase followed by a monotonic increase following task-dependent latency. The UMHR model for RT data and AMI neural data also showed significant correlation (r = 0.92 and 0.96 for experiments 1 and 2, respectively; P < 0.001, Spearman's test).
Most previous studies on the cortical locus of attention have focused on attention that enhances perceptual sensitivity to a location or feature of a relevant stimulus. Robust visuo-spatial attentional modulation has been described throughout the early visual pathway by manipulating distracter load, increasing (or decreasing) saliency, or spotlighting stimulus features such as color, orientation etc. (Spitzer et al. 1988; Gilbert et al. 1996; Connor et al. 1997; Vidyasagar 1999; Hayden and Gallant 2005; Chen et al. 2008; McAlonan et al. 2008). A number of underlying mechanisms have been proposed: Enhanced saliency by contextual-collinearity; biased-competition that prioritizes task-relevant information; scaling of neural responses or response normalization within the locus of attention; and selective increase of contrast of the attended location or object (Desimone and Duncan 1995; Gilbert et al. 1996; Maunsell and Treue 2006; Reynolds and Heeger 2009). Others have proposed intra- and interareal synchrony in specific frequency bands of local field potentials (Schroeder et al. 2001; Coe et al. 2002; Corbetta and Shulman 2002), or neuromodulators such as acetylcholine (Herrero et al. 2008; Goard and Dan 2009) as the principal drivers of spatially specific attention. In contrast, surprisingly few studies have examined temporal aspects of attention. Studies on human subjects attending selectively to different points in time have shown enhancement in behavioral performance (Carrasco and McElree 2001; Tse et al. 2004), an effect that seems to be principally mediated by fronto-parietal systems (Corbetta and Shulman 2002). Other studies have explored anticipatory changes in oculomotor preparation in parietal cortex, frontal eye fields, and superior colliculus (Dorris et al. 1997; Janssen and Shadlen 2005; Zhou and Thompson 2009), cerebral blood flow changes in preparation for trial onset (Sirotin and Das 2009), and responses of lateral intraparietal cortex (LIP) neurons signaling elapsed time relative to a remembered duration (Leon and Shadlen 2003). Thus, while the existing literature provides clues to temporal signals in the brain, few previous studies have examined the representation of temporal attention by cortical neurons.
Our results demonstrate that a substantial proportion of V1 neurons are significantly modulated by attention. While the primary task involved sustained spatial attention alone, clearly attentional responses do undergo change over time. Our results suggest that this time-dependent change does not follow a monotonous increase or decrease, but seems to be guided by the spatial and temporal conditions of the task. A more basic question involves whether modulation in neural responses could be due to “low”-level stimulus effects such as pop-out or figure-ground interactions. Let us recapitulate some noteworthy aspects of our experimental design: in experiments 1 and 2, the attention spot was located close to but always outside the RF of a neuron. In addition, the stimulus gratings provided no cues and were irrelevant to the monkeys' task performance. This effectively dissociated the monkeys' behavioral task from the neural “task,” the stimuli used for eliciting neural responses. Placing the attention spot outside the RF allowed us to record neural modulation as a result of spatially directed attention, unadulterated by characteristics of the attended spot. Furthermore, responses during stimulus “blank” trials, when only the attention spot was presented, were subtracted from stimulus-driven responses, so that the residual responses were solely stimulus evoked. For all these reasons, the response difference in the 2 attention conditions would most likely be due to attentional state of the animal. Finally, a sustained, time-dependent increase in responses, late in the trial, as seen in the present data, is highly unlikely to be due to low-level sensory influences.
Strikingly, similar attentional modulation early in the trials in experiments 1 and 2, despite having distinctly different “fixed” periods during which the attention spot appeared and stayed (900 and 1250 ms, respectively), suggests that attentional resources get continuously titrated according to the task demands. During this early phase, the monkeys simply had to locate the relevant spot; subsequently, the high attentional requirement soon dissipated, though monkeys had to continue to monitor the spot. There is some evidence that attention, sustained over time, interferes with perceptual processing in the early sensory pathway (Lou 1999; Sherman and Guillery 2002; Ling and Carrasco 2006). In addition, top-down attention is resource limited (Lavie and Tsal 1994). This implies that unless and until the demand is high, intracortical circuits actively inhibit the use of the precious resource and deploy it elsewhere for more emergent needs, as is probably the case for inhibition of return and attentional blink conditions (Posner et al. 1982; Raymond et al. 1992). Our finding of an early, temporally limited modulation in attention is novel and departs from other similar studies. Studies done in LIP and V4 (Gallistel and Gibbon 2000; Janssen and Shadlen 2005) have shown no early modulation of neural response, suggesting such interactions in attentional processing may be characteristic of V1, which includes both intracortical and cortico-thalamic circuits (Sherman and Guillery 2002). These interactions, along with specific inhibitory mechanisms, may control both the magnitude and duration of attention deployment depending on behavioral contingencies. Recent evidence indicates these inhibitory processes to be quite dynamic; they may increase attentional facilitation via disinhibition in response to increasing resource demands, as in the case of uncertainty (Yu and Dayan 2005) and task difficulty (Castel et al. 2005), or conversely, they may inhibit and/ or redeploy resources during low demand, as seen here, and in tasks requiring sustained attention (Ling and Carrasco 2006). Inhibitory mechanisms in V1, particularly those involving calretinin-positive neurons, are attractive substrates for implementation of such dynamic gating of top-down excitatory influences (Meskenaite 1997; Xu and Callaway 2009).
Our results also provide predicable and at least partially separable signatures of spatial and temporal attention. Several lines of evidence support this contention. First, as discussed above, at the beginning of a trial, attentional elevation is transient and is limited to locating the relevant spot. Hence, it seems fully devoted to space while remaining agnostic to time. Later in the trial, sustained elevation that scales with trial duration indicates transition from space only to time as well, because the expectation of a change also requires attention to time. Secondly, an increase in neuronal response in the Attend-To condition begins just before the earliest attention-spot disappearance and is temporally shifted in experiments 1 and 2, from 900 to 1250 ms (fixed delay). This demonstrates task-dependent internal state changes. Thirdly, the rate of change of late attentional modulation is significantly different in experiments 1 and 2 (Fig. 4c), signifying exquisite sensitivity of neuronal modulation not only to the time when the attention-spot was expected to disappear, but also the duration within which the expected change would occur. Fourthly, the magnitudes of early and late attention modulation are significantly different and have no correlation (Fig. 3d,e). These findings, combined with a highly significant inverse correlation between monkeys' RT and trial duration, provide evidence for an overriding influence of attention to space in the early phase and to time in the later phase of the trials, both in guiding monkeys' behavior and in their V1 responses. This argument is further supported by the results from experiment 3, where no such temporally guided late response modulation is discerned, confirming that dynamic changes in V1 responses in “attentionally controlled” experiments are indeed dependent on internal states that override “normal” neuronal responses and that change in a task-dependent fashion during the course of the same trial. The exquisite sensitivity of monkeys to trial duration is also evident in their surprisingly faster RTs during occasional repeats of trials of similar duration (see Supplementary Material for full description). Their behavioral data from experiment 1 show a significant improvement in RT, especially in the short duration trials (Supplementary Fig. S5). The monkeys seem to be sensitive to such randomly occurring repeat sequences, indicating their conscious awareness of such epochs, which they use to optimize their behavioral performance.
Prior to our work, 2 studies in awake, behaving monkeys, one in V4 (Ghose and Maunsell 2002) and the other in LIP (Janssen and Shadlen 2005), have shown similar task-dependent changes that correlate with the hazard rate of attention change. Both of these studies changed the shape of the hazard function to show that attention-guided saccadic responses covary with task timing and monkeys' anticipation of stimulus change probability. However, the V4 study (Ghose and Maunsell 2002) did not test whether the responses correlated with the behavioral report of the monkeys. The LIP study (Janssen and Shadlen 2005) on the other hand did not explicitly use attention as a probe and a modulator of anticipatory change in their primary task. In the present study, we have not only focused on the earliest possible cortical area, V1, but have also varied the hazard function by manipulating the trial duration to shape behavior. Also, instead of using saccade as an indicator of performance, which may confound results from an area where neurons can be sensitive to saccade preparation, our task required monkeys to respond by grabbing or releasing a lever, and that too after a fixed delay of 150 ms following attention-spot extinction, with maintained fixation till response, effectively minimizing reflexive and occulomotor influences (Dorris et al. 1997; Moore et al. 1998; Coe et al. 2002; Corbetta and Shulman 2002). As reward contingency and its timing have been shown to affect even V1 responses (Shuler and Bear 2006), we did not manipulate the reward to encourage faster RTs, and the animal received a fixed amount of reward for each correct response regardless of whether the attention spot appeared towards the RF or away from it. However, neural responses related to reward expectation are difficult to disambiguate from attention (Maunsell 2004), because most experimental designs requiring a behavioral report from primates (and other mammals) use reward as the primary motivator. Similar confounds exist in interpretation of other cognitive signals where reward timing is contingent on behavioral response and does not occur at a fixed time point during the trial. To the best of our knowledge, no clear demonstration of reward-related facilitation of single-neuron responses in V1 has been made, especially in a cognitive task where response modulation is directly related to whether the animal attended toward the RF or away from it. Thus, we believe our experiments capture internal signals principally related to the attentional state of the animal combined with expectation of a behavioral response.
The neural responses we describe provide insights into the dynamics and mechanisms of spatio-temporal attention. During the initial phase of the trials and responses, attentional resources are required to locate the attention spot (spatial uncertainty related to “where” the spot would appear). Here, no attention to time is necessary, because animals learn that there is no behavioral response to be made for a certain period of time after the trial onset. Conversely, in the later period, the attentional modulation seems to be wholly of temporal origin (temporal uncertainty related to “when” the spot would disappear). Here, no further attention to spatial processing, aside from awareness of the spot, appears necessary, resulting in considerably reduced attentional load (Chelazzi et al. 1998). This phase of attentional processing directly links with behavioral response, in this case releasing the lever whenever the attention spot is extinguished. The single-neuron responses (Fig. 3a–c) capture these 2 distinct phases, revealing an early spatial phase that varies between neurons and a later temporal phase, which is invariant across neurons. It is tempting to relate the attention-related responses to space and time to their respective uncertainties (Yoshor et al. 2007) arising from task structure. Dissipation in attention with increasing predictability has been proposed as a means to achieve cognitive economy (Pearce and Hall 1980; Yu and Dayan 2005). Conversely, the greater the uncertainty, the greater the attentional load. Initial uncertainty (“where” the attention spot will appear) followed by predictability (no change will occur for a period of time) might explain the early enhancement and dissipation of spatial attention signals in our task. Later in the trial, temporal uncertainty or expectation (“when” the spot will disappear) increases with each passing moment and requires heightened attention to time for successful completion of a trial and subsequent reward. Our finding of sequential spatial attention and temporal expectation signals in V1 indicates that V1 may act both as a spatially selective gate for bottom-up inputs and an integrator of top-down influences, likely through feedback connections impinging on local and long-range intracortical and interareal networks (Gilbert et al. 1996; Das and Gilbert 1999; Lamme and Roelfsema 2000; Schroeder et al. 2001; Angelucci and Bressloff 2006; Hung et al. 2007; Womelsdorf and Fries 2007; Sommer and Wurtz 2008). Regardless of mechanism, robust modulation of neurons in early visual cortex, even when no overt visual cues are provided and in the absence of any eye movements, unequivocally demonstrates the widespread representation of behaviorally relevant time in the cortex.
J.S. and M.S. conceived the experiment and wrote the paper. J.S., H.S., and J.Sch trained monkeys and performed recordings. Y.K., J.B.T., J.S., and H.S. carried out the modeling. J.S., H.S., and Y.K. analyzed the data.
Supported by grants from the US National Institutes for Health and the Simon Foundations to M.S.
The authors thank Dr R.P. Marini for excellent veterinary care and editorial advice; Travis Emery for technical help; Anubhav Jain for help with some figures; Samvaran Sharma and Sunanda Sharma for text editing, and members of the Sur lab especially Michael Goard and Nathan Wilson for discussions and comments. Conflict of Interest: The authors declare no competing financial interest.