Selective visual attention directed to a location (even in the absence of a stimulus) increases activity in the corresponding regions of visual cortex and enhances the speed and accuracy of target perception. We further explored top-down influences on perceptual representations by manipulating observers’ expectations about the category of an upcoming target. Observers viewed a display in which an object (either a face or a house) gradually emerged from a state of phase-scrambled noise; a cue established expectation about the object category. Observers were faster to categorize faces (gender discrimination) or houses (structural discrimination) when the category of the partially scrambled object matched their expectation. Functional magnetic resonance imaging revealed that this expectation was associated with anticipatory increases in category-specific visual cortical activity, even in the absence of object- or category-specific visual information. Expecting a face evoked increased activity in face-selective cortical regions in the fusiform gyrus and superior temporal sulcus. Conversely, expecting a house increased activity in parahippocampal gyrus. These results suggest that visual anticipation facilitates subsequent perception by recruiting, in advance, the same cortical mechanisms as those involved in perception.
The perception and interpretation of the visual world is always colored by previous experience and by current expectations. Helmholtz (1867/1910) argued that observers routinely generate hypotheses about what they expect to see, based on prior experience and current goals. These expectations can be rich with spatial, featural, and object-specific content. Many studies have demonstrated that perception is facilitated when a target appears at an expected location (Posner et al. 1980; Prinzmetal et al. 2005). Similarly, expectation of visual features such as shape, color, and motion facilitates recognition and discrimination (e.g., Ball and Sekuler 1981; Corbetta et al. 1990; Leonard and Egeth 2008).
Higher level visual priming of complex stimuli such as faces and houses has recently been shown to facilitate object recognition as well. For example, Puri and Wojciulik (2008) had participants discriminate between normal and distorted images of faces and houses. When they were told in advance to expect a particular famous face, participants were faster and more accurate at the discrimination when the target matched their expectation. In many of these examples, not only did correct expectation lead to better performance but also incorrect expectation impaired performance. In addition to knowing “where” and “what” to expect, we can also generate hypotheses about “when” a stimulus will appear. This kind of temporal cueing or expectation also facilitates visual perception (e.g., Correa et al. 2005; for review, see Nobre et al. 2007).
The neural mechanisms of spatial and feature-based visual expectation have been studied extensively. For example, directing attention to a location in anticipation of a target leads to increased neural activity in spatially specific regions of visual cortex (Luck et al. 1997; Kastner et al. 1999; Hopfinger et al. 2000). Furthermore, this effect is modulated by task difficulty, such that anticipating a more difficult discrimination leads to a greater baseline shift in cortical activity even in the absence of visual stimulation (Serences et al. 2004). Changes in baseline activity have also been observed in the feature domain: anticipation of the onset of motion leads to anticipatory activity in motion selection visual cortex, area MT/MST (Chawla et al. 1999; Shulman et al. 1999), and conversely activity in color-selective V4 increases—in the absence of color—when a color task is anticipated. Together, these studies suggest that one of the mechanisms of perceptual expectation is anticipatory neural activity in brain regions associated with stimulus-specific information. It has been demonstrated that these anticipatory effects in the visual brain are associated with, and presumably controlled by, regions of parietal and frontal cortices, such as intraparietal sulcus (IPS), frontal eye fields, and lateral prefrontal cortex (PFC) (e.g., Kastner et al. 1999; Hopfinger et al. 2000). However, it has not clearly been demonstrated that anticipatory effects extend to more abstract categorical visual representations.
Object recognition in the real world is often ambiguous and continuously unfolding in time. One way in which this aspect of the recognition process has been studied is with ambiguous images that gradually become disambiguated over several seconds. In the 1960s, Bruner and Potter (1964) found that when gradually reducing the blurriness of an image, the blurrier the image was when the sequence began, the longer it took for participants to recognize the object. Bruner and Potter argued that longer exposure to blurry objects lead to “false” and premature hypotheses or expectations, impairing the recognition process.
This method has usefully been applied to investigate the neural basis of expectation on object recognition using functional magnetic resonance imaging (fMRI). For example, Eger et al. (2007) found that word cues did speed the recognition of objects that were slowly coming into focus, but this was not accompanied by anticipatory changes in visual cortex when controlling for speed of recognition. However, in that study, the primes were verbal rather than visual. For example, expecting a “rabbit” to appear does not necessarily lead to a useful visual template for object recognition because there are nearly an infinite variety of specific visual images that could correspond to “rabbit”.
In the current study, we used a gradual disambiguation procedure in which “categorical” visual expectation (i.e., faces vs. houses) could improve performance in an object discrimination task (Sadr and Sinha 2004; Eger et al. 2007). Target faces required gender discrimination (male or female), and houses required a structural discrimination (1-story house or 2-story house). To encourage a visual anticipation strategy, each face and house were similarly cropped and presented at the same viewing orientation and size, giving participants a rough “template” for each category. An initial behavioral study provided evidence that expectation effectively facilitated perceptual discriminations. We then used fMRI to examine whether expecting a face or a house leads to baseline shifts of neural activity in object-selective cortex in the absence of object-specific sensory information. Specifically, we examined the time course of the effects of expectation and how this may be related to temporal expectation. To preview our results, we report that object category expectation facilitates object recognition and is associated with anticipatory modulations in temporal lobe category-selective regions—modulations that are absent in occipital lobe object-selective regions. Further, the timing of the anticipatory effect reflects the time at which the observer expects the stimulus to become visible.
Materials and Methods
Sixteen healthy undergraduate students at Johns Hopkins University participated in the behavioral study. A new group of 9 graduate and undergraduate students participated in the fMRI experiment. All fMRI participants were right handed. All participants provided informed consent as approved by the Johns Hopkins University Institutional Review Board.
Face and house photographs were used as stimuli. These images were taken from CalTech database and the AR-face database (Martinez and Benavente 1998), as well as in-house photography. Twelve faces and houses were used for each experiment consisting of 6 one-story and 6 two-story houses and 6 male and 6 female faces. All images were cropped and sized to 200 square pixels and transformed to gray scale. The images subtended 6.0 × 6.0 degrees of visual angle from a viewing distance of 68 cm.
The dynamic face and house stimuli were created by first subjecting each image to a Fourier transform into spatial frequency and phase components. The spatial phase of a given image was randomized; this was the starting point of the visual stimulus on a given trial. The phase was then unrandomized gradually over time using the RISE algorithm, by linearly interpolating from the random phase map to the original image's phase map (Sadr and Sinha 2004). The result is an image that starts in a state of unidentifiable noise and that gradually coheres into a recognizable picture (see Fig. 1a).
The identifiability of any given visual image depends on the content of the phase map much more than on the relative magnitudes of its spatial frequencies (Sadr and Sinha 2004). In order to equate images for spatial frequency, we used the mean spatial frequency magnitude spectrum of all the original images as the magnitude spectrum for all stimuli in these experiments (Sadr and Sinha 2004); thus, the spatial frequency content of all images was identical and was thus not a cue to object category.
On each trial, an image of a house or a face was presented, starting from a random phase that gradually cohered as described above at a rate of 1% per 75 ms. The sequence ended at 74% coherence because at this point, all objects were clearly discriminable. This “image coherence interval” thus lasted 5550 ms. This was preceded by a “noise interval” that ranged in duration from 0 to 6 s randomly; the onset of the image coherence interval, and thus the earliest sensory information about the target image, was therefore unpredictable. The noise interval was a dynamic display that “cohered” from one random phase map to another random phase map (rather than from a random phase map to the image phase map).
Participants made either a gender discrimination (male/female) or house structure discrimination (1 story/2 story); they were instructed to respond as quickly as possible while minimizing errors. Using the right hand, button 1 was pressed for “male” or “1 story” and button 2 was pressed for “female” or “two story” (Fig. 1a). Category cues (100% validity) consisted of red or blue horizontal lines presented at the top and bottom of the image throughout each trial. For half the subjects, red indicated with 100% validity that the image would cohere to a face and blue to a house; this mapping was reversed for the remaining subjects. Yellow lines indicated that face and house were equally likely to occur. A point system was employed to encourage accuracy, speed, and to provide feedback after each testing block (8 blocks total and 18 trials each).
fMRI Paradigm: Face and House Categorization
In the magnetic resonance imaging (MRI) scanner, participants performed a version of the behavioral task with a few modifications. First, because the behavioral task clearly established an effect of expectation (vs. a neutral condition), we chose to omit neutral cues in this version to maximize the amount of blood oxygen level–dependent (BOLD) fMRI data collected during states of deliberate expectation. Thus, in this experiment, we have no behavioral index of the effect of expectation.
To optimize the design for fMRI, the rate of coherence (again to a maximum of 74%) was decelerated to 1% per 100 ms; the image coherence phase therefore lasted 7400 ms. In addition, the duration of the noise interval was 0 s (17% of the trials), 4 s (17%), or 8 s (67%). This modification achieved several goals. First, the variable noise period made the target image presentation temporally unpredictable, and the shortest interval meant that the image could cohere early in the trial, encouraging participants to actively anticipate the image for an extended period of time. In addition, because of the sluggishness of the BOLD response, the longest noise interval offered a way to observe the effects of expectation in the absence of category- or object-specific stimulus information and thus comprised the majority of trials. This design introduced a hazard function such that, on average, each image began to cohere after 6 s of noise and started to become discriminable approximately 5–6 s later (11–12 s into the trial). The soonest a participant could respond accurately to an image was 5–6 s into a trial (around 50–60% coherence following a 0-s noise interval). Each participant performed eight 6-min runs of 18 trials each. Intertrial intervals were randomly and equally often 4.5, 6.5, or 8.5 s. The onsets of successive trials were thus separated by 12–24 s in this slow event-related design.
fMRI Paradigm: Face and House Localizer Task
To independently define face and house-selective regions of cortex, each participant completed a block design localizer task after the main experiment. In the localizer task, the same faces and houses used in the categorization task were presented at either 100% phase coherence or 35% phase coherence for 15-s blocks at a rate of 1 image per second while participants performed a 1-back matching task (4 conditions: faces 100% coherence, houses 100% coherence, faces 35% coherence, and houses 35% coherence; each condition was presented 4 times per run). Participants completed 3 or 4 runs of the localizer task (1 participant only performed a single run). The 100% coherent images were used to localize category-specific cortical regions; the 35% coherent images were used to determine the degree to which the cortical responses in these regions were category specific in the absence of expectation using low-coherence stimuli.
MRI scanning was carried out with a Philips Intera 3-T scanner in the F. M. Kirby Research Center for Functional Brain Imaging at the Kennedy Krieger Institute, Baltimore, MD. Anatomical images were acquired using a magnetization prepared rapid gradient echo T1-weighted sequence that yielded images with 1-mm isotropic voxels (time repetition [TR] = 8.1 ms, time echo [TE] = 3.7 ms, flip angle = 8°, time between inversions = 3 s, inversion time = 738 ms). Whole-brain echo planar functional images (EPI) were acquired with an 8-channel SENSE (MRI Devices, Inc., Waukesha, WI) parallel-imaging head coil in 40 transverse slices (TR = 2000 ms, TE = 35 ms, flip angle = 90°, matrix = 64 × 64, field of view = 192 mm, slice thickness = 3 mm, no gap). Neuroimaging data were analyzed using BrainVoyager QX software (Brain Innovation, Maastricht, The Netherlands). Functional data were slice time and motion corrected and then temporally high-pass filtered to remove components occurring 3 or fewer times over the course of a run. To correct for between-scan motion, each subject's EPI volumes were all coregistered to that subject's anatomical scan. Spatial smoothing was applied (4-mm full-width half-maximum Gaussian kernel).
Object-Selective Regions of Interest
Regions of interest (ROIs) were defined for each subject separately based on independent a priori criteria. Time courses were later extracted from these ROIs to examine effects of expectation on critical trials (i.e., those with the long, 8-s noise intervals).
First, face- and house-selective activation maps were defined from the localizer task (face blocks contrasted with house blocks; cluster correction for multiple comparisons was performed for all maps, P < 0.05).
A second step in ROI definition was included to further restrict the voxels that were task relevant without violating the independence of the ROI definition. This refinement of the ROIs served 2 functions: first, the main task involved much less distinct objects (a maximum of 74% phase coherence in the main task vs. 100% in the localizer) and second, the depth of processing differed in the 2 tasks (1-back matching in the localizer vs. discrimination of gender or house size in the main task). Thus, the final ROIs were further constrained by selecting activated voxels within the localizer-defined regions that revealed object-selective activity on the noncritical, short noise trials (i.e., those with 0- or 4-s noise interval durations) in the main task using a general linear model (GLM) to estimate parameter values. Nominal P values were chosen individually to be maximally sensitive to ROIs and to allow ROIs to remain functionally distinct (ranging from t = 2.25, P = 0.024 to t = 3.75, P = 0.00018). This yielded voxels that were selectively responsive to the images even at lower coherence (<75%) during the expectation tasks.
Each of the ROIs created were object selective in both the localizer task and the main task (see Results). Critically, the ROI analyses reported below were conducted on long-duration noise trials (8-s noise intervals). The ROIs were thus defined using data that were independent of the critical expectation data.
We performed an exploratory analysis of the whole brain to identify regions that might be candidate sources of the modulatory signals that evoked the anticipatory responses in category-selective cortex. Anatomical and functional images were Talairach transformed and resampled in 3-mm isotropic voxels. Face and house regressors were defined as a hemodynamic response function (Boynton et al. 1996) convolved with a boxcar starting at 4 s into the critical trial and lasting the entire phase coherence sequence (4 s of noise interval + entire image interval). The construction of this regressor was informed by the time course in the category-selective ROIs (see Results) and reflects a rise in activity beginning around 8 s into these trials. A group random effects (RFX) analysis was performed using a GLM. The contrast of interest was a conjunction of the main effects of both the face and house regressors (conjunction of RFX, Nichols et al. 2005). This analysis was used to identify regions that responded significantly for both face and house trials (i.e., domain-general control regions) that mirrored the anticipatory effect we observed in category-selective cortex (see Results). A minimum cluster of 8 contiguous voxels was adopted to correct for multiple comparisons, yielding a whole-brain corrected statistical threshold of alpha < 0.01 (t = 4.00, nominal P < 0.004) determined by a cluster threshold estimator plug-in implemented in BrainVoyager. To ensure that the statistical tests were independent of voxel selection, event-related averages were generated using a leave-one-subject-out cross-validation procedure, where the ROIs are defined for each subject using the other subjects’ data (Esterman M, Tamber-Rosenau BJ, Chiu YC, Yantis S, submitted).
Overall mean accuracy was 84%. An analysis of variance (ANOVA) on accuracy revealed a main effect of stimulus, such that participants were better at the house than face task overall (faces = 82% vs. houses = 86%, F1,15 = 7.27, P < 0.05). There was no effect of expectation on accuracy (valid vs. neutral, P > 0.1).
Coherence Required for Accurate Discrimination
In this experiment, response time and percent phase coherence at response are directly yoked (1% coherence increase every 75 ms). We report behavioral responses in terms of the percent phase coherence of the image at the moment the observer correctly classified the image. Thus, smaller values indicate that the image was classified on the basis of less coherent visual information that was present at some point before the button-press response was made (see Fig. 1b).
A 2 × 2 ANOVA was conducted on the coherence of the images at the moment a correct response was made. We observed a main effect of stimulus (faces = 63.7% coherence, houses = 66.0% coherence, F1,15 = 21.9, P < 0.001) and expectation (valid = 64.2%, neutral = 65.5%, F1,15 = 95.44, P < 0.001). The interaction approached significance (F1,15 = 3.69, P < 0.08), driven by the fact that the expectation effect was slightly larger for houses (64.2% vs. 66.8%) than for faces (63.3% vs. 64.2%). Paired post hoc t-tests revealed significant expectation effects for both stimulus types (P < 0.05).
Behavior during Imaging
Overall, participants were 86% correct in classifying the faces and houses, similar to the behavioral experiment. Performance did not differ for face and house trials (face trials: 67.4% coherence, 86% correct; house trials: 68.0% coherence, 85% correct).
We also examined the effect of noise duration (0, 4, or 8 s). We observed a main effect of noise duration on coherence at response (F2,16 = 7.90, P < 0.01): correct discrimination was achieved earlier in trials with a medium or long noise interval duration than in trials with a short noise interval duration (67.3% vs. 68.7% phase coherence; P < 0.05). This result suggests than participants were less prepared for the cued image category when the image appeared early in the trial: they were 134 ms slower to respond (note that the short- and medium-duration noise trials were less frequent—16.7% of the trials each—than the long-duration noise trials—67% of the trials). This leads to the prediction that any observed preparatory effect in brain activity should be maximal after about 7 s into trials on average (the mean response time for short-duration noise trials was 6870 ms into the trial). However, the absence of a difference between medium- and long-duration noise trials (67.1% vs. 67.4% coherence) suggests that preparation is complete by the time the object appears on the medium-duration noise trials; thus, preparation is optimized by about 11 s into the trial (medium length trials: 4 s of noise + 6.71 s for mean response time).
Three face-selective ROIs were localized in most hemispheres. The first was defined as the face-specific peak in the middle/anterior fusiform gyrus (FG), often termed the fusiform face area or FFA (Sergent et al. 1992; Allison et al. 1994; Puce et al. 1995; Kanwisher et al. 1997). We observed a robust right FG region for all 9 participants and a left FG activation for 5 of the participants. Second, a region was identified in the lateral inferior occipital gyrus (IOG), often termed the occipital face area or OFA (Gauthier et al. 2000; Rossion et al. 2003), for all 9 participants in the right hemisphere and for 5 of 9 in the left hemisphere. Finally, a region in the posterior superior temporal sulcus (pSTS; Puce et al. 1998; Haxby et al. 1999) was face selective in all 9 participants in the right hemisphere and for 3 of 9 in the left hemisphere. A region in the parahippocampal gyrus (PHG), often termed the parahippocampal place area or PPA (Epstein and Kanwisher 1998), exhibited selectively greater activation for houses in 8 of 9 participants in the right hemisphere and 6 of 9 in the left hemisphere. In addition, we identified a region in more dorsal extrastriate cortex (transverse occipital sulcus or TOS) that was house selective in 8 of 9 participants (5 in the right hemisphere and 6 in the left). This region is consistent with the building selectivity in TOS that has been reported in other studies (e.g., Hasson et al. 2003; Schwarzlose et al. 2008).
Each of these ROIs exhibited reliable selectivity to 100% coherent faces and houses in the localizer task (as they must according to how they were defined; see Materials and methods, Fig. 2). Recall that in the localizer task, we also presented faces and houses exhibiting only 35% spatial phase coherence. Importantly, these ROIs exhibited no category selectivity at 35% coherence (35% faces minus 35% houses, Fig. 2), suggesting that the images with 35% phase coherence did not contain sensory information that could drive category-specific responses in the face- and house-selective ROIs. Thus, any observed activation difference in the category-selective regions during the expectation task that occurs either during the noise interval or during the image coherence interval before 35% coherence must be driven solely by top-down expectation. In fact, given the delay of the BOLD response, it would be legitimate to infer purely top-down effects even several seconds after 35% coherence is reached; to be conservative, we will limit the conclusions regarding pure effects of expectation to the period before the image contains 35% phase coherence.
Event-Related BOLD Time Course in Category-Selective ROIs
Event-related averages were examined for critical trials in the category-selective ROIs (i.e., in the 67% of trials with 8-s noise intervals). Discrimination errors and anticipatory responses during the noise interval were removed (14% of trials). For participants with bilateral ROIs, time series were averaged across the 2 hemispheres, yielding a total of 5 ROIs (3 face-selective regions, shown in Fig. 3, and 2 house-selective regions, shown in Fig. 4). Functional data were transformed to percent signal change relative to the mean of the run.
In the FG and the STS, there was reliably greater activity on face trials compared with house trials starting 10 s into the trial at 20% coherence (paired t-tests, P < 0.05; Fig. 3). Given the hemodynamic delay, this reflects neural activity from at least 2–4 s earlier when the stimulus was at 0% coherence; even ignoring the hemodynamic delay, the stimulus at this point contained insufficient sensory information to drive category-specific responses, as discussed earlier. These 2 regions thus exhibited purely expectation-driven category-specific activity. In contrast, in the occipital face region (IOG), face trial activity was not reliably greater than house trial activity until 16 s into the trial after the visible face or house had cohered to 74% and disappeared. This selectivity in IOG was therefore potentially stimulus evoked; there was no evidence that it was purely expectation driven.
In the PHG, there was greater activity on house trials compared with face trials starting at 10 s into the trial at 20% phase coherence (P < 0.05; Fig. 4). This parallels the finding in the FG and pSTS. In contrast, in the occipital house-selective region (TOS), house trial activity was not reliably greater than face trial activity until 18 s into the trial after the visible face or house had cohered to 74% and disappeared. This parallels the finding in the occipital face region (IOG).
Category-Independent Effects of Expectation
To explore category-independent effects of expectation, we performed a group whole-brain RFX GLM conjunction analysis by modeling a regressor whose rise in activity begins about 8 s into the critical trials and spanned the entire phase coherence sequence (modeled after the anticipatory effects in the category-selective ROIs; see Materials and Methods). Using a conjunction analysis of a main effect of this regressor for both face and house trials revealed several regions (see Fig. 5). First, this analysis revealed regions in the right prefrontal cortex extending to inferior and middle frontal gyri (center of mass: x = 43 y = 4 z = 30) as well as bilateral posterior intraparietal sulcus extending to the occipital junction (left hemisphere: x = −24 y = −73 z = 26; right hemisphere: x = 25 y = −67 z = 33). Time courses in these regions reveal patterns that are similar to those observed in the category-selective ROI: a rise initiated about 8–10 s into the trials, particularly within the IPS. This analysis also identified 2 clusters in bilateral occipital cortex (left hemisphere: x = −21 y = −86 z = 0; right hemisphere: x = 27 y = −83 z = 2) and 2 clusters in bilateral inferior temporal cortex (not shown—left hemisphere: x = −41 y = −63 z = −9; right hemisphere: x = 37 y = −50 z = −13). However, in contrast to the frontal and parietal regions, the BOLD response in these regions rose steadily from 2 s into the trial and peaked at 18 s; this pattern likely reflects activity associated with dynamic visual stimulation evoked by the presentation of the stimulus rather than by expectation per se.
Observers viewed phase-scrambled images of faces or buildings and actively anticipated a gradually cohering member of one of these categories on each trial. Expectation yielded faster discrimination of gender (for faces) or structure (for houses) that was based on less coherent visual information than when the expectation was neutral. This behavioral result extends previous work that reported face and building expectation effects for specific exemplars but not for categories (Puri and Wojciulik 2008). This discrepancy could be due to the visual similarity of our images within each category, which may have allowed observers to form a more effective generic visual template.
Our fMRI experiment addressed the neural basis of category-specific expectation by revealing that it is associated with anticipatory brain activity in some (but not all) category-selective regions of visual cortex. These anticipatory effects were observed in temporal lobe regions (FG and STS for faces and PHG for houses); however, these effects were absent for extrastriate occipital regions (IOG and TOS, respectively). This dissociation suggests that the category-specific representations in the temporal cortex are more subject to top-down modulation than earlier, more sensory regions of cortex. Another possible source of this dissociation is the nature of the task, which required participants to prepare to see a member of an object category rather than a specific object. For example, if we had asked observers to prepare to see a specific individual picture of a face, this might have evoked more refined expectations about visual features or parts or perhaps a more vivid visual image. Exemplar-based expectation might therefore selectively recruit more posterior extrastriate cortical areas. It is important to note that within category-selective occipital regions, activity increased gradually (but equally for both categories) before any object cohered, leaving open the possibility that they are in fact modulated by visual expectation, but not in a category-specific manner.
Our choice of a within-category discrimination task was strategic: it enabled category expectation to be independent of a specific motor response (i.e., button 1 or 2 was equally likely regardless of whether the observer expected a face or a house), avoiding the potential confound of expectation and motor response that could arise in a task requiring a discrimination between 2 categories (face vs. house). The current results suggest that the detection threshold is somewhere between 35% coherence (when there is no category-selective brain response) and the discrimination threshold (68% coherence overall in the fMRI experiment; 64% coherence in the behavioral experiment). In order to specify detection thresholds, a behavioral detection experiment was conducted with faces and houses and a similar paradigm (1%/75 ms). We found that detection thresholds for expected categories were at 61% coherence. Extrapolating to the fMRI experiment suggests that the coherence required to discriminate which category was present occurred about 300 ms before the within-category discrimination threshold was reached.
A component process that may be related to forming a visual expectation in the absence of unambiguous visual stimulation is visual imagery. Like having an expectation, imagery can either facilitate or impair perceptual performance, particularly with simple images such as oriented bars or filled-in spatial locations (Farah 1985; Ishai and Sagi 1995). Imagining objects recruits regions of the brain associated with perception but only a subset of these regions. For example, imagining specific faces and houses leads to activity in parts of FFA and PPA, respectively (Ishai et al. 2000; O'Craven and Kanwisher 2000). Occipital cortex, including V1, has also been implicated in visual imagery, although not in all circumstances (Kosslyn and Thompson 2003). The inconsistency of this finding could be attributed to individual differences in the vividness of imagery (Cui et al. 2007; Reddy et al. 2008) or to the use of tasks in which imagining high-resolution details was required (Kosslyn and Thompson 2003). In the current study, imagery vividness may be limited by the categorical rather than exemplar-specific nature of the task. To the degree our task evoked mental imagery, it was presumably used strategically in order to improve performance on the task (i.e., to discriminate gender or structure for the face and house categories, respectively). Further experiments will be needed to determine the degree to which mental imagery contributed to the visual expectation effect observed in this study and the degree to which individual differences in imagery may affect expectation priming both behaviorally and in the brain.
It is also important to consider the link between visual expectation and visual working memory. In the working memory literature, delay period activity in visual cortex has been shown to contain stimulus-specific information, such as the orientation and color of the remembered stimulus (e.g., Harrison and Tong 2009; Serences et al. forthcoming). In addition, the maintenance and selection of specific houses or faces in working memory modulates activity in face- and house-selective regions of inferotemporal cortex (e.g., Ranganath et al. 2004; Lepsien and Nobre 2007). Recollection of specific faces and houses that were previously studied have also been associated with object-specific cortical activity (Polyn et al. 2005).
Although working memory may be related to the expectation effects reported here, there are several ways in which the current results differ from what would be predicted by simply maintaining a face or house template in working memory. First, the memory load concerning which category is currently relevant is low because the colored line cues were always present to indicate the expected category. Although participants had knowledge of the category throughout the trial, the anticipatory effect does not emerge until the moment that the observer expected the image to begin to emerge. The relationship between visual working memory, expectation, and imagery is ripe for further investigation.
Behavioral and imaging data suggest the presence of a temporal expectation effect in addition to the category-specific expectation effect. Participants responded more slowly on short catch trials (no initial noise interval, 1/6 of trials) than on trials with 4 or 8 s of preceding noise. This suggests that participants were typically not fully prepared to see the image cohere within the first 6–7 s. The category-selective expectation effect in the brain corroborates this: the anticipatory rise in the object-selective cortex became reliable at approximately 10 s into the critical trials. Accounting for the hemodynamic delay, the selective cortical response began at roughly the soonest time at which images could become visible (6–7 s). This is consistent with previous research demonstrating that temporal expectation can improve discriminability of visual targets (Correa et al. 2005). Because a long period of noise (8 s) preceded most trials (67%), participants initiated preparatory object-selective activity when they expected an object to start to appear.
As has already been shown in the spatial domain (Kastner et al. 1999; Hopfinger et al. 2000) and feature domain (Chawla et al. 1999; Shulman et al. 1999), this study has revealed that the activity of object-selective cortex can be modulated in the absence of object-specific visual information. It is not yet known how space-based, feature-based, and temporal expectations interact. In one study, participants anticipated a specific direction of motion in 1 visual field. Expectation led to motion-selective cortical modulations in retinotopic cortex for both the expected and unexpected visual field, suggesting that feature-based attention dominated spatial attention (Serences and Boynton 2007). However, McMains et al. (2007) found that when expecting features (color or motion) at a spatial location, baseline shifts in visual cortex were solely based on spatial receptive fields rather than feature selectivity. In this case, spatial expectation dictated subjects’ preparatory state. Other studies have investigated the interaction between spatial and temporal expectation. Results suggest partially overlapping neural mechanisms, which seems to combine synergistically, such that with spatial uncertainty, temporal expectation may have different effects on perceptual processing than with a concurrent spatial expectation (Coull and Nobre 1998; Doherty et al. 2005). The interaction of spatial, featural, as well as temporal expectation deserves further study.
We observed that the anticipatory rise in object-selective ROIs was associated with increased activity in parietal (IPS) and prefrontal cortices. Consistent with previous findings, these regions are potential sources of the top-down modulations we observed in inferior temporal (IT) cortex (see Kastner et al. 1999; Hopfinger et al. 2000; Ishai et al. 2000; Yantis 2008). It is interesting to note that, particularly in IPS, we observed a rise in activity time locked to the category-selective preparation effects. Functional connectivity analyses have previously shown that top-down object anticipation involved increased coupling between right parietal cortex and ventral visual areas, consistent with our findings (Eger et al. 2007). Others have shown that connectivity between prefrontal cortex and object-selective regions increases during mental imagery (Mechelli et al. 2004). These studies suggest a causal role for both PFC and posterior parietal cortex in top-down control of inferior temporal cortex. A recent theory (Bar 2004, 2007) proposes that low-spatial frequency visual information is rapidly transmitted to PFC and PFC in turn generates predictions by “presensitizing” category-selective visual cortex in IT. Rather than modulating visual cortex, the observed activity in IPS and PFC could be associated with preparing to respond, implementing the categorization rule, or modulating other regions of visual cortex not identified in this study. Future research will be required to dissociate the control of these different facets of task preparation.
In sum, the current study demonstrates that visual anticipation of an object category evokes increased activity in corresponding category-selective regions of temporal cortex. These findings extend previous reports of the effects of spatial and featural expectation and have broader implications for related cognitive domains such as working memory and visual imagery.
National Institutes of Health (R01-DA13165 to S.Y.).
Conflict of Interest: None declared.