How does the amount of time for which we see an object influence the nature and content of its cortical representation? To address this question, we varied the duration of initial exposure to visual objects and then measured functional magnetic resonance imaging (fMRI) signal and behavioral performance during a subsequent repeated presentation of these objects. We report a novel ‘rise-and-fall’ pattern relating exposure duration and the corresponding magnitude of fMRI cortical signal. Compared with novel objects, repeated objects elicited maximal cortical response reduction when initially presented for 250 ms. Counter-intuitively, initially seeing an object for a longer duration significantly reduced the magnitude of this effect. This ‘rise-and-fall’ pattern was also evident for the corresponding behavioral priming. To account for these findings, we propose that the earlier interval of an exposure to a visual stimulus results in a fine-tuning of the cortical response, while additional exposure promotes selection of a subset of key features for continued representation. These two independent mechanisms complement each other in shaping object representations with experience.
Prior exposure to a stimulus generally facilitates its recognition in subsequent encounters. This experience-based phenomenon, termed priming, has been studied extensively and is believed to be one of the building blocks of learning and memory (Tulving and Schacter, 1990). Electrophysiological recording studies in monkeys have provided important insights regarding the possible physiological basis of such experience-related changes in processing: specifically, by showing reduced neuronal response for repeated, compared with novel, stimuli. This effect has been found in inferior-temporal regions (Li et al., 1993; Ringo, 1996; Brown and Xiang, 1998) as well as in the prefrontal cortex (Rainer and Miller, 2000). In humans, regions involved in visual recognition have also been observed to produce a relatively reduced cortical response for repeated stimuli, as measured in studies using positron emission tomography (PET; Buckner et al., 1995; Badgaiyan et al., 2001), event-related potentials (ERP; Rugg et al., 1995; Puce et al., 1999), magnetoencephalography (MEG; Noguchi et al., 2004) and functional magnetic resonance imaging (fMRI; Buckner et al., 1998; Grill-Spector et al., 1999; James et al., 1999; Henson et al., 2000; Chao et al., 2002; Vuilleumier et al., 2002). This experience-based change in cortical response has been assigned numerous terms, many of which implicitly assume some underlying function in the effect they describe [e.g. suppression (Henson and Rugg, 2003) and adaptation (Grill-Spector and Malach, 2001), but not attenuation (Yi et al., 2004)]. We adopt here a functionally neutral working term: repetition-related response reduction.
While the phenomenon of repetition-related response reduction is, by definition, associated with the level of experience an individual has with a particular stimulus, the precise nature of this relation remains unclear. Specifically, how does the duration of our exposure to a certain visual object affect its cortical representation? We sought to clarify this relation by first systematically varying the amount of visual experience that observers acquired for each stimulus during its initial exposure, where each object was first presented for a duration lasting between 40 and 1900 ms. Then, using fMRI, we compared the response when each object was shown again in a subsequent repetition with that obtained for novel objects. To ensure identical viewing conditions when assessing the reduction of fMRI signal and behavioral response for new versus repeated presentations, additional new objects and all of the previously seen objects (regardless of their prior exposure duration) were each presented for 500 ms.
How might the magnitude of cortical response reduction change as a function of the amount of prior experience? It has been hypothesized that, at a neuronal level, repetition-related response reduction reflects the operation of a mechanism that increases the efficiency of cortical object representations with added exposure (Desimone, 1996). According to this account, the representation becomes efficient as the cortical response displays ‘sharpened’ stimulus selectivity. The use of ‘sharpening’ in this original proposal might mean that groups of neurons collectively represent all of the features of an object and, with added experience with the object, come to do so with increasing fidelity. The prediction that stems from this view is that increased exposure to a certain stimulus would result in increased neural selectivity for that stimulus, producing a continued reduction of the cortical response because neurons that are not optimally selective for that stimulus gradually stop participating in the object's representation. Increasing the exposure duration in the initial encounter with an object should, accordingly, lead to a larger response reduction, up to a certain asymptotic value (Li et al., 1993).
Subsequent theorizing has suggested that rather than merely sharpening the response to all information about a stimulus, experience with a visual stimulus might alternately lead to continued representation of only those features that are essential for identifying an object, while neurons coding features that are non-essential stop responding (Wiggs and Martin, 1998). By this view, the representation of a stimulus formed through increased exposure does not maintain exhaustive information about all of an object's features, but instead selectively represents only a ‘key’ sub-set of features that may be useful in distinguishing it from other objects. In light of the cortical commitment involved in maintaining object representations over time, it seems beneficial to have a mechanism that can increase the distinctiveness of an object's representation while also reducing the number of represented features. However, representing fewer features provides less overlap between an object's cortical representation and the corresponding visual input when that object is encountered subsequently. Therefore, if the magnitude of repetition-related response reduction depends on the similarity between the object's features and its primed representation, then a decrease in the total number of represented features might be expected to elicit a diminished response reduction with increased prior exposure to an object.
Both of these prior proposals imply that the magnitude of response reduction should change as a function of the initial exposure duration. Accordingly, increases in the magnitude of response reduction following increased visual experience with an object might indicate the influence of a ‘sharpening’ mechanism, while decreases in the magnitude of response reduction might indicate the influence of a selective mechanism that leads to the continued representation of features that are essential for identifying an object. Our results suggest, in fact, that both mechanisms are involved across different incremental periods of visual experience. Specifically, we report a clear ‘rise-and-fall’ pattern that consists of a distinct period in which repetition-related response reduction increases (i.e. following 40–250 ms of prior exposure) and a distinct period in which it decreases (i.e. following 350–1900 ms of prior exposure). We therefore propose that both fine-tuning and features selection affect visual representation of objects with increasing exposure.
When considering the relation between prior experience with visual objects and the corresponding repetition-related reduction in fMRI signal, it is important to remember that repetition-related response reduction is typically accompanied by improved recognition and shortened behavioral response latencies for repeated stimuli. Because cortical response reduction and behavioral priming generally occur together and share similar characteristics (Wiggs and Martin, 1998; Sayres and Grill-Spector, 2003; Lustig and Buckner, 2004; Maccotta and Buckner, 2004; Noguchi et al., 2004), it is tempting to think of them as manifestations of the same mechanism. While additional evidence is needed to establish an unequivocal causal link between them, a demonstration that cortical response reduction and behavioral priming consistently show similar changes across the experimental conditions of the present studies would provide converging support for the hypothesis that these effects are critically related. We therefore asked subjects in our studies to make a simple judgment about each presented object (i.e. natural or manufactured). Then, we compared reaction times, in addition to fMRI signal, for repeated objects relative to novel objects, as a function of the duration of prior visual exposure. This revealed the exact same ‘rise-and-fall’ pattern in the magnitude of behavioral priming as that found in the magnitude of the fMRI response reduction.
Finally, temporal parameters such as prime duration can produce quite different effects when manipulated in blocked versus randomly intermixed designs (Smith et al., 1994; Stolz and Besner, 1997). We therefore tested separate groups of subjects in block design and event-related fMRI versions of our study, as well as in a purely behavioral experiment, to guarantee results that are robust in the face of differences that these different experimental designs afford in expectancies, strategies and contrast effects. Inherent differences in each version of the study also allowed us to assess the robustness of our results across differences in the time interval separating the first and second presentations of each object [average time between presentations: block design, 40–58 s (9–18 intervening stimuli); event-related, 2 s–14 min; (1–374 intervening stimuli), behavioral study, 2 s–2 min (2–60 intervening stimuli)].
Materials and Methods
Forty-four healthy right-handed subjects (mean age: 28.5 years, range: 21–37 years; 27 females) participated in the experiment (12 in each fMRI experimental design and 20 in the behavioral study). All subjects had normal or corrected-to-normal vision. None were aware of the purpose of the experiment. Informed written consent was obtained from each subject prior to the scanning or behavioral session. All procedures were approved through Massachusetts General Hospital Human Studies Protocol number 2001P-001754 and the Harvard University committee on the use of human subjects in research.
Stimuli and Apparatus
The stimuli were 550 color photographs of familiar everyday objects, such as tools, furniture, means of transportation, clothes, animals, fruits, plants and vegetables. Each picture was presented centrally (mean visual angle 9°) on a white background, followed immediately thereafter by a mask (Fig. 1). Ten different masks were used, each a nonsense pattern of mixed lines and patches of color and texture of a similar size and contrast to that from the object-pictures.
Stimuli were back-projected (Sharp LCD projector, XG-NV6XU) onto a translucent screen that subjects viewed through a mirror mounted on a head coil. A custom-designed magnet-compatible panel of three keys was used for subjects' responses. The image presentation and response collection were controlled by a Macintosh G4 running PsyScope experimental software (Macwhinney et al., 1997) at a display resolution of 1024 × 768 pixels and a refresh rate of 75 Hz. Each subject had 130 practice trials using pictures that were not presented again in the actual experiment.
Design and Procedure
There were six functional image acquisition runs for each subject in both fMRI experimental designs. Each run consisted of trials containing fixation and object displays, each lasting 2 s. On object trials, a picture of an object was presented and followed immediately by a mask. There were 13 different object–display conditions: six different First exposure conditions (40, 150, 250, 350, 500, 1900 ms), six corresponding Repeat conditions (i.e. the same objects from the First conditions presented for 500 ms) and a New condition (i.e. novel objects presented for 500 ms). Because total trial duration was 2 s, the duration of the mask varied across conditions, with a range of 100–1960 ms. For example, the mask in the 40 ms exposure duration condition was presented for 1960 ms (i.e. 2000 ms total duration minus 40 ms object duration equaled 1960 ms mask duration). The task on experimental trials was to decide whether the presented object was natural or manufactured. Subjects were instructed to respond as accurately and as quickly as possible for each picture, by pressing a response key with their right hand. When subjects were unsure about their answer, they could press a third, ‘do not know’ button. On trials providing the fixation baseline, a black dot was presented in the center of the display. Subjects were asked to maintain fixation during these trials without making any response.
For each functional run in the block design, 13 experimental blocks of pictures — one per object-display condition — alternated with 13 fixation blocks. Each block lasted 20 s and consisted of 10 consecutive object or fixation trials, depending on the block. During the fixation blocks, the last fixation dot was red to signal the next experimental block. During the experimental blocks, 10 different stimuli were presented, all for the same exposure duration. For corresponding First and Repeat blocks, the same pictures were presented in a different random presentation order. The time interval between the first and repeated exposure of the same picture ranged between 40 and 58 s (9–18 intervening stimuli). The presentation order of the six functional runs varied randomly across subjects.
The presentation order of trials for the event-related design was determined by pseudo-randomly intermixing the 780 trials from the 13 experimental with 264 fixation trials. This was accomplished using the optseq program within the FreeSurfer Functional Analysis Stream (FS-FAST) software tools (http://surfer.nmr.mgh.harvard.edu/optseq). This program optimizes the presentation sequence of experimental and fixation trials for event-related designs to maximize the efficiency and accuracy of the estimation of the hemodynamic response for each stimulus presentation (Burock et al., 1998; Dale et al., 1999). The presentation order provided by optseq was subsequently adjusted to ensure that the First trial for a given object preceded that object's Repeat trial. This final sequence was divided into six sections of 174 consecutive trials for use in each of the functional runs. Intermixing trials from all conditions across the entire experiment resulted in a time interval between the first and repeated exposure of the same picture that ranged between 2 s and 14 min (1–374 intervening stimuli). In contrast to the block design, the wide range of intervals between First and Repeat trials in the event-related design provided minimal information about when an item may be repeated. This served accordingly to minimize any attentional effects associated with an items' anticipated recurrence (Vuilleumier et al., 2002). Intermixing trials from all conditions also served to minimize any attentional effects associated with an items' anticipated exposure duration. To control for item effects, the assignment of specific objects to each experimental condition was varied between subjects.
The design and procedure for the behavioral study was identical to that used in the event-related design, with two exceptions. First, subjects were tested individually in a testing room, outside of the scanner, with stimuli presented on a 33 cm CRT monitor. Second, the presentation order of the experimental trials were randomly intermixed with the sole constraint that the time interval separating first and second presentations of each object ranged between 2 and 120 s (2–60 intervening stimuli).
Block design subjects and event-related design subjects were scanned in a 3T Siemens-Allegra scanner. All images were acquired with a custom-built head coil. For each subject, a series of conventional high-resolution structural images (3-D T1-weighted images) was first collected for cortical surface reconstruction. A series of functional images was then collected using a gradient echo-planar imaging (EPI) sequence (block design: TR = 2.31 s, TE = 30 ms; event-related design: TR = 2.00 s, TE = 25 ms; both designs: flip angle = 90°, field of view = 256, slice thickness = 3 mm + 1 mm skip, 33 interleaved slices oriented along the AC–PC line). Each functional acquisition lasted either 8 min 50 s (block design) or 5 min 48 s (event-related design). Each scanning session, including the structural and functional sequences, lasted 1.5–2 h.
Functional data were analyzed using the FS-FAST analysis tools. The methods used here have been used and elaborated on previously (Bar et al., 2001; Bar and Aminoff, 2003). Data from individual fMRI runs were first corrected for motion using the AFNI package (Cox, 1996) and spatially smoothed with a Gaussian full-width, half-maximum filter of 5 mm (block design) or 8 mm (event-related design). The intensities for all runs were then normalized to correct for signal intensity changes and temporal drift, with global rescaling for each run to a mean intensity of 1000. Signal intensity for each condition was then computed, excluding trials with incorrect behavioral responses, and averaged across runs. The estimated hemodynamic response was defined by a gamma function of 2.25 s hemodynamic delay and 1.25 s dispersion. To account for intrinsic serial correlation in the fMRI data within subjects, we used a global autocorrelation function that computes a whitening filter (Burock and Dale, 2000). The data were then tested for statistical significance and activation maps were constructed for comparisons of New versus Repeat conditions (t-test with a minimal threshold set at P < 0.001, uncorrected for multiple comparisons) for each fMRI design.
Cortical Surface-based Analysis
Once the data from all trials were averaged, the mean and variance volumes were resampled onto the cortical surface for each subject. Each hemisphere was then morphed into a sphere in the following manner: first, each cortical hemisphere was morphed into a metrically optimal spherical surface. The pattern of cortical folds was then represented as a function on a unit sphere. Next, each individual subject's spherical representation was aligned with an averaged folding pattern constructed from a larger number of individuals aligned previously. This alignment was accomplished by maximizing the correlation between the individual and the group, while prohibiting changes in the surface topology and simultaneously penalizing excessive metric distortion (Fischl, 1999).
Region of Interest (ROI) Analysis
The ROIs chosen for this analysis were constrained both structurally and functionally. The structural constraint was based on a hand labeling of different brain structures for each subject. These structures were limited to the temporal–occipital and prefrontal regions that were expected a priori to show repetition-related response reduction, and that did indeed show significant (P < 0.01) response reduction in the present study, as revealed by the New versus Repeat contrast. A further criterion for inclusion was that these regions had to show repetition reduction with overlapping extents when compared across fMRI designs. For the left hemisphere, the structures meeting all of these requirements (see Fig. 2) included the lateral occipito-temporal sulcus, the inferior temporal gyrus, the fusiform gyrus, the collateral sulcus and the inferior frontal sulcus. For the right hemisphere, while robust repetition reduction was observed in the fusiform gyrus and the collateral sulcus for both fMRI designs, the extent of repetition reduction in these structures was anterior and non-overlapping in the event-related design relative to that observed in the block design. For this reason, only the left hemisphere structures were included in the selection of ROIs.
The additional functional constraint for the ROIs was based on a mask selecting only the subset of the voxels within each anatomical label that were activated in a positive direction by any component of the task, as revealed by the main effect (i.e. the contrast of all-conditions versus fixation-baseline), with a threshold of P < 0.01, corrected for multiple comparisons. All the voxels that met these constraints were then averaged, for each anatomical structure, allowing the contrasts of interest to be computed across the resulting time courses. The mean percentage of peak signal change was then calculated for each condition. For the block design, this was calculated across eight TRs (time points: 2.3–18.4 s). For the event-related design, this was calculated for the TR showing peak signal change (time point: 4–6 s).
Our main findings are that: (i) the magnitude of the repetition-related reduction in fMRI signal increased significantly with increased duration of prior exposure, peaking at ∼250 ms, but significantly decreased for longer durations of prior exposure; and (ii) prior visual exposure modulated both fMRI response reduction and behavioral priming in a highly similar manner.
Only trials associated with a correct response were included in subsequent analyses. Categorization performance for both fMRI designs was consistently high in every condition (<5% errors for each condition) except for First items shown for 40 ms (>70% errors). Regarding the conditions of primary interest (New and Repeat), a mixed-factor analysis of variance (ANOVA) conducted on the mean proportions of errors revealed that neither the main effects of prior exposure duration and experimental design nor the interaction term for these factors were significant (all Ps > 0.1).
Repetition-related Changes in fMRI Signal
As an overall test for repetition-related response reduction, we first compared fMRI signal change between the combined Repeat conditions (all prior-exposure durations) and the New condition. Note that all of these conditions had identical viewing conditions, with each object presented for 500 ms, and differed from each other only in the level of prior exposure. Several brain regions elicited lower activation for Repeat objects compared with New objects. Of these areas, we focused our ROI analysis only on those showing overlapping extents of repetition reduction across both experimental designs (Fig. 2; New > Repeat): the posterior part of the left inferior temporal gyrus (Talairach coordinates of greatest common activation, −58, −49, −4) the left lateral occipito-temporal sulcus (−45, −50, −12), the left fusiform gyrus (−39, −34, −17) and the left collateral sulcus (−31, −34, −9). Although robust repetition-related response reduction was observed in the right fusiform gyrus and the right collateral sulcus for both fMRI designs (see Fig. 2), the extent of the response reduction produced using the event-related design was anterior to, and non-overlapping with, that produced using the block design (fusiform: event-related, 36, −25, −16; block design, 27, −45, −13; collateral sulcus: event-related, 31, −28, −14; block design, 26, −40, −8). In the frontal lobe, relatively reduced activation was found in both experimental designs in the left inferior frontal sulcus (−47, 35, 4).
Some brain regions showed higher activation in both experimental designs for Repeat objects relative to New objects (Fig. 2; Repeat > New). Such increases of the BOLD signal were detected in the right intraparietal sulcus (32, −52, 43) and in precuneus (left: −10, −72, 40; right: 3, −56, 46), extending to the right parieto-occipital sulcus (36, −58, 25). Similar repetition enhancements in the same regions have been also found in priming studies during implicit and explicit tasks (Chao et al., 2002; Henson et al., 2002) and have been hypothesized to reflect recollection processes (Heun et al., 1999).
Effect of Prior Exposure Duration on fMRI Signal Reduction
To evaluate the impact of level of visual experience on the subsequent repetition-related response reduction within each ROI and for each fMRI design, we subtracted the percentage of fMRI signal change for each Repeat condition from the percentage of fMRI signal change elicited by the New condition (see Fig. 3). Maximal prior exposure-related reduction in fMRI signal was obtained for prior exposures of 250 ms. This was specifically indicated by two-tailed paired t-tests on the ROI data within each fMRI design (250 ms Repeat versus the other Repeat conditions) in the left inferior temporal gyrus, the left lateral occipito-temporal sulcus, the left fusiform gyrus, the left collateral sulcus and the left inferior frontal cortex (all Ps < 0.05). Longer prior exposures durations (350–1900 ms) not only failed to increase the magnitude of the repetition reduction, but actually resulted in a significantly smaller effect in all of these cortical regions. Consistent with the poor categorization performance for 40 ms First presentations, no fMRI reduction was detected for repeated objects with 40 ms prior exposure in the majority of the ROIs, except for the block design in the left fusiform gyrus (P < 0.05) and the left inferior temporal gyrus (P < 0.01).
On average, correct RTs for Repeat presentations were shorter than those for New presentations, for both the block (646 versus 696 ms) and event-related (717 versus 770 ms) designs. A mixed-factor ANOVA indicated that this behavioral priming effect by prior exposure was significant (P < 0.001) and did not vary across experimental design (P > 0.1). This comparison indicates robust behavioral priming with both experimental designs for conditions that were all presented for the same duration at the testing stage (i.e. 500 ms). Differences due to experimental design were limited to marginally faster RTs for the block design than the event-related design (P < 0.07).
Effect of Exposure Duration on Subsequent Priming
To evaluate the effect of prior exposure duration on the magnitude of behavioral priming, we calculated individual priming values by separately subtracting RTs obtained for each of the Repeat conditions from those for the New condition (Fig. 4). A mixed-factors ANOVA revealed a significant main effect of prior exposure on priming magnitude (P < 0.001) within each experimental design, which did not reliably differ across experimental design (P > 0.21). Because experimental design had no significant effects on the magnitude of priming with varying prior exposure (P > 0.21), the data from each condition were averaged across the different versions of the experiment to simplify subsequent analyses.
Maximal priming occurred for repeated objects with prior exposures of 250 ms. Two-tailed paired t-tests indicated that the magnitude of priming for 250 ms of prior exposure was greater than that for 40 ms (P < 0.01), 150 ms (P = 0.05), 350 ms (P < 0.05), 500 ms (P < 0.01) and 1900 ms (P < 0.01) of prior exposure. As in the fMRI response reduction results, repeated objects with longer prior visual exposure (350–1900 ms) not only failed to show an increase in behavioral priming, but actually resulted in less priming compared with repeated objects with a prior exposure of 250 ms. Of additional interest is the fact that 40 ms of prior exposure was not sufficient to produce reliable behavioral priming (P > 0.1), just as it was not sufficient to produce reliable repetition-related reduction in the fMRI signal. Again, this may be attributable to the poor categorization performance on first exposure to items in this condition.
That very brief exposure to objects (40 ms) did not produce subsequent behavioral priming, despite above-chance performance, is interesting in light of the fact that previous studies have shown that even shorter presentations can induce reliable priming (Bar and Biederman, 1998, 1999). The procedures used in such demonstrations of subliminal visual priming, however, differ in critical ways from those used in the present studies. For instance, in these subliminal priming studies, the presentation conditions (e.g. exposure duration, contrast, quality of masking) were optimized individually for each experimental object. It has been shown in visual psychophysics that such subliminal improvements can be obtained only when operating near the threshold of conscious perception (e.g. Tanaka and Sagi, 1998). Therefore, pre-adjustments of viewing parameters to increase the likelihood that perception will be below but near the threshold are crucial. No such preparations were made here. Furthermore, in those previous studies the task and paradigm were different (naming with a four-alternative forced-choice), and priming was measured by improvement in percent of correct responses rather than reaction times (RTs) (i.e. subjects had unlimited time to consider their response). Finally, the majority of the incorrect responses in the 40 ms here were ‘do not know,’ indicating that subjects responded correctly only when they were confident of their response. Therefore, any RT priming found in this condition would have not been considered to be subliminal because RT was only calculated using correct trials.
We conducted an additional behavioral experiment with 20 additional subjects using randomly intermixed conditions to ensure that the novel and potentially important rise-and-fall pattern of priming results would replicate outside of the magnet. The time interval separating first and second presentations of each object in this study ranged between 2 and 120 s (2–60 intervening stimuli). Despite this additional difference, the results of this behavioral study precisely replicated the pattern of behavioral results from both versions of the fMRI study (Fig. 4). These results provide converging evidence that behavioral priming for repeated objects with 250 ms of prior exposure was greater than that for 40 ms (P < 0.01), 150 ms (P < 0.01), 350 ms (P < 0.01), 500 ms (P < 0.05) and 1900 ms (P < 0.01) of prior exposure. Repeated objects with longer prior visual exposure (350–1900 ms) resulted in less priming than repeated objects with a prior exposure of 250 ms.
Correlation between fMRI Signal Reduction and Behavioral Priming
The results reported above indicate very similar dynamics for repetition-related reduction in fMRI response and behavioral priming: both phenomena maximized for a level of visual experience analogous to 250 ms of previous stimulus exposure and then decreased for longer prior exposures. To quantify the link between response reduction and priming, we tested the correlation between the effects of exposure duration on the dynamics of both. Averaging the magnitude of repetition reduction and behavioral priming across all ROIs revealed common experience-related changes that showed reliable average correlations (block design: r = 0.41, P < 0.01; event-related design: r = 0.36, P < 0.05), suggesting a direct connection between the cortical and the behavioral phenomena.
Our results demonstrate that visual experience with an object has a highly similar influence on the dynamics of overall fMRI signal reduction and behavioral priming. Both were observed to be (i) relatively small for briefly presented stimuli that were hardly recognized; (ii) increase with level of prior visual exposure to be maximal for a duration of 250 ms; (iii) decrease in magnitude for prior exposures longer than 250 ms; and (iv) remain significant for at least 1900 ms of prior visual exposure. The data reported here reveal a novel and counter-intuitive property of both repetition reduction and behavioral priming. Specifically, for both phenomena, this is the first demonstration that a maximal effect is obtained only for a prior exposure of 250 ms, and that the magnitude of these effects is reduced for longer durations. While our primary focus concerns experience-related reductions in cortical response and the general effect of visual exposure on object representations in the cortex, the striking similarity of the dynamics of repetition reduction and behavioral priming resonates strongly with the hypothesis that these two phenomena are critically related.
The cortical regions showing repetition-related response reduction in our fMRI results include bilateral collateral sulcus and fusiform gyrus, left lateral occipito-temporal sulcus, inferior temporal gyrus and inferior frontal cortex. Each of these regions has previously been found to exhibit reduced activity for repeated objects when compared with that for novel objects (Buckner et al., 1998; Vuilleumier et al., 2002; Sayres and Grill-Spector, 2003; Maccotta and Buckner, 2004). Although we did not test for distinct processing contributions of different regions, the sensory processing function typically associated with relatively posterior occipital–temporal regions suggests that the specific reduction found there might reflect perceptual priming. Results of previous fMRI studies also implicate more anterior regions of temporal–occipital and inferior prefrontal cortices, especially in left hemisphere, to be associated with object representations that generalize across different exemplars (Koutstaal et al., 2001; Simons et al., 2003) or viewpoints (Vuilleumier et al., 2002) that involve lexical/semantic information (Demb et al., 1995; Thompson-Schill et al., 1999; Koutstaal et al., 2001), or that concern task-specific (Wagner et al., 2000) and response-related (Dobbins et al., 2004) information associated with an object. We therefore take response reductions in these regions to reflect perceptually abstract and non-perceptual (e.g. conceptual, linguistic and response-related) components of priming (cf. Schacter et al., 2004). As shown in Figure 3, all of these regions produced similar ‘rise-and-fall’ patterns of exposure-related fMRI repetition reduction, suggesting that each of these areas either mediates, or is affected by, the processes involved in the response reduction. Thus, while the nature of the object-related information represented in these various cortical regions may differ, the processes that shape the representations found in each may be the same.
The different versions of our study yielded highly similar results. This suggests that our results are robust in the face of differences in the time interval separating the first and second presentations of each object, and differences in expectancies, strategies and contrast effects that the different experimental designs afford. In particular, the similarity across the event-related and blocked designs demonstrates that the ‘rise-and-fall’ pattern of results we obtained was not due to exposure-related differences in how subjects allocated their attention. Attentional confounds can be problematic for the results of block design experiments, because subjects typically know what condition to expect on every trial. However, there was no way for subjects in the event-related design to know what duration to expect for a forthcoming stimulus, because the presentation orders of the different conditions in these designs were intermixed. Thus, there was no way to allocate different levels of attention voluntarily when stimuli appeared for different exposure durations. The similarity in the results from the different designs therefore provides strong evidence, not only that the ‘rise-and-fall’ pattern of repetition reduction and behavioral priming replicates, but also that these effects are not an artifact of differences in the top-down allocation of attention.
Findings of repetition-related response reduction have been speculated to reflect a ‘sharpening’ of the cortical response (Desimone, 1996). This hypothesis regarding the functional significance of repetition reduction was later interpreted (Wiggs and Martin, 1998) to suggest that the reduced response is a manifestation of a selective representation, in which only key object features continue to be represented with repeated experience. These two proposals differ from each other in that one focuses on a ‘sharpened’, and presumably exhaustive, representation, whereas the other focuses on a selective, non-exhaustive representation. Although neither of these proposals would individually predict a pattern of exposure effects similar to that reported here, our findings support the coexistence of both mechanisms. As elaborated below, we suggest that these mechanisms operate separately from each other, and together create object representations that are both ‘sharpened’ and selective.
According to the present proposal, visual exposure to a certain object first recruits a sharpening process during which the initially broad cortical response becomes fine-tuned and maximally stimulus-specific. The cortical response to a visual input is initially driven by coarse information and global aspects of the image and, in that sense, is not optimal and therefore requires fine-tuning. Indeed, psychophysical experiments with stimuli ranging from simple gratings (DeValois and DeValois, 1988) to complex scenes (Schyns and Oliva, 1994) indicate that observers perceive global components considerably earlier than they perceive the stimulus-specific detail (Watt, 1987; Bar, 2003; Loftus and Harley, 2004). Recent neurophysiological studies (Brown and Xiang, 1998; Sugase et al., 1999; Tamura and Tanaka, 2001) support this idea by showing that activity in inferior temporal is initially, at ∼130 ms from stimulus onset, broad and relatively less selective to the specific stimulus, representing only its global properties (e.g. general orientation and dimensions). Then, at ∼240 ms from stimulus onset, the representation becomes stimulus-specific, such that only those neurons that best represent the specific properties of the stimulus continue to respond (Tamura and Tanaka, 2001). Fine-tuning may also benefit from the attentional selectivity of neurons in inferior-temporal cortex, which follows a comparable timecourse (Chelazzi et al., 1998): While cells initially show a similar response, regardless of the relevance of a particular stimulus, this response becomes highly selective in accordance with attentional demands within 200 ms from stimulus onset. Taken together, these timecourses are especially compelling in their similarity to our findings that exposure effects peaked for objects previously presented for 250 ms, suggesting that maximal fMRI signal reduction coincides with the completion of fine-tuning. The outcome of this fine-tuning process is an efficient but exhaustive representation of the stimulus. The representation is efficient in that each object's feature is represented optimally, but is also redundant because it includes all of the features in the image. Based on the inverted U-shaped pattern of exposure effects we observed, it is proposed that a subsequent selection process eliminates this redundancy.
Given sufficient exposure to a specific object, this second process selects the key features from the fine-tuned, exhaustive representation of the object in a similar manner as suggested previously (Wiggs and Martin, 1998). Subsequently, only the key features continue to be represented, while the neurons representing redundant features gradually respond less. Signals for guiding the selection of these key features may be projected back from the prefrontal cortex, which processes, among other things, semantic information about objects (Demb et al., 1995; Wagner et al., 2000), as well as from the amygdala, which analyzes emotionally relevant information (Hariri et al., 2002). For the present purpose, key features are defined as either diagnostic features that distinguish the specific object from other objects, features that are critical for the specific task at hand, features that remain invariant under various viewing conditions, features of outstanding interest, or odd, surprising and unexpected features. For example, while the shape of the legs of a certain chair may be considered a key property and will continue to be represented, maintaining details about all four of its similarly looking legs is not essential for an economic and reliable representation. Being selective about which information is represented may also serve to emphasize the unique features of a certain object and thus make it more recognizable, just as a caricature of a face, eliminating non-distinctive extraneous information, can be recognized more accurately than its detailed, veridical version (Rhodes et al., 1987). Thus, allocating neurons for representing redundant or non-essential features can be seen as a waste of resources (Lennie, 2003), and it is predicted that representations are formed to minimize such cortical commitment whenever optimization is possible.
The selection process that we describe is proposed to help shape object representations. However, the term selection has also been associated, in a different context, with a mechanism that operates in left inferior frontal cortex to select among multiple lexical/semantic representations that compete for access to further processes based on their relevance to task and stimulus demands (Thompson-Schill et al., 1999). Greater selection in this latter regard refers to the need to select an appropriate representation from many different representations. This between-representation process is therefore notably distinct from the within-representation process that we describe. Importantly, while selection between different semantic representations may occur primarily in left inferior frontal cortex, the shaping of object-related representations by the selection of which properties should continue to be represented may occur throughout various cortical regions involved in object priming and recognition.
The exposure-related fine-tuning and selection processes described here may overlap in time, but they are completed consecutively. Fine-tuning is guided by the arrival of gradually increasing details about the visual stimulus, and is therefore an inherently bottom-up process that is completed relatively early (e.g. our results suggest by ∼250 ms). The selection process, on the other hand, depends on high-level information and semantic knowledge, and is therefore predicted to be guided by top-down mechanisms and be completed relatively later (i.e. 350 ms and beyond based on our data). While future research is needed to address whether the precise time course of these processes depends on task demands or the processing complexity of individual objects, the present findings nevertheless suggest that the combined outcome of these two processes is an efficient and selective long-term representation.
How does this two-process model account for the parabolic pattern of our results? A mask presented after a picture interrupts further visual processing (Rolls and Tovee, 1994; Kovács et al., 1995). If we assume that priming captures the most developed representation up to this interruption, then measures of priming can be considered to reflect the latest outcome of the processes that shape visual representations (Bar, 2001). When a mask interrupts processing at 250 ms, a comprehensive fine-tuning process has been completed, but the selection process has not yet developed. The resulting primed representation is therefore based on a fine-tuned representation of all the features. Accordingly, the next time subjects see that specific object the activation of this complete and fine-tuned object representation elicits a minimal cortical response. In other words, presenting the image first for 250 ms results in maximal repetition reduction relative to novel controls because all of the object's features have been stored in a fine-tuned manner.
When, on the other hand, the mask interrupts visual processing at 350 ms or longer, after the subset of the relevant key features has been selected, the resulting stored representation is partial because it only includes key features. In other words, key features are primed and represented in their fine-tuned form, whereas ‘non-key’ features are no longer part of the object representation and are therefore primed relatively weakly, if at all. When a subject sees the specific object again, the primed features elicit a minimal response but the ‘non-key’ features elicit a response comparable to that of a previously unseen feature. This combination of activating primed and less-primed features results in a cortical reduction and RT improvement lower than the maximum, but higher than that obtained for a novel object.
We have described the operations of the fine-tuning and selection mechanisms primarily in terms of the formation of perceptual representations of objects. However, the proposed fine-tuning and selective processes are also presumed to operate to shape other types of object-related representations, such as those involved in the conceptual, linguistic and response-related components of priming (for a review of priming specificity, see Schacter et al., 2004). Support for this comes from the fact that we obtained the same ‘rise-and-fall’ pattern of exposure-related response reductions in several cortical regions, including anterior temporal and inferior frontal regions that have been implicated in non-perceptual operations. This possibility underscores the potential generality and importance of our proposal, and emphasizes the need for future research to establish the extent to which the fine-tuning and selective processes might reflect the general operating characteristics of neural ensembles in shaping different types of cortical representations.
While our primary focus in this investigation concerns experience-related reductions in cortical response, our behavioral results also merit consideration. Indeed, despite decades of research interest in the behavioral manifestations of priming, evidence regarding the effects of initial exposure duration on subsequent recognition performance for repeated objects is lacking. As a result, the ‘rise-and-fall’ pattern of exposure-related behavioral priming effects that we obtained is itself a novel finding. Despite the lack of comparable prior object recognition studies, several studies using visually abstract or linguistic stimuli have manipulated prime exposure duration and are therefore relevant to the present results. However, many of these studies used only very brief prime exposure durations (<100 ms, e.g. Frost et al., 2003), relatively long exposures (>1000 ms, e.g. Jacoby and Dallas, 1981; Neill et al., 1990; Musen, 1991) or only two different exposure durations (e.g. Hirshman and Mulligan, 1991, Experiment 3; Versace, 1998; Versace and Nevers, 2003), and none used reaction times to measure priming. Unfortunately, the absence of multiple prime durations and/or lack of a similar range of durations as used in our study impedes proper comparison of these results to our general ‘rise-and-fall’ pattern of behavioral priming effects.
Of the remaining studies that used relatively more comparable procedures, three provided results that are nominally consistent with our behavioral findings. Two studies reported by Crabb and Dark (1999; 2003, Experiment 2), for instance, together show a similar ‘rise-and-fall’ pattern of priming effects on identification accuracy for words. In their first study (Crabb and Dark, 1999), repeated target words that were actively attended to in prime displays were correctly identified more often than new, unprimed items. Importantly, for these items there was a priming-related ‘rise’ in the proportion of identified repeated items, relative to the proportion of identified new items, when the prime exposure duration increased from 100 ms (0.095) to 200 ms (0.126), and there was a ‘fall’ in priming magnitude when the duration increased further to 300 ms (0.100). Another of their studies that used longer prime exposure durations (Crabb and Dark, 2003, Experiment 2) showed additional evidence for the ‘fall’ of priming, with nominally greater priming for words that were initially presented individually for 200 ms than for those presented for 600 or 1000 ms. Although the statistical reliability of these prior trends was not established, the general similarity between these results and ours supports the potential generality of our findings. Even more compelling in this regard are the results of von Hippel and Hawkins (1994, Experiment 1). Prime words in their study were presented under perceptual study conditions for 50, 100, 200, 500 or 1000 ms. The proportion of these prime words that were subsequently used to complete word fragments (e.g. ma__l_ → marble) showed a ‘rise’ in priming, with steady increases with prime exposure from 50 ms to the maximal priming effect at 200 ms. The ‘fall’ of priming was also clearly evident in these results, with consecutive decreases in the proportion of primed fragment completions following 200 ms of prime exposure to that following 500 and 1000 ms of prime exposure, respectively. Furthermore, the quadratic trend defining this ‘rise-and-fall’ pattern of priming effects was found to be statistically reliable. Although this pattern was not as clear in other conditions in von Hippel and Hawkin's (1994) study, such as when subjects were required to type the name of previously primed words that were briefly flashed again for 33 ms, our survey of the behavioral priming literature nevertheless suggests that our proposal is further supported by previous reports.
We have interpreted the ‘rise-and-fall’ patterns of repetition-related response reduction and behavioral priming that we obtained as reflections of how cortical representations are shaped with increasing visual experience. Our account suggests that a ‘rise-and-fall’ pattern might be expected in any situation where a repeated stimulus and task-related demands are highly similar across both presentations, where a fine-tuned response to redundant and otherwise irrelevant features and information provides a greater overlap between an object's cortical representation and the corresponding visual input, and where selection of only ‘key’ features for continued representation reduces this overlap. Importantly, our account does not suggest that increase exposure inevitably decreases behavioral performance with sufficient visual exposure. Indeed, the effect of the proposed selection process might often make object identification more efficient; that is, retaining only the most distinctive, relevant features and information about an object will generally make it easier to distinguish from other objects. Thus, eliminating the influence of redundant, less relevant information can aid identification. However, in our task, this normally redundant and less relevant information is in fact helpful, as it provides a greater overlap between an object's cortical representation and the corresponding visual input. Maximal priming should therefore be observed in such situations whenever the object is most accurately and exhaustively represented (i.e. following maximal fine-tuning and minimal feature selection).
The reliable correlation and striking similarity in the ‘rise-and-fall’ pattern of repetition-related response reduction and behavioral priming we observed suggests that these phenomena are critically related. If the evolution of an object's cortical representation is related to recognition ability, then at least some level of representational fine-tuning may be required before recognition of an object is possible. Consequently, if the representation activated in a second encounter is fine-tuned, RT is shorter than that observed for a novel stimulus because less time is required for recognition. Our proposal that fine-tuning is completed by 250 ms is supported in this regard by the fact that RTs were indeed fastest for objects shown previously for 250 ms, in addition to priming being maximal in this condition. The link to behavioral RT improvement is bolstered by the finding that the cortical response to visual objects is not only reduced with repeated exposure, but also peaks earlier (Noguchi et al., 2004). Similarly, in a study of the cell population in IT, activity there initially distinguished between novel and familiar objects ∼100 ms after the onset of their response (∼180 ms from stimulus onset; Li et al., 1993). The 100 ms delay of this diagnostic activity, however, was reduced to only 10 ms following additional presentations. This shortening of response onset to a familiar stimulus may therefore reflect the efficiency involved in behavioral RT priming.
Our results demonstrate that visual experience with an object has a highly similar influence on two important phenomena: the relatively reduced cortical response to repeated stimuli and the corresponding behavioral priming. While future research is required to demonstrate that our findings generalize to different experimental designs and cognitive tasks, these findings converge to improve our understanding of the mechanisms mediating both. A more important result observed here is our novel finding of the ‘rise-and-fall’ pattern, in which maximal repetition-related cortical and behavioral effects were both obtained at a specific level of visual experience, analogous to prior exposure of 250 ms, and were reduced at longer exposure durations. Consequently, we suggest a model in which experience with a specific visual stimulus recruits two separate mechanisms that together create cortical representations that are both efficient and selective.
We thank R. Henson, A. Martin, C. Tyler and N. Tzourio-Mazoyer for helpful comments and stimulating discussions; M. Vangel and D. Greve for statistical advice; B. Quinn and the Imaging Core at the Martinos Center at MGH for technical assistance; and H. Linz for assistance with data collection. Supported by the James S. McDonnell Foundation 21st Century Science Research Award in Bridging Brain, Mind and Behavior #21002039 (to M.B.), NINDS R01 NS44319 (to M.B.) and the MIND Institute.