One ubiquitous finding in functional magnetic resonance imaging studies is that repeated stimuli elicit lower responses than novel stimuli. In apparent contradiction, some studies have reported the exact opposite effect—greater responses to repeated than novel stimuli—in many of the same brain regions. Interestingly, these latter enhancement effects are typically obtained when stimuli have been degraded. To explore this observation, the present study examines the degree to which visual quality mediates repetition effects in a stimulus-selective ventral visual area. Subjects were presented with grayscale photographs of scenes that were either near or substantially above visual threshold, as determined by calibrating image contrast to behavioral performance. The presentation of 2 identical high-contrast scenes elicited lower blood oxygen level–dependent (BOLD) responses than the presentation of 2 different high-contrast scenes (repetition attenuation). Conversely, the presentation of 2 identical low-contrast scenes elicited greater BOLD responses than the presentation of 2 different low-contrast scenes (repetition enhancement). Neurophysiological studies suggest that repetition attenuation in ventral visual areas may reflect the reactivation of perceptual representations that have become sparse and selective as a result of prior experience, whereas repetition enhancement may reflect spared access to existing representations by severely degraded input.
The question of how the brain represents information is of central importance to cognitive neuroscience. One general approach for addressing this question involves studying how neural activity and behavior change as a function of experience. Single-unit recordings have revealed that the firing rates of many neurons in monkey inferior temporal (IT) cortex (e.g., Baylis and Rolls 1987; Brown and others 1987; Li and others 1993; Miller and Desimone 1994) and prefrontal (PF) cortex (e.g., Miller and others 1996; Rainer and Miller 2000) are markedly reduced upon repetition of a stimulus. When combined with the sharpened tuning curves of the remaining responsive neurons, this “repetition suppression” results in sparser and more selective representations (e.g., Desimone 1996; Ringo 1996; Brown and Xiang 1998; Wiggs and Martin 1998). Repetition suppression may be fundamentally related to an analogous finding—repetition attenuation—in functional magnetic resonance imaging (fMRI) studies, where repeated stimuli elicit reduced blood oxygen level–dependent (BOLD) responses in occipitotemporal and PF regions (e.g., Demb and others 1995; Buckner and others 1998; Schacter and Buckner 1998; Wiggs and Martin 1998; Grill-Spector and Malach 2001; Henson 2003; Zago and others 2005). Alternatively, “repetition attenuation” may reflect dampened activity over an entire population of responsive neurons or shorter durations of neural activity as a function of repetition (for a review, see Grill-Spector and others 2006).
The relationship between repetition suppression and the process of representation is not fully understood. One possibility is that repetition suppression is important for novelty detection (Brown and Xiang 1998). This novelty detection would have to be stimulus specific (Li and others 1993) rather than general (Wilson and Rolls 1990) because suppression can occur after normal responses to several interleaved stimuli (Rolls and others 1989; Riches and others 1991; Li and others 1993; Miller and Desimone 1994). Another related possibility is that repetition suppression reflects the pruning of neurons that do not code well for perceptual features: initial representations of novel stimuli are broadly selective (representing global features; Sugase and others 1999), but upon repetition, neurons that are not tuned to behaviorally relevant stimulus attributes are winnowed out of the representation (e.g., Li and others 1993; Tamura and Tanaka 2001). Conversely, neurons that are tuned to stimulus attributes can become more responsive to those attributes with repetition (Logothetis and others 1995; Kobatake and others 1998; Rainer and Miller 2000); the fact that this subset of neurons consistently responds above a certain threshold ensures that stimuli will be processed similarly over multiple exposures (perceptual constancy; Brown and Xiang 1998).
The decrease in the number of firing neurons, as well as the increase in their selectivity is known as “sharpening” (Desimone 1996), which may provide a mechanism—at the level of neuronal populations—for the decreased BOLD response that repeated stimuli can elicit in fMRI (Schacter and Buckner 1998; Wiggs and Martin 1998; Zago and others 2005). One problem with this account, however, is that repeated stimuli sometimes elicit greater BOLD responses than do novel stimuli—repetition enhancement—in the same regions that produce attenuation (e.g., Dolan and others 1997; Grill-Spector and others 2000; James and Gauthier 2005; Kourtzi and others 2005). For example, Dolan and others (1997) found greater BOLD responses in the fusiform gyrus for repeated versus novel presentations of binarised faces. Although there exist theories about the cause of “repetition enhancement” in fMRI (see Discussion), the neurophysiological basis of such enhancement has received little attention.
In fact, reports of repetition-induced increases in neural firing rate are rare; most studies note that few, if any, of the neurons in their sample show response enhancement (e.g., Riches and others 1991; Li and others 1993). However, there are 2 studies that have observed greater neural selectivity to repeated than to novel stimuli. In one study (Rainer and Miller 2000), monkeys were presented with novel and familiar images of objects that had been degraded parametrically with phase scrambling. When images were undegraded, more neurons in PF cortex were selective to novel stimuli than to familiar stimuli. However, at moderate levels of degradation, the reverse effect was observed: more PF neurons were selective to familiar stimuli than to novel stimuli. This repetition enhancement was manifest in the continued selectivity of neurons to familiar stimuli relative to the sharp decline in the selectivity of neurons to novel stimuli upon degradation (invariance). The second study (Rainer and others 2004) employed a very similar design but recorded from the extrastriate region V4. When undegraded, neural responses to novel and familiar stimuli were equivalent in terms of average firing rate and selectivity. However, at moderate levels of degradation, familiar stimuli elicited stronger and more selective responses than did novel stimuli. Repetition enhancement in this study resulted from greater and more selective responses to degraded familiar stimuli than to undegraded familiar stimuli (amplification), whereas responses to novel stimuli declined slightly with degradation.
Existing findings of BOLD enhancement could be explained by “invariance” (spared access to trained low-visibility stimuli; cf., Grill-Spector and others 2000) or “amplification” (the boosting of task-relevant features of low-visibility stimuli during a difficult discrimination task; cf., Kourtzi and others 2005). In the current study, we directly investigate whether either of these effects can account for BOLD repetition enhancement by manipulating the visibility of repeated and novel stimuli. According to sharpening, the repetition of high-visibility (undegraded) stimuli should result in attenuation of the BOLD response. Conversely, the repetition of low-visibility (degraded) stimuli may result in enhanced BOLD responses because of invariance and/or amplification. It should be noted that invariance in Rainer and Miller (2000) refers to the maintained selectivity of neurons after degradation. Because they did not analyze average firing rates, it is unclear whether this maintained selectivity would result in the stronger population responses as measured by fMRI (although such a link was present in Rainer and others 2004).
In the current study, subjects were presented with grayscale photographs of initially novel real-world scenes during one fMRI session. Brain analyses were focused on a scene-selective region of visual cortex called the parahippocampal place area (PPA) (Epstein and Kanwisher 1998). The functional properties of the PPA are relatively well known: it is maximally responsive to local layouts (Epstein and Kanwisher 1998), sensitive to repetition (Yi and others 2004; Yi and Chun 2005; Turk-Browne and others 2006), and modulated by manipulations of visual quality, such as blurring (Yi and Chun 2005) and contrast (Yi and others 2006). To manipulate visibility, image contrast was reduced such that accuracy in deciding whether a novel scene occurred indoors or outdoors was 90% for high-visibility scenes and 70% for low-visibility scenes. All scenes were presented for 50 ms and masked after a 50-ms interstimulus interval (ISI). Every trial consisted of 2 high- or low-visibility scenes. The second scene in each trial was 1) identical to the first (repeated), 2) a novel scene that required the same indoor/outdoor response (novelSR), or 3) a novel scene that required a different response (novelDR). The comparison of repeated with novelSR trials allowed us to study stimulus repetition divorced from response repetition.
Materials and Methods
Twenty normal subjects (14 females, 3 left handed, mean age: 22.4 years, range: 18–38 years) volunteered in exchange for monetary compensation. Data from 2 additional subjects were unusable because of procedural difficulties during scanning. All subjects reported normal or corrected-to-normal vision. Informed consent was obtained from all subjects, and the study protocol was approved by the Human Investigation Committee of the School of Medicine and the Human Subjects Committee of the Faculty of Arts and Sciences at Yale University.
Each subject viewed a total of 620 grayscale photographs that were randomly selected from a larger pool of indoor and outdoor scenes. Indoor scenes depicted several types of rooms in buildings, for example, kitchens, bedrooms, bathrooms, and offices. Outdoor scenes depicted the outsides of buildings, city vistas, and natural landscapes. Each picture appeared in the center of a medium gray background, subtending 13 × 13 degrees of visual angle. A green fixation cross or outlined circle, subtending 0.5 × 0.5 degrees, was superimposed in the center of the photographs. Scenes were masked with a black and white checkerboard of the same size.
Image contrast was manipulated using the Parameter Estimation by Sequential Testing algorithm (Taylor and Creelman 1967), which ran continuously throughout the experiment to ensure that practice effects would not result in higher than desired accuracy. Separate adjustment factors were estimated for high- and low-contrast stimuli based on the accuracy of responses to the first scene in each trial. Thresholds were set to 90% for high visibility and 70% for low visibility. Subjects completed a practice block to allow the algorithm to converge. Contrast was adjusted with the Image Processing Toolbox in Matlab (The MathWorks, Natick, MA) by narrowing the range of intensity values around the mean intensity of each scene. On average, contrast was reduced by 45% for high-visibility scenes and 88% for low-visibility scenes.
The stimulus protocol contained 2 factors: 1) visibility and 2) repetition. There were 4 levels of visibility: high visibility on the first and second scene (high–high), high on the first and low on the second (high–low), low on the first and high on the second (low–high), and low on both (low–low). The “high–high” and “low–low” conditions are the basis of our primary comparisons. The “high–low” and “low–high” mixed visibility conditions primarily served a methodological role (see Results), preventing subjects from anticipating the visibility of the second scene. As described above, the repetition factor had 3 levels: repeated, novelSR, and novelDR. The 2 factors were combined in a full factorial 4 by 3 design; each of the 12 conditions had 30 observations across the 5 functional image acquisition runs.
Each subject completed 24 practice trials outside of the scanner and was instructed to respond as quickly and accurately as possible to all scenes. Inside the scanner, each run began with 12 s of stabilization time and one filler trial. The trial sequence is depicted in Figure 1. A trial began with a fixation cross for 200 ms, followed by the onset of the first scene, which lasted for 50 ms. The short exposure of the scenes ensured that eye movements during the presentation would not occur. After a 50-ms ISI, the mask was presented for 50 ms; the ISI reduced any detrimental effects of the mask on response accuracy (Bacon-Mace and others 2005). Subjects had a 2000-ms window to categorize the scene as indoors or outdoors by pressing the left or right button. The second scene in each trial was presented the same way, starting 3 s after the onset of the first scene. Trials were spaced by 0, 2, or 4 s to reduce serial correlations between conditions (Burock and others 1998). We simulated numerous trial sequences prior to scanning in SPM99 (Wellcome Department of Cognitive Neurology, Institute of Neurology, London, UK) and chose sequences in which correlations between conditions were low (r < 0.15). A Latin square was used to ensure that each trial order appeared an equal number of times at each point during the experiment across subjects.
After completing 5 experimental runs, the PPA was localized by alternating blocks of faces and novel scenes; subjects performed male/female and indoor/outdoor tasks. The localizer contained 7 blocks of each task lasting for 30 s each; 12 scenes/faces were presented in each block for 200 ms (to prevent eye movements). The fixation dot was placed directly between the eyes for faces to further reduce the likelihood of saccades and at the corresponding location for scenes (just for the localizer).
All scans took place in a Siemens Trio 3-T scanner with a standard birdcage head coil. Functional images were acquired with a T2*-weighted gradient-echo sequence (time repetition = 2000 ms, echo time = 25 ms, flip angle = 80°, 7 × 3.75 × 3.75 mm resolution, no gap); each volume contained 19 axial slices parallel to the anterior commissure–posterior commissure line covering the entire brain. The main experiment was conducted in the first 5 functional scans, each acquiring 310 volumes. The final functional scan, the PPA localizer, acquired 220 volumes. Visual stimuli were presented by a liquid-crystal display projector on a rear-projection screen, seen through an angled mirror attached to the head coil. A magnetic resonance imaging–compatible button box was used to collect responses.
Preprocessing and statistical analyses were conducted using SPM2 (Wellcome Department of Cognitive Neurology, Institute of Neurology, London, UK). After the first 6 volumes of each functional scan were discarded, each volume was slice-time corrected, motion corrected (using the INRIAlign toolbox; Alexis Roche, EPIDAURE Group, INRIA Sophia Antipolis, France), normalized (to standard MNI space; Montreal Neurological Institute, Montreal, Canada), resampled to 3-mm isotropic voxels, and spatially smoothed (with an 8-mm full-width half-maximum Gaussian kernel). Two subjects were excluded from further analysis because of excessive head movement. The signal time course in each voxel was high-pass frequency filtered (128 s period cutoff) and corrected for autocorrelation between scans.
Individual PPA regions of interest (ROIs) were functionally localized bilaterally based on the independent localizer scan. Blocks of faces and scenes were separately modeled with canonical hemodynamic response functions (HRFs) used as regressors in a multiple regression analysis. The 6 movement parameters from motion correction were entered as covariates of no interest. A linear contrast of the scene block versus the face block created a statistical parametric map of t-values with a strict threshold (P < 0.001, corrected for familywise error rate, cluster threshold of 5 voxels). The maximally scene-selective voxel in ventral visual areas, including the parahippocampal gyrus and the collateral sulcus, was used as the center of a spherical ROI (4 mm radius) in each hemisphere (Epstein and others 2003; Yi and others 2004; Yi and Chun 2005; Turk-Browne and others 2006). A typical subject's localizer results and ROIs are presented in Figure 3A.
For each subject, the MarsBar toolbox (Brett and others 2002) was then used to examine results from the main experiment in both the left and right PPAs. Only trials with 2 correct responses were included in the experimental conditions; trials containing one or more errors were counted as fillers. The 12 experimental conditions and the filler condition were modeled using a canonical HRF with time derivative in a multiple regression analysis. The 6 movement parameters from motion correction were also included as covariates of no interest. Whole-brain analyses were conducted outside of the PPA by modeling our conditions in SPM2 using a canonical HRF with time derivative in a multiple regression analysis, along with regressors for the 6 movement parameters. All coordinates are in Talairach space (Talairach and Tournoux 1988).
Comparing repeated to novelSR trials allowed us to explore the effect of stimulus repetition (perceptual priming) in isolation. Most importantly, task responses were held constant because both types of trials required the same response to both scenes.
Behavioral Perceptual Priming
To study the effect of perceptual priming on response latency, the response time from the second scene was analyzed as a function of whether it was repeated or novelSR (Fig. 2A). Only correct responses were included in response time analyses. The contributions of visibility and stimulus repetition to response times were analyzed with a 2 (visibility: high–high, low–low) by 2 (stimulus repetition: repeated, novelSR) repeated measures analysis of variance (ANOVA). This analysis revealed main effects of visibility (F1,17 = 39.13, P < 0.0001), with faster responses to high-visibility (692 ms) than low-visibility scenes (807 ms), and of stimulus repetition (F1,17 = 14.16, P = 0.002), with faster responses to repeated (732 ms) than novelSR scenes (768 ms). There was also an interaction between visibility and stimulus repetition (F1,17 = 4.08, P = 0.059) with more perceptual priming for high-visibility (57 ms; t17 = 4.17, P = 0.001) than low-visibility scenes (13 ms; t < 1). The lack of priming in the low-visibility condition was not the result of a ceiling or scaling effect because robust priming was observed in high–low trials even though their overall response latency was longer than low–low trials (P < 0.05). (It is interesting to consider the null effect for low-visibility scenes in relation to a study that found more priming under perceptually demanding conditions [Ostergaard 1998]. In that study, subjects completed a word-naming task in which the words were either faded in gradually over 5 s or presented intact immediately. Response times were slower for the faded-in words, but more priming was observed when the same words were later repeated. Because less perceptual information was available when the responses were executed in the faded-in condition, it could be argued that priming effects are larger for degraded stimuli. However, there are several salient differences between that study and the present one. First, because response times were much faster for the immediate words, it is possible that the smaller priming effect in that condition was the result of a floor or scaling effect. Second, even though the stimuli in the faded-in condition were degraded in absolute terms, visibility may have reached ceiling at this level of degradation. In support of this, naming accuracy was quite high in both the faded-in (94.5%) and immediate (97.4%) conditions. Thus, it appears that the 2 studies differ in how perceptual difficulty was defined: ours based on response accuracy, and theirs based on the fade-in procedure).
The effect of perceptual priming on response accuracy was examined by comparing the accuracy of indoor/outdoor judgments to the second scene of repeated versus novelSR trials (Fig. 2B). The same two-factor ANOVA revealed a main effect of visibility (F1,17 = 26.43, P < 0.0001), with more accurate responses to high-visibility (96%) than low-visibility scenes (87%), but no main effect of stimulus repetition (F1,17 = 2.34, P = 0.15) or interaction between visibility and stimulus repetition (F < 1). Thus, there was little evidence in the ANOVA of accuracy priming; however, follow-up analyses revealed an effect of stimulus repetition on accuracy for high-visibility scenes (P < 0.001) but not for low-visibility scenes (P = 0.65). It is unlikely that the lack of an effect of perceptual priming on response accuracy in the low–low condition was the result of floor or ceiling effects: accuracy was well above chance, and there was a significant effect for high-visibility stimuli, even though their overall accuracy was higher. Although accuracy in responding to the first scene in a trial was calibrated to 70% for low-visibility stimuli, accuracy in responding to a second low-visibility scene was higher than 70%. This may reflect a response bias, as discussed in Supplementary Material. Importantly, such a response bias could not affect the comparison of low-visibility repeated and novelSR conditions, because they required the same pattern of indoor/outdoor responses, and response accuracy was identical (P = 0.65).
To verify that the calibration algorithm functioned as desired, responses to the first scene in each trial were submitted to an ANOVA. Because the first scene in every trial was novel, there should be no differences in response accuracy as a function of stimulus repetition condition. Moreover, response accuracy should be close to our desired threshold because these responses formed the basis of the calibration. The ANOVA confirmed these predictions; there was no main effect of stimulus repetition or interaction between stimulus repetition and visibility (F values < 1) but a main effect of visibility (F1,17 = 143.81, P < 0.0001), with greater accuracy for high-visibility (92%) than low-visibility scenes (70%)—similar values to the initial parameters.
Neural Perceptual Priming
The effect of stimulus repetition on BOLD responses to degraded and undegraded stimuli can be examined by comparing the fitted responses of repeated and novelSR trials in our scene-selective PPA ROIs (Fig. 3B). It was hypothesized that the BOLD response to repeated high-visibility trials would be attenuated relative to novelSR high-visibility trials because the neural response to the second scene in repeated trials would be suppressed as a result of sharpening; by contrast, the second scene in novelSR high-visibility trials was completely novel and would elicit a typical neural response. Conversely, it was hypothesized that low-visibility repeated trials would produce a greater BOLD response than low-visibility novelSR trials, either because existing representations are invariant to degradation or because prior experience allows for attentional amplification of stimulus attributes.
To test our predictions, fitted responses in the PPA (L: −27, −46, −8; R: 29, −44, −8) were submitted to the same ANOVA as behavioral responses. Because there were no hemispheric differences or interactions in the PPA (P values > 0.10), data from the left and right PPA were collapsed (see also Yi and others 2004; Yi and Chun 2005; Turk-Browne and others 2006). In bilateral PPA, there was a main effect of visibility (F1,17 = 51.46, P < 0.0001), with greater responses to high-visibility (0.48% signal change [s.c.]) than low-visibility trials (0.31% s.c.), but no main effect of stimulus repetition (F < 1). Crucially, there was a crossover interaction between visibility and stimulus repetition (F1,17 = 14.64, P = 0.001), reflecting attenuation for repeated high-visibility trials (0.037% s.c.; t17 = 2.21, P = 0.041) and enhancement for low-visibility trials (0.056% s.c.; t17 = 2.14, P = 0.047). These results are consistent with our hypothesis that experience has dissociable effects on perceptual processing, as a function of stimulus quality. Although previous studies that used degraded scene stimuli observed only attenuation (Yi and others 2004, 2006), their levels of degradation were very similar to our high-visibility condition, in which contrast was reduced by 45% to bring performance off ceiling.
Based on sharpening and invariance/amplification hypotheses, interesting predictions could be made for the high–low and low–high conditions. For example, one might predict that responses to the high–low repeated trials would be enhanced relative to the novelSR trials because prior (high visibility) experience would allow the repeated scene to be less affected by degradation than the novel scene. However, our ability to test these hypotheses was complicated by the fact that the hybrid trials introduced a sharp change in overall image contrast, which may cue novelty, obliterating repetition effects. Indeed, the results, reported in Supplementary Material, were ambiguous. These trials remained important, however, to prevent subjects from anticipating the visibility of the second scene in a trial.
To examine regions outside of the PPA, exploratory whole-brain analyses were conducted using attenuation (novelSR > repeated) and enhancement (repeated > novelSR) contrasts for both high–high and low–low trials with an uncorrected statistical threshold of P < 0.001 and a cluster threshold of 5 voxels. High-visibility trials produced attenuation in a region of the left parietal lobe near the precuneus (−21, −60, 28) and enhancement in 3 regions: left superior temporal gyrus (−48, 11, −6), right inferior parietal lobule (50, −45, 41), and left dorsal anterior cingulate cortex (−3, −25, 35). Several studies have reported attenuation in other ventral occipital and temporal regions for high-visibility stimuli (see Henson 2003). One possible explanation for the lack of such effects is that our signal was reduced by presenting scenes for only 50 ms and at reduced contrast. In line with this possibility, a more liberal threshold (uncorrected P < 0.005) revealed attenuation in a visual area—the middle occipital gyrus (−9, −95, 16)—that has been observed in other studies (e.g., Buckner and others 1998). Moreover, when the same stimuli presented for 200 ms at full contrast were repeated, a whole-brain attenuation effect was observed in IT (Turk-Browne and others 2006).
At the original threshold, low-visibility trials produced attenuation in the left frontal lobe near rostral anterior cingulate cortex (−18, 38, −4) and enhancement in several regions, including right inferior parietal lobule (45, −30, 37), left postcentral gyrus (−42, −18, 48), left and right cerebellum (−6, −59, −5; 30, −45, −20), left frontal lobe near middle frontal gyrus (−21, 0, 53), and left and right precuneus (−21, −71, 34; 12, −36, 43). Enhancement in these regions (especially inferior parietal lobule) under conditions of both low and high visibility may reflect explicit memory for the second scene in repeated trials (Schott and others 2005).
Response Priming Analyses
Although we were primarily interested in perceptual priming, it was necessary to have trials in which the second scene required a different response to prevent subjects from only attending to the first scene. Comparing these novelDR catch trials to novelSR trials allowed us to explore the effect of response repetition (response priming), controlling for stimulus novelty. However, because only one-third of trials were of this type, they may have been adversely affected by a response bias that favored repeating the same response (which would have equally affected repeated and novelSR trials, leaving our perceptual priming comparison unconfounded). For this reason, response priming analyses are reported only for completeness in Supplementary Material.
Early findings of attenuated BOLD responses to repeated stimuli (e.g., Demb and others 1995; Buckner and others 1998) were taken as evidence of repetition suppression, a well-characterized finding from neurophysiology (e.g., Schacter and Buckner 1998; Wiggs and Martin 1998). Further support for this reduction has emerged from studies linking attenuation to behavioral priming (Maccotta and Buckner 2004; Wig and others 2005; Turk-Browne and others 2006) because priming may result from the reactivation of a sparser and more selective representation (Wiggs and Martin 1998). However, if attenuation is a BOLD signature of the process of representation, studies finding enhanced BOLD responses to repeated stimuli in the same regions are perplexing. Resolving this apparent contradiction is crucial for understanding how changes in BOLD response correspond to experience-induced changes in cortex.
There have been two recent attempts at identifying the conditions under which attenuation and enhancement occur. The recognition hypothesis postulates that attenuation is the default consequence of repetition and that enhancement is observed when a stimulus is recognized only upon repetition (Henson 2003). For example, attenuation is observed when famous faces are repeated because they are recognized each time, but enhancement is observed when novel faces are repeated because they can only be recognized after repetition (Henson and others 2000). The accumulation hypothesis argues that BOLD attenuation is the result of a faster peak neural response (e.g., James and others 2000; Henson and others 2002; Noguchi and others 2004), which results in less area under the neural response curve (James and Gauthier 2005); area under the curve has been positively correlated with the magnitude of BOLD responses (Boynton and others 1996). According to accumulation, enhancement would occur when a repeated stimulus has a later peak neural response than a novel stimulus.
There have been several recent attempts to relate BOLD attenuation to behavior. Two fMRI studies have reported that attenuation in inferior frontal gyrus and in middle/inferior temporal regions is correlated with repetition priming across subjects (Maccotta and Buckner 2004; Turk-Browne and others 2006). A third study has revealed that transcranial magnetic stimulation (TMS) of the left inferior frontal gyrus during encoding eliminates both repetition priming and attenuation in these same regions (Wig and others 2005). These results suggest that repetition attenuation may reflect more efficient processing of repeated stimuli as a result of sharpening, which has also been proposed as the cause of behavioral priming (Wiggs and Martin 1998). Both the accumulation and recognition hypotheses are compatible with a tight link between attenuation and priming. The accumulation hypothesis argues that the peak of a neural response corresponds to the point in time when sufficient information is available for a response to be executed; thus, early peaks should correspond to shorter behavioral latencies and lower BOLD responses (e.g., James and Gauthier 2005). The recognition hypothesis is neutral with respect to neural mechanisms; it would be compatible with the mechanism proposed by accumulation models but does not necessitate a strong relationship between neural and behavioral latencies (Henson 2003).
The relationship between BOLD enhancement and behavior has received far less attention. In fact, to the best of our knowledge, no study has examined the pattern of speeded responses that accompanies BOLD enhancement (cf., James and Gauthier 2005). The recognition hypothesis may predict that because enhancement results from improved recognition of repeated stimuli, responses to these stimuli in a recognition task should be faster and/or more accurate—although such claims are absent from a recent formulation of the recognition hypothesis (Henson 2003). The accumulation hypothesis, on the other hand, makes the strong prediction that enhancement reflects later peak neural responses and hence longer response latencies: “repetition enhancement should accompany a performance deficit” (James and Gauthier 2005, p. 39).
Prima facie, the current results are incompatible with predictions of the recognition and accumulation hypotheses with respect to enhancement. The problem with the original formulation of the recognition hypothesis (Henson and others 2000) is that our high- and low-visibility stimuli were both initially novel. That is, if enhancement is caused by the recognition of previously novel repeated stimuli, then we should have observed enhancement rather than attenuation in the high-visibility condition. However, a more recent version of the recognition hypothesis (Henson 2003) resolves this issue by distinguishing 2 forms of recognition: 1) categorical (recognizing a type: “a living room”) and 2) exemplar (recognizing a token: “my living room”). Thus, in the case of high-visibility trials, the same categorical recognition may have occurred for the second scene of both repeated and novelSR trials, revealing attenuation. Although additional exemplar recognition was possible for repeated trials, such recognition was irrelevant to the indoor/outdoor task, possibly minimizing its influence.
The enhancement observed in low-visibility trials might then be attributed to more categorical recognition of the second scene for repeated than novelSR trials. However, there was little behavioral evidence of this if one assumes that better categorical recognition should facilitate categorical judgments: responses to repeated and novelSR stimuli were equally accurate (P = 0.65, ηP2 = 0.012) and fast (P = 0.38, ηP2 = 0.046). As noted earlier, floor and ceiling effects were unlikely. It remains possible that a larger sample size would reveal differences, although the effect sizes are small even by large sample standards. Alternatively, the lack of correlation between better recognition and speeded responses may not be problematic for the recognition hypothesis, as it does not make strong claims about behavior. Moreover, the recognition hypothesis finds support in our whole-brain analyses, which revealed robust enhancement for low-visibility trials in several parietal and frontal regions, possibly reflecting recognition-related activity (Schott and others 2005).
Unlike the recognition hypothesis, the accumulation hypothesis makes very specific predictions about behavioral performance in our protocol. The fact that priming accompanied attenuation in the high-visibility condition is entirely supportive of a tight correlation between behavioral and neural response latency. An additional test of the accumulation hypothesis is the strong prediction that BOLD enhancement should be accompanied by a performance deficit (James and Gauthier 2005). As presented above, there was no evidence that enhancement in the low-visibility condition was accompanied by a slowing of response times. Although the lack of difference in response times is a null effect, we do not believe that it can be easily explained by speed-accuracy trade-offs, because response accuracy was equivalent, or by a lack of power, because effect sizes were very small. Thus, we believe that the results from the low-visibility condition may be incompatible with the accumulation hypothesis in its current form (James and Gauthier 2005). Despite their difficulty in accounting for the full range of our data, the recognition and accumulation hypotheses both provide parsimonious explanations for several findings in the literature. Moreover, we do not believe that they are necessarily irreconcilable with the neurophysiological mechanisms discussed below.
Whereas BOLD attenuation has been linked to suppressed responses of single neurons, BOLD enhancement has lacked this thorough neurophysiological grounding. One possible source of enhancement could be an increase in the size of stimulus representations upon repetition as a result of cortical recruitment; although such effects have been mostly reported as changes in cortical topography during extensive training in low-level perceptual tasks (Gilbert and others 2001). It is also possible that enhancement, like attenuation, may reflect changes in the duration of neural activity, rather than changes in the magnitude of evoked responses (James and Gauthier 2005). Although the discussion below is centered on changes in the magnitude of evoked responses, we believe that many of the same mechanisms could be adapted to such temporal models.
Primate neurophysiology suggests that BOLD enhancement could result from greater firing rates and selectivity of a population of neurons to familiar degraded stimuli than to novel degraded stimuli. In one study (Rainer and Miller 2000), many neurons in PF cortex ceased responding selectively to novel stimuli upon degradation, but continued responding selectively to familiar stimuli. In another study (Rainer and others 2004), neurons in V4 responded more strongly to familiar degraded stimuli than to familiar undegraded stimuli, whereas the same was not true for novel stimuli. We observed analogous effects with fMRI by degrading images of scenes: the BOLD response in the PPA was greater for trials in which a low-visibility stimulus was repeated than for trials containing 2 novel stimuli and vice versa when the stimuli were highly visible. These results replicate several findings in the literature of repetition enhancement for low-visibility stimuli, including faces (Dolan and others 1997), objects (Grill-Spector and others 2000; James and Gauthier 2005), and low-salience shapes (Kourtzi and others 2005) and repetition attenuation for clearly visible stimuli, including scenes (e.g., Epstein and others 2003; Yi and Chun 2005), famous faces (Henson and others 2000), objects (Buckner and others 1998; Kourtzi and Kanwisher 2000; Vuilleumier and others 2002), and pop-out shapes (Kourtzi and others 2005). Our results advance these earlier findings by demonstrating within-subject attenuation and enhancement using a common task and stimuli that were controlled for novelty.
The earlier studies of PF cortex and V4 (Rainer and Miller 2000; Rainer and others 2004) provide 2 possible explanations for the enhancement we observed in the PPA. In both studies, more neurons were selective for degraded familiar stimuli than for degraded novel stimuli; however, the baseline of the subtraction was different in the 2 cases. In PF cortex, the selectivity of responses to familiar stimuli was invariant to degradation, whereas the selectivity of responses to novel stimuli declined with degradation. This invariance suggests that PF cortex may construct abstracted representations, ensuring that stimuli can be recognized under variable viewing conditions (Rainer and Miller 2000). In V4, the selectivity of responses to novel stimuli declined slightly with degradation, but enhancement was mostly carried by the stronger selectivity of responses to degraded familiar stimuli. This amplification suggests that under difficult viewing conditions V4 can be recruited to make task-relevant features accessible (Rainer and others 2004).
There are a couple of issues to consider before comparing these studies to the current results. First, there are substantial differences in training, as detailed below. Second, it is unclear whether the greater selectivity of responses to degraded familiar stimuli will translate to greater average firing rates measurable by fMRI (Mukamel and others 2005)—although the study by Rainer and others (2004) is suggestive of such a link. More research, possibly with simultaneous single-unit recordings and fMRI (e.g., Logothetis and others 2001), will be necessary to fully understand the relationship of repetition-induced changes in stimulus selectivity/preference to changes in BOLD responses.
With these caveats in mind, we believe that our results in the PPA most closely parallel findings from primate PF cortex (Fig. 4). In Rainer and Miller (2000), more neurons were selective for novel objects (63 of 160 neurons) than for familiar objects (40 of 164 neurons) at the 100% stimulus level (undegraded). However, at the 65% stimulus level (moderately degraded), only a small fraction of the neurons that had selectively responded to undegraded novel objects continued responding (9 of 63 neurons), whereas most of the neurons that had selectively responded to undegraded familiar stimuli remained responsive (31 of 40 neurons). Analogously in our study, the PPA BOLD response to novel scenes decreased by 44% when contrast was reduced, whereas the response to repeated scenes was less affected by degradation (P = 0.01), decreasing by only 26%. Thus, similar to PF cortex, the PPA may be relatively invariant to degradation, ensuring that stimulus representations can be reactivated by degraded visual input. Moreover, when undegraded, repeated stimuli elicited weaker responses than novel stimuli in PF cortex and in the PPA. Further support for an analogy between PF cortex and the PPA comes from the TMS study by Wig and others (2005), which demonstrated that middle temporal gyrus and PF cortex may be involved in similar kinds of visual/semantic processing: application of TMS to the left inferior frontal gyrus eliminated attenuation in both of these regions. Finally, as noted earlier, there have been reports of correlations between behavioral priming and attenuation in both regions (Maccotta and Buckner 2004; Turk-Browne and others 2006).
Contrary to the present situation, the enhancement observed in single cells (Rainer and Miller 2000; Rainer and others 2004) was accompanied by improvements in behavioral performance. The monkeys in both studies completed a modified delayed match to sample task, in which they were briefly presented with a sample at one of several levels of degradation and, after a delay, with an undegraded test stimulus. Performance in identifying whether the sample matched the test was better for familiar than for novel stimuli—the same comparison that elicited enhancement. There are several salient differences between those experiments and the current one. First, whereas stimuli were repeated only once in the current experiment, the monkeys were extensively trained with their familiar stimuli for 5 days (Rainer and Miller 2000) to 4 weeks (Rainer and others 2004). Second, much of that training involved undegraded exposures to the stimuli, both as samples at the 100% stimulus level and as a subset of the test items (which were always undegraded). When our subjects were “trained” with undegraded stimuli, that is, high–high and high–low trials, we observed robust behavioral improvements in the form of repetition priming. Thus, although we failed to observe performance improvements associated with enhancement, such improvements are likely to emerge with increased training or with training on high-visibility stimuli. In line with this view, substantial training on severely degraded stimuli leads to improvements in recognition performance and BOLD enhancement (Grill-Spector and others 2000). Interestingly, repetition-based neural enhancement effects may be detectable before behavioral improvements.
In closing, the current results help to resolve an apparent contradiction in the literature by suggesting that visual quality can determine whether repeated stimuli will elicit attenuated or enhanced BOLD responses. The repetition of clearly visible stimuli produced less BOLD activity in the PPA, as well as faster response times, possibly reflecting a sharpened representation. The repetition of degraded stimuli resulted in greater BOLD activity in the PPA, possibly reflecting the invariance of familiar stimuli to degradation. This putative spared access to visual representations under suboptimal viewing conditions is highly relevant to real-world vision, where occlusion and interference often reduce the quality of our visual input.
Supplementary material can be found at: http://www.cercor.oxfordjournals.org/.
This work was supported by National Institutes of Health (NIH) Grant EY014193 to MMC. NBTB was further supported by a foreign Natural Sciences and Engineering Research Council of Canada Postgraduate Scholarship. ABL was also supported by an NIH National Research Service Award. The authors would like to thank David Widders for his assistance in data collection and 3 anonymous reviewers for helpful comments. Conflict of Interest: None declared.