## Abstract

Scene categorization draws on 2 information sources: The identities of objects scenes contain and scenes' intrinsic spatial properties. Because these resources are formally independent, it is possible for them to leads to conflicting judgments of scene category. We tested the hypothesis that the potential for such conflicts is mitigated by a system of “crosstalk” between object- and spatial layout-processing pathways, under which the encoded spatial properties of scenes are biased by scenes' object contents. Specifically, we show that the presence of objects strongly associated with a given scene category can bias the encoded spatial properties of scenes containing them toward the average of that category, an effect which is evident both in behavioral measures of scenes' perceived spatial properties and in scene-evoked multivoxel patterns recorded with functional magnetic resonance imaging from the parahippocampal place area (PPA), a region associated with the processing of scenes' spatial properties. These results indicate that harmonization of object- and spatial property-based estimates of scene identity begins when spatial properties are encoded, and that the PPA plays a central role in this process.

## Introduction

Fast and accurate scene recognition is critical to daily life, allowing us to navigate from one place to another and to interact efficiently and appropriately with the environment at each point along the way. Objects have often been cast as the fundamental building blocks of scenes, with scene recognition proposed to emerge from a cataloging of the types of objects in a scene and the spatial relationships among them (Biederman 1972, 1987; Friedman 1979; Biederman et al. 1982; De Graef et al. 1990). Consistent with this view, behavioral studies have shown that scene recognition falters when highly informative objects (e.g., refrigerators or toilets) are removed from their associated scenes (MacEvoy and Epstein 2011) or when scenes contain incongruent objects (Davenport and Potter 2004; Joubert et al. 2007). More recently, however, theoretical and behavioral studies of scene recognition have demonstrated a substantial role for scenes' intrinsic global properties, particularly spatial properties such as depth, openness, and navigability, complementing the category cues provided by objects (Schyns and Oliva 1994; Oliva and Schyns 2000; Oliva and Torralba 2001, 2006; Renninger and Malik 2004; McCotter et al. 2005; Fei-Fei et al. 2007; Vogel and Schiele 2007; Greene and Oliva 2009).

From a physical perspective, the kinds of objects scenes contain and scenes' spatial properties are often unrelated. To be sure, these factors place constraints on each other. Objects help define a scene's spatial dimensions, and a scene's spatial dimensions may place limits on the types of objects it can contain. Yet, many scene categories that differ from each other in their spatial properties can nevertheless accommodate each other's associated objects without grossly altering their properties or violating physical law. For instance, although bathrooms and kitchens may differ in their average dimensions at the category level, there is usually no physical (as opposed to semantic) impediment to replacing a kitchen's refrigerator with a shower, or a bathroom's bathtub with a stove, and little change in either scene's spatial properties as a consequence of the switch. In general, all other things (e.g., object size) being equal, the identities of objects in a scene and the scenes' spatial properties are, with a few exceptions, logically independent.

The reliance of scene recognition on 2 cues, that can vary fairly freely with respect to each other raises a problem. How do scene categorization decisions cope with the potential for conflicts between the categories most associated with each cue? Consider, for example, the problem of categorizing a large bathroom. While the objects present (a sink, toilet, and bathtub) might be closely associated with bathrooms, the spatial dimensions of the room might be more closely associated with a different room category, such as a kitchen. Drawing from models positing that both objects and scenes' spatial properties can activate schemata or context frames for specific scene categories (Biederman 1972; Palmer 1975; Friedman 1979; Antes et al. 1981; Biederman et al. 1982; Loftus et al. 1983; Boyce and Pollatsek 1992; Bar and Ullman 1996; Hollingworth and Henderson 1998; Bar 2004; Mudrik et al. 2010), we might imagine that this conflict is resolved via some negotiation between the schemata activated by each resource. In this view, objects and spatial properties are processed independently through the stage at which each triggers a “hypothesis” of the room's category. An alternative is that potential conflicts between the room's object contents and spatial properties are blunted at the stage at which those resources are encoded, that is, before schemata are activated by each. Information from whichever resource is likely to be more reliable under a given set of circumstances (likely objects in this scenario) how the other resource is encoded, with the goal of maximizing the likelihood that each resource ultimately activates the same schema.

In the present study, we used a combination of behavioral and neuroimaging techniques to assess whether any such encoding-stage bias occurs during scene viewing, specifically asking whether the encoded values of scenes' spatial properties are influenced by scenes' object contents. Taking advantage of the susceptibility of perceived spatial properties to negative aftereffects (Greene and Oliva 2010), we first asked participants in a series of behavioral experiments to judge the spatial scales of average-sized bathrooms and kitchens after prolonged exposure to either very large or very small scenes from the same category. We find that the magnitudes of aftereffects produced by both small and large adapting rooms were significantly smaller when objects which were strongly informative of scene category, such as refrigerators or toilets, were visible in adapting scenes versus when they were masked. These results indicate that the presence of informative objects in adapting scenes biased their encoded spatial properties toward values associated with the average of their category. Next, using functional magnetic resonance imaging (fMRI), we observed an essentially identical bias in scene-evoked activity patterns in the parahippocampal place area (PPA), a region which has been associated with the encoding of scenes' spatial properties (Aguirre et al. 1998; Epstein and Kanwisher 1998; Kravitz et al. 2011; Mullally and Maguire 2011, 2013; Park et al. 2011), as well as processing of scenes' contextual associations (Aminoff et al. 2007; Hassabis et al. 2007; Summerfield et al. 2010; Howard et al. 2011). We propose that these behavioral and physiological biases reflect “crosstalk” between object- and spatial property-processing pathways that serves scene recognition by harmonizing estimates of scene identity derived from each of these information resources, and that this process is mediated by the PPA.

## Materials and Methods

### Behavioral Experiments

#### Stimuli

Visual stimuli were real-world bathroom image exemplars assembled for the behavioral experiments, except in gray- or blue-scale format (Supplementary Fig. 3). Both masked and unmasked versions of these images were used. Scenes subtended 9.3° of visual angle.

#### Experimental Procedure

Each stimulus event consisted of a bathroom image presented for 150 ms, followed by a white fixation cross for 1350 ms. Five types of bathrooms were shown: Exemplars from the top and bottom spaciousness quintiles shown both with informative objects unmasked and masked, and exemplars from the middle quintile with objects unmasked. Participants indicated the color of the bathroom (gray or blue) by button press when the fixation cross appeared. The 5 scene types along with 3-s null events were ordered according to third-order counterbalanced de Bruijn sequences, a general class of pseudorandom sequences that provide the minimum length sequence needed to achieve a desired depth of stimulus counterbalance for a condition set of arbitrary size (Aguirre et al. 2011; MacEvoy and Yang 2012). Each scan run contained 36 repetitions of each stimulus type. Runs lasted 6 min and 18 s, including 15-s fixation-only intervals attached to the end of each run. Unique stimulus sequences were constructed for 6 scan runs for each subject. Scan sessions also included 2 functional localizer scans lasting 7 min 48 s each, during which subjects viewed blocks of color photographs of scenes, faces, common objects, and scrambled objects presented at a rate of 1.33 pictures per second (Epstein and Higgins 2006). Localizer stimuli subtended 15° of visual angle.

#### MRI Acquisition

All scan sessions were conducted at the Brown University MRI Research Facility using a 3-T Siemens Trio scanner and a 32-channel head coil. Structural T1*-weighted images for anatomical localization were acquired using a 3D MPRAGE pulse sequences [time repetition (TR) = 1620 ms, time echo (TE) = 3 ms, time to inversion (TI) = 950 ms, voxel size = 0.9766 × 0.9766 × 1 mm, matrix size = 192 × 256 × 160]. T2*-weighted scans sensitive to blood oxygenation level-dependent contrasts were acquired using a gradient-echo, echo-planar pulse sequence (TR = 3000 ms, TE = 30 ms, voxel size = 3 × 3 × 3 mm, matrix size = 64 × 64 × 45). Visual stimuli were rear projected onto a screen at the head end of the scanner bore and viewed through a mirror affixed to the head coil. The entire projected field subtended 24° × 18° at 1024 × 768 pixel resolution.

#### fMRI Analysis

Functional images were corrected for differences in slice timing by resampling slices in time to match the first slice of each volume, realigned with respect to the first image of the scan, and spatially normalized to the Montreal Neurological Institute (MNI) template. Volumes from experimental scans were analyzed with general linear models (one for each scan run) implemented in SPM8 (http://www.fil.ion.ucl.ac.uk/spm), including an empirically derived 1/f noise model, filters that removed high and low temporal frequencies, and nuisance regressors to account for global signal variations, between-scan signal differences, and participant movements. Beta-value maps were extracted for each stimulus condition for each scan.

For each subject, a permutation test was used to identify those voxels whose responses varied significantly among scene types and would therefore be passed to multivoxel pattern analysis (MVPA) for hypothesis testing. For each voxel, we stored the F statistic from a one-way analysis of variance (ANOVA) performed on beta values from each of the 5 bathroom types, sampled across the 6 scans. This statistic was compared with a null distribution of F statistics computed from 10 000 within-scan permutations of condition labels, accumulated across all voxels. A voxel was passed to subsequent analysis if its unpermuted F statistic exceeded the 95th percentile of the null distribution. Selection based on a null distribution accumulated across all voxels was a conservative approach, ensuring that only the voxels with responses differing most consistently across conditions were selected for further analysis. Note, however, that while this procedure identified voxels whose responses varied among stimuli, it was not biased toward identifying voxels with any particular ordinal relationships among those responses. That is, a voxel could satisfy our selection criterion as easily with responses to stimuli labeled A, B, C, D, and E that reliably fell in the order A > B > C > D > E as with responses that were reliably ordered B > C > A > E > D, or any other order. Because our hypotheses addressed ordinal relationships, as explained in the following paragraph, our ANOVA-based feature selection procedure thus did not amount to “peeking.” We required that at least 7 voxels be selected from each region of interest (ROI; see below for definitions) in each participant. This minimum was selected because it equaled the number of voxels in each searchlight cluster used for whole-brain analyses, as described in a following section. Response vectors composed of selected voxels were generated for each stimulus type and averaged across scans, and pairwise Euclidean distances among all vectors were computed for each participant.

#### Statistical Analysis

Whole-brain searchlight pattern analysis was performed with 3 mm radius (7 voxels) searchlights centered on each voxel in the brain (Kriegeskorte et al. 2006). To measure any local effect of object visibility on scenes' encoded spatial properties, Euclidean distances among patterns evoked by each stimulus at each searchlight position were used to compute the distance contrast [(distance from masked high-spaciousness to masked low-spaciousness) minus (distance from unmasked high-spaciousness to unmasked low-spaciousness)]. The contrast value for each searchlight cluster was assigned to the voxel at its center. Resulting single-participant contrast volumes were passed to a second-level exact permutation test implemented with SnPM (http://go.warwick.ac.uk/tenichols/snpm) and custom MATLAB scripts to assess the group-level significance of regions showing large subject-averaged contrast values, which were consistent with a biasing effect of objects. First, voxelwise variance was smoothed with a 3-mm full-width at half-maximum (FWHM) Gaussian filter under the nonparametric assumption of smooth underlying variance in the searchlight volumes (Nichols and Holmes 2002). Smoothed variance maps were used to compute maps of pseudo t-values for each of the 212 sign permutations of the 12 single-subject contrast volumes. The resulting distributions of pseudo t-values for each voxel were used to identify voxels in each permuted volume whose pseudo t-values were encountered with a probability <0.001, and the size of the largest 6-connected cluster of such voxels recorded for each permutation volume. Clusters identified in the same way from unpermuted volumes were considered significant if their sizes were exceeded by fewer than 5% of elements in the distribution of maximum cluster sizes accumulated across permuted volumes. The thresholded second-level volume was projected onto a surface-based representation of the MNI canonical brain with the SPM Surfrend toolbox (http://spmsurfrend.sourceforge.net), and then rendered in NeuroLens (http://www.neurolens.org).

#### Regions of Interest

All ROIs were defined from localizer scans using a recently described algorithmic approach applied to data from localizer scans (Julian et al. 2012). Briefly, for each contrast of interest (e.g., scenes > objects), a whole-brain group volume was created in which each voxel was tagged with the proportion of subjects in which that voxel showed an activation difference exceeding a threshold of t = 1.6. A 3-mm FWHM Gaussian filter was applied to this volume, followed by a watershed algorithm with an 8-connected part filter applied to each axial slice. The resulting volumes contained segmented parcels corresponding to activations shared between subjects. To reduce extraneous activations present in the segmented group volumes, parcels generated from the activations of fewer than 50% of subjects were removed. Individual subject ROIs associated with a given contrast were defined from the intersection between the shared activation volume and each subject's contrast map thresholded at t = 1.6. This procedure was performed for the contrasts of scenes > objects [to identify the PPA, retrospenial complex (RSC), and transverse occipital sulcus (TOS)], objects > scrambled objects [lateral occipital (LO) and posterior fusiform sulcus (pFs) subdivisions of the lateral occipital complex (LOC)], and scrambled objects > objects (early visual cortex). All voxels identified by the scenes > objects contrast that were inferior to the splenium of the corpus callosum were assigned to the PPA, and all superior voxels assigned to the RSC. An 11-voxel region of overlap between the group-defined candidate regions for the right PPA and right pFs was assigned to the PPA.

## Results

### Behavioral Experiments

Consistent with the susceptibility of scene spatial properties to adaptation (Greene and Oliva 2010), participants in the real bathroom group (n = 34) judged average bathrooms to be significantly less spacious after adaptation to high-spaciousness bathrooms than after adaptation to low-spaciousness bathrooms, all with objects unmasked (Fig. 2A, vertical difference between data points on the left; one-tailed permutation test, P = 0.0001). The presence of this basic aftereffect demonstrates (1) that the perceptual quantity of spaciousness is subject to aftereffects similar to those described previously for individuated spatial properties of scenes, and (2) that aftereffects can be observed even within the spatial constraints of a single indoor scene category.

Figure 2.

Behavioral results. (A) Average-spaciousness test bathrooms were judged significantly smaller after adaptation to high-spaciousness bathrooms than after adaptation to low-spaciousness bathrooms (black vs. gray filled squares). Spaciousness ratings after adaptation to high-spaciousness bathrooms were significantly lower when adapting scenes' informative objects were masked. Object masking in low-spaciousness adapting scenes produced the opposite effect. (B) Real kitchens were similarly subject to basic (i.e., high vs. low) aftereffects, although a significant impact of informative objects was only present in aftereffects produced by low-spaciousness exemplars. (C) Bidirectional enhancement of aftereffects was restored in computer-rendered kitchens, which allowed exact specification of object contents. (D) Bidirectional enhancement was also evident in ratings of kitchens after adaptation to extreme bathrooms. Error bars are SEM; P = 0.065; *P < 0.05; **P < 0.01; ***P < 0.001.

Figure 2.

Behavioral results. (A) Average-spaciousness test bathrooms were judged significantly smaller after adaptation to high-spaciousness bathrooms than after adaptation to low-spaciousness bathrooms (black vs. gray filled squares). Spaciousness ratings after adaptation to high-spaciousness bathrooms were significantly lower when adapting scenes' informative objects were masked. Object masking in low-spaciousness adapting scenes produced the opposite effect. (B) Real kitchens were similarly subject to basic (i.e., high vs. low) aftereffects, although a significant impact of informative objects was only present in aftereffects produced by low-spaciousness exemplars. (C) Bidirectional enhancement of aftereffects was restored in computer-rendered kitchens, which allowed exact specification of object contents. (D) Bidirectional enhancement was also evident in ratings of kitchens after adaptation to extreme bathrooms. Error bars are SEM; P = 0.065; *P < 0.05; **P < 0.01; ***P < 0.001.

One explanation for the absence of aftereffect enhancement with high-spaciousness kitchens is that the extra space in large kitchens allowed them to accommodate a greater number of objects carrying information about scene category than low-spaciousness kitchens, potentially blunting the impact of masking the limited set of informative objects we targeted for masking; this potential complication applied less to bathrooms because they were associated with fewer informative objects to begin with (Table 1). Consistent with this explanation, high-spaciousness kitchens contained significantly more of the objects in Table 1 than did high-spaciousness bathrooms (6.99 vs. 5.01 objects per scene on average; t(198) = 11.84, P < 0.0001). Moreover, we observed that large kitchens often contained many objects that, while not appearing on the list in Table 1, may still have been associated with kitchens (e.g., kitchen counter stools). Large bathrooms did not appear to collect extra potentially informative objects in a similar way.

Table 1

List of all objects nominated by online raters as associated with bathrooms and kitchens

Bathrooms Kitchens
Bathtub Cabinet
Mirror Countertop
Shower Dish
Sink Dishwasher
Toilet Microwave
Towel Oven
Vanity Refrigerator
Sink
Stove
Table
Utensil
Bathrooms Kitchens
Bathtub Cabinet
Mirror Countertop
Shower Dish
Sink Dishwasher
Toilet Microwave
Towel Oven
Vanity Refrigerator
Sink
Stove
Table
Utensil

### fMRI Experiment

Our behavioral results indicate that the presence of objects strongly associated with a particular scene category produces a “centripetal bias” in the encoded spatial properties of scenes containing them. To understand where in the visual system this bias arises, we used fMRI to record activity patterns evoked by exemplars of each of the 5 types of bathrooms used in the behavioral experiments: high- and low-spaciousness exemplars, both with and without informative objects masked, plus average-spaciousness exemplars with objects unmasked. Note that this experiment sought to directly measure neural responses to scenes varying in spaciousness and masking state rather than any neural signature of aftereffects those scenes produced. This direct approach was feasible because no perceptual judgments of scene spatial properties were required of subjects in the scanner, who solely judged whether scenes were shaded in gray or blue. Only bathrooms were used because they were associated with a more reliable centripetal bias in the perceptual experiments.

Our analysis focused on the PPA, within which activity patterns have been shown to track the spatial properties of scenes (Kravitz et al. 2011; Park et al. 2011; Harel et al. 2012). Consistent with these previous studies, distances among activity patterns evoked in the right PPA by high-, low-, and average-spaciousness bathrooms, all with objects unmasked, qualitatively matched differences among their spatial properties: The average distance between patterns evoked by high- and low-spaciousness rooms was significantly greater than that of distances between each of those patterns and patterns evoked by average-spaciousness rooms (Fig. 3A, permutation test, P = 0.028). This basic sensitivity to spatial properties was not significant in the left PPA, consistent with previous results suggesting greater sensitivity to spatial properties in the right PPA (Wagner et al. 1998; Kirchhoff et al. 2000; Epstein et al. 2003).

Figure 3.

Analysis of pattern distances in the right PPA. (A) Average Euclidean distances between patterns evoked by high- and low-spaciousness bathrooms with unmasked objects were significantly greater than the average of distances between each of those extremes and the pattern evoked by average-spaciousness bathrooms. (BD) Pattern distances satisfying the predictions made from behavioral results, demonstrating a centripetal object bias in the right PPA. The combination of these contrasts was not found in other ROIs. Distance data in each panel correspond to comparisons between patterns denoted by same-shaded arrows in the left half of each panel. Error bars are SEM. *P < 0.05; **P < 0.01; ***P < 0.001.

Figure 3.

Analysis of pattern distances in the right PPA. (A) Average Euclidean distances between patterns evoked by high- and low-spaciousness bathrooms with unmasked objects were significantly greater than the average of distances between each of those extremes and the pattern evoked by average-spaciousness bathrooms. (BD) Pattern distances satisfying the predictions made from behavioral results, demonstrating a centripetal object bias in the right PPA. The combination of these contrasts was not found in other ROIs. Distance data in each panel correspond to comparisons between patterns denoted by same-shaded arrows in the left half of each panel. Error bars are SEM. *P < 0.05; **P < 0.01; ***P < 0.001.

Although these results were encouraging, it was possible that the greater distances between patterns evoked by extreme scenes with objects masked arose from differences in cognitive processes related to object masking. Therefore, there was still a risk that the greater similarity of patterns evoked by unmasked extreme scenes to those evoked by average-spaciousness scenes reflected a direct effect of object masking state, rather than an effect of objects on encoded spatial properties. To achieve a more direct comparison between our behavioral results and PPA activity patterns, we used multidimensional scaling (MDS) to visualize and isolate PPA pattern dimensions that specifically corresponded to scenes' spatial properties. Matrices of pairwise Euclidean distances among the 5 scene-evoked patterns from the right PPA (Fig. 4A) were averaged across participants and passed to MDS, which produced as output the coordinates of patterns along the set of orthogonal dimensions that best accounted for the full suite of pairwise distances.

Figure 4.

Visualization of relationships among scene-evoked patterns in the right PPA. (A) Matrix of Euclidean distances among bathroom-evoked patterns, averaged across participants. (B) Corresponding positions of patterns in two-dimensional space returned by MDS; Dimensions 1 and 2 capture 36.0% and 30.7%, respectively, of total between-pattern distance. Dashed contours are bootstrap 95% confidence ellipses for pattern coordinates, based on 10 000 resamples. Positions of patterns along the second (vertical) dimension qualitatively match relative encoded spaciousness of scenes indicated by behavioral experiments.

Figure 4.

Visualization of relationships among scene-evoked patterns in the right PPA. (A) Matrix of Euclidean distances among bathroom-evoked patterns, averaged across participants. (B) Corresponding positions of patterns in two-dimensional space returned by MDS; Dimensions 1 and 2 capture 36.0% and 30.7%, respectively, of total between-pattern distance. Dashed contours are bootstrap 95% confidence ellipses for pattern coordinates, based on 10 000 resamples. Positions of patterns along the second (vertical) dimension qualitatively match relative encoded spaciousness of scenes indicated by behavioral experiments.

The positions of right PPA patterns expressed in terms of the first 2 dimensions returned by MDS are shown in Figure 4B. Taken together, these 2 dimensions accounted for the majority of total pairwise pattern distance, and individually accounted for similar shares 36.0% and 30.7% for the first and second dimensions, respectively; the remaining 2 dimensions each accounted for substantially less distance (17.5% and 15.8% for the third and fourth dimensions, respectively). The first dimension (horizontal axis in Fig. 4B) appears to arrange patterns on the basis of masking state, legitimizing our concern that greater distances from patterns evoked by average-spaciousness scenes to those evoked by extreme scenes with masked objects versus those with unmasked objects might reflect object masking per se, rather than any feature of spatial property coding. In contrast, the second dimension (shown vertically in Fig. 4B) clearly arranges patterns in order of their evoking scenes' “ground-truth” spaciousness. The coordinate for average-spaciousness scenes along this dimension is intermediate between the coordinates for high- and low-spaciousness unmasked scenes and also intermediate between the coordinates for high- and low-spaciousness masked scenes. Furthermore, coordinates for both masked and unmasked scenes at each extreme fall on the same side of the coordinate for the average-spaciousness scene. These features identify this dimension as capturing PPA sensitivity to scenes' spatial properties (Kravitz et al. 2011; Park et al. 2011; Harel et al. 2012). Neither of the 2 higher dimensions displayed these features (Supplementary Fig. 4).

Critically coordinates of masked high- and low-spaciousness scenes along the second dimension are further from those of the average scene than are coordinates for their unmasked counterparts, exactly consistent with the centripetal bias we observed in our behavioral results. To assess the probability of observing this order by chance, we performed MDS on new distance matrices that were computed after random within-subject label swaps between activity patterns elicited by masked high-spaciousness scenes and by unmasked high-spaciousness scenes, and simultaneously between activation patterns elicited by masked low-spaciousness scenes and by unmasked low-spaciousness scenes. Across 10 000 sets of swaps, there was a probability of 0.019 of observing an MDS output dimension which (1) correctly ordered all 5 scenes in terms of their ground-truth spaciousness (as described in the previous paragraph) and (2) showed a mask-dependent increase in average distance from average- to extreme-spaciousness exemplars that were at least as large as the increase along the second dimension of Figure 4B. (At least one dimension that correctly ordered coordinates was observed for every swap; in the event that 2 such dimensions were returned, only the dimension accounting for the greater portion of pattern distance was considered.) These analyses indicate that patterns evoked in the PPA by extreme bathrooms with objects unmasked were more similar to those evoked by average bathrooms specifically along PPA pattern dimensions encoding scenes' spatial properties. No other ROI possessed a profile of pattern similarity consistent with the perceptual experiments. Data from all other ROIs, including RSC, TOS, and LOC, can be found in Supplementary Figures 5–15.

Finally, we used a searchlight analysis to identify any regions outside our selected ROIs in which relationships among scene-evoked patterns were consistent with a centripetal bias by informative objects. Consistent with our ROI analysis, in the occipitotemporal cortex, only voxel clusters corresponding to the anterior portions of PPA showed evidence of centripetal bias (Fig. 5A). Evidence for centripetal bias was also found in a single right hemispheric frontal lobe cluster (Fig. 5B).

Figure 5.

Results of searchlight analysis, showing results of second-level analysis of maps of the contrast [(pattern distance between high-/low-spaciousness masked bathrooms) minus (pattern distance between high-/low-spaciousness unmasked bathrooms)]. Statistical thresholds were determined via permutation testing, with voxel activations thresholded at P < 0.001 and a minimum cluster size of 7 voxels, which defined the 95th percentile of maximal cluster sizes across 10 000 condition label permutations. (A) A cluster of 8 voxels, centered at MNI 27/–34/–5, fell within the bounds of our group localizer for the right PPA (outlined in blue). (B) An additional cluster of 9 voxels was found in the right frontal lobe, centered at MNI 21/47/13.

Figure 5.

Results of searchlight analysis, showing results of second-level analysis of maps of the contrast [(pattern distance between high-/low-spaciousness masked bathrooms) minus (pattern distance between high-/low-spaciousness unmasked bathrooms)]. Statistical thresholds were determined via permutation testing, with voxel activations thresholded at P < 0.001 and a minimum cluster size of 7 voxels, which defined the 95th percentile of maximal cluster sizes across 10 000 condition label permutations. (A) A cluster of 8 voxels, centered at MNI 27/–34/–5, fell within the bounds of our group localizer for the right PPA (outlined in blue). (B) An additional cluster of 9 voxels was found in the right frontal lobe, centered at MNI 21/47/13.

## Discussion

We find that scenes' encoded spatial properties are influenced by the presence of informative objects, which bias encoded properties toward the average of each scene's category. This centripetal bias was evident both perceptually and in activity patterns in the PPA, a region that has been linked to processing of scenes' spatial properties. Because scenes' actual objective spatial properties are to some extent determined by the objects within them, it would not have been very surprising to find that the addition of objects exerted a negative effect on scenes' encoded spatial properties. Critically, however, we found that the presence of informative objects led to both high-spaciousness scenes being encoded as smaller and low-spaciousness scenes being encoded as larger. This contingent directionality indicates that the presence of objects influenced scenes' encoded spatial properties above and beyond what would be expected from objects' simple occupancy of space.

### Potential Explanations for Centripetal Bias

Moving beyond attention, it is very difficult to explain the centripetal bias as an outcome of some “direct” cognitive effect of object masking (i.e., an effect not mediated by some change in encoded spatial properties). Although it is possible and perhaps even likely that cognitive processes related to object recognition were differentially activated by masked and unmasked adapting scenes, this difference is unlikely to explain our results, for 2 reasons. First, it is unlikely that purely object-related cognitive differences would have influenced aftereffects exerted on the perceived spatial properties of test scenes. Secondly, even if they were able to exert such an influence, it is even less likely that they could have produced the bidirectional enhancements in aftereffects we observed. This is because any object-based differences in cognitive processes between masked and unmasked adapting scenes would have been identical for both high- and low-spaciousness adapters, and as a consequence, any potential contamination of spatial codes should therefore have likewise been identical. This conflicts with our observation that object masking exerted opposite effects on the encoded spatial properties of high- and low-spaciousness adapting scenes.

### Crosstalk Theory

Rather than an outcome of decision feedback or attention, we propose instead that the centripetal bias reflects a form of heretofore undescribed crosstalk between object- and spatial property-encoding pathways. In this theory, objects associated with a given scene category contribute a “normalizing” signal to codes for spatial properties, bringing potentially highly excursive encoded values into closer register with those typical of the scene category the objects are associated with. In contrast to the feedback account rejected above, in this framework, the centripetal influence of objects precedes scene recognition. Moreover, we propose that the purpose of this influence is to assist scene recognition by easing potential conflicts between scene category judgments derived from object contents and spatial properties.

For an example of how this might work, let us return to the task of deciding whether a room in an unfamiliar house is a bathroom, perhaps after being told that both a bathroom and a kitchen (but no other room type) can be found along a hallway one is walking down. These room categories differ both in their typical object contents (Table 1) and in their average spatial properties (Fig. 6A). As such, upon viewing the first encountered room, it can be expected that hypotheses about its identity could be generated from both its object contents and its spatial properties. (We use the term “hypothesis” here to avoid any mechanistic implications attached to “schema” or “context frame”.) Let us assume that this room happens to be an inordinately large bathroom. Owing to the high degree of overlap between real-world distributions of the sizes of bathrooms and kitchens, it is quite possible that this room's extreme spatial properties may place it on the “kitchen” side of a neutral spatial property-based category criterion. This would generate a spatial property-based hypothesis that conflicts with the hypothesis generated from its object contents, which we will assume leave no doubt about the room's category. A final judgment of the room's identity therefore requires some means of reconciling these competing hypotheses. To do so would presumably require consideration of a variety of factors to determine the appropriate weight that should be given to each hypothesis, a process that might take time and offer added opportunities for error.

Figure 6.

Hypothesized role for centripetal bias in scene categorization. (A) Histograms of crowd-sourced ratings of the spaciousness of the 100 bathrooms (gray) and 100 kitchens (black), all with unmasked objects, from the middle spaciousness quintile of each category, accumulated across 61 observers (bathroom n = 609 and kitchen n = 607). Ratings were solicited from paid online raters (separate from those who contributed to scenes' quintile assignments) who were each asked to rank a pool of 50 bathroom and 50 kitchen images in terms of perceived spaciousness of the depicted rooms, without regard to category. X-axis values are within-subject z-transforms of raw ratings. The means of these distributions are significantly different (two-tailed t-test, t(1214) = 4.93, P = 9.3 × 10−7). These data are shown only to establish that average-sized real-world bathrooms tend to be judged as smaller than average-sized real-world kitchens, albeit with a significant overlap. We infer from these data that average-sized rooms in each category possess similarly differing distributions of actual spatial scales. No inferences about scene categorization mechanisms are drawn from these data. Dashed curves are normal distributions fit to data. (B) Schematized versions of distributions in A. Because the distributions of spatial properties overlap between the 2 room categories, any fixed spatial property-based decision criterion (vertical dashed line) will result in some proportion of spatial property-based categorizations that conflict with scenes' object contents; this fraction is represented by the union of the horizontal-lined and dark gray-shaded regions. (This example assumes that object contents are perfectly informative of scene category.) By narrowing the distributions of encoded spatial properties (dotted curves), object-triggered centripetal bias reduces overlap between the distributions of the internal representations of the categories' spatial properties, potentially producing a smaller proportion of conflicted categorizations (dark gray-shaded region alone). Although centripetal bias is illustrated here as a reduction in the SDs of normal distributions, nonuniform centripetal effects (e.g., applied only to distributions' tails) would produce a similar outcome.

Figure 6.

Hypothesized role for centripetal bias in scene categorization. (A) Histograms of crowd-sourced ratings of the spaciousness of the 100 bathrooms (gray) and 100 kitchens (black), all with unmasked objects, from the middle spaciousness quintile of each category, accumulated across 61 observers (bathroom n = 609 and kitchen n = 607). Ratings were solicited from paid online raters (separate from those who contributed to scenes' quintile assignments) who were each asked to rank a pool of 50 bathroom and 50 kitchen images in terms of perceived spaciousness of the depicted rooms, without regard to category. X-axis values are within-subject z-transforms of raw ratings. The means of these distributions are significantly different (two-tailed t-test, t(1214) = 4.93, P = 9.3 × 10−7). These data are shown only to establish that average-sized real-world bathrooms tend to be judged as smaller than average-sized real-world kitchens, albeit with a significant overlap. We infer from these data that average-sized rooms in each category possess similarly differing distributions of actual spatial scales. No inferences about scene categorization mechanisms are drawn from these data. Dashed curves are normal distributions fit to data. (B) Schematized versions of distributions in A. Because the distributions of spatial properties overlap between the 2 room categories, any fixed spatial property-based decision criterion (vertical dashed line) will result in some proportion of spatial property-based categorizations that conflict with scenes' object contents; this fraction is represented by the union of the horizontal-lined and dark gray-shaded regions. (This example assumes that object contents are perfectly informative of scene category.) By narrowing the distributions of encoded spatial properties (dotted curves), object-triggered centripetal bias reduces overlap between the distributions of the internal representations of the categories' spatial properties, potentially producing a smaller proportion of conflicted categorizations (dark gray-shaded region alone). Although centripetal bias is illustrated here as a reduction in the SDs of normal distributions, nonuniform centripetal effects (e.g., applied only to distributions' tails) would produce a similar outcome.

We propose that crosstalk aids scene recognition by reducing the frequency with which this reconciliation process is required. By driving the encoded spatial properties of the very large bathroom toward those of the average bathroom, the centripetal bias we observed reduces the probability that the hypothesis of scene identity derived from those properties will conflict with the hypothesis derived from the scene's object contents. Assessed across encounters with many scenes, we propose that centripetal bias narrows the distributions of each scene category's encoded spatial properties, reducing the degree of overlap they would show if space were encoded veridically, and consequently decreasing the proportion of scenes on the “wrong” side of neutral spatial property criteria between categories (Fig. 6B). We expect that the resulting harmonization of category hypotheses derived from encoded spatial properties with those derived from objects would improve the speed and accuracy of categorization.

Based on this theory, we expect that the degree of centripetal bias produced by an object should bear some relationship to the strength of its association with a specific scene category; that is, that the identities of the masked objects in both our behavioral and fMRI experiments mattered. Although our study did not directly test this relationship, the alternative that all objects are equipotent in inducing centripetal bias is virtually impossible to reconcile with the bidirectional bias we observed. Consider a completely empty room with a floor area intermediate between the average floor areas of 2 room categories generally differing reliably in size. The addition of an object with no association to any particular scene category will, by definition, add no information about the identity of the scene, leaving the direction of any potential induced bias unspecified: Should the bias be toward the smaller or the larger scene category? With the target of bias undefined, such an object cannot produce any bias, seemingly negating the possibility that all objects are equipotent in producing the bias. We therefore consider it extremely likely that the centripetal bias we observed depended on the identities of those objects which varied in masking state.

We acknowledge, however, that our results do not tell us whether the ability of objects to bias scenes' encoded spatial properties derives from objects' statistical associations with specific base-level scene categories, such as “bathroom” versus “kitchen”, or with scenes grouped at some higher taxonomic level, such as “indoor scenes” versus “outdoor scenes.” In other words, while our results are consistent with our hypothesis that objects bias encoded spatial properties toward the average values of bathrooms or kitchens, they leave open the possibility that objects biased encoded properties toward those of the average indoor room. This ambiguity exists because the high- and low-spaciousness adapting scenes we used from each scene category were likely extreme enough that they bracketed the average spatial properties of both categories, and quite possibly those of the average across all indoor scene categories. However, while we cannot identify with certainty the level of scene specificity at which the centripetal bias operated, it seems that a bias which targeted the spatial properties of base-level categories would be more adaptive than one which targeted the average properties of a higher taxonomic cluster, such as indoor scenes. This is because while a bias targeting the average indoor room would benefit indoor/outdoor scene distinctions, it would simultaneously harm distinctions among base-level categories of indoor or outdoor scenes by compressing the range of encoded spatial properties of all categories within each group toward a single point. In contrast, a bias that targeted base-level scene categories, and therefore aided distinctions among them, would be at worst neutral with respect to the high taxonomic distinctions, such as indoor versus outdoor. Ultimately, additional experiments are necessary to clarify this issue. We emphasize, however, that the uncertainty we highlight does not challenge our crosstalk interpretation of the centripetal bias but merely raises questions about the level of scene distinctions that might benefit from it.

Although we favor the idea that the centripetal bias targets base-level scene categories, our crosstalk theory does not predict that all such categories will be equally susceptible. Instead, assuming equally strong object associations, the magnitude of centripetal bias should vary with the strength of scene categories' associations with any particular set of spatial properties. Specifically, we predict that the centripetal bias should be stronger for indoor scenes (such as the bathrooms and kitchens used here), which tend to occupy a relatively narrow range of real-world sizes, than for outdoor scenes. Given this, we do not interpret the fact that objects “controlled” spatial properties in this experiment to indicate that objects enjoy a general position of superiority over spatial properties during processing of all scenes. Thus, an important future test of our crosstalk hypothesis will be to show not only that the centripetal bias exists beyond the narrow range of scene types used in the present study, but also that it fails predictably for scene categories not strongly associated with any particular spatial scale.

Our crosstalk theory thus holds that informative objects benefit scene categorization in 2 ways: by directly activating schemata of their associated scenes and by biasing encoded spatial properties to reduce conflicts with properties associated with those schemata. This view makes the testable prediction that the presence of informative objects should aid performance on a binary scene discrimination task more under conditions that allow objects to produce a centripetal bias, such as when scene exemplars possess spatial properties departing from their category averages, versus when they do not, such as when exemplars from at least one category already match the spatial properties typical of their category. The competing view that the centripetal bias reflects feedback from object-activated schemata predicts no such difference. We expect, therefore, that future experiments will be able to clarify whether our feedforward crosstalk explanation of centripetal bias is correct.

### Role of Parahippocampal Cortex

Matching our behavioral results, activity patterns evoked in the right PPA by scenes at each spatial extreme were more similar to patterns associated with the opposite extreme when objects were unmasked versus when they were masked. This correspondence to our behavioral results was not observed in any other ROI. Although PPA has been shown to be sensitive to low-level properties of stimuli, such as spatial frequency (Rajimehr et al. 2011; Zeidman et al. 2012) and texture (Cant and Goodale 2007; Cant and Xu 2012), response differences between unmasked and masked scenes are unlikely to have arisen from differences in low-level properties. Any influence of object masking on these properties should have taken the same sign for both high- and low-spaciousness scenes, whereas the influence of objects along the PPA space-coding dimension operated in opposite directions depending on scene spaciousness. While PPA activity has been shown previously to relate to spatial properties of scenes (Kravitz et al. 2011; Park et al. 2011; Harel et al. 2012) and to human judgments of scene category (Peelen et al. 2009; Walther et al. 2009), the present study joins a very small group demonstrating that PPA activity tracks scenes' encoded spatial properties even when those properties depart from physical reality (Park et al. 2007; Chadwick et al. 2013).

Viewed in the framework of our crosstalk theory, our results suggest that PPA, at least in the right hemisphere, is the brain area in which encoded spatial properties of scenes are brought into alignment with expectations derived from scenes' object contents. Our assignment to the PPA of this role as a junction point between codes for objects and spatial properties is consistent with its recent characterization (Harel et al. 2012) as the midpoint in a hierarchy of scene processing regions which ranges from the purely object-sensitive LOC to the purely space-sensitive RSC. Moreover, our results offer an alternative perspective on the contentious issue of the origin of object-evoked activity in the PPA (Aminoff et al. 2007; MacEvoy and Epstein 2009; Harel et al. 2012; Troiani et al. 2012), which has alternately been explained in terms of either object-triggered spatial representations (Epstein and Ward 2010; Mullally and Maguire 2011, 2013) or contextual associations among objects (Bar and Aminoff 2003; Aminoff et al. 2007, 2013). Our results suggest that the presence of object-evoked activity in the PPA also reflects the object information necessary for centripetal bias to take place. Indeed, as our crosstalk theory is based on associations between spatial properties and nonspatial information (i.e., object identity), the role we ascribe to the PPA as the effector of centripetal bias appears consistent with both the context- and layout-centered views of its function in scene processing.

Neither our ROI-level nor searchlight analyses showed a similar effect of objects on encoded spatial properties in the left PPA. This laterality can be separated into 2 distinctions between the right and left PPA. First, unlike patterns from the right PPA, patterns from the left PPA failed to pass even the basic test of distinguishing significantly among unmasked scenes on the basis of spatial properties: Pattern distances between high- and low-spaciousness unmasked exemplars were not significantly greater than pattern distances between those exemplars and the average-spaciousness exemplars. The reason for this failure is not clear, although some research has suggested that left parahippocampal cortex may have a relatively reduced capacity for spatial processing (Wagner et al. 1998; Kirchhoff et al. 2000; Epstein et al. 2003; Stevens et al. 2012), which may have been less apparent in previous MVPA studies of PPA spatial sensitivity that used scenes spanning a much greater range of spatial properties than those we used (Kravitz et al. 2011). Second, we observed no significant effect of object masking on relationships among left PPA activity patterns. This potentially reflects the demonstrated greater sensitivity of right parahippocampal cortex to the specific visual contents of scenes, contrasting with a greater capacity for abstraction in the left parahippocampal cortex (Koutstaal et al. 2001; Xu et al. 2007; Stevens et al. 2012).

While the medial temporal cluster identified by our searchlight analysis fell within the boundaries of group-defined PPA, it is positioned markedly anteriorly in the parahippocampal cortex. Our results thus join a growing set of findings suggesting that PPA is differentiable along its rostrocaudal axis in terms of both response properties (Rajimehr et al. 2011; Nasr et al. 2013) and connectivity (Baldassano et al. 2013), and dovetails very closely with some. For instance, anterior PPA has been recently shown to be much less sensitive to objects than posterior PPA (Baldassano et al. 2013), and less sensitive to the high spatial frequencies that might convey information about objects (Rajimehr et al. 2011). While this might appear at first to conflict with our searchlight map showing the most prominent effect of objects in anterior PPA, it is important to remember that the searchlight analysis identified subregions whose patterns were biased by the presence of objects in scenes, not necessarily those which contained information about the identities of the objects. Furthermore, inasmuch as the centripetal bias would appear to be a rather high-level refinement of spatial property codes, it makes sense that it would be found in the anterior PPA, which shares a greater degree of connectivity with fronto-parietal networks than posterior PPA (Baldassano et al. 2013). It is noteworthy in this regard that the only other area which showed evidence of centripetal bias in our searchlight analysis was a cluster in prefrontal cortex. Whether this indicates some functional association with PPA is unclear, but to the extent that it might, we are inclined toward the view that it results from prefrontal mirroring of a centripetal bias that arises in the PPA, potentially reflecting the channel through which PPA spatial codes contribute to categorical decisions.

In summary, although scenes' spatial properties and object contents are formally independent descriptors of scenes, both our behavioral and fMRI results show that this theoretical independence is not respected by the visual system. While it has long been appreciated that objects can influence judgments of scene category, the biasing influence of objects on encoded spatial properties that we observed has not been previously described nor explicitly predicted by scene recognition models. We propose that this bias reflects a system of object/spatial property crosstalk supporting generation of unified judgments of scene category by reducing potential categorization conflicts. Further perceptual and neuroimaging experiments will be necessary to understand the neuroanatomical basis of this phenomenon, and to explicitly test the hypothesis that it aids the accuracy and speed of scene recognition.

## Funding

This work was funded by Boston College.

## Notes

The authors thank Lauren Beebe, Chris Gagne, Emilie Josephs, and Molly LaPoint for assistance with stimulus generation, Zoe Yang for assistance with data collection, and Russell Epstein for comments on the manuscript. Conflict of Interest: None declared.

## References

Aguirre
GK
Mattar
MG
Magis-Weinberg
L
.
2011
.
de Bruijn cycles for neural decoding
.
NeuroImage
.
56
:
1293
1300
.
Aguirre
GK
Zarahn
E
D'Esposito
M
.
1998
.
An area within human ventral cortex sensitive to “building” stimuli: evidence and implications
.
Neuron
.
21
:
373
383
.
Aminoff
E
Gronau
N
Bar
M
.
2007
.
The parahippocampal cortex mediates spatial and nonspatial associations
.
Cereb Cortex
.
17
:
1493
1503
.
Aminoff
EM
Kveraga
K
Bar
M
.
2013
.
The role of the parahippocampal cortex in cognition
.
Trends Cogn Sci
.
17
:
379
390
.
Anstis
S
Verstraten
FA
Mather
G
.
1998
.
The motion aftereffect
.
Trends Cogn Sci
.
2
:
111
117
.
Antes
JR
Penland
JG
Metzger
RL
.
1981
.
Processing global information in briefly presented pictures
.
Psychol Res
.
43
:
277
292
.
Baldassano
C
Beck
DM
Fei-Fei
L
.
2013
.
Differential connectivity within the parahippocampal place area
.
NeuroImage
.
75
:
228
237
.
Bar
M
.
2004
.
Visual objects in context
.
Nat Rev Neurosci
.
5
:
617
629
.
Bar
M
Aminoff
E
.
2003
.
Cortical analysis of visual context
.
Neuron
.
38
:
347
358
.
Bar
M
Ullman
S
.
1996
.
Spatial context in recognition
.
Perception
.
25
:
343
352
.
Biederman
I
.
1972
.
Perceiving real-world scenes
.
Science
.
177
:
77
80
.
Biederman
I
.
1987
.
Recognition-by-components: a theory of human image understanding
.
Psychol Rev
.
94
:
115
147
.
Biederman
I
Mezzanotte
RJ
Rabinowitz
JC
.
1982
.
Scene perception: detecting and judging objects undergoing relational violations
.
Cognit Psychol
.
14
:
143
177
.
Boyce
SJ
Pollatsek
A
.
1992
.
Identification of objects in scenes: the role of scene background in object naming
.
J Exp Psychol Learn Mem Cogn
.
18
:
531
543
.
Brainard
DH
.
1997
.
The Psychophysics Toolbox
.
Spat Vis
.
10
:
433
436
.
Cant
JS
Goodale
MA
.
2007
.
Attention to form or surface properties modulates different regions of human occipitotemporal cortex
.
Cereb Cortex
.
17
:
713
731
.
Cant
JS
Xu
Y
.
2012
.
Object ensemble processing in human anterior-medial ventral visual cortex
.
J Neurosci
.
32
:
7685
7700
.
MJ
Mullally
SL
Maguire
EA
.
2013
.
The hippocampus extrapolates beyond the view in scenes: an fMRI study of boundary extension
.
Cortex
.
49
:
2067
2079
.
Davenport
JL
Potter
MC
.
2004
.
Scene consistency in object and background perception
.
Psychol Sci
.
15
:
559
564
.
De Graef
P
Christiaens
D
d’ Ydewalle
G
.
1990
.
Perceptual effects of scene context on object identification
.
Psychol Res
.
52
:
317
329
.
Drucker
DM
Aguirre
GK
.
2009
.
Different spatial scales of shape similarity representation in lateral and ventral LOC
.
Cereb Cortex
.
19
:
2269
2280
.
Epstein
R
Higgins
J
.
2006
.
Differential parahippocampal and retrosplenial involvement in three types of visual scene recognition
.
Cereb Cortex
.
17
:
1680
1693
.
Epstein
RA
Graham
KS
Downing
PE
.
2003
.
Viewpoint-specific scene representations in human parahippocampal cortex
.
Neuron
.
37
:
865
876
.
Epstein
RA
Kanwisher
N
.
1998
.
A cortical representation of the local visual environment
.
Nature
.
392
:
598
601
.
Epstein
RA
Ward
EJ
.
2010
.
How reliable are visual context effects in the parahippocampal place area?
Cereb Cortex
.
20
:
294
303
.
Fei-Fei
L
Iyer
A
Koch
C
Perona
P
.
2007
.
What do we perceive in a glance of a real-world scene?
J Vis
.
7
:
10
,1–29
.
Friedman
A
.
1979
.
Framing pictures: the role of knowledge in automatized encoding and memory for gist
.
J Exp Psychol
.
108
:
316
355
.
Greene
MR
Oliva
A
.
2010
.
High-level aftereffects to global scene properties
.
J Exp Psychol Hum Percept Perform
.
36
:
1430
1442
.
Greene
MR
Oliva
A
.
2009
.
Recognition of natural scenes from global properties: seeing the forest without representing the trees
.
Cognit Psychol
.
58
:
137
176
.
Harel
A
Kravitz
DJ
Baker
CI
.
2012
.
Deconstructing visual scenes in cortex: gradients of object and spatial layout information
.
Cereb Cortex
.
23
:
947
957
.
Hassabis
D
Kumaran
D
Maguire
EA
.
2007
.
Using imagination to understand the neural basis of episodic memory
.
J Neurosci
.
27
:
14365
14374
.
Hollingworth
A
Henderson
JM
.
1998
.
Does consistent scene context facilitate object perception?
J Exp Psychol Gen
.
127
:
398
415
.
Honey
C
Kirchner
H
VanRullen
R
.
2008
.
Faces in the cloud: Fourier power spectrum biases ultrarapid face detection
.
J Vis
.
8
:
1
13
.
Howard
LR
Kumaran
D
Ólafsdóttir
HF
Spiers
HJ
.
2011
.
Double dissociation between hippocampal and parahippocampal responses to object–background context and scene novelty
.
J Neurosci
.
31
:
5253
5261
.
Joubert
OR
Rousselet
GA
Fize
D
Fabre-Thorpe
M
.
2007
.
Processing scene context: fast categorization and object interference
.
Vision Res
.
47
:
3286
3297
.
Julian
JB
Fedorenko
E
Webster
J
Kanwisher
N
.
2012
.
An algorithmic method for functionally defining regions of interest in the ventral visual pathway
.
NeuroImage
.
60
:
2357
2364
.
Kirchhoff
BA
Wagner
Maril
A
Stern
CE
.
2000
.
Prefrontal-temporal circuitry for episodic encoding and subsequent memory
.
J Neurosci
.
20
:
6173
6180
.
Koutstaal
W
Wagner
Rotte
M
Maril
A
Buckner
RL
Schacter
DL
.
2001
.
Perceptual specificity in visual object priming: functional magnetic resonance imaging evidence for a laterality difference in fusiform cortex
.
Neuropsychologia
.
39
:
184
199
.
Kravitz
DJ
Peng
CS
Baker
CI
.
2011
.
Real-world scene representations in high-level visual cortex: it's the spaces more than the places
.
J Neurosci
.
31
:
7322
7333
.
Kriegeskorte
N
Goebel
R
Bandettini
P
.
2006
.
Information-based functional brain mapping
.
.
103
:
3863
3868
.
Leopold
DA
O'Toole
AJ
Vetter
T
Blanz
V
.
2001
.
Prototype-referenced shape encoding revealed by high-level aftereffects
.
Nat Neurosci
.
4
:
89
94
.
Little
AC
DeBruine
LM
Jones
BC
.
2005
.
Sex-contingent face after-effects suggest distinct neural populations code male and female faces
.
Proc R Soc B Biol Sci
.
272
:
2283
2287
.
Loftus
GR
Nelson
WW
Kallman
HJ
.
1983
.
Differential acquisition rates for different types of information from pictures
.
Q J Exp Psychol Sect A
.
35
:
187
198
.
MacEvoy
SP
Epstein
RA
.
2011
.
Constructing scenes from objects in human occipitotemporal cortex
.
Nat Neurosci
.
14
:
1323
1329
.
MacEvoy
SP
Epstein
RA
.
2009
.
Decoding the representation of multiple simultaneous objects in human occipitotemporal cortex
.
Curr Biol
.
19
:
943
947
.
MacEvoy
SP
Yang
Z
.
2012
.
Joint neuronal tuning for object form and position in the human lateral occipital complex
.
NeuroImage
.
63
:
1901
1908
.
McCotter
M
Gosselin
F
Sowden
P
Schyns
P
.
2005
.
The use of visual information in natural scenes
.
Vis Cogn
.
12
:
938
953
.
Morgan
LK
MacEvoy
SP
Aguirre
GK
Epstein
RA
.
2011
.
Distances between real-world locations are represented in the human hippocampus
.
J Neurosci
.
31
:
1238
1245
.
Mudrik
L
Lamy
D
Deouell
LY
.
2010
.
ERP evidence for context congruity effects during simultaneous object–scene processing
.
Neuropsychologia
.
48
:
507
517
.
Mullally
SL
Maguire
EA
.
2013
.
Exploring the role of space-defining objects in constructing and maintaining imagined scenes
.
Brain Cogn
.
82
:
100
107
.
Mullally
SL
Maguire
EA
.
2011
.
A new role for the parahippocampal cortex in representing space
.
J Neurosci
.
31
:
7441
7449
.
Nasr
S
Devaney
KJ
Tootell
RBH
.
2013
.
Spatial encoding and underlying circuitry in scene-selective cortex
.
NeuroImage
.
83
:
892
900
.
Nichols
TE
Holmes
AP
.
2002
.
Nonparametric permutation tests for functional neuroimaging: a primer with examples
.
Hum Brain Mapp
.
15
:
1
25
.
Oliva
A
Schyns
PG
.
2000
.
Diagnostic colors mediate scene recognition
.
Cognit Psychol
.
41
:
176
210
.
Oliva
A
Torralba
A
.
2006
.
Building the gist of a scene: the role of global image features in recognition
.
Prog Brain Res
.
155
:
23
36
.
Oliva
A
Torralba
A
.
2001
.
Modeling the shape of the scene: a holistic representation of the spatial envelope
.
Int J Comput Vis
.
42
:
145
175
.
Palmer
SE
.
1975
.
The effects of contextual scenes on the identification of objects
.
Mem Cognit
.
3
:
519
526
.
Park
S
TF
Greene
MR
Oliva
A
.
2011
.
Disentangling scene content from spatial boundary: complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes
.
J Neurosci
.
31
:
1333
1340
.
Park
S
Intraub
H
Yi
D-J
Widders
D
Chun
MM
.
2007
.
Beyond the edges of a view: boundary extension in human scene-selective visual cortex
.
Neuron
.
54
:
335
342
.
Peelen
MV
Fei-Fei
L
Kastner
S
.
2009
.
Neural mechanisms of rapid natural scene categorization in human visual cortex
.
Nature
.
460
:
94
97
.
Rajimehr
R
Devaney
KJ
Bilenko
NY
Young
JC
Tootell
RBH
.
2011
.
The “parahippocampal place area” responds preferentially to high spatial frequencies in humans and monkeys
.
PLoS Biol
.
9
:
e1000608
.
Renninger
LW
Malik
J
.
2004
.
When is scene identification just texture recognition?
Vision Res
.
44
:
2301
2311
.
Rhodes
G
Jeffery
L
Clifford
CWG
Leopold
DA
.
2007
.
The timecourse of higher-level face aftereffects
.
Vision Res
.
47
:
2291
2296
.
Russell
B
Torralba
A
Murphy
K
Freeman
W
.
2008
.
LabelMe: a database and web-based tool for image annotation
.
Int J Comput Vis
.
77
:
157
173
.
Schyns
PG
Oliva
A
.
1994
.
From blobs to boundary edges: evidence for time- and spatial-scale-dependent scene recognition
.
Psychol Sci
.
5
:
195
200
.
Stevens
WD
Kahn
I
Wig
GS
Schacter
DL
.
2012
.
Hemispheric asymmetry of visual scene processing in the human brain: evidence from repetition priming and intrinsic activity
.
Cereb Cortex
.
22
:
1935
1949
.
Summerfield
JJ
Hassabis
D
Maguire
EA
.
2010
.
Differential engagement of brain regions within a “core” network during scene construction
.
Neuropsychologia
.
48
:
1501
1509
.
Troiani
V
Stigliani
A
Smith
ME
Epstein
RA
.
2014
.
Multiple object properties drive scene-selective regions
.
Cereb Cortex
.
24
:
883
897
.
Vogel
J
Schiele
B
.
2007
.
Semantic modeling of natural scenes for content-based image retrieval
.
Int J Comput Vis
.
72
:
133
157
.
Wagner
Schacter
DL
Rotte
M
Koutstaal
W
Maril
A
Dale
AM
Rosen
BR
Buckner
RL
.
1998
.
Building memories: remembering and forgetting of verbal experiences as predicted by brain activity
.
Science
.
281
:
1188
1191
.
Walther
DB
E
Fei-Fei
L
Beck
DM
.
2009
.
Natural scene categories revealed in distributed patterns of activity in the human brain
.
J Neurosci
.
29
:
10573
10581
.
Webster
M
Maclin
O
.
1999
.
Figural aftereffects in the perception of faces
.
Psychon Bull Rev
.
6
:
647
653
.
Webster
MA
Kaping
D
Mizokami
Y
Duhamel
P
.
2004
.
.
Nature
.
428
:
557
561
.
Xu
Y
Turk-Browne
NB
Chun
MM
.
2007
.
Dissociating task performance from fMRI repetition attenuation in ventral visual cortex
.
J Neurosci
.
27
:
5981
5985
.
Zeidman
P
Mullally
SL
Schwarzkopf
DS
Maguire
EA
.
2012
.
Exploring the parahippocampal cortex response to high and low spatial frequency spaces
.
Neuroreport
.
23
:
503
507
.
Zimmer
M
Kovács
G
.
2011
.
Position specificity of adaptation-related face aftereffects
.
Philos Trans R Soc B Biol Sci
.
366
:
586
595
.