Ventral visual cortex contains specialized regions for particular object categories, but little is known about how these regions interact during object recognition. Here we examine how the face-selective fusiform gyrus (FG) and the scene-selective parahippocampal cortex (PHC) interact with each other and with the rest of the brain during different visual tasks. To assess these interactions, we developed a novel approach for identifying patterns of connectivity associated with specific task sets, independent of stimulus-evoked responses. We tested whether this “background connectivity” between the FG and PHC was modulated when subjects engaged in face and scene processing tasks. In contrast to what would be predicted from biased competition or intrinsic activity accounts, we found that the strength of FG–PHC background connectivity depended on which category was task relevant: connectivity increased when subjects attended to scenes (irrespective of whether a competing face was present) and decreased when subjects attended to faces (irrespective of competing scenes). We further discovered that posterior occipital cortex was correlated selectively with the FG during face tasks and the PHC during scene tasks. These results suggest that category specificity exists not only in which regions respond most strongly but also in how these and other regions interact.
The human visual system is organized into regions that are specialized for different kinds of visual information. In higher-level visual cortex, distinct regions have been associated with particular object categories that are prevalent in the visual environment, including faces (Kanwisher et al. 1997; McCarthy et al. 1997; Tsao and Livingstone 2008), bodies (Downing et al. 2001; Schwarzlose et al. 2005), words (Cohen et al. 2000; Baker et al. 2007), and buildings/scenes (Aguirre et al. 1998; Epstein and Kanwisher 1998; Epstein 2008). The response properties of these regions have been studied extensively, but little is known about how these regions interact during object recognition. This is a crucial question since the cortex is highly interconnected (Rockland and Van Hoesen 1994), and information processing in the brain depends on both local and long-range neuronal interaction.
To some extent, response properties in themselves provide basic evidence about interactions. For example, the existence of multiple face-selective patches has led to the proposal that faces are processed in a functionally connected network (Haxby et al. 2000; Fairhall and Ishai 2007). This approach however is limited by the fact that stimulus events can elicit correlated responses in multiple visual regions whether or not they interact (Fox and Raichle 2007; Turk-Browne et al. 2010). A different approach that we explore here is to examine correlations that occur in the background of stimulus-locked changes. In other words, we seek to examine correlations that result from maintaining sustained task sets, independent of the evoked responses to individual stimuli. We assess such changes by measuring blood oxygen level–dependent (BOLD) correlations after modeling and removing the mean evoked response from each region, an approach henceforth referred to as “background connectivity.” This can be thought of as an extension of resting connectivity, where one does not need to assume a stable default state (e.g., Greicius et al. 2003). Instead, background connectivity can be used to assess changes in cognitive state, with the goal of understanding how patterns of functional connectivity support specific cognitive processes.
We focus on the background connectivity of 2 regions whose individual response properties have been well characterized: the face-selective fusiform gyrus (FG, also known as the fusiform face area; Kanwisher et al. 1997; McCarthy et al. 1997; Tsao and Livingstone 2008) and the scene-selective parahippocampal cortex (PHC, also known as the parahippocampal place area; Aguirre et al. 1998; Epstein and Kanwisher 1998; Epstein 2008). In particular, we test how background connectivity between the FG and PHC is influenced by face and scene processing tasks.
One possibility is that the background connectivity of category-selective regions may also be to some extent category-selective. In particular, the FG and PHC may interact differently with each other and with the rest of the brain during face versus scene processing tasks. For example, since the processing of scenes may rely on the coordination of distributed contextual associations (Bar and Aminoff 2003; Bar 2004; Köhler et al. 2005; but see Epstein and Ward 2010), scene tasks might lead to enhanced interactions (vs. face tasks) between the PHC and other specialized visual regions. If patterns of connectivity can be category-selective, then the level of background connectivity between the FG and PHC may be qualitatively different when faces are task-relevant compared with when scenes are task-relevant.
Alternatively, background connectivity may depend on more generic task demands such as the degree of stimulus competition in the environment (i.e., clutter) or the need for top-down control. For example, biased competition models suggest that neurons selective for different objects mutually inhibit each other in the presence of their preferred stimulus (Desimone and Duncan 1995; Reynolds and Chelazzi 2004; see also Kastner and Ungerleider 2000), and this kind of competition has also been suggested to occur between category-selective neural populations (Allison et al. 2002; Pelphrey et al. 2003). Such competition models predict that the correlation between neurons or regions will be affected less by the particular category being processed—since the interactions are mutually inhibitory—and more by the degree of stimulus competition. In this case, FG–PHC background connectivity may differ from a resting baseline only (or especially) when both faces and scenes are present in the visual environment, but would not differ between face and scene tasks per se.
A final possibility is that background connectivity reflects the intrinsic functional architecture of the brain and that all task-induced changes are contained in stimulus-evoked responses (which have been removed in our case). This possibility is supported by the claim that task-related activity is linearly superposed on intrinsic activity, which persists equally during rest and tasks (Fox et al. 2006; Fox and Raichle 2007). In this case, FG–PHC background connectivity would not differ from a resting baseline, regardless of whether faces or scenes are task-relevant and regardless of stimulus competition. Likewise, it is also possible that interactions are not directly mediated in visual cortex and instead occur in frontal or parietal regions (Miller and Cohen 2001).
Here we directly examine whether background connectivity in ventral visual cortex reflects the task relevance of particular object categories, the degree of stimulus competition, or the presence of task-insensitive intrinsic activity. We focus on FG and PHC because we can explore all 3 of these hypotheses with only these 2 regions. If we had instead examined connectivity between regions with similar selectivity (e.g., multiple face-selective regions), it would have been difficult to explore the role of competition. Furthermore, because FG and PHC have such different selectivities, they provide a strong test of whether category-selective processing can be understood in terms of interactions between regions, independent of the selectivity of individual regions.
Our design included 6 conditions that varied in the type of category being processed or the degree of stimulus competition: faces viewed alone (F), scenes viewed alone (S), faces attended with scenes superimposed (Fs), scenes attended with faces superimposed (Sf), faces and scenes attended while superimposed (conjunction; C), and rest (R). If background connectivity in ventral visual cortex reflects more generic types of task demands, for example as suggested by competition models, then background connectivity might vary as a function of stimulus competition (F/S vs. Fs/Sf). Further, by explicitly asking subjects to attend both categories, we tested whether task demands can in some cases reduce the effects of competition (C vs. Fs/Sf). In contrast, if interactions in ventral visual cortex can be category-selective, then background connectivity might vary as a function of which category is task-relevant (F/Fs vs. S/Sf). Finally, if background connectivity reflects a fixed intrinsic functional architecture, then we would not expect any changes from rest. Indeed, in all of these cases, the rest condition provides a meaningful baseline that can be used to assess the directionality of task-induced effects, thereby relating changes in task set to our rapidly growing knowledge of the resting human brain. We tested each of these hypotheses by first examining changes in background connectivity between the FG and PHC. To more fully characterize FG and PHC interactions, we subsequently explored the background connectivity of each of these regions with the rest of cortex.
Materials and Methods
Twenty right-handed subjects with normal or corrected-to-normal vision participated in this study (mean age = 22; 11 females). Two subjects were excluded prior to analysis due to excessive head motion and technical difficulties, respectively. Informed consent was obtained from all subjects, and the study protocol was approved by the Human Investigation Committee of the Yale University School of Medicine.
Subjects participated in 7 runs of equal duration (7 m, 21 s): 1 run of a localizer task, 5 main task runs, and 1 resting run in which they fixated a central dot. Each of the main task runs corresponded to one of 5 conditions: face-only (F), scene-only (S), face-attended (Fs), scene-attended (Sf), or conjunction (C) (Fig. 1). Localizer runs always occurred at the end of the experiment. The order of the 6 rest/task runs was perfectly counterbalanced across the 18 subjects using a 6 × 6 Latin square.
Localizer runs consisted of an alternating block design in which 18-s stimulus blocks were interleaved with 18-s blocks of fixation. A total of 12 stimulus and 12 fixation blocks were presented. Stimulus blocks were evenly divided into 6 face and 6 scene blocks (face-fixation-scene fixation-face-fixation …), with the order counterbalanced across subjects. Stimulus blocks consisted of 12 gray-scale images (9 × 9°) presented in a pseudorandom order from a single category. Stimuli were presented every 1500 ms with a 1-s duration and a 500-ms interstimulus interval (onsets locked to the repetition time [TR]). Face images were drawn from a set of neutral face photographs from the NimStim data set (http://www.macbrain.org/resources.htm). Scene images consisted of photographs of single houses in their natural context, collected from the Internet and stock photography discs. Subjects were instructed to detect one-back image repetitions.
Main task runs had the same structure as the localizer task, and the images were drawn from the same stimulus sets. However, in contrast to the localizer, subjects focused on the same image category in all 12 stimulus blocks for the entire run (faces only, scenes only, or faces and scenes superimposed). The face-only stimulus blocks consisted of 12 gray-scale neutral face images. The scene-only stimulus blocks consisted of 12 gray-scale house images. In both the face-only run and the scene-only run, subjects were instructed to detect one-back image repetitions.
For the face-attended, scene-attended, and conjunction runs, superimposed images were generated by first equalizing face and house stimuli (such that they had equal mean luminance and uniform local intensity histograms) and then averaging pixel intensities across pairs of face and house images. In the face-attended run, subjects were instructed to ignore the superimposed scene images and detect one-back repetitions in the face images, regardless of whether a scene also repeated. In the scene-attended run, subjects likewise detected one-back scene repetitions while ignoring faces. In the conjunction run, subjects were instructed to detect simultaneous face and scene repeats while ignoring face repetitions that did not coincide with scene repetitions, and vice versa (Fig. 1).
For all main and localizer task runs, face and scene images were selected in a pseudorandom fashion such that each block contained either 1 or 2 repetitions. This helped ensure that subjects remained attentive throughout each block. Importantly, face- and scene-attended runs used identical stimuli and only differed in the instructions given to subjects. Conjunction runs used functionally identical stimuli, but a slightly different repetition structure to introduce the same number of conjunction repetition targets. Resting runs were of equal length to the main and localizer task runs, and subjects were instructed to passively attend a central fixation dot.
Data Acquisition and Preprocessing
All data were collected at the Yale Magnetic Resonance Research Center using a 3-T Siemens Trio scanner with an 8-channel head coil. Functional images were collected using a T2*-weighted gradient echo echo-planar imaging (EPI) sequence with a 1500-ms TR, 25-ms TE, 90° flip angle, and a 64 × 64 matrix. In a given volume, 26 5-mm slices parallel to the anterior commissure–posterior commissure line were obtained with a 224-mm FOV (3.5 × 3.5 × 5 mm voxels). Two T1-weighted anatomical scans were also collected for spatial registration and normalization (high resolution and coplanar).
Preprocessing and regression analyses were carried out using FSL 4.1 and FMRIB software libraries (Analysis Group, FMRIB, Oxford, UK). All other analyses were completed using custom scripts for Matlab (Mathworks, Natick, MA). For each localizer, resting, and main task run, the first 4 volumes were discarded to allow for T1 equilibration. The remaining data were motion-corrected in 6 dimensions to the middle volume of each run, spatially smoothed using a 5-mm full-width half-maximum (FWHM) kernel and temporally high-pass filtered with a 100-s period. Functional runs were registered to the coplanar structural scan with 3 degrees of freedom (DOF), which was in turn registered to the high-resolution structural scan with 6 DOF. Each subject's high-resolution scan was normalized to the Montreal Neurological Institute template brain with 12 DOF. Using these transformations, EPI data were transformed into standard space and interpolated to 2-mm isotropic voxels.
Localizer Analysis and Seed Definition
Localizer runs were fit using a general linear model (GLM) that included boxcar regressors for face and scene blocks convolved with a hemodynamic response function (HRF). z scores were computed from “face > scene” and “scene > face” contrasts and thresholded at a voxel threshold of P < 0.01. Face > scene contrasts were then used to select a face-selective FG seed in each subject, defined as the peak voxel from a cluster centered on the right mid-FG. Likewise, scene-selective PHC seeds were selected in each subject from the scene > face contrasts, defined as the peak voxel from a cluster in the right PHC/collateral sulcus region. We chose to focus on the right hemisphere seeds because of a well-known bias in face processing (McCarthy et al. 1997). We report background connectivity for left FG and PHC seeds in Supplementary Figure 3 (seeds picked using identical criteria).
For completeness, we report the evoked responses of FG and PHC in each task. Evoked responses were calculated as the percent change from the pre-block rest period to the peak of the hemodynamic response. In calculating the peak response, all time points that did not differ from the time point with the maximal response (collapsing across task) were included.
Finite Impulse Response Model
To examine whether face and scene tasks alter background connectivity, we first modeled and removed the mean evoked response from each of the 5 main task runs and then examined functional connectivity in the residuals of the model (Fig. 2). The goal of this approach is to measure changes in functional connectivity that are independent of stimulus-evoked responses and instead depend on a sustained task set (e.g., Logan and Gordon 2001; Braver et al. 2003; Fox, Synder, Barch, et al. 2005; Dosenbach et al. 2006). In other words, background connectivity examines task-related changes that cannot be explained by considering the evoked response of each region. Note that this is a fundamentally different approach than typical GLM analyses of fMRI data, in which only the evoked response is studied—indeed, while all GLM models produce residuals, these residuals are typically treated as error or noise, and used only to evaluate the model fit. By establishing task sets that persist across stimulus blocks, the GLM residuals in our study may contain not only error and noise but also variance related to maintaining the task at hand.
We fit the evoked response with a finite impulse response (FIR) model because such models make no assumptions about the shape of the hemodynamic response, as opposed to models based on canonical HRFs. Approaches that explicitly model the canonical HRF never perfectly capture the idiosyncratic response profile of different regions and brains, and therefore residuals of such models will likely contain unexplained evoked responses. This unexplained variance could in turn correlate among regions with similar shaped evoked responses and lead to conclusions about background connectivity based on the hemodynamic properties of different brain regions. For similar reasons, residual task-evoked responses could also give rise to spurious connectivity in other approaches such as psychophysiological interaction (Friston et al. 1997) and dynamic causal modeling (Friston et al. 2003). The approach we pursue also has the advantage of being conceptually simple: we model the evoked response without any assumptions about shape and then examine connectivity in the scrubbed residuals of this model (Fig. 2).
Each preprocessed run was fit using a GLM consisting of 24 constant-height candlestick regressors (the FIR functions) modeling each time point in one stimulus block + fixation period (total = 36 s). Thus, any response that occurred a fixed time period after the onset of a stimulus block was captured by at least one of the 24 regressors. In addition, we included several other nuisance regressors commonly used in resting connectivity analyses (Fox, Synder, Vincent, et al. 2005): the global mean of the BOLD time course over all voxels, the BOLD time course from 4 white matter seeds (bilateral anterior and posterior), the BOLD time course from 4 ventricle seeds (bilateral anterior and posterior), and 6 movement parameters. Because removing the global mean can in some cases induce correlations (Murphy et al. 2009), we ran a second GLM, excluding the global mean time course. Signal averaging the residuals confirmed that stimulus-evoked responses had been modeled and removed (see Supplementary Fig. 1). Data were not prewhitened at this stage of the analysis, since autocorrelations in these residuals could be important for characterizing background connectivity. In addition, resting runs were also fit using the same GLM to remove nuisance variables and to ensure that any generic effects of the model were distributed equally across resting and task runs (e.g., FIR models behave as band-stop filters).
FG–PHC Background Connectivity
To calculate FG–PHC background connectivity, we overlaid the FG and PHC localizer seeds on the residuals from each main task and resting run. We then extracted FG and PHC time courses by taking a weighted average for each time point across a cluster of voxels surrounding the peak. Weights were defined using an 8-mm FWHM Gaussian Kernel centered on the peak voxel. For each subject, we correlated the FG and PHC time courses and converted this correlation coefficient to a z score using Fisher's r-to-z transform. These z-scored correlation coefficients served as our measure of background connectivity, and they were analyzed using a hierarchical combination of repeated-measures analyses of variance (ANOVAs) and planned t-tests (see Results).
To examine the frequency profile of task-related changes, preprocessed FG and PHC time courses were bandpass filtered into 0.04-Hz frequency bins using an equiripple FIR digital filter. For each subject and run, this produced 7 sets of FG/PHC time courses corresponding to the following frequency ranges: 0.01–0.05, 0.05–0.09, 0.09–0.13, 0.13–0.17, 0.17–0.21, 0.21–0.25, and 0.25–0.29 Hz. For each bin, we then calculated the background connectivity between FG and PHC, producing 7 sets of band-limited connectivity data. Background connectivity was then analyzed using a similar combination of repeated-measures ANOVAs and t-tests (see Results).
Whole-brain analyses were similar to the FG–PHC region of interest (ROI) analysis. However, instead of correlating residual FG/PHC time courses with each other, we regressed each region's residual time course against the residual time course of every other voxel in the brain. The residuals were not extracted from the GLM analyses described above, but rather from 2 new GLM analyses of each main task and resting run. When examining whole-brain FG connectivity, we used residuals from a GLM that included the PHC time course as a nuisance regressor (in addition to the FIR and other nuisance regressors described above). Similarly, when examining whole-brain PHC connectivity, residuals were drawn from a separate GLM that included a nuisance FG regressor (among others). The goal of this partial connectivity approach, which we have explored in a recent study (Turk-Browne et al. 2010), was to isolate variance that is more functionally specific to a given region; for example, to remove variance that is generic across all high-level visual regions. This also ensured that any effects found in the whole-brain analyses were independent of the results found in the FG–PHC analysis.
For the whole-brain FG and PHC analyses, seed time courses were extracted from the residuals of each main task and resting run GLM (containing PHC and FG nuisance regressors, respectively) and entered into a new voxel-wise GLM of the same residual data. Before fitting the GLM, prewhitening was applied to remove autocorrelations unrelated to background connectivity. At the first level, z scores were computed, testing the fit between each voxel's residual time course and the residual FG or PHC regressor. Each task condition was then compared against rest at the group level using a mixed-effects approach (FLAME 1). To correct for multiple comparisons, the two-stage cluster correction in FSL was used: first, an initial z score threshold (z > 2.3) was applied to every voxel in order to determine which clusters entered the second stage; the significance of the remaining clusters was determined by a Gaussian random fields method based on the smoothness of the data (Poline et al. 1997; Woolrich et al. 2004). The resulting statistical maps were cluster-thresholded to a corrected alpha of P < 0.05.
Calcarine Sulcus ROI Analyses
To analyze interactions between high-level and low-level visual areas, additional correlations were computed with an ROI from the calcarine sulcus (CS) in right posterior occipital cortex. The CS was chosen as an ROI because it contains representations of the upper and lower visual field in primary visual cortex (DeYoe et al. 1996; Engel et al. 1997). Since we did not have retinotopy data, this region was chosen in each subject based on anatomical and functional constraints. In particular, we chose a midline region in the posterior CS that responded robustly to both face and scene blocks in the localizer (z > 5), but did not show differential activation across these blocks (z < 0.33). The ROI analyses were conducted in the same manner as the FG/PHC ROI analysis, except that FG–CS connectivity was calculated from the residuals of the GLM that contained a PHC nuisance regressor, and PHC–CS connectivity was calculated from the residuals of the GLM that contained an FG nuisance regressor (as used in the whole-brain analyses described above).
Subjects successfully completed the one-back memory task for each condition (mean accuracy > 80%). To examine whether performance differed across conditions, response accuracies for the face-alone (F), scene-alone (S), face-attended (Fs), and scene-attended (Sf) conditions (conjunction task analyzed separately below) were submitted to a 2 (category: F/Fs vs. S/Sf) × 2 (stimulus competition: F/S vs. Fs/Sf) repeated-measures ANOVA. There were significant main effects of category (F1,17 = 16.409, P = 0.0008), with greater accuracy for scenes (95.6%) than faces (88.6%), and stimulus competition (F1,17 = 26.775, P < 0.0001), with greater accuracy for no competition (96.7%) than competition (87.4%). There was also a significant interaction between the 2 factors (F1,17 = 26.781, P < 0.0001), driven by less accurate performance during the face-attended condition relative to the scene-attended and face-alone conditions (Supplementary Fig. 2A).
Response times for target trials produced a similar pattern of results. There were significant main effects of category (F1,17 = 9.062, P = 0.0079), with faster responses to scenes (566 ms) than faces (594 ms), and stimulus competition (F1,17 = 111.712, P < 0.0001), with faster responses for no competition (540 ms) than competition (619 ms). There was also a significant interaction between the 2 factors (F1,17 = 6.203, P = 0.0234), driven by slower responses during the face-attended condition (Supplementary Fig. 2B).
Response times and accuracies for the conjunction task were analyzed through direct comparisons with the face-attended and scene-attended conditions. Performance on the conjunction task did not differ from the scene-attended condition, but was significantly better than the face-attended condition (accuracy: C > Fs: t17 = 4.112, P = 0.0007; C > Sf: t < 1; response times: Fs > C: t17 = 2.846, P = 0.0112; C > Sf: t < 1). Better performance during the conjunction task relative to the face-attended task is perhaps surprising given that subjects were required to attend both faces and scenes during the conjunction run. One possible explanation is that subjects were able to use image-matching strategies to improve their performance because simultaneous face and scene repeats produced identical superimposed images.
FG–PHC Background Connectivity
To examine how visual tasks modulate FG–PHC background functional connectivity, we first modeled and removed the mean evoked response from every voxel, and then examined functional connectivity in the residuals of the model. This technique ensures that changes in the FG–PHC correlation cannot be explained by correlations in the evoked response of each region (see Materials and Methods, Fig. 2).
In particular, we examined whether background connectivity between the FG and PHC reflects the particular category being processed (i.e., faces or scenes) or more domain-general mechanisms such as the need to suppress competing information. To evaluate these 2 possibilities, we submitted FG–PHC background connectivity to the same 2 (category) × 2 (stimulus competition) repeated-measures ANOVA used to analyze the behavioral data. The ANOVA revealed a main effect of category (F1,17 = 19.623, P = 0.0004), with scene tasks eliciting greater connectivity than face tasks, but no main effect of stimulus competition (F1,17 = 2.791, P = 0.1131) and no interaction (F < 1) (Fig. 3). Follow-up comparisons revealed significantly greater connectivity during the scene-only compared with the face-only condition (t17 = 3.183, P = 0.0054), as well as during the scene-attended compared with the face-attended condition (t17 = 2.898, P = 0.0100). Thus, across very different stimulus conditions, FG–PHC background connectivity depended on which category was task-relevant, but was independent of stimulus competition. The effect in the attentional contrast (Fs vs. Sf) is particularly revealing because these conditions differed only in the instructions provided to subjects, effectively titrating the effects of top-down selection. We observed the same pattern for left hemisphere seeds (Supplementary Fig. 3), and the results did not change when excluding the global mean from the regression (Supplementary Fig. 4). Further, the evoked responses of each region were sensitive to attentional selection (Supplementary Fig. 5), replicating prior findings (O'Craven et al. 1999).
To analyze the conjunction task, we directly compared background connectivity with the conditions with the same composite images (Fs, Sf). There was significantly greater connectivity in the conjunction relative to the face-attended condition (t17 = 2.667, P = 0.0163), but no difference between the conjunction and scene-attended conditions (t < 1). With respect to FG–PHC connectivity, the conjunction task appeared more similar to the scene-attended than face-attended condition.
The difference in background connectivity between face-only and scene-only conditions suggests that our results do not reflect task difficulty, since behavioral performance was equivalent across these 2 conditions (accuracy: t < 1; response times: t < 1). While the overall pattern of connectivity differed from behavioral performance, there was, however, a similar pattern within just the composite-image conditions (Fs, Sf, C): background connectivity and behavioral performance were both higher for scene-attended and conjunction tasks, relative to the face-attended condition. This raises the possibility that background connectivity for the composite-image conditions reflects task difficulty per se, and not stimulus category. To evaluate this possibility, we examined whether better performance predicted higher background connectivity in general across subjects. Specifically, within each task we correlated a subject's behavioral performance (in terms of accuracy and RTs) with the amount of background connectivity observed. RTs were inverted such that higher numbers reflected better performance, thus an explanation of the results in terms of task difficulty would predict positive correlations between background connectivity and both accuracy and RT. However, we observed no reliable relationship between background connectivity and accuracy (rs < 0.4) and in fact a weak negative relationship between background connectivity and inverted response times (sign test, n = 5, one-tailed P < 0.0313; Supplementary Fig. 6). These findings suggest that the pattern of background connectivity in the composite-image conditions reflects categories and not task difficulty per se.
While these results demonstrate significant category differences across conditions, it is unclear how these task changes relate to the resting baseline. Qualitatively, background connectivity decreased from rest in face tasks and increased from rest in scene tasks. However, only the difference for the face-attended task approached significance (R > Fs, t17 = 2.039, P = 0.0573). Because there was no interaction with stimulus competition, we averaged over that factor to produce a measure of category-based background connectivity that we could compare with rest. These scores again did not differ from rest (R > F/Fs, t17 = 1.950, P = 0.068; S/Sf > R, t17 = 1.274, P = 0.220). It is therefore unclear from these results whether the connectivity differences between face and scene tasks were driven by face-related decreases, scene-related increases, or a combination of both increases and decreases from rest. This motivated the subsequent set of frequency analyses where we sought to resolve these differences in greater detail.
In resting connectivity studies, the analysis of correlations is typically limited to very low frequencies (e.g., 0.01–0.08 Hz; Biswal et al. 1995; Fox and Raichle 2007), with the aim of filtering out high-frequency noise. Similarly, we reasoned that certain frequency bands might be more informative than others with respect to task-induced changes, a notion supported by the finding that resting connectivity patterns qualitatively differ across frequency bands (Salvador et al. 2008). Thus, by limiting our analyses to certain subsets of the spectrum, we sought to improve our sensitivity. Since, to our knowledge, the frequency profile of task-related changes has not previously been characterized, we conducted our analyses in an exploratory manner. Indeed, one interesting possibility is that patterns of resting and task-induced connectivity may qualitatively differ in their frequency content (see Discussion).
Specifically, we calculated background connectivity after bandpass filtering the data into 0.04-Hz frequency bins, a bandwidth chosen to balance the number of bins while preserving reasonable signal to noise. While somewhat arbitrary, the results did not depend on the choice of this particular bandwidth (Supplementary Fig. 7). This approach resulted in 7 sets of band-limited connectivity data. In order to avoid problems with multiple comparisons, we analyzed each band using a hierarchical approach. First, each bin was submitted to the same 2 (category) × 2 (stimulus competition) repeated-measures ANOVA as before. To test for changes relative to rest, we limited analysis to those bins that showed a significant main effect of category (F/Fs vs. S/Sf) and no interaction. For these bins, we then averaged the background connectivity across the stimulus competition factor (given the lack of interaction), resulting in a measure of category-based connectivity that we compared with rest (F/Fs vs. R, S/Sf vs. R).
In the first four frequency bands (0.01–0.05, 0.05–0.09, 0.09–0.13, 0.13–0.17 Hz), there was a significant main effect of category (Ps < 0.03) and no interaction (Ps > 0.61). In examining the collapsed category-based measures of background connectivity (see Fig. 4), we observed a significant increase for scene tasks relative to rest in the lowest frequency band (0.01–0.05 Hz, t17 = 2.222, P = 0.0401) and a significant decrease for face tasks in the second and third frequency band (0.05–0.09, t17 = 2.328, P = 0.0325; 0.09–0.13, t17 = 3.493, P = 0.0028). These results reveal 2 interesting properties of background connectivity: 1) significant scene increases and face decreases relative to rest and 2) strong task-based effects in frequencies above 0.09 Hz, higher than the typical low-pass threshold for resting connectivity. Indeed, the strongest task-induced changes were found in the third frequency band between 0.09 and 0.13 Hz.
While we primarily focused on the relationship of FG and PHC, we were also interested in how tasks altered connectivity between these ROIs and the rest of cortex. In particular, we were interested in whether background connectivity with other visual regions would show qualitatively different patterns across conditions. For example, in addition to the decreased FG–PHC connectivity observed during face tasks, there may also be increased FG connectivity with regions earlier in the visual processing hierarchy. This would suggest that the FG can dynamically couple and decouple with different regions during the same task.
To examine this possibility, we conducted whole-brain analyses with FG and PHC. The FG and PHC analyses were parallel in how they were executed, so we describe just the FG analysis here. Overall, our approach for examining whole-brain FG connectivity was the same as that used for examining FG–PHC connectivity. However, instead of comparing FG to only PHC, we compared FG with every voxel in the brain. In addition, since we were interested in FG interactions that were independent of PHC, we included the PHC time course as a control regressor—thus ensuring that whole-brain FG background connectivity is statistically independent of PHC (see Materials and Methods). Finally, in order to evaluate how the main tasks influenced background connectivity, we compared this partial FG connectivity in each task with the partial FG connectivity at rest.
This approach produced 5 contrast maps for the FG at the group level, one for each main task (voxel threshold z > 2.3, cluster-corrected to P < 0.05). Interestingly, for both face tasks as well as the conjunction run (F > R, Fs > R, C > R), we observed a significant increase in FG connectivity with a cluster in posterior occipital cortex, in the vicinity of early visual areas involved in processing low-level features (Fig. 5, upper row; Supplementary Table 1A). This cluster was absent for both scene tasks (S > R, Sf > R), even when using a more liberal voxel threshold (z > 1.65). In addition, for both stimulus competition conditions (Fs > R, Sf > R), we observed increased FG connectivity with 2 clusters in frontal cortex, in the middle frontal gyrus and left frontal pole (Supplementary Fig. 8), as well as a cluster in the precuneus found only during the face-attended condition (Fs > R). No other significant increases were found outside of these regions, but several other clusters did show reduced connectivity with the FG (Supplementary Table 1B).
Whole-brain PHC analyses revealed a similar cluster in posterior occipital cortex (Fig. 5, lower row; Supplementary Table 1C). However, in opposition to the whole-brain FG analyses, this cluster was only observed during scene tasks (S, Sf) and not during face or conjunction tasks (F, Fs, C)—even when using a more liberal voxel threshold (P < 0.05). In addition, for both stimulus competition conditions (Fs > R, Sf > R), we observed increased PHC connectivity with overlapping clusters in the supramarginal gyrus (Supplementary Fig. 8), as well as a cluster in the inferior parietal sulcus found only during the scene-attended condition (Sf > R). No other significant increases were found, but several other clusters did show reduced connectivity (Supplementary Table 1D).
CS ROI Analyses
We observed increased FG and PHC connectivity with posterior occipital cortex, and these effects were limited to the respective category-selective tasks. Task-based modulation of background connectivity with early visual areas would be striking because of their putative lack of selectivity for high-level categories. It remains possible, however, that these effects can be explained by the low-level properties of our face and scene stimuli. Moreover, our previous analyses did not directly assess whether there were quantitative differences in the amount of connectivity for face versus scene tasks.
To address these issues, we conducted an ROI analysis between the FG, PHC, and CS. Since we did not have a retinotopic localizer (e.g., Wandell et al. 2007), we defined a CS ROI in each subject based on anatomical and functional constraints as explained in Materials and Methods. Most importantly, we chose a CS seed that did not respond differentially to faces and scenes in the localizer. As a result, we could examine how background connectivity between category-selective high-level regions and nonselective low-level regions changed across face and scene tasks.
Our analysis approach was identical to that used for FG–PHC interactions, except that we also removed shared variance between FG and PHC to isolate variance that is relatively unique to these regions (see Materials and Methods). Background connectivity with the CS was analyzed with the same 2 (category) × 2 (stimulus competition) repeated-measures ANOVA. For both FG–CS and PHC–CS connectivity, we observed a main effect of category (FG–CS: F1,17 = 17.984, P = 0.0006; PHC–CS: F1,17 = 7.070, P = 0.0165). Importantly, the main effect of category was driven by opposite patterns for FG–CS and PHC–CS connectivity (Fig. 6): face tasks increased FG–CS connectivity relative to scenes (F/Fs > S/Sf), while scene tasks increased PHC–CS connectivity relative to faces (S/Sf > F/Fs). There was no main effect of stimulus competition (FG–CS: F1,17 = 1.396, P = 0.2537; PHC–CS: F1,17 = 1.173, P = 0.294) and no interaction (FG–CS: F1,17 = 2.272, P = 0.1501; PHC–CS: F1,17 = 1.079, P = 0.3136). These results demonstrate that both face- and scene-selective background connectivity can be observed with a single, nonselective region.
Follow-up comparisons revealed that FG–CS connectivity increased during the face-alone condition relative to scene-alone and resting conditions (F > S: t17 = 3.829, P = 0.0013; F > R: t17 = 5.872, P < 0.0001). In contrast, PHC–CS connectivity increased during the scene-alone condition relative to face-alone and resting conditions (S > F: t17 = 2.263, P = 0.0370; S > R: t17 = 2.974, P = 0.0085). The same pattern was observed for the stimulus competition conditions, reaching significance for FG–CS connectivity (Fs > Sf: t17 = 2.289, P = 0.0352; Fs > R: t17 = 3.921, P = 0.0011), but not PHC–CS connectivity (Sf > Fs: t17 = 1.500, P = 0.1520; Sf > R: t < 1). Finally, it is interesting to note that resting connectivity for FG–CS and PHC–CS did not differ from zero (ts < 1), in contrast with the strong resting connectivity observed for FG–PHC (t17 = 5.590, P < 0.0001). This provides a novel demonstration that task-induced interactions can be observed between regions with little or no resting connectivity.
By examining functional connectivity in the background of tasks, we explored how category-selective regions interact during visual processing. In particular, we investigated whether the interactions of category-selective regions depend on which category is task-relevant, or instead reflect domain-general competitive mechanisms or a fixed intrinsic functional architecture. We focused on 2 well-characterized regions in human ventral visual cortex: the face-selective FG (Kanwisher et al. 1997; McCarthy et al. 1997; Tsao and Livingstone 2008) and the scene-selective PHC (Aguirre et al. 1998; Epstein and Kanwisher 1998; Epstein 2008). As an index of the interactivity between these regions during high-level visual processing, we measured the correlation in their BOLD time course during different perceptual tasks after removing mean evoked responses (background connectivity). We manipulated which stimulus category was task-relevant (faces vs. scenes) and whether stimuli were themselves in competition (faces/scenes alone vs. faces/scenes superimposed).
Our results suggest that FG–PHC background connectivity is particularly sensitive to which category is task-relevant. Face and scene tasks resulted in opposite changes relative to rest: FG–PHC background connectivity decreased during face tasks and increased during scene tasks. Thus, our results suggest that FG–PHC interactions are asymmetric with respect to face and scene processing. Interestingly, these changes occurred irrespective of whether the stimuli were presented alone or whether they were spatially superimposed and required selective attention. Since these effects were observed in the selective attention conditions (when bottom-up stimuli were identical), our results demonstrate that FG–PHC interactions can be modulated by top-down goals.
Examining background connectivity in the whole brain, we observed regions of posterior occipital cortex that selectively coupled with the FG during face tasks and the PHC during scene tasks. Follow-up ROI analyses revealed that these category-specific effects persisted after limiting analysis to a region of the CS that did not respond differentially to faces and scenes. Thus, category-specific visual processing alters cortical interactions with nonselective regions of visual cortex that process low-level features.
While past studies have focused on the individual regions activated by different visual categories, these findings suggest that patterns of connectivity can also be selective for different categories. Below, the implications and limitations of these results are discussed in the context of category-selective processing and functional connectivity.
Our initial question concerned whether the interactions between category-selective regions have a distinct functional profile and thus change in response to certain visual tasks. To explore this question, we examined whether FG–PHC background connectivity reflects the task relevance of particular categories or more domain-general types of processing. For example, by analogy to competition models (Desimone and Duncan 1995), neural populations selective for different categories might symmetrically inhibit each other (Allison et al. 2002; Pelphrey et al. 2003); thus the correlation between regions selective for different categories might be reduced in the presence of either preferred stimulus.
Our results however suggest that FG–PHC background connectivity is primarily sensitive to differences between categories. Thus, in contrast to what might be expected from competition models (Allison et al. 2002; Pelphrey et al. 2003), we observed asymmetric changes with respect to face and scene processing: background connectivity decreased during face tasks and increased during scene tasks. This suggests that the effects of category-specific processing may extend beyond the evoked responses of single regions and may in part reflect the functional organization of category-sensitive interactions, including between regions with very different selectivities.
One explanation for this category asymmetry that is still somewhat consistent with the notion of competition is that FG and PHC may differ in the extent to which they inhibit each other. For example, competition may be biased toward scenes by default (even during rest), such that scene processing does not result in additional inhibition from the PHC to the FG (but not vice versa). A lack of inhibition from the PHC is also consistent with a default attentional priority for processing faces, supported by the finding that faces capture attention even when embedded in a complex scene (Vuilleumier and Schwartz 2001).
Another complementary interpretation of our findings that does not invoke competition is that face and scene perception represent qualitatively different types of high-level visual processing. Specifically, scene processing may be more global and relational than face processing, and thus—to the extent that background connectivity serves as a proxy for neuronal interaction—scene processing may be accompanied by comparatively greater FG–PHC background connectivity. For example, PHC might serve to bind multiple objects and faces into coherent scenes (Bar and Aminoff 2003; Bar 2004; Köhler et al. 2005) and therefore benefit from coordination with face-selective cortical regions. At the same time, face-selective regions such as FG may depend on category-specific computations that do not benefit from interaction with PHC. In support of this idea, activation is observed in PHC during the retrieval of contextually relevant objects, but not in FG (Janzen and van Turennout 2004). More generally, objects with rich spatial contexts have been found to differentially activate parahippocampal regions (Bar and Aminoff 2003; Bar et al. 2008; but see Epstein and Ward 2010).
These ideas are supported by the intuitive notion that faces and scenes are qualitatively different types of visual information. For example, faces are retinotopically more focal than scenes, a property reflected in the retinotopic biases of FG and PHC (Hasson et al. 2002; Schwarzlose et al. 2008). Similarly, computational models of face and object processing emphasize the need for spatially invariant representations (Riesenhuber and Poggio 1999), while models of scene recognition often rely more heavily on spatial dependencies (Oliva and Torralba 2001). Given that many of these distinctions apply more generally to objects, an interesting question for future research would be to examine whether object-selective regions such as the lateral occipital complex would behave more like FG than PHC.
Further work will be needed to evaluate these different interpretations, including resolving some ambiguities in the interpretation of BOLD responses and functional connectivity. For example, it is possible that inhibitory inputs to a region (such as from FG to PHC) might increase the BOLD response, since it is known to be sensitive to the level of presynaptic inputs (Logothetis et al. 2001). Note, however, that concerns about whether functional connectivity reflects inhibitory or excitatory interactions would be more problematic if we had observed a uniform effect of category. Instead, face processing was accompanied by decreased background connectivity, and scene processing was accompanied by increased background connectivity. Thus, the effect of category remains to be explained regardless of the underlying neural dynamics.
Another complication is that a reduced correlation between 2 regions could result from changes to a single region: 2 correlated regions will become less correlated when noise is added to one of the regions. For example, if face processing modulates FG activity in a manner independent of PHC, this could lead to a reduced FG–PHC correlation. The most basic version of this account would be that faces activate the FG but not the PHC. However, we have attempted to avoid the influence of asymmetric activation by removing all traces of the evoked response in every region using a liberal model-free FIR approach. In addition, this account would predict that local idiosyncratic changes to the FG time course should lead to broad decreases in FG connectivity throughout cortex, but we observed clear increases in FG connectivity with posterior occipital cortex (including the CS), as well as with several frontal regions. For similar reasons, it is unlikely that increased signal can explain our findings, since we observed clear decreases in background connectivity for both ROI and whole-brain analyses. Indeed to explain our results in terms of signal to noise, one would need to posit regionally specific, task-modulated, and bi-directional changes in signal to noise.
Finally, changes in the interaction between FG and PHC might be mediated by other regions. For example, FG–PHC connectivity could be modulated by correlated inputs from frontal or parietal regions (e.g., Moore et al. 2006; Stevens et al. 2010). Future studies will be necessary to map out the precise functional pathway of such cortical interactions (e.g. Moeller et al. 2008).
Visual System Connectivity
One of the most intriguing findings of the whole-brain analysis was that connectivity between FG/PHC and several regions in posterior occipital cortex increased during face and scene processing. Interestingly, connectivity with these visual regions was selective for categories: FG connectivity increased only during face tasks, while PHC connectivity increased only during scene tasks. Follow-up CS ROI analyses suggested that these results cannot solely be explained by selectivity for the different low-level features of faces and scenes.
Thus, inferotemporal (IT) regions coding for a particular high-level category may become synchronized with areas earlier in the visual processing hierarchy when that category is task-relevant. This may facilitate the rapid feed-forward processing of category information (Oliva and Torralba 2001; Liu et al. 2002). In addition, feedback from IT may also directly change the response properties of earlier visual cortex to more closely reflect the firing patterns observed in FG/PHC. This kind of feedback has been suggested to prune responses earlier in the hierarchy to better reflect computations performed in higher level regions (Murray et al. 2002; Hochstein and Ahissar 2002).
Note that this analysis was only possible because of our novel background connectivity approach. Without this approach, FG–CS connectivity might appear face-selective because CS and FG both respond to faces, and PHC–CS connectivity might appear scene-selective because CS and PHC both respond to scenes. By removing all evoked responses, background connectivity may reflect interactions among regions, rather than their shared responses to stimuli. As evidence that this approach was successful, background connectivity was more focal in the visual system than would be expected if evoked responses had seeped through. In addition, other regions with typically strong coactivations, such as the FG and posterior superior temporal sulcus (Turk-Browne et al. 2010), showed reduced background connectivity during tasks. This suggests that task-induced background connectivity between IT and posterior occipital cortex is special and does not simply reflect the evoked responses of each region. Determining precisely which retinotopic visual areas increase their connectivity with IT remains an important question for future research.
Finally, given that behavioral performance was not uniform across our tasks, the observed category differences could in principle reflect variation in task difficulty. However, several aspects of our analyses and data make this unlikely. First, there was no consistent correspondence between behavior and background connectivity for any of our ROI pairs across all tasks. Perhaps the closest correspondence was observed for FG–PHC background connectivity, but even here there are notable differences. For example, behavioral performance was virtually identical for face-only and scene-only tasks, despite robust differences in background connectivity. In addition, similarities that exist between behavioral performance and FG–PHC background connectivity do not replicate for other ROI pairs. For example, FG–PHC background connectivity was roughly similar to behavioral performance for the composite-image conditions, but the same was not true for FG–CS background connectivity. Further, correlations between behavioral performance and background connectivity across subjects did not reveal a reliable relationship, and if anything trended toward a negative relationship. This is the opposite of what would be predicted from a task difficulty account of FG–PHC background connectivity. Nevertheless, it remains possible that differences in task difficulty modulated category effects. For example, increased difficulty for the face-attended task may help explain why category effects were stronger for FG–CS versus PHC–CS connectivity.
An inherent ambiguity with examining functional connectivity in visual cortex is that the onset of a stimulus will tend to correlate the responses of different regions even if they do not directly interact. The most prominent way of avoiding such ambiguity is to examine correlations during rest (e.g., Turk-Browne et al. 2010). Resting connectivity studies using seed-based (Biswal et al. 1995; Greicius et al. 2003) and data-driven approaches (e.g., independent components analysis; De Luca et al. 2006) have identified several resting networks that have been associated with language, vision, attention, and memory (Lowe et al. 1998; Damoiseaux et al. 2006). In all of these cases, however, functional connectivity could only be interpreted in cognitive terms by comparing resting networks with regions typically coactivated by particular tasks. In contrast, by manipulating tasks and examining changes in the concurrent background connectivity, our approach directly probes the relationship between functional connectivity and cognitive processes. This allows functional connectivity to be examined without referencing other studies and also allows for a closer inspection of the interaction between background connectivity and task-evoked responses.
To our knowledge, no previous studies have directly examined interactions between FG and PHC. However, several other studies have explored task-based FG or PHC connectivity with the rest of cortex (Gazzaley et al. 2004; Summerfield et al. 2006; Rotshtein et al. 2007; Fairhall and Ishai 2007; Nummenmaa et al. 2010). For example, gaze shifts increase FG connectivity with clusters in the superior temporal and middle frontal gyri (Nummenmaa et al. 2010), and emotional faces enhance FG connectivity with the amygdala (Fairhall and Ishai 2007). Most relevant to the current study, top-down input alters FG and PHC connectivity with frontal areas (Summerfield et al. 2006), and these changes can persist across task and rest conditions (Stevens et al. 2010).
However, most existing task-based approaches make assumptions about the shape of the HRF, which is quite variable between subjects and regions (Miezin et al. 2000). Critically, if these assumptions are violated (i.e., if the predicted HRF does not match a region's actual response), then residual evoked responses can contribute to functional connectivity (cf. Gitelman et al. 2003). In contrast, our approach makes no assumptions about the shape of the HRF because evoked responses are modeled using an FIR basis function. One prior study used an FIR model to examine state-related connectivity (Summerfield et al. 2006); however, it may be difficult to capture all stimulus-locked activity in jittered event-related designs, as used in that and other studies.
Thus, our analysis approach and block design are well suited for examining how tasks modulate connectivity, a question that remains controversial. Some researchers have emphasized the consistency of network correlations across rest and task states (Fox and Raichle 2007; Buckner et al. 2009; Smith et al. 2009). For example, it has been suggested that stimulus-locked responses are linearly superimposed on resting connectivity (Fox et al. 2006; Fox and Raichle 2007). This claim, however, conflicts with the results of our study: if stimulus-locked and resting responses are linearly superimposed, then the residual of an accurate linear model of the stimulus-locked response should consist of resting variance that remains constant across tasks. One possible explanation for this discrepancy is that prior studies examined task changes over comparatively short time scales (e.g., Fox et al. 2006; Leber et al. 2008) and thus may not have captured low-frequency changes, such as those related to maintaining task sets over prolonged time periods. Since resting connectivity is typically evaluated over long temporal windows, sustained task sets may allow for a particularly well-matched comparison of resting and task-based functional connectivity. However, sustained tasks are likely not strictly necessary, since connectivity changes have also been observed in the residuals of event-related data (Fair et al. 2007). Thus, our findings add to a growing body of work showing dynamic changes in connectivity related to tasks (Rissman et al. 2008; Hasson et al. 2009; Boorman et al. 2009; Stevens et al. 2010).
The observation of background connectivity between regions with no resting relationship—such as between FG/PHC and CS—is particularly informative, demonstrating that our approach can discover potentially important interactions that are not apparent from patterns of resting connectivity alone. Given their differences, the combination of resting and background connectivity may also provide unique insights: for example, resting connectivity could be used to reveal latent networks, and the functional significance of particular interactions could then be studied by manipulating tasks and assessing background connectivity.
Finally, while most resting studies have focused on oscillations below 0.08 or 0.1 Hz (Biswal et al. 1995; Cordes et al. 2001; cf. Salvador et al. 2008), we observed meaningful changes in background connectivity at higher frequencies (e.g., 0.09–0.13 Hz), and these changes differed from what was observed at lower frequencies. In particular, FG–PHC background connectivity decreased at higher frequencies during face tasks, but increased at lower frequencies during scene tasks. These changes in functional connectivity at frequencies higher than 0.1 Hz may be limited to task-based analyses or may not have been sufficiently explored in resting data. In either case, the 1/f nature of BOLD activity (e.g., Zarahn et al. 1997) may mean that lower frequency signals are most salient in resting and background activity and that higher frequency effects only become apparent when examining differences, either between rest and task or between 2 tasks. Indeed, a similar pattern is often observed in patterns of EEG, albeit at much higher overall frequencies. At rest, lower frequency coherences (<30 Hz) typically dominate the spectrum, but when comparing across tasks, high-frequency changes (>30 Hz) often become apparent (Engell and McCarthy 2010). Indeed, many EEG studies have also reported functional dissociations across different frequency ranges (Pfurtscheller and Lopes da Silva 1999).
Background connectivity in ventral visual cortex did not decrease uniformly during object recognition, as might be suggested by models of local competition in IT. Rather, changes in background connectivity were linked to which category is currently task-relevant. These results suggest that category specificity is best characterized not only by the set of regions with the greatest evoked response but also by the way in which regions interact. Moreover, stimuli of different categories may vary in the extent to which they are processed interactively in ventral visual cortex. Finally, our results as a whole reveal that meaningful regional correlations exist beneath the surface of evoked responses, that they can be measured with background connectivity, and that they may help characterize the systems-level interactions subserving specific cognitive processes.
National Institutes of Health (P01 NS41328 to G.M., EY014193 and P30 EY000785 to M.M.C.).
Conflict of Interest : None declared.