Repetition suppression (RS) (or functional magnetic resonance imaging adaptation) refers to the reduction in blood oxygen level–dependent signal following repeated presentation of a stimulus. RS is frequently used to investigate the role of face-selective regions in human visual cortex and is commonly thought to be a “localized” effect, reflecting fatigue of a neuronal population representing a given stimulus. In contrast, predictive coding theories characterize RS as a consequence of “top-down” changes in between-region modulation. Differentiating between these accounts is crucial for the correct interpretation of RS effects in the face-processing network. Here, dynamic causal modeling revealed that different mechanisms underlie different forms of RS to faces in occipitotemporal cortex. For both familiar and unfamiliar faces, repetition of identical face images (same size) was associated with changes in “forward” connectivity between the occipital face area (OFA) and the fusiform face area (FFA) (OFA-to-FFA). In contrast, RS across image size was characterized by altered “backward” connectivity (FFA-to-OFA). In addition, evidence was higher for models in which information projected directly into both OFA and FFA, challenging the role of OFA as the input stage of the face-processing network. These findings suggest “size-invariant” RS to faces is a consequence of interactions between regions rather than being a localized effect.
Repetition suppression (RS) or functional magnetic resonance imaging (fMRI) adaptation refers to the reduction in blood oxygen level–dependent (BOLD) signal following 2 or more presentations of the same stimulus and is thought to indicate the presence of a neural population tuned to that stimulus (Naccache and Dehaene 2001; Grill-Spector et al. 2006). In the past 10 years, over 60 studies have used RS to investigate the functional role of numerous face-selective regions in human visual cortex thought to be involved in coding different facial attributes, such as identity, expression, and eye gaze (Grill-Spector and Malach 2001; Winston et al. 2004; Rotshtein et al. 2005; Calder et al. 2007). The vast majority have focused on RS to facial identity and have shown that 2 regions of ventral occipitotemporal cortex—the occipital face area (OFA) and the fusiform face area (FFA)—show RS to repeated presentations of the same facial identity (compared with different identities) that persists across changes in image size (Andrews and Ewbank 2004; Kovacs et al. 2008; Cziraki et al. 2010; Lee et al. 2011). RS has also been observed across small changes in view (Fang et al. 2007; Ewbank and Andrews 2008), although this finding is less reliable (Grill-Spector and Malach 2001; Andrews and Ewbank 2004; Pourtois et al. 2005).
A common assumption is that RS reflects localized, “within-region” changes, such as neuronal fatigue (see Grill-Spector et al. 2006). In contrast, predictive coding models suggest that rather than being a localized effect, repetition leads to a reduction in neural activity in a given area because it reflects a decrease in prediction error between bottom-up (stimulus-related) and top-down (prediction-related) inputs (Henson 2003; Friston 2005). Consistent with this, we recently used dynamic causal modeling (DCM) to show that RS to images of the same body in occipitotemporal cortex was associated with changes in “top-down” connectivity between the “higher level” fusiform body area (FBA) and the “lower-level” extrastriate body area (EBA) (Ewbank et al. 2011)—2 regions that lie adjacent to the face-selective FFA and OFA (Peelen and Downing 2007). Furthermore, we found that changes in “forward” connectivity were only apparent during repetition of an identical image. Top-down modulation of RS in FFA has been suggested by previous work that demonstrated greater RS to expected rather than unexpected face repetitions (Summerfield et al. 2008, 2011), although similar effects of expectancy have not been found in macaque inferior temporal cortex (Kaliukhovich and Vogels 2011). These results challenge the inference that a brain region showing RS to a given stimulus necessarily contains a neuronal population holding a representation of that stimulus. However, as yet there is no direct evidence demonstrating changes in top-down connectivity during repetition of faces. Furthermore, given that changes in top-down connectivity within the ventral visual stream appear to underlie RS to bodies (Ewbank et al. 2011), a similar finding for faces would suggest that such changes are part of a general mechanism underpinning fMRI RS to complex objects.
Evidence that RS to faces is best accounted for by a predictive coding model would have major implications for the interpretation of numerous studies that have used RS to infer the nature of representations within different regions of the face network. Specifically, previous studies have assumed that a brain region that shows RS to a particular visual attribute (e.g., facial identity) across changes in visual properties (e.g., size, facial expression etc.) indicates the presence of a neural population within that brain area that is invariant to these properties. However, if for example, OFA shows RS across changes in image size, this does not necessarily imply size-invariant representations within OFA. The OFA could code size–specific representations, and RS in this region could reflect predictions from size-invariant representations in “higher” regions such as FFA (Henson 2003; see Ewbank et al. 2011 for a fuller description of such a model).
A second important issue we address relates to the functional architecture underlying face perception. Haxby et al.’s (2000) influential model of face processing portrays OFA as the input stage of the face-processing network projecting to 2 separate routes—one incorporating the midlateral fusiform gyrus (FFA) involved in processing facial identity, and the other incorporating the posterior superior temporal sulcus involved in processing facial expression—although see Calder and Young (2005) and Calder (2011) for evidence against such a clear distinction. However, an alternative proposal is that input enters FFA directly and that OFA activity is driven primarily by feedback connections from FFA (Rossion 2008). This idea is based upon evidence from prosopagnosic patients, who show right FFA activity despite damage to the right OFA and left FFA (Rossion et al. 2003; Schiltz et al. 2006) or bilateral OFA (Steeves et al. 2006). Previous DCM studies have reported evidence in favor of a simple feed-forward relationship between face-selective regions of occipitotemporal cortex (i.e., input entering OFA and then modulating FFA) (Fairhall and Ishai 2007; Li et al. 2010; Cohen Kadosh et al. 2011). However, these studies assumed that driving input enters OFA only and did not test alternative models in which input enters other visual regions. A third possibility, not yet evaluated, is that face information enters both OFA and FFA in parallel (e.g., from early visual areas). The current study provides the first direct comparison of these 3 alternative frameworks.
Finally, given that previous work has found different patterns of RS to familiar and unfamiliar faces in occipitotemporal cortex (Henson et al. 2000, 2003; Andrews et al. 2010), we also sought to determine whether different patterns of effective connectivity might underlie RS to familiar and unfamiliar faces. Recent research has identified a further face-selective area in the anterior temporal lobe (Tsao et al. 2008; Rajimehr et al. 2009), thought to be involved in the coding of individual facial identities (Kriegeskorte et al. 2007). However, it has yet to be determined whether this region shows RS to faces and/or whether RS in this region differs for familiar compared with unfamiliar faces.
In summary, the aim of the study was to use DCM to determine the mechanisms underlying RS to faces in the occipitotemporal cortex. If RS reflects local changes such as neural fatigue, then it should be primarily associated with changes in self-connectivity. However, if RS is best explained by a predictive coding model then we would predict changes in “backward” or top-down (FFA-to-OFA) connectivity during RS.
Materials and Methods
Thirty healthy volunteers (8 female, all right-handed, aged 18–37 years old, mean age = 24.7) with normal or corrected to normal vision participated in this study. No participant had a history of neurological disease or head injury or was currently on medication affecting the central nervous system. The data from 4 participants were excluded (one due to scanner malfunction and 3 due to excessive head movement). The study was approved by the Cambridgeshire Research Ethics Committee. All volunteers provided written informed consent and were paid for participating.
At the start of each session, a localizer scan was used in order to identify face-selective OFA and FFA in ventral occipitotemporal cortex. Images of 32 familiar faces, 32 unfamiliar faces, 32 houses, and 32 scrambled versions of faces were presented using a block design, consisting of four 16-s blocks for each condition; a block contained 8 images with each image shown for 1800 ms, followed by a 200 ms fixation. Blocks of stimuli were separated by a rest block (fixation) of equal duration (16 s). Participants performed a target detection task and responded, via a button press, whenever they saw a green dot appear on the image. One or 2 images in each block contained a green dot. To maximize the number of face-selective regions of interest (ROIs) identified across participants, ROIs were identified using the contrast of faces (familiar + unfamiliar) > scrambled faces, at a minimal threshold of P < 0.001 uncorrected (10 contiguous voxels). This contrast is also thought to provide greater sensitivity to a face-selective region in the anterior temporal lobe compared with alternative contrasts such as faces versus houses/places (Kriegeskorte et al. 2007; Rajimehr et al. 2009). To verify that the ROIs identified using this contrast showed selectivity for faces relative to other complex objects, we also included a comparison of faces versus houses.
Participants lay supine in the magnet bore and viewed images projected onto a screen visible via an angled mirror placed above the participant’s head. Color photographs of unfamiliar faces were obtained from various sources, including the NimStim Face Stimulus Set (Tottenham et al. 2009) http://www.macbrain.org/resources.htm (date last accessed 8 March 2012), the Karolinska Directed Emotional Faces (KDEF) image set (Lundqvist and Litton 1998) http://www.emotionlab.se/resources/kdef (date last accessed 8 March 2012), and the FERET database (Phillips et al. 2000) http://www.itl.nist.gov/iad/humanid/feret/ (date last accessed 8 March 2012) Color photographs of famous faces were obtained from the worldwide web. Familiar and unfamiliar faces were matched for gender, age, and facial expression and differed from those used in the localizer scan. Images were presented using a block design. Each block lasted for 10 s and contained 10 images. Each image was presented for 800 ms followed by a 200-ms blank screen. Participants were required to perform a dot-detection task identical to that in the localizer scan.
The experiment used a 2 × 2 × 2 repeated measures design, examining the effect of Familiarity (familiar, unfamiliar), Image Size (same-size, vary-size), and Repetition (same-identity and different-identity). Each level comprised 10 same-identity blocks in which an image of the same identity was shown 10 times, and 10 different identity blocks in which images of 10 different faces were shown, giving a total of 80 stimulus blocks (Fig. 1). In the same-size blocks, all faces were shown from a frontal viewpoint and subtended a visual angle of 9° × 6°. Blocks in the vary-size condition contained images shown at full size (9° × 6°) and at 66% and 33% of this size, presented in a random sequence. Blocks of images were presented in a counterbalanced order and were separated by 8-s periods of fixation when an equiluminant gray screen was viewed. Individual identities were shown an equal number of times in the same and different identity blocks. Presentation of images was controlled using E-Prime software (Psychology Software Tools Inc., Pittsburgh, PA). To verify that participants could identify the famous faces, they were presented with a series of 84 faces—comprising the familiar and unfamiliar faces contained in the localizer and main experiment—before the scanning session. Participants were asked to report which of the faces they recognized by giving either a name or biographical information. Mean recognition rate (±1 standard error) of famous faces used in the experimental scan was 93.1% (2.8), and recognition rate of famous faces used in the localizer scan was 77.5% (3.8).
MRI scanning was performed on a Siemens Tim Trio 3-T MR scanner. Brain data were acquired with -weighted echo-planar imaging sensitive to BOLD signal contrast. Each image volume consisted of 32, 1.8-mm thick slices (gap 25%; field of view 192 × 192 mm; voxel size 3 × 3 × 2.25 mm; flip angle 78°; time echo 30 ms; time repetition 2 s). Slices (1.8 mm) were used in order to optimize signal in the anterior temporal lobe (Bellgowan et al. 2006). Slices were acquired sequentially in an axial orientation aligned along the ventral temporal lobes (slice coverage can be seen in Fig. 2A). The first 3 volumes were discarded to allow for the effects of magnetic saturation. T1-weighted structural images were acquired at a resolution of 1 × 1 × 1 mm.
Data were analyzed using SPM 8 software (Wellcome Trust Centre for Neuroimaging, London, UK). Standard preprocessing was applied, including correction for slice timing and head motion. Each participant’s scans were normalized using the linear and nonlinear normalization parameters estimated from warping each participant’s structural image directly to the Montreal Neurological Institute—ICBM avg152 T1-weighted template, using 2-mm isotropic voxels and smoothed with a Gaussian kernel of 8-mm full-width at half-maximum. For both the localizer scan and the RS scan, blocks of each condition were modeled by sustained epochs of neural activity (boxcars) convolved with a canonical hemodynamic response function. Realignment parameters were also included as effects of no interest to account for motion-related variance. A high-pass filter of 128 s was used to remove low-frequency noise.
Analysis of Regional Effects
To determine the effect of RS on face-selective regions, mean parameter estimates were extracted from an 8-mm sphere centered on the maximal voxel in each participant’s individually defined ROI using MarsBar (Brett et al. 2002). Identical ROIs were used in the univariate analysis and the DCM analysis (see below). Mean parameter estimates for each region were entered into separate 2 × 2 × 2 analyses of variance (ANOVAs) including Familiarity (familiar, unfamiliar), Size (same-size, vary-size), and Repetition (same-identity, different-identity) as repeated measures factors. To determine whether RS was apparent in regions outside of the face-selective ROIs, a group-based analysis (whole-brain FWE corrected, 10 contiguous voxels) was performed in which individual images of parameter estimates were entered into a 2 × 2 × 2 repeated-measures ANOVA.
Dynamic Causal Modeling
DCM was performed using DCM10, as implemented in SPM 8. DCM uses generative models of neural and haemodynamic processes to explain regional effects in terms of changing patterns of connectivity among regions during experimentally induced contextual modulation (Stephan et al. 2010). The principal advantage of DCM is the ability to make inferences about the presence and direction of causal connections (e.g., is activity in brain region X caused by activity in brain region Y?) using evidence based on Bayesian model selection (BMS). A standardized set of regions are identified, and all regions are included in each of several different models. The models differ in terms of connections, contextual modulation of connectivity, and driving inputs that perturb the network due to experimental events. DCM simultaneously optimizes model parameters for neuronal interactions (between region connectivity) and the regionally specific hemodynamic forward model (neurovascular coupling). The endogenous or intrinsic network connections (DCM matrix “A”) represent the average connectivity between regions across all experimental conditions. Self-connections (i.e., within region connectivity, such as “within FFA”) are estimated for each region separately. Responses in the dynamic model can be changed in 1 of 2 ways. First, inputs elicit responses through direct influences on specific regions, called the driving input of the network (DCM matrix “C”). Here, the driving input was represented by all face presentations relative to fixation (irrespective of condition). Low-level visual processing (e.g., activation to faces in V1 spreading up the cortical hierarchy) is modeled implicitly by the functions that serve as direct inputs to the network. Second, contextual modulation can alter the coupling between regions and also within regions. “RS” was included as the modulatory context (DCM matrix “B”) and was defined as the difference between same-identity blocks and different-identity blocks. All models were estimated using a deterministic rather than stochastic model of neural dynamics with each region containing a single hidden state (neuronal “activity”: a function of inhibitory and excitatory populations within voxels).
We modeled changes in connectivity for each of the 4 conditions 1) RS to same-size familiar faces, 2) RS to vary-size familiar faces, 3) RS to same-size unfamiliar faces, and 4) RS to vary-size unfamiliar faces. Models were fitted to the complete fMRI time series, with data for each condition first being adjusted for the general linear model’s fit to all other conditions.
DCM model selection procedures compare plausible mechanistic explanations for the fMRI data. The ROIs included in a model network should be both related to the experimental design, established on the basis of a general linear model (Stephan et al. 2010) and sufficient to test the hypothesis in question. As we were unable to reliably localize face-selective activation in the anterior temporal lobe in a sufficient number of subjects, we restricted our model network to OFA and FFA. The data for these nodes were extracted by taking the first eigenvariate across voxels within an 8-mm sphere centered on the peak voxel in each participant’s right OFA and right FFA as defined by the localizer scan. The first eigenvariate reflects the first principal component of the time course of a regions’ response (i.e., the principal source of variance within a region), without assuming that all voxels contribute equally. Thus, the eigenvariate is relatively denoised compared with the raw MR signal. For consistency between the univariate and DCM analysis, we performed an additional univariate analysis to examine the change in the regional response between same and different identity blocks using the first eigenvariate extracted from each region. This produced the same pattern of results as was found using the mean parameter estimates reported below (see Supplementary Material).
Model fitting is achieved by adjusting model parameters to maximize the free-energy estimate (F) of the log model evidence for a given data set (Friston et al. 2003), adjusting for model complexity (in terms of both the number of parameters and dependencies among parameters). The maximized F is a lower bound on the log model evidence, namely the probability of the data given the model (Stephan et al. 2009).
We specified a set of 33 models with systematic variations in structure. In all models, both OFA and FFA had at least one input, which could either reflect driving input from areas other than these regions (specified by DCM matrix C) and/or endogenous connections between regions (DCM matrix A). Models were first grouped into 3 Meta-Families based on differences in the location of the driving input. Driving input could enter the system by projecting directly into OFA (Meta-Family A), directly into FFA (Meta-Family B), or directly into both OFA and FFA (Meta-Family C) (all models are shown in Supplementary Figs 1–3).
Each Meta-Family was composed of 3 further families in which models were grouped by similarities in the direction of modulation of connectivity. RS could either modulate forward connectivity only (OFA-to-FFA) (Family 1), “backward” connectivity only (FFA-to-OFA) (Family 2), both forward and backward connectivity (Family 3), or within-region connectivity only (Family 4). Note, we use the terms forward and backward to refer to the flow of information between the 2 regions according to their relative locations in the ventral processing stream. Families 1 to 3 also included models with and without modulation of within-region connections (i.e., within-OFA, within-FFA). A modulation of a within-region inhibitory autoconnection by RS would reflect an increase in the rate of exponential decay of neural activity (above and beyond any saturation attributable to hemodynamics) (Friston et al. 2003). To test the possibility that RS in face-selective regions of the occipitotemporal cortex could be explained using either forward and backward between-region connections, an entirely feed-forward architecture or an entirely feedback architecture, each family included models that had both forward and backward endogenous connections, forward endogenous connections only, or backward endogenous connections only.
In view of the different patterns of RS for familiar and unfamiliar faces in the same-size and vary-size conditions (see Results), we performed BMS for each of the 4 combinations. After estimating all 33 models for each participant, we computed the group evidence for all models using random effects (RFX) BMS as implemented in DCM10. In contrast to a fixed-effects approach (FFX), a RFX approach accommodates intersubject variation and does not assume that the optimal model will be same across all participants. It is less susceptible to outliers than a FFX approach (Penny et al. 2010).
Inferences from RFX BMS can be based on the expected probability (i.e., the probability of each model generating the observed data in a randomly selected individual from the population) or the exceedance probability (i.e., the extent to which each model is more likely than any other model tested to have generated the data). The exceedance probability, reported here, is a statement of belief about the posterior probability of one model being higher than the posterior probability of any other model tested (Stephan et al. 2009). However, the expected probability and exceedance probability will be reduced as the model space increases (i.e., number of alternative models). This means that when comparing multiple models with shared features, as is the case here, one model is less likely to dominate. Thus, we adopted the family inference method (Penny et al. 2010) whereby models were first compared on the basis of group/family membership (outlined previously). The family inference method allows one to estimate the probability that a specific attribute of a model, for example, the presence or absence of a particular connection, improves or reduces model performance, regardless of any other differences among models.
For each of the 4 combinations of face familiarity and size, BMS was used to compare the 3 Meta-Families (A, B, C). All models from the most likely Meta-Family were then entered into a second BMS where models were grouped into families (1 to 4) based on similarities in the modulation of connectivity. Finally, all models from the most likely family were entered into a third BMS to identify the preferred model. Restricting the model space to plausible models (i.e., the winning family) provides a more stringent test of models, as the relative nature of BMS means it is possible that higher evidence for a given model may be the result of the presence of other implausible models. For transparency, BMS was also performed across the whole model space (i.e., all 33 models without family partitions) as shown in Supplementary Figures 4–7. Using this approach, the models identified as the most likely were the same as those identified using the family based inference.
Using the contrast of faces > scrambled faces, we localized both FFA and OFA in the right hemisphere of 26 participants, with all 26 showing activation at a threshold of P < 0.001 uncorrected (20 also showed activation in both regions at P < 0.05 whole brain FWE corrected). Mean coordinates (±1 standard deviation [SD]) for right OFA were 42(4.5), −77(6.5), −11(5.2), and for right FFA were 40(3.0), −47(6.0), −20(3.3). In the left hemisphere, both OFA and FFA were identified in 21 participants at P < 0.001 uncorrected (15 with activation at P < 0.05 whole brain FWE corrected). At the same threshold (P < 0.001 uncorrected), a contrast of faces > houses identified both OFA and FFA in the right hemisphere of 17 participants and in the left hemisphere of 15 participants. A contrast of faces > scrambled faces only identified an anterior temporal face response in the right hemisphere of 5 of 26 participants and in the left hemisphere of 4 participants. Using a more liberal criterion (P < 0.05 uncorrected) produced a slight improvement—identifying anterior temporal activation in the right hemisphere of 14 participants and left hemisphere of 11 participants (As the anterior temporal face region has been proposed to be involved in processing individual face identities [Kriegeskorte et al. 2007], we also examined whether a contrast of familiar faces > scrambled faces would more reliably identify face-related activation in the region. This contrast identified anterior temporal activation in the right hemisphere of only 4 participants and in the left hemisphere of 4 participants, at a threshold of P <0.001 uncorrected.). Given the dominant role of the right-hemisphere in face-perception (Rhodes 1985; Kanwisher et al. 1997; Rossion et al. 2000) and the greater number of participants showing right hemisphere ROIs using the faces > scrambled faces contrast, we focused on RS effects in right hemisphere ROIs only in both the univariate and DCM analysis. However, univariate results relating to left hemisphere ROIs revealed the same pattern of RS as the right hemisphere and can be found in Supplementary Materials.
To determine whether the response in each ROI identified using the faces versus scrambled contrast differed for unfamiliar and familiar faces, we extracted parameter estimates from each ROI for each of the 4 conditions. Paired t-test on the extracted parameter estimates revealed a greater response to familiar compared with unfamiliar faces in both the right OFA (t25 = 2.96, P < 0.01) and right FFA (t25 = 4.04, P < 0.001). However, we found no difference between familiar and unfamiliar faces in the right anterior temporal lobe (t13 = 0.68, P = 0.51). Finally, paired t-tests revealed that right FFA, OFA, and anterior temporal lobe showed a greater response to both familiar and unfamiliar faces compared with houses (all P’s < 0.05), indicating that these regions also showed selectivity for faces relative to another category of complex object.
Dot detection task.
Accuracy rates for the dot-detection task were close to ceiling; mean accuracy rate (+1 SD) = 99.1% (0.6) and were therefore not analyzed further. Means and SDs of accuracy rates and response times (RTs) for all RS conditions can been found in Table 1. RTs were entered into a 2 × 2 × 2 repeated measures ANOVA including Familiarity (familiar, unfamiliar), Size (same-size, vary-size), and Repetition (same, different) as repeated measures factors. This revealed no main effects of Familiarity (F < 1), Repetition (F1,25 = 1.2, P = 0.28), or Size (F1,25 = 2.1, P = 0.16) and no interactions between these factors (all P’s > 0.17). Any effects of these factors on RS are therefore unlikely to reflect changes in attentional focus or task difficulty.
|Familiar face||Unfamiliar face|
|Same size||Vary size||Same size||Vary size|
|Same Id||Vary Id||Same Id||Vary Id||Same Id||Vary Id||Same Id||Vary Id|
|Familiar face||Unfamiliar face|
|Same size||Vary size||Same size||Vary size|
|Same Id||Vary Id||Same Id||Vary Id||Same Id||Vary Id||Same Id||Vary Id|
Occipital face area.
Mean parameter estimates from the OFA were entered into a 2 × 2 × 2 repeated measures ANOVA analogous to the behavioral analysis. The ANOVA revealed a main effect of Repetition (F1,25 = 80.26, P < 0.001), with participants showing a reduced response to same-identity blocks compared with different-identity blocks (Fig. 2A). This was qualified by a Size × Repetition interaction (F1,25 = 11.01, P < 0.005) reflecting greater RS in the same-size condition relative to the vary-size condition and by a Familiarity × Size × Repetition interaction (F1,25 = 5.01, P < 0.05). To determine the nature of the 3-way interaction, we performed separate 2 × 2 (Size x Repetition) ANOVAs on data from the familiar face and unfamiliar face conditions. This revealed an interaction between Size and Repetition in the unfamiliar face condition (F1,25 = 15.35, P < 0.001), reflecting less RS to unfamiliar vary-size faces compared with unfamiliar same-size faces, but no interaction in the familiar face condition (F < 1). This suggests greater generalization of RS across size for familiar faces compared with unfamiliar faces.
Fusiform face area.
A 2 × 2 × 2 ANOVA for FFA also revealed a main effect of Repetition (F1,25 = 89.81, P < 0.001). Again, we found interactions between Size and Repetition (F1,25 = 10.78, P < 0.005) and Familiarity, Size, and Repetition (F1,25 = 5.36, P < 0.05). Separate ANOVAs examining familiar and unfamiliar faces revealed a similar pattern as found in OFA, with an interaction between Size and Repetition for unfamiliar faces (F1,25 = 17.7, P < 0.001) but not familiar faces (F < 1) (Fig. 2B).
Anterior Temporal Lobe
Unlike OFA and FFA, we found no main effect of Repetition in the anterior temporal lobe (F1,13 = 1.84, P = 0.20) and no interaction between Familiarity, Size, and Repetition (F1,13 = 1.23, P = 0.28). It should be noted that this analysis includes ROIs defined using a particularly liberal threshold (P < 0.05 uncorrected) and has less power (n = 14) compared with the analysis of OFA and FFA.
Besides OFA and FFA, a whole-brain analysis revealed that no other regions showed a main effect of Repetition (i.e., greater reduction in activity to same-identity compared with different-identity) that survived correction for multiple comparisons (P < 0.05 FWE). There were also no interactions between Familiarity, Size, and/or Repetition.
Familiar face same-size condition.
BMS identified Meta-Family C as the most likely family (exceedance probability = 0.96) (Fig. 3A). The common factor underlying all models in Meta-Family C is that driving input enters both OFA and FFA. A further family-level BMS, with models from Meta-Family C divided according to the direction of modulation of connectivity during RS, favored Family C1 (exceedance probability = 0.98) (Fig. 3B). All models in this family are characterized by changes in forward (OFA-to-FFA) connectivity during RS. Finally, a third BMS for models in Family C1 identified Model C12 as the most likely model (exceedance probability = 0.99) (Fig. 3C). Model C12 has endogenous forward and backward connections between OFA and FFA, a change in forward (OFA-to-FFA) connectivity during RS and no effect of RS on self-connectivity (Fig. 7A).
Familiar face vary-size condition.
At the level of driving input, BMS again favored Meta-Family C (exceedance probability = 0.85) (Fig. 4A). In contrast to the familiar face same-size condition, however, model evidence was highest for Family C2 (exceedance probability = 0.96) (Fig. 4B). Models in Family C2 are all characterized by changes in backward (FFA-to-OFA) connectivity during RS. Finally, a third BMS identified Model C22 as the preferred model (exceedance probability = 0.99) (Fig. 4C). Model C22 also has endogenous forward and backward connections between OFA and FFA, however RS modulates backward (FFA-to-OFA) connectivity, with no effect of RS on self-connectivity (Fig. 7B).
Unfamiliar face same-size and vary-size conditions.
For both conditions, BMS produced the same result as for familiar faces. In the same-size condition, model evidence was highest for Meta-Family C (exceedance probability = 0.92) (Fig. 5A), then Family C1 (exceedance probability = 0.75) (Fig. 5B) and Model C12 within this family (exceedance probability = 1) (Fig. 5C and Fig. 7A). While for the vary-size condition, model evidence favored Meta-Family C (exceedance probability = 0.95) (Fig 6A), then Family C2 (exceedance probability = 0.72) (Fig. 6B) and Model C22 (exceedance probability = 0.99) (Fig. 6C and Fig. 7B).
RS is increasingly used to probe the perceptual and neural representations of different facial attributes and their locus within the face-processing network. Understanding the mechanisms underlying RS is therefore vital to the correct interpretation of RS effects. Here, we sought to determine whether RS of facial identity reflects locally based changes such as neuronal fatigue or is best accounted for by predictive coding models emphasizing the role of top-down modulation.
The face-selective OFA and FFA both showed RS to familiar and unfamiliar faces that persisted across changes in image size, with no other brain regions showing a RS effect. The analysis of interactions between regions revealed that the mechanisms underlying RS to identical face images and RS across changes in image size were qualitatively different. RS to identical stimuli (same-size) was manifest by changes in forward connectivity (OFA-to-FFA), whereas RS across different sizes was manifest by changes in backward connectivity (FFA-to-OFA). These findings suggest that interactions between core face-processing regions in occipitotemporal cortex may underlie “size-invariant” fMRI RS to faces and challenge the proposal that RS reflects locally based changes alone (see Grill-Spector et al. 2006).
Mechanisms of RS to Faces
The finding that different patterns of effective connectivity underlie different forms of RS accords with recent work investigating the mechanisms underlying RS to body images in occipitotemporal cortex (Ewbank et al. 2011). Consistent with the current study, RS to the same body across changes in size and view was associated with changes in top-down connectivity (FBA-to-EBA). Moreover, both the previous study and the current study indicate that changes in forward connectivity (EBA-to-FBA and OFA-to-FFA) are only apparent during repetition of an identical stimulus. Thus, converging evidence suggests that qualitatively different patterns of effective connectivity underlie these 2 forms of RS to complex objects within the ventral visual pathway, irrespective of stimulus category.
The change in top-down connectivity during RS across different image sizes is consistent with models of predictive coding, in which RS is thought to reflect a decrease in prediction error between bottom-up (stimulus-related) and top-down (prediction-related) inputs (Henson 2003; Friston 2005). However, our data show that top-down modulations may not be necessary for all form of RS since a change in forward connectivity alone occurred in the same-size condition. Changes in forward connectivity during the same-size condition may be a consequence of reduced “bottom-up” prediction error originating from “earlier” visual areas feeding into OFA. However, the absence of a change in forward connectivity between the same- and different-identity conditions across changes in image size may reflect limitations in the accuracy of predictions from FFA to OFA. For example, repetition of the same face identity may change the predictive inputs from FFA to OFA so that they become more tuned toward a specific identity. However, during blocks of changing size, top-down predictions may only provide guidance regarding face identity and not accurate predictions of identity and image-size combinations (see also Ewbank et al. 2011). Therefore, a forward prediction error occurs in the same-identity vary-size blocks, meaning that forward connectivity does not differ between same- and different-identity conditions.
An alternative explanation is that a change in forward connectivity is the consequence of neuronal fatigue, whereby repeated activation of the same OFA population leads to reduced neuronal firing (see Grill-Spector et al. 2006). It is also possible that changes in top-down connectivity could reflect fatigue of a neuronal population within FFA, resulting in a change in feedback to OFA. However, the idea that RS is attributable to neuronal fatigue within both OFA and FFA appears difficult to reconcile with the different patterns of connectivity observed in the same-size and vary-size conditions.
According to conventional accounts of RS (based on fatigue-type mechanisms), brain areas showing RS to faces across changes in size, such as the OFA, contain neuronal populations encoding size-invariant representations of facial identity. However, our DCM results suggest that RS in OFA across changes in image size is the consequence of modulation from FFA. For the vary-size condition, we found a change in backward connectivity (FFA-to-OFA) but no change in forward connectivity, suggesting RS in OFA is attributable to changes in input from FFA. Although the current results do not speak to the precise nature of the processes performed in these 2 regions, one interpretation of these findings is that presentations of the same face at different sizes may activate different neuronal populations in OFA. In contrast, neural populations in FFA may be invariant to transformations in size, and thus modulatory input from this region suppresses responses in OFA. However, when using RS to infer neural representations in FFA, it is also important to consider the FFA’s role within a hierarchy of face-processing regions. In other words, RS in FFA may also be the consequence of top-down modulation. As such, any conclusions drawn from this study are specific to the network space explored—that is, OFA and FFA only.
One candidate region that may modulate FFA function is the anterior temporal lobe. Previous work has shown that this region is involved in representing higher level information relating to specific identities (Kriegeskorte et al. 2007). However, we found no significant effect of RS in this area or any other brain area besides OFA and FFA and only a minority of participants showed face-related activation in the anterior temporal lobe at a liberal threshold. Similarly, previous studies have reported that the anterior temporal lobe shows less face selectivity than OFA or FFA and shows less consistent activation across subjects than these occipitotemporal regions (Fairhall and Ishai 2007; Kriegeskorte et al. 2007; Rajimehr et al. 2009). The inconsistent nature of face-selective responses in the anterior temporal lobe is often attributed to the high magnetic susceptibility in this region, however we used scanning parameters that minimized signal dropout in this area (Bellgowan et al. 2006). It is also worth noting that studies reporting face-related activation in anterior temporal lobe used extensive signal averaging approaches (Tsao et al. 2008; Rajimehr et al. 2009) or multivariate pattern analysis (Kriegeskorte et al. 2007; Carlin, Calder et al. 2011; Carlin, Rowe et al. 2011), and thus may be more sensitive than “standard” univariate localizer scans.
Functional Architecture of the Core Face Network
In Haxby et al.’s (2000) neural model of face processing, the core network involved in face perception is characterized as a hierarchical system, with face information directly entering OFA only (see also Fairhall and Ishai 2007). Here, we tested alternative models in which input could enter either or both face-selective regions. Evidence clearly favored models with inputs to both OFA and FFA, suggesting that activity in FFA is partly independent of that in OFA. This accords with evidence from prosopagnosic patients who show right FFA activity despite damage to the right OFA (Rossion et al. 2003; Schiltz et al. 2006) or to bilateral OFA (Steeves et al. 2006) and with research suggesting the existence of direct anatomical connections from retinotopic visual cortex to both the lateral occipital and fusiform regions (Kim et al. 2006). Of particular relevance to the current study, the same prosopagnosic patients also fail to show RS to facial identity in FFA (Dricot et al. 2008; Steeves et al. 2009). This suggests that although information is projected directly to this region, RS in FFA appears to be dependent upon interactions with an intact OFA. In accordance with this, the current RS study provides the first direct evidence of reciprocal (forward and backward) connectivity between these 2 regions.
Unlike our previous study examining the mechanisms underlying RS to body images in EBA and FBA (Ewbank et al. 2011), our current results did not favor models in which RS also altered self-connectivity, which would be associated with within-region changes such as neuronal fatigue. However, this may reflect an important difference in estimation procedures between the 2 studies, in that in DCM10 used here, endogenous self-connectivity can vary rather than being a fixed inhibitory connection. This means the value of endogenous self-connections reflect average changes in self-connectivity across all experimental conditions and any effect of RS on self-connectivity reflects an additional “context-dependent” change. Our findings therefore suggest that changes in self-connectivity do not differ between same- and different-identity blocks.
A second difference between the 2 studies is that repetition of identical face images was associated with changes in forward connectivity alone, as opposed to changes in forward and backward connectivity that were observed for repetition of identical body images. A possible explanation is that in contrast to the previous study which favored a model in which driving input entered EBA only (Ewbank et al. 2011), here we found evidence suggesting that driving input enters both OFA and FFA in parallel. Given this parallel input, and the observation that FFA has reciprocal connections with OFA, a change in connectivity from OFA-to-FFA could reflect a change that occurs before or after OFA receives input from FFA. Future developments in the application of DCM in magnetoencephalography/electroencephalography studies could help to resolve the precise timing of these effects. To some extent, all inferences about directionality are dependent upon where information enters the system. However, the notion of hierarchical processing need not be restricted to a strictly serial sequence but applies to any network in which there are well-defined levels of processing. In a hierarchical network, information can flow in both directions, can skip intermediate levels, and can travel in parallel through distinct channels (Felleman and Van Essen 1991). As such, the term backward or top-down connectivity describes the influence of a “higher level region” on a “lower level region” based upon their relative locations within the visual hierarchy, regardless of the location of the driving input. However, the key common finding from both studies is that different patterns of connectivity underlie RS to identical images and RS across changes in image size, with RS across size characterized by changes in backward connectivity and repetition of an identical image associated with changes in forward connectivity.
The univariate analyses identified different patterns of RS for familiar and unfamiliar same- and vary-size faces, with greater RS for unfamiliar same-size faces. However, model selection identified the same winning models for both familiar and unfamiliar faces in this condition. One of the limitations of DCM is that is not possible to directly compare Bayesian model evidence for models fitted to different data sets (i.e., familiar face same-size vs. unfamiliar face same-size) as model evidence only reflects a relative statement about a particular model compared with all other models fitted to a particular data set. Thus, while BMS identified qualitatively similar models, we were unable to test for any possible quantitative differences that may exist between these models.
Previous studies using repetition priming paradigms, in which each face is repeated only once after a varying number of intervening faces, have also found RS in occipitotemporal cortex (Henson et al. 2000; 2003). By contrast, the current study used a blocked design, where the same identity is repeated multiple times within a block. A blocked design was chosen because it is typical of many prior fMRI-adaptation studies (e.g., Grill-Spector and Malach 2001; Loffler et al. 2005; Mazard et al. 2006; Andrews et al. 2010) used to infer the stimulus properties of face-selective regions and because it maximizes sensitivity to basic fMRI RS effects. Furthermore, block designs are optimal for the application of DCM, as DCM is not sensitive to brief modulations, as would occur in an RS design in which repeats and nonrepeats were intermixed. However, it would be interesting to determine whether a similar pattern of connectivity is observed using a nonblocked design, where, for example, participants have fewer expectations about the nature of the next stimulus.
In conclusion, our findings suggest that different neural accounts underlie different forms of RS in face-processing regions of occipitotemporal cortex. Repetition of the same image of the same face produced changes in forward connectivity (OFA-to-FFA), whereas repetition across changes in size affected backward or top-down connectivity (FFA-to-OFA). These findings suggest that RS in a given face-selective region may reflect a change in the interactive relationship between regions, rather than “within-region” changes such as fatigue of an underlying neuronal population. In addition, our results suggest that the core face-processing network is characterized by reciprocal connectivity rather than a purely feed forward architecture and that OFA and FFA receive direct parallel inputs from lower visual regions. Previous RS studies of face processing have largely interpreted their findings based on the fatigue model of RS. These results challenge the inference that RS in a given face-selective region reflects the neural representations contained in that area and instead suggest that RS is not necessarily “localized” to any given region. Taken together with previous evidence that RS to bodies is associated with a change in top-down connectivity between FBA and EBA (Ewbank et al. 2011), these findings suggest that changes in top-down connectivity may be part of a general mechanism underlying fMRI RS within the ventral visual stream.
UK Medical Research Council under project codes MC_US_A060_0017 (to A.J.C.) and MC_US_A060_0046 (R.N.H); Wellcome Trust [088324 to J.B.R.].
Conflict of Interest : None declared.