Recognition of faces and written words is associated with category-specific brain activation in the ventral occipitotemporal cortex (vOT). However, topological and functional relationships between face-selective and word-selective vOT regions remain unclear. In this study, we collected data from patients with intractable epilepsy who underwent high-density recording of surface field potentials in the vOT. “Faces” and “letterstrings” induced outstanding category-selective responses among the 24 visual categories tested, particularly in high-γ band powers. Strikingly, within-hemispheric analysis revealed alternation of face-selective and letterstring-selective zones within the vOT. Two distinct face-selective zones located anterior and posterior portions of the mid-fusiform sulcus whereas letterstring-selective zones alternated between and outside of these 2 face-selective zones. Further, a classification analysis indicated that activity patterns of these zones mostly represent dedicated categories. Functional connectivity analysis using Granger causality indicated asymmetrically directed causal influences from face-selective to letterstring-selective regions. These results challenge the prevailing view that different categories are represented in distinct contiguous regions in the vOT.
Face recognition is essential for the social interactions of primates (Rolls 2010). In humans, literacy further facilitates visual communications beyond space and time. A key brain structure implicated in the recognition of both “faces” and “written words” is the ventral occipitotemporal cortex (vOT) (Allison, McCarthy, et al. 1994; Halgren et al. 1994; Puce et al. 1996). Specifically, functional magnetic resonance imaging (fMRI) studies have shown that visual presentation of particular visual categories such as faces, places, bodies, and written words induced hemodynamic responses in specific vOT regions (Grill-Spector and Malach 2004). The fusiform face area (FFA) is a right-dominated vOT region specialized for face recognition (Kanwisher and Yovel 2006). As an individual learns to read and write, another region within the occipitotemporal sulcus—the left-dominated “visual word form area (VWFA)”—becomes specifically activated by visual presentation of words or letterstrings (Dehaene and Cohen 2011; Price 2012). Despite hemispheric dominance, activation was often bilaterally observed for both categories (Puce et al. 1996; Kanwisher et al. 1997; Cohen et al. 2003). Electrocorticography (ECoG), recording of brain-surface field potentials from chronically implanted subdural electrode arrays in patients of medically intractable epilepsy (Lachaux et al. 2003; Mukamel and Fried 2012), also revealed bilateral event-related potentials selective to faces and letterstrings (Halgren et al. 1994; Nobre et al. 1994; Allison et al. 1999; McCarthy et al. 1999; Puce et al. 1999; Chan et al. 2011). Importantly, the within-hemispheric topological relationship between the face-selective and word-selective vOT regions remains unsettled. Three hypotheses have been proposed (Fig. 1). First, both face-selective and letterstring-selective vOT regions might form distinct and contiguous areas with no macroscopic spatial overlap between them (single-zone hypothesis). Second, multiple face-selective regions and multiple letterstring-selective regions might exist and alternate each other within the vOT (multizone hypothesis). Third, face-responsive and letterstring-responsive regions might be macroscopically overlapped in the vOT (overlap hypothesis). Since the discovery of the FFA and the VWFA, the single-zone hypothesis has been most widely accepted. However, functional specificity and regional specificity of the FFA and the VWFA have been challenged (Price and Devlin 2011; Weiner and Grill-Spector 2012). The overlap hypothesis emphasizes the importance of the patterns of suboptimal neural activity distributed in the vOT which may encode faces and object categories (Haxby et al. 2001). Both the FFA and the VWFA occupy huge portions along the anteroposterior axis of the vOT cortex. Recently, Weiner and coworkers argued that high-resolution fMRI revealed 2 separate (anterior and posterior) face-selective vOT patches near the midfusiform sulcus in individual hemispheres, and body-selective regions alternate with them (Weiner and Grill-Spector 2010; Parvizi et al. 2012). Likewise, in the greater VWFA, successive activation peaks have sometimes been observed along the anteroposterior axis (Dehaene et al. 2002; Vinckier et al. 2007; Wandell 2011). These findings raised the plausibility of the multizone hypothesis. In general, population analysis of data acquired in different experiments often makes it difficult to distinguish between the single-zone, multiple-zone, and overlap hypotheses.
A pioneering ECoG study reported that face-selective and letterstring-selective recording channels occasionally abutted each other in individual hemispheres (Nobre et al. 1994). Other ECoG studies have found evidence for 2 face-selective vOT patches proposed by Weiner and Grill-Spector (2012) using high-resolution fMRI, but this is less common due to coverage issues and electrode organization (e.g., strips vs. grids) (Allison et al. 1999; Parvizi et al. 2012).
The present study focused on within-hemispheric analyses in 6 hemispheres from 4 patients out of a larger series of studies (30 hemispheres from 20 patients). We collected data from those patients who underwent high-density implantation of as many as 18–46 intracranial electrodes in vOT per hemisphere and in whom multiple face-selective channels and letterstring-selective channels were identified. For each subject, ECoG responses to a variety of 24 visual categories including faces and letterstrings were recorded and selectivity was defined based on analysis of variance (ANOVA) corrected for multiple comparisons, or more conservatively based on the d′ value calculated from the signal detection theory (Green and Swets 1966). Our objective was to examine whether within- hemispheric analysis supported the one-zone, multiple-zone, or overlap hypothesis. Empirically, the “single-zone” model would be supported if face-selective channels form one continuous cluster and letterstring-selective channels form another cluster without overlap in the vOT of each hemisphere. The “multizone” model would be supported if face-selective and letterstring-selective channels form multiple clusters, respectively, in the vOT without overlap. The “overlap” model would be supported if face-responsive and letterstring-responsive channels are overlapped. To test the validity of the overlap hypothesis directly, we also conducted a classification analysis that quantitatively examined how the face and the letterstring information are distributed in the letterstring-selective and face-selective channels respectively. Another advantage of high-density recording is that it enables evaluation of connectivity among multiple sites within hemispheres. Therefore, we compared Granger causality within and across the face-selective and letterstring-selective channels (Granger 1969) in order to clarify functional relationship between these 2 category-selective vOT regions.
Materials and Methods
Written informed consent was obtained from 20 patients with pharmacologically intractable epilepsy, who were evaluated for possible surgical treatment at The University of Tokyo Hospital or Nishi-Niigata Chuo National Hospital. Experimental protocols were approved by the institutional review boards of both hospitals and Niigata University School of Medicine.
From 2009 to 2012, we implanted subdural electrodes in 20 patients, totaling 30 hemispheres for the purpose of detecting epileptic foci. Among them, 2 patients (2 hemispheres) were excluded from analyses (Table 1), because only 1 category (letterstring) was tested. To validate the category-selectivity for each channel, we further excluded from analyses the patients who could not perform a visual category judgment task or a one-back task during presentation of 24 categories of visual stimuli. To examine whether the face-selective and letterstring-selective channels were spatially separated, alternated, or overlapped, we excluded the hemispheres implanted with ≤15 electrodes in vOT and in which coexistence of multiple face-selective channels and multiple letterstring-selective channels using ANOVA (see Category-selectivity section) was not observed. Subdural ECoG electrode arrays were arranged in grids or strips (Unique Medical Co., Tokyo, Japan). Each grid/strip contained 4–20 electrodes. Electrode contact was 1.5 mm in diameter with a 5-mm separation, or 3 mm in diameter with a 10-mm separation. The number and location of the recording sites in the temporal, occipital, and frontal lobes were determined exclusively by clinical criteria. The recorded signal was amplified using an reference placed on the scalp, filtered between 0.55 and 150 Hz and sampled at 400 Hz (Nicoletone, Care Fusion, San Diego, CA, USA), or amplified using an averaged intracranial reference placed outside of the epileptic focus and filtered between 0.05 and 300 Hz, and sampled at 1 kHz (EEG1200, Nihon Koden, Tokyo, Japan). All data were acquired during periods without epileptic seizure events.
|Number of channels||113||107||865||585||239|
|Number of visually responsive channels||15||8||139||371||113|
|d′ (category selectivity) in visually responsive channels (Mean ± SD)||0.35 ± 0.13||0.33 ± 0.11||0.40 ± 0.21||0.78 ± 0.59||0.59 ± 0.39|
|Number of channels||113||107||865||585||239|
|Number of visually responsive channels||15||8||139||371||113|
|d′ (category selectivity) in visually responsive channels (Mean ± SD)||0.35 ± 0.13||0.33 ± 0.11||0.40 ± 0.21||0.78 ± 0.59||0.59 ± 0.39|
Note: Data from 1909 channels in 28 hemispheres.
Subjects performed a one-back task or a category judgment task while colored photographs of 120 objects from 24 different categories, including human faces and letterstrings, were presented in a pseudorandom order on a 27-inch LCD monitor at a viewing distance of 57 cm. During the one-back task, the same stimuli were repeatedly presented on some trials. Data from the second presentation trials was excluded not to underestimate the visual response due to repetition suppression. Each stimulus, subtending 6° of the visual angle, was presented for 300 ms, followed by a 900-ms interval period.
Three-dimensional (3D) T1-weighted magnetic resonance images (MRIs) of each subject's brain, which consisted of 136 sequential 1.4-mm-thick axial slices with a resolution of 256 × 256 pixels in a field of view of 240 mm, were obtained preoperatively. MRIs were automatically registered to postoperatively scanned computed tomography to determine electrode positions based on a normalized mutual information method using AVIZO (Visualization Science Group, Bordeaux, France). The 3D brain surface was then reconstructed using Real INTAGE (Cybernet Systems, Ltd., Tokyo, Japan). For joint presentation, the 3D brain image mounted with electrode locations was normalized to Montreal Neurological Institute coordinates via a linear scale adjustment using SPM8 (http://www.fil.ion.ucl.ac.uk/spm/software/spm8/, last accessed on 11/12/13), and then, transformed into Talairach coordinates.
In-house Matlab codes (MathWorks, Natick, MA, USA), with a statistical toolbox and signal processing toolbox, were used for data analysis (Matsuo et al. 2011; Toda et al. 2011). Signals acquired at 1 kHz were resampled offline at 400 Hz. A notch-filter was applied at 50 Hz in the analysis of visually evoked potentials (VEPs). VEPs during a 1.2-s trial epoch (600 ms from stimulus onset, 481 sampling points) were averaged across trials with baseline offset correction (−200 to 0 ms). Event-related spectral perturbation (ERSP) was calculated from the power spectrum over the trial epoch. Each 1.2-s epoch was Hamming windowed using 250 ms sliding windows in 2.5 ms steps prior to Fourier transformation. The averaged baseline (−375 to −125 ms) power spectrum was subtracted. Latency of visual response was determined from VEP or γ band power as the first of 4 consecutive data points that were significantly deviated (paired t-test, P < 0.05) from the baseline response (−250 to −50 ms)
We investigated the category selectivity using ANOVA and index of d′. Category selectivity of the ERSP between 50 and 300 ms after stimulus onset for each channel was evaluated. First, ANOVA and post hoc t-tests were conducted to compare the differences in the mean responses between one category and other categories accompanied by a Bonferroni correction. The α value for each comparison was set at 0.05 divided by the number of channels, frequency bands, and categories. For individual hemisphere analysis, we focused on those hemispheres where multiple face-selective channels and multiple letterstring-selective channels were simultaneously identified in the vOT (−80 < y < −30 and z < −10). Second, d′ (Green and Swets 1966) was computed as follows:
The procedure was repeated once again after exchanging odd and even-numbered blocks. Significance of category selectivity was evaluated if d′ > 1 in both odd and even blocks.
To examine the time course of the category selectivity, d′ was calculated from ERSP in every sampling points (2.5 ms). We determined the latency when d′ value exceeded the value of 1. We calculated peak times of the response between 0 and 475 ms from the averaged time course in each electrode.
Using a neural decoding approach, category information encoded in a group of electrodes (face-selective channels or letterstring-selective channels) was evaluated. We selected 2 object categories and picked up the trials in which images included in those categories were presented. Using those trials, a binary classifier was trained to predict the category of a presented image on a trial-by-trial basis and tested (Kamitani and Tong 2005). We applied this procedure to all pairs of the 24 object categories. Each binary classifier consisted of a linear support vector machine (Vapnik 1998) [implemented by LIBSVM (Chang and Lin 2011)]. As input features to the classifiers, we calculated the ERSP with the time window between 50 and 300 ms after the stimulus onset, and those for the electrodes of a group and the 5 frequency bands were used. To avoid arithmetic overflow and underflow, the values of each feature were z-transformed using the sample mean and SD calculated with the training dataset. To evaluate generalization performance for category classification across different exemplars, we ensured that trials corresponding to the same visual stimuli were not included in both the training and the test datasets (Vindiola and Wolmetz 2011). For each category pair, we divided the exemplars included in either category of the pair into 5 groups, each of which contained 2 exemplars from the 2 different categories and divided the corresponding trials into 5 groups. Four groups were then used to train a decoder and the remaining group was used for evaluating the trained classifier. This procedure was repeated until the trials from all 5 groups were tested (5-fold cross-validation), and the percentage of correct classification was calculated.
Time Course of VEP and ERSP Responses for Face- and Letterstring-Stimuli
Visual latencies were calculated from both VEP and ERSP in each trial. We determined the latency when the response exceeded the mean ± 2.5 SD of the response in the prestimulus period for the 4 consecutive data sampling points. We calculated peak times of the response between 0 and 475 ms for VEP and high-γ activity from the averaged waveform in each electrode. Latencies and peak times for face and letterstring stimuli from 6 hemispheres were tested using the Mann–Whitney U-test.
Spatial Distribution of Face-Selective Channels and Letterstring-Selective Channels
Individual brains were transformed into a spatially normalized brain using SPM8 and all the face- and letterstring-selective vOT channels were plotted in the Talairach coordinates.
Functional Connectivity Between Channels
The eConnectome MATLAB software (http://econnectome.umn.edu, last accessed on 11/12/13) was used to investigate directional interactions (He et al. 2011), or Granger causality, between face-selective channels and letterstring-selective channels. The Granger causality tests a statistical hypothesis for determining whether one time series is useful in forecasting another (Seth 2007). A variable X1 ‘Granger causes’ a variable X2 if information in the past of X1 helps predict the future of X2 with better accuracy than is possible when considering only information in the past of X2 itself. The adaptive-directed transfer function (ADTF) estimates the directional causal interaction in the spectral domain (Wilke et al. 2008). ADTF in the high-γ band power was calculated from ECoG signals evoked by face and letterstring stimuli. ADTFs between 50 and 300 ms, after stimulus onset, were averaged. To evaluate the effect of directional combinations on the interaction strength in each combination, ADTFs from 6 hemispheres with multiple face- and letterstring-selective channels were tested using the Kruskal–Wallis test and Mann–Whitney U-test accompanied by a Bonferroni correction.
Recordings from a total of 1909 electrodes in 28 hemispheres were analyzed. The majority of the category-selective channels were located in the vOT (Table 1). The areas of all the category-selective channels did not contain epileptiform abnormalities.
Topological Alternation of Face- and Letterstring-Selective Zones in the vOT
In the bilateral vOT (−80 < y < −30 and z < −10), face-selective channels and letterstring-selective channels (Fig. 2) were predominantly observed in the fusiform gyrus (Fig. 3, Table 2). To examine whether the face-selective and letterstring-selective channels were spatially separated, alternated, or overlapped (Fig. 1), among the whole set of data from 28 hemispheres in 18 patients, 10 hemispheres were excluded because of limited tested stimulus categories, 7 hemispheres were excluded because of insufficient number of channels in vOT, 5 were excluded because multiple face- and letterstring-selective channels were not found within hemispheres. Finally, 6 hemispheres (3 left and 3 right hemispheres) from 4 patients (15–48 years old, 3 males, 4 right-handed) fulfilled the criteria. Two patients underwent bilateral implantation of the electrode arrays, while 2 patients underwent unilateral implantation. Data were obtained from 18 to 46 electrodes in vOT for each hemisphere from a total of 207 recording sites. Quantitative measure of the response amplitude and the selectivity of each electrode revealed topological alternation for face-selective and letterstring-selective channels (Fig. 3). We defined the topological alternation of face-/letterstring-selective channel clusters if one or more face-/letterstring-selective channels were sandwiched by 2 letterstring-/face-selective channels, respectively. Alternation was observed in 4 of the 6 hemispheres (Fig. 3A–C,E, black squared insets). In the other 2 hemispheres, where cortical coverage of electrodes was restricted (see Discussion) (Fig. 3D,F), alternation was not observed after statistically thresholding (Fig. 3D,F black squared insets), but appeared in selectivity measurement (Fig. 3D,F pie chart). To clarify the anatomical location of the face-selective and letterstring-selective channels relative to the cerebral sulci, we plotted them on individual brains (Fig. 3). Although the locations of implanted electrodes varied across subjects, 2 distinct face-selective zones were noted in the anterior and posterior portions of the midfusiform sulcus in both left and right hemispheres. The anterior face zone was consistently observed in 5 of the 6 hemispheres, whereas the posterior face zone was observed in 4 of the 6 hemispheres. Alternating with these face-selective zones, there appeared 3 letterstring-selective zones: one located anterior to the anterior end of the midfusiform sulcus and near the collateral sulcus, another in the anterior portion of the midfusiform sulcus just behind the anterior face zone, and the other located in the posterior portion of the mid-fusiform sulcus. Taken together, the distribution of the face- and letterstring-selective zones appeared to be alternated and partially overlapped, as if constituting a multilayered stripe pattern architecture between the collateral sulcus and the occipitotemporal sulcus (Fig. 1 center).
|Category||Number of selective channels||Number of selective channels in each frequency band||Mean d′|
|Number of selective channels in each frequency band||12||8||10||17||58|
|Category||Number of selective channels||Number of selective channels in each frequency band||Mean d′|
|Number of selective channels in each frequency band||12||8||10||17||58|
Note: Data from 207 channels in 6 hemispheres.
Faces and letterstrings elicited the most and second most category-selective neural responses, respectively, among a variety of 24 visual categories tested (Table 2, Fig. 2, Supplementary Figs S1 and S3). In typical recording sites in the right vOT [(x, y, z) = (31, −65, −24) and (36, −54, −27) in Talairach coordinates for Fig. 2A,B, respectively], VEPs and total ERSP power were sharply selective to the “face” and “letterstring” categories (Fig. 2A,B). In all the 6 hemispheres, 64 channels exhibited significant category selectivity. The majority of the category-selective channels (52 of 64) were selective to single categories (46 were selective in a particular frequency component among the θ, α, β, γ, and high-γ bands, whereas 18 were selective across multiple frequency components). Category selectivity of the ECoG signals was most prominent in high-γ band powers. In addition to the conventional ANOVA with familywise error correction, we also estimated the category selectivity by d′ analysis (Fig. 4A) evaluated at d′ > 1 in θ, α, β, γ, or high-γ band power (Supplementary Fig. S1, Table 2). Consequently, d′ analysis revealed a smaller number of face- and letterstring-selective channels compared with ANOVA with familywise error correction (Supplementary Fig. S1), and thus appeared more conservative. Nonetheless, alternation of face- and letterstring-selective channels was still observed in d′ analysis (Fig. 4A) in 3 of 3 hemispheres where multiple face-selective channels and letterstring-selective channels were found. Few exceptional channels (1/8, 0/12, 1/9 in each hemisphere) exhibited selectivity for both categories. The alternation was observed even following a cross-validation analysis (Supplementary Fig. S2) where independent sets of data were used to define the preferred category and to quantify category selectivity. The channel-wise preference for the face- or letterstring-category emerged along with the time course of visual response and was maintained through the entire visual presentation period (Fig. 4B).
The stereotaxic coordinates of the face-selective channels in the vOT plotted on the spatially normalized brain (Fig. 5A) were in agreement with right-dominated, bilateral face activations reported in prior fMRI studies (Puce et al. 1996; Kanwisher et al. 1997; Grill-Spector et al. 2004; Barton et al. 2009; Haist et al. 2010; Mei et al. 2010), and also consistent with face-related VEPs bilaterally located by ECoG (Allison et al. 1999, 2002; Parvizi et al. 2012). The averaged ERSP for face-selective and letterstring-selective channels showed clear visual selectivity (Fig. 5B). The ERSP response to faces averaged for all face channels was greater and lasted longer than the ERSP response to letterstring averaged for all letterstring channels. A scatter-plot analysis indicated that the majority of individual channels exhibited selectivity to face or letterstring stimuli in high-γ band powers (Fig. 5C). No significant difference was detected in the visual response latencies of VEP and high-γ power between face channels and letterstring channels (Fig. 6A,B). Peak response time in high-γ power for face-selective channels was longer than one for letterstring-selective channels (326 ± 35 and 275 ± 73 ms; median ± interquartile range (IQR); P = 0.021), while no significant difference was observed in VEP (199 ± 10 ms for face channels and 205 ± 19 ms for letterstring channels; P = 0.26) (Fig. 6A,C). We found that d′ for faces rose quicker than that for letterstring by 17 ms (144 ± 33 and 161 ± 28 ms; P = 0.048) (Fig. 6A,B), although peak time in d′ was not different between face- and letterstring-selective channels (293 ± 75 and 278 ± 36 ms; P = 0.34) (Fig. 6A,C). Thus, although face-selective and letterstring-selective regions appeared to be macroscopically overlapped in the vOT, the 2 populations exhibited distinct functional properties.
Category Information in Distributed Activity Patterns
Based on ECoG responses recorded from face-selective channels (Fig. 3), pairwise classification performance of faces was much higher (92 ± 9.5% median ± IQR) than the classification performance of letterstrings (60 ± 21%) (P < 9.6 × 10−38, Mann–Whitney U-test with Bonferroni correction) and other categories (57 ± 14%) (P < 6.1 × 10−74) (Fig. 7A left, B left). Similarly, based on ECoG responses from letterstring-selective channels, classification performance of letterstrings was much higher (89 ± 13%) than the classification performance of faces (61 ± 15%) (P < 1.4 × 10−38) and other categories (61 ± 17%) (P < 3.5 × 10−66) (Fig. 7A right, B right). Thus, although it is possible that the representations of a few suboptimal categories may be overlapped, most of the suboptimal categories are not overlapped in a sense of above chance classification performance.
Functional Connectivity Between Face- and Letterstring-Selective Channels
A functional differentiation between face- and letterstring-selective channels was also suggested by the nonuniform distribution of Granger causalities among the recording channels (Fig. 8A). Granger causality within face-selective channels (0.068 ± 0.021 median ± IQR) was significantly higher compared with within letterstring-selective channels (0.059 ± 0.020) (P < 0.001, Mann–Whitney U-test with Bonferroni correction). Moreover, across-domain connectivity was asymmetrically directed from face channels to letterstring channels (from face to letterstring channels, 0.075 ± 0.024; from letterstring to face channels, 0.048 ± 0.017; P < 5.7 × 10−5) (Fig. 8B). If face-selective channels and letterstring-selective channels were driven by a common visual input with a significant lag, the lagged common input may cause apparent pseudocorrelation directed from face to letterstring channels. However, we found no significant difference in the visual response latency between the letterstring-selective channels and face-selective channels for both VEP (face-selective channels, 123 ± 138 ms; letterstring-selective channels, 118 ± 130 ms, median ± IQR) and high-γ activity (face-selective channels, 115 ± 130 ms; letterstring-selective channels, 113 ± 128 ms) (Mann–Whitney U-test, P = 0.12 for VEP, P = 0.32 for high-γ activity) (Fig. 6B). These results rule out the possibility that the common cause factor was a major contribution of Granger causality analyses. We also conducted the permutation test by randomly assigning face or letterstring labels to the relevant population. We found no significant order of Granger causality following permutation. This analysis excludes the possibility that the causality might be produced by chance.
We presented rare data of high-density electrocorticographic recording from the human vOT. Our present study is consistent with prior work reporting the coexistence of face-selective and letterstring-selective channels in the vOT, but also adds the important fact that the stimulus-selective electrodes cluster together and generate an alternating organization. To the best of our knowledge, coexistence of multiple face-selective channels and multiple letterstring-selective channels in the vOT has not been described in any individual hemisphere in the previous literature of intracranial recording (Allison, Ginter, et al. 1994; Allison, McCarthy, et al. 1994; Halgren et al. 1994; Nobre et al. 1994; Allison et al. 1999; McCarthy et al. 1999; Puce et al. 1999; Tsuchiya et al. 2008; Fisch et al. 2009; Liu et al. 2009; Vidal et al. 2010; Chan et al. 2011). The current series of precious anecdotal cases provided us significant insight into the functional organization of the vOT, specifically regarding the controversy whether the face-selective and letterstring-selective vOT regions are organized as “single-zone,” “multizone,” or “overlap.” Among the 3 hypotheses, our results support the multizone hypothesis. Granger causality analysis with simultaneously identified multiple face- and letterstring-selective regions revealed asymmetrically directed connection from face-selective to letterstring-selective regions. These results challenge the prevailing view that different categories are represented in distinct contiguous regions in the vOT but, instead, support a novel view that there are alternating zones selective to faces and words in the vOT, with asymmetric functional link between them.
Alternation of Face- and Letterstring-Selective Regions in the Human vOT
Many fMRI and ECoG studies assume that the face-responsive and word-form-responsive vOT regions are anatomically separated, although their relative position remains unclear. Face- and letterstring-selective ECoG channels were described to be “intercalated” in the pioneering studies (Allison, Ginter, et al. 1994; Nobre et al. 1994). However, “intercalation” in these initial papers referred to a situation where face-responsive and letterstring-responsive regions located side by side or constituted pieces of a mosaic of various category-selective regions in the vOT in population analyses, quite different from the definition of alternation in the present paper (i.e., a sandwich structure of face-letterstring-face channels or letterstring-face-letterstring channels). In these early papers, individual subjects were typically placed with one ECoG strip with several electrode contacts, with a 10-mm interelectrode spacing, in the vOT (e.g., Figs 2–4A of Nobre et al. 1994). A subsequent study (Fig. 8 of Puce et al. 1999) showed how face- and letterstring-related ERPs were organized in a single individual. Unfortunately, since only one letterstring-selective channel was identified in this patient, it was impossible to distinguish between the single-zone and multizone hypotheses. Puce et al. (1999) proposed a “random mixture column” model (their Fig. 10A) whereby face-selective columns are randomly intermixed with letterstring-selective or other category-selective columns, contrasting with a “segregated category-specific columns” model whereby multiple columns with similar category selectivity assembled together to form a functionally homogeneous cluster (their Fig. 10B). From sharp channel-wise specificity of the ECoG responses, they rejected the intercalation model at the columnar grouping level. At the level of the whole vOT organization, however, these same authors argued, “human face-selective patches of cortex are on average 12–16 mm wide and 15–35 mm long, depending on whether face-selective region is unitary or broken into 2 patches.” Therefore, it has been difficult to determine which of the single-zone, multizone, or overlap hypotheses is most likely by ECoG studies, since the number and density of electrodes implanted in individual hemispheres is solely determined by clinical criteria, and not always sufficient to systematically examine category selectivity across the whole vOT.
By focusing on the cases with high-density electrodes in the vOT, we found that highly face-selective and letterstring-selective regions were alternated in the vOT in 4 of 6 hemispheres examined. Importantly, the remaining 2 cases did not directly support the single-zone or the overlap hypothesis. In 1 of the 2 hemispheres, the result may support multizone hypothesis with a proviso. In this case, the posterior part of the vOT was not covered, but considering the location of the conventional VWFA (Dehaene et al. 2002; Vinckier et al. 2007), it is likely that there should be another letterstring-selective zone posterolateral to the face-selective zone in this patient, which would be in favor of the multizone hypothesis. In the remaining hemisphere, fewer numbers of face- and letterstring-selective channels were detected compared with other hemispheres tested, presumably because the electrode diameter was 3 mm instead of 1.5 mm, which may affect detectability in high γ band powers (see the next section), in this patient. The alternation was still observed 3 of 3 hemispheres in which multiple face-selective and letterstring-selective regions were defined by d′ analysis (Fig. 4A), even following cross-validation (Supplementary Fig. S2). Face-selective and letterstring-selective sites appeared to be macroscopically overlapped in the vOT, but individual channels showed distinct visual selectivity to face- or letterstring-category in 52 out of 64 channels based on ANOVA and 27 out of 29 channels based on d′ measure. The selectivity, particularly in high-γ band powers, was maintained through the entire visual stimulation period. From these results, we conclude that face-selective channels and letterstring-selective channels can be alternated in the human vOT cortex. Specifically, a series of alternating face- and letterstring-selective zones would constitute stripe architecture.
The “FFA” is the most representative category-selective vOT region widely accepted as a spatially contiguous cluster that can be accurately localized across subjects (Kanwisher and Yovel 2006). However, there has been no clear consensus as to its extent and spatial relationship to other category-selective regions (Op de Beeck et al. 2008). Pioneering ECoG studies reported that face-selective sites 15–25 mm apart existed (Allison, McCarthy, et al. 1994; Allison et al. 1999). However, these critical findings did not provide the assertion if they were from a single region or separate regions, because the density of the grid was insufficient. Our high-density recording showed that 2 face-selective regions were aligned with consistent topology. One region (face zone #1) appeared on the midfusiform gyrus around the anterior lip/anterior part of the midfusiform sulcus (Fig. 3C,E). The other region (face zone #2) appeared on the posterior-fusiform gyrus around the posterior part of the midfusiform sulcus between the collateral sulcus and occipitotemporal sulcus. This organization is consistent with recent human and monkey fMRI studies (Tsao et al. 2003, 2008; Pinsk et al. 2009; Rajimehr et al. 2009; Ku et al. 2011; Parvizi et al. 2012) and monkey electrophysiology studies (Moeller et al. 2008) suggesting that 2 face-selective clusters exist within the vOT. We estimated whether these 2 face regions correspond to mFus and pFus or whether they correspond to FFA and occipital face area (OFA) (Weiner and Grill-Spector 2010, 2012). We have compared the coordinates of these 2 face regions (face zones #1 and #2) and those of mFus, pFus (Weiner and Grill-Spector 2011) and OFA (Grill-Spector et al. 2004). In the right hemisphere, the averaged coordinate of the channels in face zone #1 was estimated at (33 ± 6, −44 ± 3, −27 ± 5, n = 8 channels from 3 hemispheres) (mean ± SD) in Talairach coordinates. The face zone #2 was centered at (33 ± 6, −67 ± 6, −21 ± 2, n = 5 channels from 2 hemispheres). The centers of mFus, pFus, and OFA in the literature were located (34 ± 4, −47 ± 6, −16 ± 4), (34 ± 5, −62 ± 6, −15 ± 3) and (45 ± 5, −70 ± 3, 2 ± 9). The distances between zone #1 and mFus, pFus, and OFA were 12, 22, and 41 mm, respectively. The face zone #1 was the closest to mFus among the 3 possibilities in the coordinates. For zone #2, the distances from mFus, pFus, and OFA were 20, 8, and 16 mm. The face zone #2 was the closest to pFus. The same geometrical relationship was found in the left hemisphere. The distances between zone #1 and mFus, pFus and OFA were 14, 22, and 45 mm. The distances between zone #2 and mFus, pFus, and OFA were 15, 9, and 35 mm. Thus, it is likely that the face zone #1 and the face zone #2 identified in the present study would correspond to mFus and pFus, respectively. It is possible that ECoG electrodes medial to the midfusiform sulcus are picking up signals from neural responses lateral to the midfusiform sulcus if the lip of the sulcus locates lateral to its fundus. Our data are consistent with a recent in-depth analysis reporting that the position of pFus relative to the midfusiform sulcus can be more variable across subjects than the relative position of mFus (Weiner et al. 2013). It is therefore not easy to determine the topology between the ECoG electrodes and specific sulci in some individual analyses.
The regional specificity of the VWFA may be more contentious than that of the FFA (Dehaene and Cohen 2011; Price and Devlin 2011). Signal sensitivity of fMRI responses in the vicinity of VWFA can be affected by considerable inhomogeneity of the blood oxygen level–dependent signals. The anterior portion of the VWFA approaches the anterior fusiform gyrus, which significantly suffers from the susceptibility artifacts arising from the auditory canals (Ojemann et al. 1997). In addition, it has been recently reported that there is another source of artifacts arising from the transverse sinus just adjacent to the VWFA, which is present in every subject in a variable position (Winawer et al. 2010). It is likely that variable portions of the VWFA may be influenced depending on the variable position of the transverse sinus (Wandell 2011). Nonetheless, multiple spots along the anterior and posterior axis of the fusiform gyrus were documented both in fMRI studies (Cohen et al. 2000; Dehaene et al. 2002; Vinckier et al. 2007) and in an early ECoG study (Nobre et al. 1994). Our results are consistent with these results, and further add evidence indicating that multiple letterstring-selective vOT regions alternating or intermingled with the above-mentioned 2 face-selective vOT regions. The fMRI-defined VWFA is typically located within the occipitotemporal sulcus (Dehaene and Cohen 2011; Price 2012). Thus, the surface recording of letterstring-selective ERSP signals originating from the fundus of the occipitotemporal sulcus might be located more medially than actually is (but never medial to the midfusiform sulcus), and/or might be underestimated by ECoG. If this is the case, the actual letterstring-selective ECoG activity in the posterior lateral fusiform gyrus should be located more laterally and/or stronger. As the ECoG grids are more likely to pick up neural activity from gyri than sulci, it is less likely that they would read out neural activity from the depths of the sulci, which most likely houses some of the neural activity in response to the current 24 categories. Other measurements such as fMRI may find support for the alternating organization, or may find other organizations based on the fact that fMRI can measure within sulci and the ECoG measurements are more likely to measure from gyri. Importantly, even in that case, the alternation structure would still be maintained. A recent study indicated that hemodynamic changes measured by fMRI reflected nonphase-locked changes in high-frequency power (Engell et al. 2012).
As further methodological considerations, ECoG can only partially cover the brain surface by 5–10 mm interspaced electrode arrays on the millisecond time resolution and with relatively high signal sensitivity, while fMRI can typically record whole-brain activity on 3–5 mm voxel resolution and the second time resolution with relatively low signal sensitivity. Group analyses in hemodynamic neuroimaging studies with smoothing kernels would be more likely to provide support for the overlapped model than direct electrophysiological recordings. It is also important to consider that the inter-electrode distance of the ECoG (5–10 mm in the current study) may limit the detectable size of the functional overlap. All we can test by the ECoG data is whether there is macroscopic channel-wise clustering or intercalation of the optimal category. It is impossible to examine columnar (submillimeter) level functional microarchitectures by the current experimental setup. Notably, the random mixture described at the columnar level proposed in a previous work (Puce et al. 1999) cannot be ruled out by the present work. Taken together, since it is difficult to localize the source of neural activity in the vOT by fMRI or ECoG alone, the combination of fMRI and ECoG would be complementary and promising to thorough and accurate source localization.
Signature of Category Specific Processing
Among many visual categories, faces and letterstrings, 2 essential visual communication elements in humans induced the most and the second-most category-selective visual responses, respectively, in the human vOT (Table 1, Supplementary Fig. S1). We found category selectivity primarily in high γ band powers in most channels, but we occasionally found channels selective in lower frequency bands. Previous ECoG studies used N200 as a signature of category-specific processing (Allison, McCarthy, et al. 1994; Halgren et al. 1994; Allison et al. 1999; McCarthy et al. 1999; Puce et al. 1999). Other studies indicated categorical specificity in frequency-specific ERSP (Klopp et al. 1999; John et al. 2000). More recent ECoG studies addressed consistency of selectivity across different frequency components. There have been mixed evidence as to the spatial independence of the face-γ ERSP and face-N200 (Vidal et al. 2010; Engell and McCarthy 2011). In our cases selectivity was basically consistent between VEP and different frequency-band powers. A couple of factors might contribute to this apparent difference between the studies. First, electrode diameter may affect detectability in high-γ band powers. Electrodes with a diameter of 0.8 mm were used for the study emphasizing the spatial segregation across different frequency powers (Vidal et al. 2010), whereas electrodes with diameters of 2.2 and 2.0 mm were used for the studies reporting the co-localization. In the current study, the diameters were 1.5 and 3.0 mm (we mainly used 1.5 mm). We had an impression that 1.5 mm-diameter-electrodes would be more suitable for detecting gamma oscillations, which may be consistent with the proposition that γ-band activity tends to be more focal than low-frequency LFP activity (Lindén et al. 2010; Crone et al. 2011). Second, the electrodes' configuration of surface vs. depth recoding might also be an important factor. Thus, it is possible that the finer organization based on localized oscillations will be preferentially found by the depth electrode with smaller diameter. Third, the criteria for selectivity should also contribute the discrepancy. Our analyses were mainly focused on sharply selective channels according to the conservative criteria. These might emphasize the spatial consistency of the γ ERSP and N200 channels.
Functional Interactions Between Face- and Letterstring-Selective Regions
Mutual inhibition between the face-selective and word-selective channels was indicated by observations of category-specific reversals of event-related potentials (Allison et al. 2002), but whether the reversal of the potential indicates functional excitation/inhibitions or simply reflects topological relationship between the signal source and the electrodes remain unclear. Our high-density electrode configuration enabled us to detect multiple face- and letterstring-selective channels and thus to directly investigate the functional interaction between them. The Granger causality analysis (Fig. 8) suggested that the neural system for visual word perception in the vOT would be functionally linked to the face perception machinery, although the link was unidirectional. Thus, the word-selective vOT region would receive local input from the face-selective vOT region, but not vice versa. One possible interpretation is that cross-category connectivity might represent interaction between anatomically neighboring zones that may belong to different categories but the same hierarchal level of visual processing (Vinckier et al. 2007; Freiwald and Tsao 2010). Specifically, face-selective channels might suppress letterstring-selective channels, but not vice versa (Allison et al. 2002). Alternatively, a recent eccentricity bias hypothesis (Levy et al. 2001; Hasson et al. 2002; Malach et al. 2002) links the positioning of face- and word-selective regions to a foveal bias common across word and face processing. Thus, this central visual-field bias organization would suggest a potential linking factor between face- and word-selective responses, which may help understanding the interplay between face- and word-selective responses. Recent theories have proposed that literacy acquisition partially recycles pre-existing cortical systems that are evolved for object recognition (Baker, Liu, et al. 2007; Dehaene and Cohen 2011). Indeed, the human brain has likely not evolved a dedicated mechanism for reading letters, because letter invention is too recent and culturally divergent to have influenced the human genomic blueprint. In agreement with this postulation, our findings indicate that a common neural substrate in the vOT, originally evolved for face recognition, can be shaped via reading experiences into anatomically alternated patchy modules that are specialized for the recognition of faces and visual words. These findings highlight an adaptive neural principle to represent newly acquired knowledge among existing salient categories in the cerebral association cortex.
This work was supported by Strategic Research Program for Brain Science from the MEXT of Japan to I.H. and Y.K., 2008 Specified Research grant from Takeda Science Foundation, Grant (A) from Hayao Nakayama Foundation for Science and Technology and Culture to I.H. Grant for Promotion of Niigata University Research Project (20A010, 23B008, 23H081) to I.H., Grant for Comprehensive Research on Disability, Health and Welfare (H23-Nervous and Muscular-General-003) from MHLW of Japan to K.Kawai. and JSPS KAKENHI (11J08024) to K.M.
Conflict of Interest: None declared.