## Abstract

Recognition of faces and written words is associated with category-specific brain activation in the ventral occipitotemporal cortex (vOT). However, topological and functional relationships between face-selective and word-selective vOT regions remain unclear. In this study, we collected data from patients with intractable epilepsy who underwent high-density recording of surface field potentials in the vOT. “Faces” and “letterstrings” induced outstanding category-selective responses among the 24 visual categories tested, particularly in high-γ band powers. Strikingly, within-hemispheric analysis revealed alternation of face-selective and letterstring-selective zones within the vOT. Two distinct face-selective zones located anterior and posterior portions of the mid-fusiform sulcus whereas letterstring-selective zones alternated between and outside of these 2 face-selective zones. Further, a classification analysis indicated that activity patterns of these zones mostly represent dedicated categories. Functional connectivity analysis using Granger causality indicated asymmetrically directed causal influences from face-selective to letterstring-selective regions. These results challenge the prevailing view that different categories are represented in distinct contiguous regions in the vOT.

## Introduction

Face recognition is essential for the social interactions of primates (Rolls 2010). In humans, literacy further facilitates visual communications beyond space and time. A key brain structure implicated in the recognition of both “faces” and “written words” is the ventral occipitotemporal cortex (vOT) (Allison, McCarthy, et al. 1994; Halgren et al. 1994; Puce et al. 1996). Specifically, functional magnetic resonance imaging (fMRI) studies have shown that visual presentation of particular visual categories such as faces, places, bodies, and written words induced hemodynamic responses in specific vOT regions (Grill-Spector and Malach 2004). The fusiform face area (FFA) is a right-dominated vOT region specialized for face recognition (Kanwisher and Yovel 2006). As an individual learns to read and write, another region within the occipitotemporal sulcus—the left-dominated “visual word form area (VWFA)”—becomes specifically activated by visual presentation of words or letterstrings (Dehaene and Cohen 2011; Price 2012). Despite hemispheric dominance, activation was often bilaterally observed for both categories (Puce et al. 1996; Kanwisher et al. 1997; Cohen et al. 2003). Electrocorticography (ECoG), recording of brain-surface field potentials from chronically implanted subdural electrode arrays in patients of medically intractable epilepsy (Lachaux et al. 2003; Mukamel and Fried 2012), also revealed bilateral event-related potentials selective to faces and letterstrings (Halgren et al. 1994; Nobre et al. 1994; Allison et al. 1999; McCarthy et al. 1999; Puce et al. 1999; Chan et al. 2011). Importantly, the within-hemispheric topological relationship between the face-selective and word-selective vOT regions remains unsettled. Three hypotheses have been proposed (Fig. 1). First, both face-selective and letterstring-selective vOT regions might form distinct and contiguous areas with no macroscopic spatial overlap between them (single-zone hypothesis). Second, multiple face-selective regions and multiple letterstring-selective regions might exist and alternate each other within the vOT (multizone hypothesis). Third, face-responsive and letterstring-responsive regions might be macroscopically overlapped in the vOT (overlap hypothesis). Since the discovery of the FFA and the VWFA, the single-zone hypothesis has been most widely accepted. However, functional specificity and regional specificity of the FFA and the VWFA have been challenged (Price and Devlin 2011; Weiner and Grill-Spector 2012). The overlap hypothesis emphasizes the importance of the patterns of suboptimal neural activity distributed in the vOT which may encode faces and object categories (Haxby et al. 2001). Both the FFA and the VWFA occupy huge portions along the anteroposterior axis of the vOT cortex. Recently, Weiner and coworkers argued that high-resolution fMRI revealed 2 separate (anterior and posterior) face-selective vOT patches near the midfusiform sulcus in individual hemispheres, and body-selective regions alternate with them (Weiner and Grill-Spector 2010; Parvizi et al. 2012). Likewise, in the greater VWFA, successive activation peaks have sometimes been observed along the anteroposterior axis (Dehaene et al. 2002; Vinckier et al. 2007; Wandell 2011). These findings raised the plausibility of the multizone hypothesis. In general, population analysis of data acquired in different experiments often makes it difficult to distinguish between the single-zone, multiple-zone, and overlap hypotheses.

Figure 1.

Three models of the spatial distribution of face- and letterstring-selective regions in the ventral occipitotemporal cortex (vOT). Single-zone (left), multizone (center), and overlap (right) models for face- (red), letterstring-selective (blue), and both face-/letterstring-selective channels are superimposed on the ventral view of a spatially normalized brain.

Figure 1.

Three models of the spatial distribution of face- and letterstring-selective regions in the ventral occipitotemporal cortex (vOT). Single-zone (left), multizone (center), and overlap (right) models for face- (red), letterstring-selective (blue), and both face-/letterstring-selective channels are superimposed on the ventral view of a spatially normalized brain.

A pioneering ECoG study reported that face-selective and letterstring-selective recording channels occasionally abutted each other in individual hemispheres (Nobre et al. 1994). Other ECoG studies have found evidence for 2 face-selective vOT patches proposed by Weiner and Grill-Spector (2012) using high-resolution fMRI, but this is less common due to coverage issues and electrode organization (e.g., strips vs. grids) (Allison et al. 1999; Parvizi et al. 2012).

The present study focused on within-hemispheric analyses in 6 hemispheres from 4 patients out of a larger series of studies (30 hemispheres from 20 patients). We collected data from those patients who underwent high-density implantation of as many as 18–46 intracranial electrodes in vOT per hemisphere and in whom multiple face-selective channels and letterstring-selective channels were identified. For each subject, ECoG responses to a variety of 24 visual categories including faces and letterstrings were recorded and selectivity was defined based on analysis of variance (ANOVA) corrected for multiple comparisons, or more conservatively based on the d′ value calculated from the signal detection theory (Green and Swets 1966). Our objective was to examine whether within- hemispheric analysis supported the one-zone, multiple-zone, or overlap hypothesis. Empirically, the “single-zone” model would be supported if face-selective channels form one continuous cluster and letterstring-selective channels form another cluster without overlap in the vOT of each hemisphere. The “multizone” model would be supported if face-selective and letterstring-selective channels form multiple clusters, respectively, in the vOT without overlap. The “overlap” model would be supported if face-responsive and letterstring-responsive channels are overlapped. To test the validity of the overlap hypothesis directly, we also conducted a classification analysis that quantitatively examined how the face and the letterstring information are distributed in the letterstring-selective and face-selective channels respectively. Another advantage of high-density recording is that it enables evaluation of connectivity among multiple sites within hemispheres. Therefore, we compared Granger causality within and across the face-selective and letterstring-selective channels (Granger 1969) in order to clarify functional relationship between these 2 category-selective vOT regions.

## Materials and Methods

### Subjects

Written informed consent was obtained from 20 patients with pharmacologically intractable epilepsy, who were evaluated for possible surgical treatment at The University of Tokyo Hospital or Nishi-Niigata Chuo National Hospital. Experimental protocols were approved by the institutional review boards of both hospitals and Niigata University School of Medicine.

### Recordings

From 2009 to 2012, we implanted subdural electrodes in 20 patients, totaling 30 hemispheres for the purpose of detecting epileptic foci. Among them, 2 patients (2 hemispheres) were excluded from analyses (Table 1), because only 1 category (letterstring) was tested. To validate the category-selectivity for each channel, we further excluded from analyses the patients who could not perform a visual category judgment task or a one-back task during presentation of 24 categories of visual stimuli. To examine whether the face-selective and letterstring-selective channels were spatially separated, alternated, or overlapped, we excluded the hemispheres implanted with ≤15 electrodes in vOT and in which coexistence of multiple face-selective channels and multiple letterstring-selective channels using ANOVA (see Category-selectivity section) was not observed. Subdural ECoG electrode arrays were arranged in grids or strips (Unique Medical Co., Tokyo, Japan). Each grid/strip contained 4–20 electrodes. Electrode contact was 1.5 mm in diameter with a 5-mm separation, or 3 mm in diameter with a 10-mm separation. The number and location of the recording sites in the temporal, occipital, and frontal lobes were determined exclusively by clinical criteria. The recorded signal was amplified using an reference placed on the scalp, filtered between 0.55 and 150 Hz and sampled at 400 Hz (Nicoletone, Care Fusion, San Diego, CA, USA), or amplified using an averaged intracranial reference placed outside of the epileptic focus and filtered between 0.05 and 300 Hz, and sampled at 1 kHz (EEG1200, Nihon Koden, Tokyo, Japan). All data were acquired during periods without epileptic seizure events.

Table 1

Spatial distribution of ECoG electrodes and their visual response

Electrode location Frontal Parietal Temporal vOT Occipital
Number of channels 113 107 865 585 239
Number of visually responsive channels 15 139 371 113
d′ (category selectivity) in visually responsive channels (Mean ± SD) 0.35 ± 0.13 0.33 ± 0.11 0.40 ± 0.21 0.78 ± 0.59 0.59 ± 0.39
Electrode location Frontal Parietal Temporal vOT Occipital
Number of channels 113 107 865 585 239
Number of visually responsive channels 15 139 371 113
d′ (category selectivity) in visually responsive channels (Mean ± SD) 0.35 ± 0.13 0.33 ± 0.11 0.40 ± 0.21 0.78 ± 0.59 0.59 ± 0.39

Note: Data from 1909 channels in 28 hemispheres.

### Stimulus Presentation

Subjects performed a one-back task or a category judgment task while colored photographs of 120 objects from 24 different categories, including human faces and letterstrings, were presented in a pseudorandom order on a 27-inch LCD monitor at a viewing distance of 57 cm. During the one-back task, the same stimuli were repeatedly presented on some trials. Data from the second presentation trials was excluded not to underestimate the visual response due to repetition suppression. Each stimulus, subtending 6° of the visual angle, was presented for 300 ms, followed by a 900-ms interval period.

### Electrode Localization

Three-dimensional (3D) T1-weighted magnetic resonance images (MRIs) of each subject's brain, which consisted of 136 sequential 1.4-mm-thick axial slices with a resolution of 256 × 256 pixels in a field of view of 240 mm, were obtained preoperatively. MRIs were automatically registered to postoperatively scanned computed tomography to determine electrode positions based on a normalized mutual information method using AVIZO (Visualization Science Group, Bordeaux, France). The 3D brain surface was then reconstructed using Real INTAGE (Cybernet Systems, Ltd., Tokyo, Japan). For joint presentation, the 3D brain image mounted with electrode locations was normalized to Montreal Neurological Institute coordinates via a linear scale adjustment using SPM8 (http://www.fil.ion.ucl.ac.uk/spm/software/spm8/, last accessed on 11/12/13), and then, transformed into Talairach coordinates.

### Data Analysis

In-house Matlab codes (MathWorks, Natick, MA, USA), with a statistical toolbox and signal processing toolbox, were used for data analysis (Matsuo et al. 2011; Toda et al. 2011). Signals acquired at 1 kHz were resampled offline at 400 Hz. A notch-filter was applied at 50 Hz in the analysis of visually evoked potentials (VEPs). VEPs during a 1.2-s trial epoch (600 ms from stimulus onset, 481 sampling points) were averaged across trials with baseline offset correction (−200 to 0 ms). Event-related spectral perturbation (ERSP) was calculated from the power spectrum over the trial epoch. Each 1.2-s epoch was Hamming windowed using 250 ms sliding windows in 2.5 ms steps prior to Fourier transformation. The averaged baseline (−375 to −125 ms) power spectrum was subtracted. Latency of visual response was determined from VEP or γ band power as the first of 4 consecutive data points that were significantly deviated (paired t-test, P < 0.05) from the baseline response (−250 to −50 ms)

#### Category Selectivity

We investigated the category selectivity using ANOVA and index of d′. Category selectivity of the ERSP between 50 and 300 ms after stimulus onset for each channel was evaluated. First, ANOVA and post hoc t-tests were conducted to compare the differences in the mean responses between one category and other categories accompanied by a Bonferroni correction. The α value for each comparison was set at 0.05 divided by the number of channels, frequency bands, and categories. For individual hemisphere analysis, we focused on those hemispheres where multiple face-selective channels and multiple letterstring-selective channels were simultaneously identified in the vOT (−80 < y < −30 and z < −10). Second, d′ (Green and Swets 1966) was computed as follows:

$d′=μonecategory−μothercategoriesσonecategory2+σothercategories2/2$
where μ and σ represent mean and standard deviation (SD) of the ERSP. We calculated d′ for all the categories for 5 frequency bands –θ, α, β, γ, and high-γ band ranges (i.e., 4, 8–12, 16–28, 32–64, and 68–100 Hz, respectively). Significance of category selectivity was evaluated at d′ > 1 (Grill-Spector et al. 2006; Baker, Hutchison, et al. 2007). Both in the first and second analyses (ANOVA and d′ measure), single channels can be judged selective for multiple categories. The procedures thus avoid an overestimate of the sharpness of the selectivity. We also confirmed statistical reliability of d′ with a cross validation protocol, where the preferred category for each channel was initially determined as the category that yielded the maximal averaged response during odd-numbered blocks, and then d′ was calculated using data from even-numbered blocks between the preferred category and all other categories (Baker, Hutchison, et al. 2007).
$d′=μpreferredcategory−μnonpreferredcategoriesσpreferredcategory2+σnonpreferredcategories2/2$

The procedure was repeated once again after exchanging odd and even-numbered blocks. Significance of category selectivity was evaluated if d′ > 1 in both odd and even blocks.

To examine the time course of the category selectivity, d′ was calculated from ERSP in every sampling points (2.5 ms). We determined the latency when d′ value exceeded the value of 1. We calculated peak times of the response between 0 and 475 ms from the averaged time course in each electrode.

#### Classification Analysis

Using a neural decoding approach, category information encoded in a group of electrodes (face-selective channels or letterstring-selective channels) was evaluated. We selected 2 object categories and picked up the trials in which images included in those categories were presented. Using those trials, a binary classifier was trained to predict the category of a presented image on a trial-by-trial basis and tested (Kamitani and Tong 2005). We applied this procedure to all pairs of the 24 object categories. Each binary classifier consisted of a linear support vector machine (Vapnik 1998) [implemented by LIBSVM (Chang and Lin 2011)]. As input features to the classifiers, we calculated the ERSP with the time window between 50 and 300 ms after the stimulus onset, and those for the electrodes of a group and the 5 frequency bands were used. To avoid arithmetic overflow and underflow, the values of each feature were z-transformed using the sample mean and SD calculated with the training dataset. To evaluate generalization performance for category classification across different exemplars, we ensured that trials corresponding to the same visual stimuli were not included in both the training and the test datasets (Vindiola and Wolmetz 2011). For each category pair, we divided the exemplars included in either category of the pair into 5 groups, each of which contained 2 exemplars from the 2 different categories and divided the corresponding trials into 5 groups. Four groups were then used to train a decoder and the remaining group was used for evaluating the trained classifier. This procedure was repeated until the trials from all 5 groups were tested (5-fold cross-validation), and the percentage of correct classification was calculated.

#### Time Course of VEP and ERSP Responses for Face- and Letterstring-Stimuli

Visual latencies were calculated from both VEP and ERSP in each trial. We determined the latency when the response exceeded the mean ± 2.5 SD of the response in the prestimulus period for the 4 consecutive data sampling points. We calculated peak times of the response between 0 and 475 ms for VEP and high-γ activity from the averaged waveform in each electrode. Latencies and peak times for face and letterstring stimuli from 6 hemispheres were tested using the Mann–Whitney U-test.

#### Spatial Distribution of Face-Selective Channels and Letterstring-Selective Channels

Individual brains were transformed into a spatially normalized brain using SPM8 and all the face- and letterstring-selective vOT channels were plotted in the Talairach coordinates.

#### Functional Connectivity Between Channels

The eConnectome MATLAB software (http://econnectome.umn.edu, last accessed on 11/12/13) was used to investigate directional interactions (He et al. 2011), or Granger causality, between face-selective channels and letterstring-selective channels. The Granger causality tests a statistical hypothesis for determining whether one time series is useful in forecasting another (Seth 2007). A variable X1 ‘Granger causes’ a variable X2 if information in the past of X1 helps predict the future of X2 with better accuracy than is possible when considering only information in the past of X2 itself. The adaptive-directed transfer function (ADTF) estimates the directional causal interaction in the spectral domain (Wilke et al. 2008). ADTF in the high-γ band power was calculated from ECoG signals evoked by face and letterstring stimuli. ADTFs between 50 and 300 ms, after stimulus onset, were averaged. To evaluate the effect of directional combinations on the interaction strength in each combination, ADTFs from 6 hemispheres with multiple face- and letterstring-selective channels were tested using the Kruskal–Wallis test and Mann–Whitney U-test accompanied by a Bonferroni correction.

## Results

Recordings from a total of 1909 electrodes in 28 hemispheres were analyzed. The majority of the category-selective channels were located in the vOT (Table 1). The areas of all the category-selective channels did not contain epileptiform abnormalities.

### Topological Alternation of Face- and Letterstring-Selective Zones in the vOT

In the bilateral vOT (−80 < y < −30 and z < −10), face-selective channels and letterstring-selective channels (Fig. 2) were predominantly observed in the fusiform gyrus (Fig. 3, Table 2). To examine whether the face-selective and letterstring-selective channels were spatially separated, alternated, or overlapped (Fig. 1), among the whole set of data from 28 hemispheres in 18 patients, 10 hemispheres were excluded because of limited tested stimulus categories, 7 hemispheres were excluded because of insufficient number of channels in vOT, 5 were excluded because multiple face- and letterstring-selective channels were not found within hemispheres. Finally, 6 hemispheres (3 left and 3 right hemispheres) from 4 patients (15–48 years old, 3 males, 4 right-handed) fulfilled the criteria. Two patients underwent bilateral implantation of the electrode arrays, while 2 patients underwent unilateral implantation. Data were obtained from 18 to 46 electrodes in vOT for each hemisphere from a total of 207 recording sites. Quantitative measure of the response amplitude and the selectivity of each electrode revealed topological alternation for face-selective and letterstring-selective channels (Fig. 3). We defined the topological alternation of face-/letterstring-selective channel clusters if one or more face-/letterstring-selective channels were sandwiched by 2 letterstring-/face-selective channels, respectively. Alternation was observed in 4 of the 6 hemispheres (Fig. 3AC,E, black squared insets). In the other 2 hemispheres, where cortical coverage of electrodes was restricted (see Discussion) (Fig. 3D,F), alternation was not observed after statistically thresholding (Fig. 3D,F black squared insets), but appeared in selectivity measurement (Fig. 3D,F pie chart). To clarify the anatomical location of the face-selective and letterstring-selective channels relative to the cerebral sulci, we plotted them on individual brains (Fig. 3). Although the locations of implanted electrodes varied across subjects, 2 distinct face-selective zones were noted in the anterior and posterior portions of the midfusiform sulcus in both left and right hemispheres. The anterior face zone was consistently observed in 5 of the 6 hemispheres, whereas the posterior face zone was observed in 4 of the 6 hemispheres. Alternating with these face-selective zones, there appeared 3 letterstring-selective zones: one located anterior to the anterior end of the midfusiform sulcus and near the collateral sulcus, another in the anterior portion of the midfusiform sulcus just behind the anterior face zone, and the other located in the posterior portion of the mid-fusiform sulcus. Taken together, the distribution of the face- and letterstring-selective zones appeared to be alternated and partially overlapped, as if constituting a multilayered stripe pattern architecture between the collateral sulcus and the occipitotemporal sulcus (Fig. 1 center).

Table 2

Category-selective ECoG channels in vOT

Category Number of selective channels Number of selective channels in each frequency band

Mean d′
θ α β γ High-γ
Face 24 12 23 1.48
Letterstring 20 19 1.38
Body part 1.22
House 1.20
Pattern 1.18
Building 1.18
Insect 1.12
Animal 1.09
Jewelry 1.06
Clothing 1.06
Beverage 1.05
Food 1.03
Vehicle 1.02
Flower 1.01
Fruit 1.01
Number of selective channels in each frequency band 12 10 17 58
Mean d′ 1.31 1.14 1.23 1.35 1.45
Category Number of selective channels Number of selective channels in each frequency band

Mean d′
θ α β γ High-γ
Face 24 12 23 1.48
Letterstring 20 19 1.38
Body part 1.22
House 1.20
Pattern 1.18
Building 1.18
Insect 1.12
Animal 1.09
Jewelry 1.06
Clothing 1.06
Beverage 1.05
Food 1.03
Vehicle 1.02
Flower 1.01
Fruit 1.01
Number of selective channels in each frequency band 12 10 17 58
Mean d′ 1.31 1.14 1.23 1.35 1.45

Note: Data from 207 channels in 6 hemispheres.

Figure 2.

Category-selective responses for face and letterstring stimuli. (A) Visually evoked potentials (VEPs) and the total electrocorticography (ECoG) power selective to the face category. Upper, traces represent VEPs from different exemplars of face (red) and building (black) categories. Shadows denote stimulus presentation periods. Lower, the total ECoG power at the same channel is plotted against the stimulus category. The 5 successive bars represent evoked ECoG powers from 5 exemplars of face category. Error bars indicate standard deviations (SDs). Color bars under the images denote category labels as in VEPs. (B) Selectivity of another channel to the letterstring category. Formats are the same as (A).

Figure 2.

Category-selective responses for face and letterstring stimuli. (A) Visually evoked potentials (VEPs) and the total electrocorticography (ECoG) power selective to the face category. Upper, traces represent VEPs from different exemplars of face (red) and building (black) categories. Shadows denote stimulus presentation periods. Lower, the total ECoG power at the same channel is plotted against the stimulus category. The 5 successive bars represent evoked ECoG powers from 5 exemplars of face category. Error bars indicate standard deviations (SDs). Color bars under the images denote category labels as in VEPs. (B) Selectivity of another channel to the letterstring category. Formats are the same as (A).

Figure 3.

Alternation of highly face-selective and letterstring-selective ECoG channels in each individual hemisphere. (AF) Left, channels significantly selective to faces (red circles), letterstrings (blue circles), and both (quarter red/blue circles) in the vOT (−80 < y < −30 and z < −10), shown on ventral views of the brain in 6 hemispheres from 4 patients. White circles indicate channels nonselective for these 2 categories and gray circles represent channels outside the vOT. Blue, green, and red lines indicate occipitotemporal, midfusiform, and collateral sulci. Right, each pie chart depicts the relative F-value, evaluated by ANOVA, for each category of stimuli during a time window of 50–300 ms after stimulus onset. Diameter of each channel reflects a total power of the channel.

Figure 3.

Alternation of highly face-selective and letterstring-selective ECoG channels in each individual hemisphere. (AF) Left, channels significantly selective to faces (red circles), letterstrings (blue circles), and both (quarter red/blue circles) in the vOT (−80 < y < −30 and z < −10), shown on ventral views of the brain in 6 hemispheres from 4 patients. White circles indicate channels nonselective for these 2 categories and gray circles represent channels outside the vOT. Blue, green, and red lines indicate occipitotemporal, midfusiform, and collateral sulci. Right, each pie chart depicts the relative F-value, evaluated by ANOVA, for each category of stimuli during a time window of 50–300 ms after stimulus onset. Diameter of each channel reflects a total power of the channel.

Faces and letterstrings elicited the most and second most category-selective neural responses, respectively, among a variety of 24 visual categories tested (Table 2, Fig. 2, Supplementary Figs S1 and S3). In typical recording sites in the right vOT [(x, y, z) = (31, −65, −24) and (36, −54, −27) in Talairach coordinates for Fig. 2A,B, respectively], VEPs and total ERSP power were sharply selective to the “face” and “letterstring” categories (Fig. 2A,B). In all the 6 hemispheres, 64 channels exhibited significant category selectivity. The majority of the category-selective channels (52 of 64) were selective to single categories (46 were selective in a particular frequency component among the θ, α, β, γ, and high-γ bands, whereas 18 were selective across multiple frequency components). Category selectivity of the ECoG signals was most prominent in high-γ band powers. In addition to the conventional ANOVA with familywise error correction, we also estimated the category selectivity by d′ analysis (Fig. 4A) evaluated at d′ > 1 in θ, α, β, γ, or high-γ band power (Supplementary Fig. S1, Table 2). Consequently, d′ analysis revealed a smaller number of face- and letterstring-selective channels compared with ANOVA with familywise error correction (Supplementary Fig. S1), and thus appeared more conservative. Nonetheless, alternation of face- and letterstring-selective channels was still observed in d′ analysis (Fig. 4A) in 3 of 3 hemispheres where multiple face-selective channels and letterstring-selective channels were found. Few exceptional channels (1/8, 0/12, 1/9 in each hemisphere) exhibited selectivity for both categories. The alternation was observed even following a cross-validation analysis (Supplementary Fig. S2) where independent sets of data were used to define the preferred category and to quantify category selectivity. The channel-wise preference for the face- or letterstring-category emerged along with the time course of visual response and was maintained through the entire visual presentation period (Fig. 4B).

Figure 4.

Alternation of d′-defined face-selective and letterstring-selective channels in the vOT. (A) Channels significantly selective to faces, letterstrings, and both categories defined by d′ analysis in the vOT (−80 < y < −30 and z < −10), evaluated at d′ > 1. Formats are as in Figure 3 insets. Event-related spectral perturbations (ERSPs) evoked by face and letterstring categories are shown as time-frequency plots. Gray bars indicate stimulus presentation periods. (B) Time courses of high γ band powers of ERSP in response to faces (outer circles) and letterstrings (inner circles) in a subject. Channel-wise category selectivity was maintained throughout the visual presentation period. Note: Red/blue color in (A) indicates that face/letterstring response is significantly greater than responses to other categories. In contrast, color scale in (B) indicates high-γ power evoked by face/letterstring stimuli irrespective of its responsiveness to other 22 categories.

Figure 4.

Alternation of d′-defined face-selective and letterstring-selective channels in the vOT. (A) Channels significantly selective to faces, letterstrings, and both categories defined by d′ analysis in the vOT (−80 < y < −30 and z < −10), evaluated at d′ > 1. Formats are as in Figure 3 insets. Event-related spectral perturbations (ERSPs) evoked by face and letterstring categories are shown as time-frequency plots. Gray bars indicate stimulus presentation periods. (B) Time courses of high γ band powers of ERSP in response to faces (outer circles) and letterstrings (inner circles) in a subject. Channel-wise category selectivity was maintained throughout the visual presentation period. Note: Red/blue color in (A) indicates that face/letterstring response is significantly greater than responses to other categories. In contrast, color scale in (B) indicates high-γ power evoked by face/letterstring stimuli irrespective of its responsiveness to other 22 categories.

The stereotaxic coordinates of the face-selective channels in the vOT plotted on the spatially normalized brain (Fig. 5A) were in agreement with right-dominated, bilateral face activations reported in prior fMRI studies (Puce et al. 1996; Kanwisher et al. 1997; Grill-Spector et al. 2004; Barton et al. 2009; Haist et al. 2010; Mei et al. 2010), and also consistent with face-related VEPs bilaterally located by ECoG (Allison et al. 1999, 2002; Parvizi et al. 2012). The averaged ERSP for face-selective and letterstring-selective channels showed clear visual selectivity (Fig. 5B). The ERSP response to faces averaged for all face channels was greater and lasted longer than the ERSP response to letterstring averaged for all letterstring channels. A scatter-plot analysis indicated that the majority of individual channels exhibited selectivity to face or letterstring stimuli in high-γ band powers (Fig. 5C). No significant difference was detected in the visual response latencies of VEP and high-γ power between face channels and letterstring channels (Fig. 6A,B). Peak response time in high-γ power for face-selective channels was longer than one for letterstring-selective channels (326 ± 35 and 275 ± 73 ms; median ± interquartile range (IQR); P = 0.021), while no significant difference was observed in VEP (199 ± 10 ms for face channels and 205 ± 19 ms for letterstring channels; P = 0.26) (Fig. 6A,C). We found that d′ for faces rose quicker than that for letterstring by 17 ms (144 ± 33 and 161 ± 28 ms; P = 0.048) (Fig. 6A,B), although peak time in d′ was not different between face- and letterstring-selective channels (293 ± 75 and 278 ± 36 ms; P = 0.34) (Fig. 6A,C). Thus, although face-selective and letterstring-selective regions appeared to be macroscopically overlapped in the vOT, the 2 populations exhibited distinct functional properties.

Figure 5.

Topological mixture and functional differentiation of face- and letterstring channels. (A) Ventral view of a spatially normalized brain showing the entire distribution of electrodes (circles) in all 6 patients. Symbols are the same as in Figure 3. Horizontal and vertical axes denote mediolateral (x) and anteroposterior (y) coordinates, respectively, in the standard brain atlas (Talairach and Tournoux 1988). Dorsoventral axis (z) is collapsed. (B) ECoG responses to face (left) and letterstring (right) stimuli in face-selective (upper) and letterstring-selective (lower) channels. In each panel, VEP is shown under the ERSP. (C) The scattergram of high γ power for face- and letterstring stimuli in each category-selective channel. Red and blue circles represent face-selective and letterstring-selective channels.

Figure 5.

Topological mixture and functional differentiation of face- and letterstring channels. (A) Ventral view of a spatially normalized brain showing the entire distribution of electrodes (circles) in all 6 patients. Symbols are the same as in Figure 3. Horizontal and vertical axes denote mediolateral (x) and anteroposterior (y) coordinates, respectively, in the standard brain atlas (Talairach and Tournoux 1988). Dorsoventral axis (z) is collapsed. (B) ECoG responses to face (left) and letterstring (right) stimuli in face-selective (upper) and letterstring-selective (lower) channels. In each panel, VEP is shown under the ERSP. (C) The scattergram of high γ power for face- and letterstring stimuli in each category-selective channel. Red and blue circles represent face-selective and letterstring-selective channels.

Figure 6.

Temporal characteristics in category-selective responses. (A) Time course of high γ activity and d′ in face-selective channels (red) and letterstring-selective channels (blue). Shadows denote stimulus presentation periods. Error bars indicates SDs. (B) Box plot of the visual response latency in face-selective channels and letterstring-selective channels for VEP (left), high γ activity (center), and d′ (right). The box signifies the upper (q3) and lower (q1) quartiles. The median is depicted by the central red bar. The whiskers indicate the range, which include all data except outliers. Data points exceeding q3 + 1.5 × (q3 − q1) or below q1 − 1.5 × (q3 − q1) were treated as outliers and plotted individually as red dots. Asterisk indicates statistical significance level evaluated by Mann–Whitney U-tests (*P = 0.04). (C) Box plot of the peak time in VEP (left), high γ activity (center), and d′ (right). Formats are the same as (B) (**P = 0.01).

Figure 6.

Temporal characteristics in category-selective responses. (A) Time course of high γ activity and d′ in face-selective channels (red) and letterstring-selective channels (blue). Shadows denote stimulus presentation periods. Error bars indicates SDs. (B) Box plot of the visual response latency in face-selective channels and letterstring-selective channels for VEP (left), high γ activity (center), and d′ (right). The box signifies the upper (q3) and lower (q1) quartiles. The median is depicted by the central red bar. The whiskers indicate the range, which include all data except outliers. Data points exceeding q3 + 1.5 × (q3 − q1) or below q1 − 1.5 × (q3 − q1) were treated as outliers and plotted individually as red dots. Asterisk indicates statistical significance level evaluated by Mann–Whitney U-tests (*P = 0.04). (C) Box plot of the peak time in VEP (left), high γ activity (center), and d′ (right). Formats are the same as (B) (**P = 0.01).

### Category Information in Distributed Activity Patterns

Based on ECoG responses recorded from face-selective channels (Fig. 3), pairwise classification performance of faces was much higher (92 ± 9.5% median ± IQR) than the classification performance of letterstrings (60 ± 21%) (P < 9.6 × 10−38, Mann–Whitney U-test with Bonferroni correction) and other categories (57 ± 14%) (P < 6.1 × 10−74) (Fig. 7A left, B left). Similarly, based on ECoG responses from letterstring-selective channels, classification performance of letterstrings was much higher (89 ± 13%) than the classification performance of faces (61 ± 15%) (P < 1.4 × 10−38) and other categories (61 ± 17%) (P < 3.5 × 10−66) (Fig. 7A right, B right). Thus, although it is possible that the representations of a few suboptimal categories may be overlapped, most of the suboptimal categories are not overlapped in a sense of above chance classification performance.

Figure 7.

Category classification performance determined by a linear support vector machine. (A) Pairwise classification performance of a subject using face-selective channels (left) and letterstring-selective channels (right). Red and blue rectangle denote face and letterstring category. (B) Box plot of averaged classification performance in face- and letterstring-selective channels of all 6 hemispheres. Formats are the same as Figure 6B. Broken lines indicate chance level (gray) and the threshold of significance level (green, P = 0.01).

Figure 7.

Category classification performance determined by a linear support vector machine. (A) Pairwise classification performance of a subject using face-selective channels (left) and letterstring-selective channels (right). Red and blue rectangle denote face and letterstring category. (B) Box plot of averaged classification performance in face- and letterstring-selective channels of all 6 hemispheres. Formats are the same as Figure 6B. Broken lines indicate chance level (gray) and the threshold of significance level (green, P = 0.01).

### Functional Connectivity Between Face- and Letterstring-Selective Channels

A functional differentiation between face- and letterstring-selective channels was also suggested by the nonuniform distribution of Granger causalities among the recording channels (Fig. 8A). Granger causality within face-selective channels (0.068 ± 0.021 median ± IQR) was significantly higher compared with within letterstring-selective channels (0.059 ± 0.020) (P < 0.001, Mann–Whitney U-test with Bonferroni correction). Moreover, across-domain connectivity was asymmetrically directed from face channels to letterstring channels (from face to letterstring channels, 0.075 ± 0.024; from letterstring to face channels, 0.048 ± 0.017; P < 5.7 × 10−5) (Fig. 8B). If face-selective channels and letterstring-selective channels were driven by a common visual input with a significant lag, the lagged common input may cause apparent pseudocorrelation directed from face to letterstring channels. However, we found no significant difference in the visual response latency between the letterstring-selective channels and face-selective channels for both VEP (face-selective channels, 123 ± 138 ms; letterstring-selective channels, 118 ± 130 ms, median ± IQR) and high-γ activity (face-selective channels, 115 ± 130 ms; letterstring-selective channels, 113 ± 128 ms) (Mann–Whitney U-test, P = 0.12 for VEP, P = 0.32 for high-γ activity) (Fig. 6B). These results rule out the possibility that the common cause factor was a major contribution of Granger causality analyses. We also conducted the permutation test by randomly assigning face or letterstring labels to the relevant population. We found no significant order of Granger causality following permutation. This analysis excludes the possibility that the causality might be produced by chance.

Figure 8.

Granger causality within and between face- and letterstring-selective channels. (A) Matrix showing Granger causality defined as adaptive-directed transfer function (ADTF) for all combinations of face- and letterstring-selective channels in a left hemisphere. The columns indicate source (output) channels, while the rows indicate destination (input) channels. Local information flow was face-dominant and asymmetric. Granger causality was stronger within face-selective channels than within letterstring-selective channels. Also, face channels sent stronger outputs to letterstring channels than to themselves, whereas letterstring channels sent weaker outputs to face channels than to themselves. Color bar indicates ADTF value. (B) Box plots of ADTF from letterstring to face channels (L-F), ADTF within letterstring channels (L-L), ADTF within face channels (F-F), and ADTF from face to letterstring channels (F-L) in all tested hemispheres. A significant difference between conditions was found (P < 2.2 × 10−16, Kruskal–Wallis test). Asterisks indicate statistical significance levels evaluated by post hoc Mann–Whitney U-tests with Bonferroni correction (*P = 0.03, **P = 0.001, ***P = 5.7 × 10−5).

Figure 8.

Granger causality within and between face- and letterstring-selective channels. (A) Matrix showing Granger causality defined as adaptive-directed transfer function (ADTF) for all combinations of face- and letterstring-selective channels in a left hemisphere. The columns indicate source (output) channels, while the rows indicate destination (input) channels. Local information flow was face-dominant and asymmetric. Granger causality was stronger within face-selective channels than within letterstring-selective channels. Also, face channels sent stronger outputs to letterstring channels than to themselves, whereas letterstring channels sent weaker outputs to face channels than to themselves. Color bar indicates ADTF value. (B) Box plots of ADTF from letterstring to face channels (L-F), ADTF within letterstring channels (L-L), ADTF within face channels (F-F), and ADTF from face to letterstring channels (F-L) in all tested hemispheres. A significant difference between conditions was found (P < 2.2 × 10−16, Kruskal–Wallis test). Asterisks indicate statistical significance levels evaluated by post hoc Mann–Whitney U-tests with Bonferroni correction (*P = 0.03, **P = 0.001, ***P = 5.7 × 10−5).

## Discussion

We presented rare data of high-density electrocorticographic recording from the human vOT. Our present study is consistent with prior work reporting the coexistence of face-selective and letterstring-selective channels in the vOT, but also adds the important fact that the stimulus-selective electrodes cluster together and generate an alternating organization. To the best of our knowledge, coexistence of multiple face-selective channels and multiple letterstring-selective channels in the vOT has not been described in any individual hemisphere in the previous literature of intracranial recording (Allison, Ginter, et al. 1994; Allison, McCarthy, et al. 1994; Halgren et al. 1994; Nobre et al. 1994; Allison et al. 1999; McCarthy et al. 1999; Puce et al. 1999; Tsuchiya et al. 2008; Fisch et al. 2009; Liu et al. 2009; Vidal et al. 2010; Chan et al. 2011). The current series of precious anecdotal cases provided us significant insight into the functional organization of the vOT, specifically regarding the controversy whether the face-selective and letterstring-selective vOT regions are organized as “single-zone,” “multizone,” or “overlap.” Among the 3 hypotheses, our results support the multizone hypothesis. Granger causality analysis with simultaneously identified multiple face- and letterstring-selective regions revealed asymmetrically directed connection from face-selective to letterstring-selective regions. These results challenge the prevailing view that different categories are represented in distinct contiguous regions in the vOT but, instead, support a novel view that there are alternating zones selective to faces and words in the vOT, with asymmetric functional link between them.

### Alternation of Face- and Letterstring-Selective Regions in the Human vOT

Many fMRI and ECoG studies assume that the face-responsive and word-form-responsive vOT regions are anatomically separated, although their relative position remains unclear. Face- and letterstring-selective ECoG channels were described to be “intercalated” in the pioneering studies (Allison, Ginter, et al. 1994; Nobre et al. 1994). However, “intercalation” in these initial papers referred to a situation where face-responsive and letterstring-responsive regions located side by side or constituted pieces of a mosaic of various category-selective regions in the vOT in population analyses, quite different from the definition of alternation in the present paper (i.e., a sandwich structure of face-letterstring-face channels or letterstring-face-letterstring channels). In these early papers, individual subjects were typically placed with one ECoG strip with several electrode contacts, with a 10-mm interelectrode spacing, in the vOT (e.g., Figs 2–4A of Nobre et al. 1994). A subsequent study (Fig. 8 of Puce et al. 1999) showed how face- and letterstring-related ERPs were organized in a single individual. Unfortunately, since only one letterstring-selective channel was identified in this patient, it was impossible to distinguish between the single-zone and multizone hypotheses. Puce et al. (1999) proposed a “random mixture column” model (their Fig. 10A) whereby face-selective columns are randomly intermixed with letterstring-selective or other category-selective columns, contrasting with a “segregated category-specific columns” model whereby multiple columns with similar category selectivity assembled together to form a functionally homogeneous cluster (their Fig. 10B). From sharp channel-wise specificity of the ECoG responses, they rejected the intercalation model at the columnar grouping level. At the level of the whole vOT organization, however, these same authors argued, “human face-selective patches of cortex are on average 12–16 mm wide and 15–35 mm long, depending on whether face-selective region is unitary or broken into 2 patches.” Therefore, it has been difficult to determine which of the single-zone, multizone, or overlap hypotheses is most likely by ECoG studies, since the number and density of electrodes implanted in individual hemispheres is solely determined by clinical criteria, and not always sufficient to systematically examine category selectivity across the whole vOT.

By focusing on the cases with high-density electrodes in the vOT, we found that highly face-selective and letterstring-selective regions were alternated in the vOT in 4 of 6 hemispheres examined. Importantly, the remaining 2 cases did not directly support the single-zone or the overlap hypothesis. In 1 of the 2 hemispheres, the result may support multizone hypothesis with a proviso. In this case, the posterior part of the vOT was not covered, but considering the location of the conventional VWFA (Dehaene et al. 2002; Vinckier et al. 2007), it is likely that there should be another letterstring-selective zone posterolateral to the face-selective zone in this patient, which would be in favor of the multizone hypothesis. In the remaining hemisphere, fewer numbers of face- and letterstring-selective channels were detected compared with other hemispheres tested, presumably because the electrode diameter was 3 mm instead of 1.5 mm, which may affect detectability in high γ band powers (see the next section), in this patient. The alternation was still observed 3 of 3 hemispheres in which multiple face-selective and letterstring-selective regions were defined by d′ analysis (Fig. 4A), even following cross-validation (Supplementary Fig. S2). Face-selective and letterstring-selective sites appeared to be macroscopically overlapped in the vOT, but individual channels showed distinct visual selectivity to face- or letterstring-category in 52 out of 64 channels based on ANOVA and 27 out of 29 channels based on d′ measure. The selectivity, particularly in high-γ band powers, was maintained through the entire visual stimulation period. From these results, we conclude that face-selective channels and letterstring-selective channels can be alternated in the human vOT cortex. Specifically, a series of alternating face- and letterstring-selective zones would constitute stripe architecture.

The “FFA” is the most representative category-selective vOT region widely accepted as a spatially contiguous cluster that can be accurately localized across subjects (Kanwisher and Yovel 2006). However, there has been no clear consensus as to its extent and spatial relationship to other category-selective regions (Op de Beeck et al. 2008). Pioneering ECoG studies reported that face-selective sites 15–25 mm apart existed (Allison, McCarthy, et al. 1994; Allison et al. 1999). However, these critical findings did not provide the assertion if they were from a single region or separate regions, because the density of the grid was insufficient. Our high-density recording showed that 2 face-selective regions were aligned with consistent topology. One region (face zone #1) appeared on the midfusiform gyrus around the anterior lip/anterior part of the midfusiform sulcus (Fig. 3C,E). The other region (face zone #2) appeared on the posterior-fusiform gyrus around the posterior part of the midfusiform sulcus between the collateral sulcus and occipitotemporal sulcus. This organization is consistent with recent human and monkey fMRI studies (Tsao et al. 2003, 2008; Pinsk et al. 2009; Rajimehr et al. 2009; Ku et al. 2011; Parvizi et al. 2012) and monkey electrophysiology studies (Moeller et al. 2008) suggesting that 2 face-selective clusters exist within the vOT. We estimated whether these 2 face regions correspond to mFus and pFus or whether they correspond to FFA and occipital face area (OFA) (Weiner and Grill-Spector 2010, 2012). We have compared the coordinates of these 2 face regions (face zones #1 and #2) and those of mFus, pFus (Weiner and Grill-Spector 2011) and OFA (Grill-Spector et al. 2004). In the right hemisphere, the averaged coordinate of the channels in face zone #1 was estimated at (33 ± 6, −44 ± 3, −27 ± 5, n = 8 channels from 3 hemispheres) (mean ± SD) in Talairach coordinates. The face zone #2 was centered at (33 ± 6, −67 ± 6, −21 ± 2, n = 5 channels from 2 hemispheres). The centers of mFus, pFus, and OFA in the literature were located (34 ± 4, −47 ± 6, −16 ± 4), (34 ± 5, −62 ± 6, −15 ± 3) and (45 ± 5, −70 ± 3, 2 ± 9). The distances between zone #1 and mFus, pFus, and OFA were 12, 22, and 41 mm, respectively. The face zone #1 was the closest to mFus among the 3 possibilities in the coordinates. For zone #2, the distances from mFus, pFus, and OFA were 20, 8, and 16 mm. The face zone #2 was the closest to pFus. The same geometrical relationship was found in the left hemisphere. The distances between zone #1 and mFus, pFus and OFA were 14, 22, and 45 mm. The distances between zone #2 and mFus, pFus, and OFA were 15, 9, and 35 mm. Thus, it is likely that the face zone #1 and the face zone #2 identified in the present study would correspond to mFus and pFus, respectively. It is possible that ECoG electrodes medial to the midfusiform sulcus are picking up signals from neural responses lateral to the midfusiform sulcus if the lip of the sulcus locates lateral to its fundus. Our data are consistent with a recent in-depth analysis reporting that the position of pFus relative to the midfusiform sulcus can be more variable across subjects than the relative position of mFus (Weiner et al. 2013). It is therefore not easy to determine the topology between the ECoG electrodes and specific sulci in some individual analyses.

The regional specificity of the VWFA may be more contentious than that of the FFA (Dehaene and Cohen 2011; Price and Devlin 2011). Signal sensitivity of fMRI responses in the vicinity of VWFA can be affected by considerable inhomogeneity of the blood oxygen level–dependent signals. The anterior portion of the VWFA approaches the anterior fusiform gyrus, which significantly suffers from the susceptibility artifacts arising from the auditory canals (Ojemann et al. 1997). In addition, it has been recently reported that there is another source of artifacts arising from the transverse sinus just adjacent to the VWFA, which is present in every subject in a variable position (Winawer et al. 2010). It is likely that variable portions of the VWFA may be influenced depending on the variable position of the transverse sinus (Wandell 2011). Nonetheless, multiple spots along the anterior and posterior axis of the fusiform gyrus were documented both in fMRI studies (Cohen et al. 2000; Dehaene et al. 2002; Vinckier et al. 2007) and in an early ECoG study (Nobre et al. 1994). Our results are consistent with these results, and further add evidence indicating that multiple letterstring-selective vOT regions alternating or intermingled with the above-mentioned 2 face-selective vOT regions. The fMRI-defined VWFA is typically located within the occipitotemporal sulcus (Dehaene and Cohen 2011; Price 2012). Thus, the surface recording of letterstring-selective ERSP signals originating from the fundus of the occipitotemporal sulcus might be located more medially than actually is (but never medial to the midfusiform sulcus), and/or might be underestimated by ECoG. If this is the case, the actual letterstring-selective ECoG activity in the posterior lateral fusiform gyrus should be located more laterally and/or stronger. As the ECoG grids are more likely to pick up neural activity from gyri than sulci, it is less likely that they would read out neural activity from the depths of the sulci, which most likely houses some of the neural activity in response to the current 24 categories. Other measurements such as fMRI may find support for the alternating organization, or may find other organizations based on the fact that fMRI can measure within sulci and the ECoG measurements are more likely to measure from gyri. Importantly, even in that case, the alternation structure would still be maintained. A recent study indicated that hemodynamic changes measured by fMRI reflected nonphase-locked changes in high-frequency power (Engell et al. 2012).

As further methodological considerations, ECoG can only partially cover the brain surface by 5–10 mm interspaced electrode arrays on the millisecond time resolution and with relatively high signal sensitivity, while fMRI can typically record whole-brain activity on 3–5 mm voxel resolution and the second time resolution with relatively low signal sensitivity. Group analyses in hemodynamic neuroimaging studies with smoothing kernels would be more likely to provide support for the overlapped model than direct electrophysiological recordings. It is also important to consider that the inter-electrode distance of the ECoG (5–10 mm in the current study) may limit the detectable size of the functional overlap. All we can test by the ECoG data is whether there is macroscopic channel-wise clustering or intercalation of the optimal category. It is impossible to examine columnar (submillimeter) level functional microarchitectures by the current experimental setup. Notably, the random mixture described at the columnar level proposed in a previous work (Puce et al. 1999) cannot be ruled out by the present work. Taken together, since it is difficult to localize the source of neural activity in the vOT by fMRI or ECoG alone, the combination of fMRI and ECoG would be complementary and promising to thorough and accurate source localization.

### Signature of Category Specific Processing

Among many visual categories, faces and letterstrings, 2 essential visual communication elements in humans induced the most and the second-most category-selective visual responses, respectively, in the human vOT (Table 1, Supplementary Fig. S1). We found category selectivity primarily in high γ band powers in most channels, but we occasionally found channels selective in lower frequency bands. Previous ECoG studies used N200 as a signature of category-specific processing (Allison, McCarthy, et al. 1994; Halgren et al. 1994; Allison et al. 1999; McCarthy et al. 1999; Puce et al. 1999). Other studies indicated categorical specificity in frequency-specific ERSP (Klopp et al. 1999; John et al. 2000). More recent ECoG studies addressed consistency of selectivity across different frequency components. There have been mixed evidence as to the spatial independence of the face-γ ERSP and face-N200 (Vidal et al. 2010; Engell and McCarthy 2011). In our cases selectivity was basically consistent between VEP and different frequency-band powers. A couple of factors might contribute to this apparent difference between the studies. First, electrode diameter may affect detectability in high-γ band powers. Electrodes with a diameter of 0.8 mm were used for the study emphasizing the spatial segregation across different frequency powers (Vidal et al. 2010), whereas electrodes with diameters of 2.2 and 2.0 mm were used for the studies reporting the co-localization. In the current study, the diameters were 1.5 and 3.0 mm (we mainly used 1.5 mm). We had an impression that 1.5 mm-diameter-electrodes would be more suitable for detecting gamma oscillations, which may be consistent with the proposition that γ-band activity tends to be more focal than low-frequency LFP activity (Lindén et al. 2010; Crone et al. 2011). Second, the electrodes' configuration of surface vs. depth recoding might also be an important factor. Thus, it is possible that the finer organization based on localized oscillations will be preferentially found by the depth electrode with smaller diameter. Third, the criteria for selectivity should also contribute the discrepancy. Our analyses were mainly focused on sharply selective channels according to the conservative criteria. These might emphasize the spatial consistency of the γ ERSP and N200 channels.

### Functional Interactions Between Face- and Letterstring-Selective Regions

Mutual inhibition between the face-selective and word-selective channels was indicated by observations of category-specific reversals of event-related potentials (Allison et al. 2002), but whether the reversal of the potential indicates functional excitation/inhibitions or simply reflects topological relationship between the signal source and the electrodes remain unclear. Our high-density electrode configuration enabled us to detect multiple face- and letterstring-selective channels and thus to directly investigate the functional interaction between them. The Granger causality analysis (Fig. 8) suggested that the neural system for visual word perception in the vOT would be functionally linked to the face perception machinery, although the link was unidirectional. Thus, the word-selective vOT region would receive local input from the face-selective vOT region, but not vice versa. One possible interpretation is that cross-category connectivity might represent interaction between anatomically neighboring zones that may belong to different categories but the same hierarchal level of visual processing (Vinckier et al. 2007; Freiwald and Tsao 2010). Specifically, face-selective channels might suppress letterstring-selective channels, but not vice versa (Allison et al. 2002). Alternatively, a recent eccentricity bias hypothesis (Levy et al. 2001; Hasson et al. 2002; Malach et al. 2002) links the positioning of face- and word-selective regions to a foveal bias common across word and face processing. Thus, this central visual-field bias organization would suggest a potential linking factor between face- and word-selective responses, which may help understanding the interplay between face- and word-selective responses. Recent theories have proposed that literacy acquisition partially recycles pre-existing cortical systems that are evolved for object recognition (Baker, Liu, et al. 2007; Dehaene and Cohen 2011). Indeed, the human brain has likely not evolved a dedicated mechanism for reading letters, because letter invention is too recent and culturally divergent to have influenced the human genomic blueprint. In agreement with this postulation, our findings indicate that a common neural substrate in the vOT, originally evolved for face recognition, can be shaped via reading experiences into anatomically alternated patchy modules that are specialized for the recognition of faces and visual words. These findings highlight an adaptive neural principle to represent newly acquired knowledge among existing salient categories in the cerebral association cortex.

## Funding

This work was supported by Strategic Research Program for Brain Science from the MEXT of Japan to I.H. and Y.K., 2008 Specified Research grant from Takeda Science Foundation, Grant (A) from Hayao Nakayama Foundation for Science and Technology and Culture to I.H. Grant for Promotion of Niigata University Research Project (20A010, 23B008, 23H081) to I.H., Grant for Comprehensive Research on Disability, Health and Welfare (H23-Nervous and Muscular-General-003) from MHLW of Japan to K.Kawai. and JSPS KAKENHI (11J08024) to K.M.

## Notes

Conflict of Interest: None declared.

## References

Allison
T
Ginter
H
McCarthy
G
Nobre
A
Puce
A
Luby
M
Spencer
D
.
1994
.
Face recognition in human extrastriate cortex
.
J Neurophysiol
.
71
:
821
825
.
Allison
T
McCarthy
G
Nobre
A
Puce
A
Belger
A
.
1994
.
Human extrastriate visual cortex and the perception of faces, words, numbers, and colors
.
Cereb Cortex
.
4
:
544
554
.
Allison
T
Puce
A
McCarthy
G
.
2002
.
Category-sensitive excitatory and inhibitory processes in human extrastriate cortex
.
J Neurophysiol
.
88
:
2864
2868
.
Allison
T
Puce
A
Spencer
D
McCarthy
G
.
1999
.
Electrophysiological studies of human face perception. I: potentials generated in occipitotemporal cortex by face and non-face stimuli
.
Cereb Cortex
.
9
:
415
430
.
Baker
CI
Hutchison
TL
Kanwisher
N
.
2007
.
Does the fusiform face area contain subregions highly selective for nonfaces?
Nat Neurosci
.
10
:
3
4
.
Baker
CI
Liu
J
Wald
LL
Kwong
KK
Benner
T
Kanwisher
N
.
2007
.
Visual word processing and experiential origins of functional selectivity in human extrastriate cortex
.
.
104
:
9087
9092
.
Barton
JJS
Fox
CJ
Sekunova
A
Iaria
G
.
2009
.
Encoding in the visual word form area: an fMRI adaptation study of words versus handwriting
.
J Cogn Neurosci
.
22
:
1649
1661
.
Chan
A
Baker
J
Eskandar
E
Schomer
D
Ulbert
I
Marinkovic
K
Cash
S
Halgren
E
.
2011
.
First-pass selectivity for semantic categories in human anteroventral temporal lobe
.
J Neurosci
.
31
:
18119
18129
.
Chang
CC
Lin
CJ
.
2011
.
LIBSVM: A library for support vector machines. Acm T Intel Syst Tec. 2
.
Cohen
L
Dehaene
S
Naccache
L
Lehéricy
S
Dehaene-Lambertz
G
Hénaff
M
Michel
F
.
2000
.
The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients
.
Brain
.
123
(Pt 2)
:
291
307
.
Cohen
L
Martinaud
O
Lemer
C
Lehericy
S
Samson
Y
M
Slachevsky
A
Dehaene
S
.
2003
.
Visual word recognition in the left and right hemispheres: anatomical and functional correlates of peripheral alexias
.
Cereb Cortex
.
13
:
1313
1333
.
Crone
N
Korzeniewska
A
Franaszczuk
P
.
2011
.
Cortical γ responses: searching high and low
.
Int J Psychophysiol
.
79
:
9
15
.
Dehaene
S
Cohen
L
.
2011
.
The unique role of the visual word form area in reading
.
Trends Cogn Sci
.
15
:
254
262
.
Dehaene
S
Le Clec'H
G
Poline
J-B
Le Bihan
D
Cohen
L
.
2002
.
The visual word form area: a prelexical representation of visual words in the fusiform gyrus
.
Neuroreport
.
13
:
321
325
.
Engell
A
Huettel
S
McCarthy
G
.
2012
.
The fMRI BOLD signal tracks electrophysiological spectral perturbations, not event-related potentials
.
Neuroimage
.
59
:
2600
2606
.
Engell
A
McCarthy
G
.
2011
.
The relationship of γ oscillations and face-specific ERPs recorded subdurally from occipitotemporal cortex
.
Cereb Cortex
.
21
:
1213
1221
.
Fisch
L
Privman
E
Ramot
M
Harel
M
Nir
Y
Kipervasser
S
Andelman
F
Neufeld
M
Kramer
U
Fried
I
et al
2009
.
Neural "ignition": enhanced activation linked to perceptual awareness in human ventral stream visual cortex
.
Neuron
.
64
:
562
574
.
Freiwald
WA
Tsao
DY
.
2010
.
Functional compartmentalization and viewpoint generalization within the macaque face-processing system
.
Science
.
330
:
845
851
.
Granger
CWJ
.
1969
.
Investigating causal relations by econometric models and cross-spectral methods
.
Econometrica
.
37
:
424
438
.
Green
DM
Swets
JA
.
1966
.
Signal detection theory and psychophysics
.
New York
:
Wiley
.
Grill-Spector
K
Knouf
N
Kanwisher
N
.
2004
.
The fusiform face area subserves face perception, not generic within-category identification
.
Nat Neurosci
.
7
:
555
562
.
Grill-Spector
K
Malach
R
.
2004
.
The human visual cortex
.
Annu Rev Neurosci
.
27
:
649
677
.
Grill-Spector
K
Sayres
R
Ress
D
.
2006
.
High-resolution imaging reveals highly selective nonface clusters in the fusiform face area
.
Nat Neurosci
.
9
:
1177
1185
.
Haist
F
Lee
K
Stiles
J
.
2010
.
Individuating faces and common objects produces equal responses in putative face-processing areas in the ventral occipitotemporal cortex
.
Front Hum Neurosci
.
4
:
181
.
Halgren
E
Baudena
P
Heit
G
Clarke
J
Marinkovic
K
Clarke
M
.
1994
.
Spatio-temporal stages in face and word processing. I. Depth-recorded potentials in the human occipital, temporal and parietal lobes [corrected]
.
J Physiol Paris
.
88
:
1
50
.
Hasson
U
Levy
I
Behrmann
M
Hendler
T
Malach
R
.
2002
.
Eccentricity bias as an organizing principle for human high-order object areas
.
Neuron
.
34
:
479
490
.
Haxby
JV
Gobbini
MI
Furey
ML
Ishai
A
Schouten
JL
Pietrini
P
.
2001
.
Distributed and overlapping representations of faces and objects in ventral temporal cortex
.
Science
.
293
:
2425
2430
.
He
B
Dai
Y
Astolfi
L
Babiloni
F
Yuan
H
Yang
L
.
2011
.
eConnectome: a MATLAB toolbox for mapping and imaging of brain functional connectivity
.
J Neurosci Methods
.
195
:
261
269
.
John
K
Ksenija
M
Patrick
C
Valeriy
N
Eric
H
.
2000
.
Early widespread cortical distribution of coherent fusiform face selective activity
.
Hum Brain Mapp
.
11
:
286
293
.
Kamitani
Y
Tong
F
.
2005
.
Decoding the visual and subjective contents of the human brain
.
Nat Neurosci
.
8
:
679
685
.
Kanwisher
N
McDermott
J
Chun
MM
.
1997
.
The fusiform face area: a module in human extrastriate cortex specialized for face perception
.
J Neurosci
.
17
:
4302
4311
.
Kanwisher
N
Yovel
G
.
2006
.
The fusiform face area: a cortical region specialized for the perception of faces
.
Philos Trans R Soc Lond B Biol Sci
.
361
:
2109
2128
.
Klopp
J
Halgren
E
Marinkovic
K
Nenov
V
.
1999
.
Face-selective spectral changes in the human fusiform gyrus
.
Clin Neurophysiol
.
110
:
676
682
.
Ku
S-P
Tolias
A
Logothetis
N
Goense
J
.
2011
.
fMRI of the face-processing network in the ventral temporal lobe of awake and anesthetized macaques
.
Neuron
.
70
:
352
362
.
Lachaux
J
Rudrauf
D
Kahane
P
.
2003
.
Intracranial EEG and human brain mapping
.
J Physiol Paris
.
97
:
613
628
.
Levy
I
Hasson
U
Avidan
G
Hendler
T
Malach
R
.
2001
.
Center-periphery organization of human object areas
.
Nat Neurosci
.
4
:
533
539
.
Lindén
H
Pettersen
K
Einevoll
G
.
2010
.
Intrinsic dendritic filtering gives low-pass power spectra of local field potentials
.
J Comput Neurosci
.
29
:
423
444
.
Liu
H
Agam
Y
J
Kreiman
G
.
2009
.
Timing, timing, timing: fast decoding of object information from intracranial field potentials in human visual cortex
.
Neuron
.
62
:
281
290
.
Malach
R
Levy
I
Hasson
U
.
2002
.
The topography of high-order human object areas
.
Trends Cogn Sci
.
6
:
176
184
.
Matsuo
T
Kawasaki
K
T
Sawahata
H
Suzuki
T
Shibata
M
Miyakawa
N
Nakahara
K
Iijima
A
Sato
N
et al
2011
.
Intrasulcal electrocorticography in macaque monkeys with minimally invasive neurosurgical protocols
.
Front Syst Neurosci
.
5
:
34
.
McCarthy
G
Puce
A
Belger
A
Allison
T
.
1999
.
Electrophysiological studies of human face perception. II: response properties of face-specific potentials generated in occipitotemporal cortex
.
Cereb Cortex
.
9
:
431
444
.
Mei
L
Xue
G
Chen
C
Xue
F
Zhang
M
Dong
Q
.
2010
.
The "visual word form area" is involved in successful memory encoding of both words and faces
.
Neuroimage
.
52
:
371
378
.
Moeller
S
Freiwald
W
Tsao
D
.
2008
.
Patches with links: a unified system for processing faces in the macaque temporal lobe
.
Science
.
320
:
1355
1359
.
Mukamel
R
Fried
I
.
2012
.
Human intracranial recordings and cognitive neuroscience
.
Annu Rev Psychol
.
63
:
511
537
.
Nobre
A
Allison
T
McCarthy
G
.
1994
.
Word recognition in the human inferior temporal lobe
.
Nature
.
372
:
260
263
.
Ojemann
J
Akbudak
E
Snyder
A
McKinstry
R
Raichle
M
Conturo
T
.
1997
.
Anatomic localization and quantitative analysis of gradient refocused echo-planar fMRI susceptibility artifacts
.
Neuroimage
.
6
:
156
167
.
Op de Beeck
H
Dicarlo
J
Goense
J
Grill-Spector
K
Papanastassiou
A
Tanifuji
M
Tsao
D
.
2008
.
Fine-scale spatial organization of face and object selectivity in the temporal lobe: do functional magnetic resonance imaging, optical imaging, and electrophysiology agree?
J Neurosci
.
28
:
11796
11801
.
Parvizi
J
Jacques
C
Foster
B
Witthoft
N
Withoft
N
Rangarajan
V
Weiner
K
Grill-Spector
K
.
2012
.
Electrical stimulation of human fusiform face-selective regions distorts face perception
.
J Neurosci
.
32
:
14915
14920
.
Pinsk
M
Arcaro
M
Weiner
K
Kalkus
J
Inati
S
Gross
C
Kastner
S
.
2009
.
Neural representations of faces and body parts in macaque and human cortex: a comparative FMRI study
.
J Neurophysiol
.
101
:
2581
2600
.
Price
C
.
2012
.
A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading
.
Neuroimage
.
62
:
816
847
.
Price
C
Devlin
J
.
2011
.
The interactive account of ventral occipitotemporal contributions to reading
.
Trends Cogn Sci
.
15
:
246
253
.
Puce
A
Allison
T
Asgari
M
Gore
JC
McCarthy
G
.
1996
.
Differential sensitivity of human visual cortex to faces, letterstrings, and textures: a functional magnetic resonance imaging study
.
J Neurosci
.
16
:
5205
5215
.
Puce
A
Allison
T
McCarthy
G
.
1999
.
Electrophysiological studies of human face perception. III: effects of top-down processing on face-specific potentials
.
Cereb Cortex
.
9
:
445
458
.
Rajimehr
R
Young
J
Tootell
R
.
2009
.
An anterior temporal face patch in human cortex, predicted by macaque maps
.
.
106
:
1995
2000
.
Rolls
ET
.
2010
.
Face Neurons
. In:
The oxford handbook of face perception
.
Oxford
:
Oxford University Press
. p.
51
75
.
Seth
A
.
2007
.
Granger causality
.
Scholarpedia
.
2
:
1667
.
Talairach
J
Tournoux
P
.
1988
.
Co-planar stereotaxic atlas of the human brain
.
New York
:
Thieme
.
Toda
H
Suzuki
T
Sawahata
H
Majima
K
Kamitani
Y
Hasegawa
I
.
2011
.
Simultaneous recording of ECoG and intracortical neuronal activity using a flexible multichannel electrode-mesh in visual cortex
.
Neuroimage
.
54
:
203
212
.
Tsao
D
Freiwald
W
Knutsen
T
Mandeville
J
Tootell
R
.
2003
.
Faces and objects in macaque cerebral cortex
.
Nat Neurosci
.
6
:
989
995
.
Tsao
D
Moeller
S
Freiwald
W
.
2008
.
Comparing face patch systems in macaques and humans
.
.
105
:
19514
19519
.
Tsuchiya
N
Kawasaki
H
Oya
H
Howard
MA
3rd
R
.
2008
.
Decoding face information in time, frequency and space from direct intracranial recordings of the human brain
.
PLoS One
.
3
:
e3892
.
Vapnik
VN
.
1998
.
Statistical learning theory
.
Wiley
.
Vidal
JR
Ossandon
T
Jerbi
K
Dalal
SS
Minotti
L
Ryvlin
P
Kahane
P
Lachaux
JP
.
2010
.
Category-specific visual responses: an intracranial study comparing gamma, beta, alpha, and ERP response selectivity
.
Front Hum Neurosci
.
4
:
195
.
Vinckier
F
Dehaene
S
Jobert
A
Dubus
J
Sigman
M
Cohen
L
.
2007
.
Hierarchical coding of letter strings in the ventral stream: dissecting the inner organization of the visual word-form system
.
Neuron
.
55
:
143
156
.
Vindiola
M
Wolmetz
M
.
2011
.
Mental encoding and neural decoding of abstract cognitive categories: A commentary and simulation
.
Neuroimage
.
54
:
2822
2827
.
Wandell
B
.
2011
.
The neurobiological basis of seeing words
.
.
1224
:
63
80
.
Weiner
K
Golarai
G
Caspers
J
Chuapoco
M
Mohlberg
H
Zilles
K
Amunts
K
Grill-Spector
K
.
2013
.
The mid-fusiform sulcus: a landmark identifying both cytoarchitectonic and functional divisions of human ventral temporal cortex
.
Neuroimage
.
84C
:
453
465
.
Weiner
K
Grill-Spector
K
.
2012
.
The improbable simplicity of the fusiform face area
.
Trends Cogn Sci
.
16
:
251
254
.
Weiner
K
Grill-Spector
K
.
2011
.
Neural representations of faces and limbs neighbor in human high-level visual cortex: evidence for a new organization principle
.
Psychol Res
.
77
:
74
97
.
Weiner
K
Grill-Spector
K
.
2010
.
Sparsely-distributed organization of face and limb activations in human ventral temporal cortex
.
Neuroimage
.
52
:
1559
1573
.
Wilke
C
Ding
L
He
B
.
2008
.
Estimation of time-varying connectivity patterns through the use of an adaptive directed transfer function
.
IEEE Trans Biomed Eng
.
55
:
2557
2564
.
Winawer
J
Horiguchi
H
Sayres
R
Amano
K
Wandell
B
.
2010
.
Mapping hV4 and ventral occipital cortex: the venous eclipse
.
J Vis
.
10
:
1
.