Heterogeneous Redistribution of Facial Subcategory Information Within and Outside the Face-Selective Domain in Primate Inferior Temporal Cortex

Abstract The inferior temporal cortex (ITC) contains neurons selective to multiple levels of visual categories. However, the mechanisms by which these neurons collectively construct hierarchical category percepts remain unclear. By comparing decoding accuracy with simultaneously acquired electrocorticogram (ECoG), local field potentials (LFPs), and multi-unit activity in the macaque ITC, we show that low-frequency LFPs/ECoG in the early evoked visual response phase contain sufficient coarse category (e.g., face) information, which is homogeneous and enhanced by spatial summation of up to several millimeters. Late-induced high-frequency LFPs additionally carry spike-coupled finer category (e.g., species, view, and identity of the face) information, which is heterogeneous and reduced by spatial summation. Face-encoding neural activity forms a cluster in similar cortical locations regardless of whether it is defined by early evoked low-frequency signals or late-induced high-gamma signals. By contrast, facial subcategory-encoding activity is distributed, not confined to the face cluster, and dynamically increases its heterogeneity from the early evoked to late-induced phases. These findings support a view that, in contrast to the homogeneous and static coarse category-encoding neural cluster, finer category-encoding clusters are heterogeneously distributed even outside their parent category cluster and dynamically increase heterogeneity along with the local cortical processing in the ITC.


Introduction
Humans recognize individual objects by sorting them into multiple categories, which are often hierarchically structured. For example, a dog is recognized more specifically by its breed (e.g., Dalmatian) or more vaguely as a 4-legged animal, depending on the context. The present study aims to clarify the mechanisms by which the hierarchical structure of perceptual categories is reflected in the co-ordinated activity of neuronal populations in the brain. Accumulating evidence suggests that the inferior temporal cortex (ITC) in the ventral visual system of primates contains neural correlates of different levels of category recognition, ranging from ordinate-level categorization (Rosch 1978;Wang et al. 1996;Haxby et al. 2001;Hung et al. 2005;Kiani et al. 2007;Sato et al. 2013) to subordinate-level discrimination (Wang et al. 1996;Sugase et al. 1999;Kreiman et al. 2000;Quiroga et al. 2005;Kriegeskorte et al. 2008;Huth et al. 2012;Sato et al. 2013). Neuroimaging and electrophysiological studies have indicated that there is a mosaic of brain regions highly selective to distinct coarse categories, such as faces (Kanwisher et al. 1997;Tsao et al. 2003;Tsao et al. 2006;;Sato et al. 2013), places (Epstein and Kanwisher 1998), and other objects (Bell et al. 2011;Ku et al. 2011;Sato et al. 2013) in the ITC. Animal studies have also shown that neuronal activity in the ITC is selective to different subcategories of face, such as faces from specific viewing angles (Wang et al. 1996) and faces of particular animal species (Sato et al. 2013). Some neurons in the anterior/medial temporal lobe have been found to be sensitive to facial identities regardless of the viewing angle (Quiroga et al. 2005). However, there has been little evidence about the ways in which neuronal representations of facial subcategories (facial species, views, and identity) are spatially and temporally organized, or how subcategory-encoding neuronal clusters, if any, are topologically related to the coarser face categoryselective cluster in the ITC. The present study examined these questions in 3 steps.
First, we investigated whether neurons selective to facial subcategories form discrete clusters in the ITC. Specifically, to estimate the spatiotemporal clustering of neuronal activity representing ordinate (face) and subordinate (facial view, species, and identity) categories, we test whether multichannel patterns of multi-unit activity (MUA), local field potentials (LFPs), or electrocorticogram (ECoG) across a region in the anterior ITC contain information sufficient to predict distinct levels of the sought category, using a decoding-based approach. The scale of spatiotemporal summation has been shown to vary across spiking activities, LFPs, and ECoG by direct comparisons in rodent (Helmchen et al. 1999), cat (Contreras and Steriade 1995), and macaque cortices (Belitski et al. 2008;Buzsaki et al. 2012). Thus, the difference in the amount of category information extractable from respective recorded data would be expected to reflect the spatiotemporal scale and uniformity of category-specific neuronal clusters (Kamitani and Tong 2005). Furthermore, comparison of decoding accuracy with simultaneously acquired MUA, LFPs, and ECoG may enable a reasonable prediction about whether category information of a particular level is enhanced or reduced by spatial summation up to several millimeters and could aid the understanding of spatiotemporal clustering of the neuronal activity encoding different levels of category information in the ITC. For simultaneous acquisition of MUA, LFPs, and ECoG data, we combined a high-density surface field potential recording technique recently established in our laboratory (Matsuo et al. 2011;Toda et al. 2011;Nakahara et al. 2016) and a high-density microelectrode-array technique (Dotson et al. 2015).
Second, we estimated the frequency dependency and temporal stability of category-specific IT architecture, again using the decoding approach. Previous studies have indicated that individual IT neurons can change their category preferences over the visual response time course, developing a preference for finer categories (Sugase et al. 1999) and sharpening stimulus tuning (Tamura and Tanaka 2001;Brincat and Connor 2006). However, little is known about whether category-selective IT architecture defined by frequency-specific synchronous activity is stable or changes dynamically during the visual response. In early visual cortices, LFPs, particularly the stimulus-locked early theta and initial transient high-gamma power ("evoked activity"), mainly reflect the initial synaptic inputs to the granular cortical layer and the immediately following polysynaptic activity within the local recorded region (Mitzdorf 1985(Mitzdorf , 1987Belitski et al. 2008). In contrast, high-gamma power in the later period ("induced activity") reflects further processing in the local recurrent network (Buzsaki et al. 2012). Recent studies have reported that low-frequency LFPs carry spike firingindependent information in the primate primary visual cortex (V1; Belitski et al. 2008). In the current study, we examined whether high-frequency LFPs carry category-selective information that is tightly coupled with output spike selectivity, and whether low-frequency LFPs carry spike-independent category information in the ITC, as in V1. For this purpose, we compared category-level-specific information embedded in early evoked LFPs, late-induced LFPs, and MUA. Further, by examining the time-frequency specificity of the decoded signals, we tested whether elaboration of categorical cortical representations through local processing within the ITC, from the early "evoked" low-frequency-dominant architecture to the late "induced" high-frequency-dominant architecture, depends on the level of category.
Interpretation of the spatial scale of different category clusters in the ITC by differences in decoding accuracy with LFPs, MUAs, and ECoGs is reasonable (see comparison of LFP, but not ECoG, to multiple levels of spatially summated MUA signal in macaque IT) (Kreiman et al. 2006), but suggestive. Thus, in the third part of the paper, we aimed to clarify the spatial and temporal factors contributing to the category-level-dependent "spatiotemporal neuronal clusters" identified by the decoding analyses. Specifically, we focused on the LFP-based IT architecture encoding the face category and its subcategories. We created coarse (ordinate) category including faces, and facial species, view, and identity (subordinate) category selectivity maps from early evoked low-frequency LFPs and late-induced high-frequency LFPs, and examined whether the actual clustering of neuronal activity with similar category selectivity in cortical space contributed to "spatiotemporal clusters." Further, to clarify the spatial relationship between cortical representations of a parent category and its subcategories, we tested whether the clustering of facial subcategory-selective channels and the strength of channel-wise subcategory selectivity are greater within the parent face category domain than outside. It has been previously reported that the spatial reach of the recorded neural signal depends not only on spatial configuration but also on the temporal coherence of the source signals because phase matching of synaptic activity affects the spatial summation of the signal (Linden et al. 2011;Einevoll et al. 2013). By analyzing the phase of evoked LFPs, we investigated whether spatial patterns and temporal coherence both contribute to the separation of species and view category information.

Animals
Two Japanese macaque monkeys (Macaca fuscata), 1 male (9.5 kg) and 1 female (5.7 kg), provided by the National BioResource Project "Japanese Monkeys" by MEXT Japan, were used for the experiments. All experiments were performed in accordance with the National Institutes of Health Guidelines for the Care and Use of Laboratory Animals. The experimental protocol was approved by the Niigata University Institutional Animal Care and Use Committee.

Task and Stimuli
Monkeys were trained in a visual fixation task (Fig. 1A) to keep their gaze within a 2-3°fixation window, while a 0.2-0.3°fixation spot was displayed on a 22-inch cathode ray tube monitor (Mitsubishi Electric, Tokyo, Japan) at a viewing distance of 57 cm. After 300 ms of stable fixation, a stimulus image was presented for 300 ms, followed by a 600-900-ms blank interval. Two or three stimuli were successively presented in a single fixation session. Monkeys passively viewed the stimulus set and were rewarded with a drop of apple juice for maintaining fixation over the entire duration of the trial. Eye movements were captured with an infrared camera system (i-rec https://staff.aist.go.jp/k.matsuda/ iRecHS2/index_e.html, date last accessed December 14, 2017) at a sampling rate of 60 Hz. The behavior of animals was controlled by an in-house program written in MATLAB (Mathworks, Natick, USA) and OpenEx (TuckerDavisTechnologies (TDT), Alachua, USA) running on a Windows PC and a multicore digital signal processor (RZ2, TDT), which make up a multichannel acquisition system (System3, TDT). Stimuli were presented via the ViSaGe System (CambridgeResearchSystem, Rochester, UK), which was controlled by another in-house MATLAB program that also feeds stimulus timing to TDT with a transistor-transistor-logic (TTL) pulse.

Anatomical MRI
To acquire structural images of the monkey brains, we used a 4.7-T MRI scanner with 100-mT/m actively shielded gradient coils and a volume radiofrequency (RF) coil (Biospec 47/40; Bruker, Ettlingen, Germany). High-resolution, T1-weighted structural images were scanned using a 3D MDEFT (modified driven equilibrium Fourier transform) sequence (voxel = 0.5 × 0.5 × 0.5 mm 3 ). Throughout the MRI session, we maintained the monkeys under anesthesia. Anesthesia was introduced with an intramuscular injection of medetomidine/midazolam (30 μg/kg and 0.3 mg/kg, respectively) and ketamine (0.5 mg/kg) before MRI scans. During acquisition of MRI, anesthesia was maintained with continuous intravenous infusion of propofol (5-10 mg/kg/h) and intramuscular injections of xylazine (1 mg/kg) as needed. Glucose-lactated Ringer's solution was given intravenously (5 ml/kg/h). Heart rate, oxygen saturation, and blood pressure were continuously monitored.

Recording Electrodes
The multimicroelectrode array used for MUA and LFP recording was customized from a commercially available semichronic microdrive system (SC60-1; Gray Matter Research, Bozeman, USA). The array consisted of 60 microelectrodes arranged in a grid configuration with 1.2-mm interelectrode spacing ( Fig. 2A).
Each microelectrode was 75-μm-diameter iridium coated with Parylene-C (Poly(chloro-para-xylylene)) having typical impedance of 0.5 MΩ measured at 1 kHz. ECoG electrodes were prepared via micromachine techniques using 0.25-μm-thick gold wiring and 10-μm-thick Parylene-C insulation with the recording contacts exposed in a 100 × 100 μm square shape (Fig. 2C, Supplementary Fig. S1B). ECoG contacts were arranged in a grid shape matching the spatial configuration of the multimicroelectrode array ( Fig. 2A inset, Fig. 2C). The lead wires and Parylene-C insulation were aligned in columns with slits between them (Fig. 2C, Supplementary Fig. S1B Figure 1. Visual stimuli and the presentation paradigm. (A) Two to three stimuli were presented during a passive fixation task. (B) Stimulus set consisting of different categorical levels. The coarse category set included faces, face parts, bodies, body parts, and inanimate objects. The fine categories were the face images used in the coarse category subdivided into species, views, and identities. The species category set included human faces and monkey faces with frontal view angle and gaze direction. The view and identity sets included human faces of 5 identities in 3 viewing angles. (C) Stimulus set used for monkey C, with the coarse category structure for faces, face parts, bodies, body parts, and inanimate objects that correspond to the coarse categories of the stimulus set for monkey H shown in (B).
bundles led from the ECoG probe to two 0.025-inch pitch 36-pin connectors ( Supplementary Fig. S1B, C; #A8828-001-vv; Omnetics, MN, USA). Additional details on the ECoG-manufacturing process have been described previously (Takeuchi et al. 2005;Toda et al. 2011). Gold-Parylene-C ECoG electrodes were attached to the bottom of a silicone artificial dura ( Fig. 2A), which resembled the design of the "artificial dura" used in in vivo optical imaging techniques (Arieli et al. 2002). Small protrusions of the insulation film were inserted into the slits on the brim of the artificial dura and fixed using a small amount of silastic rubber for mechanical stability ( Fig. 2A). ECoG probe and microelectrode array were assembled together and implanted onto the cortical surface on area TE of the IT cortex ( Fig. 2A, B).

General Surgical Procedures
General procedures of the surgery largely overlap with those described in a previous report (Matsuo et al. 2011). Anesthesia was introduced with an intramuscular injection of medetomidine (30 μg/kg) and ketamine (1 mg/kg). Animals were artificially respirated with oxygen and maintained in anesthesia with isoflurane (1-2%) during the surgeries. The venous line was secured using lactated Ringer's solution, and ceftriaxone (100 mg/kg) was dripped as a prophylactic antibiotic. Animals received ketoprofen as an analgesic for 3 days, and the antibiotics were continued for 1 week after surgery. Oxygen saturation, heart rate, and end-tidal CO 2 were continuously monitored (Surgi Vet; Smiths Medical PM Inc., London, UK) throughout surgery to adjust the levels of anesthesia. Body temperature was maintained at 37°C using an electric heating mat. The skull was fixed with a 3-point fastening device (Integra Co., NJ, USA) with a custom-downsized attachment for macaques and a vacuum-fixing bed (Vacuform, B.u.W.Schmidt GmbH, Garbsen, Germany) was used to maintain the position of the body. Following skin incision, zygomatic arch, temporal muscle, and the upper portion of the mandible bone were removed to facilitate the approach. A burr hole was opened in the inferior temporal portion of the skull (Fig. 2B) by a perforator (Primado PD-PER; NSK, Tochigi, Japan) with an attachment for infants (DGR-OS Mini 8/5 mm R; Acura-Cut Inc., MA, USA). Hemorrhage from the dura was controlled by a bipolar coagulator (Bipolar SX-2001; Tagawa Electronic Research Institute, Chiba, Japan).

Implant Surgery
We implanted the chronic recording device from the temporal side ( Fig. 2A, B). An artificial dura that has the ECoG probe attached to its bottom ( Fig. 2A, Supplementary Fig. S1A, D) was placed onto the surface on area TE of the IT cortex, covering the IT gyrus and extending slightly below AMTS ( Fig. 2A, B), through a window on the dura. A 3-piece metal chamber system was used as the interface between the skull and electrode arrays. The bottom chamber ( Fig. 2A, Supplementary Fig. S1A, C) fit tightly to the craniotomy window that was made on the skull. Titanium anchor screws were placed on the skull, and dental resin firmly attached the chamber to the skull. Canals on the inner wall of the chamber and the protruding ridges on the outer wall of the cylindrical part of the artificial dura aligned the ECoG probe and the microelectrode array. The middle chamber was slowly inserted into the inner wall of the cylindrical part of the artificial dura ( Fig. 2A, Supplementary Fig. S1A, D), whereas the wall of the artificial dura was securely held up with a 5-0 nylon thread. ECoG lead wires exited through an opening located between the 2 chambers (curved arrows on the ECoG probe in Fig. 2A, Supplementary Fig. S1D), and the 2 chambers were firmly attached by screws. The opening made for the ECoG wire was later closed with a quick-curing silastic rubber (kwik-sil; WPI, Sarasota, USA). The microdrive was inserted into the second chamber, and the third piece of the chamber was firmly screwed to the second piece, thereby attaching the microdrive to the second chamber. The microdrive and second chamber were precisely aligned by a pin located on the microdrive and a hole located on the second chamber. The electrode assembly accessed the IT cortex at a pre-allocated position and angle, which were determined via an anatomical MRI scan (Fig. 2B). The sharp iridium microelectrodes used for MU and LFP recordings penetrated through the silicone membrane and went through the slit in the Parylene-C insulation (Fig. 2C). ECoG contacts and microelectrodes were arranged in the same spacing and configuration but shifted by half of the spacing distance. Electrodes were placed on area TE of the IT cortex, covering the IT gyrus and extending slightly below AMTS ( Fig. 2A, B).

Daily Recordings
Daily recording experiments included 2 steps. First, the animal's head was fixed in the chair and the quality of multi-unit recording from the microelectrodes was quickly examined qualitatively on the basis of the signal-to-noise ratio (S/N) of the signal. We adjusted the depth of the electrodes that had poor recording quality. However, to minimize the working time of the animal and the risk of pushing down the cortex, we adopted the following strategy when choosing the electrodes that were to be manipulated. In the initial 2 weeks of the experiment, up to 15 electrodes were manipulated per day. In later sessions, we took the history of recording quality into account; electrodes with a poor S/N history were left untouched, and electrodes with an intermediate S/N history were adjusted, persuading only to the level matching that of the preceding recording sessions. This allowed us to limit the electrode adjustment time to 1-1.5 h per day.

Stimulus Image Set
The stimulus set consisted of images that belonged to 1 of 3 discrete "coarse" categories (face, body, and inanimate object) and 2 additional categories, namely modified face (parts-scrambled face and face part) and body part (hand) (Fig. 1B). Images that belonged to the face category were further divided into subcategories ( Fig. 1C) that overlapped partially. One was the "species" category, which consisted of the human face group and the macaque face group. Another was the "view" category, which consisted of human face images with 3 different views, with each view having 5 different identities. The same image set was also used as the "identities" category, which was set up by grouping the images into different identities, with each identity having 3 different views.

Data Analysis Part 1: Data Acquisition, Frequency Spectrum
Data Acquisition MUA, LFP, and ECoG data were simultaneously recorded using the TDT System3. MUA and LFP were recorded from the 60 penetrating microelectrodes, and ECoG was recorded from the 60 surface-contact electrodes. Signals were fed to headstage amplifiers (ZC 32 and ZC64, TDT) and a preamplifier/digitizer (PZ2, TDT) and then fed into the digital signal-processing module (RZ2, TDT). For multi-unit data, the signal was band-pass filtered between 300 Hz and 5 kHz, and the time points at which the waveform exceeded 3.7 × the standard deviation (SD) of the signal were stored as multi-unit time stamps. For LFP and ECoG data, the signal was initially stored in wide band (no digital filtering). Acquired data were analyzed with in-house programs that run on MATLAB. Visually evoked MUA was converted to spike density function using kernel optimized for the spiking rate of each of the respective stimulus condition (Shimazaki and Shinomoto, 2010;Fig. 2D). A multi-unit was considered to be visually responsive if the firing rate in the visual stimulation period and that in the prestimulus period differed with statistical significance (P < 0.05, 2-sample Kolmogorov-Smirnov test, corrected for multiple comparisons using the Bonferroni method by the number of stimuli).

Features of MUA
We used the frequencies of spiking activity of MUA as input features for classification. Unless stated otherwise, spike rates from multiple electrodes and time windows were combined. We used MUA signals during a period from −50 ms to 600 ms relative to the stimulus onset in each trial. The signal at each microelectrode was sampled using a 100-ms time window that was shifted by 50 ms, and the spike rate in each time window was calculated. The spike rates of all microelectrodes and the 12 consecutive time windows were used as the input features to a decoder. The features used for characterizing the time course of decoding accuracy were limited to a single time window. The spike rates of all electrodes in a single 100-ms sliding time window were used. The time window was slid by 25 ms, and the decoding accuracy was calculated as a function of time. The spike rates of a single electrode from the 12 time windows were used for characterizing the decoding accuracy of each single electrode. We excluded the MUA data that did not yield significant visually evoked response, as defined by pairwise Kolmogorov-Smirnov (P < 0.05, corrected by Bonferroni method with the number of stimulus images) between the prestimulus period and the evoked period.

Features of ECoG and LFP Signals
For classification, we used the mean amplitudes and spectral powers of the ECoG/LFP signals as input features. To compare the decoding performance with that obtained using MUA, we excluded the data from the ECoG (LFP) electrodes that overlay (matched) the microelectrodes that did not yield good MUA signals. We used ECoG/LFP signals during a period from −50 ms to 600 ms relative to the stimulus onset in each trial. Unless stated otherwise, the mean amplitudes and spectral powers from multiple electrodes and time windows were combined. Two types of features were computed from ECoG/LFP signals: one was the total power summed across the frequency spectrum, while the other was the wavelet power separately obtained for respective frequencies. To obtain the total power, the signal at each electrode was sampled using a 100-ms time window that was shifted by 50 ms, and the spectral powers of the 101 frequency bands (10-1000 Hz, with 10-Hz intervals) in each time window were calculated using Fast Fourier Transform. The mean of all the frequency powers was taken as the "total power" of the time window, and the total powers from all electrodes and the 12 consecutive time windows were used as the features of input into a decoder (Fig. 3B, C). To obtain the wavelet power, the original signal was convoluted with a Gabor (Morlet) wavelet, with the sinusoidal carrier frequencies in theta (4 Hz), alpha (12 Hz), beta (24 Hz), low gamma (40 Hz), and high gamma (80 Hz). DC was the mean of the squared raw voltage values within the time window. The wavelet at each frequency had a Gaussian envelope width (σ) that was equal to the cycle period (frequency −1 ) of the carrier and had tail truncation at 2σ of the Gaussian envelope (double of the carrier cycle period). The spectrograms obtained after the power of each frequency was normalized to the power observed in the prestimulus period (−200 to 0 ms) are shown in Figure 2D. The mean of the total power from the time bins in the range of 50-450 ms was used in multidimensional scaling (MDS) analysis (Fig. 3A).
In the analysis performed to compare stimulus selectivity and decoding accuracy between frequency bands, the power of each frequency was binned within the 100-ms time window that was shifted by 50 ms. For stimulus selectivity analysis ( Supplementary Fig. S2) and for generating the category selectivity d′ map (Fig. 5), the response of the respective frequency band was the mean of time bins in the 50-450-ms range, collected for each channel. Stimulus selectivity was compared between the trial-averaged data of respective measurement methods. The d′ map was generated using the mean of odd trials to compute the preferred category and using the even trials to compute the d′ of the preferred category. For frequency-dependent decoding analysis (Fig. 4A), power from all electrodes and the 12 consecutive time windows for the respective frequencies was taken as the features of input into a decoder. The features used for characterizing the time course of decoding accuracy were limited to a single time window (Figs. 4B and 6A, B). The mean amplitudes and powers of all electrodes from a single 100-ms sliding time window that was slid by 25 ms were used, and the decoding accuracy was calculated as a function of time. Phase-locking value (PLV) of the theta frequency was computed from the theta wavelet phase response (Fig. 6C). First, the phase of each channel at a fixed post-stimulus time point was plotted as unit-length vectors in the complex plane. Then, the PLV was computed as the length of a vector-sum (resultant vector) of these channel-wise theta phase vectors in the complex plane. Statistically significant difference of the PLV values was evaluated by Mann-Whitney U-test for species (human/monkey) categorization and by Kruskal-Wallis test for facial view (right/center/left) categorization. Pair-wise difference between the facial views was tested with post-hoc Bonferroni-Dunn method. The fixed poststimulus time point was set to 75 ms after the stimulus onset, where difference between the LFP and ECoG time course reached its maximum slope.

Data Analysis Part 2: Decoding
Decoding Analysis Using a neural decoding approach, the efficacy of extraction of visual object information from single-trial signals was compared between ECoG, LFP, and MUA. The decoding performance of each signal method was evaluated by pair-wise decoding analysis. We selected a pair of object categories and selected the trials in which the images included in those 2 categories were presented. Using those trials, a binary classifier (decoder) was trained to predict the category of a presented image on a trial-by-trial basis and was tested (Kamitani and Tong 2005). We applied this procedure to all pairs of the 3 coarse categories (face, body, and inanimate object); modified face and body part were not included in the decoding analysis because they do not fully qualify as the face or body category. All pairs of the 3 view categories, all pairs of the 5 identity categories, and the pair of the 2 species categories were decoded similarly. Each binary decoder consisted of a linear support vector machine (Vapnik 1998) implemented by LIBSVM (Chang and Lin 2011). Before decoder training, we used a feature-normalization procedure and a feature-selection procedure. In the feature-normalization procedure, the values of each feature were z-transformed using the sample mean and SD calculated using the training data set. In the feature-selection procedure, the dimensionality of the feature vector was reduced by selecting informative features on the basis of univariate analysis (F-statistics) applied to the training data set. We ranked the features according to the F-value that indicated differential responses to the categories, and the top 100 features were used as input into the decoder. In cases in which the number of original features used for classification was equal to or less than 100, we omitted this feature-selection procedure and used all features. Decoding performance was evaluated by cross-validation analysis. To evaluate generalization performance for category classification across different exemplars, we ensured that trials that corresponded to the same visual stimuli were not included in the training and test data sets (Vindiola and Wolmetz 2011). For each category pair, we randomly selected N exemplars per category. N was set to the number of the exemplars of the category that had fewer exemplars than the paired category. We divided the N × 2 exemplars into N groups, each of which contained 2 exemplars from the 2 different categories and divided the corresponding trials into N groups. (N − 1) groups were then used to train a decoder, and the remaining group was used to evaluate the trained decoder. This procedure was repeated until the trials from all N groups were tested (N-fold cross-validation), and the percentage of correct classification was calculated.
Decoding with Spatial Shuffling For spatial shuffling, we shuffled the original wavelet power response vectors (ECoG and LFP) or the spike rate response vectors (MUA) in the spatial domain by exchanging the channel label for each stimulus presentation trial. The range of spatial shuffling varied from 4 to 60 channels ( Supplementary Fig. S5B). We quantified the drop in decoding performance on the basis of the difference in performance in the condition without shuffle and the condition with a maximum 60-channel shuffle ( Supplementary  Fig. S5). The maximum drop in decoding performance and the drop rate (sharpness of the drop) were quantified in the same manner as that in the spatial shuffling. We also quantified the drop rate of decoding performance with regard to shuffling. The decoding performance was fit with a curve that was defined as y = A exp(−Bx) + C (Supplementary Fig. S5A; x, size of the subarea used for shuffling;  quencies. Note that the number of features before feature selection was identical across methods and frequencies, with the exception of the "all" condition. An equal number of features were selected across frequency bands, including the "all" condition. Each line color represents the performance of each recording method, which is denoted in the caption. Error bars and red shadings around the MUA lines indicate the 95% confidence limit, assuming binomial distribution. (B) Time course of category decoding performance across recording methods (ECoG, LFP, and MUA) and frequency bands (high-gamma, theta, and DC bands). Each colored line represents performance in each category denoted in the caption. The details of the feature extraction and the decoding methods were equivalent to those described in Figure 3, with the exception that features from the corresponding time bins were used at each time point (Materials and Methods). Shadings show the stimulus presentation period.
shuffling to multichannel field potential data (Majima et al. 2014). For category decoding with shuffled training data and original test data, training data were shuffled across the trial for every Nfold cross-validation procedure. For category decoding with shuffled training and test data, the original data were first shuffled across trial and processed for further decoding analyses.

Results
To explore and compare spatiotemporal organizations for ordinate and subordinate categories in the ITC, we recorded neural activity from 2 monkeys (Macaca fuscata) performing a passive viewing task. In this task, the animal must maintain fixation while 2 or 3 visual stimuli from a hierarchically categorized stimulus set were sequentially presented (Fig. 1A). Visual stimuli were classified into 3 "coarse (ordinate)" categories (face, body, and inanimate object; Fig. 1B), and the face category was divided into subordinate categories (Fig. 1C) based on "species" (human faces and macaque faces). Human face category was further divided into "view" (3 different views of human faces) and "identity" (5 individuals regardless of the view angles) subcategories. Our novel electrode assembly enabled simultaneous highdensity recording of MUA, LFP, and ECoG from a 12 mm × 12 mm local region in the anterior ITC ( Fig. 2A, B). MUAs (Fig. 2D top left) and LFPs (Fig. 2D middle) were recorded from the same penetrating microelectrode array (Fig. 2C closed arrowheads; see black spots in Fig. 2A inset for the spatial arrangement). ECoG (Fig. 2D bottom) was recorded from the surface electrode array (Fig. 2C open arrowheads; see yellow spots in Fig. 2A inset for the spatial arrangement) that covered the same local cortical region. The microelectrodes penetrated the slits in the ECoG probe, avoiding electrode contacts and lead wires (Fig. 2C).

A D A
Early  Early θ Late high γ Error bars indicate the standard errors.

Spatiotemporal Homogeneity of Category-Encoding Neural Activity Depends on the Ordinate Level of the Category
We compared the amount of category information obtained from the multichannel patterns of visually evoked MUA, LFP, and ECoG signals that record neural activity with different scales of spatial and temporal summation. MDS and decoding-based analyses were performed by extracting the same number of features from the respective recorded data sets: total powers from ECoG and LFP and mean firing rate from MUA (see Materials and Methods). MDS revealed that with all the 3 recording methods the visual responses to coarse categories (faces, bodies, and inanimate objects) showed a clear tendency to form discrete clusters (Fig. 3A). To estimate the spatiotemporal scale and homogeneity of functional neuronal clusters representing multiple levels of visual category, we examined how reliably the stimulus category was decoded from single-trial ECoG, LFPs, or MUA using a linear support vector machine (Vapnik 1998). The generalization accuracy for the coarse category classification (Fig. 3B) was well above the chance level of 50% for all 3 recording modalities (see Materials and Methods). In particular, the single-trial ECoG and LFPs carried sufficient information for predicting the coarse category with a correct classification rate of 88.9% and 92.0%, respectively. These were significantly higher (P < 0.05 and P < 0.001, chi-squared test corrected for multiple comparisons) than the performance obtained using MUA responses (87.4%), indicating that summation of neural activity in a certain spatiotemporal scale enhanced the coarse category selectivity. However, for subordinate category classifications, MUA was the best of the 3 recording methods (Fig. 3C, brown bars) -MUA (69.2%) and LFP (64.3%) carried significant facial identity information, whereas ECoG (51.5%) did not (Fig. 3C right). The correct classification rates were 79.2% (MUA), 75.5% (LFP), and 73.0% (ECoG) for facial view angles (Fig. 3C  left), and 82.2% (MUA), 79.5% (LFP), and 74.6% (ECoG) for facial species (Fig. 3C middle). The superiority of MUA suggests that subordinate categories are encoded in finer and/or more heterogeneous spatiotemporal patterns. For example, the activity of neighboring neurons may be tuned to different individuals (identity), where they could be considered similar in a sense that both are tuned to the face category. Otherwise, population neuronal responses selective to facial identities may be temporally incoherent. In any case, columnar or larger scale spatiotemporal summation of neuronal activity may result in substantial reduction of the subordinate category information, whereas the coarser category information was relatively preserved or enhanced. Decoding of the species and the view categories had characteristics that (1) differed from the coarse category decoding in that performance with MUA was superior to ECoG and (2) differed from the identity decoding in that ECoG showed moderately but significantly above-chance decoding performance. Because these 2 categories considerably have intermediately fine and/or homogeneously patterned cortical representations, we call them "intermediate categories" from here on.

High-Frequency LFPs Specifically Contains Spike-Coupled Category Information
In the analyses so far described (Fig. 3), category decoders used total power of ECoG and LFP discarding frequency-specific features for comparison of the detectability by LFP, ECoG, and MUA with an equal number of features. However, it is plausible that powers in different frequency ranges carry qualitatively independent information having affinity to distinct types of the source neural signal (e.g., either input-or output-related signal of the recorded cortical region). Here, we tested a possibility that low-frequency LFPs carry spike-independent and input-biased category information whereas high-frequency LFPs carry category information tightly coupled to the output spike firing in the ITC, as has been reported for evoked visual responses in the V1 (Belitski et al. 2008). We first examined correlations of stimulus selectivity, rather than category selectivity, across the recording modalities in different frequency ranges (Supplementary Fig. S2). We found that the stimulus selectivity of theta-band (4 Hz) ECoG power strongly correlated with that of theta-band LFP (R = 0.81, P = 1.8 × 10 −35 ). High-gamma-band (80 Hz) ECoG and LFP exhibited a significant (R = 0.38, P = 2.0 × 10 −6 ) but weaker correlation. In contrast, MUA correlated strongly with high-gamma-band LFP (R = 0.61, P = 2.5 × 10 −16 ), but not significantly with theta-band LFP (R = 0.029, P = 0.72), theta-band ECoG (R = 0.010, P = 0.90), or high-gamma-band ECoG (R = 0.049, P = 0.56). This methodspecific and frequency-specific correlation, observed in 2 monkeys across channels ( Supplementary Fig. S2B), indicates that LFP carried MUA-coupled stimulus information in the highfrequency powers, but not in the low-frequency powers. To address whether the method and frequency dependency found in the stimulus selectivity is also found in the category selectivity, we decoded multiple level of categories from the stimulus-evoked ECoG and LFP in each frequency range separately (Fig. 4A). For ECoG-based coarse category decoding, the correct classification rate was highest when low-frequency components such as DC and theta power were used (Fig. 4A top left, black line). In higher frequency ranges, the performance was above the chance level but was less accurate, with beta power giving the minimum performance. Although the overall frequency profile of LFP-based coarse category decoding ( Fig. 4A top left, gray line) was similar to that of ECoG ( Fig. 4A top left, black line), the classification rate with high-gamma LFP was notably higher than high-gamma ECoG, and comparable to the performance with theta LFP. This finding implies that the high-gamma LFP contains MUA-coupled category information, which the high-gamma ECoG does not contain. In facial identification with LFP, the maximum classification rate was obtained with high-gamma component ( Fig. 4A bottom right, gray line), which is also consistent with the idea that high-gamma LFP carried fine category information coupled with MUA.

Subordinate Category Decoding Depends on Recording Method and Signal Frequency
The classification levels of coarse category were similarly high regardless of whether low-frequency LFPs/ low-frequency ECoG or high-frequency LFPs/MUA were used (Fig. 4A top left). In contrast, the classification level of intermediate category (facial species and facial view) depended both on the spatial summation specific to the recording method and on the frequency of the signals used as features for machine learning (Fig. 4A top right and bottom left). Low-frequency components (e.g., theta power and DC) of LFP and ECoG both classified the intermediate categories significantly above chance. When the high-frequency component (e.g., high-gamma power) was used, however, the classification was significant with the less spatially summated LFP, but not significant with the more summated ECoG (Fig. 4A top right and bottom left). These results led us to a hypothesis that 1) for coarse categories, the functional architecture based on high-frequency LFPs may be similarly organized as those based on low-frequency LFPs/ECoG and that 2) for the intermediate, species and view categories, the low-frequency field signals form neural clusters with intermediate spatiotemporal homogeneity whereas the high-frequency field signals were relatively distributed or heterogeneous, forming no electrocorticographically detectable homogeneous clusters, in the macaque ITC.

Double Dissociation of View and Species Decoding Between Early Theta ECoG and Late High-Gamma LFP
There is an interesting contrast between the temporal profile of the facial species decoding and facial view decoding. In the early "evoked" period of the visual response (100-200 ms after the stimulus onset), where the initial synaptic inputs and polysynaptic activity should dominate (Mitzdorf 1985), the correct classification rate with theta ECoG (Fig. 4B top center) was higher for view (green) than for species (blue). The classification rate with early high-gamma ECoG (Fig. 4B top left) was much lower but exhibited similar tendency. In this early evoked period, however, there was no difference between view and species decoding with theta LFP (Fig. 4B middle center) or high-gamma LFP (Fig. 4B middle left). In contrast, in the late "induced" period of the visual response (300-500 ms after the stimulus onset), species decoding with high-gamma LFP was slightly superior to view decoding (Fig. 4B middle left). Superiority of species decoding to view decoding was observed neither with high-gamma ECoG nor with theta LFP/ ECoG. These findings suggest that category information extractable from the activity of neural clusters in the ITC not only depends on the method-specific spatial summation and the frequency of neuronal synchrony but also on the latency, namely the early "evoked" period and the late "induced" period, underscoring the necessity to scrutinize the category-specific functional architecture of early evoked theta LFP/ECoG and late-induced high-gamma LFP separately.

Mapping Category-Selective "Homogeneous Clusters" in the Cortical Space
To test whether the category-encoding "spatiotemporally homogeneous neural clusters" implied by the decoding analyses correspond to the actual clustering of neurons with similar category selectivity in the cortical space, we examined spatial patterns of category selectivity maps (d′ maps) generated from the early low-frequency LFPs and the late high-frequency LFPs for both monkeys (Fig. 5). We found that the category-specific decoding performance with LFPs ( Fig. 4) approximately corresponded to the strength of channel-wise selectivity (d′ value depicted by the diameter of colored circles in Fig. 5), which we speculate to reflect a local, columnar-scale (several hundred micrometer) summation of similar category-selective neuronal activity. In contrast, the decoding performance with ECoGs appeared to reflect a larger, across-channel (several millimeter) homogeneity of category selectivity in early low-frequency LFP maps. Typically, the coarse category maps exhibited a group of face-selective channels in the anterior part of the chamber for monkey H (Fig. 5A top), and in the dorsal part for monkey C (Fig. 5A bottom). The early thetadefined view categorization map was dominated by a large "leftview"-selective homogeneous region except for a small region in the dorsal portion within the chamber (Fig. 5E left). Similarly, the early theta-defined species categorization map exhibited a "monkey face" selective dorsal region for monkey H (Fig. 5B top left), or a larger but weakly selective "human face" region for monkey C (Fig. 5B bottom left). The late gamma-defined categorization maps tended to have more distributed form for both view and species categorization (Fig 5B, E left). Interestingly, the channels selective to particular facial species, facial views, and facial identity spanned not only within but also outside of the face-selective region (see light-colored region in Fig 5B, E, and F). These results suggest that not only the spatial clustering but also spatially spanned homogeneity of low-frequency neuronal activity is the physiological correlate of the "spatiotemporal homogeneous clusters" implied by the decoding-based analysis.

Spatial Factors Partially Explain Dissociation Between View and Species Decoding
Does the spatial clustering give a reasonable account on the double dissociation of the view and species decoding between the early theta ECoG and late high-gamma LFP? The left-viewselective cluster in the view early theta d′ map (Fig. 5E left) was larger but more heterogeneous than the human-selective cluster in the species d′ map (Fig. 5B left). The larger spatial span of the signal source is advantageous, but the heterogeneity of the signal source is disadvantageous for decoding with ECoG signals that go through extensive spatiotemporal summation. To quantify the net effect of larger but more heterogeneous clustering of the view-selective signals in comparison to the species-selective signals, we conducted decoding analysis using spatially shuffled LFP data ( Supplementary Fig. S5), where the channel assignment within various-size subareas of the chamber was randomly shuffled (Materials and Methods; Supplementary Fig. S4A). As the shuffled area size increased, the early theta LFP-based decoding performance decreased more gradually for view than species, which was exemplified by the smaller spatial decay constant (Fig. S4B inset). The results indicate that the positive effect of the larger cluster size overrode the negative effect of its heterogeneity, which may explain why loss of the decoding performance with the early theta ECoG compared with the early theta LFP was milder for view than species categories (Fig. 4B top center,  Fig. 6A).
In the post-stimulus-induced period (after 300 ms), decoding performance with high-gamma LFP was higher for species than for view ( Fig. 4B middle left), whereas no species or view information was detectable in high-gamma ECoG (Fig. 4B top left). The spatial shuffle analysis confirms that the late high-gamma LFP-based decoding was more robust for species than view, as indicated by the smaller spatial decay constant (Fig. S4B). These results are consistent with the late high-gamma d′ maps showing a more mosaic-like distribution for view-selective channels than species-selective channels (Fig. 5B, E).

Temporal Factors Contributing to Category-Selective Functional Neural Clusters
We next evaluated the possibility that reasons other than the spatial clustering, particularly temporal synchrony of neuronal population, may also significantly contribute to the formation of spatiotemporally homogeneous functional cluster sensitive to decoding. To test this possibility, we analyzed phase-locking of the evoked low-frequency LFP signals across channel, which may reflect synchrony of the inputs in the recorded region (Fig. 6C). The phase of the evoked theta LFP was investigated at 75 ms after the stimulus onset, where the time derivative of the differential between LFP and ECoG decoding performance reached a maximum (Fig. 6B). The PLV (see Materials and Methods) were significantly different across the view category members (right/center/left; P = 4.2 × 10 −7 , Kruskall-Wallis test), specifically between the right and center views (P = 0.0015, posthoc Bonferroni-Dunn test) and between the right and left views (P = 2.9 × 10 −7 ), but not between the center and left views (P = 0.19). The phase variability was not significantly different across species category members (human/monkey; P = 0.069, Wilcoxon test). These findings suggest that temporal synchrony was another significant factor contributing to the higher decoding accuracy for view compared with species using the early theta ECoG.

Facial Subcategory-Specific Alteration of Categorical Architectures in the ITC
For the coarse category level, the face-selective domains in the early low-frequency d′ map and the late high-frequency d′ map overlapped (Fig. 5A) showing significant correlation (R = 0.62, P = 2.7 × 10 −8 ; Fig. 5C), supporting the hypothesis (1) that the functional IT architecture for coarse category based on the high-frequency LFPs is similarly organized as those based on the low-frequency LFPs/ECoG. For the intermediate (facial species and view) categories, the d′ category selectivity maps defined by the early theta LFP and those defined by the late high-gamma LFP were distinct (Fig. 5B, E). Neither the species (R = 0.17, P = 0.18; Fig. 5D) nor the view (R = 0.24, P = 0.12; Fig. 5G) categories indicated significant correlation between the early and the late d′ values.
In the d′ maps of the early theta LFP, there was recognizable spatial homogeneity (Fig. 5B left and Fig. 5E left). In contrast, the d′ map of the late high-gamma LFP was more spatially heterogeneous (Fig. 5B right and Fig. 5E right). Specifically, species maps exhibited clusters both smaller in size and weak in selectivity (illustrated by small patches), indicating local mixture of neuronal activity selective to distinct species (Fig. 5B right). To quantify this alteration of category selectivity maps, we counted the number of category-selective channels in the early theta and the late high-gamma d′ map. The channels were considered category-selective if |d′|>1. For monkey C, humanselective channels dominated in the early theta d′ map (monkey/ human = 0/29), but the dominance declined significantly in the late high-gamma d′ map (monkey/human = 5/3, P = 0.00013, Fisher's exact test). For monkey H, on the other hand, monkeyselective channels dominated in the early theta d′ map (monkey/ human = 6/0). The dominance also tended to decline, although this change did not reach statistical significance (monkey/human = 1/2, P = 0.083). The facial view map and the facial identity map exhibited mosaic-like distribution of channels selective to different views (Fig. 5E right) and to different identities (Fig. 5F), indicating extensive heterogeneity of category selectivity.
These results were consistent with the hypothesis (2) that for the intermediate categories, the low-frequency field signals are intermediately clustered and/or spatiotemporally homogeneous, whereas the high-frequency field signals were relatively distributed and/or heterogeneous. The finding that the intermediate category maps with the late high-gamma LFP did not contain highly homogeneous clusters may explain why ECoGbased decoding with a large-scale spatial summation was disadvantageous with late high-gamma signals.

Discussion
In the present study, we developed a method for estimating the spatiotemporal clustering of neural activity by decoding simultaneously acquired MUA, LFP, and ECoG data. The results revealed that neuronal signals selective to the facial view and species categories formed intermediately homogeneous spatiotemporal clusters in the ITC, whereas signals selective to the facial identity category did not form clear spatiotemporal cluster. The category information extractable from LFP and ECoG data depended on the temporal frequency of the neural synchrony and changed over time between the early "evoked" period and the late "induced" period. Specifically, low-frequency evoked LFP and ECoG data contained correlated and spike-independent category information, whereas the high-frequency-induced LFP data carried information that was tightly coupled to spike firing. Importantly, in contrast to coarse category maps, which had highly homogeneous clusters that were robust across early low-frequency signals and the late high-frequency signals, the facial view and species category maps dynamically changed from moderately homogeneous organization in early low-frequency signals to more heterogeneous and distributed organization in late highfrequency signals (see Figure 7 for schemas).
Face is a core category most frequently used for assessing the categorical organization of the pattern/object vision system (what pathway) in the macaque IT cortex. Thus, although the main findings of the present study are primarily on the categorical architecture of the face category and its subcategories, we believe that our conclusions provide significant insights into the neural principle representing natural hierarchical object categories in the macaque IT cortex. These findings suggest that the category-level-dependent functional organization of spike-coupled high-gamma signals is shaped through local cortical circuits within the ITC.

Distributed Neural Organization for Perceptually Hierarchical Categories
The visual stimuli in the current study were hierarchically structured so that faces of 5 individuals comprised the coarser "human face" category, and human faces and macaque faces comprised the coarsest "face" category. Here, we consider 2 potential models of the topological relationship between the face-selective neuronal cluster and the facial subcategory-selective neurons in the ITC. First, a "hierarchical representation model," a natural extension of the taxonomy of perceptual categories, assumes that the ordinatelevel face category-selective neural cluster is a linear sum of the facial subcategory-selective neurons. In other words, facial subcategory-selective neurons are subpopulation of the parent face-encoding cluster. An alternative "distributed representation model" assumes nonlinear relationship between the parent category and its subcategories, indicating that the facial subcategoryselective neurons are distributed outside as well as inside the face-selective neuronal cluster. Comparison of Figure 5A, B, E, and F reveals that the facial subcategory-encoding sites (human face-selective sites or left-view-selective sites) were not subpopulations of the face-selective region. For example, a group of left-view-selective sites in the d′ map with early theta signal was found in the posteroventral region within the chamber (Fig. 5E  left), located outside the face-selective cluster (Fig. 5A top left). Sites selective to monkey faces partially overlapped with the face-selective cluster, but the peak position showed a posteroventral shift (Fig 5B top left). Likewise, some identity-coding sites (Fig. 5F) located outside the parent human face-selective cluster, particularly in the late high-gamma maps. Quantitative analyses shown in Figure 5C, D, G, and H and Supplementary Figure S3 show no significant correlations between the face category selectivity and facial subcategory selectivity except for facial view selectivity defined with early theta signals in a monkey. Taken together, our findings do not provide support for the hierarchical representation model, but for the distributed representation model. As the recording chamber was placed above the posterior end of the anterior middle temporal sulcus with the center of the chamber approximately 15 mm (monkey C) and 18 mm (monkey H) anterior in Horsley-Clark stereotaxic coordinates, the face-responsive area in our study likely corresponded to the "AL face patch" (Tsao et al. 2008) and the "face-domain" (Sato et al. 2013). Indeed, in the coarse category d′ maps obtained with MUA and LFP recording, the face-selective sites spanned over several millimeters on the cortical surface (Fig. 5A), consistent with previous descriptions (Tsao et al. 2008;Sato et al. 2013). The present results suggest that in addition to the mirror-symmetric representation of side-view faces reported by Tsao et al., distributed representation outside the AL face patch may encode facial view information. Similarly, additional information from a region outside the AL face patch may encode the species of the target face (Fig. 7B), as suggested by a previous report (Sato et al. 2013). Tsunoda et al. previously suggested a nonadditive relationship between neural representations of an object and representations of its parts in the macaque ITC (Tsunoda et al. 2001). From these findings, it is reasonable to suggest that such distributed and nonlinear representation may be a general rule governing the representation of category hierarchy in the ITC as well. The current data indicate that subordinate-level facial information is sparsely scattered within the ITC, spanning out of the ordinate-level face-selective domain rather than discretely clustering within it, as illustrated in a partially speculative schema in Figure 7B. (depicted by face illustrations). Neurons preferring hairless skin-like texture over haired fur-like texture can help differentiate human and monkey. Not only neurons preferring face of a particular species but also combination of speciesnonspecific face (face illustrations) and skin/fur texture can differentiate human from monkey, or vice versa. Note that fur/skin were not used as visual stimuli in the present study but are shown here to indicate potential nonfacial cues to discriminate between monkey faces and human faces.

Effects of Temporal Coherence on Representation by Spatially Summated Signals
A characteristic category-specific reduction of decoding accuracy by spatial summation was found in the early evoked time window; view and species category information were decoded with equivalent accuracy with early theta LFP in monkey H, but only the performance of species decoding was reduced with early theta ECoG (Fig. 6A). The results are consistent with the finding that in the early evoked period, the neural population representing species subcategories exhibits relatively smaller but more homogeneous organization than the population representing view subcategories (Fig. 5B top left, Fig. 5E left). In addition to the spatial configuration of neural activity, a temporal effect may also have contributed to the robustness of view decoding in ECoG. An analysis of temporal phase information revealed that the theta signal for the right-view face arrived at the recorded region in a less correlated manner than the center-and left-view faces (Fig. 6C bottom). This may have provided right-view-specific signal reduction and robust distinction across views in the spatially summated ECoG signal. We speculate that the nonlinearity mentioned in the preceding section have arisen, at least in part, from the temporal structure of IT neural responses. This interpretation is consistent with the idea that the spatial reach of the recorded neural signal depends not only on the spatial configuration but also on the temporal coherence of source signals, since phase matching of synaptic activity affects the spatial summation of the signal (Linden et al. 2011;Einevoll et al. 2013).

Contribution of Higher Order Correlation
In multichannel neural data, important information can be embedded in higher order correlation across channels (Maynard et al. 1999). To address this issue, we conducted 2 types of decoding analyses by manipulating the covariance structure of the data. In the first analysis, we trained the category classifiers with trialshuffled data and classified the original data (Fig. 8A). This procedure maintains the trial average but destroys the trial-wise covariance structure of the training data. Thus, the outcome performance may reveal the amount of loss that would occur if the trial covariance was negligible in training the category classifiers. Classification performance significantly decreased compared with the original data, indicating substantial trial covariance in the ECoG/LFP data (Fig. 8A). Several factors may explain this covariance: 1) noise unrelated to neural activity, 2) visual stimuli-unrelated neural activity fluctuation, and 3) visual stimulirelated neural activity fluctuation. The latter 2 factors could arise from subthreshold membrane voltage fluctuations because MUA performance was not affected by the shuffling procedure. In the second analysis, we used trial-shuffled data for both training and testing of the category classifiers (Fig. 8B). This second data set resembles data obtained with single-unit recording experiments, where serially acquired data are pooled for use in multivariate analysis. These data may be plotted as mean response vectors but should not be plotted as trial-wise data unless zero covariance is assumed (Hung et al. 2005). The classification performance of coarse category and identity decoding in the shuffled LFP data differed significantly from the original data, and coarse category decoding in the shuffled ECoG data also differed significantly from the original data (Fig. 8B). These results suggest that the classification performance of simultaneously acquired LFP data might be underestimated unless taking significant information embedded in the higher order correlation across channels into account.
We observed several phenomena that cannot be explained by higher order correlation nor by temporal coherence. For example, late high-gamma LFP-based decoding performance was higher for species than view ( Fig. 4B middle left), even though the channelwise d′ appeared to be higher for view compared with species category (Fig. 5B right, 5E right). In addition, there was no clear difference between view and species in higher order correlations. A possible explanation is that there was more redundant information across remote LFP recording sites for view than for species category, giving rise to increased species decoding performance in multivariate decoding analysis.

Implications for Brain-Machine Interfaces
ECoG is becoming an increasingly popular tool for brainmachine interfaces because it is associated with minimal tissue damage, long-term stability, large area coverage, and fewer ethical barriers for human applications (Schalk and Leuthardt 2011). However, its brain-decoding capability compared with that by neuronal spiking activity has not been studied in detail. The current study demonstrates that the reliability of category decoding by different recording methods depends on the type of target category. ECoG-based decoding was surprisingly reliable for coarse category information. LFPs can reliably predict data. The shuffled data set resembles a case where a serially acquired data is later pooled (e.g., pooled single-unit data) for use in multivariate analysis. *P < 0.05; **P < 0.01; ***P < 0.001, Chi-squared test with Bonferroni correction for multiple comparisons. multiple level categories including identity of the individual faces. This is valuable because the current method of identity decoding is not a simple discrimination of one particular stimulus image from another (Hung et al. 2005) but accounts for the generalization of personal identity regardless of viewing angle. High-classification performance of LFP-based decoding is presumable because it can detect both high-frequency local oscillations and across-area slow voltage synchronization. Although acquisition of LFP signals relies on invasive microelectrode penetration, it can be acquired stably for a long period. Overall, the current results suggest that LFP-based decoding could provide a powerful neurophysiological and prosthetic tool for reading out a wide range of targeted information from a small cortical window.