Face perception is a complex process involving a network of brain structures, dynamically processing information to enable judgments about a face to be made (e.g., familiarity, identity, and expression). Here we introduce an analysis methodology that makes it possible to directly study this information processing in the brain from spatially and temporally resolved magnetoencephalographic signals. We apply our methodology to the study of 2 face categorization tasks, gender and expressiveness, and track the processing of 3 key visual features that underlie behavioral performance, over time and throughout the cortex. We find information processing correlates beginning from 90 ms following stimulus onset, where features are processed in isolation in occipital extrastriate regions. Over time, processing of successively more features and feature combinations takes place in occipitotemporal regions, with maximal information processing of visual information coinciding with the well-established face-selective M170 component at 170 ms. Later still, around 250–400 ms, cortical activity responds significantly more to task-specific features and their complex combinations. These results indicate a complex process of visual information processing during face perception with face parts processed in isolation at very early stages, and task-specific processing of combinations of features taking place within 300 ms. Crucially, our approach specifically establishes which information in the visual stimulus the brain signal is responding to and how this varies with time, cortical location, and task demands to establish a more precise tracking of information processing mechanisms in the cortex during face perception.
In the search for a deeper understanding of the workings of the human brain, researchers are increasingly looking to brain imaging methods to provide the answers. Thanks to these methodologies the framework of distributed processing networks has emerged (Haxby et al. 2001), with neuronal oscillations potentially playing an important role for interactions among network nodes (Varela et al. 2001; Fries 2005). Of critical importance in understanding the functional role of these networks is to establish the information that they process and how this information is distributed and transferred throughout the different brain areas involved (Sheng et al. 2007). Through the careful control of cognitive parameters during experimentation and by making use of advanced multivoxel pattern analysis methods, considerable insight into what information is represented by the brain has been possible (for a review, see Norman et al. 2006). However, although subtle differences in experimental conditions (e.g., a happy vs. a fearful face) may tell us which brain areas respond to a particular condition, they cannot inform us about the information processing that subsumes these critical categorizations. For example, What visual information is being extracted? Where in the brain is this taking place? Is the information processing content of a particular brain region changing over time? Is the same information extracted by a particular brain region irrespective of task? or is it extracted as a function of its diagnosticity for the task (Schyns et al. 1998)?
Here we sought to address these questions by establishing an analysis methodology that allows us to ascribe specific information processing content to temporally and spatially resolved magnetoencephalography (MEG) signals. In essence, with centimeter and millisecond precision, we decode the brain activity into the underlying stimulus information and track how this processing evolves over time across the cortex. To illustrate our approach, we apply it to facial expressiveness (happy vs. neutral) and gender categorizations. Whereas traditional analysis methods have established a network of brain regions that are activated during face processing such as this, including bilateral regions in the fusiform gyrus (FG), the superior temporal sulcus (STS), occipital, and temporal regions (Haxby et al. 2000), here we seek to establish the dynamics of information processing in these regions during the 2 categorization tasks.
To accurately depict the dynamics of information processing within the cortex, it is necessary to take into account the complex relationship between the signals recorded on the scalp (with an electroencephalography [EEG] electrode or MEG sensor) and the underlying neuronal sources. As the scalp-recorded signal constitutes the linear sum of the projections of individual brain sources, potentially originating in different regions throughout the cortex, it is not appropriate to simply assume that the activity measured on a given electrode/sensor can be directly attributed to the underlying cortical area (Nunez and Srinivasan 2006). For applications of our approach to single-sensor scalp EEG activity, see Schyns et al. (2003, 2007) and Smith et al. (2004, 2006, 2007). Determining the cortical locations and independent activity of each individual source is typically termed the EEG/MEG inverse problem and a number of approaches to its solution have been proposed (for a review, see Baillet et al. 2001). Beamforming methods in particular are becoming increasingly popular as a means of generating a so-called virtual electrode or spatial filter (Van Veen et al. 1997; Robinson and Vrba 1999; Gross et al. 2001). This virtual electrode can then be placed at any number of different locations within the cortex to accurately estimate the time course of neuronal activity generated by each location in turn. Here we will develop these methods to establish the information processing of the cortical regions themselves and track how this changes over time.
In order to establish the relationship between specific high-level visual information and brain activity, we randomly sampled the input image space with circularly symmetric Gaussian apertures (Gosselin and Schyns 2001) to generate so-called subsampled stimuli (see Fig. 1 for 2 such examples). By randomly varying the positions of the apertures over a sufficient number of trials, this sampling approximates a uniform random sampling of the space and constitutes a viable solution to the bias–variance dilemma (Smith et al. 2004). By relating the locations of the random samples of information presented on each trial with behavioral response, it is possible to reveal, independently per subject and task, the visual information (i.e., the facial features) correlated with correct classifications of faces by gender and expressiveness (see Gosselin and Schyns 2001; Schyns et al. 2003; and for more details, see Smith et al. 2004). Such planes of correlation coefficients are often called classification images (CIs) (Eckstein and Ahumada 2002).
These random information samples can also be correlated with any dependent variable (e.g., the brain signal measured by a virtual electrode) to establish the sensitivity of that variable to the input visual information space (see Fig. 2 for an outline of the methodology). Using beamformer methods, we estimate the single-trial activity at the nodes of a regularly spaced grid throughout the cortex and establish this information sensitivity mapping directly in the source space. This representation depicts any significant systematic relationship between the level of activity in the cortex and specific visual information. Thus, for the first time, we can dynamically track the processing of specific high-level visual information directly within the cortex.
Materials and Methods
Four healthy paid volunteers (1 male, mean age 26, all right handed) participated in the experiment. All subjects gave informed written consent in accordance with the institutional guidelines of the local ethics committee (Commissie Mensgebonden Onderzoek [CMO] Committee on Research Involving Human Subjects, region Arnhem-Nijmegen, The Netherlands) and the Faculty of Information and Mathematical Sciences ethics committee (University of Glasgow, UK).
Participants were seated upright in the MEG recording system in a magnetically shielded room. Visual stimuli, generated with Presentation 9.10 software, were presented using an LCD video projector (60-Hz refresh rate) and back projected onto the screen using 2 front-silvered mirrors. MEG data were continuously recorded using a whole-head system with 151 axial gradiometers (Omega 2000, CTF Systems Inc, Port Coquitlam BC, Canada). Head position with respect to the sensor array was measured using localization coils fixed at anatomical landmarks (the nasion and at the left and right ear canal). These measurements were made before and after the MEG recordings to assess head movements during the experiment. Participants were instructed to maintain their head position as best they could within recording blocks. In addition, horizontal and vertical electrooculograms were recorded using electrodes placed below and above the left eye and at the bilateral outer canthi. Electrode impedance was kept below 20 kΩ. MEG, electrooculography (EOG), and electrocardiography signals were low-pass filtered at 300 Hz, sampled at 1200 Hz, and then saved to disk. Subjects’ psychophysical performance was recorded by means of key presses using a button box (LUMITouch).
For each subject, a full-brain anatomical magnetic resonance image (MRI) was acquired using a high-resolution inversion-prepared 3D T1-weighted scan sequence (flip angle = 15°; voxel size: 1.0 mm in plane, 256 × 256, 164 slices, time repetition = 0.76 s; time echo = 5.3 ms). The anatomical MRIs were recorded using a 1.5-T whole-body scanner (Siemens, Erlangen, Germany), with anatomical reference markers at the same locations as the head position coils during the MEG recordings (see above). The reference markers allow alignment of the MEG and MRI coordinate systems, such that the MEG data can be related to the anatomical structures within the brain.
On each experimental trial, we created subsampled versions of a set of face stimuli (256 × 256 pixel gray-level images of 5 male and 5 female actors each displaying happy or neutral expressions) by randomly sampling visual information from the experimental stimulus using Gaussian apertures (see Fig. 1 and Gosselin and Schyns 2001). All photographs were taken under standardized conditions of illumination, and hairstyle was normalized across faces to eliminate this feature (see Fig. 1). Stimuli were projected on a light gray background to the center of a screen fixed at a distance of 0.7 m from the participant (visual angle 4.77 × 3.58° forehead to chin).
A trial started with the 1000-ms presentation of a fixation cross, immediately followed by a randomly selected face picture whose information was revealed through nine 2D Gaussian apertures (sigma = 0.39° visual angle) randomly allocated across the face. On each trial, these nine randomly located apertures make up a so-called bubble mask. Pilot experiments have indicated that 9 apertures typically result in 70–75% categorization accuracy across the 2 tasks. The subsampled face remained on screen for 1500 ms and observers were instructed to respond as quickly as possible without making mistakes by depressing the appropriate labeled button of the appropriate response box. Upon response, a blank gray screen replaced the face stimulus for 500 ms. Subjects were instructed to maintain fixation in the center of the face in the region of the fixation cross.
Subjects were asked to categorize these sparsely sampled images by gender (male vs. female) in one 4000-trial session and expressiveness (happy vs. neutral) in a second. Order of task was counterbalanced across participants, and sessions were completed in blocks of 500 or 1000 trials up to a total of 2000 trials per day. Short breaks were allowed every 100 trials. In total, the experiment lasted for approximately 6 hours per subject. The large number of trials per participant is required in order to achieve a broadly uniform random sampling of the stimulus space at a level of appropriate signal to noise with the MEG recordings.
Subjects were on average 70% correct for the gender task (σ = 7%) and 73% correct for the expressive or not task (σ = 4%). The specific visual information shown on each trial was sorted as a function of whether or not it resulted in a correct response in the categorization task. Observers will tend to be correct if the information necessary to perform the task has been provided and inversely they tend to be incorrect if this information is missing. Summing together the information corresponding to correct categorizations and subtracting the sum of all the information leading to incorrect responses results in the behavioral CI. This is similar to performing a least-square multiple regression. We transformed these CI pixel values to z scores using the noninformative normalized hairstyle and forehead region as the baseline distribution and thresholded for significance at P < 0.05 (2 tailed and corrected for multiple comparisons within the image space; for details, see Chauvin et al. 2005) to reveal the significant visual information used to perform the gender and expressiveness categorizations.
MEG data were low-pass filtered offline at 40 Hz, and analysis epochs were generated offline and extended from 200 ms prestimulus presentation to 600 ms poststimulus. Epochs were scanned for eye blinks, eye movements, muscle activity, and jump artifacts in the SQUIDs using the FieldTrip analysis toolbox (Maris and Oostenveld 2007), implemented in MATLAB, and contaminated epochs removed from further consideration. Note that the EOG rejection procedure will reject rotations of the eyeball from 0.9° inward to 1.5° downward of visual angle, when the stimulus spanned 4.77 × 3.58° of visual angle from forehead to chin. On average, 5% of trials were rejected on the basis of jump artifacts (signal disturbances in the MEG machine) and 6% (standard deviation 1.5%) of trials rejected on the basis of EOG artifacts (eye blinks and movement).
Source Space Analysis
We employed a linearly constrained minimum variance (LCMV) beamformer approach (Van Veen et al. 1997; Vrba and Robinson 2000) to estimate the single-trial time course of activity at each node (n) of a regularly spaced grid within the cortex (1 cm3) using FieldTrip (Maris and Oostenveld 2007). For each node, a spatial filter was constructed that passes activity originating from this location with unity gain while attenuating activity originating at other locations (Huang and Mosher 1997; Robinson and Vrba 1999). We used a multispherical volume conductor model to compute the forward model of a dipole source at the node point location of interest. We did this by fitting a sphere to the head surface (derived from each individual structural MRI) underlying each sensor (Vrba and Robinson 2000). The beamformer filter weights were computed from the covariance matrix of the averaged event-related magnetic field in each recording session (1000 [or 500]-trial block) in 2 time windows: 75–225 and 225–400 ms following stimulus onset. Averaging the single trials in this way can improve performance of the beamformer as it reduces the noise variability but not the temporal variability in the signal (Van Veen et al. 1997). As the signal covariance in this low noise condition can be ill conditioned, a small regularization term was included in the computation (for more details on the specifics of the algorithms, see Van Veen et al. 1997). Individual trial data were then beamed through the appropriate spatial filter to give 3 time courses (x, y, and z orientation) at each node of the grid for each experimental trial. Each node point can be thought of as a virtual electrode created by the beamforming process.
Our initial beamformer analysis considered all regions of the cortex in order to establish without any a priori assumption those regions contributing to our signal. However, it would be both impractical and ill advised to further analyze all 2800 voxels for their information processing content. As the beamformer analysis is most reliable at the voxels corresponding to maximum activation (Van Veen et al. 1997), we chose to analyze only those voxels corresponding to high-power activation across all sessions and both tasks in each subject in order to establish the voxels of interest. We chose to collapse the analysis across tasks as the scalp-recorded sensor topographies for each task were virtually identical (see Supplementary Materials for more details). In addition, by considering the same voxels in each task, we can make direct comparisons of the processing in a given region when subjects perform one task or the other. Specifically, from the filter weights, we computed one power map per time window and recording session. For each subject, we then averaged these maps across all recording sessions and the 2 tasks and thresholded them with respect to baseline activity (P < 0.1). Only those voxels passing the threshold are considered for further analysis, now performed independently per task. Please note that the selected voxels are not necessarily the same in the first (75–225 ms) and second (225–400 ms) time windows.
To estimate the visual features correlated with variations in the signal at each voxel, we derived an independent CI for every measurement time point (ti) and every voxel location within the significantly activated regions (si) to determine the features discriminating between low and high activity at that location and time point. To this end, for a given voxel and time point, we built a distribution of the MEG amplitude values established for that voxel at the given time point across all correct trials of a particular task. The specific visual information sampled on each trial was then ranked as a function of the activity elicited across trials at the voxel and time point of interest. Summing all the information corresponding to the top 40% of the distribution and subtracting the sum of all the information corresponding to the bottom 40% of the distribution resulted in one MEG CI (si,ti). For each participant, MEG CIs at each voxel and time point were linearly combined across all sessions of the same task.
To establish those features that were significantly correlated with the MEG voxel activity, we transformed the CI pixel values to z scores using the noninformative normalized hairstyle and forehead region as the baseline distribution. We term any cluster of connected pixels with a value greater than a P <0.05 significance threshold criterion (2 tailed and corrected for multiple comparisons within the image space, for details, see Chauvin et al. 2005) a “feature.” Figure 2 illustrates this procedure going from the single-voxel MEG signal in response to the subsampled stimuli to a MEG CI representing the statistically significant visual information associated with modulations in the signal amplitude (this figure also illustrates further analysis steps which we will return to shortly). Repeating this analysis across all voxels and time points results in a set of MEG CIs that detail any statistically significant visual information that is systematically correlated with the MEG signal. Critically, in this application, the use of the spatial filter maximizes the chance that the resulting representation of the dynamics of information processing corresponds to the activity in one cortical region only.
We chose the time-domain LCMV beamformer implementation in particular (as opposed to the frequency domain, e.g., Dynamic Imaging of Coherent Sources [DICS], Gross et al. 2001), as we wanted to retain the maximum time resolution within the data set. Further, we elected to use a linear beamformer implementation that allowed the orientation of the source to vary with time as opposed to nonlinear methods that select the source orientation to maximize beamformer output (e.g., synthetic aperture magnetometry, Robinson and Vrba 1999; Vrba and Robinson 2000). With this linear approach, we are then able to investigate information sensitivity as a function of source orientation over time.
Visual Feature Coding
In order to reduce the very high dimensionality of the source results’ space (e.g., per task we have approximately 500 voxels × 206 time points = 103 000 CIs each containing 256 × 256 correlation coefficients), 3 key features were established from the visual information according to their importance for correct behavioral classification. For the gender and expressiveness tasks considered in these studies, the 3 features are the left eye, the right eye, and the mouth (see Fig. 5 and for more information on behavioral studies, see Gosselin and Schyns 2001). Reference templates (RT) were generated to cover the extent of these 3 features across all example stimuli in the set (see Fig. 2 for the templates). The statistically thresholded MEG CIs were then compared with each of the RT to project each MEG CI into this 3D orthogonal feature subspace as follows:
At each time point and location, we represent the FS values for the 3 chosen reference features, that is, the values of the overlap between each CI and the 3 RT, as the 3 coefficients in a red green blue (RGB) color space (the left eye in red, the right eye in blue, and the mouth in green). In this way, the processing of multiple features is coded by the appropriate color combination (i.e., the 2 eyes in purple, the left eye and the mouth in yellow, the right eye and the mouth in turquoise, and all 3 features in white, see Fig. 3). For each voxel, the FS was computed independently for the 3 dipole orientations and the resultant vector length computed to obtain one measure for each voxel.
Randomized Permutation Tests
To establish the probability with which FSs could arise as a result of a chance correlation between the bubble masks and the signal strength, we computed permutation tests independently for each task and each participant, taking only those bubble masks used in the MEG analysis (i.e., corresponding to correct, nonartifact trials). To this end, on each one of 30 900 iterations (to match the number of sensor space CIs, see Supplementary Materials for more details), we randomly rearranged the mapping between the trial number and the corresponding bubble mask in order to disrupt the mapping between visual information and MEG amplitude. With this new random mapping, we generated an MEG CI as before (sum of the bubble masks associated, now randomly, with the top 40% of signal amplitudes minus the sum of those bubble masks randomly associated with the bottom 40% of signal amplitudes). This CI was then thresholded for significance as before (P < 0.05, 2 tailed) and projected into the 3-feature subspace. From the distribution of FS measures across the iterations, we established an independent threshold for each feature corresponding to the maximally obtained FS occurring by chance in over 30 000 iterations and thresholded our values accordingly.
Feature Overlap Measure
To establish the extent to which different cortical regions are processing each of these 3 key features in isolation or processing different sets of features (in isolation or concurrently), we computed a feature overlap measure. At each time point, we established the number of voxels whose CI significantly represented a given feature combination and normalized by the maximum number of voxels responding to any feature in the entire time period.
It is important to note that the current application of bubbles technique does not distinguish between a region that is processing combinations of features concurrently and a region which is capable of processing different individual features at different times (i.e., the distinction between a region that only responds when both eyes are present vs. a region that can process each of the eyes in isolation or together, but for such applications, see Schyns et al. 2002; Smith et al. 2004). To fully investigate, second- and third-order cortical processing would require significantly more trials than we have in the present study.
Behavioral Sensitivity Measure
To establish the extent to which cortical regions are processing only the behaviorally relevant information, we computed a behavioral sensitivity measure. In an analogous way to the FS measure, we computed the intersection of the thresholded MEG CIs with a RT. However, in this case, the RT was the thresholded behavioral CI. For example, for subject 1, the gender template comprised regions around both eyes and the mouth and the expressiveness template included a large area around the mouth, though biased to the left hand side (see Fig. 5, behavioral processing). To estimate the evolution of this measure over time, we determined the number of voxels corresponding to significant behavioral sensitivity at each time point and depicted the integrative sum of this value over time.
Four participants each completed 4000 trials of the gender and 4000 trials of the expressiveness tasks in eight 2-h recording sessions. Participants were on average 70.4% correct in the gender categorization (σ = 7%) and 71.7% correct in the expressiveness (Exnex) categorization (σ = 6%). Behavioral analysis (see Materials and methods) revealed that all 4 participants used information from the mouth area to correctly categorize the faces by expression and information from the eyes to correctly categorize gender (see behavior images on Fig. 5 and Supplementary Figs 9–11). In addition, 3 out of 4 participants also used some information from the mouth region when judging gender. In order for the participant to perform correctly in each task, their brain must process at least the behaviorally relevant visual information at some point between stimulus onset and behavioral response.
To encompass the activity underlying 3 well-established face-selective responses, we computed virtual electrode single-trial activity in 2 time windows: 75–225 and 225–400 ms. The first window encompasses both the P100/M100 and the N170/M170 components occurring at 100 and 170 ms in EEG/MEG, respectively. The second time window includes the neural activity making up the P300/M300 component (existing from 250 ms onward). Supplementary Figure 1 illustrates the cortical locations of maximum power for each subject computed for each of the time windows. Power is averaged across tasks and recording sessions for each subject and is illustrated with respect to baseline activity (thresholded at P < 0.1). For all subjects, activity in the earlier time period was localized bilaterally in lateral occipital cortex, and there is a trend for a stronger right occipitotemporal distribution that varies across subjects. Additional activity was also localized to midline occipital cortex (early visual areas) in 3 subjects. For the later time window (see Supplementary Fig. 2), activity was more central, extending to occipitoparietal regions for 3 subjects and was right lateralized in the fourth.
To depict information processing within the cortex, we projected the CIs computed at each activated voxel and for each time point into a feature coding subspace (see Materials and methods, Visual feature coding). This subspace comprised 3 orthogonal bases corresponding to the 3 key visual features required to perform the 2 tasks as established from the behavioral results. We code processing of each feature as one color in an RGB color space with the left eye coded in blue, the right eye coded in red, and the mouth coded in green. Simultaneous processing of more than one feature by a particular cortical region is represented by the appropriate color code combination (e.g., processing of the 2 eyes would be coded in magenta).
Information processing of the activated cortical regions is shown for one example subject in Figures 3 and 4 for the gender and expressiveness tasks, respectively. Results are shown for 4 time periods in the early time window and 4 in the later time window centered on 120, 140, 160, 180, 240, 260, 280, and 300 ms. Supplementary Figures 3–8 illustrate equivalent results for the remaining 3 subjects. For the first subject (Fig. 3), clear differences in the information being processed can be observed across time, brain regions, and between the 2 tasks.
To facilitate the interpretation of these results across the 4 subjects, we created a schematic illustration of the different information processing content at 3 specific time points, corresponding to the M1, M170, and M300 components, as shown in Figure 5 for subject 1 and Supplementary Figures 9–11 for subjects 2, 3, and 4. At the first time point, 120 ms following stimulus onset, information processing is confined to occipital regions and primarily comprises processing of the 3 key features in isolation. However, even at this very early time point, some task-specific processing of features is taking place. By 180 ms following stimulus onset, the information processing is much more widespread, extending into occipitotemporal regions and involves more regions responding to combinations of features. At the final time point (280–300 ms), marked differences in the visual information being processed are clear both with respect to the earlier time points and across the 2 tasks. Specifically, in all subjects by 300 ms, information processing is more complex with regions of occipital cortex processing all 3 features in 3 out of the 4 subjects in the gender task (e.g., in Fig. 3, at 280 ms, on slice 8 the white regions correspond to significant processing of all 3 features in the same voxel).
To investigate the time course of information processing in more detail, 2 further analyses were performed (see Materials and methods, Feature overlap measure and Behavioral sensitivity measure for full details). First, we computed the proportion of significantly activated voxels responding to 1) any one feature in isolation (left eye, right eye, or mouth); 2) any combination of 2 features (2 eyes, left eye and mouth, and right eye and mouth); and 3) all 3 features. Plotting the evolution of this feature overlap measure over time in Figure 5 for subject 1 and Supplementary Figures 9–11 for the remaining 3 subjects, a number of interesting points become clear. Feature processing begins with isolated features at 83 ms following stimulus onset [σ(8) = 9.5 ms], followed by processing of 2 features at 115 ms [σ(8) = 18 ms], and in the gender condition only this is followed by processing of all 3 features at 133 ms [σ(4) = 19 ms]. So that in all 4 subjects, processing of 3 features begins significantly later than processing of 2 features [t(3,Gender) = 2.37, P (1 tail) = 0.049], which occurs significantly later than processing of individual features [t(3,Gender) = 2.59, P (1 tail) = 0.04; t(3,Exnex) = 2.75, P (1 tail) = 0.035]. As feature processing builds in complexity over time, it also expands in space as more voxels become significantly correlated with visual information. In the first time window, this expansion reaches its peak 170 ms following stimulus onset [σ(8) = 17 ms], after which time the number of voxels significantly correlated with any of the 3-feature drops.
In the later time window, the differences in processing across the 2 tasks become even more apparent. Processing in the expressiveness task continues to be dominated by single features (typically the mouth) with virtually no processing of all 3 features by the same cortical regions. In the gender task, however, processing of 2 or more features by the same cortical regions grows, so that significantly more voxels are responding to all 3 features in the second time window compared with the first [t(3) = 7.37, P (2 tail) = 0.005].
Although this analysis provides a useful insight into the dynamics of information processing, the specifics of the visual information being processed are lost. For this reason, we performed a second analysis to establish the extent to which the information being processed relates to the behaviorally relevant information. Figure 5 and Supplementary Figures 9–11 depict the cumulative correlation of the information established to be diagnostic for behaviorally relevant with the information significantly modulating cortical activity in occipital and temporal regions. In all 4 subjects, processing of behavioral information increases throughout the first and second time periods and in 7 of the 8 comparisons, more behavioral information is processed in the later time period (after 225 ms) than the earlier (before 225 ms).
Our main objective was to examine the information processing of the cortical activity underlying face-sensitive neuronal responses in 2 face categorization tasks. To this end, we performed a LCMV source reconstruction of our MEG data in 2 temporal periods encompassing the M100, M170, and M300 face-specific responses (Liu et al. 2000, 2002; Tanskanen et al. 2007). Using this source model as a spatial filter, we extracted the single-trial activity time courses for all significant voxels and mapped out the information processing underlying this activity. To perform this information mapping, we established a novel methodology that combined the information sampling approach of bubbles with the spatially and temporally defined neuronal activity. The method works by sparsely sampling the input stimulus space (visual in our application, though other stimulation modalities are equally applicable) and correlating the specific information available to the participant on each trial with modulations in the recorded brain signal.
It is important to note that each CI is computed as a weighted sum of exactly the same bubble masks (i.e., the different sets of Gaussian apertures used to sample the face on each trial). It is by changing this weighting, as a function of the relationship between the information presented on a given trial and the resulting signal modulation, that statistically significant features appear. So that for one voxel and time point, the CI might reveal a sensitivity to the left eye while at the same time on another sensor a sensitivity to the right eye. If no strong relationship exists between the visual information and the measured signal, the resulting CI will be noise and will not result in any visual features passing the significance test.
We have introduced an analysis methodology that makes it possible to directly study the brain in terms of the specific stimulus information being processed and demonstrated its applicability on a real data set. In essence, we can provide both spatially and temporally “a window on the brain” and take the first steps to considering the brain as an information processing mechanism. We successfully applied our new methodology in 4 participants by considering 2 face categorization tasks and 3 well-established face-specific brain responses. We found that we could track the processing of 3 key visual features (the left eye, the right eye, and the mouth) over time and throughout the cortex. We distinguish between an early time period in which feature processing is primarily independent and lateralized in occipital and occipitotemporal cortices and a later time period in which visual information is processed more as a function of task demands (mouth for expressiveness judgments and 2 eyes plus mouth for gender judgments). In this latter time period, processing of visual information becomes more complex with the same region processing combinations of features in the gender task (e.g., 2 eyes and 2 eyes plus mouth).
The present approach represents a major extension of our previous studies, which focused on scalp-recorded single-electrode analysis. Our current findings fit well with these earlier results of the contralateral eye processing around the N170 on low occipitotemporal electrodes (e.g., Smith et al. 2004, 2007) and also of latter task-specific feature processing in midline parietal regions (e.g., Smith et al. 2004, 2006). However, this new approach, which considers all activated regions of cortex as opposed to a few key sensors on the scalp, provides a more detailed picture of the visual information processing in the brain. By segmenting the different modes of information processing, for example, as a function of different features, specific combinations of features (e.g., Smith et al. 2004), and/or specific spatial frequency content (e.g., Smith et al. 2006), we can further enable the understanding of the cortical networks subtending the categorization of visual stimuli (for more details, see Schyns et al. 2008).
An important question remains: how do these findings fit with current theories of face processing? It has been proposed that the core of the human face-processing network consists of 3 bilateral regions in occipitotemporal visual extrastriate cortices. These regions comprise the inferior occipital gyri, lateral FG, and STS area and are each thought to be involved in different aspects of face processing (Kanwisher et al. 1997; Haxby et al. 2000; Ishai et al. 2005). Furthermore, face processing is routinely associated with 3 electrophysiological responses occurring in the 400 ms following presentation of a face (the M100/P1, M170/N170, and M300/P2 in MEG/EEG, respectively: Bentin et al. 1996; Liu et al. 2000, 2002; Henson et al. 2003; Tanskanen et al. 2007). However, as yet it remains unclear exactly how these components relate to each other, to the information being processed, and critically to the network of face regions established by functional MRI (fMRI) studies.
Our results indicate that information processing begins approximately 90 ms following stimulus onset and is associated with the processing of single features, with little or no processing of feature combinations. Processing at this time is primarily localized to midoccipital and bilateral occipital extrastriate regions, which comprise the lingual gyrus and early visual areas (consistent with other MEG studies including Halgren et al. 2000; Tanskanen et al. 2004). As time progresses, processing of successively more features take place, with maximal information processing coinciding with the well-established M170/N170 component. By this time point, processing differs across the 2 tasks, with processing of all 3 features occurring only in the gender task where this information is required to perform correctly. Although it has been shown that qualitatively different information from faces is being processed 100 and 170 ms following stimulus onset (Halgren et al. 2000; Liu et al. 2002; Tanskanen et al. 2004), for the first time, we are able to quantify this information, relate it specifically to task demands, and observe its evolution over time using a realistic full-face stimulus.
EEG and MEG studies suggest that 3 functionally distinct sources in the fusiform face area (FFA) (MEG studies, Deffke et al. 2007; Henson et al. 2007), lateral occipital–temporal regions (EEG studies, Shibata et al. 2002; Rossion et al. 2003), and occipital regions (Itier et al. 2006) respond simultaneously around 170 ms following the presentation of a face. These sources correspond to 2 regions of the face-processing network established with fMRI: the FFA (Kanwisher et al. 1997) and the occipital face area (OFA, Gauthier et al. 2000; Ishai et al. 2005). Indeed, in our study, we find activity in lateralized occipital and occipitotemporal regions corresponding to occipital face-sensitive areas (see also Rossion et al. 2003). Localizations of activity in the later time window (after 225 ms) are less well characterized in literature and indeed resulted in more variability across our subjects. In their intracranial study, Allison et al. (1999) found activity around 290 and 350 ms in sites overlapping with or anterior to those active at 200 ms following stimulus onset. This pattern of activity was observed in the present study with some activity extending into more centroparietal regions (see also Itier et al. 2006).
It has been proposed that face-selective responses in occipital areas are driven by feedback connections from the FFA, which potentially guides the extraction of fine-grained visual information necessary for face processing (Gauthier et al. 2000; Rossion et al. 2003). This raises the intriguing possibility that the activity observed in the later time period is a reactivation of OFAs with more specific instructions on the information to process (Itier et al. 2006). Certainly, our findings support the conclusion that activity after 240 ms involves significantly more processing of complex visual information than in the preceding time interval and is increasingly contributing to the processing of task-specific information.
This methodology opens the door to the prospect of studying the evolution of the information processing sensitivity of key brain regions with millisecond temporal resolution during complex perceptual and cognitive tasks. For example, one could imagine studying facial expression categorization and tracking the brain signal sensitivity to the diagnostic visual information for each category (e.g., 2 wide-open eyes of fear, the broad smile of happy, and the wrinkled nose of disgust, Smith et al. 2005; Schyns et al. 2007) as it flows across the cortex and within specialized brain regions. Further extensions to this approach will extend it to other stimulation domains (e.g., auditory signals) and to the inclusion of information sampling across visual spatial frequency channels (Gosselin and Schyns 2001, Smith et al. 2006), to inform as to the sensitivity of cortical regions proposed to be involved in the differential processing of high– and low–spatial frequency information (Rotshtein et al. 2007).
Although it is of much importance in cognitive science to be able to transform the activity of a given brain region into its information processing content, one of the key challenges facing the field is establishing the nature of neuronal communication, that is, how do different brain regions communicate with each other and transfer information throughout the brain? It has been proposed that this communication takes the form of phase synchrony of neuronal oscillations (Varela et al. 2001; Fries 2005). In our information processing account, sensitivity to the same visual information in 2 distinct brain regions suggests that the underlying neuronal assemblies in each region are responding in a roughly equivalent manner to the input visual stimuli. Thus, over time, constant phase relationships in the signal of the 2 regions is observed through sensitivity to equivalent visual information, which could be hypothesized to represent direct neuronal communication between these regions (Fries 2005). We can then apply signal association measures, for example, phase synchrony and coherence (Varela et al. 2001), and granger causality (Granger 1969) directly to the modulations in information processing content to further characterize these relationships.
Biotechnology and Biological Sciences Research Council Grant (BB/D01400X/1) awarded to MLS, PGS and K. Kessler; Royal Society Grant (2004/R6) to MLS; European Young Investigator Award to PF.
We would like to thank Robert Oostenveld, Joachim Gross, Jan-Mathijs Schofeelen, Klaus Kessler, Roberto Caldara, and Fraser Smith for useful discussions. Conflict of Interest: None declared.