The ERP component N170 is face-sensitive, yet its specificity for faces is controversial. We recorded ERPs while subjects viewed upright and inverted faces and seven object categories. Peak, topography and segmentation analyses were performed. N170 was earlier and larger to faces than to all objects. The classic increase in amplitude and latency was found for inverted faces on N170 but also on P1. Segmentation analyses revealed an extra map found only for faces, reflecting an extra cluster of activity compared to objects. While the N1 for objects seems to reflect the return to baseline from the P1, the N170 for faces reflects a supplement activity. The electrophysiological ‘specificity’ of faces could lie in the involvement of extra generators for face processing compared to objects and the N170 for faces seems qualitatively different from the N1 for objects. Object and face processing also differed as early as 120 ms.
Faces are probably the most important stimuli for humans, as they convey essential information regarding identity, emotional expressions, intent, and are a basis of social interactions. Processing of faces has been the focus of intense research over the past 10 years and regions of the ventral occipito-temporal pathway of the brain, such as a lateral part of the fusiform gyrus, have been shown to respond more to faces than other stimuli by numerous fMRI and PET studies (e.g. Sergent et al., 1992; Puce et al., 1995; Haxby et al., 1996; McCarthy et al., 1997, 1999; Kanwisher et al., 1997; Halgren et al., 1999). The ventral pathway also processes various object categories, however, in spatially adjacent, but distinct regions (Haxby et al., 1994, 1996) and its functional organization, especially the specificity of certain of its regions to particular categories, is still subject to debate (Cohen and Tong, 2001; Downing et al., 2001; Haxby et al., 2001).
These neuroimaging studies focused on the localization of face and object processing, but they did not provide information on the timing of the processing stages as fMRI and PET signals have latencies of a few seconds (Ungerleider, 1995), a temporal scale that is very broad compared to the neurophysiological mechanisms underlying cognitive processes. This crucial timing information is available with event-related potentials (ERPs). ERP studies have found a negative component, the N170 that responds maximally to face stimuli over temporo-parietal regions of the human scalp, between 140 and 200 ms (Bötzel et al., 1995; Bentin et al., 1996; George et al., 1996; Eimer, 2000). It is believed to reflect the structural encoding stage of faces and can be seen in children as young as 4 years of age (Taylor et al., 1999, 2001). The N170 is very prominent to faces and even larger to eyes (Bentin et al., 1996; Séverac-Cauquil et al., 2000; Taylor et al., 2001), but much smaller or absent for other visual object categories.
Intracranial electrophysiological recordings from the surface of the cortex have shown a face-specific negative component maximum around 200 ms, the N200 (Allison et al., 1994, 1999; McCarthy et al., 1999; Puce et al., 1999). Face-specific N200 sites were found in distinct cortical regions separated anatomically by areas relatively uninvolved in face processing. Face-specific N200 sites were found ventrally, predominantly in the lateral parts of the fusiform gyrus, and on the lateral surface of the temporal lobe, centered in the middle temporal gyri (Allison et al., 1999). These two regions could reflect different functional aspects of face processing. According to McCarthy et al. (1999), the subsystem located in the middle fusiform gyrus would process face configuration and identity; the lateral one would process the facial features (physiognomic information), particularly the eyes. The authors suggested that this lateral activity could be recorded from the scalp surface at the electrodes where N170 is maximal and could explain why the N170 is larger for eyes than for full faces. However, the responses of N170 and N200 do not always correspond. For instance inverted faces did not lead to increased N200 amplitude in the fusiform gyrus compared to upright faces (McCarthy et al., 1999) as they did for N170, although N200 was delayed for inverted faces similarly as N170 (McCarthy et al., 1999). The relationships of N170 to the well-studied activations in the fusiform gyrus (Puce et al., 1995; Kanwisher et al., 1997; McCarthy et al., 1997) or to the N200 are not yet established, but the use of this component as an index of early face processing is now widely accepted.
Although it is accepted that N170 is very sensitive to faces, its ‘specificity’ to faces is still unresolved. An argument in favor of specific processing for faces is the inversion effect on the N170, i.e. the increased latency and/or amplitude of this component for upside-down faces (Bentin et al., 1996; Linkenkaer-Hansen et al., 1998; Rossion et al., 1999, 2000; Taylor et al., 2001; Itier and Taylor, 2002), that is not found for inverted objects (Bentin et al., 1996; Rossion et al., 2000; Rebai et al., 2001). ERP studies contrasting faces with objects have used small numbers of object categories, from one to five (Bentin et al., 1996; Eimer and McCarthy, 1999; Taylor et al., 1999, 2001; Rossion et al., 2000) and the ERPs to the different non-face objects were rarely compared. Such comparisons would help determine if there was a gradient of sensitivity of N170 to objects, with faces being perhaps the strongest eliciting stimuli. Analyzing other components elicited by face and object stimuli such as VPP (vertex positive peak) that was reported to be face-sensitive (Jeffreys, 1989; Rossion et al., 1999) and the P1 may contribute further to understanding the neural correlates of face perception.
Although not always investigated, P1 seems to reflect more than simply low-level features such as luminance and contrast; it indexes an early stage of visual processing and appears sensitive to stimuli important to humans such as faces (Taylor, 2002). Recent studies have found interesting P1 effects with face stimuli using ERPs (Linkenkaer-Hansen et al., 1998; Halit et al., 2000; Taylor et al., 2001) and MEG (Linkenkaer-Hansen et al., 1998; Halgren et al., 2000; Liu et al., 2002). Recently, we found differences in P1 latency and amplitude between inverted, contrast-reversed and upright faces, and suggested that P1 could reflect the (holistic) processing of a face as a face, whereas the N170 would reflect the relational processing of facial features (i.e. face configuration), both of which would be disrupted by inversion (Itier and Taylor, 2002). The ‘faceness’ could thus be reflected by the P1 and not by the N170. This hypothesis is reinforced by the finding that P1 was always shorter to faces than to any other category including inverted faces, across six age groups of children (Taylor et al., 2001).
The aim of this paper was to investigate the extent to which N170 and P1 were face-sensitive. To do so, we used seven object categories in addition to faces and inverted faces and several different analysis tools: peak analyses of these early components, topographical analyses and segmentation analyses of ERPs into micro-states (Pasqual-Marqui et al., 1995) in order to determine if faces were processed differently than objects.
Materials and Methods
Sixteen healthy adult volunteers participated in the study (seven females, nine males), between 21 and 33 years (mean = 24.9 years). Two males and one female were left-handed. Subjects had normal or corrected-to-normal vision. They reported taking no medication and had no history of neurological, ophthalmological or systemic disease. All subjects gave informed written consent. The experimental procedure was approved by the French Comité Opérationnel pour l’Ethique dans les Sciences de la Vie du CNRS.
Stimuli and Task Procedure
Stimuli were 450 gray-scale pictures divided into nine object categories. Textures, mushrooms, flowers, houses, lions, tools and road signs were obtained from Corel Draw CD-ROMs. Upright and inverted faces were scanned from university yearbooks (half men, half women). A checkerboard was the target stimulus. Examples of all categories are shown in Figure 1.
Subjects were seated in a comfortable chair in a dimly lit room. Testing consisted of five blocks of trials. Within each block, 99 pictures were presented, 10 pictures from each of the nine categories and nine checkerboards. Pictures were presented on a black background, centered on a computer screen 50 cm in front of the subjects with a visual angle of approximately 9 × 11°. The stimuli were presented for 500 ms, with a randomized inter-trial interval (1200–1600 ms). A central cross on the screen between stimuli helped subjects maintain fixation. Stimuli and block orders were randomized across subjects. The task was to attend to all pictures and to press a key on the keyboard to the checkerboards. Short pauses were given between blocks.
ERPs were recorded via an EasyCap containing 35 electrodes including three ocular sites to monitor vertical and horizontal eye movements, from the outer canthi and supra-orbital ridge. The electrodes were placed according to the 10/10 system and comprised Fp1/2, F3/4, Fz, F7/8, FT9/10, FC5/6, T7/8, C3/4, Cz, CP5/6, Pz, POz, P3/4, P7/8, TP9/10, PO9/10, O1/2, Iz, plus a ground electrode. Cz was the reference lead during acquisition. An average reference was used that was calculated off-line. Impedances were kept below 5 kΩ. Continuous electroencephalogram (EEG) was recorded with NeuroScan 4.1 and amplified using SynAmps system with a gain of 500. Data were recorded with a 500 Hz sampling rate through a band-pass of 0.1–100Hz. Based on the ocular electrodes, artifacts ≥ 100 µV were rejected after baseline correction was performed. Averages were then digitally filtered (0.1–30 Hz). Only non-target trials were analyzed.
Peak analyses included three components: P1 (maximal around 116 ms), N170 (maximal around 170 ms) and VPP (maximal around 170 ms). P1 and N170 were measured at Tp9, Tp10, P7, P8, Po9, Po10, O1 and O2 sites. VPP was measured at Fz. Peaks were measured within a ±30 ms window centered on the maximum of the grand-average means. Latencies were measured at the components’ maximum over each hemisphere and the amplitudes at each of the four electrodes were taken at this latency (Picton et al., 2000). Peak latencies and amplitudes were analyzed with repeated measures analyses of variance (ANOVA) using Greenhouse–Geisser adjusted degrees of freedom. Intra-subject factors were categories (9) and hemisphere (2). Electrode (4) was a factor for amplitudes except for VPP. Post-hoc t-statistics used Bonferroni corrections for multiple comparisons.
Mean amplitudes over two time periods were obtained for each subject and each condition and were rescaled to correct for amplitude variations (McCarthy and Wood, 1985). The corrected normalized mean amplitudes were subjected to MANOVA analyses in order to determine if there were significant differences in the topographies across categories, using object-face and time period as fixed factors and the 32 electrodes as dependent variables. The object-face factor was obtained by regrouping upright and inverted faces against all object categories regrouped together. The two time periods used were between 120 and 170 ms and between 170 and 220 ms, in order to encompass the beginning and ending of the N1–N170 component in all subjects across all categories.
To investigate differences among categories, we performed a segmentation analysis of the scalp activity into microstates (Pasqual-Marqui et al., 1995) using Cartool software (Denis Brunet, Functional Brain Mapping Laboratory, Geneva, Switzerland) in order to determine if there were differences present for face and object categories. A microstate is characterized by the normalized vector formed by reference free scalp electric potentials. Functional microstates (Lehman, 1987; Michel et al., 1992) refer to time segments of stable map configuration supposed to reflect the different steps of information processing. The transition from one map to another reflects the change in signal stability. According to the microstate model, brain activity can be seen as a sequence of non-overlapping microstates of variable duration and intensity dynamics (Pasqual-Marqui et al., 1995). It is assumed that a functional microstate is characterized by a unique distribution of active neuronal generators. The segmentation is a spatio-temporal cluster-analysis that determines the predominant electrical field configurations over time (Pasqual-Marqui et al., 1995) on the basis of cross-validation criteria (for a review of its applications to cognitive tasks, see Michel et al., 2001). Segmentation analysis defines the optimal number of segmentation maps (microstates) explaining the whole data set and the time at which they occur. The comparison of the number of maps, their length and their different time of occurrence across conditions, allows one to determine functional microstates present in some conditions and not in others. Segmentation maps are represented on the global field power (GFP, equivalent to instantaneous standard deviation of the scalp potential measurements) during that time period. The maximums of the GFP reflect maximal electrical activity; small GFP reflects noise. Statistical correlation between maps was also calculated to assess if maps were similar (i.e. positively correlated) or dissimilar (not correlated or negatively correlated).
The second step was to verify statistically the appearance of the segmentation maps in the individual ERPs of our subjects. To do so, each segmentation map obtained for the grand averages was compared with the moment-by-moment scalp topography of the individual ERPs for each condition. This fitting procedure was performed using strength-independent spatial correlation (Pegna et al., 1997; Ducommun et al., 2002). Thus, at each time point in the individual ERPs, the scalp topography was compared to the segmentation maps of grand averages and labeled according to the one with which it best correlated. This enabled us to determine how much of the global explained variance (GEV) of one condition a given segmentation map explained. Repeated measures ANOVAs were performed on the GEV using categories and number of maps as within-subject factors. We hypothesized that if one segmentation map differentiated activity for faces from that of objects, the ANOVAs would show that this map explained one condition (faces) significantly better than other conditions (objects). This kind of analysis and reasoning has been successfully applied previously (e.g. Pegna et al., 1997; Ducommun et al., 2002).
All subjects performed a 100% hit rate for targets, with no misses. Only three false alarm responses (key press for non-targets) were made by two subjects. The mean reaction time (RT) to the targets was 451 ms.
ERPs are shown at the electrodes analyzed for all categories in Figure 2.
No overall effect of category or hemisphere was found on P1 latency, despite a tendency for upright faces to elicit shorter P1s (P = 0.07) than other categories as shown on Figure 3a. As in previous studies we found shorter P1 latencies for upright than inverted faces (Taylor et al., 2001; Itier and Taylor, 2002), we performed an ANOVA with only these two categories. Confirming earlier results, P1s were shorter for upright compared to inverted faces [F(1,15) = 10.33, P = 0.006].
P1 amplitude showed a main effect of category [F(4.8,72.1) = 20.96, P = 0.0001]; the amplitude for upright faces was significantly higher than for all categories except inverted faces and lions. Similarly, amplitude for inverted faces was significantly higher than for all categories except upright faces (Fig. 3b). No main effect of hemisphere was found but a significant category × hemisphere interaction [F(4.7,70.4) = 3.13, P = 0.015] revealed an increased amplitude over the right hemisphere for upright and inverted faces only. An effect of electrode [F(1.1,17.1) = 14.31, P = 0.001] was due to amplitudes at temporal-parietal sites being lower than at other sites and a category × electrode interaction [F(5.6,84.4) = 4.18, P = 0.001] revealed that the amplitude differences among categories was maximum at occipital electrodes. Again, we performed an ANOVA on P1 amplitude including only upright and inverted faces. We found that P1 was larger for inverted than for upright faces [F(1,15) = 7.42, P = 0.016] as seen on Figure 2, larger over the right than the left hemisphere [F(1,15) = 10.99, P = 0.005], thus confirming the previous category × hemisphere result, and largest at occipital sites as seen in an electrode effect [F(1.3,19.2) = 20.24, P = 0.0001].
Because objects within each category differed widely in shape, local luminance, local contrast and apparent distance (Fig. 1), the differences found as early as P1 could represent low-level feature differences. As ERPs were averaged across trials, we averaged the 50 pictures of each category pixel by pixel in order to get a mean object picture per category. As can be seen on Figure 4, the mean image did not represent any distinct object for any categories except for faces and inverted faces for which a clear face appeared (note that these averaged faces appear as smiling women’s faces although equal numbers of male and female faces were used in both face categories). Thus, our results on P1 amplitude between faces and objects could be due to local contrasts seen only for faces and inverted faces: in averaging the ERP trials obtained for each object picture, the low-level variations inducing P1 differences could be cancelled for objects but not for faces. Although the origins of the amplitude difference could be argued (see Discussion), the significant latency differences found between faces and inverted faces are unlikely due to only low-level differences. Luminance and mean contrast were the same; the only differences were the local contrasts configuration. Thus, the significant latency delay found for inverted faces likely reflects an early discrimination processing between these two face categories.
A category effect [F(4.2,63.7) = 5.55, P = 0.001] was due to VPP latency being shortest for upright faces; post-hoc comparisons were significant only between faces, and inverted faces (P = 0.0001) and lions (P = 0.02).
VPP amplitude showed a main effect of category [F(4.3,64.6) = 43.1, P = 0.0001] due to amplitude for inverted and upright faces being larger than that for all other categories (P = 0.0001 for all comparisons), as shown in Figure 3b. Inverted and upright faces did not differ between themselves. VPP amplitude was also larger for houses than for texture (P = 0.001).
N170 latency showed a main effect of category [F(3.9,59) = 16.97, P = 0.0001] as latency for upright faces (mean latency of 156 ms) was shorter than all other categories (P = 0.0001 for all comparisons except with flowers, where P = 0.001), inverted faces included. Latency was longest for lions (significantly compared to texture, inverted faces, houses and road signs), and longer for inverted faces than for road signs (P = 0.007). No effect of hemisphere was found.
N170 amplitude showed a main effect of category [F (4,60.4) = 79.23, P = 0.0001] due to upright and inverted face amplitudes being larger than all the other categories on the post hoc contrasts (Fig. 3b; P = 0.0001 for all comparisons). Road signs showed larger amplitude than tools (P = 0.008) and texture (P = 0.001). No overall hemisphere effect was found for N170 amplitude. A main effect of electrode [F(1.7,25.4) = 8.64, P = 0.002] was due to N170 amplitude being smallest at occipital sites but not significantly different among the other electrode pairs. However, a category by electrode interaction [F(5.6,84.5) = 11.01, P = 0.0001] was due to N170 amplitude being significantly different across parietal, parietal-occipital and temporal-parietal electrodes only for upright and inverted faces. This effect localized the response to faces and inverted faces over the parietal and parietal-occipital areas of the brain, an effect not seen with any of the other categories. We also ran a separate ANOVA on the N170 amplitude for upright and inverted faces only. Inverted faces elicited larger N170s than upright faces [F(1,15) = 7.89, P = 0.013], consistent with numerous reports in the literature. Across all electrodes, no hemisphere effect was found, but a hemisphere × electrode interaction [F(2,29.7) = 3.56, P = 0.041] revealed that N170 amplitude was larger over the right than the left hemisphere for posterior parietal electrodes (P7/P8).
Scalp Topographies Analyses
Voltage maps are shown for all categories between 90 and 140 ms showing the P1 component (Fig. 5, left) and between 146 and 200 ms showing the N170/N1 component (Fig. 5, right). Bilateral occipital positivities are shown for all categories at the latency of P1, but of greater intensity for upright and especially for inverted faces. The activity is also larger over the right hemisphere for inverted faces, reflecting the hemisphere effect found on P1 amplitude for upright and inverted faces only. The latency difference between inverted and upright faces is also clear, as the maximum activity lasts ∼10 ms longer for inverted faces. During the N1–N170 latency period, bilateral negativities situated occipito-parietally are found for faces only and correspond to the N170. They are clearly seen from 146 to 166 ms for upright faces and from 156 to 188 ms for inverted faces, reflecting the latency delay of N170 for inverted faces as analyzed earlier. However, this bilateral negativity is absent for other object categories. Instead, between 150 and 200 ms where the N1–N170s were found for object categories, only an occipital positivity or no activity was seen for objects, suggesting that the N1 is the return to baseline after P1 and not a real negativity such as N170. This hypothesis is reinforced by the topography analysis. The MANOVA performed on the mean normalized amplitudes across the two time periods revealed a main effect of the object/face factor [F(31,254) = 3.07, P < 0.0001] that was significant for all electrodes except CP5, P3, P4 and POz. Thus, the topographies obtained for upright and inverted faces were significantly different from that obtained for object categories during the time window where the N1–N170 was measured in all subjects.
For upright and inverted faces only, a frontal positivity was seen at the same latency as the N170 that corresponded to the VPP component recorded at Fz electrode. Results from latency and amplitude analyses (Fig. 3), along with the voltage maps, suggest that VPP may be the positive counterpart of N170 as suggested by George et al. (1996). Therefore, we will not discuss VPP further.
The segmentation analysis was done across the whole time period (–100 to 1000 ms) and enabled us to compare statistically all categories. Only the 50–400 ms period is displayed in Figure 6 showing the GFP during that time period where it was maximal. During the segmentation, maps are randomly assigned a number but across conditions, two maps having the same number and being situated at the same time represent the same cluster of activity. However, within each condition, two maps having the same number but appearing at different times don’t necessarily mean that they have the same functional relevance (Pasqual-Marqui et al., 1995).
An extra map was found for upright and inverted faces (in black on Fig. 6) that was not there in any other object categories. This map started at 138 ms and ended at 178 ms for upright faces, reaching a GFP maximum at 156 ms, the same latency at which maximum N170 amplitude was observed for upright faces on voltage maps (Fig. 5) and on peak analysis. For inverted faces, the extra map started at 148 ms and ended at 194 ms, reaching the maximum GFP at 170 ms, the mean latency of the N170 maximum for inverted faces, i.e. 14 ms later than upright faces on average. It was also larger in magnitude (Fig. 6), corresponding to the larger N170 amplitude for inverted faces. For both upright and inverted faces, this extra map was negatively correlated with the preceding maps corresponding to the first GFP peak of activity (maps number 5 and 1 on Fig. 6, correlation of –64 and –88%, respectively) and the following map corresponding to the third GFP peak (maps 5), demonstrating opposite polarity between these clusters of activity. This extra map corresponded to the N170, which is negative and opposed to the positive components P1 and P2 reflected in the first and third GFP maximums respectively. Thus, for all object categories, no stable map reflecting the N170 component was found. In contrast, at the latency of the N1, the GFP was at its minimum, reflecting noise. In line with the voltage maps, this suggests that the negative-going component (or N1) found for objects around the same latency as N170 is not an N170, but the return to baseline after P1. In contrast, for upright and inverted faces, the N170 corresponds to a real supplement cluster of activity (extra map) compared to objects. The statistical assessment of these maps across subjects was obtained using repeated measures ANOVAs of GEV performed on two time periods, with categories (9) and maps as within-subject factors. The maps used in each time-period were the ones seen during that period on the segmentation analysis (Fig. 6). The first time period was from 0 to 130 ms, encompassing the P1 component. For that period, a 9 × 2 (maps 1 and 5) ANOVA was performed. No main effects of category, map or their interaction were found. The second time-period from 132 to 200 ms encompassed the N170–N1 of all categories. A 9 × 4 (maps 1, 3, 4, 5) ANOVA was performed. No main effects were found but the crucial category by map interaction was significant [F(8,120.7) = 4.28, P < 0.0001], reflecting that certain segments selectively explained some conditions. In order to assess which maps better explained which categories, we performed post-hoc ANOVAs on each segmentation map and searched for category effects. For map 4, a significant effect of category was found [F(3.5,52.7) = 17.2, P < 0.0001]; paired comparisons revealed that upright and inverted faces were significantly different than all other categories, reflecting that this map 4 explained processes involved specifically in faces and not in objects. No other comparisons were significant for that map. No category effects were found for maps 1, 3 and 5.
In summary, the segmentation analysis revealed that an extra map was found for upright and inverted faces compared to the other seven object categories. That map was found at the latency of the N170, across subjects and explained significantly better the activity for faces than for objects. In contrast, at the latency of the N1 for objects, no such map was seen.
No differences were found for maps obtained at the latency of the P1 component, although the GFP peak representing the P1 component was larger for upright and inverted faces compared to other object categories (Fig. 6), reflecting the larger amplitude of the P1 for those two categories. Moreover, for the object categories, the amplitude of this first GFP peak lasted longer, suggesting that the processes reflected in the P1 component for upright faces were finished faster. By the time the activity leading to the N170 took place for faces, objects were still showing a positive activity related to P1 component, a further argument in favor of the N1 for objects being in fact the return to baseline of the P1 component.
In this study we found that compared to seven other object categories, faces and inverted faces showed distinctive effects that were seen on P1 and N170 peak analyses, voltage maps, and segmentation analyses. The implications of these results on the way the brain processes faces and objects are discussed.
P1 is an early measure of endogenous processing of visual stimuli, originally found to index only spatial processing (Mangun, 1995), although more recently it has been shown to index non-spatial features processing as well (for a review, see Taylor, 2002). In this study, P1 tended to be earlier for upright faces and was significantly larger for upright and inverted faces compared to all other object categories except lions (Fig. 3a,b). Interestingly, lions are part of the superordinate category animals, which can also be identified very rapidly, perhaps due to their importance to humans (e.g. Batty and Taylor, 2002). Moreover, animals have faces (or heads) that may be treated rapidly as such by the visual system; this could be a reason why the P1 amplitude was as large for faces as for lions. Inverted faces also elicited significantly longer and larger P1s than upright faces, a result consistent with previous reports (Linkenkaer-Hansen et al., 1998; Taylor et al., 2001; Itier and Taylor, 2002) and in agreement with early discrimination among faces around 100 ms (Debruille et al., 1998) and between face and non-face stimuli around 120 ms (Halgren et al., 2000; Taylor et al., 2001; Liu et al., 2002; Taylor, 2002). Although no differences in segmentation maps were found for P1, the segmentation analysis showed that at the latency of P1, GFP for faces and inverted faces was larger and of shorter duration (Fig. 6) than for objects, suggesting faster processing for faces. Segmentation analyses also showed larger and later GFP activity for inverted faces compared to upright faces, paralleling the larger and later P1s for inverted faces.
The processing of upright and inverted faces as early as 110 ms fits well with recordings in the temporal cortex in macaques, showing that global information about faces is conveyed by neurons in the first 100 ms (Sugase et al., 1999). Very early effects distinguishing amongst categories around 50 ms (Seeck et al., 1997; Mouchetant-Rostaing et al., 2000) are likely due to low-level features (VanRullen and Thorpe, 2001), consistent with the findings that only the C1 component, which starts around 50 ms, reflects activity in V1 (Di Russo et al., 2001; Foxe and Simpson, 2002), an area sensitive to low-level features. In fact, only the first 10–15 ms of C1 represents activity in V1, the rest of the component represents the activity of multiple extrastriate visual areas (Foxe and Simpson, 2002). P1 also involves several areas, such as areas V3, V3a, V4 and the fusiform gyrus, even for simple stimuli (Di Russo et al., 2001). The contribution of these areas (and maybe others) in the generation of a larger P1 to faces than objects seems probable.
Amplitude differences at the P1 level between six object categories and faces could nevertheless be due to low-level feature differences; as seen in Figure 4, the mean pictures of each category did not show any recognizable objects except for inverted and upright face categories. Thus, differences in amplitude could represent local contrast differences. However, that in itself is a way to discriminate faces from objects. The average faces in Figure 4 are very similar to low-frequency faces neonates apparently see, given their poor visual acuity (Slater, 1993). There is considerable evidence that babies orient preferentially towards faces, be they schematic or photographic (e.g. Goren et al., 1975; Maurer and Young, 1983; Johnson et al., 1991; Simion et al., 1998) and that this visual preference is determined by their psychophysical properties, such as spatial frequencies and contrasts, that meet the needs of the undeveloped visual system of neonates (Banks and Salapatek, 1981; Kleiner, 1987, 1990). Recently, it was shown that the simple presence of elements in the superior part of the configuration of the patterns used, was the critical factor enabling orientation responses (Simion et al., 2001; Turati et al., 2002), independent of the type of visual patterns. In natural upright faces, the upper part contains the eyes, a region of high contrast and of crucial social importance (Baron-Cohen, 1995). As faces were presented centrally, the effect found on P1 amplitude could be due to the high contrast patterns of eyes being found in the upper visual field for upright faces. The reason why inverted faces produced a larger P1 than upright faces is unknown, but the results support the view that the salience of face configuration is an aspect that distinguishes faces from objects early in the visual processing and is probably used by the visual system from infancy to pull attention towards faces (e.g. Johnson, 2001). P1 seems to represent an early global response to faces, and could thus index the perception of a face as a face (holistic perception), enabling attention shifts towards these biologically important stimuli. This idea would fit well with a recent model of visual processing (Bullier, 2001) as by 100–120 ms, integrated processing including feedback could be completed in the human ventral pathways. It is possible that early processing is faster for faces than for objects, which could enable feedback to orient attention to faces, hence increasing activity for that category at the P1 level.
N170 latencies were significantly shorter for upright faces than for all categories of objects (Fig. 3a); N170 amplitudes were also significantly larger for upright and inverted faces than for all other seven categories (Fig. 3b). The voltage maps revealed bilateral negativities in parietal-occipital areas for inverted and upright faces, but not for other objects for which an occipital positivity was seen (Fig. 5b). The analysis of the scalp topographies after normalizing amplitudes confirmed that those obtained between the time-period encompassing the N1–N170 component were significantly different between the object and face categories. The segmentation analyses revealed that an extra map was found for upright and inverted faces, that was absent for the other seven object categories (Fig. 6). The GFP associated with this map was maximal at the latency of the N170 and the map was negatively correlated to the preceding and following maps, denoting a different cluster of negative activity. For inverted faces, this extra map was delayed and larger in the GFP amplitude, paralleling the larger and delayed N170 for inverted faces. In contrast, no such map was found for objects at the latency of the N1 and no other map differences were seen between object categories. Moreover, by the time the activity leading to the N170 took place for faces, objects were still showing a positive activity related to P1 component or no activity (GFP at noise level). When the maps were fitted to the subjects’ individual ERPs, statistical analyses revealed that the extra map explained the activity for face categories significantly better than that for objects and was thus associated with the qualitative difference between face and object processing.
All the above results strongly suggest that the face N170 is qualitatively different from the N1 for objects. While the N1 for objects seems to reflect the return to baseline from P1 (GFP at noise level), the face N170 is the result of a real supplementary activity. This supplementary activity could arise from lateral temporal regions and could be what defines the ‘specificity’ of faces in scalp electrophysiological studies.
Face-specific N200 sites recorded intracranially (Allison et al., 1994, 1999; McCarthy et al., 1999; Puce et al., 1999) were found in two major areas: a ventral region including the fusiform gyrus (FG) and a lateral region centered in the middle temporal gyri (MTG) including the inferior temporal gyri (ITG) (Allison et al., 1999). Interestingly, the electrodes where the N170 is usually maximal on the scalp (T5/T6 or P7/P8) are situated over the MTG (Homan et al., 1987). In the intracranial recordings, N200 lateral sites sensitive to face parts were found primarily in the ITG lateral to the occipito-temporal sulcus (OTS), at the border between ventral and lateral cortices (McCarthy et al., 1999). The spatial location of this activity could be recorded from the scalp surface at the electrodes where N170 is maximal, and could explain why the N170 is larger for eyes than for full faces (McCarthy et al., 1999). N170 and the lateral N200 could reflect similar processes, despite the difference in timing. Even in scalp ERP studies, the mean latency of the N170 varies widely across experiments, from 150 ms, as in the present experiment, to 190 ms (George et al., 1996). In addition, the N200 reported by Allison and collaborators was recorded in epileptic patients who were still on some medication (Allison et al., 1999). It is possible that processing was delayed for these patients, especially as the other ERP components were also shifted in latency, the equivalent of the P1 being around 150 ms, and that of the P2 around 250 ms. Differences due to experimental designs, references used, stimulus presentation software and populations studied could account for these timing differences.
The superior and middle temporal gyri (STG–MTG) region is also activated in fMRI studies on face perception and recognition, in addition to the more frequently discussed FG and the inferior occipital gyri (Puce et al., 1995; Kanwisher et al., 1997; Halgren et al., 1999; Haxby et al., 1999; Hoffman and Haxby, 2000; see Haxby et al., 2000 for a review). These regions have also been found in the intracranial electrophysiological recordings mentioned above (Allison et al., 1994, 1999; McCarthy et al., 1999; Puce et al., 1999). The N170 could come from this lateral region, as hypothesized previously (Bentin et al., 1996; McCarthy et al., 1999). The activation of this region by upright and inverted faces would be consistent with face-selective cells in the macaque superior temporal sulcus (Perrett et al., 1984, 1985; Desimone, 1991). STG, ITG and especially MTG are also part of the human STS region that contains neurons selective for eye gaze, head direction, facial expressions, facial identity and even movement of the hand and mouth (for a review, see Allison et al., 2000). A larger population of cells responsive to eye gaze than to faces in the STS could explain why eyes alone often elicit larger N170 than full faces (Bentin et al., 1996; Séverac-Cauquil et al., 2000; Taylor et al., 2001) despite the N170 for faces not being an eye detector (Eimer, 1998). The N170 recorded for faces and for eyes alone seem to come from different generators (Taylor et al., 2001; Shibata et al., 2002) and could be elicited by different neuronal populations situated very closely to each other in these temporal regions.
We found similar activation for inverted compared to upright faces, with similar topographical distributions and the same extra map in the segmentation analyses, although the map was larger and delayed for inverted faces, paralleling the larger and delayed N170 for that category. Thus, it seems that the difference between upright and inverted faces lies in the intensity and delay of the response rather than in the recruitment of additional areas. This hypothesis is supported by recent dipole analyses showing that the N170 to inverted and upright faces comes from the same lateral temporal region, near the superior temporal sulcus, and that the only difference between the two is the delayed response for inverted faces (Watanabe et al., 2003). This contrasts with fMRI studies reporting no increase or a small decrease of the lateral fusiform gyrus activity with face inversion (Kanwisher et al., 1998; Aguirre et al., 1999; Haxby et al., 1999). The main difference between upright and inverted face processing thus seems to be a time delay and perhaps the recruitment of a larger cell population in the same area in processing inverted faces. It could be hypothesized that in addition to the cells responding to faces, cells responding preferentially to eyes could also be recruited as inverted faces are processed analytically and inverted eyes are still seen as eyes.
In contrast, we did not find the same topographical distribution for the N1 recorded for the seven object categories tested, nor did we find the extra map found for faces. Instead, at the latency of the N1, the GFP was at its minimum, i.e. noise level (Fig. 6). At most sites, the N1 was even positive for the majority of the object categories (Figures 2 and 5). However, the analysis of the N1 revealed significant differences among certain categories. N1 was most delayed for lions, significantly compared to three object categories; it was also larger for road signs than for tools and texture. Although only these differences were seen among object categories, they nevertheless suggest that there could be a gradient of the N1 recorded for certain categories. An extra map such as found for faces was not seen for road signs, but road signs clearly lead to a large and negative N1 (Fig. 2). Segmentation analysis may not be as sensitive as peak analyses. The reason why certain object categories would elicit larger N1 than others is unknown. One possibility could be the level of categorization, as subordinate categorization leads to increased N1 amplitude compared to superordinate or entry level categorization (Tanaka et al., 1999). Some studies have also found larger N1 amplitudes for object categories for which subjects already are or are becoming experts compared to objects for which they are novice (Tanaka and Curran, 2001; Rossion et al., 2002). Interestingly, road signs are a category that is learned in order to drive and all our subjects were drivers. Thus, expertise may explain the larger N1 amplitude for this category. It seems unlikely that expertise could explain the difference seen between faces and objects as inverted faces are not a category for which our subjects were experts, yet it leads to even larger amplitudes than upright faces. Furthermore, faces are also processed faster as reflected by the earlier latency of the N170 for upright faces compared to all objects, and the expertise effects on the N1 were seen only on amplitude.
In conclusion, the difference between the scalp recordings to faces and objects seems to lie in the recruitment of extra generators for faces, as shown by an extra map for that category in segmentation analyses; the N1 for objects seems qualitatively different from the face N170 and could reflect the return to baseline after P1. However, N1 can be modulated depending on the category and this modulation could be due to the expertise level of subjects for that category, to the orientation of the underlying generators or to other reasons that still need to be explained. Another important finding was early effects of face processing on P1, suggesting a differential processing of faces and objects around 110 ms. Our results are in agreement with the hypothesis that faces are perceived holistically as faces around 100–120 ms, reflected in P1, whereas N170 could reflect the relational processing (i.e. face configuration) of facial features enabling subsequent recognition of identity (Itier and Taylor, 2002).
We thank Denis Brunet for his software Cartool and both him and Christophe Michel for help in using this program. We also thank Renaud Lestringent for his contribution in the segmentation analyses. This study was supported by the French Fondation pour la Recherche Medicale (F.R.M) to R.J.I.