The aim of this study was to determine the extent to which the neural representation of faces in visual cortex is viewpoint dependent or viewpoint invariant. Magnetoencephalography was used to measure evoked responses to faces during an adaptation paradigm. Using familiar and unfamiliar faces, we compared the amplitude of the M170 response to repeated images of the same face with images of different faces. We found a reduction in the M170 amplitude to repeated presentations of the same face image compared with images of different faces when shown from the same viewpoint. To establish if this adaptation to the identity of a face was invariant to changes in viewpoint, we varied the viewing angle of the face within a block. We found a reduction in response was no longer evident when images of the same face were shown from different viewpoints. This viewpoint-dependent pattern of results was the same for both familiar and unfamiliar faces. These results imply that either the face-selective M170 response reflects an early stage of face processing or that the computations underlying face recognition depend on a viewpoint-dependent neuronal representation.
Recognizing faces in a visual scene is a simple and effortless process for most human observers. However, the face of any individual can generate countless different retinal images depending on the viewing conditions. The visual system must take into account sources of variation caused by changes in viewpoint, but at the same time be able to detect differences between faces. Models of face processing propose that the earliest level of processing involves computation of a view-dependent representation. Information from this early stage of processing is compared with view-invariant representations of familiar faces for recognition (Bruce and Young 1986; Burton et al. 1999).
Functional imaging studies have also revealed a network of face-selective regions in the occipital and temporal lobe that are thought to underlie our ability to perceive and recognize faces (Haxby et al. 2000). Processing of facial identity is associated with inferior temporal lobe regions, such as the fusiform face area (Kanwisher et al. 1997; Grill-Spector et al. 2004). These inferior temporal lobe structures project to anterior temporal regions that contain semantic information associated with a particular facial identity (Rotshtein et al. 2005). A region posterior to this, known as the inferior occipital cortex, or occipital face area (Gauthier et al. 2000) is thought to be implicated in an earlier structural encoding stage of face processing (Hoffman and Haxby 2000).
Event-related potential (ERP) and magnetoencephalography (MEG) studies have also shown that faces and other objects can be distinguished by the pattern of electrical activity across the occipitotemporal lobe (Nobre et al. 1994; Allison et al. 1999). For example, ERP studies have shown a face-selective potential occurring between 140 and 200 ms after stimulus onset which appears twice as large for face stimuli compared with a variety of other stimuli (Bentin et al. 1996; Jeffreys 1996; Liu et al. 2002). MEG studies have also revealed an early face-selective potential, known as the M170, which has been shown to correlate with the successful recognition of a face (Liu et al. 2002). Consistent with behavioral studies (Yin 1969), the M170 component has been found to be delayed for inverted faces compared with upright faces (Wantanabe et al. 2003; Itier et al. 2006). The M170 has also been found to be significantly reduced in some, but not all patients with prosopagnosia (Harris et al. 2005). The M170 is often considered to reflect the magnetic equivalent of the N170. Source analysis techniques have suggested that the M170 and N170 may originate in inferior temporal regions, specifically in the locale of the fusiform gyrus (Halgren et al. 2000; Itier and Taylor 2002). However, recent studies have suggested that the N170 and M170 may reflect 2 distinct sources (Wantanabe et al. 2003; Itier et al. 2006).
The aim of this study is to use the technique of adaptation to ask whether the M170 potential reflects an underlying representation of facial identity, and whether this representation is invariant to changeable aspects of faces. The principle underlying adaptation is that repetitive presentation of a stimulus results in a decrease in the response of a neuronal population that is selective for that stimulus (Krekelberg et al. 2005; Grill-Spector et al. 2006). The nature of the neural representation can be determined by varying the stimulus. If the underlying neural representation is insensitive to a change then the neural response will remain the same. Alternatively, if the neurons are sensitive to this manipulation, the response will return to the initial level. Although little is known about the effect of stimulus repetition on the M170 response, a recent study has shown a reduction in the amplitude of the M170 following repetition of different face images when using rapid presentation rates (Harris and Nakayama 2006). Recently, we reported that adaptation of the N170 potential to facial identity was sensitive to changes in the viewpoint of the image (Ewbank and Andrews 2006). However, the changes in viewpoint used in these studies were quite large (variations in subject pose were of the order of ±45°) and only unfamiliar faces were used. It is possible, therefore, that viewpoint-invariant responses may be found when presenting smaller changes in viewing angle (for example, variations of <10°), or when showing faces that are familiar to the observer. Our hypothesis is that, if the neural representation underlying the M170 response is selective for the identity of a face, we would predict a reduced response to repeated images of the same face. We would also predict that this adaptation should be invariant to changes in the viewpoint of the face and that this invariance should be found over a greater degree of viewpoint change for familiar compared with unfamiliar faces. In contrast, any recovery from adaptation when images of the same face are presented over different viewpoints would suggest that the M170 reflects a viewpoint-specific stage in face processing.
Eighteen subjects (9 females; mean age 23) participated in the study. All observers had normal or corrected-to-normal visual acuity. Fifteen subjects were right-handed. Written consent was obtained from all subjects. All imaging took place at the York Neuroimaging Centre (YNiC).
In order to identify sensors that responded preferentially to images of faces, subjects viewed gray-scale images from different object categories: 1) unfamiliar faces; 2) familiar faces 3) inanimate objects; 4) places (buildings, indoor, and natural landscapes); and 5) textures. Photographs of unfamiliar faces were taken from a database of the Psychological Image Collection at Stirling (http://pics.psych.stir.ac.uk), and images of familiar faces were taken from the World Wide Web. Images of inanimate objects, places, and textures were obtained from various sources including commercial clip-art collections (CorelDraw, Microsoft). All images were projected onto a screen at a viewing distance of approximately 80 cm and subtended a viewing angle of 9° × 9°. Images were presented in a series of stimulus blocks, with each block containing 25 images. Each image was presented for a period of 400 ms, and was followed by a blank screen containing a fixation cross for 1100 ms. In each stimulus block, 5 images from each object category were randomly interleaved. A total of 8 stimulus blocks were presented. Subjects were required to perform a target detection task, by pressing a response button when they saw an image containing a small red dot. Target trials were removed from the subsequent analysis. A resting period was inserted in between each block, during which an equiluminant gray screen was presented for 8 s.
There were 2 adaptation scans, one consisting of unfamiliar faces (Fig. 1) and another containing familiar faces (Fig. 2). The experimental procedure was identical for both scans. In each scan, stimulus blocks contained either 12 images of the same face (same identity) or 12 images of different faces (different identity). Stimulus blocks also varied in the degree of viewpoint change about the vertical axis between images. Four different viewpoint change conditions were used: 1) 0° same viewpoint; 2) 2° change; 3) 4° change; and 4) 8° change. Thus, there were 8 different stimulus conditions in each scan. Images in the same viewpoint condition were shown from a frontal viewpoint throughout the block. In the viewpoint change conditions, the first face image in each block was always a frontal view; this was followed by subsequent images rotation to the left or right of the preceding image (see Figs 1 and 2). Faces were rotated 3 increments to the left and the right. For example, in the 2° change condition, faces were shown over a range of 12° (0°, 2°, 4°, 6°, 4°, 2°, 0°, −2°, −4°, −6°, −4°, −2°).
To generate the images of unfamiliar and familiar faces at different viewpoints, we recovered a 3-dimensional model of each face from a single, frontal view using shape-from-shading. This technique exploits a statistical model of facial shape to render the shape-from-shading problem tractable (Smith and Hancock 2006). By restricting the algorithm to a certain class of objects (namely faces); the model provides a sufficiently powerful constraint to allow accurate reconstructions from a single image. The estimated 3-dimensional models can be rotated to yield realistic images of each face from different viewpoints (see Figs 1 and 2).
Each image was presented for 400 ms followed by a 1100-ms blank screen containing a fixation cross. Each condition was repeated 4 times in a counterbalanced block design, making a total of 32 stimulus blocks. Subjects were required to perform a target detection task in which they were required to respond when they saw an image containing a red dot. Target trials were removed from the subsequent analysis. Stimulus blocks were separated by periods of fixation when an equiluminant gray screen was presented for 8 s. At the end of the experiment subjects were asked to name the familiar faces that had been shown in the experimental scan.
MEG recordings were made using a 248-channel whole head system with superconducting quantum interference device based first-order magnetometer sensors (Magnes 3600WH 4D-Neuroimaging MEG system at the YNiC, University of York, UK). Magnetic brain activity was digitized continuously at a sampling rate of 1017.25 Hz and was filtered with a 1-Hz high-pass and 200-Hz low-pass cut-off. Average waveforms for each subject were computed using a 1-s epoch (200 ms before and 800 ms after stimulus onset). The average waveforms were further processed off-line using a 200-ms prestimulus baseline correction and were high-pass filtered between 3 and 30 Hz. Artifact rejection was performed to remove epochs that exceeded a predetermined amplitude threshold (alpha = 0.05).
In the localizer scan, a contour plot was then used to locate the 10 largest contiguous face-selective sensors. The peak amplitudes and peak latencies were calculated for each condition in each hemisphere for each subject. Analysis of the MEG amplitude in the viewpoint scans was then restricted to these face-selective sensors of interest (SOIs). A multifactorial analysis of variance (ANOVA) was used to determine the main effects of identity (same, different) hemisphere (left, right), viewpoint (0, 2, 4, 8), and fame (familiar, unfamiliar). To assess whether the reduction in the M170 amplitude was statistically significant in different conditions, we performed a 2-sample t-test on the peak amplitudes across subjects. Finally, we calculated an adaptation index (AI) to quantify the reduction in the M170 amplitude during the same image blocks compared with different image blocks: Response[same] − Response[different].
First, we determined which sensors showed selective responses to images of faces compared with other categories of stimuli (Fig. 3A). We located SOIs in occipitotemporal regions that had a significantly higher response to images of unfamiliar and familiar faces than to nonface stimuli in each subject. Eighteen subjects showed face-selective M170 responses in right hemisphere sensors, with 12 showing an additional left-hemisphere face-selective M170. We then measured the peak amplitude of the M170 in response to each of the 5 categories shown in the localizer scan (Fig. 3C,D). A 2-way ANOVA (Hemisphere × Category) revealed a highly significant effect of category (F4,48 = 51.63, P < 10e−17), no effect of hemisphere (F1,12 = 1.65, P = 0.22), and no interaction between hemisphere and category (F4,48 = 0.73, P = 0.57). The mean amplitude response to unfamiliar faces in both the right and left hemisphere was significantly greater than objects right hemisphere (RH): (t(17) = 8.79, P < 10e−8); left hemisphere (LH): (t(12) = 6.29, P < 0.0001); places RH: (t(17) = 10.44, P < 10e−9); LH: (t(12) 11.82, P < 10e−7), and textures RH: t(17) = 7.68, P < 10e−7); LH: t(12) = 7.73, P < 0.0001). There was no significant difference between the response to unfamiliar faces and familiar faces in either the right (t(17) = 0.25, P = 0.80), or left hemisphere (t(12) = −0.06, P = 0.95). The mean amplitude to familiar faces in both hemispheres was also significantly larger than objects RH: (t(17) = 9.30, P < 10e−8); LH: t(12) = 11.29, P = 10e−7), places RH: (t(17) = 11.58, P < 10e−9); LH: (t(12) = 7.99, P < 10e−6), and textures RH: (t(17) = 8.72, P < 10e−7); LH: (t(12) = 5.53, P < 0.0001).
The mean latency of the face-selective M170 was 155.6 ms in right hemisphere and 166.7 ms in left hemisphere. A 2-way ANOVA of latency (Hemisphere × Category) revealed a significant effect of hemisphere (F4,48 = 27.0, P > 0.001) with all categories showing a significantly earlier potential in right hemisphere sensors than left-hemisphere sensors. Response data indicated no difference in the response times across different categories in the target detection task (F4,68 = 0.65, P = 0.84).
A 4-way ANOVA 2 × 2 × 2 × 4 (Identity, Hemisphere, Familiarity, Viewpoint) found no effect of identity, fame, hemisphere, or viewpoint. However, there was a significant interaction between Hemisphere × Identity × View (F3,36 = 4.04, P < 0.05). Figure 4 shows the response of the M170 in the right hemisphere to the different face conditions. A 3-way ANOVA (2 × 2 × 4) (Identity, Fame, Viewpoint) revealed a significant effect of viewpoint (F3,51 = 4.33, P < 0.01), and a significant interaction between viewpoint and identity (F3,51 = 4.00, P < 0.05), in the right hemisphere. In the 0° (same viewpoint) condition, we found that the peak M170 response to images of the same face was significantly lower than the response to different faces in face-selective sensors for both unfamiliar (t(17) = 3.57, P < 0.01) and familiar (t(17) = 2.25, P < 0.05) faces (see Fig. 4). We then measured the M170 response to the same and different unfamiliar faces during the 2°, 4°, and 8° angle change conditions. The results showed no difference in the M170 response to images of the same face compared with different faces at a rotation of 2° (unfamiliar, t(17) = −0.60, P = 0.53; familiar, t(17) = −0.40, P = 0.69), 4° (unfamiliar, t(17) = −0.22, P = 0.82; familiar, t(17) = −0.25, P = 0.80), or 8° (unfamiliar, t(17) = 0.35, P = 0.72; familiar, t(17) = 0.62, P = 0.54) for either the unfamiliar or familiar conditions (Fig. 5). We found no difference in the latencies of the target response across the same and different conditions. No significant effects were found in the left hemisphere. Subjects were successfully able to recognize the familiar faces used in the experimental scan. Mean recognition rate across familiar faces was 90.28 ± 8.3%. No subject recognized fewer than 75% of faces.
The aim of this experiment was to determine the role of the M170 response in face recognition. Specifically, we asked whether the M170 response: 1) is involved in representing facial identity; 2) reflects a viewpoint-dependent or a viewpoint-invariant representation of faces; and 3) differs in its response to familiar and unfamiliar faces. Using an adaptation paradigm, we found that the M170 amplitude in the right hemisphere is significantly reduced during the presentation of identical face images shown at the same viewpoint compared with different face images shown at the same viewpoint. To determine whether the neural representation underlying the M170 response was invariant to changes in the face image, we systematically varied the viewpoint of the images. We found that there was no difference in the magnitude of the M170 response between the same and different conditions when the viewpoint of the face was varied. Furthermore, we found no significant difference in the M170 response to familiar and unfamiliar faces.
These results are consistent with a recent ERP study, in which we showed that a similar N170 response was elicited to the same and different faces when they varied in viewing angle (Ewbank and Andrews 2006). The present study goes beyond this by showing that this viewpoint-dependent response is still evident for quite small changes in viewing angle. Clearly, this provides strong evidence for a view-dependent representation. Although adaptation to the identity of a face shown in this study is consistent with other ERP studies (Campanella et al. 2000; Itier and Taylor 2004; Kovacs et al. 2006), the result contrasts with other reports that have failed to find adaptation to faces (Eimer and McCarthy 1999; Schweinberger et al. 2002, 2004). One possible reason for this discrepancy is likely to be related to the number of intervening stimuli between repeated images and the time interval between prime and target. For example, Henson et al. (2004) only found effects of repeating the same view of an object when there were no intervening stimuli. More recently, it has been reported that adaptation is influenced by the interval between stimulus presentations, with shorter delays giving larger adaptation (Harris and Nakayama 2006). Our results using a continuous adaptation procedure in which images are repeated in a block suggests that the number of repetitions may also be an important factor. This would fit with single neuron and functional magnetic resonance imaging (fMRI) studies that have reported that the adaptation effect is dependent on the number of repetitions of a stimulus (Grill-Spector et al. 1999; Grill-Spector 2006; Sawamura et al. 2006). For example, Sawamura et al. (2006) showed that reduction in response of neurons in macaque IT was greatest for the first repetition, but further reductions in response occurred with successive repetitions. Moreover, the response selectivity of neurons was predicted more accurately by adaptation in a block design than an event-related design. One problem with a block design, however, is that the neural response may be influenced by attention. To control for the influence of attention, participants had to perform a detection task. The results show no systematic difference in the latency of response or accuracy of the task in the different conditions.
We found no significant effect of familiarity in the M170 response to faces. This is consistent with fMRI studies that have shown familiarity has little effect on the response of face-selective regions (Gorno-Tempini et al. 1998; Eger et al. 2005; Pourtois et al. 2005). However, these neuroimaging results contrast with the fact that human subjects are very good at identifying familiar faces (even from very low quality images), whereas performance in recognition or matching of unfamiliar faces is poor (Hancock et al. 2000). A recent MEG study, Kloth et al. (2006) reported that the M170 is modulated by familiarity, with increased amplitude when viewing personally familiar faces compared with unfamiliar faces. However, consistent with our findings, a significant difference was not observed when comparing famous familiar faces with unfamiliar faces.
A central question in the visual recognition of objects is whether this process depends on a viewpoint-dependent or viewpoint-invariant neuronal representation. Models of face processing suggest that the initial stage of processing is based on a view-dependent structural representation and that further recognition of facial identity is based on matching to a viewpoint-invariant representation (Bruce and Young 1986; Burton et al. 1999). It would appear, therefore, that the view-dependent nature of the M170 response for familiar and unfamiliar faces could be taken as an indication of an early stage in face processing. On the other hand, a number of behavioral studies provide evidence that faces and other objects could be represented by a view-dependent neural representation (Hill et al. 1997; Fang and He 2005; Lee et al. 2006). For example, Lee et al. (2006) showed that changing the size of a face had no effect on face discrimination, but that changing the viewpoint caused a progressive decrement in performance. In a previous fMRI study, we found that face-selective regions within the inferior temporal lobe showed a reduced response to repeated face images and that this adaptation was invariant to changes in the size of the face, but was sensitive to changes in expression and viewpoint (Andrews and Ewbank 2004; see also, Grill-Spector et al. 1999; Winston et al. 2004; Pourtois et al. 2005). These findings are consistent with single-unit studies, where the majority of face-selective neurons in monkey temporal lobe are relatively invariant to changes in image size, but are sensitive to changes in viewpoint (Perrett et al. 1985; Rolls and Baylis 1986). Together, these findings provide some support for the idea that faces may be represented in a view-dependent representation (Logothetis et al. 1995; Wallis and Bulthoff 1999). It is important to note, however, that many of these studies used unfamiliar faces. So, it remains to be established if a view-invariant representation exists for familiar faces. The results from this study suggest that this type of process must happen at a later stage of processing.
In conclusion, we found that the M170 potential adapts to faces with the same identity if they are shown from an identical viewpoint. However, there was a recovery from adaptation when the viewpoint of the images was varied. The view-dependent nature of the M170 response did not differ according to the familiarity of a face. These results do not rule out the possibility that a view-invariant neural representation may exist within the visual system analogous to face recognition units (Bruce and Young 1986).
We are grateful to Phil Pell, Tobias Halliday, Dave Cole, and Leif Jiskoot for their help on this project. We would also like to thank members of the YNiC, particularly Andre Gouws, Gary Green, and Maribel Pulgarin for their help during the course of this project. We thank 2 anonymous reviewers for helpful comments on an earlier version of this manuscript. This work was supported by a grant from the Anatomical Society of Great Britain and Ireland to T.J.A.; M.P.E. is supported by an Anatomical Society Studentship. Conflict of Interest: None declared.