We carried out 2 functional magnetic resonance imaging experiments to investigate the cortical mechanisms underlying the contribution of form and surface properties to object recognition. In experiment 1, participants performed same–different judgments in separate blocks of trials on pairs of unfamiliar “nonsense” objects on the basis of their form, surface properties (i.e., both color and texture), or orientation. Attention to form activated the lateral occipital (LO) area, whereas attention to surface properties activated the collateral sulcus (CoS) and the inferior occipital gyrus (IOG). In experiment 2, participants were required to make same–different judgments on the basis of texture, color, or form. Again attention to form activated area LO, whereas attention to texture activated regions in the IOG and the CoS, as well as regions in the lingual sulcus and the inferior temporal sulcus. Within these last 4 regions, activation associated with texture was higher than activation associated with color. No color-specific cortical areas were identified in these regions, although parts of V1 and the cuneus yielded higher activation for color as opposed to texture. These results suggest that there are separate form and surface-property pathways in extrastriate cortex. The extraction of information about an object's color seems to occur relatively early in visual analysis as compared with the extraction of surface texture, perhaps because the latter requires more complex computations.
The study of how the brain enables us to recognize objects has been a major enterprise in cognitive neuroscience. It has been assumed that our visual system fractionates the information available in our visual field along basic dimensions, such as luminance, color, motion, and depth, and that these fundamental dimensions are then used to recover the 3-dimensional (3D) structure of objects, their surface characteristics, material properties, and relative location in the scene. Not all these properties of objects, however, have received the same level of attention. For example, virtually all studies of object recognition have focused on the geometric structure of objects. Very few have focused on the recognition of their material properties from surface-based visual cues. Even when the processing of surface-based cues, such as color and texture, has been studied, it has been in the context of using these cues to reveal the geometric structure of objects. Nevertheless, knowledge about the material properties of objects has, by itself, profound implications for understanding what an object is.
One area where surface-based cues (and the material properties they signal) have been shown to play a critical role is in the domain of scene recognition. Gegenfurtner and Rieger (2000), for example, have shown that scene recognition is faster for color as opposed to black-and-white images of natural scenes. Indeed, a number of models of scene recognition have argued that surface-based cues are used to categorize scenes (scene “gist”) without the need for identifying the particular objects in those scenes (Biederman and others 1982; Schyns and Oliva 1994, 1997; Moller and Hurlbert 1996; Oliva and Schyns 1997, 2000; Vailaya and others 1998; Oliva and Torralba 2001).
But as Adelson has persuasively argued in a series of papers, surface-based cues also play a vital role in the identification of the material properties of the objects themselves (Bergen and Adelson 1988; Adelson and Bergen 1991; Adelson 2001). In a study that employed real as well as depicted objects, Humphrey and others (1994) showed that surface properties, particularly color, can facilitate the naming of natural objects, presumably by flagging the material properties of those objects. This facilitatory effect on naming was not present, however, in the naming of manufactured objects, where color is far less diagnostic. It was also not present when the natural objects were presented in inappropriate colors or in gray scale. Taken together, these results argue that the color of the object is not contributing to recognition purely at the “sensory” level. Instead, color (and perhaps other surface properties) may confer an advantage at a high level of visual analysis. In the domain of face processing, as well, research has shown that surface color can assist in the discrimination of gender, particularly when other cues such as the shape of the face are nonpredictive (Tarr and others 2001, 2002). Although all these behavioral studies have shown that the processing of surface-based cues play an important role in the identification and categorization of objects, what they have not revealed is the nature of the underlying neural substrates mediating this processing.
Over the last 10 years, a large number of neuroimaging studies have focused on identifying the neural substrates of object recognition. Although a number of different category-specific regions have been identified in the ventral stream using functional magnetic resonance imaging (fMRI), a key region for visual object recognition appears to be the lateral occipital (LO) area (Malach and others 1995; for review, see Grill-Spector and Malach 2004). Almost all the fMRI studies of area LO, however, have focused on manipulations of the geometric structure of objects rather than on their material properties (as indicated by their surface-based cues). When surface properties have been studied, it has been in the context of how such cues reveal the geometric structure rather than the material properties of the object (e.g., Grill-Spector and others 1998; Wilson and Wilkinson 1998; Kourtzi and others 2003).
Nevertheless, there have been quite a few neuroimaging studies that have examined color processing in the human brain but largely in isolation from its role in object recognition (for review, see Tootell and others 2003). Until recently, there have been almost no studies of the neural basis of texture processing (but see Puce and others 1996). A recent fMRI study by Peuskens and others (2004), however, did investigate the processing of the surface texture of objects by asking participants to attend to the spatial scale of the surface texture of randomly deformed spheres. Separate regions within the lingual gyrus and collateral sulcus (CoS) were found to be differentially activated when participants performed a same–different judgment on the spatial scale of the surface texture of these objects (as opposed to judgments of their 3D shape or orientation). This pattern of activation contrasts with the activation seen in area LO for shape discriminations. But again, the focus of this study was not on the identification of the material properties of the object.
There is compelling neuropsychological evidence that the perception of object form and the perception of the material properties of objects depend on quite separate neural substrates. Patient DF, for example, who developed visual form agnosia following an acute hypoxic episode, is unable to recognize objects on the basis of their geometric structure but has no problem describing the surface properties of those same objects, and thus the material from which they are made (Milner and others 1991; Humphrey and others 1994; Goodale and Milner 2004). For example, she can tell if something is made of wood, metal, or plastic, presumably on the basis of differences in color, texture, and specularities. It is important to emphasize, however, that even though DF can process these surface cues, she is unable to use them to recover object form. She can use the surface cues, however, to access stored knowledge about the material properties of objects. As it turns out, DF has large bilateral lesions that encompass area LO, but her fusiform gyrus and parahippocampal cortex remain largely intact (James and others 2003). Not surprisingly, an fMRI study showed that there was no differential activation in DF's brain for line drawings of common objects, where form was the only cue to the identity of the object. In sharp contrast, there was robust activation in her fusiform gyrus (extending to some degree into the parahippocampal region) when DF was presented with high-resolution color photographs of objects, many of which she could identify on the basis of diagnostic surface cues (James and others 2003). More recent fMRI experiments revealed that DF showed higher activations in the parahippocampal gyrus for appropriately colored scenes (which she can often correctly categorize) than for black-and-white versions of those same scenes (Steeves and others 2004). Taken together, these results converge on the earlier fMRI studies of normal observers, suggesting that area LO plays a critical role in processing the geometric structure of objects. But in addition, these results also suggest that the processing of the material properties of objects, independent of their form, may depend on neural networks that are located more medially in the fusiform and parahippocampal regions.
The pattern of compromised and spared visual abilities in DF is not unique. The majority of the patients with visual form agnosia reported in the literature have developed their deficit following hypoxia, and the majority of these have spared perception of surface properties such as color (for review, see Milner and Goodale 1995). At the same time, people afflicted with cerebral achromatopsia show the opposite pattern of results: spared form processing but compromised color perception. The lesions responsible for this visual deficit have been localized to the lingual and fusiform gyri (Heywood and others 1995; Duvelleroy-Hommet and others 1997; for review, see Heywood and Kentridge 2003).
The double dissociation of spared and compromised visual abilities (and the lesions responsible for these behavioral observations) in visual form agnosia and cerebral achromatopsia again presents striking evidence for the notion that there are separate form and surface-property pathways in the primate visual system. It would appear that there is a form pathway that projects laterally from area V1 and encompasses area LO—and that this pathway is separate from a surface-property pathway (involving not only color but also visual texture) that projects more ventromedially from area V1 into the fusiform gyrus and parahippocampal cortex.
To test this idea, we carried out an fMRI study in normal observers in which we explicitly compared patterns of activation associated with the processing of the geometric structure of objects with those associated with the processing of their surface properties, including color and texture. In experiment 1, we scanned healthy participants as they attended to the form, to the surface properties, or to the orientation of a set of unfamiliar “nonsense” objects. We reasoned that when participants attended to the form of objects, area LO would be selectively activated, whereas when they attended to the surface properties, activation would be higher in more medial regions, including the lingual and fusiform gyri and parahippocampal cortex. We included the orientation condition as a validation of this method. Work in our laboratory (James and others 2002; Valyear and others 2006) and in others (Vuilleumier and others 2002) has revealed a region in and around the caudal region of the intraparietal sulcus (IPS) extending into the parieto-occipital sulcus that is selectively recruited for processing of the orientation of objects. We expected therefore that when participants attended to the orientation of objects, activation would be highest in this parietal region.
In experiment 2, we attempted to differentiate those components of the surface-property network that are involved in the processing of an object's color from those components that are involved in the processing of its surface texture. In this experiment, participants were required to attend selectively to the form, color, or texture of the same nonsense objects that were used in experiment 1. Again, the same logic applied. We predicted, on the basis of previous neuroimaging evidence, that area LO would be activated more when participants attended to the form of objects. Although we anticipated that color and texture would both activate more medial regions (the lingual and fusiform gyri and the parahippocampal cortex), we were uncertain as to their respective patterns of activation in these regions. One might have expected that areas previously associated with color processing, such as the V4–V8 complex, would show more activation when participants attended to color as opposed to texture, but even this prediction is uncertain because there have been no investigations as to whether or not this complex also processes surface texture.
Materials and Methods
Nine healthy participants (4 males and 5 females) took part in experiment 1, and 10 participants took part in experiment 2 (3 males and 7 females). Four participants took part in both experiments. All participants (mean age = 26.27, range = 22–34 years) were right handed, reported normal or corrected-to-normal visual acuity, gave their informed consent to participate in the study in accordance with the Declaration of Helsinki, and had no history of neurological disorder. The participants were selected from undergraduate students, graduate students, research assistants, and postdoctoral fellows studying psychology or neuroscience at the University of Western Ontario. All participants were experienced in keeping still and maintaining fixation during fMRI experiments. The procedures and protocols for both experiments were approved by the Review Board for Health Sciences Research Involving Human Participants for the University of Western Ontario and the Robarts Research Institute (London, Ontario, Canada).
Face–Place–Object Localizer Stimuli
Stimuli used to localize face-, place-, and object-sensitive areas consisted of grayscale photographs of faces, various place images (furnished rooms, buildings, city landscapes, and natural landscapes such as forests, deserts, and beaches), and both living and nonliving objects. Scrambled versions of each image were also presented to participants. All categories of objects, including scrambled images, in this face–place–object (FPO) localizer were 250 × 250 pixels in size.
Stimuli used in all experimental functional runs consisted of a series of unfamiliar nonsense objects (Figs 1, 2, and 3), all of which were bilaterally symmetrical (i.e., each object had an arbitrarily assigned top, bottom, front, and back). The novel objects we used had been constructed as 3D clay models for an earlier series of perceptual studies (Humphrey and Khan 1991; Harman and Humphrey 1999). These objects were digitally photographed and were then rendered using computer software (Discreet 3DS Max, Montreal, Quebec, Canada) to give a graphical 3D depiction of the original. Each object was rendered at 640 × 480 pixels. Once rendered, different textures and colors could be applied to the object's surface.
In experiment 1, 8 different objects were used, each of which was rendered in 8 different full-color textures (chrome, gold leaf, laminated oak, stucco wall, metallic paint, particle board, marble, and tinfoil). Each object was presented in 8 different orientations with respect to the observer (see Fig. 2).
In experiment 2, 4 different objects were used, each of which was rendered in 4 different textures (metallic paint, laminated oak, marble, and tinfoil) and 4 different colors (red, blue, yellow, and green). The orientation of all objects was the same and did not vary across trials (see Fig. 3).
Stimulus presentation was controlled by Superlab Pro version 2.0.4 (Cedrus Corporation, San Pedro, CA). Each image was projected via an LCD projector (NEC VT540 [NEC Corporation, Tokyo, Japan], screen resolution of 800 × 600) onto a screen mounted above the participant's waist as he or she lay in the bore of the magnet. The participant viewed the image through a mirror angled 45° from the browline, directly above the eyes. Distance from the participant's eyes, via the mirror, to the screen was ∼60 cm. A response pad was secured around the participant's right thigh, and behavioral reaction time (RT) measures were recorded from a computer in the control room when the participant was engaged in the 1-back task of the experimental runs (described below and in Fig. 4).
The FPO localizer was used in both experiments and was designed to identify face, place, and object areas in each participant. A single run of the FPO localizer consisted of randomly presented experimental blocks of intact face, place, or object stimuli (4 blocks of each) interleaved with blocks of scrambled images from each category. Two separate runs were carried out for each participant, 1 at the beginning of the session and 1 halfway through the experimental runs. Each run had a unique order of experimental block presentation, and the run orders were counterbalanced across participants. Each run lasted 6.44 min, starting and ending with the presentation of epochs of scrambled images. Participants were instructed to maintain central fixation while passively viewing the images being projected to them. The quantity and temporal parameters of the images presented were consistent across the different categories of stimulus blocks. That is, each face, place, object, and scrambled block contained 32 images, each was presented for 400 ms, and each was followed by a 50-ms interstimulus interval, yielding 14.4-s stimulus blocks. No images were repeated within or across blocks.
Prior to entering the magnet, participants had already been shown an example of each form, surface property, and orientation that they would encounter in the experimental runs. Each surface property had been made explicit via verbal instruction from the experimenter (i.e., “this is gold, this is tinfoil”).
In each experimental run, presentations of 16-s experimental blocks were interleaved with 12-s fixation blocks. Immediately after each fixation period, a 4-s instructional period was presented, wherein a cue was given to participants informing them explicitly to attend to a particular stimulus dimension in the ensuing experimental block (e.g., the word “form,” “texture,” or “orientation” appeared centrally, instructing the participant to attend to that particular aspect of the forthcoming stimuli; see Fig. 4). In asking participants to attend to the object's texture, we emphasized that we wanted them to attend to what the object was made of.
To ensure that participants paid attention to the correct stimulus cue, a trial-by-trial 1-back task (adapted from Corbetta and others 1990) was employed, where participants were instructed to press a button if in a pair of stimuli the second image contained the same form, texture, or orientation as the first (depending of what they were instructed to attend to). In a single trial, the first object was presented for 600 ms followed by a briefly flashed (200 ms) blank screen, then the second image was presented (also for 600 ms), and the trial ended with 600 ms of blank screen (to provide adequate time to prepare for the next trial). Participants were instructed to respond as soon as the second image appeared. Thus, each trial in an experimental block lasted for a duration of 2 s, and there were 8 trials in total (4 “same” and 4 “different” trials), yielding 16 images presented during a 16-s-long experimental block. The number of trials in an entire run was balanced so that there were roughly equal numbers of trials where 0, 1, 2, or all 3 stimulus dimensions changed upon presentation of the second image. Blocks of each experimental task (i.e., attention to form, surface properties, or orientation) were randomly presented 4 times throughout each run, and there were a total of 8 unique run orders (1 run order for each functional scan undertaken, each run lasted 6.44 min). Presentation of all 8 run orders was counterbalanced across participants. It is important to note that throughout the 8 functional scans, the visual input across the 3 experimental tasks was identical (save for the order of presentation across scans); all that was manipulated was the deployment of attention to a particular stimulus attribute, a manipulation which has been shown to reliably increase the neural response of cortical regions that process the attended dimension (Corbetta and others 1990; Murray and Wojciulik 2004).
The procedures for the functional runs in experiment 2 were identical to those of experiment 1, save for a color condition being substituted for the orientation condition. In this experiment, the instruction “texture” referred to the texture independent of the object's color, and the instruction “color” referred to the color independent of the object's surface texture.
Magnetic Resonance Imaging Acquisition
The experiments were carried out with a 4.0-T Siemens-Varian (Erlangen, Germany; Palo Alto, CA) whole-body imaging magnetic resonance imaging system at the Robarts Research Institute, using a radiofrequency head coil to collect blood oxygenation level–dependent (BOLD) weighted images (Ogawa and others 1992). A series of sagittal T1-weighted test images were collected for each participant to select 18 contiguous, 5-mm-thick functional slices of either coronal (experiment 1) or axial (experiment 2) orientation. Functional volumes were collected using a T2*-weighted, navigator echo-corrected, slice-interleaved multishot (2 shots) spiral imaging pulse sequence (volume acquisition time = 2 s, 200 volumes collected per imaging run, repetition time [TR] = 1000 ms, 64 × 64 matrix size, flip angle = 45°, echo time [TE] = 15 ms, field of view = 19.2 cm, 3.0 × 3.0 × 5–mm voxel size). After all the functional scans were completed, T1-weighted anatomical images were collected with either coronal (experiment 1) or axial (experiment 2) slice orientation (3D spiral acquisition with inversion time = 1300 ms, TE = 3 ms, TR = 50 ms, 256 × 256 matrix × 120 slices, 0.75 × 0.75× 2.5–mm voxel size).
Data analyses were carried out using Brain Voyager 2000 and Brain Voyager QX software packages (Brain Innovation, Maastricht, The Netherlands). Imaging data were preprocessed by applying a linear trend removal to the functional data and transforming the anatomical volumes into a common stereotaxic space (Talairach and Tournoux 1988). The imaging data were not subjected to spatial smoothing or a motion-correction algorithm. All functional volumes were superimposed onto an anatomical depiction of the brain. Data from the FPO localizer and the experimental runs for both experiments were analyzed using a general linear model (GLM) approach, accounting for hemodynamic lag (Friston and others 1995). Predictor variables were created for each condition in the localizer and experimental scans (FPO: faces, places, and objects; experiment 1: form, surface properties, and orientation; experiment 2: form, texture, and color). Across all scans, activated voxels were identified by means of a t-test contrasting the predictors in the regression equation against a fixed baseline level of activation (scrambled images for the FPO localizer, fixation epochs for the experimental task; t13130 = 4.0, P < 6.4 × 10−5, uncorrected). Using this method of analysis, we identified significantly active face (faces vs. scrambled images), place (places vs. scrambled images), and object (objects vs. scrambled images) areas of cortex from the localizer scans. Form-selective (activation to object form vs. both texture and orientation), surface-property–selective (texture vs. both form and orientation), and orientation-selective (orientation vs. both form and texture) regions were identified from the functional scans of the first experiment. In contrast, form- (activation to object form vs. both texture and color), texture- (texture vs. both form and color), and color (color vs. both form and texture)-selective regions were identified in experiment 2.
A voxelwise analysis was conducted first, in which data from the localizer scans were not used. We calculated significance levels by taking into account the minimum cluster size and the probability threshold of a false detection of any given cluster (Alphasim, by B. Douglas Ward, a software module in Cox 1996). Through a series of Monte Carlo simulations, Alphasim outputs information regarding how large a particular cluster must be to be considered significantly active at a particular threshold value (i.e., Alphasim calculates the probability of a false detection). Clusters of cortex identified by t-tests contrasting the predictors in the regression equation (3 experimental contrasts in each experiment described above) satisfied the criteria for significance at the level of P < 0.001, corrected. Event-related averages were then extracted from each region of cortex. The activation levels for each condition in both experiments were measured as percent BOLD signal change from a baseline, which was defined as the activation in a 4-s window that extended from 8 to 4 s before onset of the instructional cue. This 4-s window corresponded to the activity that was present in the previous fixation block. The event-related averages for each experimental condition (experiment 1: activation to form, surface properties, and orientation averaged separately across all scan sessions for all participants; experiment 2: activation to form, texture, and color) were subjected to a 1-way repeated-measures analysis of variance (ANOVA) performed separately on each hemisphere on a region-by-region basis (SPSS software package, Chicago, IL). The significant main effect of condition (form, surface properties, and orientation in experiment 1; form, texture, and color in experiment 2) was investigated using post hoc t-tests, employing a Bonferroni correction for multiple comparisons. Finally, using Brain Voyager QX, conditions from all scans were color coded and superimposed onto an anatomical representation of one participant's brain to depict the overlap between the experimental and averaged localizer data (done separately for each experiment, see Fig. 7).
Region of Interest Analysis
We also used a region of interest (ROI) approach on a single-participant basis (Hasson and others 2003). Using this method, the FPO localizer was used to identify face, place, and object areas. The threshold level of activation was manipulated for each contrast so as to sample the peak level of activity in each ROI (all P values using this method of analysis were Bonferroni corrected for multiple comparisons). Thus, the selection of each ROI was based on a combination of the probability threshold for a false detection and the size of a given cluster. Each of these had to be manipulated to sample the peak focus of activity in each cluster. Event-related averaging time courses from the experimental data (experiment 1: the activation for form, surface properties, and orientation for a single participant, averaged separately across all experimental runs and baselined using a 4-s window of activation that corresponded to neural processing during the previous fixation block; experiment 2: the activation for form, texture, and color) were then extracted from these ROIs, and further analyses were conducted to assess whether the level of activation from 1 experimental condition was significantly different from that of the other 2 (e.g., in the first experiment, whether the activation resulting from attention to form differed from the activation for surface properties and orientation in the independently identified object-sensitive region from the FPO localizer). Using this method, activation from the experimental conditions was independent of the statistical test used to identify each category-sensitive region of cortex.
Behavioral data from both experiments were analyzed by importing RT measures recorded by the Superlab Pro software package into Microsoft Excel (Microsoft Corporation, Redmond, WA). After sorting correct trials by condition, RT data were subjected to a 1-way repeated-measures ANOVA. The number of misses and false positives were analyzed in this manner as well.
Voxelwise Analysis: Processing of Form, Surface Properties, and Orientation
The voxelwise analysis examined the BOLD activity averaged across 9 participants for all possible comparisons between the 3 experimental conditions (i.e., attention to form, surface properties, and orientation, baselined against the activity from the fixation blocks). For illustrative purposes, the group data (which were derived after transformation into Talairach space) were mapped onto a single participant's anatomical brain scan. It should be noted, however, that this method of illustration does not take into account the differences in sulcal patterns across participants. As Figure 5A illustrates, form- and surface-property–specific cortical regions were confined primarily to ventral occipitotemporal cortex, and orientation-specific regions were found predominantly in parietal cortex.
In total, 6 regions of cortex met the Alphasim criteria for significance (based on a combination of the cluster size and probability threshold for false detection of each region). Of these 6 regions, 4 were found bilaterally, whereas the activations for the other 2 were localized unilaterally in the left hemisphere (Table 1). Two regions showed bilateral activation in the ventral stream (Fig. 5A). One region showed selective activation for form (as compared with surface properties and orientation); this region appeared to correspond to what has been termed area LO, t13130 = 4.00, P < 0.001. A second ventral stream region showed selective activation for surface properties (as compared with form and orientation); this region was located in the CoS, t13130 = 4.00, P < 0.001. Two regions in the dorsal stream showed bilateral activation that was selective for orientation (as compared with form and surface properties). One region appeared to be similar to the one that James and others (2002) called the human homologue of monkey caudal intraparietal sulcus (cIPS); the second region appeared to correspond to what Culham and others (2003) has termed the anterior intraparietal sulcus (AIP), t13130 = 6.00, P < 0.001, for both regions. Two unilateral clusters of activation were identified by the voxelwise analysis: a region within the left inferior occipital gyrus (IOG), which showed selective activation for surface properties as compared with form and orientation, t13130 = 4.00, P < 0.001, and a region within left primary motor cortex (M1), which showed selective activation for orientation as compared with form and surface properties, t13130 = 6.00, P < 0.001 (see Fig. 5A).
|x||y||z||t Value||Cluster size (number of voxels/27 mm3)|
|x||y||z||t Value||Cluster size (number of voxels/27 mm3)|
Note: L, left; R, right; M1, primary motor cortex.
The time courses of the percent BOLD signal change (compared with baseline fixation epochs) for each condition (form, surface properties, and orientation) were extracted from each significantly active region by means of event-related averaging in Brain Voyager. The integrated area under the curve of each of these time courses for each region was calculated, and the resulting measures were then subjected to a 1-way repeated-measures ANOVA to detect overall differences in activation across the conditions (performed separately in each hemisphere and region). To account for hemodynamic lag (i.e., the delay between stimulus onset and the rise in the BOLD signal), the first 2 data points for each waveform were not included in the calculation of the area under the curve. If significant results were yielded, the 3 conditions (i.e., form, surface properties, and orientation) were further contrasted by means of post hoc t-tests, Bonferroni corrected (P < 0.05) for multiple comparisons. Of course, based on our Brain Voyager criteria for identifying regions of cortex selectively involved in processing a given stimulus attribute (e.g., a form-selective region was identified by contrasting the activation for form against the activation for surface properties and orientation), we would certainly expect that post hoc t-tests in these regions would mirror the pattern of selectively revealed by the a priori Brain Voyager contrasts. These post hoc analyses are important, however, in that they confirm whether the processing of the other 2 stimulus dimensions are significantly different from each other. (For a detailed summary of all the post hoc statistical results, see Supplementary Table 1.)
As expected, the main effect of condition was significant in each region investigated. Post hoc analyses revealed that area LO showed higher activation for object form in both hemispheres, compared with both surface properties and orientation, which did not differ from one another in either hemisphere. These differences and the results of all the other comparisons are illustrated in Figure 5B. In the CoS, BOLD activity was significantly higher for surface properties compared with both form (only in the right CoS, but the contrast between surface property and form activation in the left CoS approached significance) and orientation (bilaterally). In addition, activation in the CoS associated with attending to form was significantly higher than the activation associated with orientation in both hemispheres. The left IOG also showed the highest activation when participants attended to the surface properties of the experimental stimuli; form activation was also significantly higher than orientation activation in this region. The region we identified as area cIPS in the posterior parietal cortex showed more activation for orientation than for form (only in the right cIPS, but the contrast between orientation and form in the left cIPS, approached significance) and surface properties (bilaterally). In both the left and right cIPS, form activation was significantly higher than surface-property activation. In the region corresponding to area AIP, the activation associated with orientation was also higher than the activation associated with either form or surface properties in both hemispheres. Again the activation associated with form in area AIP was significantly higher than the activation associated with surface properties in both hemispheres. Finally, the left M1 also showed higher activation for orientation than form or surface properties, and the activation associated with form was significantly higher than that associated with surface properties. (The pattern of activation observed in M1 is difficult to explain. Perhaps engaging in any kind of hand response [such as the button press in the experimental task] will activate M1, but explicitly attending to a stimulus dimension that is more appropriate to an “actual” hand movement will facilitate the activity in this region. That is, producing a motor response while attending to the orientation of an object [rather than its form or surface properties] might potentiate activity in M1 by more directly engaging related visuomotor networks in the dorsal stream. In this regard, it is interesting to note that activity in M1 was observed only in the contralateral hemisphere to the dominant grasping hand [all participants were right handed].)
ROI Analysis: Processing of Faces, Places, and Objects
The FPO localizer was used to independently localize face-, place-, and object-sensitive cortical regions, respectively. These regions were identified by means of t-tests contrasting various predictors in the regression equation for the FPO localizer (initially, P < 0.000064, uncorrected). Once cortical regions were identified, the probability threshold for significant activation and the size of each cluster were manipulated in order to sample the peak focus of activity in each region (ranging from P < 0.05 to P < 0.0001 across all regions, Bonferroni corrected for multiple comparisons). Event-related time courses corresponding to activity from the experimental task were then extracted from each brain region (baselined against the activation from the previous experimental fixation block). The integrated area under the curve of each of these time courses for each region was calculated, and the resulting measures were then subjected to an ANOVA to detect overall differences in activation across the conditions (number of participants in each region × experimental condition: form, surface properties, and orientation), conducted separately for each hemisphere on a region-by-region basis. To account for hemodynamic lag, the first 2 data points for each waveform were not included in the calculation of the area under the curve.
The main effects of participant, experimental condition, and the participant-by-experimental condition interaction were significant for all cortical regions identified using the ROI approach. Post hoc main effects analyses were performed on the levels of activation for the 3 experimental conditions in each region, Bonferroni corrected for multiple comparisons, P < 0.05. (For a detailed summary of all the post hoc statistical results in this analysis, see Supplementary Table 2.) The main effects of participant and the participant-by-experimental condition interaction were not analyzed further as these data represented individual differences and were not of interest to the present study.
Regions that were independently localized by the FPO localizer and corresponded with regions thoroughly investigated in the neuroimaging literature included area LO (objects vs. scrambled images), IOG (faces vs. scrambled), the fusiform gyrus (faces vs. scrambled), the CoS (places vs. scrambled), and the parahippocampal gyrus (places vs. scrambled). Figure 6A shows examples of the 5 ROIs identified in individual brains (3 of these ROIs coincide with the major cortical regions identified in the voxelwise analysis). Figure 6B summarizes the differences in activation for form, surface properties, and orientation in all these ROIs. Supplementary Table 3 presents the Talairach coordinates, the statistical thresholds, and the cluster sizes for each of these ROIs (on a subject-by-subject basis). Activity in area LO was strongly modulated by attention to object form. This was particularly evident in the left hemisphere where form activation was significantly higher than both surface-property and orientation activations. In the right hemisphere, form activation was significantly higher than orientation, but the difference between form and surface-property activations only approached significance. Orientation activation in area LO was significantly higher than surface-property activation in the left hemisphere but not the right. Activation in the CoS was significantly higher for surface properties than for either form or orientation—and this was the case in both hemispheres. In addition, activation associated with form in the CoS was significantly higher than that associated with orientation in both hemispheres. The evidence for surface-property selectivity was less consistent in the parahippocampal gyrus, the fusiform gyrus, and the IOG. Although all areas showed higher activation for surface properties than they did for orientation, the activation for surface properties did not differ from the activation for form in all 3 of these areas in the right hemisphere, and the activation for surface properties was actually significantly lower than the activation for form in the left parahippocampal and the left fusiform areas. It should be noted, however, that the magnitude of the difference in activation for form versus surface properties in the left fusiform gyrus and the left parahippocampal gyrus was not nearly as compelling as it was in area LO. In summary, there appears to be a shift in the relative weighting of activation associated with surface-property processing (compared with form or orientation processing) as one moves ventrally, anteriorly, and medially from area LO. Finally, it should be noted that the activation associated with orientation processing in most of these ventral stream ROIs was lower than the activation associated with the processing of surface properties or form—a pattern that was also evident in the voxelwise analysis.
Superimposition of the Voxelwise and Localizer Analyses
To compare the extent of overlap between the regions identified in the voxelwise and the functional areas identified in the ROI analyses, Brain Voyager QX software was used to superimpose the results from the 2 types of analyses onto a single participant's brain. For sake of clarity, localizer scans from all 9 participants were combined in a single GLM analysis, and the resulting ROIs were superimposed over the voxelwise results (Fig. 7A), rather than color coding and overlaying data from the 9 individual participants. This superimposition revealed at least 2 interesting observations.
First, form-specific regions identified in the voxelwise analysis (form contrasted against surface properties and orientation, t13130 = 4.00, P < 0.001, Alphasim corrected) fall entirely within object regions independently recruited by the FPO localizer (objects vs. scrambled images, t2783 = 12.00, P < 0.0001, Bonferroni corrected). Interestingly, both of these functionally identified regions correspond nicely with the anatomical boundaries of area LO. Second, surface-property regions identified in the voxelwise analysis (surface properties contrasted against form and orientation, t13130 = 4.00, P < 0.001, Alphasim corrected) overlapped to some degree with the face and place areas identified by the FPO localizer (face area: faces vs. scrambled images, t2783 = 7.20, P < 0.0001 Bonferroni corrected; place area: places vs. scrambled images, t2783 = 7.20, P < 0.0001, Bonferroni corrected). Note how the surface-property–specific regions identified by the voxelwise analysis were distributed along the same medial–anterior axis in the fusiform and parahippocampal gyri as the layout of the face-and place-selective regions identified by the ROI analysis. The correspondence between the surface-property–specific regions and these ROIs was more evident in the place-selective than the face-selective regions.
Of course, a groupwise analysis can overestimate the degree of overlap because of the inevitable smoothing of data that occurs when the activation for the individual participants is summed. Therefore, we also carried out an additional participant-by-participant analysis in which we examined the degree of overlap between the regions identified by the voxelwise analysis and the functional ROIs in each individual. Even though the individual voxelwise maps were inherently noisier than the group map, the patterns of overlap were remarkably similar to those revealed by the group analysis (see Table 7 in supplementary material).
Behavioral analyses were conducted on the RT data, the number of misses, and the number of false positives from the experimental task averaged across all 9 participants. Each category of data was subjected to a 1-way repeated-measures ANOVA, alpha = 0.05 (experimental conditions: form, surface properties, and orientation; dependent measures: RT, number of misses, and number of false positives, respectively). Pairwise post hoc comparisons were performed on significant main effects using the Bonferroni procedure to correct for multiple comparisons, alpha = 0.05. In the RT data analysis, trials where the participant's response was either 3 standard deviations above or below the mean response were excluded from analysis (this was done individually for each participant). Accuracy on the experimental task ranged from 78.65% to 96.61% correct. Accuracy averaged across all 9 participants was 86.84% correct.
No significant differences were found in response latency (form: mean [M] = 392.03, standard error of the mean [SEM] = 20.71; surface properties: M = 393.15, SEM = 21.51; and orientation: M = 397.68, SEM = 14.63) between the 3 experimental conditions (F2,16 = 0.13, P > 0.87, mean square error [MSE] = 606.53). Participants did differ, however, on the number of misses (F2,16 = 14.59, P < 0.001, MSE = 27.95). Specifically, post hoc t-tests revealed that the large majority of misses occurred during the orientation discriminations (M = 22.44, SEM = 4.08) as compared with both form (M = 10.00, SEM = 2.19, t8 = 4.48, P < 0.01) and surface-property (M = 11.78, SEM = 1.86, t8 = 3.61, P < 0.05) judgments. No differences, however, were found between trials where form and surface-property decisions were required, t8 = 1.21, P > 0.75. The analysis on the number of false positives made throughout the experiment yielded a significant difference between the 3 experimental conditions, F1.11,8.91 = 14.63, P < 0.005, MSE = 71.42 (adjusted using Greenhouse–Geisser). Participants made a significantly higher number of false positives on orientation trials (M = 17.11, SEM = 3.45) compared with form (M = 2.33, SEM = 0.85, t8 = 4.04, P < 0.05) and surface-property (M = 4.22, SEM = 1.31, t8 = 3.69, P < 0.05) discriminations. There were no differences, however, in number of false positives on form and surface-property trials, t8 = 1.90, P > 0.25.
Voxelwise Analysis: Processing of Form, Texture, and Color
The statistical procedures for the voxelwise analysis of experiment 2 were identical to those of experiment 1, with the only difference being a substitution of the orientation-discrimination condition by a color-discrimination condition. Again, the averaged data from 10 participants were mapped onto a single participant's anatomical brain scan (see Fig. 8A).
Major form-selective foci of activation (activation to form contrasted against the activation to both texture and color for all regions) included area LO (bilaterally) and the IPS (localized to the left hemisphere), t15916 = 6.0, P < 0.001, for both contrasts (Table 2). Texture-selective regions (texture vs. form and color for all regions) included the right CoS, the right IOG, the left lingual sulcus (LS), and the left inferior temporal sulcus (ITS), t15916 = 11.6, P < 0.001, for all contrasts. No color-selective regions were identified in the voxelwise analysis. That is, no regions were discovered where the processing of object color was significantly higher than the processing of both object form and texture. In fact, the only regions where the activation associated with color was higher than the activation associated with texture (but not form) were the left primary visual cortex (V1) and neighboring regions in the right cuneus (activation to color vs. activation to texture), t15916 = 4.0, P < 0.001, for both regions. (Area V1 and the cuneus were defined anatomically with respect to the calcarine fissure.)
|x||y||z||t Value||Cluster size (number of voxels/27 mm3)|
|x||y||z||t Value||Cluster size (number of voxels/27 mm3)|
Note: L, left; R, right; V1, primary visual cortex.
The main effect of condition (form, texture, and color) was significant in all regions investigated. As expected, post hoc analyses indicated that the form-selective region corresponding to area LO showed higher activation when participants attended to object form compared with both texture and color. But in addition, texture discriminations yielded significantly higher activity in this region compared with color discriminations (Fig. 8B). In the left IPS, however, there was no difference in the levels of activation associated with form and texture, although the activation for both these tasks was higher than that associated with color. Within the texture-selective regions (CoS, IOG, LS, and ITS), there was no difference in the levels of activation associated with form and color, with both showing lower activation than texture. We noted previously that no color-selective regions were discovered. Indeed, no regions were discovered where activation associated with color was higher than activation associated with form and texture, but activation to color was higher than the activation to texture in left V1 and in the right cuneus. In both these regions, activity associated with form was greater than the activity associated with texture. In the cuneus, the level of activation associated with form was also higher than that associated with color, but in left V1, the levels of activation associated with form and color did not differ. (For a detailed summary of this post hoc analysis, see Supplementary Table 4.)
ROI Analysis: Processing of Faces, Places, and Objects
The FPO localizer was again used to localize face-, place-, and object-sensitive cortical regions in all 10 participants. The results from a single run from one participant were not included because these results were corrupted by head motion. (Supplementary Table 5 presents a summary of the post hoc statistical results in this analysis, and Supplementary Table 6 presents the Talairach coordinates, the statistical thresholds, and the cluster sizes for each of the ROIs identified in this analysis [on a subject-by-subject basis].)
The main object-sensitive cortical regions (intact vs. scrambled objects) identified by the FPO localizer included area LO (bilaterally) and the fusiform gyrus (bilaterally; see Fig. 9A). In area LO, the activation patterns from the experimental paradigm revealed that the activity associated with object form and texture were significantly higher than the activity associated with the object color (form and texture did not differ significantly from each other, although the trend for higher form-related activation in the right hemisphere approached significance at P = 0.06; see Fig. 9B). In contrast, texture judgments yielded a higher BOLD response compared with form and color judgments in the object-selective region identified in the fusiform gyrus. Furthermore, activation associated with form judgments was higher in this region than activations associated with color judgments.
A clearly defined face-selective cortical region (faces vs. scrambled stimuli) was also identified along the fusiform gyrus (bilaterally). In both hemispheres, the levels of activation associated with form and texture judgments in this face-selective region were higher than those associated with color judgments, with the levels of activation associated with form and texture judgments not differing from one another.
Cortical regions found to be particularly sensitive to the processing of scenes (scenes vs. scrambled stimuli) were localized to the left and right CoS, the right parahippocampal gyrus, and the right inferior lingual gyrus (ILG). In all these cortical regions (save for the left CoS, where no significant differences were revealed), the activation associated with texture judgments was significantly higher than the activation associated with either form or color judgments, with the levels of activation associated with the latter 2 stimulus dimensions not differing from one another.
Finally, we compared activation for attention to form, color, or texture within the ROIs that were defined by activation to the scrambled (vs. the intact) images from our localizer task. In some ways, the scrambled versions of the achromatic objects used in the localizer task looked somewhat texturelike, and thus the ROIs defined by activation to scrambled (vs. intact) images could be probed for selective activation to texture versus color or form in the attention task. We found that activation to texture in the attention task was significantly higher than activation to either form or color in 2 of these regions: the left ITS and the right ventral fusiform gyrus (see Table 8 in supplementary material). Activation to color and form in these 2 regions did not differ. This finding converges to some degree with the voxelwise analysis that identified the left ITS (along with other regions) as being selective for texture.
Superimposition of the Voxelwise and Localizer Analyses
We superimposed the results of the voxelwise and ROI analyses onto a single participant's brain using the same procedures from experiment 1. This was done to compare the extent of overlap between the brain regions identified in these 2 analyses (see Fig. 7B). The results from a single run of the FPO localizer from one participant were not included in this superimposition because these results were corrupted by head motion. For the most part, the patterns of overlapping activations from the voxelwise and ROI analyses in this experiment mirrored those from the superimposition in experiment 1 but were not quite as compelling as the results from the previous experiment. First, form-specific regions of cortex identified in the voxelwise analysis (t15916 = 6.0, P < 0.001, Alphasim corrected) fell within the boundaries of the object-selective regions independently recruited by the FPO localizer (t3778 = 13.2, P < 0.0001, Bonferroni corrected). Again, both of these functionally identified regions correspond with the anatomical boundaries of area LO. Second, texture-specific regions identified in the voxelwise analysis (t15916 = 11.6, P < 0.001, Alphasim corrected) overlapped to some degree (albeit to a lesser degree than the superimposition from experiment 1) with the face and place areas identified by the FPO localizer (face area: t3778 = 10.0, P < 0.0001, Bonferroni corrected; place area: t3778 = 10.0, P < 0.0001, Bonferroni corrected). Similar to the results from experiment 1, texture-specific regions identified by the voxelwise analysis were distributed along the same medial–anterior axis in the fusiform and parahippocampal gyri as the layout of the face- and place-selective regions identified by the ROI analysis. Note that this pattern of activation was most prominent in the right hemisphere. The correspondence between the texture-specific regions and these ROIs was more evident in the place-selective than the face-selective regions. No overlap was revealed between the color-selective regions identified in the voxelwise analysis and the face, place, and object areas identified in the ROI analysis.
Again, a participant-by-participant analysis revealed patterns of overlap that converged on the results of this group analysis (see Table 9 in supplementary material).
Analyses were carried out on the RT data, number of misses, and the number of false positives from the experimental task averaged across all 10 participants. The procedures for these analyses were identical to those of experiment 1. Accuracy on the experimental task ranged from 71.62% to 98.44% correct. Accuracy averaged across all 10 participants was 92.71% correct.
No significant differences were found in RT (form: M = 408.59, SEM = 17.08; texture: M = 421.08, SEM = 23.63; color: M = 413.39, SEM = 14.79) between the three 3 experimental conditions (F1.23,11.07 = 0.72, P > 0.44, MSE = 902.95, adjusted using the Greenhouse–Geisser epsilon multiplier). A significant difference was revealed, however, on the number of misses (F2,18 = 5.44, P < 0.02, MSE = 12.49). Specifically, post hoc t-tests revealed a higher number of misses during the texture discriminations (M = 12.10, SEM = 3.80) as compared with both form (M = 7.40, SEM = 3.04, t9 = 2.80, P < 0.03) and color (M = 7.80, SEM = 3.38, t9 = 2.32, P < 0.05) discriminations. No differences in the number of misses for the form and color judgments were revealed, t9 = 0.36, P > 0.72. The analysis on the number of false positives yielded a significant difference between the 3 experimental conditions, F1.07,9.64 = 17.58, P < 0.002, MSE = 6.56 (adjusted using Greenhouse–Geisser). Participants committed significantly fewer false positives on form trials (M = 0.60, SEM = 0.22) compared with texture (M = 5.00, SEM = 0.70, t9 = 6.57, P < 0.001) and color (M = 4.80, SEM = 0.55, t9 = 7.58, P < 0.001) trials. No differences were found between the number of false positives on texture and color discriminations, t9 = 0.17, P > 0.86.
The results of our 2 neuroimaging experiments demonstrate that the processing of an object's form and the processing of that same object's surface properties are mediated to a large extent by separate regions within the ventral stream. In experiment 1, we showed that the processing of form was largely localized to area LO, whereas the processing of surface properties was largely localized to more medial regions within the IOG and the CoS. In addition, we showed that the processing of object orientation (as compared with form and surface properties) activated regions within the dorsal stream, particularly areas cIPS and AIP. This latter result, which converges on the findings of a number of other studies (Vuilleumier and others 2002; James and others 2003; Valyear and others 2006), provides a convincing confirmation of our attentional paradigm.
In making these arguments, we do not mean to suggest that the areas we identified respond only to single attributes of objects. In other words, we do not mean to imply that area LO, for example, processes “only” the form of objects but rather that it is the ventral stream area that is activated most strongly when attention is directed to object form. In short, area LO would appear to be the strongest “form” node in a network of processing that deals with multiple attributes of objects (i.e., form, surface properties, and orientation). Moreover, it should be remembered that even when participants were attending to other attributes, the form of the object was still present. That is, the activation in area LO during the trials in which the participant was attending to surface properties or orientation could have arisen from the obligatory processing of form—even when form was not the targeted attribute on those trials. The same argument applies to those regions showing differential activation to surface properties and orientation.
In experiment 2, we attempted to differentiate those components of the surface-property network that are involved in the processing of an object's color from those components that are involved in the processing of its surface texture and how both of these differed from the processing of object form. Once more, we showed that area LO was particularly involved in the processing of object form and that more medial regions were involved in the processing of surface properties. This was true even for an object-selective region (defined by the FPO localizer) that was located in the fusiform gyrus medial to area LO. However, these medial regions, including the IOG, the fusiform gyrus, the LS, the ILG, the ITS, the parahippocampal gyrus, and the CoS, appeared to be texture specific rather than color specific. In fact, we identified no color-specific regions anywhere in the ventral stream (i.e., no regions were revealed when the activation for color was contrasted against the activation for both form and texture). Color-selective activation was observed, however, in V1 and the cuneus. Indeed, these 2 regions were the only places where activation to an object's color was higher than the activation to an object's texture. This suggests that the visual system may extract the color of objects relatively early in visual processing, whereas information about texture, perhaps because it is more complex, requires the participation of higher order visual areas.
Other imaging studies have also found evidence for activation in area V1 for chromatic (vs. achromatic) stimuli (e.g., Engel and others 1997; Beauchamp and others 1999). But in addition, many studies have found evidence for color processing in higher order areas such as the lingual gyrus and CoS (for review, see Grill-Spector and Malach 2004). In the majority of these experiments (e.g., Corbetta and others 1990; Beauchamp and others 1999), however, activation to color stimuli was never directly contrasted with activation to surface texture. Indeed, this failure to distinguish between the processing of color and the processing of surface texture appears to be a persistent confound in many human imaging and monkey neurophysiology studies. In real-world situations, texture and color are often inextricably linked. Thus, disentangling the processing of surface properties and the abstraction of different features such as texture and color will require a good deal of further experimentation.
Processing of the Surface Properties of Objects
The medial regions of the ventral stream implicated in the processing of surface properties coincide to a large extent with those regions that have been implicated in the processing of faces (IOG, fusiform gyrus) and places (LS, ILG, parahippocampal gyrus, CoS). This was demonstrated most clearly when the regions revealed by the voxelwise analysis were superimposed onto face and place areas identified by the ROI analysis, the same areas that have been identified in a large number of neuroimaging studies (e.g., Kanwisher and others 1997; Epstein and Kanwisher 1998; Epstein and others 1999; Hasson and others 2003). Moreover, from this superimposition, it is evident that the overlap is more striking in the case of place rather than face regions.
One of the most consistent findings from the 2 experiments conducted in this study is the finding that the CoS, particularly in the right hemisphere, preferentially processes the surface properties of objects. This result fits nicely with the findings of other studies showing that this region responds specifically to texture patterns when compared with either faces and letterstrings (Puce and others 1996) or 3D shape and 3D motion (Peuskens and others 2004). This converging evidence, which strongly suggests that CoS is responsive to surface properties, is particularly interesting in light of the fact that this region has been associated with the preferential processing of scenes (Epstein and Kanwisher 1998; Epstein and others 1999). In fact, the CoS is typically included in the anatomical boundaries of the heavily studied parahippocampal place area (PPA). This functional area also extends into the parahippocampal gyrus of course. Indeed, that component of the PPA identified by ROI analysis within the parahippocampal gyrus in experiment 2 showed evidence of texture-specific processing.
There is considerable evidence in the behavioral literature that surface properties such as color and texture aid in scene perception. For example, scene recognition has been found to be faster for color as opposed to black-and-white images of natural scenes (Gegenfurtner and Rieger 2000). In addition, it has been repeatedly demonstrated that surface-based cues can be used to categorize scenes (scene gist) without the need for identifying the particular objects in those scenes (Biederman and others 1982; Schyns and Oliva 1994, 1997; Moller and Hurlbert 1996; Oliva and Schyns 1997, 2000; Vailaya and others 1998; Oliva and Torralba 2001). Not many neuroimaging studies, however, have examined the possibility that the areas involved in scene perception also show selective processing of surface properties (but see Steeves and others 2004). To our knowledge, the present study is the first to show converging evidence for surface property and scene-specific processing in that part of the CoS adjacent to the parahippocampal gyrus, from both a grouped voxelwise analysis and independent single-participant ROI-based analyses. This leads one to question whether the PPA should continue to be considered solely a “place” area or should also be considered an area that has a special role to play in the analysis of surface properties. But whatever the case might be, it is clear that surface properties have an important role to play in scene recognition.
In experiment 2, it was revealed that the LS and the ILG were more sensitive to texture than to color or form. It is of interest to note that there was also a hint of texture selectivity in the lingual gyrus in the Puce and others (1996) study discussed above. Aguirre and others (1998) have described a “landmark” area in the lingual gyrus. This area, like the PPA, showed greater activation when people were presented with pictures of buildings compared with faces and objects. Epstein and others (1999) have suggested that this region may play a particularly important role in the recognition of places as compared with the PPA proper, which they see as playing a role in encoding place information into memory. Whatever the respective roles of the place areas in the lingual gyrus and parahippocampal cortex might be, the results of the present neuroimaging study provide evidence that these regions contain networks that extract information about the surface properties of objects in the visual array.
Another region of extrastriate cortex that was responsive to differences in the surface properties of objects, particularly their texture, was localized to the IOG. This same region was identified using the FPO localizer as one that was particularly responsive to face stimuli. Indeed, the IOG has been described as being one of a number of face-selective regions in several other studies (compared with letterstrings and textures: Puce and others 1996; facial identity compared with direction of gaze: Hoffman and Haxby 2000; compared with houses and chairs: Ishai and others 2000). Indeed, this specific region of the occipital lobe has been termed the “occipital face area” by one group of researchers (Gauthier and others 2000). In other words, there is a region in the IOG that responds both to faces and to the surface properties of objects (paralleling what was observed for scenes and surface properties in the CoS and parahippocampal gyrus).
We also found a second face-selective region in the fusiform gyrus, which appears to correspond to what has been called the fusiform face area (FFA; Puce and others 1996; Kanwisher and others 1997; Grill-Spector and others 2004). The evidence for surface-property specificity in this independently identified ROI was not as clear-cut as it was in the IOG. That is, in all but one case, the activations to form and surface properties (or texture in experiment 2) did not differ from each other but both were higher than the processing of orientation (experiment 1) or color (experiment 2). The fact that the FFA showed sensitivity to both form and surface properties suggests that both these sets of cues may be involved in face processing. In fact, the surface properties of faces may play an important role in face recognition. Results from behavioral studies have demonstrated that cues such as color (Tarr and others 2001, 2002) and pigmentation (Russell and others 2004) are quite important in discriminating among faces. Price and Humphreys (1989) have proposed that the surface properties of stimuli are particularly useful in the recognition of classes of objects where there are relatively few deviations away from a common geometric template. Faces certainly fall within this class of objects. One final point worth making is that both form and surface properties received higher levels of activation than orientation in both the IOG and the fusiform gyrus. This is consistent with studies that have shown that face-specific regions in extrastriate cortex show viewpoint invariance with regard to face processing (Perrett and others 1987; Hasselmo and others 1989; Pourtois and others 2005).
Processing of Object Form
The demonstration that area LO showed more activation for attention to object form than its surface properties (both texture and color) or orientation is consistent with the results of numerous studies in the neuroimaging literature that have shown that this region is selectively activated by 3D objects (e.g., Malach and others 1995; Kanwisher and others 1996; Kourtzi and Kanwisher 2000). Moreover, the form-sensitive regions identified by the voxelwise analyses fell within the object-sensitive regions, including area LO, defined by the FPO localizer. Both of these observations provide compelling evidence that area LO plays a crucial role in processing the geometric structure of objects. Of course, many authors have suggested that area LO is particularly important for object recognition (e.g., Bar and others 2001; Grill-Spector and others 2001; James and others 2002). The results of our experiment also converge nicely with the work on the visual form agnosia patient DF, who has bilateral lesions of area LO sparing other parts of the ventral stream (Humphrey and others 1994; James and others 2003). DF has great difficulty recognizing objects on the basis of their form but remains remarkably sensitive to the surface properties of objects. Indeed, it was this observation that provided much of the impetus for the present study.
The strong association between area LO and object recognition in the imaging literature (e.g., Grill-Spector and others 2001) could lead one to conclude that the form or shape of an object is the most important element for object recognition. In fact, most fMRI studies of object recognition have tended to use stimuli in which the geometry of the visual stimuli is the main route to object identity and surface properties are not well specified or are absent (e.g., Bar and others 2001; Avidan and others 2002). Indeed, the typical contrast that has been used to identify “object recognition areas” (intact minus scrambled objects) might actually reflect (at least in part) the difference between form and texture processing. In the present experiment, we found evidence that the reverse contrast (i.e., scrambled minus intact objects) revealed areas that were driven more by attention to an object's surface texture than by attention to its form.
In addition to the work on DF described above, findings from another patient population, people afflicted with cerebral achromatopsia, also provide support for the notion that form and surface properties activate different pathways in the ventral stream. These patients, who typically have lesions to the fusiform or lingual gyri, have lost the ability to perceive color but can nonetheless perceive form (Heywood and others 1995; Duvelleroy-Hommet and others 1997).
The double dissociation of spared and compromised visual abilities in DF and the achromatopsic patients provides striking evidence that separate streams of processing for form and surface properties exist in the primate visual system. What is not entirely clear from the cerebral achromatopsic literature, however, is whether or not the lesions that result in deficits in color perception cause deficits in texture perception as well. Surprisingly, there have been almost no evaluations of texture perception in patients with achromatopsia (but see Mendola and Corkin 1999).
Processing of Object Orientation
We included the orientation condition in experiment 1 because work in our laboratory (James and others 2002; Valyear and others 2006) and in others (Vuilleumier and others 2002) has identified a region in the caudal portion of the IPS and another region in the anterior portion of the IPS that are particularly sensitive to the orientation of objects. Thus, if we also found orientation-sensitive foci in these same regions, this would validate our method of manipulating attention in the experimental task. As it turns out, this is exactly what we did find. One region identified in the voxelwise analysis of experiment 1 was located in the posterior part of the IPS and could correspond to what James and others (2002), Sakata and Taira (1994), and Shikata and others (1996) have termed area cIPS. Another, more anterior, region appears to correspond to what is thought to be the human homologue of area AIP in the monkey (Culham and others 2003). For a more detailed discussion of the functional properties of these 2 orientation-specific regions and their putative role in visuomotor control, see James and others (2003) and Valyear and others (2006).
Attention, Viewing Strategy, and Behavioral Performance
In our experiments, we manipulated attention to unmask which regions of the ventral stream were involved in the processing of different features of the presented objects, a technique that has been exploited in a number of experiments that have explored visual processing in the ventral stream (e.g., Murray and Wojciulik 2004). Thus, attention was not a confounding variable in our experiments but was rather the very means we used to investigate differences in the regions of the ventral stream involved in processing form and surface properties. In other words, if participants attended more to one stimulus dimension than another in different blocks of trials, they were successfully complying with the task demands.
Of course, there may have been differences in the amount of attention deployed across the experimental conditions. If this were the case, then the amount of activation seen in one condition, even though different brain regions would be involved, could theoretically be much higher and more extensive than that in other conditions. In fact, the magnitude of activation for the orientation task was much higher than that for both the form and the surface-property discriminations. That is, much more activation was observed in dorsal orientation-specific regions than in ventral form- or surface-property–specific regions at the same probability threshold level. Thus, to disentangle the dorsal orientation-specific activation into anatomically separable—and interpretable—regions, the probability threshold of detecting false activation was increased. But even though attention was almost certainly a factor, the activity in all the dorsal stream regions cannot be completely explained away by appealing to attention. When statistical power was lowered to the same levels as those used in the form and surface-property conditions, the whole of the IPS was activated. When the statistical criterion was more stringent, however, only areas cIPS and AIP were activated. The coordinates of these latter 2 areas place them at some distance from the lateral intraparietal area (LIP), the area most associated with the deployment of attention (Wojciulik and Kanwisher 1999) and one that showed massive activation at the lower statistical criterion.
One reason why the orientation task was more attention demanding could have been that it was more difficult. There were certainly more misses and false positives in this task compared with the form and surface-property tasks. It should be noted, however, that differences in button presses by themselves could not explain the differences in activation observed in experiment 1. The number of misses (which decreased the total number of button presses) and the number of false positives (which increased the number of button presses) on orientation trials summed to approximately the same total number of button presses that were generated in the form and surface-property discriminations. But in any case, even if differences in attentional deployment and motor behavior can account for some of the observed activity in the orientation task, the fact remains that there were no differences in any of the performance measures (RT, misses, and false positives) for the form and surface-property tasks, which were the tasks of central importance in experiment 1.
The same story holds for the magnitude of texture activation compared with form and color activation in experiment 2. There was more activation in response to making texture discriminations than there was in response to making either form or color discriminations at the same probability threshold level. Thus, as in experiment 1, the probability threshold of detecting false activation was increased for the texture condition to aid in the interpretation of the results. Appealing to an argument of differential deployment of attention across the 3 conditions to explain these differences in activation is again not convincing. Taking into account the number of hits, misses, and false positives observed for the 3 conditions, there were approximately the same numbers of button presses executed for form, texture, and color discriminations. (In fact, there were slightly more total button presses in the color condition, which contradicts the argument of differential deployment of attention, because the lowest amount of activation was observed for color discriminations when the probability thresholds for form and texture were set to the same level as that used for color.) From this observation, one could argue that the participants were performing equally well in all 3 conditions.
Although it is true that participants in experiment 2 committed more misses on texture trials than they did on form and color trials (which did not differ from each other) and committed more false positives on texture and color trials (which did not differ from each other) compared with form trials, it should be stressed that the overall accuracy across all 3 experimental conditions was 93% (texture 90%, form 94%, and color 94%). Thus, although there were differences in the number of misses and false positives between the experimental conditions (but not in RT), there was a dramatically larger number of hits compared with these erroneous responses. Given such a high level of overall performance, it is unlikely that differences in activation were due to differences in task difficulty (and the deployment of attention). In fact, the same argument can apply to the results of experiment 1, where accuracy across all 3 experimental conditions was 87% (orientation 80%, surface properties 89%, and form 91%).
It might be argued that the differences in activation were simply a reflection of differences in viewing strategies across conditions. For example, when attending to the surface properties, participants could have employed a local-viewing strategy, where they selectively fixated on a small patch located on the surface of the object to make their discriminations. On the other hand, when engaged in the form condition, participants could have employed a global-viewing strategy, where they focused on the global structure—rather than the local detail—of the objects to make their discriminations. Finally, when engaged in the orientation condition, participants could have employed a viewing strategy where they scanned the longitudinal axis of each object to make their discriminations. Even though the stimuli were present on the screen for only 600 ms (thus severely limiting the number of exploratory saccades), it is at least plausible that participants could have employed different viewing strategies because no fixation cross was present on the objects. It is our contention, however, that this viewing-strategy argument cannot explain the differences in the levels of activation observed across the experimental conditions.
First, if there were systematic differences in the way in which the stimuli fell on the retina (because of the different viewing strategies), then we should have seen differences in the patterns of activation in early visual areas. This was certainly not the case in experiment 1. It was also not the case for color and form in experiment 2, even though there were large differences in activation between color and form in higher order areas such as area LO. Of course, one might have predicted that differences in activation between form, color, and texture in higher order areas would arise purely on retinotopic grounds, if different viewing strategies had been employed—because there is some evidence for a quasi-retinotopic organization in these higher order areas (Levy and others 2001; Hasson and others 2003). But in fact, the pattern of activation that we found does not conform to this prediction. According to the retinotopic account of the functional organization of extrastriate regions, the location of particular category-specific areas reflect a preference for processing particular stimuli in specific parts of the visual field. Thus, the FFA has been shown to have a center-field bias, perhaps because we typically tend to foveate on faces. On the other hand, the PPA has been shown to have a peripheral–visual field bias, presumably because scenes typically extend well into the peripheral visual field. In our experiment, however, we saw robust activation in both these regions when subjects were attending to the surface properties of the objects, undercutting the argument that participants were employing a single viewing strategy in this condition.
Second, if participants were using different viewing strategies to perform the experimental task in each stimulus condition, one might have expected to see differences in the number of saccadic eye movements across conditions. Although eye movements were not directly recorded in the scanner, it is possible to indirectly assess whether there were differences in eye movements by examining the activation in area LIP. This area in the posterior parietal cortex has been shown to play a role in the planning and execution of eye movements (Anderson and others 1992; Colby and others 1996) and shows robust fMRI activation during saccades (for review, see Culham and Kanwisher 2001). No activation was observed in area LIP in any of the contrasts of experiment 2. There was also no LIP activation in contrast between form and surface properties in experiment 1. The only time that area LIP was activated was in the orientation condition of experiment 1, but even here there was much more activation in cIPS and AIP. In fact, as argued earlier, the fact that the orientation condition showed so much activation in LIP and other neighboring regions of the IPS is probably a reflection of the fact that this task was by far the most attentionally demanding. In short, it is unlikely that the differences in activations among the experimental conditions can be explained by appealing to an argument of different viewing strategies for each condition.
Although our experiment has demonstrated differences in the spatial distribution of activation related to the processing of object form and surface properties, it is not clear what this says about the functional organization of the visual networks mediating object perception. In fact, there is a long-standing debate in the neuroscientific community between those who advocate a category-specific account of visual processing and those who argue that visual processing is more distributed. Proponents of category-specific accounts have suggested that categories of particularly high biological relevance have led to the evolution of brain regions in the visual system devoted to processing these “special” categories (objects: Malach and others 1995, faces: Kanwisher and others 1997, places: Epstein and Kanwisher 1998, bodies: Downing and others 2001). Proponents of “distributed” accounts of ventral stream cortical organization point out that, in an imaging study, it is quite rare to observe activation in only a single region when a specific stimulus category is presented; instead, a whole host of regions (both large and small) become active (Ishai and others 1999, 2000). Thus, although it is true that particular stimulus categories might invoke distinct foci of activation, even when these areas are removed from consideration, the remaining patterns of activation that extend across the ventral stream are still correlated with differences in the stimuli that have been presented (Haxby and others 2001).
But the category-specific and distributed accounts may not be all that distant from one another. We would contend first of all that the nature of the evidence for the distributed and categorical accounts depends to a large extent on the nature of the analysis tools that are used and the way the experimental protocols are designed (i.e., evidence for either account could be revealed in the same experiment depending on how the data are analyzed and/or how the experiment is designed). Second, by taking a step back from the binary logic inherent in this controversial debate, it is possible to reconcile the 2 accounts. The particularly active nodes in a network of visual processing, which appear to be correlated with the processing of particular categories of visual stimuli (e.g., faces, animate objects, tools, and places), could represent points in a distributed network where the stimulus dimensions important for identifying the object in question intersect. The matrix of lines making up the separate neural pathways mediating the processing of form, color, and texture could also be organized within a quasi-retinotopic organization of the ventral stream (Levy and others 2001; Hasson and others 2003), in which the distribution of areas “specialized” for particular biological categories reflects those locations on the retina upon which those stimuli typically fall (e.g., “face” areas have a center-field bias, and “place” areas have a peripheral-field bias). Whatever the organizing principles of the ventral stream might turn out to be, it is clear that more attention needs to be directed toward the types of cues that are used for recognizing different classes of objects and scenes. In the current study, we have shown that form is processed preferentially in object-sensitive areas, surface properties such as texture in scene-sensitive areas, and the combination of form and surface properties in face-sensitive areas. But even here the overlap might not be expected to be perfect because the moment-to-moment activity of the underlying distributed network reflects not just differences in the processing of fundamental object attributes but also differences in retinotopic activation and other more subtle factors, such as novelty, familiarity, and task demands.
Role of Knowledge of Material Properties
The surface properties of an object, particularly a natural object, can tell us a great deal about its material properties (i.e., its mass, compliance, temperature, fragility). In fact, unlike the form of an object (its size, orientation, overall shape), which can often be derived directly from optics (i.e., the retinal array), the material properties of an object can only be deduced from previous experience with the object or similar objects. Our perception of the surface properties of an object plays an important role here—allowing us to access previously stored information about the material from which the object appears to be made. Despite the fact that there is little work on how we perceive material properties and how our brain processes this information, Adelson (2001) has convincingly argued that the perception of materials is just as important as the perception of form in everyday behavior. He argues that early vision can be thought of as a process of extracting information regarding the material properties of an object (presumably by processing surface cues) and that knowledge about these types of properties becomes critical for high-level visual processing (Bergen and Adelson 1988; Adelson and Bergen 1991; Adelson 2001). To illustrate this, Adelson points out that the accurate rendering of an object's material properties in 3D computer graphics is essential to the creation of a natural looking image. Beyond mere image rendering in the realm of computer graphics, researchers involved in studying recognition in machine vision (and computational models) have found that such systems will optimize their recognition performance when they are designed to use color and texture cues along with shape information (Voorhees and Poggio 1988; Murase and Nayar 1995; Schiele and Crowley 1996; Mel 1997; but see Edelman and Duvdevani-Bar 1997).
In summary, surface-based visual features such as color and texture can serve as important cues to material perception, which will contribute to both object recognition (Adelson 2001) and the programming of object-directed actions (e.g., Gordon and others 1993). In both of these fields of research, however, the role of material properties and surface properties remains largely unexplored.
It is currently believed that the processing of color is mediated to large part by an area or areas in the fusiform and surrounding cortex, variously referred to as V4 and/or V8 (McKeefry and Zeki 1997; Hadjikhani and others 1998; Tootell and others 2003). The findings from the present study suggest an alternative interpretation. Perhaps the V4–V8 complex is not a color area per se but is rather a surface-property area. It is quite possible that given the appropriate paradigm, one may find that this region is engaged in processing surface properties where color is only one of several sources of information contributing to the final computation.
In the present study, the term “surface properties” was used to refer specifically to the color and texture of objects. Other properties of an object's surface appearance, such as specularities (Adelson 2001) and the direction of illumination (Köteles and others 2004), have been considered in some studies, but further experimentation is required to assess how these (and other) surface cues interact with color and texture, both behaviorally and in the brain.
Finally, another question that deserves study is how knowledge of the material properties of an object (cued by the perception of that object's surface properties) contributes to object-directed action. For example, by measuring the grip and lift forces people apply to test objects, one could assess how much information about the material properties of those objects affects the calibration of the forces that are applied—and what the critical cues are.
The major finding of the present study was that attention to object form preferentially activated more lateral regions of the ventral stream such as area LO, whereas attention to an object's surface properties preferentially activated more medial regions in the ventral stream, particularly regions within the lingual, fusiform, and parahippocampal cortex. The form-sensitive regions appear to overlap with areas that have been associated with object recognition, whereas the surface-property–sensitive regions overlap with areas that have been associated with face and scene recognition. There is also evidence that surface color is extracted relatively early in visual processing, whereas information about surface texture, perhaps because it is more complex, requires processing that is carried out in higher order visual areas.
Supplementary material can be found at: http://www.cercor.oxfordjournals.org/.
This research was supported by the Canadian Institutes of Health Research and the Canada Research Chairs Program (MAG) and a postgraduate scholarship from the Natural Sciences and Engineering Research Council of Canada (JSC). The authors would like to thank Dwayne Connolly and Kenneth Valyear for their technical assistance. Conflict of Interest: None declared.