Single unit activity in area TE was recorded from two macaques as they viewed 3D appearing rendered objects that were illuminated from different directions (without cast shadows) and intensities of illumination. The average modulation produced by changes in illumination intensity or direction was rather moderate, with the majority of the neurons responding invariantly to these lighting variables. When neural activity was affected by illumination direction, it was not manifested as a preference for a particular direction of illumination by a given neuron. Instead, the tuning appeared to be to the relative brightness of a given shaped surface at a given orientation. The modulation to changes in illumination direction was considerably smaller than that produced by changes in object shape. Most of the neurons that were unaffected by changes in illumination direction responded much less to silhouettes of these objects, indicating that these neurons were also sensitive to an object's inner features. The neuronal invariance for shading variations may provide the basis for the invariance of object recognition under changes in illumination.
We can readily recognize objects presented at different positions, distances and viewpoints. The present study deals with yet another invariance of object recognition: the ability to recognize objects under different illumination conditions. Due to changes of the position of objects relative to an artificial or natural light source and/or due to the diurnal and seasonal variations of natural light sources, the illumination conditions of objects can vary greatly. Despite these considerable illumination-dependent changes in the retinal image of an object, we are able to recognize it (Braje et al., 2000). Although the ability to discount variations in illumination is, from an evolutionary point of view, not surprising, it has proven to be a computationally difficult problem. Given two images, it is not straightforward to decide whether these are created by two distinct objects, or by the same object under different illumination conditions. In fact, it can be shown that illumination invariants, i.e. illumination-independent properties that show up in every image of a particular object, do not exist (Moses and Ullman, 1992; Chen et al., 2000). However, some measures that are insensitive for illumination variations, although not under all viewing conditions (so-called quasi-invariants), have been proposed. Edge position is insensitive to illumination changes for polyhedral objects, but not for smooth surfaces (Belhumeur and Kriegman, 1998). Thus, for some objects, an edge-based representation can provide illumination-invariant information. More recently, it has been proposed that the distribution of the direction of the image gradient is a probabilistic, quasi-invariant for illumination-invariant object recognition (Chen et al., 2000). However, it is not known whether this information is used by biological systems.
Given the difficulty in computing an illumination-invariant object representation, it is important to determine how well primates are able to recognize objects under varying illumination conditions. Earlier workers (Tarr et al., 1998) found only small costs in a sequential same–different object-matching task when the illumination direction was varied between the first and second objects. However, these costs were present only for differences in cast shadows, but not for shading variations. More recently, it has been shown (Nederhouser et al., 2001) that object recognition is even invariant for variations of the cast shadows when the position of the two objects is varied so that observers cannot use the absence of any display change (on ‘same’ trials where the illumination direction is constant) as an artefactual cue to respond ‘same’. Tarr et al. also reported a small 19 ms reaction time (RT) cost (relative to a mean RT of 1358 ms) and no statistically significant effect on discrimination accuracy when naming previously learned objects at a novel illumination direction (Tarr et al., 1998). Thus, overall, these psychophysical studies indicate that humans show considerable illumination invariance when classifying objects. The present study investigates whether the same invariance also holds at the single cell level in macaque monkeys.
Surprisingly little work has been done on the physiological basis of illumination-invariant recognition. Lesion work in the macaque monkey indicated that the inferior temporal (IT) cortex is critical for object recognition under varying conditions of illumination (Weiskrantz and Saunders, 1984). Hietanen and colleagues (Hietanen et al., 1992) measured the effect of different lighting conditions of faces for face selective cells of the superior temporal sulcus (STS). However, considerable psycho-physical work indicates that, unlike object recognition, face recognition is impaired when the direction of illumination is varied (Johnston et al., 1992; Braje et al., 1998; Liu et al., 1999). Hence, we measured the effect of shading variations, induced by varying the illumination direction, in images of non-face objects on the responses of IT neurons. The neurons were tested with images of different objects so that we could assess their object selectivity under different illumination directions. In addition to varying the illumination direction, we manipulated the illumination intensity. The latter affects both mean luminance and the contrast of edges inside and between the parts of an object. To determine whether the neurons responded only to the outer contour or were also sensitive to features inside the object, we also measured their responses to silhouettes of the objects.
Materials and Methods
Two male rhesus monkeys served as subjects. These are two of the three animals used in previous work (Vogels et al., 2001). Before these experiments, a head post for head fixation and a scleral search coil were implanted under full anesthesia and strict aseptic conditions. After training in the fixation task, a stainless steel recording chamber was implanted stereotactically, guided by structural magnetic resonance imaging (MRI). The recording chambers were positioned dorsal to IT, allowing a vertical approach, as described previously (Janssen et al., 2000a). A computerized tomography (CT) scan of the skull, with the guiding tube in situ, was obtained during the course of the recordings. Superposition of the coronal CT and MRI images, as well as depth readings of the white and grey matter transitions and of the skull basis during the recordings, allowed reconstruction of the recording positions before the animals were killed (Janssen et al., 2000a). Histological confirmation of the recording sites of one animal is available (Vogels et al., 2001). All surgical procedures and animal care were in accordance with the guidelines of NIH and of the KU Leuven Medical School.
The apparatus was identical to that described previously (Vogels et al., 2001). The animal was seated in a primate chair, facing a computer monitor (Phillips 21 in. display) on which the stimuli were displayed. The head of the animal was fixed and eye movements were recorded using the magnetic search coil technique. Stimulus presentation and the behavioral task were under the control of a computer, which also displayed and stored the eye movements. A Narishige microdrive, which was mounted firmly on the recording chamber, lowered a tungsten microelectrode (1–3 MΩ; Frederick Hair) through a guiding tube. The latter tube was guided using a Crist grid that was attached to the microdrive. The signals of the electrode were amplified and filtered, using standard single cell recording equipment. Single units were isolated on line using template matching software (SPS). The timing of the single units and the stimulus and behavioral events were stored with 1 ms resolution by a PC for later offline analysis. The PC also showed raster displays and histograms of the spikes and other events, that were sorted by stimulus.
The stimuli consisted of greylevel rendered images of 13 objects. These objects were the same as those used in previous human psychophysical (Biederman and Bar, 1999) and electrophysiological (Vogels et al., 2001) studies. The objects were composed of two parts (‘geons’) and were rendered on a white background (luminance = 56 cd/m2). The images (size ~6°; luminance gamma corrected) were shown at the center of the display. Four different kinds of images were used as stimuli.
High luminance, shaded objects
The 3D objects were rendered on a Silicon Graphics Indigo2 work station, using the Showcase Toolkit. Each object was illuminated by a high luminance light source positioned in front of the object, 45° below, 45° above, or 45° to the right or to the left of the object. The lighting model was based on OpenGL. The light source was a combination of three light sources: ambient, diffused and specular. We used a Gouraud shading model, where the values are calculated only at the vertices and then linearly interpolated for each surface. Since cast shadows were not rendered, the images of the same object differed only in their shading. The greylevels of the five images of the same object were adjusted so that these had the same mean luminance after gamma correction. The five different shading versions of each of the 13 objects are shown in Figure 1. Note that changes of the illumination direction produced strong variations in luminance and contrast of the surfaces of the same object. Also, both object parts were affected. The median of the mean luminances of the 13 high luminance, shaded objects was 32 cd/m2 (1st quartile = 29; 3rd quartile = 34).
Low luminance, shaded objects
The same objects were rendered using each of the same five illumination directions as above, but with a lower luminance light source. This produced darker images of the same objects, as shown in Figure 2 for frontally illuminated versions of two objects. The median of the mean luminances of the 13 low luminance, shaded objects was 16 cd/m2 (1st quartile = 16; 3rd quartile = 19). A comparison of the responses to the high (a) and low (b) luminance shaded images allows an assessment of the effect of illumination intensity. The median change in luminance between the low and high luminance images was 48% (1st quartile = 44%; 3rd quartile = 48%).
The silhouettes had the same outlines as the object images of the two preceding cases, but in this case no shading was present so the internal, feature information was lost. The greylevel of the pixels inside the object contours had a constant luminance value, equal to the mean luminance of the corresponding high luminance, shaded version of the object (gamma-corrected). Comparison of the responses to a silhouette and the shaded, high luminance versions of the same object allowed an assessment of the necessity of inner contours, features, etc. for responsivity and selectivity.
These were identical to those in the preceding section, except that the luminance of the silhouettes corresponded to the lowest greylevel (0.1 cd/m2). The contrast of borders of a shaded object may be much higher than the contrast of the border of a silhouette that has the same mean luminance as the shaded object. To control for this, we tested the responses to these ‘black’ silhouettes of which the border had the highest possible contrast.
Trials started with the onset of a small fixation target at the display's center on which the monkey was required to fixate. After a fixation period of 700 ms, the fixation target was replaced by the stimulus for 300 ms, followed by presentation of the fixation target for another 100 ms. If the monkey's gaze remained within a 1.5° fixation window until the end of the trial, it was rewarded with a drop of apple juice.
Responsive neurons were searched by presenting a frontally illuminated version of the 13 objects, i.e. the original object images of Vogels et al. (Vogels et al., 2001). After isolating a neuron responsive to at least one of these 13 images, one or more of the following three tests were conducted.
Illumination Direction and Intensity (IDI) Test
Ten images of two objects were presented in an interleaved fashion. The 10 images were the high and low illumination, shaded versions of that object. The two objects were the object producing the largest in the search test (see above) and another one still producing a response, if possible. With this test, we assessed the influence of illumination direction and intensity on the response and selectivity of the neuron. The 20 stimulus conditions were presented in random order for at least 10 trials per stimulus.
Illumination direction and silhouette (IDS) test
Six images of four objects were presented in an interleaved fashion (usually 10 trials/stimulus). The five high luminance, shaded images and the silhouette of each of the four objects were shown. The four objects consisted of the objects producing the strongest response in the search test and three other ones. In general, the test included the two objects producing the strongest activity. In the initial version of this test, which was run on only four neurons, the silhouettes were not presented.
Silhouette Luminance Control (SLC) Test
This test consisted of three conditions: a shaded version of the object producing the largest response, its silhouette and its black silhouette. The three images were presented in interleaved fashion for at least 10 trials/stimulus each.
Each neuron was tested either with the IDI or with the IDS test. The SLC test was run after the IDS test in some neurons showing a difference between the response to the silhouette and the shaded images.
For each trial, spikes were counted in windows of 300 ms duration. Net responses were computed by subtracting the baseline activity (i.e. the spike counts obtained in a window preceding stimulus onset) from the stimulus induced activity (i.e. the spike counts measured in a window starting 50 ms after stimulus onset). Statistical significance of responses was assessed by analysis of variance (ANOVA) which compared the spike counts in the two windows. All neurons reported in this paper showed statistically significant responses (ANOVA, P < 0.05). Other ANOVAs and post hoc comparisons were performed on the net responses to test for stimulus selectivity and differences between conditions. Additional parametric and/or non-parametric tests were used to assess the significance of difference between conditions or sensitivity indices. These will be described below.
The database of the current study consisted of 94 responsive IT neurons recorded in one hemisphere of each of the two monkeys (53 and 41 neurons in each of the two animals). Based on recording depth, sulcal landmarks and relative position with respect to the base of the skull, the large majority of the neurons were recorded on the lateral convexity of area TE, i.e. lateral to the anterior medial temporal sulcus [see Fig. 1 in Vogels et al. (Vogels et al., 2001)]. A few responsive neurons were recorded in the ventral bank of the STS. Thirty-five and 59 neurons were tested with the IDI and IDS tests, respectively.
Effect of Illumination Intensity
In 35 neurons, we compared the responses to the shaded objects under high and low illumination. Figure 3 shows the responses of two IT neurons in the different conditions of the IDI test. Open and closed symbols indicate the responses to the low and high illuminated objects, respectively. The neuron of Figure 3A was shape selective, responding only to one of the two shapes tested (compare triangles and squares in Fig. 3A). Its response to the best object was significantly modulated when changing the illumination intensity (ANOVA, main effect of illumination intensity: P < 0.008). Note that this effect of illumination intensity was inconsistent across the different illumination directions. Figure 3B presents the responses of the IT neuron showing the strongest effect of illumination intensity that we encountered in this sample. It responded strongly to the high luminance, shaded images, but only weakly in the low luminance conditions (ANOVA, significant main effect of illumination intensity: P < 0.00001).
In each neuron, we compared the responses to the shading condition producing the largest response in the test (L1) to the responses to the image of the same object and illumination direction, but rendered with the other illumination intensity (L2). Testing with a t-test showed a statistically significant difference between the L1 and L2 responses (P < 0.05) in 49% (17/35) of the neurons. In order to quantify the size of this effect, relative to the response variability of the neurons, we computed an intensity modulation index, defined as the absolute difference between L1 and L2, divided by the mean (trial-by-trial) standard deviation of the responses in the two conditions. This index is analogous to the d′ measure used in signal detection theory (Green and Swets, 1966). [Since an observer gets only a single trial in every-day object recognition, the (trial-by-trial) standard deviation and not the standard error (of the mean across trials) is the relevant measure of variability of the response of a single neuron.] The distribution of this index is shown in Figure 4A. The median of the distribution is 0.97, indicating a response modulation by illumination intensity nearly equal to the response standard deviation. The neurons shown in Figure 3A,B have intensity modulation indices of 0.77 and 6.10, respectively.
Overall, do the neurons respond more strongly to the high compared to low illuminated shaded images? To answer this question, we computed the following index for each neuron: the response to the low illumination condition (L1 or L2) was subtracted from the response to the high illumination condition (L2 or L1) and divided by the sum of L1 and L2. A positive (negative) value would indicate a stronger (weaker) response to the high compared to the low illuminated version. The median of the distribution of this index (Fig. 4B) was –0.05 (1st quartile, –0.16; 3rd quartile, 0.07), indicating no overall consistent effect of the illumination intensity. Although reducing the luminance of the light source darkened the image of the object, this darkening had two opposing effects (Fig. 2). First, the contrast of the outer border of the object and the white background was increased and, second, the saliency of the internal features was reduced. Depending on the sensitivity of the neuron to either the outer object border (see below) or its inner features, a reduction of the luminance may thus result in an increase or a decrease of the response.
Effects of Illumination Direction
Figure 5 shows the responses of two IT neurons to the five high luminance, shaded versions of two objects. The response of the neuron of Figure 5A was not affected by illumination direction of its preferred object (ANOVA, P > 0.05), while the response of the neuron of Figure 5B was strongly affected by the illumination direction (ANOVA, P < 0.00001). These two examples illustrate extremes of the effect of illumination direction on the responses of the IT neurons. The two neurons shown in Figure 3 provide additional examples in which the responses were affected significantly by shading variations.
Based on ANOVA, 45% (42/94) of the response modulations due to direction of illumination (high-luminance condition) were statistically significant when selecting the object producing the largest response. To quantify this effect of shading we selected among the responses to the five shaded versions of the object producing the best response, the maximal (SH) and minimal (SL) mean responses. The direction modulation index was defined as the subtraction of SL from SH, divided by the mean standard deviation of these responses, analogous to the intensity modulation index (see above). Figure 6A shows the distribution of the direction modulation indices for the high illumination conditions of the 94 neurons tested in either the IDI or IDS test. The median direction modulation index was 1.17, indicating a response modulation by shading similar to the response standard deviation. The neurons shown in Figure 5 A,B have a direction modulation index of 0.88 and 3.28, respectively.
The large majority of the neurons responded differentially to the different objects. The responses of 86 (30/35) and 100% (59/59) of the neurons tested in the IDI and IDS tests were significantly modulated when varying the object (ANOVA, main effect of object: P < 0.05). Since the neurons were tested with four different objects in the IDS test, the results of that test were analyzed further. In order to quantify the degree of object selectivity in the IDS test in a similar way to that done for illumination intensity and illumination direction, we selected the best and worst responses for the ‘above’ illumination direction condition. This particular direction was chosen because it turned out to have the largest mean image difference (i.e. the sum of the squared differences between greylevels of corresponding pixels in the two images) with the frontally illuminated object images. Since the objects for the IDS test were selected with frontally illuminated objects, responses to the ‘above’ illuminated condition should not be affected much by this object selection, assuming there is a strong effect of shading. The shape modulation index was defined as the response difference between the ‘best’ and ‘worst’ of the four objects in the ‘above’ illumination condition, divided by the mean standard deviation of these responses. The distribution of this shape modulation index, shown in Figure 6B, has a median value of 3.03 (n = 59, IDS test), which is about three times larger than those obtained for the magnitude and direction modulation indices — this holds when the direction modulation (median = 1.03, n = 59) and shape modulation indices are compared for the same neurons in the same (IDS) test). Computation of physical image similarities (in pixel greylevels) showed that images of different objects are less similar than images of the same object but with different shading. Thus, although unlikely, it cannot be excluded that the greater modulation indices for objects compared to illumination direction merely reflect physical image similarity. Whichever the case, the large shape modulation indices clearly demonstrate that the more modest intensity and direction modulation indices do not result from an inherent limitation of the neurons to show greater, reliable response differences.
For 86% (51/59) of the neurons tested with four objects (IDS test), the responses to each of the five shaded versions of the ‘best’ object were larger than the responses to the five images of the ‘worst’ object. Examples of neurons showing this invariance of object preference to changes of illumination direction are shown in Figures 3, 5A, 7 and 10. This invariance is mainly present for the extremes of the object preference (‘best’ and ‘worst’ object), since responses for images of the ‘best’ object and for images of a less optimal object that is still driving the neuron can overlap (see Figs 3B, 5B, 7 and 8).
Coding of Illumination Properties in IT
The responses of somewhat less than half of the neurons were significantly affected by the direction of illumination. Thus, it is possible that IT neurons not only signal object-attributes per se, but also code for illumination properties such as direction of illumination. The neuron of Figure 5B responded best to the same illumination direction for two objects, apparently suggesting a tuning for illumination direction. However, for other neurons the preferred illumination direction was usually not consistent over the objects tested. Of 35 neurons that showed an effect of illumination direction for two objects, only 31% (11/35) had the same preferred illumination direction of the two objects. This percentage does not differ significantly from the value of 20% expected by chance (binomial test, P > 0.05). Thus, IT neurons do not seem to code for direction of illumination independently of shape.
Given this absence of consistent coding for direction of illumination, what then is causing the modulations of IT responses by shading variations? Inspection of the images and corresponding neural responses strongly suggests that the neurons that show an effect of illumination direction were sensitive to the variation of particular features, such as the presence or absence of particular high-contrast planes or borders. For example, the neuron of which the responses to all 20 shaded images are shown in Figure 7, responds better to two ‘brick’ objects when the luminance of their upper side is much lower than that of one of the other sides or the other curved part. A similar sensitivity for the contrast-dependent saliency of bordering planes likely underlies the shading effect for the neuron of Figure 5B. Another example is shown in Figure 8. This neuron generally responded best to a dark horizontal bar (cylinder or brick) partially overlapping a lighter plane. Note that the best responses are obtained when the bar is contrasted the most from the other part of the object, suggesting that variations in the degree of segmentation of overlapping shapes contribute to the effects of illumination direction.
Responses to Shaded Objects and their Silhouettes Compared
The neuron of Figure 8 responded weakly to the silhouettes. This agrees with the proposed effect of shading on object part segmentation (see above), since, in the silhouettes, the region where the two parts overlap cannot be segmented from the rest of the partially occluded object part. To determine the generality of this observation, we compared the responses to the shaded images and the silhouettes in the 55 neurons tested with the two sorts of stimuli. Figure 9 shows the average response to the five shaded images and the silhouettes, averaged for the best objects of all neurons tested. Overall, the responses to the silhouettes were significantly smaller than to each of the shaded versions (Scheffé post hoc tests: P < 0.05 for each of the five comparisons of shaded and silhouette images), while there was no significant difference in mean responses among the five shaded versions (Scheffé post hoc tests, not significant).
The overall response to the silhouettes is still relatively large (mean >20 spikes/s). Thus, although these neurons were selected when testing shaded versions of the two-part objects (see search test), they respond reasonably well to images missing internal contrast borders or other features. This suggests that for many neurons, the outer object contour (or ‘apparent’ contour) is largely sufficient for a selective response. The latter could, in principle, explain the lack of effect of shading in many neurons. Indeed, neurons not affected by changes in the direction of illumination might only respond to the outer contour and be insensitive to the internal object features. This can be tested by comparing the responses to the silhouettes with those to shaded versions of the same object in the subgroup of 23 neurons that showed no significant effect of illumination direction in the IDS test. In 6 of the 23 (26%) neurons, the response to the silhouettes did not differ significantly from those to the shaded versions (post hoc test, not significant). An example of such a neuron is shown in Figure 10A. The similar responses to the five shaded images and the silhouette indicates that the outer object border is sufficient for its response and selectivity. For the other 17 (73%) neurons, the response to the silhouette was significantly different from that to the shaded objects. An example of such a neuron is shown in Figure 10B. This neuron, although only weakly and non-significantly affected by the shading variations, responded much more weakly to the silhouette of its ‘best’ object.
Most of the neurons that showed no effect of shading and that responded differentially to the shaded objects versus their silhouettes, decreased their response to the silhouette. For each of the 17 neurons for which the response was affected by the silhouette manipulation, we normalized the response to each of the six images of the ‘best’ object to the maximum of the response for these images. These normalized responses for each of the six images, averaged across the 17 neurons, are shown in Figure 11A. The figure shows that although these neurons were only very weakly, if at all, affected by illumination direction, their response decreased, on average, by ~50% when stimulated by the silhouette of the same object. In addition, it should be noted that 6 out the 17 neurons lost their shape selectivity (ANOVA, shape effect not significant: P > 0.05) when the four objects were presented as silhouettes. Figure 10B presents an example of such a neuron: it responded much more strongly to one object compared to the three others when the objects were shaded, while the responses to the silhouettes of the same objects did not differ significantly.
It is possible that the reduced responses to silhouettes are not due to the neuron's sensitivity to the object's inner features, but to a decrease in the contrast of segments of its outer borders. Indeed, some segments of the outer border of (most of) the shaded versions of an object are darker, and thus higher in contrast, than its silhouette. In order to test this alternative interpretation, we compared the responses to a shaded version, its silhouette and its dark silhouette (SLC test, see Materials and Methods). The borders in a dark silhouette image have the maximum contrast possible, which is larger than the maximum contrast of the shaded images. This control test was run on seven neurons and the results, highly consistent across neurons, are shown in Figure 11B. This figure shows the average normalized responses to the 3D shape, the mean-luminance equated silhouette and the dark silhouette. Note the decrease in response to both kinds of silhouettes, ruling out the possibility that the decrease in response to the silhouette is due to parts of the border of the silhouette having less contrast than the shaded images. These results suggest that some IT neurons respond to luminance changes between or within surfaces. However, the same neurons can show a response invariance for these luminance (shading) variations.
The response of about half of IT neurons was significantly modulated by the intensity and/or direction of illumination of two-part shaded objects. Overall, the size of the response modulation was modest, the maximal response difference due to these lightness variations being about equal to the response standard deviation. The response of the same neurons was affected much more strongly by object identity and the object preferences of most IT neurons was generally invariant to changes in illumination direction. We found no evidence for a coding for illumination direction independently of shape. Instead, the modulations by illumination direction appear to be due to variations of the saliency of edges internal to the object. Many neurons responded to contrast changes inside the object's border, in addition to the object border per se, as evident from their reduced but not absent response to silhouettes of the objects. Finally, we showed that while some IT neurons require these internal contrast variations to respond selectively (because they do not have selective responses to silhouettes of these objects), these neurons respond invariantly to the shading variations.
Comparison with Previous Studies on Effects of Illumination
This is the first study of the effect of different object illumination conditions on the responses and/or selectivity of IT neurons, with the possible exception of one study in anesthetized monkeys that compared the responses to images having the opposite contrast polarity (Ito et al., 1994). However, the present results can be compared to those of previous studies using faces as images. It should be kept in mind that effects of lighting conditions on single neuronal responses may differ for objects and faces, since several psychophysical studies suggest a larger effect of illumination for face than for object recognition (Johnston et al., 1992; Braje et al., 1998; Liu et al., 1999).
Rolls and Baylis (Rolls and Baylis, 1986) found a linear increase of the response of face-selective IT neurons with the logarithm of the contrast of faces. This agrees with the idea that the illumination effects found in our study reflect a sensitivity for the contrast of edges within or between parts of an object. Hietaenen et al. (Hietaenen et al., 1992) reported that 6 of 21 face selective neurons showed a response invariance to highly different and unusual illumination conditions. This proportion is smaller than what we found for images of objects in the present study. This quantitative difference between our results and theirs can be due to several factors, among which are: (i) the more unusual and extreme lighting conditions in the Hietanen et al. study compared to ours; (ii) the presence in the Hietanen et al. study of cast shadows, which were not present in our images; and (iii) a difference between face and object selectivity.
Effects of Illumination on Neural Responses to Objects
It is possible that under real viewing conditions, the invariance for shading variations is larger than found here. Indeed, due to limitations of the rendering software, some shading effects produce what appears to be differences in material rather than differences in shading of the same material. Thus, it is possible that some of the effects that we found reflect a sensitivity for material instead of for shading differences and that the cell may have ‘correctly’ signaled objects with different materials. On the other hand, it should be pointed out that the present images did not contain cast shadows. These shadows might produce larger modulations than those we have observed for shading.
Neurons at earlier stages of the visual system are mainly sensitive to the stimulus contrast of image features (Kuffler, 1953). A sensitivity for the contrast of the internal features and/or outer borders of the object can explain the differential response of IT neurons to images of the same object but illuminated with different intensities. The latter sensitivity might also account for most of the illumination direction effects. Indeed, relating the images and the responses of neurons sensitive to illumination direction suggested that the saliency of edges between and within object parts can be critical for the modulation of the neuronal responses with illumination direction. Some directions of illumination may just improve or hinder the detection of particular features the neuron responds to and/or may enhance or decrease the segmentation of parts the neuron responds to from other, adjoining parts (Missal et al., 1997). This suggestion on the origin of shading effects in IT differs from one in which object representations carry explicit information on lighting variables, such as direction of illumination. Earlier work (Tarr et al., 1998) contains a discussion of the use of explicit representations of lighting parameters. In fact, we did not find a consistent coding of illumination direction.
A comparison of the responses to silhouettes and shaded images revealed that the response invariance for shading results from a sensitivity to the image borders in some, but not all, of the neurons. Indeed, in many neurons responses to the silhouettes differed from that to the shaded images, indicating that these neurons were also sensitive to features inside the object. However, this group of neurons tolerated quite large changes of the luminance gradients and discontinuities inside the object. How does this tolerance for shading variations arise? One possibility is that it results from the convergence of efferents of ‘lower-order’ neurons, each responding to different shading-dependent features. This is analogous to a computational scheme used to explain both view- and illumination-independent object recognition (Riesenhuber and Poggio, 2000). A second, related, possibility is that these neurons respond to the 3D structure of the object (part). Indeed, shading is a potentially important cue for 3D-shape — ‘shape from shading’ (Horn, 1981) — and thus these neurons may respond to the 3D structure of the object as signaled by the shading variations. This is not unlikely given our previous finding of selectivity for stereo-defined 3D shape (Janssen et al., 2000a, b) in the rostral, lower bank of the STS, which is part of IT. However, further experiments in which the shading-defined depth-structure of an object is varied systematically are needed to probe this possibility.
Comparison with Computational Studies of Illumination-invariant Recognition
Computational work, discussed in the Introduction, shows that a single illumination invariant does not exist. However, edge or luminance gradient representations can go a long way for some types of objects (i.e. are quasi-invariants; see Introduction). If these sorts of representations were used at the level of IT, one may expect that the response modulations of the neurons would correlate with variations of these representations. It is clear that the latter is not the case for all neurons. For instance, the responses of the neuron shown in Figure 8 do not correlate with changes in an edge representation computed using the Canny Edge Detector (Fig. 12) (Canny, 1986), nor with changes in the distribution of the image gradients. Thus these neurons respond to feature variations that are not captured by these computational models. Of course, the biological validity of such computational models could be tested only in neurons of which the response does vary with illumination direction.
So-called ‘appearance based’, computational approaches to illumination-invariant recognition do not endeavor to construct an illumination-invariant property common to all images of an object, but attempt to compute a low-dimensional manifold in image space that encompasses all images of the same object illuminated from all possible directions. Interestingly, for objects with an arbitrary reflectance function, this set of images is a convex cone in image space and can, under some conditions, be approximated from a small set of (learned) ‘basis’ images (Belhumeur and Kriegman, 1998). Although it might be tempting to relate the shading-dependent neurons to these ‘basis’ image functions, further computational work is needed to determine how well such systems perform in realistic settings, especially for categorization of novel objects (the appearance-based systems require training with a set of images).
Invariances of IT Responses and Invariant Object Recognition
Objects can be recognized despite changes in their retinal position (Biederman and Cooper, 1991) and size (Biederman and Cooper, 1992). This position and size invariance has been linked to the position and size invariance of IT neuronal responses (Sato et al., 1980; Schwartz et al., 1983; Ito et al., 1995; Logothetis et al., 1995). However, the position and size invariance of IT neurons should be qualified. First, the response of many IT neurons depends on stimulus position and size (Ito et al., 1995; Janssen et al., 2000b;Op de Beeck and Vogels, 2000) and thus IT responses per se are not that invariant to position and size changes. Second, the invariance holds, at least to some degree, for shape preferences. This invariance of shape preference is not absolute when considering other stimuli than the ‘best’ and the ‘worst’ shapes, since responses of individual neurons for optimal and less-optimal shapes may switch rank when changing position and size (Ito et al., 1995). Given this, albeit not complete, invariance of shape preference, it has been suggested (Schwartz et al., 1983; Vogels and Orban, 1996) that it is not the absolute response, but the response relative to that of other shape selective neurons, that is critical for coding shape. Results on the shape cue invariance (Sary et al., 1993; Tanaka et al., 2001) and invariance of preference for partially occluded shapes (Kovacs et al., 1995) of IT neurons fit this proposal.
The present results on invariance for illumination intensity and shading variations suggest that IT neurons that prefer a particular part or shape feature of an object, on average, still express that preference when the same object is viewed under different illumination conditions. Indeed, although the response can be affected by shading variations, the object preference is generally unaffected, an invariance that is similar to what is observed for changes of position and size. Thus, as may hold for size and position invariance, the relative population activity of neurons tuned to parts of different objects may underlie illumination-invariant recognition. In such a population code, both active and inactive neurons may play a critical role: cells tuned to parts other than those present in the presented object will remain inactive whichever illumination direction, while neurons tuned to the stimulus parts will be responsive, perhaps to different degrees that depend on the illumination conditions. Whether only real illumination-invariant neurons underlie invariant recognition or whether it is a more distributed code that includes illumination-variant neurons remains an open question.
The technical help of M. De Paep, P. Kayenbergh, G. Meulemans and G. Vanparrys is kindly acknowledged. We are grateful to Moshe Bar for creating the images. Supported by Human Frontier Science Program Organization RG0035/2000-B (R.V. and I.B.), Geneeskundige Stichting Koningin Elizabeth (R.V), GOA 2000/11 (R.V.), DARPA PWASSP (I.B.) and The James S. McDonnell Foundation (I.B.).