Abstract

This study investigated the cellular mechanisms in the anterior part of the superior temporal sulcus (STSa) that underlie the integration of different features of the same visually perceived animate object. Three visual features were systematically manipulated: form, motion and location. In 58% of a population of cells selectively responsive to the sight of a walking agent, the location of the agent significantly influenced the cell’s response. The influence of position was often evident in intricate two- and three-way interactions with the factors form and/or motion. For only one of the 31 cells tested, the response could be explained by just a single factor. For all other cells at least two factors, and for half of the cells (52%) all three factors, played a significant role in controlling responses. Our findings support a reformulation of the Ungerleider and Mishkin model, which envisages a subdivision of the visual processing into a ventral ‘what’ and a dorsal ‘where’ stream. We demonstrated that at least part of the temporal cortex (‘what’ stream) makes ample use of visual spatial information. Our findings open up the prospect of a much more elaborate integration of visual properties of animate objects at the single cell level. Such integration may support the comprehension of animals and their actions.

Introduction

The brain integrates different features of a visual stimulus, such as its form, colour, motion and location, into a single coherent percept. The question of how the brain manages to do this, given that each visual feature may be encoded by different specialized brain areas (Livingstone and Hubel, 1988), is often referred to as the binding problem, and has been the subject of intense study and debate (e.g. Optican and Richmond, 1987; Singer and Gray, 1995; Perrett and Oram, 1998; Reynolds and Desimone, 1999).

Existing ideas about where in the brain the processing of the different features of a visual stimulus occurs are heavily influenced by the Ungerleider–Mishkin model (Ungerleider and Mishkin, 1982), and by a subsequent adaptation by Milner and Goodale (1995). The Ungerleider–Mishkin model basically discriminates two visual cortical streams: a dorsal ‘where’ stream, extending into the inferior parietal cortex, primarily dealing with the spatial position of objects; and a ventral ‘what’ stream, extending into the inferior temporal cortex, dealing with the shape and identity of objects (Desimone and Ungerleider, 1989; Haxby et al., 1991; K@ouml;hler et al., 1995). Milner and Goodale questioned the strict ‘what-where’ dichotomy, and suggested that space and form are processed in both parietal and temporal areas but for different purposes (e.g. Goodale et al., 1991; Milner and Goodale, 1995). In their view, the ventral stream subserves visual ‘perception’, i.e. object and scene recognition, requiring allocentric spatial coding to represent the enduring characteristics of objects. This idea has gained support from studies at the cellular level. For example, Dobbins et al. (1998) reported cells coding for the object distance in area V4 within the ventral stream.

An area of great interest with respect to visual integration is the anterior part of the Superior Temporal Sulcus (STSa; encompassing both the upper and lower banks and the fundus). A subset of the STSa, consisting of the upper bank and fundus, is often called STP (Superior Temporal Polysensory area; Bruce et al., 1981), as many of the cells in this area respond to auditory and/or somesthetic stimuli in addition to visual stimuli. The STSa is often reported as a focus for the processing of the visual appearance of the face and body, body postures and actions (Gross et al., 1972; Perrett et al., 1982; 1985a,b, 1989b; Jellema and Perrett, 2002). The cells often generalize their selectivity for faces across changes in size, retinal position, orientation, the species (human or monkey), luminance and colour (e.g. Bruce et al., 1981; Perrett et al., 1984, 1989b; Rolls and Baylis, 1986; Ashbridge et al., 2000). The upper bank of the STS is thought to form an interface between the ventral and dorsal streams (Karnath, 2001). Indeed, it has been demonstrated that activity of single visual STSa cells is determined by information about both form and motion of animate objects (Oram and Perrett, 1994, 1996; Tanaka et al., 1999; Jellema and Perrett, 2003).

Although early observations of cells in STSa sensitive to spatial cues were made by Bruce et al. (1981), and sensitivity to spatial position has been documented in posterior regions of the STS (STPp) by Hikosaka et al. (1988), the influence of spatial location on STSa cells and interaction with other visual cues has been largely unexplored. Recently, we discovered that some cell populations in STSa are sensitive to the spatial location of animate objects that moved out of sight behind a screen (Jellema and Perrett, 1999; Baker et al., 2000, 2001). These findings have prompted further investigation into the question of whether single cell sensitivity for location is combined with form and motion sensitivity, and if so, in what way? The current study provides the first detailed demonstration that single STSa cells integrate information about the form, motion and location of animate objects. We discuss the implications our findings have for the ideas about higher-order visual integration.

Materials and Methods

Subjects

The experiments were performed on two awake rhesus macaque monkeys (Macaca mulatta, age 4–6 years). A detailed description of the surgical procedures can be found elsewhere (Oram and Perrett, 1996). Animal care and experimental procedures were performed in accordance with UK Home Office guidelines.

Recording

Single cell recordings (using standard methods, see Oram and Perrett, 1996) were made while the monkey was seated in a primate chair. Spikes were captured online onto a PC (CED1401plus and Spike2 software, Cambridge Electronic Design, UK). Additionally, spikes were stored on an audio track of a HiFi videotape recorder. The stimulus events (seen from the subject’s perspective) were recorded with a video camera, and stored simultaneously on the video track of the same tape. Eye movements were recorded with a second (infra-red sensitive) camera mounted on the primate chair. The signals from the two cameras were integrated (Panasonic VHS video mixer, WJAVE7) prior to recording. The signal from the eye camera was also recorded separately on a second videotape recorder, synchronized with a time-code generator and frame counter (VITC Horita VG50), for offline analysis of eye position.

Stimuli and Testing Procedure

The visual stimulus consisted of a 3-D live presentation of a human agent positioned within the testing room. The agent walked toward or away from the subject in a compatible (i.e. walking forward, head and body facing the same direction as overall movement) or in an incompatible manner (walking backward, head and body facing in the opposite direction to overall movement). This allowed for testing cell sensitivity to three basic features of the visual stimulus: motion, form, and location, with two levels within each modality. The levels of the factor motion were ‘motion away from the subject’ and ‘motion toward the subject’; the levels of the factor form were ‘front view of body’ and ‘back view of body’; and the levels for the factor location were ‘near to subject’ and ‘far away from subject’. The walking agent was chosen as stimulus because it allowed for easy manipulation of the three factors. Furthermore, STSa cells respond maximally to animate actions, of which walking actions are especially well represented (cf. Oram and Perrett, 1996).

The testing procedure consisted of systematic manipulation of the levels of the three factors motion, form and location. In a typical experiment the agent walked (compatibly or incompatibly) toward or away from the subject. The total walking distance per trial was 4.5 m (walking velocity 1m/s), between the subject and the opposite wall. The walking space in the testing room (dimensions: 5 m depth and 4 m width, relative to the subject) was pragmatically divided into three zones: ‘near’ (between 1 and 2 m from subject), ‘middle’ (2–3 m), and ‘far’ (3–4 m) (see Fig. 1A for a plan view of the testing room). Only the ‘near’ and ‘far’ locations were used in the analysis, resulting in 2 × 2 × 2 = 8 conditions.

Half of the cells were additionally tested with the agent walking within the ‘far’ zone to the left and to the right with respect to the subject. Note that the factors motion and form together determined the manner of walking (compatible or incompatible).

The velocity of walking (1 m/s), and the gait and appearance of the agent, were kept constant (one and the same agent was used for the live 3-D presentations). Other agents (with different gait and appearance) were regularly tested in addition to the standard one, but this was never observed to produce different results.

Definition of Cells Sensitive to Walking

A cell was defined as sensitive to walking when it responded significantly more to either forward or backward walking in at least one of the four directions tested (toward and away, and to the left and right, with respect to subject), compared to a range of control stimuli. No cells were found to respond to both forward and backward walking. In cases of cell sensitivity to, for example, the back view of the agent walking away from the subject, the primary control conditions consisted of the static back view of the agent (to exclude cell sensitivity to just the back view of the head and/or body), and of the agent walking away from the subject in backward manner (to exclude sensitivity to just the direction of motion, and at the same time to exclude sensitivity to spatial position, irrespective of the shape of the object). Note that the latter control condition also served as experimental condition. The necessity of the walking action was further tested by presenting (i) single arm or leg movements, which formed part of the preferred walking action [cell selectivity for individual limb articulation has been reported (Perrett et al., 1985b; Jellema et al., 2000)]; (ii) a variety of other whole body actions; and (iii) moving non-animate control objects. The latter consisted of rigid objects of comparable size (e.g. a large screen on wheels covered by a lab coat), which was pushed at similar velocity along the walking trajectories by the experimenter (who remained out of sight behind the screen). Cells defined as responding to walking were significantly less excited by each of these control conditions.

Multiple Visual Cues Contribute to Each of the Three Factors

One should bear in mind that multiple visual cues are likely to have contributed to each of the three main factors motion, form and location, and that the computations involved in producing each individual factor may have taken place either inside or outside STSa.

At least two sources of visual form information exist: form from physical body cues with rigid motion and form-from-motion (i.e. the typical pattern of articulation of limbs characteristic for forward or backward walking). Previous studies from our lab indicated that both sources contribute to STSa cell sensitivity for walking agents (Oram and Perrett, 1994, 1996). Virtual all STSa cells sensitive to walking agents are sensitive to form cues derived or available from the body moving rigidly. This is suggested by the finding that equivalent translations of the body, in which the direction of motion and the body/head view were the same as the preferred walking action, also excited the cells, but at reduced firing rate compared to the articulated walking action (Oram and Perrett, 1996). The body translations consisted of an agent standing on a mobile platform while the platform was made to move. The reduced firing rate during body translations indicates that the limb articulation during walking contributed to the responses to walking. STSa cell sensitivity for form-from-articulated motion is further supported by findings that ∼25% of cells sensitive to a walking agent responded to biological motion stimuli corresponding to the specific walking action, but again at reduced firing rate (Oram and Perrett, 1994). In the biological motion condition the form of the body is defined only by the motion of light patches attached to the points of limb-articulation. We therefore assume that all STSa cells sensitive to a walking agent use form information from body cues (e.g. head and body view), and to a certain extent the biological motion of the articulating limbs.

Visual cues to an object’s spatial location are numerous and include the retinal image size of the object (especially for familiar objects), expansion/contraction, disparity, and environmental cues to distance. Again, it is likely that several of these cues contribute to sensitivity for spatial location. It is unclear to what extent location may have been computed inside STSa, or elsewhere and fed into STSa. Findings of spatial sensitivity in single neurons in V4 suggest that distance may be computed at stages earlier in the processing chain than STSa (Dobbins et al., 1998).

The motion of the object is most likely coded in the prestriate areas (V5)MT/MST, and fed into STSa (Boussaoud et al., 1990; Oram and Perrett, 1996).

In the present study we did not attempt to assess the relative contributions of the different visual cues, because we were interested in the contribution of each factor per se, irrespective of its origin.

Stimulus Presentation

The stimuli were typically presented live from behind a fast rise-time liquid crystal shutter (aperture 20 by 20 cm at a distance of 15 cm). Between five and 12 repetitions were used per condition. In some cases a mechanical shutter with a larger aperture was used to avoid narrowing the scope of view of the subject. In addition to live presentation, stimuli were sometimes presented on film projected onto a screen at life size. The video stimuli were made with a camera positioned at the subject’s location to produce a realistic image. The live stimuli were shown at 1–4 m distance from the subject. Retinal images of live presented bodies varied from ∼67° × 23° (vertically × horizontally) at 1.5 m distance, to 28° × 9° at 4 m distance. Control stimuli consisted of objects of comparable size moved in compatible ways.

Analysis

Offline spike sorting was routinely performed (Spike2, Cambridge Electronic Design, UK). Spike counts were obtained during 1 s epochs in which the agent was walking in the ‘near’ and ‘far’ zones; the ‘middle’ zone was discarded from the analysis. The analysis epoch did not start directly following shutter opening. Upon shutter opening the subject was confronted with the sight of the agent standing still, waiting to commence walking. This image lasted for ∼2 s, after which walking started. The analysis epoch started after walking had began, thus excluding the acceleration at the start, and it ended before the end of the walk, so as to exclude the deceleration at the end of the walk. This was done to avoid confounding the results with cell sensitivity for changes in velocity.

Cell responses were analyzed using ANOVAs and Newman–Keuls post-hoc testing (significance level α = 0.05). In addition, multiple regression analysis (with effect coding: +1 and –1 for the two levels of each condition) was performed on each individual cell tested in all eight stimulus conditions, to fit a linear equation to the responses. This allowed an estimate to be made of the relative weight of each factor, and of all two-way, and the three-way, factor interactions. Only cells tested in all eight stimulus conditions were included in the analysis (n = 31).

Eye position was analyzed offline (Iview, Sensomotoric Instruments, Germany). Statistical analysis of the percentages of time the subject spent fixating with different eye positions during the recording periods indicated that the response magnitude was not related to the pattern of fixation.

Cell Localization

A detailed description of the cell localization procedure can be found elsewhere (Jellema et al., 2000). At completion of each experiment, frontal and lateral X-ray photographs were taken with the electrode still in place, to locate the electrode and the recorded cells with respect to specific bone landmarks (Aggleton and Passingham, 1981). During the final experiment, electrolytic microlesions were produced at the site of recording. The subject was then sedated and given a lethal dose of anaesthetic. After transcardial perfusion the brain was removed, and coronal sections (25 µm) were cut, photographed and stained. The X-ray photographs were aligned with the histological sections to determine the cell locations (accuracy ∼1 mm).

Although the histological reconstructions indicated that all recordings in the current study were made from cells in the upper bank and fundus of the STSa (i.e. STP, polysensory area; Fig. 4B), we used the term STSa for the sake of consistency with our previous studies. Also, the error in the accuracy of reconstructions (∼1 mm) means that we cannot exclude the possibility that some cells may have been located in the lower bank of STSa, outside STPa.

Results

A total of 272 STSa cells were screened for their specific visual responsiveness by presenting a range of bodily actions, which included limb articulations, head rotations and whole body actions, and a range of static body postures. Of these 272 cells, 64 cells responded to any or most of the actions, or to any motion in a specified direction. These cells were classified as ‘motion general’ cells, and were discarded. The remaining 208 cells (76%) produced a significantly larger response to a particular bodily action (or posture) than to any of the other actions (or postures) out of the range of actions (and postures) presented. Examples of effective actions are arm reaching, head rotation, bowing, crouching and walking. Cells found to be sensitive in the screening test to a walking agent were next subjected to further detailed testing. Forty cells met the criteria for responsiveness to the sight of a walking agent (see methods for specification of criteria). Of these, 31 cells were tested with all 8 experimental conditions, and only these cells were included in this report.

Cells Sensitive to the Spatial Location of the Agent

Eighteen out of the 31 cells (58%) showed sensitivity for the spatial location of the walking agent. These cells responded significantly differently depending on where the agent was walking: in the ‘near’ or in the ‘far’ location. Two typical examples of such cells are given in Figures 1 and 2.

The cell illustrated in Figure 1 responded maximally to the agent when walking at the ‘far’ location, and significantly less at the ‘near’ location (Fig. 1B, top left). Interestingly, the location sensitivity was present only when the direction of motion of the agent was away from the subject, and the back view of the body was visible to the subject (i.e. forward or compatible walking). Changing the levels of one, or both, of the factors form and motion was enough to abolish the response at the ‘far’ location. Thus, the sight of the agent at the ‘far’ location walking toward the subject, maintaining the back view of the body to the subject (i.e. incompatible or backward walking; Fig. 1B, top right) did not evoke a response, nor did the sight of the agent walking at the ‘far’ location away from the subject with the front body view directed to the subject (backward walking; Fig. 1B, bottom left). Thus, a walking action at the ‘far’ location was effective only when the form information indicated the back view and the motion information indicated a direction away from the subject. In other words, the cell’s location sensitivity was not absolute, irrespective of form and motion, but instead completely depended on the form and motion factors. One-way ANOVA showed a significant condition effect [F(7,64) = 8.83, P < 0.00001]. Post-hoc testing showed that the stimulus combination consisting of motion away/back body view/location ‘far’, evoked a larger response than any of the other seven stimulus combinations (P < 0.0005).

Figure 2 illustrates another cell, which responded maximally to walking at the ‘near’ location, provided the front view of the body was visible and the direction of motion was away from the subject. Again, changing the body view or the motion direction abolished responses [F(7,40) = 18.56, P < 0.00001]. Posthoc testing showed that the combination motion away/front body view/location near, evoked a larger response than all other combinations (P < 0.0002).

Cells not Sensitive to the Spatial Location of the Agent

In the remaining 13 of the 31 cells no sensitivity for location was found. A relatively large number of these cells responded in an object-centred manner (9/13, 69%). That is, they coded for one type of walking, either forward walking or backward walking, irrespective of the direction of motion and body view (cf. Perrett et al., 1989b). Object-centred coding was also found in cells that were sensitive to location, albeit less frequently (5/18, 28%). Equal numbers of cells responded to forward walking compared to backward walking.

The cell illustrated in Figure 3 is a typical example of an object-centred cell, insensitive to the location of the agent. This cell spiked vigorously as long as the agent walked forward, either away from the subject (Fig. 3A, top left) or toward the subject (bottom right). Backward walking in both directions evoked significantly smaller responses (top right and bottom left) [F(7,62) = 24.4, P < 0.00001]. In none of the conditions did the response significantly differ between the ‘near’ and ‘far’ locations (P > 0.1). The stimulus combinations in which body view and motion direction indicated forward walking (front view, motion toward; back view, motion away) evoked significantly larger responses than the combinations indicating backward walking (P < 0.0002), at both the ‘near’ and the ‘far’ location. The sensitivity to forward walking also extended into other directions, i.e. to the left and right of the subject (Fig. 3B). No location sensitivity was found between the ‘left’ and ‘right’ locations (P < 0.0001).

All nine cells that coded in an object-centred manner for the live 3-D walking agent were additionally tested with films of the walking agents. The videoed stimuli consistently produced smaller responses than the live stimuli, but the characteristic response properties (i.e. coding for either forward or backward walking) were preserved.

Population Response

Table 1 summarizes the multiple regression analyses performed on each individual cell. The main effects for the factors motion (M), form (F) and location (L), and the two- and three-way interactions are indicated. The table should be read as follows (using cell 1 as an example): the intercept value (9.0) represents the mean number of spikes/s over all eight conditions. The entry in column M determines the number of spikes that should be added/subtracted in case of a main motion effect. Here to the value (–7.5) should be multiplied by the effect code (–1 or +1). Since ‘motion away’ was given an effect code of –1, the effect of walking away is an increase in spiking activity of 7.5 spikes to 16.5 spikes/s. Walking toward (effect code: +1) results in a decrease of spiking activity of 7.5 spikes to 1.5 spikes/s. Similarly, the F and L columns give the increase/decrease in spiking activity in case of a main effect of form (effect code is –1 for ‘back body view’, and +1 for ‘front body view’) and location (effect code is –1 for ‘near’, and +1 for ‘far’), respectively. To obtain the contributions of the two- and three-way interactions, the entry is multiplied by all the effect codes of the appropriate levels. Thus, the contribution of the three-way interaction between e.g. ‘motion away’ (–1), ‘back view’ (–1) and ‘far’ (+1) is an increase with 2.3 spikes [(–1) × (–1) × (+1) × 2.3 = 2.3] to 11.3 spikes/s (which is a non-significant effect). In this way, the mean cell response in each of the 8 conditions can be calculated. The results of these calculations are given in Table 2, where C1 to C8 are the eight conditions tested.

Table 1 shows that all 31 cells (100%) were influenced by the factor form, through a main effect and/or interaction with one or both of the other factors. The vast majority of the cells (28/31, 90%) was influenced by the factor motion (main and/or interaction effects). A smaller percentage of cells, 58% (18/31), was influenced by the factor location: nine cells showing a main effect of location (five with additional interactions) and nine further cells showing significant interactions in absence of a main location effect.

Of the significant two-way interactions, the form × motion interaction was most frequently encountered (in 87% of cells, 27/31). The interactions with location were less frequent: the location × form interaction was found in 32% (10/31) and location × motion interaction in 29% (9/31) of cells. The three-way form × motion × location interaction was least frequently found (in 19% of cells, 6/31; see Figs 1 and 2 for examples of such cells).

Although individual cells discriminated sharply between conditions, one-way ANOVA showed that the population of cells as a whole did not discriminate between them [F(7,240) = 1.7, P = 0.101].

Cell Preference for Certain Combinations of Factor Levels

The population of 31 cells turned out to be quite heterogeneous with respect to the particular combination of factors most effective in driving the cell. For each of the eight possible factor-level combinations, a cell could be found that was maximally excited by it, or which produced a response not significantly different from the maximal response (Table 2). Certain combinations were, however, much more prevalent than others.

Close examination of the motion × form interactions showed that 19 of the 27 cells (70%) were sensitive to form and motion interactions indicating forward walking (i.e. motion away/back body view; motion toward/front body view). These are the cells in Table 1 with significant values in the MF column with positive sign, because multiplication of the effect codes for ‘motion away’ (–1) and ‘back body view’ (–1) is +1, as for ‘motion toward’ (+1) and ‘front body view’ (+1). A minority of the cells (8/27, 30%) responded to motion × form interactions that indicated backward walking (motion away/front body view; motion toward/back body view). The 3:1 ratio of cells coding for forward and backward walking confirmed earlier reports (Perrett et al., 1985a, 1989b; Oram and Perrett, 1996). The ratio might reflect the prevalence of forward walking in human, and to a lesser extent in monkey, society.

With respect to interactions with the factor location, one might expect the ‘near’ location to be preferentially combined with ‘motion toward’ and ‘front body view’. Similarly, one might expect the ‘far’ location to be preferentially combined with ‘motion away’ and ‘back body view’. The underlying idea is that visual features that convey a similar message are combined (cf. Mistlin and Perrett, 1990). However, due to the relatively small number of cells, firm conclusions could not yet be drawn. The cell numbers are provided here merely to give an indication. From the nine cells with significant Motion – location interactions, three showed the expected preference (the cells with a negative value in the ML column). From the ten cells with significant form × location interactions, seven showed the expected preference (cells with negative value in the FL column).

Cell Locations

For the majority of the cells that integrated form, motion and location, histological reconstructions were made. Figure 4 shows a reconstruction of the locations of cells for one subject. All cells were located within the upper bank and fundus of STSa, between 12 and 20 mm anterior to the interaural plane.

Discussion

We demonstrated for the first time in detail that single cells in STSa integrate information about the form, motion and location of animate objects. Integration of form and motion information at the single cell level in STSa has been reported previously (Perrett et al., 1985a, 1989b; Oram and Perrett, 1994, 1996), but in these studies the factor location was not manipulated. The present study was designed to allow for an accurate estimation of location effects, in addition to motion and form effects. We demonstrated that in 58% of a population of 31 cells sensitive to a walking agent the location of the agent influenced the responses, often by means of intricate two- and three-way interactions with the factors form and motion. The factors form and motion were even more influential; form affected responses in all 31 cells (100%), motion in 90% of the cells. The relative impact of the three factors varied strongly from cell to cell. In just one of the 31 cells tested the response could be explained by a single factor; in all other cells at least two factors, and in half of them all three factors, played a role.

Cellular Computations

The intricate and varied ways in which the three factors explained the cells’ responses provide a glimpse of the computations that take place at the cellular level, presumably within cortical columns. It appears that the weight of the input of the three factors varies from cell to cell, while the cell’s output does not merely reflect a summation of the individual inputs, but follows conditional rules. For example, the location ‘near’ may have a significant influence on the responses of a cell on the condition that the agent is moving away from the subject and the front view of the body is visible, changing the level of just one of these factors abolishes responses (see Fig. 2).

Suggestions as to the underlying computations also come from recordings of cells located closely together along a single electrode track. Such cells sometimes showed surprisingly large differences in preferred stimulus configuration. Although this data is anecdotal, it nicely highlights the complexity of cellular computations within a small patch of STSa. An example is given in Figure 5, which shows recordings from three neighbouring cells, along a single track in STSa. The upper cell responded maximally to the agent walking compatibly away at the ‘far’ location. The middle cell, located ∼85 µm deeper in the brain (according to the reading on the micromanipulator), required the agent to walk away incompatible at the ‘near’ location. The third cell, located just 15 µm deeper than the second cell, responded to incompatible walking in either direction, irrespective of location. Thus, along this single track, the contributions of all three factors varied significantly over a distance of ∼100 µm. In contrast, there were other examples of tracks of neighbouring cells with quite similar response patterns. Since electrode tracks were positioned perpendicular to the longitudinal axis of the STS, different recording tracks likely sampled from different columns. The inaccuracy in histological procedures for reconstructing cell location (Fig. 4) does not allow conclusions to be drawn as to the number of columns that might have been traversed by individual electrode tracks.

Where Does Spatial Coding in STS Come From?

The location information in STSa could have come from various parts in the brain and could have been based on various visual cues. For instance, distance sensitivity observed in cells in area V4 (Dobbins et al., 1998) might extend into STSa, since V4 forms the main visual input into inferior temporal cortex (IT), and IT projects heavily onto STSa (Felleman and Essen, 1991). Of particular interest are the nearby hippocampus and parahippocampal gyrus (the latter projects to STSa via the perirhinal cortex; Seltzer and Pandya, 1994). The primate hippocampus contains cells that code for the view of a particular part of the testing environment (i.e. allocentric place coding; Tamura et al., 1992; O’Mara, 1995; Rolls et al., 1997). In humans, the parahippocampal gyrus becomes active during passive viewing of the geometrical layout of local coherent space (Epstein and Kanwisher, 1998; Epstein et al., 1999), and when subjects have to navigate or recall navigational routes (Maguire et al., 1997). The parahippocampal area thus seems to code the spatial layout irrespective of the presence of discrete objects at those locations, whereas the spatial sensitivity we found in STSa requires the presence of an (animate) object at the effective location. Thus, information about spatial location has a profound influence in the temporal lobe, but its utilization in the visual processing of complex animate objects and their actions is still largely unknown.

The retinal size of the object might also have been a cue to its distance. For familiar objects of known dimensions (such as humans), a small retinal size indicates a far away location and a large retinal size a nearby location. However, the prevalent view is that cells in inferior temporal cortex (including STSa) generalize over object size, as over many other stimulus characteristics (such as illumination, contrast, and orientation; Gross et al., 1972; Bruce et al., 1981; Perrett et al., 1982, 1985b; Rolls and Baylis, 1986; Kovács et al., 2003). This is consistent with their presumed role in object recognition and object constancy. There are nevertheless indications that generalization over object size may be less ubiquitous in STSa than previously assumed (Ashbridge et al., 2000).

The Frame of Reference for Spatial Coding

We labelled the locations of the walking agent from the subject’s perspective, i.e. near or far from the subject, to the subject’s left or right. Although this reflects an egocentric frame of reference, we cannot exclude an allocentric frame of reference (i.e. spatial descriptions based on environmental landmarks rather than the subject’s own position and orientation). The abundance of cells in STSa sensitive to the perspective view of the object (viewer-centred framework), compared to the minority of cells responding equally well to all views of an object (object-centred framework) (Perrett et al., 1991; Ashbridge and Perrett, 1998), suggests that a spatial sensitivity of STSa cells might also be expressed in relation to the observer (i.e. egocentric; but see Milner and Goodale, 1995). However, allocentric coding has been observed for STSa cells sensitive to goal-directed actions (Perrett et al., 1989b) and occluded agents (Baker et al., 2001). Ego- and allocentric coding do not have to be mutually exclusive. Processing of position could begin with an egocentric frame and progress to an allocentric one, in much the same way as view-general (object-centred) cell properties can be generated by combining view-specific cell properties (Perrett et al., 1989b).

Some authors emphasized the distinction between categorical and coordinate spatial representations (e.g. Kosslyn, 1987). Categorical representations specify relative spatial relations (e.g. on top or below, to the left or to the right), which are especially relevant for detecting goals and consequences of actions. Coordinate representations specify the exact spatial positions, and are typically used for guiding actions (e.g. picking up an object). Our data are consistent with the notion that spatial coding in STSa is of the categorical type, i.e. independent of the absolute positions of the agent and the object in space, but dependent on their relative positions.

The Functional Significance of Positional Coding in STSa

Why would the ventral visual stream care about distance if its purpose is to perform object identification? One would rather expect it to ignore distance in order to achieve object constancy.

Previously we suggested that STSa plays a role in the visual analysis of the intentions and goals of others’ actions (i.e. social cognition), in addition to animate object identification (Emery and Perrett, 1994; Jellema and Perrett, 2002). We argue that the significance of spatial coding in STSa must be seen in this light. The spatial positions that an individual occupies with respect to other individuals or objects contains vital information for an observer when it comes to determining the goal or intention of that individual.

Cells in STSa have been shown to code for congruent sets of body actions and postures, which convey information about the direction of others’ attention (Perrett et al., 1992), their intentions (Jellema et al., 2000) and goals (Perrett et al., 1989b). Such body actions typically relate to particular locations, e.g. reaching toward a location at which a food reward is kept, or walking toward the door. The previous studies, however, did not define the role of these target locations in the cell sensitivity. It might well be that spatial sensitivity also extends to these situations, but this remains to be investigated.

Why Was Spatial Coding in the STS not Found Before?

Our results suggest that spatial coding may indeed be widespread in STSa, which prompts the question of why it was not found before. The only reports so far come from our lab showing that STSa cells are sensitive to the location of occluded agents (Jellema and Perrett, 1999; Baker et al., 2000, 2001). One reason is probably that, given the predominant view of the functions of the dorsal and ventral visual streams (Ungerleider and Mishkin, 1982 model), most studies on the ventral stream were biased towards investigating object recognition, neglecting possible effects of position.

Another reason may be related to the function of the STS in social perception (Allison et al., 2000; Jellema and Perrett, 2002). Spatial relationships may be coded in STS provided they contribute to the social significance of the visual stimuli. This might explain the failure to activate the STS in imaging studies that used ‘socially meaningless’ spatial relationships between human figures to localize spatial coding in the brain (e.g. Courtney et al., 1996).

Our findings suggest that spatial coding is not an exclusive property of the dorsal visual stream, but occurs in the ventral visual stream (STSa) as well. Moreover, our findings open up the prospect of a much more elaborate integration of visual information about animate objects at the single cell level in STSa. Such integration may support the comprehension of animals and their actions.

This work was supported by the Human Frontier Science Program. We thank Bruno Wicker for his contribution to some of the experiments.

Figure 1. Cell response determined by the interaction between motion, form and location information. (A) Plan view of the testing room, in which the location of the subject (S) and agent (filled circle), and the agent’s walking path (interrupted lines) are indicated. Walking was performed in either forward or backward manner. (B) Rastergrams and stimulus time histograms show the cell activity during 3 s stimulation periods. Responses are shown for the four combinations of the factors Motion and Form: Motion away/Back body view (top left), Motion away/Front body view (bottom left), Motion toward/Back body view (top right), and Motion toward/Front body view (bottom right). Locations in the testing room are indicated at the top of the rastergrams (near, middle, far).

Figure 1. Cell response determined by the interaction between motion, form and location information. (A) Plan view of the testing room, in which the location of the subject (S) and agent (filled circle), and the agent’s walking path (interrupted lines) are indicated. Walking was performed in either forward or backward manner. (B) Rastergrams and stimulus time histograms show the cell activity during 3 s stimulation periods. Responses are shown for the four combinations of the factors Motion and Form: Motion away/Back body view (top left), Motion away/Front body view (bottom left), Motion toward/Back body view (top right), and Motion toward/Front body view (bottom right). Locations in the testing room are indicated at the top of the rastergrams (near, middle, far).

Figure 2. Cell response determined by the interaction between motion, form and location information. Rastergrams and stimulus time histograms show the cell activity during 3 s stimulation periods. Details are as in legend to Figure 1. This cell responded maximally to a different, specific, combination of the three factors, namely: Motion away/Front body view/Near location (bottom left panel).

Figure 2. Cell response determined by the interaction between motion, form and location information. Rastergrams and stimulus time histograms show the cell activity during 3 s stimulation periods. Details are as in legend to Figure 1. This cell responded maximally to a different, specific, combination of the three factors, namely: Motion away/Front body view/Near location (bottom left panel).

Figure 3. Cell response determined by the interaction between motion and form information, irrespective of location. Rastergrams and stimulus time histograms show the cell activity during stimulation periods of 3 s (A) and 2 s (B). Details are as in legend to Figure 1. (A) Responses to the four combinations of the factors Motion and Form. (B) The agent additionally walked from left to right and vice versa. In this case, the two levels of the factor Form were the left and right profile views of the body. Walking in the left–right dimension lasted for 2 s.

Figure 3. Cell response determined by the interaction between motion and form information, irrespective of location. Rastergrams and stimulus time histograms show the cell activity during stimulation periods of 3 s (A) and 2 s (B). Details are as in legend to Figure 1. (A) Responses to the four combinations of the factors Motion and Form. (B) The agent additionally walked from left to right and vice versa. In this case, the two levels of the factor Form were the left and right profile views of the body. Walking in the left–right dimension lasted for 2 s.

Figure 4. Histological reconstruction of cell locations. (A) Left side view of the macaque brain. Cells were recorded in the banks of the STSa, between 11 and 19 mm anterior to the inter-aural plane (indicated by vertical bars). (B) Reconstruction of coronal sections of the left hemisphere taken at 2 mm intervals from 20 to 12 mm anterior to the inter-aural plane. Each section represents a 2 mm thick slice. Thick line, cortical surface; thin lines, edge of grey matter.

Figure 4. Histological reconstruction of cell locations. (A) Left side view of the macaque brain. Cells were recorded in the banks of the STSa, between 11 and 19 mm anterior to the inter-aural plane (indicated by vertical bars). (B) Reconstruction of coronal sections of the left hemisphere taken at 2 mm intervals from 20 to 12 mm anterior to the inter-aural plane. Each section represents a 2 mm thick slice. Thick line, cortical surface; thin lines, edge of grey matter.

Figure 5. Different response characteristics in neighbouring cells. Mean responses (± SE) are shown of three cells (cells 1, 2 and 3), which were located at 85 and 15 µm distances from each other along a single recording track (see inset at the left). 1 s period of assessment.

Figure 5. Different response characteristics in neighbouring cells. Mean responses (± SE) are shown of three cells (cells 1, 2 and 3), which were located at 85 and 15 µm distances from each other along a single recording track (see inset at the left). 1 s period of assessment.

Table 1


 Summary of the multiple regression analysis for all individual cells

Cell Intercept Main effects    Two–way interactions    Three–way interaction R2 
   M × F M × L F × L  M × F × L  
 1  9.0 –7.5***  –6.8***  2.6    7.7*** –2.9 –2.6   2.3 0.84 
 2  7.1 –5.5***  –4.9***  1.3    5.5*** –1.4 –1.1   1.0 0.92 
 3  4.0  3.5***   0.9**  0.7*    1.2***  0.4  0.0   0.1 0.86 
 4 13.3 12.5***   7.9*** –5.5**    7.4*** –5.8** –3.9*  –4.2* 0.89 
 5 10.3 –3.2***   3.0***  2.7***    1.9** –5.6*** –1.2*   2.1** 0.88 
 6 14.0 –1.8  –7.7***  1.2   –0.5  0.7 –1.1  –0.4 0.54 
 7 15.6  1.7 –12.1***  1.1   –3.3*  0.9 –1.7  –0.7 0.68 
 8  1.2 –1.0***  –0.5*  0.4    0.7* –0.6* –0.3   0.1 0.5 
 9 10.1  0.1  –0.2  0.2    5.0***  0.4  0.9*   0.0 0.72 
10 21.8  0.8   0.5  0.0    3.9***  0.6  1.6  –0.4 0.28 
11 11.3  3.3***   2.5**  1.2    9.7***  3.6***  2.7**   0.8 0.89 
12  9.9  0.4   1.2  1.2    3.7***  2.5*  1.5   0.0 0.52 
13 25.9 –1.5  –0.3 –1.8    2.8*  3.5**  1.4  –0.8 0.39 
14 16.3  1.5   1.9 –0.2    5.1**  0.6  0.1  –1.9 0.34 
15  7.1  1.1   0.2  0.8    1.3  0.8  1.4*  –0.7 0.31 
16  0.3 –0.3*  –0.3*  0.3*    0.3* –0.2 –0.3*   0.3* 0.42 
17  0.8 –0.5**  –0.6**  0.1    0.5** –0.2 –0.4*   0.3 0.35 
18  3.2 –2.5***  –1.8**  0.4    1.4** –0.5 –1.3*   1.2* 0.49 
19  3.0 –1.2**   1.2** –0.9*   –2.7***  1.5*** –1.6***   0.8* 0.76 
20  2.8 –1.3***   1.3*** –0.8**   –2.7***  1.2*** –1.1***   0.8** 0.81 
21 12.2  0.5  –0.3 –0.3  –11.7***  0.8 –1.1   0.0 0.77 
22  2.6  1.1  –0.9* –0.8*   –2.2*** –0.4  0.3   0.6 0.67 
23  1.1  0.1  –0.2 –0.1   –1.1**  0.2 –0.2   0.1 0.36 
24 16.5  2.6  –3.2  0.1  –13.7***  4.3* –3.8  –1.8 0.76 
25  4.8  2.7**  –2.9** –0.1   –4.3***  0.5 –0.2  –0.2 0.73 
26 17.9  7.7***   2.4*  0.0    4.8***  0.3  1.1  –0.2 0.65 
27  6.4 –1.5  –2.0* –1.7    3.2**  1.0  1.0  –0.4 0.72 
28  8.3 –4.6***  –2.3* –0.1    0.5 –0.9  0.6  –0.1 0.85 
29 11.0 –3.3**  –3.2** –3.0*    3.8**  1.4  0.5   0.0 0.74 
30  2.3 –1.2  –1.8* –1.5*    1.4  0.7  1.1  –0.9 0.59 
31  3.9 –1.6*  –2.1* –1.2    2.8***  1.3  1.1  –1.0 0.64 
No. sign. cells  17  22  9   27  9 10   6  
Cell Intercept Main effects    Two–way interactions    Three–way interaction R2 
   M × F M × L F × L  M × F × L  
 1  9.0 –7.5***  –6.8***  2.6    7.7*** –2.9 –2.6   2.3 0.84 
 2  7.1 –5.5***  –4.9***  1.3    5.5*** –1.4 –1.1   1.0 0.92 
 3  4.0  3.5***   0.9**  0.7*    1.2***  0.4  0.0   0.1 0.86 
 4 13.3 12.5***   7.9*** –5.5**    7.4*** –5.8** –3.9*  –4.2* 0.89 
 5 10.3 –3.2***   3.0***  2.7***    1.9** –5.6*** –1.2*   2.1** 0.88 
 6 14.0 –1.8  –7.7***  1.2   –0.5  0.7 –1.1  –0.4 0.54 
 7 15.6  1.7 –12.1***  1.1   –3.3*  0.9 –1.7  –0.7 0.68 
 8  1.2 –1.0***  –0.5*  0.4    0.7* –0.6* –0.3   0.1 0.5 
 9 10.1  0.1  –0.2  0.2    5.0***  0.4  0.9*   0.0 0.72 
10 21.8  0.8   0.5  0.0    3.9***  0.6  1.6  –0.4 0.28 
11 11.3  3.3***   2.5**  1.2    9.7***  3.6***  2.7**   0.8 0.89 
12  9.9  0.4   1.2  1.2    3.7***  2.5*  1.5   0.0 0.52 
13 25.9 –1.5  –0.3 –1.8    2.8*  3.5**  1.4  –0.8 0.39 
14 16.3  1.5   1.9 –0.2    5.1**  0.6  0.1  –1.9 0.34 
15  7.1  1.1   0.2  0.8    1.3  0.8  1.4*  –0.7 0.31 
16  0.3 –0.3*  –0.3*  0.3*    0.3* –0.2 –0.3*   0.3* 0.42 
17  0.8 –0.5**  –0.6**  0.1    0.5** –0.2 –0.4*   0.3 0.35 
18  3.2 –2.5***  –1.8**  0.4    1.4** –0.5 –1.3*   1.2* 0.49 
19  3.0 –1.2**   1.2** –0.9*   –2.7***  1.5*** –1.6***   0.8* 0.76 
20  2.8 –1.3***   1.3*** –0.8**   –2.7***  1.2*** –1.1***   0.8** 0.81 
21 12.2  0.5  –0.3 –0.3  –11.7***  0.8 –1.1   0.0 0.77 
22  2.6  1.1  –0.9* –0.8*   –2.2*** –0.4  0.3   0.6 0.67 
23  1.1  0.1  –0.2 –0.1   –1.1**  0.2 –0.2   0.1 0.36 
24 16.5  2.6  –3.2  0.1  –13.7***  4.3* –3.8  –1.8 0.76 
25  4.8  2.7**  –2.9** –0.1   –4.3***  0.5 –0.2  –0.2 0.73 
26 17.9  7.7***   2.4*  0.0    4.8***  0.3  1.1  –0.2 0.65 
27  6.4 –1.5  –2.0* –1.7    3.2**  1.0  1.0  –0.4 0.72 
28  8.3 –4.6***  –2.3* –0.1    0.5 –0.9  0.6  –0.1 0.85 
29 11.0 –3.3**  –3.2** –3.0*    3.8**  1.4  0.5   0.0 0.74 
30  2.3 –1.2  –1.8* –1.5*    1.4  0.7  1.1  –0.9 0.59 
31  3.9 –1.6*  –2.1* –1.2    2.8***  1.3  1.1  –1.0 0.64 
No. sign. cells  17  22  9   27  9 10   6  

b-values are given for the main effects, and for the two- and three-way interactions, of the three factors motion (M), form (F) and location (L). Each factor consisted of two levels (coded –1 or + 1). Factor motion: away from subject (01501), toward subject (+1); Factor form: back body view (–1), front body view (+1); factor location: near to subject (–1), and far from subject (+1). The intercept values represent the mean number of spikes/s of each cell across all conditions. Significant b values: *P < 0.05; **P < 0.005; ***P < 0.0005. Squared multiple correlation coefficients are shown in the column on the right (R2).

Table 2


 Summary of the mean number of spikes per second for all individual cells in the eight experimental conditions

Cell Away      Toward     
 Back   Front   Back   Front  
 Near (C1) Far (C2)  Near (C3) Far (C4)  Near (C5) Far (C6)  Near (C7) Far (C8) 
 1 20.6 41.2   1.5  2.5   0.5  0.5   3.0  1.8 
 2 18.2 27.8   1.5  3.0   1.0  1.0   2.4  2.0 
 3  0.3  1.0   0.0  0.4   4.3  6.5   8.4 10.8 
 4  0.3  0.3   0.7  1.7  13.7  7.3  60.3 21.7 
 5  6.9 29.7   3.6 13.6  12.0  4.4   8.0  4.0 
 6 21.8 24.3   8.8  8.3  16.8 23.8   3.5  4.3 
 7 21.5 23.8   6.0  4.3  28.2 37.2   2.3  1.5 
 8  2.0 5.0   0.3  1.7   0.0  0.0   0.8  0.0 
 9 16.2 14.1   4.0  5.4   5.7  5.2  13.5 16.4 
10 27.0 21.8  16.1 19.1  18.9 17.6  25.2 28.7 
11 19.5 10.8   1.2  0.4   1.0  3.8  18.5 35.3 
12 14.8  9.3   6.8  7.0   3.3  7.5  10.0 20.3 
13 38.0 22.8  27.3 21.1  20.8 23.0  24.7 29.2 
14 20.8 15.0  10.4 12.9   8.5 13.0  26.2 23.3 
15  9.2  5.0   2.9  7.0   5.8  7.8   7.5 12.0 
16  0.2 2.2   0.0  0.0   0.0  0.1   0.0  0.1 
17  1.5 3.4   0.6  0.2   0.3  0.4   0.5  0.1 
18  5.5 12.3   4.1  0.8   1.1  1.4   0.5  0.1 
19  0.3  0.2  12.8  3.3   1.8  4.7   0.5  0.2 
20  0.3  0.2  11.8  4.3   2.2  3.7   0.0  0.2 
21  0.3  0.2  25.3 21.0  23.3 26.3   1.3  0.0 
22  0.3  0.0   3.5  2.0  8.8  4.8   1.0  0.3 
23  0.0  0.2   2.5  1.3   2.3 2.8   0.0  0.0 
24  5.7  1.2  30.7 18.3  26.0 46.0   3.3  1.0 
25  1.3  0.2   4.0  3.0  14.0 15.5   0.3  0.4 
26 14.1 11.1   6.8  8.8  19.0 17.8  31.7 34.1 
27 17.3  9.0   4.0  1.5   5.0  2.5   6.0  6.0 
28 15.7 15.7   8.5 11.5   7.0  4.0   2.3  1.3 
29 26.3 16.5  11.3  3.5   9.0  5.0   9.5  7.3 
30 10.8  2.5   0.5  0.0   2.5  0.5   1.3  0.0 
31 15.0  5.8   1.0  0.3   1.5  1.5   2.8  3.3 
Cell Away      Toward     
 Back   Front   Back   Front  
 Near (C1) Far (C2)  Near (C3) Far (C4)  Near (C5) Far (C6)  Near (C7) Far (C8) 
 1 20.6 41.2   1.5  2.5   0.5  0.5   3.0  1.8 
 2 18.2 27.8   1.5  3.0   1.0  1.0   2.4  2.0 
 3  0.3  1.0   0.0  0.4   4.3  6.5   8.4 10.8 
 4  0.3  0.3   0.7  1.7  13.7  7.3  60.3 21.7 
 5  6.9 29.7   3.6 13.6  12.0  4.4   8.0  4.0 
 6 21.8 24.3   8.8  8.3  16.8 23.8   3.5  4.3 
 7 21.5 23.8   6.0  4.3  28.2 37.2   2.3  1.5 
 8  2.0 5.0   0.3  1.7   0.0  0.0   0.8  0.0 
 9 16.2 14.1   4.0  5.4   5.7  5.2  13.5 16.4 
10 27.0 21.8  16.1 19.1  18.9 17.6  25.2 28.7 
11 19.5 10.8   1.2  0.4   1.0  3.8  18.5 35.3 
12 14.8  9.3   6.8  7.0   3.3  7.5  10.0 20.3 
13 38.0 22.8  27.3 21.1  20.8 23.0  24.7 29.2 
14 20.8 15.0  10.4 12.9   8.5 13.0  26.2 23.3 
15  9.2  5.0   2.9  7.0   5.8  7.8   7.5 12.0 
16  0.2 2.2   0.0  0.0   0.0  0.1   0.0  0.1 
17  1.5 3.4   0.6  0.2   0.3  0.4   0.5  0.1 
18  5.5 12.3   4.1  0.8   1.1  1.4   0.5  0.1 
19  0.3  0.2  12.8  3.3   1.8  4.7   0.5  0.2 
20  0.3  0.2  11.8  4.3   2.2  3.7   0.0  0.2 
21  0.3  0.2  25.3 21.0  23.3 26.3   1.3  0.0 
22  0.3  0.0   3.5  2.0  8.8  4.8   1.0  0.3 
23  0.0  0.2   2.5  1.3   2.3 2.8   0.0  0.0 
24  5.7  1.2  30.7 18.3  26.0 46.0   3.3  1.0 
25  1.3  0.2   4.0  3.0  14.0 15.5   0.3  0.4 
26 14.1 11.1   6.8  8.8  19.0 17.8  31.7 34.1 
27 17.3  9.0   4.0  1.5   5.0  2.5   6.0  6.0 
28 15.7 15.7   8.5 11.5   7.0  4.0   2.3  1.3 
29 26.3 16.5  11.3  3.5   9.0  5.0   9.5  7.3 
30 10.8  2.5   0.5  0.0   2.5  0.5   1.3  0.0 
31 15.0  5.8   1.0  0.3   1.5  1.5   2.8  3.3 

C1–C8 represent the eight conditions. For each cell, the maximal number of spikes is shown in bold and is underlined. The entries in other conditions that did not differ significantly from the maximal value are underlined (post-hoc Newman–Keuls, P < 0.05).

References

Aggleton JP, Passingham RE (
1981
) Stereotaxic surgery under X-ray guidance in the rhesus monkey, with special reference to the amygdala.
Exp Brain Res
 
44
:
271
–276.
Allison T, Puce A, McCarthy G., (
2000
) Social perception from visual cues: role of the STS region.
Trends Cogn Sci
 
4
:
267
–278.
Ashbridge E, Perrett DI (
1998
) Generalising across object orientation and size. In: Perceptual Constancy (Walsh V, Kulikowski J, eds), pp.
192
–209. Cambridge: Cambridge University Press.
Ashbridge E, Perrett DI, Oram MW, Jellema T (
2000
) Effect of image orientation and size on object recognition: Responses of single units in the macaque monkey temporal cortex.
Cogn Neuropsychol
 
17
:
13
–34.
Baker CI, Keysers C, Jellema T, Perrett DI (
2000
) Coding of spatial position in the superior temporal sulcus of the macaque.
Curr Psychol Lett Behav Brain Cogn
 
1
:
71
–87.
Baker CI, Keysers C, Jellema T, Wicker B, Perrett, DI (
2001
)
Neuron
 al representation of disappearing and hidden objects in temporal cortex of the macaque.
Exp Brain Res
 
140
:
375
–381.
Boussaoud D, Ungerleider LG, Desimone R (
1990
) Pathways for motion analysis: cortical connections of the medial superior temporal and fundus of the superior temporal visual areas in the macaque.
J Comp Neurol
 
296
:
462
–495.
Bruce C, Desimone R, Gross CG (
1981
) Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque.
J Neurophysiol
 
46
:
369
–384.
Courtney SM, Ungerleider LG, Keil K, Haxby JV (
1996
) Object and spatial visual working memory activate separate neural systems in human cortex.
Cereb Cortex
 
6
:
39
–49.
Desimone R, Ungerleider LG (
1989
) Neural mechanisms of visual processing in monkeys. In: Handbook of Neuropsychology (Boller F, Grafman J, eds), vol. 2, pp.
267
–299. Amsterdam: Elsevier.
Dobbins AC, Jeo RM, Fiser J, Allman JM (
1998
) Distance modulation of neural activity in the visual cortex.
Science
 
281
:
552
–555.
Emery NJ, Perrett DI (
1994
) Understanding the intentions of others from visual signals: neurophysiological evidence.
Curr Psychol Cognit
 
13
:
683
–694.
Epstein R, Kanwisher N (
1998
) A cortical representation of the local visual environment.
Nature
 
392
:
598
–601.
Epstein R, Harris A, Stanley D, Kanwisher N (
1999
) The parahippocampal place area: Recognition, navigation, or encoding?
Neuron
 
23
:
115
–125.
Felleman DJ, Essen DCV (
1991
) Distributed hierarchical processing in the primate cerebral cortex.
Cereb Cortex
 
1
:
1
–47.
Goodale MA, Milner AD, Jakobson LS, Carey DP (
1991
) A neurological dissociation between perceiving objects and grasping them.
Nature
 
349
:
154
–156.
Gross CG, Rocha-Miranda CE, Bender, DB (
1972
) Visual properties of neurons in inferotemporal cortex of the macaque.
J Neurophysiol
 
35
:
96
–111.
Haxby JV, Grady CL, Horwitz B, Ungerleider LG, Mishkin M, Carson RE, Herscovitch P, Schapiro, MB, Rapoport SI (
1991
) Dissociation of object and spatial visual processing pathways in human extrastriate cortex.
Proc Natl Acad Sci USA
 
88
:
1621
–1625.
Hikosaka K, Iwai E, Saito H-A, Tanaka K (
1988
) Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey.
J Neurophysiol
 
60
:
1615
–1637.
Jellema T, Perrett DI (
1999
) Coding of object position in the banks of the superior temporal sulcus of the macaque.
Soc Neurosci Abstr
 
25
:919.
Jellema T, Baker CI, Wicker B, Perrett DI (
2000
) Neural representation for the perception of the intentionality of actions.
Brain Cogn
 
44
:
280
–302.
Jellema T, Perrett DI (
2002
) Neural coding for visible and hidden objects. Attention and Performance XIX, pp.
356
–380.
Jellema T, Baker CI, Oram, MW, Perrett, DI (
2002
) Cell populations in the banks of the superior temporal sulcus of the macaque and imitation. In: The imitative mind. development, evolution, and brain bases (Prinz W, Meltzoff A, eds), pp.
267
–290. Cambridge: Cambridge University Press.
Jellema T, Perrett DI (
2003
) Perceptual history influences neural responses to face and body postures.
J Cogn Neurosci
 
15
:
961
–971.
Karnath H-O (
2001
) New insights into the functions of the superior temporal cortex.
Nat Rev Neurosci
 
2
:
568
–576.
Kosslyn SM (
1987
) Seeing and imagining in the cerebral hemispheres: a computational approach.
Psychol Rev
 
94
:
148
–175.
Köhler S, Kapur S, Moskovitch M, Winocur G, Houle S (
1995
) Dissociation of pathways for object and spatial vision: a PET study in humans.
Neuroreport
 
6
:
1865
–1868.
Kovács G, Sáry G, Köteles K, Chadaide Z, Tompa T, Vogels R, Benedek G (
2003
) Effects of surface cues on macaque inferior temporal cortical responses.
Cereb Cortex
 
13
:
178
–188.
Livingstone M, Hubel D (
1988
) Segregation of form, color, movement & depth: anatomy, physiology and perception.
Science
 
240
:
740
–749.
Maguire EA, Frackowiak RSJ, Frith CD (
1997
) Recalling routes around London: activation of the right hippocampus in taxi drivers.
J Neurosci
 
17
:
7103
–7110.
Milner AD, Goodale MA (
1995
) The Visual Brain in Action. Oxford: Oxford University Press.
Mistlin AJ, Perrett DI (
1990
) Visual and somatosensory processing in the macaque temporal cortex: the role of ‘expectation’.
Exp Brain Res
 
82
:
437
–450.
O’Mara SM (
1995
) Spatially-selective firing properties of hippocampal neurons in rodents and primates.
Prog Neurobiol
 
45
:
253
–274.
Optican LM, Richmond BJ (
1987
) Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. III. Information theoretic analysis.
J Neurophysiol
 
57
:
162
–178.
Oram MW, Perrett DI (
1994
) Responses of anterior superior temporal polysensory (STPa) neurons to ‘biological motion’ stimuli.
J Cogn Neurosci
 
6
:
99
–116.
Oram MW, Perrett DI (
1996
) Integration of form and motion in the anterior superior temporal polysensory area (STPa) of the macaque monkey.
J Neurophysiol
 
76
:
109
–1297.
Perrett DI, Rolls, ET, Caan W (
1982
) Visual neurones responsive to faces in the monkey temporal cortex.
Exp Brain Res
 
47
:
329
–342.
Perrett DI, Smith PAJ, Potter DD, Mistlin AJ, Head AS, Milner AD, Jeeves MA (
1984
)
Neuron
 es responsive to faces in the temporal cortex: studies of functional organization, sensitivity to identity and relation to perception.
Hum Neurobiol
 
3
:
197
–208.
Perrett DI, Smith PAJ, Mistlin AJ, Chitty AJ, Head AS, Potter DD, Broennimann R, Milner AD, Jeeves MA., (
1985
) Visual analysis of body movements by neurons in the temporal cortex of the macaque monkey: a preliminary report.
Behav Brain Res
 
16
:
153
–170.
Perrett DI, Smith, PAJ, Potter DD, Mistlin AJ, Head AS, Milner AD, Jeeves MA (
1985
) Visual cells in the temporal cortex sensitive to face view and gaze direction.
Proc R Soc Lond Biol B
 
223
:
293
–317.
Perrett DI, Mistlin AJ, Harries MH, Chitty AJ., (
1989
) Understanding the visual appearance and consequences of hand actions. In: Vision and action: the control of grasping (Goodale MA, ed.), Norwood, NJ: Ablex Publishing Corporation.
Perrett DI, Harries MH, Bevan R, Thomas S, Benson PJ, Mistlin AJ, Chitty AJ, Hietanen JK, Ortega JE (
1989
) Frameworks of analysis for the neural representation of animate objects and actions.
J Exp Biol
 
146
:
87
–113.
Perrett DI, Oram MW, Harries MH, Bevan R, Hietanen JK, Benson PJ, Thomas S (
1991
) Viewer-centred and object-centred coding of heads in the macaque temporal cortex.
Exp Brain Res
 
86
:
159
–173.
Perrett DI, Hietanen JK, Oram MW, Benson PJ (
1992
) Organization and functions of cells responsive to faces in the temporal cortex.
Phil Trans R Soc Lond Biol B
 
335
:
23
–30.
Perrett DI, Oram MW (
1998
) Visual recognition based on temporal cortex cells: viewer-centred processing of pattern configuration.
Zeitschr Naturforschung C-A/ J Biosci
 
53
:
518
–541.
Reynolds JH, Desimone R (
1999
) The role of neural mechanisms of attention in solving the binding problem.
Neuron
 
2419
–29.
Rolls ET, Baylis GC (
1986
) Size and contrast have only small effects on the responses to faces of neurons in the cortex of the superior temporal sulcus of the monkey.
Exp Brain Res
 
65
:
38
–48.
Rolls ET, Robertson RG, Georges-François P (
1997
) Spatial view cells in the primate hippocampus.
Eur J Neurosci
 
9
:
1789
–1794.
Seltzer B, Pandya DN (
1978
) Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey.
Brain Res
 
149
:
1
–24.
Seltzer B, Pandya DN (
1994
) Parietal, temporal and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: a retrograde tracer study.
J Comp Neurol
 
243
:
445
–463.
Singer W, Gray CM (
1995
) Visual feature integration and the temporal correlation hypothesis.
Annu Rev Neurosci
 
18
:
555
–586.
Tamura R, Ono T, Fukuda M, Nakamura K (
1992
) Spatial responsiveness of monkey hippocampal neurons to various visual and auditory stimuli.
Hippocampus
 
2
:
307
–322.
Tanaka YZ, Koyama T, Mikami A (
1999
)
Neuron
 s in the temporal cortex changed their preferred direction of motion dependent on shape.
Neuroreport
 
10
:
393
–397.
Ungerleider LG, Mishkin M (
1982
) Two cortical visual systems. In: Analysis of visual behavior (Ingle, DJ, Goodale MA, Mansfield RJW, eds), pp.
549
–586. Cambridge, MA: MIT Press.