Human and non-human primates are able to perceive three-dimensional structure from motion displays. Three-dimensional structure-from-motion (object-motion) displays were used to test the hypothesis that neurons in the anterior division of the superior temporal polysensory area (STPa) of monkeys can selectively respond to three-dimensional structure-from-motion. Monkeys performed a reaction time task that required the detection of a change in the fraction of structure in three-dimensional transparent sphere displays. Neurons were able to distinguish structured and unstructured three-dimensional optic flow. These cells could differentiate the change in structure-from-motion at stimulus presentation and when the animal was detecting the amount of structure in the display. Some of these neurons were also tuned for characteristics of the sphere stimuli. Cells were also tested with navigational motion and many were found to respond both to three-dimensional structure-from-motion and navigational motion. These results suggest that STPa neurons represent specific aspects of three-dimensional surface structure and that neurons within STPa contribute to the perception of three-dimensional structure-from-motion.
In 1909, Helmholtz demonstrated that Homo sapiens can integrate motion information to create three-dimensional percepts (von Helmholtz, 1924). Psychophysicists defined the constructive nature of this task (Wallach et al., 1953) and computational studies have established many of the constraints underlying this perceptual ability (Ullman, 1979; Longuet-Higgins and Prazdny, 1980; Marr, 1982; Hoffman and Bennett, 1986). The ability to perceive structure-from-motion is remarkable as it involves the combination of information thought segregated into two separate visual streams (the ventral ‘what’ and dorsal ‘where’ pathways) (Mishkin et al., 1983).
A psychophysical task was developed to test whether monkeys had the same psychophysical ability to perceive structure-from-motion as humans. In that study, both species were tested with a structured, hollow three-dimensional sphere and a control ‘unstructured’ stimulus (Siegel and Andersen, 1988), and the two species were indeed similar across a broad range of stimulus parameters, indicating that the monkey was a valid model for the human percept.
Neurons that are sensitive to the rotation of objects in depth have been reported in MT (Bradley et al., 1998), MST (Saito et al., 1986; Tanaka et al., 1986) and the inferior parietal cortex (Sakata et al., 1986, 1994; Shikata et al., 1996). These neurons have been shown to be tuned to various aspects of three-dimensional structure-from-motion, and may form a first step in generating a representation of structure-from-motion; they have not been directly tested for their changes in the fraction-of-structure. There are a number of studies showing that inferior temporal neurons respond to shape (Gross et al., 1972; Kayaert et al., 2003), but these are not solely defined by motion. Functional magnetic resonance imaging (fMRI) studies in human and monkey subjects have described areas that are selective for aspects of three-dimensional structure-from-motion (Orban et al., 1999; Sereno et al., 2002; Vanduffel et al., 2002); however, these blood flow derivative studies cannot look at the temporal details of single neurons. Each of these areas may contain elements, or a complete representation, of three-dimensional structure-from-motion. Many of these converge upon the temporal lobe.
A likely neural candidate in the temporal lobe based upon these results and connectional data is the superior temporal polysensory area (STP) (Anderson and Siegel, 1999), which lies in the upper bank and floor of the superior temporal sulcus. STP is connected to both streams, in particular to regions rich in motion information (MST and 7a) and form information (TE) (Cusick et al., 1995). STP can be divided into two broad regions, an anterior division (STPa) and a posterior division (STPp). The anterior portion of the superior temporal polysensory area (STPa) receives input from both the dorsal (motion/spatial) and ventral (object) visual processing streams (Baizer et al., 1991; Boussaoud et al., 1990; Cusick et al., 1995; Seltzer and Pandya, 1984). STPa neurons respond well to moving stimuli and are selective to different types of motion including biological motion (Bruce et al., 1981; Oram et al., 1993; Wachsmuth et al., 1994). Recent fMRI studies indicate areas that are either STP or areas near STP have blood flow that is dependent on the structure in a motion display (Orban et al., 1999; Sereno et al., 2002; Vanduffel et al., 2002). Selectivity for other complex global motion patterns (e.g., navigational optic flow) has also been found in STPa using single unit recording methods (Anderson and Siegel, 1999; Bruce et al., 1981).
In the current study, neurons were found in STPa that had the characteristics expected if these cells represent three-dimensional structure-from-motion. Indeed the change in neuronal activity of some of these neurons is directly correlated with the monkey's performance of the behavioral task. Portions of these results have appeared in abstract form (Anderson and Siegel, 1995, 1997).
Materials and Methods
Neurons were recorded in STPa from three hemispheres of two male Rhesus monkeys that performed a reaction time task requiring the detection of three-dimensional structure-from-motion as compared to a control unstructured stimulus (Siegel and Andersen, 1988; 1990). These two stimuli were expressly constructed to have exactly the same local and global density of points, and the same distribution of local motion components. The only difference between the two displays was the fraction of structure (FOS) (Siegel and Andersen, 1988; 1990), which indicates the spatial organization of the motion components that define the three-dimensional shape (Longuet-Higgins and Prazdny, 1980). When the FOS was 1, all the motion trajectories were in the correct position, giving rise to a three-dimensional hollow sphere; a FOS = 0 indicated that the motion trajectories were randomly shuffled, yielding the ‘unstructured’ display. Psychophysical studies have shown that monkeys and humans detect changes in these stimuli similarly (Siegel and Andersen, 1988; 1990), providing a foundation for exploring the properties of neurons in monkey in order to understand the neuronal basis of this perception in humans. Up to four different sphere displays were used to test each cell. The four spheres differed in diameter (10 and 20°) and axis of rotation (vertical and horizontal).
Two male Rhesus monkeys (4–6 kg) were trained to perform a reaction time task while fixating a central 0.3° point as described elsewhere (Siegel and Andersen, 1988, 1990; Siegel and Read, 1997). The monkey pulled a lever at the onset of the fixation point. Two seconds later, the visual stimulus came on. A change in the structure of the display occurred randomly between 3500 and 6000 ms after the fixation point onset. The monkey needed to release the key within a reaction time window of 150–800 ms for a juice reward. The monkey's head was fixed and one eye's position was monitored with an ISCAN infrared tracker to be within 1°. Saccades were not permitted. The monkeys correctly performed this task for 80–100% of the trials. Typically 8–12 trials were collected for each stimulus condition.
The visual stimuli (Fig. 1) all consisted of 128 points (0.1°) that had a limited lifetime of 532 ms (Morgan and Ward, 1980; Siegel and Andersen, 1988; 1990). The spheres rotated at an angular velocity of 60°/s around an axis that was in the plane of the display. Particular care was taken to ensure that the point density was kept constant to avoid density form cues for three-dimensional shape (Ratzlaff and Siegel, 1990; Anderson and Siegel, 1999). Receptive fields are typically over 40° in size (Anderson and Siegel, 1999) and include the fovea. Therefore, spheres were chosen to be well within the receptive field with a diameter of 10° or 20° of visual angle and centered on the fixation point (Fig. 1a). An orthographic projection of the sphere resulted in fast motions at the centers and slower ones at the edges (Fig. 1b). Rotation was along the vertical or horizontal axis. These manipulations of diameter and axis of rotation lead to the four different sphere characteristics used in this study. The displays were ‘unstructured’ by randomly displacing entire motion trajectories in a square window whose width was equal to the diameter of the motion display (Fig. 1c). In the unstructured display, the same distribution of motions was used, but the motion trajectories were randomly displaced. A completely structured display had a FOS of 1; an unstructured display had a FOS of 0.
Navigational Optic Flow
Neurons in STPa have been shown to be selective to navigational optic flow (Fig. 1d–g). The question arises as to whether the STPa neurons selective to three-dimensional structure-from-motion also respond to the navigational optic flow. Thus cells were also studied with the navigational optic flow. These optic flow data are presented solely for comparison and are fully discussed in an earlier publication (Anderson and Siegel, 1999). The navigational displays had the same number of points, and the same point life. The points rotated at 60°/s. Radial expansion and compression displays were constructed to have exactly the same speed distributions as the planar rotations by using the speed profiles from the planar rotation stimuli. The navigational stimuli were 40° in diameter.
The stimuli were designed to have changes well above detectable thresholds. The ease of the task ensured that the task difficulty was minimized across all stimulus conditions and that “task” difficulty could not account for differences in response. This experimental design precluded single trial error analysis as very few incorrect trials were observed.
Electrophysiology, Anatomical Methods and Statistical Analysis
The monkeys were implanted with a cap made of Smith+Nephew Palacos orthopedic cement and Synthes screws with a pedestal to fix the head. A chamber was implanted so that penetrations could be made in the frontal plane. Single units were recorded using standard methods (Anderson and Siegel, 1999) with an interspike interval precision of 0.1 ms. Chamber placements and penetrations were guided using MRIs taken prior to the study and confirmed using radiography of the electrodes in situ. Electrolytic lesions were made at the end of the recording and the recording sites were verified histologically (Anderson and Siegel, 1999).
Peri-stimulus time histograms were computed from the correct behavioral trials. Typically 8–12 trials were averaged and a 25 ms bin width was used. The responses of the neurons were quantified by measuring the firing rate for 500 ms before and after stimulus onset. When the firing rate at stimulus change was evaluated, the firing rate for 500 ms before the change was compared to the rate for the time following the change up to when the key was released, ∼350 ms (Siegel and Read, 1997). This ensured behavioral control during all phases of the task. The firing rate was expressed in Hertz and the trial-by-trial data was subjected to a two-way analysis of variance as described elsewhere (Anderson and Siegel, 1999). A neuron was defined as selective if it significantly responded differentially to at least one of the stimuli within a given set of stimuli using a two-way analysis of variance (P < 0.05) (Siegel and Read, 1997; Anderson and Siegel, 1999). Sensitive cells had a significant effect of the stimulus onset (P < 0.05), but did not have a significant dependence on the type of stimulus.
The neurons described in this study were drawn from a population partly described previously (Anderson and Siegel, 1999). All procedures were approved by the Rutgers University Animal Institutional Review Board and were in accordance with the NIH Guidelines on the Care and Use of Animals in Research.
Of 464 visual neurons tested, 266 (57%) had a significant response to the onset of the structured sphere stimuli (Fig. 2a), with 112 having responses that were selective for the size and/or axis of rotation. These significant responses could either be increases or decreases in firing rate. This initial onset response could be the result of the three-dimensional structure derived from motion or it could be due to the simplest qualities of the stimuli (e.g. the change in luminance at onset). The presentation of the unstructured control motion stimulus eliminates the latter explanation. The cell in Figure 2 did not have a significant response to the onset of unstructured motion display (Fig. 2b) for which the motion components were exactly the same but the percept of a hollow sphere is lost; the only difference between the two displays was the spatial distribution of speeds which defined the sphere. The response to the structured motion and the dependence on the size of the display suggests the neuron was selective for some aspects of the three-dimensional structure of the display.
Response to Sphere Stimuli at Stimulus Change
A direct assessment of the neurons' response to changes in the FOS within each trial was obtained by evaluating the response at the time the stimulus changed from structured to unstructured motion or vice versa (Fig. 3). At this time, not only is the animal attending to the stimulus, he is also in the process of detecting the change in the FOS of the stimulus. Cells were found that did not respond to the initial onset of the display, but did respond significantly when the FOS in the display changed (Fig. 3a versus b). The increase in activity preceded key release (Fig. 3c). This cell's response is reasonably difficult to explain on trivial grounds as its response cannot be attributed to simple characteristics of the stimuli such as directional selectivity, number of points and change in luminance. The only characteristic of the display that changes is the spatial distribution of speeds across the display. When the stimulus is unstructured, there is a bounded random distribution; when the display is structured, there are faster speeds in the center and slower ones at the edge.
Of 464 neurons tested with a reduction in the FOS (structured to unstructured), 70 (15%) responded significantly to the change. These responses could not be attributed to the loss of behavioral control as these changes preceded the animal's release of the key and juice reward (Fig. 3c). Cells such as these responded to the subtle changes in the spatial distribution of speeds across the display. It is this precise event — the change in the structured motion trajectories across the display — that defines the difference between the structured hollow sphere and its unstructured control as shown by computational studies (Longuet-Higgins and Prazdny, 1980). Indeed, it is this same spatial ordering of motion speeds across the display that has been accepted as one definition of the psychophysical ability to perceive structure-from-motion (Siegel and Andersen, 1988; Vaina, 1994; Logothetis et al., 1995). Other neurons were found that responded to the transition from the unstructured control display to the structured hollow sphere (21% of 92 cells), with the majority of these showing an increased firing rate.
Coincident with the change in fraction of structure, the monkey is generating motor planning signals for the ensuing key release. However, motor planning alone cannot explain the change in neuronal firing rate, because cells were also selective to the stimulus characteristics as demonstrated by differential responses to the four types of displays (two diameters by two axis of rotation). Of the 70 cells that showed a significant response to a decrease in structure-from-motion, 49% were selective for the stimulus characteristic of size or axis of rotation; of the 18 cells firing for an increase in fraction of structure, 53% showed selectivity for the stimulus characteristics of size or rotation axis. If the cells were only encoding the motor planning signals, then the responses would not depend on the stimulus characteristics. These data lead to the conclusion that STPa neurons respond to the change in fraction-of-structure in a manner expected based upon psychophysical studies (Siegel and Andersen, 1988). Further, the temporal correlation between the neural activities associated with the transition in the fraction of structure that occurs prior to the behavioral response indicates that the neural activity of these cells can play a role in the perception of structure-from-motion.
One possible description of these cells is that they are only encoding transitions in the fraction-of-structure. However this does not appear to be the case for two reasons. First, many of these cells are tuned to the characteristics of the motion stimuli as shown by the significant effect of the size and/or orientation of the sphere stimuli. A second analysis can address this point directly. Eighty-one of the total cells tested were examined with both the ‘unstructured to structured’ transition and the ‘structured to unstructured’ transition using the analysis of variance described above. Of this group of neurons, 50 (62%) did not respond significantly to either transition, 14 (17%) responded significantly only to a decrease in the fraction-of-structure, and 13 responded significantly only to the increase in the fraction-of-structure. Only four neurons responded to both experimental runs indicating a general sensitivity to the transition in structure-from-motion in both directions; of these only two (2.5%) were similarly tuned to the size and direction characteristics of the sphere. The gross majority of neurons (33% versus 2.5%) were tuned to either an increase or decrease in the fraction-of-structure indicating that these neurons were not tuned to solely indicate transitions in structure-from-motion regardless of the underlying stimulus characteristics.
Response to Sphere Stimuli at Onset
As an additional measure of whether the cells could differentiate between structured and unstructured three-dimensional structure, comparisons were made across two experimental runs for individual cells. For example, the responses to structured motion onsets and unstructured motion onsets were compared. The hypothesis of a common component of the response attributable to the simple increase in luminance at onset could be tested by an analysis of variance. Similarly, if the comparison were made when the stimulus structure changed, the common premotor components would be discernible. As the reaction time task had a randomized variable delay to the change in fraction-of-structure, there were only two visual events in the task across two experimental runs that could matched: stimulus onset and stimulus change.
The response to the onset of the structured motion displays was directly compared with the response to the onset of the ‘unstructured’ motion control displays. The comparison was performed using a two-way ANOVA with one independent variable corresponding to the FOS of the display and the other to the characteristics of the sphere (two stimulus sizes by two axes of rotation). Thus the response of a cell with a significant effect of the FOS cannot be explained by simple effect of luminance or the presence of motion. This analysis was performed for the 90 cells that were tested with both structured and unstructured motion displays. (The two cells that were only tested with unstructured to structured motion were not included in this analysis; the 374 cells that were only tested with the structured to unstructured motion were also excluded.)
A comparison of the response at onset in Figure 2a,b indicates that this cell differentially responded to the onset of structured motion versus the onset of unstructured motion. Thirty-three cells of the 90 tested (37%) were able to distinguish the structured from the unstructured motion at stimulus onset (Fig. 4a, P < 0.05 USM versus SM). These cells appear to be extracting the three-dimensional structure from the motion display at onset.
One-quarter of these putative structure-from-motion cells were able to provide information about the characteristics of the stimuli in that the diameter and/or axis of rotation had a significant effect on firing rate (Fig. 3a; P < 0.05 for USM vs. SM, tuned). The other 57 cells of the 90 tested (63%) were unable to distinguish unstructured from structured motion (P > 0.05 for USM versus SM) at stimulus onset. The responses of this latter group of cells are most simply explained as a response to lower-order characteristics of the stimuli (e.g. increase in luminance or the presence of translation motion). Thus either the effect of the luminance or the translation motion may dominate the sensitivity to fraction-of-structure. Alternatively, there is no real selectivity for structure-for-motion for these 57 cells at onset.
The premotor signals do not appear to play a substantial role later in the trial, when the animal is detecting the change in the fraction-of-structure. At that time, 44 of the 90 cells (49%) had a significant response to the change in the structured and unstructured displays (Fig. 4b, P < 0.05 SM versus USM). Half of these FOS selective cells were also encoding information about the characteristics of the sphere (Fig. 3b, P < 0.05 SM versus USM, tuned).
Both of these population measures indicated an increase in the number of selective responses at the time the stimulus changed. A larger percentage of the 90 neurons tested were selective for the SM versus USM comparison (44 versus 33 cells; change versus onset) and more were tuned to the stimulus characteristics at the change (22 versus 9 cells). This change in selectivity as the task progressed that the attentional state of the animal could alter the structure-from-motion selectivity of the neurons as has been demonstrated for the inferior parietal lobule (Siegel and Read, 1997; Phinney and Siegel, 2000).
Thus we have shown that neurons respond selectivity to fraction-of-structure for a rotating sphere. This selectivity to the fraction-of-structure suggests that these cells are involved in the perceptual ability to extract three-dimensional structure-from-motion. An alternative explanation for the response of these neurons is that they simply respond to the gradient of speeds across the display. For example, the gradient of speeds for the rotating sphere is faster speeds in the middle and very slow speeds at the edges. In contrast, the gradient of speeds for an unstructured sphere is random with the range of speeds limited to that contained within a structured motion display. However, it is precisely this gradient in the structured motion displays that defines the three-dimensional depth.
Dependence on Size and Axis of Rotation
The effects of size and axis of rotation on the response of these putative three-dimensional structure-from-motion cells were evaluated with Bonferroni post hoc tests (P < 0.05). Half of the putative three-dimensional structure-from-motion neurons had no selectivity to the characteristics of the sphere. In one sense these could be considered cells that solely represent ‘sphereness’ from motion. The other half (22/46) of these putative three-dimensional cells showed responses that were modulated by characteristics of the sphere stimuli. Eleven neurons were modulated by the axis of rotation of the spheres, some by the size (n = 5), and others by both size and axis of rotation (n = 6). This representation of both types of neurons within STPa suggests that there could be a hierarchical arrangement for the processing of higher-order motion components, where the size- and orientation-dependent cells are combined to form cells independent of these characteristics.
Comparison of Response to Spheres and Navigational Optic Flow
STPa neurons are also known to be selective for optic flow patterns derived from egocentric motion (Anderson and Siegel, 1999). Ninety neurons were tested with both the onset of SM and USM to determine which responded exclusively to three-dimensional structure-from-motion or were more broadly tuned for navigational optic flow (Anderson and Siegel, 1999). Radial and rotating optic flow fields were used to test for navigation flow selectivity (Fig. 5). This cell responded only to the three-dimensional stimuli and not to the optic flow stimuli. Cells were considered to respond to optic flow if they had a significant response relative to baseline at stimulus onset. Forty-three percent of the 90 cells tested were found to respond to optic flow alone and, as described earlier (Anderson and Siegel, 1999), showed a preponderance of selectivity for flow derived from forward egomotion (radial expansion). At stimulus onset, 37% of the 90 neurons were selective to the FOS in the three-dimensional motion displays. A substantial proportion (83%) of these three-dimensional selective cells were also selective to the onset of optic flow. Although there were a greater percentage of cells selective to the three-dimensional FOS when the stimulus changed (49%), the percentage of these cells that were also selective to the navigational optic flow at stimulus change remained about the same at 80%. Thus there seem to be two major classes of neurons within STPa — cells that respond to navigationally based optic flow exclusively and cells that combine this tuning with three-dimensional structure-from-motion selectivity. Neurons that respond exclusively to three-dimensional structure-from-motion are less common (<10% of the cells). It is possible that these two populations could be correlated with two subregions of STPa suggested from anatomical criteria (Cusick et al., 1995); however, it was not possible to determine if there was any spatial segregation of these neurons due to the long-distance penetrations and chronic nature of these recordings.
A Definition of Structure-from-motion
Structured and unstructured motions displays were compared as criteria to define whether neurons were tuned to represent three-dimensional structure-from-motion. This criteria is based upon the acceptance of the psychophysical approach of comparing structured and unstructured motion as a measure of perception of structure-from-motion (Siegel and Andersen, 1988, 1990). Not only has this approach been used to examine the responses of parietal neurons (Siegel and Read, 1997), it has been the foundation of fMRI studies in monkeys and humans examining of the expression of structure-from-motion selectivity across multiple visual areas (Sereno et al., 2002). The general idea is quite similar to the approach of using varying coherence in translation stimuli to alter the monkey's perception of motion (Britten et al., 1993); or scrambling elements of a face display (Desimone et al., 1984). a controlled change in the stimulus is selected to directly test an area's involvement in a perceptual process. The other guidance in the selection of these criteria is the clarification and definition by computational neuroscience as to what components of structure-from-motion are crucial to computing three-dimensional structure-from-motion (Ullman, 1979; Marr, 1982; Hoffman and Bennett, 1986).
We next consider the characteristics of an ‘ideal’ three-dimensional structure-from-motion neuron. At one extreme would be a cell that only responded if an object had a three-dimensional shape defined by motion. This cell would, in principle, not respond to the object's identity, its size, its color or any other characteristic. One might even say it was a ‘grandmother’ cell for a specific visual perceptual property (Konorski, 1967; Barlow, 1972). However, given the advances in our understanding of visual representations in cortex, it is highly unlikely that such a cell could exist, nor would it be possible to exhaustively test it (Van Essen, 1985). The more realistic point of view, both conceptually and pragmatically, is to expect experiments to demonstrate that a putative three-dimensional structure-from-motion neuron has many of the appropriate characteristics, as demonstrated by observing the responses to a series of stimuli grounded in psychophysical experimentation. That is the approach taken here.
It is also realistic to expect that such a structure-from-motion neuron could be sensitive to other characteristics. One need only consider that individual MT/V5 neurons, perhaps the prototype of an exquisitely tuned neuron (i.e. to motion), also have selectivity to many other related visual perceptual attributes. They are selective to disparity, wavelength and context (Albright and Stoner, 2002). Hence it was surprising to find a population of ‘ideal’ neurons that only responded to the three-dimensional structure defined by motion independent of other visual (and perhaps non-visual) events. Other untested stimulus characteristics may modulate these cells. In summary, a putative structure-from-motion would be expected to have a response that distinguished between two different levels of fraction-of-structure and have activity temporally correlated with the perceptual event of detecting changes in fraction-of-structure.
A substantial proportion of neurons were found in STPa that passed these stringent standards. These neurons are exquisitely sensitive to the speed gradients across the receptive field, which is precisely what defines three-dimensional structure-from-motion selectivity (Longuet-Higgins and Prazdny, 1980; Siegel and Andersen, 1988). Thus it is concluded that STPa neurons can represent three-dimensions constructed from two-dimensional motion information. Whether or not these cells represent different shapes remains to be seen; testing with stimuli that have matched two-dimensional contours and different three-dimensional motion shapes could address this issue (Phinney and Siegel, 1999).
Possible Sources of Three-dimensional Structure-from-motion Selectivity in STPa
The analysis of the three-dimensional structure-from-motion may arise in STPa or may be carried from other cortical regions. Area MT has neurons that appear to differentiate the front and back of transparent objects as a function of the monkey's interpretation of the visual image (Bradley et al., 1998). These cells could form a first step in the analysis of three-dimensional structure-from-motion; however, given the small receptive field size of MT neurons, it is not clear how they could directly analyze the difference between structured and unstructured motion as seen in STPa neurons. The contextual interactions from beyond the classical receptive field (Allman et al., 1985; Albright and Stoner, 2002) may play a role in the initial processing of three-dimensional structure-from-motion. However, to date MT neurons have not been demonstrated to distinguish structured and unstructured three-dimensional motion. MST contains groups of neurons sensitive to both optic flow (MSTd) and the movement of objects in depth (MSTl) (Desimone and Ungerleider, 1986; Saito et al., 1986; Tanaka and Saito, 1989; Duffy and Wurtz, 1991; Orban et al., 1992; Tanaka et al., 1993; Graziano et al., 1994), making it a putative source of three-dimensional structure-from-motion selectivity. Other regions that project to STP, such as the caudal intraparietal sulcus (Shikata et al., 1996), or area 7a (Sakata et al., 1994), could be a source of the structure-from-motion selectivity. This possibility is difficult to assess since the three-dimensional structured and unstructured displays have not been tested in these areas with single unit recordings. The fMRI studies in monkey and humans suggest a constellation of areas involved with the analysis of three-dimensional structure-from-motion, including STPa (Orban et al., 1999; Sereno et al., 2002, Vanduffel et al., 2002).
The alternative is that the selectivity to the structure-from-motion of the three-dimensional sphere could arise from the convergence of signals from individual neurons sensitive to direction, transparency and/or motion in depth, perhaps as found in MT or MST. These signals could then be combined locally to give rise to the representation of the three-dimensional structure of an object rotating in depth. A similar mechanism has been proposed for the selectivity of STPa cells to biological motion (Oram et al., 1993; Wachsmuth et al., 1994). In addition, there are static representations of form in nearby regions [TE in the lower bank of superior temporal sulcus (STS) and inferotemporal cortex (IT)] (Janssen et al., 2001; Kayaert et al., 2003), which could contribute to the formation of the representation of structure-from-motion.
Invariance Properties in STPa
Over half of the cells found to be selective for the three-dimensional motion displays also encoded the size and/or axis-of-rotation of the sphere displays, while the others were size and/or axis-of-rotation invariant. Size invariance has been shown for shape-selective neurons in the IT cortex although changes in the size of the preferred shape can alter the absolute firing rate of IT neurons (Schwartz et al., 1983). The effects of size on the response of neurons in STPa may be similar, in that size affects the strength of the response, but not the overall selectivity of the neuron. In addition, the effects of the axis-of-rotation (or orientation) of the spheres on the firing rate of STPa neurons are similar to view-dependent IT neurons (Logothetis et al., 1995). The finding that other STPa neurons respond equivalently regardless of the orientation of the sphere suggests that these latter responses are independent of the three-dimensional viewpoint of the observer relative to the object. Similar view-independent properties have been shown for face- and object-selective neurons in IT (Desimone et al., 1984; Perrett et al., 1985). STPa is connected with IT cortex via the fundus and dorsal bank of the STS (Jones and Powell, 1970; Morel and Bullier, 1990; Cusick et al., 1995). Therefore signals from IT may contribute to the ability of STPa neurons to extract viewer-independent three-dimensional structural information from motion stimuli. Furthermore, STPa may be combining inputs from motion areas in STS with those from IT in order to encode the structure of moving objects. The finding of a more equal distribution of invariant and non-invariant responses to the size and orientation of a motion-defined object in STPa is consistent with its more numerous and direct connections with areas in the dorsal stream than with those in the ventral stream.
Utility of STPa Neurons for Behavior
Many cells in STPa responded to changes in the structure-from-motion stimuli by increasing or decreasing their firing when the stimuli became unstructured. STPa is interconnected with inferior parietal areas thought to be involved in the localization of stimuli for intended movements, including reaching, grasping and object manipulation as well as sensorimotor transformations (Andersen et al., 1990; Baizer et al., 1991; Goodale and Milner, 1992). The three-dimensional structure-from-motion selective neurons often also responded to navigational optic flow, indicating that STPa plays a role in encoding three-dimensional object movement in the environment relative to an observer (Zemel and Sejnowski, 1998). The confluence of these properties at this particular apex of the ‘what’ and ‘where’ pathways support the hypothesis that STPa plays a crucial function in the integration of spatial and form information and its transfer onto motor planning regions to guide or plan grasping and other reaching movements to moving objects in the environment.
Dr Charles Schroeder of Albert Einstein College of Medicine and Drs Martin Gizzi and Lawrence Tannenbaum at JFK Memorial Hospital/NJ Neuroscience Institute performed the magnetic resonance image scans. Dr Cassandra Cusick's (Tulane University) performance of the histology on one of the brains is gratefully appreciated. Supported by NIH/NEI EY09223, ONR N00014-93-1-0334, NIH/NCRR 1S10RR12873 and NSF 9874495.