We have recently demonstrated using fMRI that a region within the human lateral occipital complex (LOC) is activated by objects when either seen or touched. We term this cortical region LOtv for the lateral occipital tactile–visual region. We report here that LOtv voxels tend to be located in sub-regions of LOC that show preference for graspable visual objects over faces or houses. We further examine the nature of object representation in LOtv by studying its response to stimuli in three modalities: auditory, somatosensory and visual. If objects activate LOtv, irrespective of the modality used, the activation is likely to reflect a highly abstract representation. In contrast, activation specific to vision and touch may reflect common and exclusive attributes shared by these senses. We show here that while object activation is robust in both the visual and the somatosensory modalities, auditory signals do not evoke substantial responses in this region. The lack of auditory activation in LOtv cannot be explained by differences in task performance or by an ineffective auditory stimulation. Unlike vision and touch, auditory information contributes little to the recovery of the precise shape of objects. We therefore suggest that LOtv is involved in recovering the geometrical shape of objects.
A central theme in sensory neurophysiology is that information processing in primary sensory areas is strictly modality specific. Moreover, a principle of division of labor within each modality leads to a specialization of function in the various cortical areas [e.g. in the visual system (Zeki, 1978; Ungerleider and Mishkin, 1982; Goodale and Milner, 1992)]. But the separate features of objects (3D motion, shape, texture, color, etc.), which are analyzed separately, must somehow be bound together to generate a coherent percept. This binding problem obviously exists also at the multimodal level, but its solution is poorly understood. One approach suggests that there are points of convergence of the parallel pathways, in specific multimodal brain regions that integrate information from multiple modalities (Zangaladze et al., 1999; Zhou and Fuster, 2000; Shimojo and Shams, 2001; James et al., 2002). We have recently shown that a region within the object-related lateral occipital complex (LOC) (Malach et al., 1995; Tootell et al., 1996; Grill-Spector et al., 1999, 2001; Kourtzi and Kanwisher, 2001) is activated by both visual and tactile inputs, with preference for objects over scrambled versions of the same objects or textures (Amedi et al., 2001). We term this cortical region the lateral occipital tactile–visual region (LOtv).
This finding raises an essential question. Is this region strictly bimodal — activated only by visual and somatosensory inputs — or multimodal — activated by all sensory modalities, provided that the information is relevant for object recognition? Consider identification of a cellular phone using the different senses. The visual and tactile senses directly sample the geometric shape of the phone, but the sound made by a cellular phone is totally arbitrary. If objects activate this region irrespective of the modality used to recognize the object, the activation is more likely to reflect an association between the input and the object identity. If, on the other hand, the activation is specific only to the visual and tactile senses, this may be due to common geometrical attributes between the two modalities, which are not shared with the other senses. Distinguishing between these two alternatives may provide a clue as to the level of representation within this region.
We show here that the object-related activation is robust in both the visual and tactile modalities. In contrast, both meaningless and object-related auditory signals fail to activate this region.
Materials and Methods
The BOLD fMRI measurements were performed in a whole-body 1.5 T, Signa Horizon, LX8.25 General Electric scanner, located at the Wohl Institute for Advanced Imaging in the Tel-Aviv Saurasky Medical Center. The MRI system was equipped with 22 mT/m field gradients with a slew rate of 120 T/m per second (Echospeed). Autoshimming was performed on each subject. In order to facilitate the coordinate determinations in the later data processing, 3D anatomical volumes were collected using a T1 SPGR sequence. The functional MRI protocols were based on a multi-slice gradient echo, echo-planar imaging (EPI) and a standard head coil. The functional data were obtained using the following parameters: TR = 3 s, TE = 55 ms, flip angle = 90°, imaging matrix = 80 × 80, FOV = 24 cm. The 17 slices with slice thickness 4 mm and 1 mm gap were oriented approximately in the axial position, covering the whole brain except the most dorsal and ventral tips.
The visual sequences were generated on a PC, and projected via LCD projector (Epson MP 7200, Japan) onto a tangent screen located inside the scanner in front of the subject. Subjects viewed the screen through a tilted mirror. During the whole experiment the subjects maintained their right arm on a custom-made table and touched the objects and textures in the somatosensory epochs with their right hand. Digital auditory sound sequences were generated on a PC, played on a stereo system, and transferred binaurally to the subjects through a pneumatic device and silicone tubes into a commercially available noise shielding headphones (Slimline noise guard headset, Newmatic sound system, USA) at a fixed level of 86–89 dB SPL.
Eight volunteers without neurological, psychiatric, visual or hearing deficits history [four women and four men, six right-handed and two left-handed (assessed by the Edinburgh test), ages 27–50] participated in the present experiments. The Tel-Aviv Sourasky Medical Center Ethic Committee approved the experimental procedure. A written informed consent was obtained from each subject.
Stimuli and Experimental Paradigms
Somatosensory Stimuli and LOtv Localizer
We delineated the LOtv region of interest (LOtv ROI) using the same procedures as published previously (Amedi et al., 2001). A set of 18 objects and textures were used. The objects were 3D solid bodies in a convenient size to grasp with one hand. The objects were from three different categories — man-made tools/devices, animal models and toys of means of transportation. The textures were all 18 × 18 cm amorphic (non-rigid) sheets. Objects in the tactile condition were a plastic fork, a syringe, a dolphin toy, a jeep toy, etc. The tactile textures included a paper, a jute fabric, a sandpaper, a napkin, a velvet fabric, etc. The experiment was carried out using a block design format. Block types consisted of somatosensory objects and somatosensory textures (six blocks each). Each block lasted 12 s followed by a blank period of 9 s. The first and last blank periods were longer (27 and 15 s, respectively). The touched objects or textures were presented to the subjects by the experimenter every 4 s. Each block consisted of three items. The subjects had to palpate, recognize and covertly name the object or the texture they were presented with. The subjects got a short auditory cue (lasting ∼1 s) 3 s before and at the end of each tactile block, to assure that they touch the objects only during the blocks.
Auditory and Visual Objects Experimental Paradigm
This experiment included visual and auditory objects and appropriate baseline conditions for controlling low-level auditory and visual processing. In all conditions the subjects performed a one-back comparison between the currently presented stimulus and the previous one, indicating if the two were the same or different by pressing the left or right button on a response box. This design was to match as best as possible the attention and arousal, naming and motor components between conditions. All epochs lasted 12 s followed by 9 s of rest period. The auditory and visual objects were equalized in type and in proportion of each of the three categories: man-made tools/devices (50%), animal pictures or vocalizations (32%) and vehicles (18%). The identity of the stimuli in each category was different between the two modalities in order to eliminate the possibility of cross-modal priming, adaptation and imagery effects. Thus, for example, members of tools category were pictures of a lighter, a stapler, a video-camera, a stethoscope, a microphone and sounds of a hammer, a camera, a whistle, a hand-saw and a gun; the animals category included pictures of a bear, a butterfly, a shark, a giraffe and a goose and sounds of a cat, a donkey, an elephant, a horse and a cow; the vehicles category included pictures of a boat, a tractor, a truck, a spaceship and a balloon and sounds of a train, a helicopter, a motorcycle, a car and an ambulance. Before the experiment the subjects were introduced to the experimental procedure in all three modalities using a different set of objects that were not utilized later in the scan.
Visual Objects and Noise Stimuli
A set of 40 grayscale visual images of different objects were presented in the visual object epochs (Fig. 1a, left). Phase-randomized control images were created for all the visual objects (Fig. 1a, second from left). Each object stimulus was Fourier-transformed, the phases were randomized, followed by an application of the inverse Fourier transform. These stimuli were used in the visual noise epochs. The visual stimuli were grouped into blocks of 12 stimuli, each containing 9–11 novel stimuli and 1–3 repeating stimuli, presented at a rate of 1 Hz (Fig. 1c, left). Altogether, four visual objects and four visual noise blocks were presented.
Auditory Objects and Noise Stimuli
We used a set of 50 auditory sounds that allow recognition of the objects that produce them (i.e. auditory objects). These sounds were from the same three categories as the visual objects, with identical proportion of objects from the different categories. The original stimuli varied in their sampling rate and quality, but were all re-sampled at 22.05 kHz to yield the auditory object stimuli. Stimulus duration varied between 0.3 and 1.3 s (mean = 0.8 s, SD = 0.3 s). The auditory objects set was split into two subsets of 25 stimuli each (auditory objects group I and auditory objects group II). This was done in order to generate two types of noise stimuli to control for different characteristics of the auditory objects (e.g. power spectrum and time envelope). The first auditory noise group (termed auditory noise type I) was created by performing the analogous manipulation to the one carried out to generate the visual noise images. Each auditory object (from group I) was Fourier-transformed, phase randomization was then applied to the Fourier components and the noise stimulus was obtained by applying the inverse Fourier transform (Fig. 1a, right). The second subset of control noise sounds, termed auditory noise type II, was created from the auditory objects group II in the following way: pairs of auditory object sounds were combined, by applying the amplitude temporal envelope of one stimulus on colored Gaussian noise with the amplitude spectrum of the other stimulus (Fig. 1b). Taking the objects and shuffling their averaged power-spectrum and their time envelope results in noise control epochs that share the same time envelope and the same average power spectrum but are not perceived as objects anymore (as was confirmed in a psychophysical test we conducted before the experiment). The amplitude envelope of each stimulus was extracted by smoothing the squared waveform with a 6 ms Hamming window and taking the square root. A 10 ms linear ramp was applied to the onset and offset of all stimuli. Stimulus level was normalized to maintain constant RMS/s. The auditory stimuli were grouped in blocks of eight sounds and presented at a rate of 0.66 Hz (Fig. 1c, left). Each block contained 5 to 7 novel stimuli and 3 to 1 repeating stimuli, to allow monitoring performance in the one-back task. Sixteen auditory epochs were presented: four from each subgroup (auditory objects groups I and II, auditory noise types I and II). Since no significant differences were found between the two auditory object groups in all relevant aspects, we pooled them together and refer to them in the rest of the paper simply as ‘auditory objects’.
Psychophysical Task and Analysis
Subjects had to perform a one-back task during the scanning period in both visual and auditory epochs. The subjects reported their decision using a response box, pushing the left button if the current stimulus was identical to the previous one and the right button if the two were different. The subjects’ report was compared to the presented sequence and performance level was determined for each epoch (i.e. percent correct = the total number of correct ‘same/different’ answers/total number of comparisons; thus failing to respond to a certain stimulus was taken into account as a wrong response). Performance across conditions involving either objects or noise stimuli in both modalities was highly accurate (Fig. 1c, right). No significant differences between conditions and stimulus types were observed [one-way ANOVA, F(5,42) = 2.1]. This indicates that the auditory signals were clearly heard in spite of the background scanner noise.
The borders of retinotopic visual areas were determined based on mapping the vertical and horizontal visual field meridians for each subject (Serano et al., 1995; DeYoe et al., 1996; Engel et al., 1997). This map was obtained in a separate scan in which the subjects viewed triangular wedges containing either natural grayscale images or flickering black and white random dots. The flickering dots were effective in mapping borders between areas V1 and V2, and the natural grayscale images were useful for distinguishing higher-order areas (Levi et al., 2001).
Pure Tones Localizer
We defined a pure tones region of interest (PT ROI) in order to compare the activation in the LOtv ROI to typical auditory areas. A block design paradigm was used with two main conditions — pure tones and rest. The pure tones blocks contained 24 tones presented at 2 Hz. The duration of each tone was 350 ms, with linear onset and offset ramps of 5 ms. Three block types are created, each containing tones in a different frequency range: low (200–300 Hz), medium (800–1200 Hz) and high (3200–4800 Hz). Each block type was repeated six times to make a total of 18 blocks. The frequency of the tones in the low blocks (6 × 24 = 144 tones) was chosen from a uniform distribution over the frequency interval 200–300 Hz. For each of the six low tone blocks thus created, corresponding medium and high blocks were created that contain the same melodic sequences, shifted by two octaves or four octaves up. This localizer scan was run in five of the eight subjects that participated in the experiments.
Data analysis was performed using the BrainVoyager 4.4 software package (Brain Innovation, Maastricht, The Netherlands, 2000). For each subject, the 2D functional data were aligned to 2D anatomical slices of the same subject. Before statistical analysis, raw data were examined for motion and signal artifacts. Head motion correction and high-pass temporal smoothing in the frequency domain were applied in order to remove drifts and to improve the signal-to-noise ratio. Time-courses were obtained from the LOtv ROI and the PT ROI. The LOtv ROI was defined by a significant activation for somatosensory objects compared to somatosensory textures while the PT ROI was defined by significant activation for pure tones compared to rest. Only voxels with a correlation coefficient above 0.33 (P < 0.005, not corrected for multiple comparisons) were included in the ROIs. The obtained maps were superimposed on to 3D anatomical reference scans. The 3D recordings were used for surface reconstruction. This procedure included the segmentation of the white matter using a grow-region function. The cortical surface was then unfolded, cut along the calcarine sulcus and flattened. The obtained activation maps were superimposed on inflated and unfolded cortex for each subject. The Talairach coordinates were determined for each ROI. Time-courses were taken from the LOtv ROI and the PT ROI. The subject’s average signal intensity was estimated by averaging across all the voxels in each ROI. The average signal intensity across all subjects was also calculated, pooling over the different individual average signal intensities. Across-subjects analysis (Fig. 2) was done using the general linear model approach (Friston et al., 1995). To create the maps, the time-courses of all subjects were transformed into Talairach space (Talairach and Tournoux, 1988), z-normalized and concatenated. The regression weights for each condition were estimated using the general linear model approach. In the contrast tests, a t-test between the estimated weights of the opposing conditions was applied using a minimal P value of 0.05.
We have previously shown that a region in the ventral visual stream termed LOtv, responds to both visual and tactile objects in individual subjects. Here we examine whether this region is activated by auditory stimulation as well. Figure 2 indicates the loci of the object-related regions in the three modalities and their overlap, on an inflated, Talairach normalized, brain. The activation map is based on a multi-subject GLM (General Linear Model) analysis, combining data from all eight subjects. As expected, the visual object-selective regions (Visual objects > Visual noise, red regions in Fig. 2) delineate the LOC bilaterally. The tactile object-selective activations (Somatosensory objects > Somato-sensory textures, purple clusters) are mostly restricted to the contralateral parietal cortex (around the intraparietal sulcus) and parts of the contralateral LOC.
The visuo-haptic overlap region within LOC (shown in yellow) delineates LOtv. Some ipsilateral activation was found in the right hemisphere (ipsilateral to the palpating hand), but the extent of the contralateral activation was larger and more significant than the ipsilateral one, using the same statistical threshold (contralateral LOtv: volume equals 716 mm3 and Talairach coordinates are x = –47, y = –62, z = –10; ipsilateral LOtv: volume equals 296 mm3 and Talairach coordinates are x = +45, y = –54, z = –14).
The auditory object-specific activation (auditory objects > auditory noise conditions, green clusters) was found in temporal regions bilaterally, mainly around the superior temporal gyrus but also in the right superior temporal sulcus, the middle temporal gyrus and anterior ventral-temporal cortex. In contrast, no auditory object-selective voxels were found in LOtv or in the entire contralateral LOC. No activation was found also when contrasting the auditory object condition with each of the two baseline noise control stimuli separately. (Furthermore, no overlap with LOtv was found even when using a non-significant threshold of P < 0.1.)
To assess if any auditory activation could be found in LOC across all subjects when compared to the rest condition, a general linear model analysis across subjects with a Bonferroni correction for multiple comparisons was carried out. While the analogous analysis revealed significant tactile activation in contralateral LOC (tactile objects and tactile textures > rest: cluster size: 285 mm3 using a corrected P-value < 0.05) no significant voxels were found for any of the auditory conditions across subjects (all auditory conditions > rest or in either of the stimulus type tests: auditory objects > rest, and auditory noise > rest). This lack of activation in LOC persisted even when the criterion level was lowered to insignificant levels (P < 0.1). Thus, across subjects, there is strong and significant activation only in the visual and tactile modalities in the contralateral LOtv. Significant auditory object-related activation was found in other cortical regions but a detailed subject-by-subject anatomical and time-course analysis of the representation and hierarchy in temporal auditory cortex is beyond the scope of this paper.
We also performed an analysis on a subject-by-subject basis, analyzing the individual subject activation time-course in the contralateral LOtv region. Figure 3 depicts the anatomical layout of the somatosensory object selective voxels in the contralateral unfolded hemisphere of one representative subject. As noted before, two major foci of activation can be seen, one in the parietal cortex around the inferior-parietal sulcus (IPS), and a more ventral focus, in the lateral occipital cortex, just outside the retinotopic areas — the LOtv ROI. The lower right panel depicts the activation pattern in the LOtv ROI elicited by the different experimental conditions. The voxels in this ROI were selected according to their general anatomical location, in the occipito-temporal cortex, and their differential activation pattern during the haptic conditions (greater activation by objects compared to textures). As expected, these voxels showed a similar, if not greater, visual object selectivity. However, there was no statistically significant activation in LOtv during either of the auditory conditions. For comparison, we present the pattern of activation in classical auditory areas using the pure tone localizer region of interest (PT ROI). This ROI (including part of the transverse gyrus of Heschl and planum temporale) was activated in a robust manner by both auditory objects and the two types of auditory noise (Fig. 3, lower-left panel). These results demonstrate clearly that the auditory signals were effective in eliciting cortical activation. This pattern of activation in the two ROIs was consistent across subjects.
The average time-course of activation across all subjects for both LOtv ROI and PT ROI is shown in Figure 4a,b, respectively. This figure also depicts the average percent signal change across subjects for LOtv (Fig. 4c) and the PT ROI (Fig. 4d). The average time-course in LOtv across seven subjects (the last subject failed to reach the critical threshold to delineate LOtv) reiterates the robust response to visual objects, with no statistically significant activation to all auditory conditions and to the visual noise patterns (Fig. 4a). The time-course of the two types of auditory noise stimuli was averaged together because no significant difference was found between the activation elicited by the two in LOtv (t-test, P = 0.64). Since the performance level in the one-back task was highly accurate in all the auditory conditions (as in the visual conditions; see Fig. 1c right), the lack of auditory activation in LOtv could not be a result of a masking effect by the scanner noise, or differences in attention and arousal levels between conditions.
The activation pattern across subjects to both auditory objects and noise in typical auditory areas (PT ROI, Fig. 4b) was very different. This ROI exhibited strong and significant activation to all auditory stimuli. Note that although the scanner noise was constantly present throughout the scan (including the rest periods), the average time-course of activation in the auditory epochs followed the typical hemodynamic response function and the subjects’ behavioral performance level for the auditory conditions was highly accurate (94.8% correct ± 6.2 SD for the auditory objects and 96.8% ± 3.9 for the auditory noise condition; see Fig. 1c right). In contrast, this ROI showed negligible activation during the tactile and visual epochs. The overall percent signal change in the LOtv ROI (Fig. 4c) and the PT ROI (Fig. 4d) across subjects reiterates the same phenomenon. The general picture suggests that although the auditory stimuli were perceivable (leading to accurate psychophysical performance) and effective (generating a typical hemodynamic response in auditory regions), they failed to activate LOtv. This is in marked contrast to the observed visual and tactile object selective activation. We therefore suggest that LOtv is an object-related, bimodal region (activated strongly by both visual and tactile objects), rather then a global, multisensory, object-related association area.
As we showed (see Fig. 2), LOtv is located within LOC. Yet, as can be readily appreciated from its acronym, LOC is composed of a constellation of object-related areas that can be subdivided into two major anatomical regions: the more dorsal LO (including the lateral occipital sulcus) and the ventral foci. Recently, we have shown that the different ventral object-related activations (including the posterior fusiform gyrus, the collateral sulcus and parahippocampal gyrus) can be placed in a framework of a larger map termed ventral occipito-temporal cortex (VOT) (Malach et al., 2002). Although all these different areas are selective to objects, they differ slightly in their functional preferences (Ishai, 1999; Haxby et al., 2001). Previous studies indicated that the fusiform gyrus is better activated by faces (Puce et al., 1995; Kanwisher et al., 1997), the collateral sulcus and parahippocampal gyrus show a preference for houses and outdoor scenes (Aguirre et al., 1998; Epstein and Kanwisher, 1998), while LO and other regions have also preference for a variety of object categories [including man-made tools and animals (Martin et al., 1996; Levi et al., 2001; Beauchamp et al., 2002)].
This functional division leads to a specific prediction: If LOtv is involved in bimodal object analysis, the active haptic-related voxels are likely to be localized in regions processing the visual attributes of graspable objects, rather than the regions within LOC showing preference for scenes or faces. To test this prediction, we superimposed our anatomical data delineating LOtv in each individual subject, on data acquired from a different experiment aimed at delineating the different subdivisions within LOC by using faces, houses and common man-made objects (Levi et al., 2001). Figure 5 presents the activation pattern of six subjects that participated in both studies. The full contralateral (left) hemisphere of one subject is shown, and the insets focusing on LOC show the anatomical pattern of activation in all subjects. Purple clusters indicate regions selective for tactile objects versus textures; blue clusters indicate regions with preference for visual objects compared to faces and houses; orange clusters depict regions with preference for faces (compared to houses and common objects); and green clusters are voxels with preference for houses. As can be easily appreciated, the tactile object-selective activation is frequently located in the visual object-related voxels and is usually missing in voxels with preference for faces or houses.
The Defining Features of LOtv
This study demonstrates unequivocally that a region in the lateral occipital cortex, termed here LOtv, is activated by both visual and haptic presentation of objects (compared to textures) and is not activated by sounds of objects. The bimodal activation is bilateral but the tactile activation is much stronger in the contralateral hemisphere to the palpating hand (see Fig. 2). Interestingly, the voxels showing tactile activation for objects are generally located in object-specific LO, which shows preferential activation for visual objects (such as man-made tools) compared to faces or houses. This makes sense, considering that the palpated stimuli were all common objects or toys rather than faces. Our daily experience with such objects makes them readily recognizable by touch, but we rarely recognize faces by touch.
In contrast, LOtv is hardly responsive to auditory cues that allow recognition of objects based on their characteristic auditory signal. It could be argued that the lack of LOtv activation by the auditory stimuli is due to differences in the objects used in the auditory and somatosensory epochs. However, this is rather unlikely. First, the object categories were the same, only the exemplars differed between conditions. Second, no auditory activation was found within the entire contralateral LOC, not just LOtv. Thus, our results strongly suggest that LOtv is a specialized visuo-haptic integration region. Concurrent with our shape-processing hypothesis, recent fMRI studies of olfaction found no activation in LOC when smelling odorants (Zatorre et al., 1992; Savic et al., 2000). A recent study requiring to explicitly identifying objects by their smell, reports of olfactory activation in the cuneus, but not in the LOC (Qureshy et al., 2000). This lends further support to our shape analysis hypothesis for the putative function of LOtv, since like audition, olfaction can convey the identity of the object, but not its precise geometrical shape. Still, we cannot rule out the possibility that auditory (or olfactory) selective neurons may exist in this region but contribute little to the hemodynamic response. One must keep in mind that fMRI measures the BOLD activation, which is only an indirect measure of neuronal activity and reflects the ‘averaged’ activity of millions of neurons per voxel. While acknowledging the limitations of fMRI, our data suggest that the average neuronal response to a visual or haptic presentation of objects is qualitatively different from that in the auditory case, in which objects are recognized by their characteristic sound. The auditory sounds were clearly distinguishable in the scanner and gave a robust hemodynamic response in typical auditory areas. Therefore, the differences in object-related activation between modalities could not be due to inappropriate auditory stimulation. The general picture emerging from our results, therefore, suggests that LOtv is involved in the processing of tactile and visually defined objects.
The Level of Representation in LOtv
We now address the nature of object representation in LOtv. Since this region was activated only by visual and tactile cues and not by auditory cues, we can rule out the possibility that LOtv is a general multisensory object association region.
What are the common and preferential attributes of vision and touch in object recognition? These are the only two senses that can extract specific and precise geometric information about an object’s shape. Consider presentation of an abstract and completely novel object. Visual or tactile experience will enable reconstructing the shape of this object with relatively high precision. In contrast, the sounds, tastes and smells of this abstract object are irrelevant to the precise information about its shape. Interestingly, a recent study (James et al., 2002) showed that regions within LOC give rise to a similar fMRI signal change when palpating novel and unfamiliar abstract objects, which are difficult to label semantically. Since the subjects never had experience with these abstract objects prior to the scan, memory of the objects’ identity was unavailable when palpating them. Still, sub-regions of LOC were strongly activated by these stimuli. Furthermore, using a priming method, the authors showed that fMRI signal obtained when seeing objects was greater if the same objects were first felt than for other objects that were not explored haptically (i.e. a tactile priming effect). In fact, the change of signal magnitude due to tactile priming was similar to that of visual priming. Several psychophysical experiments also show that haptic to visual priming is as effective as visual to visual priming in its behavioral improvement, when using familiar or novel 3D objects (Easton et al., 1997; Reales and Ballesteros, 1999). Studying an object by touch from a certain view will facilitate visual recognition of the object from the same view, compared to other views (Newell et al., 2001). All these pieces of evidence, together with the current study, indicate that vision and touch indeed share the same shape representation, and we suggest here that LOtv is the cortical region mediating this bimodal integration. The exact level of the bimodal shape representation in LOtv is still unclear. It could be either at the lower level of basic features and their spatial relationships (Tanaka, 1993), at the intermediate level of view dependent representations (Logothetis and Sheinberg, 1996; Ullman, 1998), or at the high end of object recognition, reflecting a 3D view invariant representation of objects (Marr, 1982; Biederman, 1987). Recent imaging studies suggest that visual activity in LOC may reflect processing of the holistic object’s shape rather then the image features (Hasson et al., 2001; Kourtzi and Kanwisher, 2001; Lerner et al., 2002). Since LOtv is part of LOC it is plausible that the same holistic object shape processing applies for the haptic activation.
The Possible Role of Confounding Factors
The somatosensory object selectivity seen in LOtv could have potentially resulted from other confounding factors such as visual imagery, naming or attention and arousal differences between object and noise (or texture) conditions. The contribution of these factors was found to be minor at best in a previous study (Amedi et al., 2001). Still, the fact that no auditory activation was found in LOtv although vivid imagery of an object can potentially be elicited by its characteristic sounds, lends further evidence that visual imagery was not a major factor in activating LOtv. Similarly, since the behavioral requirements and performance in all visual and auditory conditions were comparable, the lack of auditory activation in LOtv argues against differences in attention and arousal or naming as a source for the object-related activation in this region.
Finally, recently, a cross-modal attention effect was reported in unimodal visual cortex (i.e. lingual gyrus) by simultaneous stimulation of the hand, using visual and tactile inputs (Macaluso et al., 2000). Could this top-down attention effect stemming from the parietal cortex (or prefrontal cortex) solely explain the activation reported in LOtv?
The authors suggest that truly multisensory regions are activated independently by two (or more) modalities, while cross-modal attention effect can be seen only when the stimuli in the two modalities are presented in a specific configuration. In our study, either tactile or visual stimulation alone were sufficient to activate the same region (LOtv) in a robust fashion, indicating that LOtv is truly a multisensory (bimodal) region.
Two Streams for Visual and Tactile Object Recognition and Their Convergence at LOtv
Previous studies have indicated that the anterior parietal cortex is likely to be involved in aspects of tactile object processing (Pons et al., 1987; Anton et al., 1996; Iwamura, 1998; Binkofski et al., 1999), especially regions around the IPS. Activation of the IPS was found in tasks requiring analysis of object shape such as length and curvature using both simple shapes (ellipsoids) and detailed objects (Roland et al., 1998; Deibart et al., 1999; Bodegard et al., 2001). We found that the IPS was one of the two most pronounced regions of activation when palpating objects, reflecting a possible pathway from the post-central gyrus to the IPS that could play a role in tactile object processing, analogous to the ventral pathway, which is specialized for the recognition of visual objects. LOtv, in the anterior part of the ventral pathway, may be the convergence zone of these two processing streams for object recognition. This convergence region may contain a volumetric three-dimensional description of the objects in both modalities, while the parietal regions may be more specialized for more basic tactile analysis (detection, curvatures, simple shapes, etc.). In addition, the parietal cortex may be involved in grasping and manipulation of target objects using input from the visual modality (Goodale and Milner, 1992; Westwood et al., 2002) and the tactile modality. Given the dominance of vision over the tactile input in shape processing, it may come as no surprise that this bimodal shape convergence occurs in the ventral visual stream.
Strong bi-directional connections exist between the pre-frontal and inferotemporal cortex as well as parietal and inferotemporal cortex. Indeed, evidence for a fronto-parietal circuit for object manipulation was recently reported using fMRI in humans (Binkofski et al., 1999). There is also evidence for the functional significance of the fronto-temporal loop. Single unit studies demonstrated the effects of prefrontal cortex on infero-temporal neurons by using reversible cooling, and other techniques (Fuster et al. 1985; Tomita et al. 1999; Miyashita and Hayashi, 2000). Thus, it is very likely that prefrontal cortex has its effect on the inferotemporal cortex in general and possibly also LOtv. Prefrontal activation was indeed found in the previous study (Amedi et al., 2001) when haptic exploration conditions were compared to the rest condition. However, only marginally significant activation was seen in prefrontal cortex when comparing the fMRI signal during epochs of object exploration to those of texture exploration. Our interpretation is that the prefrontal regions may be involved in the motor components of tactile exploration and are not shape selective, although we cannot rule out the possible influence of prefrontal and parietal activation on LOtv.
Is the bimodal convergence region in LOtv functionally relevant? Interestingly, Feinberg et al. reported a case study of a patient with a unilateral left hemisphere lesion around the inferior occipito-temporal cortex (no Talairach coordinates available) that resulted in a severe bimodal (visual and tactile) agnosia (Feinberg et al., 1986). This patient could not name, describe or demonstrate the use of objects, while showing no impairment in basic sensory tasks. In contrast, the patient’s auditory comprehension was within the normal range and his performance in an auditory nonverbal sound recognition task was only a little shy of the normal average performance. A similar general tactile agnosia following occipito-temporal lesions was reported in six more cases out of 17 subjects that have visual agnosia as reported by Morin et al. (Morin et al. 1984) and a case study reported by Ontake et al. (Ontake et al. 2001). Yet, naturally, the lesions are often not focal, and other works show a complete dissociation between tactile agnosia and visual agnosia, so that this issue is still debated.
Finally, preliminary recent studies suggest that neighboring areas to LOtv may also be activated by tactile input. For instance, area MT+, which is suggested as the human homologue of macaque motion areas MT and MST, is also activated by tactile motion on the skin surface (Francis et al., 2001; Hagen et al., 2001). MT+ is adjacent to the dorsal part of the lateral occipital cortex, dorsal and posterior to LOtv; Talairach coordinates for the center of activation — MT+: X = –47, Y = –69, Z = +2 [averaged over the following studies: (McKeefry et al., 1997; Goebel et al., 1998; Paradis et al., 2000; Rees et al., 2000; Sunaert et al., 2000)] and LOtv: x = –47, y = –62, z = –10.
These results raise the possibility that there are several modules within the occipito-temporal cortex that receive input from both visual and somatosensory origin. These modules may be engaged in the computation of various aspects of surface properties of objects, 3D shape, and visual and tactile motion.
We thank Michal Harel for the 3D-cortex reconstruction. This study was funded by the German–Israeli Foundation for research (GIF) grant number I-576-040.01/98 and by the Israel Academy of Science 8009 grant. AA was funded by a fellowship from the Horowitz foundation. GJ was funded by a fellowship from the Israeli Ministry of Science.