Using functional magnetic resonance imaging, we investigated the effect of motor preparation/execution on the activation of visual cortical areas by action observation. We presented videos of human actors performing several fine manipulative actions (e.g., grasping) with the hand or foot, together with appropriate control stimuli. Subjects either responded in a central fixation task with the hand (A) or foot (B) or viewed the stimuli passively (C). Experimental conditions were arranged according to a 2 × 2 × 3 factorial design with action, effector, and response as factors. Bilateral posterior parietal cortex was more strongly activated for action videos compared with controls during active runs (A or B) contrasted with passive runs (C). Two neighboring regions in the right fusiform gyrus (FG) were activated when the effector employed to respond in the task matched that displayed in the videos (A or B), independently of whether the stimulus was an action or a control. Neighboring regions in the right posterior middle temporal gyrus (MTG) were also activated when the effector observed and that used to respond matched (A or B), but only for action videos, not controls. Our results indicate flexible modulation of visual areas during concurrent action observation and action execution/preparation, which was effector specific in the FG and MTG.
It is well established that different parts of the human body are represented in topographic maps in the motor and somatosensory cortices (Penfield and Rasmussen 1950). Initially, the representation of the human body seemed less detailed in visual cortex. Even though functional magnetic resonance imaging (fMRI) studies have highlighted separate regions in occipitotemporal cortex (OTC) specifically involved in the processing of faces (Kanwisher et al. 1997; Gauthier et al. 2000) and bodies (Downing et al. 2001; Peelen and Downing 2005b; Schwarzlose et al. 2005), the latter investigations did not seem to differentiate between individual body parts (Downing et al. 2001). Recently, however, functional imaging work suggested that the topographic maps in OTC might actually be more specific, with separate areas chiefly processing distinct body parts (Bracci et al. 2010; Op de Beeck et al. 2010; Orlov et al. 2010). Surprisingly, these findings have been reported only for the presentation of static images of body parts, whereas research focusing on selective processing of dynamically presented body parts did not bear conclusive results, as some studies reported coarse somatotopic organizations within the posterior superior temporal sulcus (pSTS) (Wheaton et al. 2004; Pelphrey et al. 2005), whereas others found no evidence for body part–specific processing (Buccino et al. 2001; Thompson et al. 2007; Jastorff et al. 2010).
A common feature of all the above studies was that body part–specific representations were investigated using exclusively visual stimulation. In the present study, we applied a novel approach to the study of body part specificity in visual cortex by combining action observation and action execution. The reasoning behind this approach is that a large number of psychophysical studies have indicated changes in visual processing as a result of motor preparation or execution (Craighero et al. 1999; Hommel et al. 2001; Hamilton et al. 2004; Miall et al. 2006; Fagioli et al. 2007; Blaesi and Wilson 2009). One possible explanation for this motor–visual priming is that during the planning/execution of an action, a copy of the motor command is sent to the visual system to enhance the visual analysis of the action observed (Iacoboni et al. 2001). The neural substrates of this priming have been investigated using functional imaging in humans, comparing activations for the match/mismatch between the executed and observed action or for differences in timing. These studies have highlighted regions in the posterior part of the STS, the posterior superior parietal lobule, the intraparietal sulcus, and the premotor cortex (Fink et al. 1999; Iacoboni et al. 2001; Leube et al. 2003; Chaminade et al. 2005; Stanley and Miall 2007; Kontaris et al. 2009).
In contrast to previous studies, we did not focus on the dependence of fMRI activation on an exact match between executed and observed action. Here, we intended to utilize the effect of motor–visual priming to investigate effector-specific processing in visual cortex. As several studies have reported movement-related responses in visual cortex, even when the moving body part was not visible (Astafiev et al. 2004; Peelen and Downing 2005a; Dinstein et al. 2008; Orlov et al. 2010), our reasoning was that combined action observation/action execution might enhance visual processing of the effector used by the subject. That, in turn, might be observable in the blood oxygen level–dependent response over visual cortex, providing evidence for effector specificity not detectable with purely visual experimental paradigms. As OTC does not differentiate between different actions (Jastorff et al. 2010), an exact match between executed and observed action was not required. We rather aimed for an experimental design, where the participants were required to constantly prepare/execute a movement with a given effector while simultaneously controlling for attentional factors. To this end, we presented videos of hand and foot actions together with appropriate controls in 3 different contexts. In 2 of these, subjects either had to respond to an attentionally demanding high-acuity task (Vanduffel et al. 2002) with their hand or foot at an average rate of one response every 3 s. In the third context, subjects were instructed to passively fixate without any movement. A secondary aim was to investigate to what extend effector specificity depended on the dynamic nature of the visual stimulus, that is, whether the effector was static or involved in an action.
Materials and Methods
Fifteen right-handed volunteers with normal or corrected to normal visual acuity (6 male and 9 female aged 22–32 years) participated in the experiment. One of the participants (female) was removed from the analysis due to excessive head movements (>5 mm). The study was approved by the Ethical Committee of the K.U. Leuven Medical School, and all volunteers gave their written informed consent in accordance with the Helsinki Declaration prior to the experiment.
Stimuli consisted of video clips showing human actors from the side, performing various actions with either the hand or the foot (Fig. 1A). These videos were taken from the study of Jastorff et al. (2010). Four different actions were shown: pushing, dropping, grasping, and dragging of an object. The hand and foot actions were performed by one male and one female actor using their right limbs and involved 2 different objects (blue ball and yellow cube). Thus, the full stimulus set included 16 different hand and foot actions, respectively. The videos as shown were 17.7° × 13.2° visual angle in size, and each video clip lasted 2.65 s. At approximately 1° of visual angle above or below the moving effector, an image of a toy car (either blue or yellow) was shown in every frame of the video clip. This toy measured 2.4° × 1.3° visual angle, matching the size of the hand and foot in the videos. In the control stimuli, the car mimicked the movement of the effector with the first frame of the corresponding action stimulus present in the background. The car was animated with the exact motion trajectory of the hand or the foot extracted from the respective action videos. Previous studies have shown that this control condition was most effective in controlling for local motion and the various shapes in the scene (Jastorff et al. 2010). The fixation baseline stimuli consisted of a plain gray rectangle with the same dimensions as the movie clips. The luminance of this rectangle, obtained by averaging the mean luminance over all movie clips, was chosen to minimize luminance changes and hence any changes in pupil diameter across conditions. The edges of all stimuli were blurred with an elliptical mask (14.3° × 9.6°), leaving the actor in the video unchanged, but gradually blending the periphery with the black background. A small horizontal bar was superimposed on all individual stimuli. This fixation bar was presented 0.7° above or below the position where the movement took place in the video. These 2 fixation positions were chosen to control for the retinal position of the movement. Depending on the performance of the subject, the size of the bar ranged from 0.12° × 0.05° to 0.15° × 0.05° across participants.
Stimuli were arranged according to a 2 × 2 × 3 factorial design (Fig. 1) with 2 visual factors: action (action videos and car controls) and the effector used in the action (hand and foot) and with response (hand-active, foot-active, and passive) as third factor. The factor response was divided over different runs. A single run in the experiment included 5 conditions, corresponding to the variations of the 2 visual factors: the 2 action conditions (hand and foot), their respective nonbiological motion control conditions (car motion), and a fixation baseline condition. These 5 conditions were presented in blocks of 21 s each, randomly showing 8 of the 16 videos without intervening blank frames. Within any given run, every condition was shown 3 times, resulting in 15 blocks per time series or run, lasting 315 s overall. The order of conditions was randomized and counterbalanced across subjects. The position of the red fixation bar (either above or below the effector) was selected randomly for each stimulus. Every run started with the acquisition of 4 dummy volumes to assure that the MR signal had reached its steady state.
Runs were shown in 3 different contexts, which together manipulated the factor response (Fig. 1B,C). In 2 contexts, subjects were required to indicate a change in the orientation of the fixation bar by pressing a button with the right index finger or by pushing a pedal with the right foot, respectively. The bar flip (from horizontal to vertical) occurred at random intervals between 1.5 and 4.5 s and lasted 0.5 s. This is the standard rate (Vanduffel et al. 2002) that has been used in all our visual imaging studies in both human and nonhuman primates, with the exception of Jastorff et al. (2010), who used a slower rate. In a prescanning training session, the size of the bar was defined for each participant individually to yield a performance of about 85% correct responses. The reason for including this high-acuity task was 2-fold. First, the high frequency of the orientation changes of the bar would prompt subjects to continuously prepare for a motor response throughout the entire run. Thus, the response factor includes both motor preparation and execution components. Second, previous studies have shown that the difficulty of the task reduces potential attentional differences between stimuli showing hand movements and foot movements because attention is focused elsewhere (Jastorff et al. 2010). In the third context, the fixation bar was replaced by a fixation dot (0.2°), and participants were instructed to passively fixate the dot without any movement. Runs of the 3 contexts were shown in alternation, and their order was counterbalanced across participants. Data from 9 runs were acquired for each subject and 3 runs for every context.
Apparatus and Setup
Participants lay in a supine position inside the scanner. One response button box was placed into the right hand, and a second box was attached to a device recording responses made with the right foot. To indicate a button press, only minimal movement of the index finger or the foot was required. To reduce the amount of head motion during the scanning sessions, the participants were asked to bite an individually molded bite bar fixed on the scanner table. The stimuli were projected with a liquid crystal display projector (Barco Reality 6400i, 1024 × 768, 60 Hz refresh frequency) onto a translucent screen positioned in the bore of the magnet at a distance of 36 cm from the point of observation. Participants viewed the stimuli through a mirror tilted at 45° and attached to the head coil. Throughout the scanning session, the participants' eye movements were recorded with an ASL eye tracking system 5000 (60 Hz; Applied Science Laboratories, Bedford, MA). Due to technical problems, eye movement data from 3 subjects could not be used for further statistical analysis.
Scanning was performed with a 3-T MR scanner (Intera; Philips Medical Systems, Best, the Netherlands) located at the University Hospital of the Catholic University Leuven. Functional images were acquired using gradient-echo-planar imaging with the following parameters: 50 horizontal slices (2.5 mm slice thickness; 0.25 mm gap), repetition time (TR) = 3 s, time echo (TE) = 30 ms, flip angle = 90°, 80 × 80 matrix with 2.5 × 2.5 mm in plane resolution, SENSE reduction factor of 2. The 50 slices within a given volume covered the entire brain from the cerebellum to the vertex. A 3D high-resolution T1-weighted image covering the entire brain was also acquired and used for anatomical reference (TE/TR 4.6/9.7 ms; inversion time 900 ms, slice thickness 1.2 mm; 256 × 256 matrix; 182 coronal slices; SENSE reduction factor 2.5). The scanning session lasted about 90 min.
Image processing was carried out using SPM5 (Wellcome Department of Imaging Neuroscience, London, UK), implemented in MATLAB (The Mathworks, Inc). Preprocessing involved realignment of the images, coregistration of the mean functional image and the anatomical image, and normalization of all images to standard stereotaxic space (Montreal Neurological Institute [MNI]) with a voxel size of 2 × 2 × 2 mm. For every participant, the onset and duration of each condition were modeled by a general linear model (GLM). For every run, the design matrix was composed of 4 regressors modeling the 2 visual factors (Fig. 1A) and 1 regressor for the baseline fixation condition. In addition, 6 regressors obtained from the motion correction during the realignment process were included to account for voxel intensity variations due to head movement. Three separate sets of regressors were used for the 3 levels of the factor response (Fig. 1B), thus the full GLM consisted of 15 (5 × 3) regressors modeling the conditions plus 18 (6 × 3) regressors for the respective realignment parameters. All regressors were convolved with the canonical hemodynamic response function (HRF).
Random Effects Analysis
For the random effects model, the functional data were smoothed with an isotropic Gaussian kernel of 8 mm prior statistical analysis. We computed 10 different contrast images for each of the 14 participants at the first level. As indicated in Figure 1D, the letters A and B represent the observation of hand and foot action conditions and the C and D correspond to watching the car control conditions. The numbers 1, 2, and 3 as subscripts indicate the context: hand-active, foot-active, and passive, respectively, and multiple subscripts indicate the sum over response conditions. The last 6 contrasts were used to mask the first 4 main contrasts.
Group contrast 1 (GC 1) defined the regions involved in action observation by contrasting all action conditions with all car control conditions collapsed over the third factor, response (hand-active, foot-active, and passive): (A1,2,3 + B1,2,3) − (C1,2,3 + D1,2,3).
The GCs 2–4 targeted the interactions of response with either or both visual factors. The second contrast identified areas showing an interaction between the factor response (hand-active, foot-active, and passive) and the visual factor action (action and car control), that is, stronger responses to action videos compared with control videos, in the active runs (hand or foot button presses) compared with the passive runs: [(A1,2 + B1,2) − (C1,2 + D1,2)] − 2[(A3 + B3) − (C3 + D3)]. The third contrast tested for an interaction between response (hand-active and foot-active) and the other visual factor, the effector used in the action (hand and foot) concentrating only on “active” runs. This interaction indicates stronger responses for stimuli containing hands compared with stimuli displaying feet in the runs where participants responded with the hand (hand-active runs) and the reverse for runs where subjects responded with the foot (foot-active runs): [(A1 + C1) − (B1 + D1)] − [(A2 + C2) − (B2 + D2)], which is equivalent to (A1 + C1) − (B1 + D1) + (B2 + D2) − (A2 + C2). The fourth contrast investigated the 3-way interaction between the factors response, action, and effector, again focusing on the active runs. Thus, we contrasted the difference between hand and foot actions against the same difference for the car control conditions. For hand-active runs, we assigned positive values to hand actions and foot car control stimuli, whereas for foot-active runs, we assigned positive values to foot actions and hand car control stimuli: [(A1 − B1) − (C1 − D1)] − [(A2 − B2) − (C2 − D2)], which is equivalent to (A1 − B1) − (C1 − D1) + (B2 − A2) − (D2 − C2).
The fifth GC highlighted the areas involved in visual processing by contrasting all conditions presenting videos with the baseline fixation condition. This contrast was subsequently used as a mask for all contrasts in which visual factors were included to confirm the visual nature of the activation. The sixth and seventh contrast highlighted regions showing a significant main effect of hand in the hand-active runs or a significant main effect of foot in the foot-active runs: (A1 + C1) − (B1 + D1) or (B2 + D2) − (A2 + C2), that is, the 2 parts of GC 3. These 2 contrasts were used to mask GC 3 (see above) to ensure that the main effect of hand and foot was present in both the hand- and the foot-active runs, respectively. The eighth contrast highlighted the main effect of action by contrasting the actions versus the car controls, and the ninth and tenth contrast highlighted a significant interaction in the hand-active runs or in the foot-active runs: (A1 − B1) − (C1 − D1) or (B2 − A2) − (D2 − C2), that is, the 2 parts of GC 4. The latter 3 contrasts were used to mask GC 4 to ensure that the interaction was present in both the hand- and the foot-active runs, respectively, and that the interaction was driven by a stronger response to the respective action and not the control.
Subsequently, we generated 10 different activation maps by means of a second-level random effects analysis (Holmes and Friston 1998), combining the contrast images from all participants (one-sample t-test). For the main contrasts, including the interactions, significance level was set at P < 0.001 uncorrected, corresponding to a t-score of 3.8. The thresholds for the masks were set to P < 0.05 uncorrected.
Recent studies have shown that group analyses, because of smoothing (here full-width at half-maximum of 8 mm) and averaging across subjects, may underestimate the selectivity of neighboring areas. For example, Kolster et al. (2010) showed that the human MT/V5 area is only weakly shape selective in analyses using unsmoothed single-subject data but that shape selectivity from neighboring lateral occipital complex could influence activation levels in MT/V5 in the group analysis. To investigate, whether activation sites identified in the group analysis might actually show selectivity with respect to a single effector, we analyzed the unsmoothed single-subject data separately for hand- and foot-active runs. Here, we defined 6 additional contrast images for each participant, 3 involving only the hand-active runs and 3 for only the foot-active runs. These 6 single-subject contrasts (SSC 1–6) corresponded pairwise to each of the interactions considered in the group analysis (i.e., GC 2–4). The pair SSC 1–2, corresponding to GC 2, considered the main effect of action separately for hand- and foot-active runs: (A1 + B1) − (C1 + D1) and (A2 + B2) − (C2 + D2). The pair SSC 3–4, corresponding to GC 3, considered the main effects of effector separately for hand and foot responses: (A1 + C1) − (B1 + D1) and (B2 + D2) − (A2 + C2) and is identical to the pair GC 6–7. The pair SSC 5–6, corresponding to GC 4, defined the interaction between the factors action and effector with positive signs for hand actions and foot car control stimuli for the hand-active runs and the reverse for the foot-active runs: (A1 − B1) − (C1 − D1) and (B2 − A2) − (D2 − C2). It is identical to the pair GC 9–10. Since the SSC 5–6 are interactions, they were each masked with the main effect of action ((A1 − C1) − (B1 − D1) and (A2 − C2) − (B2 − D2); P < 0.05 uncorrected) to ensure that the interactions arose from an increased response to one of the dynamic conditions. The SPMs generated by these 6 SSC were scrutinized for local maxima within a 15 mm radius of the local maximum of the corresponding GC. Previous fMRI studies have shown that the location of a functionally defined area can vary considerably between subjects. The vector distance corresponding to twice the standard deviation (SD) of the mean coordinates across subjects ranged from 13.5 to 17.5 mm (Tootell et al. 1995; Hasnain et al. 1998; Dumoulin et al. 2000). Thus, a sphere with a radius of 15 mm centered on the group local maxima is likely to include most of the individual local maxima. For each SSC and subject, the local maximum with the smallest distance from the group local maximum was selected.
Activity profiles plot the MR activation, in % MR signal change from fixation baseline, for the different conditions of the experiment. Profiles were computed by averaging the 27 voxels, corresponding to an area of 216 mm3, surrounding the local maxima, and averaged over subjects for both the group and the single-subject analyses. These profiles were used for illustrative purposes to visualize the presence and origin of the interactions. Indeed, interaction can arise from high activation in the experimental condition of the positive contrast or in the control condition of the negative contrast, but only the first is of interest in the present study. The profiles were also further analyzed using three-way repeated measures analyses of variance (ANOVAs) to test only for the presence of effects but not their size (Kriegeskorte et al. 2010).
Subjects' performance on the high-acuity task inside the scanner proved very accurate with an average of 87.1 ± 10.5% and 86.1 ± 10.9% correct responses for the hand- and foot-active runs, respectively. Statistical analysis showed no significant differences in performance levels between the 2 groups (t13 = 1.3, P = 0.21). Analyzing the correct responses separately for individual conditions, showed no significant difference within the hand- (F4,65 = 1.0, P = 0.39) and foot-active runs (F4,65 = 0.8, P = 0.51). Reaction times for the hand-active runs (454 ± 34 ms) were significantly faster than the foot-active runs (502 ± 45 ms; t13 = 5.6, P < 0.001), yet there was no difference in reaction times among the conditions within the hand-active (F4,65 = 1.9, P = 0.12) or foot-active runs (F4,65 = 0.9, P = 0.45). Statistical analysis of the motion regressors showed no significant difference in the amount of head motion between hand-active and foot-active runs (F1,13 = 2.3, P = 0.16). Eye movement recordings demonstrated that subjects were able to fixate for long periods of time. They averaged 4.7 ± 4.2, 3.8 ± 3.5, and 8.0 ± 4 saccades per minute for hand-active, foot-active, and passive runs, respectively. Eye movements did not differ significantly among the individual conditions (hand-active: F4,40 = 1.4, P = 0.26, foot-active: F4,40 = 0.5, P = 0.76, and passive: F4,40 = 2.4, P = 0.18) nor between the hand-active and the foot-active runs (F1,10 = 0.9, P = 0.36), but subjects made significantly more saccades during the passive runs compared with the active runs (hand-active: F1,10 = 4.9, P = 0.05, foot-active: F1,10 = 17.9, P < 0.05).
Main Effect of Action Observation (GC 1)
The main effect of action observation in the random effects analysis is shown in Figure 2. Contrasting fMRI signals for the conditions presenting hand or foot actions compared with their nonbiological motion control conditions resulted in significant activation across the 3 levels of the action observation network, including occipitotemporal, parietal, and premotor cortex, similar to the pattern observed in Jastorff et al. (2010). In the following random effects analyses, we investigated the interaction between action and response, the interaction between effector and response, and the three-way interaction between action, effector, and response (for definition, see Materials and Methods).
Interaction between Action and Response (GC 2, SSC 1–2)
This two-way interaction was defined as the difference between the activations for action videos and their respective nonbiological motion control videos during the active runs (hand and foot), contrasted with the same difference for the passive runs. This contrast resulted in significant activation in 2 sites in posterior parietal cortex (PPC; Fig. 3A). Because the interaction could signal either a stronger activation for action videos compared with controls in the active runs or a stronger activation for control videos compared with actions in the passive runs, the % MR signal changes in the 2 local maxima (Table 1) are given in Figure 3B. These profiles are shown to verify the presence and the origin of the interaction rather than to estimate the size of the interaction. These profiles confirm that the 2 sites in the left and right hemispheres were indeed strongly activated in the 2 action conditions (dark blue and red bars) compared with the control conditions (light blue and yellow bars) in the active runs but not in the passive runs.
Note: Note that in all cases, the activity profiles were computed taking a volume of 216 mm3 surrounding the local maximum. MTG, middle temporal gyrus; STS, superior temporal sulcus.
Subsequent statistical analysis using a three-way repeated measures ANOVA showed a significant interaction between response and action in both local maxima (LH: F2,26 = 20.9, P < 0.001; RH: F2,26 = 6.9, P < 0.01). This result was trivial as this interaction defined the site, though it allowed further contrast analyses within the ANOVA. These further analyses revealed a main effect of action in both types of active runs (hand-active LH: F1,13 = 19.5, P < 0.001; RH: F1,13 = 16.1, P < 0.001; foot-active LH: F1,13 = 20.0, P < 0.001; RH: F1,13 = 29.2, P < 0.001) that was absent in the passive runs (LH: F1,13 = 0.5, P = 0.49; RH: F1,13 = 3.7, P = 0.08). Thus, both these sites seem to be recruited exclusively during action observation when the observer him/herself is performing a movement. This effect was present for both effectors during the active runs, independent of whether the effector used to respond to the task matched that displayed in the videos. Even single-subject analysis (SSC 1–2) based on unsmoothed data revealed no specific selectivity for conditions where the effector that was used to respond matched that presented as the stimulus (i.e., used in the action).
Interaction between Effector and Response (GC 3 and SSC 3–4)
This two-way interaction was defined as a stronger activation for stimuli displaying a hand compared with stimuli containing a foot during the hand-active runs together with the opposite effect during the foot-active runs. Thus, the interaction indicated a stronger activation when the effector used to respond to the high-acuity task and the effector displayed in the stimuli were identical, independently of whether the stimulus was an action or a control. The passive runs were not modeled. This contrast showed a significant activation at a single site in the right fusiform gyrus (FG) (Fig. 4A). The respective % MR signal changes for the different conditions in this local maximum (Table 1) are shown in Figure 4B. Again these profiles are shown only to confirm the presence and intended origin of the interaction. As expected, in hand-active runs, the activations in the 2 hand conditions (dark and light blue bars) are higher than those of the foot conditions (red and yellow bars), whereas the reverse is true during foot-active runs, with little difference observed in the passive runs. Subsequent statistical analysis using a three-way repeated measures ANOVA demonstrated a significant two-way interaction between the factors response and effector (F2,26 = 4.1, P < 0.05), which simply confirms the voxel-based analysis.
Activity profiles (Fig. 4B) showed that the site identified on the basis of the group data switched selectivity, depending on which effector was used to respond to the task. That is, the same site showed stronger activation for stimuli containing a hand during the hand-active runs but stronger activation for stimuli displaying a foot during the foot-active runs. To investigate, whether this change in selectivity was an unintended effect of the group analysis (see Materials and Methods and also Kolster et al. 2010), in an additional analysis, we defined the main effect of effector separately for hand-active and foot-active runs on single-subject unsmoothed data (SSC 3–4). Using SSC 3 and SSC 4, we defined 2 local maxima, one for the main effect of hand and foot, respectively, for each participant. These local maxima were all located within 6 mm radius from the group local maxima (Fig. 4A and Table 1). Subsequently, the activity profiles were calculated by averaging over the 27 voxels surrounding these individual local maxima separately for all 3 types of runs (hand-active, foot-active, and passive). Despite the close proximity of the local maxima, there was no overlap between the 27 voxels for hand and foot, respectively, within a given subject. The profiles are presented in Figure 4C (optima defined by hand-active runs, SSC 3) and Figure 4D (optima defined by foot-active runs, SSC 4) to confirm the presence of the effects, not their size.
For the local maxima defined purely on the basis of the hand-active runs (Fig. 4C), a three-way ANOVA revealed a significant two-way interaction between response and effector (F2,26 = 22.0, P < 0.001). This interaction may reflect the selection since the individual local maxima were required to be located near the group local maximum for this interaction. Additional contrast analyses within the ANOVA showed that the main effect of effector was present only for the hand-active runs (F1,13 = 56.6, P < 0.001), an effect reflecting the definition of the site, but not for the other 2 types of runs (foot-active: F1,13 = 0.2, P = 0.63; passive: F1,13 < 0.1, P = 0.87). Indeed, the profiles of Figure 4C show high dark blue and light blue bars during hand-active runs but not during the 2 other types of runs. Importantly, the absence of a significant main effect of effector in the foot-active and passive runs was not included in the definition of the site. Likewise, when the local maxima were defined purely on the basis of the foot-active runs (Fig. 4D), we again obtained a significant interaction between response and effector (F2,26 = 6.3, P < 0.01). Here, contrast analyses revealed a significant main effect of effector for the foot-active runs (F1,13 = 53.1, P < 0.001), which was absent in the other 2 runs (hand-active: F1,13 = 2.2, P = 0.17; passive: F1,13 = 0.54, P = 0.48). Again activity profiles (Fig. 4D) show that in the foot-active runs, the red and yellow bars are high compared with the 2 other conditions.
Note that the significant main effect of hand effector in the hand-active runs and of foot effector in the foot-active runs were predictable given the contrast used for localization. The important finding, however, is the absence of any main effect of effector in the passive runs or when the effector used to respond to the task did not match that shown in the stimuli. That is, the site showing a main effect of effector during the hand-active runs did not show this effect during the foot-active runs and vice versa, even though the stimuli presented were exactly the same. Another interesting result is that activation in the passive runs for any of the videos does not significantly exceed baseline activation, indicating that both sites are unresponsive to visually presented action stimuli, unless the participant him/herself performs a movement. This effect becomes visible only in the single-subject analysis (compare Fig. 4C,D with Fig. 4B), as the segregation between responsive and unresponsive voxels is blurred in the group analysis (see Materials and Methods).
Figure 4E displays the location of these 2 local maxima in the flatmap, shown separately for each subject. As can be seen from this illustration, we obtained 2 separate local maxima for all subjects, organized in a medial to lateral direction in the hemisphere, that is, running across the FG orthogonally to the collateral sulcus and the occipitotemporal sulcus. However, the local arrangements of the 2 maxima with respect to the group maximum were not consistent across subjects. To investigate whether the lack of a clear organization was related to the normalization process, we created a spherical region of interest (ROI) (5 mm radius) around the 2 local maxima for a single subject and projected these ROIs back into the native space of that subject using the inverse normalization. Figure 4F shows the results of this back projection into the original 3D volume for 3 representative subjects with the local maximum for the hand (SSC 3) shown in yellow and that for the foot (SSC 4) shown in red. These 3 subjects correspond to the green, red, and orange colored symbols displayed in Figure 4E. These subjects were selected because the respective locations of effector-specific effects in the flatmap were opposite: The pattern for the first subject (green, top) was exactly opposite that of the 2 other subjects. In native space, the local organization of the hand and foot local maxima with respect to each other was also different between subjects, even more so than in the flatmap since the patterns were different in all 3 subjects. In the original 3D volume, the representations of hand and foot seemed to rotate around a center (for illustration, see inserts Fig. 4F). This is reminiscent of the MT/V5 cluster in humans, where the spatial relation between the subparts of the cluster remains the same across subjects, but the whole cluster can be rotated up to 90° with respect to neighboring areas (Kolster et al. 2010).
Taken together, these results suggest that the single site located in the right FG is specifically activated when the effector used to respond and the effector shown in the display are identical, independently of whether the effector is shown moving or static. This single site obtained from the group analysis reflected the smoothing and averaging typical of this analysis, not the underlying neural substrate. Indeed, analyses based on unsmoothed single-subject data showed that 2 sites, though closely neighboring, are highly specific with respect to the effector. That is, one responded only to the observation of a hand, when the participants simultaneously use a hand to respond to the task but does not differentially respond during foot-active or passive runs. Similarly, the other site responded only to the visual display of a foot, when participants responded with the foot, but is not differentially activated during hand-active or passive runs. On average, these 2 sites were located at a vector distance of 10 mm (SD ± 2.5 mm) from one another. Because the activity profiles were generated by averaging the activity of the 27 voxels surrounding the local maxima, this analysis signifies a strong selectivity within closely neighboring regions. However, this specificity was not apparent in the group analysis because their localization was not consistent across subjects. This underscores the need for using unsmoothed single-subject analysis, when claims are made about identical location, as was implied by our interaction involving the factor response in the SPM group analysis.
Three-Way Interaction between Movement, Action, and Effector (GC 4, SSC 5–6)
The three-way interaction was defined as a stronger activation for hand action videos compared with foot action videos, both relative to their respective nonbiological motion control videos in the hand-active runs, together with the reverse effect in the foot-active runs. Passive runs were not included in the model. This GC resulted in a single activation site, located in the right posterior middle temporal gyrus (pMTG) shown in Figure 5A (Table 1). The corresponding activity profile averaged over the 27 voxels surrounding the local maximum is displayed in Figure 5B, again to verify the presence and origin of the interaction, not its size. As expected, the difference between the dark blue and light blue bars is greater than that between the red and yellow bars during hand-active runs, whereas during the foot-active runs, the difference between the red and the yellow bars exceed that between the blue bars. Statistical analysis of the profile with a repeated measures ANOVA showed a significant main effect of action (F1,13 = 2.8, P < 0.001) and a trend toward a significant three-way interaction (F2,26 = 3.1, P = 0.06). Thus the three-way interaction site, unlike the other interaction sites, was also part of the action observation network (see also Fig. 6).
In a manner analogous to the procedure used for the two-way interaction between effector and response, we computed the interaction between the visual factors action and effector, masked with the main effect of action, doing so separately for hand-active and foot-active runs on unsmoothed individual subject data (SSC 5–6). On this basis, we defined corresponding local maxima individually for each subject in the vicinity of the group local maximum of GC 4 (Table 1). As was the case with the site in the FG, we obtained separate local maxima for the interactions in the hand-active (SSC 5) and foot-active runs (SSC 6) in each individual subject (Fig. 5E). These local maxima were located within 12 mm radius from the group local maximum in pMTG. The % MR signal changes were calculated by averaging over the 27 voxels surrounding the local maxima for the hand (SSC 5) and foot (SSC 6). Again, there was little overlap between the 27 voxels for hand and foot: Overlap was present in only 3 of the 14 subjects (3, 4, and 6 voxel overlap, respectively). The profiles are given in Figure 5C,D, again solely for verification of the presence and origin of the interaction.
For the site defined exclusively by the hand-active runs (Fig. 5C), we obtained a significant three-way interaction (F2,26 = 9.1, P < 0.01). Subsequent contrast analysis resulted in a significant interaction between action and effector (the difference between blue bars was greater than that between the red and yellow bars) that was present only for the hand-active runs (F1,13 = 35.1, P < 0.001), reflecting the definition of the maxima, but this interaction was not observed for the 2 other types of runs (foot-active: F1,13 = 3.2, P = 0.09; passive: F1,13 = 0.6, P = 0.44). The absences of interactions in the latter 2 types of runs were not predicted by the selection of the local maxima. Further post hoc analyses demonstrated that when the subjects responded with the hand, the response of this site to the hand action observation exceeded that in the other 3 conditions significantly (foot action: P < 0.01, hand control: P < 0.001, and foot control: P < 0.01).
Similar results were observed for local maxima defined exclusively on the basis of the foot-active runs (Fig. 5D). In this case, the opposite interaction, a difference between red and yellow bars that was larger than that between the dark blue and light blue bars appeared, but only in the foot-active runs. Correspondingly, in the ANOVA, we obtained a significant three-way interaction (F2,26 = 5.7, P < 0.01) and subsequent contrast analysis revealed that here the interaction was significant only for the foot-active runs (F1,13 = 40.5, P < 0.001), the ones used to define the sites, but not for the other 2 types of runs (hand-active: F1,13 = 2.3, P = 0.13; passive: F1,13 = 0.5, P = 0.50). Again the response of this foot-specific site to the observation of foot actions, when the subjects responded by foot exceeded the responses in the 3 other conditions significantly (hand action: P < 0.001, hand control: P < 0.001, and foot control: P < 0.001). As for the sites in the FG, the respective interactions between the action and the effector for hand- or foot-active runs were predicted by the model, but the absence of interactions for the other 2 types of runs was not.
Taken together, the results of the three-way interaction reveal that even though the pMTG region is generally involved in action observation, its group response to the observation of an action carried out with a given effector is specifically enhanced when the same effector is used for a motor response. Moreover, 2 anatomically distinct sites (Fig. 5E), neighboring each other in all individual subjects (vector distance: 15 mm, SD ± 7 mm), showed this effect exclusively for hand actions and foot actions, respectively. The local maxima of individual subjects were organized in the rostrocaudal direction coursing through the pMTG. As in the fusiform cortex, there was a considerable variation across subjects in the relative location of the hand- and foot-specific sites, and pairs of subjects could display opposite patterns (e.g., white and dark blue symbols or light pink and light purple symbols). Although there was some tendency of hand sites (squares) to be located more rostrally than foot sites (circles), clear exceptions were notable, and both types of sites were intermingled at the level of the group activation, explaining the lack of segregation at the group level.
Our results show that the interaction between motor preparation/execution and action observation results in 3 qualitatively different effects, each found in a specific anatomical location: PPC, FG, and pMTG (Fig. 6). We will first discuss these effects observed in these 3 distinct sites, followed by a general interpretation of the function of these interaction effects.
Bilateral Posterior Parietal Cortex
Activation in PPC indicated that this region was recruited bilaterally during action observation, only when the observer him/herself was performing a movement. Electrophysiological studies in monkeys and brain imaging studies in humans have implicated the PPC in the integration of visual and motor information during self-generated movements (Taira et al. 1990; Grafton et al. 1992; Sakata et al. 1995; Snyder et al. 1997; Fink et al. 1999). In humans, activation in PPC increases when the executed and observed hand movements do not match (Fink et al. 1999; Stanley and Miall 2007), and repetitive transcranial magnetic stimulation impairs judgment of the relative timing between the subject's own finger movement and a visually displayed movement (MacDonald and Paus 2003). These findings have lead to the proposal that PPC actively combines the motor efference copy with information from lower level visual areas in order to produce an accurate predictive model of hand position (Stanley and Miall 2007).
Our study differed from those previously performed, in that it was not primarily designed to investigate responses to matches or mismatches between movement observation and movement execution. In our experiment, the executed movement was decoupled from the observed movement since both differed grossly with regard to type of action (i.e., pressing a button vs. manipulating an object) and temporal characteristics. Nevertheless, our finding that the PPC sites showed stronger activation for actions compared with controls only during active runs is compatible with the idea that this site combines visual and central (efference copy) signals. In the absence of central signals, during the passive runs, this site was not differentially activated. Yet, if one assumes that the PPC combines signals from both modalities, one might expect this site to be more specific with respect to the effector used for the movement. Surprisingly, the PPC was the only site where we did not obtain such specificity. It is conceivable, however, that the underlying neural populations are so closely intermingled that they cannot be discriminated given standard fMRI resolution.
Two neighboring sites near the fusiform body area (FBA, Peelen and Downing 2005b; Schwarzlose et al. 2005) showed selective responses to hand or foot stimuli, respectively, in the unsmoothed individual subject data. This specificity became apparent when the effector used to respond in the task matched that displayed in the videos and was independent of whether the stimulus was an action or a control. Orlov et al. (2010) reported a large-scale topographic representation of the human body in OTC. By applying phase-encoding techniques, separately presenting images of upper and lower faces, arms, torsos, and legs, they demonstrated a preference for individual body parts in specific regions of OTC extending far beyond the classical extrastriate (EBA, Downing et al. 2001) and fusiform body areas (FBA, Peelen and Downing 2005b; Schwarzlose et al. 2005). In their study, regions in the vicinity of the FBA showed a preference for the upper and lower parts of faces rather than hands and feet as in our study. However, the 2 studies are difficult to compare, in that the preference for a specific effector in the FG in the present study was observed only in active runs, when the effector presented and the effector used for the response were identical. During the passive runs of the present study, which are those most closely matching the study of Orlov et al. (2010), no effector preference was observed in the sites, which failed even to respond significantly above baseline. This finding indicates additional recruitment of processing capacities, not involved in the purely visual analysis of body parts.
The absence of an effect in passive runs also indicates that the 2 fusiform activation sites cannot be located within the core of the FBA since by definition, FBA responds to the visual presentation of human body parts during passive observation. This interpretation is in line with the results of Kontaris et al. (2009) showing that the FBA itself does not show increased fMRI activation for concurrent observation and execution of hand movements compared with pure observation alone. As shown in Figure 6, the fusiform group interaction site is indeed located rostral to the FBA and might represent a body part–specific and acting-dependent extension of the FBA.
Posterior Middle Temporal Gyrus
The second site showing an effector-based organization was located in the pMTG. In keeping with previous studies, the pMTG showed stronger activation for actions compared with control conditions (Buccino et al. 2001; Wheaton et al. 2004; Pelphrey et al. 2005; Thompson et al. 2007; Jastorff et al. 2010), and in contrast to the FG, this pattern was also present in the unsmoothed single-subject analysis of the passive runs. Interestingly, however, this general preference for action stimuli was modified during active runs. In unsmoothed individual data, neighboring sites in the pMTG showed significant interaction between effector and action, and the sign of this interaction depended on the effector used to respond to the task. Previous results investigating somatotopic organizations within the pSTS/MTG have been inconsistent. For example, Pelphrey et al. (2005) reported that different subregions of pSTS were involved in the visual processing of hand, eye, and mouth movements. Similar results were obtained by Wheaton et al. (2004). However, several other studies found no evidence for body part–specific processing (Buccino et al. 2001; Thompson et al. 2007; Jastorff et al. 2010). Our results indicate that certain parts of pMTG, which are not overtly somatotopically organized (as shown in the passive runs), show a preference for the processing of a specific effector when this effector is used for a motor response. Investigating effector specificity exclusively using the passive runs did not result in significant activations in the pSTS. However, this negative result does not necessarily imply that other regions within the STS are not sensitive to specific effectors. Indeed, our paradigm may not be sufficiently powerful to detect somatotopic organization in these STS regions in the absence of motor execution/preparation.
It is noteworthy that the 2 group sites having effector-based organization (FG and pMTG) showed considerable individual variation in the relative location of the hand and foot sites. This variability could not be ascribed to the normalization process involved in warping to a template, as it was also observed in native space. Interestingly, the 2 effector-specific individual sites seemed to rotate around a central point from subject to subject. This is reminiscent of the rotation of the MT/V5 cluster en bloc with respect to nearby retinotopic regions in individual subjects (Kolster et al. 2010). These rotations were limited to 90°, but wider rotations can be observed in other parts of individual retinotopic maps (H Kolster, R Peeters, GA Orban, unpublished data). This suggests that individual variations in the cortical surface layout are not limited to mere deformations.
Modulation of activity in OTC with respect to our own movements in the absence of visual stimulation has been demonstrated previously (Astafiev et al. 2004; Peelen and Downing 2005a; Dinstein et al. 2008; Kuhn et al. 2010; Orlov et al. 2010). In these studies, mapping of the visual areas had been performed in separate runs, using stimuli unrelated to the movements subjects performed in subsequent runs, which indicated a rather automatic influence arising from our own movements upon the activity in OTC. Also, our study differed from classical studies investigating the interaction between action and perception in that the subject's response and the observed action only matched in terms of effector, but not in the exact action or timing. Nevertheless, our study has revealed effector-specific effects in OTC, suggesting that our own movements influence activation in visual cortex automatically, possibly rendering the areas more sensitive to the processing of a given effector. As we investigated only this nongeneric case, it is conceivable that we missed interactions that would depend on the exact match between observed and executed action. Furthermore, given our block design, we cannot disentangle effects resulting from the preparation or the execution of the action. Again we may have missed sites that require these more specific instances of interaction that occur only during given planning stages or execution of the actions.
Functional Roles of the Different Interaction Sites
Why would motor preparation/execution modulate activation and why in these 3 locations described in particular? One possibility could be that these sites constitute higher order visual processing modules, located just before, or within, the stage where visual action–related information and motor information concerning movements are combined. Since there is growing evidence that visual action–related signals are processed in multiple streams, one would expect multiple interaction sites. It has indeed been proposed that 2 separate streams are involved in the transport and the grasp component of reach-to-grasp actions (Arbib 1981; Jeannerod 1981). Rizzolatti and Matelli (2003) distinguished a dorsomedial pathway, encompassing the superior parietal lobule and dorsal premotor cortex, involved mainly in the control of reaching and the representation of the visual periphery, as well as a dorsolateral pathway, involving the inferior parietal lobule and the ventral premotor cortex, responsible for processing object properties in preparation for grasping.
In the monkey, a possible entry point for motor information into the dorsomedial pathway might be area V6A, an area that, in contrast to the original division between reaching and grasping, also contains neurons responsive to specific grasps (Fattori et al. 2010). Galletti et al. (2003) proposed that area V6A might be involved in rapid online control of one's own actions by matching the programmed movement, signaled by intrinsic activity or efference copies of the motor command arising from premotor cortex, with the executed movement, signaled by visual and somatosensory inputs. In the human, an area with properties similar to V6A, termed anterior superior parietooccipital cortex, has recently been described by Cavina-Pratesi (2010). Our PPC activation sites do not seem to correspond directly to this area but are nonetheless located in the same vicinity, as is the case with POIPS (Fig. 6) and human V6 (Pitzalis et al. 2006). Since V6A in monkeys consists of 2 parts (Luppino et al. 2005), our PPC sites might correspond to one of these and thus might play a role in the online control of our own movements. Note, however, that the homologies between human and monkey parietal cortical regions discussed here are speculative and need to be verified in future experiments testing humans and nonhuman primates in parallel, using identical experimental designs.
In the monkey, visual information about actions reaches areas AIP and PFG, related to the dorsolateral pathway, via connections with the lower and the upper bank of the STS, respectively (Rozzi et al. 2006; Borra et al. 2008). Jastorff and Orban (2009) proposed that in humans, the lower bank of monkey STS might correspond to posterior inferior temporal gyrus and the FG, and to the upper bank to pMTG/STS, a proposition that is further supported by our recent monkey imaging results (Jastorff et al. forthcoming). Effector-specific modulation due to concurrent visual stimulation and motor preparation/execution in these 2 regions might enhance the processing of others' actions for a more precise analysis of body postures (FG) and body movements (pMTG). Although further parallel imaging work is needed, it may prove to be the case that in humans, these occipitotemporal interaction regions provide signals that could be used, in contrast to signals from parietal cortex, for imitation by the motor system or for communication and interpretation by the social brain.
For the first time, we have observed effector-specific processing in the FG and the pMTG that became apparent exclusively during combined action observation and action preparation/execution. In the absence of action preparation/execution, the respective regions did not respond selectively. This suggests a flexible mechanism by which central signals tune neuronal populations within OTC. However, discovering this mechanism at the resolution of fMRI required the analysis of unsmoothed single-subject data. Future research, investigating the relationship between the observed and executed action, as well as their relative timing, is necessary to further understand this mechanism and the specificity of this tuning process.
Fonds Wetenschappelijk Onderzoek (FWO), Inter-University Pole of Attraction 6/29, and Excellentie Financiering 05/014. FWO G 0730.09, IUAP 6/29, and EF 05/014. J.J. is a postdoctoral fellow of the FWO.
The authors are indebted to M De Paep, W Depuydt, P Kayenbergh, G Meulemans, and S Verstraeten for technical support and to S Raiguel for comments on an earlier version. Conflict of Interest: None declared.