When we observe actions, we activate parietal and premotor areas that are also recruited when we perform actions ourselves. It has been suggested that this action mirroring is causally involved in the process of action understanding. Alternatively, it might reflect the outcome of action understanding, with the underlying cognitive processes taking place elsewhere. To identify and characterize areas involved in action understanding, we presented participants with point-light displays depicting human actions and engaged them in tasks that required identifying the effector (arm/leg) or the goal of an action. We observed a stronger blood oxygen level-dependent signal during the Goal in comparison to the Effector Task not only in premotor areas, but also in the middle temporal gyrus (MTG) and the anterior ventrolateral prefrontal cortex. In the MTG, the Goal Task led to a signal higher than the Effector Task only when actions were easy to understand, whereas frontal areas showed this difference also when the task was difficult, a finding that is not caused by a ceiling effect. Our results suggest an interplay between temporal and frontal areas that is modulated by task difficulty and thus provide important constraints for biologically plausible models of action understanding.
Successful interactions with our environment depend on our ability to interpret and understand actions performed by others. Motor theories of action understanding suggest that this ability crucially depends on the recruitment of the same structures that are involved when we perform the same actions ourselves. According to this view, we understand actions by means of a direct matching mechanism between observed actions and actions that are stored in the observer's motor system (Rizzolatti and Craighero 2004; Rizzolatti and Sinigaglia 2010). An alternative account that we will refer to as nonmotor theories of action understanding assumes that activation in the motor system during action observation might be triggered by the outcome of action understanding and that cognitive processing that underlies action understanding takes place in structures outside the motor system (Csibra 2008; Mahon and Caramazza 2008; Hickok 2009; Lingnau et al. 2009).
Neurophysiological studies have provided evidence that is compatible with motor theories of action understanding. In particular, a subset of neurons in macaque inferior premotor cortex has been reported to be activated during both the observation and execution of specific actions (di Pellegrino et al. 1992; Gallese et al. 1996; Rizzolatti et al. 1996). Some of these neurons are sensitive to the type of action (e.g. grasping an object), irrespective of the way these actions are performed (e.g. using normal or reversed pliers; Umiltà et al. 2008). Moreover, these neurons respond during hand-object manipulations even if the final part of the action is hidden from the monkey and thus can only be inferred (Umiltà et al. 2001). In humans, the inferior parietal lobe (IPL) and the inferior frontal gyrus (IFG) have been suggested to form the putative human mirror system (Buccino et al. 2001; Rizzolatti and Craighero 2004), but it is debated whether these areas contain neuronal populations with properties similar to those reported in macaques (Dinstein et al. 2007; Chong et al. 2008; Dinstein, Gardner et al. 2008, Dinstein, Thomas et al. 2008; Hickok 2009; Kilner et al. 2009; Lingnau et al. 2009; Mukamel et al. 2010; Oosterhof et al. 2011, 2012).
Whereas both motor and nonmotor theories of action understanding are compatible with an involvement of premotor regions in tasks requiring action understanding, only motor theories of action understanding predict that an impairment of the motor system affects the ability to understand actions. Studies testing this prediction have produced mixed results. In line with motor theories of action understanding, several studies report that performance in tasks requiring gesture production and recognition is correlated in groups of patients suffering from a lesion in the left hemisphere (Buxbaum et al. 2005; Pazzaglia et al. 2008). However, whereas such correlations can be found at the group level, there are important exceptions from these patterns. As an example, Negri et al. (2007) described several patients who were impaired for object use but normal for action recognition. Moreover, they reported 1 patient (P.T.) with impaired imitation of both pantomimes and intransitive actions, but normal performance in pantomime recognition and object use (see also Mahon and Caramazza 2005). Likewise, Papeo et al. (2010) demonstrated double dissociations between action naming and imitation.
Motor and nonmotor theories of action understanding differ in 1 additional aspect. Whereas motor theories assume that the motor system contains high-level representations of action goals that need to be recruited to understand actions, nonmotor theories assume that such high-level representations can be found outside the motor system. This comparison is thus an example of forward inference (Henson 2006) since nonmotor theories, but not motor theories of action understanding, predict the involvement of areas outside the motor system.
Studies testing this prediction have produced conflicting results. Hamilton and Grafton (2006) reported that anterior intraparietal sulcus (aIPS) is sensitive to the type of object being grasped, but not to the spatial location of the object, and conclude that aIPS represents the goal of an action. Using state-dependent transcranial magnetic stimulation (TMS), Cattaneo et al. (2010) reported that ventral premotor cortex (PMv) and IPL are sensitive to the type of action (push and pull) irrespective of the effector (hand and foot), whereas superior temporal sulcus (STS) was sensitive to both manipulations, suggesting a more abstract representation in PMv and IPL in comparison to STS. In contrast, neurons in STS show a generalization across a variety of manipulations such as viewing angle and movement velocity (Perrett et al. 1989, 1990), indicating that they might represent the goal of an action. In line with this view, Lestou et al. (2008) reported that STS is sensitive to differences in the type of action, irrespective of the kinematics, whereas PMv is sensitive both to movement kinematics and to the type of action. However, Tranel et al. (2003) reported that retrieval of action knowledge is impaired not only by lesions in the left premotor/prefrontal cortex and parietal cortex, but also by lesions in the left posterior middle temporal region.
A problem that is common to many previous studies is that the tasks can often be performed without having to understand the actions, for example, on the basis of judgments of the spatial configuration of stimuli (Urgesi et al. 2006, 2007) or on the basis of the distinction between correct and incorrect actions (Pazzaglia et al. 2008). Such differences might simply be due to familiarity or salience and thus do not necessarily reflect the process of action understanding (see also Kalenine et al. 2010).
In the current study, we aimed to identify and characterize the network of areas that are involved if participants have to understand the goal of an action in comparison to identifying the effector that constitutes an action while keeping visual stimulation identical, thereby preventing confounding factors such as familiarity or salience. According to motor theories of action understanding, we should find that understanding the goal versus the effector of an action reveals areas within parietal and premotor cortices, in particular, the IPL and the PMv (Rizzolatti et al. 2001; Rizzolatti and Sinigaglia 2010). In contrast, if activation within parietal and premotor regions during action understanding reflects the outcome rather than the process underlying action understanding, we expect to find differences between the Goal and the Effector Tasks also outside the motor system. To test our predictions, we presented participants with point-light displays depicting human actions (throwing a ball, punching someone, kicking a ball, and kicking someone) while engaging them in 3 tasks requiring different levels of action understanding (Fig. 1A). During the Red Dot Task, participants had to indicate whether or not 1 of the points forming the actions had briefly turned red. During the Effector Task, participants had to indicate whether the relevant effector was the arm (as in throwing a ball) or the leg (as in kicking a ball). During the Goal Task, participants had to indicate whether the point-light display depicted an action involving a ball (as in throwing a ball) or not (as in punching someone). To be able to examine whether areas revealed by these tasks are sensitive to task difficulty, we parametrically varied this factor by applying different levels of spatial noise to the point-light displays (Fig. 1B).
Our paradigm revealed a network of areas that showed a stronger blood oxygen level-dependent (BOLD) response during the Goal Task in comparison to the Effector Task. These areas included, but were not restricted to, premotor areas. In particular, our results suggest an important role of the middle temporal gyrus (MTG) and the anterior ventrolateral prefrontal cortex (aVLPFC) in action understanding. These results are compatible with nonmotor, but not with motor, theories of action understanding and thus have important implications for the assertion that higher level cognitive processes are mediated by a direct, not cognitively mediated, simulation within the motor system.
Materials and Methods
Seventeen participants (9 males and 8 females, mean age 28.9 years) participated in this experiment. Vision was normal or corrected to normal using magnetic resonance (MR)-compatible glasses. All participants were neurologically intact and gave written informed consent for their participation. The experimental procedures were approved by the Ethics Committee for research involving human subjects at the University of Trento.
Procedure and Visual Stimulation
Stimuli consisted of point-light displays recorded from 2 different actors (1 male and 1 female) using a motion-capture device (Qualisys) at a sampling frequency of 250 Hz outside the MR laboratory. We used 13 reflective markers attached to the head, shoulders, elbows, hands, hips, knees, and feet. Four different types of human actions (throwing/kicking a ball and punching/kicking someone; Fig. 1A) were used. Ten different exemplars of each action were produced by each actor. To make sure that actors performed the different exemplars of each action with similar timing and to make sure that the actions were performed in a comparable way across the 2 actors, we provided the actors with a prototype video of each action via a laptop in front of them during stimulus generation. Coordinates of each point-light stimulus were reconstructed using software available from Qualisys. For each stimulus, 5 different viewing angles (lateral to the left/right, half-frontal to the left/right, and frontal) were reconstructed. Each point-light display was downsampled to 60 Hz and cut to a length of 1.5 s using software written in MATLAB.
Difficulty of action understanding was manipulated by either using the original trajectories (low noise level) or rotating the trajectories of 6 (medium noise level) or 12 (high noise level) numbers of markers by 90°, 180°, or 270° (Saygin et al. 2004). This rotation disrupts local form information while keeping the overall physical stimulation intact. The advantage of this procedure, in contrast to paradigms used in previous studies, is that it allows quantification of performance in action understanding.
In each trial, participants were presented with a 1.5 s point-light display depicting 1 of the actions. At the same time, they were asked to perform tasks that require different levels of action understanding (Fig. 1A). During the Red Dot Task, participants were asked to indicate whether or not 1 of the 13 markers briefly turned red. During the Effector Task, they were asked to judge whether the action mainly involved the arm (throwing a ball/punching someone) or the leg (kicking a ball/kicking someone). The Goal Task required to indicate whether the action involved a ball (throwing/kicking a ball) or not (punching/kicking someone).
Stimuli were back-projected onto a screen by a liquid crystal projector at a frame rate of 60 Hz and a screen resolution of 1024 × 768 pixels (mean luminance, 109 d/m2). Participants viewed the stimuli binocularly through a mirror above the head coil. The screen was visible as a rectangular aperture of 17.8° × 13.4°.
The size of each point-light display was normalized to appear on the screen with a width and height of 5.25°. The size of each single marker was 0.14°.
In half of all trials, 1 of the markers (randomly selected) turned red for 4 consecutive frames (66.7 ms), whereas in the other half of the trials, all markers were white throughout the entire trial. The duration of the red dot was chosen on the basis of behavioral pilot data to make sure that the Red Dot Task was demanding. To prevent missing the red marker when it was presented very close in time to the onset or the offset of the point-light display, it always appeared within a time window of 500–1000 ms after the onset of the trial.
Instructions and Training
Before the experiment, participants were given written instructions explaining their task, followed by several practice blocks until the participant was able to perform the tasks properly. For the Red Dot Task, participants had to indicate by button press if 1 of the markers had briefly turned red (left: yes and right: no). During the Effector Task, participants had to indicate whether the action mainly involved the arm (left button) or the leg (right button). During the Goal Task, they were instructed to indicate whether the action involved the use of a ball (left: yes and right: no).
The experimental design is illustrated in Figure 1C. We used a mixed design, with task blocked (15 s on, followed by 12 s rest). Within each block, noise level was varied from trial to trial, with each noise level occurring 2 times per block, leading to 6 trials per block, for a total of 24 repetitions of each noise level per scanning run. The type of action was assigned randomly, with each type of action occurring 24 times within each scanning run.
Each task block started with a brief (3 s) written instruction of the upcoming task, followed by 6 trials, lasting 1.5 s each. Each single trial was followed by a 1 s inter-trial-interval during which a green fixation cross was shown. Participants were instructed to respond while the fixation cross was present.
Within each scanning run, no point-light display was shown more than once, with type of actor and viewing angle assigned randomly. The order of blocks followed a regular sequence (e.g. ABC), with each participant having 2 different sequences in odd and even runs (ABC vs. BCA for participants 1, 2, 7, 8, 13, and 14; CAB vs. ACB in participants 3, 4, 9, 10, 15, and 16; BAC vs. CBA in participants 5, 6, 11, 12, and 17).
In total, we created 4 (action type) × 2 (actor) × 10 (exemplars of each action type per actor) × 5 (viewing angle) × 3 (noise level) = 1200 different point-light displays.
Altogether, there were 36 conditions (4 types of actions × 3 tasks × 3 noise levels). However, for data analysis, we aimed to average across the type of action, thus leaving 9 conditions (3 tasks × 3 noise levels), with 64 repetitions per condition per participant. To identify regions of interest (ROIs), we computed a contrast comparing the Effector and Goal Tasks versus the Red Dot Task (see Definition of ROIs). For statistical analysis of the effect of task and noise level, we therefore left out the Red Dot Task, using 6 conditions (2 tasks × 3 noise levels; see Statistical Analysis).
We acquired functional magnetic resonance imaging (fMRI) data using a 4 T Bruker MedSpec Biospin MR scanner and an 8-channel birdcage head coil. Functional images were acquired with a T2*-weighted gradient-recalled echo-planar imaging (EPI) sequence with fat suppression. Before each functional scan, we performed an additional scan to measure the point-spread function (PSF) of the acquired sequence, which serves for correction of the distortion expected with high-field imaging (Zaitsev et al. 2004).
We used 31 slices, acquired in the ascending interleaved order, slightly tilted to run parallel to the calcarine sulcus [time to repeat (TR), 2250 ms; voxel resolution, 3 × 3 × 3 mm; echo time, 33 ms; flip angle (FA), 76°; field of view (FOV), 192 × 192 mm; gap size, 0.45 mm]. Each participant completed 8 scanning runs of 168 volumes each.
To be able to coregister the low-resolution functional images to a high-resolution anatomical scan, we acquired a T1-weighted anatomical scan (magnetization-prepared rapid acquisition gradient echo; voxel resolution, 1 × 1 × 1 mm; FOV, 256 × 224 mm; generalized autocalibrating partially parallel acquisitions with an acceleration factor of 2; TR, 2700 ms; inversion time, 1020 ms; FA, 7°).
Behavioral performance in the 3 tasks was analyzed using d′ (Macmillan and Creelman 1991), which corrects for response bias. Data analysis of anatomical and functional data, including cortex segmentation and inflation, was performed using BrainVoyager QX 2.2 (BrainInnovation) in combination with the BVQX Toolbox and custom software written in Matlab (MathWorks).
To correct for distortions in geometry and intensity in the echo-planar images, we applied distortion correction on the basis of the PSF data acquired before each EPI scan (Zeng and Constable 2002). Before additional analysis, we removed the first 4 volumes to avoid T1 saturation. We aligned the first volume of the first run, which was closest in time to the acquisition of the anatomical scan, to the high-resolution anatomy (9 parameters). Next, we performed 3D motion correction with trilinear interpolation using the first volume of the first run of each participant as reference, followed by slice timing correction with ascending interleaved order. Functional data were temporally high-pass filtered using a cutoff frequency of 3 cycles per run. Furthermore, we applied spatial smoothing with a Gaussian kernel of 8 mm full-width at half-maximum. For group analysis, both functional and anatomical data were transformed into Talairach space using trilinear interpolation.
Within each hemisphere, the border between gray and white matter was segmented and reconstructed. The resulting surfaces were then smoothed and inflated.
Definition of ROIs
To identify ROIs, we aimed to choose a contrast that allows identifying areas that are involved in action understanding in contrast to passive observation of biological motion, without biasing our data toward a difference between a task that requires a higher level of action understanding (i.e. determining if the action involved a ball or not) and a lower level task that can be carried out on the basis of lower level aspects of the action (i.e. the type of the effector). To this aim, we computed a random-effects (RFX) general linear model (GLM) including the factors Task (Red Dot, Effector, and Goal) and Noise Level (low, medium, and high). We also included the first derivative of each predictor time course to be able to model shifts of the hemodynamic impulse response function. Furthermore, we included 6 parameters resulting from 3D motion correction (x, y, z translation and rotation) in the model. Each predictor time course was convolved with a dual-gamma hemodynamic impulse response function (Friston et al. 1998). The resulting reference time courses were used to fit the signal time course of each voxel.
To identify ROIs, we computed the contrast [Effector Task, Goal Task]> Red Dot Task. The resulting statistical map was thresholded using a false-discovery rate (FDR) < 0.01 in combination with a cluster threshold of 4 contiguous voxels.
Within each ROI, we computed an RFX GLM analysis with the factors Task and Noise Level and extracted the resulting beta weights separately for each participant. Next, we performed an analysis of variance (ANOVA) with the factors ROI × Task × Noise Level. Since the comparison between the Red Dot Task and the Effector and Goal Tasks had been used for ROI identification, we restricted the ANOVA to the Effector and Goal Tasks. Additional statistical analyses were carried out if justified by significant interactions.
Figure 2A shows the behavioral data (d′) collected inside the MR scanner as a function of Noise Level (low, medium, and high) and Task (Red Dot, Effector, and Goal). Two observations are evident: first of all, as expected, performance decreased with increasing noise level for the Effector Task and the Goal Task, but not for the Red Dot Task. Secondly, performance differed between the 3 tasks, with highest overall performance for the Effector Task and lowest performance for the Red Dot Task. This difference in performance between tasks decreased with increasing noise level. Our observations are supported by the corresponding statistics (main effect of Task: F2,30 = 4.703, P = 0.017; main effect of Noise Level: F2,30 = 124.211, P < 0.0001; interaction Task × Noise Level: F4,60 = 29.927, P < 0.0001). During the highest noise level, performance was at chance both during the Effector and Goal Tasks, as indicated by d′-value not significantly differing from 0 [Effector Task High Noise Level vs. 0: t(15) = −1.043, P = 0.313 and Goal Task High Noise Level vs. 0: t(15) = 1.821, P = 0.089]. Moreover, performance did not differ between the Effector and Goal Tasks [Effector Task High Noise Level vs. Goal Task High Noise Level: t(15) = −1.588, P = 0.133] during the highest noise level.
Mean reaction time and standard error (SE) in the Red Dot, Effector, and Goal Tasks were 464.54 (±10.30), 447.73 (±9.78), and 460.62 (±8.57) ms, respectively (main effect task: F2,30 = 3.393, P = 0.047). The Red Dot Task yielded longer response times than the Effector Task (F1,15 = 6.465, P = 0.023), whereas none of the remaining comparisons revealed significant differences (all P > 0.05). Mean reaction time was shorter with low (435.05 ± 9.15 ms) in comparison to the medium (468.46 ± 10.12 ms) and high (469.38 ± 8.41 ms) noise levels (main effect noise level: F2,30 = 28.130, P < 0.0001). Task and noise level did not interact (F4,60 = 1.725, P = 0.156).
Areas Involved in Action Understanding
The areas identified by the contrast [Effector Task, Goal Task]>Red Dot Task (see Definition of ROIs for details) are shown in Figure 2B. This statistical contrast revealed a region in the orbital part of the left and right IFG (aVLPFC), the triangular and opercular part of the left and right IFG, left dorsal premotor cortex, bilateral IPL, and bilateral MTG. Talairach coordinates of these ROIs can be found in Supplementary Table S1.
The Effect of Task and Noise Level
Within these ROIs, we extracted the beta estimates of the BOLD response as a function of Task and Noise Level. Since we used the comparison between the Effector and Goal Tasks versus the Red Dot Task to identify our ROIs, we restricted our statistical analysis to the comparison between the Effector and Goal Tasks (the results of the Red Dot Task in Fig. 2C,D are shown for illustrative purposes only).
Figure 2C–E shows the effect of Task and Noise Level as a function of ROI. Overall activation level increases from anterior to posterior ROIs [main effect of ROI: (F8,128 = 35.078, P < 0.0001)]. This observation holds for all 3 noise levels. The Red Dot Task (red bars) leads to an overall weaker response in comparison to the Effector (gray bars) and Goal Tasks (blue bars) (Fig. 2C–E). This is an expected result since we selected ROIs accordingly. Importantly, however, in most of the regions, the Goal Task leads to a higher BOLD signal in comparison to the Effector Task (main effect task: F1,16 = 8.822, P = 0.009). In most ROIs, the difference between the Effector and Goal Tasks is observed only for the low noise level, that is, those trials when the stimulus was easy to recognize (Fig. 2C). This observation is supported by the corresponding interaction between Task and Noise Level (F2,32 = 3.911, P = 0.030). In IFG and aVLPFC, this difference between the Goal and Effector Tasks is present also for the medium noise level (Fig. 2D). This difference of the effect of the Noise Level on the effect of the Task between ROIs is supported by the corresponding interaction for the factors ROI × Task × Noise Level (F9.988,159.808 = 3.720, PHF < 0.001). For the high noise level, none of the regions showed a significant difference between the Effector and Goal Tasks (Fig. 2E). For an overview of the corresponding statistics, see Table 1.
|Noise Level||(2, 32)||2.134||0.135|
|ROI × Task||(3.848, 61.575)||14.549||<0.0001 (HF)|
|ROI × Noise Level||(8.794, 140.711)||3.720||<0.0001 (HF)|
|Task × Noise Level||(2, 32)||3.911||0.030|
|ROI × Task × Noise Level||(9.988, 159.808)||3.720||<0.0001 (HF)|
|Noise Level||(2, 32)||2.134||0.135|
|ROI × Task||(3.848, 61.575)||14.549||<0.0001 (HF)|
|ROI × Noise Level||(8.794, 140.711)||3.720||<0.0001 (HF)|
|Task × Noise Level||(2, 32)||3.911||0.030|
|ROI × Task × Noise Level||(9.988, 159.808)||3.720||<0.0001 (HF)|
Note: If Mauchly's tests indicated violation of the sphericity assumption, degrees of freedom were adjusted by the Huynh–Feldt procedure (denoted as HF).
Within each ROI, we carried out additional ANOVAs with the factors Task and Noise Level (Table 2). This analysis revealed that the effect of task is not restricted to premotor areas. Both within the left and right MTG and the aVLPFC, the Goal Task leads to a higher BOLD signal than the Effector Task, and this effect is modulated by the noise level. Note that the lack of a difference between the Effector and Goal Tasks for medium and high noise levels in MTG cannot be due to a ceiling effect: if that were the case, we should expect to see an increase in overall activity with noise level. In contrast, the BOLD response decreases with increasing noise level both in left (main effect noise level: F2,32 = 6.978, P = 0.003 and linear trend: F1,16 = 8.204, P = 0.011) and right (main effect noise level: F2,32 = 12.566, P < 0.0001 and linear trend: F1,16 = 15.922, P = 0.001) MTG.
|Main effect of Task||Main effect of Noise Level||Interaction Task × Noise Level|
|Main effect of Task||Main effect of Noise Level||Interaction Task × Noise Level|
Significant values (P < 0.05) are marked by an asterisk.
It has been claimed that the action mirroring system enables a direct, “not cognitively mediated,” access to action understanding (Fabbri-Destro and Rizzolatti 2008). In contrast to this view, we show that a task that requires understanding the goal in comparison to identifying the effector of an action does recruit not only premotor areas involved in action mirroring, but also bilateral MTG and the aVLPFC.
Action Mirroring as the Outcome of Action Understanding or Vice Versa?
We observed that the difference between the Goal and Effector Tasks in bilateral MTG was restricted to the low noise level condition, that is, when actions were easy to understand. In contrast, frontal areas also distinguished between the 2 tasks at an intermediate noise level, that is, when actions were hard to understand. This suggests an interesting division of labor between temporal and frontal areas, with the MTG potentially being involved in a first pass of the analysis that generates hypotheses about the meaning of an action, whereas frontal areas, in particular, aVLPFC and IFG, might provide additional information when it becomes harder to distinguish between several competing alternatives.
Alternatively, the involvement of bilateral MTG in action understanding might reflect feedback from premotor areas: information enters the IFG through the IPL, and once the action has been understood on the basis of action mirroring, the output is transmitted to bilateral MTG. If IFG were the central hub for action understanding, lesions to IFG, but not MTG, should impair action understanding even if the task were easy. In contrast to this prediction, Tranel et al. (2003) reported impaired retrieval of action knowledge not only by lesions in premotor and parietal cortices, but also by lesions in the left posterior MTG. Likewise, using voxel-based lesion symptom mapping (VLSM; Bates et al. 2003), Kalenine et al. (2010) reported that lesions in the MTG, but not in the IFG, were associated with impaired performance in a gesture recognition task. These results are compatible with the idea that MTG provides access to action semantics stored in memory, which are then fed forward to other areas, including IFG, where this information is matched with the corresponding motor plan, for example, in preparation of an imitative gesture, and where additional information might be recruited, for example, on the basis of the knowledge of how these actions are performed. Note that the task used by Kalenine et al. (2010) was relatively easy. This might be the reason why they observed no significant relation between performance and lesions in IFG but in the MTG.
Our results suggest that future studies should investigate the causal role of both MTG and IFG in action understanding as well as alterations of brain activity in time using TMS and electrophysiology, respectively. Moreover, our data indicate that future studies using TMS and VLSM could profit from the parametric variation of task difficulty described in the present study.
The Role of MTG and aVLPFC in Action Understanding
Taken together, our data suggest a specific involvement of areas within the temporal cortex that contain elaborated representations of actions in addition to parietal and premotor areas that have been suggested to be involved in action mirroring. We hypothesize that the MTG might respond to sequences of visual or auditory events and gather evidence in favor of stored semantic representations. This information might then be sent to posterior IFG via the aVLPFC, where this semantic information is then matched with the corresponding motor representation. The suggested interplay between MTG and aVLPFC is supported by a recent study using diffusion-tensor imaging and resting state fMRI (Turken and Dronkers 2011) that reported that the left MTG is strongly connected with the orbital part of the IFG, anterior superior temporal gyrus, and the superior temporal sulcus. These findings support the idea that the MTG might be seen as a central hub in the network involved in retrieval and storage of action semantics. According to this view, lesions in MTG should result in profound deficits, in line with the findings reported by Kalenine et al. (2010) and Tranel et al. (2003). In line with this view, a meta-analysis on fMRI studies aimed to identify brain regions involved in the storage and retrieval of conceptual knowledge reported that the largest overlap of studies that specifically examined action knowledge was reported in the posterior left MTG and the ventral left supramarginal gyrus in the vicinity of the IPL (Binder et al. 2009).
The Effect of Task Difficulty
Our behavioral data show that the Goal Task was more difficult than the Effector Task at the lowest noise level. Is it possible that the larger BOLD signal in the Goal in comparison to the Effector Task reflects task difficulty rather than differences in the level of action understanding required by the 2 tasks? We think that this is unlikely: if the areas revealed in our paradigm responded to differences in task difficulty only, the BOLD signal should follow the behavior not only in the Goal and the Effector Tasks, but also in the Red Dot Task. This is clearly not the case: in the low noise condition, the Red Dot Task leads to worst performance, whereas the Effector Task leads to best performance. The BOLD signal, on the contrary, is highest for the Goal Task, intermediate for the Effector Task, and lowest for the Red Dot Task. For the high noise level, performance for the Red Dot Task is better in comparison to the Effector and Goal Tasks, whereas the Goal and Effector Tasks show a higher BOLD signal in comparison to the Red Dot Task also under these circumstances. Taken together, differences in task difficulty between the 3 tasks cannot explain the BOLD responses we observed.
One may wonder why we observed no difference between the Goal and Effector Tasks for the highest noise level. For the most difficult condition, that is, when 12 out of 13 markers were scrambled, behavioral results (d′) showed that participants performed at chance, and performance did not differ between the Effector and Ball Tasks. It is therefore not surprising that frontal regions do not distinguish between deep and shallow action understanding at the highest noise level. In other words, we assume that frontal areas might present additional information when action understanding becomes harder, but not when it is impossible to understand the action.
Sensitivity to Objects
Beauchamp et al. (2003) reported that the MTG responds stronger to point-light displays of tool motion in comparison to human motion. Does the difference between the Goal and Effector Tasks in the MTG reflect sensitivity to objects, which are relevant in the Goal Task, but not in the Effector Task? Since actions involving a ball were equally likely to occur during the Goal as well as during the Effector Task, a simple explanation on the basis of the presence or absence of an action involving a ball during these 2 conditions does not hold. It is possible, however, that participants paid attention to whether or not a ball was involved during the Goal Task and that this enhanced the signal in object-sensitive neuronal populations, thereby causing a higher response during the Goal Task in comparison to the Effector Task. Such an explanation would predict a higher BOLD signal for actions involving a ball (i.e. throwing a ball and kicking a ball) in comparison to those actions not involving a ball (i.e. punching someone and kicking someone) during the Goal Task, but not during the Effector Task. To test this prediction, we carried out an additional GLM analysis, using the factors Type of Action (actions involving a ball and actions not involving a ball) and Task (Effector and Goal). In order to have enough statistical power for this analysis, we collapsed actions involving the arm and the leg as well as trials with low and medium noise levels, leaving out trials with high noise levels. Within each ROI, we estimated the beta weights for each of these factorial combinations and submitted the resulting values to a repeated-measures ANOVA. The results of this analysis can be seen in Supplementary Figure S1 and Table S2. None of the areas followed the predicted pattern. If at all, actions not involving a ball tended to lead to a higher BOLD signal in comparison to actions involving a ball (significant in the right MTG and not significant in all the remaining regions). We are therefore confident that object sensitivity in MTG does not explain the results we observed.
It has been suggested that parietal and premotor areas that are activated when we perform actions ourselves might be involved in recognizing how an action is performed and what the meaning of that action is. The current study suggests that the meaning of an action is not accessed in the action mirroring network alone, but in additional areas involving the MTG, that is, an area that binds together information from different modalities and has access to stored memory representations related to these various inputs, possibly through direct interactions with the aVLPFC. These results provide important constraints for biologically plausible models of action understanding and the role of action mirroring.
This research was supported by the Provincia Autonoma di Trento and the Fondazione Cassa di Risparmio di Trento e Rovereto.
We are grateful to Jens Schwarzbach, Alfonso Caramazza, and Liuba Papeo for helpful discussions and feedback on the manuscript. Conflict of Interest: None declared.