Object-manipulation tasks (e.g., drinking from a cup) typically involve sequencing together a series of distinct motor acts (e.g., reaching toward, grasping, lifting, and transporting the cup) in order to accomplish some overarching goal (e.g., quenching thirst). Although several studies in humans have investigated the neural mechanisms supporting the planning of visually guided movements directed toward objects (such as reaching or pointing), only a handful have examined how manipulatory sequences of actions—those that occur after an object has been grasped—are planned and represented in the brain. Here, using event-related functional MRI and pattern decoding methods, we investigated the neural basis of real-object manipulation using a delayed-movement task in which participants first prepared and then executed different object-directed action sequences that varied either in their complexity or final spatial goals. Consistent with previous reports of preparatory brain activity in non-human primates, we found that activity patterns in several frontoparietal areas reliably predicted entire action sequences in advance of movement. Notably, we found that similar sequence-related information could also be decoded from pre-movement signals in object- and body-selective occipitotemporal cortex (OTC). These findings suggest that both frontoparietal and occipitotemporal circuits are engaged in transforming object-related information into complex, goal-directed movements.
Most everyday manual tasks involve object manipulation, requiring the linking together of several successive actions, such as reaching toward, grasping, lifting and transporting an object, in order to accomplish a desired goal, like putting your phone in your pocket. Although psychophysical research in humans has provided a solid understanding of the planning and control of manipulation tasks from an information-processing perspective (Wolpert and Flanagan 2001; Flanagan et al. 2006; Bowman et al. 2009; Johansson and Flanagan 2009; Wolpert and Flanagan 2010; Wolpert et al. 2011; Safstrom et al. 2013), our understanding of the brain organization supporting such actions is limited. In part, this is because studies examining object-oriented actions in humans have tended to focus on single actions in isolation, such as reaching (e.g., Beurze et al. 2007, 2009; Leone et al. 2014), grasping without further manipulation (e.g., Culham et al. 2003; Gallivan, McLean, Valyear, et al. 2011; Gallivan, McLean, Flanagan, et al. 2013), or simple lifting (Schmitz et al. 2005; Jenmalm et al. 2006). In cases in which sequential behaviors have been studied, these have largely been sequences of repeated actions like finger-press responses (e.g., Wiestler and Diedrichsen 2013; Wiestler et al. 2014) and not sequences of different actions related to object manipulation.
Most of the current understanding about how manipulation tasks are planned and implemented by the brain has come from neurophysiological recordings in non-human primates (NHPs). Recordings from the supplementary and primary motor areas of macaque monkeys trained to perform memorized action sequences indicate that these frontal regions appear to store information pertaining to their execution, such as the component movements and their temporal order (Tanji and Shima 1994; Lu and Ashe 2005). It has been further shown that neurons in both premotor and parietal cortex, which code for single motor acts like grasping (Rizzolatti et al. 1988; Rozzi et al. 2008), also appear to represent the final goals of the manipulation tasks in which object grasping is embedded (e.g., grasping an object for eating versus placing, see Fogassi et al. 2005; Bonini et al. 2010; Bonini et al. 2011). These data are consistent with the conceptualization, set forth in an influential dual visual stream framework, of a dorsal processing pathway, which involves dorsal parietal and premotor regions and that supports action planning and control (Goodale and Milner 1992). Notably, this view further postulates a functional dissociation between this dorsal processing pathway and a more ventral processing pathway in occipitotemporal cortex (OTC) that primarily supports object perception and recognition. The implication of this two-stream model—though it has not yet actually been tested using neurophysiological methods—is that ventral pathway regions are “not” engaged during the planning of movements for object manipulation.
In everyday behavior, the processes of object recognition and the planning and control of movements must be dynamically intertwined. Object recognition is prerequisite to efficient manipulation, since manipulation requires identifying and accessing stored knowledge of object properties (e.g., smaller objects tend to be lighter than larger objects, Johansson and Flanagan 2009). It seems plausible, then, that the planning of object-manipulation actions, in addition to involving frontoparietal structures, might also engage OTC structures. Here, we test that idea using fMRI and a delayed-movement task in which participants first prepare and then execute different object-directed action sequences that vary either in the required number of movement components or in their final spatial positions. We show, using fMRI decoding methods (Tong and Pratte 2012), that preparatory signals specifying upcoming goal-directed object-manipulation tasks are not only represented in several areas of human frontoparietal cortex, as expected, but also several areas of OTC.
Materials and Methods
Fourteen neurologically normal volunteers (7 females, age range: 20–28 years) who were right-handed, as assessed by the Edinburgh handedness questionnaire (Oldfield 1971), participated in 1 behavioral testing session followed by 2 fMRI testing sessions (the fMRI action-sequence experiment, followed by the fMRI localizer session, performed on separate days). Informed consent and consent to publish was obtained in accordance with ethical standards set out by the Declaration of Helsinki (1964) and with procedures cleared by the Queen's University Health Sciences Research Ethics Board. Participants were naïve with respect to the hypotheses under evaluation.
Setup and Apparatus
During the behavioral session and the fMRI action-sequence session, the same experimental setup was used. Each participant's workspace consisted of a black platform placed over the waist and tilted away from the horizontal at an angle (∼15°) to maximize comfort and target visibility. To facilitate direct viewing of the workspace, the head coil was tilted slightly (∼20°) and foam cushions were used to give an approximate overall head tilt of 30° (see Fig. 1A). On each individual trial, participants were first auditorily cued (via headphones) to prepare 1 of 3 different object-directed action sequences with their right hand upon a single centrally located cube object (2.5 × 2.5 × 2.5 cm, width × length × height) and then, after a variable delay (6–12 s), prompted to execute that action sequence. On Grasp-to-Hold trials, they were instructed to execute a precision grasp on the cube object, with the thumb and index finger, lift it ∼10 cm above the platform, hold it stationary in midair for ∼1 s, and then replace it. On Grasp-to-Place-Left trials, they carried out the same sequence of actions as on Grasp-to-Hold trials, but instead of replacing the cube, they transported it to the left cup and released the cube above the cup. Grasp-to-Place-Right trials were almost identical except that the cube was deposited in the right cup. Following Grasp-to-Place-Left and Grasp-to-Place-Right trials, during the intertrial interval (ITI), the experimenter placed a new cube object on the platform. The auditory cues “Grasp,” “Left,” and “Right” signaled the 3 types of trials at trial onset. Participants were instructed to keep the general timing of each hand action as consistent as possible across trials. Other than the execution of these different object-directed action sequences, throughout all other phases of the trial (Plan epoch and ITI), subjects were instructed to keep their hand still (in a relaxed fist) and in a pre-specified “home” position on the platform in between the cube position and the right cup (see Fig. 1C). For each participant, this home/starting position was marked with an elevated small black plastic capsule taped to the surface of the platform and participants were required to return to this same position following execution of each action sequence. The positions of the cube object and cup objects never changed over the entire experimental testing session, thus eliminating retinal differences across the different trial types.
From the participant's perspective, the left and right cup objects were placed on the left and right sides of the platform, equidistant from the participant's mid-sagittal plane and approximately equidistant with respect to the participant's right elbow. The cube object, left cup, and right cup were positioned at ∼7°, 12°, and 11° of visual angle with respect to the fixation point, and the left and right cups were positioned at ∼12° and 11° of visual angle with respect to the cube object's position. The cup objects were held in place by custom-made black disks with raised edges (11 × 1 cm, radius × height, with a 0.5-cm lip of 0.7 cm of thickness) that were secured to the platform. Once the cups were positioned, the experimenter placed the cube on another disk (5.5 × 1 cm, radius × height, with a 0.5-cm lip of 0.7 cm of thickness) that was secured to the platform halfway between the 2 cups. This disk ensured correct and consistent placement of the cube (by either the participant or experimenter, depending on trial type) throughout the experiment (for representative cube and cup positions, see Fig. 1C). The cubes and cups were painted white to increase their contrast with the background. To minimize limb-related artifacts, participants had the right upper-arm braced, limiting movement of the right arm to the elbow and thus creating an arc of reachability for the right hand. The exact placement of the cube and cups on the platform was adjusted to match each participant's arm length such that all required object-directed sequences were comfortable and ensured that only movement of the forearm, wrist, and fingers was required. At the end of each experimental run, the experimenter emptied the cubes that were placed by the participant in the cups on Grasp-to-Place-Left and Grasp-to-Place-Right trials. During the experiment, the workspace of the participant was illuminated from the side by 2 bright white Light Emitting Diodes (LEDs) attached to flexible plastic stalks (Loc-Line, Lockwood Products), positioned to the left and right of the platform (see Fig. 1C). During participant setup, both illuminator LEDs were positioned so as to brightly and evenly illuminate the full workspace (i.e., cube and cups). Experimental timing and lighting were controlled with in-house software created with MATLAB (The Mathworks). To control for eye movements, a small red fixation LED, attached to a flexible plastic stalk, was placed above and ∼10 cm beyond (i.e., away from the participant) the cube position such that both cups and the cube were positioned within the subject's lower visual field. The fixation point was ∼100 cm from the participants' eyes and at a visual angle of ∼15° above the participants' natural line of gaze. Participants were required to always foveate the fixation LED during fMRI data collection.
Action-Sequence fMRI Experiment
Each trial began with a Plan epoch, in which the participant's workspace was illuminated throughout and the auditory cue (1 of “Grasp,” “Left,” or “Right”) was delivered via headphones at the start of the epoch. Following a jittered delay interval (6–12 s), a 0.5-s auditory “beep” cued the participant to immediately execute the cued action sequence, initiating the Execute epoch of the trial. Two seconds following the beginning of this auditory Go cue, the illuminator was turned off, providing the cue for the participant to return their hand back to its “home” location. Once the illuminator was extinguished, the participant then waited in the dark while maintaining fixation for 16 s, allowing the fMRI response to return to baseline prior to the next trial (ITI phase). The 3 trial types (Grasp-to-Hold, Grasp-to-Place-Left, and Grasp-to-Place-Right), with 6 repetitions per condition (18 trials in total), were randomized within a run and balanced across all 9 runs so that each trial type was preceded and followed equally often by every other trial type. Each experimental run lasted 8 min 38 s (259 brain volumes).
The variable delay between cue and movement onset (Plan epoch) on each event-related trial allowed us to distinguish sustained planning-related neural activity prior to movement onset from the transient movement-execution response (Execute epoch, see Fig. 1D–E) accompanying action initiation (see also, Gallivan et al. 2014, for example). This design allowed us to isolate the planning-related fMRI signals while avoiding many of the potential sensory confounds that arise during the hand movement itself (e.g., visual stimulation created by the hand moving and somatosensory stimulation created by the hand contacting and lifting the cube, releasing the cube in the cup, etc.). We adapted this paradigm from previous work with eye- and arm-movements that has successfully parsed delay period activity from the transient neural responses that follow movement onset (Curtis et al. 2004; Beurze et al. 2007, 2009; Chapman et al. 2011; Pertzov et al. 2011). In our previous work, using variants of this general design, we have successfully used the spatial voxel patterns of delay period responses from various brain regions to predict which of 2 or 3 single hand movements directed toward objects (e.g., grasps, reaches, etc.) would be executed moments later (e.g., Gallivan, McLean, Valyear, et al. 2011).
Participants were scanned using a 3-Tesla Siemens TIM MAGNETOM Trio MRI scanner located at the Centre for Neuroscience Studies, Queen's University (Kingston). Functional MRI volumes were acquired using a T2*-weighted single-shot gradient-echo echo-planar imaging (EPI) acquisition sequence (time to repetition [TR] = 2000ms, slice thickness = 3mm, in-plane resolution = 3 × 3mm, time to echo [TE] = 30ms, field of view = 240 × 240mm, matrix size = 80 × 80, flip angle = 90°, and acceleration factor [integrated parallel acquisition technologies, iPAT] = 2) with generalized auto-calibrating partially parallel acquisitions reconstruction. Each volume comprised 35 contiguous (no gap) oblique slices acquired at a ∼30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC–PC), providing near whole-brain coverage. We used a combination of imaging coils to achieve a good signal to noise ratio (see Fig. 1B) and to enable direct object workspace viewing without mirrors or occlusion. Specifically, we tilted (∼20°) the posterior half of the 12-channel receive-only head coil (6-channels) and suspended a 4-channel receive-only flex coil over the anterior-superior part of the head (see Fig. 1A). A T1-weighted ADNI MPRAGE anatomical was also collected (TR = 1760ms, TE = 2.98ms, field of view = 192 × 240 × 256mm, matrix size = 192 × 240 × 256, flip angle = 9°, 1mm isotropic voxels).
Separate practice sessions were carried out before the actual fMRI experiment to familiarize participants with the delayed timing of the task. One of these sessions was conducted before participants entered the scanner (see Behavioral Control Experiment) and another was conducted during the anatomical scan (collected at the beginning of every fMRI experiment). The action-sequence fMRI testing session for each participant lasted approximately 3 h and included setup time (∼45 min), 1 high-resolution anatomical scan, 8–9 experimental runs, and 2–3 localizer scans (not analyzed; collected for a separate study). Throughout the experiment, the participant's hand movements were monitored using an MR-compatible infrared-sensitive camera (MRC Systems GmbH), optimally positioned on 1 side of the platform and facing toward the participant. The videos captured during the experiment were analyzed offline to verify that the participants were performing the task as instructed. Eye tracking was not carried out in the scanner because our eye-tracking system does not work well when the head is tilted, due to a partial occlusion from the eyelids.
The purpose of this separate localizer scan session was to independently identify well-documented OTC ROIs involved in object-selective and body-selective visual processing so that we could then examine in each participant whether object-directed action sequences could be decoded from the pre-movement spatial voxel patterns of activity in each of these category-specific areas. This fMRI session was conducted on a separate testing day, after the action-sequence fMRI session.
During this session, participants viewed color photographs consisting of headless bodies, tools, non-tool objects, and scrambled versions of these stimuli (from Valyear and Culham 2010). Photographs were organized into 16-s blocks, with 18 photographs of the same type (e.g., tools) per block, presented at a rate of 400 ms per photograph with a 490-ms inter-stimulus interval. Each run included 6 stimulus blocks for each of the 3 intact stimulus conditions as well as 7 scrambled blocks, and 2 fixation/baseline blocks (20 s) placed at the beginning and end of each run. Runs lasted 7 min 30 s (225 brain volumes). Within a run, intact stimulus blocks were randomized into sets of 3, separated by scrambled blocks, and balanced for prior-block history within a single run. Each participant completed 3 experimental runs.
Functional data were collected using the same acquisition parameters as for the action-sequence testing session, except that the participant was supine and the conventional 12-channel receive-only head coil was used. In this session, we also collected a high-resolution anatomical image from each of the participants. All stimuli were rear-projected with an LCD projector (NEC LT265 DLP projector; resolution, 1024 × 768; 60 Hz refresh rate) onto a screen mounted behind the participant. The participant viewed the images through a mirror mounted to the head coil directly above the eyes. Participants were required to maintain fixation on a dot (a small black circle) superimposed on the center of each image. Each image subtended ∼15° of visual angle. To encourage participants to maintain attention throughout the localizer scans, participants performed a one-back task throughout, whereby responses were made, via a right-handed button press, whenever 2 successive photographs were identical. Each stimulus block included either 3 or 4 repeated photographs, balanced across conditions. Two additional localizer scans, for a separate study, were also collected during this testing session.
MR Preprocessing and Modeling
All data (from the action sequence and localizer sessions) were spatially aligned to the corresponding participant's high-resolution anatomical image collected during the localizer testing session. All preprocessing and univariate analyses were performed using Brain Voyager QX version 2.6 (Brain Innovation). All ANOVA statistics were corrected for inhomogeneity of variance.
Preprocessing for both experiments included slice scan-time correction, 3D motion correction (such that each volume was aligned to the volume of the functional scan closest in time to the anatomical scan), high-pass temporal filtering of 3 cycles/run, and functional-to-anatomical co-registration. For the ROI-based analyses (see below), the individual subject data were not transformed into a standard brain space. However, for the whole-brain searchlight analysis (also see below), to allow for group-level analyses, the individual subject data were transformed into Talairach space (Talairach and Tournoux 1988). Other than the trilinear-sinc interpolation performed during realignment, and the sinc interpolation performed during reorientation, no additional spatial smoothing was applied to the data.
Functional data from each testing session in each participant were screened for motion and/or magnet artifacts by examining the time-course movies and the motion plots created with the motion correction algorithms. None of the runs revealed head motion that exceeded 1.5 mm translation or 1.5° rotation. In the action-sequence experiment, error trials were identified offline from the videos recorded during the testing session and were excluded from analysis by assigning these trials predictors of no interest. Error trials included those in which the participant fumbled with the object (6 trials, 2 participants), performed the incorrect instruction (4 trials, 3 participants), contaminated the Plan epoch data by slightly moving their limb (2 trials, 2 participants), or cases in which the experimenter failed to replace the cube object following a Grasp-to-Place-Right or Grasp-to-Place-Left trial (5 trials, 3 participants).
General Linear Models
To localize ROIs, for both the action-sequence and localizer sessions, we used general linear models (GLMs) with predictors created from boxcar functions that were then convolved with the Boynton (Boynton et al. 1996) hemodynamic response function (HRF). For each trial in the action-sequence session, a boxcar regressor was aligned to the onset of each phase of the trial, with its duration dependent on that phase: 3–6 volumes for the Plan epoch (due to trial jittering), and 1 volume for the Execute epoch. The ITI was excluded from the model, and therefore, all regression coefficients (betas) were defined relative to the baseline activity during the ITI. For the localizer scans, a boxcar HRF was aligned to the onset of each stimulus block with its duration dependent on stimulus block length. The Baseline/Fixation epochs were excluded from the model, and therefore, all regression coefficients (betas) were defined relative to the baseline activity during these time points. For both sessions, the time-course for each voxel was converted to percent signal change before applying the GLM.
Regions-of-Interest for Pattern-Information Analyses
We used pattern-information decoding methods (Tong and Pratte 2012) to investigate the spatial patterns of fMRI activity during the Plan (and Execute) phases of the action-sequence experiment in several frontoparietal and occipitotemporal regions-of-interest (ROIs). The question of interest was whether we would be able to predict the specific object-directed action sequences to be performed from the preparatory fMRI activity patterns that form prior to movement onset. For each ROI, we examined whether patterns of activity in the region encoded the complexity of the action sequence (i.e., represented Grasp-to-Place vs. Grasp-to-Hold trials differently) and the spatial end goals of the equally complex action sequences (i.e., represented Grasp-to-Place-Left vs. Grasp-to-Place-Right trials differently).
Note that while we recognize that “action complexity” can be a somewhat abstract concept, here we operationalize the term to connote the various features of movement that differentiate the Grasp-to-Place trials from Grasp-to-Hold trials (i.e., movement duration, types of muscles used, and the types of actions performed, etc.) (we do appreciate, however, that other differences between the trials do exist [e.g., there is a social expectation on Grasp-to-Place-Left and Grasp-to-Place-Right but not Grasp-to-Hold trials that the experimenter will add a new cube at the end of the trial, etc.]). Likewise, we also recognize that the term “end goals” can be equally abstract, particularly in the neurophysiological literature, sometimes referring to the upcoming spatial location of a saccade or reach target (e.g., Basso and Wurtz 1997; Snyder et al. 1997; Beurze et al. 2009; Gallivan, McLean, Smith, et al. 2011), other times a desired motor act like grasping, eating, or placing (e.g., Fogassi and Luppino 2005; Hamilton and Grafton 2006) or—and perhaps most often—the term is used to describe some presumably higher-level cognitive process like goal-directed attention (e.g., Corbetta and Shulman 2002). Here, for the current study, we operationalize the term to connote the spatial location of the cup in which the cube will be placed on Grasp-to-Place-Left and Grasp-to-Place-Right trials.
Eight frontoparietal ROIs (superior parieto-occipital cortex [SPOC], posterior intraparietal sulcus [pIPS], anterior IPS [aIPS], primary motor cortex [M1], supplementary motor area [SMA], dorsal premotor cortex [PMd], ventral premotor cortex [PMv], and somatosensory cortex [SSc]), all contralateral to the acting (right) limb, were chosen based on their well-documented role in sensorimotor processing in both humans and NHPs (see Supplementary Table 1 for a list of the regions).
In the case of SPOC, previous work has reported both grasp- and reach-related neural activity in human SPOC and monkey V6A, its putative homolog (Prado et al. 2005; Fattori et al. 2009; Cavina-Pratesi et al. 2010; Fattori et al. 2010; Grafton 2010; Gallivan, McLean, Smith, et al. 2011; Gallivan, McLean, Valyear, et al. 2011). Directly relevant to the current work, it has also been shown that the parietal reach region in the monkey, a functionally defined region encompassing V6A and both medial and caudal intraparietal cortical areas (Calton et al. 2002; Chang et al. 2009), encodes in parallel both targets of a double-reach sequence prior to the first reach being initiated (Baldauf et al. 2008). Thus, here we wished to examine the extent to which human SPOC would represent, during planning, subsequent movements (Grasp-to-Place actions) from the more immediate ones (Grasp-to-Hold actions).
In the case of pIPS, the region has been implicated in a wide range of sensorimotor processes, ranging from visual-spatial attention (Szczepanski et al. 2010) to the coding of action-relevant 3D visual object features (Sakata et al. 1998) and the integration of information related to the acting effector and target location (Beurze et al. 2007; Chang et al. 2008; Stark and Zohary 2008; Gallivan, McLean, Smith, et al. 2011; Gallivan, McLean, Flanagan, et al. 2013). Given the multiplexing of these signals in pIPS, we hypothesized that the area might also represent the different object-directed action sequences.
aIPS is a key parietal area that, through coordination with the PMv to which it is connected (Tanne-Gariepy et al. 2002; Rizzolatti and Matelli 2003), is thought to mediate the transformation of visual information about object features into corresponding motor programs for grasping (Jeannerod et al. 1995; Rizzolatti and Luppino 2001). NHP work has further shown that neurons located near aIPS, in parietal area PFG, also encode the goals of an action sequence in which grasping is embedded (Fogassi et al. 2005). Our selection of aIPS was guided, in part, by an effort to similarly characterize in the human some of these previously documented neural representations (as object grasping is embedded in all 3 of the action sequences used here).
With regards to M1, although neurophysiological recordings in NHPs have previously suggested that the area may play no role in the encoding of entire movement sequences (Tanji and Shima 1994), more recent evidence has challenged this view. For instance, Lu and Ashe (2005) directly identified anticipatory activity in M1 specifying different memorized sequences of arm movements. Likewise, in humans, pattern analysis methods show that both trained and untrained sequences involving finger presses are represented in M1 (Wiestler and Diedrichsen 2013). In selecting voxel activity in human M1, we wished to further clarify its role in encoding action sequences.
SMA, perhaps more than any other brain area, has been implicated in the planning and generation of movement sequences. Lesions, pharmacological inactivation, or TMS to medial premotor cortex results in disruptions of the performance of sequential movements (Brinkman 1984; Halsband et al. 1993; Chen et al. 1995; Thaler et al. 1995; Gerloff et al. 1997; Shima and Tanji 1998). SMA activity was selected in this study so as to expand upon these previous observations and characterize the role of this area in the planning of multi-phase movement sequences involving object manipulation.
In the case of PMd, neural recording studies in NHPs and fMRI work in humans show that activity in the area is involved in coding arm movements (Weinrich and Wise 1982; Weinrich et al. 1984; Caminiti et al. 1990; Beurze et al. 2009). In addition, PMd is thought to play an important role in integrating both effector- and spatial goal-related signals for reaching (Cisek et al. 2003; Hoshi and Tanji 2006; Pesaran et al. 2006; Beurze et al. 2010; Gallivan, McLean, Smith, et al. 2011; Gallivan, McLean, Flanagan, et al. 2013). When considering the planning and execution of limb- or hand-related movement sequences, however, the activity of PMd has not often been considered (though see Kettner et al. 1996; Shanechi et al. 2012; Wiestler and Diedrichsen 2013; Kornysheva and Diedrichsen 2014). Thus, a goal of the present study was to fully characterize the activity of PMd in the context of preparing different object-directed action sequences.
PMv, in addition to playing a role in hand preshaping for grasping, has also been linked to the representation of higher-level action goals (Rizzolatti et al. 1988; Hoshi and Tanji 2002, 2006). For instance, recordings in NHPs show that PMv neurons encode the overarching goals of an action sequence in which grasping is embedded, rather than the precise movement kinematics required to achieve those goals (Rizzolatti et al. 1988; Bonini et al. 2010; Bonini et al. 2011; see Umilta et al. 2008 for a further example of goal-related coding in PMv). Thus, similar to aIPS, our selection of PMv was guided by an effort to characterize its activity in the context of a human goal-directed object-manipulation task.
And lastly, the preparatory activity in SSc was examined so as to provide an “in-brain” control region. That is, based on its well-known sensory response properties, SSc should only begin representing information related to the action-sequence task when the hands mechanoreceptors have been stimulated at movement onset and/or object contact (see Johansson and Flanagan 2009 for review), but not earlier (i.e., during the Plan epoch).
All of these above ROIs were identified using the action-sequence experiment data via their role in movement generation by contrasting activity for movement execution versus planning (collapsed across trial types): Execute(Grasp-to-Hold + Grasp-to-Place-Left + Grasp-to-Place-Right) > Plan(Grasp-to-Hold + Grasp-to-Place-Left + Grasp-to-Place-Right). The resulting statistical map of all positively activated voxels in each participant was then used to define the ROIs within the left hemisphere (at t = 3, P < 0.005; each participant's activation map was cluster-threshold corrected at P < 0.05 so that only voxels passing a minimum cluster size were included in the map). The voxels included in each ROI were selected based on all significant contiguous activity within a (15 mm)3 cube (i.e., 3375 mm3 or 125 voxels) centered on the peak voxel of activity within predefined anatomical landmarks (see Selection Criteria). This approach ensured that regions were selected objectively, that a similar number of voxels were included within each ROI, that the ROI size was big enough to allow for pattern classification (an important consideration), and that regions could be largely segregated from adjacent regions (see also Downing et al. 2006). The average number of functional voxels selected across the 14 participants in each ROI is given in Supplementary Table 1.
Critically, the contrast employed to select these frontoparietal areas (i.e., Execute > Plan, collapsed across conditions) is orthogonal to those used in the pattern-information analyses (i.e., Grasp-to-Hold vs. Grasp-to-Place, and Grasp-to-Place-Left vs. Grasp-to-Place-Right). Thus, the selection criteria will not bias the ROIs to exhibit pattern differences between conditions (for verification of this fact, see the signal response amplitudes in Figs 5 and 6 and the univariate analyses in Fig. 8).
SPOC was defined by selecting voxels located medially and directly anterior (or sometimes within) the parieto-occipital sulcus (Gallivan et al. 2009). Posterior intraparietal sulcus (pIPS) was defined by selecting activity at the caudal end of the IPS (Beurze et al. 2009). Anterior IPS (aIPS) was defined by selecting voxels directly at the junction of the IPS and postcentral sulcus (PCS) (Culham et al. 2003). Somatosensory cortex (SSc) was defined by selecting voxels encompassing the postcentral gyrus and PCS, medial and anterior to aIPS (Gallivan, McLean, Valyear, et al. 2011). Motor cortex (M1) was defined by selecting voxels around the “hand knob” landmark in the central sulcus (Yousry et al. 1997). Dorsal premotor (PMd) cortex was defined by selecting voxels at the junction of the precentral sulcus (PreCS) and superior frontal sulcus (SFS) (Picard and Strick 2001). Ventral premotor (PMv) cortex was defined by selecting voxels posterior to the junction of the inferior frontal sulcus (IFS) and PreCS (Tomassini et al. 2007). Finally, the SMA was defined by selecting voxels adjacent and anterior to the medial end of the CS and posterior to the plane of the anterior commissure (Picard and Strick 2001; Gallivan, McLean, Valyear, et al. 2011). See Supplementary Table 1 for details about ROI sizes, and Figure 2A for representative locations in a single participant.
Occipitotemporal (OTC) ROIs
Eight OTC ROIs (the left and right lateral occipital [LO] areas, the left and right posterior fusiform sulcus [pFs] areas, the left and right extrastriate body areas [EBA], and the left and right fusiform body areas [FBA]) were chosen based on their well-documented role in object- and body-related processing in humans (Grill-Spector and Malach 2004; Peelen and Downing 2007) (See Supplementary Table 1 for a list of the regions).
In the case of LO and pFs, both of these areas are thought to form the core components of a visual network involved in object processing (Grill-Spector et al. 2001; Grill-Spector and Malach 2004). With regards to the current study, recent work has reported activity in the vicinity of LO during the execution of grasping-related tasks (Cavina-Pratesi et al. 2010), and some of our own work shows that certain aspects of simple actions directed “toward” objects (i.e., whether hand preshaping is required in a movement) can actually be decoded from pre-movement activity patterns in both LO and pFs (Gallivan, Chapman, et al. 2013). A goal of the present work was to both replicate and significantly extend these previous findings by determining whether these areas also encode far more complex movements that involve interactions “with” objects.
With respect to EBA and FBA, both of these areas are thought to form the key components of a visual network involved in body-related processing (Peelen and Downing 2005a; Schwarzlose et al. 2005; Downing et al. 2006; see Downing and Peelen 2011 for review). With regards to the current study, EBA, in particular, has been shown to be activated by self-generated unseen movements (i.e., hand actions, Astafiev et al. 2004; Orlov et al. 2010, though see Peelen and Downing, 2005b), suggesting a convergence of both visual and motor information related to the body in EBA. In both EBA and FBA, we have also recently shown that their pre-movement signals can be used to decode grasp versus reach actions directed “toward” objects (Gallivan, Chapman, et al. 2013; Gallivan, McLean, Valyear, et al. 2013). As in the case of the object-selective areas, a major goal of the present work was to both replicate and significantly expand upon these previous findings using a more complex object-manipulation task.
For each participant, each of the above 8 OTC ROIs was defined based on the peak voxel of a particular contrast (or conjunction) from the localizer experiment data and constrained by the anatomical location expected from previous reports (see Selection Criteria). Voxelwise and cluster thresholds, selection procedures, and ROI volume constraints were the same as for the frontoparietal ROIs. If information related to intended object-directed action sequences can be decoded from any of these areas, it would indicate that the area not only represents objects (and/or the body) during visual-perceptual processing but also represents real goal-directed action sequences to be performed “upon” objects (“by” the body).
Object-sensitive activity in LO and pFs was localized based on the contrast of Non-tool objects > Scrambled non-tool objects. Left and right LO were defined around the peak voxel near the LO sulcus (Malach et al. 1995; Grill-Spector et al. 1999; Grill-Spector et al. 2001). Left and right pFs were defined around the peak voxel in the posterior aspect of the fusiform gyrus, extending into the occipitotemporal sulcus (Grill-Spector et al. 1999; Grill-Spector et al. 2001). Body-sensitive activity in EBA and FBA was selected based on a conjunction contrast of ([Bodies > Scrambled] AND [Bodies > Tools] AND [Bodies > Objects]) (we define a conjunction contrast as a Boolean AND, such that, for any one voxel to be flagged as statistically significant, it must show a difference for each of the constituent contrasts.) Left and right EBA were defined around the peak voxel in the posterior inferior temporal sulcus/middle temporal gyrus (Downing et al. 2001; Peelen and Downing 2005c), superior to LO. Left and right FBA were defined around the peak voxel in the fusiform gyrus (Peelen and Downing 2005a; Schwarzlose et al. 2005). See Supplementary Table 1 for details about ROI sizes and Figure 2B for locations on a representative participant's brain.
Note that for the purposes of visually comparing some of the whole-brain searchlight findings (see below) with some of the object- and body-selective OTC ROIs, we also performed a group-level random-effects analysis in which, using the same contrasts as defined earlier, we functionally identified the object- and body-selective areas (at P < 0.005, cluster-size threshold corrected; see Fig. 7). All of these functional areas were easily identified at the group-level, with the exception of L-FBA (note that this failure to reliably identify L-FBA at the group-level directly follows from some of our recent work [Hutchison et al. 2014]).
Non-Brain Control ROIs
To ensure that our decoding accuracies could not result from spurious factors (e.g., task-correlated arm or head movements) or were unlikely to arise simply due to chance, we created control ROIs in locations in which no statistically significant classification should be possible: the left and right ventricles. To select these ROIs, we further reduced our statistical threshold [after specifying the (Execute > Plan) network within each participant] down to t = 0, P = 1 and selected all activation within (15 mm)3 centered on a consistent point within each participant's left and right lateral ventricles (see Supplementary Fig. 1 for representative locations in an individual subject and the results of this control analysis).
Pattern Classification Analysis
Support Vector Machine Classifiers
Pattern classification was performed with a combination of in-house software (using Matlab) and the Princeton MVPA Toolbox for Matlab (http://code.google.com/p/princeton-mvpa-toolbox/) using a support vector machine (SVM) classifier (libSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvm/). The SVM model used a linear kernel function and a constant cost parameter, C = 1, to compute a hyperplane that best separated the trial responses. To test the accuracy of the SVM classifiers, we used a “leave-one-run-out” N-fold cross-validation, in which a single fMRI run was reserved for classifier testing. We performed this N-1 cross-validation procedure until all runs were tested and then averaged across N-iterations in order to produce a representative classification accuracy measure for each participant, ROI, trial epoch, and multiclass or pairwise discrimination (see Duda et al. 2001).
Multiclass and Pairwise Discriminations
SVMs are designed for classifying differences between 2 patterns, and LibSVM (the SVM package implemented here) uses the so-called one-against-one method for classification (Hsu and Lin 2002). With the SVMs, we performed 2 complementary types of classification analyses; one in which the multiple pairwise results were combined in order to produce multiclass discriminations (distinguishing among 3 trial types) and the other in which the individual pairwise discriminations (i.e., Grasp-to-Hold vs. Grasp-to-Place-Left, Grasp-to-Hold vs. Grasp-to-Place-Right, and Grasp-to-Place-Left vs. Grasp-to-Place-Right) were examined and tested separately.
The multiclass discrimination approach allowed for an examination of the distribution of the classifier guesses through the visualization of the resulting “confusion matrix” (for such visualizations, see Supplementary Material). In a confusion matrix, each row (i) represents the instances of the actual trial type and each column (j) represents the predicted trial type. Their intersection (i, j) represents the (normalized) number of times a given trial type i is predicted by the classifier to be trial type j. Thus, the confusion matrix provides a direct visualization of the extent to which a decoding algorithm confuses (or correctly identifies) the different classes. All correct guesses are located in the diagonal of the matrix (with classification errors represented by non-zero values outside of the diagonal) and average decoding performance is defined as the mean across the diagonal. The values in each row sum to 1 (100% classification). If decoding is at chance levels, then classification performance will be at 1/3 = 33.3%. For all multiclass discriminations, we statistically assessed decoding significance across participants (for each ROI and trial epoch) using two-tailed t-tests versus 33.3% chance decoding.
Examination of pairwise discriminations allowed us to identify ROIs encoding movement complexity and the spatial end goals. For example, if an ROI discriminates Grasp-to-Hold versus Grasp-to-Place-Left AND Grasp-to-Hold versus Grasp-to-Place-Right trials, but not Grasp-to-Place-Left versus Grasp-to-Place-Right trials, it would suggest that the area in question may discriminate movement complexity (this is because the Grasp-to-Place-Left and Grasp-to-Place-Right trials require more elaborative movements than the Grasp-to-Hold trials), but not the final spatial goals of the Grasp-to-Place movements (i.e., whether the cube will be placed in the left versus right cup). It is important to recognize that this hypothetical result would be largely obscured using a multiclass discrimination approach. For pairwise discriminations, we statistically assessed decoding significance across participants using two-tailed t-tests versus 50% chance decoding. For both the multiclass and pairwise discriminations, a FDR correction of q ≤ 0.05 was applied based on the number of ROIs examined (Benjamini and Hochberg 1995).
Searchlight Pattern-Information Analyses
To complement the ROI analyses, we also performed a whole-brain pattern analysis in each individual using a searchlight approach (Kriegeskorte et al. 2006). Here, the classifier moved through each individual participant's (Talairach-normalized) brain in a voxel-by-voxel fashion whereby, at each voxel, a sphere of surrounding voxels (searchlight sphere radius of 3 voxels, n = 123) were extracted and input into the SVM classifier. The decoding accuracy for that sphere of voxels was then written to the central voxel. This searchlight procedure was performed for each of the pairwise discriminations and used the activity patterns associated with the Plan epoch (see Inputs to the SVM Classifier). Thus, for each subject, 3 whole-brain maps of classification accuracies were obtained: one for the Grasp-to-Hold versus Grasp-to-Place-Left comparison during planning, one for the Grasp-to-Hold versus Grasp-to-Place-Right comparison during planning, and another for the Grasp-to-Place-Left versus Grasp-to-Place-Right comparison during planning. For each voxel, we statistically assessed decoding significance across participants using a two-tailed t-test versus 50% chance decoding. For the whole-brain group results, we used cluster-size-corrected alpha levels; this involved thresholding the individual voxels at P < 0.05 (uncorrected) and then applying a cluster-size threshold generated by a Monte Carlo style permutation test (implemented in AlphaSim, neuroelf.net) that maintains Type I Error rate at the 0.05 level.
Inputs to the SVM Classifier
BOLD percent signal change values for each ROI and searchlight voxel provided inputs to the SVM classifier. The percent signal change response was computed from the time-course activity at a time point(s) of interest with respect to the time-course of a run-based averaged baseline value, for all voxels in the ROI. The baseline window was defined as volume −1 (averaged across all trials within an experimental run), a time point prior to the onset of each trial that also avoids contamination from responses of the previous trial. For the Plan epoch—the time points of critical interest—we extracted for each trial the average of the final 2 imaging volumes prior to the subject hearing the auditory cue to initiate a movement (see Fig. 1E gray shading bordered in light blue). Note that, due to the jittered timing of the delay intervals, these final 2 imaging volumes differed across trials with respect to the amount of time for which individuals had been planning a movement. For the Execute epoch time points, we extracted for each trial the average of imaging volumes 4–5 (with respect to onset of the Execute epoch), time points generally corresponding to the peak (and time point following the peak) of the transient motor execution response, which accompanies initiation of the movement sequence (see percentage signal change time-courses in Figs 5–6). The time points extracted for pattern classification are similar to those used in our previous work (e.g., Gallivan, McLean, Valyear, et al. 2013).
Following the extraction of each trial's activity, these values were rescaled between −1 and +1 across all trials for each individual voxel within an ROI or searchlight sphere (Misaki et al. 2010). This epoch-dependent analysis approach, in addition to revealing which types of object-directed action sequences could be decoded, allowed us to examine when in time movement information was available from the patterns of brain activity (i.e., during the Plan and/or Execute epoch of the trial).
Behavioral Control Experiment
All subjects participated in a behavioral testing session (performed outside the MRI scanner and before the fMRI experiments) in which their eye fixations and forces corresponding to manipulatory events (i.e., liftoff and replacement of the cube object and dropping the cube in either cup) were measured as they completed the action-sequence tasks. This testing session was used for participant screening (1 individual was excluded from further participating in the fMRI testing sessions due to poor fixation performance) and to determine, from an analysis of their force and eye-movement behavior, whether participants were, respectively, (1) maintaining the object-directed action sequence to be performed in memory over the delay period of each event-related trial (i.e., Plan epoch) and (2) able to reliably maintain fixation over the duration of an fMRI testing session (thereby arguing against alternative “eye-movement confound” interpretations of the fMRI data). Each participant completed 9 experimental runs, identical to those performed in the MRI scanner.
Measurement of Forces
In each trial, the participant lifted the cube object from a tabletop platform instrumented with force sensors (Nano 17 F/T sensors; ATI Industrial Automation) and then, depending on the prepared action, replaced the cube object in the same location (Grasp-to-Hold trial) or deposited it into 1 of the 2 cups (Grasp-to-Place-Left and Grasp-to-Place-Right trials). Three force sensors, which were capped with flat circular disks with a diameter of 3 cm, supported the cube (in its home position) and the 2 cups. The force sensors measured the vertical forces exerted by the cube object and the cups (signals sampled at 1000 Hz and low-pass filtered using a fourth-order, zero-phase lag Butterworth filter with a cutoff frequency of 10 Hz), allowing us to track the progression of the movement sequences (see Fig. 3B). Prior to beginning the experiment, participants received both verbal instructions and a demonstration by the experimenter as to how to correctly perform the object-directed action sequences (following this behavioral control experiment, participants recruited to take part in the MRI version of the task were then instructed to use the same general movements and timing). Note that force measurements in this behavioral testing session were primarily taken only to provide additional confirmation that participants were capable of performing the task correctly.
An infrared video-based eye-tracking system (ETL 500 pupil/corneal tracking system, ISCAN, Inc.), mounted below a headband, recorded the gaze position of the left eye at 240 Hz as the participant maintained fixation on a dot displayed on a computer monitor (1024 × 768; 60 Hz refresh rate) located directly behind the tabletop platform and positioned at an average across-participants height above the cube object of ∼9.45° visual angle. Gaze was calibrated using a two-step procedure: an initial five-point calibration using ISCAN's Line-of-Sight Plane Intersection Software followed by a 25-point calibration routine. Calibration points (4-mm-diameter circles) were shown on the computer monitor where the fixation point was projected and distributed over a region that incorporated the fixation point, the hand start location, and the locations of the cube home position and cups. The ISCAN calibration converted raw gaze signals into pixels from the line-of-sight camera and the 25-point calibration converted pixels (i.e., the output of the ISCAN calibration) into the coordinates of the computer monitor. Gaze was calibrated at the start of the experiment and was checked following each block of trials so that, if necessary, gaze could be re-calibrated before starting a new test block.
Behavioral Control Experiment
Measurement of the forces corresponding to manipulatory events in the separate behavioral testing session, as well as experimenter verification from videos collected inside the MRI scanner during the task, indicates that participants were able to reliably maintain in memory, over the delay period of each event-related trial, the object-directed action sequences to be performed. In addition, cumulative distributions of the standard deviation of horizontal and vertical gaze positions in all trials performed by all participants (see Fig. 3C), in combination with our observation during analysis that participants did not make saccades, demonstrate that participants had little difficulty maintaining their gaze at the fixation point. Nevertheless, to determine the extent to which small systematic movements of the eyes might account for the fMRI decoding of planned and executed object-directed action sequences, we further examined whether, for each of the Plan and Execute epochs of the trial, subtle differences in eye position and its variance were present between the different trial types (i.e., Grasp-to-Hold, Grasp-to-Place-Left, and Grasp-to-Place-Right). Following the removal of blinks and their related artifacts, this entailed computing the horizontal and vertical eye position means and SDs for each trial and trial type over 2 time separate time bins: 1) the Plan epoch, defined as the onset of the auditory instruction (i.e., “Grasp,” “Left,” or “Right) to the time that the auditory Go instruction was given and 2) the Execute epoch, defined as the onset of the auditory Go instruction to the time that the auditory instruction was given for the following trial (i.e., combining the Execute and ITI phases of the fMRI trial). These eye-movement measures were then each subjected to both univariate and multivariate analyses.
For the univariate analyses, we performed several repeated-measures ANOVAs (each with factor trial type). Importantly, we found that none of these ANOVAs reached significant levels (Plan Epoch-Horizontal eye position, F1.595, 20.740 = 0.707, P = 0.474; Plan Epoch-Vertical eye position, F1.330, 17.290 = 0.025, P = 0.976; Plan Epoch-Horizontal eye variability, F1.847, 24.012 = 3.124, P = 0.061; Plan Epoch-Vertical eye variability, F1.356, 17.632 = 0.098, P = 0.831; Execute Epoch-Horizontal eye position, F1.328, 17.264 = 0.175, P = 0.751; Execute Epoch-Vertical eye position, F1.485, 19.303 = 0.647, P = 0.490; Execute Epoch-Horizontal eye variability, F1.498, 19.472 = 3.039, P = 0.083; Execute Epoch-Vertical eye variability, F1.351, 17.560 = 1.152, P = 0.317; all tests Greenhouse–Geisser corrected).
For the multivariate analyses, we performed 2 separate classification analyses using SVMs. In the first analysis, the classifier inputs consisted of mean horizontal and vertical eye positions for each of the Plan and Execute epochs for each trial; in the second analysis, the classifier inputs instead consisted of the horizontal and vertical eye position SDs for each of the Plan and Execute epochs for each trial. Using the same leave-one-run-out cross-validation procedure and binary classification approach as implemented in the fMRI decoding analysis, we found that trial type decoding based on mean eye position and its SD was not significantly different than chance levels (i.e., 50%) for both the Plan and Execute epochs of the trial (Plan epoch-Eye position: Grasp-to-Hold vs. Grasp-to-Place-Left: 47.4%, SEM: 2.4%, P = 0.300; Grasp-to-Hold vs. Grasp-to-Place-Right: 47.7%, SEM: 1.9%, P = 0.253; Grasp-to-Place-Left vs. Grasp-to-Place-Right: 55.6%, SEM: 3.2%, P = 0.086; Plan epoch-eye variability: Grasp-to-Hold vs. Grasp-to-Place-Left: 43.1%, SEM: 3.8%, P = 0.091; Grasp-to-Hold vs. Grasp-to-Place-Right: 51.3%, SEM: 1.5%, P = 0.413; Grasp-to-Place-Left vs. Grasp-to-Place-Right: 47.1%, SEM: 2.3%, P = 0.224; Execute epoch-Eye position: Grasp-to-Hold vs. Grasp-to-Place-Left: 45.5%, SEM: 3.3%, P = 0.186; Grasp-to-Hold vs. Grasp-to-Place-Right: 47.2%, SEM: 1.9%, P = 0.170; Grasp-to-Place-Left vs. Grasp-to-Place-Right: 54.5%, SEM: 2.9%, P = 0.154; Execute epoch-eye variability: Grasp-to-Hold vs. Grasp-to-Place-Left: 53.3%, SEM: 3.8%, P = 0.400; Grasp-to-Hold vs. Grasp-to-Place-Right: 48.8%, SEM: 1.9%, P = 0.538; Grasp-to-Place-Left vs. Grasp-to-Place-Right: 52.3%, SEM: 3.3%, P = 0.498). Taken together, these univariate and multivariate results reveal negligible evidence of eye movements in our participants and suggest that differences in eye position and its stability are unlikely to account for any accurate decoding performance found throughout frontoparietal cortex and OTC.
For the sake of completeness, we also examined the extent to which differences in reaction time (RT) and movement time (MT) existed across the trial types. In the context of our task, we defined RT as the time from the onset of the Go cue to object contact (the latter being defined as the time when the absolute load force rate first exceeded 0.5 N/s), and we defined MT either as the time from object contact to object replacement (for Grasp-to-Hold trials) or from object contact to object placement in one of the cups (for Grasp-to-Place-Left and Grasp-to-Place-Right trials, cube replacement and cube placement in the cup were defined as the time when the absolute load force rate first exceeded 0.5 N/s). Whereas a repeated-measures ANOVA of RT was non-significant (F1.403, 18.236 = 2.248, P = 0.145; mean RTs: Grasp-to-Hold, 1491 ms; Grasp-to-Place-Left, 1435 ms; Grasp-to-Place-Right, 1492 ms), we found that this was not the case for MT (F1.348, 17.529 = 9.373, P = 0.004; Mean MTs: Grasp-to-Hold, 1106 ms; Grasp-to-Place-Left, 972 ms; Grasp-to-Place-Right, 1079 ms). This latter effect appears to be driven by very small but reliable MT differences between Grasp-to-Place-Left trials and each of the Grasp-to-Hold and Grasp-to-Place-Right trials (P = 0.021 and P = 0.001, respectively).
Localization of Frontoparietal ROIs
To determine the extent to which sequence-related information is represented in the voxel patterns of activity in frontoparietal cortex during action planning, we localized 8 different frontoparietal ROIs (SPOC, pIPS, aIPS, M1, SMA, PMd, PMv, and SSc), each thought to play key roles in action planning and control in both humans and NHPs.
Using the action-sequence experiment data, each of these aforementioned ROIs was defined via their elevated responses during movement execution with the contrast of Execute versus Planning (collapsed across trial types): Execute(Grasp-to-Hold + Grasp-to-Place-Left + Grasp-to-Place-Right) > Plan(Grasp-to-Hold + Grasp-to-Place-Left + Grasp-to-Place-Right). This contrast ensured that only voxels involved in initiating movements were included for analysis and directly follows from our previous work in the area (Gallivan, McLean, Smith, et al. 2011; Gallivan, McLean, Valyear, et al. 2011). All 8 of these ROIs were reliably activated and identified in the left hemisphere (i.e., contralateral to the acting right hand/limb) of each individual subject. Each ROI was defined at the single-subject level using stringent selection criteria and procedures outlined in the section Materials and Methods. See Figure 2 and Supplementary Table 1 for an overview of these areas.
Sequence-Related Decoding from Frontoparietal Cortex
fMRI pattern classification analyses revealed that, in several frontoparietal regions, we could successfully decode, prior to execution, which of the 3 sequences of object-directed actions participants were intending to perform. These decoding results are briefly discussed below in accordance with the nature of the sequence-related information that could be revealed from the regions (see Fig. 4 for a schematic overview of our ROI findings). It is worth noting that although in some areas we do in fact observe several interesting pattern classification profiles during movement execution (i.e., Execute epoch), any claims concerning this activity require some restraint. For instance, it is unclear during movement execution whether observed decoding may be linked to the motor actions being generated, the accompanying visual, proprioceptive, and tactile responses that are evoked, or—perhaps more likely—a combination of both motor- and sensory-related signals. Given this ambiguity, the primary focus of the current paper is on the pattern information that emerges prior to movement onset—points in time where the motor action (and its associated sensory consequences) has yet to be generated. Thus, the Execute epoch findings, when relevant, are only briefly discussed.
In SSc, we found no above-chance decoding during the Plan epoch with either the multiclass or pairwise discrimination pattern analyses (see the bar plots in Fig. 5; see also Supplementary Table 2 for stats). Importantly, however, we did find significant decoding of all 3 object-directed action sequences when analyzing the Execute epoch-related activity (Fig. 5). This is consistent with neural discriminations related to the tactile feedback received by the hand once the task has actually been initiated. These control findings, in addition to confirming the well-documented role of SSc in sensory feedback processing, suggest that—at least during movement preparation—signals for intended actions might be primarily constrained to areas with well-documented planning-related responses (see also Gallivan, McLean, Smith, et al. 2011; Gallivan, McLean, Valyear, et al. 2011). Taken together, these SSc findings offer a good control of data quality (i.e., showing both negative and positive decoding effects for the Plan and Execute epochs of the trial, respectively) and strongly reinforce the notion that the signals being discriminated with the pattern classification methods are unlikely to arise simply due to chance. Decoding analyses in non-brain control regions (see Materials and Methods and Supplementary Material) were used to further ensure that our decoding accuracies are unlikely to result from spurious factors related to the task (see Supplementary Fig. 1).
The multiclass discriminations in SPOC showed that, during preparation (i.e., based on activity during the Plan epoch), the 3 action sequences could be reliably discriminated from each other (Fig. 5, Supplementary Table 2). However, further examination of the individual pairwise discriminations (Fig. 5, pink, cyan, and purple bars) revealed that successful multiclass discrimination was driven largely by correct classifications of the Grasp-to-Hold versus Grasp-to-Place-Left and Grasp-to-Hold versus Grasp-to-Place-Right trials types and not those of the Grasp-to-Place-Left versus Grasp-to-Place-Right trial types (in which decoding accuracies were not significantly above-chance classification levels; see Supplementary Table 2). Notably, the exact same pattern of results, for both the multiclass and pairwise discriminations, was also revealed in both aIPS and PMv (see Fig. 5; see also Supplementary Table 2). These findings suggest that neural activity in SPOC, situated at one of the earliest levels of visual processing for action in posterior parietal cortex, as well preparatory activity in 2 areas frequently associated with grasp-selective responses, aIPS and PMv, may primarily represent the complexity of the upcoming movement sequence (i.e., whether a Grasp-to-Hold versus Grasp-to-Place movement will be performed) rather than the spatial end goals of the more complex object-directed sequences (i.e., in which particular cup the cube will be placed).
Notably, investigation of the planning-related signals in all remaining frontoparietal regions (i.e., pIPS, M1, SMA and PMd) revealed that the 3 object-directed actions sequences were differently represented (see multiclass and pairwise decoding bar plots in Fig. 5; see also Supplementary Table 2). This result suggests that each of these regions, though likely playing different and unique roles, is at some level involved in encoding each of the object-directed action sequences to be performed upon the centrally located cube object. Although the ability to decode the intended final spatial goals of the action sequences (i.e., the Grasp-to-Place movements) in several of these areas is consistent with some previous fMRI work describing target location-related signals in these same regions (Beurze et al. 2007; Stark and Zohary 2008; Beurze et al. 2009; Beurze et al. 2010; Gallivan, McLean, Smith, et al. 2011, see Filimon 2010, for review) here, we show that this spatial goal encoding must be somewhat invariant to the initial series of actions (i.e., the Grasp-to-Hold movements), as that component of the sequence is identical across both Grasp-to-Place actions. In effect, this demonstrates that preparatory signals in many of the aforementioned areas must be tuned to the second-next movement of the sequence.
Localization of Occipitotemporal ROIs
To additionally determine whether sequence-related information is represented in the voxel patterns of activity in OTC during action planning, we further localized 8 different ROIs (left and right LO, pFs, EBA, and FBA), each of these being involved in either object- or body-related processing. Using the localizer data collected in a separate fMRI testing session (see Materials and Methods), in each subject, left and right LO and pFs were reliably identified via their increased responses to intact versus scrambled objects, and left and right EBA and FBA were reliably identified via their increased responses to bodies versus other stimulus categories (conjunction contrast of bodies > tools, objects, and scrambled images). See Materials and Methods for ROI selection criteria and procedures. See Figure 2 and Supplementary Table 1 for an overview of these areas.
Sequence-Related Decoding from Occipitotemporal Cortex
Given that the activity of OTC is typically linked to the processes involved in visual perception and object recognition and not those involved in action planning and control (Goodale and Milner 1992), the reliability with which we were able to predict different action sequences from the localizer-defined object- and body-selective areas in OTC—during planning and before initiation—was noteworthy. The results for OTC, like that of frontoparietal cortex, are briefly discussed below in accordance with the nature of the sequence-related information that could be decoded from the regions.
The first finding worth noting is that, in both right LO and right EBA, we were unable to extract any sequence-related information prior to movement onset (see Fig. 6; see also Supplementary Table 3). In contrast, we found that, in both left pFs and left FBA, the pre-movement activity could be used to reliably decode which of the upcoming Grasp-to-Place movements subjects were going to make (i.e., Grasp-to-Place-Left vs. Grasp-to-Place-Right) but could not reliably decode differences between the Grasp-to-Place and Grasp-to-Hold trials (see Fig. 6; see also Supplementary Table 3). Notably, investigation of the planning-related signals in all remaining OTC regions (i.e., left LO, right pFs, left EBA, and right FBA) revealed that the intention to perform each of the 3 object-directed action sequences was differently represented prior to movement onset (see Fig. 6; see also Supplementary Table 3). In brief, these findings extend previous reports of action-related processing in OTC (e.g., Astafiev et al. 2004; Orlov et al. 2010; Gallivan, Chapman, et al. 2013; Gallivan, McLean, Valyear, et al. 2013) by showing that, during planning, contralateral LO and EBA represent not just the initial action to be performed upon an object (i.e., grasping) but also the second-next movement in the sequence (i.e., placing). When contrasted with the lack of sequence-related decoding found in right LO and right EBA (noted above), one speculative possibility is that preparation-related activity in LO and EBA may be preferentially linked to the limb (right hand) to be used in the movement. We note, however, that future testing with the other (left) limb would be required to unequivocally make such claims of contralaterality in LO and EBA. Likewise, it remains unclear the extent to which the contralaterality of these effects may reflect, in part, the handedness (right) of our participants.
Sequence-Related Decoding across OTC ROIs
Given some of the marked differences in decoding observed across OTC during planning, we next examined the extent to which between-region differences in decoding for the 3 pairwise comparisons (i.e., Grasp-to-Hold vs. Grasp-to-Place-Left, Grasp-to-Hold vs. Grasp-to-Place-Right, and Grasp-to-Place-Left vs. Grasp-to-Place-Right) reached statistical significance. We reasoned that if decoding is in fact lateralized in posterior OTC (i.e., LO and EBA) and that if certain object- and body-selective ROIs encode features of movement more strongly than others, then comparisons of the decoding performance between regions might reveal some of this functional architecture. Using the Plan epoch decoding accuracy values, we performed 2 separate 4 (number of ROIs) × 3 (number of pairwise comparisons) omnibus repeated-measures ANOVAs (rm-ANOVA)—one for the object-processing regions (left and right LO and pFs) and one for the body-processing regions (left and right EBA and FBA).
For the object-selective ROIs rm-ANOVA, only the main-effect of ROI was significant (F2.126 = 5.881, P = 0.002, Greenhouse–Geisser corrected), suggesting differences in information decoding across the ROIs. To further investigate these differences, we performed a series of planned comparisons (using paired sample t-tests) to test whether, for each pairwise comparison, decoding accuracies differed for homologous regions in the left and right hemispheres (i.e., L-LO vs. R-LO and L-pFs vs. R-pFs) and whether they differed for posterior and anterior object-selective ROIs within the same hemisphere (i.e., L-LO vs. L-pFs and R-LO vs. R-pFs) (for the sake of completeness, we report both the significant effects (P ≤ 0.05) and trends toward significance [P ≤ 0.15]). Notably, we found differences (and trends toward differences) in decoding accuracies for Grasp-to-Hold versus Grasp-to-Place-Left, Grasp-to-Hold versus Grasp-to-Place-Right, and Grasp-to-Place-Left versus Grasp-to-Place-Right comparisons between the following ROIs: L-LO > R-LO (Grasp-to-Hold vs. Grasp-to-Place-Left, P = 0.045, Grasp-to-Hold vs. Grasp-to-Place-Right, P = 0.005, Grasp-to-Place-Left vs. Grasp-to-Place-Right, P = 0.002), L-LO > L-pFs (Grasp-to-Hold vs. Grasp-to-Place-Right, P = 0.136), and R-pFs > R-LO (Grasp-to-Hold vs. Grasp-to-Place-Left, P = 0.040, Grasp-to-Hold vs. Grasp-to-Place-Right, P = 0.060, Grasp-to-Place-Left vs. Grasp-to-Place-Right, P = 0.022). Taken together, this suggests that 1) decoding in LO is lateralized to the hemisphere contralateral to the hand carrying out the action sequences (i.e., L-LO) and 2) sequence-related decoding in pFs can be largely found within both hemispheres.
For the body-selective ROIs rm-ANOVA, only the main-effect of ROI showed trends toward significance (F2.466 = 2.466, P = 0.077, Greenhouse–Geisser corrected). We further investigated these decoding accuracy differences using the same tests and approach taken with the object-selective ROIs and found differences and trends toward significance between the following ROIs: L-EBA > R-EBA (Grasp-to-Hold vs. Grasp-to-Place-Right, P = 0.010, Grasp-to-Place-Left vs. Grasp-to-Place-Right, P = 0.044), L-EBA > L-FBA (Grasp-to-Hold vs. Grasp-to-Place-Right, P = 0.054), and R-FBA > R-EBA (Grasp-to-Place-Left vs. Grasp-to-Place-Right, P = 0.067). Similar to that noted with the object-selective ROIs, these findings suggest a trend toward sequence-related decoding in EBA being largely lateralized contralateral to the acting hand, with this contralateral selectivity vanishing more ventro-anteriorly in FBA. This general pattern of effects is consistent with the gradient of OTC representations found in our previous work that used much simpler action-related tasks (see Gallivan, Chapman, et al. 2013).
To complement the ROI analyses and to determine whether, during planning, representations of action sequences could be found in brain areas outside the predefined frontoparietal and occipitotemporal ROIs, we also performed a whole-brain searchlight analysis (Kriegeskorte et al. 2006). Like the ROI analysis, the searchlight approach identified several areas of sequence-related decoding in left frontoparietal and OTC including parietal, motor, supplementary motor, and premotor areas, as well as both lateral occipital and ventro-anterior cortex (see Fig. 7; for the whole-brain percent decoding maps of the individual pairwise comparisons, see Supplementary Figs 2–4). In addition, the searchlight analysis further revealed decoding in several medial and lateral frontal/prefrontal regions, as well as the superior and middle temporal gyrus; expanses of cortex that had not been considered in our a priori ROI analysis. Notably, the searchlight analysis also revealed that sequence-related decoding was not limited to the contralateral (left) hemisphere, as examined with the ROI-based analyses, but extended into the ipsilateral (right) hemisphere, albeit to a much lesser extent (see Fig. 7). These latter findings provide additional support for increasing evidence that action-based neural representations for hand- and/or limb-related movements can be observed within the ipsilateral hemisphere (Diedrichsen et al. 2013; Gallivan, McLean, Flanagan, et al. 2013; Waters-Metenier et al. 2014; Wiestler et al. 2014, though for some caveats in interpreting such representations, see Leone et al. 2014).
Despite there being many brain areas in which the results of ROI- and searchlight-based analyses appear to converge, we do in fact observe some brain areas in which there are discrepancies in the 2 types of findings. For instance, there are some cortical areas considered in our ROI analysis, like left SMA for example, in which significant searchlight decoding appears limited to only one pairwise comparison (and not all 3 comparisons, as shown in Fig. 5), whereas in other areas, like right LO, we observe some searchlight decoding that is not captured by the ROI analysis (for comparison, see Fig. 6). Such discrepancies in the results of ROI- and search-based pattern-information analysis approaches relate to a variety of factors (e.g., effects of normalizing and group-averaging, etc.) and have been quite well-documented in the neuroimaging field (e.g., Etzel et al. 2013). Moreover, the fact that, here, spatial smoothing was not applied to the group data presumably adds to such discrepancies. For these reasons, though more limited in scope, we place a stronger emphasis on the results of our ROI-based decoding analyses and include the searchlight-based results primarily for visualization purposes.
ROI-Based Univariate Analyses
The significant decoding shown here across both frontoparietal and OTC ROIs for the different object-directed action sequences is not evident at the coarser level of the mean response amplitudes within each area. When we averaged trial responses across all voxels in each ROI for the same time points as extracted for pattern classification (i.e., Plan and Execute epoch signals, see Fig. 8), we observed only a few significant differences for the 3 planned movements (for related statistics, see Supplementary Table 4). The results of these conventional univariate analyses demonstrate the importance of analyzing the distributed patterns of activity across a region. Indeed, one might erroneously conclude, based on an examination of the mean signal amplitude responses alone (in Fig. 8), that only a very small minority of areas within frontoparietal and OTC encode planned action sequences.
Here, we examined the neural mechanisms supporting the planning of real object-directed action sequences, in which the complete series of movements were fully prepared ahead of their initiation. We found that in several frontoparietal and occipitotemporal areas, using the preparatory patterns of fMRI activity that form prior to movement onset, we could decode and, in effect, predict which of the 3 action sequences were to be performed. These “predictive” neural signals were manifest only in the distributed patterns of activity of each region, as nearly all areas examined showed highly overlapping signal amplitude responses during movement preparation. Based on previous work in NHPs (Tanji 2001; Fogassi et al. 2005; Bonini et al. 2011), the fact that we could decode planned object-manipulation sequences from several frontoparietal areas, such as aIPS, PMv, and SMA, may not be particularly surprising. In other areas like SPOC or pIPS, however, there is very little previous evidence suggesting that movement sequences, let alone object-manipulation tasks, are represented so posteriorly in parietal cortex (though see Baldauf et al. 2008; Wiestler and Diedrichsen 2013 for related examples).
We also found that both object-selective (LO and pFs) and body-selective (EBA and FBA) areas in OTC appeared to represent sequence-related information. In particular, with our ROI analyses, we observed a differentiation in the processing of sequence-related information along the posterior–anterior axis of OTC: Whereas areas LO and EBA were found to represent intended movements in the left hemisphere only, areas pFs and FBA failed to show this same contralateral selectivity. These findings suggest that information related to action sequences is represented not only in the SMA or even other frontoparietal structures, as previously shown. Rather, it appears to be widespread and distributed throughout cortex, notably extending into several well-documented areas of the ventral visual pathway.
Current Findings in the Context of Past Work
A diverse range of complex, sequential actions are characteristic of human daily behavior (e.g., looking, eating, communicating, and playing an instrument) and past research has probed the neural representations of several of these. For instance, several lines of research in NHPs have examined the planning and execution of multi-step target-directed eye (Fujii and Graybiel 2003; Ohbayashi et al. 2003; Histed and Miller 2006) and reach (Batista and Andersen 2001; Lu and Ashe 2005; Baldauf et al. 2008) movements. Other NHP research has investigated the sequencing of arbitrary sets of hand movements (e.g., sequential actions involving pushing, pulling, or turning a manipulandum; see Tanji and Shima 1994) or the movements of a virtual cursor on a monitor (Saito et al. 2005; Mushiake et al. 2006). Other research, using “real” object-manipulation tasks, has investigated how the end goals of a movement sequence (e.g., eating versus placing) are represented with respect to their component movements (e.g., grasping, see Bonini et al. 2011). Given the constraints of the MRI environment, studying the neural basis of these latter, more naturalistic types of object-manipulation tasks in humans has been challenging. Accordingly, most previous fMRI studies have focused on how more simple motor sequences, like those involving finger-press responses, used when typing on a keyboard for example, are cortically represented (e.g., Doyon et al. 2003; Koechlin and Jubault 2006; Wymbs et al. 2012; Wiestler and Diedrichsen 2013). Though, whereas this previous fMRI work points to an important role for frontoparietal circuits in generating action sequences, it has not suggested any such role for OTC. Why might this be the case?
We have previously shown that reach and grasp actions directed toward objects—much simpler than the more complex types of movements examined here—can also be decoded from OTC structures prior to movement onset (Gallivan, Chapman, et al. 2013; Gallivan, McLean, Valyear, et al. 2013). The current findings, in addition to extending this previous work and underscoring the complexity of the sensorimotor representations that can emerge at the level of OTC, appear to converge upon a common theme: The engagement of OTC seems to depend on the object-oriented nature of the sensorimotor processing required. That is, given the importance of OTC in object processing (e.g., Grill-Spector and Malach 2004), it is plausible that OTC may only be engaged—as here and in our previous work—during sensorimotor tasks that either require processing and knowledge of object properties, skilled interactions with those objects, or that alter the arrangement (and structure) of objects in the environment. Below, we further discuss other possible reasons and alternative explanations for why OTC may be preferentially engaged during object-oriented sensorimotor tasks.
Representation of Sequence Information in Frontoparietal and Occipitotemporal Networks
Implicit in nearly all interpretations of frontoparietal activity in NHPs is that the neuronal response patterns immediately preceding movement onset reflect parameters of the upcoming movements to be executed. For instance, pre-movement activity in traditional motor areas like M1 and PMd has been shown to be predictive of RT and movement variability and correlates well with several factors related to movement kinematics and kinetics (e.g., reach direction, distance, velocity, etc.; see Scott 2008; Churchland et al. 2010). Likewise, even in regions further removed from the final motor pathways, like parietal cortex, pre-movement signals are often described as being effector-specific (e.g., coding the limb vs. eye) and interpreted within the context of the sensorimotor transformations required for action (e.g., reference frame transformations; see Cohen and Andersen 2002). Whereas the specific parameters coded in these pre-movement signals are a matter of significant and robust debate (Scott 2008; Cisek and Kalaska 2010; Shenoy et al. 2013), there is general agreement that the signals are somehow linked to the generation of upcoming movement. With respect to the current study, it seems likely that, given their traditional role in planning and control (Goodale and Milner 1992), the pre-movement signals observed here in several frontoparietal areas code both the movement complexity (e.g., movement duration, muscles used, types of actions performed; Grasp-to-Place vs. Grasp-to-Hold trials) and spatial end goals (e.g., final target location; Grasp-to-Place-Left vs. Grasp-to-Place-Right trials) of prepared object-directed action sequences. This, however, then begs the question: If the activity patterns in frontoparietal cortex are somehow linked to the parameters of movement preparation, then what is being represented in OTC, which appears to contain some of the same sequence-related information?
Visual areas of the brain are necessary for processing motor-relevant target properties (such as spatial location), and behavioral studies indicate that an automatic component of preparing multi-step action sequences (such as reaching to multiple locations serially) is the deployment of visual attention to each of the goal locations in parallel (Baldauf et al. 2006). In accordance with this notion, the current OTC results may reflect the simultaneous deployment of visual attention, prior to movement, to all of the “task-relevant” objects on a given trial (i.e., the cube on Grasp-to-Hold trials, the cube and left cup on Grasp-to-Place-Left trials, and the cube and right cup on Grasp-to-Place-Right trials). Similarly, during movement preparation, the brain needs to keep track of the relative position of the hand with respect to the object(s) (e.g., Pesaran et al. 2006), and it is also possible that some portion of the pre-movement responses in OTC reflect a perceptual/sensory representation of the hand that is dynamically updated in the context of upcoming movements. Unfortunately, the current experimental design does not allow us to disentangle these possibilities and future studies will be required to determine the exact nature of the pre-movement response patterns in OTC. For example, it would be interesting to test whether the encoding in LO and pFs is more tightly linked to the objects to be acted upon (e.g., cube and cups), whereas in EBA and FBA, it is more tightly linked to upcoming postural changes in position of the hand (e.g., move left versus right).
Another, related, possibility is that some portion of the OTC activity reflects efference copies of the planned action sequences (Iacoboni et al. 2001; Orlov et al. 2010; Downing and Peelen 2011; Jastorff et al. 2012). In NHPs, parietal areas like aIPS are reciprocally connected with ventral pathway structures like inferotemporal cortex (IT), which contains areas involved in object recognition (Borra et al. 2008). Prefrontal areas, which interconnect densely with supplementary motor and premotor areas, are also interconnected with IT (Webster et al. 1994; Borra et al. 2010; Gerbella et al. 2010; Gerbella et al. 2013). Thus, the connectivity of OTC is entirely consistent with it receiving sequence-related information from (and sharing object-related information with) a variety of sensorimotor and cognitive structures (for an expansion on this general idea, see Mahon and Caramazza 2009; Mahon and Caramazza 2011). One important reason for sharing efference copies with OTC prior to movement initiation would be so that it can anticipate the sensory consequences of moving certain body parts (Haarmeier et al. 1997; Keysers and Perrett 2004). Given the delay of incoming sensory signals, this would allow for 1) movements of the body to be distinguished perceptually from movements of the world (von Helmholtz 1866; Haarmeier et al. 2001; Shergill et al. 2003) and 2) a sensorimotor updating of the forward-state estimations used for visually monitoring and implementing corrective actions for ongoing movements (Wolpert and Flanagan 2001; Johansson and Flanagan 2009).
Limitations to Interpretation
In principle, the representation of sequence-related information in frontoparietal cortex and OTC may be attributable to other factors. In the case of OTC, some fMRI studies show that the human motion-selective area, MT, which partly overlaps with the EBA (Downing et al. 2007), can be activated by imagery of visual motion (Kaas et al. 2010; Seurinck et al. 2011). Thus, a possible explanation of our findings is that the discriminative activity patterns in OTC may reflect visual imagery of the intended action sequences (see Orlov et al. 2010; Downing and Peelen 2011; Kuhn et al. 2011 for discussions). Though we cannot rule out some modulatory effect of visual imagery, it seems unlikely to be the sole factor contributing to our OTC results. Recall in the present study that, in our ROI-based results, we observe a lateralization of sequence encoding to the left (contralateral) hemisphere in both LO and EBA. The effects of visual imagery, on the other hand, would be expected to result in discriminatory activity in LO and EBA in both hemispheres, given that imagined Grasp-to-Place-Left and Grasp-to-Place-Right movements should activate the corresponding left and right visual fields. On this point, although we do not find complete differentiation of all 3 movements during the Execute epoch in left and right LO and EBA (see Fig. 6), if visual imagery was able to “exclusively” account for the OTC results during planning, then one would expect that at least some level of decoding should arise in right LO and right EBA ROIs prior to movement. Thus, though we clearly cannot exclude the possibility that visual imagery may have had a modulating effect on the present OTC findings, it is, at the same time, unlikely to fully account for them.
With regards to our findings in both the frontoparietal cortex and OTC areas in which all 3 movements could be decoded, it is possible that the activity in these regions, rather than representing the entire sequence of upcoming actions (i.e., reach, grasp, lift, transport, and then place), may only be representing the single differentiable component of the action sequence (i.e., the Grasp-to-Place action). In principle, such encoding could lead to the exact same pattern of effects being observed (as an encoding of cup location “only” could also lead to differentiation of Grasp-to-Place-Left and Grasp-to-Place-Right trials). Although based on the design of the current study we are unable to exclude this possibility, it seems likely that if the observed decoding of Grasp-to-Place-Left versus Right movements in some areas was solely linked to the retinotopic position(s) of the Cup location, then we might have also expected such effects to be reliably reflected—at least to some extent—in the signal amplitude responses of the region (i.e., higher percentage signal change responses for the Cup location in the contralateral visual field)(see, for review, Silver and Kastner 2009), which was not the case. Nevertheless, we recognize that future work will be required to fully disentangle representations related to the preparation of entire movement sequences versus those related to its component parts (e.g., the final target location only).
Lastly, several lines of work in NHPs have shown that the upcoming behavior or “intention” of an animal (e.g., to move the hand vs. eye or left vs. right limb) can be predicted based on the activity that forms prior to the movement in areas of parietal and premotor cortex (for reviews, see Andersen et al. 1997; Andersen and Buneo 2002; Andersen and Cui 2009). For these and other reasons, frontoparietal cortex is often ascribed a sensorimotor function during movement planning. As such, we expect any descriptions of “intention-related” activity for frontoparietal regions to be largely uncontroversial (though see Bisley and Goldberg 2010). Our descriptions of “intention-related” activity in the OTC, however, may be more controversial because this region is traditionally linked to the cognitive processes of visual perception and object recognition, not those of action (Goodale and Milner 1992). Indeed, in most cases, any sort of task-based modulation of activity in the region is subsumed under the general auspices of “attention-related” processing (for review, see Kastner and Ungerleider 2000; for recent NHPs findings, see Gregoriou et al. 2012). New neural evidence in macaque monkeys, however, paints a much more complex picture of the types of signals contained in the ventral visual pathway. Steinmetz and Moore (2014), recording from neurons in ventral visual area V4, report a modulation of visual cortical responses during the preparation of saccadic eye movements that is separate from the visual responses associated with the focus of attention. Notably, this preparatory saccadic activity was qualitatively similar to that of covert attention (e.g., similar increases in neuronal firing rates, similar stimulus selectivity, etc.), despite the fact that visual information at the potential saccade target was behaviorally irrelevant. While this intriguing pattern of effects has several possible interpretations, it does suggest that saccade preparation itself (i.e., the “intention” to move one's eyes to a particular location in space) is sufficient to modulate visual cortex activity.
Historically, there exists a long and robust debate as to whether the signals that precede movement onset should be described as reflecting the processes of one's action “intentions” or one's allocation of “attention” (for reviews, see Moore et al. 2003; Andersen and Cui 2009; Bisley and Goldberg 2010). This debate over nomenclature is one that we wish to avoid entirely. The important fact is that we were able to predict upcoming sequences of behaviors from several OTC areas: Whether these representations reflect, for example, “intention” or “motor attention” (Rushworth et al. 2001), or some general attentional “priority map” that is then used to guide sequencing behavior (Ipata et al. 2009), remains highly controversial and will be a topic for future work.
Conclusions and Implications
Much effort is currently directed toward developing cognitive neural prosthetics, robotic devices operable by intention-related brain signals related to movement goals (Andersen et al. 2010). A key question in this field concerns where in human cortex should such signals be recorded. Here, we show that signals specifying complete object-directed action sequences in advance of their movement are represented in several areas of frontoparietal cortex and OTC. This raises the perhaps counterintuitive notion that neural signals, not just from frontoparietal cortex, but also those from OTC, might be used to operate such devices.
This work was supported by an operating grant from the Canadian Institutes of Health Research (CIHR) awarded to J.R.F., I.S.J., and J.P.G. (MOP126158). J.P.G. was supported by a Banting Postdoctoral fellowship and Ontario Ministry of Research and Innovation Postdoctoral fellowship.
The authors thank Adam McLean for useful discussions and Martin York, Sean Hickman, and Don O′Brien for technical assistance. Conflict of Interest: None declared.