When we move around in the environment, we continually change direction. Much work has examined how the brain extracts instantaneous direction of heading from optic flow but how changes in heading are encoded is unknown. Change could simply be inferred cognitively from successive instantaneous heading values, but we hypothesize that heading change is represented as a low-level signal that feeds into motor control with minimal need for attention or cognition. To test this, we first used functional MRI to measure activity in several predefined visual areas previously associated with processing optic flow (hMST, hV6, pVIP, and CSv) while participants viewed flow that simulated either constant heading or changing heading. We then trained a support vector machine (SVM) to distinguish the multivoxel activity pattern elicited by rightward versus leftward changes in heading direction. Some motion-sensitive visual cortical areas, including hMST, responded well to flow but did not appear to encode heading change. However, visual areas pVIP and, particularly, CSv responded with strong selectivity to changing flow and also allowed direction of heading change to be decoded. This suggests that these areas may construct a representation of heading change from instantaneous heading directions, permitting rapid and accurate preattentive detection and response to change.
The ability to use vision to guide interaction with the environment is fundamental to survival for many animals and a key role of vision is to guide and monitor self-motion. Gibson (1950) suggested that humans use “optic flow” to determine self-motion. For example, moving forward produces radial (expanding) retinal motion, with all movement starting from a single focus of expansion (FoE). Such flow can be used to infer direction of heading.
Natural locomotion produces optic flow that is complicated by movements of the head and eyes, making the recovery of heading more difficult. How the visual system recovers heading direction has been explored extensively. Some behavioral studies suggest that we recover heading using flow in conjunction with extraretinal information such as eye and head position (Royden et al. 1992). Others suggest that we are able to recover heading from the retinal signal alone, especially if it holds information about parallax and reference objects (Li and Warren 2000) but even if it contains only the flow field (Li and Cheng 2011). Extensive physiological studies of visual sensitivity to heading have been made in primates. Two cortical regions in particular, the dorsal middle superior temporal area (MSTd) and the ventral intraparietal area (VIP) have been shown to contain neurons that are sensitive to flow components (Tanaka and Saito 1989; Duffy and Wurtz 1991) and to the location of the FoE during forward motion (Duffy and Wurtz 1995; Bremmer, Duhamel et al. 2002). Electrical stimulation of these regions can influence heading judgments (Britten and van Wezel 2002; Zhang and Britten 2011) suggesting that they contribute directly to perceptual awareness of heading. Some MSTd and VIP cells also receive congruent vestibular input suggesting integration of visual and vestibular cues to heading (Duffy 1998; Gu et al. 2006, 2007; Chen et al. 2011; Fetsch et al. 2012), although incongruent visual–vestibular preferences are also common.
Previous neurophysiological and neuroimaging studies focus on the extraction of instantaneous heading direction. However, in natural locomotion, it is necessary to monitor “changes” in heading. Heading changes can occur frequently and rapidly (e.g., when playing ball games), and it is necessary to monitor these changes to permit accurate locomotor control. Reliance on instantaneous heading signals alone imposes serious limitations, even in quite simple circumstances. For example, when following a curved trajectory, instantaneous heading is tangential to the path and always indicates imminent departure from the intended trajectory. To recognize that one is turning sufficiently to keep to a curved path, it is necessary to extract either the direction and magnitude of change of instantaneous heading or, alternatively, the curvature of locomotor flow lines (Lee and Kalmus 1980; Warren et al. 1991).
Little is known of how changing locomotor direction is registered in the brain. A key unresolved issue is whether perception of changing heading is effected through the monitoring of instantaneous heading followed by a secondary “cognitive” process of tracking changes in heading over time, or whether magnitude and direction of change in heading is encoded at a relatively low level so that it can feed directly into motor control with minimal need for attention or awareness. The notion of a cognitive process is somewhat ill defined but, by analogy, acceleration of visual motion is thought not to be represented at a low-level but to be inferred from a changing pattern of speed signals. Acceleration discrimination performance is poor and aftereffects of illusory deceleration cannot be induced by exposure to acceleration, suggesting that neurons that are tuned for specific values of acceleration/deceleration do not exist in the visual system and that we detect visual acceleration merely by noticing that speed has become different from a remembered value. In the case of heading, the question is whether neurons exist that respond to specific directions of heading change or whether we simply infer heading change by noticing a difference between current instantaneous heading and a remembered earlier heading direction.
Given the criticality of locomotor control to survival, the low-level detection of directional change would be optimal to ensure robust control and rapid error correction. The vestibular system, being sensitive to rotation, is potentially suited to this purpose. However, although the vestibular system is acutely sensitive to changes in head position, visual sensitivity to changes in locomotor direction could substantially increase the robustness of locomotor control mechanisms. The visual system also has the advantage of allowing directional changes to be referenced to the positions of external features (e.g., obstacles), information that is not present in vestibular signals. Vestibular signals are most useful for locomotion when integrated with visual signals, and the existence of visual estimates of directional changes would enhance and facilitate that integration.
In macaques, it appears that area MST does not encode the temporal properties of heading direction. At least 2 studies (Paolini et al. 2000; Mineault et al. 2012) have shown that when different optic flow patterns are presented in succession, the combined response is predictable from the component responses. There is evidence for an influence of spatial context on MST responses to flow (e.g., Froehler and Duffy 2002) but we know of no effects of temporal context. In other macaque flow-sensitive regions, such as VIP and V6, we know of no relevant investigations.
If specific visual heading-change signals do indeed exist in the brain, the neural substrate is, to our knowledge, completely unknown, in any species. In this study, we employ univariate and multivariate fMRI to examine sensitivity to heading change in several predefined motion-sensitive cortical regions.
Materials and Methods
To assert that changes in visual heading are encoded in a given cortical region, 2 conditions must be fulfilled. First, it must be shown that specific neural populations selectively respond to changing, as opposed to invariant, optic flow. Second, it must be shown that these neurons selectively respond to particular directions of change of heading, for example, turning to the left rather than to the right. To test for sensitivity to changing heading, we recorded the blood oxygen level–dependent (BOLD) response in healthy human volunteers during visually simulated movement across a ground plane (Experiment 1). Activity was measured in specific cortical regions associated with optic flow processing, namely, human MST (Huk et al. 2002; Wall et al. 2008), pVIP (Bremmer et al. 2001), visual area hV6 (Pitzalis et al. 2006; Cardin and Smith 2010), and the cingulate sulcus visual area (CSv; Wall and Smith 2008). For comparison, primary visual cortex (V1) was also examined. To test for specificity to direction of change of heading, we used multivoxel pattern analysis (MVPA) to assess whether different neural activity patterns are associated with different heading changes (Experiment 2).
Seven healthy volunteers took part (5 females). All had normal or corrected-to-normal vision. They were screened for MRI contraindications accordingly to standard procedures and written consent was obtained. The experimental procedure was in accord with the Declaration of Helsinki and was approved by the appropriate local ethics committee.
Stimuli and Task
Computer generated visual stimuli were projected by a LCD projector onto a rear-projection screen at the end of the scanner bore and were viewed via a mirror mounted on the head coil giving an image of ∼25 × 20° visual angle. The stimuli were created using a combination of OpenGL, MATLAB (The Mathwork, Inc.), ASF (Schwarzbach 2011), and Psychtoolbox-3 (Brainard 1997; Pelli 1997).
The stimuli are shown diagrammatically in Figure 1. Each motion stimulus lasted for 2 s and simulated a ground plane that filled the lower hemifield and consisted of white dots on a dark background (approximate luminance of dots 700 cd/m2, dot density 11 dots/°2). Three types of simulated observer movement were used:
The “No-Change” condition simulated continuous forward linear motion of the observer across the ground plane (no heading change) while fixating a distant point on the horizon. It provided a baseline in terms of the BOLD response to simulated self-motion.
The “Change-FoE” condition (see Fig. 1, left column and Supplementary Movie 1) simulated the same forward motion but with an added sinusoidal right–left component to create a sinusoidal motion path, while the observer again fixated a distant point that was static in the external world. This caused the FoE to move sinusoidally back and forth along the horizon (eccentricity range ±4.7°). The FoE started in the center of the screen and completed one cycle (left, center, right, center) during the 2 s stimulus.
The “Change-Curve” condition (see Fig. 1, center column and Supplementary Movie 2) simulated motion along the same sinusoidal forward path as Change-FoE but now as if the eyes were always aligned with the instantaneous heading direction. In this condition the FoE remained static (at the center of the horizon) and the sinusoidal heading change was conveyed entirely by changing flow elsewhere in the image (dot trajectories of changing curvature). The purpose of this condition was to test whether any sensitivity to changing heading that might be observed in the Change-FoE condition is reliant on encoding heading in term of the locus of the FoE.
Inevitably, these stimuli that differ in terms of global heading cues also differ in terms of local motion. It is essential to establish whether any observed differences in neural activity elicited by the stimuli reflect differences in heading cues or trivial differences in local motion. To control for differences in local motion between the stimuli, 2 further conditions, “Ctrl-FoE” and “Ctrl-Curve” were added. These were based on Change-FoE and Change-Curve, respectively, but the horizontal starting position of each dot was randomized while preserving its 2D motion trajectory relative to its starting position (see Fig. 1, right-hand column and Supplementary Movie 3). This gave an impression of noisy downward global motion (due to the downward mean 2D trajectory) and also a weak impression of forward motion of the observer but with indeterminate heading (there was no FoE). The control stimuli gave no sense of changing heading, yet contained exactly the same set of local dot motions as the changing-heading stimuli. Although different, the 2 controls appeared very similar.
Throughout each scan run, participants fixated just above the midpoint of the simulated horizon. Motion stimuli lasting 2 s were separated by a variable intertrial interval (ITI) with duration drawn from a Poisson probability distribution (Hagberg et al. 2001) with an average ITI of 6 s. During the ITI, the screen was blank apart from a central letter stream (see below). Each of the 5 motion conditions was repeated 8 times in a random order within each run (40 trials per run, duration ∼6 mins), and the whole experiment was composed of 8 runs (320 trials total).
To divert attention from the motion stimuli and maintain a constant attentional state, a demanding letter identification task was carried out at fixation. Throughout each scan run, a random letter stream appeared at fixation, the letter changing at 2 Hz. The participant searched the stream for the occurrence of either of 2 letters, E and F. On seeing F, they incremented a mental count and on seeing E, they decremented it. They reported the final count verbally at the end of the scan run.
Data were acquired using a 3T Siemens Trio MR scanner with an 8-channel array head coil. In Experiment 1, functional images were acquired with T2*-weighted gradient-recalled echo-planar imaging (EPI) sequence (35 axial slices, TR 2500 ms, TE 31 ms, flip angle 85°, resolution 3 × 3 × 3 mm). Structural data were acquired using a T1-weighted 3D anatomical scan (MPRAGE, Siemens; TR 1830 ms, TE 5.56 ms, flip angle 11°, resolution 1 × 1 × 1 mm).
Functional data were analyzed in terms of mean activity across all the voxels within each of a number of visual areas defined on the basis of separate localizer scans (see Fig. 2). Primary visual cortex (V1) was identified by a standard retinotopic mapping procedure (Sereno et al. 1995) employing an 8-Hz counterphasing checkerboard wedge stimulus (a 24° sector) of radius 12°. Check size was scaled by eccentricity in approximate accordance with the cortical magnification factor. The wedge rotated clockwise at a rate of 64 s/cycle, and 8 cycles were presented. This stimulus was presented twice to each participant, and the data from the 2 scan runs were averaged to give the final retinotopic maps.
hMT and hMST were defined based on a standard method (Dukelow et al. 2001; Huk et al. 2002). A circular patch of dots (8° diameter) was presented with its center placed 10° to the left or right of fixation. Blocks of 15 s in which the dots were static were alternated with blocks of 15 s in which the dots moved alternately inward and outward along the radial axes, creating alternating contraction and expansion. Sixteen blocks (8 static and 8 moving) were presented in each scan run; 1 run was completed with the stimulus on the left and another with it on the right. With this procedure, 2 regions that have been called hMT and hMST can be differentiated in terms of the absence or presence, respectively, of ipsilateral drive. It is likely that “hMST” comprises 2 or more regions that respond to motion and have large receptive fields, but further refinement requires demanding high-resolution mapping techniques (Amano et al. 2009; Kolster et al. 2010) that are beyond the scope of this study.
A third localizer was used to identify areas hV6, pVIP and CSv (Wall and Smith 2008; Cardin and Smith 2010). This consisted of 2 time-varying optic flows (light dots on a dark background). The first was egomotion-compatible optic flow that cycled through spiral space to simulate back-and-forth spiral motion of the observer. The second was an egomotion-incompatible 3 × 3 array of similar spiral motions. Each was presented for 15 s, separated by 15 s with no stimulus (except for a fixation point) and each was repeated 10 times. Participants were continuously engaged in a color counting task at fixation. Contrasting the activity elicited by these 2 stimuli isolates regions (hV6, pVIP, and CSv) that favor egomotion-compatible flow from those that respond well to any flow stimuli. CSv is as originally defined in the human brain (Wall and Smith 2008), and it is unknown whether a counterpart exists in macaque brain. Area hV6 corresponds closely to V6 of Pitzalis et al (2006), and it is likely that it has similar functions and connectivity to macaque V6. The status of pVIP (putative VIP) is less certain. It appears to be the same region as human VIP of Bremmer et al. (2001), who suggested that it may be homologous with macaque VIP. It is possible that pVIP corresponds to IPS4 of Swisher et al. (2007), although this remains to be determined by direct comparison. The key point here is that it is a region in anterior IPS that is selectively responsive to visually simulated self-motion (Wall and Smith 2008) and the label “pVIP” is intended only to indicate a possible homology.
Data were analyzed using Brain Voyager QX 2.3 (BrainInnovation, The Netherlands), MATLAB (The Mathwork, Inc., USA) and R (R Foundation for Statistical Computing). The first 2 volumes of each run were discarded. Three-dimensional motion correction and slice time correction were performed. The data were temporally high pass filtered at 3 cycle/run (∼0.01 Hz). The preprocessed EPI scans were then coregistered with the anatomy. Finally, both functional and anatomical data were aligned into AC-PC space. The preprocessed data were analyzed within the general linear model (GLM), separately for each participant. For the main experiment, each motion condition was modeled separately, with a regressor formed by convolving the stimulus time-course with a canonical hemodynamic impulse response function and then scaling to unity. Head motion regressors were also included. For the retinotopic mapping data, the temporal phase of the response to the rotating wedge at each voxel was obtained by fitting a model to the time-series. Phases were superimposed as colors on a segmented and flattened representation of the grey matter. Phase was taken as an indicator of visual field position in terms of polar angle and the boundary of V1 was drawn by eye using conventional criteria. The hMST localizer data were analyzed by fitting a model and the results were superimposed on the flattened grey matter representation as a colored t-map. hMST was defined as a cluster of voxels at the expected location that responded significantly to ipsilateral motion (Smith et al. 2006). For the third localizer, to localize CSv, hV6, and pVIP, separate models were fitted to the blocks of the 2 types (single motion patch versus 9 patches), accordingly to a standard procedure developed in our laboratory (Wall and Smith 2008; Cardin and Smith 2010) in which a cluster of voxels that was significantly more strongly activated by egomotion-compatible than egomotion-incompatible motion was identified in each of 3 expected locations.
Having identified the regions of interest (ROIs), the mean BOLD response magnitudes (β values) corresponding to each condition in the main experiment were calculated by averaging across all voxels in the ROI, independently for each ROI.
Eye Movement Recording
Eye position measurements were obtained with an infrared video camera positioned close to the eye (NordicNeuroLab, Norway). The purpose was to establish the extent to which participants tracked the FoE as it moved along the horizon. Pupil position was continuously sampled at 60 Hz with software (Arrington, Inc., USA) that located and tracked the pupil. Blinks were detected and the corresponding samples excluded. Eye position was not recorded in all participants, and data were included only if a clean (low-noise) position trace was obtained. As movement of the FoE occurred only horizontally, only the horizontal component of eye position was analyzed. Eye position was calibrated based on a short calibration run in which the participant successively fixated 4 points on the simulated horizon.
For each 2 s stimulus event, the corresponding 2 s section of the eye trace was extracted. For each event type independently, all traces relating to that event type were grouped together and normalized to the same mean, to remove slow drift (e.g., due to small head movements). The standard deviation of all available horizontal eye positions across all identical trials in all scan runs was then calculated, as an overall index of horizontal eye stability. In addition, traces were extracted from an equivalent number of 2 s periods from intertrial intervals when no optic flow was presented but the central letter task continued, and these were analyzed in the same way.
Seven healthy volunteers participated (5 females). Six of these had previously taken part in Experiment 1.
Stimuli and Task
The stimuli were again dot patterns that simulated forward motion across a ground plane and filled the lower hemifield. They were based on the Change-FoE condition of Experiment 1 but the FoE moved smoothly in one direction (either left or right) rather than sinusoidally back and forth, to provide 2 contrasting stimuli for multivariate pattern decoding. The stimuli were similar to Experiment 1 in terms of dot luminance, size, contrast, and speed. Each dot moved in a straight line along some portion of a path from a FoE to the periphery of the display area. It moved for 500 ms before disappearing and reappearing at a new random location. Dots reaching the edge of the screen were repositioned randomly within 10° of the FoE. Different dots were repositioned in different frames, with 3.3% of the dots being repositioned at each frame update. Over time the FoE itself moved, sweeping either from left to right or from right to left across the horizon (the horizontal meridian). The initial position of the FoE was equal to the largest displacement of the FoE in the Change-FoE condition in Experiment 1, which was 4.7° to either the left or right of the center of the screen. The sweep lasted 1 s, during which time the FoE moved with constant speed (9.4°/s) to the right or left, respectively. At the end of a 1-s FoE sweep between ±4.7° eccentricity, the FoE continued to move (over a further 2.3°), but global motion was smoothly degraded over 0.5 s by randomizing the horizontal positions of a growing proportion of dots until all dots were horizontally randomized, and the image appeared as noisy forward motion with no FoE. New coherent directions were then smoothly reapplied over a further 0.5 s such that the FoE reappeared on the opposite side, for the start of the next sweep. During this 0.5 s, the FoE moved toward the center from 7.0° and reached 100% coherence at 4.7° eccentricity. This gave a 2-s cycle that could be repeated to give repeated movement of the FoE in a single direction while avoiding any confound due to abruptly resetting the FoE.
A potential confound arises from differences in local dot direction between leftward and rightward sweeps. In natural conditions, when the locomotor trajectory shifts rightward, with gaze directed at a fixed point on the horizon, not only does the FoE move rightward across the retinal image but also a leftward local translation (of spatially varying magnitude) is added to all points on the image of the ground plane. The direction of this translation reverses with the direction of heading change. If rightward and leftward sweeps are decoded with MVPA, it is therefore possible that it is this difference in local motion that is decoded, rather than the direction of heading change per se. To avoid this confound, the horizontal local translation was removed. Each dot emanated from the currently pertaining FoE in some direction and then continued in a straight line in that direction throughout its lifetime. That it is, once in motion, it moved in a straight line as if the FoE were static, without the curvature (the increasing horizontal motion) that would normally occur when the FoE subsequently moves horizontally. As the FoE moved, newly generated dots moved in the straight paths associated with the FoE position pertaining when they were generated. This meant that rightward and leftward sweeps contained identical sets of local motion trajectories, one set simply read out in the reverse order to the other. Thus, there were no differences in local dot motion direction that might form the basis of successful classification performance.
Each 15 s block consisted of a series of 8 sweeps, separated by the smooth resetting of the FoE. The direction of heading change (leftward or rightward) alternated between stimulus blocks. Each run was formed of 10 stimulus blocks, separated by 7.5 s blocks in which no dots were presented, and buffered at the beginning and end by 15 s with no dots. Participants fixated centrally and the letter task of Experiment 1 was performed throughout each run, to encourage good fixation and to maintain attention in a constant state diverted from the motion stimuli.
Data Acquisition and Analysis
Acquisition was the same as Experiment 1 except that the functional (EPI) voxels were reduced in size to 2.5 mm (isotropic), to provide more voxels for use as features in the MVPA, each functional scan consisted of 99 volumes and parallel imaging (GRAPPA, acceleration factor 2) was used.
Multivariate as well as univariate analysis was used. For each run, the preprocessed time-courses at each voxel were divided into sections that were averaged across similar blocks, to create time-courses with one 15 s response for each class, each response being the average of the 5 similar blocks in the run. MVPA was then performed on these averaged time-courses. Inclusion of voxels as features in the MVPA was based on the ROIs, and a separate analysis was performed for each ROI. A limitation of this approach is that small visual areas such as pVIP and hV6 may contain as few as 15–20 functional voxels, whereas MVPA requires a larger number of features to be efficient. To ameliorate this problem, data were combined across participants prior to MVPA analysis (Brouwer and Heeger 2009). For each visual area, decoding performance was assessed as follows. An estimate of the response at every voxel was obtained by fitting a GLM including a regressor to model the trial response obtained by convolving a box-car function, representing the stimulus timing with a double-gamma hemodynamic response function. Separate regressors modeled leftward and rightward heading change, and the resulting β values were used as trial response values (exemplars) for the 2 stimuli. In addition, voxels within each ROI, combined across participants, were ranked in terms of β (averaged across the 2 conditions). Decoding performance was examined for each ROI as a function of number of features included by progressively including more voxels, starting with those with the largest responses and descending down the ranking. Performance was also evaluated with random voxel selection. For each MVPA analysis, a subset of observations was used to train the classifier, which was a support vector machine (SVM) with a linear kernel. The SVM was trained to identify the optimal separating boundary (hyperplane) between the 2 conditions (sweep toward right, sweep toward left). A “leave-one-out” method was used. Of the 10 scans, 9 were used for training, and the 10th was used for testing. This was repeated 10 times, leaving out each run in turn, and the 10 performances were averaged. Finally, for each ROI, we tested the hypothesis that the classification accuracy was different from chance level by comparing it against the test accuracy on the same dataset after having randomly permuted (shuffled) the labels, which should produce chance-level accuracies with a similar variance to the main analysis. One thousand such analyses were performed with different random permutations, employing the same leave-one-out method, giving 10 performance estimates per permutation. The 95th percentile of the distribution of permuted performance results (typically in the range 60–65% correct, see Fig. 4) was taken as a critical value for regarding unpermuted performance values as significantly above chance. GLM analysis was performed with BrainVoyager and all analyses beyond GLM (merging the ROIs, voxel selection, SVM classification) were performed with MATLAB (The Mathwork, USA) using the LIBSVM library for SVMs (Chang and Lin 2011). Further statistical analysis on the test accuracies was performed using R.
Eye Movement Recording
In Experiment 2, for each 2 s cycle of the stimulus, a 1 s trace was extracted corresponding to the period when the FoE was moving, and the flow was 100% coherent. These traces were grouped into those arising from leftward or rightward stimulus blocks and normalized to remove slow drift. The traces were then averaged to reveal the direction and magnitude of the mean horizontal eye movement that occurred during each stimulus class. As in Experiment 1, eye position was not recorded in all participants and data were included only if a clean (low-noise) position trace was obtained.
Figure 2 shows, for 2 participants, the locations of the cortical regions examined, based on the results of the various independent localizer scans. Corresponding regions of interest were identified in both hemispheres in all participants except for a few cases where a particular ROI could not be reliably defined in a given hemisphere. Four regions (hMST, pVIP, hV6, and CSv) are examined because they are a priori candidates for the processing of optic flow, while V1 (where no sensitivity to global flow properties is expected) is included for comparison. The locations of all regions are described fully, with Talairach co-ordinates, in our previous work (Wall and Smith 2008; Cardin and Smith 2010), in which we used the same methods for localization.
Responses to Changing Heading
Figure 3A–E shows mean response magnitudes for each optic flow stimulus, averaged across both hemispheres and all participants. The results are shown separately for each cortical region examined and are normalized (independently within each region) to remove variance due to overall BOLD magnitude differences between participants. One of the regions examined, the CSv, showed striking selectivity for the 2 optic flow stimuli that simulated changing heading (Fig. 3A). Both stimuli elicited a response that was about 3 times larger than that to unchanging heading, and the difference was statistically significant in both cases (Change-FoE vs. NoChange: t(6) = 6.26, P < 0.001; Change-Curve vs. NoChange: t(6) = 4.01, P < 0.01). Responses were completely absent during presentation of horizontally scrambled flow, despite the presence of coherent downward global motion. This downward motion is noisy and, having no FoE, is more consistent with object motion (e.g., a waterfall) than with self-motion. Thus, the lack of response is consistent with our earlier claim (Wall and Smith 2008) that CSv is selectively responsive to flow stimuli that are compatible with self-motion. The greatly enhanced response to changing compared with unchanging heading suggests that CSv may have a specific role in signaling changes in heading direction that occur during self-motion.
Two other cortical regions, pVIP and hV6, showed similar trends to CSv but with weaker stimulus selectivity. In pVIP (Fig. 3B), the mean response to changing heading was about twice that to unchanging heading, and the difference reached significance for Change-FoE (t(7) = −2.95, P = 0.021) although not for Change-Curve. In hV6 (Fig. 3C), the response to changing heading was only about 25% greater than that to unchanging heading but this was very consistent across participants (variance was low) and the difference was significant for both Change-FoE (t(6) = 7.36, P < 0.001) and Change-Curve (t(6) = 3.16, P < 0.02). The response to scrambled flow, which appeared as noisy but coherent downward motion, was only slightly reduced compared with unchanging heading in hV6, and not at all in pVIP. These results suggest that hV6 and pVIP may receive information about changing heading but are perhaps less specifically concerned with heading change than CSv (see Discussion section).
Neither hMST nor V1 showed selective responses to changing heading (Fig. 3D,E). In these areas, the response was not significantly greater for changing than unchanging heading and the scrambled flow controls gave strong responses in both regions.
Attention and Eye Movements
A possible explanation of the results in CSv, pVIP, and hV6 might be that changing heading attracts attention more strongly than unchanging heading or scrambled flow and that this increased attention increases response gain. Several factors make such an explanation unlikely. First, the participants engaged in a demanding central fixation task and reported little awareness of the motion stimuli. Second, at least in CSv, it would be hard to create the observed pattern of results even if there were no task and attention were unconstrained. To do so would require 300% enhancement by attention, which (to our knowledge) is greater than has ever been reported in any visual area. Moreover, the complete suppression of responses in the control condition is hard to explain in this way. In pVIP, the enhancement is again too great to be explained by attention. The only region where the effect is small enough to be plausibly attributed to attention is hV6. Third, the lack of a preference for changing flow in hMST militates against an explanation in terms of attention, since attentional modulation has been shown to be strong in human MT+ (O'Craven et al. 1999; Saenz et al. 2002). Taken together, these factors suggest that the task was effective in engaging attention and that the results cannot be explained in terms of differences in attention among stimuli.
Another possible explanation might be that eye movements differed between the conditions. Most obviously, participants might have tracked the FoE in the Change-FoE condition. In the Change-Curve condition, the FoE was static so this explanation does not hold, but other changes might have elicited eye movements. Eye movements are less likely in the No-Change, Ctrl-FoE, and Ctrl-Curve conditions. If greater eye movement occurred during the changing heading conditions, the BOLD response might be enhanced in the 2 Change conditions by eye-movement-related activity. It might also be altered by the change in retinal speed caused by smooth eye movements. To evaluate these possibilities, eye position was monitored. Figure 3F shows that the standard deviation of horizontal eye position was small (<0.2°) and was similar for all 5 conditions and during the ITI. Thus, in visual areas where greater responses occurred during changing than unchanging flow, the enhancement cannot be attributed either to stronger retinal motion stimulation as a result of eye movement or to greater eye-movement-related cortical activity.
A more troubling possible interpretation of enhanced BOLD responses to changing flow might be that, over the 2 s stimulation period, multiple neurons with different instantaneous heading preferences (e.g., preferred locations of the FoE) are recruited. If briefly stimulating multiple neuron populations yields greater total activity than stimulating one population continuously, then this could lead to a greater BOLD response for changing than unchanging flow (and potentially for any changing stimulus than its unchanging counterpart). The existence of a representation of changing heading could not then be inferred from our data. It might be expected that the effect of recruiting more neurons would be cancelled by the fact that each neuron is active for a shorter period, but it is unknown and difficult to judge whether this is quantitatively correct. A related possibility is that even if the 2 stimuli initially generate similar levels of activity, greater adaptation might occur during the 2 s presentation of the unchanging flow than during changing flow, explaining the reduced overall response for unchanging flow. This could occur if adapting one set of neurons for a prolonged period resulted in more attenuation of the BOLD response than transiently adapting each of several populations, resulting in a smaller overall response to unchanging flow. Again, whether this is the case is difficult to assess. The converse possibility, that adapting many neurons causes more attenuation rather than less, is also plausible. The effects of adaptation on BOLD amplitude are complex and depend on the specific dynamics of the neuron population in question (see, e.g., Krekelberg et al., 2006).
Such factors might contribute to the observed differences across visual areas as well as to differences between stimuli within areas. If one visual area contains neurons with narrower heading tuning than another area and this affects the level of adaptation, differences in BOLD response will result. Similarly, if areas differ in their inherent susceptibility to adaptation, BOLD differences may result. If the 2 factors interact, adaptation could contribute to the observed interaction between preference for changing flow and visual area.
It seems to us unlikely that these factors explain the strong, region-specific effects evident in Figure 3, but it is hard to discount them completely. We therefore conducted a second experiment to seek converging evidence for a representation of heading change without reliance on differences in univariate BOLD response amplitudes among conditions.
Decoding Direction of Heading Change
To be useful for guiding locomotion, a representation of changing heading must include the direction in which heading is changing. We therefore attempted to train a linear SVM classifier to decode the pattern of neural activity across voxels elicited by optic flow that simulated a smooth change in egomotion direction either from left-to-right or from right-to-left. The analysis was run separately on hV6, CSv, and pVIP, the 3 regions that showed an apparent preference for changing heading in Experiment 1 (Fig. 3).
Classification performance is strongly dependent on the number of “features” (voxels) included in the analysis (Li et al. 2007), and our ROIs varied considerably in size. Results based on all available voxels might therefore be biased in favor of larger visual areas. To assess results independently of this factor, we systematically varied the number of voxels included, between 50 and 300 per ROI. Data were combined across participants prior to running the analysis (see Materials and Methods section). Voxels were selected for inclusion by moving progressively through the voxels in descending rank order of univariate amplitude. A similar pattern of results across areas, but with slightly lower classifier performance levels, was obtained when they were selected randomly.
The results are summarized in Figure 4. In CSv, the classifier was successful in distinguishing the 2 categories (changing heading from left-to-right and from right-to-left). Accuracy increased with the number of features included, up to a maximum of about 85%. In pVIP, accuracy also increased with the number of features, to a similar level. In contrast, hV6 did not show classifier performance that was significantly above chance levels. These results suggest that CSv and pVIP may contain neurons that are sensitive to direction of heading change but they provide no support for such a conclusion in the case of hV6.
Statistical significance was assessed by comparing classifier performance with chance performance as estimated from similar MVPA analyses of the same data conducted with randomly permuted labels (i.e., with no correlation between the leftward/rightward labels assigned to trials and the actual directions of heading change in the trials). One thousand such analyses were conducted with different random permutations and Figure 4 shows, for each cortical area, the mean (asterisk) and 95th percentile (dashed line) of the resulting distribution of 1000 classification performances. Performance in both CSv and pVIP was comfortably above the 95th percentile for larger voxel numbers, indicating statistical significance, whereas this was not the case in hV6.
In a further analysis of the permutation data, for each of the 1000 iterations of the permutation test, the 10 performance values from the 10 leave-one-out iterations (see Materials and Methods section) based on unpermuted data were tested against the corresponding 10 performances from the permuted data by t-test. Figure 5 shows, for each visual area, the resulting distribution of P values. For CSv and pVIP, the values were clustered at low values, with >600 of 1000 values falling below P = 0.05. In contrast, hV6 showed a wide spread of P values. This analysis supports the assertion that changes in heading direction could be reliably decoded in CSv and pVIP but not in hV6.
Effect of Response Amplitude
Classifier performance is strongly affected by response amplitude (Smith et al. 2011). Because of the presence of noise in BOLD data, a given actual response selectivity in a given cortical region can be detected with MVPA more readily if the responses elicited by the stimuli employed are large than if they are small. Therefore, our results could be biased in favor of CSv and pVIP if the response amplitudes in these regions were greater than in hV6. However, this was not the case. CSv and pVIP both had similar, relatively small responses (see Fig. 4D), while the mean response was substantially larger in hV6 (hV6-CSv: t(259) = 750, P < 0.001; hV6-pVIP: t(259) = 633, P < 0.001). If anything, our classification results are biased against CSv and pVIP when compared with hV6. Interestingly, however, although the overall performance function for V6 (Fig. 4C) suggests a failure to decode heading change, it can be seen that performance is above chance for low (<100) voxel numbers. As voxels were progressively recruited based on ranked amplitude, these points on the plot reflect the V6 voxels with the highest response amplitudes. It could be that decoding is supported by a subset of highly active voxels but that performance is diluted when additional, less selective and less active voxels are added in. However, as performance is only just above chance for low voxel numbers, further research would be needed before selectivity in V6 could be claimed on this basis.
We considered whether the representation of heading change might be lateralized, for example, whether there might be a preference in each hemisphere for contraversive heading change. This was tested and found not to be the case with a univariate (voxelwise) GLM analysis in which response magnitude was estimated separately for the 2 directions of heading change. The statistical contrast between leftward and rightward blocks yielded no significant voxels in any area in any participant. In a more sensitive analysis in which responses were averaged across all voxels in each ROI prior to statistical testing across participants, neither counterpart (left/right hemisphere) of either CSv or pVIP showed any difference between rightward and leftward heading change, and there were no trends that approached significance (left CSv: t(6) = −0.38, P = 0.72; right CSV: t(6) = 0.15, P = 0.89; left pVIP: t(6) = −0.02, P = 0.98; right pVIP: t(6) = −0.38, P = 0.71).
Attention and Eye Movements
A possible interpretation of the results in Figure 4 might be that classifier performance is based on differences in attention between the conditions. As one stimulus is the same as the other apart from the direction in which heading changes, no difference in overall level of attention is expected. Any attentional explanation would therefore have to relate to differences in the direction in which spatial attention moves. If voxels with preferences for different directions of smooth shift of spatial attention existed within CSv and pVIP, this would be of interest in its own right. However, although not inconceivable, this seems an unlikely explanation, especially since, as in Experiment 1, a demanding letter task was employed to engage attention such that attention to the motion stimuli was minimized.
It must also be considered whether classifier performance might reflect differences in eye movements between the 2 conditions. Specifically, tracking the FoE would lead to large differences in direction of eye movements between the 2 conditions that might provide a basis for pattern classification. To be able to test any such interpretation, we measured eye position. Averaged eye traces are shown in Figure 6 for 2 participants. For one participant, eye position was extremely stable, and there was no tendency to follow the FoE. For the other, the eye was somewhat less stable (standard errors are larger) but again there was no following of the FoE. No other participant showed measurable following eye movements. The absence of any significant following of the FoE rules out the possibility that our multivariate analyses decoded direction of eye movement rather than direction of FoE movement. Participants were able to fixate the central letter stream and did not track the FoE.
In summary, the results of Experiment 2 provide direct evidence for the existence of a representation of heading change in CSv and pVIP (but not hV6), and they militate against alternative interpretations of the results of Experiment 1 in terms of differential recruitment of neurons or differential adaptation.
The way we derive instantaneous heading direction from visual cues during self-motion is relatively well understood. There have also been some behavioral studies that have looked at how well humans can predict future heading or control heading when they are on curved trajectories (Warren et al. 1991; Wilkie and Wann 2006; Li et al. 2011). Some studies have also looked at the neural systems engaged in planning and adjusting a curved trajectory (Field et al. 2007; Billington et al. 2010). But what has not been identified is how we are able to monitor and adjust to changes in locomotor direction, which is central to any mechanism to correct errors and maintain control on curved or complex paths. A representation of instantaneous heading could be updated rapidly, and it might be that we rely entirely on such updating for the adjustment of motor commands to maintain the desired locomotion, and rely on conscious inference about heading change when monitoring our overall trajectory and planning ahead. Alternatively, however, it might be advantageous to generate a low-level representation of the direction and magnitude of heading changes. This could simplify the computations needed to maintain a complex trajectory when locomoting at speed and to generate corrective motor signals so as to avoid errors and collisions. It might also facilitate integration of visual heading signals with the vestibular signals that are associated with heading change.
Sensitivity to Heading Change in CSv and pVIP
Our work provides the first evidence that heading change is indeed represented in certain sensory regions of the human brain. We searched for such a representation in all the key cortical areas that have previously been associated with processing optic flow related to self-motion and found a different result in each area. In 2 experiments, clear evidence for a representation of heading change was found in the cingulate sulcus visual area, CSv. Not only can direction of change of heading be predicted from responses in this area (Fig. 4A) but, remarkably, the response to flow is greatly attenuated if heading is invariant, and abolished if there is no clear FoE (Fig. 3A). Compelling evidence for sensitivity to heading change was found also in pVIP, which has previously been implicated in encoding self-motion (Bremmer et al. 2001; Wall and Smith 2008). This increases the likelihood of a homology with macaque VIP, which has been similarly implicated (Bremmer, Duhamel et al. 2002; Zhang et al. 2004; Zhang and Britten 2010, 2011). However, in the case of pVIP, responses were somewhat less attenuated by invariant or indeterminate heading than in CSv (Fig. 3B), suggesting perhaps that changing heading is represented in pVIP but is just one of many attributes represented there.
In hV6, we found a clear enhancement in BOLD response when heading was changing (Fig. 3C) but it proved impossible to decode the direction of heading change (Fig. 4C). This could perhaps be interpreted as indicating the presence of a nondirectional heading-change signal, which would be of little value for establishing the trajectory of self-motion but might, for example, act as a nonspecific trajectory-change warning signal.
We found no evidence for the representation of heading change in hMST, an area that has repeatedly been associated with instantaneous heading (Morrone et al. 2000; Wall et al. 2008) and may be related to areas of the macaque brain, such as MSTd, that show similar properties (Saito et al. 1986; Tanaka and Saito 1989). Indeed, a representation of heading change is not expected, at least in macaque MST, based on current models showing that macaque MST responses to rapidly changing flow stimuli can be well accounted for simply by (nonlinear) combination of instantaneous inputs from sub-units in MT (Mineault et al. 2012; see also Paolini et al. 2000). Thus, the proposed heading-change signals in pVIP and CSv may be derived from instantaneous heading at a higher level than hMST and require an additional computational process that takes account of the nature of temporal variations. Classical accounts place macaque MST and VIP at the same hierarchical level, on the grounds that both receive strong direct inputs from MT (Maunsell and Newsome 1987), but the fact that VIP has, if anything, greater heading sensitivity (Zhang and Britten 2010) and is more strongly multisensory (Bremmer, Klam et al. 2002) is consistent with the idea that it may, at least in some respects, sit at a higher functional level than MST. CSv has not yet been identified in macaques but anatomical tracer studies suggest that its strongest connections, if it exists, are likely to be with medial and lateral parietal cortex (Cavada and Goldman-Rakic 1989; Bakola et al. 2010) rather than occipital cortex or MT/MST. CSv might therefore receive optic flow signals from pVIP, making it the highest area in an MST-VIP-CSv hierarchy, consistent with it having the strongest selectivity for heading change, as well as the strongest selectivity for the egomotion-compatibility of flow (Wall and Smith 2008). However, absolute hierarchies of sensory cortical areas are difficult to establish and multiple configurations can be modeled (Hilgetag et al. 1996). Interestingly, CSv has recently been shown to have strong vestibular input (Smith et al. 2012) and is therefore a candidate for visual–vestibular interactions that might facilitate detection of heading change.
Throughout our work, we have taken great care to establish that where our results apparently suggest sensitivity to changing heading, they cannot be explained by other factors. In the univariate study (Experiment 1), we discounted explanations in terms of attention and eye movements but residual alternative explanations remain. Larger responses to changing than unchanging flow (Fig. 3) might reflect recruitment of a larger number of neurons with different heading preferences, even though each is stimulated more transiently. Similarly, smaller responses to unchanging flow might reflect strong habituation of a single subpopulation while more transient habituation occurring in a more diverse population during changing flow might be weaker in total, even though more neurons are affected. We are therefore cautious about these results and interpret them in light of the results of our multivariate study (Experiment 2). In any MVPA study, it is essential to carefully consider whether the classifier is decoding the property the researcher wishes to examine or some other correlated property. In Experiment 2, it is relatively easy to show that we are unlikely to be decoding differences in eye movement or attention; the primary challenge is to show that we are not decoding differences in local motion, given that it is impossible to change global optic flow without changing local motion. To meet this challenge, we used a careful stimulus construction that, at the expense of deviating somewhat from natural flow stimuli, contains identical sets of local motions in both stimulus classes. As a result, we have high confidence that the property decoded by our classifier is the direction in which simulated heading is changing. As only CSv and pVIP survive this test, while hV6 does not, we are confident that CSv and pVIP represent direction of change of heading but it may be that the univariate hV6 result (Fig. 3C) reflects other factors.
Basis of Heading Encoding
It is of interest that in Experiment 1, we obtained similar results whether changing heading was defined in terms of 1) changing FoE location or 2) changing flow curvature with fixed FoE. Our results suggest that human CSv and pVIP represent changing heading equally well when it is defined by either cue. This permits greater flexibility in the heuristics that may be employed to detect heading and control locomotion, particularly on curved trajectories (Wann and Swapp 2000).
The primate literature has focused heavily on the location of the FoE. For example, numerous studies have documented the responses of neurons in MSTd to expanding optic flow when the FoE is eccentric, indicating heading direction at variance with gaze direction, or when a translation is added to the expansion, simulating smooth pursuit during self-motion (Duffy and Wurtz 1995, 1997; Bradley et al. 1996; Gu et al. 2006, 2010; Bremmer et al. 2010). Few neurophysiological studies have been conducted with optic flow stimuli in which heading is specified in other ways. Specifically, we know of no physiological reports of heading sensitivity with stimuli that simulate motion on a curved trajectory with gaze aligned with instantaneous heading direction.
When comparing our results with primate neurophysiological data, there is therefore a firm physiological reference point for our Change-FoE stimulus, if we are willing to assume homologies between species, but there is none for Change-Curve. It is possible to imagine how sensitivity to changing heading as signaled by a shifting FoE could be constructed from macaque MSTd neurons, but harder to know how sensitivity to changing heading indicated by curved flow with a static FoE might be constructed. However, this is not to say that it could not be done based on MSTd efferents. The response properties of MSTd neurons are complex and may well provide versatile information that allows construction of heading through any of several varied cues.
Given the existence of a low-level visual representation of changing heading, an important outstanding issue is whether this representation consists of pure change signals (e.g., trajectory has veered 10° leftward) or whether change signals are referenced to external positions (heading has veered from moving toward point x to moving toward point y). When attempting to reach a given external location, or to take a trajectory through the environment that avoids obstacles, the latter might be more useful. However, there might also be circumstances in which the former information meets needs more directly, for example, detecting deviation from an intended straight path or attempting to make a turn of a given magnitude. Further research will be needed to establish the nature and the reference frame of the change signals implicated by our experiments.
The existence of a visual representation of changing heading in the human brain helps to explain our exquisite ability to negotiate accurately and at high speed through complex environments. We are aware of no primate neurophysiological studies in which evidence of sensitivity to heading change has been found, or even sought, in any brain region. Our work opens up new opportunities, not only for further neuroimaging work but also for neurophysiological studies of heading change in other primates and for computational treatments of the mechanisms involved.
This research was funded by the European Community's Seventh Framework Programme FP7/2007-2013 under grant agreement number 214728-2.
We thank Jonas Larsson, Andrew Welchman, and Zoe Kourtzi for helpful comments on an earlier version of the manuscript. Conflict of Interest: None declared.