How does our brain detect changes in a natural scene? While changes by increments of specific visual attributes, such as contrast or motion coherence, can be signaled by an increase in neuronal activity in early visual areas, like the primary visual cortex (V1) or the human middle temporal complex (hMT+), respectively, the mechanisms for signaling changes resulting from decrements in a stimulus attribute are largely unknown. We have discovered opposing patterns of cortical responses to changes in motion coherence: unlike areas hMT+, V3A and parieto-occipital complex (V6+) that respond to changes in the level of motion coherence monotonically, human areas V4 (hV4), V3B, and ventral occipital always respond positively to both transient increments and decrements. This pattern of responding always positively to stimulus changes can emerge in the presence of either coherence-selective neuron populations, or neurons that are not tuned to particular coherences but adapt to a particular coherence level in a stimulus-selective manner. Our findings provide evidence that these areas possess physiological properties suited for signaling increments and decrements in a stimulus and may form a part of cortical vigilance system for detecting salient changes in the environment.
Any sudden changes in the environment may be of vital significance for survival. We have previously observed, in a study exploring the effect of adaptation on contrast representation in the early visual cortex, the opposing patterns of responses to changes in stimulus contrast: Whereas neuronal activities in areas V1–V3 follow the sign, and scale with the magnitude, of transient contrast increments or decrements from an intermediate adapting contrast, human area V4 (hV4) always exhibits positive responses regardless of whether contrast was incremented or decremented (Gardner et al. 2005). That hV4 responds positively to decrease in stimulus strength (contrast) raised a possibility that this area is responsible for extracting changes toward lesser stimuli in a visual scene, a function that is of fundamental importance for normal visual processing (Buffalo et al. 2005). This hypothesis is consistent with the results from a previous study showing that losing such a neural mechanism, as a consequence of experimental ablation of monkey V4, can lead to severe impairment in the animal's ability to select a singleton whose contrast is lower than that of the flanking distracters (Schiller and Lee 1991).
Two outstanding issues remain to be addressed. First, it is necessary to verify whether this physiological property of hV4 responding positively to both incremental and decremental changes is limited to contrast or it can be generalized to other types of changes in the stimulus. Second, it is of obvious interest to know if other areas along the visual pathways also possess this property. To answer these questions, we have conducted a series of functional magnetic resonance imaging (fMRI) experiments targeting occipito-parietal cortex of human volunteers when they viewed moving dots whose coherence level (defined as the proportion of dots moving in the same direction) is transiently increased or decreased. Moving stimuli are known for eliciting cortical responses both in the dorsal and ventral visual streams, including area V4, in both animals (Cheng et al. 1994) and humans (Sunaert et al. 1999; Braddick et al. 2000). Moreover, motion coherence, like contrast, is an ideal stimulus attribute for probing behavioral and physiological responses as a function of the stimulus strength (Rees et al. 2000) and, as opposed to other stimulus attributes, such as orientation, spatial frequency, and motion direction and speed, which are differentially coded in segregated cortical modules, motion coherence is represented uniformly over the cortical surface in motion-responsive areas. Our experimental procedure and the stimulus employed, thus, allow us to potentially identify visual areas that signal salient changes in the stimulus irrespective of the sign of the change (increments or decrements in motion coherence in this experiment).
Our results demonstrated that hV4 plays a fundamental role in signaling changes in the visual stimuli, not only changes in contrast, as previously revealed, but also changes in motion coherence. Importantly, we also found that this property of hV4 is shared by areas ventral occipital (VO) and V3B. These areas, as a whole, may constitute a cortical vigilance system that is primarily engaged in detecting salient changes in the visual scene.
Materials and Methods
All subjects participated in a series of MRI experiments: (1) Acquisition of high-resolution anatomical images, (2) four types of functional experiments (Experiments 1 through 4, described in detail below) testing blood oxygenation level-dependent (BOLD) responses to motion coherence, and (3) polar and eccentricity mapping of the visual field for identifying retinotopically defined visual areas.
Each type of the experiment was conducted on a separate session, for a total of six sessions for each subject. We performed all analyses independently on six hemispheres of three healthy human volunteers (27–33 years of age, one female) with normal or corrected-to-normal vision. Anatomical and functional MRI data were acquired with the understanding and written consent from each subject in accordance with the protocol approved by the RIKEN fMRI Safety and Ethics Committee, and in compliance with national legislation and the Code of Ethical Principles for Medical Research Involving Human Subjects of the World Medical Association (Declaration of Helsinki). Before functional experiments were conducted, a radiologist inspected high-resolution anatomical images of each subject's brain to confirm that there was no structural abnormality.
Visual Stimulation and Behavioral Task
During functional experiments, the subject viewed visual stimuli via a pair of MRI-compatible goggles equipped with an eye tracker (Avotec Inc., Stuart, FL, USA), connected to a Power Mac G4 computer (Apple Computer Inc., Cupertino, CA, USA). The goggles subtended approximately 24 × 18 degrees of visual angle.
In the motion coherence experiments, single frames of a stimulus were prepared in MATLAB R2009a (The Mathworks Inc., Natick, MA, USA) and presented to the subject at a screen refresh rate of 30 Hz. The stimulus was generated by placing 1333 white dots (3 × 3 pixels in size) on a black background (800 × 600 pixels) in a uniform, pseudo-randomized arrangement, resulting in an average density of three dots per square degree. The brightness of the dots was modulated such that they appeared white at the eccentricities between 1.3 and 7.7 degrees and faded linearly to black at the eccentricities from 7.7 to 8.8 degrees in the periphery and from 1.3 to 0.5 degrees in the center. No dots were visible at the eccentricities smaller than 0.5 degree and beyond 8.8 degrees. For any given new frame, all coherently moving dots were replotted along circular trajectories of 8.6 deg diameter at a distance of 0.3 deg (10 pixels rounded to the nearest coordinates on the display) from their coordinates on the preceding frame, resulting in coherent parallel movements at a speed of 9.0 degrees per second. These dots completed a full circumference every 3 s, so that in this time interval all motion directions were represented at any location of the visual field. Randomly moving dots had the same speed, but moved along random directions. The direction of each random dot was allowed to vary within ±6 degrees from the direction of the same dot on the preceding frame, such that the direction of randomly moving dots varied slightly from frame to frame. The coherent and random dots whose motion implied a new position outside the screen were replotted as if they reentered the screen from the opposite edge. Three levels of motion coherence, 10%, 50%, and 90%, were employed. Supplementary Movie 1, available online, shows one example of motion coherence = 90%. A blue fixation cross was placed at the center of the display where no dots were visible. The temporal structure of the stimulus used in the main experiment (Experiment 1) is depicted in Figure 1. The subject was initially presented with a pattern of 50% motion coherence for 60 s (adaptation coherence). The coherence was then transiently changed repetitively to either 90% (coherence increase) or 10% (coherence decrease) for 3 s in a pseudo-randomized order. Each test condition was presented for 3 s and succeeding tests were separated by the initial adaptation coherence (50%, as a re-adapter) with an inter-stimulus interval of 6–12 s. A similar temporal structure was used in additional functional experiments. In Experiment 2 and Experiment 3, subjects were adapted to either the lowest (10%) or the highest (90%) coherence, respectively, and coherence was transiently increased to either 50% or 90% (in Experiment 2) or decreased to either 50% or 10% (in Experiment 3) similarly to the main experiment. In Experiment 4, the 3 s test stimuli were the three levels of coherence (10%, 50%, and 90%), while a black screen was presented during adaptation period and inter-stimulus intervals.
In the experiments aiming at identifying retinotopic areas, we employed standard visual stimuli as described in the literature (Engel et al. 1994, 1997; Larsson and Heeger 2006; Wandell et al. 2007). The polar-angle mapping of the visual field was achieved by using high-contrast 36-degree wide wedges that covered the visual field up to 9 degrees of the eccentricity (sparing the fovea) and rotated either in clockwise or counterclockwise direction about the fixation point with a period of approximately 25 s. The eccentricity mapping was achieved by means of expanding and contracting ring-shaped stimuli centered at fixation, which systematically covered the same extent of visual field as that in the polar-angle mapping with the same period. Snapshots of wedge and ring stimuli used in the retinotopic mapping as well as example retinotopic-mapping results are shown in Figure 2B and C. Several areas including hMT+ and V6+ were identified by strong responses to the coherent motion stimulus.
Throughout functional experiments, the subject was instructed to fixate the central fixation cross and report whenever its color was briefly brightened or dimmed by pressing a response button. Offline analyses of subjects' eye positions (measured with the eye tracker attached to the goggles) and their responses to luminance changes indicate that all subjects maintained fixation during functional scans and had no difficulty in performing the moderately demanding task.
Imaging Hardware and Parameters
All experiments were conducted at RIKEN Brain Science Institute (Wako, Japan) on an Agilent 4 Tesla whole-body MRI scanner (Agilent Technologies, Santa Clara, CA, USA). Data were processed on a Sun Blade 2005 workstation (Sun Microsystems, Santa Clara, CA, USA) and a Mac Pro computer (Apple Computer Inc., Cupertino, CA, USA).
High-Resolution Three-Dimensional (3D) Anatomical Images
Both T1-weighted and T2*-weighted anatomical images were acquired from each subject with a whole-brain ‘Duyn’ array (Nova Medical Inc., Wilmington, MA, USA) at a spatial resolution of 1 mm3 (isotropic).
Two-Dimensional (2D) Functional and Anatomical Scans
Images were acquired with a 5-inch transmit/receive quadrature radio-frequency (RF) surface coil, covering the most posterior part of the subject's head. T2*-weighted functional images were acquired with a two-segment gradient-recalled echo-planar imaging (EPI) pulse sequence, measuring the BOLD signal in the imaged volume (Ogawa et al. 1990). Sixteen or 17 slices (matrix size: 64 × 48) were prescribed pseudo-coronally, perpendicular to the calcarine sulcus. Each slice was 3 mm thick and covered a field view of 19.2 × 14.4 cm2, resulting in a 3 mm isotropic spatial resolution in the imaged volume. The echo time was 20 ms, the repetition time (TR) of each segment was 42.4 ms, resulting in a volume TR of approximately 1.45 s (for 16 slices) or 1.53 s (for 17 slices), and the nominal value of the flip angle over the region of interest (ROI) was 45°. The first volume at the beginning of each scan was acquired without phase encoding, which was used to correct for phase errors (Kim et al. 1996), and the first echo in each segment was a navigator echo used to correct inter-segment variations in phase and amplitude (Bruder et al. 1992). Subjects' respiration and heart rate were monitored and recorded throughout all scans by a pressure sensor placed on the abdomen and a pulse oximeter attached to a fingertip. The recorded signals were used for offline correction of physiological artifacts contained in functional images (see Processing and analysis of functional data section below). Each subject underwent 9–11 functional scans for the motion coherence experiments (the average length of each scan was 12′16″) and 14 scans for retinotopic mapping (5 scans for wedges rotating clockwise, 5 for wedges rotating counterclockwise, 2 for expanding rings, 2 for contracting rings; each scan lasted 4′27″). At the beginning of each functional session, a set of 2D FLASH T1-weighted images, at an in-plane resolution of 1.5 × 1.5 mm2, were also acquired with the same RF coil and slice prescription as in functional scans.
Processing and Analysis of Data
Processing of High-Resolution Anatomical Data and Computation of Flattened Cortical Patches
For each subject, the two sets of high-resolution anatomical images (T1-weighted and T2*-weighted) were first aligned using the 3dAllineate program in AFNI (Cox 1996) and then T1-weighted images were divided by corresponding T2*-weighted images in order to correct for the brightness inhomogeneity across the field of view (Van de Moortele et al. 2009). By using software package FreeSurfer (available for download at http://surfer.nmr.mgh.harvard.edu/), the inner and outer borders of the cortical gray matter were automatically demarcated (Dale et al. 1999). These data were used to obtain flattened representations of the parieto-occipital cortex (Fig. 2) by the mrFlatMesh program (Wandell et al. 2000), a software contained in the package mrTools (available for free download at http://gru.brain.riken.jp).
Processing and Analysis of Functional Data
Raw EPI images were corrected in k-space for inter-segment phase and amplitude variations (Kim et al. 1996) and for respiratory and cardiac fluctuations (Hu et al. 1995). After EPI image reconstruction, the time series were corrected for rigid-body motion artifacts by using 3dVolReg, a 3D motion compensation tool included in AFNI (Cox 1996). A high-pass Fermi filter with a cutoff frequency of 0.02 Hz (but retaining the DC component) was then applied to suppress slow signal drifts in the time series without affecting event-related responses, which are sufficiently high in frequency. Functional data analysis was performed in MATLAB using the software package mrTools and other custom-built functions. Data were aligned to each subject's high-resolution anatomical images and then converted into percent signal changes by dividing the time series of individual voxels by their respective mean response values.
Data from multiple scans in the motion coherence Experiments 1 through 4 were, respectively, appended to form single time series. Such concatenated four scans held a minimum of 44 to a maximum of 93 repetitions per stimulus type, depending on experiment number and subject studied. The mean hemodynamic response function (HRF) for each stimulus condition was computed using a deconvolution approach (Dale 1999). The following matricial model was assumed: S × H + N = B, where B is a column vector having dimensions V×1, representing a BOLD time course of V volumes of a single voxel; S is the stimulus convolution matrix that, in the case of two types of test stimuli, has dimensions V×TP, where T is the number of stimulus types (T = 2 in Experiments 1 through 3, and T = 3 in Experiment 4) and P is the number of time points for which we calculate the estimated hemodynamic response; H is a column vector having dimensions TP×1 and representing estimated hemodynamic responses; N is the additive, zero-mean Gaussian noise, having dimensions V×1. The matricial structure of the model for T = 2 is depicted in an example shown in Figure 3. Hemodynamic responses of approximately 26 s were reconstructed by solving the equation H = (STS)−1STB and setting P to either 17 or 18, depending on the volume TR of each scan (1.45 s or 1.53 s for 16 or 17 slices prescribed for that scan). To assess the goodness of fit of the deconvolution model, estimated hemodynamic responses were convolved with stimulus times to form a model time course, and the amount of variance r2 in the original time course, which is accounted for by this model time course, was computed. Statistics (P-values) for the r2-values were obtained using a permutation analysis method described previously (Gardner et al. 2005). In short, the r2-values were re-calculated for all voxels using 10 different matrices S that were generated by randomized stimulus presentation times and all resultant r2-values from all voxels obtained in this way were combined into a single distribution. This combined r2 distribution was taken as an estimate of the distribution of r2 expected by chance. By taking the r2-value that was ranked as the 99.5% highest r2-value in the randomized distribution, we could determine a cutoff r2-value for selecting only the voxels with P < 0.005. The analysis of response amplitudes was performed only on the voxels whose r2-values were above this threshold (P < 0.005). For each area in each hemisphere, an average time series was obtained by averaging the time series of these selected, significantly responsive voxels from the same area (areas that did not contain any significant voxel were considered unresponsive to the stimulus). To quantify the response amplitude A (measured in percent signal change) to each stimulus type in each area, each of the HRFs (one for each stimulus type) for each resultant time series was modeled as a gamma function:Boynton et al. 1996). The parameters of each gamma function were chosen as those that minimized the mean squared difference between the model time course and the actual time course.
Data from Experiment 4 underwent further analysis. For each ROI in all hemispheres, the normalized average of HRFs to the three stimulus types was used as the canonical HRF of that ROI in a general lineal model, in which the regression coefficients β to be estimated were the amplitudes of responses to individual trials. One-way ANOVA tests were performed on the β-values, grouped for different levels of motion coherence, to assess the difference among them.
Retinotopic mapping data were analyzed following standard procedures (Engel et al. 1994, 1997; Larsson and Heeger 2006; Wandell et al. 2007). Each scan in our study consisted of 10.5 cycles of rotating wedges or expanding/contracting rings. The data in the first half a cycle, acquired before the longitudinal magnetization had reached the steady state, were discarded. The time courses obtained with counterclockwise-rotating wedges were time-reversed and then averaged with those obtained with clockwise-rotating wedges (hemodynamic delays were taken into account when these time courses were averaged). Similar time-reversal and averaging were also carried out for time courses obtained with contracting and expanding rings. The two resultant time courses, each consisting of 10 cycles, were used in a voxel-based correlation analysis for mapping the polar-angle and eccentricity representations of the visual field, respectively (shown as flattened maps in Fig. 2B and C). The borders of visual areas were defined according to the standard conventions (Wandell et al. 2007) (Fig. 2A). The region anterior to human V4 (hV4) was labeled as VO, which may encompass both area VO-1 and area VO-2 (Brewer et al. 2005) but was not further subdivided in our study. V3A and V3B are the two areas exhibiting distinct retinotopic maps dorso-laterally to V3d. V3A is anterior to V3B and responds to coherent motion (Fig. 2D) in Experiment 2 (adapting to 10% motion coherence and testing at 50% and 90%, respectively). Human middle temporal complex (hMT+), parieto-occipital complex (V6+), and posterior intraparietal sulcus (area IPS0/V7) were determined by their strong responses to coherent motion. The anatomical locations of the areas are in perfect agreement with those reported in the literature (Tootell et al. 1995; Dumoulin et al. 2000; Annese et al. 2005; Pitzalis et al. 2006; Malikovic et al. 2007; Swisher et al. 2007; Amano et al. 2009; Pitzalis et al. 2010).
In Experiment 1, the subject was adapted to visual stimuli consisting of white dots that moved at 50% coherence, and test stimuli at increased (90%) or decreased (10%) coherence were repeatedly presented in an event-related fashion (Fig. 1). In agreement with the hypothesis that hV4 plays a role in signaling changes in the stimulus, we found that this area indeed responded positively to both transient increments and decrements in motion coherence (Fig. 4A, left, for one representative hemisphere). Of 8 significantly responsive hV4 voxels identified in this hemisphere, all exhibited a similar response profile, indicating that this response pattern, in particular the positive response to the decrease in motion coherence (from 50% to 10%), reflects a common property in hV4. Similar trends were also observed in the remaining 5 hemispheres: overall, we confirmed that hV4 responded significantly positively to both motion coherence increases (BOLD signal change 0.85 ± 0.26%, P < 0.011, one-tailed t-test) and decreases (0.53 ± 0.1%, P < 0.002) in all 6 hemispheres studied (Fig. 4B, left). In contrast to the behavior observed in hV4, we found that human middle temporal complex (hMT+), a region that is well studied for its involvement in visual motion processing and is thought to be in the same hierarchy as hV4 (Ungerleider and Mishkin 1982), coded the motion coherence faithfully (Rees et al. 2000): It exhibited a positive response to increases in motion coherence from 50% to 90% (0.8 ± 0.12%, P < 0.0005) and a negative response to decreases in motion coherence from 50% to 10% (−0.34 ± 0.11%, P < 0.013) (Fig. 4A, right, and B, right).
Across the identified visual areas (Fig. 2), we further discovered several areas that behaved either like hV4 or hMT+: V3B (Press et al. 2001; Zeki et al. 2003) and the portion of the VO cortex abutting hV4 anteriorly (Brewer et al. 2005) responded always positively to both increments (V3B: in all 6 hemispheres, 0.73 ± 0.13%, P < 0.0013; VO: in 4 of 6 hemispheres, 1.03 ± 0.35%, P < 0.032) and decrements (V3B: 0.21 ± 0.06%, P < 0.0096; VO: 0.65 ± 0.2%, P < 0.024) of motion coherence (Fig. 5A), whereas V3A (Tootell et al. 1997; Bartels et al. 2008) and parieto-occipital complex V6+ (Pitzalis et al. 2010) responded positively to motion coherence increments (V3A: in 5 of 6 hemispheres, 0.51 ± 0.21%, P < 0.036; V6+: in 5 of 6 hemispheres, 0.71 ± 0.17%, P < 0.032) and negatively to decrements (V3A: −0.4 ± 0.05%, P < 0.007; V6+: −0.53 ± 0.03%, P < 0.0001) (Fig. 5B). Along the dorsal stream, while the involvement of human V3A and V6+ in processing aspects of visual motion information has been documented in previous publications (Tootell et al. 1997; Bartels et al. 2008; Pitzalis et al. 2010), the finding that V3B can extract salient changes in visual stimuli provides additional evidence that this area is functionally involved in more than just extracting object information from motion cues (Zeki et al. 2003).
The behavior that we observed in hV4, VO, and V3B was unlikely the artifact caused by the presence of large dural sinuses, in particular the transverse sinus (TS) (Winawer et al. 2010). First, TS is not always located near hV4. As clearly described by Winawer et al., TS abuts hV4 only in some subjects. Because the presence of TS results in local field inhomogeneity (B0 shift), the EPI images tend to be dark (low signal intensity due to de-phasing) and the BOLD signal tends to be out of phase with stimulus. These (low signal intensity and out-of-phase response in retinotopic mapping) were not found near hV4 of our subjects. Second, even in the worst case, as reported by Winawer et al., TS distorts mostly the representation of the lower vertical meridian around the lateral edge of hV4, and thus the completeness of the hemi-field representation in this area, while bulk of hV4 should not be affected. Finally, VO borders hV4 anteriomedially, and is thus more distant from TS than hV4. V3B is dorsally located, further away from TS. Hence, the responses in these two areas should not be affected by the presence of TS.
The characteristic, always-positive response pattern observed in hV4, VO, and V3B was confirmed in Experiment 2 and Experiment 3, as well, in which the subject was adapted to either the lowest (10%) or the highest (90%) coherence, respectively, and coherence was transiently increased to either 50% or 90% (in Experiment 2) or decreased to either 50% or 10% (in Experiment 3) in an experimental design similar to that of the main experiment. In both experiments, these areas responded positively to all test stimuli, both coherence increments and decrements, irrespectively of the level of motion coherence to which the subject was adapted (Fig. 6).
Our r2-based analysis revealed significantly responsive voxels also in visual areas V1, V2, V3, and IPS0/V7. The average responses to transient increments and decrements in motion coherence in these areas, however, resembled neither the pattern that we observed in hV4 nor that in hMT+ (Fig. 7). The response pattern observed in V1 was opposite to that in hMT+: V1 responded negatively to motion coherence increments (−0.32 ± 0.1%, P < 0.011) and positively to decrements (0.17 ± 0.05%, P < 0.008). In V2 and V3, responses to motion coherence increases (V2: −0.21 ± 0.14%, P > 0.12; V3: 0.39 ± 0.28%, P > 0.11) and decreases (V2: 0.05 ± 0.09%, P = 0.3; V3: 0.04 ± 0.16%, P > 0.4), on average, were not significant. Area IPS0/V7 showed only a positive response to motion coherence increases (0.88 ± 0.11%, P < 0.0003) but not to decreases (0.02 ± 0.03%, P > 0.3). Regarding the response pattern characterizing V1, also observed previously in the literature (Handel et al. 2007), it may be related to the structure in our stimulus, that is, the change in motion coherence will lead to local changes in dot density. Stimuli characterized by low coherence possess high spatiotemporal variability in dot density than stimuli at high coherence, which leads to changes in stronger evoked responses in V1 in the case of random motion. It is also compatible with predictive coding models that posit a decrease of activity in early visual areas when higher-level areas are able to describe, or ‘explain away’, coherent features in the visual stimulus (Murray et al. 2002).
In an effort to elucidate the mechanisms responsible for the response behavior observed in hV4, VO, and V3B, in Experiment 4, we measured the cortical responses to the three levels of motion coherence employed above (10%, 50%, and 90%) with a black screen as baseline. The amplitude of the response to each stimulus type did not differ significantly in hV4 (P > 0.2, one-way ANOVA), VO (P > 0.2), and V3B (P > 0.6), whereas, as expected, it strongly correlated with motion coherence in hMT+ (P < 0.0002, Fig. 8).
A System for Signaling Changes in the Stimulus
The results from the present study revealed a previously unknown property of visual areas hV4, VO, and V3B in humans, which respond always positively to both increments and decrements in motion coherence. The BOLD response, no matter increase or decrease, is positively correlated with excitatory neural activity, at least in the visual cortex (Shmuel et al. 2002; Shmuel et al. 2006), although the correlation is slightly better between BOLD response and local field potential than that between BOLD response and neural activity (Logothetis et al. 2001). In addition, a general linear relationship between BOLD response and neural response has been observed in the visual cortex in several studies (Boynton et al. 1996; Rees et al. 2000; Logothetis et al. 2001). Thus, we interpret that BOLD response and neural response have a positive relationship in our study. The present observations extend our previous finding that hV4 responds always positively regardless of whether contrast was incremented or decremented (Gardner et al. 2005), and demonstrate that this property of hV4 responding to changes can be extended to other types of stimulus attributes, including motion coherence.
The fact that this property, using a well-established motion stimulus, is found in hV4, VO, and V3B, and not in hMT+, V3A, and V6+, which play crucial roles in motion processing, also strongly suggests that this mechanism is unlikely domain specific. Rather, hV4, VO, and V3B may underlie a generic circuit for detecting changes in the retinal input, irrespective of the strength and identity of the input, that is, they represent visual changes in an abstract, stimulus-independent manner. In addition, as it has been reported in portions of the lateral–occipital cortex (Amedi et al. 2001; Pietrini et al. 2004) and the dorsal visual stream (Prather et al. 2004; Poirier et al. 2005; Ricciardi et al. 2006), including hMT+ (Hagen et al. 2002; Poirier et al. 2006; Ricciardi et al. 2007), growing evidence has indicated that the extrastriate cortex processes information that is not restricted to the visual modality. Hence, we speculate that these ‘change detectors,’ and in particular those in areas hV4 and VO, may not be limited to vision only; they may signal changes in other sensory modalities as well.
A Stimulus-Driven Mechanism
The subjects in our study were engaged in a luminance-change detection task on the centrally located fixation cross, where there were no moving dots. Subjects were neither requested to report changes in motion coherence, nor were asked to perform any stimulus-driven actions. We thus contend that this physiological behavior observed in hV4, VO, and V3B reflects an automatic neuronal process that is responsible for signaling temporal discontinuities in a natural scene. This process is different from (but may be closely related to) other processes that automatically detect spatial discontinuities in the visual scene (Mazer and Gallant 2003; Constantinidis and Steinmetz 2005): abrupt changes in motion coherence may trigger involuntary, stimulus-driven mechanisms of attention, whose effects are fast and decay quickly (Nakayama and Mackeben 1989). It is important to note, however, that the effect we have observed in hV4, VO, and V3B, and not elsewhere in the visual cortex, is very different from other known attentional modulations due to task contingencies and sustained or transient attention, which typically have large effects throughout visual cortical areas (Liu et al. 2005; Pestilli et al. 2011).
Mechanisms for Signaling Changes in Motion Coherence
A very general mechanism that may explain the pattern of always-positive responses in hV4, VO, and V3B is that, when there is a change in the visual field, subjects involuntarily defocus attention from fixation, thereby inhibition in the periphery, where the change takes place, would be reduced selectively in these areas. This phenomenon would be reflected in the positive responses observed in hV4, VO, and V3B, which would serve as sentinels of changes to the whole visual system.
Mechanisms specific to motion coherence, instead, can result from the tuning properties of neurons in hV4, VO, and V3B. Data shown in Figure 8 can derive from two hypothetical scenarios. In the first case, heterogeneous groups of neurons in these areas may be tuned to at least two coherence levels, preferring either higher motion coherences or lower coherences. Our preliminary results indicate that a small proportion of voxels in hV4, VO, and V3B were indeed tuned to different coherence levels (data not shown). The presence of these coherence-level tuned neuron populations would be sufficient for explaining the positive BOLD response observed in hV4, VO, and V3B to the decrease in motion coherence, namely, the positive response can be the consequence of selective adaptation: a baseline motion with higher coherences should selectively adapt the neuron group tuned to higher coherence levels, and any change from such an adapted baseline, including a decrement in motion coherence, would result in a positive BOLD response elicited by the activity of neuron populations preferring motion coherences that are lower than adapted coherences. In the second case, even with the neurons that are not selective for the level of motion coherence, it would still be possible for a positive response to be elicited through stimulus-selective adaptation. For example, after adaptation to a higher coherence, the response of the entire area to this coherence would decrease, and a positive response would then be evoked when the coherence is transiently changed to a lower level. Such a stimulus-selective adaptation has already been described for neurons in the monkey inferotemporal cortex (Sawamura et al. 2006; Liu et al. 2009; De Baene and Vogels 2010).
Alternatively, hV4, VO, and V3B may generate an abstract representation of the change by virtue of their specific interactions with the areas from which they receive feature-specific inputs. Several recent studies have shown that hV4 exhibits a qualitatively different response behavior from those in earlier cortical areas (Gardner et al. 2005; Donner et al. 2008; Sligte et al. 2009). The opposing patterns of responses in hV4 and hMT+ to decrements in the motion strength are consistent with opposing response modulations observed during bi-stable perception, which have pointed to the strong interaction between dorsal and ventral visual pathways at the level of these two areas (Donner et al. 2008).
Besides contrast and motion coherence, it remains to be studied how hV4 and related areas would respond to changes in other stimulus features, such as color saturation and object intactness. However, unlike the case with luminance contrast, to which almost all neurons in V4 (and presumably hV4) respond monotonically (Cheng et al. 1994; Liu and Wandell 2005), only a portion of neurons in V4 are selective for colors or shapes (Roe et al. 2012). Thus, the behavior of hV4, ultimately, is likely to be determined by the bulk of neurons that are not selective for but broadly tuned to these visual properties, which can still evoke positive responses through the mechanism of stimulus-selective adaptation. Future studies are needed to justify if this property of hV4 and related areas in responding to changes can be generalized.
Supplementary material can be found at: http://www.cercor.oxfordjournals.org/
This work was partially supported by the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) grants 18020033 (to K.C.) and 20020033 (to K.C.), and the Japan Society for the Promotion of Science (JSPS) grant 20300114 (to K.C.). Funding to pay the Open Access publication charges for this article was provided by the RIKEN Brain Science Institute.
We thank Tobias Donner for inspiring discussion and Danilo Scelfo for technical support. Conflict of Interest: None declared.