Many theories of visual object perception assume the visual system initially extracts borders between objects and their background and then “fills in” color to the resulting object surfaces. We investigated the transformation of chromatic signals across the human ventral visual stream, with particular interest in distinguishing representations of object surface color from representations of chromatic signals reflecting the retinal input. We used fMRI to measure brain activity while participants viewed figure-ground stimuli that differed either in the position or in the color contrast polarity of the foreground object (the figure). Multivariate pattern analysis revealed that classifiers were able to decode information about which color was presented at a particular retinal location from early visual areas, whereas regions further along the ventral stream exhibited biases for representing color as part of an object's surface, irrespective of its position on the retina. Additional analyses showed that although activity in V2 contained strong chromatic contrast information to support the early parsing of objects within a visual scene, activity in this area also signaled information about object surface color. These findings are consistent with the view that mechanisms underlying scene segmentation and the binding of color to object surfaces converge in V2.
For us to perceive “color”, our brains need to transform retinal signals about the relative dominance of short, medium, and long wavelengths of light into representations of colored surfaces. These surface representations are thought to be the critical input for higher-level functions of the visual system, laying the foundation for our ability to recognize objects (Grossberg et al. 1994; Nakayama et al. 1995). In the visual cortex, the computation of object surface color involves distinct levels of processing. Early stages encode differences in the wavelengths of light at the retina, which inform the brain about borders between objects and their backgrounds. Later stages are thought to convert these signals to more complex representations of color, which are generally resistant to changes in illumination within the environment (Land 1977) and are treated as an integral property of an object. Here, we explore this mapping of chromatic signals in the human visual cortex from early chromatic contrast representations underlying the extraction of object shape to representations of color as an object's surface property.
Although the majority of neurons in the primary visual cortex (V1) are color sensitive, their functional role has been predominantly associated with the early extraction of edges and form information within a visual scene (Hubel and Wiesel 1977; Livingstone and Hubel 1984; Hubel and Livingstone 1987; Johnson et al. 2001, 2008; Friedman et al. 2003; Shapley and Hawken 2011). For instance, most color-selective neurons in this region elicit limited responses to the interior of a colored figure but show strong responses to chromatic contrast borders between figures and their backgrounds (Hubel and Wiesel 1968; Hubel and Livingstone 1990; Lennie 1998; Friedman et al. 2003; Shapley and Hawken 2011). Hierarchical models of visual processing propose that these color-selective cells in early retinotopic cortex help to segment visual scenes into figure-ground representations (Hubel and Wiesel 1968, 1977; Lee et al. 1998), whereas cells in regions beyond early visual cortex compute surface properties of the resulting objects, including those that create our perception of surface color (Zeki and Marini 1998; Conway et al. 2007; Bouvier et al. 2008).
Many theories of visual object perception have also suggested that the visual system initially segments objects from the background and later “fills in” or “binds” color to the resulting surfaces (Grossberg and Mingolla 1985; Humphreys et al. 2000; Roelfsema et al. 2007; Shipp et al. 2009). This idea has been supported by findings that surface color perception is highly contingent on object form (Krauskopf 1963; Bloj et al. 1999; Feitosa-Santana et al. 2011), whereas form perception is limited primarily by color contrast, rather than color per se (Kovacs and Julesz 1992; Clifford et al. 2004; Wilson and Switkes 2005; Rentzeperis and Kiper 2010). Furthermore, substantial evidence suggests that the majority of cells in early visual areas respond primarily to polarity-specific changes in color contrast (Hubel and Wiesel 1968; Lennie 1998; Friedman et al. 2003; Shapley and Hawken 2011), whereas responses in mid-tier areas, particularly area V4, reflect perceived color that is typically associated with an object's surface (Zeki 1980, 1983, 1990; Zeki and Marini 1998; Bartels and Zeki 2000; Kusunoki et al. 2006; Bouvier et al. 2008). Surface color judgments are also affected by high-level cognitive factors such as color memory (Hansen et al. 2006), as are the neural responses of areas such as V4 and parts of the inferior temporal cortex (Zeki and Marini 1998; Slotnick 2009; Bannert and Bartels 2013). Overall, then, the representation of object surface color has been predominantly thought to involve mid-tier cortical areas (although see Shapley and Hawken 2011) where color is integrated with other object features, such as object form (Grill-Spector et al. 1998; Zeki and Marini 1998).
In this study, we collected functional magnetic resonance imaging (fMRI) data and used 2 variants of multivoxel pattern analyses (MVPA) to explore the transformation of chromatic information in the human cortex. Specifically, we examined the extent to which each region in the human ventral visual stream represents chromatic signals that reflect the sensory input arriving from the retina versus chromatic signals representing the higher-level encoding of color as a “bound” property of an object's surface. We developed a novel variation of MVPA to test within each visual region the relative weights given to neural coding of color (red vs. green) at a set retinal location (“color contrast polarity”) versus the neural coding of color of an object's surface (red vs. green) independent of its position on the retina (“object surface color”). This allowed us to effectively pit these 2 types of signals against each other and document the biases in each visual area for chromatic mechanisms contributing to the extraction of figure-ground information versus the encoding of object surface color (Fig. 1). Then, using standard MVPA logic, we tested for the presence of this information irrespective of any processing bias.
Methods and Materials
Ten participants (5 male and 5 female) with normal or corrected-to-normal vision took part in this study. They provided written informed consent and were paid for their time. The study was approved by the Macquarie University Human Research Ethics Committee.
Stimuli were presented with Psychtoolbox 3.0.8 (Pelli 1997) running under Matlab 2013b (Mathworks). They were back-projected onto a screen placed at the end of the scanner bore and viewed through a mirror located above the subject's eyes from a viewing distance of 135 cm. Screen resolution was 1280 × 1024 pixels, and the refresh rate of the projector was 60 Hz. All stimuli were presented at the center of the screen, where subjects were asked to fixate.
Main Experimental Stimuli
We generated 4 unique figure-ground stimuli (Fig. 1) by modifying a single red/green square-wave radial grating (6 cycles per revolution; radius 7 degrees of visual angle) that was developed based on a stimulus used by Fang et al. (2009). The first stimulus had the red segments of this grating slightly elongated in the radial direction by 0.5° to induce the perception of a red object on a green background (R1). The second stimulus had similarly elongated green segments to induce the perception of a green object on a red background (G2). We then created 2 additional stimuli by shifting the phase of these stimuli, such that the red and green segments fell at exactly opposite retinal locations (R2 and G1). Prior to scanning, the red and green segments were calibrated for equiluminance inside the scanner bore using the minimum flicker technique (Anstis and Cavanagh 1983). To minimize any small luminance differences that might artifactually drive classification performance, we also incorporated uniform luminance noise increments or decrements (±10% mean luminance) to the 2 colors in each stimulus with an equal frequency of increments and decrements to each color across runs.
Stimulus Localizer Stimuli
We applied a strict control in our experimental design to avoid incorporating stimulus artifacts that could influence a linear classifier's performance in decoding our stimuli. Specifically, since we were interested in dissociating “color contrast” signals reflective of the retinal input from “object surface color” signals, we included 2 stimulus localizer runs at the end of our main experiment that we then used to separate voxels corresponding to the elongated edge regions of our stimuli from those responding to the central portion.
In the localizer runs, similar stimuli were presented to those of the main experiment but restricted to 1 of 2 apertures: an “inner” (1.3–4.0° eccentricity) or an “outer” (4.0–8.0° eccentricity) aperture (Fig. 2). This enabled us to isolate regions of interest (ROI) corresponding exclusively to the “inner” part of our stimuli, thus removing activation elicited by the outer edge regions (see also “Feature (voxel) selection and ROI definition”). Hence, these localizer runs ensured that the voxel activation patterns submitted to our classification analyses corresponded to the central portion of the stimulus where the position and color of the object (i.e., figure) was ambiguous (Fig. 2B).
In addition to these localizer runs, in a separate scan session, we performed detailed retinotopic mapping and localization of human V4 (hV4) and lateral occipital complex (LOC) for each participant (see “Retinotopic mapping and region-of-interest definition”).
fMRI and Analysis
Each participant was scanned on 8 experimental runs followed by 2 stimulus localizer runs. Prior to scanning, participants familiarized themselves with the stimuli and task instructions. During the runs, they fixated on a point in the center of the screen (0.15° in size) and performed a fixation task where they pressed a single button as quickly as possible when a small luminance change occurred. Luminance increments and decrements alternated across each block, approximately every 1.7–2.4 s across an entire run.
For the main experimental runs, stimuli were displayed in a blocked design. Each block lasted 12 s followed by a 1-s randomly generated white pixel noise mask to prevent the formation of an afterimage. Each stimulus condition was presented 4 times per run in a pseudorandom order and was preceded equally often by the other conditions. The stimulus localizer runs employed a similar blocked design to the main experiment and used the same noise masks and fixation task. However, each 12-s stimulus alternated with blank 12-s fixation blocks. The 8 stimulus conditions (i.e., 4 main stimuli seen through 2 different apertures) were repeated twice per run such that each stimulus (e.g., R1) was first viewed through the inner aperture, then a fixation block occurred, and then the same stimulus was viewed through the outer aperture. Stimulus order was pseudorandom across the 2 runs, ensuring each of the 4 stimuli preceded each other equally often.
fMRI Data Acquisition
Data were collected using a 3T Siemens Verio scanner and 32-channel head coil at the Macquarie Medical Imaging facility, Macquarie University Hospital, Sydney, Australia. Blood oxygen level-dependent responses were measured using echoplanar imaging with an ascending sequence incorporating the following parameters: repetition time 2.5 s; echo time 32 ms, flip angle: 80°, slice thickness 2 mm, interslice gap 0.2 mm, and voxel size 2 × 2 × 2 mm. We collected 33 slices positioned at an orientation parallel to the calcarine sulcus. We collected a high-resolution 3D structural scan (3D MPRAGE; 1 × 1 × 1 mm3 resolution) at the beginning of the scan session.
We carried out minimal preprocessing using Statistical Parametric Mapping (SPM8, Wellcome Department of Imaging Neuroscience, University College London, UK). Data were coregistered in raw AC-PC space and not transformed to any standard coordinate system; all analyses were carried out in individual subject space. We corrected for head motion, made a mean intensity adjustment, and removed low-frequency noise and drift, but no smoothing was applied. Functional data from the main experimental session and the separate retinotopic mapping session (see “Retinotopic mapping and region-of-interest definition”) were co-registered via the structural images collected for each participant in these 2 scan sessions.
Retinotopic Mapping and Region-of-Interest Definition
In a separate scan session using standard protocols (Engel et al. 1997), we performed retinotopic mapping scans. A high-resolution structural image and functional data were collected as per the main experimental scan sequences. Functional data were transformed onto an inflated representation of the cortical surface using Freesurfer (Sereno et al. 1995). Manual delineation of the borders between visual areas V1, V2, V3, and V4 was based on the phase of the responses of each voxel to standard polar angle (wedge) protocols. The human V4 complex (hV4) was defined as those voxels anterior to V3v in the fusiform gyrus representing both upper and lower visual fields (Brewer et al. 2005). We further restricted our definition to exclusively color responsive voxels by employing an additional V4 localizer scan that contrasted colored Mondrian versus monochrome stimuli (McKeefry and Zeki 1997). Thus, hV4 was defined as all voxels surviving a threshold of P < 0.05 (FDR-corrected) within this prescribed location. Average Talairach coordinates of the peak responding voxel (left hemisphere −27 ± 6, −74 ± 7, 14 ± 4, right hemisphere 26 ± 5, −79 ± 4, −16 ± 5) were consistent with Brewer's definition of hV4 (Brewer et al. 2005). As degraded representations of the visual field meridians beyond the anterior border of hV4 were observed in most of our participants, we did not isolate anterior color responsive regions, VO-1 and VO-2 (Brewer et al. 2005; Mullen et al. 2007). A scan that contrasted objects versus scrambled objects isolated object-selective voxels within the anatomical region LOC (Malach et al. 1995). The same statistical cutoff applied. In all cases, bilateral visual regions were combined to form single ROIs.
Feature (Voxel) Selection and ROI Definition
Since we wanted to avoid classification performance being influenced by the elongated edge regions of our stimuli, based on our independent localizer data, we first defined voxels in each visual region that responded to the inner portion of the stimuli where the position and color of the object was ambiguous (i.e., excluding voxels that responded to the varying elongated edges, see “Stimulus localizer stimuli”). In each participant, feature selection masks were created from this data for each visual area using the general linear model (GLM) as implemented in SPM. We modeled inner and outer conditions from the stimulus localizer runs and isolated those voxels surviving a threshold of P < 0.05 (FDR-corrected) in a contrast between these 2 conditions. We then intersected this mask with each separately mapped visual area (see “Retinotopic mapping and region-of-interest definition”) to create ROIs for our classification analyses of the experimental data. We did not create the same feature selection masks for area LOC because, for most subjects, an insufficient number of voxels survived the statistical threshold in this region. Hence, classifiers were presented with voxel activation patterns from LOC that corresponded to the entire stimulus including the elongated edge regions (inner + outer > fixation), surviving a threshold of P < 0.05 (FDR-corrected). Also note that as we progress along the ventral visual hierarchy receptive field sizes increase, and thus there are less voxels that can be isolated as responding purely to the inner region of the stimuli in hV4 than in V1. For example, voxel numbers presented to the classifier from V1 ranged from 266 to 594 across subjects, whereas voxel numbers from hV4 ranged from 53 to 74 across subjects.
Multivariate Pattern Classification
Multivariate analyses were performed separately on each subject for each ROI. In all analyses, stimulus classes were modeled as regressors in a GLM, and the resulting beta-estimates were used to train and test linear classifiers (Vapnik 1998). We employed a Matlab Decoding Toolbox (Hebart et al. 2014) that implements LibSVM software (http://www.csie.ntu.edu.tw/~cjlin/libsvm) to train support vector machines (SVMs) to classify our data sets. For each ROI of each subject, we calculated mean classification accuracy from cross-validations of classifier performance (see details for each “Analysis 1–3” below). We performed permutation testing on our ROI-based decoding to determine significance at the group level (Schreiber and Krekelberg 2013; Stelzer et al. 2013). We used a balanced block permutation strategy as described by Schreiber and Krekelberg (2013) where we created null data sets by randomly shuffling class labels and running our classifier analyses. We then calculated an average decoding accuracy across our 10 subjects and repeated this procedure 10 000 times to create a null distribution. The true data were considered significant (at P < 0.05) if it was larger than the 95th percentile of the null distribution.
Classifier Analysis 1: Decoding Representational Biases for Color Contrast Polarity Signals versus Object Surface Color
To understand how the brain transforms chromatic information across the visual cortex, we used a modification of MVPA, namely, pitting 2 types of chromatic signals against each other (Seymour et al. 2009). Our first analysis tested in which visual brain areas a classifier would discriminate the stimuli by using voxel activation patterns associated with color (red vs. green) on an object's surface over patterns associated with color contrast polarity, reflective of the early retinal input (red vs. green). All 4 stimuli were modeled as separate regressors in a GLM, and the resulting beta-estimates were used as data for our cross-classification analysis. Specifically, using data from all 8 runs, we trained an SVM classifier to assign a decision boundary to separate activation patterns related to red objects from activation patterns related to green objects presented at retinal position 1 (i.e., R1 vs. G1) and then tested whether this decision boundary could be applied to new data (the other 2 stimuli) from the 8 runs to correctly classify red objects from green objects presented at retinal position 2 (i.e., R2 vs. G2), and vice versa. Note that the train and test phases are conducted on data from viewing different stimuli. As all of our stimuli varied only by whether the red or green segments fell at a particular retinal location and appeared to be part of an object or not, our analysis determined whether distinct visual regions have a bias for encoding chromatic signals reflecting the sensory input versus chromatic signals representing color as a property of an object's surface. If voxels in a given ROI are more sensitive to “object surface color,” the classifier will generalize across the retinal position of the object to produce significant above-chance decoding performance (e.g., labeling R2 as being more similar to R1 than to G1). If, however, the ROI predominately encodes “color contrast polarity,” the classifier would systematically classify surface color “incorrectly”—would show up as significantly “below” chance performance (e.g., labeling R2 as being more similar to G1 than to R1 because of the color at each retinal position). Thus, the classifier can make a correct color classification based on color contrast signals at the expense of an incorrect surface color classification or vice versa. Mean surface color classification accuracy was computed over the 16 cross-validations (i.e., 8 runs under 2 train/test scenarios). Unlike conventional leave-one-out approaches, this method did not give the classifier any examples from the test stimuli during the training phase (i.e., the classifier was forced to generalize to a new “stimulus” set), making this a strict approach for examining “representational biases” across the visual cortex (Seymour et al. 2009).
Classifier Analysis 2: Decoding (Color-Invariant) Figure-Ground Organisation
The perception of object form defined by color is primarily limited by color contrast boundaries, rather than color per se (Kovacs and Julesz 1992; Clifford et al. 2004; Wilson and Switkes 2005; Rentzeperis and Kiper 2010). Based on this, we conducted a second analysis to test for the evidence of figure-ground representations that are invariant to color contrast polarity. Here, we pooled data into 2 data sets, modeling all objects displayed at position 1 (irrespective of their color) as one separate regressor in a GLM, and all objects at position 2 (irrespective of their color) as another separate regressor. SVMs were therefore explicitly forced to generalize across both color contrast polarity and object surface color signals (i.e., R1 + G1 vs. R2 + G2) to successfully learn the “position” at which the object (relative to its background) was presented on the retina. Classification performance was tested using a standard “leave-one-out” validation approach, in which the classifier was trained using all data from 7 of the 8 runs, with the remaining run being used for validation (iterating through train and test examples of the same stimuli until every run had been used as a test set). Above-chance performance would show the “presence” of information about the figure-ground organization of the stimulus (i.e., object position) that was invariant to the specific chromatic input.
Classifier Analysis 3: Decoding Object Surface Color
Because our primary analysis aimed to specifically examine representational “biases” for chromatic information in the human visual cortex, we carried out a third, more standard, analysis to investigate which visual areas carried information about surface color irrespective of a relative bias for encoding retinotopic color contrast signals reflecting the early spatial arrangement and content of light at the retina. That is, although some areas could be biased toward encoding color contrast over surface color signals, they might nonetheless still carry information about object surface color that we would miss with our direct comparison (Analysis 1). We again pooled data into 2 data sets and modeled all red objects (irrespective of their position on the retina) as one regressor, and all green objects (irrespective of their position on the retina) as another regressor in a GLM. Thus, in both data sets, SVMs were explicitly forced to ignore chromatic signals at a particular retinal location (i.e., color contrast polarity) to successfully distinguish red objects from green objects on the basis of voxel activation patterns (i.e., R1 + R2 vs. G1 + G2). We measured classification accuracy using a standard “leave-one-out” approach, again testing classifier performance by iterating through train and independent test examples of the same stimuli. Above-chance performance in a given ROI would show the “presence” of object surface color information, independent of the object's retinal position.
Behavioral data showed that subjects followed the task instructions correctly and attended to the fixation task. The mean proportion across participants of detected luminance increments and decrements of the fixation point was 72% (SD = 5%) and 70% (SD = 5%), respectively; reaction times were 430 ms (SD = 20 ms) and 440 ms (SD = 18 ms), respectively.
Classifier Analysis 1: Decoding Representational Biases for Color Contrast Polarity versus Object Surface Color
There were significant differences in classification biases for color contrast polarity versus object surface color across the visual hierarchy (repeated-measures one-way ANOVA: F1,4 = 17.73, P < 0.001). Figure 3 shows the mean surface color classification accuracy by ROI. Positive values signify a bias toward surface color, negative values signify a bias for color contrast, and a value of 0 indicates no bias (i.e., equal information about object surface color and color contrast polarity or no detectable information about either). In early visual areas (V1, V2, and V3), we found a strong bias for activation patterns to reflect color at a given retinal location. In these areas, the classifier based the decoding on color contrast polarity information (significant in V1: −79%, P = 0.002; and V2: −74%, P = 0.003), rather than information about surface color. Conversely, activation patterns in higher visual areas resulted in positive classification values indicating a bias for object surface color, with this reaching significance in the LOC (hV4: 54%, P = 0.148 and LOC: 57%, P = 0.031). Note that since there is no a priori baseline for this bias analysis, the absolute classification performance for each region is less informative than differences between regions. Differences in the strength of the bias are informative about distinct properties of the ROIs.
Classifier Analysis 2: Decoding (Color-Invariant) Figure-Ground Organisation
The results from Analysis 1 revealed a strong representational bias in early visual areas for polarity-specific color contrast signals. Our second analysis reveals that early areas also contain strong information about the figure-ground organization (i.e., object position) that was invariant color contrast polarity. Figure 4 reports the mean classification accuracy for decoding object position in the absence of useful color information. Here, linear classifiers were trained to generalize across both color contrast polarity and object surface color (i.e., R1 + G1 vs. R2 + G2) to successfully discriminate an object's position on the retina from independent test data sets. Classifiers could learn to discriminate color-invariant object position based on voxel activation patterns in V1 (59%, P = 0.0202), V2 (55%, P = 0.0394), and V3 (57%, P = 0.0340), but not in higher visual areas (V4: 53%, P = 0.111 LOC: 51%, P = 0.845). Note that because the magnitude of decoding performance can depend on both the level of functional involvement and on functional anatomy, including region size, we avoid making any direct comparison between ROIs here. Unlike our bias analysis, which is less susceptible to influences of region size (i.e., it indexes relative weights given to different types of coding within each region), of relevance here is that color-invariant figure-ground information “exists” within specific brain regions.
Classifier Analysis 3: Decoding Object Surface Color
In Analysis 3, we investigated which visual areas hold information about surface color information irrespective of relative biases for encoding color contrast polarity (Analysis 1). Figure 5 reports the mean classification accuracy for decoding object surface color. SVM classifiers were trained to generalize across chromatic signals at particular retinal locations (i.e., R1 + R2 vs. G1 + G2) to successfully distinguish object surface color from independent test data sets. Consistent with findings from Analysis 1, SVM classifiers could learn to discriminate object surface color from voxel activation patterns in LOC (67%, P = 0.010), irrespective of the position at which the objects, and therefore colors, were presented on the retina. There was also significant decoding of object surface color in hV4 (62%, P = 0.008). In addition, Analysis 3 revealed that there is significant surface color information present in voxel activation patterns of V2 (56%, P = 0.043), but not in other early visual regions (V1: 54%, P = 0.321; V3: 54%, P = 0.476).
We examined the representation of chromatic information across the human visual cortex from the early encoding of color contrast polarity, closely matching the retinal input, to representations of color as an inherent surface property of an object. Consistent with classic hierarchical models of visual processing, retinotopic chromatic signals (color contrast polarity) dominated voxel activation patterns (and thus classification) in early visual areas, whereas regions further along the ventral stream encoded representations of color with respect to the object. Our secondary analyses also showed that although early retinotopic areas were biased toward using chromatic signals to convey strong figure-ground information, brain activity in area V2 also carried information about object surface color. These results support claims that V2 may play an important role in the computation of object surface color (Zhou et al. 2000; Qiu and von der Heydt 2005; Zhaoping 2005; Grossberg and Hong 2006). This also supports current theories suggesting that color vision provides separate input to distinct channels underlying the processing of an object's shape and surface properties (Johnson et al. 2008; Cavina-Pratesi et al. 2010; Gheiratmand and Mullen 2014). Furthermore, our data support evidence for a shared circuitry within V2 that links scene segmentation and feature binding mechanisms to achieve a coherent perception of color “bound” to object surfaces (Grossberg and Mingolla 1985; Humphreys et al. 2000; Zhou et al. 2000; Roelfsema et al. 2007; Bartels 2009; Shipp et al. 2009).
The strong bias of V1 and V2 to represent color contrast polarity information is consistent with the accepted view that early retinotopic regions encode information corresponding to the spatial arrangement of light on the retina (Hubel and Wiesel 1977; Livingstone and Hubel 1984). However, although physiological recording show that a minority of cells in these regions respond to interior aspects of uniformly colored figures (Thorell et al. 1984; Johnson et al. 2001, 2008), our results likely reflect the majority of cells in these areas that respond to chromatic contrast boundaries between figure and background (Hubel and Wiesel 1977; Livingstone and Hubel 1984; Hubel and Livingstone 1987; Leventhal et al. 1995; Lennie 1998; Zhou et al. 2000; Friedman et al. 2003; Johnson et al. 2008; Shapley and Hawken 2011). Moreover, our second analysis revealed that responses in V1 and V2 signal figure-ground organization (i.e., object position) that is invariant to object color. Thus, our results support findings in the macaque that areas as early as V1 are involved in representing figure-ground information (Lamme 1995; Zipser et al. 1996; Lee et al. 1998; Zhou et al. 2000; Fang et al. 2009), which in this case was based purely on the processing of chromatic boundary signals and blind to the polarity of this input.
The lack of evidence for color-invariant figure-ground information in hV4 is consistent with this region's involvement in analyzing shapes defined specifically by color (Bushnell et al. 2011). Although separable tuning for object shape and color has been consistently observed in V4 neurons (Zeki 1980, 1990; Desimone and Schein 1987; Pasupathy and Connor 1999; Bushnell and Pasupathy 2012), our results suggest that this region also encodes these features as being bound to (and thus inseparable from) one another. Moreover, since surface color information was found in hV4 (Analysis 3), despite the absence of a strong representational bias (Analysis 1), our results indicate that this region might play a pivotal role in transforming early chromatic signals into more complex representations of object surface color. This would be consistent with a number of studies showing V4 activity reflects changes in perceived surface color as opposed to changes in the wavelengths of light hitting the retina (Zeki 1980, 1990; Zeki and Marini 1998; Conway et al. 2007; Bouvier et al. 2008). Moreover, since our data suggest strong surface color representations exist beyond hV4, namely in the LOC (Analysis 1, 3), our findings support mid-tier visual areas representing color as an inherent property of an object's form and identity (Grill-Spector et al. 1998; Zeki and Marini 1998). While VO color regions anterior to hV4 were not examined in this study, their functionally distinct preference for colored stimuli over luminance defined stimuli (Wade et al. 2002; Brewer et al. 2005; Liu and Wandell 2005; Jiang et al. 2007; Mullen et al. 2007) suggests that these regions might also exhibit biases for surface color representations. Failure to observe object position information in hV4 is also consistent with the notion that more anterior regions of the ventral stream encode higher-level “whole” object representations that are tolerant to changes in position (Desimone and Schein 1987; Kobatake and Tanaka 1994; Grill-Spector et al. 1999; Carlson et al. 2011).
Although our main analysis (Analysis 1) demonstrates that early visual areas have a strong representational bias for signaling color contrast polarity, suited for the extraction of object form information, Analysis 3 revealed that area V2 also holds a representation of color associated with an object's surface. This finding fits with physiological evidence of edge-selective cells in V2 preferentially responding to figures of a specific color, irrespective of which side of the edge the figure is presented (i.e., independent of the object's retinal position). Moreover, finding both color contrast and object surface color responses in V2 supports recent suggestions that the mechanisms underlying scene segmentation and object feature binding converge in V2 (Zhou et al. 2000; Bartels 2009; Shipp et al. 2009). It also supports the notion that distinct types of color responsive neurons in early visual cortex are needed for the linkage of color and object form (Zhou et al. 2000; Johnson et al. 2008; Cavina-Pratesi et al. 2010; Gheiratmand and Mullen 2014). Future research to test whether selective attention (Qiu et al. 2007; Fang et al. 2009) and feedback modulations from higher visual areas such as hV4 (Roelfsema et al. 2007; Shipp et al. 2009) provide the necessary “bridge” between these mechanisms (Bartels 2009), will offer important insight for these models.
To conclude, our data reveal representational biases across the ventral visual stream that follow a change from chromatic signals that closely match the retinal input to those representing an object's surface color. Our finding that activity in V2 signals both chromatic information with respect to the retina and chromatic information with respect to the object supports suggestions that area V2 assists in figure-ground segmentation and the binding of color to object surfaces (Grossberg and Mingolla 1985; Nakayama and Shimojo 1992; Humphreys et al. 2000; Von der Heydt and Pierson 2006; Hung et al. 2007; Bartels 2009). We did not see surface color representations in V1 (Komatsu et al. 2000; Sasaki and Watanabe 2004; Meng et al. 2005), but our findings are consistent with the emerging view that early levels of visual processing not only signal changes in wavelength at the retina but play an important role in transforming these signals to represent color as a surface property of an object (Zhou et al. 2000; Shapley and Hawken 2002; Wachtler et al. 2003; Hurlbert and Wolf 2004; Roe et al. 2005; Hung et al. 2007).
This work was supported by an Australian Research Council (ARC) Discovery Project grant awarded to A.N.R. (DP0984494) and an ARC Fellowship awarded to M.A.W. (DP0984919).
We thank Erin Goddard, Johannes Stelzer, and Ciaran Baranasooriya for their helpful comments on this manuscript. Conflict of Interest: None declared.