-
PDF
- Split View
-
Views
-
Cite
Cite
Zhen Li, Dorita H F Chang, Context-based modulations of 3D vision are expertise dependent, Cerebral Cortex, Volume 33, Issue 11, 1 June 2023, Pages 7136–7147, https://doi.org/10.1093/cercor/bhad026
- Share Icon Share
Abstract
An object’s identity can influence depth-position judgments. The mechanistic underpinnings underlying this phenomenon are largely unknown. Here, we asked whether context-dependent modulations of stereoscopic depth perception are expertise dependent. In 2 experiments, we tested whether training that attaches meaning (i.e. classification labels) to otherwise novel, stereoscopically presented objects changes observers’ sensitivity for judging their depth position. In Experiment 1, observers were randomly assigned to 3 groups: a Greeble-classification training group, an orientation-discrimination training group, or a no-training group, and were tested on their stereoscopic depth sensitivity before and after training. In Experiment 2, participants were tested before and after training while fMRI responses were concurrently imaged. Behaviorally, stereoscopic performance was significantly better following Greeble-classification (but not orientation-discrimination, or no-) training. Using the fMRI data, we trained support vector machines to predict whether the data were from the pre- or post-training sessions. Results indicated that classification accuracies in V4 were higher for the Greeble-classification group as compared with the orientation-discrimination group for which accuracies were at chance level. Furthermore, classification accuracies in V4 were negatively correlated with response times for Greeble identification. We speculate that V4 is implicated in an expertise-dependent, object-tuning manner that allows it to better guide stereoscopic depth retrieval.
Introduction
The ability to judge depth position of objects is critical for human survival. Many cues can be used for depth perception, including a variety of both monocular (motion parallax, shading) and binocular (disparity) cues (Snowden et al. 2012). Among them, binocular disparity is perhaps the most salient cue to depth, and reflects the slightly disparate images falling on the 2 eyes because of their horizontal separation. Neurophysiology and imaging work has shown that neurons selective to disparity are widely distributed across visual cortex, and can be found in early visual areas V1, V2, intermediate dorsal areas V3A, KO, and ventral areas V4, IT (Hubel and Wiesel 1962; Uka et al. 2000; Hinkle and Connor 2001; Thomas et al. 2002; Preston et al. 2009). Intermediate to higher visual areas, such as dorsal areas V3A and IPS, ventral area IT, appear to be involved representation of shape from disparity (Janssen et al. 1999; Georgieva et al. 2009).
Depth perception depends on readouts from these depth-sensitive areas, but curiously, can be further modulated by the type of object encountered (regardless of the irrelevancy of this information!). “Contextual” influences of stereovision have been shown, both in work by our group and in work by others, as modulations in stereosensitivity as a function of task type, the stimulus’ geometric plausibility (i.e. plausible vs implausible), and the stimulus’ biological relevance (e.g. face vs random surface) (Wong et al. 2020; Chou et al. 2021). Here, we are interested in addressing contextual effects in stereovision in terms of modulations based on the stimulus’ biological relevance.
Recent work from our laboratory has indicated that stereo-performance on a depth segregation-from-noise task is better for viewing implausible objects vs plausible objects (Wong et al. 2020). Object-based modulations can occur in other contexts, too: for example, stereoscopic depth retrieval is worse when viewing an upright face vs an inverted face (Chou et al. 2021). This effect, however, is task-dependent. In the depth-in-noise (SNR) task that required judging the depth position (near/far) of a target in the center compared with surrounding plane with the coherency (SNR) of the target varied, depth sensitivity was worse when tested with the upright face than an inverted face or random shape; but, in a fine stereo-discrimination task that required subjects to discriminate which of 2 consecutively presented, fully coherent, objects were nearer, sensitivity was better when tested with an upright face than a random shape (Chou et al. 2021). fMRI results of these studies found that these context-based modulations of stereovision appear to be reflected in key nodes along both dorsal (V3A), and ventral [fusiform face area (FFA), lateral occipital complex (LOC)] cortex. These findings suggest there to be a complex interaction between higher order object areas and depth mechanisms.
Here, we ask about the nature of these curious object-stereo interactions. Particularly, we sought to understand whether the relevant mechanisms are in-built, or acquired in an expertise-dependent matter. To do so, we provided observers with training that attaches meaning (i.e. classification labels) to otherwise novel stereoscopic objects and asked whether this training changes observers’ sensitivity for judging the now meaningful objects' depth position. We started with a behavior experiment (Experiment 1) in which a set of stereoscopic objects (“Greebles”; Gauthier and Tarr 1997) that are novel to participants were presented. Participants were trained to classify these objects (i.e. judge their names and genders) and their depth sensitivities (i.e. ability to judge whether the object is “near” or “far”) were indexed before and after training. We elected to index depth sensitivity here using an SNR-based task as previous work from our laboratory has shown this task to reveal salient differences in performance for unfamiliar vs familiar objects, e.g. non-face better than face (Chou et al. 2021) and geometrically implausible objects better than plausible objects (Wong et al. 2020). In a second experiment (Experiment 2), participants were tested before and after training while fMRI responses were concurrently imaged.
Materials and methods
Experiment 1 (behavior)
Participants
Forty-six observers (age: mean = 24.39, SD = 4.83; 8 males and 38 females) participated in this initial behavioral experiment. Participants were further divided into 3 groups according to training condition (see General Procedures). Twenty participants (age: mean = 23.85, SD = 4.56; 2 males and 18 females) were tested in group 1 (Greeble-classification training), and 13 participants were tested in each of the 2 control groups: group 2 (orientation training; age: mean = 24.38, SD = 5.24; 3 males and 10 females) and group 3 (no training; age: mean = 25.23, SD = 5.12; 3 males and 10 females). Two participants in group 3 were left-handed and all other participants were right-handed. All participants had normal or corrected-to-normal vision as assessed by a Snellen linear acuity chart and the butterfly stereo acuity test. Participants provided written informed consent in line with ethical review approved by Human Research Ethics Committee of The University of Hong Kong. The sample sizes were determined based upon statistical power-assessments with effect sizes reported previously by our laboratory and in the relevant depth perception literature pertaining to behavioral, fMRI, and rTMS depth-related effects (Preston et al. 2008; Chang et al. 2014; Wong et al. 2020).
Stimuli and apparatus
Random dot stereograms (RDS) were generated from 3D models of 20 novel objects known as Greebles (Gauthier and Tarr 1997). Each Greeble consists of a main vertical stem part and surrounding limbs. These Greebles can be organized along 2 independent dimensions: gender and family. There are 2 genders (i.e. PLOK or GLIP) depending on the orientation of limbs (upwards or downwards), and 5 families (denoted by families 1–5) depending on the shape of stem part (Fig. 1A). Each Greeble belongs to one gender and one family. We selected 2 representative Greebles for a given gender and family. The 3D models were originally designed by Scott Yu and made available by Michael J. Tarr, Carnegie Mellon University, http://www.tarrlab.org/. In order to ensure sensory characteristics of the Greebles were held constant, we further equated the models as follows: we adjusted the height of stem part to be the same across models, generated gray-scale depth maps with a black background, normalized depth maps to keep intensity distributions of all models constant. RDS stimuli were generated based on these depth maps (with intensity coding for disparity) and presented using custom software written in MATLAB (the MathWorks) using extensions from Psychtoolbox 3 (Brainard 1997).

Sample stereoscopic “Greeble” stimuli and tasks. A) Greeble depth-maps were generated from 3D models.The depth maps were then used to derive RDS stimuli. Greebles with the same orientation of limbs belong to the same gender; Greebles with the same stem body part are from the same family (Gauthier and Tarr 1997). For example, a and b (also c and d) are Greebles of the same gender from the same family; a, b and c, d are Greebles of different genders from different families. B) SNR depth task. Participants were required to judge whether the Greeble was in front or behind the reference plane. C) Orientation-discrimination task. Each Greeble was shown tilted to the left and right of vertical and participants judged the orientation of the Greeble.
Stimuli were presented on an ASUS monitor (24 inches) running at 120 Hz with resolution of 1,920 × 1,080 pixels. Participants wore a pair of NVIDIA 3D Vision 2 Wireless Glasses when viewing the stimuli at a distance of 58 cm. Each Greeble was surrounded by a reference square plane (13.5° × 13.5°). The Greebles and reference plane were presented on a mid-gray background. Dot density was 90 dots/deg2, and each dot had a size of 0.023°. Each RDS had a maximum disparity of 28 arcmin and each Greeble had an average disparity of 22 arcmin.
General procedures
The behavioral experiment comprised pretest, training, and posttest sessions conducted on different days. Each session lasted for about 1 h. Participants were randomly assigned to one of 3 groups. Participants in group 1 were trained on Greeble classification; participants in group 2 were trained instead on an orientation discrimination task using identical stimuli (Fig. 1C); participants in group 3 did not receive training. All participants were tested on the SNR depth task before and after 3–5 days of training (see details below).
Tasks
Signal-in-noise depth task
In the signal-to-noise depth (SNR) task (Fig. 1B), the RDS was shown in 1 of 2 depth positions (i.e. nearer or farther than the reference plane), and participants judged the depth position of each Greeble (“far”/up-arrow key; “near”/down-arrow key). The ratio of the number of signal dots (i.e. dots that defined the surface of Greeble) to the number of noise dots was varied on different trials. Disparity of the noise dots was drawn from a normal distribution centered on normal 0 arcmin and SD of ±14 arcmin. At 100% signal, all dots defined the Greeble and at 0% signal, only noise-dots were present. Stimulus coherence was adjusted using a staircase procedure with a 1-up/2-down rule yielding thresholds at the 70.7% correct level (Treutwein 1995).
Participants completed 10 runs in total with each run comprising 70 trials. Trials of all 10 staircases (one for each family and gender) were interleaved, and each run consisted of 70 of these interleaved trials in randomized order. On each trial, we first presented a nonius fixation marker (0.4 deg horizontal and vertical nonius lines) at the center of the screen for 500 ms. A stereoscopic Greeble was then shown for 500 ms. Then a blank screen was shown. The length of the blank screen was adjusted to preserve the length of each trial at 3 s. Participants were permitted to input their response as soon as the Greeble was onset and until the trial ended. Participants were allotted breaks every 140 trials. The entire task took about 50 min including breaks.
Greeble classification training
For participants assigned to learn Greeble classification, training consisted of 3–5 sessions (depending on performance attainment). Each session lasted about 1 h and sessions were completed on separate days. Participants were trained to associate Greebles with their names and genders. The individual segments of the training procedure generally followed the framework proposed by Gauthier et al. (1998):
(1) Overview. In this segment, all Greebles with their names and genders were shown to participants in 2 successive screens with each screen showing 10 Greebles for 5 min. Greebles were only shown in near depth position. Participants were asked to observe the Greebles freely and try to memorize the name and gender of each Greeble if possible. This segment was only completed once at the beginning of the first training session.
(2) Gender inspect. This segment focused on training associations with the Greebles’ gender. Greebles with their genders were shown on 2 successive screens with each screen showing 10 Greebles for 2 min. Greebles were shown in near depth position. Participants were to memorize genders of all Greebles.
(3) Gender classification. Greebles were shown one by one in near depth position without gender information and participants were asked to complete a gender discrimination task for each, pressing “p” for gender “PLOK” or “g” for gender “GLIP.” If the answer was incorrect, the correct answer would be shown for 3 s. The purpose of this segment was to verify whether participants correctly learned the gender of each Greeble, and, to help the subject continue to review the associations.
(4) Individual inspect. Greebles and their names were shown one by one in near depth position. Participants were required to memorize the name of each Greeble and no response was needed. Greebles were selected from one or more families according to training progression. In the initial training run, Greebles were randomly selected from a single family (e.g. family 2); as the training proceeded, Greebles from 2 families were selected and presented (e.g. families 2 and 3); eventually, Greebles across all families were selected and presented (i.e. families 1–5). Each Greeble was shown for 5 s in a trial, and the number of presentation repetitions was adjusted dynamically to preserve a run-length of 4 min.
(5) Naming with feedback. In each trial of this segment, a Greeble was shown for 5 s without its name, and participants were asked to indicate (type) the first letter of its name. For those participants who were not familiar with the layout of keyboard, the experimenter helped them press the letter on the keyboard. If the wrong letter was pressed, the correct (full) name would be presented for 5 s. This task was designed to assess the subject’s progress on learning Greeble names, as well to him/her continue to review the associations. Identical to Individual inspect task, for each run, Greebles were selected from one or more families according to training progression. Number of presentation repetitions was adjusted dynamically to preserve a run-length of 4 min.
(6) Verification. For each trial, a Greeble, along with a name or gender, was shown for 5 s. The name/gender labels presented may be correct or incorrect for the Greeble presented. Participants were asked to indicate via keypress (y/n) as to whether the labels presented were correctly associated with the stimulus presented. Half of these trials presented correct labels. Each of the individual Greebles was shown 4 times (once with a correct name label, once for an incorrect name label, once with a correct gender label, and once with an incorrect gender label). All trials were interleaved and no feedback was given. This segment is critical—as it verifies whether Greeble learning has been well established. For learning criterion to be met, the subject’s response accuracies for name labeling (i.e. the first letter of the name) must reach 80% correct level, and gender accuracy—90% correct level.
Training continued until these criteria were met. Notably, this particular segment was also tasked to the subject on the first (day 1) test session to ensure that participants had no prior exposure to the fictional Greeble family (that of course, is now legendary in Psychology). Table 1 presents an illustrative procedure for a participant in group 1. The sequence of families was randomized for each participant.
Session . | Run . | Task . | Data . |
---|---|---|---|
1 | 1 | SNR task | All Greebles |
2 | Verification | All Greebles | |
2 | 1 | Overview | All Greebles |
2 | Gender inspect | All Greebles | |
3 | Gender categorization | All Greebles | |
4 | Individual inspect | Family 1 | |
5 | Naming with feedback | Family 1 | |
6 | Individual inspect | Family 2 | |
7 | Naming with feedback | Family 2 | |
8 | Naming with feedback | Family 1&2 | |
9 | Gender categorization | All Greebles | |
10 | Naming with feedback | Family 1&2 | |
3 | 1 | Gender categorization | All Greebles |
2 | Naming with feedback | Family 1&2 | |
3 | Individual inspect | Family 3 | |
4 | Naming with feedback | Family 3 | |
5 | Individual inspect | Family 4 | |
6 | Naming with feedback | Family 4 | |
7 | Naming with feedback | Family 3&4 | |
8 | Naming with feedback | Family 1–4 | |
4 | 1 | Gender categorization | All Greebles |
2 | Individual inspect | Family 5 | |
3 | Naming with feedback | Family 5 | |
4 | Individual inspect | Family 5 | |
5 | Naming with feedback | Family 5 | |
6 | Individual inspect | All Greebles | |
7 | Naming with feedback | All Greebles | |
5 | 1 | Individual inspect | All Greebles |
2 | Naming with feedback | All Greebles | |
3 | Verification | All Greebles | |
6 | 1 | SNR task | All Greebles |
Session . | Run . | Task . | Data . |
---|---|---|---|
1 | 1 | SNR task | All Greebles |
2 | Verification | All Greebles | |
2 | 1 | Overview | All Greebles |
2 | Gender inspect | All Greebles | |
3 | Gender categorization | All Greebles | |
4 | Individual inspect | Family 1 | |
5 | Naming with feedback | Family 1 | |
6 | Individual inspect | Family 2 | |
7 | Naming with feedback | Family 2 | |
8 | Naming with feedback | Family 1&2 | |
9 | Gender categorization | All Greebles | |
10 | Naming with feedback | Family 1&2 | |
3 | 1 | Gender categorization | All Greebles |
2 | Naming with feedback | Family 1&2 | |
3 | Individual inspect | Family 3 | |
4 | Naming with feedback | Family 3 | |
5 | Individual inspect | Family 4 | |
6 | Naming with feedback | Family 4 | |
7 | Naming with feedback | Family 3&4 | |
8 | Naming with feedback | Family 1–4 | |
4 | 1 | Gender categorization | All Greebles |
2 | Individual inspect | Family 5 | |
3 | Naming with feedback | Family 5 | |
4 | Individual inspect | Family 5 | |
5 | Naming with feedback | Family 5 | |
6 | Individual inspect | All Greebles | |
7 | Naming with feedback | All Greebles | |
5 | 1 | Individual inspect | All Greebles |
2 | Naming with feedback | All Greebles | |
3 | Verification | All Greebles | |
6 | 1 | SNR task | All Greebles |
Session . | Run . | Task . | Data . |
---|---|---|---|
1 | 1 | SNR task | All Greebles |
2 | Verification | All Greebles | |
2 | 1 | Overview | All Greebles |
2 | Gender inspect | All Greebles | |
3 | Gender categorization | All Greebles | |
4 | Individual inspect | Family 1 | |
5 | Naming with feedback | Family 1 | |
6 | Individual inspect | Family 2 | |
7 | Naming with feedback | Family 2 | |
8 | Naming with feedback | Family 1&2 | |
9 | Gender categorization | All Greebles | |
10 | Naming with feedback | Family 1&2 | |
3 | 1 | Gender categorization | All Greebles |
2 | Naming with feedback | Family 1&2 | |
3 | Individual inspect | Family 3 | |
4 | Naming with feedback | Family 3 | |
5 | Individual inspect | Family 4 | |
6 | Naming with feedback | Family 4 | |
7 | Naming with feedback | Family 3&4 | |
8 | Naming with feedback | Family 1–4 | |
4 | 1 | Gender categorization | All Greebles |
2 | Individual inspect | Family 5 | |
3 | Naming with feedback | Family 5 | |
4 | Individual inspect | Family 5 | |
5 | Naming with feedback | Family 5 | |
6 | Individual inspect | All Greebles | |
7 | Naming with feedback | All Greebles | |
5 | 1 | Individual inspect | All Greebles |
2 | Naming with feedback | All Greebles | |
3 | Verification | All Greebles | |
6 | 1 | SNR task | All Greebles |
Session . | Run . | Task . | Data . |
---|---|---|---|
1 | 1 | SNR task | All Greebles |
2 | Verification | All Greebles | |
2 | 1 | Overview | All Greebles |
2 | Gender inspect | All Greebles | |
3 | Gender categorization | All Greebles | |
4 | Individual inspect | Family 1 | |
5 | Naming with feedback | Family 1 | |
6 | Individual inspect | Family 2 | |
7 | Naming with feedback | Family 2 | |
8 | Naming with feedback | Family 1&2 | |
9 | Gender categorization | All Greebles | |
10 | Naming with feedback | Family 1&2 | |
3 | 1 | Gender categorization | All Greebles |
2 | Naming with feedback | Family 1&2 | |
3 | Individual inspect | Family 3 | |
4 | Naming with feedback | Family 3 | |
5 | Individual inspect | Family 4 | |
6 | Naming with feedback | Family 4 | |
7 | Naming with feedback | Family 3&4 | |
8 | Naming with feedback | Family 1–4 | |
4 | 1 | Gender categorization | All Greebles |
2 | Individual inspect | Family 5 | |
3 | Naming with feedback | Family 5 | |
4 | Individual inspect | Family 5 | |
5 | Naming with feedback | Family 5 | |
6 | Individual inspect | All Greebles | |
7 | Naming with feedback | All Greebles | |
5 | 1 | Individual inspect | All Greebles |
2 | Naming with feedback | All Greebles | |
3 | Verification | All Greebles | |
6 | 1 | SNR task | All Greebles |
Orientation discrimination training
Similar to the classification training group, participants completed a 4-day training protocol that presented identical stimuli as for the other group, but instead were tasked with completing an orientation discrimination task. Each session lasted about 1 h and sessions were completed on separate days. In the orientation-discrimination task, Greebles (again, presented stereoscopically) were shown tilted to the left or right of vertical. Participants judged the orientation/tilt of each Greeble via keypress. Task difficulty was varied in terms of the angle of tilt from vertical (0–90 deg), and was adjusted using a staircase procedure with a 1-up/2-down rule. The other parameters for the orientation discrimination task were identical to those described for the depth task.
Experiment 2 (fMRI)
Participants
Thirty-three new observers (age: mean = 22.70, SD = 3.02; 6 males and 27 females) participated in the fMRI experiment. Data of 3 participants were discarded because of excessive movement during scans. Participants were subdivided into 2 groups and were assigned to one of 2 training conditions: Greeble-classification training (group 1, n = 20; mean age = 23.15, SD = 3.18; 4 males and 16 females) or control, orientation-discrimination training (group 1, n = 10; mean age = 21.70, SD = 3.02; 8 males and 2 females). All participants were right-handed, and had normal or corrected-to-normal vision. Participants provided written informed consent in line with ethical review approved by Human Research Ethics Committee of the University of Hong Kong.
Stimuli and apparatus
For the sessions in-bore (as this experiment included both in-laboratory, and in-scanner sessions), stimuli were presented using a VPixx PROPixx projector equipped with a circular polarizer. The stimuli were projected onto a translucent screen placed at the back of the bore. Participants wore corresponding polarizing lenses glasses and viewed the images through a tilted mirror mounted above the head coil. The screen resolution was set at 1,920 × 1,080 pixels and the refresh rate was set to 120 Hz. Each Greeble was surrounded by a reference square plane (11.6° × 11.6°). The Greebles and reference plane were presented on a mid-gray rectangular background. Dot density of RDS was 90 dots/deg2, and each dot had a size of 0.023°. Each RDS had a maximum disparity of 24 arcmin and each Greeble had an average disparity of 18 arcmin. For the sessions in the laboratory, the same apparatus was used as in Experiment 1. Stimuli were adjusted in visual angle to match the angular size of the stimuli presented in-bore.
General procedures
Figure 2 shows the general flow of the fMRI experiment. Subjects competed pretest (lab), pre-scan (MRI), training (lab), posttest (lab), and post-scan (MRI) sessions. These sessions were completed on separate days. Each session lasted for about 1 h. Participants were randomly assigned into 2 groups: participants in group 1 were trained on Greeble classification, and participants in group 2 (control) were trained on an orientation-discrimination task (as described for Experiment 1).

Tasks
SNR depth task
Participants performed the SNR depth task in the pre and posttest laboratory sessions as well as in the pre- and post-fMRI sessions (i.e. pre- and post-scans). In the pre and posttest (lab) sessions, all parameters were identical to those described in Experiment 1 except that Greebles from 4 (rather than 5) families were used. For the fMRI sessions, further details are discussed below.
Greeble classification training and orientation discrimination training
The training parameters were identical to those described in Experiment 1 except that Greebles from 4 (rather than 5) families were used. Here, Greeble classification training consisted of 2–5 sessions (depending on performance attainment) and orientation discrimination training consisted of 3 sessions.
fMRI design
fMRI runs were arranged in a block design. There were 9 block types, comprising 8 stimulus types (i.e. 4 families × 2 genders) and a fixation block. Except the first fixation block that lasted for 26 s (additional volumes were acquired and then discarded to eliminate startup transients), each block lasted for 16 s. Stimulus blocks were interleaved with fixation blocks. Each stimulus block included 8 trials with equal number of “far” and “near” trials. On each trial, a stimulus was shown for 0.5 s and was followed by a fixed-duration response period of 1.5 s. Participants judged the depth position of each stimulus (i.e. “near” or “far). In order to control the task difficulty across stimulus types and across participants, before each scanning session (i.e. pre-scan or post-scan session), participants completed a behavioral run in-bore to index thresholds for each stimulus type. For each staircase, stimuli were generated sampling SNR from a uniform distribution between ±1 SD of the mean threshold estimates +15 using the last 30 trials. These values were subsequently used to define stimuli for the fMRI runs.
Image acquisition
fMRI data acquisition was performed at the MRI unit at The University of Hong Kong using a 3-Tesla GE MRI scanner with a 48-channel multiphase array head coil. A pair of foam pads was used to limit head movement. A high-resolution T1-weighted anatomical image was acquired for each participant [echo time (TE) = 2.8 ms; repetition time (TR) = 7 ms; inversion time = 900 ms; flip angle = 8°; voxel size = 1 × 1 × 1 mm3; field of view = 256 × 256]. For functional runs, BOLD signals were measured using an echo planar imaging sequence (voxel size = 2 × 2 × 2 mm3; TE = 30 ms; TR = 2000 ms; field of view = 240 × 240; flip angle = 90°; number of slices = 58; multiband factor = 2; number of volumes = 141).
Identification of ROIs
We identified ROIs (V1–V3, V3A, V4, LOC, and FFA) for each participant. Retinotopically organized areas V1–V3, V3A, V4 were identified by rotating wedge stimuli (Sereno et al. 1995). Ventral area LOC, which includes anatomically separated areas lateral occipital, posterior to mid parts of the fusiform gyrus, and collateral sulcus (Grill-Spector et al. 2000), was defined as the set of voxels in lateral occipito-temporal cortex showing higher activation to intact object images compared with the corresponding scrambled images (Kourtzi and Kanwisher 2001). FFA was defined as an area in the fusiform gyrus demonstrating higher activation in response to face images compared with common object images (Kanwisher et al. 1997). For 7 participants whose FFAs could not be adequately identified using the single run FFA localizer, we defined their FFA using a 5-mm spherical ROI centered at Talairach coordinates of [−37, −42, −16] in the left hemisphere and [39, −40, −16] in the right hemisphere (Grill-Spector et al. 2004).
Imaging data analysis
Preprocessing
MRI data were processed using BrainVoyager 22.0 (Version 22.0.2.4572, 64-bit; Brain Innovation, Maastricht, the Netherland), along with custom scripts written in MATLAB (the MathWorks, Natick, MA, USA). For each participant, the T1-weighted anatomical image was transformed into Talairach space (Talairach and Tournoux 1988) and used for 3D cortex reconstruction, and inflation. For functional runs, images were preprocessed using slice scan time correction, 3D motion correction (using the first volume of the first functional run as a reference) and high-pass temporal filtering (3 cycles). No spatial smoothing was performed. Functional runs were then co-registered with native anatomical images and transformed into Talairach space. The first 10 volumes (i.e. 20 s) of each functional run were discarded in order to eliminate start-up transients.
Univariate analysis
We performed whole-brain- and ROI-based GLM analyses on fMRI data obtained in the pre- and post-scans. GLMs were defined using 11 predictors, including the 5 experimental conditions (1 fixation; 4 families), and 6 motions parameters (3 translation parameters, in millimeters; and 3 rotation parameters, pitch, roll, yaw; in degrees). For each block, square-wave predictors were convolved with a gamma function considering the idealized hemodynamic response. BOLD signal series of each voxel was modeled as a linear combination of all predictors with coefficients (beta weights) minimizing the squared error values. Beta weights were then used for contrasts of different experimental conditions. We contrasted stimulus conditions with the fixation condition. For whole-brain GLM analyses, group-level responses were analyzed using GLM random-effects analyses, and then t-values were further compared across pre- vs post-scan sessions. For ROI-based GLM analyses, individual subject beta weights of each ROI were entered to a 2 (training group) × 2 (pre-scan, post scan) × 7 (ROI) ANOVA to evaluate differences before and after training.
ROI-based MVPA
We further performed an ROI-based MVPA using customized scripts written in MATLAB. A linear support vector machine (SVM) implemented by libSVM (Chang and Lin 2011) was used as the classifier. fMRI time courses were shifted by 2 volumes (4 s) in order to account for the hemodynamic delay of the BOLD signal and then converted into z scores. For each ROI, voxels were sorted by descending order according to their t-values for the contrast of “all stimuli vs fixation.” The top 300 voxels from each ROI for classification training and testing. This final voxel number was selected as it was the voxel count at which performance saturated. A leave-one-run-out cross-validation was used to assess the performance of the MVPA. Here, data were partitioned as whole runs; that is, data from one run were used for testing, and data from other runs were used for training. This was repeated with different partitions of the data. The classification accuracy for each ROI was calculated as the average classification accuracy of all cross-validations. Mean classification accuracies were tested against permutation baseline which was obtained by a permutation test during which 1,000 SVMs were trained and tested with data labels shuffled.
Finally, we further explored the relationship between behavioral data and fMRI MVPA accuracies in each ROI by means of Pearson’s correlations (r). Two behavioral indices were used. First, we computed indices reflecting threshold changes for the SNR task measured in the lab before and after training, computed as: (threshpre − threshpost)/threshpre. Participants in the Greeble-classification training group and orientation-discrimination training (control) group were considered separately. Second, we computed an index using the average response time (RT) for name judgments in the Verification task. RTs on the verification tasks in pre- and posttest sessions for the Greeble-classification training group, and the pretest session for the orientation-discrimination training group (control) were considered.
Results
Experiment 1 (behavior)
We first examined the effectiveness of training across groups. In group 1, all participants met the learning criteria for Greeble classification training (mean number of training sessions: 4.25, SD = 0.640. They achieved a mean accuracy of 88.9%, with SD 5.8% for name judgment, and a mean accuracy of 95.8%, with SD 3.9% for gender judgment in the Verification segment. In group 2, thresholds for orientation discrimination task did not change significantly over sessions [t(12) = 0.852, P = 0.411]; RTs, however, were significantly shorter for the last training session compared with the first training session [t(12) = 3.941, P = 0.002].
Next, we examined performance on the SNR depth task before and after training (Fig. 3). A 2 (pretest or posttest) × 3 (Greeble-training, orientation-training, no-training group) ANOVA showed a significant main effect of test [F(1, 43) = 4.451, P = 0.041, η2 = 0.094], and a significant interaction between group and test [F(2, 43) = 7.296, P = 0.002, η2 = 0.253]. Follow-up-paired t-tests showed that the depth performance was significantly better (i.e. threshold lower) after Greeble-classification training [t(19) = 3.874, P = 0.001 (Fig. 3A)]. In contrast, the depth performance did not change following orientation-training [t(12) = 0.866, P = 0.403 (Fig. 3B)] nor no-training [t(12) = −1.176, P = 0.262 (Fig. 3C)].

Performance on the SNR depth task in the pre and posttest sessions. A) Data from participants in group 1 (Greeble-classification training); B) data from participants in group 2 (orientation-training); C) data for participants in group 3 (no-training). Error bars represent ±1 standard error of the mean.
Next, we further examined whether changes in depth performance in the pre vs posttests depended on the particular Greeble family presented. Thresholds (Fig. 4) were entered in a 5 (family) × 2 (test) × 3 (group) mixed-ANOVA that indicated a main effect of test [F(1, 43) = 4.451, P = 0.041, η2 = 0.94], a significant interaction between test and group [F(2, 43) = 7.296, P = 0.002, η2 = 0.253], but no effect of family nor interactions involving family (Fig. 4).

Performance on the depth SNR task in the pre and posttests, presented across families. A) Data for participants in group 1 (Greeble training); B) data for participants in group 2 (orientation-training); C) data for participants in group 3 (no-training). Error bars represent ±1 standard error of the mean.
Experiment 2
Behavioral (SNR task)
The behavioral results from Experiment 1 were replicated in Experiment 2. In terms of training effectiveness: in group 1, all participants met the learning criteria for Greeble classification training (mean number of training sessions: 3.37, SD = 0.60). They achieved a mean accuracy of 92.8% (SD = 6.0%) for name judgment, and a mean accuracy of 95.5% (SD = 3.6%) for gender judgment in the Verification segment. For the orientation-training group (group 2), orientation thresholds did not change with training [t(9) = −0.551, P = 0.595]; again, however, RTs were significantly shorter for the last training session compared with the first training session [t(9) = 3.424, P = 0.008].
For the SNR depth thresholds in the pre and posttest (lab) sessions, thresholds were entered in a 2 (pre vs posttest) × 2 (group) mixed-ANOVA. Results showed a main effect of test [F(1, 28) = 13.725, P = 0.001, η2 = 0.329], and a significant interaction between group and test [F(1, 28) = 8.856, P = 0.006, η2 = 0.241]. Follow-up-paired t-tests indicated that thresholds were significantly lower after Greeble-classification training [t(19) = 6.634, P < 0.001]. In contrast, depth thresholds did not change for the orientation-discrimination training group [t(9) = 0.364, P = 0.724]. We again repeated the analysis breaking the data down by family, entering the data into a 4 (family) × 2 (test) × 2 (group) mixed-ANOVA that indicated a main effect of test [F(1, 28) = 13.725, P = 0.001, η2 = 0.329], a significant interaction between test and group [F(1, 28) = 8.856, P = 0.006, η2 = 0.240], but no effect of family nor interactions involving family.
Functional magnetic resonance imaging (fMRI)
Univariate analysis
For the whole-brain GLM analysis, we compared activation in pre- vs post-scan sessions and found that there were no voxels/clusters that were significantly more/less activated before and after Greeble-classification training with corrections made for multiple comparisons holding false discovery rate q < 0.05.
For the ROI-based GLM analysis, a 2 (group) × 2 (test) × 7 (ROI) ANOVA on extracted beta-values indicated a main effect of ROI [F(2.845, 79.649) = 25.93, p = < 0.001, η2 = 0.481 (with Greenhouse–Geisser corrections for violations of sphericity)]. The effect of group was not significant [F(1, 28) = 1.006, P = 0.324, η2 = 0.035]; and the interaction between ROI and group was not significant [F (2.845, 79.649) = 1.56, P = 0.208, η2 = 0.053]. The effect of test was not significant [F(1, 28) = 0.943, P = 0.34, η2 = 0.033], and the interaction between test and ROI was not significant [F(3.052, 85.462) = 0.518, P = 0.674, η2 = 0.018]. That is, beta values were not significantly different before and after training.
ROI-based MVPA
Next, we examined patterned responses in the 7 ROIs, comparing responses pre vs posttests. That is, we trained SVMs to predict whether data were obtained in the pre-scan or post-scan. To preserve a balance between the pre- and post-scan data, the same number of runs in the pre- and post- scans were used (e.g. if there were 7 runs in pre-scan and 8 runs in post-scan, all 7 runs in pre-scan session and the first 7 runs in the post-scan session were used). Classification accuracies of all ROIs are presented in Fig. 5. Classification accuracies were tested against a baseline of 52.1% for Greeble-classification training group or a baseline of 52.9% for orientation-classification training group, with corrections made for multiple comparisons holding false discovery rate q < 0.05. Baselines were obtained via permutation tests (see section ROI-Based MVPA).

MVPA classification accuracies for pre- vs post-SVM classifications. The horizontal bars indicate the shuffled baselines. Error bars depict the standard error of the mean.
For the Greeble-classification training group, accuracies were above baseline in early visual areas (V1–V3), dorsal intermediate area V3A, and ventral area V4. For the orientation-discrimination group, SVM accuracies were above baseline in early visual areas (V2, V3), and dorsal intermediate area V3A. SVM accuracies were entered in a 2 (group) × 7 (ROI) ANOVA, that indicated a main effect of ROI [F(6, 168) = 18.468, P < 0.001, η2 = 0.397], a main effect of group [F(1, 28) = 4.271, P = 0.048, η2 = 0.132], and a significant group × ROI interaction [F(3.972, 111.227) = 2.647, P = 0.027, η2 = 0.086]. Follow-up post-hoc t-tests showed that in area V4, classification accuracy was significantly higher in the Greeble-classification group than in the orientation-discrimination group [t(22.77) = 3.007, Bonferroni adjusted P = 0.042].
Brain-behavior correlations
Finally, we tested 2 sets of correlations: SVM accuracies vs behavioral depth threshold changes [(threshpre − threshpost)/threshpre], and SVM accuracies vs RT. For these analyses, we only considered ROIs in which SVM accuracies were higher than permutation baselines (i.e. V1–V3, V3A, and V4). No significant correlation was found between SVM accuracies and depth threshold changes (pre vs post) in both groups. For the correlations between RT and MVPA accuracies: (i) for the posttest session in the Greeble-classification group, RT correlated negatively with MVPA accuracies in area V4 [r(18) = −0.569, Bonferroni correction P = 0.045] (Fig. 6); (ii) for pretest sessions in both groups (i.e. Greeble-classification group, orientation-discrimination group), there was no significant correlation between RT and MVPA accuracies.

Correlations between RT and SVM classification accuracy. P-values shown in this figure are uncorrected. An asterisk denotes a correlation that is significant at the 0.05 level after Bonferroni correction.
Discussion
We investigated whether expertise plays a role in object-based modulations of stereoscopic depth retrieval. Behavioral sensitivity and neural responses to disparity-defined depth were tested before and after training. In Experiment 1, we showed that thresholds were lower after object-level Greeble-classification training whereas thresholds did not change after orientation-discrimination training, or no-training. In Experiment 2, we measured fMRI responses while participants performed the SNR depth task in the scanner before and after training. Univariate analysis showed that responses in each ROI were comparable before and after training for participants in both training groups. MVPA results were more interesting, however, showing that for the Greeble-classification group, response patterns were different before and after training in areas V1–V3, V3A, and V4; for the orientation-discrimination group, response patterns were different in areas V2, V3, and V3A. Response patterns in V4, during judgments of depth of otherwise nonsensical Greeble figures, were distinguishable following Greeble-classification training that attaches labels/meaning to these stimuli, but not following orientation-discrimination training using the same stimuli. We discuss our behavior and imaging findings, in turn.
The effects of object-level training on depth sensitivity
Our data indicate that depth sensitivity can be altered simply by attaching meaning to novel objects. How might this come about?
In high-noise stimuli, perceptual performance is bottle-necked by 2 requirements: (i) Segregation of disparity target signals from noise, and (ii) intrinsic limitations associated with feature read-outs. In our experiment, we trained subjects to distinguish Greebles under zero-noise, in between 2 tests involving high-noise. Doing so may have allowed for target/feature enhancement, which then facilitated subsequent requirements for segmenting targets from noise (when the stimuli were placed back into a high noise setting). This logic would be consistent with the previous work of Dosher and Lu (2005), in which perceptual learning in an orientation discrimination task in low-noise conditions transferred to the same task using high-noise displays.
An alternative explanation is that a specialized circuit, which is independent from that used during the perception of novel objects, was involved after Greeble training. Experts in a specific domain tend to recognize objects in that domain at an exemplar-specific level, whereas other individuals with less experience tend to recognize objects at a categorical level (Rosch et al. 1976). A special class of stimuli in which such expertise-dependent effects are found are faces. It is well established that face recognition is achieved by specialized modules, which are different from those used for object recognition (Farch 1992). Might such face-specific, or, even “Greeble”-specific modules be implicated post-training? As to the control groups, depth discrimination performance did not change after orientation-discrimination training (or no training). This is important, as it indicates that exposure to Greebles alone cannot improve depth retrieval.
It is worth pausing to consider in greater depth, the effectiveness of our orientation-discrimination training protocol. Notably, orientation thresholds for the orientation discrimination task did not change significantly over sessions. Several factors may have prohibited improvements in our experiments: (i) the average orientation thresholds were already low in the first training session (Experiment 1: mean = 6.0°, SD = 5.6°; Experiment 2: mean = 4.8°, SD = 3.4°); (ii) perceptual learning involving orientation discrimination is highly position and orientation dependent (Shiu and Pashler 1992; Schoups et al. 1995) and the shapes of Greebles are more complex than lines or Gabor gratings which are commonly used in previous orientation-discrimination work (Zhang et al. 2010); (iii) Greebles here were defined solely by binocular disparity—so it is unclear as to whether this imposes a limit on perceptual improvements that can be attained. Nevertheless, we observed significant improvements in RTs: RTs were significantly shorter for the last training session compared with the first training session. As such, we deem our orientation-discrimination training to be still effective. Yet, unlike for the Greeble-training group, depth thresholds did not improve following training.
Still, it is possible that one training task may have been more cognitively demanding in nature. As a coarse attempt to equate the difficulty in training-attainment across the 2 tasks, we defined learning criteria at the 80%-correct-level for Greeble-name-labeling (along with a 90%-correct-level discrimination requirement for gender identification). We also thresholded orientation stimuli to the 71% correct level using a staircase. Nevertheless, the 2 training tasks are very different in nature (even if we go ahead and assume that difficulty per se is somewhat equivalent). Therefore, it is possible that any changes post-training may result, in part, from the added “cognitive” engagement that the Greeble classification group had to undergo, although we presume the effects should have come up in prefrontal/fronto-parietal regions, rather than be visual-cortex-specific as they are here. Alternatively, it is likely that completing a Greeble-classification task required subjects to engage in finer details of stimulus features—perhaps amounting to some degree of “fine” stereo training. Such attention-to-detail would not be necessary for global orientation discrimination. Whether inadvertent fine, feature-level training occurred here, that then led to the observed coarse depth improvements observed would benefit from future empirical investigation.
fMRI responses before and after training
In terms of the fMRI data, our univariate analysis showed that responses were generally comparable before and after training. Results of the pattern analysis (MVPA) were more revealing: classification accuracy in V4 was significantly higher after Greeble-classification (but not orientation-discrimination) training. V4 is commonly implicated in color (Zeki 1973; Conway et al. 2007), shape (Pasupathy and Connor 2001), and texture (Arcizet et al. 2008) perception.
Expertise-dependent changes in responses of V4 have been demonstrated in earlier animal work. In a study by Yang and Maunsell (2004), macaque monkeys received orientation discrimination training, and neurons in V4 showed stronger responses and narrower orientation tuning curves after training. In a more recent study by Sanayei et al. (2018), macaque monkeys learned a fine contrast categorization task. The ability for neurons in V4 to code for the stimuli increased, and neuronal activity became more predictive of upcoming behavioral choices with training.
The fact that V4 is implicated in stereopsis is of course not new (Uka et al. 2005; Umeda et al. 2007). What is new and surprising here is that V4 can exhibit expertise-dependent plasticity when it comes to retrieving stereoscopic data pertaining to the object. We reason this can be explained by a “matched filter” mechanism, which allows the evolution of filters to match the expected signal with during high noise viewing (Wehner 1987; Warrant 2016). We suspect that tuning properties of neurons in V4 can be changed to represent known/remembered stimulus features in visual tasks. This idea is generally well-supported by animal work (David et al. 2008; Hayden and Gallant 2013). The mechanism is efficient by shifting neuronal tunings toward representing relevant features, and away from representing irrelevant noise. This benefits selecting task-relevant features, helping ignores task-irrelevant features, thereby improving SNR task performance.
Considered together, our findings suggest that V4 reflects modulations of stereoscopic sensitivity based on expertise-dependent object knowledge. Still, we pause once more to consider whether it is possible that high SVM accuracies observed in V4 post-training could have resulted from differences in task difficulty between pre- vs post-scan sessions. We think this is unlikely as task difficulty in the scanner was held constant across the 2 tests: recall that SNR of the stimuli was sampled at the 70.7% correct level at the start of each scan. We can infer this control to be effective by verifying behavior task accuracies in the scanner: a 2 (test) × 2 (group) ANOVA indicated that accuracies did not differ significantly across sessions [F(1, 28) = 1.005, P = 0.325, η2 = 0.035], nor differ significantly across groups [F(1, 28) = 0.498, P = 0.486, η2 = 0.017], and there was no interaction between session and group [F(1, 28) = 0.113, P = 0.739, η2 = 0.004]. Alternatively, is it possible the V4 classification decoded SNR differences between pre- and post-scan sessions? We can also rule out this possibility as we found no significant correlations between behavioural threshold changes and neural (SVM) classification accuracies.
Training effects observed in other visual areas
In early visual areas, classification accuracies tended to be higher than chance level, but were not different between the 2 groups. Training effects in early visual areas are controversial (Sasaki et al. 2009). A single-cell study found no perceptual learning related changes in V1 and V2 (Ghose et al. 2002), whereas Schoups et al. (2001) reported modest training effects in V1. For other areas, Shibata et al. (2012) reported changes in the decoded tuning function of V3A after motion perceptual learning. We speculate that the discriminative responses in these areas, as they appear in both training groups, are a result from repeated exposure to the Greebles, and not from expertise/context-dependent changes per se.
For FFA, classification accuracies were around chance level in both groups (and not different between groups). While FFA is regarded as a key node for face processing (Kanwisher et al. 1997), it has been suggested to serve not only face processing but to act as a more generalized visual expertise-based area: Gauthier et al. (2000) showed that right FFA is more activated for car and bird experts when shown with their expertise images compared with non-expertise images; activation in the right hemisphere face areas also increases as participants gain expertise with novel objects (Gauthier et al. 1999). At first glance, the lack of FFA changes in our work may appear to contradict these previous studies. However, we note that (i) the unique role of FFA is still under debate (Grill-Spector et al. 2004), (ii) our training may not have been adequate for MVPA to reveal subtle pattern differences before and after training, and (iii) as we functionally localized FFA with faces, thereby picking up human face-specific voxels, it is possible that the portion of the fusiform that is implicated by acquiring “Greeble expertise” were missed in analysis.
We note here, our choice to design the study that centers around pre- vs post-training comparisons only. We elected to do so, rather than build in within-session (untrained) stimuli to serve as comparisons in order to increase statistical power. It was important for us to include sufficient representation across Greeble families in order to map neural responses to Greeble expertise-dependent changes, rather than learned-representations to any particular Greeble set. However, this brings upon a concern that comparisons across sessions, as done here, may result in artificially enhanced SVM accuracies because of between-scan differences in noise, for example. We can exclude this possibility here, for several reasons: (i) the data fed to SVM were normalized during which baseline (univariate responses) were subtracted. In this process, the effect of different noise levels could be suppressed; (ii) perhaps more importantly, even if different noise levels existed between scans, the effect should be the same for both groups, and across ROIs. There is no reason for scanner-elevated noise to produce group- or ROI-specific effects as we observed here. Nevertheless, in an attempt to achieve some level of a within-session comparison, we trained additional SVMs to predict the exact type of each Greeble presented within each of the pre and posttests, corresponding to an 8-class classification (i.e. 4 family × 2 gender). This particular analysis, then, provides insight into the uniqueness of the patterned representations across the individuated Greebles (or at least their families and genders). Classification accuracies were tested against a baseline of 14.2% for the Greeble-classification training group or a baseline of 14.8% for the orientation-training group. Baselines were obtained via permutation tests. Results are shown in Supplementary Fig. S1. No ROI showed accuracies above baseline (holding false discovery rate q < 0.05) across both the pre and posttests for both groups. SVM accuracies were entered in a 2 (group) × 2 (test) × 7 (ROI) ANOVA, that indicated no main effect of group [F(1, 28) = 1.502, P = 0.231, η2 = 0.051], test [F(1, 28) = 1.125, P = 0.298, η2 = 0.039] or ROI [F(4.275, 119.707) = 0.595, P = 0.678, η2 = 0.021], and no interactions involving test. While the grand ANOVA was not significant, we can glean interesting trends from this analysis that are consistent with results presented in our main analyses: SVM accuracies were somewhat higher in V4 [t(19) = 2.269, P = 0.035, without correction] and LOC [t(19) = 2.391, P = 0.027, without correction] for the Greeble-classification training group post-training as compared with pre-training. This was not evident in the orientation-training group. From this analysis, together with V4’s known role in complex shape representations, as well as its proximity to inferotemporal regions known to represent expertise-level object classes, we can speculate (albeit with a great deal of caution) that V4 becomes more attuned to the individual Greebles through training, which allows it to better guide disparity (positional readouts) in our task.
In sum, our findings suggest a key role for V4 in governing context-based modulations of stereosensitivity. These contextual influences are object-level expertise dependent, and do not merely reflect repeated exposure to the stimulus.
Acknowledgments
Original 3D models of the Greeble stimuli courtesy of Michael J. Tarr, Center for the Neural Basis of Cognition, Carnegie Mellon University, http://www.tarrlab.org/.
Author contributions
Zhen Li (Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Writing—original draft, Writing—review & editing) and Dorita H. F. Chang (Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Writing—review & editing)
Data availability
The raw data supporting the conclusions of this article will be made available upon reasonable request to the corresponding author.
Funding
Early Career Scheme Grant (Research Grants Council, Hong Kong; Project Number: 27612119) to D.C.
Conflicts of interest statement: The authors declare no relevant financial or non-financial competing interests.