Abstract

The “sensory recruitment hypothesis” posits an essential role of sensory cortices in working memory, beyond the well-accepted frontoparietal areas. Yet, this hypothesis has recently been challenged. In the present study, participants performed a delayed orientation recall task while high-spatial-resolution 3 T functional magnetic resonance imaging (fMRI) signals were measured in posterior cortices. A multivariate inverted encoding model approach was used to decode remembered orientations based on blood oxygen level-dependent fMRI signals from visual cortices during the delay period. We found that not only did activity in the contralateral primary visual cortex (V1) retain high-fidelity representations of the visual stimuli, but activity in the ipsilateral V1 also contained such orientation tuning. Moreover, although the encoded tuning was faded in the contralateral V1 during the late delay period, tuning information in the ipsilateral V1 remained sustained. Furthermore, the ipsilateral representation was presented in secondary visual cortex (V2) as well, but not in other higher-level visual areas. These results thus supported the sensory recruitment hypothesis and extended it to the ipsilateral sensory areas, which indicated the distributed involvement of visual areas in visual working memory.

Introduction

Working memory (WM) is a critical cognitive function characterized by maintaining and manipulating sensory information over short timescales to serve goal-directed behavior. Persistent neuronal activity during WM delay period was first discovered in the prefrontal cortex (Fuster and Alexander 1971; Funahashi and Bruce 1989) and later in the sensory cortex (Zhou and Fuster 1996). Moreover, the sensory and prefrontal cortices were proposed to mediate different aspects of WM (Ku et al. 2015a). However, it has recently been debated whether information maintained in sensory cortices during WM delay periods is critical for WM performance. Some researchers argued that representations maintained in primary visual cortex (V1) can be easily distracted and are not essential for visual WM (VWM) (Bettencourt and Xu 2016; Xu 2017, 2018), whereas others supported the “sensory recruitment hypothesis” from the point that neural encoding in V1 was associated with performance of VWM (Lorenc et al. 2018; Scimeca et al. 2018; Rademaker et al. 2019; Jia et al. 2021; Yu and Postle 2021).

Functional magnetic resonance imaging (fMRI) studies on humans indicated associations between brain activity in primary sensory cortices and VWM performance (Ester et al. 2013; Sprague et al. 2014; Rademaker et al. 2019). Meanwhile, animal studies have shown that cognitive training could improve the spectral and spatial selectivity of the receptive field (RF) in sensory cortices, which would subsequently benefit WM and sustained attention, indicating the relationship between the low-level sensory neural activity and the high-level cognitive functions of attention and WM (Mishra et al. 2014). Further transcranial magnetic stimulation (TMS) studies had demonstrated that stimulating the primary somatosensory cortex (S1) during the WM delay period could affect the accuracy of either tactile unimodal or tactile–visual cross-modal WM performance (Ku et al. 2015b, 2015c; Zhao and Ku 2018; Zhao et al. 2018), thus implying the causal role of S1 in tactile WM. It was interesting to note that not only the contralateral S1, but also the ipsilateral S1 appeared to participate in WM processes (Zhao et al. 2018). However, it was still unclear how representations in the ipsilateral sensory cortex were expressed, as well as at which time periods WM information could be encoded in the ipsilateral sensory cortex.

In the current study, we assessed the changes of representations over time in both contralateral and ipsilateral sensory cortices during a visual delayed recall task for orientation-modulated stimuli. We collected high-resolution (2 × 2 × 2 mm) fMRI measurements in posterior brain areas. Retinotopic mapping (Dumoulin and Wandell 2008) was first conducted to define each visual area in the visual hierarchy, and an additional functional localizer was used to further confirm the selection of voxels in V1 that responded to the visual stimuli presented at different locations of the screen in the current WM task. We then used an inverted encoding model (IEM) approach (Sprague et al. 2018) to reconstruct orientation tuning curves based on activity observed in different cortical regions. Our results indicate that both the contralateral V1 and ipsilateral V1 were involved in maintaining VWM with similar representational tuning, but the tuning in ipsilateral V1 was sustained longer than that in contralateral V1.

Materials and Methods

Participants

Eight participants were recruited for the experiment. Two participants were excluded because they had head movement larger than 6 mm in all sessions. The age of the remaining six participants (three females) were 22.46 ± 1.86 (mean ± standard deviation), ranging from 20 to 25 years old. A total of 42 h MRI data were collected from the six participants in this experiment. All participants were right handed with normal or corrected vision. No participant reported history of mental disorder in themselves or in their family. This study was approved by the ethics committee in the East China Normal University, and all participants signed the informed consent form before the experiment.

Stimuli and Procedure

Experimental stimuli were generated by the Psychophysics Toolbox in MATLAB. Stimuli were shown on an LCD monitor with a resolution of 1024 × 768 and projected to screen fabric in the MRI scanner. Participants viewed stimuli via a mirror assembled on the head coil.

Population Receptive Field Experiment

We used a population receptive field (pRF) experiment similar to that used in the HCP 7 T Retinotopy Dataset (Benson et al. 2018) (Fig. 1a). Stimuli used in this task were colorful object textures windowed through slowly moving apertures. The apertures were clockwise and counterclockwise rotating wedges and expanding or contracting rings. The maximum visual angle of the visual stimuli was 25o in diameter. A fixation dot of size 1.5o × 1.5o was shown in the middle of the screen during the entire experiment. The fixation dot changed color every 1–5 s and the participant was instructed to press a button when the fixation dot changed color. Every run of the pRF task lasted 300 s, and participants completed four runs of this task.

Experimental procedures and individual participant results. (a) Schematic of the population receptive field (pRF) experiment (top) and the mapping task (bottom). The estimation of polar angle from the pRF experiment was used to define ROIs of early visual areas. Voxels exhibiting maximum BOLD responses to the four potential target locations probed in the mapping task were extracted from each ROI. Note that only voxels extracted from V1 are shown here. (b) Schematic of working memory task. Participants were required to remember the orientation of one of the sample gratings and then adjusted the probed item’s orientation to match the remembered orientation. There were four possible sample configurations (bottom). (c) Schematic of the inverted encoding model approach. The model assumes that the BOLD activity measured by fMRI of each vertex in each trial is an approximately weighted sum of underlying neural populations that can be modeled using a set of basis functions (C1). This weight (W) can be estimated by the assumed C1 and a training set of the observed fMRI response (B1), and then, by using another testing set, the memory representation of a tested trial can be decoded. (d) Polar angle results of the population receptive field estimation and ROI definition of individual participants. The colors on the colormap represent the phase of the fMRI response and therefore the preferred regions of the visual field. Black dashed lines indicate the borders of visual areas. Note that some brain areas cannot be shown on the figure because of the viewpoints. (e) Histograms of working memory recall errors (i.e., reported orientation minus original orientation) for each participant.
Figure 1

Experimental procedures and individual participant results. (a) Schematic of the population receptive field (pRF) experiment (top) and the mapping task (bottom). The estimation of polar angle from the pRF experiment was used to define ROIs of early visual areas. Voxels exhibiting maximum BOLD responses to the four potential target locations probed in the mapping task were extracted from each ROI. Note that only voxels extracted from V1 are shown here. (b) Schematic of working memory task. Participants were required to remember the orientation of one of the sample gratings and then adjusted the probed item’s orientation to match the remembered orientation. There were four possible sample configurations (bottom). (c) Schematic of the inverted encoding model approach. The model assumes that the BOLD activity measured by fMRI of each vertex in each trial is an approximately weighted sum of underlying neural populations that can be modeled using a set of basis functions (C1). This weight (W) can be estimated by the assumed C1 and a training set of the observed fMRI response (B1), and then, by using another testing set, the memory representation of a tested trial can be decoded. (d) Polar angle results of the population receptive field estimation and ROI definition of individual participants. The colors on the colormap represent the phase of the fMRI response and therefore the preferred regions of the visual field. Black dashed lines indicate the borders of visual areas. Note that some brain areas cannot be shown on the figure because of the viewpoints. (e) Histograms of working memory recall errors (i.e., reported orientation minus original orientation) for each participant.

Mapping Task

The purpose of the mapping task was to carefully localize voxels that respond to the locations of the gratings used in the WM task. The mapping task involved simply showing gratings of different locations and orientations one by one (Fig. 1a). As shown in Figure 1, every grating appeared at a fixed location (10o eccentricity) in one of the four quadrants and stayed on the screen for 4 s. The radius of each grating was 1o of visual angle, and gratings were flickered at 3 Hz. Orientations of gratings were selected from 0o, 20o, 40o, 60o, 80o, 100o, 120o, 140o, and 160o (same as the WM task). The interstimulus interval was 2 s. Every run consisted of 4 (quadrants) × 9 (orientations) = 36 stimulus trials and four blank trials. Every combination of location (quadrant) and orientation appeared once in each 272 s run.

WM Task

We used an orientation-based continuous recall task as the WM task (Fig. 1b). Each trial began with two gratings shown on the screen at the same time (sample array), flickering at 3 Hz. The physical features and possible locations of the gratings were identical to those used in the mapping task. The spatial configuration of the two gratings in the sample array is illustrated in Figure 1b. The two gratings always appeared in two adjacent quadrants and never appeared diagonally from one another. This ensured that the center-to-center distance between the two gratings was consistent during the whole experiment. In each trial, the orientation of each grating was set randomly to either 0o, 20o, 40o, 60o, 80o, 100o, 120o, 140o, or 160o plus a small offset drawn from a uniform distribution over [−5°,5°]. This small offset was introduced to reduce practice effects for specific orientations. The two orientations of the stimuli in each trial were always different, which meant that there was at least 10o between the two orientations on the same screen. After the 1 s sample array, the delay period began, and a cue appeared on the screen instructing participants to remember the orientation of which grating. The cue stayed on the screen during the whole delay period. Finally, a probe array appeared. In the probe array, participants were asked to adjust the probed item, which appeared at a specific location, to the orientation that had been remembered. This was achieved through the use of an MRI-compatible computer mouse and a mouse pad placed on the right side of the scanner bed. The initial orientation of the probed item was always towards the fixation cross. The probe array lasted 4 s. All participants responded on all trials. Intertrial intervals were pseudo-randomly chosen from 2, 4, or 6 s. Each run contained 16 trials. Combinations of orientation pairs (36 pairs), quadrant pairs (four pairs as shown in Fig. 1b) were fully counterbalanced across both runs and participants.

General Procedure

Participants were asked to finish one session of localization (the pRF experiment) and at least two sessions of the main task (the mapping and WM tasks). We conducted only one session in a given day and the intervals of two sessions were at least 2 days apart. The localization session contained one run of anatomical imaging (MPRAGE sequence) and four runs of the pRF task (echo planar imaging (EPI) sequence). The main task session was composed of 10 runs of the mapping task (272 s each) and eight runs of the WM task (356 s each). Each session lasted about 1.5 h. Participants were asked to finish at least two sessions and at most four sessions.

MRI Data Acquisition

MRI data were collected using a Siemens 3 T Trio scanner and a 32-channel RF coil in the East China Normal University. Whole-brain T1-weighted imaging was conducted with an MPRAGE sequence: repetition time (TR) 2530 ms, echo time (TE) 2.46 ms, flip angle 7o, matrix size 256 × 256, Field-of-view 210 mm, voxel size 0.8 × 0.8 × 0.8 mm.

The functional images were acquired with an EPI sequence: TR 2000 ms, TE 30 ms, flip angle 90o, matrix size 96 × 96 with parallel imaging acceleration (GRAPPA) 2, field-of-view 192 mm, voxel size = 2 × 2 × 2 mm. The number of slices was 31. Because we only covered the posterior part of the brain (all dorsal part of the visual cortex, part of the ventral visual cortex, and posterior part of the parietal cortex), to avoid image wraparound, the phase-encoding direction was set to left–right.

MRI Data Analyses

Preprocessing of Anatomical and Functional Images

Preprocessing of anatomical data was performed by using FreeSurfer. T1-weighted anatomical images were processed by the command “recon-all” and resampled to 1 × 1 × 1 mm. A cortical surface positioned halfway between the pial surface and the boundary between gray and white matter was reconstructed.

Functional data were preprocessed by performing slice time correction (aligned to the middle slice) and motion correction (aligned to the first volume of the first run) for each session, respectively. The mean functional volume of each session was then aligned to the participant-native anatomical volume by using a rigid body transformation. This alignment was used to sample the functional data onto the cortical surfaces. Data used in the following analyses were performed on surfaces, meaning that the spatial units were vertices instead of voxels. The resolution of a vertex is equivalent to a voxel. All data of regions of interest (ROIs) were extracted on participants’ native space.

ROI Definition

We used pRF estimation methods to define the visual field maps and ROIs (Dumoulin and Wandell 2008; Kay et al. 2013b). The Compressive Spatial Summation model was used to estimate angle, eccentricity, size, and exponent of each vertex from the time series data (Kay et al. 2013b). Angle information was further visualized on cortical surface reconstructions and used to define V1, V2, V3, V4, V3a, and V3b (Brewer et al. 2005) (Fig. 1d for ROI definition of each individual participant). We then refined the ROIs by the general linear model (GLM) results of the mapping task (see details in GLM Analysis).

GLM Analysis

In the main experiment sessions, the preprocessed fMRI data were analyzed using GLMdenoise (Kay et al. 2013a) (http://cvnlab.net/GLMdenoise/), a denoising method that estimates correlated noise from the data and regard these estimates as regressors in a GLM analysis.

In the mapping task, we coded each combination of orientations and locations as a separate condition, producing 9 (orientation) × 4 (location) = 36 conditions for each run. We then averaged the beta values of nine orientations within each location and obtained the mean beta value for each location in each vertex. According to the retinotopic mapping rule, each subregion (left/right, dorsal/ventral) of the early visual cortex (EVC) (V1–V3) represents one location (e.g., left dorsal V1–V3 only activates when the visual stimulus is displayed in the bottom-right). Moreover, since the visual stimuli we used only covered part of the visual field, we identified the 500 surface vertices showing strongest activation for the corresponding location of each subregion for each participant (e.g., we selected in right dorsal V1 the 500 vertices that showed highest blood oxygen level-dependent (BOLD) responses when bottom-left stimulus was shown, and the same for other ROIs). Note that because V3a, V3b, and V4 exists only on the dorsal or ventral part of the visual cortex, we averaged the beta values of the two regions in the left or right hemisphere before labeling the vertices. These regions composed of 500 vertices were regarded as the ROIs for further analyses.

In the WM task, we first upsampled the preprocessed time series data to one data point per second (note that the TR was 2 s) using cubic interpolation to get more samples for the subsequent finite impulse response (FIR) analysis and get a better fitting. We then conducted an FIR analysis to estimate the time course of response for each single trial (this analysis included polynomial terms to model low-frequency drift in the time series data). The onset time for each time course was the sample onset, and each time course lasted for 16 s.

IEM

To reconstruct the orientation information maintained in VWM, we used a multivariate approach termed the “inverted encoding model (IEM)” (Fig. 1c) (Brouwer and Heeger 2009, 2011; Sprague and Serences 2013; Ester et al. 2015b). This linear model assumes that the BOLD activity measured by fMRI in a given vertex is an approximately linear sum of underlying neural populations that can be modeled using a set of basis functions. According to the model assumption, in each trial, the observed BOLD activity (i.e., beta values) B1 (in a matrix of m vertices × n trials) of each vertex is a weighted (W, m vertices × k channels) linear sum of the predicted BOLD activity C1 (k channels × n trials) to the orientation present on each trial. This can be described as follows:
W is a weight matrix that links the “information channel space” to the “vertex space.” Following Ester et al. (2015a) (Ester et al. 2015b), we used a set of nine orientation channels to model neural responses. We generated nine half-wave cosine functions centered at the same nine orientations used in the experiment (0o, 20o, 40o, etc.) and raised to the eighth power. Given the B1 and C1, the weight matrix W can be estimated using ordinary least-squares regression:
We used a “training” dataset to estimate |$\hat{W}$| and then, given the observed responses in an independent “test” dataset B2, we obtained a set of estimated channel responses C2:

For each set of estimated channel responses, we calculated Pearson’s correlation between these responses and the nine hypothetical orientation channels. The orientation showed the strongest correlation with the estimated channel response was taken to be the most likely orientation that had been remembered during the VWM task. Estimated channel responses were circularly shifted to a common center (0o) corresponding to the orientation of the sample stimulus and then averaged across trials.

We performed a leave-one-run-out cross-validation approach, in which data from all runs but one acted as the training dataset B1 to estimate |$\hat{W}$| and data from the remaining run acted as test dataset B2 to estimate C2. This process was repeated until all runs have been used as the independent test dataset. Note that we chose the top 20% sensitive to orientation voxels in each ROI to do the IEM (Sprague and Serences 2013). Thus, this reconstruction analysis was done for each ROI (with 100 vertices) and each scanning session of every participant separately. Results from sessions of each participant were averaged at the last step.

We used the center channel response (CCR, channel response at 0o) to quantify the reconstruction significance, and then used a bootstrapping procedure to estimate the significance of the CCR. For each ROI, the CCR of all participants was randomly sampled with replacement from the pool of all participants’ reconstruction results and then averaged. This procedure was repeated 50 000 times. To test the significance, we compared the t-statistic calculated from the reconstructed results against zero. Specifically, in order to get the P value, we randomly get averaged CCR from all participants with replacement, and then calculated the probability of averaged CCR to be less than zero. This test is one sided and uncorrected.

Except for the time course analyses, averaged beta values of 9–12 s after the sample onset of each trial were used in the IEM. This time period was regarded as the delay period with the least interference from the sample and probe array.

Time Course Analysis of VWM Representation

In order to assess how VWM representation evolves over time, we performed a time course analysis. In this analysis, we used a sliding time window of width 4 s and repeated the IEM analysis for the averaged data in each time window (step size 1 s).

Results

Behavior Results of WM Task

WM performance was calculated as the mean absolute value of the distance between participants’ reported orientations and the original displayed orientations in the sample array. Average recall error of all participants was 13.72o, and standard error was 0.61o. Recall error distribution of each participant was clustered around 0o and illustrated in Figure 1e, confirming that participants were remembering the accurate orientations according to the instruction.

Reconstruction of VWM Contents in the Visual Cortex

We first investigated whether the visual cortex is involved in the maintenance of the content of VWM. If memory content (i.e., orientations of gratings) is encoded by visual cortex during the delay period, then the reconstructed curve from the IEM analysis should be a tuning curve centered on the memorized orientation (Fig. 2c, dark grey curves). If visual cortex is not responsible for VWM maintenance, the reconstructed curve should resemble a uniform distribution (Fig. 2c, light grey curves). In this analysis, the probed item was defined as the target item, the ROI corresponding to the location of the target item was defined as the ROI within the RF (contr ROI–within RF), and the rest of the ROI(s) in each visual area (i.e., V1, V2, V3, V3a, V3b, V4) were defined as the ROI out of the receptive field (ROI out of RF). Note that each ROI was composed by top 100 strongly activated surfaced vertices got from a separate mapping task. As illustrated in Figure 2b, if the target item was shown in the second quadrant (top left), the ROI–within RF was then the ROI in the right ventral early visual areas (V1, V2, V3) or in right hemisphere V3a, V3b, and V4.

Orientation reconstruction across the visual hierarchy. (a) Center channel response of the reconstructed tuning function. Error bars indicate the between-participant standard error. *P < 0.05; **P < 0.01; ***P < 0.001. (b) Illustration of the ROI definition. (c) Reconstructed tuning functions across the visual hierarchy. EVC: early visual cortex (combination of V1, V2, V3). Plots show average across all participants. Shade areas represent the between-participant standard error.
Figure 2

Orientation reconstruction across the visual hierarchy. (a) Center channel response of the reconstructed tuning function. Error bars indicate the between-participant standard error. *P < 0.05; **P < 0.01; ***P < 0.001. (b) Illustration of the ROI definition. (c) Reconstructed tuning functions across the visual hierarchy. EVC: early visual cortex (combination of V1, V2, V3). Plots show average across all participants. Shade areas represent the between-participant standard error.

Consistent with previous findings (Ester et al. 2013, 2015a), we successfully decoded the orientation of the probed item in the early visual areas (V1, V2, V3) using responses from the ROI corresponding to the location of the target item. The representation fidelity (calculated as the CCR) of V1, V2, V3 as well as the combination of these three ROIs were all significantly higher than baseline (Ps < 0.02) (Fig. 2a,c). The averaged IEM results of ROIs out of the RF for V1 (P = 0.006) and V2 (P < 0.001) were significantly higher than baseline, but other ROIs showed no difference from baseline (Ps > 0.055).

VWM Representation in the ROIs in Nonstimulated Regions

The above findings demonstrate robust VWM representation in regions of the EVC. However, it remains unknown whether this representation occurs only in the ROI that responds to the target location. We further investigated the roles of ROIs whose RFs did not overlap with the target item. We divided ROIs into three types: contralateral ROI with RF covering the probed item (contr ROI–within RF), contralateral ROI out of RF (contr ROI–out of RF), and ipsilateral ROI (ipsi ROI). For instance, as illustrated in Figure 3b, if the target item was presented at top left of the screen, the right ventral V1 would be the contralateral ROI within the RF, and the right dorsal V1 would be the contralateral ROI out of the RF. Moreover, the left two ROIs would be combined as the ipsilateral ROIs.

We calculated the orientation reconstruction of the four ROIs separately and then averaged results of the two ipsilateral ROIs. We compared the three types of ROIs’ reconstruction outcomes and found that the contralateral ROI within RF and the ipsilateral ROIs in V1 showed significant reconstruction of memory content (Ps < 0.01), whereas surprisingly, the contralateral ROI out of RF did not show evidence of representation in V1 (P = 0.392). The combination of the V1, V2, and V3 (constituting the EVC) showed similar results that, except for the contralateral ROI within RF, the ipsilateral ROI also represented the orientation in VWM (P < 0.001) and the contralateral ROI out of RF did not (Fig. 3). These results imply that the neurons recruited for the representation of VWM content are more extensive than neurons recruited for visual perception.

Orientation reconstruction from different ROIs. (a) Center channel response of the reconstructed tuning function. Error bars indicate the between-participant standard error. *P < 0.05; **P < 0.01; ***P < 0.001. (b) Illustration of the ROI definition. (c) Reconstructed tuning functions. Results showed that both contralateral ROIs whose receptive field (RF) overlapped with the probed item and ipsilateral ROIs could successfully reconstruct the orientation tuning functions during memory maintenance. EVC: early visual cortex (combination of V1, V2, V3). Plots show average across all participants. Shade areas represent the between-participant standard error.
Figure 3

Orientation reconstruction from different ROIs. (a) Center channel response of the reconstructed tuning function. Error bars indicate the between-participant standard error. *P < 0.05; **P < 0.01; ***P < 0.001. (b) Illustration of the ROI definition. (c) Reconstructed tuning functions. Results showed that both contralateral ROIs whose receptive field (RF) overlapped with the probed item and ipsilateral ROIs could successfully reconstruct the orientation tuning functions during memory maintenance. EVC: early visual cortex (combination of V1, V2, V3). Plots show average across all participants. Shade areas represent the between-participant standard error.

In our WM experiment, the sample array always comprised two distinct items (gratings). Consequently, when the two items were presented on the same side of the visual field, the contralateral ROI out of RF may have been engaged in representing the nontarget stimulus even though it was not the task-relevant item. This is a potential reason why reconstruction did not succeed in this ROI. To rule out this possibility, we isolated the trials in which the two sample items were shown on both sides of the visual field and excluded the ROI corresponding to the nontarget item (Fig. 4b). The reason that we took out this ROI is because: (1) There is a possibility that this ROI encoded the nontarget item although it was not a task-relevant stimulus, which may contaminate the reconstruction results from the ipsilateral ROI and (2) the size of ipsilateral ROI was twice as the other two types of ROIs, which might bring bias into the data analyses. We again found similar results indicating the noninvolvement of the contralateral ROI out of RF in the VWM representation, even though this ROI was not presented a visual stimulus. Meanwhile, the contralateral ROI within RF and the ipsilateral ROI exhibited robust orientation reconstruction (Fig. 4a,c).

Orientation reconstruction from different ROIs–control analysis. (a) Center channel response of the reconstructed tuning function. Error bars indicate the between-participant standard error. ***P < 0.001. (b) Illustration of the ROI definition. (c) Reconstructed tuning functions of early visual areas in a subset of trials and ROIs. EVC: early visual cortex (combination of V1, V2, V3). Plots show average across all participants. Shade areas represent the between-participant standard error.
Figure 4

Orientation reconstruction from different ROIs–control analysis. (a) Center channel response of the reconstructed tuning function. Error bars indicate the between-participant standard error. ***P < 0.001. (b) Illustration of the ROI definition. (c) Reconstructed tuning functions of early visual areas in a subset of trials and ROIs. EVC: early visual cortex (combination of V1, V2, V3). Plots show average across all participants. Shade areas represent the between-participant standard error.

Time Course of Orientation Reconstruction

To gain potential insight into dynamics during WM, we performed a sliding-window analysis combined with the IEM analysis. Results showed that the contralateral ROI within RF represented visual information during early parts of the delay period, whereas the reconstruction from fMRI activity in the ipsilateral ROI persisted during the whole delay period (Fig. 5).

Center channel response of the reconstructed tuning function over time. †P < 0.06; *P < 0.05; **P < 0.01; ***P < 0.001. Results showed that contralateral ROI within the receptive field (RF) only exhibited representation of the orientations in the early delay period, whereas the ipsilateral ROIs revealed representation into the late delay.
Figure 5

Center channel response of the reconstructed tuning function over time. P < 0.06; *P < 0.05; **P < 0.01; ***P < 0.001. Results showed that contralateral ROI within the receptive field (RF) only exhibited representation of the orientations in the early delay period, whereas the ipsilateral ROIs revealed representation into the late delay.

Discussion

Using high-resolution fMRI combined with the IEM, we reveal the maintenance of VWM representations in both contralateral V1 and ipsilateral V1. First, these results support the “sensory recruitment hypothesis” for VWM and extend it to the involvement of both contralateral and ipsilateral sensory cortices. Second, these results are consistent with the working model of distributed representations of VWM over cortices, as we observed prominent tuning curves across multiple regions in the visual hierarchy. Third, the temporal profiles of representations in the ipsilateral V1 are more sustained than the contralateral V1, suggesting cooperative roles of these areas in fulfilling VWM processes. Taken together, the sensory recruitment hypothesis has been revisited and extended to the ipsilateral sensory cortex, which helps its contralateral partner with sustained high-fidelity representation during maintenance of VWM.

Why might the ipsilateral V1 participate in VWM? The visual cortex is essential to represent detailed visual information, as it has been suggested that population responses in visual cortices have higher dimensionality than those in the prefrontal cortex (Stringer et al. 2019). Indeed, it has been proposed that sensory cortices and the prefrontal cortex may represent “quality” and “quantity” of WM, respectively (Ku et al. 2015a). Recently, a univariate fMRI study provided evidence that the visual cortex acted as the neural correlate of the “precision” parameter in VWM (Zhao et al. 2020), providing additional supports together with many studies that successfully decode information of VWM items from the sensory cortices (Emrich et al. 2013; Postle 2016; Serences 2016; Sprague et al. 2016; Jia et al. 2021). However, it has been consistently demonstrated that the representations in primary visual areas are more fragile to interference than those in the prefrontal or parietal cortices (Bettencourt and Xu 2016). A recent paper has shown that VWM trace is lateralized in the prefrontal cortex and can be shifted to the other hemisphere, but the representation of the shift cannot be cross-decoded (Brincat et al. 2021). It is possible that the prefrontal cortex plays a role of top-down control (Ku et al. 2015a; Zhao et al. 2021), whereas the visual cortex represents the details of the mnemonic contents (Ester et al. 2009; Lorenc et al. 2018) and the information could shift to the ipsilateral hemisphere. On the other hand, the representation on the ipsilateral site may also relate to averting the sensory noise, which has a negative effect on sensory process, memory maintenance, and even speech comprehension (Li et al. 2021). Thus, we speculate that ipsilateral V1 may serve as a suitable neural substrate that can maintain high-fidelity visual information while ignoring interference from newly presented distractors at the location of the encoded item. The memory representation in contralateral visual regions is quite robust according to previous multivariate studies, and our findings add evidence that the ipsilateral visual regions may have a different function in VWM compared with the contralateral regions, given the distinguished temporal profiles of the two regions. We have previously found evidence for ipsilateral representation in primary somatosensory cortex (S1) in the tactile domain (Zhao and Ku 2018). It would be interesting to look at the situation in the auditory domain as the primary auditory cortex has already processed binaural information.

Where does the information in ipsilateral V1 come from, given that feedforward visual processing appears largely contralateral at the primary level of V1? There might be three possible pathways. First, information goes directly from the contralateral V1 through the corpus callosum to the ipsilateral counterpart. Second, information could be sent from the thalamus to the ipsilateral V1. Third, information originates from higher-level visual areas and then been transferred to the ipsilateral V1 through feedback projections. Thalamus has recently been suggested to regulate cortical representation (Rikhye et al. 2018) and is important for cognitive control (Halassa and Kastner 2017). One study showed that representations decoded from the fMRI signal in the lateral geniculate nucleus could be affected by attention (Ling et al. 2015). Therefore, it is possible that VWM information goes through thalamus to the ipsilateral V1. Unfortunately, corpus callosum pathway and thalamus pathway could not be distinguished from the current results, which needs further animal studies to dissect. Meanwhile, it is likely that the signal in the ipsilateral V1 originates from feedback projection. Recent studies using 7 T fMRI could identify activity in different layers of V1 and provided evidence indicating the roles of feedback projections in the superficial layers (I/II) for context processing (Muckli et al. 2015) and feedback projections in the deep layers (VI) for top-down modulation (Kok et al. 2016). It should be noted that in animal studies the secondary somatosensory cortex (S2) and the motor areas have been proposed to project memory information back to S1 in the tactile domain (Condylis et al. 2020) and future study might further explore the signal transition between bilateral primary sensory cortices. It is interesting to notice that in the present study the tuning in the ipsilateral V2 is also prominent but not in the ipsilateral side of V3 or other higher-level visual areas. The enlarged RF along the visual hierarchy could be a possible reason for these results.

Visual imagery has been argued to share neural substrates with VWM and participate in the processes of VWM (Albers et al. 2013; Tong 2013). However, there also existed discrepancies between visual imagery and VWM. For example, evidence has shown that participants with weaker ability of imagination do not use imagery strategy to maintain WM information (Keogh and Pearson 2011). Our results further suggest that the processes engaged in the current VWM paradigm extend beyond imagery. If the processes of VWM are similar with visual imagery, one might expect only representations in the contralateral brain areas as imagery reinstantiation that generally followed the areas processing sensory input, at least in the visual (Farah 1984; Guariglia et al. 1993) and tactile (Yoo et al. 2003) domain. Thus, our results could not be simply explained by mechanisms of mental imagery.

Besides the quality representations in VWM, quantity is also an important factor influencing VWM. It has been suggested VWM has a limit of about 3–4 slots (Cowan 2001; Zhang and Luck 2008), and prefrontal cortex in each hemisphere can represent half of VWM items (Buschman et al. 2011). It would be interesting to further investigate the interaction between the ipsilateral V1 and the prefrontal cortex in representing the quantity of VWM. It should also be noted that the current study can only offer correlational evidence. Future TMS studies are needed to further reveal the causal relationship between the representations in the ipsilateral V1 and WM performance.

In summary, our findings supported the sensory recruitment hypothesis in VWM and revealed that besides the contralateral V1, the ipsilateral V1 was involved in maintaining VWM representation, especially during the late delay period.

Funding

The National Natural Science Foundation of China (32171082), the National Social Science Foundation of China (17ZDA323), the Shanghai Committee of Science and Technology (19ZR1416700), and the Hundred Top Talents Program from Sun Yat-sen University to Y.K. T.Y. is partially supported by grants from the National Natural Science Foundation of China (under contract Nos 61825101, 62088102).

Notes

The authors thank Wenshuo Zhang for his invaluable input in this paper. Conflict of Interest: None declared.

References

Albers
 
AM
,
Kok
 
P
,
Toni
 
I
,
Dijkerman
 
HC
,
de
 
Lange
 
FP
.
2013
.
Shared representations for working memory and mental imagery in early visual cortex
.
Curr Biol
.
23
:
1427
1431
.

Benson
 
NC
,
Jamison
 
KW
,
Arcaro
 
MJ
,
Vu
 
AT
,
Glasser
 
MF
,
Coalson
 
TS
,
Van Essen
 
DC
,
Yacoub
 
E
,
Ugurbil
 
K
,
Winawer
 
J
, et al.  
2018
.
The Human Connectome Project 7 Tesla retinotopy dataset: description and population receptive field analysis
.
J Vis
.
18
:
23
.

Bettencourt
 
KC
,
Xu
 
Y
.
2016
.
Understanding location- and feature-based processing along the human intraparietal sulcus
.
J Neurophysiol
.
116
:
1488
1497
.

Brewer
 
AA
,
Liu
 
J
,
Wade
 
AR
,
Wandell
 
BA
.
2005
.
Visual field maps and stimulus selectivity in human ventral occipital cortex
.
Nat Neurosci
.
8
:
1102
1109
.

Brincat
 
SL
,
Donoghue
 
JA
,
Mahnke
 
MK
,
Kornblith
 
S
,
Lundqvist
 
M
,
Miller
 
EK
.
2021
.
Interhemispheric transfer of working memories
.
Neuron
.
109
:
1055
1066.e4
.

Brouwer
 
GJ
,
Heeger
 
DJ
.
2009
.
Decoding and reconstructing color from responses in human visual cortex
.
J Neurosci
.
29
:
13992
14003
.

Brouwer
 
GJ
,
Heeger
 
DJ
.
2011
.
Cross-orientation suppression in human visual cortex
.
J Neurophysiol
.
106
:
2108
2119
.

Buschman
 
TJ
,
Siegel
 
M
,
Roy
 
JE
,
Miller
 
EK
.
2011
.
Neural substrates of cognitive capacity limitations
.
Proc Natl Acad Sci
.
108
:
11252
11255
.

Condylis
 
C
,
Lowet
 
E
,
Ni
 
J
,
Bistrong
 
K
,
Ouellette
 
T
,
Josephs
 
N
,
Chen
 
JL
.
2020
.
Context-dependent sensory processing across primary and secondary somatosensory cortex
.
Neuron
.
106
:
515
525.e5
.

Cowan
 
N
.
2001
.
The magical number 4 in short-term memory: a reconsideration of mental storage capacity
.
Behav Brain Sci
.
24
:
87
114
.

Dumoulin
 
SO
,
Wandell
 
BA
.
2008
.
Population receptive field estimates in human visual cortex
.
Neuroimage
.
39
:
647
660
.

Emrich
 
SM
,
Riggall
 
AC
,
LaRocque
 
JJ
,
Postle
 
BR
.
2013
.
Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory
.
J Neurosci
.
33
:
6516
6523
.

Ester
 
EF
,
Anderson
 
DE
,
Serences
 
JT
,
Awh
 
E
.
2013
.
A neural measure of precision in visual working memory
.
J Cogn Neurosci
.
25
:
754
761
.

Ester
 
EF
,
Serences
 
JT
,
Awh
 
E
.
2009
.
Spatially global representations in human primary visual cortex during working memory maintenance
.
J Neurosci
.
29
:
15258
15265
.

Ester
 
EF
,
Sprague
 
TC
,
Serences
 
JT
.
2015a
.
Parietal and frontal cortex encode stimulus-specific mnemonic representations during visual working memory
.
Neuron
.
87
:
893
905
.

Ester
 
EF
,
Zilber
 
E
,
Serences
 
JT
.
2015b
.
Substitution and pooling in visual crowding induced by similar and dissimilar distractors
.
J Vis
.
15
:
4
.

Farah
 
MJ
.
1984
.
The neurological basis of mental imagery: a componential analysis
.
Cognition
.
18
:
245
272
.

Funahashi
 
S
,
Bruce
 
CJ
,
Goldman-Rakic
 
PS
.
1989
.
Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex
.
J Neurophysiol
.
61
:
331
349
.

Fuster
 
JM
,
Alexander
 
GE
.
1971
.
Neuron activity related to short-term memory
.
Science
.
173
:
652
654
.

Guariglia
 
C
,
Padovani
 
A
,
Pantano
 
P
,
Pizzamiglio
 
L
.
1993
.
Unilateral neglect restricted to visual imagery
.
Nature
.
364
:
235
237
.

Halassa
 
MM
,
Kastner
 
S
.
2017
.
Thalamic functions in distributed cognitive control
.
Nat Neurosci
.
20
:
1669
1679
.

Jia
 
K
,
Li
 
Y
,
Gong
 
M
,
Huang
 
H
,
Wang
 
Y
,
Li
 
S
.
2021
.
Perceptual learning beyond perception: mnemonic representation in early visual cortex and intraparietal sulcus
.
J Neurosci
.
41
:
4476
4486
.

Kay
 
KN
,
Rokem
 
A
,
Winawer
 
J
,
Dougherty
 
RF
,
Wandell
 
BA
.
2013a
.
GLMdenoise: a fast, automated technique for denoising task-based fMRI data
.
Front Neurosci
.
7
:
1
15
.

Kay
 
KN
,
Winawer
 
J
,
Mezer
 
A
,
Wandell
 
BA
.
2013b
.
Compressive spatial summation in human visual cortex
.
J Neurophysiol
.
110
:
481
494
.

Keogh
 
R
,
Pearson
 
J
.
2011
.
Mental imagery and visual working memory
.
PLoS One
.
6
:e29221.

Kok
 
P
,
Bains
 
LJ
,
Van Mourik
 
T
,
Norris
 
DG
,
De Lange
 
FP
.
2016
.
Selective activation of the deep layers of the human primary visual cortex by top-down feedback
.
Curr Biol
.
26
:
371
376
.

Ku
 
Y
,
Bodner
 
M
,
Zhou
 
Y-D
.
2015a
.
Prefrontal cortex and sensory cortices during working memory: quantity and quality
.
Neurosci Bull
.
31
:
175
182
.

Ku
 
Y
,
Zhao
 
D
,
Bodner
 
M
,
Zhou
 
Y-D
.
2015b
.
Cooperative processing in primary somatosensory cortex and posterior parietal cortex during tactile working memory
.
Eur J Neurosci
.
42
:
1905
1911
.

Ku
 
Y
,
Zhao
 
D
,
Hao
 
N
,
Hu
 
Y
,
Bodner
 
M
,
Zhou
 
Y-D
.
2015c
.
Sequential roles of primary somatosensory cortex and posterior parietal cortex in tactile-visual cross-modal working memory: a single-pulse transcranial magnetic stimulation (spTMS) study
.
Brain Stimul
.
8
:
88
91
.

Li
 
Z
,
Li
 
J
,
Hong
 
B
,
Nolte
 
G
,
Engel
 
AK
,
Zhang
 
D
.
2021
.
Speaker–listener neural coupling reveals an adaptive mechanism for speech comprehension in a noisy environment
.
Cereb Cortex
(
online
. .

Ling
 
S
,
Pratte
 
MS
,
Tong
 
F
.
2015
.
Attention alters orientation processing in the human lateral geniculate nucleus
.
Nat Neurosci
.
18
:
496
498
.

Lorenc
 
ES
,
Sreenivasan
 
KK
,
Nee
 
DE
,
Vandenbroucke
 
ARE
,
D’Esposito
 
M
.
2018
.
Flexible coding of visual working memory representations during distraction
.
J Neurosci
.
38
:
5267
5276
.

Mishra
 
J
,
deVillers-Sidani
 
E
,
Merzenich
 
M
,
Gazzaley
 
A
.
2014
.
Adaptive training diminishes distractibility in aging across species
.
Neuron
.
84
:
1091
1103
.

Muckli
 
L
,
De Martino
 
F
,
Vizioli
 
L
,
Petro
 
LS
,
Smith
 
FW
,
Ugurbil
 
K
,
Goebel
 
R
,
Yacoub
 
E
.
2015
.
Contextual feedback to superficial layers of V1
.
Curr Biol
.
25
:
2690
2695
.

Postle
 
BR
.
2016
.
How does the brain keep information “in mind”?
 
Curr Dir Psychol Sci
.
25
:
151
156
.

Rademaker
 
RL
,
Chunharas
 
C
,
Serences
 
JT
.
2019
.
Coexisting representations of sensory and mnemonic information in human visual cortex
.
Nat Neurosci
.
22
:
1336
1344
.

Rikhye
 
RV
,
Gilra
 
A
,
Halassa
 
MM
.
2018
.
Thalamic regulation of switching between cortical representations enables cognitive flexibility
.
Nat Neurosci
.
21
:
1753
1763
.

Scimeca
 
JM
,
Kiyonaga
 
A
,
D’Esposito
 
M
,
D’Esposito
 
M
,
D’Esposito
 
M
.
2018
.
Reaffirming the sensory recruitment account of working memory
.
Trends Cogn Sci
.
22
:
190
192
.

Serences
 
JT
.
2016
.
Neural mechanisms of information storage in visual short-term memory
.
Vision Res
.
128
:
53
67
.

Sprague
 
TC
,
Adam
 
KCS
,
Foster
 
JJ
,
Rahmati
 
M
,
Sutterer
 
DW
,
Vo
 
VA
.
2018
.
Inverted encoding models assay population-level stimulus representations, not single-unit neural tuning
.
eNeuro
.
5
:
1
5
.

Sprague
 
TC
,
Ester
 
EF
,
Serences
 
JT
.
2014
.
Reconstructions of information in visual spatial working memory degrade with memory load
.
Curr Biol
.
24
:
2174
2180
.

Sprague
 
TC
,
Ester
 
EF
,
Serences
 
JT
.
2016
.
Restoring latent visual working memory representations in human cortex
.
Neuron
.
91
:
694
707
.

Sprague
 
TC
,
Serences
 
JT
.
2013
.
Attention modulates spatial priority maps in the human occipital, parietal and frontal cortices
.
Nat Neurosci
.
16
:
1879
1887
.

Stringer
 
C
,
Pachitariu
 
M
,
Steinmetz
 
N
,
Carandini
 
M
,
Harris
 
KD
.
2019
.
High-dimensional geometry of population responses in visual cortex
.
Nature
.
571
:
361
365
.

Tong
 
F
.
2013
.
Imagery and visual working memory: one and the same?
 
Trends Cogn Sci
.
17
:
489
490
.

Xu
 
Y
.
2017
.
Reevaluating the sensory account of visual working memory storage
.
Trends Cogn Sci
.
21
:
794
815
.

Xu
 
Y
.
2018
.
Sensory cortex is nonessential in working memory storage
.
Trends Cogn Sci
.
22
:
192
193
.

Yoo
 
S-S
,
Freeman
 
DK
,
McCarthy
 
JJ
,
Jolesz
 
FA
.
2003
.
Neural substrates of tactile imagery: a functional MRI study
.
Neuroreport
.
14
:
581
585
.

Yu
 
Q
,
Postle
 
BR
.
2021
.
The neural codes underlying internally generated representations in visual working memory
.
J Cogn Neurosci
.
33
:
1142
1157
.

Zhang
 
W
,
Luck
 
SJ
.
2008
.
Discrete fixed-resolution representations in visual working memory
.
Nature
.
453
:
233
235
.

Zhao
 
D
,
Ku
 
Y
.
2018
.
Dorsolateral prefrontal cortex bridges bilateral primary somatosensory cortices during cross-modal working memory
.
Behav Brain Res
.
350
:
116
121
.

Zhao
 
D
,
Zhou
 
Y-D
,
Bodner
 
M
,
Ku
 
Y
.
2018
.
The causal role of the prefrontal cortex and somatosensory cortex in tactile working memory
.
Cereb Cortex
.
28
:
3468
3477
.

Zhao
 
J
,
Mo
 
L
,
Bi
 
R
,
He
 
Z
,
Chen
 
Y
,
Xu
 
F
,
Xie
 
H
,
Zhang
 
D
.
2021
.
The VLPFC versus the DLPFC in downregulating social pain using reappraisal and distraction strategies
.
J Neurosci
.
41
:
1331
1339
.

Zhao
 
Y
,
Kuai
 
S
,
Zanto
 
TP
,
Ku
 
Y
.
2020
.
Neural correlates underlying the precision of visual working memory
.
Neuroscience
.
425
:
301
311
.

Zhou
 
Y-D
,
Fuster
 
JM
.
1996
.
Mnemonic neuronal activity in somatosensory cortex
.
Proc Natl Acad Sci
.
93
:
10533
10537
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)