Interactions between the neural correlates of short-term memory (STM) and attention have been actively studied in the visual STM domain but much less in the verbal STM domain. Here we show that the same attention mechanisms that have been shown to shape the neural networks of visual STM also shape those of verbal STM. Based on previous research in visual STM, we contrasted the involvement of a dorsal attention network centered on the intraparietal sulcus supporting task-related attention and a ventral attention network centered on the temporoparietal junction supporting stimulus-related attention. We observed that, with increasing STM load, the dorsal attention network was activated while the ventral attention network was deactivated, especially during early maintenance. Importantly, activation in the ventral attention network increased in response to task-irrelevant stimuli briefly presented during the maintenance phase of the STM trials but only during low-load STM conditions, which were associated with the lowest levels of activity in the dorsal attention network during encoding and early maintenance. By demonstrating a trade-off between task-related and stimulus-related attention networks during verbal STM, this study highlights the dynamics of attentional processes involved in verbal STM.
Despite extensive research, the nature of the neural networks supporting our ability to temporarily maintain verbal information remains a matter of controversy. One area, the left inferior parietal lobule, in conjunction with the dorsolateral prefrontal cortex, has been shown to be particularly critical for verbal short-term memory (STM) performance (Ravizza et al. 2004; Buchsbaum and D’Esposito 2008). However, as we will develop below, the precise role of this frontoparietal network, and especially of the left parietal cortex, is being increasingly questioned. By leaning on studies from the visual STM domain, the aim of the present study is to demonstrate that part of the neural substrate of verbal STM is to be explained by the intervention as well as interaction of attention networks.
Initial studies, guided by the modular working memory model (Baddeley 1986), considered that the inferior parietal lobule, and more specifically the left supramarginal gyrus, supports a dedicated verbal short-term storage system, independent from general attentional processes and distinct from other storage systems such as visual STM. Initial neuroimaging studies showed that the inferior parietal lobule was activated to a greater extent during verbal STM as compared with visual STM (Paulesu et al. 1993; Salmon et al. 1996). However, later studies failed to show specificity of this region for verbal STM since this region also responds to phonological information in the absence of STM load (Martin et al. 2003; Buchsbaum and D’Esposito 2008). Most importantly, this region does not respond to STM load (Ravizza et al. 2004). The same studies also showed that a region slightly superior and more posterior, located in the intraparietal sulcus (IPS), is actually sensitive to STM load (Ravizza et al. 2004). However, this region also does not fulfill the assumptions of a dedicated verbal STM store since this region is sensitive to STM load for both verbal and visual STM tasks (Nystrom et al. 2000; Majerus et al. 2010). In the light of these findings, an alternative account has emerged, considering that domain-general attentional processes explain the involvement of the IPS and associated frontoparietal networks in verbal STM rather than modality-specific STM buffers (Majerus et al. 2010; Cowan et al. 2011). Although this view has received increasing and direct support in the field of visual STM, there is currently very limited empirical evidence for this account in the field of verbal STM research. We should note here that we use the term STM to refer to capacity-limited, temporary maintenance of information, as opposed to executive processes as involved in dual tasking or manipulation of information held in STM. Some studies use the term working memory for both situations, including some of the studies we will review here. When referring to these studies, we focus on the results relating to maintenance of information and/or exploring the effect of STM load (i.e., the number of stimuli to be maintained).
A first line of evidence for the involvement of attention in visual STM tasks comes from studies showing an overlap between IPS regions engaged during selective spatial attention tasks and those during visuospatial STM tasks (Mitchell and Cusack 2008; Silk et al. 2010). Furthermore, these activations in the IPS present the same capacity-limited pattern of activity in attention and STM conditions, reaching a plateau or declining when processing demands exceed the capacity limits. The most direct evidence for the involvement of attention in visual STM tasks, however, comes from studies that have shown competition between 2 distinct attention networks when performing a visual STM task: the dorsal attention network, involving the IPS and the superior frontal gyrus (SFG), and the ventral attention network, involving the temporoparietal junction (TPJ) and the ventral orbitofrontal cortex (OFC) (Marois et al. 2000; Corbetta and Shulman 2002; Corbetta et al. 2008). The ventral attention network is activated by novel, salient, and unexpected stimuli not being the focus of the ongoing task and has been considered to subtend stimulus-related attention (Marois et al. 2000; Asplund et al. 2010). A number of studies have shown that these 2 attention networks are in competition during visual STM tasks (Todd et al. 2005; Anticevic et al. 2010; Matsuyoshi et al. 2010). Todd and Marois (2004) and Todd et al. (2005) observed that the higher the visual STM load the higher the activation in the IPS and the higher the STM load the greater the deactivation of the TPJ. Furthermore, separate behavioral experiments showed that participants were less likely to report an irrelevant stimulus presented during the maintenance interval of a visual STM task when the number of stimuli to be maintained in the STM task was high (Todd et al. 2005). These data suggest that there is a trade-off between 2 attention systems in visual STM: when the recruitment of task-related attention is high, stimulus-related attention is deactivated, allowing for successful STM task performance by preventing distraction from task-irrelevant stimuli (Shulman et al. 2007). A related theoretical account has been proposed by Cowan (1999) and Cowan et al.,(2011) by considering that the dorsal network, and especially the IPS, ensures the maintenance of information in the focus of attention, the focus of attention representing information in an active and conscious state. This focus of attention is considered to be of limited capacity (4 ± 1 item; Cowan 1999). This capacity limit is consistent with the studies by Todd and Marois (2004) showing that activity in the IPS reaches an asymptote level when behavioral limits are exceeded (STM arrays with more than 4 items).
Although task-related and stimulus-related attention as well as the focus of attention are considered to reflect domain-general attention processes, there is currently very limited neuroimaging evidence for the involvement of these processes in verbal STM tasks. Some evidence stems from studies showing strong overlap between frontoparietal networks, and especially activation of the IPS, during verbal and visual STM tasks (Nystrom et al. 2000; Majerus et al. 2010; Cowan et al. 2011). LaBar et al. (1999) further showed that this frontoparietal network observed for verbal STM tasks is also overlapping with frontoparietal networks involved in spatial selective attention task, which also involve task-related attention. Finally, Nee and Jonides (2011) observed that during the retrieval stage of a verbal STM task, activity in the IPS was stronger for retrieving items presented toward the end of the STM list than for items presented at the beginning of the STM list. This is consistent with the focus of attention account: given the limited capacity of the focus of attention, only the final items of a 6-item STM list should be in the focus of attention. Hence, only retrieval of these most recent items should be associated with activation of the IPS, the other items being retrieved using long-term memory mechanisms involving the left prefrontal cortex (Brodmann area 45) and the medial temporal lobe, as also shown by Nee and Jonides (2011).
The present study aims at providing direct support for the involvement of attention networks in verbal STM by determining whether the same type of competition between dorsal and ventral attention networks as previously documented in visual STM also occurs in verbal STM. In order to demonstrate the intervention and the competition of these networks in verbal STM tasks, we varied 2 parameters known to differentially recruit these networks. First, we varied STM load, which has been shown to increasingly engage the dorsal network in visual STM tasks and to increasingly de-engage the ventral network in visual STM tasks (Todd and Marois 2004; Todd et al. 2005). STM load was varied by presenting lists of 2, 4, or 6 letters to be maintained over a variable maintenance delay. Second, in order to directly probe the intervention of the ventral network, we presented a distractor stimulus (DS) during the maintenance phase for half of the trials of each STM load condition. The ventral network is known to be sensitive to the appearance of novel, task-irrelevant, and unpredictable stimuli (Todd et al. 2005; Anticevic et al. 2010; Asplund et al. 2010). Most critically, if the ventral network competes with the dorsal network during verbal STM, then its reaction to the DS should be modulated by STM load: the reaction to the DS should be inversely proportional to STM load and to the associated involvement of the dorsal attention network. Note that these predictions are opposite to those that can be derived from the load theory of attention, an influential attention theory developed by Lavie (2005), and which actually predicts higher sensitivity to DS in high-load STM conditions (see general Discussion for further discussion). Finally, according to the focus of attention account, the directionality of STM load effects on dorsal and ventral attention networks may vary according to STM phase. During encoding, the high-load STM condition should maximally activate the dorsal network (and deactivate the ventral network), since the capacity limits of the focus of attention (presumably supported by the dorsal network) will be challenged to its maximum. However, given the highly limited nature of the focus of attention (i.e., 4 ± 1 item), the focus of attention will not be able to hold all these items in the high-load condition (6 letters). Hence, at the moment of retrieval, activation in the dorsal network may be highest in the low-load STM condition where stimuli are more likely to be still within the focus of attention, in line with the observations of Nee and Jonides (2011).
Twenty-two right-handed native French-speaking young adults (10 males; mean age: 21.86 years; age range: 18–29 years), with no diagnosed psychological or neurological disorders, were recruited from the university community. The study was approved by the Ethics Committee of the Faculty of Medicine of the University of Liège and was performed in accordance with the ethical standards described in the Declaration of Helsinki (2008). All participants gave their written informed consent prior to their inclusion in the study.
The encoding phase consisted of the presentation of a horizontally organized sequence of 2, 4, or 6 consonants (fixed duration: 3250 ms), followed by a maintenance phase indicated by the appearance of a star in the center of the screen (variable duration: random Gaussian distribution centered on a mean duration of 6000 ± 2000 ms) (see Fig. 1). The retrieval phase consisted of an array of lines ordered horizontally; the number of lines was equal to the number of positions of the target sequence. A consonant was displayed in one of these positions indicated by the lines. Participants indicated within 3000 ms if the consonant presented was part of the memory list and had occurred in the indicated position (by pressing the button under the third finger for “yes” responses and the button under the index for “no” responses) (see Fig. 1 for further details on stimulus duration and timing). In all conditions, there were an equal number of positive and negative probe trials, probing equally all serial positions. In half of the trials and for each STM load condition, a DS was presented briefly during 60 ms at random time points and at variable locations during the maintenance phase in order to diminish stimulus expectancy; the latter was further reduced by the use of a variable duration of the maintenance phase during which the DS occurred. The stimulus occurred within 9° of fixation. The stimulus was a consonant of a size 50% smaller than the consonants of the memory list, and the font color was gray. The specific size, font color, and duration parameters of the DS were chosen so to make the stimulus just noticeable but without further possibility for the participant to check or reanalyze the stimulus, based on similar parameters as in Todd et al. (2005). Furthermore, a consonant was chosen since the ventral attention pathway has been shown to react most strongly for unexpected stimuli, which share some features with the target stimulus set (Serences et al. 2005; Anticevic et al. 2010). We furthermore ensured that the DS was never part of the current memory set or the subsequent probe stimulus in order to avoid any priming or interference effect between the DS and the target/probe stimuli. We should note here that our paradigm differs from previous studies that have explored stimulus-related attention; most of these studies used the attentional blink paradigm where a surprise stimulus is presented within a train of target stimuli presented at very fast presentation rates, the surprise stimulus occurring less frequently than every 2 trials or occurring even only once across the task (e.g., Asplund et al. 2010). In the present study, using a higher presentation rate potentially diminishing the surprise character of the stimulus, we used the label “DS” rather than “surprise stimulus.”
A baseline condition, controlling for letter identification and motor response and decision processes, consisted of the presentation of a sequence containing 2–6 identical consonants ordered horizontally, followed by a delay interval (a fixation star of variable duration) and a response display showing the same letter in 1 of the2, 4, or 6 positions. The probe letter was presented in either upper or lower case, and the participants had to decide whether the case was the same as in the target list by pressing the button under the third finger or not by pressing the button under the index.
The 6 STM conditions and the baseline condition were presented in a single session using an event-related design. There were 26 trials for each STM condition and 30 trials for the baseline condition. The different trials were presented in pseudorandom order, with the restriction that 2 successive trials of the same STM load condition could not be separated by more than 5 trials of a different condition (i.e., by more than 65 s on average) in order to keep blood oxygen level-dependent (BOLD) signals for same condition epochs away from the lowest frequencies in the time series (see below). Before the start of a new trial, an exclamation mark appeared on the center of the screen during 1000 ms, informing the participant about the imminent start of a new trial. The duration of the intertrial interval was also variable (random Gaussian distribution centered on a mean duration of 2000 ± 200 ms) and further varied as a function of the participants’ response times: the probe array disappeared immediately after pressing the response button followed by the presentation of the next trial. If the participant did not respond within 3000 ms, a “no response” was recorded and the next trial began. Both response accuracy and response times were collected. In order to avoid bias due to extreme responses, response times were filtered, retaining values within 2 standard deviations of mean response times for each participant. Finally, a practice session outside the magnetic resonance environment, prior to the start of the experiment, familiarized the participants with the specific task requirements and included the administration of 12 practice trials; noDS was presented during the practice trials.
Magnetic Resonance Imaging Acquisition
The experiments were carried out on a 3-T head-only scanner (Magnetom Allegra, Siemens Medical Solutions, Erlangen, Germany) operated with the standard transmit–receive quadrature head coil. Functional magnetic resonance imaging data were acquired using a -weighted gradient echo echo-planar imaging (EPI) sequence with the following parameters: time repetition (TR) = 2040 ms, time echo (TE) = 30 ms, field of view (FOV) = 192 × 192 mm2, 64 × 64 matrix, 34 axial slices with 3 mm thickness, and 25% interslice gap to cover most of the brain. The 3 initial volumes were discarded to avoid T1 saturation effects. Field maps were generated from a double-echo gradient-recalled sequence (TR = 517 ms, TE = 4.92 and 7.38 ms, FOV = 230 × 230 mm2, 64 × 64 matrix, 34 transverse slices with 3 mm thickness and 25% gap, flip angle = 90°, bandwidth = 260 Hz/pixel) and used to correct echo-planar images for geometric distortion due to field inhomogeneities. A high-resolution T1-weighted magnetization-prepared rapid gradient echo image was acquired for anatomical reference (TR = 1960 ms, TE = 4.4 ms, time to inversion = 1100 ms, FOV = 230 × 173 mm2, matrix size 256 × 192 × 176, voxel size 0.9 × 0.9 × 0.9 mm3). In each session, between 1253 and 1357 functional volumes were obtained. Head movement was minimized by restraining the subject’s head using a vacuum cushion. Stimuli were displayed on a screen positioned at the rear of the scanner, which the subject could comfortably see through a mirror mounted on the standard head coil.
Functional Magnetic Resonance Imaging Analyses
Data were preprocessed and analyzed using SPM8 software (Wellcome Department of Imaging Neuroscience, http://www.fil.ion.ucl.ac.uk/spm/) implemented in MATLAB (Mathworks Inc., Sherbom, MA). EPI time series were corrected for motion and distortion using “Realign and Unwarp” (Andersson et al. 2001) using the generated field map together with the FieldMap toolbox (Hutton et al. 2002) provided in SPM8. A mean realigned functional image was then calculated by averaging all the realigned and unwarped functional scans, and the structural T1 image was coregistered to this mean functional image (rigid body transformation optimized to maximize the normalized mutual information between the 2 images). The mapping from subject to Montreal Neurological Institute space was estimated from the structural image with the “unified segmentation” approach (Ashburner and Friston 2005). The warping parameters were then separately applied to the functional and structural images to produce normalized images of resolution 2 × 2 × 2 mm3 and 1 × 1 × 1 mm3, respectively. The scans were screened for motion artifacts, and time series with motion exceeding 3 mm (translation) or 3° (rotation) were discarded; this resulted in the removal of the data of 2 participants not presented here. Finally, the warped functional images were spatially smoothed with a Gaussian kernel of 8 mm full-width at half maximum (FWHM).
For each subject, brain responses were estimated at each voxel using a general linear model with epoch and event-related regressors. Two models were designed. A first model assessed sustained activity over the whole STM trials as a function of STM load, and the epoch regressors ranged from the time of the onset of each trial until the participant’s response. This model also included event-related regressors assessing transient activity associated with the presentation of the DS as a function of the STM load of the trial within which the DS occurred. This model was used to explore overall effects of STM load on dorsal and ventral attention networks, independently of the STM phase. A second model used distinct epoch regressors for the early maintenance, late maintenance, and retrieval phases in order to assess the impact of STM load as a function of STM phase, in addition to the event-related regressors for the DS. Rather than starting after the presentation of the STM, the early maintenance regressor ranged from the start of the presentation of the STM list until 3 s within the maintenance phase; this time frame was chosen given that previous studies have shown that activity in the IPS already starts during presentation of the STM list and is then maintained over the early maintenance period (e.g., Jha and McCarthy 2000; Pessoa et al. 2002; Majerus et al. 2010). The late maintenance regressor covered the remaining duration of the maintenance phase until the onset of the STM probe display. The retrieval regressor covered the duration of the onset of the STM probe display until the response of the participant. The variable duration of the late maintenance regressor ensured minimal autocorrelation between the early maintenance and the retrieval regressors (Ollinger et al. 2001; Cairo et al. 2004; Majerus et al. 2006, 2007, 2010). Due to unavoidable multicollinearity between the late maintenance phase and the 2 other STM phases, the late maintenance regressor was orthogonalized relative to the other 2 regressors, attributing possible shared variance between the early maintenance phase and the late maintenance phase to the early maintenance and possible shared variance between the late maintenance phase and the retrieval phase to the retrieval regressor (Andrade et al. 1999). For both models, the baseline condition was modeled implicitly meaning that any activation reported in this study is activation controlled for baseline activation. Boxcar functions representative for each regressor were convolved with the canonical hemodynamic response. The design matrix also included the realignment parameters to account for any residual movement-related effect. A high-pass filter was implemented using a cutoff period of 128 s in order to remove the low-frequency drifts from the time series. Serial autocorrelations were estimated with a restricted maximum likelihood algorithm with an autoregressive model of order 1 (+white noise).
For the first model, 3 linear contrasts corresponding to the 3 STM load conditions and 3 linear contrasts corresponding to the DS occurring within each of the 3 STM load condition were defined. The resulting set of voxel values constituted a map of t statistics (SPM[T]). These contrast images were then smoothed again (6-mm FWHM Gaussian kernel) in order to reduce remaining noise due to intersubject differences in anatomical variability in the individual contrast images. Smoothing by 8 mm (at the first level) and then by 6 mm leads to a single equivalent smoothing kernel of 10 mm (as 102 = 82 + 62), a common value common for multiple subject analysis. Given the linear nature of the general linear model used here, smoothing can be applied at any stage of processing. The use of a 2-step smoothing procedure was justified by the fact that we used low levels of smoothing for the estimation of the data at the single-subject level; these data were used for the calculation of individual beta parameter estimates reported in Figures 3 and 4. The additional smoothing by 6 mm then allowed us to attain the more common higher levels of smoothing for group-level analyses. The contrast images were then entered in second-level analyses, corresponding to analysis of variance (ANOVA) random effects models. A first ANOVA assessed the main effect of STM load on sustained activation over the 3 STM trials. A second ANOVA assessed the main effect of STM load on transient activation associated with the DS. As a rule, statistical inferences were performed at the voxel level at P < 0.05 corrected for multiple comparisons across the entire brain volume using random field theory (Worsley et al. 1996b). When regions of interest were not significant at this level, a small volume correction (Worsley et al. 1996a) was computed on a 10-mm radius sphere around the averaged coordinates published for the corresponding location of interest (see below).
|Anatomical region||No. voxels||Left/right||x||y||z||BA||SPM (Z) value|
|Inferior frontal gyrus/Insula||20||L||−40||16||6||45||3.42*|
|Inferior frontal gyrus/Insula||13||R||32||20||8||45||3.81*|
|Dorsal attention network|
|Ventral attention network|
|Anatomical region||No. voxels||Left/right||x||y||z||BA||SPM (Z) value|
|Inferior frontal gyrus/Insula||20||L||−40||16||6||45||3.42*|
|Inferior frontal gyrus/Insula||13||R||32||20||8||45||3.81*|
|Dorsal attention network|
|Ventral attention network|
Note: If not otherwise stated, all regions are significant at P < 0.05, corrected for whole-brain volume (at the voxel level).
*P < 0.05, small volume corrections (spherical volume with radius of 10 mm).
**P < 0.001, uncorrected.
For the second model, 1 linear contrast for each of the 9 cells resulting from the crossing of the 3 STM conditions and the 3 STM phases and 1 linear contrast for each of the 3 DS conditions were defined. These contrasts were not further smoothed and were used to extract individual STM load and STM phase-specific beta parameter estimates or peristimulus hemodynamic response functions for the regions of interest identified by the group-level analyses via the first model.
A Priori Locations of Interest
Regions of interest included the bilateral anterior IPS but also bilateral premotor, dorsolateral prefrontal, subcortical, and cerebellar regions consistently activated in verbal STM tasks. Regions of interest for the dorsal attention network were the IPS, the SFG, and the middle frontal gyrus (MFG). Regions of interest for the ventral attention network were the bilateral TPJ and the OFC.
Supplementary motor area (SMA) (0, 18, 54) (Majerus et al. 2010), MFG (−50, 26, 32; 46, 36, 22) (Cairo et al. 2004; Ravizza et al. 2004; Majerus et al. 2006, 2007, 2010); SFG (24, 10, 56) (Majerus et al. 2006, 2007); inferior frontal gyrus (40, 19, 13; −48, 19, 7) (Majerus et al. 2006, 2010); anterior IPS (−40, −36, 40; 42, −38, 44) (Majerus et al. 2010); caudate (−10, −4, 24; −12, 20, −8; −26, −31, 22; 24, −32, 12; −20, −42, 14; 8, 4, 22) (Cairo et al. 2004; Ravizza et al. 2004; Majerus et al. 2006, 2007).
Dorsal Attention Network
IPS (24, −56, 46; −25, −57, 46) (Corbetta and Shulman 2002; Serences et al. 2005; Chiu and Yantis 2009; Asplund et al. 2010); SFG (26, −2, 47; −22, −3, 49) (Serences et al. 2005); MFG (48, 7, 34) (Serences et al. 2005; Chiu and Yantis 2009).
Ventral Attention Network
Response accuracy was assessed via a 3 (STM load) by 2 (DS, no-DS) ANOVA, revealing a main effect of STM load, F2,42 = 18.64, MSE = 0.01, P < 0.001, = 0.47, no effect of the DS, F1,21 < 1, MSE = 0.01, P = 0.84, = 0.01, nor any interaction, F2,42 = 2.65, MSE = 0.01, P = 0.08, = 0.11. Response accuracy was overall very high: 0.96 ± 0.06, 0.94 ± 0.08, and 0.91 ± 0.02 for DS trials and 2-load, 4-load, and 6-load conditions, respectively, and 0.96 ± 0.05, 0.97 ± 0.04, and 0.87 ± 0.10 for no-DS trials and 2-load, 4-load, and 6-load conditions, respectively. Response times were submitted to the same analyses, revealing a main effect of STM load, F2,42 = 79.52, MSE = 8066, P < 0.0001, = 0.47, no effect of the DS, F1,21 < 1, MSE = 5329, P = 0.33, = 0.05, but a significant interaction, F2,42 = 3.60, MSE = 2990, P < 0.05, = 0.15. Bonferroni-corrected planned comparisons (P < 0.05) showed that when a DS occurred during the maintenance phase, response times were significantly slowed relative to no-DS trials, and this only so during STM trials with the lowest STM load (see Fig. 2). This pattern of results is in line with the prediction of a trade-off between task-related attention and stimulus-related attention, the DS capturing attention and interfering with task-related attention but only when the engagement of task-related attention is low.
We furthermore determined whether the effect of the DS was present during the entire task or whether there was a habituation effect; in the latter case, there should be no effect or an attenuated effect of the DS when comparing the first half and the second half of the trials. By running a 3 (STM load) by 2 (DS, no-DS) by 2 (first half and second half) ANOVA on the response times of the same data set, we observed no main effect of time, F1,21 < 1, MSE = 27 856, P = 0.52, = 0.02, but again a main effect of STM load, F2,42 = 78.82, MSE = 16 608, P < 0.001, = 0.79, as well as an interaction between the DS and STM load, F2,42 = 4.13, MSE = 5839, P < 0.05, = 0.16; importantly, there was no additional interaction with the effect of time, F2,42 = 1.36, MSE = 6534, P = 0.27, = 0.06. Planned comparisons further confirmed an effect of the DS on response times in the 2-load condition for response times in both the first and the second halves of the task (first half: 1218 ± 229 ms and 1157.17 ± 205 ms for DS and no-DS trials, respectively; second half: 1192 ± 205 ms vs. 1154 ± 204 ms for DS and no-DS trials, respectively).
STM Activation Patterns
First, an ANOVA assessed the effect of STM load on sustained activation patterns over the entire STM trial. A main effect of STM load was observed in a widespread network containing, on the one hand, the left SFG and the bilateral IPS being part of the dorsal attention network and, on the other hand, the right OFC and the bilateral TPJ defining the ventral attention network (see Table 1). Additional regions included a more inferior portion of the OFC extending into the inferior anterior cingulate cortex, the superior anterior cingulate cortex, the precentral gyrus, anterior IPS, the bilateral insula, and the right superior cerebellum (area CrI). This first analysis shows that dorsal and ventral attention networks show load-dependent activity in the context of a verbal STM task. Next, we determined the directionality of load-dependent activity in these networks by directly contrasting the high- and low-load conditions (6-load and 2-load trials). As expected, a positive effect of STM load on BOLD signal (SPM-T: 6-load vs. 2-load trials) was observed in regions associated with the dorsal attention network (the left SFG and the bilateral IPS) and other regions typically activated in STM tasks (left precentral gyrus, anterior cingulate/SMA, bilateral inferior frontal cortex/insula, anterior IPS, caudate nucleus, and bilateral superior cerebellum) (see Fig. 3). A negative effect of STM load on BOLD signal (SPM-T: 2-load vs. 6-load trials) was observed in regions associated with the ventral attention network (bilateral OFC and bilateral TPJ) and in the posterior cingulate cortex (see Fig. 4). As predicted, regions associated with the dorsal attention network showed an increase of activation as a function of increasing STM load while regions associated with the ventral attention network showed a decrease of activation as a function of increasing STM load.
Next, the activation pattern as a function of both STM load and STM phase was explored by distinguishing between early maintenance, late maintenance, and retrieval stages (see Methods for details). For each load condition and STM phase, β parameter estimates were extracted at the level of each individual participant for the regions significantly activated in the dorsal and ventral attention networks shown by the previous analysis (Table 1). For IPS-related target regions in the dorsal network, we also added the more anterior left IPS (aIPS) activation observed in the present study. Figure 3 shows the β parameter estimates. For regions in the dorsal attention network, a significant effect of STM load was observed as expected for the early maintenance stage, with activation increasing proportionally as a function of STM load (SFG: F2,42 = 53.66, MSE = 0.02, P < 0.0001; aIPS: F2,42 = 100.29, MSE = 0.03, P < 0.0001; left IPS: F2,42 = 81.99, MSE = 0.07, P < 0.0001; right IPS: F2,42 = 47.80, MSE = 0.06, P < 0.0001). For the late maintenance period, and except for the left IPS, no load effect was observed (SFG: F2,42 = 1.18, MSE = 0.03, P = 0.32; aIPS: F2,42 = 1.49, MSE = 0.06, P = 0.24; left IPS: F2,42 = 5.24, MSE = 0.14, P < 0.01; right IPS: F2,42 < 1, MSE = 0.06, P = 0.56), in line with previous studies showing no or reduced load effects for late maintenance delays (Jha and McCarthy 2000). At the same time, as also observed by Jha and McCarthy (2000), activation remained above baseline activity during the late maintenance period (see Fig. 3). Finally, as predicted from the focus of attention account, a significant load effect was observed for the retrieval stage but with activation in the dorsal network being highest for the low-load condition where stimuli are most likely to still reside in the focus of attention (SFG: F2,42 = 19.93, MSE = 0.11, P < 0.0001; aIPS: F2,42 = 46.85, MSE = 0.13, P < 0.0001; left IPS: F2,42 = 40.74, MSE = 0.52, P < 0.0001; right IPS: F2,42 = 34.33, MSE = 0.15, P < 0.0001). This is also consistent with the lower activation in the left IPS for the high-load STM condition observed during the late maintenance stage. This pattern of STM phase-specific activation and load dependency was exactly the reverse in the ventral network: a significant load effect was observed during the early maintenance condition, but with activation decreasing proportionally to STM load (left OFC: F2,42 = 40.40, MSE = 0.02, P < 0.0001; right OFC: F2,42 = 32.47, MSE = 0.02, P < 0.0001; left TPJ: F2,42 = 26.50, MSE = 0.08, P < 0.0001; right TPJ: F2,42 = 25.73, MSE = 0.05, P < 0.0001). No load effect was observed during the late maintenance period (left OFC: F2,42 < 1, MSE = 0.04, P = 0.64; right OFC: F2,42 = 2.39, MSE = 0.03, P = 0.10; left TPJ: F2,42 < 1, MSE = 0.07, P = 0.92; right TPJ: F2,42 < 1, MSE = 0.06, P = 0.87), with activation being generally below baseline in all conditions. During retrieval, the deactivations in the regions of the ventral network were most pronounced for the low-load condition (left OFC: F2,42 = 2.88, MSE = 0.10, P = 0.07; right OFC: F2,42 = 3.45, MSE = 0.08, P < 0.05; left TPJ: F2,42 = 13.52, MSE = 0.13, P < 0.0001; right TPJ: F2,42 = 13.15, MSE = 0.11, P < 0.0001). These results show that STM load and STM phase affect dorsal and ventral attention networks in opposing directions: each time activation in the dorsal network increases, regions in the ventral network are deactivated; these dynamics between dorsal and ventral attention networks are STM load dependent for early maintenance and retrieval stages, and these dynamics go in opposing directions when comparing the encoding and retrieval conditions, as predicted by the focus of attention account of task-related attention.
DS Activation Patterns
Next, we determined the impact of the DS on BOLD response and its interaction with STM load. As expected, the main effect of the DS resulted in activation of the ventral attention network including the bilateral TPJ and the left OFC (see Table 2). The critical analysis was the analysis of the impact of STM load on activation elicited by the DS by contrasting DS activation as a function of the STM load conditions in which the DS occurred. A first analysis contrasted the different STM load conditions in order to assess positive effects of STM load on DS activation (i.e., higher activation for the DS in higher STM load trials), leading to no significant results. The second contrasts assessed negative effects of STM load on DS activation and showed activation specifically in the ventral attention network (bilateral TPJ, IFC, and middle temporal gyri) (see Fig. 5). As shown in Figure 5, transient activation in the ventral attention network for the DS was inversely related to STM load. This is further documented by the analysis of the time course of activation for target regions in the OFC and the bilateral TPJ; time course of activation was obtained by extracting the peristimulus hemodynamic response function for each participant at each target region. As shown in Figure 5, DS-related activation in the bilateral TPJ was highest for trials with the lowest STM load, while deactivation was still observed for high-load STM trials, and this is at 2 s (F1,21 = 14.81, MSE = 0.05, P < 0.001),4 s (F1,21 = 50.70, MSE = 0.06, P < 0.001), and 6 s (F1,21 = 35.68, MSE = 0.03, P < 0.001) after the DS for the left TPJ; at 0 s (F1,21 = 5.42, MSE = 0.03, P < 0.05), 2 s (F1,21 = 16.61, MSE = 0.05, P < 0.001), 4 s (F1,21 = 82.17, MSE = 0.03, P < 0.001), and 6 s (F1,21 = 27.91, MSE = 0.07, P < 0.001) after the DS for the right TPJ; and at 2 s (F1,21 = 11.16, MSE = 0.06, P < 0.01), 4 s (F1,21 = 82.51, MSE = 0.02, P < 0.001), 6 s (F1,21 = 31.66, MSE = 0.05, P < 0.001), and 8 s (F1,21 = 41.29, MSE = 0.01, P < 0.001) after the DS for the left OFC.
|Anatomical region||No. voxels||Left/right||x||y||z||BA||SPM (Z) value|
|Dorsal attention network|
|Ventral attention network|
|Anatomical region||No. voxels||Left/right||x||y||z||BA||SPM (Z) value|
|Dorsal attention network|
|Ventral attention network|
Note: *P < 0.05, small volume corrections (spherical volume with radius of 10 mm).
The aim of this study was to show that the dynamics between dorsal and ventral attention networks shown to determine neural substrates of visual STM are also an important determinant of neural correlates of verbal STM, in contrast to traditional models of STM considering a strict separation between verbal and visual STM systems (Baddeley 2000). We observed STM load-dependent intervention of 2 antagonistic attention networks during a standard verbal STM probe recognition task. A dorsal, task-related attention network including the bilateral IPS and the superior frontal cortex showed load-dependent increase of activation during early maintenance and load-dependent decrease of activation during retrieval. On the other hand, the ventral, stimulus-related attention network encompassing the bilateral TPJ and the ventral frontal cortex showed load-dependent increase of deactivation during early maintenance and load-dependent decrease of deactivation during retrieval. Importantly, further direct evidence for the interaction between STM load and the ventral network was obtained from the pattern of activation elicited by the DS occurring during the maintenance phase of 50% of the STM trials: only when STM load was low did TPJ activity increase in response to the presentation of the DS. This was further mirrored by the participants’ response time analyses, showing that response times for STM recognition were slowed when a DS had occurred during the maintenance interval, and this is only for the low-load STM trials. In sum, during encoding and maintenance for high-load STM conditions, stimulus-related attention was depressed, while the engagement of task-related attention was maximized, precluding potential interference by distracting, task-irrelevant stimuli.
The present data provide strong evidence for the intervention of attention networks in verbal STM tasks. Previous studies have suggested that attentional processes partly define the neural underpinnings of verbal STM tasks, and this is especially with respect to the activation of the anterior and posterior IPS (e.g., Ravizza et al. 2004; Majerus et al. 2006, 2007, 2010; Cowan 2011). This suggestion was based on the observation that the IPS is sensitive to STM load across a wide range of STM tasks, involving the maintenance of verbal, visual, spatial, or auditory nonverbal information and that IPS activation during STM tasks overlaps with IPS activation during spatial attention tasks (e.g., LaBar et al. 1999; Linden et al. 2003; Rämä et al. 2004; Majerus et al. 2010; Cowan et al. 2011). The current study highlights more directly the intervention of attention mechanisms in STM by demonstrating opposing activation dynamics of the dorsal attention network and the ventral attention network during a verbal STM task. In this sense, our results are a direct reflection of results previously obtained by Todd and colleagues in the visual STM domain: in a first study, they showed STM load-dependent activation in the IPS (Todd and Marois 2004), while in a second study, they showed STM load-dependent deactivation of the TPJ (Todd et al. 2005). Together, these results and those obtained in the present study not only provide direct evidence for the intervention of attention networks in STM but also, furthermore, suggest that these attention networks may be domain general and support both verbal and visual STM.
One may argue that the dorsal, task-related network, in addition to reflecting task-related attention, could also be involved in other controlled processes such as rehearsal, especially given the verbal nature of the present study and the opportunity for rehearsal during the maintenance delay. However, the region which has been most strongly associated with rehearsal processes, the pars opercularis of the inferior frontal gyrus (BA44), did not show load-dependent activity in the present study (Paulesu et al. 1993; Salmon et al. 1996). Furthermore, the fact that the dorsal network activated in the present study overlaps with the dorsal network observed in previous studies during visual STM with no opportunity for verbal rehearsal further suggests that rehearsal strategies are not likely to explain the load-dependent activation of the dorsal network (Todd and Marois 2004). At the same time, we cannot exclude the possibility that the load-dependent activation observed in the dorsal network may involve attentional refreshing processes of the information to be maintained, since task-related attention may precisely be achieved via attentional refreshing of information in high-load conditions where the amount of information to be maintained exceeds focus of attention capacity (Awh et al. 1999).
The fact that IPS activation in the dorsal network responds to both verbal and visual STM load and shows the same load-dependent interactions with the ventral network during verbal and visual STM tasks further supports the assumption that the IPS region supports an amodal control function while going against the assumption of a modality-specific buffer function (Todd and Marois 2004; Todd et al. 2005; Majerus et al. 2010; Cowan et al. 2011). Following Cowan (1999) and Cowan et al. (2011), the IPS region may support an attentional pointer function, pointing to representations temporarily activated in modality-specific knowledge bases. Majerus et al. (2006, 2007) indeed showed anterior IPS activation in both verbal and visual STM tasks, but the peak voxels in the IPS were connected to distinct neural substrates: in the verbal STM task, the IPS was connected to superior and middle temporal areas subtending phonological and orthographic language representations, while in the visual STM task, using face stimuli, the IPS showed common activation with right fusiform and medial temporal areas specialized in the representation of face stimuli. This is consistent with an attentional pointer function of the IPS, the IPS connecting with distinct representational substrates, as a function of the type of information to be pointed to.
Hence, despite involving the same dorsal and ventral attention networks, the neural networks of verbal and visual STM can nevertheless be differentiated via the type of modality-specific representational substrates that are recruited to process and represent the items to be maintained. This partial but not complete overlap of verbal and visual STM networks can also explain the well-documented dissociations between verbal and visual STM impairment in brain-damaged patients (e.g., Basso et al. 1982; Vallar et al. 1990). In the case of patients with specific verbal STM impairment, the lesions most typically involve the left posterior temporoparietal area close to language processing areas; most of these patients presented language impairment immediately after brain injury. Hence, at least in patients with specific verbal STM deficits, these deficits appear to be often associated with residual language impairment (for a review, see Majerus 2009). The STM deficit in at least some of these patients is then likely to derive from difficulties to maintain language representations active due to suboptimal activation of the language system. Alternatively, the lesion in the temporoparietal area may alter the connectivity between the language system and the IPS, preventing the IPS from exerting its attentional pointer function toward verbal representations, but not toward visual representations. On the other hand, in case of bilateral damage in the IPS regions, we would predict equal impairment in verbal and visual STM tasks.
With respect to the ventral attention network, although the STM load-dependent deactivation and DS-dependent activation peaks we observed in the present study were situated in areas identical to those observed by Todd et al. (2005), we should, however, note that the activation extent was larger in the left TPJ relative to the right TPJ in the present study. In the visual STM experiment by Todd et al. (2005), the most robust STM load-dependent deactivation was observed in the right TPJ, and deactivation in the left TPJ was observed only when lowering the statistical thresholds. Although initial studies considered that the ventral, stimulus-related attention system was lateralized to the right (e.g., Corbetta and Shulman 2002), later studies have shown a bilateral implication of the TPJ in stimulus-related attention tasks (Asplund et al. 2010), as also observed in the current study. It remains to be determined whether the stronger level of deactivation of the left TPJ in the present study (relative to the study of Todd et al. 2005) is due to the fact that verbal information had to be retained while visual information had to be retained in the study of Todd et al. (2005).
Importantly, this study directly demonstrates the functional relevance of TPJ deactivation as a result of increasing STM load. Previous studies observed deactivation in the TPJ as a function of increasing STM load (Todd et al. 2005; Matsuyoshi et al. 2010) but did not explore whether this deactivation had any direct impact on the efficiency of stimulus-related attention. In separate behavioral experiments, these studies showed that the detection of task-irrelevant unexpected stimuli was reduced when occurring in a high STM condition; however, no direct link between this reduction in stimulus-related attention performance and the level of deactivation in the TPJ could be obtained. The present study bridges the gap between these neuroimaging and behavioral findings by demonstrating that STM load-dependent deactivation in the bilateral TPJ during the encoding and early maintenance stage reduces the reactivity of the TPJ to an unexpected stimulus occurring during the maintenance phase. The strongest DS-related activation increase in the TPJ was indeed observed in the low-load STM trials, and this was accompanied by a slowing of reaction times for later STM decisions at retrieval for those low-load STM trials where a DS had occurred, indicating that the DS had been detected and was interfering with task-related performance. Thus, the present data show that the TPJ is deactivated as a function of increasing STM load, and this STM-induced deactivation alters the reaction of the stimulus-related attention network to unexpected, task-irrelevant stimuli, preventing interference from task-irrelevant stimuli especially in high-load STM conditions.
A further novel finding of the present study is that the activation dynamics of the dorsal and ventral attention networks appear to be STM phase dependent. The STM load-dependent increase of activation in IPS and SFG regions and the STM load-dependent increase of deactivation in the TPJ areas during encoding and early maintenance as well as the absence of STM load effects for later maintenance stages are in line with previous studies (Jha and McCarthy 2000; Todd and Marois 2004; Majerus et al. 2010). Interestingly, a reverse load effect was observed during retrieval stages. The few studies that have focused on retrieval-related activation during verbal STM tasks observed that activation in IPS areas was not associated with retrieval activity as such but rather was limited to retrieval of the most recent items, that is, those that are most likely to reside in the focus of task-related attention, assuming that the capacity of this system is limited with a supposed limit around 4 items (Ötzekin et al. 2010; Cowan 2011; Nee and Jonides 2011). In the context of the present study, this means that our high-load STM condition (6 letters) exceeded these capacities, and hence, although the task-related attention network tried very hard during encoding and early maintenance to maintain the 6 items, potentially via attentional refreshing as mentioned earlier, eventually the items were out of the focus of attention, leading to decrease of activation in the dorsal network over later maintenance stages. At retrieval, these items, out of the focus of attention, will then generate no activation in the dorsal attention network anymore. According to Nee and Jonides, these items will be retrieved using controlled long-term memory retrieval mechanisms as they observed left prefrontal cortex activation (BA45; pars triangularis) for retrieval of items being out of the focus of attention (see also Nee and Jonides 2008). In accordance, in the present study, a follow-up analysis also showed activation of this prefrontal area during the high-load STM condition (x = −44, y = 16, z = 14, Z = 6.20, voxel extent = 316 voxels). On the other hand, for low-load STM trials, initial activation of the dorsal network will be low and slow since items can be easily entered into the focus of attention without any effort or attentional refreshing needs, but activation will increase progressively in order to maintain the items in the focus of attention and will be maximal at retrieval from the focus of attention, especially given that the DS in the low-load STM condition might also enter the focus of attention and will eventually compete with the STM probe stimuli for response selection. This is in line with the progressive increase of activation in the dorsal network over later maintenance and retrieval stages for the low-load STM trials, with the increase of deactivation of the ventral network over the same stages for the low-load STM trials and finally with the increased reaction times for the low-load, DS-positive STM trials observed in the present study. Interestingly, Oberauer and Kliegl (2006) proposed a theory similar to Cowan’s focus of attention account but consider that the focus of attention (the amount of information we can attend to at one time) is actually limited to 1 item. The present results are not inconsistent with this even more restricted view of capacity limitations in the dorsal network, since at retrieval, activity in the dorsal attention network was indeed highest in the 2-load STM condition and not in the 4-load STM condition. If the focus of attention is limited to 4 ± 1 items, as predicted by Cowan (1999), and if activation in the dorsal network during retrieval reflects readout processes from the focus of attention, then we should have expected highest activation in the dorsal network during retrieval of both 2-load and 4-load STM trials (see also Cowan 2011). We should, however, note that the present results have been obtained using a probe recognition design, and it remains to be shown whether the same load-dependent effects as observed here equally apply to retrieval-related cerebral activation during recognition as well as recall procedures. In that respect, Chein et al. (2011) used a pointing response recall procedure during complex working memory span tasks, and they observed findings consistent with those reported here: identical activation in parietal and prefrontal cortices was found during verbal and spatial task conditions of the span task, further indicating that these regions subserve a domain-general, task-related attention function, irrespective of recall versus recognition procedures.
Finally, the present results, as well as those obtained by Todd et al. (2005), may appear to contradict another influential theory on interactions between STM/working memory and attention, the load theory of attention (Lavie et al. 2004; Lavie 2005). In controlled attention conditions like in the STM task of the present study, Lavie and colleagues propose the existence of a reversed load effect, the sensitivity to irrelevant distractors (i.e., the DS in the present study) being most important in high-load rather than low-load STM conditions. Their proposal is based on the observation that during a selective attention task, reaction to irrelevant distractor visual stimuli is enhanced in superior parietal, associative, and primary visual cortical areas when the attention task has to be performed concurrently to a high-load rather than to a low-load STM task (see also Kelley and Lavie 2011). We observed exactly the opposite in the present study. There is, however, a major difference between the paradigms used in the present study and those used by Lavie et al. (2004). In the studies by Lavie et al. (2004), attention was explicitly directed toward both the STM task and the incoming stimuli for the selective attention task, which was presented during the maintenance phase of the STM task. Hence, task-related attention had to be divided between internal representations held within the focus of attention for the STM task and incoming stimuli for the selective attention task. It follows that in this situation, task-related attention underlies processing of both types of stimuli, leading to less efficient control of task-related attention for the spatial attention task in high-load STM conditions, making the task-related attention network more sensitive to the appearance of DSs; in the studies by Lavie et al. (2004), the DS had indeed an effect on IPS regions of the dorsal attention network and not on TPJ regions of the ventral attention network as was the case in the present study. In the present study, given that the stimuli during the maintenance delay were outside task-related attention, their detection was tied to the intervention of the stimulus-related system, which was less deactivated during the low-load STM condition and hence more likely to respond to unexpected, novel stimuli. In sum, in the studies by Lavie et al. (2004), the DSs were part of the stimuli within task-related attention and hence were determined by task-related attention control processes, more likely to be challenged in high-load conditions. In the present study, as well as the study by Todd et al. (2005), the DSs were fully outside task-related attention and their detection and interference with ongoing task performance was determined by the intervention of the stimulus-related attention system, more likely to intervene in low-load conditions with reduced task-related attention control.
Our results can also be related to a further element of the load theory of attention, the perceptual load theory (Lavie 2005). At early-stage perceptual processing, Lavie predicts increased sensitivity to irrelevant distractors in low-load perceptual conditions since attentional resources are not entirely consumed and can be captured by task-irrelevant stimuli. Although strictly speaking, there was no perceptual task during the maintenance delay in our task, since the stimuli of the STM trial were not physically present anymore and had to be maintained via internal mental representations (late-stage processing). Early perceptual processes were nevertheless involved in the nonintentional detection of the only stimulus potentially occurring during the maintenance delay, the DS; the fact that earlier perceptual processes are involved is also supported by the fact that the TPJ, considered to be involved in the detection of nontarget, unannounced stimuli, reacted to the DS, and not the IPS, involved in controlled task-related attention, as already noted. The important element here is that these stimulus detection processes nevertheless interact with controlled attention processes, since there appears to be a top-down task-related deactivation of these processes when the involvement of task-related attention is high. More generally, this means that in a single task condition, the effect of STM load on distractor detection appears to follow the predictions of the perceptual load theory. In this sense, the perceptual load theory could be considered to relate to stimulus-related attention and its interaction with task-related attention, while the controlled attention, dual-task experiments described by Lavie et al. (2004) mainly involve the application of task-related attention on 2 tasks simultaneously.
To conclude, the present results strongly support recent models of STM, which consider that controlled, task-related attention processes not only determine executive processing such as updating and manipulation of information to be maintained but also that they are already involved in basic maintenance processes (Barrouillet et al. 2004; Cowan et al. 2005; Majerus 2009; Majerus et al. 2010; Cowan 2011; Nee and Jonides 2011). Importantly, the fact that the antagonistic dynamics of task-related, dorsal attention and stimulus-related, ventral attention networks observed here resemble very closely those observed previously for visual STM suggests that these attention processes are domain general, as assumed by these theoretical accounts. The inversion of load-dependent activation dynamics in these networks during retrieval stages of STM further supports a limited capacity account of attention involvement in STM as instantiated by the focus of attention account, which implies that at retrieval only those items still in the focus of attention, that is, items from low-load STM conditions, recruit the dorsal attention network. Although this account has been highly influential in the domain of visual STM (Cowan 2011), few studies have investigated the neural substrates of verbal STM as reflecting the intervention of attention networks. The present demonstration of a direct and antagonistic intervention of dorsal and ventral attention networks during verbal STM clearly shows the need for STM researchers to consider the attentional foundations of verbal STM networks.
Fund for Scientific Research FNRS, Belgium (grant 1.5.056.10); Belgian Science Policy department (IUAP-Phase IV research grant P6/29); Ministry for Higher education and scientific research of the French-speaking Community, Belgium (Concerted Research Action ARC grant 06/11-340).
Conflict of Interest: None declared.