Enhanced alpha power compared with a baseline can reflect states of increased cognitive load, for example, when listening to speech in noise. Can knowledge about “when” to listen (temporal expectations) potentially counteract cognitive load and concomitantly reduce alpha? The current magnetoencephalography (MEG) experiment induced cognitive load using an auditory delayed-matching-to-sample task with 2 syllables S1 and S2 presented in speech-shaped noise. Temporal expectation about the occurrence of S1 was manipulated in 3 different cue conditions: “Neutral” (uninformative about foreperiod), “early-cued” (short foreperiod), and “late-cued” (long foreperiod). Alpha power throughout the trial was highest when the cue was uninformative about the onset time of S1 (neutral) and lowest for the late-cued condition. This alpha-reducing effect of late compared with neutral cues was most evident during memory retention in noise and originated primarily in the right insula. Moreover, individual alpha effects during retention accounted best for observed individual performance differences between late-cued and neutral conditions, indicating a tradeoff between allocation of neural resources and the benefits drawn from temporal cues. Overall, the results indicate that temporal expectations can facilitate the encoding of speech in noise, and concomitantly reduce neural markers of cognitive load.
Oscillatory alpha power recorded with magneto- or electro-encephalography (M/EEG; 8–13 Hz) is studied extensively in the fields of attention and working memory. In the current study, we were particularly interested in the cognitive load associated with the retention of to-be-remembered items in working memory. In the visual, somatosensory, and auditory domains, increases in alpha power have been associated with performance of working-memory tasks (Jensen et al. 2002; Leiberg et al. 2006; Haegens et al. 2010). Moreover, alpha power parametrically increases with the number of to-be-remembered items (Jensen et al. 2002; Leiberg et al. 2006; Obleser et al. 2012), and this provides further evidence that alpha indexes cognitive load associated with item retention. In particular, within the framework of the “functional inhibition” hypothesis, it has been argued that higher alpha power during item retention in working memory reflects the inhibition of task-irrelevant information (for review see Klimesch 2012) and/or brain regions (for review see Jensen and Mazaheri 2010).
Compatible with the “functional inhibition” framework, a decrease of alpha power can be related to active stimulus processing (e.g., Hanslmayr et al. 2012) and to increased excitability in sensory cortices (e.g., Jensen et al. 2012; Lange et al. 2013). Moreover, controlled inhibition (as reflected by alpha power increases) and active processing (as reflected by alpha power decreases) are likely to play a role in improving the signal-to-noise ratio (SNR) of the relevant information stored in memory (for review see Weisz et al. 2011; Klimesch 2012).
Reasoning from the “functional inhibition” hypothesis, we chose to examine working-memory performance for speech items embedded in noise, where the noise creates cognitive load that raises the need to increase functional inhibition. Degraded speech is hypothesized to increase memory load mainly due to the additional resources and time needed to encode and subsequently process the speech signal (Pichora-Fuller and Singh 2006). In line with this claim, a previous study observed increased alpha power during the retention of degraded speech items in working memory (Obleser et al. 2012). In particular, Obleser et al. noted parametric increases in alpha power both with the number of to-be-remembered items and with the decline in acoustic signal quality, suggesting that cognitive load is increased by task-detrimental acoustic factors as well.
The primary goal of the present study was to explore the potential of temporal cueing (Nobre 2001; Coull and Nobre 2008; Jaramillo and Zador 2011) to improve working-memory performance and concomitantly reduce alpha power. Temporal expectations have been shown to enhance the precision of stimulus encoding (Rohenkohl et al. 2012) as well as to improve behavioral performance (Coull and Nobre 1998). Thus, we hypothesized that behaviorally cueing participants to the time of occurrence of a to-be-remembered speech item would improve working-memory performance (for a review see Gazzaley and Nobre 2012). Critically, we expected that alpha power would be reduced when to-be-remembered items were temporally cued, reflecting the potentially reduced demand for functional inhibition.
We devised an MEG experiment using an auditory delayed-matching-to-sample task on speech in noise: Retaining a syllable in memory for 2-s introduced memory load. A priori, we provided listeners with potentially facilitating visual cues that contained probabilistic information about the duration of the foreperiod preceding the syllable pair (Nobre 2001; Coull and Nobre 2008; Kaiser et al. 2009; Jaramillo and Zador 2011). The experiment addressed 3 specific questions: First, do temporal expectations reduce cognitive load imposed by the retention of a target stimulus presented in noise, as reflected in a relative alpha-power decrease? Second, what are the underlying neural sources of alpha-power modulations due to temporal cueing? Third, if individuals differ in their ability to behaviorally profit from temporal expectations, is individual alpha power during memory retention predictive of individuals' behavioral performance?
Materials and Methods
Twenty healthy right-handed participants took part in this study. Data of 2 participants were discarded from further analyses because more than 50% of their trials were rejected due to artifacts. This led to the inclusion of data for 18 participants (9 females) ranging in age from 21 to 35. All participants had self-reported normal hearing. Participants were fully debriefed about the nature and goals of this study, and received financial compensation of €7 per hour for their participation. The study was approved by the local ethics committee (University of Leipzig), and written informed consent was obtained from all participants prior to testing.
Experimental Task and Stimuli
The time course of an example trial is depicted in Figure 1A. Each trial began with the simultaneous onset of speech-shaped noise and a fixation cross. The noise lasted throughout the entire trial. A visual cue was presented ∼1 s after noise onset (jittered between 750 and 1250 ms). Cues were presented for 1500 ms, and indicated the approximate onset time of S1. The onset time of S1 was measured from the offset time of the cue. Participants had to retain S1 in memory during a 2-s retention period following S1 offset. Then, a second syllable, S2, was presented, and participants judged whether S2 had the same or different initial consonant as S1. Approximately 1 s (jittered between 900 and 1100 ms) after the presentation of S2, participants were prompted to give a response via button press. Finally, participants indicated their confidence in their “same”/“different” response on a 3-level confidence scale (“not at all confident”: 1, “somewhat confident”: 2, “very confident”: 3). Trials were separated by an inter-trial interval of ∼1 s that was free of stimulation or responses.
Three types of cues were presented: “early,” “late,” and “neutral.” Early and late cues were specific, meaning that cues provided meaningful information about when S1 would occur following cue offset. S1-onset times for early and late cues were randomly drawn from Gaussian distributions (early: µ = 850 ms, σ = 85 ms; late: µ = 1300 ms, σ = 130 ms). On the other hand, neutral cues were unspecific, and S1-onset times were randomly drawn from a uniform distribution ranging between 700 and 1500 ms (see Fig. 1B).
S1 and S2 stimuli consisted of 4 different syllables: “da,” “de,” “ga,” and “ge.” Syllables were edited from full words beginning with the respective syllable. Two different words and 2 recordings per word were used to create a pool of 4 naturally varying tokens for each syllable (e.g., Obleser et al. 2003). Speech stimuli were recorded by a trained female speaker of German in a sound-proof chamber. Recordings were digitized at 44 100 Hz. All syllables were edited to be of 200-ms final length, including 3-ms onset and 30-ms offset ramps. Sound files were peak normalized to equal decibel full-scale amplitude.
Speech-shaped noise was generated by filtering white noise to approximate the long-term average spectrum of speech (e.g., Peters et al. 1998). Power per frequency band from a concatenated set of 60 German nouns (female speaker) served as input for the filter, which was subsequently applied to white noise. This resulted in noise with an approximately speech-shaped spectral envelope and an approximately flat amplitude envelope (i.e., a nonfluctuating noise masker).
Prior to the MEG measurement, participants completed 3 blocks of an adaptive tracking procedure in order to estimate individual SNRs yielding 70.7% correct responses (i.e., two-down-one-up; Levitt 1971). Participants performed the same task as they did in the experiment proper, with the exception that no cues to the timing of S1 onset were provided. The intensity of the noise was kept constant at 50-dB sensation level, and the relative intensity of the syllables was adjusted in 1-dB steps. Each block terminated after 12 reversals. Thresholds were taken as the arithmetic average of the final 8 reversals in each block, and additionally averaged across blocks.
Next, brain activity was recorded with MEG during the performance of 360 trials completed in 18 blocks of 20 trials each. Cue type (early, late, and neutral) was constant within a block, and participants were informed at the start of each block about the type of temporal cue they would receive on each trial. The order of trials within a block and order of blocks were randomized for each participant. Button assignments were counterbalanced across participants, such that half of the participants indicated that S1 and S2 started with the same consonant using the left button, and half did so with the right button.
The testing took ∼2.5 h per subject and was conducted within 1 session. The overall session including adaptive tracking and preparation of the MEG setup took about 3.5 h.
Data Recording and Analysis
Participants were seated in an electromagnetically shielded room (Vacuumschmelze, Hanau, Germany). Magnetic fields were recorded using a 306-sensor Neuromag Vectorview MEG (Elekta, Helsinki, Finland) with 204 orthogonal planar gradiometers and 102 magnetometers at 102 locations. Two electrode pairs recorded a bipolar electrooculogram (EOG) for horizontal and vertical eye movements. The participants' head positions were monitored during the measurement by 5 head position indicator (HPI) coils. Signals were sampled at a rate of 1000 Hz with a bandwidth ranging from direct current (DC) to 330 Hz.
The signal space separation method was applied offline to suppress external interferences in the data and to transform individual data to a default head position that allows statistical analyses across participants in sensor space (Taulu et al. 2004).
Subsequent data analyses were carried out with Matlab (The MathWorks, Inc., MA, USA) and the FieldTrip toolbox (Oostenveld et al. 2011) using only trials to which correct responses were provided (correct trials). Analyses were conducted using only the 204 gradiometer sensors, as they are most sensitive to magnetic fields originating directly underneath the sensor (Hämäläinen et al. 1993). The continuous data were filtered offline with a 0.5-Hz high-pass filter, specifically designed to provide a strong suppression of DC signals in the data (>140 dB at DC, 3493 points, Hamming window; e.g., Ruhnau et al. 2012).
Subsequently, trial epochs ranging from –1 to 3 s time-locked to the onset of S1 were extracted. Additionally, 4-s epochs were extracted from −2.25 to 1.75 s time-locked to noise onset providing the baseline window (−1.0 to −0.25 s) for a remote baseline correction of the time–frequency data. These rather long epochs were extracted to circumvent windowing artifacts in the time–frequency analysis; the intervals analyzed statistically were shorter (see below). Trial and baseline data were low-pass filtered at 150 Hz and subsequently downsampled to 500 Hz. Epochs were rejected when the signal range within one channel exceeded 200 pT/m (gradiometer) or 100 µV (EOG). Additionally, trials for which variance was deemed high relative to all others (per participant, per condition) based on visual inspection were rejected manually.
Time–frequency representations (TFRs) were calculated for each trial and 4-s baseline epoch (with 20-ms time resolution) for frequencies ranging between 2 and 30 Hz (logarithmically spaced, in 15 bins). Time-domain data were convolved with a Hann taper, with an adaptive width of 4 cycles per frequency (Δt = 4/f). An event-free 750-ms interval ranging between −1 and −0.25 s prior to noise onset (i.e., during the intertrial interval) was used as baseline period.
For each participant, single-trial relative power changes were calculated with respect to mean baseline power (averaged over trials and time; separately for each condition, sensor, and time–frequency bin). Note that no statistical differences were found between baselines of the different conditions. Power estimates for each trial were baseline-corrected by subtracting and dividing by average baseline power. The average baseline provides a possibility to adjust for block-specific differences. Therefore, the condition-specific baseline correction reflects changes in alpha power during stimulation in contrast to no stimulation and might contain between-trial differences.
Behavioral responses (i.e., proportion correct, PC; and response times, RTs) were analyzed with a one-way repeated-measures ANOVA followed by paired-samples t-tests to resolve differences between individual cueing conditions (early-cued, late-cued, and neutral).
Statistical analyses of the time–frequency data comprised a multilevel approach on alpha-power data (van Dijk et al. 2010; Obleser et al. 2012): On the first (single-subject) level, specific contrasts were conducted using single-trial data to test for alpha-power differences (8–13 Hz) between cueing conditions. Contrasts of all single conditions (neutral vs. late-cued, neutral vs. early-cued, early-cued vs. late-cued) were performed within the framework of Fieldtrip's independent-samples t-test. The contrast of the cued (early-cued and late-cued combined) and neutral conditions was conducted using the Fieldtrip-implemented independent-samples regression t-test with contrast coefficients neutral = 2, early-cued = −1, late-cued = –1. β-Values for all contrasts were obtained for each time–frequency bin at each of the 102 sensor positions. Next, β values were averaged across 8–13 Hz to derive an aggregated alpha-frequency estimate for each time point and sensor. Time points from −0.5 to 2.5 s relative to S1 onset, and each sensor were included in the analyses. For the statistical analyses on the second (group) level, β-values resulting from the single-subject first-level statistics were tested against zero with cluster-based permutation tests (dependent samples t-test, 1000 iterations; Maris and Oostenveld 2007).
The cluster approach protects against inflated type-1 error due to multiple comparisons. A second-level t-statistic was calculated for β-values (derived from alpha-power first-level analysis, see above) for each time-sensor bin. Then, clusters were formed based on combining adjacent time-sensor bins with t-values exceeding a threshold of P < 0.05. Within each cluster, t-values were summed. Using a permutation-based approach, time-sensor values were randomly assigned to two “conditions” without regard for their true condition labels on each of 1000 iterations. On each iteration, clusters were again formed based on combining neighboring bins with statistically significant t-values, and the t-value from the cluster with the largest summed statistic was added to a permutation distribution. Finally, any clusters with t-values exceeding 95% of those from the permutation distribution were considered statistically significant. All cluster tests were two tailed and were thus considered significant when P < 0.025.
We also tested for correlations of alpha power with an in-depth measure of behavioral performance. Confidence ratings served to construct receiver-operating characteristic (ROC) curves (Macmillan and Creelman 2005) for each condition that were used to derive Az, a nonparametric performance measure corresponding to the area under the ROC curve (see Fig. 4A). Based on our analyses on alpha power and Az, differences between the late-cued and neutral conditions were the largest (see below). In order to test how the dynamic ranges of Az and of alpha power are related to each other in individual participants, the difference Az_Neutral – Az_Late was correlated with the difference αNeutral – αLate. We used the permutation-cluster approach across time points and sensors to identify clusters of significant alpha-power–behavior correlation.
On the basis of individual T1-weighted MRI images (3 T Magnetom Trio, Siemens AG, Germany), topographical representations of the cortical surface of each hemisphere were constructed with Freesurfer (http://surfer.nmr.mgh.harvard.edu/).
The MR coordinate system was co-registered with the MEG coordinate system using the HPIs and about 100 additional digitized points on the head surface (Polhemus FASTRAK 3D digitizer). For forward and inverse calculations, boundary element models were computed for each participant using the inner skull surface as volume conductor (using the MNE toolbox; http://www.nmr.mgh.harvard.edu/martinos/userInfo/data/index.php). Individual mid-gray-matter surfaces were used as source model by reducing the ∼150 000 vertices needed to describe single hemispheres to 10 242 vertices.
The FieldTrip-implemented beamformer approach (DICS, dynamic imaging of coherent sources; (Gross et al. 2001) was used to project alpha power during the retention of S1 (1.25–2.0 s after S1 onset) to source space, employing the cross-spectral density (CSD) across sensors. The CSD was calculated based on results of the sensor-space analysis: Using a multitaper fast Fourier transform (FFT) applied to single trials, we focused on the alpha frequency band (8–13 Hz). The multitaper FFT was centered at 10.5 Hz (±2.5 Hz smoothing with 3 Slepian tapers; Percival and Walden 1993) and a complex common filter (all conditions and baseline) was calculated (Gross et al. 2001; Schoffelen et al. 2008). Data were then projected through the filter, separately for each retention condition and each condition-specific baseline interval. Then, projections of relative power change per condition averaged over trials were attained (comparable with baseline correction in sensor space). For visualization, the relative power source projection of each condition was morphed onto a common surface (Freesurfer average brain; Fischl et al. 1999).
To illustrate condition effects observed in sensor space at the source level, we contrasted all source-projected conditions against each other by means of vertex-wise t-tests (neutral vs. late-cued, neutral vs. early-cued, early-cued vs. late-cued). The resulting t-values were z-transformed and displayed on the average brain surface. Given that the goal of source reconstruction was to localize the neural generators of sensor-space effects previously identified as significant, z-value maps were displayed with an uncorrected vertex-wise threshold of |z| ≥ 2.5 (Sohoglu et al. 2012).
Additionally, for each condition, we extracted source-projected alpha power (baseline-corrected) from the vertices yielding a |z| ≥ 2.5 (resulting from the neutral greater than late-cued contrast) within the right insula cluster, where z-values showed the greatest condition effects (see more on the insula below). Extracted activity was then averaged across vertices for each condition separately and used for visualization. A repeated-measures ANOVA was conducted across the averaged activity in order to reveal condition differences in this exact area.
Effects of Temporal Cueing on Behavioral Performance
The participants' task was to retain syllable S1 in memory for 2 s and then to decide, after the offset of S2, whether S1 and S2 had the same syllable-initial consonant.
A one-way repeated-measures ANOVA on proportion correct (Fig. 1B) indicated that performance depended on cueing (F2,34 = 4.15, P = 0.024). Participants responded more accurately in the late-cued condition compared with the neutral condition (t(17) = 2.53, P = 0.009). Participants benefited less robustly from early cues, as this condition did not differ significantly from the late-cued (t(17) = −1.49, P = 0.155) or the neutral condition (t(17) = −1.44, P = 0.169).
A repeated-measures ANOVA on response times (measured relative to a response prompt that occurred 1 s after S2 offset) revealed a main effect of condition (F2,34 = 7.30, P = 0.035). Post hoc paired-samples t-tests revealed that responses to late-cued trials were significantly faster than to neutral trials (neutral vs. late: t(17) = 2.58, P = 0.019. Similar to the accuracy results, there were no significant differences between RTs for early-cued and late-cued conditions (t(17) = 0.50, P = 0.622) or the early-cued and neutral conditions (t(17) = 1.95, P = 0.0678; Figure 1B).
Effects of Temporal Cueing on Alpha-Power Changes
As predicted, alpha power increased across the entire trial in all three conditions, relative to baseline (see Fig. 2A). We compared the postbaseline interval (−0.5 to 2.5 s after S1 onset) and the baseline interval (−1.0 to –0.25 s prior to noise onset), both averaged across time-sensor bins in the alpha range, with t-tests. Each condition in the postbaseline interval presented a significant increase compared with the baseline interval in the alpha range (all P's < 0.05).
Statistical contrasts between conditions revealed a strong difference in alpha power between neutral and late-cued trials. Two significant clusters (1. P = 0.010, –0.42 to 0.98 s; 2. P = 0.018, 1.4–2.5 s) during S1 and S2 encoding, as well as during retention of S1 indicated that alpha power was reduced in late-cued trials relative to neutral trials (see Fig. 2B, upper panel).
Alpha power in the early-cued condition did not differ significantly from the neutral condition. Moreover, no significant clusters obtained for the contrast between the cued and neutral conditions. However, in an early time window around syllable S1, early-cued trials exhibited larger alpha power than late-cued trials in a right-frontal positive cluster (P = 0.033; –0.42 to 0.32 s; Fig. 2B, lower panel).
In sum, late-cued trials showed reduced alpha power compared with neutral and early-cued trials. The late cue was thus most effective in providing temporal expectations that yielded the hypothesized alpha-power decrease.
Source Localization of Alpha-Power Changes
We tested whether the alpha-power source projections (see Materials and Methods) presented less activity in the late-cued condition than in the neutral condition (Fig. 3), to confirm results from sensor space (Medendorp et al. 2007; Haegens et al. 2010; Obleser et al. 2012). Source space results corroborated the findings in sensor space: late-cued trials led to a reduction of alpha power compared with early-cued and neutral trials. Locations of the alpha-power reduction in the late-cued condition were strongly overlapping. In general, the alpha-power differences (z > 2.5) were found in the right hemisphere emerging from the right anterior insular cortex [peak activity at MNI: 28; 23; −6]. A repeated-measures ANOVA (F2,34 = 7.70, P = 0.006) on the condition-wise averaged alpha-power projection in the insula showed that late-cued trials present significantly less activity than neutral trials (t(17) = −4.21, P = 0.002), whereas this reduction in late-cued trials compared with early-cued trials is only significant on trend level (t(17) = −2.09, P = 0.078). Activity in early-cued and neutral trials does not differ at all (t(17) = −1.57, P = 0.135; Fig. 3A). Figure 3A depicts the source-projected alpha power in the insula for each condition. Condition-specific power values show the same pattern as our sensor-space analysis, thereby confirming the right insula as the main source of our alpha effects.
Alpha-Power Reduction During Memory Retention Predicts Behavioral Performance
In a final analysis, we aimed to relate the observed modulation of behavioral performance by temporal cueing to the alpha-power differences between cue conditions. Specifically, we contrasted the 2 conditions for which we observed the largest difference in both behavior and alpha power (i.e., the neutral and late-cued conditions). We asked whether the degree to which alpha power was decreased by temporal cueing (indexed by αNeutral – αLate) would predict the degree to which participants were able to profit from the temporal cue (Az_Neutral – Az_Late).
This analysis focused on the performance measure Az, a nonparametric measure derived from the ROC curve (see Materials and Methods and Fig. 4A) which can be interpreted similarly to proportion correct (Fig. 4B). In brief, recall that confidence ratings were collected for “same”/“different” responses on each trial, and these were used to construct ROC curves. Then, ROC curves were tested for asymmetry around the minor diagonal (Henry and McAuley 2013). Linear fits to z-transformed ROCs (zROCs) yielded a slope estimate for each participant. Separately for each condition, zROC slopes were then tested against unity (slope = 1) using a single-sample t-test. Significant deviations from unit slope (as was the case here; all 0.07 > P > 0.01) indicate asymmetric ROC curves and nonindependence of perceptual sensitivity and response bias in parametric performance measures (e.g., PC, d′). Thus, nonparametric performance measures derived from the ROCs themselves, like Az, are considered more accurate performance measures.
Next, we calculated an “alpha-power modulation index” that reflects the difference for each participant between alpha power in the neutral and late-cued conditions (i.e., αNeutral − αLate), and correlated these values with a “behavioral performance modulation index” calculated for the same 2 conditions (Az_Neutral − Az_Late). We then correlated these values for individual time-sensor bins, again using a cluster-based approach. This revealed a broad positive fronto-central cluster (0.08–2.7 s, P = 0.007) ranging across the entire retention phase including encoding of S1 and S2.
The correlation of the alpha-power differences extracted from this cluster and the behavioral differences (r = 0.51) are shown in Figure 4C.
The present study investigated whether temporal cues improved behavioral performance and decreased alpha power in a delayed-matching-to-sample working-memory task, where to-be-remembered syllables were embedded in masking noise. We observed that “knowing when to listen” facilitated retention of a syllable in ongoing noise, as indexed by higher accuracy and faster response times in the late-cued compared with the neutral condition. This finding is in line with previous research indicating that temporal cues and long foreperiods lead to better stimulus encoding (Correa et al. 2005; Rohenkohl et al. 2012) and behavioral performance (Coull and Nobre 1998). Moreover, we observed that, along with the overall increase of alpha power in all conditions, temporal cues (in particular when coupled with a relatively long foreperiod) caused a reduction of the magnitude of this alpha-power increase, suggesting that knowing when to listen also decreased the necessity to functionally inhibit task-irrelevant information. In particular, largest differences in alpha power between temporal cueing conditions were observed in the right insula. Overall, the reduction of alpha power as well as the increase of behavioral performance implies that temporal expectations (i.e., late-cued condition) are able to reduce the cognitive load elicited by stimuli presented in noise (see Zanto and Gazzaley 2009).
In the following sections, we will put the current findings in context, in particular emphasizing how the facilitatory effects of temporal cues might be realized neurally in terms of alpha-power modulations. The discussion will be structured in 3 parts: 1) How do temporal expectations affect alpha power and cognitive load?; 2) What are the underlying neural sources of alpha-power modulations?; 3) How do alpha-power modulations predict modulations of behavioral performance?
How Do Temporal Expectations Affect Alpha Power and Cognitive Load?
Temporal expectations in the current study led to a decrease of alpha power during syllable retention, relative to when the onset of the syllable pair could not be expected even though stimulation (syllables, noise, and SNR) was identical across conditions. In particular, we observed the largest differences in terms of both behavioral and alpha-power effects when we contrasted the late-cued with the neutral condition. That is, although early and late cues both provided information about the onset time of the syllable, the late-cued condition was more effective in reducing alpha power than the early-cued condition (see also the behavioral results in Fig. 2B). This effect of foreperiod duration corresponds to previous behavioral results showing that longer foreperiod durations lead to increased encoding precision (Correa et al. 2005) and better stimulus detection (Niemi and Näätänen 1981).
We interpret reduced alpha power for the late-cued relative to the neutral condition to mean that temporal expectations reduced the need for functional inhibition. The reason is that, in all temporal cueing conditions (early-cued, late-cued, and neutral), alpha power was generally increased relative to baseline. Thus, we suggest that alpha power played an inhibiting role in the speech-in-noise working-memory task regardless of temporal cueing condition. For the late-cued condition, alpha power increased less relative to baseline than in the neutral condition, suggesting that alpha power still played an inhibiting role, albeit a less strong one. In particular, we suggest that more specific temporal expectations may have allowed for an a priori suppression of irrelevant, potentially interfering information. Concomitantly, less functional inhibition was needed, which was reflected in reduced alpha power.
Along these lines, previous studies have shown that knowing when to listen enhances stimulus encoding (e.g., Posner 1980; Correa et al. 2005; Rohenkohl et al. 2012; Vangkilde et al. 2012; Cravo et al. 2013). We suggest that improved encoding could have been allowed for by the stronger suppression of irrelevant information (e.g., Hillyard et al. 1998) in the late-cued relative to the neutral condition. Moreover, less degraded stimuli elicit less cognitive load and less alpha power during maintenance in working memory (Obleser et al. 2012), suggesting that the beneficial effects of temporal expectations cascaded into the retention interval in the current study, thereby triggering the observed alpha effects.
It is worth pointing out that alpha power rather serves as an indirect measure not reflecting active maintenance but functional inhibition of irrelevant information (termed working-memory “protection,” Roux and Uhlhaas 2014), whereas stimulus maintenance in memory per se has previously been associated with gamma oscillations (>30 Hz;, e.g., Howard et al. 2003; Jensen et al. 2007; Lisman and Jensen 2013; Roux et al. 2013). Essentially, alpha and gamma are inversely related: brain areas presenting high alpha power are inhibited and present low gamma power because active processing is suppressed, and vice-versa (Jokisch and Jensen 2007; see for review Klimesch et al. 2007; Jensen and Mazaheri 2010).
So far, we have only discussed less alpha power as reflecting less functional inhibition. However, an alternative (although not mutually exclusive) explanation is that reduced alpha power associated with temporal expectations has been interpreted to reflect increased cortical excitability (e.g., Jensen et al. 2012; Lange et al. 2013). The association between reduced alpha and increased excitability comes specifically from studies involving focusing attention either spatially (Weisz et al. 2014; Whitmarsh et al. 2014) or temporally (Rohenkohl et al. 2012), where focused attention also results in improved task performance. With our design, it is not possible to completely disentangle whether reduced alpha power reflects reduced functional inhibition or enhanced cortical excitability. However, our overall alpha effects reflect synchronization (i.e., a power increase compared with baseline; Klimesch 2012) rather than desynchronization (i.e., power decrease relative to baseline). Moreover, our primary effect localized not to sensory/domain-specific, but rather to domain-general cortex (i.e., the insula, see below). Thus, we suggest that the functional inhibition framework and a relative decrease in the need for such functional inhibition offer the more parsimonious explanation for our observed alpha effects. More generally, the current results fit within the context of an extensive literature relating alpha oscillations to attention and working memory. Studies manipulating selective attention (for a review see Foxe and Snyder 2011), along with studies using comparable delayed-matching-to-sample tasks in the somatosensory (e.g., Haegens et al. 2010; Haegens et al. 2011) and in the auditory domain (Kaiser et al. 2007), imply that increased alpha power effectively inhibits interference from other processes and/or brain sites.
What are the Underlying Neural Sources of Alpha-Power Modulations?
Source analyses of alpha power revealed that effects between conditions were confined mainly to the right insular cortex (see Fig. 3). We suggest that lateralization to the right hemisphere generally reflects inhibition of the hemisphere that is arguably task-irrelevant when it comes to retaining verbal material (i.e., syllables) in working memory (e.g., Smith and Jonides 1998). Previous research supports this proposition. Specifically, right hemispheric alpha-power effects were observed in another working-memory study making use of syllable material (Leiberg et al. 2006; see also results by Obleser et al. 2012). Although the authors interpreted their alpha effects as reflecting executive processes operating on verbal material van Dijk et al. (2010) re-interpreted the findings of Leiberg and co-workers as meaning that alpha power was inhibiting the right hemisphere, which (similar to the current study) was task-irrelevant during syllable retention. Conversely, van Dijk et al. (2010) made use of a nonverbal, pitch memory task, and found increased alpha power in the left hemisphere. The authors argued that enhanced alpha power reflected a functional inhibition of the hemisphere that was again task-irrelevant, this time during retention of pitch information. Finally, during a working-memory task in the somatosensory domain, Haegens et al. (2010) showed that alpha power increased at sensors ipsilateral to the side of stimulation (i.e., the task-irrelevant hemisphere).
With respect to localization to the insula more specifically (Fig. 4), several previous fMRI studies have shown that the processing of degraded speech (not unlike the present stimulus setup) is accompanied by increased insular activity reflecting the difficulty of comprehension (Erb et al. 2013; Vaden et al. 2013). Converging evidence for increased insula activity in a difficult listening situation comes from an fMRI study of Sadaghiani et al. (2009), who found that increased prestimulus BOLD activity in the insula was associated with enhanced detection of near-threshold auditory stimuli in a sustained attention task. According to Sadaghiani et al., activity in the insula is a marker of fluctuations of sustained attention. Dosenbach et al. (2007) and Eckert et al. (2009) more genrally see that the anterior insula not only enhances sustained attention but also, is part of a network which is responsible for sustained task-related cognitive control.
We would suggest that the insula plays an active role in functional inhibition, in line with the localization of our alpha effects to this region. A recent fMRI study from our group found upregulation of insula activity associated not only with selective attention to task-relevant information, but also with selective ignoring of task-irrelevant information (Henry et al. 2015). Work using combined EEG/fMRI has typically shown a negative relation between BOLD signal and alpha power in much of cortex (Laufs et al. 2006; Scheeringa et al. 2011). This correlation has been interpreted within the context of alpha as a marker of inhibition (e.g., Jensen and Mazaheri 2010). However, a combined EEG/fMRI resting-state study (Sadaghiani et al. 2010) indicated that the BOLD signal and alpha power, specifically in the right anterior insular cortex, were positively correlated. On these grounds, we propose that the right insula may act as a generator for alpha power and is a neural source for functional inhibition.
How do Alpha-Power Modulations Predict Modulations of Behavioral Performance?
Lastly, a correlation analysis revealed that a listener's “behavioral performance modulation” was predictable from her/his own “alpha-power modulation” between temporal expectation conditions. Performance was in general better for late-cued trials than for neutral trials, but the correlation of the behavioral differences and alpha-power differences between neutral and late-cued conditions (Fig. 4) conveys 1 central conjecture: Participants who had relatively large alpha power in neutral trials performed in these trials as well as, or even better than, in late-cued trials. Thus, both (exogenous) temporal cues and (endogenous) alpha power are means that can lead to the same end, that is, a performance benefit: Either listeners form and utilize specific temporal expectations to reduce cognitive load up front (the arguably more adaptive strategy), or, alternatively, listeners do not utilize the cues as much but succeed in good performance in neutral trials nevertheless. This latter strategy then comes at the “cost” of increased alpha power to boost working-memory performance. We propose to label such alpha-power increases a “compensatory” process, as these increases might come at a neural or metabolic cost to the system (see also the previous section on insular activity), but they can be beneficial to performance. This view is supported by a study from Haegens et al. (2010) showing that alpha-power increases were strongest during successful working-memory performance.
We conclude that alpha power is instrumental for performance in cognitively demanding tasks. It can partly makeup for (or trade-off with) other task-beneficial factors, such as temporal expectations cues provided here.
Alpha-power changes are a sensitive neural marker of cognitive load, particularly so when the task requires memorizing and matching auditory syllables in task-detrimental noise. Cues that allow listeners to form a specific temporal expectation about when target syllables will occur can counteract and reduce alpha power. Furthermore, the facilitatory or task-beneficial effects of cueing are not limited to sensory encoding but extend to later stages of memory retention. Here, the magnitude of alpha oscillations emerging from right insular cortex scales directly with listeners' performance benefits. Thus, alpha power appears as a costly but effective neural mechanism to boost performance in difficult listening situations.
Research was supported by the Max Planck Society (Max Planck Research group grant to J.O.).
Yvonne Wolff helped to acquire the data. We thank 2 anonymous reviewers for their helpful comments. Conflict of Interest: None declared.