The ability to selectively attend to one sound and ignore other competing sounds is essential for auditory communication. Subjects in our study detected occasional changes in the frequency of amplitude modulation in sounds presented to one ear while ignoring sounds in the other ear. Neuromagnetic source analysis revealed attention-related activity in a cortical network including primary auditory cortices, posterior superior temporal gyri, inferior parietal lobules (IPLs), inferior frontal gyri (IFG), and medial frontal gyri. Time courses of event-related magnetoencephalography responses were analyzed during the interval between stimulus presentation and behavioral response. Enhanced neural responses to targets and standards in the attended ear indicated early modulation of sensitivity in the attended sensory channel. A subsequent process of discriminative stimulus selection was indexed by a response increase over time for targets and decreasing activity for standards. Enhanced responses to deviants in the unattended ear indicated discriminative processing of unattended inputs as well, though to a lesser extent than for attended stimuli. Superior temporal gyrus, planum temporale, and the IPL were prominently involved in stimulus selection, whereas medial frontal regions were linked to initiation of behavioral responses and sustained activity in IFG suggested a role in attentional control.
In a multispeaker environment, we can direct our attention to one conversation and ignore the voices of other simultaneous speakers. Nevertheless, our attention can still be captured if somebody outside of the focus of attention mentions our name (Wood and Cowan 1995). Although we suppress irrelevant auditory information when selectively listening to one speaker through top-down modulation of sensitivity in the attended channel, a considerable amount of information must therefore still be processed at unattended levels (Winkler et al. 2005). How is the brain able to balance the processing of attended and unattended inputs? We investigated the temporal dynamics of neuromagnetic cortical activity during the several hundred milliseconds between stimulus presentation and behavioral response. We expected that the time courses of neural activity elicited by attended and nonattended stimuli would indicate whether selective attention enhances early perceptual processing or facilitates later stimulus selection. Moreover, we expected that the time courses of magnetoencephalography (MEG) activity would indicate whether the interfering target like deviant stimuli in the nonattended ear would be suppressed by channel-selective attention or enhanced during stimulus selection. The latter question is of great importance for understanding false-positive responses in attention-demanding tasks.
We performed a dichotic listening experiment, in which sounds were presented concurrently to the 2 ears, and the listener discriminated relevant sounds in one ear while ignoring those in the other ear, which is the classical paradigm for studying selective auditory attention (Cherry 1953; Asbjornsen and Hugdahl 1995). In such an experiment, brain responses elicited by identical stimuli are recorded under different attention conditions—listening to one or the other ear—and differences are considered to be effects of top-down control. The N1 component of the auditory evoked potential (AEP) with a latency of about 100 ms is reported to be larger for attended stimuli, indicating enhanced processing of auditory stimuli in the attended ear (Hillyard et al. 1973; Picton and Hillyard 1988). Enhancement of a positive wave at 20–50 ms indicated an even earlier gating based on physical properties of the sound before perceptual analysis has been completed (Woldorff and Hillyard 1991). Selective attention also engenders an additional longer lasting change in endogenous AEP, which has been variously called the negative difference wave, Nd (Hansen and Hillyard 1980), or the processing negativity (Näätänen et al. 1978). Näätänen (1982) described the stimulus selection as a progressive discrimination of the features of the auditory object in relation to a template of the target stimulus that the subject was supposed to detect. Studying the timing of the Nd wave revealed that the most easily accessible feature of the sound was processed first and less discriminable features were selected later (Hansen et al. 1983). This indicated a sequential selection of individual stimulus attributes rather than a final single stage of target selection after all attributes had been processed (Picton and Hillyard 1988). Long-latency components elicited by target sounds were also enhanced, although this effect was diminished for deviant stimuli presented in the unattended channel (Woldorff et al. 1998). Such bottom-up effects may account for false-positive responses in a dichotic listening task (Pollmann and Maertens 2006). Corresponding brain networks involved in attention capture have been identified in a functional magnetic resonance imaging (fMRI) study (Watkins et al. 2007), in which a ventral stream responded to target-like stimuli regardless of the experimental task, and a dorsal stream was activated when attention had been captured by stimuli in the attended channel.
The distinction between attention to an auditory object and attention to location is closely related to the concept of dual pathways for the processing of “what” and “where” of auditory information (Rauschecker and Tian 2000). Positron emission tomography (PET) has shown that posterior auditory areas and posterior parietal cortex are involved in processing spatial aspects of sound (Zatorre et al. 2002), and inferior frontal areas became active when spectral features of sound were analyzed (Zatorre et al. 2004). However, the brain areas specific for spatial or spectral auditory processing may overlap significantly (Zatorre et al. 1999; Arnott et al. 2004). Because the dichotic listening task involves a spatial aspect of sound and analyses of the sound object, we expected that our MEG studies would reveal a similar network of cortical areas involved in selective auditory attention.
Frontal brain areas have been implicated in controlling and monitoring of attention. Whereas the brain areas involved in auditory attention are well documented—particularly in recent fMRI studies (Grady et al. 1997; Petkov et al. 2004; Degerman et al. 2006)—less is known about how activity evolves in the several hundred milliseconds between the onset of the sound onset and the completed perceptual decision. Hemodynamic studies identifying regions of the brain that are active during auditory attention have only limited resolution. Recent developments in the analysis of MEG source activity (“event-related synthetic aperture magnetometry” or SAM), however, enable us to study the precise temporal dynamics of focal brain activation during stimulus processing (Robinson 2004; Cheyne et al. 2006).
We used this new technique to evaluate the MEG activity during a dichotic listening task. The task was to detect a change in the modulation frequency of a tone in one ear and to ignore the other ear. The stimuli lasted longer, and the discrimination of the deviant stimuli took more time than in previous studies of selective auditory attention. This allowed us to evaluate long-lasting components of the auditory response, such as the sustained potential and the steady-state response, and to follow the discrimination of the modulation frequency over several hundred milliseconds. This paper presents our studies of the transient and sustained potentials. A subsequent paper will consider the steady-state responses.
Twelve healthy university students (26.5 years of age, range 22–34 years, 8 female) without any history of neurological or hearing disorders participated in this study. All had normal hearing defined as audiometric pure-tone thresholds below 20 dB hearing level in the frequency range from 250 to 4000 Hz. Participants gave written informed consent to partake in the study, which had been approved by the research ethics board of the faculty of medicine at the University of Münster, Germany.
Amplitude-modulated (AM) tones of 600 ms in duration were presented with an onset asynchrony of 900 to 1100 ms (i.e., 300–500 ms interstimulus interval (ISI)) to the left or right ear in random order. Sound waveforms are illustrated in Figure 1A. The modulation depth was 90%, and the modulation began 40 ms after sound onset. In 15% of the stimuli, the modulation frequency was changed from 40 Hz (standard—middle-gray shading in Fig. 1) to 20 Hz (deviant—dark gray shading in Fig. 1). The subjects were instructed to attend either to the right or the left ear sounds and to respond with a right hand button press to detection of a target in the attended ear only. Because unilateral sounds might have diverted attention to the stimulated ear, unmodulated pure tones (fillers—pale-shaded stimuli in Fig. 1) were presented contralaterally to each of the AM stimuli. Thus, 2 continuous streams of stimuli were presented that occurred simultaneously in the 2 ears. The carrier frequency was 400 Hz in one ear and 700 Hz in the other, with the frequencies allocated randomly across the blocks. The modulated sounds evoked a stronger percept than the unmodulated sounds simultaneously presented in the opposite ear. Thus, the listeners perceived the stimuli as a sequence of AM sounds alternating randomly between both ears. However, the additional unmodulated tones filled the temporal gap in the contralateral ear to create continuous streams of sound in both ears. The different tonal frequencies in both streams supported the listeners in maintaining attention to one ear, and they responded accurately to the targets in this ear. The stimuli were presented with intensity of 60 dB above individual sensation level through Etymotic ER3A transducers connected with 1.5 m of length-matched plastic tubing and foam earplugs to the subject's ears.
The experiment was performed in multiple short blocks of 7 min duration each, and the focus of attention was alternated to one or the other ear between blocks. Each block contained 180 standard and 30 deviant stimuli in the attended and unattended ears, respectively. During the break of about 1.5 min between blocks, the investigator communicated with the participant in order to minimize carryover effects of attention between blocks. In contrast to the regularly alternated ear to be attended, the stimulus carrier frequencies were randomly chosen between blocks. Eight experimental blocks were performed in one recording session of 70 min duration, and 2 recording sessions were performed with each subject on the same day. Task performance was assessed based on the reaction time and percentage of correctly detected targets and false alarm responses.
Data Acquisition and Analysis
Structural magnetic resonance images (MRIs) were acquired after the MEG recording on a 3-T Scanner (Gyroscan Intera, Philips Medical Systems, Eindhoven, the Netherlands) for the overlay of functional MEG data on the individual brain anatomy. T1-weighted sagittal images with in-plane size of 512 × 512 (0.6 × 0.6 mm resolution) and 320 slices (0.5 mm thickness) were recorded using spoiled gradient echo imaging with a standard head coil.
The MEG was recorded in a magnetically shielded room using a 275-channel whole head neuro-magnetometer (OMEGA, VSM Medtech Inc, Vancouver, Canada) at the University of Münster, Germany. Participants were seated in upright position with their head resting in the helmet-shaped MEG sensor. In order to reduce eye movements, participants kept their eyes focused on a fixation cross in front of them. The neuromagnetic activity was sampled at a rate of 600 Hz after 200 Hz low-pass filtering and was recorded continuously. For offline analysis, the MEG data were reorganized in stimulus-related epochs with length of 1300 ms including a 400-ms prestimulus interval separately for standard and deviant stimuli and attention to the left and right ears. Deviants with incorrect behavioral responses were excluded from further analysis.
The SAM minimum-variance beamformer algorithm (Van Veen et al. 1997; Robinson and Vrba 1999) was used as a spatial filter to estimate the source activities at all nodes of a lattice of 5 mm spacing across the whole brain volume. The beamformer analysis, using the algorithm as implemented in the VSM software package, was based on individual multisphere models, for which single spheres were locally approximated for each of the 275 MEG sensors to the shape of the cortical surface as extracted from the MRI. The MEG beamformer minimizes the sensitivity for interfering sources as identified by analysis of covariance in the multichannel magnetic field signal while maintaining constant sensitivity for the source location of interest. The covariances were calculated for each individual recording block from all 0–600 ms epochs of data, low-pass filtered at 24 Hz. Before applying the beamformer to each single epoch of magnetic field data, a principal component analysis was performed, and field components larger than 2 pT at any time were subtracted from the data. This procedure effectively removed large artifacts caused by eye blinks (Lagerlund et al. 1997; Kobayashi and Kuriki 1999). Furthermore, the offset in each epoch was corrected with respect to the mean in the 100-ms interval before stimulus onset. Spatial filtering with the beamformer resulted in single epoch waveforms of source activity for each volume element within the brain. The mean and variance across all epochs in one experimental block were calculated for each time point. Dividing the mean by the square root of the variance resulted in time series of z-scores. Those time series were averaged across all repeated measurements for each subject and can be interpreted as time series of signal change in percent of the intrinsic brain activity in each volume element. Eight volumetric sets of time series were calculated for each subject according to an AM stimulus presented in the left or right ear, whether the stimulus was a standard or a deviant and whether it has been attended or nonattended.
For data reduction, the time series were down-sampled by a factor of 8 (i.e., one sample point every 13 ms) and converted into AFNI format (Cox 1996) for overlay with each participants’ anatomical MRI. All anatomical MRIs and volumetric maps of the evoked source activity were spatially normalized to the Talairach coordinate system. Volumetric maps of the effect of attention were calculated as the difference between the grand-averaged responses to attended stimuli and the grand-averaged responses to nonattended stimuli. Local maxima in those maps determined with the 3dmaxima function provided by AFNI identified regions of interest (ROIs) for further analysis.
Repeated measures analyses of variance (ANOVAs) were performed for each time point of the 8 time series of source activity in each ROI with the within-group factors of “attention” (attend vs. ignore), “stimulus type” (standard vs. deviant), and “side of stimulation” (right vs. left). Time series of cortical source activity evoked under the various experimental conditions and time series of statistical parameters (F and P values) were visualized for the ROIs.
The participants in the dichotic listening experiment were able to focus their attention on the tones in the required ear and accurately detected the targets. Responses in the latency interval of 300–1200 ms were analyzed. The group mean rate of correctly detecting a target (hit rate) in the left ear was 73.0%, which was not significantly different from the hit rate of 70.0% for the right ear. Nevertheless, listeners sometimes noticed and occasionally responded to the deviants in the other ear. False-positive responses were made to 3.6% of deviants in the unattended ear, to 0.2% of attended standards, and to 0.06% of unattended standard stimuli. Thus, the subject's false-positive responses were predominantly a result of interference by the contralateral deviant stimulus. The median reaction time was 663 ms with respect to the stimulus onset, and the lower and upper quartiles were 562 and 782 ms, respectively (Fig. 2). The subject's reactions were slightly faster for left than right ear stimuli [t(11) = 2.4, P = 0.035].
Auditory Evoked Magnetic Fields
All auditory stimuli elicited magnetic field responses, which were clearly identifiable in the maps of magnetic field distributions and the time series obtained from maximally responding sensors in both hemispheres shown in Figure 3. The auditory evoked magnetic field (AEF) showing the first prominent peak at 65-ms latency corresponds to the initial evoked P1 wave of the auditory ERP and hence labeled P1m (Fig. 3). Whereas P1m amplitudes were larger in the left compared with the right hemisphere [F(3,33)=4.71, P = 0.0076], they did not vary as a function of attention and stimulus type [F(1,11) < 1.0 for both], although, a slightly smaller P1m amplitude can be seen in the selected posterior channel of group averaged data in Figure 3. The N1m response, corresponding to the N1 of the auditory ERP, was strongly suppressed because of the short interstimulus interval of 300–500 ms. However, a clear deflection with same polarity as N1m did appear at 200-ms latency in right-hemispheric responses, most strongly expressed in the response to the deviant stimulus. This peak, which likely reflects the N1m change response to the onset of AM, is labeled N1mAM in Figure 3. The ANOVA for the amplitude at 200 ms in the 4 selected MEG sensors revealed a main effect of the stimulus type [F(1,11) = 20.9, P < 0.0009] with larger amplitudes for the deviant.
The most prominent AEF component was a long-lasting deflection with same polarity as the N1m with an onset latency of about 150 ms and offset about 100 ms after stimulus offset, which is termed the sustained magnetic field (SF). The SF amplitudes to the deviants were noticeably larger than the standard stimuli [F(1,11) = 29.1, P = 0.0002]. There was also a main effect of sensor site on the SF amplitude [F(3,33) = 11.2, P < 0.0001] with larger amplitudes in the posterior MEG channels. The ANOVA revealed no significant overall effect of attention on the group of selected MEG channels, but there was a significant interaction between sensor site and attention [F(3,33) = 8.55, P = 0.0002]. The SF increased with attention at posterior MEG sites and decreased at anterior sites. The effect of attention on the SF was significant for all 4 selected MEG channels. The observation that attention modulates the SF in opposite directions at anterior and posterior MEG sensor sites indicates a more complex configuration of underlying sources than the magnetic field maps (Fig. 3) may suggest. The maps of magnetic field distribution for the P1m onset response and the SF to both attended and unattended stimuli exhibit a dipolar pattern above the left and right temporal lobes and could be modeled with single equivalent current dipoles in left and right hemispheres. However, such a model would not account for the effect of attention on the SF. The observed magnetic field maps suggest additional sources outside of the auditory cortices. We therefore applied the beamformer approach for estimating neuromagnetic source activity, which does not require a priori knowledge about the source configuration, and studied the volumetric distribution of magnetic source activity and its temporal dynamics. All effects of attention and stimulus types will be discussed based on the observed cortical source activity.
The beamformer analysis revealed sources for the P1m in bilateral Heschl's gyri (HG), the location of primary auditory cortices, which were identified at the peak latency of the response to contralateral standard stimuli averaged across attended and unattended conditions, because the effect of attention on the P1m amplitude was not significant. All other source locations were based on maps of differences between responses under attended and unattended conditions. Local maxima in the volumetric maps were considered as single sources and were found in bilateral planum temporale and the posterior superior temporal gyrus (these sources overlapped and we considered them as one source under the abbreviation STG), bilateral inferior parietal lobules (IPL), left and right inferior frontal gyri (IFG), medial frontal gyrus (MFG) near the location of the supplementary motor area (SMA), and the precentral gyrus (PreC). The Talairach coordinates of source locations are listed in Table 1. Overlays to an MRI atlas are shown in Figure 4 and demonstrate well-defined local maxima in the maps of MEG source activity. The spatial separation of sources by the MEG method is demonstrated with profiles of the right IFG source activity along the x-axis and y-axis (Fig. 4E). The activity in IFG and STG could be approximated by Gaussian functions of 12-mm spatial standard deviation, equivalent to a half-intensity width of 20 mm. The activity profiles in Figure 4E demonstrate especially that the IFG source activity does not contribute to the STG source at its peak location and vice versa, indicating excellent spatial separation of the sources.
|Source||Talairach coordinates||A:||Main effect of attention|
|S:||Main effect of stimulus type|
|x (mm)||y (mm)||z (mm)||Onset latency (ms)||Peak latency (ms)||F(1,11)||P|
|Right primary auditory cortex||S:||223||277||16.5||0.0019|
|Left primary auditory cortex||S:||223||330||15.7||0.0022|
|Right planum temporale||S:||383||570||21.4||0.0007|
|Left planum temporale||S:||357||517||58.0||<0.0001|
|Source||Talairach coordinates||A:||Main effect of attention|
|S:||Main effect of stimulus type|
|x (mm)||y (mm)||z (mm)||Onset latency (ms)||Peak latency (ms)||F(1,11)||P|
|Right primary auditory cortex||S:||223||277||16.5||0.0019|
|Left primary auditory cortex||S:||223||330||15.7||0.0022|
|Right planum temporale||S:||383||570||21.4||0.0007|
|Left planum temporale||S:||357||517||58.0||<0.0001|
Note: At the onset latencies, the P value for the main effects was ≤0.05. The F ratios and corresponding P values are reported for the latencies of maximum effect size.
Time Courses of Magnetic Source Activity
The time courses of activity at the selected sources are shown in Figure 5 for standard and deviant stimuli and attended and unattended conditions. The waveforms were averaged across the ear of stimulation because the ANOVA did not reveal interactions between the stimulated ear and attention or the stimulus type. A remarkable characteristic of the waveforms was that the responses for deviants were larger than for the standards at latencies later than 150 ms, whereas the earlier onset response was of consistent size for all stimuli and experimental conditions. The dominance of responses to infrequent deviants was obvious regardless whether the subject attended to a stimulus or not. However, the effects of attention evolved over time and responses to attended standard as well as to target stimuli increased over time. For example, in posterior sources in bilateral STG and IPL, we observed early attention-related increase in the standard response, which reached maximum at around 400 ms and diminished thereafter, whereas the effect of attention on the response to targets was initially small, increased slowly, and reached maximum at around 600 ms. The temporal dynamics were different in inferior frontal sources. Here, the standard responses were less modulated by attention than in posterior sources and the attention effect on the deviant responses dominant with earlier onset and maxima than in posterior sources. Finally, the activity observed at the SMA (MFG) revealed the completed selection process as indicated by a large response for the targets in the attended ear and comparable response to the standard stimuli and the deviants in the unattended ear.
The effects of attention and stimulus type were analyzed with ANOVA for the source activities at time points every 13 ms in the latency interval from 0 to 800 ms. The ANOVA revealed main effects of attention expressed as larger responses to stimuli presented to the attended than unattended ear. The latencies at which the main effects became significant as well as the latencies and F and P values of maximal effects are given in Table 1. Time courses of F values are given in Figure 6 for the main effects of attention (attend vs. ignore), stimulus type (standard vs. deviant), and for the interaction between attention and stimulus type.
Summary of ANOVA Results
A main effect of the side of stimulation resulting in larger responses in the hemisphere contralateral to the stimulated ear was significant in most sources at latencies around 100–200 ms and again near 500 ms. The early effect was significant in bilateral HG; right STG and bilateral IPL; and the later effect in bilateral HG, right STG, right IPL, and right IFG. Interactions between side of stimulation and stimulus type or attention were not significant, which justified averaging of data for left and right ear stimulation as in Figure 5.
The effects of hemisphere showed only borderline effects. Pairwise comparison between corresponding sources in left and right hemispheres revealed larger right hemispheric responses in HG in the early latency range of 100–200 ms [F(1,11) = 7.09, P = 0.022] only. The similar effect of larger responses in right than left IFG did not reach significance. Attention significantly increased the responses to stimuli in the attended ear, reaching significance levels early after stimulus modulation onset in all selected sources. The effect of attention reached a maximum at about 300-ms latency in bilateral HG and the right IFG and about 100 ms later in more posterior sources in STG and IPL. The effect of attention was most strongly expressed in right STG and bilateral IPL.
Main effects of stimulus type with larger responses to deviant compared with standard stimuli were dominant later than the main effects of spatial attention. Effects of stimulus type reached an early maximum around 300-ms latency in bilateral HG and IFG, between 500 and 600 ms in STG and IPL, and between 600 and 800 ms in HG, IFG, and MFG.
The interaction between the factors “attention” and “stimulus type” was of specific interest because larger effect of attention for the target than standard stimuli would indicate the final process of stimulus selection. The interaction reached significance level at around 300 ms in right HG and left STG and again at around 500 ms in left STG; however, in these cases, the effect of attention was larger for the standard than the target. Interaction between “attention” and “stimulus type” indicating larger attention effects for the deviants became significant after 400 ms in bilateral HG, STG, and IPL, in left IFG, and MFG. The interaction between “attention” and “stimulus type” reached significance in left IPL at 470 ms and a maximum at 543 ms [F(1,11) = 11.1, P = 0.006]. Similarly, the interaction was significant at 463 ms in left IFG and reached a peak at 517 ms [F(1,11) = 6.25, P = 0.025] as well as at 543 ms in right IFG [F(1,11) = 7.58, P = 0.016] before the global maximum at around 750 ms in left IFG and left IPL.
Effects of Attention
We measured the effect of directed attention as the normalized amplitude difference between responses to stimuli in the attended and unattended ears and compared this measure between the stimulus types at the 2 latencies of 400 and 600 ms (Fig. 7A). These latencies were selected to evaluate early and late processing. The hypothesis was that the stimulus selection process would be expressed as initially large effect of attention for both standards and deviants, but the effect would diminish over time for standards and increase for deviants. The ANOVA of the attention effect with factors “source location,” “stimulus type,” and “time” revealed a main effect of time [F(1,11) = 20.1, P = 0.0009] with larger attention effect at 600-ms latency and an interaction between “stimulus type” and “time” [F(1,11) = 19.8, P = 0.001] because the attention effect increased for the deviants and decreased for the standards. This interaction is demonstrated in the bar graphs shown in Figure 7A. The effect of “time” was significant for all source locations, whereas the interaction between “stimulus type” and “time” was significant for the posterior sources in IPL and STG, indicating that the posterior sources were strongly involved in the process of stimulus selection.
Activities in bilateral STG and IPL showed strong effects of attention at 400-ms latency with no significant difference between standard and deviant stimuli [STG: t(11) = 0.18, IPL: t(11) = 1.83]. However, at 600 ms, in both areas, the effect of attention was larger for the deviant than standard stimuli [STG: t(11) = 4.54, P = 0.0008, IPL: t(11) = 4.41, P = 0.001]. The difference between the attention effects on standard and deviant responses increased significantly between 400 and 600 ms for STG [t(11) = 4.07, P = 0.0018] and IPL sources [t(11) = 4.62, P = 0.0007]. Other than the posterior sources, the IFG activity showed a strong contrast between stimuli already at 400 ms [t(11) = 2.72, P = 0.02] and similarly at 600 ms [t(11) = 3.27, P = 0.007] with no significant change [t(11) = 0.67] between the 2 time points.
Responses to Unattended Deviants
One consistent finding was that the deviants presented to the unattended ear elicited larger responses than standard sounds (either unattended or attended). This may reflect a bottom-up capture of attention by the infrequent deviants in the unattended ear. We expressed the effect of stimulus type as the differences between the responses to deviants and standards, normalized to the sum of both responses. The larger this measure was, the more salient was the deviant response. We calculated the salience measures separately for the attended and unattended ears and compared how they changed over time (Fig. 7B). The ANOVA revealed a main effect of “attention” [F(1,11) = 20.7, P = 0.0008] with larger stimulus effect for the attended than unattended ear and a main effect of “time” [F(1,11) = 68.9, P < 0.0001) with increasing difference between stimuli in the 400- to 600-ms latency interval. However, no interaction between “attention” and “time” was found indicating that the salience of deviants increased over time for stimuli in both attended and unattended ears. This temporal dynamic was most pronounced in the posterior sources. The salience of the attended target increased between 400- and 600-ms latency in IPL [t(11) = 7.92, P < 0.0001] and STG [t(11) = 2.77, P = 0.029] as well as the salience of the unattended deviant did [IPL: t(11) = 3.95, P = 0.0023, STG: t(11) = 3.64, P = 0.0039]. Whereas the contrast between attended target and standard responses increased through suppression of responses to standard sounds over the 400- to 600-ms time interval, the responses to unattended deviants even increased.
Cortical Areas Involved in Selective Attention
The present MEG recordings identified a network of brain areas involved in selective auditory attention that included the superior temporal, inferior parietal, and inferior frontal regions. These regions had also been identified in neuroimaging studies of auditory selective attention (Pugh et al. 1996), visual attention (Corbetta and Shulman 2002), and processing auditory spatial and object information (Zatorre et al. 1999, 2002, 2004). Evidence for analysis of complex sound patterns in secondary auditory cortices has come from primate physiology showing that these regions have specific sensitivity for modulated sound (Rauschecker 1998) like those used in this study. fMRI studies in human showed sound modulation processing in posterior STG, and most importantly, the activity in STG increased during active listening (Hall et al. 2000). In addition, posterior STG has shown specific processing of sound location (Ahveninen et al. 2006). Our observation of strong modulation of STG activity by both selective attention and stimulus type is consistent with those reports.
The role of IPL in controlling the focus of attention had been emphasized in an fMRI study in which dichotic listening (different stimuli in each ear) activated parietal areas more than binaural listening to same stimuli did (Pugh et al. 1996). Focusing attention to one ear involves spatial attention, and the IPL potentially plays a specific role in processing the spatial aspect of attention. However, PET activation in parietal cortices was found to be similar for attention to location and for attention to pitch (Zatorre et al. 1999). Moreover, IPL activation was not consistently different, when the listener allocated attention to one ear or to both but rather depended on the demands of the attentional task (Lipschutz et al. 2002). Our finding of a strong attentional modulation of IPL activity supports the view that the IPL forms a part of a bilateral attention control network that processes sensory information in the focus of attention.
The role of frontal cortex in controlling selective attention has been suggested from observations of early activation preceding stimulus selection and related action (Miller and D'Esposito 2005). Although, our study did not require shifting attention, we found early IFG activation in addition to sustained and later peaking activity. Instead of controlling the focus of attention in the present task, the IFG could play a role in evaluation of the stimulus. Indications for this role come from a PET study about music imagery showing that IFG holds an auditory image (Halpern and Zatorre 1999), which may serve as a template to be compared with the sensory information to identify the target. The Nd wave associated with attention has both frontal and temporal components (Jemel et al. 2002), although the frontal components are more anterior and superior than our IFG sources in the present paper. Störmer et al. (2009) reported a frontal source in the event-related potential to an auditory cue stimulus that was related to directing attention to an upcoming stimulus, but again the source activity was localized more anteriorly with respect to the present IFG sources. Clearly, different areas of the frontal lobe are involved in different types of attentional control. It is also possible that the activity that we recorded in the IFG reflected 2 different processes—maintaining attention to one ear and monitoring the process of discrimination within that ear.
The strongest response to attended targets compared with all other stimuli was found later than 600 ms in MFG, the location of SMAs. The SMA subdivides into the caudally located SMA proper, which has direct connections to motor areas and is closely involved in motor action, and the rostrally located pre-SMA, which serves in more abstract aspects of motor actions (Picard and Strick 1996, 2001). The vertical line through the anterior commissure (y = 0 plane of the Talairach coordinate system) approximately separates both parts of the SMA. Thus, the maximum of source activity found in our study at y = −13 points to the SMA proper. The strong dominance of responses to attended targets over all other responses is a vivid demonstration of the completed selection process at around 600 ms after stimulus onset.
Although we did not perform a formal connectivity analysis based on signal properties like coherence, we interpret the functional modulation of the identified cortical areas as an interacting brain network for selective auditory attention that includes sensory registration in primary auditory cortex (HG), perceptual processing and stimulus selection in STG and IPL, monitoring and control of response selection in IFG, and preparation of the motor response in SMA.
Attention Modulation of Early Sensory Registration
Enhanced response amplitudes under selective attention had been reported at first for the N1 wave of the auditory evoked response (Hillyard et al. 1973). The N1m onset response in our study was small under all experimental conditions. Fast stimulus repetition could be one reason for the reduced N1m amplitude. Although short duration stimuli evoke a clearly expressed N1 response at repetition rate of 1/s (Hari et al. 1982; Näätänen and Picton 1987), the response decreases with longer duration sound (i.e., shorter silent interval between stimuli) (Hillyard and Picton 1978; Imada et al. 1997). We observed similarly small onset N1m responses to 40-Hz AM tone bursts with short ISI in previous studies (Ross and Pantev 2004; Ross et al. 2005). Moreover, enhanced processing of an attended stimulus may not be represented entirely in larger response amplitudes. Jääskeläinen et al. (2007) proposed that selective attention may sharpen the spatial representation of the stimulus through lateral inhibition. In such case, the response would be locally enhanced; however, the net effect in mass activity as recorded with electroencephalography (EEG) or MEG may be even a response reduction when compared with the response to the nonattended stimulus. Also, it has been reported that the effects of attention on the N1 amplitude could be observed at central midline electrodes in EEG but were not significant in the source waveform of a single equivalent dipole approximated to the magnetic field of the N1m in MEG (Ahveninen et al. 2003).
In a dichotic listening experiment with speech and pure-tone stimuli of about 500 ms duration, presented at ISI of 2.3 s, the N1m amplitude was not affected by attention or even reduced for word stimuli, whereas large changes had been observed in the sustained response beginning about 150 ms after stimulus onset (Hari, Hämäläinen, Kaukoranta et al. 1989). For short duration stimuli and shorter ISI, the effect of attention on the neuromagnetic evoked response was maximal around 200-ms latency when listeners attended to the duration of the stimuli; however, an earlier onset of the attention effect occurred for dichotic sounds (Rif et al. 1991). Our results are consistent with those studies indicating attention modulation of multiple components of the evoked response with main effects at latencies beyond 200 ms.
A possible explanation why the early onset evoked responses were not significantly affected by selective attention is that our stimuli contained simultaneous sound onsets in both attended and unattended ears. Any facilitation of the processing of the attended response would therefore have been superimposed on the unfacilitated or even inhibited response to the unattended channel.
However, we found increased activity under attention as early as 143 ms in right primary auditory and inferior frontal sources. At the same latency, we identified a deflection in right hemispheric magnetic field waveforms as the N1m response to the onset of stimulus AM. Thus, attention affected the early response to the task-relevant stimulus feature. The early attention effect in right IFG supports the concept of modulation of sensitivity for the attended stimulus in primary sensory cortex under control of inferior frontal cortex (Knight et al. 1989).
Attention enhanced early responses to standards and targets similarly, especially in posterior STG and IPL sources. The STG source is related to target discrimination, and the IPL is part of the network for auditory spatial information. Thus, the early attention effect increased the sensitivity in a sensory channel equally for all stimulus types. Specific enhancement for the targets was observed after 223 ms in bilateral auditory cortices. Obviously, some time was required to detect the different temporal pattern of AM.
IPL and STG responses to both standard and target stimuli in the attended ear were enhanced maximally at latencies around 400 ms. At this time attention affected standards and targets almost equally. This relation changed remarkably over the following 200 ms, with increased attention effect for the targets and decreased attention effect for the standards. Statistical interaction between attention and stimulus type showed a sharp onset for STG and IPL in the 400- to 500-ms latency interval (200–300 ms after detection of the AM onset), indicating that stimulus selection likely took place in this latency interval. Earlier dissociation between target and standard stimuli in IFG supports the role of IFG in controlling the process of stimulus selection, whereas STG and IPL are the secondary auditory areas in which the stimuli are processed and actively selected.
Enhanced Responses to Deviant Stimuli
Although they accurately detected targets in the attended ear, the participants also sometimes responded incorrectly to the deviant stimuli in the unattended ear. In the natural world, the ability to detect a “novel” or “deviant” sound, even in the presence of competing background noise, is a clear benefit for survival. Thus, during evolution the brain developed a highly efficient system for deviant detection. Our data showed that the enlarged responses to deviant sounds occurred with or without attention but increased further when they were attended.
The infrequent changes in the stimulus modulation frequency constituted a mismatch paradigm and a deviation type response such as the mismatch negativity (MMN) could have contributed to the present waveforms. Indeed, larger N2 responses to deviant stimuli with latencies in the 200- to 250-ms range have been found in dichotic listening (Woldorff et al. 1991). The finding that attention enhanced the N2 and the following P3 wave led to controversy about whether detecting stimulus deviation, as reflected in the MMN, is a purely automatic process or is enhanced under cognitive control during attention. One proposal has been that the MMN reflects automatic processing, whereas another N2b component is dependent upon effects of attention (Näätänen and Winkler 1999). However, this does not fit with the MEG findings of Woldorff et al. (1998) showing a supratemproal source for the N2 effect. The effect of deviants on MEG source activity in primary and secondary auditory areas and in IFG in our study likely contains contributions from the same cortical sources which generate the Nd, MMN, or N2b waves of the AEF. This corroborates previous findings that sources in superior temporal and inferior frontal cortices contribute to generation of the MMN as shown from EEG source analysis (Giard et al. 1990; Jemel et al. 2002), fMRI (Rinne et al. 2005), lesion studies (Alain et al. 1998), and optical imaging (Tse and Penney 2008). It has also been suggested that the MMN may be related to selective adaptation of the generators of the N1 wave (Jääskeläinen et al. 2004). One important finding was that detection of deviants was reflected differently in temporal and frontal sources. Whereas the activity in STG increased with the amount of stimulus deviation, IFG was not affected in a combined EEG and fMRI study (Schönwiesner et al. 2007). Such findings have been discussed as support for a hierarchical model with acoustic change detection in primary auditory cortices, analysis of stimulus change in secondary auditory cortices, and judgment of novelty and eventually reallocation of attention under control of inferior frontal cortices.
An alternative hypothesis would be that the 20-Hz AM sounds elicit larger responses than 40-Hz AM sounds do, and the observed larger responses for deviant stimuli would be related to the different likelihood of occurrence or being in the focus of attention for deviance detection. However, previous literature provided evidence for smaller responses to 20-Hz compared with 40-Hz stimuli. The neuromagnetic steady-state response, with main contribution from primary auditory cortex, is smaller at 20 Hz than at 40 Hz (Hari, Hämäläinen, and Joutsiniemi 1989; Ross et al. 2000). The N1m onset response declines systematically for decreasing rate of periodic stimulation (Forss et al. 1993). However, the onset response in our study was likely less affected by the stimulus rate because the sound onset was the same for all stimuli and the onset of AM was delayed. Different types of activation in primary and secondary sensory cortices has been shown by Forss et al. (2001), in which primary somatosensory cortex showed a sequence of transient responses to a 12-Hz stimulus train, whereas secondary somatosensory cortex showed a sustained amplitude shift for the duration of the click train. For the auditory system, Gutschalk et al. (2002) demonstrated that the sustained response is larger for regular rhythms like the AM in our study compared with irregular temporal structures. Most importantly, the size of the sustained response decreased with decreasing stimulus rhythm and the amplitude of the sustained field was about one-third smaller for a 20-Hz stimulus compared with a 40-Hz stimulus. In summary, a change in modulation frequency from 40 to 20 Hz most likely causes the responses to decline, whereas in our study, the responses to the 20-Hz deviant stimuli were enhanced regardless of presentation in the attended or unattended ear.
Separate Top-Down and Bottom-Up Mechanism?
One explanation for the concurrent top-down and bottom-up attention processes in our study would be that each process is associated with a separate underlying neural network. First, the task-oriented direction of attention to one ear enhances the sensitivity in the spatial sensory channel, and this sensitivity increases over the time course of several hundred milliseconds. Second, bottom-up sensitization of infrequent stimuli facilitates selection of the targets. Indeed, directed attention in dichotic listening is conceived as affecting perceptual input but not response selection (Treisman and Geffen 1967; Asbjornsen and Hugdahl 1995). In our study, the initial effect of attention on the standard stimuli was diminished after about 600 ms. This supports the proposal of an active stimulus selection mechanism rather than a simple bottom-up discrimination. This process would gradually increase the sensitivity for the relevant target and decrease the sensitivity for the irrelevant standard stimuli in the attended channel. This concept is consistent with previous behavioral results showing that selective attention in dichotic listening improved signal detection in the attended channel to the detriment of detection in the unattended channel, consistent with 2 stages of processing: sound localization and then stimulus discrimination (Hiscock et al. 1999).
Using advanced MEG data analysis approaches, we localized cortical sources underlying attention control during dichotic listening. More importantly, we obtained the time courses of their activation. Directing attention to one ear enhanced the sensitivity in the sensory channel and was associated with increased activation in HG and IFG at latencies of 150–500 ms. A later effect of attention was different on target and standard stimuli and identified IPL and STG as the location of stimulus discrimination during the 400- 600-ms latency interval. The responses to targets presented to the unattended ear showed increased activity during the time interval of target identification, indicating that stimulus discrimination still proceeded in the ignored channels despite the higher sensitivity in the selected channel. The time course of early activation in IFG with the early onset of effects of attention and effects of stimulus type suggest a role of IFG in monitoring auditory input, maintaining attention to the selected sensory channel, and controlling stimulus discrimination.
Canadian Institutes for Health Research (grant no. 81135) and the Canadian Foundation for Innovations.
Conflict of Interest: None declared.