The posterior visual event-related potential (ERP) component, the N2pc, has been widely used to study lateralized shifts of attention within visual arrays. Recently, Gamble and Luck (2011) reported an auditory analog of this activity (the fronto-central “N2ac”), reflecting the lateralized focusing of attention toward a Target sound among 2 simultaneous auditory stimuli. Here, we directed an electrophysiological approach toward understanding auditory Target search within a more complex auditory environment in which rapidly occurring sounds were distributed across both time and space. Trials consisted of ten 40-ms monaural sounds rapidly presented to the 2 ears: 8 medium-pitch tones and 2 deviant sounds (one high and one low). For each block, one deviant type was designated as the Target, which participants needed to identify within each trial to discriminate its tonal quality. The extracted electrophysiological results included a very early enhancement, starting at approximately 50 ms, of a bilateral negative-polarity auditory brain response to the designated Target Deviant (compared with the Nontarget Deviant), followed at approximately 130 ms by the N2ac activity reflecting the lateralized focusing of attention toward that Target. The results delineate the tightly orchestrated sequence of neural processes underlying the detection of, and focusing of attention toward, Target sounds in complex auditory scenes.
The auditory world is rich with competing and multifaceted stimulus information. The highly studied “Cocktail Party” problem (Cherry 1953) is a classic example of a complex sensory circumstance in which we are able to maintain attentional focus on a particular auditory stream, typically arising from a particular spatial location (e.g., a particular speaker), while ignoring concurrent irrelevant auditory inputs. In contrast, a much less studied problem is how we are able to search a complex auditory scene to detect and discriminate a specific target without prior knowledge of its location, although some recent behavioral studies have begun to explore this type of auditory scene (e.g., Cusack and Carlyon 2003; Dalton and Lavie 2007; Eramudugolla et al. 2008).
Consider, for example, the children's game Marco Polo. A round of the game, usually played in a pool, requires the person who is “It,” with his/her eyes closed, to locate and tag another individual based solely on the other players' voices. When the person who is “It” yells “Marco,” the other players have to respond by exclaiming “Polo,” thereby giving auditory information on their locations. Often, the person who is “It” will try to select a specific individual to tag. So, when the other players give their “Polo” responses, the “It” player will search the array of sounds for the specific person's voice that he or she is trying to locate. Although this is just a children's game, we can easily extrapolate this example to other situations in which one has to search for a particular target sound in a complex auditory scene, such as a mother listening for her child's cry in an acoustically noisy environment such as a playground or a hunter searching for a potential prey's distinctive rustling of leaves in a jungle full of chirping insects and birds. While there are certainly situations in which we need to maintain our attention to a specific auditory source or location, as in the Cocktail Party situation, there are important circumstances in which we do not know ahead of time the location of a target sound, and we must use information we know about the specific qualities of a target sound to search for and locate it. This process of sifting through all of the sounds in a complex auditory scene in order to detect and focus attention toward a particular sound of interest will be referred to as “auditory search” in this paper (also see Dalton and Lavie 2007; Eramudugollaet al. 2008).
In a recent study, Gamble and Luck (2011) investigated auditory Target search among 2 simultaneously presented lateralized sounds, and found that the focusing of attention toward the lateralized Target stimulus resulted in a contralateral negative-polarity event-related potential (ERP) waveform around 200–300 ms poststimulus. These authors termed this contralateral negativity the N2ac, due to its functional similarity and parallel extraction methods to the visual attention-related ERP component known as the N2pc. The N2pc is a posteriorly distributed ERP component in the N2 latency range, contralateral to a detected visual Target item embedded in a visual array of multiple distractors (Luck and Hillyard 1994a, 1994b; Woodman and Luck 1999; Mazza et al. 2009). The N2pc is thought to reflect a focusing of attention toward a specific visual Target stimulus in a visual scene and has been used extensively to study different aspects of visual attention (e.g., Eimer and Kiss 2007; McDonald et al. 2009; Dell'Acqua et al. 2010; Sawaki and Luck 2010). The analogous auditory N2ac component could be used in a similar fashion to study the focusing of auditory attention toward an auditory Target during auditory search.
In the present study, we directed such an electrophysiological approach toward understanding the sequence of neural processes underlying auditory Target search and attentional focusing, in particular under the more complex auditory environment in which rapidly occurring sounds are distributed across both time and space. In the Gamble and Luck (2011) study, 2 different sounds of identical onset, duration, and offset were simultaneously presented to the left and right ears, and subjects searched this pair of sounds for a designated Target stimulus. In the real world, however, sounds from different sources and locations rarely occur at precisely the same time (i.e., with identical onsets and offsets), although the occurrence of temporal overlap among stimuli is fairly common. Moreover, in contrast to searches of arrays of visual stimuli, it is rather difficult to selectively find and focus attention to one auditory Target stimulus among multiple sounds that are presented simultaneously. In addition, in a typical auditory environment, it is rare to encounter only 2 sounds occurring close together in time; rather, in many auditory environments, multiple stimuli may be occurring in close temporal proximity. Accordingly, in order to create a paradigm with competing sounds that better simulates a more typical complex auditory environment, we embedded a Target sound in a set of other auditory stimuli that were distributed across a short period of time (a few hundred milliseconds) and were coming from multiple spatial locations. In this way, we were able to include an “array” of 10 sounds that were distributed across both space and time, consisting of 1 Target Deviant, 1 Nontarget Deviant, and 8 repeated “Standards.” This design created an auditory paradigm that was more analogous to the widely studied visual search arrays, while also implementing the varying spatial and temporal distributions of sounds that one tends to encounter in real life.
There were additional benefits to designing the experiment in this way. Presenting a set of stimuli simultaneously makes it virtually impossible to separately extract the electrophysiological responses to the individual stimuli. Having the sounds distributed in time allowed for averages of the evoked brain activity to be selectively extracted for the Standards and for each Deviant type. Thus, this approach not only enabled the extraction of neural activity specifically related to the lateralized focusing of attention toward the designated Target stimulus (e.g., potential N2ac activity), but such an approach also enabled the extraction and comparison of any deviance-related activity for the Target and Nontarget stimuli.
The Gamble and Luck (2011) study suggested that the N2ac, at least that elicited with simultaneous stimuli occurrence, reflects some stage of attentional processing associated with the detection of, or focusing of attention toward, a relevant lateralized Target. This activity, however, provides only a limited understanding of the cascade of processes underlying auditory search. In the present study, by selectively extracting the specific electrophysiological activations for the various stimulus types occurring in a complex auditory scene, we aimed to delineate the rapid temporal cascade of processes that occur in the brain during the detection and discrimination of a Target sound during auditory search. In particular, we hypothesized that an N2ac would be present for the Targets and absent for the Nontargets in these rapid stimulus streams, indicating a selective focusing of attention toward the Target after its detection. We also hypothesized that our approach would enable us to uncover other, potentially even earlier, neural markers of the selective processing of the relevant Target, including its initial detection prior to the focusing of lateralized attention toward it reflected by the N2ac.
Materials and Methods
Eleven participants (5 females) with an average age of 21 years (range 18–29 years), with self-reported normal hearing and no reported history of neurological disorders or diseases, took part in the experiment. All participants provided informed consent before participating and were paid for their participation. Because the discrimination task was very difficult, potential participants were prescreened behaviorally to determine whether or not they could perform the discrimination. Of the 20 initial participants who partook in the behavioral prescreening, 11 were invited back to participate in the electroencephalography (EEG) experiment.
Stimuli and Task
As shown in Figure 1, each trial consisted of a rapid sequential presentation via headphones (Sony Dynamic Stereoheadphones, MDR-V600) of ten 40 ms sounds, 5 in the left ear and 5 in the right, each with 3 ms rise and fall times and with interstimulus intervals of 10 ms, giving a total trial duration of 500 ms. Eight of the 10 sounds were repeated medium-pitch tones (Standards), and the remaining 2 sounds were “Deviant” tones, one high and one low, presented to opposite ears on a trial. The first 2 sounds were always Standards and were presented to different ears. For half of the trials, the first Standard was presented to the left ear, and for the other half of the trials, it was presented to the right ear. For the rest of the presented tones (slots 3–10), the order of the ear of presentation was pseudorandomized, with 4 of the 8 remaining sounds presented to each ear and no more than 3 sounds in a row occurring on the same side. The presentation of the Deviants was selected randomly within slots 3–10, with the only constraints being that the 2 Deviant types were never presented to the same ear. In addition, there were 2 versions of the low and high Deviant tones with different tonal qualities: A pure-tone version and an amplitude-modulated (AM) version. To create the AM tones, each pure tone was multiplied with a 37.5-Hz envelope waveform.
Three different pitches were used for the tones in each trial: 3000, 1396, and 500 Hz, which were chosen to be distinct and discriminable from each other at these stimulus rates and were all measured to be 60 dB SPL. The middle tone, 1396 Hz, served as the repeated Standard tone, and the high-pitched and low-pitched tones were the Deviants. For each block of trials, either the high-pitched or low-pitched Deviant tone was designated as the relevant Target, rendering the other pitch Deviant the Nontarget for that block. Each participant completed a total of 2 blocks, one block of 742 trials for each of the Target designations, with the trials separated by intertrial intervals of 1500 ms. The order of the 2 Target-designation block types was counterbalanced across subjects. The participants' task was to attend to each tone series trial, detect the designated Target tone within it, and discriminate whether it was pure or AM, indicating their decision by pressing one button when it was a pure tone and a different button when it was amplitude modulated (both with their dominant hand).
One of the key features to note in this design was that the participants did not know ahead of time which side the Target was going to be presented. In order to make a Target discrimination, the participant had to attend to the input of both ears, search for and find the designated Target sound within each trial, and then orient their attention to that Target to discriminate whether it was AM or pure.
Recording and Analysis
EEG was recorded with a Synamps Neuroscan system (Charlotte, NC, USA) using a customized, extended coverage, 64-electrode elastic electrode cap (Electro-cap International, Eaton, OH, USA). The EEG data were sampled at 500 Hz, with an online bandpass filter of 0.01–100 Hz. The horizontal electro-oculogram was recorded using 2 electrodes placed on the outer canthi of each eye, referenced to each other. The vertical electro-oculogram was recorded from electrodes below the left and right eye referenced to electrodes placed above the left and right eyes, respectively. The scalp EEG activity was recorded referenced to the right mastoid, but was re-referenced algebraically offline to the average of the 2 mastoids. Independent component analysis was used to identify and correct for eye blinks (Jung et al. 2000). Trials with any additional excessive eye movements or muscle movements were excluded from analysis. In addition, the selectively averaged ERPs were baseline corrected by subtracting the mean amplitude of the baseline (−100 to 0 ms) from the ERP for each time-locked stimulus.
The naming convention for the recording sites of our equidistant-electrode custom cap adheres generally to the standard 10–20/10–10 systems, with some descriptor modifications as necessary. For sites deviating more than a few millimeters from their closest 10–20 or 10–10 analogs, the traditional name of the closest site is used, with a prime (′) symbol added (e.g., O1′, FC3′). We have specifically identified the relevant electrodes in each case on the head figures in the Results section.
As noted above, the Target and Nontarget Deviant tones could occur at any of stimulus time slots 3–10 in each tone series. The ERP responses to the Target tones were averaged together to create a presented-on-left Target waveform and a presented-on-right Target waveform, regardless of when it occurred within the trial. Similarly, Nontargets were also selectively averaged by the side of presentation, regardless of when they occurred in the trial. The first 2 Standards for each tone-series trial were discarded, and the rest of the Standards were averaged together separately by the side of presentation and by whether that side had a Target or Nontarget stimulus on that trial (LStandardTarg, RStandardTarg, LStandardNontarg, and RStandardNontarg). In order to selectively extract the deviance-related activity, the appropriate Standard-tone ERP was subtracted from the corresponding Deviant type ERP for that side (e.g., LStandardNontarg was subtracted from a Nontarget presented on the left).
Due to the very rapid presentation of the 10 stimuli in each trial, the ERP responses from the previous and subsequent stimuli in the series overlapped substantially with the time-locked activity for the Target and Nontarget Deviants of interest (reviewed in Woldorff 1993; also see Luck 2005). The bulk of this distorting overlap on the Deviant responses could be removed by subtracting the Standard-tone waveform from the Target and Nontarget Deviant waveforms for that same side, given that they had similar overlap, which also enabled extraction of the deviance-related activity of interest for the Target and Nontarget Deviants. Although this subtraction removed the majority of the overlap distortion from the trial series on the Deviant-tone responses, the overlap on the Deviants and the corresponding same-side Standards still differed somewhat (e.g., the Standards would have overlap from both the Target Deviant and the Nontarget Deviant in the rapid-series trial, whereas the Target Deviant would have overlap only from the Nontarget Deviant), and thus this subtraction did not completely remove all the overlapping distortion that could be present. Accordingly, in order to improve on this overlap-distortion correction process, we employed a modified version of the ADJAR technique (Woldorff 1993) to first estimate and correct the ERP overlap from the Deviants on the Standard-tone responses before they were subtracted [*See final paragraph of Methods section]. Contrasts between these better estimates of the deviance-related ERP responses could then be performed to selectively extract specific functional activations. Any further subtractions, such as to isolate the N2ac, are described in detail below in the appropriate sections of the Results.
Grand averages were filtered with an additional low-pass Gaussian filter with a half-amplitude cutoff of 50 Hz. These filtered averages were used in all figures and statistical analysis. To calculate the onset latency for specific isolated components and amplitude differences for different factors, the mean amplitude for successive 10 ms windows was subjected to repeated-measures analyses of variance (ANOVAs), correcting for sphericity deviations with Greenhouse–Geisser P-value corrections. The effect in a time range was considered significant if it was significant not only in that specified time range, but also in the next 3 consecutive time ranges. In addition, an ANOVA on the mean amplitude over a longer time range (350–550 ms) at longer latencies was applied to test statistical differences between Target and Nontarget responses for the Late Posterior Contralateral Positivity over occipital cortex (Gamble and Luck 2011).
[*More specifically, we took the initial best estimate of the large Target responses and convolved it with its adjacent-event stimulus distribution relative to the Standards on that side, thereby providing an estimate of the overlap from the Target responses on the StandardTarg ERP. Subtracting this Target-overlap estimate from the StandardsTarg ERP yielded a corrected (i.e., “Target-overlap-corrected”) StandardsTarg response. A final Target minus Standard contrast (using this corrected Standard response) then provided a corrected version of the deviance-related activity elicited by the Targets. (Note that since both the Standards and the Targets had Nontarget overlap, it was removed directly by this subtraction.). Similarly, we corrected the StandardsNontarg for Nontarget response overlap. We then subtracted these Nontarget-corrected StandardsNontarg from the original Nontargets, yielding the corrected deviance-related activity elicited by the Nontargets. (Again, note that since both the Standards and the Nontargets had Target overlap, it was removed directly by this subtraction.)]
Participants correctly discriminated the tonal quality (pure vs. AM) of the target stimulus 89% of the time, with an average response time (RT) of 557 ms. Separately considered, the mean RT was 563 ms for the high-tone Targets and 551 ms for the low-tone Targets. For the pure Target tones, the mean RT was 565 ms, compared with 549 ms for AM Target tones. A 2 × 2 ANOVAs with pitch (high and low) and tone quality (pure and amplitude modulated) as within-subject factors indicated that there were no statistically significant main effects or interactions on RT or accuracy as a function of whether the Target stimulus was a high or low tone, or whether it was amplitude modulated or pure.
The experimental manipulation required participants to identify, focus attention toward, and then make a discrimination of the designated Deviant Target tone. To provide a Target stimulus feature to discriminate, our paradigm included both pure and AM versions of the 2 Deviant tones. Any difference between the processing of these 2 stimulus types, however, was not a focus of this paper. Neither the reaction times (see above) nor the raw ERPs differed between the 2 stimulus types. Accordingly, the ERPs were collapsed across the Pure and AM Deviant tones.
To examine effects on the various stages of processing for the different auditory stimulus types, a series of electrophysiological subtractions were performed. Below, we first describe the extraction of the nonlateralized deviance-related activity and then describe the extraction of the lateralized deviance-related activity. For both of these analyses, ERPs were extracted separately for the Target and Nontarget Deviants, but were collapsed across the contralateral hemispheres to the deviant in each case (i.e., the activity at left hemisphere sites when the Deviant was presented on the right was combined with that at the corresponding right hemisphere sites when the Deviant was presented on the left) and analogously collapsed across the ipsilateral hemispheres to the Deviant (left hemisphere activity for left-sided Deviants was combined with right hemisphere activity for right-sided deviants). Finally, in the lateralized analysis, we will discuss the longer-latency contralateral activity that was specific for the Target Deviants (compared with the Nontarget Deviants), which we expected to yield the N2ac component, as well as any other longer-latency activity associated with the lateralized focusing of attention to the Target.
Nonlateralized Deviance-Related Activity
To examine the general activation pattern for deviance-related activity, the data were analyzed (after ADJAR correction—see Materials and Methods) to create Target, StandardsTarg, Nontarget, and StandardsNontarg ERPs (where the subscripts “Targ” and “Nontarg” for the Standard ERPs indicate that the averages were derived from the Standards from the same side on which the Target or Nontarget Deviant tone occurred). These analyses were first performed separately for left and right stimuli and then the ERPs for the 2 locations were collapsed appropriately together to examine the contralateral and ipsilateral activations to the lateralized stimuli. Target minus StandardsTarg and Nontarget minus StandardsNontarg difference waveforms from these analyses are shown in Figure 2A (site Cz), collapsed across the type of Deviant (Hi/low and AM/Pure), and the topographic plots for the corresponding Target versus Nontarget difference are shown in Figure 2B. These data show that there was a robust nonlateralized, fronto-centrally distributed enhancement of early deviance-related activity for the designated Target relative to the Nontarget. More specifically, as can be seen in the topographic plots, regardless of whether the Deviants were on the left or on the right, there was a substantially larger bilaterally distributed enhanced negativity for the Targets in the 60- to 120-ms range. [Additional analyses confirmed that there was no significant contralaterality to this early Target-specific activity, as is further discussed in the next section.]
As can be seen in Figure 2A, this Target-related difference started quite early, onsetting at approximately 50–60 ms. The Target-specific enhancement was confirmed by a series of 2-way ANOVAs (Deviant vs. Standard × Target-side-stimuli vs. Nontarget-side-stimuli) on the mean amplitudes of the ERP responses in 10-ms windows from 0 to 200 ms. Because of the nonlateralized nature of this early effect, the statistical analyses were focused on the activity at the midline site Cz. [Additional analyses on fronto-central sites isolated to the ipsilateral hemisphere were performed as well. By analyzing the ipsilateral region of interest (ROI), we were able to focus only on activity that was present over both hemispheres (i.e., the bilateral part), thus excluding any additional contribution from the larger, contralateral activity. The results of these analyses did not differ from those performed at Cz.] First, a significant main effect of Deviant versus Standard showed an enhanced early negative-polarity ERP response to the Deviants (both Targets and Nontargets) relative to the Standards, starting in the 50- to 60-ms time range and lasting until 140–150 ms (F-values ranging from 6.28 to 36.9; P-values ranging from 0.03 to 0.0001). Most importantly, however, there was significant interaction between Deviancy and Targetness, which became significant starting between 50 and 60 ms and lasting until the 120- to 130-ms time range (F-values ranging between 5.79 and 15.57; P-values ranging from 0.04 to 0.003). The very early, nonlateralized, Target-specific enhancement of this negative ERP activity (which we will refer to here as the Early Bilateral Negativity, or EBN, effect) to the Deviants as a function of whether they had been designated as the Target or the Nontarget in that block would thus appear to reflect a very early Target-specific deviance detector (see below for further Discussion).
Lateralized Deviance-Related Activity for Both Targets and Nontargets
Given that the Target and Nontarget Stimuli, and indeed all the tones, were laterally presented, we also wanted to examine the lateralized portion of the deviance-related activity. The ERP waveforms for activity contralateral and ipsilateral to the Deviant stimuli are displayed in Figure 3A, separately for the Targets and for the Nontargets. These were extracted from a symmetric pair of lateral fronto-central ROIs, consisting of data collapsed across 3 electrode sites on the left (C3′, C5′, and FC3′) and 3 on the right (C4′, C6′, and FC4′) (see figures for locations). In addition, in order to further functionally isolate the lateralized deviance-related activity, we also created contralateral minus ipsilateral difference waves, separately for the Target minus StandardTarg waveforms and for the Nontarget minus StandardNontarg waveforms (Fig. 3B). This lateralized deviance-related activity also started very early, appearing as a negative wave onsetting at approximately 60–70 ms. As shown in Figure 3B, and in contrast to the early nonlateralized deviance-related negativity (i.e., the EBN) described above, this lateralized deviance-related negativity did not differ between the Target and Nontarget Deviants in its early phase, showing identical amplitude levels from the onset out to around 130 ms (i.e., for another 60 ms afterwards).
Following the first portion of these contralateral minus ipsilateral waveforms, however, the Nontarget deflection began to return to zero (starting at ∼130–140 ms), while the corresponding Target waveform had a second, substantially longer-lasting phase of negative-polarity activity. This later Target-specific lateralized activity thus appeared to be equivalent to the N2ac effect reported previously in Gamble and Luck (2011) for simultaneous stimuli, reflecting the lateralized attentional focusing toward the detected Target (see below). To determine the onset of the Lateralized Deviance-Related Activity for the Target and Nontarget, we ran a series of 2 × 2 ANOVAs ([Contralateral vs. Ipsilateral] × [Target vs. Nontarget]) on the mean amplitudes of these waveforms in 10 ms windows on the lateral fronto-central ROIs noted above. Consistent with the EBN effects described above, there was a main effect of Target vs. Nontarget at these more lateral electrodes starting in the 50- to 60-ms time window and lasting until the 170- to 180-ms window (F-values ranging from 5.39 to 13.4; P-values ranging between 0.04 and 0.004) (Fig. 3A). In addition, there was a main effect of contralaterality starting at around 60–70 ms and lasting until 190–200 ms (F-values ranging from 8.59 to 49.4; P-values ranging from 0.015 to 0.00003), which can clearly be seen in Figure 3A,B, where the contralateral deviance-related activity is more negative than the ipsilateral activity, for both Targets and Nontargets. On the other hand, and importantly, there was no significant interaction between these 2 factors (Contra/Ipsi and Target/Nontarget) in the early time range, with such an interaction not appearing until much later in time (i.e., when the N2ac begins, described more below). This lack of a significant interaction during the earlier time range is consistent with the observation that the Lateralized Deviance-Related Negativity (contralateral minus ipsilateral difference wave) did not differ for the Targets and Nontargets in this time range (see Fig. 3B). Thus, the early-latency deviance-related activity showed no lateralized difference as a function of whether the Deviant was a Target, whereas the early-latency nonlateralized deviance-related activity (i.e., the EBN) did show a Target-specific enhancement, as described above.
N2ac: Lateralized Attention Effect
In order to more clearly isolate the N2ac from the Lateralized Deviance-Related Negativity, we created another difference wave by subtracting the Contra minus Ipsi Nontarget difference wave from the Contra minus Ipsi Target difference wave (Fig. 3C). Because there was no early-latency difference in the lateralized deviance-related activity for Targets and Nontargets, this derivation extracts the longer-latency, long-lasting, Target-specific lateralized negativity over the lateral fronto-central electrodes, which started at 130 ms and lasted for around 200 ms (see voltage maps Fig. 3C). We again ran 2 × 2 ANOVAs (Contralateral vs. Ipsilateral × Target vs. Nontarget) on the mean amplitudes over successive 10 ms windows, which is essentially a continuation of the analysis used above for the Lateralized Deviance-Related Activity. While there were no significant interactions between contralaterality and Targetness for the first 130 ms, the interaction effect did become significant between 130 and 140 ms and remained so until the 340- to 350-ms time range (F-values between 5.93 and 15.7; P-values between 0.03 and 0.003). This interaction indicates that the laterally presented Deviant tones elicited greater activation contralaterally (vs. ipsilaterally) for the Targets versus the Nontargets starting at 130 ms after the stimulus and continuing until almost 350 ms (i.e., the N2ac).
Late Posterior Contralateral Positivity
In addition to the fronto-central activity associated with the Target identification and discrimination, the Targets also elicited a late positive-polarity contralateral effect over posterior scalp sites, similar to an effect observed in Gamble and Luck (2011). This Late Posterior Contralateral Positivity, although smaller and less robust than the N2ac, started approximately 350 ms after the stimulus onset in the present study and lasted for several hundred milliseconds (see Fig. 4A,B). An ANOVA performed on the mean amplitude between 350 and 550 ms for [Target vs. Nontarget] × [Contralateral vs. Ipsilateral] on a left-right pair of posterior ROIs, consisting of electrode sites PO7/PO8, P3′/P4′, P7′/P8′, showed a significant interaction (F1,10 = 5.72, P = 0.037), due to the contralateral posterior waveform being more positive than the ipsilateral waveform for the Target stimulus, but not for the Nontarget stimulus. There was also a main effect of Target versus Nontarget (F1,10 = 43.40, P < 0.0001), due to the presence of nonlateralized long-latency activity, which reflects the well-known P300 response associated with Target detection (reviewed in Polich 2007). Additionally, there was a significant main effect of Contralaterality (F1,10 = 18.87, P < 0.0015).
In the present experiment, we looked at the neural cascade of processes invoked when an individual searches for and detects a Target sound in a spatially and temporally distributed array of auditory stimuli, orients attention to the Target, and makes a difficult discrimination concerning its features. Participants were asked to find a specific Target in a rapidly presented auditory scene consisting of 8 Standards and 2 pitch Deviants (one of which was designated as the Target in different blocks of trials). Once the Target was identified, participants had to perform a difficult discrimination of its subtle features, which required focused attention and processing of that stimulus. Owing to the temporally distributed nature of the stimulus presentation in this paradigm, we were able to extract ERPs to all of the individual sounds (e.g., Standards, Target Deviants, and Nontarget Deviants) and perform finely tuned contrasts between their neural responses, thereby enabling a delineation of the neural cascade of processing involved during auditory search.
More specifically, the pitch Deviants (both Targets and Nontargets) elicited an early nonlateralized negativity (EBN) over fronto-central scalp sites, starting at around 50 ms and peaking at around 100 ms after stimulus onset, which was substantially larger for Targets compared with the Nontargets (see summary panels in Fig. 5). Given that this EBN effect was derived from a response contrast for physically identical stimuli that differed only in whether they had been designated as a Target or Nontarget for that block of trials, we interpret this effect as reflecting a very early, bilateral, template-matching process for the task-relevant Target. Shortly after and in parallel, starting at around 60–70 ms, both the Target and Nontarget Deviants elicited a lateralized activation, but it did not differ for Targets and Nontargets during this early time range. Indeed, the lateralized part of the deviance-related activity did not differentiate between the Targets and the Nontargets until about 130 ms poststimulus, at which point there began an enhanced contralateral negativity that was specific for the Target, namely the N2ac, that lasted until 350 ms. Finally, as in Gamble and Luck (2011), there was a Late Posterior Contralateral Positivity to the Target, beginning around 350 ms and lasting until around 550 ms.
Early Nonlateralized (Bilateral) Negativity
An advantage of the temporally distributed nature of the stimulus presentation used in the current design (when compared with presenting 2 or more sounds simultaneously) was the ability to selectively extract the ERP activity associated with the Target and the Nontarget stimuli. Comparing these activations revealed a very robust and strikingly early-latency Target-related effect (starting at 50–60 ms)—in particular, a very early bilateral differential negativity for the Target compared with the Nontarget waveforms (i.e., the Target-specific enhancement of the EBN). Because this early activity was larger for the Targets compared with the Nontargets, and occurred so early in time, it suggests that this differential bilaterally distributed activity reflects a Target-matching process to a functionally preset template of the Target.
It is important to consider some possible causes of this EBN for the Deviant tones and whether they could be, in turn, causing the Target-related difference that was observed. First, the bilateral negativity in this latency range could reflect refractoriness of the N1 sensory response—namely, that the neural response to the Deviants was enhanced because some of the neuronal elements responding to them were being stimulated less frequently than those for the standards (see Näätänen and Picton 1987 for review). However, such an effect cannot explain the differential activity for the Target and Nontarget waveforms. Although the N1 response to the infrequent high and low pitch Deviants was likely enhanced relative to the Standards due to such refractory effects, the paradigm and analyses are such that the Target and the Nontarget Deviants were the exact same physical stimuli, with either the low or high Deviants being designated as the Target stimulus on different blocks of trials. Accordingly, the Target and Nontargets would have had the same amount of any such N1 refractoriness, thereby eliminating this as a possible explanation for the Target-specific enhancement of the EBN.
Although seemingly too early, this differential early activity for the Targets could also reflect a very early part of a deviance-related activation known as the mismatch negativity (MMN). The MMN is a widely studied ERP component that is elicited in response to a deviant auditory stimulus within a stream of repeated standard sounds and that manifests as a frontro-central negativity that typically peaks between 130 and 200 ms, although it has recently been shown that it can occur earlier (Sonnadara et al. 2006; Grimm et al. 2011). The MMN is thought to arise from the detection of a Deviant stimulus that violates an auditory template built up by the repetition of the standard sounds, and is elicited in response to deviations in any of a number of auditory features, including pitch, sound intensity, duration, and location (Näätänen et al. 1989; Paavilainen et al. 1989; Woldorff et al. 1991; Giard et al. 1995; Schroger and Wolff 1996).
Although the MMN can be elicited by deviant stimuli even when subjects are not attending (Näätänen et al. 1978; Sams et al. 1985; Näätänen et al. 1993), it has also been found to be modulated by strongly focused spatial attention to a specific auditory channel versus an unattended one (Woldorff et al. 1991; Trejo et al. 1995; Woldorff et al. 1998). The attentional focus in those experiments, however, was preset on a specific stimulus input source and location for an entire block of trials. In the present study, the location of the Target was unknown prior to the onset of the trial, such that participants could not attend to a particular spatial channel or stream. Thus, for each trial, attention was necessarily initially divided between, or spread across, the 2 possible locations. Based on the previous literature, it would therefore be expected that a similar MMN would be elicited for both the Target and Nontarget Deviants because the effects of attention on both the Target and Nontarget deviance responses would be expected to be similar (cf. Woldorff et al. 1991). This is, indeed, what we see in the early portion of the lateralized portion of the deviance-related activity. The expected and observed similarity in activation for the Target and Nontarget for the Lateralized Deviance-Related Negativity would thus tend to rule out such channel-specific enhancement of the MMN as an explanation for the Target-specific enhancement of the EBN described above. Moreover, even in previous studies showing attentional enhancement of the MMN, the effects were much later in time (140–200 ms) (Woldorff et al. 1991; Trejo et al. 1995) than the very early effects seen here. Thus, if this effect reflects an early Target-specific effect on the MMN, the conceptualization of the MMN would have to change drastically, including its potential timing and its sensitivity to Targetness.
Alternatively, it could be argued that this Target-specific enhancement of the EBN is another attention-related activation known as the processing negativity (Näätänen et al. 1978) or Nd (Hansen and Hillyard 1980). Paradigms looking at this attentional effect presented participants with a series of tones that varied randomly in some basic feature characteristics, such as between 2 pitches or 2 locations. Participants in such studies were required to attend to one of the repeated tone types (Standards) for an infrequent Target stimulus of that type (e.g., one that was slightly fainter or differed subtly in some other dimension), while ignoring all stimuli of the other type. The Nd or processing negativity is isolated by deriving a difference wave for the repeated Standard tones when they were attended versus when those same repeated Standard tones were unattended, which typically manifested as a show, long-duration, enhanced negativity for the stimuli when they were attended. Although the Nd for focused spatial attention has been found to begin by 60–100 ms or so, partially overlapping the N1 sensory component, for attention to 1 of 2 pitches this enhanced negativity has generally been observed to begin somewhat later, typically starting at approximately 130–150 ms poststimulus and lasting for a few hundred ms (Hansen and Hillyard 1980; Hansen and Hillyard 1983; Degerman et al. 2008).
There are several key reasons, however, why the current EBN effect is very unlikely to reflect the same mechanism as that reflected by an Nd. The first is the difference in the paradigms and the corresponding difference in how attention is allocated. More specifically, the Nd paradigms generally consist of 2 types of tones that are repeated throughout the experiment (thereby constantly providing a template for each stimulus type), and participants must pay attention to one of the repeated tone types. In contrast, the current experiment requires a search for a single Target tone in a short stimulus series, not the maintenance of attention to a stream of frequently repeated Standard tones. Secondly, the Nd is isolated by a subtraction of the responses to the repeated Standard sounds when they were attended versus when they were unattended, while the EBN in the current experiment looks at the difference between the single Target minus the corresponding standards on that side in each short tone series and the single Nontarget minus its corresponding standards. The contrast between these therefore isolates the differential deviance-related activity for these single Target and Nontarget occurrences, which thus shows a difference based on Targetness. Thirdly, Hansen and Hillyard (1988) showed that the position in a stream was critical for the production of an Nd, and that little to no Nd was present until the Standard-stimulus template had been “built up” over the course of several repetitions. In our experiment, there was only one Target per rapid-stream trial and therefore no opportunity to “build up” a template for it. Finally, the Nd is typically a long slow negative wave that generally begins later in time, especially for attention to pitch, which typically does not start till after 130 ms or so. Although it has been shown that the Nd can sometimes start earlier (Alho et al. 1989), this was only true for the repeated Standard tones, and not to the infrequent Targets. It is conceivable that, in the current experiment, the presence of a Target tone in each trial stream could enable the maintenance of some level of a selective pitch template for that tone across trial streams and thereby result in some contribution of the Nd to the attentional effect seen here, particularly at longer latencies. However, the structure of the current auditory paradigm, the infrequent nonrepeated nature of the Target occurrences in that paradigm, the way the Nd response is typically isolated, and the typical timing of that component would seem to make it rather unlikely that the Nd mechanism underlies the much earlier, nonlateralized, target-selective enhancement of the EBN seen here.
Eliminating the above possibilities for this early Target-specific enhancement of the EBN requires us to consider a different mechanism. In particular, the way that this bilateral activity behaves, enhanced for the Target compared with the Nontarget a mere 50–60 ms after stimulus onset, would seem to suggest that this activity reflects a rapid matching of the Target to a preset template of the Target features, irrespective of location. This Target-matching could have been facilitated by the fact that the Target was always a Deviant, so that it was possible to set the template to detect a Deviant stimulus that was also the designated Target. That is, the Deviancy aspect of the Target may have acted as a stepping-stone for its identification as a Target. Alternatively, this activity could also reflect rapid Target-matching, regardless of whether the Target was also a Deviant. Considering that the Target-specific enhancement seemed to build off of the EBN that was present for both Deviant types beginning at 50–60 ms poststimulus, we speculate that the deviancy factor seems likely to have played a role. Nevertheless, clearly distinguishing between these possible mechanisms will be important to determine in future studies. Regardless, however, the presence of this very early, nonlateralized, Target-related response to a lateral stimulus Target, prior to the lateralized Target-specific N2ac, suggests that detection of the Target can occur rapidly through a bilateral, template-based, brain process that precedes the lateralized focusing of attention toward the Target.
Due to the nature of the experimental setup, with 2 Deviants in a stream of rapidly presented stimuli, it is important to consider whether an attentional-blink-like effect could have occurred and had some influence on the data. The attentional blink is a phenomenon that has been observed in both visual and auditory studies where a rapid stimulus presentation with 2 Targets (T1 and T2) can occur, and the participants are instructed to identify both. If the T2 occurs in key positions after the T1, identification of the T2 is greatly diminished (Raymond et al. 1992; Arnell and Jolicoeur 1999; Shen and Mondor 2008). Although there were 2 Deviants in our paradigm, only one of them was relevant during each trial. Thus, no attentional blink would be anticipated, akin to control conditions in attentional-blink paradigms where the T1 is ignored, participants are only asked to detect the T2, and no detection deficit is observed (Shen and Mondor 2008). Moreover, additional analyses of the data examining Target-detection rates as a function of the relative positions of the 2 Deviants in the series confirmed that there was no variation in Target detection as a function of the relative lag to the Nontarget Deviant. Thus, it seems very unlikely that any sort of attentional-blink effect was at play in the current study.
Another main goal of this experiment was to establish that the N2ac component, first reported in Gamble and Luck (2011) in a simultaneous paired-stimulus paradigm, would also be elicited in a temporally distributed paradigm, one that has multiple distracters and more closely mimics a more complex auditory scene. The present data indicate that, when searching for a lateralized Target sound in a temporal and spatially distributed array of auditory stimuli, the focusing of attention to the Target does indeed elicit a crisp and robust contralateral negativity, the N2ac. The present results also show that the initial detection of that Target (the Target-specific enhancement of the EBN) occurs in a nonlateralized way prior to the focusing of attention onto the lateralized target stimulus that is reflected by the N2ac. Finally, the N2ac in this temporally distributed experiment and the one from Gamble and Luck (2011) simultaneous-sound experiment were roughly similar in shape, location, duration, and onset.
Late Posterior Contralateral Positivity
As in Gamble and Luck (2011), the Targets here also elicited a late posterior contralateral positivity. By being able to separately extract the responses for the Targets and the Nontargets here, however, it could be determined that this activation was present for the former but absent for the latter. While such a pattern suggests that this activity may reflect some sort of Target-specific contralateral visual processing, it may also reflect some general reorienting effect, as originally suggested by Gamble and Luck (2011). To make the Target discrimination, the participant needed to shift their attention to the Target, resulting in the N2ac. In order to prepare for the subsequent trial, the participant may have needed to reorient their attention to, again, be spread across the auditory field, which is a lateralized attention shift in the opposite direction, potentially explaining the lateralization and polarity of this late positive component following the Target. In the Nontarget condition, attention would not have been shifted toward the Nontarget side and thus there would not have been a need for reorientation. In Gamble and Luck, the use of speakers as a cue for the source of the sounds could have led to a concomitant visual reorientation, consistent with the occipital scalp distribution reported there. In the current study, the use of headphones would seem likely to have eliminated this potential visual cue, rendering this possibility less likely. On the other hand, considering the occipital distribution of this effect, the shift of auditory attention toward a lateralized auditory Target stimulus could automatically invoke a shift of visual attention regardless of the use of headphones. Future studies will be necessary to determine the functional role reflected by this longer-latency, posteriorly distributed, lateralized response.
In the present study, we examined the sequence of neural processes invoked during auditory Target search, using a novel, temporally distributed, rapid stimulus presentation paradigm with multiple distractors. The results establish that the lateralized N2ac ERP response reflects a general electrophysiological marker of the lateralized focusing of auditory attention toward a detected Target sound in an auditory scene. Moreover, the current paradigm and analytic approach enabled the selective extraction of the responses to the various stimulus types, in turn enabling the determination of the cascade of neural events that occur while searching for, identifying, and then focusing attention toward the Target for further processing (see summary panels in Fig. 5).
First, the rapid detection of a lateralized Target sound appears to be reflected by a Target-specific nonlateralized enhancement of fronto-central brain activity starting at approximately 50–60 ms (here termed the EBN effect), an early single target detection effect not previously reported. Secondly, approximately 10–20 ms later (i.e., starting at about 60–70 ms), the location of the Deviant stimuli (both Target and Nontarget), is reflected by a contralaterally enhanced negative ERP wave (here termed “Lateralized Deviance-Related Negativity”) that did not differ between the Targets and Nontargets. Starting at about 130–140 ms, however, the lateralized focusing of spatial auditory attention toward the Target for further enhanced processing is initiated, as reflected by the lateralized N2ac. These activations are then followed starting at around 300 ms by a target-selective enhanced posterior contralateral positivity over occipital cortex.
The above cascade of neural processes suggests that the brain mechanisms underlying the search for an auditory Target entail a preset bilateral template for the Target sound that enables its very rapid detection (within 60 ms) from within a complex auditory scene. Starting 60–80 ms later following this detection process, spatial auditory attention can be selectively focused toward this Target stimulus to perform task-required discrimination of its characteristics. This is then, in turn, followed by a longer-latency lateralized posterior response that reflects either a generalized reorienting processes or a supramodally linked shift of visual spatial attention. This succession of events thus delineates the tightly orchestrated sequence of neural processes that occur during the detection of, and focusing of attention toward, an auditory Target stimulus in a complex auditory scene.
Conflict of Interest: None declared.