Abstract

Goal-directed attention to sound identity (what) and sound location (where) has been associated with increased neural activity in ventral and dorsal brain regions, respectively. In order to ascertain when such segregation occurs, we measured event-related potentials during an n-back (n = 1, 2) working memory task for sound identity or location, where stimuli selected randomly from 3 semantic categories (human, animal, music) were presented at 3 possible virtual locations. Accuracy and reaction times were comparable in both “what” and “where” tasks, albeit worse for the 2-back than for the 1-back condition. The partial least squares analysis of scalp-recorded and source waveform data revealed domain-specific activity beginning at about 200-ms poststimulus onset, which was best expressed as changes in source activity near Heschl's gyrus, and in central medial, occipital medial, right frontal and right parietal cortex. The effect of working memory load emerged at about 400-ms poststimulus and was expressed maximally over frontocentral scalp region and in sources located in the right temporal, frontal and parietal cortices. The results show that for identical sounds, top-down effects on processing “what” and “where” information is observable at about 200 ms after sound onset and involves a widely distributed neural network.

Spatiotemporal Analysis of Auditory “What” and “Where” Working Memory

Much progress has been made in the past decade in characterizing the neural substrates underlying sound identification and sound localization. Evidence from lesion studies has suggested that sound identification depends on anterior temporal regions, whereas damage to parietal regions impairs listeners’ ability to localize and/or remember the location of a particular sound source (Clarke et al. 2000). Functional magnetic resonance imaging (fMRI) studies have also shown that the anterior temporal lobe and inferior prefrontal cortex are active during sound identification, whereas sound location is associated with greater changes in hemodynamic response in the inferior parietal lobule, superior parietal cortex and superior frontal gyrus (Alain et al. 2001; Maeder et al. 2001; Arnott et al. 2004, 2005; Degerman et al. 2006). These studies demonstrate that “what” and “where” processing relies on distributed activity in a number of cortical areas. However, the relatively poor temporal resolution of the fMRI method makes it difficult to determine when this segregation in processing sound identity and sound location is taking place.

Scalp recording of event-related brain potentials (ERPs) can help identify when information processing differs with respect to bottom-up stimulus-related factors or top-down task-related instructions. Few studies have examined the impact of task instructions on the processing of sound identity and sound location. Using the same stimulus set, Alain et al. (2001) found 2 ERP modulations that distinguished the processing of sound identity and sound location during a delayed match to sample task. The first occurred between 300 and 500 ms after the first stimulus (S1) and consisted of greater positivity over the inferior frontotemporal scalp regions during the pitch matching task and greater positivity over the centroparietal scalp regions during the location task. This effect was thought to reflect the maintenance of S1 in working memory for an eventual comparison with the second stimulus (S2). The second modulation occurred 300–400 ms after S2 and showed similar differences in the waveforms as seen during the delay interval. There was no difference in the late positive complex (LPC) associated with making a response (see also Anurova et al. 2005).

However, other studies failed to find ERP differences between auditory “what” and “where” tasks (Rama et al. 2000; Anurova et al. 2005). For instance, Anurova et al. (2005) simultaneously recorded neuroelectric (using electroencephalography, EEG) and neuromagnetic (using magnetoencephalography, MEG) brain activity while participants performed either a spatial or nonspatial working memory task. On each trial, participants were presented with 6 stimuli and were asked to indicate whether the fourth, fifth, and sixth stimuli matched the first, second, and third stimuli, respectively. The ERP amplitude and latencies recorded over the parieto-occipital region were comparable in both spatial and nonspatial working memory tasks. However, the MEG data did reveal a small difference in source strength, with the posterior source being more active during the spatial working memory task, whereas the temporal source tended to be more active during the nonspatial task. Using an n-back working memory task, Rama et al. (2000) found enhanced amplitude of the sustained potential with increased working memory load but no difference as a function of task or interaction between task and working memory load. Task-specific increases in activation with increased working memory load could indicate that an area is involved in maintaining an internal representation of the task-relevant features. Nonspecific increases in activation with increases in working memory load would be expected from areas more involved in the higher order attention demands than with task-specific processing. In the studies reviewed above, the small or lack of difference in brain activity as a function of the working memory task may be partly related to the material used in these studies (i.e., pure tones as opposed to environmental sounds). Moreover, the analysis also tended to focus on a particular component and subset of electrodes. Hence, it remains possible that task-related differences might have been undetected because of the choice of electrodes and/or latencies entered in the analysis.

In the present study, the time course of neural activity for processing sound identity and sound location was examined using environmental sounds presented at virtual locations using head-related transfer functions. The experimental design is comparable to that of recent fMRI studies showing distinct patterns of brain activity during auditory working memory tasks involving natural sounds (i.e., musical instrument, animal, and human sounds) presented at various spatial locations (Grady et al. 2007; Alain et al. 2008). We measured scalp-recorded ERPs while participants were asked to selectively process sound identity (what) or sound location (where) during an n-back (n = 1, 2) working memory task. Therefore, the present study provides complementary information regarding the time course of task-specific changes in neural activity involved in processing auditory “what” and “where” information. Moreover, it extends previous ERP work by using more ecologically relevant sounds and by analyzing the entire dataset. As mentioned earlier, task-related differences may have gone unnoticed in previous studies because of a priori decisions regarding the latency and/or electrodes used in the analysis. Here, the ERP data are analyzed using the partial least squares (PLS) technique (Lobaugh et al. 2001). This multivariate technique allows one to consider an entire dataset in a single analysis and has proven helpful in identifying activity patterns from the scalp-recorded ERP data that differentiate various perceptual (Itier et al. 2004) and memory tasks (Hay et al. 2002; Duzel et al. 2003). Based on previous fMRI studies using similar design, we hypothesized that processing sound identity and sound location would result in distinct patterns of neural activity, which could be distinguished from those related to working memory load.

Materials and Methods

Participants

Fifteen young adults participated in the experiment. Data from one participant was discarded due to excessive ocular artifacts; data from a second volunteer was discarded as a result of very poor accuracy; data from a third participant was discarded because of excessive drowsiness and inability to complete the entire experiment. The final sample was composed of twelve young adults (aged between 21 and 30 years, mean age of 24.67 ± 3.45 years; 6 males; one left handed). All participants had pure tone thresholds less than or equal to 30 dB HL for frequencies between 250 and 8000 Hz in both ears. Each individual provided informed consent as approved by the University of Toronto Human Subject Review Committee and was paid for his/her participation.

Stimuli and Task

Stimuli consisted of meaningful sounds from 3 semantic categories: animal (e.g., dog bark, bird chirping), human (e.g., cough, laugh), and musical instruments (e.g., flute, clarinet). In each category, 10 different exemplars were presented. Sounds were chosen from a larger databank and only those that could be unambiguously categorized were included in the study. They were edited to have a duration of 1005 ms. Onsets and offsets were shaped by 2 halves of an 8-ms Kaiser window, respectively. Stimuli were digitally generated with a 16-bit resolution and a 12.21 kHz sampling rate and passed through a digital-to-analog RP2 converter (Tucker-Davis Technology, Gainesville, FL). Stimuli were presented at about 77 dB (range of 72–82 dB) sound pressure level (root mean square) via Sennheiser HD 265 linear headphones at 3 possible azimuth locations relative to straight ahead (-45°, 0°, +45°) using head-related transfer functions that replicated the acoustic effects of the head and ears of an average listener (Wenzel et al. 1993).

Participants were tested individually in a double-walled sound attenuated chamber (IAC model1204A). Each participant performed 2 versions of a n-back working memory task at 2 levels of difficulty. In the category task, participants responded whenever a semantic category was repeated (e.g., 2 animal sounds), regardless of the sound's location. In the easy (1-back) task, participants were required to compare the current stimulus with that presented on the previous trial and to press a button if the stimulus belonged to the same semantic category. In the harder (2-back) task, participants were required to compare the current stimulus with that presented 2 trials ago and to press a button if the stimulus belonged to the same semantic category. In the location task, participants indicated by pressing a button whether the incoming stimulus was presented at the same location as in the previous trial (1-back) or, during the 2-back task, whether the current stimulus matched the location of the stimulus presented 2 trials ago. The stimuli used in both the category and location conditions were identical, only task instructions changed. Although the task may require different levels of processing because the identity judgment involved 3 categories with various items (e.g. animal sounds), whereas the location judgment involved 3 fixed azimuthal positions, pilot studies showed that both tasks were comparable in terms of difficulty. The inter-stimulus interval (ISI) in the easier task was 900–1100 ms in 10-ms steps, whereas in the difficult task the ISI was 1900–2100 ms in 10-ms steps. Participants always completed the easier task first but order of presentation of the category and location conditions was counterbalanced across subjects. Each block contained 135 trials, 33 of which were targets. Participants completed 2 blocks of each level of each condition for a total of 8 blocks (2 easy category, 2 difficult category, 2 easy location, 2 difficult location). Participants did not receive any feedback with regards to their performance.

ERP Recording and Analysis

Neuroelectric brain activity was digitized continuously from an array of 64 electrodes with a bandpass of 0.05–100 Hz and sampling rate of 500 Hz using NeuroScan Synamps2 (Compumedics, El Paso, TX). During the recording, all electrodes were referenced to the Cz electrode but were re-referenced to an average reference for data analysis. Electrodes placed at the outer canthi and the superior and inferior orbit monitored vertical and horizontal eye movements.

All averages were computed using Brain Electrical Source Analysis (BESA, V. 5.1.8) software. The analysis epoch included 200 ms of prestimulus activity and 2000 ms of poststimulus activity. Amplitude thresholds were adjusted on a subject-by-subject basis to include a minimum of 80% of the target stimuli in the average. Thresholds ranged from 100 to 195 μV (average = 152 μV). ERPs were then averaged separately for each condition, stimulus type, and electrode site.

For each participant, a set of ocular movements was obtained prior to and after the experiment (Picton et al. 2000). From this set, averaged eye movements were calculated for both lateral and vertical eye movements as well as for eye-blinks. A principal component analysis of these averaged recordings provided a set of components that best explained the eye movements. The scalp projections of these components were then subtracted from the experimental ERPs to minimize ocular contamination such as blinks, saccades and lateral eye movements for each individual average. ERPs were then digitally low-pass filtered to attenuate frequencies above 20 Hz.

The electrophysiological data were analyzed using PLS in order to distinguish the different components contributing to “what” and “where” processing, working memory load and target processing. Use of PLS allows one to consider an entire dataset (i.e., all conditions across all electrodes and latencies) in a single analysis. The main output from PLS analysis consists of a set of latent variables (LV's). Each LV represents one contrast between experimental conditions (design salience, or task effect) together with the corresponding pattern across electrodes and latencies (electrode saliences), which optimally expresses the contrast. The figure of merit for PLS is provided by statistical assessment using resampling procedures. Firstly, the significance of LV's is estimated using permutation tests, which randomly reassigns conditions within participants. Secondly, for each significant LV, the corresponding electrode saliences are further tested for stability across participants by bootstrap resampling of participants. These 2 resampling methods provide complementary information of the statistical strength of the task effects and the reliability of the corresponding regional contributions. For a basic mathematical description of PLS and detailed discussion of the statistical assessment, please refer to McIntosh and Lobaugh (2004).

Main task effects and their interactions were analyzed using 5 a priori defined experimentally driven contrasts (nonrotated PLS). For the “what” and “where” and working memory contrasts, the analysis was limited to ERPs elicited by standard sounds because ERPs to deviant sounds comprised fewer trials and increased the variance, thereby reducing the likelihood of finding small differences between the tasks and working memory load. Each contrast expressed a particular differentiation in the brain signal amplitude across the 4 conditions. For example, for the contrast aimed at identifying the time course of neural activity that distinguished the processing related to sound identity and sound location, the ERPs for the category task during the 1-back and 2-back conditions were assigned value −1 and all ERPs recorded during the location judgment task were assigned value 1. We will also refer to this contrast as “what versus where.” The effects of stimulus type and its possible interaction with task and working memory load were also investigated by including in the PLS analysis the ERPs elicited by the target stimuli. Statistical assessment of task effects was performed using 500 permutations and 500 bootstrap samples.

PLS and Source Waveforms

To gain a better understanding of the spatiotemporal pattern underlying sound identity and sound location, we also conducted a PLS analysis on the source waveforms obtained from each participant and for each stimulus and task condition. To enhance signal-to-noise ratio, the model was created using the grand averaged ERPs elicited by standard stimuli regardless of task and memory load. The analysis assumed a 4-shell ellipsoidal head model with relative conductivities of 0.33, 0.33, 0.0042, and 1 for the head, scalp, bone, and cerebrospinal fluid, respectively, and sizes of 85 mm (radius), 6 mm (thickness), 7 mm (thickness), and 1 mm (thickness). As an initial step, we used a surrogate model from BESA software (version 5.18) designed to model auditory evoked potentials. This model comprises 11 regional sources, each containing 3 orthogonal dipoles to account for all directions of current flow at the source location (tangential, radial, anterior/posterior). Maintaining the symmetrical and orthogonal constraint, the orientations of the tangential source in the left and right auditory cortices were aligned with the N1 peak amplitude (maximum direction of activity between 60 and 160 ms) and the 3 sources in the auditory cortex were converted to single dipoles. Then the remaining regional sources were converted to single dipoles to account for activity between 160 and 800 ms following stimulus onset. In all cases, the source location was kept constant; only orientation was allowed to change to optimize the signal-to-noise ratio. In each participant, the resulting model was held fixed and used as a spatial filter to derive source waveforms for both stimulus types (standard and target) in all listening conditions (i.e., category—easy; category—hard; location—easy; location—hard). The PLS was performed on the source waveforms using the same task contrasts as before.

Results

Behavior

Figure 1 shows the group mean accuracy (hits minus false alarms) and response time as a function of working memory load in both category and location tasks. In both tasks, accuracy decreased with increasing working memory load, F1,11 = 7.06, P < 0.05. The main effect of task was not significant nor was the interaction between task and working memory load. A similar pattern was found for the response time data, with longer response times during the 2-back than 1-back condition, F1,11 = 16.98, P < 0.005. As for accuracy, the main effect of task was not significant nor was the interaction between task and working memory load.

Figure 1.

(A) Group mean accuracy (hits minus false alarms) and (B) response time as a function of working memory load in both category and location tasks. The error bars indicate standard error of the mean.

Figure 1.

(A) Group mean accuracy (hits minus false alarms) and (B) response time as a function of working memory load in both category and location tasks. The error bars indicate standard error of the mean.

ERP Results

Group mean ERPs elicited by the same stimuli during working memory for sound identity and sound location comprised large N1 and P2 waves peaking at about 120 and 200-ms poststimulus. The N1 and P2 waves were largest at midline frontocentral scalp sites and inverted in polarity at inferior parieto-occipital sites (Fig. 2C), consistent with generators located in the superior temporal gyrus along the planum temporale. In addition to these sensory evoked responses, the ERPs comprised a slow negative wave that lasted several hundred milliseconds over the frontal areas and was followed by small transient responses peaking about 100 and 200 ms after sound offset (i.e., 1005 ms). Processing sound identity and sound location was also paralleled by a LPC maximum over the midline parieto-occipital areas.

Figure 2.

“What versus where” contrast. (A) Scalp distributions for the “what” (left) and “where” (right) task averaged over working memory load, with the difference in amplitude distribution at 400 ms shown in (B). Time courses for the 2 conditions at Midline central scalp site (Cz) right mastoid (TP10) are shown in (C). Color bars on top and bottom signify times of stable difference between 2 conditions at this electrode (absolute bootstrap ratio > 3). The latency displayed in A and B was selected based on the bootstrap ratio plot shown in (D). Around 400 ms the contrast is most stably expressed (i.e., large absolute bootstrap ratios) and most widespread (i.e., large number of stable electrodes at a given time).

Figure 2.

“What versus where” contrast. (A) Scalp distributions for the “what” (left) and “where” (right) task averaged over working memory load, with the difference in amplitude distribution at 400 ms shown in (B). Time courses for the 2 conditions at Midline central scalp site (Cz) right mastoid (TP10) are shown in (C). Color bars on top and bottom signify times of stable difference between 2 conditions at this electrode (absolute bootstrap ratio > 3). The latency displayed in A and B was selected based on the bootstrap ratio plot shown in (D). Around 400 ms the contrast is most stably expressed (i.e., large absolute bootstrap ratios) and most widespread (i.e., large number of stable electrodes at a given time).

The effect of task instruction and working memory load on ERPs was quantified using PLS of the whole epoch across all electrodes. The effects of task (category vs. location) and working memory load (1-back vs. 2-back) and their potential interaction were examined by computing specific contrasts among the experimental conditions. The interaction contrast between task and memory load was not significant (P = 0.98).

“What” and “Where” Contrast

The “what versus where” contrast was significant (P < 0.01). The difference between the “what” and “where” working memory task was expressed maximally at centroparietal sites and began at about 200-ms poststimulus and peaked at about 400 ms after stimulus onset (Fig. 2). This task-related difference was also present at bilateral inferior temporal-parietal and cerebellar sites, but with a different polarity suggesting generators located in the superior temporal gyrus.

The results from the PLS analysis indicate that task-related differences in processing sound identity and sound location emerged after the N1 wave. Although the PLS technique is very sensitive, it remains possible that small differences in N1 may have been undetected, especially at temporal sites (e.g., electrode T3 and T4) where the N1 amplitude is smaller. To rule out this possibility, we carried out an ANOVA on the N1 peak amplitude and latency recorded over the left (electrode T3) and right (electrode T4) auditory cortices. The analysis was limited to ERPs elicited by standard sounds because differences in N1 between location and category deviant stimuli can be accounted for by refractoriness rather then differences related to task instruction. An ANOVA on the N1 latency with task and load as factors yielded a main effect of hemisphere, F1,11 = 14.37, P < 0.005, with the N1 peaking earlier over the right than the left hemisphere. The interaction between task and hemisphere was also significant, F1,11 = 7.17, P < 0.05, reflecting shorter N1 latency over the right auditory cortex during the category (M = 94 ms; SE = 3.0 ms) than the location (M = 98 ms; SE = 3.3 ms) task, but longer latency over the left auditory cortex in the category (M = 110 ms; SE = 2.5 ms) compared with the location (M = 108 ms; SE = 2.6 ms) task. For the N1 amplitude, the main effect of working memory load was significant, F1,11 = 12.96, P < 0.005. Although the interaction between task and working memory load was not significant, F < 1, there was a marginally significant task × load × hemisphere interaction, F1,11 = 4.56, P = 0.06, which reflected greater load-related increases in N1 amplitude over the left during the category task, whereas the load-related change was greater over the right hemisphere during the location task.

Working Memory Load Contrast

The main effect of working memory load (i.e., “easy vs. hard” contrast) was significant (P < 0.001) and was reliably expressed over 2 different intervals. The first interval encompasses the N1 wave, with larger amplitude during the 2-back than 1-back condition at central and frontocentral scalp sites. This increase in N1 amplitude as a function of working memory load is, however, partly confounded with a difference in ISI between the 2 conditions, which was longer in the 2-back than 1-back condition. The second interval began at about 300-ms poststimulus and was expressed maximally during the 450- to 600-ms poststimulus interval (Fig. 3). This load-related difference was expressed at frontocentral sites on one hand, and bilaterally at inferior frontotemporal, inferior temporal-parietal and cerebellar sites on the other, but with a different polarity suggesting generators located in temporal and/or prefrontal cortex. Moreover, the effect was larger over the left central (e.g., C3) than right (e.g., C4) scalp region, which may partly reflect motor preparation.

Figure 3.

“Easy versus hard” contrast. (A) Scalp distributions of “easy” (left) and “hard” (right) conditions averaged over both tasks, with the difference in amplitude distribution at 558 ms shown in (B). Time courses for the 2 conditions at midline frontocentral (FCz) and right mastoid (TP10) are shown in (C). Color bars on top and bottom signify times of stable difference between 2 conditions at this electrode (absolute bootstrap ratio > 3). The latency displayed in (A) and (B) was selected based on the bootstrap ratio plot shown in (D). Looking beyond the early difference at ∼100 ms which is confound by the difference in the ISI, we detected long latency around 558 ms where the contrast is most stably expressed (i.e., large absolute bootstrap ratios) and most widespread (i.e., large number of stable electrodes at a given time).

Figure 3.

“Easy versus hard” contrast. (A) Scalp distributions of “easy” (left) and “hard” (right) conditions averaged over both tasks, with the difference in amplitude distribution at 558 ms shown in (B). Time courses for the 2 conditions at midline frontocentral (FCz) and right mastoid (TP10) are shown in (C). Color bars on top and bottom signify times of stable difference between 2 conditions at this electrode (absolute bootstrap ratio > 3). The latency displayed in (A) and (B) was selected based on the bootstrap ratio plot shown in (D). Looking beyond the early difference at ∼100 ms which is confound by the difference in the ISI, we detected long latency around 558 ms where the contrast is most stably expressed (i.e., large absolute bootstrap ratios) and most widespread (i.e., large number of stable electrodes at a given time).

Stimulus Type Contrast

The “standard versus target” contrast was significant (P < 0.001). The main effect of stimulus type began as early as 130-ms poststimulus and was expressed maximally over the left central and parietal regions. An increased target-related activity beginning at about 200-ms poststimulus and peaking around 350-ms poststimulus followed this early modulation (Fig. 4). Relative to standard sounds, processing infrequent target stimuli was also associated with changes in ERP amplitude over the left central scalp region, consistent with an fMRI study showing increased activity in the left pre- and/or postcentral gyrus when participants make their responses with their right index finger (Alain et al. 2008). In addition, there was a modulation that encompassed the P2 interval at parietal sites, which was also present bilaterally at temporo-parietal and cerebellar sites, albeit to greater extent over the right than the left hemisphere.

Figure 4.

“Standard versus target” contrast. (A) Scalp distributions of “standard” (left) and “target” (right) stimuli averaged over task and working memory load, with the difference in amplitude distribution shown in (B). Time courses for the 2 conditions at left central (C3) and right cerebellar site (CB2) are shown in (C). Color bars on top and bottom signify times of stable difference between 2 conditions at this electrode (absolute bootstrap ratio > 3). The latency displayed in panel A and B was selected based on the bootstrap ratio plot shown in (D). Around 200 ms the contrast is most stably expressed (i.e., large absolute bootstrap ratios) and most widespread (i.e., large number of stable electrodes at a given time).

Figure 4.

“Standard versus target” contrast. (A) Scalp distributions of “standard” (left) and “target” (right) stimuli averaged over task and working memory load, with the difference in amplitude distribution shown in (B). Time courses for the 2 conditions at left central (C3) and right cerebellar site (CB2) are shown in (C). Color bars on top and bottom signify times of stable difference between 2 conditions at this electrode (absolute bootstrap ratio > 3). The latency displayed in panel A and B was selected based on the bootstrap ratio plot shown in (D). Around 200 ms the contrast is most stably expressed (i.e., large absolute bootstrap ratios) and most widespread (i.e., large number of stable electrodes at a given time).

Source Waveform Results

To gain a better understanding of the neural network underlying these differences in ERPs as a function of task and working memory load, we performed a PLS analysis on the source waveforms, which reflect the time course of neural activity in different brain regions that best account for the scalp-recorded data. The scalp-recorded ERPs were modeled using a model that comprises 15 dipoles (Fig. 5), including 6 dipoles in the temporal lobe near Heschl's gyrus (3 in each hemisphere). As for the scalp-recorded data, the effects of task (category vs. location), working memory load (1-back vs. 2-back), stimulus type (standard vs. targets) and their potential interaction were examined by computing specific contrasts among the experimental conditions. The interaction contrast between task and memory load was not significant (P = 0.95).

Figure 5.

Dipole source model of the grand averaged ERPs elicited by the standard stimuli in both “what” and “where” tasks regardless of working memory load.

Figure 5.

Dipole source model of the grand averaged ERPs elicited by the standard stimuli in both “what” and “where” tasks regardless of working memory load.

“What” and “Where” Contrast

The “what vs. where” contrast was significant (P < 0.01) and was expressed maximally in the tangential sources near Heschl's gyrus. The difference in auditory “what” and “where” emerged at about 200-ms poststimulus and peaked during the 300- to 650-ms poststimulus interval (Fig. 6). Different time courses of activity were also found in centro-medial (CM), occipital medial (OpM), and right frontal (FR) and parietal (PR) sources. In the right frontal and parietal sources, these differences emerged after those observed in the auditory cortices and were relatively small in amplitude.

Figure 6.

Source waveforms: “What versus Where” contrast. Group mean source waveforms for the standard stimuli as a function of the task averaged over working memory load. FpM = frontopolar median; FM = frontal median; CM = central median; PM = parietal median; OpM = occipital-parietal median; FL = frontal left; ACaL = auditory cortex anterior–posterior tangential left; ACtL = auditory cortex tangential left; ACrL = auditory cortex radial left; PL = parietal left; FR = frontal right; ACaR = auditory cortex anterior–posterior tangential right; ACtR = auditory cortex tangential right; ACrR = auditory cortex radial right; PR = parietal right. The task-related difference in neural activity around 400 ms was best reflected in the tangential auditory sources (ACtL and ACtR) and also in the medial occipital source (OpM).

Figure 6.

Source waveforms: “What versus Where” contrast. Group mean source waveforms for the standard stimuli as a function of the task averaged over working memory load. FpM = frontopolar median; FM = frontal median; CM = central median; PM = parietal median; OpM = occipital-parietal median; FL = frontal left; ACaL = auditory cortex anterior–posterior tangential left; ACtL = auditory cortex tangential left; ACrL = auditory cortex radial left; PL = parietal left; FR = frontal right; ACaR = auditory cortex anterior–posterior tangential right; ACtR = auditory cortex tangential right; ACrR = auditory cortex radial right; PR = parietal right. The task-related difference in neural activity around 400 ms was best reflected in the tangential auditory sources (ACtL and ACtR) and also in the medial occipital source (OpM).

Working Memory Load Contrast

As for the scalp-recorded data, the main effect of working memory load on source waveforms (i.e., “easy vs. hard” contrast) was significant (P < 0.001) and was reliably expressed over 2 different intervals. The first interval encompasses the tangential source of the N1 wave, with a larger N1 during the 2-back than 1-back condition. The second interval began at 400 ms and was expressed maximally at the left tangential and anterior/posterior sources in the temporal cortex (Fig. 7). This load-related difference was also expressed in the right parietal and frontal sources.

Figure 7.

Source waveforms: “Easy versus Hard” contrast. Group mean source waveforms for the standard stimuli as a function of the task averaged over working memory load. FpM = frontopolar median; FM = frontal median; CM = central median; PM = parietal median; OpM = occipital–parietal median; FL = frontal left; ACaL = auditory cortex anterior–posterior tangential left; ACtL = auditory cortex tangential left; ACrL = auditory cortex radial left; PL = parietal left; FR = frontal right; ACaR = auditory cortex anterior–posterior tangential right; ACtR = auditory cortex tangential right; ACrR = auditory cortex radial right; PR = parietal right. The effect of working memory load peaking at 560-ms poststimulus was best reflected in the anterior–posterior auditory sources (ACaL and ACaR), frontal median (FM), and right parietal source (PR).

Figure 7.

Source waveforms: “Easy versus Hard” contrast. Group mean source waveforms for the standard stimuli as a function of the task averaged over working memory load. FpM = frontopolar median; FM = frontal median; CM = central median; PM = parietal median; OpM = occipital–parietal median; FL = frontal left; ACaL = auditory cortex anterior–posterior tangential left; ACtL = auditory cortex tangential left; ACrL = auditory cortex radial left; PL = parietal left; FR = frontal right; ACaR = auditory cortex anterior–posterior tangential right; ACtR = auditory cortex tangential right; ACrR = auditory cortex radial right; PR = parietal right. The effect of working memory load peaking at 560-ms poststimulus was best reflected in the anterior–posterior auditory sources (ACaL and ACaR), frontal median (FM), and right parietal source (PR).

Stimulus Type Contrast

The “standard vs. target” contrast was significant (P < 0.001). The main effect of stimulus type on source waveforms began as early as 130-ms poststimulus and was expressed maximally at the tangential source for the N1 in both left and right auditory cortex. An increased target-related activity peaking at 600 ms in the left parietal source followed this early modulation (Fig. 8).

Figure 8.

Source waveforms: “Standard versus Target” contrast. Group mean source waveforms for the standard and target stimuli averaged over task and working memory load. FpM = frontopolar median; FM = frontal median; CM = central median; PM = parietal median; OpM = occipital–parietal median; FL = frontal left; ACaL = auditory cortex anterior–posterior tangential left; ACtL = auditory cortex tangential left; ACrL = auditory cortex radial left; PL = parietal left; FR = frontal right; ACaR = auditory cortex anterior–posterior tangential right; ACtR = auditory cortex tangential right; ACrR = auditory cortex radial right; PR = parietal right. The effect of working memory load peaking at 560-ms poststimulus was best reflected in the anterior–posterior auditory sources (ACaL and ACaR), frontal median (FM), and right parietal source (PR). Previously detected effect in the electrode space around 200 ms was best reflected in the source space, mainly in the tangential auditory sources (ACtL and ACtR).

Figure 8.

Source waveforms: “Standard versus Target” contrast. Group mean source waveforms for the standard and target stimuli averaged over task and working memory load. FpM = frontopolar median; FM = frontal median; CM = central median; PM = parietal median; OpM = occipital–parietal median; FL = frontal left; ACaL = auditory cortex anterior–posterior tangential left; ACtL = auditory cortex tangential left; ACrL = auditory cortex radial left; PL = parietal left; FR = frontal right; ACaR = auditory cortex anterior–posterior tangential right; ACtR = auditory cortex tangential right; ACrR = auditory cortex radial right; PR = parietal right. The effect of working memory load peaking at 560-ms poststimulus was best reflected in the anterior–posterior auditory sources (ACaL and ACaR), frontal median (FM), and right parietal source (PR). Previously detected effect in the electrode space around 200 ms was best reflected in the source space, mainly in the tangential auditory sources (ACtL and ACtR).

Discussion

This study aimed at clarifying when processing sound identity and sound location diverged and to test whether this pattern of neural activity could be distinguished from that elicited by manipulation of memory load. As mentioned earlier, the study was based on recent fMRI studies showing distinct patterns of brain activity during auditory working memory tasks involving natural sounds presented at various spatial locations (Grady et al. 2007; Alain et al. 2008). Hence, this study serves to obtain complementary information regarding the time course of task-specific changes in neural activity involved in processing auditory “what” and “where” information to substantiate the dual pathway model of auditory processing.

Effects of Task Instruction

Neurophysiological studies in nonhuman primates (Rauschecker et al. 1995; Kaas and Hackett 1998; Romanski, Bates, et al. 1999; Romanski, Tian, et al. 1999), neuroimaging (Alain et al. 2001; Maeder et al. 2001; Arnott et al. 2005; Degerman et al. 2006), and lesion studies (Clarke et al. 1996; Clarke et al. 2000, 2002; Adriani et al. 2003; Clarke and Thiran 2004) in humans have provided converging evidence supporting the dual pathway model of auditory processing in which ventral and dorsal processing streams are thought to be analogous to the “what” and “where” processing streams in the visual modality (Rauschecker and Tian 2000). In the present study, we show segregation in processing “what” and “where” information that emerges at about 200 ms after sound onset when stimuli are identical and only attention is manipulated via instructions. Although the latency and amplitude distribution of this task-related modulation is consistent with our prior study using a delayed match to sample task (Alain et al. 2001), they differ from more recent studies showing differences in what and where processing as early as 100-ms poststimulus (Ahveninen et al. 2006; De Santis et al. 2006). In the present study the stimulus set was kept constant in both tasks, whereas other studies have compared ERPs from different stimulus sets (De Santis et al. 2006) or measured brain activity as a function of infrequent changes in stimulus attributes (Ahveninen et al. 2006), making it difficult to know whether the differences were specific to processing sound location or identity or whether they simply reflected general differences in stimulus adaptation. In hindsight, the relatively long latency for the “what” and “where” segregation to emerge may come as no surprise given that the processing of natural sounds (e.g., dog barking, baby crying or flute playing) is likely to require some time before they can be identified and/or localized. Hence, the time course of this “what” and “where” segregation could be partly accounted for by the time needed to extract identity and location information from natural sounds such that the appropriate schemata could be retrieved and then compared with representations of previously presented stimuli in working memory. Moreover, it is important to keep in mind that the lack of early differences in ERP amplitude does not mean that there is no functional segregation taking place earlier, but rather indicates that segregation in processing “what” and “where” information takes more time to emerge when there are no physical differences among the stimuli. In other words, top-down effects on the dual pathways likely take place in more associative areas, consistent with fMRI studies that control for differences in stimulus set (e.g., Alain et al. 2001; Alain et al. 2008).

Evidence from animal studies (e.g., Rauschecker 1998a, 1998b; Rauschecker and Tian 2000) indicates that segregation in processing sound identity and sound location takes place in nonprimary areas (i.e., belt and parabelt areas). Although we cannot exclude modulation of primary areas (i.e., Heschl's gyrus) via efferent connections (Winer and Lee 2007), the latency at which segregation in processing sound identity and sound location takes place suggests that it occurs in associative areas rather than in primary auditory cortex. Our dipole source analysis provides converging evidence supporting domain-specific processing along the superior temporal gyrus near Heschl's gyrus, which appears consistent with activation arising from areas in or near the primary auditory cortex. The top-down effects on processing sound identity and sound location are likely to involve complex neural networks even within the superior temporal gyrus. However, the proximity of these generators and the low spatial resolution of the EEG technique make it difficult to dissociate contributions arising from the auditory areas immediately anterior or posterior of Heschl's gyrus. In our previous fMRI studies using similar stimuli and design (Grady et al. 2007; Alain et al. 2008), we found increased activity in right dorsal brain regions, including the IPL and superior frontal sulcus, during the location than during the category task. Conversely, greater activity was observed in left superior temporal gyrus and left inferior frontal gyrus during the category task compared with the location task. The results from these fMRI studies suggest a complex network of regions contributing to task-related differences in auditory working memory for sound identity and sound location.

Effects of Task Load

The increase in auditory mnemonic demand modulated the amplitude of slow sustained potentials over the frontocentral scalp region. This effect of working memory load was comparable for the spatial and the nonspatial task. Our findings are consistent with those of an earlier study showing that an increase in the auditory mnemonic load similarly modulates the amplitude of slow potentials over the frontal scalp region during spatial and nonspatial auditory working memory tasks (Rama et al. 2000). Together, these findings suggest that this sustained activity indexes memory demands of the task rather than task-specific activity. In the present study, the increase in working memory load was also associated with a modulation of the N1 amplitude, which can be accounted for by the longer delay interval in the 2-back task.

The amplitude distribution of the working memory load effect is consistent with generators located in prefrontal cortex and/or in the planum temporale along the superior temporal gyrus. The analysis of the source waveform activity also suggests that the right parietal cortex plays an important role in auditory working memory. Prior fMRI studies have shown an increase in fMRI signal in prefrontal brain areas (Rypma and D'Esposito 1999; Rypma et al. 1999; Martinkauppi et al. 2000), temporal cortex (Martinkauppi et al. 2000; Brechmann et al. 2007) and parietal cortices (Martinkauppi et al. 2000) with increasing auditory working memory load. The present analysis of ERP data indicates that increasing working memory load modulates neural activity at about 450 ms after sound onset. This slow wave (SW) activity may index a comparison process between the incoming sound and those stored in working memory. The latency and amplitude of this SW may be related to the duration of this comparison process and/or the number of items maintained in working memory (see also Rama et al. 2000).

Effects of Stimulus Type

Infrequent responses to either category or location targets were accompanied by activity over the left central and parietal scalp region. The enhanced activity over the left central scalp areas that began at about 210-ms poststimulus may reflect activity from the pre- and/or postcentral gyrus, which has been associated with right index responses (Alain et al. 2008). Target detection has also been associated with enhanced activity in the inferior parietal lobule (Kiehl et al. 2001; Linden et al. 1999; Mulert et al. 2004; Muller et al. 2003; Stevens et al. 2000; Stevens et al. 2005; Yoshiura et al. 1999). In the present study, activity associated with target detection and motor response differed from that observed for processing sound identity and sound location. This finding provides further evidence that task instructions place a differential load on monitoring sound attributes, which can be separated from sensorimotor integration and motor responses.

Concluding Remarks

These results provide further support for the notion that processing of auditory information is parceled in nonspatial and spatial domains. Domain-specific auditory processing can be distinguished from working memory load and/or response-related processes (i.e., goal-directed action). Our findings are consistent with prior research in nonhuman primates (Rauschecker et al. 1995; Kaas and Hackett 1998; Romanski, Bates, et al. 1999; Romanski, Tian et al. 1999), and humans using fMRI (Alain et al. 2001; Maeder et al. 2001; Arnott et al. 2005; Degerman et al. 2006) and extend it by showing that top-down modulation in processing sound identity or sound location emerges at about 200 ms after sound onset. The time course suggests that both location and identity information are automatically extracted and maintained in working memory, and that “bias” in processing one or the other occurs after sensory registration.

This research was supported by grants from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada, and James S. McDonnell Foundation. We also thank the Canadian Foundation for Innovation and the Ontario government for infrastructure support. Conflict of Interest: None declared.

References

Adriani
M
Maeder
P
Meuli
R
Thiran
AB
Frischknecht
R
Villemure
JG
Mayer
J
Annoni
JM
Bogousslavsky
J
Fornari
E
, et al.  . 
Sound recognition and localization in man: specialized cortical networks and effects of acute circumscribed lesions
Exp Brain Res.
 , 
2003
, vol. 
153
 (pg. 
591
-
604
)
Ahveninen
J
Jaaskelainen
IP
Raij
T
Bonmassar
G
Devore
S
Hamalainen
M
Levanen
S
Lin
FH
Sams
M
Shinn-Cunningham
BG
, et al.  . 
Task-modulated “what” and “where” pathways in human auditory cortex
Proc Natl Acad Sci USA.
 , 
2006
, vol. 
103
 (pg. 
14608
-
14613
)
Alain
C
Arnott
SR
Hevenor
S
Graham
S
Grady
CL
What” and “where” in the human auditory system
Proc Natl Acad Sci USA.
 , 
2001
, vol. 
98
 (pg. 
12301
-
12306
)
Alain
C
He
Y
Grady
CL
The contribution of the inferior parietal lobe to auditory spatial working memory
J Cogn Neurosci.
 , 
2008
, vol. 
20
 (pg. 
285
-
295
)
Anurova
I
Artchakov
D
Korvenoja
A
Ilmoniemi
RJ
Aronen
HJ
Carlson
S
Cortical generators of slow evoked responses elicited by spatial and nonspatial auditory working memory tasks
Clin Neurophysiol.
 , 
2005
, vol. 
116
 (pg. 
1644
-
1654
)
Arnott
SR
Binns
MA
Grady
CL
Alain
C
Assessing the auditory dual-pathway model in humans
Neuroimage.
 , 
2004
, vol. 
22
 (pg. 
401
-
408
)
Arnott
SR
Grady
CL
Hevenor
SJ
Graham
S
Alain
C
The functional organization of auditory working memory as revealed by fMRI
J Cogn Neurosci.
 , 
2005
, vol. 
17
 (pg. 
819
-
831
)
Brechmann
A
Gaschler-Markefski
B
Sohr
M
Yoneda
K
Kaulisch
T
Scheich
H
Working memory-specific activity in auditory cortex: potential correlates of sequential processing and maintenance
Cereb Cortex.
 , 
2007
, vol. 
17
 (pg. 
2544
-
2552
)
Clarke
S
Bellmann
A
De Ribaupierre
F
Assal
G
Non-verbal auditory recognition in normal subjects and brain-damaged patients: evidence for parallel processing
Neuropsychologia.
 , 
1996
, vol. 
34
 (pg. 
587
-
603
)
Clarke
S
Bellmann
A
Meuli
RA
Assal
G
Steck
AJ
Auditory agnosia and auditory spatial deficits following left hemispheric lesions: evidence for distinct processing pathways
Neuropsychologia.
 , 
2000
, vol. 
38
 (pg. 
797
-
807
)
Clarke
S
Bellmann Thiran
A
Maeder
P
Adriani
M
Vernet
O
Regli
L
Cuisenaire
O
Thiran
JP
What and where in human audition: selective deficits following focal hemispheric lesions
Exp Brain Res.
 , 
2002
, vol. 
147
 (pg. 
8
-
15
)
Clarke
S
Thiran
AB
Auditory neglect: what and where in auditory space
Cortex.
 , 
2004
, vol. 
40
 (pg. 
291
-
300
)
De Santis
L
Clarke
S
Murray
MM
Automatic and intrinsic auditory “what” and “where” processing in humans revealed by electrical neuroimaging
Cereb Cortex.
 , 
2007
, vol. 
17
 (pg. 
9
-
17
)
Degerman
A
Rinne
T
Salmi
J
Salonen
O
Alho
K
Selective attention to sound location or pitch studied with fMRI
Brain Res.
 , 
2006
, vol. 
1077
 (pg. 
123
-
134
)
Duzel
E
Habib
R
Schott
B
Schoenfeld
A
Lobaugh
N
McIntosh
AR
Scholz
M
Heinze
HJ
A multivariate, spatiotemporal analysis of electromagnetic time-frequency data of recognition memory
Neuroimage.
 , 
2003
, vol. 
18
 (pg. 
185
-
197
)
Grady
CL
Yu
H
Alain
C
Age-related differences in brain activity underlying working memory for spatial and nonspatial auditory information
Cereb Cortex.
 , 
2008
, vol. 
18
 (pg. 
189
-
199
)
Hay
JF
Kane
KA
West
R
Alain
C
Event-related neural activity associated with habit and recollection
Neuropsychologia.
 , 
2002
, vol. 
40
 (pg. 
260
-
270
)
Itier
RJ
Taylor
MJ
Lobaugh
NJ
Spatiotemporal analysis of event-related potentials to upright, inverted, and contrast-reversed faces: effects on encoding and recognition
Psychophysiology.
 , 
2004
, vol. 
41
 (pg. 
643
-
653
)
Kaas
JH
Hackett
TA
Subdivisions of auditory cortex and levels of processing in primates
Audiol Neurootol.
 , 
1998
, vol. 
3
 (pg. 
73
-
85
)
Kiehl
KA
Laurens
KR
Duty
TL
Forster
BB
Liddle
PF
Neural sources involved in auditory target detection and novelty processing: an event-related fMRI study
Psychophysiology.
 , 
2001
, vol. 
38
 (pg. 
133
-
142
)
Linden
DE
Prvulovic
D
Formisano
E
Vollinger
M
Zanella
FE
Goebel
R
Dierks
T
The functional neuroanatomy of target detection: an fMRI study of visual and auditory oddball tasks
Cereb Cortex.
 , 
1999
, vol. 
9
 (pg. 
815
-
823
)
Lobaugh
NJ
West
R
McIntosh
AR
Spatiotemporal analysis of experimental differences in event-related potential data with partial least squares
Psychophysiology.
 , 
2001
, vol. 
38
 (pg. 
517
-
530
)
Maeder
PP
Meuli
RA
Adriani
M
Bellmann
A
Fornari
E
Thiran
JP
Pittet
A
Clarke
S
Distinct pathways involved in sound recognition and localization: a human fMRI study
Neuroimage.
 , 
2001
, vol. 
14
 (pg. 
802
-
816
)
Martinkauppi
S
Rama
P
Aronen
HJ
Korvenoja
A
Carlson
S
Working memory of auditory localization
Cereb Cortex.
 , 
2000
, vol. 
10
 (pg. 
889
-
898
)
McIntosh
AR
Lobaugh
NJ
Partial least squares analysis of neuroimaging data: applications and advances
Neuroimage.
 , 
2004
, vol. 
23
 
Suppl. 1
(pg. 
S250
-
263
)
Mulert
C
Jager
L
Schmitt
R
Bussfeld
P
Pogarell
O
Moller
HJ
Juckel
G
Hegerl
U
Integration of fMRI and simultaneous EEG: towards a comprehensive understanding of localization and time-course of brain activity in target detection
Neuroimage.
 , 
2004
, vol. 
22
 (pg. 
83
-
94
)
Muller
BW
Stude
P
Nebel
K
Wiese
H
Ladd
ME
Forsting
M
Jueptner
M
Sparse imaging of the auditory oddball task with functional MRI
Neuroreport.
 , 
2003
, vol. 
14
 (pg. 
1597
-
1601
)
Picton
TW
van Roon
P
Armilio
ML
Berg
P
Ille
N
Scherg
M
The correction of ocular artifacts: a topographic perspective
Clin Neurophysiol.
 , 
2000
, vol. 
111
 (pg. 
53
-
65
)
Rama
P
Paavilainen
L
Anourova
I
Alho
K
Reinikainen
K
Sipila
S
Carlson
S
Modulation of slow brain potentials by working memory load in spatial and nonspatial auditory tasks
Neuropsychologia.
 , 
2000
, vol. 
38
 (pg. 
913
-
922
)
Rauschecker
JP
Cortical processing of complex sounds
Curr Opin Neurobiol.
 , 
1998
, vol. 
8
 (pg. 
516
-
521
)
Rauschecker
JP
Parallel processing in the auditory cortex of primates
Audiol Neurootol.
 , 
1998
, vol. 
3
 (pg. 
86
-
103
)
Rauschecker
JP
Tian
B
Mechanisms and streams for processing of “what” and “where” in auditory cortex
Proc Natl Acad Sci USA.
 , 
2000
, vol. 
97
 (pg. 
11800
-
11806
)
Rauschecker
JP
Tian
B
Hauser
M
Processing of complex sounds in the macaque nonprimary auditory cortex
Science.
 , 
1995
, vol. 
268
 (pg. 
111
-
114
)
Romanski
LM
Bates
JF
Goldman-Rakic
PS
Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey
J Comp Neurol.
 , 
1999
, vol. 
403
 (pg. 
141
-
157
)
Romanski
LM
Tian
B
Fritz
J
Mishkin
M
Goldman-Rakic
PS
Rauschecker
JP
Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex
Nat Neurosci.
 , 
1999
, vol. 
2
 (pg. 
1131
-
1136
)
Rypma
B
D'Esposito
M
The roles of prefrontal brain regions in components of working memory: effects of memory load and individual differences
Proc Natl Acad Sci USA.
 , 
1999
, vol. 
96
 (pg. 
6558
-
6563
)
Rypma
B
Prabhakaran
V
Desmond
JE
Glover
GH
Gabrieli
JD
Load-dependent roles of frontal brain regions in the maintenance of working memory
Neuroimage.
 , 
1999
, vol. 
9
 (pg. 
216
-
226
)
Stevens
AA
Skudlarski
P
Gatenby
JC
Gore
JC
Event-related fMRI of auditory and visual oddball tasks
Magn Reson Imaging.
 , 
2000
, vol. 
18
 (pg. 
495
-
502
)
Stevens
MC
Calhoun
VD
Kiehl
KA
Hemispheric differences in hemodynamics elicited by auditory oddball stimuli
Neuroimage.
 , 
2005
, vol. 
26
 (pg. 
782
-
792
)
Wenzel
EM
Arruda
M
Kistler
DJ
Wightman
FL
Localization using nonindividualized head-related transfer functions
J Acoust Soc Am.
 , 
1993
, vol. 
94
 (pg. 
111
-
123
)
Winer
JA
Lee
CC
The distributed auditory cortex
Hear Res.
 , 
2007
, vol. 
229
 (pg. 
3
-
13
)
Yoshiura
T
Zhong
J
Shibata
DK
Kwok
WE
Shrier
DA
Numaguchi
Y
Functional MRI study of auditory and visual oddball tasks
Neuroreport.
 , 
1999
, vol. 
10
 (pg. 
1683
-
1688
)