Increasing evidence suggests separate auditory pattern and space processing streams. The present paper describes two magnetoencephalogram studies examining gamma-band activity to changes in auditory patterns using consonant–vowel syllables (experiment 1), animal vocalizations and artificial noises (experiment 2). Two samples of each sound type were presented to passively listening subjects in separate oddball paradigms with 80% standards and 20% deviants differing in their spectral composition. Evoked magnetic mismatch fields peaking ~190 ms poststimulus showed a trend for a left-hemisphere advantage for syllables, but no hemispheric differences for the other sounds. Frequency analysis and statistical probability mapping of the differences between deviants and standards revealed increased gamma-band activity above 60 Hz over left anterior temporal/ventrolateral prefrontal cortex for all three types of stimuli. This activity peaked simultaneously with the mismatch responses for animal sounds (180 ms) but was delayed for noises (260 ms) and syllables (320 ms). Our results support the hypothesized role of anterior temporal/ventral prefrontal regions in the processing of auditory pattern change. They extend earlier findings of gamma-band activity over posterior parieto-temporal cortex during auditory spatial processing that supported the putative auditory dorsal stream. Furthermore, earlier gamma-band responses to animal vocalizations may suggest faster processing of fear-relevant information.
Recent research has suggested that auditory pattern and spatial information are processed in separate ventral and dorsal streams, respectively (Rauschecker and Tian, 2000), comparable to those in the visual system (Ungerleider and Mishkin, 1982). Single-unit recordings in non-human primates have shown that neurons in the caudal auditory belt and parabelt respond preferentially to sound characteristics useful for auditory localization (Rauschecker et al., 1997; Kaas et al., 1999; Recanzone et al., 2000; Tian et al., 2001). The auditory dorsal stream originating in these fields is thought to involve the posterior parietal and dorsolateral prefrontal cortex (Rauschecker, 1998a; Hackett et al., 1999; Romanski et al., 1999b). Homologue regions have also been shown to be activated during auditory spatial processing in humans (Griffiths et al., 1998, 2000; Baumgart et al., 1999; Bushara et al., 1999; Griffiths and Green, 1999; Weeks et al., 1999). In contrast, encoding of auditory patterns including species-specific vocalizations appears to involve neurons in rostro-lateral areas of the non-primary auditory cortex (Rauschecker et al., 1995, 1997; Rauschecker, 1998a; Tian et al., 2001). The putative auditory ventral stream projects via anterior temporal areas to the ventral prefrontal cortex (Rauschecker, 1998b; Romanski et al., 1999a,b). In humans, anterior temporal (Belin et al., 2000; Scott et al., 2000) and left prefrontal regions (Binder et al., 1997) have also been found to participate in receptive language tasks, and auditory agnosia, e.g. for environmental sounds, may result from fronto-temporal lesions (Clarke et al., 2000).
In a recent magnetoencephalography (MEG) study (Kaiser et al., 2000b), we investigated auditory spatial processing using fast oscillatory activity as a putative correlate of higher-level cognitive processes. Synchronized neuronal firing in the gamma-band range (>30 Hz) observed in multiunit recordings during visual stimulation (Gray et al., 1989; Eckhorn et al., 1993; Nowak et al., 1997; Friedman-Hill et al., 2000; Maldonado et al., 2000) has been proposed as a mechanism for feature binding (Singer and Gray, 1995) and may represent a prerequisite for perceptual awareness (Engel and Singer, 2001). In humans, electroencephalographic (EEG) studies using a variety of different paradigms have demonstrated that nonphase-locked, induced gamma-band activity (GBA) is correlated with the perception of stimulus coherence or meaningfulness (Lutzenberger et al., 1995; Tallon-Baudry et al., 1996, 1997, 1998; Müller et al., 1997; Keil et al., 1999; Pulvermüller et al., 1999b). GBA has been interpreted as a signature of cortical networks involved in the generation of mental representations (Pulvermüller et al., 1999a; Tallon-Baudry and Bertrand, 1999).
Assessing differences in the processing of rare lateralized with frequent midline language stimuli, we found both evoked mismatch fields with generators at the level of the supratemporal plane (Näätänen, 1992; Alho, 1995; Schröger and Wolff, 1997) and enhanced GBA over posterior parieto-temporal cortex (Kaiser et al., 2000b). What may be the significance of increased GBA during mismatch processing? The memory-based, pre-attentive detection of changes in a sequence of sounds is thought to be accompanied by involuntary attention allocation to potentially important changes in the environment (Näätänen and Winkler, 1999) and by a closer analysis of these changes. Consistent with these notions, auditory mismatch responses have proved more sensitive for specific stimulus characteristics than EEG or MEG components not related to difference detection. For example, hemispheric differences have been demonstrated for language versus non-language stimuli (Alho et al., 1998; Shtyrov et al., 1998; Ackermann et al., 1999) and for the processing of binaural sound lateralization (Kaiser et al., 2000a) in mismatch paradigms but not for simple stimulus presentation (Woldorff et al., 1999; Shtyrov et al., 2000). We thus interpreted our findings of spatial mismatch-related GBA increases over posterior parieto-temporal areas as a signature of neuronal assemblies representing the changes in sound-source lateralization that formed the basis of stimulus deviance. This suggests that GBA reflects not only stimulus coherence or meaningfulness but may also be associated with the representation of more specific stimulus characteristics (Pulvermüller et al., 1997). This interpretation would be consistent with the putative role of posterior parieto-temporal regions in the representation of auditory spatial information (Andersen, 1995; Rauschecker and Tian, 2000).
Employing a comparable paradigm and the same methodology as in our previous investigation (Kaiser et al., 2000b), the present study assessed both evoked fields and oscillatory responses in MEG during passive listening to deviations in auditory patterns. The first experiment considered two consonant– vowel syllables, whereas the second experiment applied two versions each of an animal sound and an artificially distorted noise. In all three cases, standards and deviants differed in their spectral composition. These studies were not designed to test differences between the types of stimuli but rather aimed at identifying correlates of auditory pattern mismatch processing that may be common to different types of sounds. We hypothesized that changes in a sound sequence caused by deviations in the stimuli's spectral composition would (a) evoke magnetic mismatch responses originating at the supratemporal plane, and (b) give rise to increased GBA over anterior temporal and/or ventral prefrontal cortex, i.e. regions hypothesized to form parts of the auditory ventral, pattern processing stream (Rauschecker, 1998b; Romanski et al., 1999a,b). A GBA increase with this topography would be expected if GBA reflects the synchronization of cortical networks representing specific stimulus aspects that are processed following the detection of auditory pattern deviance.
Materials and Methods
Ten paid healthy volunteers (two females, eight males, aged 26–40 years) participated in experiment 1, and 16 adults (11 females, five males, aged 21–41 years) took part in experiment 2. None of the participants had a history of psychiatric or neurological illness or drug abuse and all were free of psychotropic medication. Audiometry prior to the experiment showed normal hearing in all subjects with thresholds below 30 dB (hearing level) at 1000 Hz for each ear. All participants had strong right-hand preferences as measured with the Edinburgh Handedness Inventory (Oldfield, 1971). Informed consent was obtained from all subjects. The studies were approved by the ethics committee of the University of Tübingen Medical Faculty.
Stimuli and Procedure
Both experiments relied on a classical oddball design with 20% rare, deviant stimuli interspersed among 80% frequent, standard stimuli. In experiment 1, the synthesized consonant–vowel syllable /da/ served as the standard, and the syllable /ba/ as the deviant. The duration of both syllables was 190 ms (10 ms voice onset time) and their fundamental frequency (F0) amounted to 128 Hz, giving the impression of a male voice. For both stimuli, an initial transient (duration: 35 ms) preceded the steady-state components of the lower formants F1–F3. The onset frequencies of the transients amounted to 300 (F1), 1300 (F2) and 3400 Hz (F3) for /da/, and to 300 (F1), 1000 (F2) and 2000 Hz (F3) for /ba/. The steady-state frequencies of both syllables were 780, 1100 and 2700 Hz for F1–F3, respectively. To obtain more natural sounding stimuli, two additional stationary formants at 3400 (F4) and 4600 Hz (F5) were added, and formant bandwidths were manually adjusted under auditory feedback control. Stimulus synthesis relied on an additive algorithm. Each of the five formants was modeled as an amplitude- and frequency- modulated sinusoid phase-locked to the fundamental frequency. Sound intensity amounted to 80 dB(A).
The second experiment was conducted to assess whether the effects found in the first study were specific to language stimuli or whether they could be generalized to nonverbal natural or artificial complex sounds. In experiment 2, two different standard–deviant combinations were presented in separate blocks. Two different recordings of a barking dog served as standard and deviant natural sounds, respectively. Electronically distorted versions of these two vocalizations were used as standard and deviant artificial, distorted ‘noise’ stimuli in the present study. Subjective ratings on a five-point scale obtained in a different study showed that distorted sounds were associated with significantly weaker visual associations than barking dog sounds [barking dog: 3.4 (SD = 0.3), distorted sound: 2.2 (SD = 0.3), F(1,17) = 13.7, P = 0.002], and none of the participants reported that distorted sounds reminded them of a dog. The distortion was achieved with the ‘flanger’ algorithm of the sound edit program ‘CoolEdit’ that mimics the sound of a low-quality tape recorder. The four sounds differed neither in their duration (155 ms) nor intensity [80 dB(A)]. Digital versions of all sounds in both experiments were sampled at 22 050 Hz and may be obtained on request from the authors.
In both experiments subjects were seated upright in a magnetically shielded chamber (Vakuum-Schmelze, Hanau, Germany). They were instructed to ignore the stimuli while sitting still and keeping their eyes open, looking at a fixation cross in the center of their visual field ~2 m in front of them. While there was only one recording block in experiment 1, experiment 2 comprised two blocks separated by a break where the chamber was opened. The order of the recording blocks was balanced across subjects. In both experiments, the recording blocks comprised 450 stimuli each, 360 of which were standards, and 90 deviants. Stimulus onset interval was 805 ms. Standards and deviants were presented in a pseudo-randomized order within each block, with no more than two consecutive deviants and at least three, but maximally seven consecutive standards.
Cortical magnetic fields were recorded with a whole-head MEG system (CTF Inc., Vancouver, Canada) comprising 151 hardware first-order magnetic gradiometers distributed with an average distance between sensors of 2.5 cm. The amplitude resolution of the CTF system amounts to 0.3 fT, enabling the detection of low-amplitude signal changes. The subject's head position was determined with localization coils fixed at the nasion and the pre-auricular points both at the beginning and end of each recording block. It was ensured that head movements did not exceed 0.5 cm. In experiment 1, the signals were sampled at a rate of 250 Hz. As the first study revealed effects in a fast frequency band, the sampling rate was increased to 312.5 Hz in the second experiment. In both experiments, epoch lengths were 600 ms including 50 ms prestimulus baselines. To minimize eye-blink artifacts, trials with signals exceeding 1.3 pT in a left frontal sensor were rejected.
Only responses to the first of a pair of deviants or to deviants not followed by another deviant, and to standard stimuli that immediately preceded a deviant were analyzed. Since two deviants followed each other consecutively on 10 occasions, this left 80 deviants for analysis: 10 that were followed by a second deviant and 70 that were not. This procedure ensured that equal numbers of trials (80) entered the averages for standards and deviants, respectively.
Magnetic Mismatch Responses
After baseline correction, MEG data from experiment 1 were digitally bandpass filtered between 0.5 and 40 Hz. One dipole in each hemisphere was fitted to the grand average across subjects of the difference fields between standards and deviants in a time window comprising the mismatch peaks. Symmetrically linked, fixed dipoles were used for a direct comparison of latencies. The time course of source strength of each of these dipoles was then computed. For each subject, amplitudes and latencies were determined by means of an interactive graphical computer program (CTF Dipole Fit program). Statistical analysis for syllables relied on separate ANOVAs for mismatch dipole latencies and amplitudes, with hemisphere (right vs left) as within-subject factor. Mismatch dipole latencies and amplitudes from experiment 2 were analyzed in the same way. Parts of these data have already been reported elsewhere (Kaiser et al., 2000a). Furthermore, average dipole latencies were compared between stimulus types using an ANOVA with stimulus type (syllable versus dog vs noise) as between-subjects factor. Dipole amplitudes could not easily be compared between stimulus types because of differences in dipole localization: dipoles localized more deeply in the head would show larger amplitudes than those localized more closely to the surface.
Frequency analysis of MEG was performed on a single-trial basis within the range of 1–90 Hz over the complete epoch of 50 ms prestimulus to 550 ms poststimulus for each of the deviants and standards. Selecting a 600 ms time window resulted in records of 150 points (188 points in experiment 2) which were zero-padded to obtain 256 points. Thus the effective windows had lengths of 1.02 s and 0.82 s in experiments 1 and 2, respectively. To reduce the frequency leakage for the different frequency bins, the records were multiplied by Welch windows, as recommended by Press et al. (Press et al., 1992). Subsequent Fast Fourier Transform yielded 91 power values (experiment 2: 73 power values) in the frequency range of 1–90 Hz (frequency resolution: 0.98 Hz in experiment 1, 1.22 Hz in experiment 2). Square roots of the power values were computed to obtain more normally distributed spectral amplitude values. These values were averaged across epochs to obtain measures of total spectral activity for each of the deviants and standards. We based our analysis on single-trial data to retain the nonphase-locked, induced oscillatory responses because of the functional significance of this type of activity for higher-level cognitive processes (Pulvermüller et al., 1999a; Tallon-Baudry and Bertrand, 1999).
Statistical Probability Mapping.
Differences in spectral amplitude between standards and deviants were assessed with paired, two-sided t-tests for each frequency bin and MEG sensor across the whole subject sample for the comparison between each of the deviants and standards. t-values were converted to P-values. P-values from two adjacent frequency bins had to meet the criterion of P < 0.005 to be considered significant. This value was chosen as an approximation to dealing with the problem of obtaining false positives, and because previous studies have shown consistently that effects meeting this criterion were also confirmed by randomization tests (Kaiser et al., 2000b).
In addition to the statistical probability mapping procedure described above, confirmatory statistical analyses were conducted based on randomization tests suggested by Blair and Karniski (1993), which were extended to multichannel data. First, the maximum t-value was determined for the observed data across all sensors and across the frequency bins where the frequency analysis had yielded significant effects. Then the sign of the task-related spectral amplitude difference was changed for all recording channels per subject. In experiment 1, this was done for all of the 210 possible permutations across subjects. Because of limitations of the analysis program, in the second experiment 10 000 permutations were randomly selected out of the 216 possible permutations. For each permutation the maximum t-value was identified. Finally, the significance of the observed maximum t-value was tested relative to the distribution of maximum t-values across all 210 tests (10 000 tests in experiment 2). In experiment 2, the robustness of the findings was further evaluated by repeating the tests of spectral amplitude differences in the frequency bands that had given significant effects after excluding those two subjects that showed the largest spectral amplitude difference and those that showed the smallest (or most negative) difference. This was not done in experiment 1 because of the smaller sample size.
Previous studies applying this statistical probability mapping method have always yielded only single areas of spectral amplitude enhancement at the surface (Kaiser et al., 2000b,c). Such an activation pattern suggests that the generator is not a single dipole but rather a more complex source structure. For example, a spatially extended octopole would produce a strong inner maximum located at sensors over the area circumscribed by the four dipoles, but weak outer fields. Thus, if GBA increases are confined to singular areas, spatially extended, circularly arranged currents would represent a parsimonious model to describe the source structure. This would imply that the sources should be located close to the area below the sensor with the highest GBA. A detailed description of possible source structures has been provided in a previous publication (Kaiser et al., 2000b). An alternative explanation would be that our significance criterion may have been too strict to find the second maximum generated by a simple dipole source. To address this possibility, we explored the observed spectral amplitude enhancements by repeating the statistical probability mapping with a very liberal criterion of P < 0.2 for two adjacent frequency bins. If additional extrema were found, one would have to conclude that the effects were generated by sources located at some distance of the sensors showing the significant spectral amplitude differences with the strict criterion. For the present stimuli, such sources might conceivably be located in the vicinity of the auditory cortex. An absence of further extrema, however, would suggest that the effects were generated by a more complex source structure, like the one described above.
Time Course of Spectral Amplitude Differences.
To explore the time course of the observed spectral amplitude changes, the data records were again padded to obtain 256 points, multiplied with cosine windows at their beginnings and ends and filtered in the frequency range in which the statistical probability mapping had yielded significant effects. Noncausal, Gaussian curve-shaped Gabor filters (width: ±2.5 Hz) in the frequency domain were applied to the signals on a single-epoch basis for each of the deviants and standards. The filtered data were amplitude-demodulated by means of a Hilbert transformation (Clochon et al., 1996) and then averaged across epochs for each of the standards and deviants. Differences in the time course of spectral amplitudes to deviants versus standards were assessed by a statistical mapping procedure similar to the evaluation of spectral amplitude differences described above. This narrow-band analysis was carried out for all sensors using paired, two-sided t-tests that were calculated for amplitudes at every sampling point and sensor. To be considered significant, P-values from three adjacent time points [corresponding to a time window of 12 ms (10 ms in experiment 2)] had to meet the criterion of P < 0.005. In addition, we required that spectral amplitude increases had to amount to at least 0.6 fT to avoid spurious significances. To assess differences in GBA time course to changes in the three types of sounds, we statistically tested differences in spectral amplitudes in the time windows containing the spectral amplitude peaks for each sound, i.e. mean amplitudes were determined for those 20 ms windows where the largest spectral amplitude differences between deviants and standards were observed in the group averages. This method was considered more robust than determining individual peak latencies. Amplitude differences were assessed with separate ANOVAs for each pair of sounds with the two-level factors type of sound (within-subject factor for comparison of the two stimuli from experiment 2, between-subject factor for comparisons across experiments) and latency window (within-subject factor).
Topography of MEG Sensor Positions.
To depict the topographic localization of significant spectral amplitude differences for the whole group, the MEG sensor positions of each subject were assigned to the coordinates of one representative subject (‘common coil system’). The sensor positions with respect to the underlying cortical areas were determined using the representative subject's volumetric magnetic resonance image. The error that is introduced by not using individual sensor locations was estimated in previous studies by using a single dipole localization of the first auditory evoked component (N1m). The comparison of individual sensor locations and ‘common coil system’ revealed differences ranging below the spatial resolution determined by the sensor spacing of 2.5 cm (Kaiser et al., 2000b,c). This justified the application of a ‘common coil system’ for the purpose of the present study where no exact source localization was attempted.
Magnetic Mismatch Responses
The comparison of deviant with standard stimuli yielded magnetic mismatch responses. The symmetrically fixed dipoles fitted to the evoked mismatch fields were localized at the level of the supratemporal plane and explained >90% of the variance for syllables and >95% for barking dog sounds and distorted noises. For syllables, there were trends for the mismatch dipole on the left to peak earlier than the dipole on the right [left: 192 ms (SD = 4 ms), right: 201 ms (SD = 4 ms), F(1,9) = 4.0, P = 0.075] and for larger dipole amplitudes on the left than on the right [left: 23.7 fT (SD = 5.0 fT), right: 18.0 fT (SD = 5.2 fT), F(1,9) = 3.5, P = 0.094]. There were no such interhemispheric differences between left and right dipole latencies or amplitudes for barking dog sounds or distorted noises in experiment 2. These data have been reported elsewhere (Kaiser et al., 2000a).
Comparing average mismatch dipole latencies across hemispheres between the different sounds yielded no effect of stimulus type. Mismatch fields peaked at latencies of 196 ms (SD = 3 ms) poststimulus for syllables, at 178 ms (SD = 7 ms) for barking dog sounds and at 189 ms (SD = 5 ms) for distorted noises. As the magnetic mismatch responses did not differ between sound types, Figure 1 depicts only the mismatch fields for one stimulus type (the distorted noise) in superimposed field amplitudes for all sensors (Fig. 1a) and in an isocontour plot at 210 ms poststimulus onset (Fig. 1b).
Frequency Analysis and Statistical Probability Mapping
Figure 2a,c depicts the results of spectral amplitude analysis across the whole recording epoch for syllables (top panel), barking dog sounds (center panel) and distorted noises (bottom panel) between 30 and 90 Hz. Applying the significance criterion of P < 0.005 for two consecutive frequency bins disclosed enhanced spectral amplitudes at ~86 Hz for syllables, at 63 Hz for dog sounds and at 69 Hz for distorted noises. Mapping onto the common coil system showed that these spectral amplitude increases were all localized in sensors over left temporal and frontal areas. Figure 3 further illustrates the differences in spectral amplitude between deviants and standards in the range of 55–90 Hz for the sensors showing significant effects for each of the three types of sounds. Significant differences were restricted to narrow frequency ranges, however, it has to be kept in mind that the present analyses were based on group averages. This does not preclude the possibility that differences in individual subjects were extended across broader frequency ranges. Randomization tests were conducted across all sensors at 85–87 Hz for syllables in experiment 1, at 63 Hz for barking dog sounds and at 69 Hz for distorted noises in experiment 2. These tests confirmed the effects both for syllables (P = 0.024), barking dog sounds (P = 0.020) and distorted noises (P = 0.028). Moreover, we tested the robustness of the findings by excluding the two subjects with the largest and the two subjects with the smallest spectral amplitude difference for each stimulus type in experiment 2. This showed that the spectral amplitude differences between deviants and standards were still significant in the same frequency bands and at the same sensors (barking dog: P = 0.02, distorted noise: P = 0.01).
In addition to the effects in the gamma-band range, increases in low-frequency spectral amplitudes were found for syllables (at 3 Hz) and for barking dog sounds (at 7 Hz) but not for distorted noises. The spectral amplitude enhancement for syllables was localized over the left auditory cortex and amounted to 18.4 fT (SD = 4.7 fT). In contrast, the effect for barking dog sounds was observed in a right temporal sensor (13.4 fT, SD = 2.4 fT). Whereas barking dog sounds and distorted noises differed significantly in their spectral amplitude increase at ~7 Hz [F(1,15) = 5.6, P = 0.03], there was no difference between syllables and distorted noises at ~3 Hz. For syllables, a significant reduction in spectral amplitude was found at 77 Hz. However, since we had no hypotheses for GBA reductions, the subsequent analyses were restricted to those frequency ranges where significant spectral amplitude increases were found. There were no significant spectral amplitude differences meeting the present significance criterion in any other frequency band for any of the three types of stimuli.
We explored the question of the source structure of the observed GBA effects by repeating the statistical probability mapping with a very liberal significance criterion of P < 0.2 for two adjacent frequency bins (Fig. 2b,d). This was done to test whether the fact that we only found single sensors with significant effects may have been caused by choosing a significance criterion that was too strict to find both extrema of simple dipole sources. The additional analyses in the frequency domain demonstrated that there were no second extrema. This suggested that the spectral amplitude enhancements were unlikely to be generated by simple dipole sources.
Time Course and Topographic Mapping of Spectral Amplitude Differences
Based on the results of the frequency analysis, signals were subjected to Gabor filtering with the following central frequencies: 86 Hz (syllables), 63 Hz (barking dog) and 69 Hz (distorted noise) and with filter widths of ±2.5 Hz. Subsequent complex demodulation via Hilbert transform allowed the statistical comparison of energy increases in the respective frequency bands for deviant compared with standard stimuli. Results of statistical probability mapping of amplitude differences for each of the three types of sounds are depicted in Figure 4. Deviant compared with standard syllables in experiment 1 gave rise to increased 86 ± 2.5 Hz spectral amplitude in a left prefrontal sensor peaking at ~320 ms poststimulus. Barking dog sounds in experiment 2 elicited 63 ± 2.5 Hz spectral amplitude enhancements in three left temporo-frontal sensors with peaks at ~180, 240 and 280 ms poststimulus, respectively. Distorted noises were accompanied by increased 69 ± 2.5 Hz spectral amplitude in a left prefrontal sensor peaking at ~260 ms. The topography of the effects identified in the analysis of filtered signals was investigated using the superimposition of the MEG sensor map onto a brain surface model derived from a representative subject's magnetic resonance image (Fig. 5, upper panel). The lower panel of Figure 5 shows the position of the sensors showing significant spectral amplitude increases to deviants compared with standards (red, green and pink circles) relative to the major anatomical landmarks. In addition, the time courses of spectral amplitude for each of these sensors are depicted. These sensors were located over the region of the left anterior temporal and ventrolateral prefrontal cortex for all three types of stimuli. For comparison, the left-hemisphere mismatch dipole moment time courses are depicted as blue curves and the dipole locations are displayed as blue circles in the lower panel of Figure 5 for each of the three sounds.
Lateralized effects had not been expected, yet we conducted separate ANOVAs for each of the three stimulus types to explore differences in the spectral amplitude changes between the left- and homologous right-hemispheric sensors in the respective frequency bands. These analyses demonstrated that the spectral amplitude increases were indeed more pronounced at left than right sensors both for syllables [left: 1.7 fT (SD = 0.3 fT), right: –0.1 fT (SD = 0.5), F(1,9) = 8.1, P = 0.02], barking dog sounds [left: 2.6 fT (SD = 0.6 fT), right: –0.7 fT (SD = 1.3 fT), F(1,15) = 4.9, P = 0.043] and distorted noises [left: 2.0 fT (SD = 0.5 fT), right: 1.0 fT (SD = 0.6 fT), F(1,15) = 7.2, P = 0.017].
Figure 4 shows that there were differences in the latencies at which the amplitude effects for each sound first reached significance. These differences in the timing of GBA responses between the three types of sounds were assessed by comparing spectral amplitudes in the respective frequency ranges at the sensors with significant increases [for barking dog sounds, the sensor showing the first significant peak was selected (red sensor in Fig. 4)] in those 20 ms latency intervals where the spectral amplitude differences between deviants and standards peaked for each sound. The following time windows were selected on the basis of the statistical probability mapping (Fig. 4): syllables: 320–340 ms, barking dog: 170–190 ms and distorted noise: 260–280 ms. ANOVA showed a clear type of sound × latency window interaction for the comparison of syllables with barking dog sounds [F(1,24) = 11.0, P = 0.003], with larger amplitudes at the earlier time interval for barking dog sounds than syllables and the opposite picture at the later time interval (see Fig. 6 for means and standard errors). Neither the comparison of syllables with distorted noises [interaction type of sound × latency window: F(1,24) = 2.8, P = 0.110] nor the comparison of barking dog sounds with noises [interaction type of sound × latency window: F(1,15) = 3.6, P = 0.078] yielded significant effects. Based on an individual interaction measure (the difference between spectral amplitude to barking dog sounds minus distorted noises during the first time window minus the same difference during the second time window), more robust analyses were computed excluding the subject with the largest and the one with the smallest (or most negative) interaction measure from experiment 2. This showed that both the type of sound × latency window interactions for barking dog and syllable and for barking dog and distorted noise were significant [F(1,22) = 9.3, P = 0.006, and F(1,13) = 6.8, P = 0.021, respectively], while there was no such effect for the comparison of distorted noise and syllable [F(1,22) = 1.7, NS]. In summary, deviations in barking dog sounds gave rise to earlier GBA increases than syllables or distorted noises.
Using an auditory mismatch design, this study investigated human MEG responses to changes in the spectral composition of three types of complex sounds. Deviant stimuli evoked bilateral magnetic mismatch fields (Näätänen, 1992, 2001; Näätänen et al., 1997) with average peak latencies just below 200 ms poststimulus. Trends for shorter mismatch dipole latencies and larger amplitudes in the left than in the right hemisphere for language sounds were in line with previous research suggesting a left-hemispheric predominance in speech processing already at a pre-attentive level (Alho et al., 1998; Shtyrov et al., 1998, 2000; Ackermann et al., 1999; Rinne et al., 1999). There was no such difference for either natural or artificial non-language sounds. Comparing experiment 1 with a previous study where the same standard syllables were used in a spatial mismatch paradigm (Kaiser et al., 2000b) shows that the peak latencies for the present pattern mismatch paradigm were ~80 ms longer than when deviants were characterized by their different sound-source localization. This replicates the observation of longer latencies for spectral compared with spatial mismatch for both barking dog sounds and distorted noises (Kaiser et al., 2000a) and is in keeping with EEG mismatch studies (Schröger, 1995; Schröger and Wolff, 1997).
Passive listening to changes in the spectral composition of complex acoustic stimuli also elicited gamma-band spectral amplitude enhancements. The effects were observed in the high gamma-band range (at ~83, 63 and 69 Hz for syllables, barking dog sounds and distorted noises, respectively). While these frequencies are faster than those more commonly reported in EEG studies (Tallon-Baudry and Bertrand, 1999), recent electro-corticographic data have demonstrated induced gamma-band responses between 80 and 100 Hz during auditory processing in humans (Crone et al., 2001). Conceivably MEG is better able to detect fast, low-amplitude oscillatory responses than EEG because magnetic fields are less affected by the natural low-pass filter of the skull and scalp.
Both the present and our previous study (Kaiser et al., 2000b) used a novel approach to analyze multi-channel spectral data based on statistical probability mapping. While a strict Bonferroni-type correction of P-values would have made it impossible to obtain any significant effects, we chose a criterion serving to identify differences that also withstand confirmatory randomization tests. Moreover, our approach is based on fre-quency analysis across the whole 600 ms epoch, thus identifying only robust, prolonged spectral amplitude changes. Note that this procedure is less sensitive to more transient effects that may be detectable with a wavelet-based approach (Tallon-Baudry et al., 1996; Herrmann et al., 1999). On the other hand, a wavelet analysis would have added the time aspect to the initial analysis, thus further increasing the number of tests. Instead we chose to assess the time course of GBA by subsequent filtering in the selected frequency band. It should also be noted that an important criterion for the reliability of findings, i.e. their replicability with a different sample (and, in this case, different stimuli), has been demonstrated across the two experiments described in the present paper.
The GBA enhancements were localized in sensors over left anterior temporal/ventrolateral prefrontal regions for all three types of stimuli. Since MEG signals obtained from axial gradiometers do not provide conclusive information about the topography of their cortical generators, we explored the possible source structure of the present findings by repeating the statistical probability mapping with a very liberal criterion (Fig. 2). This approach served to assess the possible existence of further maxima that may have been obscured by the strict criterion employed in the initial analysis. Our results corroborated the assumption that the enhanced GBA was confined to single areas. The only possible alternative explanation for the present surface structure, single dipoles located in basal temporal/frontal areas whose second maximum would not have been covered by the sensor grid, appears highly unlikely. We therefore suggest that the present effects may most parsimoniously be explained by complex, extended sources arranged octopolarly or even circularly. Such a source structure would produce a strong maximum over the area the generators circumscribe, which implies that the sources of the surface GBA patterns should be localized in the vicinity of the sensors showing the strongest effects (Kaiser et al., 2000b). In our opinion, a model of multiple, extended sources also ties in well with the notion of distributed neuronal assemblies showing synchronized activity in the gamma-band range (Pulvermüller et al., 1999a; Tallon-Baudry and Bertrand, 1999).
Based on this model, we interpret the present findings as evidence for an involvement of human anterior temporal/ventrolateral prefrontal cortex in the processing of changes in the spectral composition of complex sounds. This is in keeping with studies using anatomical tract-tracing and electrophysiological recordings in non-human primates suggesting reciprocal connections between prefrontal areas and the more anterior parts of secondary and tertiary auditory cortex (Romanski et al., 1999a,b) that appear to be specialized for the processing of auditory patterns (Rauschecker et al., 1995, 1997; Rauschecker, 1998a). EEG studies on auditory mismatch processing have also identified a right frontal generator in addition to the primary change detectors in the auditory cortex. Whereas the latter may reflect sensory memory mechanisms, the frontal source has been interpreted as indicating an automatic attention-switching process (Giard et al., 1990; Näätänen et al., 2001). While MEG does not detect this radially oriented frontal component, the left-hemispheric topography of the present GBA results makes it appear unlikely that they reflect similar processes as the frontal source in EEG. Our results are consistent with human brain imaging work suggesting a participation of anterior temporal and prefrontal areas both in receptive language tasks (Binder et al., 1997; Belin et al., 2000; Scott et al., 2000) and in the categorization of environmental sounds (Engelien et al., 1995). Taken together with our previous investigation showing increased GBA during the processing of changes in perceived sound-source location over posterior parieto-temporal cortex (Kaiser et al., 2000b), the present report provides further support for the notion of two separate processing streams for auditory information: a dorsal, ‘where’ stream involving posterior temporal and parietal areas, and a ventral, ‘what’ stream recruiting anterior temporal and ventral prefrontal regions (Rauschecker, 1998a; Rauschecker and Tian, 2000).
We had no hypothesis concerning the left-hemispheric predominance in the processing of auditory patterns found here regardless of sound type. In contrast, discrimination of complex tones (series of square waves) has been found to be processed predominantly in the right hemisphere (Sidtis, 1984) and pitch working memory tasks yield activation of right prefrontal areas (Zatorre et al., 1992). However, our findings are consistent with studies showing the processing of sounds with fast frequency transitions (e.g. 40 ms) to be lateralized to the left auditory (Belin et al., 1998) and left frontal cortex (Johnsrude et al., 1997). Moreover, the present paradigm involved the pre-attentive memory-based comparison of auditory patterns (Jacobsen and Schröger, 2001), and left inferior frontal regions are known to play an important role in various memory tasks (Petrides et al., 1995; Braver et al., 1997; Dolan and Fletcher, 1997; Gabrieli et al., 1998; Opitz et al., 2000).
While the topography of the present findings was consistent with other brain imaging methods, the high time resolution of MEG enabled the detection of GBA latency differences between the different types of sounds. The earlier response to barking dog sounds than syllables or noises cannot be explained by differences in the way deviants and standards differed within each stimulus category, because such differences should also have been reflected in the timing of the evoked mismatch fields. Instead it appears that changes in stimuli with little biological relevance like meaningless noises or synthetically created syllables are processed serially, i.e. auditory pattern change is detected initially at the level of the supratemporal plane, giving rise to evoked mismatch fields, and subsequently stimulus deviance elicits GBA increases in more anterior regions along the hypothesized auditory ventral stream. While the possibility cannot be ruled out that the anterior temporal/ventral prefrontal networks simply responded faster to the change-detection signal onset for barking dog sounds, the simultaneous peaks of mismatch response and anterior temporal/ventral prefrontal GBA observed for changes in barking dog sounds might also reflect parallel processing of biologically meaningful and potentially fear-relevant information (Armony et al., 1997), possibly involving a subcortical pathway bypassing the auditory cortex.
In addition to yielding further evidence for an auditory ventral processing stream, this study also demonstrated the potential value of investigating high-frequency cortical oscillatory responses for the study of perceptual processes. This is underscored by the comparison of the present results with our previous investigation of MEG responses during auditory spatial mismatch processing (Kaiser et al., 2000b). The topography of GBA clearly varied as a function of the stimulus attribute producing the deviance (here: its spectral composition), but was relatively independent of the type of stimulus (language, animal vocalization or noise). This supports the putative role of GBA as a signature of cortical networks representing relevant stimulus characteristics (Tallon-Baudry and Bertrand, 1999). In addition, the high time resolution of MEG gamma-band responses may provide unique information about subtle processing differences between different types of stimuli, thus qualifying them as a research tool supplementing other human neuroimaging methods.
We thank B. Wasserka for help in subject recruitment and data acquisition and two anonymous reviewers for helpful comments on an earlier version of this paper. This research was supported by Deutsche Forschungsgemeinschaft (SFB 550/C1).