Oscillations are pervasive in encephalographic signals and supposedly reflect cognitive processes and sensory representations. While the relation between oscillation amplitude (power) and sensory–cognitive variables has been extensively studied, recent work reveals that the dynamic oscillation signature (phase pattern) can carry information about such processes to a greater degree than amplitude. To elucidate the neural correlates of oscillatory phase patterns, we compared the stimulus selectivity of neural firing rates and auditory-driven electroencephalogram (EEG) oscillations. We employed the same naturalistic sound stimuli in 2 experiments, one recording scalp EEGs in humans and one recording intracortical local field potentials (LFPs) and single neurons in macaque auditory cortex. Using stimulus decoding techniques, we show that stimulus selective firing patterns imprint on the phase rather than the amplitude of slow (theta band) oscillations in LFPs and EEG. In particular, we find that stimuli which can be discriminated by firing rates can also be discriminated by phase patterns but not by oscillation amplitude and that stimulus-specific phase patterns also persist in the absence of increases of oscillation power. These findings support a neural basis for stimulus selective and entrained EEG phase patterns and reveal a level of interrelation between encephalographic signals and neural firing beyond simple amplitude covariations in both signals.
Oscillatory signals are pervasive in scalp potentials recorded using the electro- or magnetoencephalograms (EEG, MEG) supposedly reflect the implementation of cognitive processes, such as sensory representations, the routing of sensory information, or decision making (Varela et al. 2001; Ward 2003; Donner and Siegel 2011; VanRullen et al. 2011). Many studies have demonstrated how the amplitude (power) of oscillatory signals relates to cognitive processes, for example, by reporting correlations between the signal's power and attention, sensory stimulus features, or the level of performance during mental operations (Tallon-Baudry and Bertrand 1999; Hanslmayr et al. 2005; Thut et al. 2006; van Dijk et al. 2008; Hipp et al. 2011). In addition, accumulating evidence demonstrates how the power of encephalographic potentials relates to neural spiking in the cortical regions generating these potentials (Schroeder et al. 1991; Coenen 1995; Nunez 2005; Whittingstall and Logothetis 2009). Insights such as these promote the amplitude of encephalographic oscillations as an important marker for studying brain function in health or disease (Birbaumer et al. 2008; Engel and Fries 2010; Uhlhaas and Singer 2010).
Recent work, however, started to put additional focus on the dynamic signature of EEG/MEG activity (VanRullen et al. 2011). Several studies have shown that the precise temporal structure of slow encephalographic oscillations as characterized by temporal patterns of the signal's phase can also be informative about sensory stimuli or details relating to the cognitive task performed (Busch et al. 2009; Mathewson et al. 2009; Busch and VanRullen 2010; Stefanics et al. 2010; Drewes and VanRullen 2011; Schyns et al. 2011). In the context of acoustic stimuli such as speech tokens, for example, it has been suggested that cortical rhythms commensurate with prominent stimulus-inherent time scales may entrain selective neural representations encoding these sounds (Howard and Poeppel 2010). Especially, behaviorally relevant sounds, such as speech or animal vocalizations, are dominated by slow temporal dynamics (Drullman et al. 1994a, 1994b; Chandrasekaran et al. 2009), and auditory cortex activity seems to specifically reflect this input dynamics (Ahissar et al. 2001). Noteworthy, in some studies on acoustic processing, the phase of slow oscillations proved to be more informative about the presented sounds than the same signal's power (Luo and Poeppel 2007; Howard and Poeppel 2010), suggesting that the sensory input imprints more on the precise dynamics than on the overall amplitude of rhythmic brain activity. Intriguingly, a similar benefit of using phase was found in a visual categorization task (Schyns et al. 2011), suggesting a more widespread sensory cortical phenomenon. Overall, this highlights a potential link between the time scales of cortical oscillations and those of the sensory environment (Schroeder et al. 2008; Panzeri et al. 2010) and suggests that the precise timing of encephalographic oscillations possibly constitutes a powerful index of dynamic neural processing and sensory representations (VanRullen et al. 2011).
However, the neural correlates underlying the information carrying capacity of the phase of slow oscillations remain unclear (VanRullen et al. 2011). Addressing this, we asked whether and to what degree the apparent stimulus selectivity seen in the precise timing of slow oscillations correlates with the selectivity of neural firing rates in areas presumably generating these oscillations. In our experiments, we used the auditory system as model, and we employed the same set of acoustic sounds in 2 experiments, one recording auditory-driven scalp EEG responses in human subjects and one recording intracortical field potentials and single neuron responses in macaque auditory cortex. This permitted us to directly compare the stimulus selectivity of the phase of encephalographic and intracortical oscillations to the selectivity of neural firing.
Materials and Methods
The acoustic stimulus presented during the experiments consisted of a continuous 52 s long sequence composed of natural sounds. This sequence was created by concatenating (without periods of silence) 21 1–4 s long snippets of different animal vocalizations, environmental sounds produced by animals in their natural habitats, conspecific macaque vocalizations, and a brief sample of human speech. A spectral representation of this stimulus is shown in Figure 1A, together with the frequency spectrum of its slow envelope modulations. The latter reveals a 1/f pattern and shows peaks around 2, 4, and 8 Hz, typical for natural sounds and communication signals (Chandrasekaran et al. 2009). The root mean square intensity of this stimulus was set to 65 dB sound pressure level both for EEG and intracranial recordings (as calibrated using a condenser microphone; Bruel & Kjær 2238 Mediator sound level meter). The acoustic stimulus (i.e., full 52 s sound sequence) was presented 60 times for each recording site in auditory cortex and for each human subject in the EEG experiments. Individual repeats of this stimulus were separated by several seconds of silent intertrial intervals.
Human EEG Recordings and Data Preprocessing
Six subjects were paid to participate in the experiment. All reported normal hearing and gave informed consent prior to the involvement. The experiments were approved by the joint ethics committee of the University Clinic and the Max Planck Institute Tübingen. Sixty-four channel EEG signals were continuously recorded using an actiCAP (Brain Products, Germany) with Ag/AgCl electrodes placed according to the standard 10–20 system. The reference electrode was fixed at the nose tip, and the ground electrode was in position AFz. A third electrode was placed over the lower left orbit to register eye movements. Electrode impedance was kept under 10 kΩ. Signals were amplified using BrainAmp amplifiers (Brain Products), and data were acquired at a sampling rate of 500 Hz using a band-pass filter of 0.318–250 Hz. The experiments were conducted in a sound-attenuated room, and the acoustic stimulus was presented using a Sennheiser In-Ear headphone (Model PMX 80). The experiment was divided into 6 blocks consisting of 12 presentations of the 52 s acoustic stimulus described above. Each presentation was subject initiated, triggering a 2-s wait period before stimulus onset. Subjects were performing a “dummy” task requiring their attention to the stimulus: They were asked to detect a single embedded target (sound of a crowd cheering, 1.3 s long) that appeared on average in 1 of 6 stimulus repeats at a random time during the 52 s stimulus sequence. Except for this interspersed target, the stimulus in these trials was identical to the original acoustic stimulus described above. Subjects performed this task at 93 ± 5% correct, and these target trials were discarded from further analysis. Hence, only data recorded during the presentation of the unique acoustic stimulus described above, which was presented in both the human and animal experiments, was used for analysis.
EEG data were analyzed in Matlab using tools from the EEGLAB Matlab toolbox (Delorme and Makeig 2004). Individual trials were rejected as containing artifacts if the amplitude on any of the central channels (FC1-4, C1-4, CP1-4, FCz, CPz, and Cz) exceeded 7 standard deviations (SDs) of the signal for a period longer than 120 ms. The remaining trials (50 ± 11, mean ± SD) were kept and re-referenced to a global average reference. For subsequent analysis, we averaged the signals from those central electrodes to obtain one auditory-driven scalp signal (unless specified otherwise). Individual frequency bands (delta: 1–4 Hz, theta 4–8 Hz, alpha 8–14 Hz, beta 14–20 and 20–30 Hz, gamma 30–50 Hz) were extracted using third-order Butterworth filters, and the phase and power of these narrow-band signals were calculated using the Hilbert transform. Power was defined as the squared absolute value and phase as the phase angle of the Hilbert signal.
Intracranial Recordings and Data Preprocessing
Data were recorded from the auditory cortex of 3 alert male macaque monkeys (Macaca mulatta) as part of a previous study (Kayser et al. 2009). All procedures were approved by local authorities (Regierungspräsidium Tübingen) and were in full compliance with the guidelines of the European Community (EUVD 86/609/EEC). The animals were socially (group-) housed in an enriched environment, and prior to the experiments, form-fitting headposts and recording chambers were implanted under aseptic surgical conditions and general balanced anesthesia (Logothetis et al. 2010); antibiotics (Enrofloxacin; Baytril) and analgesics (Flunixin, Finadyne vet.) were administered for 3–5 days postoperatively. Neural activity was recorded using multiple microelectrodes (1–6 MOhm impedance), high-pass filtered (4 Hz, digital two-pole Butterworth filter), amplified (Alpha Omega system), and digitized at 20.83 kHz. Recordings were performed in a dark and anechoic booth while the animals were passively listening to the acoustic stimuli. Recording sites covered caudal auditory fields (primary field A1 and caudal belt fields caudomedial and caudolateral) as assessed based on stereotaxic coordinates, frequency maps constructed for each animal, and the responsiveness for tone versus band-passed stimuli. While it is difficult to ascertain the particular cortical layers recorded from, common biases in extracellular recordings suggest that most recordings originate from supragranular or infragranular layers. The 52 s acoustic stimulus described above was delivered from 2 calibrated free-field speakers in the left and right hemi fields at 70 cm distance from the head and was repeated 60 times for each recording site in distinct trials. Further details of the recording procedures can be found in previous publications (Kayser et al. 2008, 2009).
Spike-sorted activity was extracted using commercial spike-sorting software (Plexon Offline Sorter) after high-pass filtering the raw signal at 500 Hz (third-order Butterworth filter). For the present study, only sites with unit signals with high signal-to-noise ratio (SNR > 8) and less than 2% of spikes with interspike intervals shorter than 2 ms were included. For each recording site, the data from more than one unit were typically recovered, and the unit with highest SNR was chosen for subsequent analysis. Field potentials were extracted from the broadband signal after subsampling the original recordings at 1-ms resolution. Individual frequency bands (theta 4–8 Hz, alpha 8–14 Hz, beta 14–20 and 20–30 Hz, gamma 30–50 Hz) were extracted using the same filters as for the EEG signals.
Measure of Phase Coherence
The trial-by-trial coherence of the oscillatory phase (phase coherence; Fig. 1D) was calculated for each time point t using the magnitude of the complex-valued trial-averaged phase:
Decoding analysis was used to quantify how well different epochs sampled from the full 52 s acoustic stimulus could be discriminated using EEG or intracranial signals. Specifically, we applied the decoding analysis to sets of “sound epochs” that were randomly sampled from the full acoustic stimulus sequence. Each set consisted of 10 nonoverlapping epochs chosen at random locations within the 52 s sound but excluding the first 1.5 s of this to avoid transient responses following sound onset. The duration of these epochs was varied between 120 and 360 ms to ensure the validity of our results across different time windows. The EEG or intracranial signals during these sound epochs were sampled using 12-ms bins. For each set of sound epochs, we applied the decoding analysis described below. The overall decoding performance and correlation measures between different signals (see below) were obtained by averaging over 300 sets of randomly chosen sound epochs to ensure the generality of the results by thorough sampling from the rich dynamics of the full stimulus sequence. Importantly, the same stimulus sets (epoch positions within the acoustic stimulus) were used for the analysis of EEG and intracortical data permitting a direct comparison between the different signals within a given stimulus context. Note that the epochs used for decoding do not bear a particular relation to the individual sounds snippets that were concatenated to create the 52 s acoustic stimulus sequence presented during the experiments.
Decoding was based on a linear discriminant decoder in conjunction with a leave-one-out cross-validation procedure as used by previous studies on the coding properties of auditory cortex (Russ et al. 2008; Kayser et al. 2010). Decoding was applied to the sets of sound epochs sampled from the acoustic stimulation sequence and for each set was based on the individual repeats (trials) for each epoch (example data in Fig. 1E). Specifically, for each trial of each epoch, we repeated the following: 1) The average responses to all other 9 sound epochs were computed across all repeats of the respective epochs. 2) For the current epoch, the mean response was computed by averaging across all trials but excluding the “test” trial. This generated “template” responses for each of the 10 sound epochs. 3) A measure of distance (see below) was computed between the response on the test trial and all templates. The test trial was decoded as that epoch yielding the minimal distance between its template and the test response. By repeating this procedure for each epoch and trial, we obtained the decoding matrix, which contains the percentage of correctly and wrongly (confused) decoded trials for each epoch (e.g., Fig. 1F).
The distance between the single trial and the template in the decoding process was calculated differently when using phase, power, or firing rates. For the last 2, the distance was calculated as the Euclidean distance between both time series: the squared difference between both time series was summed over all time points. For phase, which is a cyclic variable, a different distance measure was the magnitude of the circular difference between both time series was summed over all time points, whereby the circular difference is defined as min(|a − b|, 2π − |a − b|).
Comparisons between Signals
To relate the stimulus selectivity between local field potential (LFP) phase/power to firing rates, we compared the similarity of the decoding performance at each recording site. Practically, this was done by taking into account only the performance of correctly decoded sound epochs (represented on the diagonal of the decoding matrix) or by also including decoding errors (i.e., the full decoding matrix). We computed the correlation between (either the diagonal or full) decoding matrices derived from both signals and averaged over all 300 sets of sound epochs. Figure 4B,C shows the distribution of these correlation values across recording sites. We also performed a regression analysis for each site to quantify which signal (phase or power) contributes more to predicting the decoding performance of firing rates. The regression model was calculated across all 300 sets of sound epochs, and normalized beta values and F-ratio statistics were used to assess the contribution of phase and power (Sokal and Rohlf 1995). The F-ratio was used to obtain the fraction of variance explained when each signal (phase or power) is omitted from the full model, which provides a statistical assessment of the contribution of each signal. To compare the stimulus selectivity between EEG phase or power and firing rates that were not recorded simultaneously, we exploited the fact that for both signals, the decoding performance was evaluated using the same sets of sound epochs. For each signal, we averaged the decoding performance across all recording sites, neurons, or EEG subjects for each of the 300 sets of sound epochs. We then calculated the correlation of this averaged decoding performance between EEG phase or power and firing rates across the 300 stimulus sets.
Comparisons between LFP and EEG Phase Time Courses
Comparing the similarity of stimulus-evoked phase patterns across nonsimultaneously recorded signals is difficult as the absolute phase angle of field potential or EEG signals depends on the respective reference signal (Nunez 2005). To still be able to compare the similarity of the time course of the entrained phase pattern across LFP and EEG, we based this comparison not on the actual phase but on a measure of trial-by-trial phase similarity: phase coherence. The values of the phase coherence are independent of the underlying phase but only reflect its consistency across trials. The phase coherence time courses, with unitless values between 0 and 1, can be hence compared across different signals. Practically, we computed the correlation between the phase coherence time courses between LFP and EEG bands across the full stimulus sequence.
The significance of decoding performance was quantified using a randomization test: We calculated the decoding performance by repeating the decoding process 1000 times using data in which the assignment between individual trials and sound epochs was shuffled. From this, we obtained the 99% confidence interval from this distribution corresponding to the null hypothesis of decoding performance at chance level. One common significance level was computed across frequency bands. A similar shuffling approach was used to obtain a significance estimate for the phase coherence in Figure 1D.
We recorded auditory cortex responses to a continuous 52 s acoustic stimulus sequence consisting of naturalistic sounds like environmental noises and animal calls. This sound sequence provided a rich and dynamic stimulus whose slow envelope modulations were dominated by low frequencies around 4 and 8 Hz (Fig. 1A) and which evoked a robust and dynamic signature in frequency scalp and LFPs and neural firing rates, as illustrated below. To compare the degree of sound selectivity in oscillatory activity and neural firing rates, we used a framework of stimulus decoding. Such decoding analysis quantifies how well a given set of sensory inputs can be discriminated given the observed single-trial responses and provides a measure of signal selectivity with respect to the sensory input. We applied the decoding analysis to short sound epochs randomly sampled post hoc (for analysis only) from the 52 s acoustic stimulus presented during the actual experiments (Fig. 1B). Specifically, we randomly created 300 sets consisting of 10 sound epochs to evenly sample the rich structure of the long acoustic stimulus presented during the experiments. In the following, we first report on the decoding results from EEG and LFP data separately, and we subsequently show how the selectivity of the phase of slow oscillations relates to the selectivity of neural firing rates.
Stimulus Decoding Using Power and Phase of EEG Oscillations
We recorded EEG activity from 6 volunteers that performed an acoustic target detection task while listening to the acoustic stimulus sequence. Previous studies have shown that in response to prolonged acoustic stimuli activity in auditory cortex entrains (i.e., time locks) to the slow components of the acoustic input. This entrainment results in a pattern of oscillatory phase that is consistent across trials (cf. Fig. 1E) and which can be measured using cross-trial phase coherence (Luo and Poeppel 2007; Kayser et al. 2009; Howard and Poeppel 2010). For the present data, phase coherence was highest over central electrodes and strongest in the theta (4–8 Hz) frequency band (median 0.2, randomization test P < 0.01; Fig. 1D). This central localization of sound-entrained oscillations is in good concordance with the known projections of auditory dipoles in human scalp EEG (Nunez 2005; Burkard et al. 2006; Stefanics et al. 2010). Following previous studies, we focused on the signal from these central locations for subsequent analysis, ensuring that the analyzed EEG signals reflect scalp potentials generated by auditory cortex neurons analyzed in the subsequent step.
The stimulus selectivity of slow EEG oscillations was analyzed using decoding analysis, separately for the phase and power in different bands of the EEG signal and separately for a range of time windows. Figure 1E illustrates the power and phase on all trials for one sound epoch sampled from the long acoustic stimulus (240-ms duration). Clearly, the phase is more consistent across trials than power, reflecting the entrainment of the oscillatory dynamics by the acoustic stimulus (visible as consistent color coding across trials). The decoding algorithm was applied to the collection of single-trial responses for each of the 10 sound epochs within each set, was based on a linear classifier and resulted in a decoding matrix that provides a measure of correctly and wrongly decoded (i.e., confused) sound epochs (Fig. 1F).
Decoding performance using EEG signals was strongest in the 4–8 Hz band, while lower (1–4 Hz) and higher (>8 Hz) bands yielded poorer performance (Fig. 2A). Importantly, decoding performance was considerably higher when using phase (median value 4–8 Hz: 15.3% correct) rather than power of individual EEG bands (4–8 Hz: 12.2%; sign-rank test P < 0.05). A randomization test demonstrated the significance for phase (P < 0.01 for 1–4 and 4–8 Hz), while decoding from power did not reach significance in any band (Fig. 2A). Similar results were found when decoding sound epochs of different duration, ranging from 120 to 360 ms (Fig. 2B). Repeating the decoding analysis separately on individual electrodes confirmed that regions of highest selectivity were concentrated over central locations (Fig. 2C), hence on electrodes with strongest cross-trial phase coherence. Decoding performance was comparable when using only electrode locations on the left (median 14.4%), right (14.5%), or both hemispheres (15.3%). This extends previous results derived using speech (Luo and Poeppel 2007; Howard and Poeppel 2010) to more general and complex naturalistic sounds and highlights the significant stimulus selectivity in the precise timing of slow cortical oscillations.
Acoustically, entrained slow rhythmic activity is characterized by a dynamic phase pattern and time-varying levels of signal power. In particular, periods during which the signal's power is increased compared with a sound-devoid baseline alternate with periods during which the power differs only little from baseline (cf. Fig. 1C). While increased power is characteristic of traditional evoked components in oscillatory signals such as EEG, phase resetting in the absence of additional increases in power has been considered as one possible generating mechanism for event-related potentials (Makeig et al. 2002; Sauseng et al. 2007). Pure phase resetting has been previously described in auditory cortex both for auditory and crossmodal inputs (Lakatos et al. 2007, 2009) raising the possibility that stimulus-informative oscillatory patterns could occur in the absence of increases in signal power. We performed an additional analysis to directly test whether this was indeed the case. Specifically, we asked whether sound epochs that were well decoded using their low-frequency phase pattern necessarily required an increase of power compared with baseline (a prestimulus period). Figure 2D displays the decoding performance of theta (4–8 Hz) phase versus the normalized power of the same signal for individual epochs across subjects. While power was generally above baseline (red circles denote the mean for each 10% performance range), many epochs were well decoded using their phase pattern despite the power being close to baseline. The overall correlation between power increase and decoding performance was weak (r2 = 0.021). This shows that theta-band oscillations can be stimulus entrained and provide a stimulus-specific phase pattern even in the absence of changes in oscillatory power, highlighting the prominence of phase reset as one mechanism by which natural sounds imprint on auditory cortex activity.
Stimulus Decoding Using Intracranial LFPs and Neural Firing Rates
To link the stimulus selectivity of EEG activity to more direct measures of neural activity, we analyzed LFPs and spiking activity recorded using microelectrodes at 36 sites in the caudal auditory cortex of awake macaque monkeys listening to the same acoustic stimulus sequence (see Kayser et al. 2009). Similar to the EEG, LFPs revealed the imprinting of the acoustic stimulus on the temporal profile of slow oscillations (Fig. 3A): low-frequency LFPs exhibited high values of cross-trial phase coherence (median 4–8 Hz: 0.39), revealing a similar frequency dependence of oscillatory phase coherence in EEG and LFPs. In addition, also the time courses of stimulus entrainment in theta oscillations were similar between the scalp EEG and the intracranial LFPs. Because the relative phase and amplitude between frequency bands extracted from LFPs and EEG may differ due to tissue filtering and reference selection (Nunez 2005), we quantified the similarity of these signals using the time course of the phase coherence: The subject- (EEG) and site- (LFP) averaged time courses of phase coherence were significantly correlated between EEG and LFP (r = 0.34, P < 10−10) and were of comparable magnitude as correlations between the time courses derived from individual human subjects (0.45 ± 0.015, mean ± standard error of the mean [SEM]) or individual intracranial recording sites (0.28 ± 0.01, mean ± SEM). The acoustic stimulus hence imprinted temporally similar dynamic signatures in slow oscillatory patterns in intracortical and scalp field potentials. Still, correlated patterns of phase coherence do not imply similarity in phase-related stimulus selectivity, which requires further analysis (see below).
Sound epoch decoding from LFPs showed a similar dependency on frequency band and epoch duration as decoding the same epochs from scalp EEG (Fig. 3B,C). Decoding performance using LFPs peaked in low-frequency bands (4–8 Hz) and was significant for both power and phase (randomization test, P < 0.01). As in the case of EEG, decoding performance was significantly higher when using phase rather than power (e.g., 4–8 Hz band: median 27.8% vs. 16.6%, sign-rank test P < 10−6; Fig. 3B). The firing rates of 36 neurons recorded at the same locations as the LFPs were also stimulus selective (Fig. 3A). The time-binned firing rate of individual neurons permitted a level of sound epoch discrimination comparable to that provided by the low-frequency phase (median 22.4% for 240-ms windows; Fig. 3B,C). Overall, this demonstrates common patterns of stimulus-selective activity in scalp and LFP oscillations with a superiority of phase over power in providing stimulus-specific activity patterns.
Predicting Firing Rate–Based from EEG/LFP-Based Decoding
To relate the selectivity and decoding performance derived from slow oscillatory signals to that of neural firing rates, we performed 2 analyses. The first was based on the similarity of the degree to which individual sound epochs were correctly decoded using firing rates or the theta-band (4–8 Hz) LFP: we computed the correlation of the percentage of trials at which each of the 10 epochs was correctly decoded (the diagonal of the decoding matrix) across all 300 sets of sound epochs. This effectively implements a comparison of “response preference” by quantifying whether sound epochs that can be well separated from others using the LFP can also be separated using firing rates and vice versa. We found the decoding pattern to be more similar between phase and firing rates (240-ms time window: median correlation 0.19; Fig. 4A) than between power and firing rates (median 0.11). Across recording sites, the similarity with the phase was systematically (e.g., 29 of 36 neurons, 80%; Fig. 4B) and significantly higher (sign-rank tests for each time window, P < 10−3), and similar results were found for all time windows (Fig. 4A). The second comparison was based on the similarity of the full decoding matrix, hence also taking into account the degree to which both signals “confuse” different epochs during decoding. Again, correlations between LFP phase and firing rates were stronger (median 0.25) than between LFP power and firing rates (median 0.13), a significant effect across sites (120-ms window P < 10−3, 240 and 360 ms P < 10−5; Fig. 4C).
We also performed a multiple regression analysis testing how much the decoding pattern in firing rates can be predicted using those derived from the LFP. The standardized beta values, which quantify the contribution of power and phase to predicting the decoding performance of firing rates, were significantly higher for phase (240-ms window: median value 0.23) than for power (median 0.1; sign-rank tests at least P < 10−4; Fig. 4D). The conclusion that the phase of slow oscillations is a better predictor for the selectivity of firing rates than the power is also supported by quantifying the contribution of phase and power to the joint performance in predicting decoding patterns in firing rates: The F-ratio for the contribution of phase (median F = 1035) was significantly higher than that for the contribution of power (median F = 336; sign-rank test P < 10−5). This demonstrates that the degree to which different sound epochs can be decoded from neural firing rates is better predicted by the phase pattern of theta-band oscillations than by their power.
Importantly, the pattern of sound discrimination afforded by LFPs was similar to that by encephalographic potentials obtained using EEG. By exploiting a comparison across the sets of sound epochs used for decoding, we were able to directly compare the discrimination performance obtained from intracranial signals and scalp EEG, despite them not being recorded at the same time. We computed the correlation of the overall decoding performance across all 300 epoch sets using the average percentage of correctly identified epochs for each set as discrimination measure. This confirmed the above reported higher correlation between firing rates and LFP phase (r = 0.62, randomization test P < 0.01) than between firing rates and LFP power (r = 0.52, P < 0.01). In addition, this revealed a close correspondence between decoding performance in LFP phase and EEG phase (r = 0.48, P < 0.01) and a weaker correlation between LFP power and EEG power (r = 0.22, P < 0.01). And most importantly, this revealed that discrimination performance was significantly correlated between firing rates and EEG phase (r = 0.25, P < 0.01) but not between firing rates and EEG power (r = 0.08, P > 0.05; Fig. 4E). Stimulus-selective patterns in the phase of slow encephalographic potentials that permit the discrimination of different sound epochs hence co-occur with similarly selective patterns in the underlying neural firing in auditory cortex.
Recent work suggests that the dynamics of encephalographic oscillations reflect stimulus-specific activity patterns and that the sensory information carried by slow oscillations is greater in their precise timing (phase) compared with their amplitude (power) (Luo and Poeppel 2007; Howard and Poeppel 2010; Luo et al. 2010; Schyns et al. 2011). The phase of slow oscillations can also serve as a proxy for the excitability or attentional state of cortical networks and can predict whether weak sensory stimuli will be perceptually detected (Busch et al. 2009; Mathewson et al. 2009; Stefanics et al. 2010; Drewes and VanRullen 2011). These observations pinpoint the phase of slow oscillations as a powerful indicator to study sensory processing and cognition (VanRullen et al. 2011). However, it remains unclear whether the stimulus selectivity reported for the phase of EEG signals directly reflects the selectivity of neurons in the underlying sensory areas or whether this observed selectivity rather results from the indirect and aggregate nature of encephalographic signals. Our results clearly speak in favor of the first alternative by demonstrating a direct correlation between the stimulus selectivity in neural firing and EEG phase patterns in the context of naturalistic sounds.
Dynamic Auditory Signatures in the EEG
Dynamic complex stimuli, such as natural sounds or movies, can entrain neural activity at early stages of the respective sensory cortices (Luo and Poeppel 2007; Lakatos et al. 2008; Montemurro et al. 2008; Schroeder and Lakatos 2009). This entrainment is reflected in the dynamics of neural firing and slow field potentials, where it is visible as a consistent alignment of the neural signal relative to the sensory stimulus. The entrainment of large-scale population activity can also be registered from outside the scalp where it results in stimulus-locked oscillatory phase patterns. Our results show that the encephalographic phase patterns reflect stimulus-selective activation patterns evoked by the neural responses of those cortical areas generating the oscillations. Specifically, we found that the time course of oscillatory phase coherence was correlated between scalp EEG and intracranial LFP oscillations, demonstrating similar temporal drive in both signals. In addition, we found that the selectivity of neural firing rates within a specific stimulus context correlates with the selectivity of EEG phase but not EEG power; in other words, stimuli that were well discriminable using firing rates were likely also discriminable using phase patterns and vice versa.
Given the spatial separation between EEG scalp electrodes and auditory cortical neurons, this match of selectivity could not be expected a priori. In particular, auditory-evoked EEG signals were recorded over central scalp locations, which are known to be most sensitive to the electric fields generated by presumed dipole sources located in auditory cortex (Burkard et al. 2006; Stefanics et al. 2010). The auditory cortex itself, however, is buried within a sulcus and separated from the central scalp locations by cortical folds and at several centimeters distance, reducing the passive electrical coupling to central scalp locations (Kajikawa and Schroeder 2011). While this spatial separation between auditory cortex neurons and central scalp locations makes the common phase entrainment in EEG or LFP oscillations nontrivial, it may also result in a contribution of additional brain structures outside auditory cortex to the EEG. In general, activity not originating from auditory cortex may nevertheless carry acoustic information, for example, by virtue of cross talk between sensory streams (Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007). Signals not directly related to the experimental sound stimulus may contribute as “noise” to the decoding performance and could possibly reduce the apparent selectivity of the EEG signal. Indeed, we observed lower decoding performance in the same frequency bands derived from EEG than LFP. While this may well arise from additional and not stimulus-related components in the EEG, there are several reasons for why such additional influences are likely to be small. By design of the experiment, the human subjects were only presented with an acoustic stimulus, hence reducing stimulation of other modalities. In addition, subjects were paying attention to this stimulus, as required by the task and shown by their good performance. Attention is known to improve the entrainment of early sensory areas to sensory stimuli and hence strengthens the auditory-driven component of the studied EEG signals (Lakatos et al. 2008). Together with the comparable profile of stimulus selectivity in low-frequency oscillations reported in previous MEG studies (Luo and Poeppel 2007; Howard and Poeppel 2010), this suggests that the EEG signals analyzed here indeed mostly originate from auditory cortex.
Acoustic Entrainment and Its Putative Function in Auditory Processing
Slow temporal envelope modulations are characteristic for naturalistic sounds, such as speech or animal vocalizations (Chandrasekaran et al. 2009). They are important for intelligibility, and in the case of speech, they carry information about syllabic structure (Rosen 1992; Drullman et al. 1994a, 1994b). Neural activity in auditory cortex phase locks to these slow modulations not only at the level of field potentials but also individual neurons exhibit responses that are time locked to the slow stimulus dynamics or are tuned to specific temporal modulation frequencies (Lu et al. 2001; Wang et al. 2008). It remains unclear whether this phase locking of single neurons is the direct generator of stimulus entrained low-frequency oscillations or whether more complicated circuits involving intracortical or thalamocortical loops are involved. However, a recent study has shown that the firing of auditory cortex neurons is time locked to the phase of low-frequency rhythms, especially during periods when neurons are strongly driven by the sensory input (Kayser et al. 2009). This temporal spike phase relation creates an intriguing interplay between oscillations and neural firing (Panzeri et al. 2010), which may well be the reason of the reported correlations in stimulus selectivity between theta oscillations and neural firing rates.
It is worth noting that the entrainment of low-frequency rhythms is not restricted to naturalistic stimuli (Lu et al. 2001). It has been known for a while that rapid transitions or changes of amplitude or dominant frequency in acoustic stimuli evoke low-frequency complexes in EEG or MEG responses (Pantev et al. 1986; Hari et al. 1987) and a recent study demonstrated that MEG theta-band phase locking also persists for unintelligible complex sounds. Specifically, this study showed that entrained theta-band rhythms can be described by a periodic signal that is directly driven by low-frequency envelope changes of the acoustic stimulus (Howard and Poeppel 2010). This promotes a view by which entrainment is not specifically evoked by communication or behaviorally relevant sounds but rather reflects the dynamic imprinting of those slow sound envelope modulations that are crucial to distinguish and recognize complex natural sounds (Luo and Poeppel 2007; Schroeder et al. 2008). Entrainment may hence be necessary but not sufficient for intelligibility or comprehension.
Our results show that stimulus-entrained and stimulus-selective phase patterns can occur in the absence of increases in oscillatory power in the same frequency band. The control over oscillatory phase in the absence of changes in the same signals power is known as phase resetting and may be easily missed when studying only short sound stimuli. The initial transient from silent baseline to acoustic stimulation is often accompanied by evoked responses including an increase in oscillatory power, and the use of longer and continuous stimulation sequences, such as in the present study, is advantageous to detect periods of pure phase resetting. Together with the above, our findings reinforce the notion that phase resetting plays a central role in auditory encoding, possibly as a means for stimulus selection or segregation (Schroeder and Lakatos 2009).
The segmentation of auditory scenes into distinct objects and the formation of representations that are invariant to irrelevant or distracting sounds are 2 of the central functions attributed to (primary) auditory cortex (Nelken 2008; Sharpee et al. 2011). This, however, can also create difficulties when studying auditory cortex in the context of natural sounds, as selectivity to some and invariance to other features can result in comparable responses to distinct stimuli. To deal with this problem and to avoid making specific assumptions about the feature selectivity of auditory neurons or field potentials, we followed previous studies and exploited a decoding-based framework that is feature agnostic (Howard and Poeppel 2010; Luo et al. 2010; Cogan and Poeppel 2011). Rather, we used the identity of individual short epochs as decoding tag, and to ensure the generality of our results, we compared the decoding performance across multiple selections of sound epochs chosen at random from a rich sound sequence. Still, one may conceive that the observed decoding consistency between firing rates and oscillations results from spurious correlations rather than direct and possibly causal interrelations. For example, neural firing rates and theta oscillations may be selective to distinct stimulus features that simply happen to be correlated within the explored stimulus set. While correlations between complex acoustic features surely prevail in naturalistic sounds in general, such exquisitely patterned correlations seem unlikely given the large number of sound epochs explored and their duration of several hundreds of milliseconds. A more parsimonious explanation for the correlated selectivity in firing rates and phase patterns is that both signals are sensitive to the same acoustic features because the processes generating the oscillatory signal are mechanistically related to firing rates, as evidenced by the phase locking of spikes to slow cortical rhythms (Panzeri et al. 2010).
The Timing of Neural Activity and the Human–Simian Correspondence
The timing of neural responses and oscillations differs between humans and monkeys in general. For example, latencies of evoked responses to sensory stimuli in the monkey brain amount to about 3/5 of those in the larger human brain (Schroeder et al. 1995, 2004). However, this timing difference has been mostly studied in the context of evoked responses, and it remains unclear whether similar differences in time scales also pertain to ongoing or oscillatory activity. One could conceive that the same cortical circuits or neuron types generate signals of somewhat different frequency in humans and animals or that signals of the same frequency originate from different neural structures in these species. Yet, current consensus holds that oscillatory frequency bands and the underlying neural generators are comparable across species (Engel and Fries 2010; Uhlhaas and Singer 2010). And in particular with regard to the prominence of theta-band activity in auditory cortex, there is a good agreement between the frequency ranges reported in human MEG and EEG studies (about 2–8 Hz) (Ahissar et al. 2001; Luo and Poeppel 2007; Howard and Poeppel 2010; Luo et al. 2010) and those reported for intracranial signals recorded in the monkey (about 2–9 Hz) (Lakatos et al. 2005, 2007; Kayser et al. 2009; Chandrasekaran et al. 2010); even despite the use of different criteria to determine the relevant frequency range and the use of different stimuli in these studies. Hence, while we cannot rule out potential differences in the generators of auditory theta rhythms in humans or monkeys, it seems likely that the same frequency range reflects similar and auditory-driven processes in both species. Definite answers on such questions, however, can only be obtained once all neural signals can be recorded from the same subjects, either in an animal model or a human patient.
Interpreting Oscillatory Activity Patterns
Our results demonstrate that if a set of stimuli can be discriminated using the oscillatory phase pattern, they can likely be discriminated by firing rates and vice versa. This, however, does not necessarily imply a direct correlation of particular phase patterns to the strength of neural firing, as a high stimulus discrimination level is independent of whether discrimination of a given stimulus results from high or low levels of firing rates. In addition, our results show that stimulus discrimination from phase patterns does not necessitate increases in power of the same oscillatory frequency band. Specifically, we found that there is no reciprocal relationship of stimulus selectivity between firing rates and oscillatory amplitude (power), and our results demonstrate that stimuli which can be discriminated using their phase patterns can occur in the absence of increases of oscillation power. Thereby, our findings demonstrate a level of interrelation between scalp EEGs and neural activity that goes beyond previously reported correlations between the strength of neural firing and the amplitude of EEG oscillations (Schroeder et al. 1991; Whittingstall and Logothetis 2009). Our findings pertain to similarities in stimulus preference rather than in signal amplitude, and in general, correlations between amplitudes and selectivity concern 2 signal properties that are not necessarily dependent and which may offer complementary insights into the dependencies of different signals of neural activity. Together with other recent studies on encephalographic phase patterns, our findings enhance the link between the activity of sensory cortical neurons and noninvasively measured field potentials and improve the interpretation of EEG-based studies and their implications toward understanding the neural dynamics of sensory perception.
This work was supported by the Max Planck Society and Bernstein Center for Computational Neuroscience Tübingen, funded by the German Federal Ministry of Education and Research (BMBF; FKZ: 01GQ1002)
We are grateful to Andreas Bartels and Tim Schroeder for advice and help with the EEG experiments. Conflict of Interest : None declared.