Early representations of auditory features often involve neuronal populations whose tuning is substantially wider than behavioral discrimination thresholds. Although behavioral discrimination performance can be sometimes achieved by single neurons when using the appropriate part of their (wide) tuning curves, neurons that encode the resulting high-acuity representations have rarely been described. Here we demonstrate the existence of neurons with extremely narrow tuning for interaural time differences (ITDs), a major physical cue for the azimuth of sound sources. The tuning width of ITD-tuned brainstem neurons is mostly determined by the properties of their acoustic input, and may be 10–100 times wider than behavioral thresholds. In contrast, we show that tuning widths of some neurons in the primary auditory cortex in the cat high-frequency auditory cortex (measured using transposed stimulus) can be very sharp and approach behavioral thresholds. Furthermore, while best ITDs of brainstem neurons often lie outside the range of naturally encountered ITDs (the ethological range), the range of best ITDs of the narrowly tuned cortical neurons corresponds well to the ethological range. Thus, our results suggest that the auditory cortex contains a high-resolution representation of ITDs that explicitly decodes the widely tuned brainstem representations.
Interaural time difference (ITD) is a major physical cue for the azimuth of a sound source. The behavioral acuity of mammals in ITD discrimination tasks is remarkable. For pure tones, the just noticeable differences (JNDs) for ITD around 0 μs is about 1/100 cycle (Henning 1974; McFadden and Pasanen 1976; Nuetzel and Hafter 1981), so that it drops from about 100 μs at 100 Hz to 10 μs at 1000 Hz in humans (Klumpp and Eady 1956; Bernstein and Trahiotis 2002). ITD of pure tones cannot be used by humans above 1.5 kHz (Blauert 1982; Bernstein and Trahiotis 1985), but ITD sensitivity still exists at high frequencies. The JNDs for the ITD of a high-frequency tone with a 30-Hz sinusoidally amplitude-modulated (SAM) envelope is about 300 μs (Bernstein and Trahiotis 2002), again representing about 1/100 cycle of the envelope.
In contrast with the high behavioral acuity, the coding of ITD in neural activity in the brainstem of mammals is done by populations of neurons with wide tuning. In the medial superior olive (MSO) neurons approximately compute a correlation between the signals from the 2 ears. Thus, the widths of the delay functions (neuronal response as a function of ITD) measured with pure tones are determined by tone frequency, and are about half the period of the tone (Goldberg and Brown 1969; Yin and Chan 1990). While the delay functions sharpen above the MSO, frequencies lower than 300-Hz produce delay functions widths that are still about 1/3 of the tone period even in the thalamus and in primary auditory cortex (A1) (Reale and Brugge 1990; Fitzpatrick et al. 1997, 2000). The same is true for SAM high-frequency tones (Fitzpatrick et al. 2000, 2002).
To achieve behavioral acuity, the responses of these populations of widely tuned neurons need therefore to be further processed. Two approaches that have been suggested for doing this consist of pooling responses across neuronal populations (Fitzpatrick et al. 1997; Skottun et al. 2001), or using slopes of the delay functions through midline, in particular by comparing the responses of populations of neurons in the 2 hemispheres (Fitzpatrick et al. 1997; McAlpine et al. 2001; Skottun et al. 2001; McAlpine and Grothe 2003; Shackleton et al. 2003; Gordon et al. 2008; Harper and McAlpine 2004). It is an open question whether the results of such decoding schemes are explicitly represented by downstream neurons. A hallmark of such explicit representations would be neuronal tuning that is much narrower than that of the initial sensory representations (e.g., Bitterman et al. 2008).
We studied ITD coding in high-frequency cat A1 using amplitude-modulated stimuli that have been introduced by van de Par and Kohlrausch (1997), called transposed stimuli. Like SAM tones, the transposed stimuli are generated by multiplying a high-frequency carrier by a low-frequency envelope. Instead of using a sinusoidal envelope, the envelope of the transposed stimuli is generated by applying a simple hair cell model (half-wave rectification followed by low-pass filter) to a low-frequency stimulus. The transposed stimuli have been developed in order to make the temporal information that is available in the high-frequency channel centered at the carrier frequency comparable with the temporal information at the low-frequency channel from which the envelope has been derived. By changing the ITD of these stimuli, it is possible to lateralize them (shift their perceived location inside the head) to the same extent as their low-frequency counterparts, unlike SAM stimuli which produce smaller shifts (Bernstein and Trahiotis 2003). Remarkably, behavioral acuity for these stimuli is independent of the envelope modulation rate and is about 100 μs even for an envelope modulation rate of 32 Hz (Bernstein and Trahiotis 2003). These properties raise the question of whether the neural representation of transposed stimuli is really similar to that of their low-frequency counterpart. At least in subcortical stations, the answer seems to be in the affirmative: Griffin et al. (2005) demonstrated that neurons in inferior coliculus (IC) were sensitive to the ITD of transposed stimuli, and similarly to the delay functions of low-frequency tones in low-frequency IC, high-frequency IC neurons had delay function for transposed stimuli whose widths were about 1/3 of the period of the envelope. Thus, as in the case of low-frequency tones, the tuning width of high-frequency subcortical neurons does not represent explicitly the behavioral acuity that can be achieved with transposed stimuli. Of course, the information necessary for making fine sensory discriminations is available subcortically, for example through the steep slopes of the delay functions in the ethological range (Griffin et al. 2005).
Our main result is the finding of a neuronal population in auditory cortex with sharply-tuned delay functions whose best ITDs span the ethological range of the cat. We interpret this finding as a manifestation of a transformation in the representation of ITD: a population code in subcortical stations is transformed into an explicit single neuron representation for ITD at the level of A1.
Materials and Methods
The data were collected from 9 healthy adult cats. The joint ethics committee (IACUC) of the Hebrew University and Hadassah Medical Center approved the study protocol for animal welfare. The Hebrew University is an AAALAC International accredited institute. Anesthesia was induced with medetomidine (Domitor, Orion Pharma; 0.2 mg, i.m.) and ketamine (100 mg, i.m., Fort Dodge) and maintained with halothane (0.2–1.5% as needed). Extracellular recordings were performed in the left primary auditory cortex. We used up to four individually moveable glass-coated tungsten microelectrodes to record neural activity. Spike waveforms were sampled and stored for offline sorting (AlphaMap, Alpha-Omega). Spikes were sorted online (MSD, Alpha-Omega, Nazareth Illit, Israel: template-based sorting) and, in addition, spikes that were believed to be well separated during the experiment were further sorted offline using a version of the Wave-Clus algorithm (Quiroga et al. 2004) under manual control. Spikes were assigned to 3 quality levels: well-separated spikes representing the activity of single neurons, partially separated spike shapes representing the activity of small clusters of neurons, and multiunit activity. More details on spike sorting and the criteria used to assign quality level to spikes are available in Moshitch et al. (2006).
All stimuli were generated digitally. Pure tones and broadband noise bursts were generated online (AP2, Tucker-Davis Technologies, Alachua, FL, USA), converted to analog voltage (DA3-4, Tucker-Davis), attenuated (PA4, Tucker-Davis), and switched with onset and offset ramps of 10 ms (SW2, Tucker-Davis). The sounds were presented to the animal through sealed earphones (designed by G. Sokolich) that were calibrated in each ear.
Initial characterization of all units was performed using pure tones and broadband noise bursts (BBN). The pure tones were used to measure a frequency response area (FRA) and to determine the characteristic frequency (CF) and minimum threshold of the neurons (as in Moshitch et al. 2006). A noise threshold was also determined from the response to the BBN stimuli. Firing rates were rather high, as previously described in halothane-anesthetized cats: The mean maximal rate at the most responsive tone was 54 ± 36 spikes/s (mean ± SD), and the mean response rate to BBN was 15 ± 20 spikes/s. The parameters and sound level of the transposed stimuli were set to values that usually fitted the most responsive unit among all the electrodes.
The transposed stimuli were generated by multiplying a high-frequency carrier and a low-frequency envelope. To illustrate the ideas behind the construction of the transposed stimuli, Figure 1 shows the various stimuli involved (first 2 columns), the expected firing patterns of auditory nerve fibers in response to these stimuli (generated using the software package aim-mat (Bleeck et al. 2004; third column), the autocorrelation functions of the expected neural patterns (fourth column) and the power spectra of the original sounds (fifth column).
Row A illustrates the processing of a 128-Hz pure tone in the peripheral auditory system. The most obvious transformation that a pure tone undergoes is half-wave rectification (compare second and third columns), leading to substantial off-periods of low or no firing. The ITD detectors of the MSO would cross-correlate 2 such firing patterns arriving from the 2 ears, producing a periodic autocorrelation pattern.
SAM tones (row B) are expected to generate firing patterns with substantially shorter off-periods, because envelopes, which do not have negative values, do not undergo as much half-wave rectification as the pure tones from which they are derived. The expected firing rate of an auditory nerve fiber centered at the carrier frequency, showing the expected reduction in the off-periods, is illustrated in row B, third column. As a result, the autocorrelation has a substantially reduced dynamic range. This reduction has been linked to the weaker lateralization that is evoked by high-frequency SAM tones in humans (Bernstein and Trahiotis 2002, 2003).
To remedy this, van de Par and Kohlrausch (1997) introduced the transposed stimuli, in which the pure tone envelopes undergo half-wave rectification before multiplying the carrier (row C). The transposed stimulus has substantial off-periods (row C, second column) which are reflected in the expected neural firing patterns of an auditory nerve fiber whose best frequency equals the carrier frequency (row C, third column). These patterns mimic much better those of the low-frequency tone from which the envelope was derived (third column, compare rows A and C). As a result, the autocorrelation function of the neural activity shows deeper modulation. Indeed, the JNDs of ITD discrimination (at least in humans) are smaller for the transposed stimulus than for SAM tones with the same modulation frequency, and are even smaller than the JNDs for the ITD of the pure tones used to derive the envelopes. Actual responses of auditory nerve fibers to these stimuli were recorded by Dreyer and Delgutte (2006), generally confirming the results of the simulations shown here.
Of course, the transposed stimulus has a frequency content that extends beyond the channel centered at the carrier frequency (compare the frequency content of the pure tone with that of the transposed stimulus derived from the same pure tone, rows A and C, fifth column). These additional frequency channels could supply further information about ITD, which would be unavailable when using a low-frequency pure tone.
Specifically for our experiment, the envelopes of the transposed stimulus were derived from narrow noise bands (1/3 or 2/3 octave wide) centered at one of 4 frequencies: 32, 64, 128, or 256 Hz (Fig. 1D,E). These were half-wave rectified and filtered to remove frequencies below 2 kHz using the hair cell model implemented in the software package aim-mat (Bleeck et al. 2004). The carriers were either pure tones at the frequency characterizing the recording location (as in Griffin et al. 2005, Fig. 1D) or noise bands centered at that frequency with a width equal to the same frequency in Hz (Fig. 1E). The use of noise carriers was based on our experience with A1 indicating that neurons in the relevant stations of the auditory pathway often respond better to wideband stimuli than to narrowband stimuli (Nelken et al. 1999; Bar-Yosef et al. 2002). Because the envelope was narrowband, it had an approximate periodicity at its center frequency, allowing auditory nerve fibers to follow the timing information of the stimulus, as in Figure 1D,E. Although the expected firing patterns are not strictly periodic, the expected tuning width of the binaural cross-correlator is determined by the width of the autocorrelation function of the envelope, which is dominated, for narrowband signals such as those used here to derive the envelopes, by its center frequency (see Fig. 1D,E, fourth column, illustrating the expected oscillatory form of the autocorrelation function of the auditory nerve activity and the approximately equal width of the main autocorrelation peak of all the stimuli described here). Thus, the expected tuning width to ITD was calculated in terms of the center frequency of the envelopes.
Stimulus duration was 400 ms. Envelopes of that duration were generated offline and stored as disk files, whereas the carriers were generated online. The noise envelopes were calculated separately for each modulation rate (a total of 4 different noise tokens). For each rate, they were identical across trials and neurons. The noise carriers were generated trial by trial, and shaped in the frequency domain to have the appropriate center frequency and bandwidth before multiplication by the envelope. The sound level was set to 30 dB above minimal threshold at the selected center frequency for the stimuli with a pure-tone carrier [mean ± SD of ca. 70 ± 11 dB sound pressure level (SPL)] and 30 dB above noise threshold for the stimuli with a noise carrier (ca. 65 ± 12 dB SPL). ITD was generated by shifting the whole stimulus (400 ms) in one ear relative to the other. Thus, the stimuli included both onset and on-going disparities. We typically presented 5 repetitions of each of 21 different ITDs in a pseudorandom order (a total of 105 trials) at a presentation rate of about 0.45 Hz. ITD values usually ranged between 1500 μs left-ear and 1500 μs right-ear leading.
All neurons were tested with all combinations of the 2 carriers (pure-tone or noise) and the 4 modulation frequencies (a total of 8 carrier–envelope combinations). The combination of carrier type and modulation of the envelope that showed the strongest ITD sensitivity was selected for a detailed control study which included a repeated presentation of the best transposed stimuli (with 10 repetitions of 11 different ITDs in the same range as the previous protocol) as well as tests of ITD sensitivity evoked by the unmodulated carrier and of the monaural responses. The monaural presentations consisted of the same sounds that have been used in the main test of the responses to the transposed stimuli, except that only one ear was stimulated at a time. The monaural presentations to the left ear alone consisted of 55 repetitions of the same transposed stimulus. The right-ear presentations included the same small shifts in time that were used to create the ITDs of the dichotic presentations, and were included to verify that these small shifts of the stimulus in the right ear did not affect the neuronal responses by themselves.
The significance of the responses to the transposed stimuli was tested for each envelope–carrier combination by a paired t-test between the spike rate during stimulus presentation and the spontaneous spike rate just preceding each stimulus presentation (P < 0.05). The paired t-test was performed over 2 time windows, one between 10 and 50 ms after stimulus onset (onset responses) and the other between 50 and 250 ms after stimulus onset (sustained responses). To be included in the final database, the responses to any combination of a carrier and modulation frequency had to show a stationary response during the recording. Stationarity was checked by testing for possible differences in the responses ordered by presentation time (in groups of successive 15 trials) rather than by stimulus identity. Responses that had significant effect of time by this criterion were removed from the final database.
We recorded the responses of 446 units, of which 406 (90%) had at least one carrier–envelope combination with a significant response (onset or sustained). Maximal response rates to the best stimulus were 30 ± 23 spikes/s (mean ± SD). After testing for stationary responses, 235 units had at least one carrier–envelope combination that was included in the final database. The final database consisted of 1077/1880 carrier–envelope combinations with stationary responses from these 235 units.
We have previously shown that sustained responses are observed under halothane anesthesia even with pure-tone stimulation, similarly to recordings in awake animals (see Moshitch et al. 2006, for an extensive discussion). Sustained responses were found in this study as well. Only 78/1077 cases showed pure onset responses (defined as significant responses in the 50 ms following sound onset and nonsignificant responses at all other times); the other 999/1077 cases had at least some late responses.
Response onset was defined for each condition that evoked significant responses, as the beginning of the first 5 ms window (following stimulus onset) that showed a significant increase in spike counts relative to the spontaneous rate. This was done by comparing the spike counts in each such window with the Poisson distribution whose expected value was set to the mean number of spontaneous counts in the same window duration (using a conservative significance level of 5 × 10−5, to correct for the high number of comparisons).
Selectivity to ITD
The selectivity to ITD was tested separately for each carrier and envelope combination using generalized linear models (GLM, e.g., Nelder and Wedderburn 1972; McCullagh and Nelder 1989). GLMs are a generalization of multiple linear regressions but allow non-normal error distributions (one parameter exponential families, including Poisson, binomial, and gamma) and a nonlinear transformation between the weighted sum of covariates and the model parameter, encoding for example multiplicative rather than additive effects. The approach and its calibration for this specific dataset have been described in detail (Moshitch and Nelken 2014). For this test, we gathered responses to each 3 consecutive ITDs into one group, ending up with 7 ITD groups each having 15 responses. The models had 2 factors: ITD (with 7 levels) and time after stimulus onset (with 9 bins, corresponding to 9 nonoverlapping 50-ms windows). The response variable was the spike count of individual trials at each time window. The spike counts were assumed to follow the Tweedie distribution (Tweedie 1984) since the Tweedie distribution was found to fit the spike count data better than either Gaussian or Poisson distributions. The Tweedie distribution is an exponential family, and the variance (var) is related to the mean (μ) as var = ϕμP with 1 ≤ P ≤ 2, encapsulating the overdispersion of spike counts relative to the Poisson distribution that is often encountered in cortex. The log of the mean response in each window (μ) was modeled as a linear combination of the explanatory variables. In the spirit of analysis of variance (ANOVA), we implemented a sequence of successively more complex statistical models in order to test for the significance of the effects of ITD and its interaction with the dynamics of the response during sound presentation. In particular, we demonstrated (Moshitch and Nelken 2014) that the use of the Tweedie distribution gives rise to more conservative statistical decisions (more rejections of the alternative) than GLMs based on the Poisson distribution.
A significant effect of ITD required the main effect of ITD to be significant when added to a model with a time factor only. While for many neurons, time after response onset had a strong effect on firing rates, interactions between ITD and time were weak. In consequence, firing patterns (at least at the resolution of our analyses) did not carry much information about ITD.
Locking of the responses to envelope fluctuations were analyzed using the cross-correlation (with 1-ms resolution) of the envelope with the peristimulus time histogram average over all ITDs. The cross-correlations were calculated only for envelopes at modulation rates of 32, 64, and 128 Hz (804 cases with significant responses and 174 significantly tuned responses). Less than 7% (55/804) of the responses had correlation r > 0.3. Among the significantly tuned responses only 20/174 (11.5%) cases had r > 0.3.
Thus, for the rest of the paper, we ignore the temporal aspects of the responses. Due to the small number of cases with significant envelope locking, we did not analyze temporal response patterns. Furthermore, for determining ITD selectivity, we analyzed spike counts and ignored the temporal structure of the responses.
Delay functions were computed by averaging the spike counts over a 450-ms window starting at response onset for each ITD (a total of 21 ITDs, 5 repeats each). To classify the delay functions, we generated a set of templates that varied in the peak location and in their width (as in Las et al. 2008). The templates were Gaussians with standard deviations that varied between 0.85 and 9.4 bins (where one bin correspond to 0.15 or 0.2 ms), and whose peaks included the whole range of tested ITDs. The template that had a maximal correlation with the delay function was selected for each carrier–envelope combination separately. The modeled delay function was calculated as a*(template) + b with a and b estimated using a linear regression of the measured spike counts and the best template. The modulation depth (MD) was set to a/(a + b). The best ITD and tuning width (at 50% of the template peak height) were calculated from the modeled delay function. The ITD dynamic range of the modeled delay function was defined as the ITD range over which the function dropped from maximal response to 10% above its minimal value. The ITD dynamic range is related to the steepness of the delay function.
In several studies, ITD sensitivity has been quantified by computing the synchronization coefficient of the responses (as described in Kuwada et al. 1987). For very narrow delay functions, such as those described in this paper, the synchronization coefficient is dominated by the spontaneous rate and therefore does not report the width of the delay function. We therefore avoided the use of the synchronization index in this paper.
Basic Response Properties
Among the 235 units that had significant responses to the transposed stimuli, 188 units (80%) responded significantly to pure tones as well (contributing 938 combinations of carrier and modulation rate with significant responses). These units had a mean maximal rate of 54 ± 36 spikes/s to the most responsive tone/level combination. The CF of these units had a geometric mean of 8.4 kHz and a standard deviation of 1.7 octaves. This reflects the tendency to record from the high-frequency area of A1 in these experiments.
The center frequencies selected for the carrier of the transposed stimulus had a mean of 7.2 kHz and a standard deviation of 0.8 octaves. The mean was not significantly different from the mean CF of the same units (t(187) = 1.82, n.s.). Because we recorded with multiple electrodes, it was impossible to fit the stimulus to the CFs of all recorded units; nevertheless, in many cases (535/938, 57%), the center frequency of the carrier was less than an octave away from the CF. More importantly, because of the relatively wide tuning of cortical neurons under halothane anesthesia (Moshitch et al. 2006), in 833/938 (88%) of the cases, the center frequency of the carrier was within the FRA at the sound level used in the experiment. Thus, in the majority of cases, the transposed stimuli had their spectral content within the FRA, and the delay functions reported here represent neuronal properties measured at the core of the FRA.
The minimum thresholds of the units had a mean of 16 ± 19 dB SPL. The overall energy of the transposed stimuli with a noise carrier had a mean of about 64 ± 12.5 dB SPL and the overall energy of the transposed stimuli with a tone carrier had a mean of about 69 ± 11.3 dB SPL. Thus, the levels used here were not very high but still substantially higher than minimum threshold, reflecting the fact that stimulation was within the core of most FRAs.
Tuning to ITD
Significant tuning to at least one of the carrier–envelope combinations was found in 107/235 (45%) of the units and in 255/1077 (24%) of the envelope–carrier combinations that evoked significant responses. Among the 235 units with significant responses, 57 (24%) were well separated, reflecting the responses of single neurons. Significant tuning to at least one carrier–envelope combination was found in 36/57 (63.2%) of the well-separated units.
Figure 2A displays the responses of 2 units to the transposed stimuli presented dichotically. Both units responded significantly to the dichotically presented stimuli. Furthermore, their responses to the transposed stimuli varied significantly as a function of ITD. The top unit responded most strongly to ITDs within a restricted range with best ITD of 0.6 ms, right-ear leading (denoted R0.6 ms from now on), while the bottom unit responded well to most left-leading ITD values.
In order to conclude that the units were tuned to ITD, a number of controls were required. The stimulus presented to the right ear was slightly different at every ITD value (see Materials and Methods). As a result, an apparent tuning to ITD could result from sensitivity to the slight variations in the stimulus presented to the right ear, without any binaural contribution. Figure 2B shows the responses of the 2 units to the right-ear stimulus, presented monaurally without its left-ear counterpart. The ordinate is marked by the ITD value that would be present had the left-ear stimulus been presented as well (nominal ITD). The top unit had a robust response to the right-ear stimulus presented alone, while the bottom unit showed a large reduction in its responses to the right-ear stimulus presented alone. Most importantly, the responses of neither unit varied significantly as a function of nominal ITD. Only 5% of the units showed a significant tuning to the right-ear stimulus presented alone, and this is comparable with the expected rate of false detections of the statistical test used to detect tuning under the assumption that no tuning to the right ear was present.
To further illustrate the importance of the binaural interactions in generating the responses of these units to the dichotic transposed stimuli, Figure 2C shows their responses (averaged over all ITDs) to the dichotic transposed stimuli (middle plot, average of panel A, average spike rates over all ITDs of 4.88 and 8 spikes/s for the top and bottom units, respectively), to the stimuli presented to the right ear only (bottom plot, average of panel B, average spike rates of 10.75 and 2 spikes/s for the top and bottom units, respectively) and to the left ear alone (top plot). For both units, the responses to the left ear alone were not significant, consistent with the right-ear dominance expected in left auditory cortex. Clearly, in both units, in spite of the null responses to left-ear alone stimulation, the tuned responses to the transposed stimuli reflected binaural interactions.
Among the 132 units that were tested with the control stimuli and had significant responses, 59 responded significantly to both monaural conditions (classified as EE), 50 responded significantly to the right ear alone but did not respond significantly to the left ear alone (EO), and 13 did not have significant monaural responses at all (OO/F). Ten units showed inhibitory response to the left monaural condition; of these, 4 were excited by the right ear (EI) and 6 did not show significant response to the right ear alone (OI). Even the response rates of the EE units showed a significant difference between the maximal driven rates under the 3 conditions (2-way ANOVA, F2,58 = 32, P <<0.01). Post hoc comparisons revealed that the responses to the dichotic stimulus were the strongest on average, followed by the responses to the right ear alone and to the left ear alone (15.1 ± 16.5, 11.7 ± 11.7, and 6.5 ± 7.5 spikes/s, respectively). Thus, the EE units showed substantial binaural facilitation on average. The EO units did not show significant differences between the average responses to the dichotic condition and to the right-ear alone condition (t(49) = 1.4, n.s. with maximal driven rates of 11.4 and 10.56 spikes/s, respectively).
Finally, we asked whether the complex structure of the transposed stimulus is really necessary for evoking the responses. For that purpose, we presented the carrier alone, without the low-frequency envelope. While we did not expect any ITD sensitivity of these responses, this condition tested for the importance of the envelope in determining the strength of the responses as well as their ITD sensitivity. The responses to the carrier alone as a function of ITD are displayed in Figure 2D for the same 2 units. Both units responded significantly to the unmodulated carrier but with a different temporal pattern than that evoked by the dichotic stimulus, and, importantly, with no selectivity to ITD. Thus, ITD sensitivity for these two units reflected the processing of the amplitude envelopes. There was no significant difference between the strength of the responses evoked by the full stimuli and by the carrier alone (t(86) = 1.27, n.s.), reflecting in part the tendency of cortical neurons under halothane anesthesia to have sustained responses even when responding to pure tones (Moshitch et al. 2006). The carrier alone evoked significant tuning to ITD in a small minority (12%, 10/87) of the tested units. Among these 10 cases, only 2 were in response to tone carriers. The rest of the significantly tuned responses to the carrier alone were evoked by noise carriers. The low levels of selectivity to the carrier alone could be therefore explained by the false detection rate of our tests (5%) and by the possible activation of low-frequency responses at the low-frequency edge of the FRAs by the noise carriers. We conclude that in the majority of cases, tuning to the transposed stimuli in our dataset indeed reflects binaural processing of envelope cues.
Properties of ITD Tuning
Responses to ITD of low-frequency tones have often a tuning width of about 1/3–1/2 cycle of the tone frequency (Reale and Brugge 1990; Fitzpatrick et al. 1997, 2000) and ITD sensitivity to amplitude-modulated sounds displays similar selectivity (Reale and Brugge 1990; Fitzpatrick et al. 1997, 2000; Griffin et al. 2005). Figure 3A displays the responses of a unit with similar properties. These are the responses of a well-separated unit to transposed stimuli with an envelope modulation rate of 256 Hz and a noise carrier centered at 13 kHz. The width of the modeled delay function was 1050 μs, which corresponded to 0.27 of the period of the envelope. Only a minority of cases showed this pattern (11%, 28/255 responses from 24/107 units whose widths were between 0.22 and 0.3 of the period of the envelope, see Fitzpatrick et al. 1997).
For some of the units we verified the approximate response periodicity that was expected from the approximate periodicity of the envelope. Figure 3B is an example of the responses of a unit to transposed stimuli with a modulation rate of 256 Hz (a period of ca. 4000 μs) and a pure-tone carrier at 12 kHz tested with a wide range of ITDs (10 ms right-ear leading to 10 ms left-ear leading, corresponding to about 2.5 cycles of the envelope modulation). This unit showed a tuning width of about 1000 µs around a best ITD of 300 µs right-ear leading. The responses of this unit had additional peaks with the expected period (4000 μs). The side peaks decreased slowly in size, presumably reflecting the autocorrelation of the envelope (which was derived from a noise band centered at 256 Hz and not from a pure tone, see Fig. 1D).
Many tuned neurons had delay functions that showed substantially narrower tuning to ITD. The responses in Figure 4A–H showed such narrow tuning, and were therefore tested with ITDs spanning less than one cycle of the modulation pattern (as illustrated in the figure). All had a clear peak in their ITD tuning that was shifted away from 0, but was within the ethological range for cats (R300, R150, R150, L300, L150, L150, L400, and R150 μs in Fig. 4A–H, respectively). The unit of Figure 4A had a tuning width of 450 μs for stimuli whose envelope had a center frequency of 32 Hz, and therefore a period of 31 250 μs. The tuning width of this unit represents 0.014 cycles of the envelope center frequency, more than an order of magnitude narrower than the expected 1/3 cycle. The tuning widths of the other units (Fig. 4B–H) were also substantially narrower than expected. In order, they were 750 μs (0.024 cycles of a 32-Hz envelope), 450 μs (0.029 cycles of a 64-Hz envelope), 750 μs (0.048 cycles of a 64-Hz envelope), 450 μs (0.058 cycles of a 128-Hz envelope), 450 μs (0.12 cycles of a 256-Hz envelope), 2200 μs (0.14 cycles of a 64-Hz envelope), and 750 μs (0.19 cycles of a 256-Hz envelope).
These units had a variety of temporal response patterns. The units in Figure 4A,C had a nontuned onset response, which was followed by a highly tuned sustained response. The responses of the units in Figure 4B,F consisted of tuned onset responses. The units in Figure 4D,H had a sustained, tuned response. The response of the unit in Figure 4E consisted of intense bursts of spikes, an early one that was evoked by most left-leading ITDs and a somewhat later one that had a narrower tuning, closer to an ITD of 0. The unit in Figure 4G responded robustly to all ITDs, but had a particularly high response rate to a restricted range of ITDs close to L400 μs.
Figure 5A–D present examples of responses with best ITDs outside the ethological range (best ITDs of L450, R900, R1350, and R1200 μs, respectively). However, these units had an ITD dynamic range (see Materials and Methods) of 1050, 1650, 2550, and 1000 μs, extending into the ethological range. Thus, these responses carried useful information for discriminating ITDs within the ethological range as previously suggested (McAlpine et al. 2001; McAlpine and Grothe 2003), with the additional twist that their ITD dynamic range was smaller than expected given the center frequency of the envelopes, potentially leading to better spatial resolution at the single-unit level.
We classified the shapes of the delay functions by comparing them with a set of Gaussian templates with varying standard deviations and means. The template that had the maximal correlation with the delay function was selected for each carrier–envelope combination and was termed the modeled delay function (see Materials and Methods). The best ITD and width of the delay function were estimated by those of the modeled delay function. For each delay function, a MD (see Materials and Methods for definition) was calculated as well. In order to remain conservative, we used for this analysis only those delay functions that had large enough MD (≥0.55, 169/255 tuned cases from 71/107 units).
Figure 6 summarizes the properties of the delay functions as scatter plots of tuning parameters against the best ITD. Three different measures of tuning are displayed in Figure 6: tuning width in ms, tuning width in cycles, and ITD dynamic range (Fig. 6A,C,E, respectively). All of the carrier–envelope combinations are summed together (30, 40, 44, and 55 responses to envelopes with modulation of 32, 64, 128, and 256 Hz, respectively). The ethological range of the cat (approximately ±400 µs, Roth et al. 1980) is marked by the gray patch.
Best ITDs were widely distributed (Fig. 6A,B), with both left- and right-ear leading values, although right (contralateral) shifts were more prevalent, presumably because we recorded from left A1 (average best ITD of R385.8 ± 758 μs, mean ± SD). There was no difference between the mean best ITDs at different modulation rates (2-way ANOVA on best ITD × unit, F3,95 = 1.38, n.s., Fig. 6F).
The average tuning width, measured in ms, was 1480 ± 1172 μs (interquartile range of 450–1950 μs). The average tuning width in cycles was 0.2 ± 0.2 cycles (interquartile range of 0.056–0.27 cycles) and the average ITD dynamic range was 1028 ± 732 μs (interquartile range of 400–1500 μs).
There were clear differences in the distribution of the widths for best ITDs inside the ethological range and outside the ethological range in Figure 6. We therefore considered these groups separately (as summarized in Table 1). The first group, with best ITDs within the ethological range, are illustrated in Figure 4 (46%, 78/169 responses from 35/71 units, 12 well separated). Most of these cases (51/78, 65.4%) had tuning widths smaller than 1000 μs. There was no significant dependence of the width, measured in ms, on the modulation frequency (Fig. 6B; 2-way ANOVA of log(widths) on modulation rate × unit, F3,95 = 2.13, n.s.). Since the fastest modulation frequency tested here was 256 Hz, these units had very narrow delay functions when measured in cycles. Thus, 66/78 (85%) cases from 31/35 (89%) units had tuning widths narrower than 0.22 cycles (see Fitzpatrick et al. 1997 for justification of this value) and 26/78 (44%) cases had tuning width narrower than 0.05 cycles. The remainder of the responses with best ITDs within the ethological range of the cat had widths >0.22 cycles (15%, 12/78 responses). The fact that the tuning width, measured in ms, was essentially independent of modulation rate implied that the tuning width, measured in cycles, showed a significant dependence on modulation rate. Indeed, a 2-way ANOVA of log(widths) on modulation rate × unit was highly significant (F3, 95 = 68.6, P <<0.01, Fig. 6D).
|Auditory responsive||Sensitive to ITD||Sensitive to ITD and MD >0.55||bITD inside ethological range||bITD inside ethological range and widths <0.22 cycles||bITD inside ethological range and widths <0.05 cycles||Insensitive to ITD|
|Number of cases|
|Number of units|
|Auditory responsive||Sensitive to ITD||Sensitive to ITD and MD >0.55||bITD inside ethological range||bITD inside ethological range and widths <0.22 cycles||bITD inside ethological range and widths <0.05 cycles||Insensitive to ITD|
|Number of cases|
|Number of units|
ITD, interaural time differences; MD, modulation depth; bITD, best ITD.
The second major group of responses (54%, 91/169 from 48/71 units, illustrated in Fig. 5A–D) consisted of responses with best ITDs outside the ethological range. Most of the responses (70%, 63/91) with best ITDs outside the ethological range had delay functions that sloped through the ethological range. They showed various ITD dynamic ranges, ranging between 400 and 3200 μs corresponding to different levels of steepness. About 30% of these responses (28/91) had best ITDs outside the ethological range and their delay functions did not slope through the ethological range (the dashed gray lines in Fig. 6A indicate the delay functions with the middle of their ITD dynamic range within the ethological range). It is unclear what this minority of the units are coding.
We demonstrated ITD sensitivity in a large fraction (>45%) of high-frequency auditory cortex neurons in halothane-anesthetized cats. About 30% (31/107) of the tuned units had very sharp ITD sensitivity (with widths narrower than 0.22 cycles and best ITDs within the ethological range). These units corresponded to 13% (31/235 neurons) of the auditory neurons recorded within the high-frequency region of A1. The data presented here constitute to the best of our knowledge the first demonstration of ITD hyperacuity, a tuning width that is substantially narrower than that found in brainstem neurons.
The major finding of the current study is therefore the existence of a subset of neurons in cat A1 with remarkably narrow delay functions. In addition to their narrow width, these delay functions also lost 2 other characteristics of subcortical delay functions. First, their best ITD, as well as their tuning width when expressed in ms, were independent of the modulation rate of the envelope, and covered (as a population) the same range of best ITDs for modulation rates between 32 and 256 Hz (compared with McAlpine et al. 2001). Such loss of dependence of the responses in cortex on a physical characteristic of the stimulus was reported also by Fitzpatrick and Kuwada (2001) for pure-tone stimuli. Second, these delay functions had best ITDs within the ethological range of the cat and were often closed, rather than sloping through the ethological range as found in midbrain ITD tuning (Fig. 6). Thus, we suggest that tuning to ITD in A1 becomes more invariant to irrelevant features of the stimuli (rate of envelope fluctuations), and is presumably encoding the ethologically relevant quantity, which is ITD in units of time. Although there are no behavioral studies using transposed stimuli in cats, Bernstein and Trahiotis (2002) demonstrated human behavioral resolution of transposed stimuli that was independent of the envelope modulation rate for frequencies lower than 250 Hz, with a JND of about 100 μs even for an envelope modulation rate of 32 Hz. This result may correspond to our finding that best ITDs and the width of the delay functions are independent of envelope frequency (Fig. 6F).
Subcortical Processing of Pure Tone and Envelope ITD
Responses of neurons in the MSO approximately reflect the correlations between the signals from the two ears. In consequence, the width of their neuronal delay functions measured with pure tones is determined by tone frequency, and is about 1/2 cycle (Goldberg and Brown 1969; Yin and Chan 1990). The delay functions sharpen above the MSO, having widths of about 1/3 cycle in the IC (Stanford et al. 1992; Fitzpatrick et al. 1997, 2002; McAlpine and Grothe 2003) and about 1/5 cycle in the thalamus (Stanford et al. 1992; Fitzpatrick et al. 1997). The sharpening of the delay functions has been found only for tone frequencies higher than 200 Hz: in all of the studies mentioned above, ITD sensitivity to tones was greatly reduced or disappeared for frequencies below 100 Hz. These studies should nevertheless be carefully interpreted because many of them used binaural beats which emphasize on-going, dynamical ITDs, and because some of the responses may have been recorded in the tails of the tuning curves of high-frequency neurons.
Thus, when using pure tones, at least up to the IC, neurons have widely tuned delay functions, which depend on the modulation rate of the stimulus and slope monotonically through the ethological range (Goldberg and Brown 1969; Yin et al. 1986, 1987; Batra et al. 1989; Yin and Chan 1990; Fitzpatrick et al. 1997, 2002; Brand et al. 2002). A number of recent studies have shown that the information required to reach the behavioral resolution of ITD discrimination is available in the responses of single neurons at the level of the IC (Skottun et al. 2001; Shackleton et al. 2003) because of the steep slopes of their delay functions within the ethological range. In consequence, McAlpine and Grothe (2003) made a strong case for the idea that in the brainstem, azimuth is represented by the balance of activation of 2 widely tuned populations of neurons, each of which respond essentially to all ITDs.
ITD selectivity in the MSO and IC has been found for SAM tones down to 25 Hz (Batra et al. 1989; Fitzpatrick et al. 2002). Delay functions for SAM tones in these studies were broader than those for tones and their width were about 1/3 cycle even in thalamus and in auditory cortex.
Two important studies tested responses to transposed stimuli in the auditory nerve (Dreyer and Delgutte 2006) and the IC (Griffin et al. 2005). Electrophysiologically, synchronization of auditory nerve fibers to transposed stimuli was stronger than to SAM tones, and was comparable with that of pure tones at least at low enough sound levels (as used in this study, see Materials and Methods) and below 250 Hz (Dreyer and Delgutte 2006). Consistent with a cross-correlation operation in the brainstem, responses of high-frequency IC neurons to transposed stimuli had delay functions that were comparable with those measured from low-frequency units with pure tones, with best ITD and tuning widths that depended on the modulation rate of the envelope (Griffin et al. 2005). Griffin et al. (2005) measured delay functions for SAM tones as well, and found them to be even broader than for the transposed stimuli. Thus, for an envelope modulation of 30 Hz, the width of the delay function to transposed stimuli in the IC is expected to be around 10 000 μs (1/3 cycle), about 30 times wider than the expected tuning width of a neuron that would explicitly reflect the behavioral discrimination limits. In addition, these delay functions are substantially wider than the ethological range (McAlpine et al. 2001).
The responses to transposed stimuli that we demonstrate in auditory cortex are very different—they show a substantially narrower tuning and independence from the period of the envelope. We conclude that these responses represent the result of additional processing mechanisms operating above the level of the IC.
Responses to Transposed Stimuli in Auditory Cortex
Only a few studies of ITD sensitivity have been conducted in auditory cortex (Brugge et al. 1969; Reale and Brugge 1990; Kelly and Phillips 1991; Fitzpatrick et al. 2000; Lohuis and Fuzessery 2000). Reale and Brugge (1990), recording in cat A1, indicated that ITD selectivity of neurons using low-frequency pure tones is remarkably similar to that of ITD sensitive neurons in the MSO and IC. They found a monotonically decreasing dependence of best ITD on the CF, at least up to 800 Hz, with many best ITDs outside the ethological range. However, the lowest frequencies they tested were around 100 Hz. Kelly and Phillips (1991), using clicks in anesthetized rats, found ITD dynamic ranges with a mean of 590 µs, although these relatively restricted ITD dynamic ranges likely resulted from the wideband nature of clicks. Fitzpatrick et al. (2000) obtained similar results with pure tones, finding a continuous distribution of best ITDs inside and outside the ethological range in the unanesthetized rabbit A1. In addition to low-frequency pure tones, they used SAM stimuli with high-frequency carriers and modulation rates as low as 25 Hz, which are more comparable with the stimuli used in the current study. The widths of the delay functions depended on the modulation frequency and were about 1/3 cycle.
Lohuis and Fuzessery (2000) studied ITD sensitivity in bats auditory cortex under barbiturate anesthesia using stimuli that compare best with the transposed stimuli used here. Their stimuli were trains of square-wave AM with a MD of 100% and short AM duty cycles of 10–30%. Their square-wave AM signals had modulations of 200–400 Hz on carrier tones at BF and they used static ITDs as we did. Their results are remarkably consistent with ours. Their neurons did not phase lock to the AM sounds. They also found highly sensitive tuning width with a mean ITD dynamic range of 175 µs (range between 80 and 370 µs). They do not report the values of their best ITD but the examples they provide suggest that they found a similar proportion of responses with best ITD near zero (their step-peaked and peaked responses).
The rates of envelope modulations of the transposed stimuli used in the current study are comparable with the ones used in previous SAM studies (Batra et al. 1989; Fitzpatrick et al. 2000, 2002) as well in a previous study using transposed stimuli in IC (Griffin et al. 2005). In contrast to these studies, we observed tuning widths of less than a millisecond even for stimuli with envelope modulation rates as slow as 32 Hz. Such very sharp tuning is also implied in Lohuis and Fuzessery (2000), although they mostly used higher modulation rates. Thus, cortical tuning to ITD of transposed stimuli may be substantially sharper than in subcortical stations.
While our results are consistent with those of Lohuis and Fuzessery (2000), extending them to a different species and stimulus conditions, they are at variance with those of Fitzpatrick et al. (2000) who studied cortical responses to SAM tones but failed to observe tuning as narrow as that reported here. Their experiment was different in crucial details from ours, possibly accounting for these differences. Most importantly, Fitzpatrick et al. (2000) used pure tones or SAM tones, while we used transposed stimuli. The perceptual acuity of ITD detection is better with transposed stimuli than with either pure tones or SAM tones, as long as the modulation frequencies are low enough (below 100 Hz) (Bernstein and Trahiotis 2003). We speculate that the wider spectral bandwidth of the transposed stimuli (even with a tone carrier) and the high rates of change of the temporal envelopes at each cycle of the amplitude envelope (see Fig. 1) may supply more precise temporal information to the auditory system than do a pure tone or SAM tone. This information may be used by the auditory system to generate the high degree of selectivity to ITD that we observed. An additional relevant difference between their study and ours include their use of binaural beat stimuli to measure ITD sensitivity. Such stimuli emphasize on-going responses and essentially ignore possible pure ITD sensitivity of the onset responses, which was clearly present in some of our units (e.g., Fig. 4B). Finally, there may be a species difference between cats (used by us) and rabbits (used by Fitzpatrick et al. 2000). Psychophysically, the ITD discrimination in rabbits (JNDs of noise stimuli 500–1500 Hz width is 50–60 μs, Ebert et al. 2008) is less accurate than in cats (30 μs, Wakeford and Robinson 1974; Roth et al. 1980) although they have similar head size and thus similar ethological range.
Effects of Anesthesia
The finding of narrow ITD tuning could be due to anesthesia, which sometimes causes potentiation of inhibitory mechanisms. Thus, the narrow ITD tuning could reflect an iceberg effect involving nonlinear thresholding of upstream responses in subcortical centers. For example, Gaese and Ostwald (2001) argued for such an effect on frequency tuning curves under barbiturate anesthesia, which they found to be narrower than in the awake state. However, under halothane anesthesia, cortical units tend to have wide frequency tuning (Moshitch et al. 2006). This wide bandwidth may enable across-frequency integration that is presumably expressed in the narrow tuning to ITD that we found.
More generally, we have previously shown that under halothane, responses show features that are associated with recordings in awake animals (see Moshitch et al. 2006, for an extensive discussion). In particular, the presence of sustained responses has been often cited as a major difference between anesthetized and awake recordings (Evans and Whitfield 1964; Zurita et al. 1994; Wang et al. 2005). In the data reported here, the majority of the units had responses beyond stimulus onset and the response rates were comparable with the ones reported for awake cats (Mickey and Middlebrooks 2003). As in Wang et al. (2005), some units tended to have sustained responses at their best stimuli and more transient responses for less optimal stimuli (e.g., Fig. 4A,C,G). The weak phase locking shown in our data could be due to anesthesia, but weak phase locking is often found also in the auditory cortex of awake animals (Miller and Schreiner 2000; Lu et al. 2001; Coffey et al. 2006). Thus, halothane anesthesia as used in this study is unlikely to be the origin of the narrowly tuned delay functions described here.
Explicit Representation of Sensory Variables in Auditory Cortex
Hyperacuity refers to behavioral resolution of a physical parameter that is better than the resolution of the initial neural representation of that parameter (Westheimer 1981). While the best known example of hyperacuity is Vernier acuity in vision, hyperacuity is actually widespread in the auditory modality. For example, both frequency and ITD show hyperacuity (with respect to the frequency tuning of hair cells and delay functions of MSO neurons, respectively), and in both cases, behavioral resolution can be 1–2 orders of magnitude better than the initial neural tuning. However, neural representations that reflect explicitly the behavioral resolution are rare in the mammalian auditory system.
Our observations suggest that between IC and cortex, the coding of ITD of the transposed stimuli undergoes a substantial change (suggested also by Coffey et al. 2006, studying the representation of interaural correlations). On the one hand, a representation by widely tuned delay functions which slope through the ethological range, as found in the brainstem, is maintained in auditory cortex (Fig. 5A–D). On the other hand, a subpopulation of neurons represents ITD with narrowly tuned delay functions. Thus, a subset of high-frequency A1 neurons represents ITD of transposed stimuli explicitly with individual neurons representing narrow ITD ranges, instead of the implicit representation by opponent populations as in the brainstem. We suggest that this transformation into an explicit representation is based on integration of the narrowband information represented by most IC neurons across the whole bandwidth of the transposed stimuli. Indeed, cortical neurons in the cat typically have wider frequency tuning (Read et al. 2001; Moshitch et al. 2006) than their thalamic inputs (Miller and Schreiner 2000) or typical IC neurons (Casseday et al. 2002; Griffin et al. 2005). The deep modulations and temporal gaps that characterize the transposed stimulus occur often in nature (Nelken et al. 1999; Bar-Yosef et al. 2002), so that the high-resolution representation specific to transposed stimuli that we describe here may represent a specialization for the processing of natural sounds.
Thus, a dramatic change in the coding of an important sensory parameter, ITD, occurs between subcortical stations (such as the IC) and A1. The subcortical coding of ITD of transposed stimuli at the single neuron level is implicit: delay functions are wide, and many units are likely to be activated by ITDs spanning the whole ethological range. In contrast, by presumably decoding brainstem widely tuned representations, a subpopulation of high-frequency A1 neurons shows a coding of ITD of transposed stimuli using narrow delay function curves whose width may approach human behavioral performance. Producing such explicit representations of ethologically relevant quantities may be part of the specific contribution of primary auditory cortex to hearing (Bitterman et al. 2008).
This study was supported by grants from the German-Israeli Foundation, the Volkswagen Foundation, the Israeli Science Foundation, and the Gatsby Charitable Foundation.
We thank Liora Las, Ayelet Hashachar Shapira, and Nevo Ta’aseh for help in data collection. Conflict of Interest: None declared.