Abstract

Harmonic complex tones produce pitch-height perception corresponding to the fundamental frequency (F0). This study investigates how the spectral cue of F0 is processed in neurons of the primary auditory cortex (A1) with sustained-response properties. We found F0-sensitive and -insensitive cells: the former discriminated between harmonics and noise, while the latter did not. F0-sensitive cells preferred F0s corresponding to the best frequency (BF) and 0.5 × BF. The F0-sensitivity to F0 = 0.5 × BF was preserved for missing F0, but abolished by eliminating both F0 and the second harmonic. The inhibitory subfield of the frequency-receptive field was restricted to the spectral region between the preferred harmonics in F0-sensitive cells, while it was frequency unspecific in F0-insensitive cells. We conclude that (i) A1 is well organized for discrimination between harmonics and noise; (ii) pitch-height is represented along with the tonotopic axis; (iii) all aspects of the sustained neural responses to harmonic and noise stimuli are consequences of spectral filtering; and (iv) although the observed cell behavior explains some psychophysical pitch perception behaviors, such as pitch-chroma (helical pitch perception with frequency elevation), pitch-level tolerance and adaptive behavior, F0-encoding in A1 remains at the incomplete perceptual level (dominance of the third to fifth harmonics for pitch strength is unexplainable by the cell behavior).

Introduction

A pure tone produces the perception of a single pitch-height that corresponds simply to the frequency of the tone (spectral pitch). A harmonic complex tone, composed of integer multiples of the fundamental frequency (F0), also produces perception of a single pitch-height corresponding to the F0 rather than multiple pitch-heights corresponding to the individual component frequencies. When the F0 is missing we still perceive a pitch-height corresponding to the missing F0, indicating that subjective pitch-height corresponds to a physical frequency that is actually absent (virtual pitch) (Terhardt, 1974). The biological importance of virtual pitch perception is suggested by its presence in multiple animals, including birds (Cynx and Shapiro, 1986), cats (Heffner and Whitfield, 1976) and monkeys (Tomlinson and Schwarz, 1988). Psychophysical experiments have demonstrated that two acoustic cues are involved in pitch-height perception: spectral and temporal cues. The former is a spectral relationship between resolved lower-frequency harmonics (Ritsma, 1967; Terhardt, 1979; Cohen et al., 1995; Renken et al., 2004), while the latter is temporal periodicity of the sound-wave ‘envelope’ (Bernstein and Oxenham, 2003; Renken et al., 2004), which explains pitch-height perception from the unresolved higher-frequency harmonics and amplitude modulated noise (Shouten, 1962; Plomp, 1967; Houtsma et al., 1980; Kaernbach and Demany, 1998; Renken et al., 2004). Pitch-height perception with resolved spectral stimuli is more salient than with temporal periodicity stimuli by amplitude-modulated wide-band noise (Burns and Viemeister, 1976; Fastl and Stoll, 1979; Houtsma, 1984), suggesting a larger contribution of the spectral cue to pitch-height formation in comparison with that of the temporal cue.

Pitch-height perception is an area of broad interest, spanning not just psychophysics, but also music and language (Patel, 2003). The neural mechanisms of pitch-height perception are also of interest to neurophysiologists because they represent a complex perceptual construct resulting from how the brain processes sensory information across disparate perceptual cues (temporal and spectral cues). Lesion studies in cats (Whitfield, 1980) and humans (Zatorre, 1988) show that the auditory cortex is crucial for virtual pitch perception of the complex tone. Therefore, the question remains how virtual pitch is represented in the auditory cortex.

Using magnetoencephalography (MEG), several studies in humans have examined the topographical representation of virtual pitch of harmonic complex tone in relation to the well-known tonotopic map of pure tone in the primary auditory cortex (A1). Two controversial findings were reported: Pantev et al. (1989, 1996) showed that the spectral pitch of pure tone and the virtual pitch of the missing F0 had the same evoked magnetic field location, suggesting that the tonotopic map of A1 reflects the perceived pitch of complex sounds rather than their spectral content. In contrast, Langner et al. (1997) demonstrated the orthogonal arrangement of tonotopic and periodotopic (virtual pitch) gradients.

Fishman et al. (1998, 2000) have noted the technical limitations of non-invasive MEG studies: they suggested that the supposition that magnetic response is generated in A1 with no contribution from adjacent auditory areas is problematic, and consequently, the assumption of a single dipole generator within the superior temporal gyrus may not be justified. Fishman et al. (1998, 2000) have performed experiments in awake monkeys with the use of auditory-evoked potential, multiple-unit activity and current source–density techniques. Laminar response profiles in A1 reflected the spectral content rather than the virtual pitch of compound stimuli (Fishman et al., 1998), and hierarchically organized cortical areas other than A1 were suggested to generate virtual pitch (Fishman et al., 2000).

With the use of amplitude-modulated tones composed of three harmonic components, the virtual-pitch mechanism in A1 was investigated in the single unit and in optical recording studies in gerbils by Schulze and Langner (1997) and Schulze et al. (2002). They investigated the topographical distribution of the best modulation frequency of amplitude-modulated tones, and found that periodic pitch derived from sound envelope periodicity is independently organized from the spectral organization of the tonotopic map in A1. Although Schulze and Langner (1997) provided a compelling explanation for how complex tones with similar spectral locations can have different pitches depending on their F0, the differential organization of periodicity and spectral sensitivity does not explain the key feature of pitch perception — that complex tones and pure tones with the same F0 have the same pitch, as noted by Fishman et al. (2000).

Using a harmonic complex tone composed of eight harmonics, Schwarz and Tomlinson (1990) also performed unitary experiments in alert monkeys, searching without success for A1 neurons that respond to the missing F0. This indicates that the neural responses in A1 were characterized by the spectral location of the harmonics relative to the neuron's frequency receptive field (FRF) but not to the missing F0. Their finding (Schwarz and Tomlinson, 1990) in single-unit studies is in good accordance with the findings in multi-unit studies by Fishman et al. (1998, 2000). However, the neural organization in A1 for decoding pitch-height from the harmonic complex tone remains unresolved.

The purpose of this study was to address how the spectral cue for pitch-height perception is processed in A1, with special attention to functional relevance such as the detection of harmonics, the discrimination between the spectral pitch-height of pure tone and the virtual pitch-height of harmonics, and the identification of pitch-height. For this purpose, we adopted harmonic complex tone stimuli with a wide range of spectral cues but fewer dominant temporal cues. We consider that an A1 cell is sensitive to the F0 of harmonics when the A1 cell has narrow tuning to a specific F0 of harmonic complex tones, excluding the sensitivity to co-varying parameters associated with F0 shift, such as low-edge frequency (LEF), bandwidth, and overall intensity. This study reports a group of F0-sensitive cells in A1 that respond to the missing F0 of the harmonics. The results are discussed in terms of a physiological substrate of the perception of pitch-height of harmonic stimuli.

Materials and Methods

Animal Preparation, Recording and Histology

Experiments were performed in a manner consistent with the Guidelines for Animal Experiments, University of Yamanashi, and the Guiding Principles for the Care and Use of Animals approved by the Council of the Physiological Society of Japan. Animal preparation, recording and histology procedures were as in previous reports (Chimoto et al., 2002; Qin et al., 2004a). Five cats were chronically prepared for single-cell recordings from the auditory cortex. Under pentobarbital sodium anesthesia (initial dose 40 mg/kg) and aseptic conditions, cats had an aluminum cylinder (inner diameter 12 mm) implanted in the bilateral temporal bone for microelectrode access, at an angle of 10–20° from the sagittal plane. A metal block was embedded in the dental acrylic cap to immobilize the head. After more than 1 week of postoperative recovery, the cat's body was gently wrapped in a cloth bag. The head was restrained with holding bars for a short period. In successive daily sessions, the period was lengthened, and the cats were familiarized with sitting in an electrically shielded, sound-attenuated chamber. The animals were given food and drink during the sessions and, after each session, they were returned to their home cages. The conditioning procedure lasted at least 2 weeks. When recording experiments began, they sat with no sign of discomfort or restlessness. One day before beginning the recording session, the bone (diameter 1–2 mm) at the bottom of the cylinder was removed, leaving the dura intact under ketamine anesthesia (initial dose 15 mg/kg).

The recording session began the following day. The dura was pierced with a sharpened probe, and a glass microelectrode (tip diameter, 1.8–2.5 μm: resistance, 2–3 MΩ; filled with 2 M NaCl) was advanced into the A1 with a remote-controlled micromanipulator (Narishige, MO-951). Tone bursts with variable single frequency (SF) and sound pressure level (SPL) were presented as search stimuli. The extracellular single-spike activity was discriminated using a window discriminator. The spike-occurrence outputs from the window discriminator were captured directly through a digital-to-digital interface (Cambridge Electronic Design Limited 1401), using a Pentium-based data-input computer with a time resolution of 2 μs as the digital input for later data analysis. Data were stored on hard disk.

The cat's face, particularly the eyes, was continuously observed on a monitor connected to a charge-coupled-device camera in front of the cat. In our preliminary experiments, we studied the relationship between the status of the eyes and electroencephalography (EEG). Slow waves in EEG were observed when the eyes were closed, and when the eyes were open but drifting, which were judged as a sleep state. Saccadic eye movements and eye fixation were judged as signs of an awake state. Rapid eye movements of paradoxical sleep, in which slow EEG waves were absent, were easily identified by their characteristic appearance of half-opened eyelids, and were judged as signs of sleep. When drowsiness was suspected, the cat was alerted by gently tapping the body with a remote-controlled tapping tool or by briefly opening the door.

The cats sometimes moved during recording sessions, producing artifacts in the recording. By carefully checking the monitors and the spike train, recordings with artifacts were marked in the recording computer in real time while recordings were in progress. Data with artifacts could therefore be rejected.

Daily recording sessions lasted 3–5 h for 2–6 months in each animal. At the end of each daily recording session, the recording chamber was rinsed with sterile saline and antibiotic fluid, and sealed with Exafine (GC Corporation) and an aluminum cap. The animal was returned to its cage. The animals remained healthy throughout the experimental period. At the termination of the experiment, some of the recording sites were marked with electrolytic lesions (25 μA, 10 s). The animal was deeply anesthetized with sodium pentobarbital and perfused with 10% formalin before the brain was removed. A digital camera was used to photograph the brain surface. The cerebral cortex was cut in transverse sections and stained with neutral red. The section was captured as a digital picture using a digital scanner. Based on the lesion locations and electrode tracks, the recording sites were reconstructed on the picture of the section. The position of the section was projected onto the picture of the brain surface.

Sound Generation and Delivery

Sound generation and delivery were as in a previous report (Qin et al., 2004a). In brief, sound signals were generated using user-written programs under a MATLAB (Mathworks) environment on a Pentium-based computer. Amplitude spectra were constructed, and transformed into time-domain signals by inverse Discrete Fourier Transform. Pseudo-random phase spectra were used to make a flat temporal envelope. The signals were fed into a 12-bit digital-to-analogue converter (National Instruments PCI-MIO-16E-4) at a sampling interval of 100 kHz to an eight-pole Chebyshev filter (NF Electric Instruments, P-86) with a high cut-off frequency of 20 kHz. The output was attenuated and sent to a low-output-impedance power amplifier (Denon, PMA2000III), and tones were then presented from a speaker (AKG, K1000) placed 2 cm away from the auricle contralateral to the recording site. We equalized and calibrated the sound delivery system frequently between 128 and 16,000 Hz at 8 Hz step, and the output varied by ±1.5 dB. Harmonic distortion was less than −60 dB. Stimuli were 500 ms in duration with a rise/fall time of 5 ms, and were presented at an interstimulus interval of 1640 ms. The intensity of each frequency component was set to 20–80 dB SPL in 10 dB steps.

Pure and Two-tone Stimuli and FRF Analysis

The procedures for evaluating FRF are as reported previously (Qin et al., 2004a). In brief, once a single neuron was found, we presented pure-tone stimuli with variable SF (125 steps) at a given SPL (Fig. 1A). Spike activities were analyzed using user-written programs under a MATLAB (Mathworks) environment. Driven rates (the firing rate during a stimulus period of 0.5 s minus the background firing rate during a pre-stimulus period of 0.5 s) were calculated for each SF to construct the isointensity SF-response function (Fig. 1G), which was filtered by applying the weighted averaging with four neighbors. The weight of the four neighbors and the center frequency was in the ratio of 1:2:3:2:1, respectively. By comparing the maximum heights of SF-response functions obtained at different stimulus intensities (20–80 dB SPL in 10 dB steps), we defined the ‘best SPL’ as the sound intensity producing the maximum driven rate and the best frequency (BF) as the SF producing the maximum driven rate. We defined the excitatory subfield of FRF at the best SPL as the portion where the SF-response function was positive. We considered the excitatory subfield as significant if the SF-response height was higher than 2 SD (interrupted line in Fig. 1G) of the background firing rates. The excitatory magnitude of FRF was defined as the height of the SF-response function above the baseline.

Figure 1.

Temporal and spectral-response properties of a representative F0-sensitive cell in response to 3 different stimulus paradigms. (A–C) Stimulus paradigms. For pure-tone (A), harmonic complex tone (B) and two-tone (C) stimuli, SF, F0 and S2F, respectively, were systematically shifted by 125 steps. The spectral profiles of 5 out of the 125 stimulus frequencies are shown. (D–F) Raster diagrams were constructed by plotting spike trains for stimulus parameters shifted systematically: SF for pure-tone stimuli (D), F0 for harmonic complex tone stimuli (E) and S2F for two-tone stimuli (F). Thick horizontal bar shows the stimulus period. (G–I) Response functions were constructed by plotting the driven rate against SF (G), F0 (H) and S2F (I), respectively. Vertical dotted lines in G, H and I show the frequencies corresponding to BF and 0.5 × BF (one octave below BF). Horizontal interrupted lines in G and H show 2 SD of the background discharge level. The thin solid horizontal line in H shows the half height of the mean (10 repetitions) driven rate to BF tone alone. Thick solid and dotted horizontal lines in I show the mean and −2 SD of the BF driven rate, respectively. Note that (i) the cell shows sustained firing during stimuli (black bars in D, E and F) of 0.5 s; (ii) there are two F0-tuning peaks (H) higher than the half BF driven rate (F0 = 0.5 and 1.0 × BF); and (iii) the S2F-response function (I) shows two inhibitory subfields restricted to two spectral regions (0.5–1.0 and 1.0–1.5 × BF) between harmonics whose F0 is 0.5 × BF.

Figure 1.

Temporal and spectral-response properties of a representative F0-sensitive cell in response to 3 different stimulus paradigms. (A–C) Stimulus paradigms. For pure-tone (A), harmonic complex tone (B) and two-tone (C) stimuli, SF, F0 and S2F, respectively, were systematically shifted by 125 steps. The spectral profiles of 5 out of the 125 stimulus frequencies are shown. (D–F) Raster diagrams were constructed by plotting spike trains for stimulus parameters shifted systematically: SF for pure-tone stimuli (D), F0 for harmonic complex tone stimuli (E) and S2F for two-tone stimuli (F). Thick horizontal bar shows the stimulus period. (G–I) Response functions were constructed by plotting the driven rate against SF (G), F0 (H) and S2F (I), respectively. Vertical dotted lines in G, H and I show the frequencies corresponding to BF and 0.5 × BF (one octave below BF). Horizontal interrupted lines in G and H show 2 SD of the background discharge level. The thin solid horizontal line in H shows the half height of the mean (10 repetitions) driven rate to BF tone alone. Thick solid and dotted horizontal lines in I show the mean and −2 SD of the BF driven rate, respectively. Note that (i) the cell shows sustained firing during stimuli (black bars in D, E and F) of 0.5 s; (ii) there are two F0-tuning peaks (H) higher than the half BF driven rate (F0 = 0.5 and 1.0 × BF); and (iii) the S2F-response function (I) shows two inhibitory subfields restricted to two spectral regions (0.5–1.0 and 1.0–1.5 × BF) between harmonics whose F0 is 0.5 × BF.

For two-tone stimuli, two frequency components were presented simultaneously at the best SPL, the first-tone frequency was fixed at the cell's BF, and the second-tone frequency (S2F) varied systematically at 125 steps (Fig. 1C). The driven rate was calculated for each S2F to construct the S2F-response function (Fig. 1I). We defined the inhibitory subfield of FRF as the portion where the S2F-response function was lower than the mean (10 repetitions) driven rate to BF only (solid line in Fig. 1I). We considered the inhibitory subfield significant if the S2F-response function was lower than the mean −2 SD (dotted horizontal line in Fig. 1I) to BF only. The inhibitory magnitude of FRF was defined as the deviation of the S2F-reponse function from the mean driven rate to BF tone alone, when the S2F-response function was lower than the mean driven rate to BF.

To evaluate the summation strength of excitatory and inhibitory subfields of FRF, respectively, corresponding to the harmonics of a given F0, we calculated the summation of excitatory and inhibitory magnitudes of FRF. The excitatory summation (ES) was estimated by sampling the excitatory magnitudes of FRF in a fixed frequency step, then summing the sampled magnitudes. The inhibitory summation (IS) was estimated by applying the same procedure to the inhibitory magnitudes. For each cell, ES and IS were calculated in nine different sampling steps (0.25 × BF, 0.5 × BF, 0.75 × BF,…, 2.25 × BF). In each sampling step, IS was subtracted from ES to give the net summation (NS), which was used to reflect the difference between ES and IS. A positive NS indicated that ES was dominant, while a negative NS indicated a dominant IS. For each cell, we plotted ES, IS and NS against the sampling frequency to predict the change of excitation and inhibition when FRF was sampled in different sampling steps.

To evaluate the excitation–inhibition balance of FRF quantitatively, we defined the balance index (BI) 

\[\mathrm{BI}{=}(\mathrm{ES}{-}\mathrm{IS})/(\mathrm{ES}{+}\mathrm{IS})\]
where the IS and ES used were obtained in the smallest sampling step (0.25 × BF) to measure the total strength of excitation and inhibition in FRF and compare it across cells. A positive value of BI indicated the dominance of excitation (BI = 1, excitation only); zero indicated the balance of excitation and inhibition; while a negative value indicated the dominance of inhibition (BI = −1, inhibition only).

Harmonic Complex Tone and Control Stimuli

Figure 2A shows the spectral profile of a harmonic complex tone, composed of integer multiples of F0 in the spectral range of 128–16000 Hz with equal spectral amplitude at the best SPL. The amplitude spectrum of harmonic complex tone stimuli was transformed into a time-domain signal of 500 ms in duration by inverse Discrete Fourier Transform using a pseudo-random phase spectrum, resulting in less dominant temporal periodicity of the sound-wave envelope corresponding to the period of F0 (Fig. 2B,C). Thus, in this study, harmonic complex tone stimuli have a wide range (128–16000 Hz) of spectral cues but less dominant temporal cues.

Figure 2.

Spectral and temporal profiles of a harmonic complex tone stimulus. (A) Amplitude spectrum of a harmonic stimulus with 256 Hz F0 containing 62 component frequencies in the spectral range of 128–16000 Hz. For clarification of the harmonic interval, only the 10 lowest and highest frequencies are shown. (B, C) The time signal of harmonic stimulus: onset 50 ms of total 500 ms in duration with a rise time of 5 ms (B) and time-expanded version (C). Arrows show a time interval of 3.9 ms corresponding to one period of 256 Hz. Note that the amplitude modulation in the stimulus temporal envelope is less dominant.

Figure 2.

Spectral and temporal profiles of a harmonic complex tone stimulus. (A) Amplitude spectrum of a harmonic stimulus with 256 Hz F0 containing 62 component frequencies in the spectral range of 128–16000 Hz. For clarification of the harmonic interval, only the 10 lowest and highest frequencies are shown. (B, C) The time signal of harmonic stimulus: onset 50 ms of total 500 ms in duration with a rise time of 5 ms (B) and time-expanded version (C). Arrows show a time interval of 3.9 ms corresponding to one period of 256 Hz. Note that the amplitude modulation in the stimulus temporal envelope is less dominant.

In the harmonic complex tone paradigm, the stimuli of 125 different F0 were presented in random order. The spectral profiles of five of the 125 harmonic complex tones are presented in Figure 1B. F0 was shifted systematically in the spectral ranges of 128–1120 Hz (step, 8 Hz), 128–2112 Hz (16 Hz) or 128-4096 Hz (32 Hz). One of the three F0 ranges was adopted for a given cell to include the frequencies at least one octave above and below the cell BF. For example, in cells with a BF 1.0 kHz, a range of F0 of 128–2112 Hz was adopted. Thus, in this study, the number of harmonics varied systematically. For example, when the F0 was 128 Hz, the number of the harmonics was 125 (component frequencies of 128, 256, 384, 512,…, 16 000 Hz). When the F0 was 4096 Hz, the number of harmonics was three (component frequencies of 4096, 8192 and 12 588 Hz).

A raster diagram of spike trains in response to 125 different F0s was constructed (Fig. 1E). On the ordinate of the raster plot (Fig. 1E), each harmonic tone contained F0 specified by a number (e.g. 1.0 kHz for a harmonic tone of F0 = 1.0 kHz) and, by definition, its multiple integers not indicated on the ordinate. One line of the raster plot (Fig. 1E) showed the response timing of cell spikes in response to one harmonic complex tone at a given F0. The driven rates were plotted against F0, constructing the F0-response function (Fig. 1H).

To generate high-pass noise stimuli, we constructed amplitude spectra composed of a large number of frequencies (range, 128–16 000 Hz; interval between components, 16 or 32 Hz, much narrower than the critical bandwidth for the lowest component frequency of 128 Hz) with equal spectral amplitude at the best SPL. LEF varied systematically (Fig. 5F), constructing the LEF-response function (Fig. 5H).

The missing F0 stimulus was constructed by systematically eliminating the lower harmonics, while preserving the higher harmonics (Fig. 7A). The response height was analyzed in relation to the lowest harmonic number of the stimulus (Fig. 7B,C).

Finally, we constructed two-component harmonic stimuli composed of F0 and the second harmonic (Fig. 7D). Similar to the full-component harmonics, we shifted the F0 of the two-component harmonics systematically (Fig. 7D), constructing F0-response functions (Fig. 7F).

Results

The results reported here are based on 102 single cells recorded from five cats. The recording site was determined with reference to the electric lesion and the electrode track in the brain section, and was projected onto a picture of the brain surface. Histological reconstruction of the recording sites showed that the cells were sampled from the caudal part of the middle ectosylvian gyrus, the banks of the dorsal tip of the posterior ectosylvian sulcus and a small portion of the adjacent posterior ectosylvian gyrus, i.e. the caudal part of A1 (Reale and Imig, 1980), in which the cells had a BF of <3.5 kHz. We used two criteria for distinguishing cells recorded in the A1 and the posterior auditory field (PAF): cell location and BF. The PAF is located ventrally to the A1, and BF increases with the depth of the cell location (Reale and Imig, 1980).

This study analyzed cells with sustained response properties so that a significant (P < 0.05) increase of the response spike rate above the background level was maintained throughout the stimulus period (0.5 s) of pure tone stimuli. These cells are tonic and phasic–tonic cells in the previous classification (Chimoto et al., 2002). A1 in awake cats also has phasic cells characterized by onset/offset responses (Chimoto et al., 2002). We analyzed only the sustained-response cells because our main interest was in spectral- rather than temporal-cue coding. Sustained response cells have a simple temporal firing pattern throughout the stimulus period (Qin et al., 2003; Qin and Sato, 2004), simplifying the spectral-cue analysis procedures. The sustained-firing property, first observed in pure-tone stimuli, was preserved in the other stimulus paradigms as long as vigorous responses were present (Fig. 1DF).

During the recording of A1 cells, pure-tone stimuli were presented with variable SF and intensity, constructing SF-response functions (Fig. 1G). We estimated BF (dotted vertical line labeled BF in Fig. 1G) and the best SPL of the cell (see Materials and Methods for definition). We then presented a harmonic complex tone at the best SPL to construct the F0-response functions (Fig. 1H). All 102 recorded cells were tested with the pure-tone and harmonic complex tone paradigms. At least two parameters may characterize the F0-response functions: the maximum response height and the response envelope. The former is the cell responsiveness parameter to the stimuli, while the latter is the parameter of tuning properties to the F0. Based on both parameters at the threshold of the half-driven rate for BF (thin solid line in Fig. 1H), we identified two types of cells: F0-sensitive (48 cells) and F0-insensitive cells (54 cells).

F0-sensitive Cells

F0-sensitive cells were defined by their responsiveness above the threshold and non-monotonic tuning properties to the F0 (the response height at the lowest F0 tested was less than 75% of the maximum height). Figure 1 shows the responses of a representative F0-sensitive cell. The F0-response function of the cell (Fig. 1H) was characterized by two F0 tuning peaks more than the half-driven rate for BF (thin solid line in Fig. 1H), separated by non-responsive regions <2 SD of the background discharge level (interrupted line in Fig. 1H). To investigate the relationship between the cell BF (Fig. 1G) and the peak frequencies of the F0-response function (Fig. 1H), SF and F0 were normalized for BF as 1, constructing normalized SF- and F0-response functions (Fig. 3B,G). Interestingly, two peaks appeared at regions one octave apart: one F0 peak was on the cell BF (F01st), while the other was at a frequency one octave below (F02nd) in this particular cell. F0-response functions of other example cells are also shown in Figure 3F,H. Generally, the F01st peak was present in all F0-sensitive cells, while the amplitude of the F02nd-peak varied from cell to cell, resulting in a single F0 tuning peak (single-F0-tuning cell, 18 cells, Fig. 3F) and double F0 tuning peaks (double-F0-tuning cells, 22 cells, Fig. 3G). Some cells had a third F0-tuning peak (F03rd) in addition to the F01st and F02nd peaks (multi-F0-tuning cells, eight cells, Fig. 3H) in which F03rd tended to be around 0.25 or 0.33 × BF. Figure 4 summarizes the above F0-peak characteristics by showing the distribution of peak frequencies in F0-response functions of single- (Fig. 4A), double- (Fig. 4B) and multi-F0-tuning cells (Fig. 4C).

Figure 3.

F0-sensitive versus F0-insensitive cells. (A–O) Response functions of three F0-sensitive (left three columns) and two F0-insensitive cells (right two columns) for SF (A–E), F0 (F–J) and S2F (K–O). The driven rate was normalized for the BF-response height as 1. SF, F0 and S2F were normalized for the cell BF as 1. For horizontal and vertical lines, see Figure 1 legend. Spectral profiles of harmonic complex tones are shown at the bottom of F–J when the F0 = 1, 0.5 and 0.25 × BF. (P–T) ES-F0 (solid line) and IS-F0 (dotted line) functions of three F0-sensitive (P–R) and two F0-insensitive cells (S–T). Note that (i) F0-sensitive cells have narrow F0-tuning peaks (F–H) and alternating ES-F0 versus IS-F0 patterns between preferred and non-preferred F0s (P–R), while (ii) F0-insensitive cells show a wide F0-tuning (I) and ES dominant pattern at any F0s (S) or no F0-tuning (J) and IS dominant pattern at any F0s (T).

Figure 3.

F0-sensitive versus F0-insensitive cells. (A–O) Response functions of three F0-sensitive (left three columns) and two F0-insensitive cells (right two columns) for SF (A–E), F0 (F–J) and S2F (K–O). The driven rate was normalized for the BF-response height as 1. SF, F0 and S2F were normalized for the cell BF as 1. For horizontal and vertical lines, see Figure 1 legend. Spectral profiles of harmonic complex tones are shown at the bottom of F–J when the F0 = 1, 0.5 and 0.25 × BF. (P–T) ES-F0 (solid line) and IS-F0 (dotted line) functions of three F0-sensitive (P–R) and two F0-insensitive cells (S–T). Note that (i) F0-sensitive cells have narrow F0-tuning peaks (F–H) and alternating ES-F0 versus IS-F0 patterns between preferred and non-preferred F0s (P–R), while (ii) F0-insensitive cells show a wide F0-tuning (I) and ES dominant pattern at any F0s (S) or no F0-tuning (J) and IS dominant pattern at any F0s (T).

Figure 4.

Distribution of peak frequencies in F0-response functions in F0-sensitive cells. Note that single- (A) double- (B) and multi-F0-tuning cells (C) tend to have peak F0s around cell BF (normalized F0 = 1) and/or one octave below (0.5).

Figure 4.

Distribution of peak frequencies in F0-response functions in F0-sensitive cells. Note that single- (A) double- (B) and multi-F0-tuning cells (C) tend to have peak F0s around cell BF (normalized F0 = 1) and/or one octave below (0.5).

F0-insensitive Cells

F0-insensitive cells have two subtypes: an example cell of the first subtype, an energy-integrator cell (see Discussion for an explanation of the name), is shown in Figure 3I. The cell was broadly tuned to F0, and the response height tended to increase with a decrease in F0, showing the maximum height of the F0-response function to be at low F0. Thus, energy-integrator cells were defined quantitatively to meet the criterion that the response height at the lowest F0 tested (128 Hz) was >75% of the maximum height. We identified 27 cells that meet this criterion. It is not likely that the broadly tuned F0 responses of the energy-integrator cells contribute to the identification of pitch-height.

An example cell of the second subtype, the non-responsive cell, is shown in Figure 3J. It showed no responsiveness to F0 (the response height was less than the threshold of the half-driven rates for BF). We found 27 cells that did not respond to harmonic complex tone stimuli.

Control Experiments Ruling Out Co-varying Parameters

In this study, harmonic complex tone is composed of F0 and a number of integer multiples of F0 in the spectral range, at most 128–16 000 Hz. This harmonic complex tone can also be regarded, from the point of view of energy distribution, as a high-pass tone whose LEF corresponds to F0. Thus, the upward shift of F0 in this study results in increased LEF, decreased spectral bandwidth, and a decreased number of harmonics equivalent to a decrease in overall intensity. Thus, there are three co-varying parameters associated with F0 shift of the harmonic complex tones in this study: LEF, bandwidth and overall intensity. It is important to rule out the possibility that the F0-response function reflects the sensitivity to those co-varying parameters. To rule out the LEF and bandwidth sensitivity, control experiments were designed by constructing high-pass noise stimuli with similar LEF and bandwidth as the harmonic stimuli. LEF varied systematically at the same step as F0 in harmonic stimuli (compare Fig. 5A,F), constructing the LEF-response function (Fig. 5H).

Figure 5.

Spectral similarity between harmonic and control noise stimuli and response properties of F0-sensitive and F0-insensitive cells. (A–E) Spectral profiles of 5 out of 125 trials of harmonic complex tone (A), raster display of F0-sensitive cell during 125 harmonic complex tone stimuli (B), and F0-response functions of F0-sensitive (C) and F0-insensitive cells (D and E). (F–J) Spectral profiles of 5 out of 125 trials of high-pass noise (F), raster display of F0-sensitive cell during 125 noise stimuli (G), and LEF-response functions of F0-sensitive (H) and F0-insensitive cells (I and J). The driven rate was normalized for the BF-response height as 1. F0 and LEF were normalized for the cell BF as 1. For horizontal and vertical lines, see Figure 1 legends. Note that (i) F0-sensitive cells show narrow tuning to F0 of the harmonics (C) but not to LEF of the noise (H) in spite of similar cut-off frequencies and bandwidths (compare A with F), and (ii) F0-insensitive cells show similar responses to both stimuli (compare D with I; E with J).

Figure 5.

Spectral similarity between harmonic and control noise stimuli and response properties of F0-sensitive and F0-insensitive cells. (A–E) Spectral profiles of 5 out of 125 trials of harmonic complex tone (A), raster display of F0-sensitive cell during 125 harmonic complex tone stimuli (B), and F0-response functions of F0-sensitive (C) and F0-insensitive cells (D and E). (F–J) Spectral profiles of 5 out of 125 trials of high-pass noise (F), raster display of F0-sensitive cell during 125 noise stimuli (G), and LEF-response functions of F0-sensitive (H) and F0-insensitive cells (I and J). The driven rate was normalized for the BF-response height as 1. F0 and LEF were normalized for the cell BF as 1. For horizontal and vertical lines, see Figure 1 legends. Note that (i) F0-sensitive cells show narrow tuning to F0 of the harmonics (C) but not to LEF of the noise (H) in spite of similar cut-off frequencies and bandwidths (compare A with F), and (ii) F0-insensitive cells show similar responses to both stimuli (compare D with I; E with J).

An example F0-sensitive cell responded to harmonic stimuli when the F0 was the cell BF and half BF (Fig. 5B,C). Nevertheless, noise stimuli with similar LEF and bandwidth did not evoke responses of the F0-sensitive cell (Fig. 5G,H), ruling out the possibility that the F0-response functions in F0-sensitive cells show tuning properties to the co-varying parameters of LEF and bandwidth. Similar results were found in all 29 tested F0-sensitive cells.

The overall intensity is the third co-varying parameter with F0 shift. In this study, the F0-response functions of F0-sensitive cells oscillated between the maximum and background discharge level when F0 shifted from F01st to F02nd. This F0 shift indicates double the number of harmonics, resulting in an increase of 3 dB in overall intensity. It is not likely that the activity of A1 cells changed drastically from the maximum to zero with a shift of overall intensity of just 3 dB.

Although the response property of F0-insensitive cells to high-pass noise stimuli was not our main interest, it is worthy of mention. The energy-integrator cells (tested in 16 cells) responded to noise stimuli (Fig. 5I) as well as harmonic complex tone stimuli (Fig. 5D). The response envelopes were similar to each other: both the F0- and LEF-response functions showed broadly tuned responses and the response height tended to increase with a decrease in F0/LEF, showing a function peak at low F0/LEF. Thus, it is impossible for energy-integrator cells to discriminate between harmonics and noise with similar bandwidth and energy distribution, that is, energy-integrator cells are insensitive to harmonic structure.

Non-responsive cells (tested in 18 cells) responded to neither the harmonic complex tone (Fig. 5E) nor high-pass noise (Fig. 5J), that is, the non-responsive cells were insensitive to both the harmonic structure and high-pass noise.

Excitatory and Inhibitory Summation Patterns Underlying F0 Sensitivity

This study has so far demonstrated the existence of a neuronal population in A1 that shows vigorous responses to harmonic tone only when the F0 of the harmonic tone is equal to the cell BF and half of BF, and these harmonic tone-sensitive cells show a suppression of such responses by the addition of non-harmonic frequency components, as exemplified by broadband noises. This failure raises the possibility that broadband noise itself has an inhibitory effect on the cells, such as non-harmonic frequency components of the noise falling on the inhibitory subfield of cell FRF.

To investigate the possible spectral excitation-inhibition mechanism involved in harmonic-component sensitivity and non-harmonic-component suppression, the excitatory and inhibitory subfields of the cell FRF were investigated using two-tone stimuli in addition to pure-tone stimuli. We presented two-tone stimuli to 68 of the 102 cells, constructing S2F-response functions. We noted that the S2F-response function of F0-sensitive cells showed inhibitory subfields restricted to two spectral regions (Fig. 1I, 0.5–1.0 and 1.0–1.5 × BF) between the harmonics (0.5, 1.0 and 1.5 × BF) of preferred F02nd (Fig. 1H). This finding suggests that (i) preferred harmonics with F0 = 0. 5 and 1.0 × BF have energy on the excitatory subfields but not on the inhibitory subfields, activated by the spike generation, while (ii) non-preferred harmonics, such as harmonics with F0 = 0.25, 0.75 and 1.25 × BF must fall in the inhibitory subfields (0.5–1.0 and 1.0–1.5 × BF), suppressed by spike generation. It is likely that this characteristic excitation-inhibition pattern generates cell-specific F0 sensitivity.

To investigate the above qualitative observation quantitatively, we analyzed the excitatory and inhibitory summation (ES and IS) of FRF magnitude, respectively (see Materials and Methods for details). ES and IS were calculated in different sampling frequencies corresponding to different F0s of harmonic tones. We then plotted ES and IS against F0, constructing the ES-F0 function (Fig. 3PT, solid lines) and IS-F0 function (Fig. 3PT, dotted lines), respectively.

F0-sensitive cells had a characteristic pattern of ES versus IS alternating between preferred and non-preferred F0s. For example, in single-F0-tuning cells (Fig. 3F), ES was dominant at only the preferred F0 (F0 = 1 × BF) surrounded by IS dominant regions at non-preferred F0s (Fig. 3P). In double-F0-tuning cells (Fig. 3G), ES was dominant at two preferred F0s (F0 = 0.5 and 1 × BF), alternating with IS dominant regions at non-preferred F0s (F0 = 0.25, 0.75 and 1.25 × BF) (Fig. 3Q). Finally, in multi-F0-tuning cells, similar alternating ES versus IS patterns between preferred and non-preferred F0s were identified (Fig. 3R).

The population data of ES-F0 and IS-F0 functions for 33 F0-sensitive cells are shown in Fig. 6A,D, respectively. To clarify the difference between ES-F0 and IS-F0 functions in each cell, the IS-F0 function was subtracted from the ES-F0 function, constructing the NS-F0 function (Fig. 6G). A positive NS indicated that ES was dominant, while a negative NS indicated a dominant IS. The NS-F0 function was thus constructed (Fig. 6G). NS was positive at F0 = BF in all F0-sensitive cells and at F0 = 0.5 × BF in most F0-sensitive cells, and it was negative in other regions (F0 = 0.25, 0.75, 1.25, 1.5, 1.75, 2.0 and 2.25 × BF).

Figure 6.

Excitatory and inhibitory summation patterns. ES-F0 (A–C), IS-F0 (D–F) and NS-F0 (G–I) functions were constructed for 33 F0-sensitive cells (A, D and G), 14 energy-integrator cells (B, E and H) and 21 non-responsive cells (C, F and I). The function height was normalized for the BF-response height as 1. F0 was normalized for the cell BF as 1. Note that all F0-sensitive cells have positive NS at F0 = BF and most F0-sensitive cells have positive NS at F0 = 0.5 × BF, while F0-insensitive cells have positive (H) or negative (I) NS at all F0s.

Figure 6.

Excitatory and inhibitory summation patterns. ES-F0 (A–C), IS-F0 (D–F) and NS-F0 (G–I) functions were constructed for 33 F0-sensitive cells (A, D and G), 14 energy-integrator cells (B, E and H) and 21 non-responsive cells (C, F and I). The function height was normalized for the BF-response height as 1. F0 was normalized for the cell BF as 1. Note that all F0-sensitive cells have positive NS at F0 = BF and most F0-sensitive cells have positive NS at F0 = 0.5 × BF, while F0-insensitive cells have positive (H) or negative (I) NS at all F0s.

In contrast, the energy-integrator cells had dominant ES over IS at any F0, as shown in an individual example (Fig. 3S) and population data (Fig. 6B,E,H), while the non-responsive cells had dominant IS at any F0s as shown in the individual example (Fig. 3T) and population data (Fig. 6C,F,I). Collectively, the findings suggest that characteristic excitatory and inhibitory summation patterns alternating between preferred and non-preferred harmonics underlie F0 sensitivity.

Figure 6 also suggests that there is some continuity between the cell classes. The excitatory curves are similar, with relatively large responses at low F0 (when many harmonics lie within the excitatory tuning curve of the neuron) and when F0 = BF. This is particularly clear when comparing Figure 6A with Figure 6C. Therefore, the difference between these classes is mainly due to the strength of the inhibition, which is weak in energy-integrator cells (Fig. 6E), medium in F0-sensitive cells (uncovering the excitatory responses for F0 = BF and F0 = BF/2, Fig. 6D) and strong in non-responsive cells (Fig. 6F). Furthermore, it seems that the main difference between energy-integrator cells and other cells is the width of their excitatory tuning curve, causing low-frequency peaks in Figure 6B to be more prominent. In contrast, F0-sensitive cells seem to have an inhibitory bandwidth of less than about one octave, so the second harmonic at F0 = 0.5 × BF can excite them. Thus, it seems that there is a continuum of response properties between three cell classes.

Characteristics of Excitatory and Inhibitory Subfields of FRF in Different-type Cells

In the previous section, we showed the difference between excitatory and inhibitory summation patterns among different cell types. Related to this, we are interested to see the relationship of F0 sensitivity to these single- and two-tone properties of cells, namely their excitatory and inhibitory half-height bandwidths and a measure of the strength of two-tone inhibition. The characteristics of FRF may contribute to the cell's excitatory and inhibitory summation patterns. For example, the FRF of an F0-sensitive cell has a narrow excitatory peak (Fig. 3A–C) and several narrow inhibitory troughs (Fig. 3K–M). Therefore, a slight difference in F0 could result in a drastic change in summation (Fig. 3P–R). The FRF of an energy-integrator cell has a broad excitatory peak (Fig. 3D) and no apparent inhibitory trough (Fig. 3N). Accordingly, decreased F0 (increasing the density of summation components) could cause a monotonic increase of excitatory summation, while maintaining inhibitory summation at a low level (Fig. 3S). On the other hand, a non-responsive cell has a deep and broad inhibitory trough on FRF (Fig. 3O), meaning that the inhibitory summation is always higher than excitatory summation (Fig. 3T).

To examine such a qualitative observation quantitatively, we measured the bandwidths (BWs) of the excitatory peak and inhibitory trough of FRF. The BW of excitatory peak was measured at the half-height of each cell's SF-response function. The majority of our units (89/102) had a single peak in SF-response function. Only seven F0-sensitive cells, four energy-integrator cells and three non-responsive cells had double or three separate peaks higher than half the maximum amplitude in the SF-response function. Thus, we evaluated the BW of the main excitatory peak. The BW of the inhibitory trough was measured at the S2F-response function of each tested cell. An inhibitory trough was defined as a contiguous frequency space on the S2F-response function where the function was below the half-height BF response level. As most of our cells, particularly F0-sensitive cells, have multi-separated troughs on the S2F-response function, we recorded the BWs of the three deepest inhibitory troughs for each cell. The mean ± SD of measured BWs for each cell group are shown in Table 1. As shown in representative cells, the BW of the excitatory peak was broad in energy-integrator cells, medium in non-responsive cells and narrow in F0-sensitive cells. The difference in mean BWs between each group pair was statistically significant (P < 0.05). For the BW of inhibitory troughs, the first deepest trough was significantly broad in non-responsive cells, medium in F0-sensitive cells and narrow in energy-integrator cells. The second deepest trough of non-responsive cells was also significantly broader than that of F0-sensitive and energy-integrator cells. Note that of the 14 energy-integrator cells tested with two-tone stimuli, only three cells had the second trough on their S2F-response function and no cell had the third trough. The BW of the third inhibitory trough was still significantly broader in non-responsive cells than in F0-sensitive cells. These quantitative results confirmed our qualitative observation in the representative cells.

Table 1

Excitatory and inhibitory bandwidths and balance index of FRF in different types of cells

Cell type BW of excitatory peak (Octaves)
 
BW of inhibitory troughs (Octaves)
 
  BI 

 
 1st trough
 
2nd trough
 
3rd trough
 

 
F0-Sensitive 0.45 ± 0.24 (n = 49) 0.44 ± 0.28 (n = 33) 0.26 ± 0.17 (n = 31) 0.14 ± 0.08 (n = 14) −0.17 ± 0.18 (n = 33) 
Energy-integrator 1.29 ± 0.65* (26) 0.06 ± 0.09* (14) 0.08 ± 0.07 (3) — (0) 0.50 ± 0.27* (14) 
Non-responsive
 
0.96 ± 1.02* (27)
 
1.80 ± 1.31* (21)
 
0.64 ± 0.53* (15)
 
0.38 ± 0.17* (8)
 
−0.49 ± 0.23* (21)
 
Cell type BW of excitatory peak (Octaves)
 
BW of inhibitory troughs (Octaves)
 
  BI 

 
 1st trough
 
2nd trough
 
3rd trough
 

 
F0-Sensitive 0.45 ± 0.24 (n = 49) 0.44 ± 0.28 (n = 33) 0.26 ± 0.17 (n = 31) 0.14 ± 0.08 (n = 14) −0.17 ± 0.18 (n = 33) 
Energy-integrator 1.29 ± 0.65* (26) 0.06 ± 0.09* (14) 0.08 ± 0.07 (3) — (0) 0.50 ± 0.27* (14) 
Non-responsive
 
0.96 ± 1.02* (27)
 
1.80 ± 1.31* (21)
 
0.64 ± 0.53* (15)
 
0.38 ± 0.17* (8)
 
−0.49 ± 0.23* (21)
 

Values are mean ± S. D. The value in the bracket shows the number of cells for each item.

*

significantly different from F0-sensitive cells (ANOVA followed by Tukey test for pairwise comparisons, or Student t-test; P<0.05).

significantly different from energy-integrator cells (P<0.05).

We further used an index, BI (see Materials and Methods for the definition), to evaluated the balance of excitatory and inhibitory strengths of FRF. When the excitatory and inhibitory strengths of FRF are balanced, BI is zero. The mean ± SD of BI in each cell group is shown in Table 1. BI in energy-integrator cells was positive, indicating the dominance of excitation. BIs in F0-sensitive and non-responsive cells tend to be negative, while the latter deviated further from zero. The difference of mean BI between each group pair was statistically significant (P < 0.05). It appears that the inhibitory strength of FRF is strong in non-responsive cells, medium in F0-sensitive cells and weak in energy-integrator cells.

Analysis of the Suppression Mechanism by Non-harmonic Frequency Components

Since the previous section demonstrated the bandwidth of the excitatory (0.45 octaves) and inhibitory (0.44 octaves) subfields of FRF in F0-sensitive cells, we can now analyze whether adding non-harmonic components, such as 0.75, 1.25 and 1.75 × F0 to a given effective harmonic complex tone, will decrease the response responsiveness of F0-sensitive neurons. This type of interaction was demonstrated indirectly by our data in which F0-sensitive cells could be driven by stimuli F0 = 0.5 × BF but not by stimuli F0 = 0.25 × BF. Figure 3G (bottom) shows that adding one non-harmonic component of 0.75 × BF in the lower inhibitory subfield and another non-harmonic component of 1.25 × BF in the upper inhibitory subfield abolished cell responses to F0 = 0.5 × BF. Similar effects were observed in other cells (Fig. 3, the third column for F0 = 0.5 × BF and the first column for F0 = BF).

The findings suggest that the inhibition mechanism for the addition of the non-harmonic component frequency is very strong: adding only one non-harmonic component with the same intensity as the harmonic component in each of two inhibitory subfields completely abolishes the F0-sensitivity to F0 = 0.5 × BF and F0 = BF.

Spectral Components Essential for F0 Sensitivity

To elucidate the spectral components essential for F0 sensitivity, we performed the missing F0 paradigm, in which the virtual F0 was fixed at the peaks of the F0 sensitivity curve while lower harmonics, including physical F0, were removed; that is, lower harmonics, including physical F0, were systematically eliminated, preserving the higher harmonics (Fig. 7A). Driven rates were analyzed in relation to the lowest harmonic number of the missing F0 stimuli (Fig. 7B,C). In an example cell (Fig. 7B), the F01st responsiveness was abolished when the lowest harmonic number was 2 (F0 was deleted; Fig. 7B, solid line) and F02nd responsiveness was abolished when the lowest harmonic number was 3 (both F0 and the second harmonic were deleted; Fig. 7B, dotted line). Normalized mean activities in 36 F0-sensitive cells are shown in Figure 7C. The response amplitudes for the full-component harmonic stimuli decreased to less than half, when the lowest harmonic number was 2 at F0 = F01st (Fig. 7C, solid line) and 3 at F0 = F02nd (Fig. 7C, dotted line). The finding suggests that the F0 component is essential for evoking F01st responses, and F0 and the second harmonic are essential for evoking F02nd responses.

Figure 7.

Spectral components essential for F0 sensitivity. (A) Spectral profiles of the missing F0 paradigm. The bottom trace shows the profile of stimuli when the lowest harmonic number is 1. The upper traces show the profiles of stimuli when the lowest harmonic numbers are 2, 3, 4 and 5. (B, C) Driven rates plotted against the lowest harmonic number in an example double-F0-tuning cell (B) and in population cells (C), when F0s are F01st (solid line) and F02nd (dotted line). The response height at the lowest harmonic number of 1 was used to normalize the mean driven rates in C. Vertical bars in C show SD. (D) Spectral profiles of 5 out of 125 trials of two-component harmonic stimuli. (E, F) F0-response functions of an example cell for full-component harmonics (E) and two-component harmonics (F). Note that (i) the response was abolished by eliminating F0 when F0 was F01st and by eliminating F0 and the second harmonics when F0 was F02nd (B, C), and (ii) the F0-response functions were invariant whether the components were full or at only the lowest two (E, F).

Figure 7.

Spectral components essential for F0 sensitivity. (A) Spectral profiles of the missing F0 paradigm. The bottom trace shows the profile of stimuli when the lowest harmonic number is 1. The upper traces show the profiles of stimuli when the lowest harmonic numbers are 2, 3, 4 and 5. (B, C) Driven rates plotted against the lowest harmonic number in an example double-F0-tuning cell (B) and in population cells (C), when F0s are F01st (solid line) and F02nd (dotted line). The response height at the lowest harmonic number of 1 was used to normalize the mean driven rates in C. Vertical bars in C show SD. (D) Spectral profiles of 5 out of 125 trials of two-component harmonic stimuli. (E, F) F0-response functions of an example cell for full-component harmonics (E) and two-component harmonics (F). Note that (i) the response was abolished by eliminating F0 when F0 was F01st and by eliminating F0 and the second harmonics when F0 was F02nd (B, C), and (ii) the F0-response functions were invariant whether the components were full or at only the lowest two (E, F).

The finding also rules out the possibility that the difference tone (f2–f1) equal to F0 is essential for F0 sensitivity, because the responses to F0 changed with elimination of the lower harmonics in spite of the constant difference tone. It is also not likely that the combination tone (2f1–f2) equal to F0 is essential for F0 sensitivity, because the combination tone for F0 and the second harmonic is theoretically 0 Hz.

The finding that the response disappears after the removal of one or two harmonics is not necessarily surprising — the resulting stimulus has no energy within the excitatory region of the neuron. This point reinforces the interpretation that spectral integration is the major determinant of the responses to these harmonic stimuli.

If F0 and the second harmonic components were essential for F0 sensitivity as indicated by the missing F0 paradigm, the harmonic stimuli with only the two lowest components (F0 and the second harmonic) would generate an F0-response function similar to that by full-component harmonic stimuli. The two-component harmonic paradigm (Fig. 7D) was performed in 5 of the 36 cells tested by the missing F0 paradigm. As shown in Figure 7E,F, the F0-response functions were invariant regardless of the number of the component frequencies (full in Fig. 7E or two in Fig. 7F). The finding suggests that the two lowest frequencies (F0 and the second harmonic) are sufficient for evoking F0-sensitive responses in A1 cells. The finding also rules out the possibility that the F0-response function shows sensitivity to the co-varying parameter of overall SPL, because the amplitude of the F0-response function changed with F0 in spite of the constant overall SPL.

Effects of Sound Level on F0 Sensitivity

We examined the dependence of F0-tuning characteristics on sound level (range, 20–70 dB SPL) in 16 F0-sensitive cells. In all the cells tested, the response characteristics were relatively invariant across sound levels. Figure 8 shows example responses of a representative cell. While changing the sound level resulted in changes of driven rates, peak frequencies in SF-response functions remained largely unchanged (Fig. 8A), and F0-tuning envelopes with peaks corresponding to BF and one octave below, which are specific for F0-sensitive cells at the best SPL, also remained largely unchanged (Fig. 8B) across a wide range of sound levels tested. The finding shows that although changing the sound level may change the peak driven rate, it did not result in significant change of F0-sensitivity characteristics of F0-sensitive cells.

Figure 8.

Effects of sound level on F0 sensitivity. SF- (A) and F0-response functions (B) at six different intensities of 20, 30, 40, 50 and 60 dB SPL were plotted together to illustrate the similarity between the response functions of different sound levels. For horizontal and vertical lines, see Figure 1 legend.

Figure 8.

Effects of sound level on F0 sensitivity. SF- (A) and F0-response functions (B) at six different intensities of 20, 30, 40, 50 and 60 dB SPL were plotted together to illustrate the similarity between the response functions of different sound levels. For horizontal and vertical lines, see Figure 1 legend.

Discussion

In this study, harmonic complex tone stimuli with less prominent temporal periodicity (Fig. 2) were presented during the recording of low-BF single cells in the caudal part of A1, investigating neural processing of the spectral cue of pitch-height. We found F0-sensitive and F0-insensitive cells based on the F0-response functions (Fig. 3F–J). We further investigated the underlying neural mechanism with the use of a two-tone paradigm (Fig. 3K–O). Each cell type had a distinct spectral-inhibition pattern (Fig. 3P–T), suggesting that F0 sensitivity correlates with the spectral inhibition pattern.

F0-sensitive Cells

F0-sensitive cells were sensitive to the harmonics with given F0s but not to noise with similar bandwidth and energy distribution (Fig. 5C,H). They usually preferred two F0s one octave apart: one, corresponding to the cell BF (F01st) and the other, one octave below BF (F02nd) (Fig. 4). It is suggested that (i) the caudal part of A1 is well organized for the detection of harmonics by specific F0-sensitive cells; (ii) F0-sensitive cells do not discriminate between the spectral pitch-height of pure tone and virtual pitch-height of harmonics; and (iii) pitch-height is represented along with the tonotopic axis in A1.

Pitch-height sensitivity may originate from the characteristic FRF pattern alternating between excitation and inhibition (predominant excitation on preferred harmonic frequencies and predominant inhibition on non-preferred frequencies) (Fig. 6A,D,G). This sensitivity seems to be derived, rather than primary, in these cells, that is, the excitatory and inhibitory subfields of FRF of F0-sensitive cells are specifically organized for detection of the harmonics: the inhibitory subfields are narrowly (0.44, 0.26 and 0.14 octaves for the deepest three troughs) located adjacent to the excitatory subfield (Fig. 3K–M), which allows the cell to excite only when the component frequency falls on the excitatory subfield but not on inhibitory subfields. Only harmonic complex tones, whose F0 are equal to the cell BF or half of BF, fit the excitation criterion of the cell. The addition of only one non-harmonic frequency component in each of the two inhibitory subfields completely abolishes the F0 sensitivity to F0 = 0.5 × BF and F0 = BF. It is not surprising that broadband noise, which has a number of additional non-harmonic-component frequencies on the inhibitory subfields, fits the inhibition criterion of the cell. Thus, given the excitatory and inhibitory fields of these neurons, it is conceivably possible to design a special stimulus composed of a number of artificial tonal components, which are not necessarily harmonically related, and that would be more efficient than any harmonic complexes. Theoretically, the inhibitory BW of 0.44 octaves can pass the second and the third harmonics falling, respectively, at the low and the high edges of the inhibitory subfield.

F0-insensitive Cells

Energy-integrator cells responded to both harmonic and high-pass noise stimuli as far as the sound energy on FRF (Fig. 5D,I). The driven rates tended to increase with the decrease in F0/LEF, which was equivalent to the increase in the number of spectral components on FRF. These data suggest that energy-integrator cells are not sensitive to pitch-height but are sensitive to static sounds as far as the sound energy on FRF. This is why the cells are named energy-integrator cells. The response characteristics of energy-integrator cells may originate from relatively wide bandwidth of the excitatory subfields of FRF (1.29 octaves) and less dominant spectral inhibition (Fig. 6B,E,H). The absence of inhibitory subfields of FRF (Fig. 3N) allows the energy-integrator cells to integrate the sound energy on the excitatory subfield of FRF.

In contrast to comprehensive energy-integrator cells, non-responsive cells showed responses to neither the harmonic complex tone nor high-pass noise (Fig. 5E,J). This may originate from dominant spectral inhibition on component frequencies of the harmonics and noise (Fig. 6C,F,I); that is, a wideband (1.8 octaves) inhibitory subfield on the higher frequency side of the excitatory subfield (Fig. 3O) does not allow non-responsive cells to respond to the harmonics and high-pass noise. Theoretically, inhibitory BW of 1.80 octaves can pass two frequency components, in which the lower component is on the low edge of the inhibitory subfields and the higher component is >3.5 times higher than the lower component.

Relation of this Study to Previous Studies

In this study, some cells (7 of 49 F0-sensitive cells, 14.3%; 4 of 26 energy-integrator cells, 15.4%; and 2 of 27 non-responsive cells, 7.4%) had multiple peaks more than half of the BF response height in the SF-response function. Such multi-peaked cells were previously reported by Sutter and Schreiner (1991) and Kadia and Wang (2003). By observing the response facilitation in the two-tone paradigm for the harmonic second tone in the multipeaked neurons, Kadia and Wang (2003) suggested that the multi-peaked neurons have functional significance for extracting harmonic components embedded in complex sounds. This study supports the suggestion of Kadia and Wang because the majority (7 of the 13 multipeaked neurons) of the multipeaked recorded neurons in this study had sensitivity to the harmonic complex tone.

As mentioned in the introduction, Schwartz and Tomlinson (1990) investigated the responses of A1 cells to the harmonic complex tone in monkeys. They found ‘F0 neurons’ whose frequency tuning for pure tones is similar to that for F0s of harmonic complex tones. However, their ‘F0 neurons’ were regarded as ‘not’ encoding F0 because there was no response to missing F0 stimuli but responses to both noise and harmonic complex tones, that is, they found no cells that were sensitive to the pitch-height of harmonics. The negative result may be inconclusive, rather than a strong indication of a lack of F0-sensitive cells in monkey A1, because despite studying ‘pitch’, Schwarz and Tomlinson used ‘white noise’ (a sound lacking pitch) as their search stimulus, resulting in a bias against finding F0-sensitive cells. This study has shown that F0-sensitive cells can distinguish between harmonics and noise, that is, they respond to harmonics but not to noise (Fig. 5C,H), suggesting that the noise-search stimulus prevented Schwarz and Tomlinson from recording F0-sensitive A1 cells that did not respond to white noise. This study revealed that (i) A1 has cells sensitive to F0 of the harmonics but not to noise; and (ii) those F0-sensitive cells in A1 respond to harmonics missing F0 when F0 was one octave below the cell BF (dotted lines in Fig. 7B,C).

In this study, we estimated the excitatory and inhibitory subfields of FRF from single- and two-tone stimuli, respectively, and succeeded in explaining the neural responses to the harmonic complex tone on the basis of the excitation-inhibition balance. Our procedures can be justified by the findings of Nelken et al. (1994a,b), who reported in cat A1 that single-tone effects and two-tone interactions are sufficient to explain the A1 neuron responses to multi-tone stimuli, such as four-tone and nine-tone complexes. In this study, A1 cells might act as a spectral filter with pass-band and reject-band, which correspond to the peak and trough in FRF, respectively. Thus, all aspects of the responses to the harmonic series of tones are immediate consequences of the spectral filter, including the narrowing of the F0-response peak when the second harmonic coincides with the cell BF and the differential responses between noise stimuli and harmonic stimuli for F0-sensitive cells, the broadly tuned and sloping F0-responses for energy-integrator cells with no inhibitory sideband and the weakness of F0 response for non-responsive cells with frequency-unspecific inhibitory sidebands. The fundamental point of view that neural activities of A1 can be explained from the spectral integration of stimulus energy on FRF is in good accordance with the previous reports of Schwarz and Tomlinson (1990) and Fishman et al. (1998, 2000) that studied A1 neural responses to harmonics.

More generally, our finding is in accord with previous studies in the sense that spectral filtering mechanisms modify information processing of the central auditory system such as tonal frequency tuning (Katsuki et al., 1958; Greenwood and Maruyama, 1965), duration tuning (Casseday et al., 1994), amplitude-modulation frequency tuning (Liang et al., 2002), frequency-modulation direction-selectivity (Zhang et al., 2003), spectral-edge sensitivity (Qin et al., 2004a) and spectral-shape preference (Qin et al., 2004b). This study has provided evidence demonstrating that the spectral filtering mechanism is also involved in the discrimination between harmonics and noise.

This study shows that pitch-height is represented along with the tonotopic axis. Our findings suggest that a harmonic complex tone with a given F0 such as 440 Hz would activate double-F0-tuning cells in two regions along the tonotopic axis in A1: one on the tonotopic axis of 880 Hz and the other on an octave below (440 Hz). Furthermore, a harmonic complex tone missing the F0 of 440 Hz would activate double-F0-tuning cells only on the tonotopic axis of 880 Hz, and harmonics missing both F0 and the second harmonics would not activate the double-F0-tuning cells. Thus, our findings of pitch-height representation along with the tonotopic axis can be applied only for the harmonic complex tones, including relatively lower harmonics. Our findings neither support nor reject the previous findings (Schulze and Langner, 1997; Schulze et al., 2002), suggesting an independent arrangement of pitch and BF. They focused on the ‘temporal-cue’ mechanism and used amplitude modulated tones with 100% depth of modulation and with only three spectral components far away from the cell FRF, whereas we focused on the ‘spectral-cue’ mechanism and used harmonic complex tones with less-dominant envelope periodicity and with spectral components near the cell FRF.

Correlation with Psychophysics

As discussed in the previous section, F0-sensitive cells in A1 were driven as the consequence of the spectral filter function. Nevertheless, very interestingly, the observed behavior of F0-sensitive cells reflects some psychophysical properties of harmonic complexes. First, the octave-response characteristic of F0-sensitive cells explains ‘pitch-chroma’. Pitch has two perceptual dimensions: the ‘vertical dimension’ of pitch-height from low to high and the ‘circular dimension’ of pitch-chroma, which is helical pitch perception returning to a similar pitch perception at a pitch-height elevation one octave above (Warren et al., 2003). In Western music, the same ‘note name’ is given at pitch-heights one octave apart. For example, note names of both 440 and 880 Hz are ‘A’ in English and American music. This study has shown that most F0-sensitive cells have two F0-tuning peaks: one corresponding to the cell BF and the other to the frequency one octave below (Fig. 4). This ‘octave’-response property of the F0-sensitive cells in A1 may imply that the pitch-chroma is created in the central nervous system of A1, which is well organized to discriminate between harmonic sounds and noise sounds. If it is supposed that there are two harmonic complex tones with different F0s one octave apart, such as 880 and 440 Hz, and an F0-sensitive A1 cell with BF at 880 Hz, then the two harmonic stimuli would generate similar vigorous spike activities of the cell, underlying the perception of the pitch-chroma. For the first time, we have presented physiological evidence supporting the interpretation that the pitch-chroma has relevance to the neural response characteristics in A1 organized for detection of the harmonic structure of the complex tone. Second, the level invariance of the preferred pitch-height of F0-sensitive cells (Fig. 8) may explain the level invariance of the pitch-height of the harmonic complex tone (Terhardt, 1979). Third, F0-sensitive cells reported in this study responded to pitch stimuli simply throughout the stimulus period without prominent adaptation (Fig. 1E). This time–response property explains the psychophysical phenomenon of pitch perception that hardly adapts. Thus, we close a knowledge gap between psychophysics and physiology for complex features of pitch perception that are universally found in harmonic sound.

However, the response features of F0-sensitive cells in A1 are not always consistent with the psychophysical findings of pitch. The third to fifth harmonics play a dominant role in pitch-height perception (Ritsma, 1967), while the two lowest components (F0 and the second harmonic) are essential for generating the responses of F0-sensitive cells of A1 in this study (Fig. 7). Thus, the encoding of pitch information in A1 remains in the spectral filtering level but not in the perfect perceptual level. We suggest that neural processing for pitch perception should be completed in higher-order auditory cortical fields. This should be clarified in future studies.

We thank N. Yaguchi for technical assistance. The work was supported by a grant from the Ministry of Education, Science, Culture, Sports and Technology, Japan.

References

Bernstein JG, Oxenham AJ (
2003
) Pitch discrimination of diotic and dichotic tone complexes: harmonic resolvability or harmonic number?
J Acoust Soc Am
 
113
:
3323
–3334.
Burns EM, Viemeister NF (
1976
) Nonspectral pitch.
J Acoust Soc Am
 
60
:
863
–869.
Casseday JH, Ehrlich D, Covey E (
1994
) Neural tuning for sound duration: role of inhibitory mechanisms in the inferior colliculus.
Science
 
264
:
847
–850.
Chimoto S, Kitama T, Qin L, Sakayori S, Sato Y (
2002
) Tonal response patterns of primary auditory cortex neurons in alert cats.
Brain Res
 
934
:
34
–42.
Cohen MA, Grossberg S, Wyse LL (
1995
) A spectral network model of pitch perception.
J Acoust Soc Am
 
98
:
862
–879.
Cynx J, Shapiro M (
1986
) Perception of missing fundamental by a species of songbird (Sturnus vulgaris).
J Comp Psychol
 
100
:
356
–360.
Fastl H, Stoll G (
1979
) Scaling of pitch strength.
Hear Res
 
1
:
293
–301.
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (
1998
) Pitch vs. spectral encoding of harmonic complex tones in primary auditory cortex of the awake monkey.
Brain Res
 
786
:
18
–30.
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (
2000
) Complex tone processing in primary auditory cortex of the awake monkey. II. Pitch versus critical band representation.
Acoust Soc Am
 
108
:
247
–262.
Greenwood DD, Maruyama N (
1965
) Excitatory and inhibitory response areas of auditory neurons in the cochlear nucleus.
J Neurophysiol
 
28
:
863
–892.
Heffner H, Whitfield IC (
1976
) Perception of the missing fundamental by cats.
J Acoust Soc Am
 
59
:
915
–919.
Houtsma AJM (
1984
) Pitch salience of various complex sounds.
Music Percept
 
1
:
296
–307.
Houtsma AJ, Wicke RW, Ordubadi A. (
1980
) Pitch of amplitude-modulated low-pass noise and predictions by temporal and spectral theories.
J Acoust Soc Am
 
67
:
1312
–1322.
Kadia SC, Wang X (
2003
) Spectral integration in A1 of awake primates: neurons with single- and multipeaked tuning characteristics.
J Neurophysiol
 
89
:
1603
–1622.
Kaernbach C, Demany L (
1998
) Psychophysical evidence against the autocorrelation theory of auditory temporal processing.
J Acoust Soc Am
 
104
:
2298
–2306.
Katsuki Y, Sumi T, Uchiyama H, Watanaba T (
1958
) Electric responses of auditory neurons in cat to sound stimulation.
J Neurophysiol
 
21
:
569
–588.
Langner G, Sams M, Heil P, Schulze H. (
1997
) Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography.
J Comp Physiol A
 
181
:
665
–676.
Liang L, Lu T, Wang X (
2002
) Neural representations of sinusoidal amplitude and frequency modulations in the primary auditory cortex of awake primates.
J Neurophysiol
 
87
:
2237
–2261.
Nelken I, Prut Y, Vaddia E, Abeles M (
1994
a) Population responses to multifrequency sounds in the cat auditory cortex: one- and two-parameter families of sounds.
Hear Res
 
72
:
206
–222.
Nelken I, Prut Y, Vaddia E, Abeles M (
1994
b) Population responses to multifrequency sounds in the cat auditory cortex: four-tone complexes.
Hear Res
 
72
:
223
–236.
Pantev C, Hoke M, Lutkenhoner B, Lehnertz K (
1989
) Tonotopic organization of the auditory cortex: pitch versus frequency representation.
Science
 
246
:
486
–488.
Pantev C, Elbert T, Ross B, Eulitz C, Terhardt E (
1996
) Binaural fusion and the representation of virtual pitch in the human auditory cortex.
Hear Res
 
100
:
164
–170.
Patel AD (
2003
) Language, music, syntax and the brain.
Nat Neurosci
 
6
:
674
–681.
Plomp R (
1967
) Pitch of complex tones.
J Acoust Soc Am
 
41
:
1526
–1533.
Qin L, SatoY (
2004
) Suppression of auditory cortical activities in awake cats by puretone stimuli.
Neurosci Lett
 
365
:
190
–194.
Qin L, Kitama T, Chimoto S, Sakayori S, Sato Y (
2003
) Time course of tonal frequency-response-area of primary auditory cortex neurons in alert cats.
Neurosci Res
 
46
:
145
–152.
Qin L, Sakai M, Chimoto S, SatoY (
2004
a) Spectral-edge sensitivity of primary auditory cortex neurons in alert cats.
Brain Res
 
1014
:
1
–13.
Qin L, Chimoto S, Sakai M, SatoY (
2004
b) Spectral-shape preference of primary auditorycortex neurons in awake cats.
Brain Res
 
1024
:
167
–175.
Reale RA, Imig TJ (
1980
) Tonotopic organization in auditory cortex of the cat.
J Comp Neurol
 
192
:
265
–291.
Renken R, Wiersinga-Post JE, Tomaskovic S, Duifhuis H (
2004
) Dominance of missing fundamental versus spectrally cued pitch: individual differences for complex tones with unresolved harmonics.
J Acoust Soc Am
 
115
:
2257
–2263.
Ritsma RJ (
1967
) Frequencies dominant in the perception of the pitch of complex sounds.
J Acoust Soc Am
 
42
:
191
–198.
Schulze H, Hess A, Ohl FW, Scheich H (
2002
) Superposition of horseshoe-like periodicity and linear tonotopic maps in auditory cortex of the Mongolian gerbil.
Eur J Neurosci
 
15
:
1077
–1084.
Schulze H, Langner G (
1997
) Periodicity coding in the primary auditory cortex of the Mongolian gerbil (Meriones unguiculatus): two different coding strategies for pitch and rhythm?
J Comp Physiol A
 
181
:
651
–663.
Schwarz DW, Tomlinson RW (
1990
) Spectral response patterns of auditory cortex neurons to harmonic complex tones in alert monkey (Macaca mulatta).
J Neurophysiol
 
64
:
282
–298.
Shouten JF (
1962
) Pitch of the residue.
J Acoust Soc Am
 
34
:
1418
–1424.
Sutter ML, Schreiner CE (
1991
) Physiology and topography of neurons with multipeaked tuning curves in cat primary auditory cortex.
J Neurophysiol
 
65
:
1207
–1226.
Terhardt E (
1974
) Pitch, consonance, and harmony.
J Acoust Soc Am
 
55
:
1061
–1069.
Terhardt E (
1979
) Calculating virtual pitch.
Hear Res
 
1
:
155
–182.
Tomlinson RW, Schwarz DW (
1988
) Perception of the missing fundamental in nonhuman primates.
J Acoust Soc Am.
 
84
:
560
–565.
Warren JD, Uppenkamp S, Patterson RD, Griffiths TD (
2003
) Separating pitch chroma and pitch height in the human brain.
Proc Natl Acad Sci USA
 
100
:
10038
–10042.
Whitfield IC (
1980
) Auditory cortex and the pitch of complex tones.
Acoust Soc Am
 
67
:
644
–647.
Zatorre RJ (
1988
) Pitch perception of complex tones and human temporal-lobe function.
J Acoust Soc Am
 
84
:
566
–572.
Zhang LI, Tan AY, Schreiner CE, Merzenich MM (
2003
) Topography and synaptic shaping of direction selectivity in primary auditory cortex.
Nature
 
424
:
201
–205.