Humans and some other species can nonlinguistically operate on the quantities of things or events, including sounds. Whether this ability is restricted to conscious percepts of sounds developing in ∼200 ms is, however, unclear. To this end, we recorded the mismatch negativity (MMN) brain response, an index of preperceptual auditory change detection, of adult humans who passively listened to rare sequences of four 50-ms tones (“deviants”) interspersed among frequently repeated tones (“standards”). Each tone was either 1000 or 1500 Hz in frequency. Deviants differed from standards in a ratio of the tones of the 2 frequencies. MMN was found for deviants by 160 ms from the onset of their largest ratio difference from standards (2:2 vs. 4:0), suggesting some ability of the human brain to operate on the number of sequential sounds of specific frequencies at a preperceptual time scale.
Humans, just as some other species, can operate on approximate numbers (as quantities) of things or events without the support of language (Dehaene et al. 1998; Gallistel and Gelman 2000; Gordon 2004). The speed of related neural computations remains, however, largely unclear. One of the key issues in this respect is whether items must be fully perceived before their quantities can be operated on. Here, using the mismatch negativity (MMN, Näätänen 1990) of the event-related potential (ERP) of the human brain, we tackled this question regarding sequential sounds of specific frequencies.
The percept of a discrete, and thereby easily countable, auditory event takes ∼200 ms to develop (Cowan 1984; Näätänen and Winkler 1999). MMN, in turn, can be recorded at ∼100 to 200 ms following an auditory event (“deviant”) that violates some invariant attribute of previous auditory events (“standards”) (Näätänen 1990; Näätänen et al. 2001). It is held to reflect the preattentive detection, albeit not necessarily conscious perception, of deviants. Consistently, MMN-eliciting events can be composed of a number of tones of specific frequencies sequenced at intervals of just some tens of milliseconds (e.g., Schröger et al. 1992), tones that cannot contribute to MMN as discrete conscious percepts.
We recorded MMN from adult humans, who passively listened to 200-ms sequences of 4 consecutive tones, each of which could be of 1 of the 2 alternative frequencies. Each series of the sequences was divided into rare sequences (deviants) and frequently repeated sequences (standards). In the main series, deviants differed from standards in the ratio of the tones of the 2 frequencies. In the control series, the 2 frequencies themselves distinguished deviants from standards. The control series were simply aimed at demonstrating MMN per se (Näätänen 1990) in order to be able to attribute its possible absence in the main series to numerical rather than elementary aspects of auditory processing.
Materials and Methods
Twelve participants (5 males and 7 females) between 19 and 37 years of age participated in the experiment. Participants provided their informed consent after the nature of the experiment was explained to them. During the experiment, they were comfortably seated in an electrically shielded and acoustically attenuated chamber. The participants were instructed to ignore sounds and to concentrate on a silent, subtitled playing movie on a video monitor.
Sequences of 4 consecutive tones of 50 ms in duration (including 10-ms onset/offset ramps) and 50 dB (sound pressure level) in intensity were emitted above the subject's hearing level as defined by the staircase method (Fig. 1a). Each tone of a sequence was either 1000 or 1500 Hz in frequency. There were 1000 sequences presented at a stimulus–onset–asynchrony (SOA) of 500 ms in each series. The order of the series was counterbalanced across the participants. The sequences were divided into those of low (deviants, P = 0.1) and those of high (standards, P = 0.9) sequential probability (Fig. 1b and c).
Figure 1b illustrates the sequences used in the experiment. There were 3 sets of sequences for 3 different ratios of the tones of the 2 frequencies (4:0, 2:2, and 3:1) from which individual sequences were pseudorandomly (equiprobably) assigned to standards and deviants in the 4 main series (S2:2 D4:0, S3:1 D4:0, S2:2 D3:1, and S3:1 D2:2). The resulting nonsystematic ordering of the tones over their different sequences in the series prevented deviants from being detected as mere alterations in a constant melodic pattern of standards (Schröger et al. 1992). Rather, a given number of tones of a particular sequence had to occur until this detection could be made. More explicitly, for the S2:2 D4:0 series, for example, the first and the second tone of the D4:0 sequence could still belong to the S2:2 sequence (as only 2 identical tones had occurred thus far), but not the third identical tone (which could only match the 3:1 or the 4:0 ratio). In the 2 control series (A and B), 1 sequence (1000 or 1500 Hz) was assigned to standards and the other to deviants.
The electroencephalogram was recorded (NeuroScan software, NeuroScan Co., El Paso, Texas) with Ag-AgCl electrodes from the F3, Fz F4, C3, Cz, C4, P3, Pz, and P4 scalp sites according to the 10–20 system. Eye-related activity was monitored with electrodes positioned at the outer canthus of and below the right eye. The signals from the electrodes referenced to the electrode at the tip of the nose were amplified, band-pass filtered (0.1–40 Hz, 24 dB per octave roll off), and digitized at a 200-Hz sampling rate.
ERPs were averaged for each subject, series, and stimulus type (deviants and those standards that immediately preceded the deviants) and were corrected relative to the baseline (50-ms prestimulus mean amplitude). ERPs exceeding ±150 μV at any recording channel were rejected from the averages. Subsequently, we measured the mean amplitudes during the 60-ms period starting at 100 ms from the onset of the earliest tone of a sequence distinguishing deviants from standards. Depending on the series, this onset was of the first (control series), third (S2:2 D4:0 and S2:2 D3:1), or fourth (S3:1 D4:0 and S3:1 D2:2) tone of a sequence.
The resultant values were analyzed with a repeated measures analysis of variance with stimulus types (standard and deviant), laterality (left, middle, and right, respectively), and anterior–posterior (frontal, central, and parietal, respectively) as factors. An alpha level of 0.05 was used in all analyses. Greenhouse–Geisser–adjusted degrees of freedom for the averaged tests of significance were used whenever the sphericity assumption was violated; P values were reported accordingly.
The elicitation of MMN was found in the control series (Controls A and B in Fig. 1b). The average ERP amplitude shifted toward negative polarity by deviants relative to standards between 100 and 160 ms from deviance onset (Fig. 2), as shown by the significant main effect of stimuli (standard and deviant: Control A, F1,11 = 33.70, P < 0.001; Control B, F1,11 = 11.22, P < 0.01). The frontocentral scalp distribution of this shift was further indicated by the significant interactions of the main effect of stimuli with the main effect of anterior–posterior (frontal, central, and parietal: Control A, F2,22 = 13.50, P < 0.01; Control B, F2,22 = 26.11, P < 0.001) and of laterality (left, middle, and right: Control A, F2,22 = 9.60, P < 0.01; Control B, F2,22 = 8.64, P < 0.01). Stimuli × anterior posterior × laterality interaction was not significant.
The elicitation of MMN was also found in one of the main series where it was the ratio of the tones of the 2 frequencies that distinguished deviants from standards. Namely, again, the average ERP amplitude shifted toward negative polarity by the deviant 4:0 ratio relative to the standard 2:2 ratio, especially in frontal leads between 100 and 160 ms from the onset of deviance (series S2:2 D4:0 in Figs. 1b and 2). This was indicated by a significant stimuli × anterior–posterior interaction (F2,22 = 6.43, P < 0.05), the main effect of stimuli remaining insignificant.
However, no MMN could be found with the deviant 4:0 ratio against the standard 3:1 ratio (series S3:1 D4:0 in Figs. 1b and 2). There was no significant stimuli × anterior–posterior interaction (F2,22 = 0.90, P = 0.4), and there was only a trend for the main effect of stimuli (F1,11 = 3.51, P = 0.088). The same held true with the deviant 2:2 ratio against the standard 3:1 ratio, and vice versa (series S3:1 D2:2 and S2:2 D3:1 in Figs. 1b and 2). In neither case was the main effect of stimuli or its interaction with the other main effects significant.
As expected, we found a robust frontocentral ERP deflection of negative polarity at 100–160 ms poststimulus, most likely MMN (Näätänen 1990), for passively heard deviants that differed from standards in their frequency (control series).
However, when deviants differed from standards in the ratio of the tones of the 2 frequencies, MMN was found only when this difference was large (S2:2 D4:0). With the small ratio differences (series S3:1 D4:0, S3:1 D2:2, and S2:2 D3:1), no MMN was observed, presumably because the fast pace of the tones made them difficult to quantify. After all, although one may easily discriminate between 4 seen or heard items from their respective sets of 3 (Dehaene et al. 1998; Gallistel and Gelman 2000; Gordon 2004), fast pace of sounds make them difficult to count even as general auditory activations (Garner 1951).
The standard 2:2 ratio was of 2 tones of each frequency, so already the third tone of a sequence made the deviant 4:0 ratio different from this ratio. This was because the ratio 2:2 could not include 3 identical tones, only 2. But the third tone alone was insufficient to actually elicit MMN. The next, fourth tone had to play a role also, given that no MMN was obtained for a difference of only one tone of each frequency between the deviant (4:0) and the standard (3:1) ratio.
MMN was observed by 160 ms from the onset of the earliest tone of a sequence distinguishing the deviant 4:0 from the standard 2:2 ratio, namely, the third one. Thus, in this short time period, a whole set of neural tasks were accomplished. The third tone was processed in the context of the 2 preceding tones to establish a neural correlate of the 3 tones of one frequency or of the absence of tones of another frequency, or both. One or both of these correlates, in turn, were compared with their transient memory (standard) counterparts (for 2 tones of one frequency or one tone of the other frequency, or both), and MMN was initially triggered as a result of a mismatch between the two.
But how were the numbers of the tones of the 2 frequencies per sequence actually reflected in the brain? The involvement of a visual (i.e., nonauditory) object-tracking mechanism (Kahneman et al. 1992) can obviously be excluded. Instead, a so-called accumulator mechanism could have automatically converted these numbers to continuous magnitudes (Meck and Church 1983), which is not the only possible explanation, however. The numbers could also have paralleled a frequency average (Cowan 1984) or a frequency variability (Wolff and Schröger 2001) across the tones of the 2 frequencies at preperceptual auditory storage (Näätänen and Winkler 1999). The frequency average, however, would also have reflected the cumulative durations of these tones that, in addition to their numbers, could have recruited the accumulator mechanism to process them (Meck and Church 1983).
Finally, there was only 1 frequency present in the deviant 4:0 ratio, whereas 2 frequencies were present in the standard 2:2 ratio. This implies that the MMN-eliciting changes might have been in the number of different frequencies per se rather than the number of tones of these frequencies. Further studies should therefore be conducted with higher total numbers of tones in a sequence to clarify this problem.
In conclusion, our findings suggest that the human brain possesses some ability to operate on the numbers of sequential sounds of specific frequencies, as yet incomplete in their conscious perceivability and thereby voluntary countability.
We thank Jenni Väre and Vesa Putkinen for their help in data acquisition and analysis. The study was supported by the Academy of Finland. Conflict of Interest: None declared.