Humans are endowed with an intuitive number sense that allows us to perceive and estimate numerosity without relying on language. It is controversial, however, as to whether there is a neural mechanism for direct perception of numerosity or whether numerosity is perceived indirectly via other perceptual properties. In this study, we used a novel regression-based analytic method, which allowed an assessment of the unique contributions of visual properties, including numerosity, to explain visual evoked potentials of participants passively viewing dot arrays. We found that the human brain is uniquely sensitive to numerosity and more sensitive to changes in numerosity than to changes in other visual properties, starting extremely early in the visual stream: 75 ms over a medial occipital site and 180 ms over bilateral occipitoparietal sites. These findings provide strong evidence for the existence of a neural mechanism for rapidly and directly extracting numerosity information in the human visual pathway.
Basic numerical capacities constitute one of the core human knowledge systems upon which novel representations and skills are built (Spelke and Kinzler 2007; Carey 2009). In particular, the ability to approximately estimate numerosity (i.e., the cardinal value of a set of items) without relying on language is thought to be based on an a priori Kantian intuition (Gallistel and Gelman 1992; Dehaene 1999). Nevertheless, the way in which the human cognitive neural system perceives and processes numerosity, which has typically been tested with visually presented dot arrays (see Fig. 1A), remains controversial. On the one hand, it is argued that the primate neural system may be hard-wired to process numerosity directly, just as it would process any other perceptual category (Ross 2003; Burr and Ross 2008; Viswanathan and Nieder 2013; Anobile et al. 2014). An alternate view is that numerosity is perceived indirectly using information input from the perceptual processing of other visual properties (Durgin 2008; Dakin et al. 2011; Gebuis and Reynvoet 2013).
It is not trivial to distinguish these alternative hypotheses, as changes in numerosity covary with changes in various other intensive variables (visual properties pertaining to individual items, such as individual item area [IA]) and extensive variables (visual properties pertaining to the set of items, such as total item area [TA]) (S. Dehaene et al. 2005, unpublished data) (see Fig. 1B). That is, it is not possible to isolate the effect of numerosity while holding all other visual properties constant. Accordingly, it is difficult to interpret whether a neural signature associated with numerosity arises directly from the numerosity itself or indirectly from a combination of other perceptual properties that vary with numerosity (e.g., TA divided by IA).
In the present study, we used a newly developed stimulus design and analytic method paired with the high temporal resolution of event-related potential (ERP) recordings to test whether there is neural evidence for direct perception of numerosity. Specifically, dot arrays were systematically constructed to cover equal ranges of various visual properties. Furthermore, although a number of properties were varied, all of these properties could be represented as linear combinations of 3 orthogonal dimensions (see Fig. 1B and Materials and Methods). A linear model then quantitatively assessed the influence of variations in these visual properties on the ERPs of participants viewing these dot arrays. If numerosity information is encoded indirectly, such as by building off neural analyses of other visual properties (e.g., IA and TA), then the ERPs should be uniquely sensitive to these other visual properties at processing latencies earlier than when they show sensitivity to numerosity. In contrast, if numerosity is encoded directly, or at least in parallel to, other visual properties, then the ERPs should be uniquely sensitive to numerosity at the same time as—or before—they show sensitivity to other stimulus properties. The results from 2 independent experiments demonstrate that the ERP activity at extremely early latencies are more sensitive to numerosity than to other basic visual properties, suggesting that there is a mechanism by which the human brain can rapidly and directly extract numerosity in the visual stream.
Materials and Methods
A total of 46 participants initially participated in Experiment 1. Two participants were excluded from further analyses as they were unable to stay awake throughout the experiment (eyes closed and head leaned back, as observed by a video camera), and another participant was excluded due to equipment failure. The final sample consisted of 43 participants (all right-handed, 18 male, 18.1–25.0 years old with a mean of 20.2). Participants provided written informed consent to a protocol approved by the Duke University Institutional Review Board.
Visual stimuli were white dot arrays presented on a black background that consisted of 8, 11, 16, 23 or 32 dots (5 equidistant levels of numerosity, N, in a log-base2 scale). A unique dot array was generated for each trial using a custom algorithm, which drew nonoverlapping dots within an invisible circular field. The dot size was homogeneous within each dot array (see Fig. 1A). Individual item area (IA) refers to the area encompassed by a single dot in the array. Total item area (TA) refers to all the area encompassed by all the dots in the array considered together and is simply:
The stimulus set was constructed so that these visual properties including numerosity were systematically varied in relation to one other in a log scale (see Fig. 1B). Logarithmic scaling is incorporated in this stimulus parameter space for 2 important reasons. Theoretically, perception is proportional to the logarithmic scaling of stimulus intensity as established by the Weber–Fechner law. Practically, logarithmic scaling allows the iso-numerosity lines (dotted cyan lines in Fig. 1B) to be parallel, which makes it possible to use a linear model to decompose the parameter space. Take, for example, Equation (1) into a log (base2) scale:
Now considering that TA is orthogonal to IA (i.e., with no constraints on N, both TA and IA can be independently varied), then an orthogonal dimension to log(N), which we call the “size in area (SzA)” dimension, can be defined as follows:
SzA is the dimension that changes the overall area of the dots when N was held constant (Fig. 1B). That is, when numerosity (N) is held constant and IA is varied by some scaling factor, then TA must be varied by the same scaling factor. This novel dimension representing the size of the dots “independent” of numerosity is SzA. As can be seen earlier, log scaling provides the practical benefit of linearizing the parameters, thus allowing the definition of this novel dimension orthogonal to number.
Sp is the dimension that changes the overall spacing of the dots when N is held constant (Fig. 1B). That is, when numerosity (N) is held constant and FA is varied by some scaling factor, then sparsity (Spar) must be varied by the same scaling factor. Thus, this novel dimension, Sp, represents the spacing of the dots “independent” of numerosity.
Note that in our definition of the variables IA, TA, FA, and Spar, the variables that compose SzA (i.e., IA and TA) can be manipulated independently from the variables that compose Sp (i.e., FA and Spar). That is, α and β in Figure 1B can be completely independent. For example, say our goal is to construct an array with 8 dots (N = 8). Once IA is determined as α, TA gets automatically determined as 8α. Nevertheless, we can decide to draw the 8 dots within a very small or very large FA, which will automatically determine the Spar. Thus, log(SzA) and log(Sp) are also orthogonal to each other, resulting in 3 orthogonal dimensions, log(N) ⊥ log(SzA) ⊥ log(Sp), capturing a 3D parameter space. Thus, the axes illustrated in Figure 1B are 2 projection views of a 3D parameter space (see Supplementary Fig. 1 for an illustration of the exemplary dot patterns in the 3D parameter space). For simplicity, in the remainder of the paper and the figures, the stimulus parameters (e.g., N, IA, TA, FA and Spar) refer to their log transformed values unless otherwise noted.
Our stimulus set systematically covered all ranges of the given visual parameters, as illustrated by orange points in Figure 1B, with the maximum values of each of these visual properties being 4 times as large as their minimum values. This design yielded a total of 17 different stimulus conditions: 4 subconditions for each of the numerosities 8, 11, 23, and 32, and 1 condition for numerosity 16. The minimum IA was ∼78.5 pixel2 encompassing 0.21° visual angle (10 pixels) in diameter, and the maximum IA was ∼314.2 pixel2 encompassing 0.42° visual angle (20 pixels) in diameter. The minimum FA was ∼25 447 pixel2 encompassing 3.74° visual angle (180 pixels) in diameter, and the maximum FA was ∼101 787 pixel2 encompassing 7.48° visual angle (360 pixels) in diameter.
Task and Procedure
Each participant completed 4 experimental blocks. In each block, participants passively viewed dot arrays presented around the center of the screen for 200 ms, with stimulus onset asynchronies varying between 700 and 900 ms (random selection from a uniform distribution). Each block consisted of 400 unique arrays. To ensure that subjects paid attention to the stimuli, an oddball detection task was employed. Specifically, the participants were instructed to press a button when the dot array was displayed in red (5% of trials). Oddball trials were not analyzed. Participants used their left index finger to respond for 2 blocks and their right to respond for the other 2. The finger order was counterbalanced across participants. A fixation dot appeared on the center of the screen between stimuli. The hit rate for the detection of the oddball target was 97.2% with median reaction time of 436 ms.
Electrophysiological Recording and Analysis
The electroencephalogram (EEG) was recorded continuously from 64 channels mounted in a customized, elastic electrode-cap (Duke64 Waveguard cap layout, Advanced Neuro Technology, the Netherlands) at a sampling rate of 512 Hz, a low-pass filter with a high-frequency cutoff at 138 Hz, and an online averaged reference. Our custom cap is designed such that the electrodes are equally spaced across the cap, while also providing extended coverage of the head from just above the eyebrows anteriorly to below the inion posteriorly (Woldorff et al. 2002). The ground electrode was placed on the left collarbone, and the electrooculogram (EOG) was monitored with electrodes below the left eye, and slightly lateral to each external canthus. Electrode impedances were maintained below 10 kΩ for the EOG channels and 5 kΩ for all other channels.
Event-related potential analyses were carried out using the EEGLAB (Delorme and Makeig 2004) and its ERPLAB toolbox (http://www.erpinfo.org/erplab/erplab-toolbox) in Matlab R2012a. The continuous EEG data were offline band-pass filtered to 0.01–100 Hz. EEG epochs time-locked to the onset of the dot arrays were extracted from 200 ms before to 600 ms after stimulus onset, to which a prestimulus baseline removal was applied. A step-like artifact rejection tool in EEGLAB (threshold = 30 μV; window width = 400 ms; window step = 20 ms) was used to identify any epochs contaminated by eye movements or blinks, which were then removed (on average, 20.4% of trials were rejected). The epochs were then averaged for each stimulus condition. Before the grand average ERPs were computed (see, e.g., Figs 1C and 2), individual ERPs were low-pass-filtered at 30 Hz. No low-pass filter was applied before the subsequent mixed-effect regression analyses (see the section Regression Analyses).
A linear mixed-effect model was used to assess the contribution of each of the visual properties to the neural activity. Each participant's mean ERP amplitudes in a 100-ms time window (a time window of 100 ms was used to minimize potential alpha noise variations) around each given time point for each of the 17 stimulus conditions were extracted from the data, and these were then entered as the response in this model. Then, 3 orthogonal regressors capturing N, SzA and Sp were entered as fixed-effect parameters. A positive (negative) change in one unit in any of these regressors resulted in a doubling (halving) in the corresponding dimension. Subject was entered as a random effect, which allowed an additive random effect in all the parameters for each subject. This mixed-effect model was performed on each time window (72 successive 100-ms time windows starting −200–100 ms with a step of 10 ms, measured across all 64 channels).
For the interpretation of the results, first, the significance of the effects of the overall model was assessed by comparing this full model including a constant and 3 orthogonal regressors (N, SzA, and Sp) against a constant-only null model using a likelihood ratio test. This model comparison resulted in a chi-square statistic at each of the 72 time windows across all the channels (see Fig. 3). Topographic distributions of the fixed-effect parameter estimates of N, SzA, and Sp (βN, βSzA, and βSp, respectively) were also plotted (see Fig. 3). These parameter estimates and their estimated covariance matrix were also used to plot the parameter estimate vector = (βN, βSzA, βSp) and their confidence regions in the parameter space (Fig. 5).
One of the key research questions was to determine which of the candidate visual properties (e.g., N, TA, IA, FA, or Spar) best represent the direction of . Thus, the angle between and the dimensions for each property was computed. Furthermore, in order to test whether the degree to which is close to one axis is statistically greater than the degree to which is close to another axis, a bootstrapping approach was used to derive two-tailed P-values. Specifically, a bootstrapping sample (10 000 repetitions) of for a particular latency window and site of interest was generated by running the linear mixed-effect model while randomly sampling participants with replacement. The observed angle difference (between the angle formed by and one axis and the angle formed by and another axis) was tested against the distribution of the angle difference computed from this null distribution.
A new group of 52 participants participated in Experiment 2. Four of these participants were excluded from further analyses as they were unable to stay awake throughout the experiment, and another participant was excluded due to equipment failure. The final sample thus consisted of 47 participants (all right-handed, 22 male, 18.0–24.5 years old with a mean of 19.2). Participants provided written informed consent to a protocol approved by the Duke University Institutional Review Board.
The rest of the methods were identical to that of Experiment 1, except for the construction of the stimuli. The stimuli here were systematically sampled from a parameter space based on the perimeter of each dot instead of the area of each dot (Fig. 6A). That is, individual item perimeter (IP) and total perimeter (TP) were manipulated to have a 4-fold change from the smallest to the largest of their values. Note that this manipulation inevitably generates a 16-fold change in total area (TA) and individual area (IA). The minimum IP was ∼22.0 pixels encompassing 0.15° visual angle (7 pixels) in diameter, and the maximum IP was ∼88.0 pixels encompassing 0.58° visual angle (28 pixels) in diameter. This design enabled a construction of a dimension orthogonal to numerosity, which we call size in perimeter (SzP). This novel dimension can be interpreted as the dimension that changes both the individual perimeter and the summed perimeter of the array when N is held constant. Artifact rejection resulted in an average of 21.8% of trials being discarded. The hit rate for the detection of the oddball target was 99.1% with median reaction time of 410 ms.
In Experiment 1, participants passively viewed dot arrays (see Fig. 1A) that consisted of 8, 11, 16, 23, or 32 dots while their brainwaves were recorded. We first evaluated the grand-averaged ERPs collapsed across all the stimuli (Fig. 1C). The corresponding topographic map showed prominent bilateral, positive-polarity, parieto-occipital peaks around 220 ms after stimulus onset. These peaks were at electrodes closest but slightly (∼0.14 radians) inferior to PO7 and PO8 in the standard 10–20 system (henceforth referred to as PO7i and PO8i, respectively). The latency and the location of these peaks much resemble an ERP component previously associated with numerical distances, which has been referred to as the P2p (Dehaene 1996; Temple and Posner 1998; Libertus et al. 2007; Hyde and Spelke 2008). Thus, these bilateral peaks may indicate encoding of some critical dot-array information.
Modulation of the ERPs to Changes in Numerosity
To assess the effect of variations of each of the visual properties on the ERPs, particularly over these bilateral occipital sites at around 220 ms, the brainwaves from channels PO7i and PO8i were sorted as a function of the various stimulus properties (Fig. 2). First, the ERP activity showed a systematic gradient when they were sorted along the numerosity (N) dimension (see Fig. 2A): The lowest numerosity elicited the smallest absolute positive-polarity ERP amplitude in the range between the first negative deflection (N1) and the second positive deflection (P2), whereas the highest numerosity elicited the largest positive-polarity ERP amplitude in that range. The grand average of the linear contrast (contrast coefficients of −2, −1, 0, 1, and 2) along N (the green waveforms in Fig. 2A) showing a positive-polarity peak at around 220 ms confirmed this systematic gradient over both sites. At PO7i, the mean of the linear contrast waves across all subjects around 220 ms (i.e., 195–245 ms) was significantly different from zero (t42 = 10.0, P < 0.001) with a fairly large effect size (Cohen's d = 1.53). At PO8i, the same measure was also significantly different from zero with a large effect size (t42 = 9.06, P < 0.001, d = 1.38).
The ERPs were also sorted along the other dimensions. When they were sorted along IA, a similar systematic gradient was also observed near the same latency, with smaller IA eliciting larger absolute positive-polarity ERP amplitudes in this same latency range (Fig. 2B). The linear contrast wave showed a negative deflection at both the left (t42 = −8.02, P < 0.001, d = −1.22) and the right (t42 = −7.26, P < 0.001, d = −1.11) occipital sites. Note that, in the given parameter space (Fig. 1B), greater IA on average yields smaller N, which is why greater IA resulted in smaller absolute ERP amplitudes (see the next section for the quantitative rationale for this argument). Figure 2C shows a similar gradient of the ERPs when they were sorted along TA, with larger TA eliciting larger ERP amplitudes. The linear contrast revealed a positive deflection at both the left (t42 = 8.69, P < 0.001, d = 1.32) and the right (t42 = 9.43, P < 0.001, d = 1.44) sites. Figure 2D shows a similar gradient when the ERPs were sorted along Spar, with smaller Spar eliciting larger ERP amplitudes bilaterally (t42 = −10.3, P < 0.001, d = −1.57 at PO7i; t42 = −9.20, P < 0.001, d = −1.40 at PO8i). Finally, Figure 2E reveals a systematic gradient of the ERPs along FA, with larger FA eliciting larger ERP amplitudes (t42 = 7.64, P < 0.001, d = 1.16 at PO7i; t42 = 6.68, P < 0.001, d = 1.02 at PO8i). Thus, the ERPs around 220 ms at PO7i and PO8i showed evidence of sensitivity to N, IA, TA, FA, and Spar. However, these are not pure tests of the hypothesis that the neural activity is actually modulated specifically by these various dimensions because these variables are not independent.
If the neural activity were purely sensitive to one variable, say TA, it should systematically vary as a function of changes in that dimension. At the same time, we would also expect that neural response to be invariant when that variable, TA, was held constant. In other words, a given ERP pattern can be said to reflect the encoding of a specific stimulus property “if and only if” the ERP is modulated by that specific stimulus property.
If the ERPs around 220 ms over PO7i and PO8i reflect variations in numerosity, then these ERPs should be invariant to changes in other dimensions when numerosity is held constant. Figure 2F,G shows that this is in fact what we observed. On the one hand, while holding numerosity constant, one can vary TA and IA together, the linear combination of which is represented by the novel dimension SzA. When the ERPs were sorted along SzA, as shown in Figure 2F, there was little systematic gradient with nearly flat linear contrast waves. On the other hand, while holding numerosity constant, one can vary FA and Spar together, the linear combination of which is represented by the novel dimension Sp. When the ERPs were sorted along Sp, as shown in Figure 2G, again there was little systematic gradient with nearly flat linear contrast waves. Note that although the linear contrast waves deviated from 0 in the sense of statistical significance (along SzA, t42 = 0.920, P = 0.363 at PO7i, and t42 = 2.46, P = 0.018 at PO8i; along Sp, t42 = −3.02, P = 0.004 at PO7i, and t42 = −3.99, P < 0.001 at PO8i), the effect sizes were much smaller (along SzA, d = 0.140 at PO7i, and d = 0.375 at PO8i; along Sp, d = −0.461 at PO7i, and d = −0.609 at PO8i) than those from the linear contrast waves along N. In particular, the linear contrast along N was significantly larger than the linear contrast along SzA or Sp at both the P07i and P08i sites (minimum t42 = 6.68, P < 0.001).
This kind of modulation of ERPs along one dimension but not as a function of the orthogonal dimension was not observed for any of the other visual properties (IA, TA, FA, or Spar). For example, the ERPs were modulated by IA (Fig. 2B); however, they were still sensitive to changes in TA (Fig. 2C), which is equivalent to holding IA constant on average. Likewise, the ERPs were modulated by FA (Fig. 2D) but still varied as a function of the orthogonal dimension Spar, which is equivalent to holding FA constant on average (Fig. 2E). The only way to change the cumulative area of an array of dots (TA) without changing the area of the individual dots (IA) is to change the number of dots. Similarly, the only way to change the inter-dot spacing (Spar) without changing the overall area within which the dots are drawn (FA) is to change the number of dots. Thus, these results indicate that the systematic gradient in the ERPs around 220 ms over the bilateral occipital channels reflects changes in numerosity more so than any of the other visual properties tested in the current experiment.
A Novel Analytic Technique Provides Quantitative Evidence for Numerosity Encoding
To mathematically assess the influence of numerosity and other visual properties on neural activity, we developed and applied a novel analytic technique. A linear mixed-effects model was used to characterize and quantify the unique contributions of each of the visual properties to the ERPs (see Materials and Methods). In short, this model explained the ERPs with 3 fixed-effects orthogonal regressors representing changes in N, SzA, and Sp, and a random effect of participant. The fixed-effects parameter estimates for numerosity (βN), area (βSzA), and spacing (βSp) enabled us to assess the unique contributions of each of these dimensions to the neural activity.
Topographic distributions of the overall fit of the model are illustrated in the top of Figure 3. Similar to the topographic map of grand-averaged ERPs (Fig. 1C), the model fit peaked at the same bilateral occipital sites at around 200 ms. More interestingly, another peak was identified much earlier at around 75 ms over the medial occipital channel OZ′ (0.03 radians above OZ in the standard 10–20 system). Topographic maps of βN, βSzA, and βSp (bottom of Fig. 3) suggest that these peaks in the model fit were largely driven by a stronger effect of N than SzA or Sp.
As the earlier effect (75 ms) over the medial occipital site was something novel that was not identified from the grand-average maps across all the stimuli (see Fig. 1C), we further examined the qualitative nature of the ERPs by plotting the brainwaves sorted along N, SzA, and Sp (Fig. 4) similar to what was shown in Figure 2. Interestingly, there was a clear, robust systematic gradient in the ERPs to numerosity (but of negative-polarity) (t42 = −7.05, P < 0.001, d = −1.08) with no significant effect for SzA (t42 = −3.48, P = 0.001, d = −0.531) and a much smaller effect in the case of Sp (t42 = 0.600, P = 0.552, d = 0.0915).
We then took a closer look at the parameter estimates at the 3 peaks identified in the topographic map of the model fit: electrode OZ′ at 75 ms, PO7i at 183 ms, and PO8i at 183 ms (more precisely, 100-ms window around these peak latencies). The parameter estimate vector, = (βN, βSzA, βSp), was drawn on the 3D parameter space for each of the time points of interest (Fig. 5). The axis that lies the closest to can be interpreted as the dimension that has the largest contribution to the changes in the ERPs. At Oz′ around 75 ms, = (−0.590, −0.106, 0.046), and it was closest to the axis for numerosity (denoted as axisN) with 11.1° angle between and axisN. The second closest axis to was the axis for total area (axisTA). Importantly, the angle between and axisN was significantly smaller than the angle between and axisTA (P = 0.002). At PO7i around 183 ms, = (0.824, 0.021, −0.094), and the angle between and axisN was 6.65°, which was significantly smaller than the angle between and the second closest axis, axisSpar (P < 0.001). At PO8i, = (0.991, 0.124, −0.181), and the angle between and axisN was 12.47°, which was significantly smaller than the angle between and the second closest axis, axisSpar (P < 0.001). These results provide quantitative evidence that, among all the visual properties of interest, changes in numerosity explain more of the variance in the ERPs than do the other variables at both of these key time points (75 and 180 ms).
A Second Experiment Further Provides Evidence for the Early Encoding of Numerosity
In Experiment 1, we assessed the influence of numerosity, TA, IA, FA, and sparsity on neural activity. However, another potential important visual property, the total perimeter of the dot stimuluation, was not considered in Experiment 1. Mathematically, individual item perimeter (IP) and total item perimeter (TP) can be represented as a function of SzA and N (see Supplementary Text). One unit change in IP can be represented as a positive change in SzA and a negative change in N by an equal amount, which is identical to what a unit change in IA represents. A unit change in TP can be represented as a positive change in SzA and 3 times as much positive change in N, which makes axisTP much closer to axisN than axisTA is close to axisN (see Supplementary Text). Therefore, s estimated in Experiment 1 may have been as close to axisTP as they were close to axisN, raising the possibility that the ERPs may have represented total item perimeter instead of numerosity.
In order to empirically test the validity of this alternative hypothesis, we ran a second experiment with a new group of participants. Experiment 2 was identical to Experiment 1 except total item perimeter (TP) and individual item perimeter (IP), instead of IA and TA, were systematically manipulated (Fig. 6A). The ERPs from the 3 channels identified in Experiment 1 (i.e., Oz′, PO7i, and PO8i) were decomposed into 3 orthogonal axes representing numerosity (N), size in perimeter (SzP), and spacing (Sp) (see Fig. 6B for overall model fit). Similar to SzA, SzP can be understood as a linear combination of IP and TP, given a constant N. Thus, SzP represents the dimension of the size of the dots that are allowed to vary while numerosity is held constant, but scaling with the perimeter rather than the area of the dots. Note an independent experiment was necessary to test this hypothesis because SzP and SzA are not related by a simple scalar (see Supplementary Text), and therefore, the stimulus space was not systematically sampled in Experiment 1 in terms of perimeter.
Figure 6C shows the results of the linear mixed-effects model. At Oz′ around 75 ms, = (−0.730, −0.017, 0.066) and the angle between and axisN was 5.3°, which was significantly smaller than the angle between and the second closest, axisSpar (P < 0.001). At PO7i around 183 ms, = (0.649, 0.111, −0.068), and the angle between and axisN was 12.41°, which was significantly smaller than the angle between and the second closest, axisSpar (P = 0.011). At PO8i around 173 ms (the overall model fit peaked at 173 ms rather than 183 ms), = (0.856, 0.220, −0.073), and the angle between and axisN was 15.20°, which was significantly smaller than the angle between and the second closest, axisTP (P = 0.032).
Brainwaves sorted along the dimensions of N, SzP, and Sp (Fig. 7) confirm the findings from the linear model. As in Experiment 1, there was a marked systematic gradient in the ERPs as a function of numerosity, but comparatively smaller systematic changes as a function of the 2 other orthogonal dimensions (see Supplementary Fig. 4 for ERPs sorted along all other dimensions). That is, the linear contrast along N was significantly larger than the linear contrast along SzP or Sp in P07i, P08i, and Oz′ (minimum t42 = 3.29, P = 0.002). Overall, the results from Experiment 2 further confirmed the findings in the first experiment and suggest that the observed modulations in the ERPs primarily represent changes in numerosity.
An Exploratory Analysis to Search for a Greater ERP Modulation of Other Visual Properties than of Numerosity
We then conducted an exploratory analysis to look for any evidence of neural activity primarily representing visual properties more so than numerosity. Note that, in the analyses reported thus far, the electrodes and the latencies were selected based on the fit of the linear model (see Figs 3,6B). Interestingly, all of these selections showed a stronger effect of numerosity compared with that of other properties. Yet, it is possible that other properties modulated the neural activity at different time points and locations that did not happen to coincide with the peak of the overall model fit. To evaluate this possibility, we searched for the cases when the angle between and any of the stimulus dimensions was equal or smaller than the angle between and axisN. In Experiment 1, across the 3 time points of interest, we had found that the maximum angle between and axisN was 12.47° (see Fig. 5) and the minimum Euclidian norm of was 0.601. Consequently, we tested whether any other dimension differed from by an angle of 12.47° or less with a norm of that was at least 75% of 0.601. Note that we used 75% (not 100%) of the norm to capture potentially meaningful modulation of the neural activity at a fairly liberal threshold. No stimulus property other than numerosity met this criterion on any channel at any times point (Fig. 8A).
The same analysis was performed on data from Experiment 2. This time, the maximum angle between and axisN was 15.20° (see Fig. 6C) whereas the minimum Euclidean norm of was 0.662. Based on those values, channels in the occipito-parietal region (35, 41, 42, 44) showed sensitivity to total perimeter (TP) from 192 to 230 ms (Fig. 8B). No other stimulus dimension besides numerosity and total perimeter were found to modulate the neural activity at the given criterion. These results nevertheless do not provide any evidence that greater neural sensitivity to other visual properties compared with numerosity occur prior to the greater neural sensitivity to numerosity compared with other visual properties.
We applied a novel stimulus design and regression-based analytic method to evaluate the unique contributions of visual properties of dot arrays to variations in neural activity. Specifically, using the ERP technique, we evaluated the time course of neural sensitivity to visual properties to assess whether neural sensitivity to numerosity occurs as early as neural sensitivity to other visual properties. In 2 independent experiments, the ERPs indicated neural sensitivity to numerosity information very early in the visual stream, and actually more so than they reflected sensitivity to various other basic visual properties, such as individual and total item area, individual and total item perimeter, field area, and sparsity. These findings suggest that there exists a neural mechanism for direct perceptual processing of numerosity that is not derived from first extracting information related to these other basic properties.
Monotonic Modulation of the ERPs
There are several novel aspects in the present results. First, ERPs were monotonically modulated by numerosity in a passive viewing paradigm requiring no overt responses for numerical information processing. While the P2p has been shown to be modulated by the numerical disparity or ratio between 2 values in explicit judgment tasks or short adaptation paradigms (Dehaene 1996; Temple and Posner 1998; Libertus et al. 2007; Hyde and Spelke 2008), this is the first study to find that ERPs were modulated by the numerical value of the stimulus rather than its relative value to another stimulus (but see Gebuis and Reynvoet 2013). In other words, the modulation we observed was more than a distance effect in that it was not based on a comparison process but was instead a modulation as a function of numerosity based on viewing a single stimulus array. Such a clear monotonic modulation of neural activity suggests that numerosity is likely to be processed via summation coding, represented by a monotonic relationship between numerosity and neural activity (Verguts and Fias 2004; Chen and Verguts 2013) at least in the absence of an explicit task. This observation is consistent with a single neuron recording study that found a monotonic modulation of lateral intraparietal (LIP) neural activity as a function of numerosity in monkeys without an explicit training on numerosity discrimination (Roitman et al. 2007).
Previous single-cell electrophysiology studies (e.g., Nieder et al. 2002) have reported neurons that are selectively tuned to numerosities, hence demonstrating numerosity-selective coding of neural activity. The scalp-EEG employed in the current study, however, reflects an aggregate signal from a very large pool of neuronal activities, and thus, it would be unlikely to reveal neuronal tuning behavior to specific numerosities. Future studies should explore how the current stimulus design in combination with other analytic approaches might be used to uncover the unique contribution of each of the visual properties in explaining numerosity-selective neural properties in a variety of datasets.
One other interesting aspect of not only our results but also previous reports on the P2p (Dehaene 1996; Temple and Posner 1998; Libertus et al. 2007; Hyde and Spelke 2008) is that the scalp location of the bilateral occipital electrodes sensitive to numerosity is relatively posterior to where it is typically associated with numerical processing in fMRI studies (Nieder 2005). An interesting future question would be to ask whether the neural source of the numerosity-sensitive effect (found in the current study) or the P2p effect is in the intraparietal sulcus, as alluded by a source modeling study (Hyde and Spelke 2012), or whether the neural source is in the bilateral middle occipital gyri (Dehaene 1996) in which previous fMRI studies have shown evidence for summation coding (Santens et al. 2010; Roggeman et al. 2011).
Extremely Early Sensitivity to Numerosity
Our second novel and perhaps more interesting finding was that neural sensitivity to numerosity information was observed much earlier in the visual stream than any prior study has found. Similar to our results around 183 ms over PO7i and PO8i, previous studies have found sensitivity to numerical differences at ∼200 ms post-stimulus (Dehaene 1996; Temple and Posner 1998; Libertus et al. 2007; Hyde and Spelke 2008). In this study, we also found sensitivity to numerosity at around 75 ms focally over medial occipital cortex (channel Oz′), which suggests that information about numerosity is encoded much earlier in the visual processing stream than previously thought. The channel location and latency of this component resemble that found for the C1, suggesting that the source of the ERPs may be from the primary visual cortex (Jeffreys and Axford 1972; Clark et al. 1994; Di Russo et al. 2002). The C1 has the distinctive characteristic that its polarity on the scalp inverts as a function of whether the stimulus presentation is in the upper visual field versus the lower visual field, due to their corresponding representations in primary visual cortex being on the lower or upper bank of the calcarine fissure, respectively. Very often, it does not show a prominent peak but rather rides on the front end of the slightly later and typically larger extrastriate occipital P1, thus making it difficult to identify from the grand-averaged waves. These unique characteristics, in combination with the tendency to look for the effects of numerosity in prominent peaks, may have led to this extremely early neural sensitivity to numerosity going undetected in previous studies (but see Fig. 3 in Gebuis and Reynvoet 2013, for a similar earlier effect of numerosity which, however, did not meet their stated criterion for full analysis).
What might be the functional significance of this extremely early neural signature for numerosity encoding? In a classic computational model of numerosity processing, Dehaene and Changeux (1993) proposed that an input “retina” layer on which objects of various sizes and locations are represented projects to an intermediate “object location and normalization” layer in the model, where the location of objects is represented and size is normalized. The normalization layer in turn projects to a summation layer that sums all outputs from the previous intermediate layer. As mentioned earlier, the monotonic modulation of ERPs in PO7i and PO8i around 180 ms very much resembles activation patterns in the later summation layer in this model (see also Verguts and Fias 2004). Likewise, one possibility is that the extremely early sensitivity to numerosity we observed at Oz′ may also reflect a summation computation (as in the summation layer of the Dehaene and Changeux model), but taking place in early visual cortex. However, the summation process by nature requires an additional integration-processing stage, and the primary visual cortex is probably too early in the visual stream to account for such processes. An alternative possibility is that the extremely early neural sensitivity to numerosity in Oz′ reflects a mechanism for individuation of dots that is similar to the intermediate object location and normalization stage in the model proposed by Dehaene and Changeux (1993). Like this intermediate layer in the model, early visual areas may partially normalize some item information. Pools of neurons may represent the presences of dots within their visual receptive fields. Although no individual neuron (or neural pool) would represent numerosity in this scenario, the larger-scale neural pattern resulting from the cumulative activation of many such neuronal pools, which is in turn reflected by the scalp-recorded ERP, would scale with the number of items in the array. As such, the extremely early neural activity over Oz′ may be better explained as a neural process underlying “preattentive individuation” of the dot items rather than an extraction process for numerosity itself; nevertheless, it should be noted that the summed output of this preattentive individuation process would appear to correspond to the numerical information of the stimuli.
Effects of Other Visual Properties
Across the 2 experiments, we found that numerosity modulated the ERPs to a much greater extent than any other visual property. It was not until about 192 ms into the stimulus presentation that total perimeter modulated the ERPs (see Fig. 8B), which was considerably later than when the neural activity was first sensitive to numerosity. These findings suggest that numerosity had much greater influence on explaining the variations in the neural activity early on. However, these results are not to say that other visual properties have no influence in the ERPs. In particular, it should be noted that variables orthogonal to numerosity (i.e., size and spacing) did influence the ERPs—albeit significantly less so than did numerosity—at all time points and channels of interest (i.e., OZ′ at 75 ms, PO7i and PO8i at 183 ms). That is, while holding numerosity constant, manipulating item area/perimeter and total area/perimeter together (SzA in Experiment 1 and SzP in Experiment 2) resulted in small but significant modulations in the ERPs. Likewise, while holding numerosity constant, manipulating field area and sparsity together also resulted in small but significant modulations in the ERPs. Note that item area/perimeter, total area/perimeter, field area, and sparsity are functions of numerosity and size or spacing; therefore, significant effects of size or spacing imply significant effects of the other listed visual properties.
Modulation of the ERPs by some of these other visual properties, especially at the OZ′ site at 75 ms, which may be arising from the striate cortex, is not surprising given the receptive field properties of individual neurons in the early visual areas (Hubel and Wiesel 1959, 1968). The amount of light sources hitting on the retina, and therefore propagated to the lateral geniculate nuclei and then to the primary visual cortex, should be directly proportional to total item area. Total item perimeter is equivalent to the amount of the visual scene occupied by contour boundaries, and thus edge-detecting neurons are likely to be activated. Changes in spacing will affect the extent of the V1 retinotopic map activated by the stimulus. Thus, if there exists a neural mechanism for processing continuous magnitude information in the visual cortex, one would expect low-level neural activity from these regions to be sensitive to these variables. In fact, continuous magnitude did affect the early neural activity, which is not surprising given that numerosity perception is influenced by other perceptual properties (Miller and Baker 1968; Ginsburg and Nicholls 1988; Allik and Tuulmets 1993; Sophian and Chu 2008). However, a critical finding from the current study is that the modulation of neural activity by changes in numerosity was much greater than the modulation by changes in other stimulus properties.
There is a prevailing assumption in the field of numerical cognition that luminance is much more salient than number. This assumption is based on the fact that brightness is a primary visual property and that not only perceived brightness but also absolute brightness of the stimulus is encoded in the striate cortex (Rossi et al. 1996; Kinoshita and Komatsu 2001). Likewise, in the ERP literature, the P1 is strongly influenced by basic stimulus parameters such as size and luminance (Luck 2014). However, very little is known about how luminance perception is carried out at the behavioral and the neural level when a set of items rather than a single item is presented, especially when the array size is well beyond the subitizing range. Behavioral studies in infants demonstrate that while infants are quite sensitive to changes in the size of a single item, when presented with sets of items infants are much more sensitive to changes in numerosity than to total item area (Cordes and Brannon 2008) or to individual item area (Cordes and Brannon 2011). These previous behavioral findings, along with our results, suggest that the prevailing assumption that some continuous variables are always more salient than number in dot arrays may require a serious reconsideration. An important area for future research will be to understand how numerosity-extraction processing may take place in early visual areas and how such a process may interact with processing of other visual properties such as total item area.
Previous behavioral studies have also implicated texture density to be important during visual perception (Durgin 2008; Dakin et al. 2011). Thus, it may be of surprise to find so little effects of sparsity or spacing in the current results. Anobile et al. (2014) have recently shown via numerosity and density comparison tasks that there may be separate mechanisms for estimating numerosity and density. In the numerosity comparison task, they found that the numerosity discrimination threshold increased proportionally with numerosity (which was expected assuming that numerosity perception follows Weber's law), but only when the dot arrays were relatively sparse. When the dot arrays were relatively dense, the discrimination threshold increased with the square root of numerosity, which was similar to the pattern of performance when a density comparison task was given. Their findings explain the predominant effect of texture density in other previous studies (Durgin 2008; Dakin et al. 2011) when relatively dense arrays (often much more than 10 dots per degree2, see [Durgin 2008]) were used. In contrast, it has been more customary to use relatively sparse arrays in studies that probe numerical cognition (Nieder et al. 2002; Piazza et al. 2004; Halberda and Feigenson 2008; Park and Brannon 2013). Although the precise psychometric parameters are usually not reported in these studies, the number of dots typically has ranged between 8 and 32, generally not exceeding 64 dots. Also, almost half of the screen area (if not the entire screen area) of a conventional monitor is typically dedicated as the field area. Our stimuli were designed following these conventions in the literature, and on average, the arrays were relatively sparse, ∼0.7 dots per degree2. Had we used much denser arrays with larger numerical values, the ERPs may have shown an early sensitivity to density. Nevertheless, within the current stimulus design, which is similar to the majority of studies investigating numerical cognition, neural responses in visual cortex were found to be much more sensitive to numerosity than to sparsity (or inverse of density).
Collectively, the current results show that although the neural activity was sensitive to other visual to a small extent, the neural variations were particularly sensitive to numerosity (see Figs 5,6C). These findings imply that numerosity information is encoded extremely early in the visual stream, and this encoding propagates through the dorsal stream, as captured by ERP signatures over parietal-occipital sites later in latency.
Comparisons to Previous Studies
The findings in the current study were made possible by a novel analytic method for studying numerosity processing. One major difficulty in testing how numerosity influences behavior or neural signals is that many visual properties are intrinsically confounded with numerosity (see Fig. 1B). In many behavioral and neural studies searching for the effect of numerosity, researchers have acknowledged this confound and attempted to circumvent it with a variety of strategies. In many of these studies, dot arrays have been constructed such that numerosity was correlated with one dimension in a subset of trials whereas it was correlated with another dimension in another subset of trials (Ansari and Dhital 2006; Halberda and Feigenson 2008; Park et al. 2014). These attempts were made to eliminate any apparent linear relationship between numerosity and other perceptual properties “on average” across all trials. Nevertheless, in a comparison task using such a design, it is still possible that participants may use different strategies across different trials. Some other researchers have attempted to de-confound the linear relationships between numerosity and other perceptual properties by imposing a strong nonlinear relationship between them, also with very large variance in non-numerical properties but a comparatively smaller range of numerical ratios (Gebuis and Reynvoet 2013). In our study, we found that other visual properties did influence the ERPs, albeit in a substantially weaker way; accordingly, imposing a nonlinear relationship between visual properties with much larger variations in non-numerical properties would have made it difficult to interpret the results. Moreover, this nonlinear approach also suffers from the problem that it is difficult to quantify how multiple visual properties change “all together” as a function of a change in one visual property.
In the current study, instead of attempting to control for non-numerical visual properties, we constructed the stimuli such that numerosity and other visual properties were comparably represented in the parameter space. Each of the properties (N, IA, TA, FA, Spar, SzA, and Sp in Experiment 1; N, IP, TP, FA, Spar, SzP, and Sp in Experiment 2) ranged 4-fold from their minimum value to their maximum value. Then, 3 orthogonal dimensions were used to capture the variation in the neural activity using a linear model. The use of the 3 orthogonal dimensions in the linear mixed-effect model analysis does not bias our results in any way. By a mathematical fact, any 3 orthogonal axes could be used in the regression analysis, and the same results would be obtained. While numerosity was the primary property that most strongly modulated neural activity in both experiments, there were small effects of other properties. In particular, the effect of total item perimeter was especially salient in Experiment 2, where area ranged 16-fold as perimeter ranged 4-fold. One potential explanation for this greater effect of total item perimeter is that, because area ranged 16-fold as perimeter ranged 4-fold, there could be a context effect whereby greater attention is implicitly given to the visual property with a larger range of sampling. This explanation is consistent with findings in the large body of literature showing that the brain is capable of implicitly picking up statistical regularities of the stimuli (Bonte et al. 2005; Turk-Browne et al. 2010; Yaron et al. 2012). According to this explanation, a particular visual property may appear to modulate the ERPs more than numerosity if the range (and perhaps resolution) of change in that visual property far exceeds that in numerosity which appears to be the case in some previous studies (Gebuis and Reynvoet 2012, 2013). It is therefore dangerous to compare the relative influence of numerosity and other variables on neural activity or behavior without appropriate stimulus sampling such as equating ranges. It is also important to note that, in our experiments, numerosity was not given any special treatment, either in stimulus design or in the experimental paradigm (unlike many other previous studies probing numerical cognition). Participants were not verbally instructed to attend to the numerical quantity, and numerosity was just one of the visual properties that was systematically varied.
In sum, we have developed a novel stimulus design and analytic method to assess the unique contributions of visual properties of a dot array to the visual evoked potentials measured by EEG. Systematic evaluations of the monotonic ERP modulation revealed that the neural activity was particularly sensitive to changes in numerosity, and, moreover, considerably more so than to changes in other visual properties at both early (180 ms) and extremely early (75 ms) latencies. We propose that this extremely early neural sensitivity indicates the output of a preattentive dot-individuation process. Following this, the neural sensitivity at a later latency (180 ms) indicates the output of a summation computation taking place along the dorsal visual stream. These results suggest that there exists a mechanism for a direct extraction of numerosity in the human visual stream that is minimally influenced by the processing of other low-level features of the stimuli such as individual and total item area, individual and total item perimeter, field area, and sparsity.
This study was supported by Duke Fundamental and Translational Neuroscience Postdoctoral Fellowship to J.P., an NIH R01 grant (R01-MH060415) to M.G.W., and a James McDonnell Scholar Award to E.M.B.
We thank Crystal Chiang, Anchal Sabharwal, Chandra Swanson, and Vanessa Bermudez for their assistance in data collection. Conflict of Interest: None declared.