The emergence of conscious visual perception is assumed to ignite late (∼250 ms) gamma-band oscillations shortly after an initial (∼100 ms) forward sweep of neural sensory (nonconscious) information. However, this neural evidence is not utterly congruent with rich behavioral data which rather point to piecemeal (i.e., graded) perceptual processing. To address the unexplored neural mechanisms of piecemeal ignition of conscious perception, hierarchical script sensitivity of the putative visual word form area (VWFA) was exploited to signal null (i.e., sensory), partial (i.e., letter-level), and full (i.e., word-level) conscious perception. Two magnetoencephalography experiments were conducted in which healthy human participants viewed masked words (Experiment I: active task, Dutch words; Experiment II: passive task, Hebrew words) while high-frequency (broadband gamma) brain activity was measured. Findings revealed that piecemeal conscious perception did not ignite a linear piecemeal increase in oscillations. Instead, whereas late (∼250 ms) gamma-band oscillations signaled full conscious perception (i.e., word-level), partial conscious perception (i.e., letter-level) was signaled via the inhibition of the early (∼100 ms) forward sweep. This inhibition regulates the downstream broadcast to filter out irrelevant (i.e., masks) information. The findings thus highlight a local (VWFA) gatekeeping mechanism for conscious perception, operating by filtering out and in selective percepts.
The question whether the boundary between unconscious and conscious perception is gradual or not is still debated. Neuroimaging studies investigating the neural mechanisms sustaining conscious subjective report provided sound evidence supporting a sharp nonlinear “all-or-none” transition in brain activity (for a review, see Dehaene and Changeux, 2011). However, this neural account is partly in conflict with various behavioral data, which point to a rather graded experience of conscious perception, that is, an intermediate-/partial-level separating null and full consciousness (Mangan 2001; Kouider and Dupoux 2004; Overgaard et al. 2006; de de Gardelle et al. 2009), nor with the notion of hierarchy in representational levels during visual perception (Nakayama et al. 1995; Driver and Baylis 1996; Greene and Oliva 2009; Overgaard and Mogensen 2014; Wu et al. 2015). According to this hierarchical view, lower and higher levels of perception are accessed gradually and independently (Kouider et al. 2010), thereby defining piecemeal conscious perception. What remains unknown is the neural temporal chain of ignition of the different representational levels.
Few studies have explicitly probed the putative underlying neural mechanisms during piecemeal consciousness. For instance, several fMRI studies revealed a linear signal change in several brain regions, correlating with graded subjective reports of perception (Bar et al. 2001; Haynes et al. 2005; Imamoglu et al. 2014). Although these findings are informative, to fully address the question of the neural mechanisms during piecemeal consciousness, it is additionally required to 1) capture the early and fast neural processes sustaining partial access to consciousness (with a millisecond temporal sampling) and 2) to characterize the discrete independent levels of perception (and not solely the visibility of 1 single level). As has been previously illustrated by Kouider and colleagues (2010), the graded organization of words (Selfridge 1959; McClelland and Rumelhart 1981) is particularly suitable for testing partial consciousness. That is, words can be perceived at several levels (e.g., visual components, letters, and whole word), and these levels can be separately represented as neural processes (Levy et al. 2008; Mainy et al. 2008; Vidal et al. 2014). Much previous work has suggested that an area located in the left fusiform gyrus [coined the Visual Word Form Area (VWFA)] houses the neural code for written scripts (for a review, see Dehaene and Cohen 2011). Recently, we conducted a magnetoencephalography (MEG) experiment which aimed at dissociating partial from full conscious perception of masked words (Levy et al. 2013); this dissociation was mirrored as modulation of alpha oscillations in the VWFA. Hence, neural events in the VWFA could be indicative of conscious and selective access of written scripts.
Previous studies revealed a neural ignition of full consciousness after ∼250 ms post-stimulus onset (Sergent et al. 2005; Del Cul et al. 2007; Fisch et al. 2009; Gaillard et al. 2009). However, the early neuronal activity (before 250 ms) may be precisely indicative of the fast and early neural processes, which may underlie partial perception. Although we successfully dissociated between independent levels of perception at late (after 500 ms) processing stages (Levy et al. 2013), the relatively slow alpha rhythm which we monitored is not well suited to capture the early and short-lived neural transitions during the chain of ignition of piecemeal conscious perception. Hence, we reanalyzed the data (Experiment I) in the high-frequency gamma (power and phase-synchrony) band range (40–140 Hz), a rhythm that reliably reflects active states of cortical processing and neuronal communication (Buzsáki and Draguhn 2004; Fries 2005, 2009) and constitutes a sensitive processing marker during single-word recognition (Mainy et al. 2008; Gaillard et al. 2009; Vidal et al. 2014).
Previous studies highlighted late (after ∼250 ms) gamma oscillatory enhancement as a marker of (full) conscious perception (Fisch et al. 2009; Gaillard et al. 2009). Accordingly, to explore the neural mechanisms of piecemeal (null to partial to full) conscious perception, one possible hypothesis would be that piecemeal conscious perception should trigger a piecemeal (i.e., graded) increase in gamma-band oscillations. In contrast, partial conscious perception may operate a different neural mechanism as letter and word processing cortically differ (Levy et al. 2008) and may thereby ignite earlier activity (Tarkiainen et al. 2002). However, early (∼100 ms) gamma activity (power enhancement and stimulus phase-locking) has not been reported as a marker of conscious perception but rather as a marker of bottom–up sensory (Aru et al. 2012; Bosman et al. 2012; Bastos et al. 2014, 2015; Vidal et al. 2015) and nonconscious (Gaillard et al. 2009) processes. Hence, if early gamma band activity is related to sensory and nonselective processing, an alternative hypothesis would be that gradually inhibiting this early activity could act as a selective filter by regulating the efficiency of its downstream broadcast. This early local broadcast control would then act as a gatekeeper toward subsequent full conscious processing. Importantly, VWFA sensitivity to written scripts could be exploited as neural marker to conscious perception (i.e., script processing), compared with null perception (i.e., sensory activity). We therefore assumed that an area tuned to written script processing (i.e., VWFA) should not only process conscious script percepts via late gamma oscillations, but may prior to that inhibit sensory and nonconscious contents (i.e., visual mask processing). This neural pattern is not expected to arise in neighboring occipital areas. Hence, if piecemeal consciousness is achieved, it may be signaled via 2 consecutive stages in the VWFA: an initial inhibition of sensory and nonconscious activity (thereby achieving partial perception), followed by enhancement of selective and conscious activity (thereby reaching full perception).
Our analyses revealed early (∼100 ms) inhibition of stimulus-locked gamma-band phase-synchrony and power in the VWFA occurring prior to late (originating at ∼250 ms) enhancement in gamma. A complementary recording session (Experiment II) was conducted to capture a broader spectrum of piecemeal perception (Experiment I: Partial and Full, Experiment II: Null, Partial and Full) and at the same time tested the sensitivity of this neural mechanism to top–down task engagement (Experiment I: active design, Experiment II: passive design) and to bottom–up input (i.e., orthographic script) (Experiment I: Dutch, Experiment II: Hebrew). The 2 experiments showed that early (∼100 ms) gamma-band synchrony and power increase in the VWFA were triggered by null conscious perception of scripts (i.e., general visual processing); then, inhibition of this activity reflected partial conscious script perception followed by later (originating at ∼250 ms) enhancement of gamma oscillations for full-blown conscious perception. Hence, fast neural events in the VWFA selectively “gate-keep” the access to conscious script perception in a piecemeal mechanism.
Materials and Methods
Thirteen healthy right-handed and native-Dutch human subjects (8 males and 5 females, average age 23.92 ± 4.54 years) with normal or corrected-to-normal vision participated in the experiment. None of the participants had a history of neurological or psychiatric disorders. The study was approved by the local ethics committee, and a written informed consent was obtained from the subjects before the experiment according to the Declaration of Helsinki and received monetary compensation.
Stimuli were generated using Presentation software (Neurobehavioral systems, Albany, USA) in a dimly lit room and subtended a horizontal visual angle of ∼2.5°. They were projected on an LCD monitor placed at a viewing distance of 70 cm. Responses were delivered by response pads. Stimuli were Dutch nouns (300 animate, 300 inanimate) between 3 and 9 letters in length with a lemma frequency between 100 and 300 occurrences per million (Baayen et al. 1993). Animate and inanimate words were balanced with respect to their word frequency of occurrence and the number of letters they contained. The animacy character of the selected words was unequivocal. Words were presented only once to prevent automatic stimulus–response learning or repetition-suppression effect.
Each trial began with a fixation cross presented for randomly chosen latency within the interval from 600 to 933 ms, followed by the presentation of a forward mask (67 ms), a word (33 ms), and a backward mask (67 ms). The random variation in fixation controlled for plausible anticipation of stimulus onset timing. In 16.67% of the totality of trials, we replaced the words with blank screens to avoid subject's habituation or anticipation and to control for the orthographic detection task (letters/no-letters). All stimuli were presented randomly. After stimuli presentation, a larger fixation crosshair prompted the subject to respond. Responses indicated trials’ termination and lasted no longer than 2100 ms. A smaller subsequent crosshair spanned 1000 ms and signaled the interval linking the next trial. Participants were allowed to make eye movements or blinks during that intertrial interval. A conservative response criterion was adopted so as to mitigate perceptual illusions (reconstruction) and at the same time possible subliminal (unconscious) perception which both may occur under thresholded stimulation (Kouider et al. 2010). This approach thereby aimed to maximize the dissociation between the 2 levels of perception and focus only on conscious perception (not subliminal). To this end, subjects were discouraged from guessing by 1) explicitly asking them to categorize the word only when they had clearly recognized it, 2) training on the task with 90 training trials (different words than during the experiment) before measurement, and 3) receiving visual feedback on the correctness of their response. Furthermore, in the unlikely event in which the semantic category (animate vs. inanimate) of a given word appeared equivocal, participants were instructed to avoid response and hence exclude such trials from the analysis. After blocks of 30 trials, participants were allowed to pause. They initiated the start of each new trial block by a button press. In total, each subject performed 720 trials in a total measurement time of ∼55 min. The reports were through button press on a response pad. There were four response buttons in total (right/left hands and middle/index fingers), and the eight possible response combinations were counterbalanced across subjects in the aim of controlling for differences in the motor representations of each finger.
To maximize and to balance attentional engagement, the subjects' task was to report 1) full or 2) partial word-form perception by semantically categorizing the word (animate/inanimate) or if failed to perceive the word, by orthographically categorizing the percept (letters/none), respectively. Correctness in word detection was probed using the semantic categorization task (animate or inanimate), whereas correctness in letter detection was probed based on the ability to report the presence of letters during word trials (hit) or during blank trials (false alarm). To achieve maximal sensitivity of the neural signal and the statistical tests, the procedure aimed at collecting trials with a 50% distribution for each of the 2 reports. A modified “staircase” version of the threshold estimation procedure described by Levitt (1971) was determined individually in real time. The staircase effect was obtained by varying the contrast of a mask of scrambled characters (see Fig. 1A) every 6 trials based on the given subjective response. Noteworthy, to obtain optimal parameters resulting in a 50/50 distribution of the 2 perceptual tasks, prior to this study, we conducted pre-study piloting, in which we obtained optimal stimulation/masking 1) latency and 2) illumination, as well as 3) optimal amplitude (masking intensity) of staircase steps.
MEG Recordings and Data Preprocessing
Ongoing brain activity was recorded (sampling rate, 1200 Hz) using a whole-head CTF MEG system with 275 DC SQUID axial gradiometers (VSM MedTech Ltd., Coquitlam, British Columbia, Canada). Head position was monitored using 3 coils that were placed at the subject's left ear, right ear, and nasion. Bipolar EEG channels were used to record horizontal and vertical eye movements as well as the cardiac rhythm for the subsequent artifact rejection. Only correct trials with an identical contrast level were kept for spectral analysis. They consisted in the largest fraction of report trials and were equally distributed. Trials containing eye blinks, saccades, muscle artifacts, and signal jumps were rejected from further analysis using an automatized procedure.
To minimize effects of fluctuations in response time (RT) on our analyses, we performed a post hoc stratification of the data based on the RT values. The goal of procedure was to assure that RT variance would not bias the tested contrast (i.e., full vs. partial). We therefore obtained from each of the 2 perceptual reports (partial and full), a subset of trials with an identical distribution of RT values across the subsets in each of the 2 report conditions. This approach has been adjusted from Roelfsema et al. (1998), and a similar strategy has been successfully applied before to control for EMG fluctuations (Schoffelen et al. 2005). For every participant, we binned the observations in each condition, while the bin centers were obtained by dividing the range of all RT values into 4 equally spaced bins with equal bin centers. In this way, each of the observations fell within one of the bins, for which we selected a subset of observations such that across the 2 conditions, the number of observations was identical. From the condition with the lowest number of observations in a given bin, all N observations constituting that particular bin were selected for the stratified sample. From the other condition, a subset of N observations was randomly drawn from the observations constituting that particular bin.
Twenty-two healthy right-handed and native-Hebrew-speaking human subjects (13 males and 9 females, average age 27.22 ± 6.19 years) with normal or corrected-to-normal vision participated in the experiment and received monetary compensation. Participants were screened before the MEG experiment for piecemeal conscious perception (i.e., dissociating null, partial, and full conscious perception). None of the participants had a history of neurological or psychiatric disorders. The study was approved by the local ethics committee, and a written informed consent was obtained from the subjects before the experiment according to the Institution's Review Board.
Stimuli were generated using the E-prime software (Psychology Software Tools Inc.) in a dimly lit room and presented in the center of the screen in Courier New, size 12 so that the average stimulus subtended a horizontal visual angle of ∼2.7°. Letters were gray on a black background projected through a mirror on an LCD monitor placed at a viewing distance of 50 cm. Responses were delivered by a response pad. A photosensitive diode on the screen recorded the onset time of visual stimuli. Stimuli were 364 Hebrew nouns or gerunds with a mean length of 4.5 letters ± 0.5 and a mean word frequency of occurrence of 28.39 ± 38.37 per million (Frost and Plaut 2005). The words were randomly counterbalanced across conditions and participants, with respect to their word frequency of occurrence and the number of letters they contained. There was no difference in the frequency of occurrence (F3,360 = 1.17, P = 0.31) nor in length (F3,360 = 0.01, P = 0.99) across conditions. A particular emphasis was put upon selecting words with a minimal length variance (only 4 to 5 letter words were used) as this variable was found to significantly correlate with conscious visual perception (Levy et al. 2013). Words were presented only once to prevent automatic stimulus–response learning or repetition-suppression effect.
Prior to MEG acquisition, subjects were familiarized with an orthographic detection (OD) task and with a semantic decision (SD) task. The subjects then began the behavioral part of the experiment and performed 4 sessions, 2 of each task, to predetermine the 4 perceptual levels of interest: null (OD), partial (OD), prelexical (SD), and semantic (SD) perception. In the OD, the subjects' task was to orthographically categorize the percept (letters/non-letters). In the SD, the subjects' task was to semantically categorize words (animate/inanimate). A modified “staircase” version of the threshold estimation procedure described by Levitt (1971) was determined individually by varying the contrast of a mask of scrambled characters (see Fig. 1B) every 6 trials based on the given subjective response. The procedure aimed at collecting trials with a correctness threshold of 10% or less for the null and sublexical levels, and of 90% or more for the partial and full levels.
The participants then re-performed the 4 sessions, this time at the predetermined level, to compute the objective discriminability d′ value separately for each one of the masking levels. We then assessed d′ values at the group level by means of 1-way ANOVA across the 4 levels of perception. The d′ can be computed only for hit rates and false alarm rates which are different than 1 or 0, namely, when signal and noise are not perfectly discriminated. However, several approaches exist for dealing with possible extreme values (1 or 0) (see Stanislaw and Todorov 1999), out of which, the most commonly used is an adjustment consisting in replacing rates of 0 with 0.5/n, and rates of 1 with (n − 0.5)/n, where n is the number of signal or noise trials (Macmillan and Kaplan 1985).
After registration of the head position, the participants began the MEG measurement by performing a simple word detection task (Fig. 1B): They were instructed to passively watch the visual stimuli presented on the screen and to count the occurrence of 2 filler words. During the pauses, they reported the occurrence by pressing the response pad. All words (except for the filler words) were presented once and randomly distributed in one of the 4 perceptual levels by varying masking luminance. Additionally to the 4 levels, 2 supplementary conditions were computed with identical occurrence frequency, in which the words were replaced by blank screens. This was done to simulate the neuronal response to masks only. The masking luminance of the first corresponded to that of the null perception condition, whereas that of the second corresponded to the full perception condition. The participants were asked to refrain, as much as possible, from moving their head and from blinking during the experiment
Each trial began with a fixation cross (1133–1613 ms), followed by the presentation of a forward mask (67 ms), a word (17 ms), a backward mask (67 ms), and a blank screen (704–896 ms). The random time controlled for plausible anticipation of stimulus onset timing. In approximately 5% of all trials, we presented one out of 2 target words (which were not included in the experimental word database); during random pauses, participants were required to report the occurrences of those names since the last pause. These names were not analyzed, but instead they were used as filler trials aimed to avoid subject's habituation or anticipation and to maintain a steady alertness level throughout the experiment. All stimuli were presented randomly. Participants were allowed to make eye movements or blinks during that intertrial interval. They initiated the start of each new trial block by a button press. In total, each subject performed 546 trials in a total measurement time of approximately 25 min. At the end of the experiment, head position was recorded again, and the perceptual levels (d′) were again computed using the same semantic and orthographic tasks as at the beginning of the experiment. A within-subject repeated-measures ANOVA across the 4 levels of perception before and after the online experiment tested for a possible adaptation effect to the different levels.
MEG Recordings and Data Preprocessing
Ongoing brain activity was recorded (sampling rate, 1017 Hz, online 1–400 Hz band-pass filter) using a whole-head 248-channel magnetometer array (4-D Neuroimaging, Magnes 3600 WH) in a supine position inside a magnetically shielded room. Reference coils located approximately 30 cm above the head oriented by the x-, y-, and z-axis were used to remove environmental noise. Five coils were attached to the participant's scalp for recording the head position relative to the sensor. External noise (e.g, power-line, mechanical vibrations) and heartbeat artifacts were removed from the data using a predesigned algorithm for that purpose (Tal and Abeles 2013). The analysis was performed using Matlab 7 (MathWorks, Natick, MA, USA) and the FieldTrip toolbox (Oostenveld et al. 2011). The data were segmented into 1900 ms epochs including baseline period of 700 ms, and trials containing muscle artifacts and signal jumps were rejected from further analysis by visual inspection. One sensor was excluded from the analysis due to malfunction. The data were then filtered in the 1–200 Hz range with 10 s padding and were then resampled to 400 Hz. Finally, spatial component analysis (ICA) was applied to clean eye blinks, eye movements, or any other potential noisy artifacts.
Spectral Analysis (Experiments I and II)
For each subject, a single-shell brain model was built based on a template brain (Montreal Neurological Institute), which was modified to fit each subject's digitized head shape using SPM8 (Wellcome Department of Imaging Neuroscience University College London, www.fil.ion.ucl.ac.uk). In Experiment I, the head shape was computed using the inside of the skull (Nolte 2003) based on each individual's anatomical MRI, which was spatially aligned to the MEG sensors. In Experiment II, the head shape was manually digitized (Polhemus Fastrak digitizer). The subject's brain volume was then divided into a regular grid. The grid positions were obtained by a linear transformation of the grid positions in a canonical 1-cm grid. This procedure facilitates the group analysis, because no spatial interpolation of the volumes of reconstructed activity is required. It should be noted that the spatial precision in Experiment I was superior as subject-specific dipole grids were obtained based on the subject's anatomical MRI. For each grid position, spatial filters were reconstructed in the aim of optimally passing activity from the location of interest, while suppressing activity which was not of interest.
Time series were extracted from the VWFA by applying a linear constrained minimum variance beamformer. This analysis was followed up by confirming that the expected local gamma-band response emanates from this region, at the whole brain level. To this end, an adaptive spatial filtering (Gross et al. 2004) was applied, relying on partial canonical correlations: the CSD matrix was computed between all MEG sensor pairs from the Fourier transforms of the tapered data epochs at the 2 frequency bands (high gamma, or HG, and middle gamma, or MG) at the early (0–250 ms) and late (250–500 ms) phases of the response, respectively. Spatial filters were constructed for each grid location, based on the identified frequency bin, and the Fourier transforms of the tapered data epochs were projected through the spatial filters. In the aim of computing time–frequency representations (TFRs) of power for each trial, tapers were applied to each time window and to calculate the Fast Fourier Transform (FFT) for short sliding time windows. Data were analyzed in alignment to stimulus onset, and power estimates were then averaged across tapers. To probe gamma-frequency power (40–150 Hz), 5 Slepian multitapers (Percival and Walden 1993) were applied using a fixed window length of 0.2 s, resulting in a frequency smoothing of 15 Hz. Finally, to probe a measure of behavioral acuity for full versus partial perception, we subtracted the d′ values during null perception of words from the d′ values during the complete perception of words. This measure would therefore compute how well the participants were able to recognize words during full perception and at the same time not recognize words during partial perception—thereby reflecting the behavioral acuity for full versus partial perception.
Statistical Analysis (Experiments I and II)
Statistical significance of the power values was assessed similarly at the sensor and at the source levels. It was assessed using a randomization procedure (Maris and Oostenveld 2007). This nonparametric permutation approach does take the cross-subject variance into account, because this variance is the basis for the width of the randomization distribution. This approach was chosen as it does not make any assumptions on the underlying distribution, and it is unaffected by partial dependence between neighboring time–frequency pixels. Specifically, the procedure was as follows: t-values representing the contrast between the conditions were computed per subject, channel, frequency, and time. Subsequently, we defined the test statistic by pooling the t-values over all subjects. Here, we searched time–frequency clusters with effects that were significant at the random effects level after correcting for multiple comparisons along the time and the frequency dimensions. Testing the probability of this pooled t-value against the standard normal distribution would correspond to a fixed effect statistic. However, to be able to make statistical inference corresponding to a random effect statistic, we tested the significance of this group-level statistic by means of a randomization procedure: We randomly multiplied each individual t-value by 1 or by −1 and summed it over subjects. Multiplying the individual t-value with 1 or −1 corresponds to permuting the original conditions in that subject.
This random procedure was reiterated 2000 times to obtain the randomization distribution for the group-level statistic. For each randomization, only the maximal and the minimal cluster-level test statistic across all clusters were retained and placed into 2 histograms, which we address as maximum (or minimum, respectively) cluster-level test statistic histograms. We then determined, for each cluster from the observed data, the fraction of the maximum (minimum) cluster-level test statistic histogram that was greater (smaller) than the cluster-level test statistic from the observed cluster. The smaller of the 2 fractions was retained and divided by 2000, giving the multiple comparisons corrected significance thresholds for a 2-sided test. The proportion of values in the randomization distribution exceeding the test statistic defines the Monte Carlo significance probability, which is also called a P-value (Nichols and Holmes 2002; Maris and Oostenveld 2007). This cluster-based procedure allowed us to obtain a correction for multiple comparisons at all sensor and source analyses.
The online adjustment of masking luminance as a function of participant reports yielded 1 critical mask contrast (per participant) under which physical stimulation was constant, but perceptual distribution was equally distributed between partial and full conscious perception (Fig. 1A; for more detail on the behavioral findings, see Levy et al. 2013). We proceeded to probe the time–frequency (in the broadband gamma spectrum) representations in the VWFA (Dehaene and Cohen 2011) defining the gradual transition from Null, through Partial toward Full perception. The contrast Full versus Partial was characterized particularly by middle gamma (MG) increase (overall at 55–100 Hz, with a peak in power at 60–80 Hz) at approximately 450–650 ms (P < 0.05 corrected) (Fig. 2A, left panel). Whole-brain source localization on this MG enhancement (Full perception compared with baseline activity) determined its source in the left occipito-temporal junction (Fig. 2B, left panel) thereby confirming that during Full perception MG activity emanated from this region, including the VWFA. This neural effect could be explained by perceptual change and was independent of stimulation level which was constant in this experiment. We then examined the temporal dynamics within the VWFA for each of the 2 word percepts. At 500–700 ms, MG in Full perception was significantly more robust than Partial perception (P < 0.001 corrected, Fig. 2C, left panel). Moreover, to check whether piecemeal word perception could influence gamma-band synchrony in the VWFA, Rayleigh z-values were computed to reflect uniformity of synchrony across trials. The contrast Full versus Partial yielded a decrease in synchrony from 90 to 130 ms at 41–53 Hz (P < 0.001 corrected), as well as from 30 to 70 ms at 50–66 Hz (P < 0.001 corrected) (Fig. 3, right panel). Looking at each of the 2 conditions separately (see Supplementary Fig. 4, bottom right panels) illustrated that the 2 inhibitory effects related to 2 bursts of local synchrony during partial perception which are inhibited during full perception. Hence, Experiment I conveys 2 signatures, in power and phase, within the gamma band, that convey a transition from partial to full conscious perception. We then proceeded with a complementary recording session (Experiment II) so as to pinpoint plausible neural signatures for the earlier transition step during conscious perception, namely from null to partial, altogether probing the spectrum of piecemeal perception (Null, Partial, and Full). Moreover, the net perceptual dynamics were investigated in this supplementary session.
First, participants were screened before the MEG experiment for piecemeal conscious perception (i.e., dissociating null, partial, and full conscious perception). The 4 perceptual levels were determined by means of a staircase procedure, followed by an estimation of the perception was assessed as d′ values after the participants completed 2 SD and then 2 OD tasks with the 4 predetermined levels. Out of the 22 participants, 15 were screened for piecemeal conscious perception (Null: no letters, no words; Partial: letters, no words; Full: letters and words), and the remaining 7 were not included in the experiment. Out of the rejected subjects, 6 did not yield the Partial level, and instead both their perception of letters (d′ = 1.08 ± 0.51) and that of words (d′ = 0.88 ± 1.01; P = 0.68) were constrained, and a seventh subject did not yield the Null level [the participant's detection of letters was unimpaired (d′ = 2.11) even after applying the highest masking possible]. Thus, the 15 participants yielding piecemeal conscious perception had a significant statistical difference between the d′ values across the 4 levels of perception (P = 1.77 × 10−15). Importantly, the SD tasks revealed that complete perception of words (d′ = 2.87 ± 0.42) was statistically higher (P = 1.15 × 10−10) than purportedly null perception of words (d′ = 0.51 ± 0.77), whereas the OD tasks revealed that complete perception of letters (d′ = 2.45 ± 0.68) was statistically higher (P = 8.30 × 10−8) than null perception of letters (d′ = 0.28 ± 0.77). Bridging the gap between word perception and letter perception is not a straightforward issue as it is far from unequivocal whether the 2 can be dissociated. Here, we chose a parametric approach to attempt the dissociation of the 2: given that the perceptual level ostensibly reflecting null perception of words was more weakly masked than that reflecting complete perception of letters (except for 1 participant for whom the 2 levels were behaviorally dissociable, despite corresponding to an identical masking level), one could assume with relatively high confidence that complete perception of letters conjointly mirrored close-to-null perception of words. The “null words” level was used in the present experiment to better define the "complete letters" level, although at present, “null words” assumes no clear perceptual functioning. Thus, in the following analyses, we will relate to 3 levels of perception: Null (null letters), Partial (complete letters), and Full (complete words) (Fig. 1B). Furthermore, to control for possible modulation of individual perception level throughout the experiment, the perceptual levels were measured offline also at the end of the MEG experiment: complete perception of words (d′ = 2.58 ± 0.33) was statistically higher (P = 9.03 × 10−7) than purportedly null perception of words (d′ = 0.65 ± 0.65), whereas complete perception of letters (d′ = 2.44 ± 0.72) was statistically higher (P = 1.22 × 10−6) than null perception of letters (d′ = 0.24 ± 0.42). There was an insignificant statistical difference between the d′ values before and after the experiment (P = 0.43); hence, one may rule out a possible adaptation effect throughout the online experiment and may thereby assume a reliable estimation of the individual perceptual experience. Finally, during the online experiment itself, the participants performed the name detection task successfully (90.58 ± 6.79%), thereby pointing to a reliable level of attention throughout the experiment.
To investigate the power modulation across the 3 conditions, we computed an ANOVA revealing a significant difference (P = 0.02 corrected) across all time and frequency samples. Post hoc tests revealed that the contrast Full versus Partial was characterized by MG (overall at 55–80 Hz; note however a later reduced effect at ∼40–50 Hz and 550–650 ms) increase at approximately 150–500 ms (P < 0.01 corrected) (Fig. 2A, middle panel); power enhancement in that window could not be explained simply by mask luminance decrease (P = 0.14). Whole-brain source localization on this MG enhancement (Full perception compared with baseline activity) determined its source in the left occipito-temporal junction (Fig. 2B, right panel), thereby confirming that during Full perception MG activity emanated from this region, including the VWFA. Interestingly and contrary to our expectation, the contrast Partial versus Null was not accompanied by such power increase but rather by an early HG (at ∼80–100 Hz) decrease (at ∼0–150 ms) (P = 0.03 corrected) (Fig. 2A, right panel); power suppression in that window could not be explained simply by mask luminance decrease (P = 0.12). Whole-brain source localization revealed that brain activity (Full perception compared with baseline activity) at the early and HG spectrum emanated rather from the bilateral occipital cortex (yet including the VWFA in Experiment II) (see Supplementary Fig. 1), thereby suggesting a rather general visual processing at the early and HG oscillations during word perception. This was further strengthened by the relative orthogonality of the effects measured in the VWFA with mask luminance (P > 0.11 uncorrected), compared with those measured in other key areas of the occipito-temporal cortex which were rather driven by mask luminance (P < 0.0005 corrected) (See Results section in Supplementary Information). Another measurement which pointed out to the relative orthogonality of the effects measured in the VWFA with mask luminance was through Experiment I, in which no masking modulation was applied. Importantly, the whole-brain analyses further suggest that whereas the early HG effect was more related to visual processing (bilateral occipital activity), the later effect (MG) was rather constrained to the left occipito-temporal junction, thereby pointing out to a preferential response to words at that later phase of gamma oscillations. This conjecture was further supported by additional analyses at 3 more coordinates of the occipito-temporal cortex: the right VWFA homolog and the bilateral occipital cortex (see Supplementary Material Results section). In those other coordinates, there were hardly any differences that may reflect conscious perception. Instead, the effects reflected (bottom–up) mask luminance modulation (see Supplementary Fig. 2).
Moreover, to test for exclusivity of the 2 reversed power effects, the 2 nonoverlapping time–frequency windows were tested separately for each contrast: the late MG increase did not correlate with the contrast Partial versus Null (P = 0.31), and conversely, the early HG suppression did not correlate with the contrast Full vs. Partial (P = 0.12). Furthermore, we tested whether the 2 steps take place during the contrast Full versus Null, thereby confirming that piecemeal conscious perception (i.e., from Null through Partial to Full) is also characterized by an early HG suppression (P = 0.004), followed by a MG enhancement (P < 10−4). Looking at each condition separately (see Supplementary Fig. 3) sharpens the interpretation: During conscious perception (particularly conspicuous at the first step, namely partial), the brain inhibits early HG that is present under no conscious perception (either null perception, or masks only, regardless of their luminance level). In summary, these results reveal in both experiments a pattern of late MG power increase from Partial to Full word perception. Interestingly, this mechanism is not the sole at hand, as another mechanism revealed an early high gamma (i.e., HG) suppression, approximately between 80 and 100 Hz, resulting from the contrast Partial versus Null, yet not from Partial to Full. These findings suggest a 2-staged mechanism of piecemeal conscious perception: through an initial suppression of HG, followed by an enhancement of MG.
The temporal dynamics of MG within the VWFA was then probed: an ANOVA was computed across the 3 conditions (P < 0.005 corrected), and post hoc tests revealed that MG in Full perception was significantly more robust than Partial perception (P < 0.001 corrected) at 150–600 ms and also more robust than Null perception (P < 0.01 corrected) at 250–500 ms (Fig. 2C, middle panel). This effect could not be explained by mask luminance modulation, as revealed by testing the masked blank conditions (P = 0.26). Furthermore, MG in the Partial perception was significantly weaker than that in the Null perception in the 50–200 ms window (P < 0.05 corrected), suggesting an enhancement of early visual processing during Null perception. To test for hemispheric laterality selectivity which is often assumed for this region, we also tested for differences in the right homolog of the VWFA (see Supplementary Fig. 2A): there were no significant effects across conditions in the 2 experiments (P > 0.2). The analyses above revealed that in both experiments MG power in the VWFA correlated with the contrast Full vs. Partial.
Moreover, to check whether piecemeal word perception could influence gamma-band synchrony in the VWFA, we combined all 3 conditions (in Experiment II) and computed their Rayleigh z-values reflecting uniformity of synchrony across trials. The test found a significant (P < 0.05 corrected) early synchrony in the low-gamma range (LG) from 0 until 180 ms at 40–70 Hz (see Supplementary Fig. 4, left upper panel). A linear regression was then computed across the 3 conditions, then revealing a significant linear decrease of synchrony from Null through Partial to Full (r = −0.41, P < 0.05). This decrease in synchrony could not be explained by bottom–up masking modulation (P = 0.43). Post hoc tests on this time–frequency window revealed: 1) The contrast Partial versus Null was characterized by a burst of gamma synchrony suppression from 85 to 120 ms at 52–70 Hz (P < 0.05 corrected) (Fig. 3, left panel), and this decrease in synchrony could not be explained by bottom–up masking modulation (P = 0.12 uncorrected). Similarly, the contrast Full vs. Partial was characterized by a burst of gamma synchrony suppression from 120 to 150 ms at 42–52 Hz (P < 0.05 corrected) (Fig. 3, middle panel). Once again, this decrease in synchrony could not be explained by bottom–up masking modulation (P = 0.12 uncorrected). Furthermore, this early burst did not occur during the contrast Partial versus Null (P = 0.41 uncorrected), yet it did occur during the contrast Full versus Null (P < 0.001 uncorrected), hence confirming that this burst suppression occurs during word perception at the second phase, from Partial to Full. Another, earlier inhibitory effect was found [30–60 ms at 56–66 Hz (Fig. 3, middle panel)], yet with lower statistical power (P = 0.005 uncorrected) and could not be explained by masking strength modulation (P = 0.18 uncorrected). Despite having controlled for masking strength modulation, a second control could be reflected through Experiment I, in which no masking modulation was applied. Remarkably, the 2 inhibitory effects (Full vs. Partial) were almost identical in the 2 experiments (Fig. 3, middle and right panels), thereby pointing to early effects that are largely unchanged by task demands or by bottom–up stimulation. Importantly, these effects were pinpointed in the VWFA, yet not in the right VWFA homolog, nor in the bilateral occipital cortex (see Supplementary Material Results section). Looking at each condition separately (see Supplementary Fig. 4) allowed for a more global outlook: 2 bursts of local synchrony arise and possibly reflect a bottom–up sweep triggered by the 2 masks (and the unidentified word in the null condition). Three inhibitory phases around 50–150 ms reflect piecemeal conscious perception. Hence, early gamma-band synchrony is triggered in the VWFA during null conscious perception of scripts (i.e., by nonrepresentational visual processing). It is through the modulation of this activity that the VWFA selectively gate-keeps the access to conscious script perception.
We then followed by testing whether the observed power and synchrony mechanisms could reflect an integrated mechanism. Hence, we measured the correlation between the 2 frequency measures during the 2 piecemeal perceptual phases (i.e., partial and full); power and synchrony were anticorrelated during Full conscious perception in Experiment I (r = −0.65, P < 0.05) (Fig. 4, left panel) as well as in Experiment II (r = −0.55, P < 0.05) (Fig. 4, middle panel), and positively correlated during Partial conscious perception in Experiment II (r = 0.56, P < 0.05) (Fig. 4, right panel). Hence, the more local synchrony decreased in partial and full perception, the more power decreased or increased, respectively. Hence, the power and phase were functionally coupled in the gamma band to yield piecemeal conscious perception.
Furthermore, the 2 distinct stages expressed in the MG and HG (Fig. 2A) were functionally related and could thereby co-evolve and represent together an integrated process. Hence, the MG/HG ratio was computed at the trial level for each one of the perceptual conditions. In Experiment I, the MG/HG ratio was significantly higher (P = 0.01) for Full compared with Partial perception (Fig. 5A, left panel). In Experiment II, an ANOVA revealed a significant overall effect (P = 0.01) of the MG/HG ratio across Full, Partial, and Null; specifically, the Full compared with Partial perception was also significantly higher (P = 0.008), and additionally, Partial compared with Null perception was significantly higher (P = 0.03) (Fig. 5A, middle panel). A linear regression across subjects revealed that the ratio increased linearly from Null to Full, which could be seen both at the group (r = 0.78, P = 2.62 * 10−12) (Fig. 5A, middle panel) and single-subject levels (r = 0.10, P = 1.08 * 10−6) (Fig. 5A, right panel illustrating an example of regression for a single subject). To rule out any plausible bias of masking luminance modulation, the MG/HG ratio was also computed for the 2 additional control conditions, which both contained blanks instead of words and were masked at the Null and the Full masking levels. There was no significant statistical difference (P = 0.28) between the ratio of the lowest masking contrast (equal to that in the Full condition) and that of the highest masking contrast (equal to that in the Null condition). Hence, a parametric increase in the MG/HG ratio could be observed in both experiments and across all hierarchical levels, irrespective of bottom–up modulation, and thereby mirroring a cortical mechanism for piecemeal word perception. Altogether, the 3 neuronal mechanisms (power and synchrony) across the broadband gamma (LG, MG, and HG) define piecemeal conscious perception and are summarized in Figure 5B. Finally, we probed whether the extent of synchrony (we selected the strongest effect, i.e., during the latest second inhibitory burst) during full perception could reflect the behavioral acuity of full versus partial perception (see Materials and Methods). Conforming to our expectation, synchrony and full versus partial acuity were anticorrelated (r = −0.55, P < 0.05), thereby suggesting that the less the LG of participants was synchronized during full perception, the more were they sharp in distinguishing full from partial perception (Fig. 5C).
In this study, VWFA sensitivity to written scripts was exploited as neural marker to (script) conscious perception, thereby revealing piecemeal ignition of conscious perception. Accordingly, the 2 MEG experiments highlight 1) that induced gamma-band oscillations (MG) originating around ∼250 ms in the VWFA reflect access to full conscious perception of words, 2) that early (∼100 ms) inhibition of sensory driven activity in this area “gate-keeps” the access to conscious script perception via a partial step in the chain of ignition of conscious perception, 3) that this mechanism is largely unaffected by sensory bottom–up stimulation, unlike the neighboring areas in the bilateral occipital cortex which strongly mirror sensory stimulation, 4) that these neural events operate at relatively distinct frequency bands, yet they co-evolve and therefore possibly represent a coordinated neural integration, and 5) that the extent of phase-alignment inhibition predicts the acuity of dissociating full from partial perception.
The first result in the present study, that is, the late (∼250 ms) enhancement of gamma oscillations signaling full conscious perception in both experiments, is congruent with the current state of knowledge regarding conscious perception that is well accommodated by the Global Neuronal Workspace (GNW) model (for a review, see Dehaene and Changeux 2011). It cannot, however, relate to other predictions made by the GNW model nor can it relate to other findings which are explained by the model (e.g., long-distance cortico-cortical synchronization at beta and gamma frequencies and “ignition” of a large-scale prefronto-parietal network) due to the local outlook of the present study. However, the second finding, that is, the early (∼100 ms) inhibition of gamma synchrony and power, is not accommodated by the GNW model which strongly argues that “early bottom–up sensory events, prior to global ignition (∼100 ms) contribute solely to nonconscious percept construction and do not systematically distinguish consciously seen from unseen stimuli” (for a review, see Dehaene and Changeux 2011).
Other authors coined these early events the “feedforward sweep” of information processing due to their spreading from low-level to high-level areas of the visual cortical hierarchy; it is claimed that this early sweep in itself reflects nonconscious processing (Lamme and Roelfsema 2000; Lamme 2006). More recent studies further suggest that both cortical reactivity (Aru et al. 2012; Vidal et al. 2015) and synchrony (Bosman et al. 2012; Bastos et al. 2014, 2015) of early occipital gamma-band responses may functionally mark this sweep. From a local perspective, the increase in phase-synchrony could reflect heightened efficiency of information broadcast (Salinas and Sejnowski 2001). The present study shows that early gamma power (see Supplementary Fig. 3) and synchrony (see Supplementary Fig. 4) increase in the VWFA is driven by sensory processing (Null perception), and that during piecemeal conscious (script) perception this local increase is selectively inhibited. Thus, the enhanced gamma-band sweep in the VWFA under strong masking (Null perception) could be interpreted as the broadcasting of sensory information (i.e., masks) to which the VWFA is not tuned to. By doing so, the VWFA is using resources for processing sensory information at the expense of processing selective information (i.e., script). Thus, by locally (i.e., in the VWFA) inhibiting the early sweep, irrelevant information could be filtered out by acting on the efficiency of its downstream broadcast. The current study therefore predicts not only that conscious perception is piecemeal, but also that it is achieved via a mechanism operating in opposite ways: first filtering out nonspecific information and then tuning into the targeted percepts. Although the current analyses revealed that in both experiments the effects were not modulated by low-level masking parameters, future efforts should probe whether the early inhibition reported here may reflect a general mechanism related to conscious perception (also visible under other visual paradigms) or alternatively solely restricted to masking paradigms. Moreover, it would be most interesting to probe the piecemeal hypothesis through the perspective of the large-scale brain network (e.g., the GNW) and to its recurrent loops (Lamme 2006).
Although the main focus of the present study was to probe piecemeal conscious perception, the present findings can also be of interest to the domain of reading. Namely, the assumption of an intermediate conscious level was approached here by exploiting the 2 perceptual levels involved in word reading (i.e., letter perception and whole-word perception) as 2 levels of conscious perception (i.e., partial and full-blown perception, respectively). Hence the findings relate to the long-lasting debate regarding the processing of words’ subfeatures during reading. Traditionally, it has been assumed that words are recognized as a whole and not letter by letter. The assumption was driven by several observations: letters are more quickly and accurately identified within the context of words (i.e., the word superiority effect) (Cattell 1886; Reicher 1969; Wheeler 1970), and word recognition speed is insensitive to the number of letters in (3–6 letter) words (Nazir et al. 1998). The contrasting approach has stressed the importance of processing letters (McClelland and Rumelhart 1981), thereby corroborating other observations: separately identifying letters constrains word recognition (Pelli et al. 2003); clinical cases showing an impairment of letters identification with a preservation of word identification in patients with specific brain lesions (Patterson and Kay 1982); the deficiency of dyslexic children to identify objects (e.g., letters) in cluster (e.g., a word) (Martelli and Di Filippo 2009) which is substantially remediated when increasing letter spacing (Zorzi et al. 2012). Recently, neuroimaging studies contrasted letter strings and whole words and thereby suggested that despite a cortical whole-word selectivity (Glezer et al. 2009) it is highly probable that visual stimuli are first encoded as letters before their combinations are encoded as words (Thesen et al. 2012). The present findings provide further support to this assumption by modulating perceptual levels of single words.
Future studies of the VWFA can also benefit from the current results: Despite extensive research on the functionality and expertise of the VWFA during reading, very little is known about the functional dynamics of this area especially due to the scarce neuroimaging methods combining high temporal sampling with good spatial resolution. The present findings altogether suggest that orthographic features by themselves trigger a first gamma response (at ca. 200 ms) in the VWFA, regardless of pronounceability, but also that genuine word forms induce a concurrent, yet much stronger response in that area (Fig. 2C). These findings corroborate similar recent intracranial reports of the dynamics of different orthographic categories in the VWFA (Vidal et al. 2010, 2014; Hamame et al. 2013). Furthermore, in both Experiments I and II, we note power enhancement at the descent from the peak during Full perception, although in Experiment II it is mostly masked out by the earlier activation peak (Fig. 2C, middle panel). A similar pattern was also observed in recent intracranial studies (Thesen et al. 2012; Hamame et al. 2013) and may reflect top–down feedback from anterior (lexical and phonological) areas (Song et al. 2012) which are activated earlier than the occipito-temporal cortex during word reading (Pammer et al. 2004; Cornelissen et al. 2009) and object perception (Bar et al. 2006). Finally, this 2-staged scenario may reconcile the 2 major (opposing) views on the role of the occipito-temporal junction during word perception (for a recent review, see Carreiras et al. 2014). Specifically, it confirms an early first bottom–up sweep in this region highly sensitive to word representational units (for a review, see Dehaene and Cohen 2011; for an empirical demonstration, see Hamame et al. 2013), followed by a top–down phonological/semantic input from higher areas which does not necessarily assume a selective tuning to those representational units (for a review, see Price and Devlin 2011; for an empirical demonstration, see Woodhead et al. 2014). In the future, combining time-resolved approaches with network dynamics should further investigate this novel outlook on word processing in the brain.
The present work was supported by the following funding: I-CORE Program of the Planning and Budgeting Committee, the Israel Science Foundation (grant No. 51/11), the Returning Scientists' Program from the Israeli Ministry of Absorption, the Gonda Centre's postdoctoral allowance, the European Science Foundation's European Young Investigator Award Program and the French Ministry's doctoral research allowance.
We thank Yuval Harpaz both for his technical support and advice in Israel, as well as Bram Daams and Sander Berends for their technical support in the Netherlands. We also thank Jan-Mathijs Schoffelen, Robert Oostenveld, and Stan van Pelt for their advice on the analysis scheme, and Michal Lavidor and Ian FitzPatrick for their linguistic support in Hebrew and Dutch, respectively. Conflict of Interest: None declared.