We investigated the sound-induced flash illusion, an example for the influence of auditory information on visual perception. It consists of the perception of 2 visual stimuli upon the presentation of a single visual stimulus alongside 2 auditory stimuli. We used magnetoencephalography to assess the influence of prestimulus oscillatory activity on varying the perception of invariant stimuli. We compared cortical activity from trials in which subjects perceived an illusion with trials in which subjects did not perceive the illusion, keeping the stimulation fixed. Subjects perceived the illusion in approximately 50% of trials. Prior to the illusion, we found stronger beta-band power in left temporal sensors, localized to the left middle temporal gyrus. Illusory perceptions were preceded by increased beta-band phase synchrony between the left middle temporal gyrus and auditory areas as well as by decreased phase synchrony with visual areas. Alpha-band phase synchrony between visual and temporal, parietal, and frontal cortical areas as well as alpha-band phase synchrony between auditory and visual areas were modulated. This supports and extends reports on the influence of brain states prior to stimulation on subsequent perception. We suggest that prestimulus local and network activities form predispositions if sensory streams will be integrated.
In his seminal book “The Organization of Behavior” (Hebb 1949), Hebb stated that “electrophysiology of the central nervous system indicates in brief that the brain is continuously active, in all its parts, and an afferent excitation must be superimposed on an already existent excitation. It is therefore impossible that the consequence of a sensory event should often be uninfluenced by the existing activity.” It is thus surprising that the brain state at the time of sensory stimulation has only recently become a focus of research. A number of researchers (Hanslmayr et al. 2007; Van Dijk et al. 2008; Romei et al. 2010) have reported on the influence of prestimulus fluctuations on perception. Whereas these studies focus on a relatively simple perceived-versus-not-perceived distinction between visual stimuli influenced by modulations in alpha-band activity in parieto-occipital regions, we recently showed that prestimulus beta-band activity in multisensory integration regions influences the perception of an audiovisual illusion—that is, a condition in which participants are always capable of reporting a percept, albeit with varying content (Keil et al. 2012). Beta-band activity is often reported alongside alpha-band modulations, but has received less attention in the domain of cognition and perception. Laufs et al. (2003) suggested that beta-band oscillations index spontaneous cognitive operations during rest. Modulations in beta-band activity may thus constitute a crucial feature for varying perceptual predispositions. Engel and Fries (2010) recently put forward the idea that beta-band activity is associated with endogenous top-down influences in the cognitive domain. Using the McGurk effect (Keil et al. 2012), we showed that local beta power in the left superior temporal gyrus (STG) as well as a phase coupling to frontal and parietal and also decoupling from auditory and visual primary sensory areas abets audiovisual integration and thus illusory perception. A recent study using a tactile–visual, double-flash paradigm (Lange et al. 2011) also reported on modulations in the beta range, whereas a very extensive report by Hipp et al. (2011) tells of beta-band modulations in large-scale cortical networks playing a role in the perception of bistable, ambiguous stimuli (i.e., a stimulus that has no clearly defined percept). The sound-induced flash illusion (SIFI; Shams et al. 2000), however, does not consist of bistable, ambiguous stimuli. In this paradigm, a single visual stimulus is accompanied by 2 auditory stimuli. Both stimuli are salient in isolation and do not produce a high number of false responses. Their combination, however, results in an illusory perception of a second visual stimulus. Previous studies (Shams et al. 2001, 2005; Mishra et al. 2007, 2010) report the rates of illusory perception between approximately 45% and 81% of trials. Recent studies (Shams et al. 2005; Cappe et al. 2010; Mishra et al. 2010) have reported very early modulations of the visual cortex through sound as well as early interactions between audiovisual and auditory and visual stimulation. These studies have concordantly identified the superior temporal regions alongside primary visual and auditory regions as a source of this effect. Moreover, this effect is related more to the perceptual experience of the illusion than to response bias, as has been stated by Mishra et al. (2010). Cappe et al. (2010) are in line with this as they found distinct configurations of cortical sources associated with audiovisual stimuli.
While the aforementioned studies on the SIFI have investigated poststimulus activity patterns, in the present study, we are mainly interested in the prestimulus states that predispose audiovisual integration and subsequent perception using magnetoencephalography (MEG). Apart from prestimulus differences in local synchronization, our analysis also encompasses the investigation of the patterns of interareal synchronization as measured using phase synchronization prior to the stimulus onset. We hypothesize that both aspects of macroscopic neuronal activity crucially determine the flow of information once the stimulus impinges on the system. Based on our previous work on the McGurk illusion (Keil et al. 2012) as well as recent theoretical considerations that view beta-band activity as a possible indicator of large-scale top-down influence (Engel and Fries 2010), a special emphasis was placed on prestimulus beta-band dynamics.
Materials and Methods
Fourteen (1 male/13 females with the mean age of 23.6) paid volunteers participated in this study. All participants gave their written informed consent. All participants were right-handed and had normal hearing and normal or corrected-to-normal vision. Prior to the experiment, participants received detailed instructions regarding the procedure of the experiment and subsequently gave their written informed consent. The procedures of the experiment were approved by the Ethics Committee of the University of Konstanz.
Experimental Design and Apparatus
The experiment consisted of 600 trials in which we presented small dots and short tones in 4 blocks of 150 trials each. The stimuli were presented via Psyscope X (http://psy.ck.sissa.it/) on a MiniMac (Apple Inc.). Three hundred ninety trials contained the critical mismatching stimuli (1 dot, 2 tones, and V1A2). The remaining trials consisted of the presentation of all other combinations of stimuli (V0A1, V0A2, V1A0, V1A1, V2A0, V2A1, and V2A2). We followed the description of Shams (Shams et al. 2002; experiment 2) in the presentation of stimuli: A fixation cross was displayed throughout the entire stimulus presentation. Visual stimuli were presented for 17 ms, auditory stimuli for 7 ms, and the second stimuli followed after an interstimulus interval of 50 ms (Fig. 1A). Note that the onset of the first auditory and visual stimulus was synchronous as the number of trials with multiple perceived flashes increased with smaller stimulus onset asynchronies as reported by Shams et al. (2002; experiment 2). To make the onset of stimulation unpredictable, a random pause between 700 and 1400 ms was inserted between the onset of the fixation cross and the presentation of stimuli. The response screen (“0 1 2”) was displayed 300 ms after stimulus offset without time-out. Using a forced choice task with a 500-ms delay following stimulus offset (i.e., 200 ms after the onset of the response screen), participants had to indicate whether they had perceived 0, 1, or 2 visual stimuli by pressing a button. This additional delay was introduced to avoid accidental or random button pressed. The next trial started after the response. Response was always delivered with the right hand via 3 buttons of a MEG-compatible response pad (LumiTouch, Photon Control, Inc.). Each trial required a behavioral response; therefore, motor preparation activity can be assumed to be identical in all trials. An important dependent variable in this investigation was thus the subjectively perceived content of the audiovisual sensation. The visual stimuli were presented on a screen inside the magnetically shielded MEG acquisition room using a video projector (DLA-G11E, JVC, Friedberg, Germany) and a set of mirrors positioned outside the room. The screen was positioned 60 cm above the subject, the fixation cross had a size of 1.5 by 1.5 cm, and visual stimuli were presented 5.5 cm below the fixation cross with a radius of 1 cm. The audio stimuli were 1000-Hz sine wave tones presented with an analogue-to-digital converter (Motu 2408) using amplifiers (Servo 200, Samson) and a 6.1-m long, 4-mm wide tube system (Etymotic Research, ER30).
Data Acquisition and Analysis
MEG recording was conducted using a 148-channel magnetometer (MAGNES 2500 WH, 4D Neuroimaging, San Diego). A subject-specific headframe coordinate reference was defined by means of 5 anatomical landmarks. These head fiducials, 5 coils, and the subject's head shape were digitized using a Polhemus 3Space Fasttrack at the start of each session. The subject's head position relative to the pickup coils and the MEG sensors were estimated before and after each session to ensure that no large movements occurred during data acquisition.
Subjects were lying comfortably in a supine position. They were instructed to remain still during the stimulation and to avoid eye movements and blinks as much as possible. Continuous data sets were recorded with a sampling rate of 678.17 Hz (bandwidth 0.1–200 Hz). A video camera installed inside the MEG chamber allowed the monitoring of subjects' behavior and compliance throughout the experiment.
After data acquisition, epochs of 4 s (±2 s) around stimulus onset were extracted from the raw data. Note that these epochs also contained activation from previous and following events, but were chosen to avoid filter artifacts and reliable time–frequency analyses. Shorter segments around stimulus onset without contaminating activity were used later on. Epochs were visually inspected for eye movement (recorded via electrooculogram) or movement artifacts. Critical trials (1 visual stimulus and 2 auditory stimuli, V1A2) were grouped according to their response into 2 categories: “No illusion” (response “1”) and “illusion” (response “2”). Trials with only one visual (V1A0) or one auditory stimulus (V0A1) were used to identify the unisensory evoked fields and localize their respective sources in primary sensory areas. The numbers of trials for the different categories were equalized for each subject using random omission in order to ensure comparable signal-to-noise ratios (SNRs) for both perceptual categories, as has been done a number of times before (Keil et al. 2010; Müller and Weisz 2011). Resulting epochs were filtered with a 1-Hz high-pass filter (zero-phase, Butterworth) prior to the analysis of oscillatory activity. As the prestimulus activation was the main interest of the study, no baseline was defined and outputs of the sensor- and source-space analysis for the conditions were directly compared. For the analysis of event-related activity, single trials were low-pass filtered using a 30-Hz zero-phase Butterworth filter prior to averaging and were then converted to the root mean square (RMS).
For the time–frequency analysis, a wavelet time–frequency transformation with Morlet wavelets was computed (wavelet length = 7 cycles, size of the Gaussian taper = 3). Average event-related activity was subtracted from the single trials before computing the time–frequency transformation in order to remove potential effects of the dominant evoked response that bled into the baseline period. This procedure resulted in single-trial estimates of oscillatory power between 5 and 41 Hz in 2-Hz steps. To exclude the theoretical possibility that event-related field (ERF) subtraction adversely influences the estimation of prestimulus power, the entire analysis was repeated without subtraction. Since the results were virtually identical, in this manuscript, we restrict our description to the original analysis.
To identify the cortical sources of the effects found on the sensor-level, single-trial activity was projected into the source space. Voxel-wise source-space activity from the different response categories was again compared via t-tests. A linearly constrained minimum variance (LCMV) beamformer algorithm (Van Veen et al. 1997) was used to identify the sources of the effects found in the time-series analysis and to localize primary auditory and visual areas for the analysis of connectivity. Source analysis was performed for an activation interval of 265–280 ms after stimulus onset based on the effect identified at the sensor level (see Results) and for the interval of the N1 component (115–135 ms after stimulus onset for the visual domain based on the V1A0 trials and 80–100 ms for the auditory domain based on the V0A1 trials). Spatial filters were computed for the interval between 100 and 400 ms after stimulus onset. Here, the baseline was used from −100 ms until −80 ms before stimulus onset with an interval for the computation of the spatial filter between −400 ms and −100 ms before stimulus onset.
Dynamic imaging of coherent sources (DICS; Gross et al. 2001)—a frequency-domain adaptive spatial filtering algorithm—was used to identify the sources of the effects found in the time–frequency domain. First, illusion and nonillusion trials were combined to one data set, the sensor-level cross-spectral density (CSD) was computed for the time–frequency window identified in the analysis of sensor-level data, and the spatial filter was created using these data. Subsequently, data from both response categories were projected separately into source space using the spatial filter from the previous step. The source analysis was separately conducted on the activity of the 2 conditions, and the difference between the projected sources was computed in the statistical analysis. Source activity was interpolated onto individual anatomical images from magnetic resonance imaging (MRI) and subsequently normalized onto the standard Montreal Neurological Institute (MNI) brain using SPM8 in order to calculate group statistics and for illustrative purposes. With regard to the source-level results, it should be kept in mind that while results are displayed in voxel space (following interpolation onto individual MRIs and normalization onto a MNI volume) with millimeter spacing between voxels, the original spacing between grid points was 1 cm with a total of 1496 cortical sources. Because at relevant points in the results section we report MNI coordinates (in mm), we explicitly point out that the actual resolution of our source analysis does not become higher due to these processes implemented for visualization.
Functional Connectivity Analysis
To understand the role of network processes that influence perception, we analyzed the functional connectivity between cortical sources. The functional connectivity of neuronal activity between cortical regions of interest and the whole-brain volume was analyzed in terms of phase-locking values (PLVs; Lachaux et al. 1999). Phase synchrony was computed for the time and frequency of interest as identified by the sensor-level analysis of the prestimulus effects and for the sources of this effect as identified by the source analysis and primary visual and auditory cortices as identified by the source analysis of the N1 without a priori assumptions. If the phase differences between 2 oscillators are constant, these oscillators are likely either to interact with each other or to share a common driving force. Uniform distributions of phase differences indicate the independence of 2 oscillators. We first computed sensor-level CSD for the time and frequency range identified in the time–frequency analysis (multitaper analysis, DPSS tapers). These CSD values were then projected into source space by multiplying them with the accordant beamformer spatial filters. Spatial filters were constructed from the covariance matrix of the averaged single trials at the sensor level and the respective leadfield by an LCMV beamformer (Van Veen et al. 1997). We subsequently calculated PLVs between the regions of interest and all other sources in order to identify the regions in which functional coupling differentiated between subsequent perceptions.
In the present study, we were interested in the degree to which functional connectivity is predictive of upcoming percepts. To assess this at the single-trial level, we further examined the functional coupling identified in the phase synchrony analysis. Singe-trial complex values for the frequency and time interval identified before were thus projected into source space via frequency-domain adaptive spatial filtering (Gross et al. 2001), and the power and phase for each cortical source were computed from the complex values. For each identified source pair, the single-trial phase difference between the sources was subtracted from the mean of the single-trial phase difference in order to assess the deviance from the mean, thus giving a single-trial index of phase coupling relative to the average coupling. These deviance values from the mean were sorted into subject-individual quartiles, and the proportion of illusion trials was computed for each phase quartile. Figure 2 illustrates this approach, which has been similarly used by Hanslmayr et al. (2007), albeit at the sensor level. To further support this analysis and to assess the stability of the linear trend in the data, we created 2 separate data sets from the single-trial deviance values per subject by randomly assigning each trial to either a “predict” or an “test” data set. Then we performed a binomial 1-way analysis of variance (ANOVA) for the effect of phase deviance on perception to fit a generalized linear model for the first data set (the predict set). This model was then used to predict the effect of phase deviance on perception. The prediction was subsequently compared with the second data set (the test set) using a χ2 test. Furthermore, the angle of phase difference between cortical sources for the trials in the 4 quartiles and power-to-power correlation between cortical sources was computed as a test for spurious synchrony due to local power increase or volume conduction. One concern regarding the interpretation of phase synchrony is that less power in one condition would lead to lower PLVs merely by reduced SNR, rather than genuine differences in coupling. To exclude such artificial differences with high confidence, we additionally compared oscillatory power between illusion and nonillusion trials at the respective cortical sources used in the functional connectivity analysis using dependent-samples t-tests.
In the analysis of event-related activity, we applied a dependent-samples t-test with Monte-Carlo randomization and Holms correction for multiple comparisons for the interval of 0–500 ms of the RMS transformed time course over all sensors. In the analysis of the time–frequency power, a cluster-based (at least 2 sensors per cluster) dependent-samples t-test with Monte-Carlo randomization was performed on the sensor data for the frequency range of 5–41 Hz and all sensors, separately for the prestimulus interval of −500–0 ms, the poststimulus interval of 0–500 ms (Maris and Oostenveld 2007). This method allows for the identification of clusters of significant difference in 2-dimensional (2D) and 3-dimensional (3D; time, frequency, and space), effectively controlling for multiple comparisons. Clusters were defined as significant if the probability of observing larger effects from shuffled data was <5%. The cluster-level test statistic is defined as the sum of the t statistics in 2D or 3D space in the respective cluster. For the identification of the probable neuronal generators of the observed sensor effects, statistical comparisons at the source level were computed using dependent-samples t-tests. Results from the source level were thresholded and corrected for multiple comparisons using AlphaSim (α level = 0.01, critical cluster size = 915 voxels; http://afni.nimh.nih.gov/afni/).
Reaction tendencies were computed as an individual's predisposition toward perceiving the illusion. This relative proportion of illusion reactions (the number of illusion trials divided by the number of all mismatching trials; high numbers indicated a greater tendency toward an illusion percept) in all mismatching trials was correlated with the individual differences (cortical activity or functional connectivity for the “illusion” trials versus “no illusion” trials) at the source level for the time–frequency analyses. Again, we projected the single-trial sensor-level data into source space, and the voxel-wise difference between the response categories was correlated with the behavioral data. This analysis indicated with which neuronal processes the individual's predisposition to perceiving the SIFI were associated.
If phase synchrony influences the subsequent perception, analysis of single-trial phase difference should yield additional information to the condition comparison. A repeated-measures ANOVA in R Development Core Team (2011) was used to evaluate the effects of the source phase differences, and the Tukey honestly significant difference test was used for the post hoc analyses. A trend in the proportion of illusion trials per source phase difference quartile was evaluated with the Cochran-Armitage test.
All aspects of offline treatment of the MEG signals were accomplished using fieldtrip (Oostenveld et al. 2011), an open-source signal processing toolbox for Matlab (www.mathworks.com). Anatomical structures corresponding to the statistical effects are labeled according to the Talairach atlas.
We used the SIFI to elucidate the role of prestimulus oscillatory cortical power and connectivity on the changing perception of invariant stimuli and event-related activity associated with different perceptions. We analyzed cortical activity associated either with the perception of an illusory double flash (response “2”) or with the perception of only a single visual stimulus (response “1”).
In the analysis of the subjects' behavioral responses, we found that participants reported an illusory double-flash perception in 47% (range 20–78%) of the critical (V1A2) trials. This was not significantly different from the number of nonillusory critical trials (53%, range 22–79%) as assessed with a unpaired t-test for unequal variances (Welch's t-test, t = 0.99, P = 0.33; Fig. 1B). Subjects correctly identified the number of visual stimuli in the control trials in which one visual and one auditory stimulus (V1A1, 92%, range 63–100%) or when 2 visual and 2 auditory stimuli were presented (V2A2, 72%, range 27–100%). The number of reports of a double flash was significantly larger in the control condition with 2 auditory and visual stimuli than that in the critical trials with 2 auditory stimuli and 1 visual stimulus (t = 3.29, P < 0.01). This indicates that the subjects did not make an error when reporting having seen a second visual stimulus in the critical incongruent trials, but that both percepts were equally probable. There was no significant difference in reaction times between the 2 response categories as assessed with a paired t-test (714 vs. 730 ms, t = −0.39, P = 0.70). As the reaction times within the incongruent trials do not differ and there was also no difference from the mean overall reaction time (724 ms), subjects were unlikely to have experienced a perceptual ambiguity or conflict, as suggested by Mishra et al. (2010). Note that we introduced a 500-ms delay between stimulus offset and the response in order to avoid premature button presses.
In the analysis of the event-related activity of the 2 response categories, we found a significant difference between 265 and 280 ms after trial onset in the RMS transformed time course (P < 0.05; Fig. 3A). The topography of the nontransformed time course in the same latency reveals a dipolar pattern with a positive peak in the left hemisphere and a negative peak in the right hemisphere (Fig. 3B) suggesting a medial source. We point out that this effect was derived from the RMS applying the Holms correction, in other words, an approach that considers all sensors together. The ERF effect did not survive the more conservative cluster-based correction for multiple comparisons, perhaps due to its spread onto 2 spatially rather circumscribed clusters. As we introduced a 500-ms delay between stimulus onset and the response screen, this period was not contaminated by motor responses. LCMV source analysis identified the cingulate gyrus as the source of this effect (Fig. 3C); this structure has been often associated with attention, control, error processing, and performance monitoring (Botvinick et al. 2001; Keil et al. 2010), but has also been identified as part of a salience network (Menon and Uddin 2010) involved in the maintenance of task sets and goal-directed behavior (Dosenbach et al. 2007). Unlike earlier studies (Bhattacharya et al. 2002; Shams et al. 2005; Mishra et al. 2007, 2010; Cappe et al. 2010), we did not find evidence for early audiovisual interactions. However, we did not focus on the contrast between combined audiovisual stimulation and separated auditory and visual stimulation, but rather on the effect of different perceptions of identical stimuli. We did not find any significant differences between the 2 response categories in the poststimulus time–frequency analysis. We also checked for differences in motor preparation prior to the behavioral results and found no significant differences between the different responses.
The focus of the present study was on the role of prestimulus activity. We therefore analyzed oscillatory power in the interval beginning 500 ms before and ending upon stimulus onset using a wavelet time–frequency analysis. Utilizing a cluster-based permutation statistic, we identified one cluster of power (13–21 Hz, −500 ms until −100 ms), which significantly differentiated between subsequently perceived illusions and nonillusion trials (P < 0.05; Fig. 4A). The topography of the significant sensor cluster associated with this effect comprised left temporal sensors (Fig. 4B), although higher beta power can be seen on a large number of sensors. Due to this low spatial acuity of topographic sensor maps, a correct interpretation of the results required the identification of possible cortical generators. The Beamformer source analysis (DICS; Gross et al. 2001) suggested the posterior portions of the left middle temporal gyrus (Brodmann area 39, MNI coordinates [−40–64 18]; Fig. 4C) as the source for this effect. This underscores the role of the putatively multisensory (Calvert et al. 2000) left temporal areas in merging audiovisual information, as indicated by findings on the McGurk effect (Beauchamp et al. 2010; Keil et al. 2012) and by findings on ambiguous audiovisual stimuli (Hipp et al. 2011). A positive correlation between the reaction tendency toward the illusion and the voxel-wise beta power difference values for the comparison between illusion and nonillusion trials was found in the left middle frontal gyrus (BA9, MNI coordinates [−34 9 37], r ∼0.76, P < 0.05; Fig. 4D,E). Thus, processes at the level of multisensory areas might be insufficient for explaining an upcoming illusion. Top-down influences from frontal areas might thereby play an important role, as well as activity in a network spanning primary sensory, multisensory, and higher-order areas.
To investigate the influence of network connectivity, we computed PLVs between the region of interest identified in the power analysis (i.e., BA39), the auditory and visual cortices identified via analysis of the N1-location and the whole-brain volume. The statistical comparison of PLVs between the 2 illusion and nonillusion trials revealed a complex pattern of alpha- (9–11 Hz) and beta- (13–21 Hz) phase synchrony associated with the varying perception of invariant stimuli (see Fig. 5 for details). In the alpha band (solid lines in Fig. 5), the right primary auditory cortex was found to be more strongly connected to visual areas in BA18 prior to the illusion trials. The primary visual cortex, however, was more strongly connected to medial frontal and parietal (BA4) areas and less connected to the inferior frontal cortex (BA44) prior to the illusion trials. This points to an important role of ongoing fluctuations in network connectivity in multisensory perception. The left middle temporal gyrus, the source of the beta-band power effect, was found to be relatively more phase locked in the beta band (dashed lines in Fig. 5) in illusion trials to auditory processing areas in the anterior temporal lobe (BA21) and relatively less phase locked in illusion trials to visual processing areas in the occipital lobe (BA18). This indicates that, at the group level, auditory and multisensory areas are more strongly connected prior to audiovisual illusions, whereas visual and multisensory areas are more strongly connected prior to nonillusion trials. To assess the role of ongoing phase on single-trial perception, we computed the deviation from the mean phase difference between the cortical sources identified above. Small deviance values thus indicate a stronger connectedness (i.e., a fixed phase difference). For the connectivity between BA39 and BA21, we found a significant effect for the phase deviance on perception (F3,52 = 6.41, P < 0.001, Fig. 6). For small phase deviance values (first quartile), the proportion of illusory trials was significantly above the chance level and significantly larger than for the third and fourth quartiles. For large phase deviance values (third and fourth quartiles), this proportion was significantly below the chance level. We found a significant trend (χ2 = 6.54, P < 0.05) in the data indicating a larger proportion of illusion trials for smaller phase differences and vice versa for large phase deviance values. This means that not only do PLVs generally differ between subsequent perceived illusions and nonillusions, but also the extent of synchronization between BA39 and BA21 influences perception on a single-trial level. To further support this finding, we split the single-trial phase deviance data and used the first data set to predict the effect of phase deviance on perception in the second data set. We found a significantly smaller phase deviance for the perception of the illusion (F1,26 = 3.25, P < 0.05, Supplementary Fig. 1A). The prediction for the effect of phase deviance on perception was not significantly different from the observed effect in the second data set (χ2 = 13.36, P = 0.98), thus indicating a correct prediction. One aspect, however, is that the left posterior middle temporal region (Fig. 4C) and BA21 are in relatively close proximity (6.2 cm), necessitating an exact scrutiny of whether the described effect could be due to volume conduction. Indeed, a correlation between beta power in these 2 regions is highly significant (Supplementary Fig. 1A), even though it has to be emphasized that beta power in BA21 did not significantly differentiate between the perceptual categories. An argument against volume conduction based on plausibility assumptions is that, in the case of spurious synchrony, a maximum could be expected around the seeding region and dropping off linearly in all directions. Figure 6 shows that this is clearly not the case with the peak-phase synchrony effect being well separated from the peak beta power effect. Another plausibility check requires the investigation of the angle of the phase differences (not to be confused with our previously described measure of phase deviance, in which the “raw” single-trial phase differences were subtracted from the mean phase difference). Again, in the case of spurious synchrony based on volume conduction, these should equal to 0 or 180°. We performed this analysis for all of the single trials in the aforementioned quantiles and were able to find that the phase difference between BA39 and BA21 was significantly larger in the first quartile than in the fourth, which means that stronger synchrony is associated with audiovisual integration, while the phase lag between the cortical sources is actually larger than in trials with weaker synchrony. If stronger beta power would have lead to spurious phase synchronization effects (i.e., due to volume conduction), an opposite pattern could have been expected (i.e., the first quantile being closer to 0°). Furthermore, the absolute differences were significantly above 0 and less than π in all 4 quartiles, indicating that the activity measured here in fact represented locally independent source activity (Supplementary Fig. 1B) and that we could indeed reveal genuine synchronization effects. Whereas Monto (2012) recently showed that modulations in the PLV are theoretically independent from modulations in amplitude, one concern with the interpretation of phase locking is that less power in one condition would lead to less stable phase estimates and thus less phase locking. We tested this caveat by comparing the averaged single-trial power values at the cortical sources identified above. We found no significant power differences between the conditions in any of the cortical sources (Supplementary Fig. 2). However, illusion trials were associated with a trend toward weaker alpha-band power in the middle frontal gyrus (MFG) (t(13) = 1.99, P = 0.068, Supplementary Fig. 2A). Illusion trials were also associated with a trend toward stronger beta-band power in the left BA39 (t(13) = −1.87, P = 0.083, Supplementary Fig. 2B), which is not surprising given that this cortical location was identified based on the beta-band power difference. Note that while the trend level power difference in the MFG indicates reduced alpha power during illusion, the connectivity effect suggests an increase of phase synchrony to V1. Also for the—more interesting—effects of BA39, both increases (to BA21) and decreases (to BA18) of phase synchrony could be observed, while no differences in power could be observed for the latter 2 regions. This makes trivial SNR differences an unlikely explanation for the connectivity results reported in the manuscript.
In an influential review paper on multisensory integration, Senkowski et al. (2008) raised some of the outstanding open issues in this field, in particular whether source-level connectivity could be successfully employed to gain better insights into the neurophysiological nature of multisensory integration and the functional role of local and interareal synchronization in different frequency bands. The present study aimed to elucidate these issues by relating both cortical activity and functional connectivity with varying perception upon invariant stimulation using MEG. We applied the SIFI as a conceptually simple audiovisual task and asked our subjects to indicate the number of perceived visual stimuli. This allowed us to compare instances of illusory perception with nonillusory trials. The main findings of this investigation are: 1) the perception of the SIFI is associated with elevated evoked activity in the cingulate cortex; 2) increased beta-band power in the left temporal areas before the sound onset precedes the perception of the illusion; and 3) audiovisual integration as seen in the SIFI is characterized by a complex pattern of alpha- and beta-band phase synchrony, particularly by increased connectivity between multisensory and secondary auditory areas.
Electrophysiological studies on the influence of ongoing cortical activity have mostly focused on alpha-band power (Van Dijk et al. 2008; Romei et al. 2010) and phase (Busch et al. 2009; Mathewson et al. 2009) in the visual domain. The present study underscores the role of prestimulus activation patterns on upcoming perception and makes several important additions to the aforementioned studies. First, the previous studies reporting alpha effects focused on a more simple perceived-versus-not-perceived distinction, making arousal fluctuations a difficult factor to exclude. In our study, participants always had a conscious perception, which however differed in its quality. We show that, unlike the simpler case of perceived-versus-not-perceived, in which alpha differences are putatively localized in the parieto-occipital cortex as shown by Van Dijk et al. (2008), the qualitative difference of conscious percepts in our experiments originated outside the primary sensory regions in areas likely involved in multisensory integration, namely the left BA39, the junction between the temporal, occipital, and parietal cortices. As noted by Hein and Knight (2008), the function of the multisensory superior temporal sulcus varies depending on the nature of network coactivation. The current findings are in line with this, as audiovisual information was either integrated in the case of an illusory percept or not depending on the connectivity pattern of this multisensory area. Furthermore, our study is one of the few (e.g., Ploner et al. 2010; Keil et al. 2012) focusing on prestimulus functional connectivity and its role for perception. While Hanslmayr et al. (2007) reported also beta- and gamma-band phase coupling influences next to alpha-band power influences on visual perception, the analysis was performed at an electrode level as opposed to our source-level approach. Schoffelen and Gross (2009) have demonstrated that problems of volume conduction can be greatly reduced by the approach we chose in this study. Taken together, while the results of the current study are in line with the previous work from our own group as well as others, the present findings improve significantly upon our current understanding of local and network processes in the prediction of upcoming multisensory integration. In particular, we are able to identify both local and interregional synchronization processes in the beta band to signify a predisposition for multisensory integration. Therefore, the present study aids in advancing some of the most pressing issues in the area of multisensory integration as outlined by Senkowski et al. (2008).
The Perception of the SIFI is Associated with Elevated Evoked Activity in the Cingulate Cortex
Studies analyzing audiovisual multisensory interactions (Shams et al. 2005; Mishra et al. 2007; Cappe et al. 2010; Mishra et al. 2010) have so far mostly focused on the difference in activation between audiovisual stimuli and auditory and visual stimuli [AV − (A + V)]. These studies show early interactions, as audiovisual stimuli engage distinct configurations of cortical sources. In contrast to this approach, we compared the event-related activity within an identical AV stimulation, which resulted, however, in 2 distinctly different percepts. In this analysis, we found a differentiating cortical response 265–280 ms after sound onset in the cingulate cortex. Assuming an approximate duration of 90 ms for the stimulation, this effect occurred 180 ms after stimulus offset. Whereas earlier reports on the SIFI have also found differences at later latencies (Bhattacharya et al. 2002; Shams et al. 2005), the above mentioned studies largely focus on the early interaction between auditory and visual stimuli. Bhattacharya et al. (2002) attribute the late increase in gamma-band activity to the generation of a coherent percept and, therefore, the propagation of information to the higher-order (e.g., decision making) processes, an interpretation, that is, shared by Shams et al. (2005) in the discussion of the late effects found in their study. Note that we introduced a 500-ms delay between stimulus onset and the response screen, thus the present effect is not contaminated by a motor response. Numerous reports (Botvinick et al. 2001; Keil et al. 2010) have associated this area as well as processes in this time interval with cognitive control, performance monitoring, and error processing. The process reported here, however, is unlikely to be related to error processing as it occurs prior to the actual response. As the cingulate cortex has been associated with a salience network (Menon and Uddin 2010) involved in monitoring internal and extra-personal events as well as in directing behavior through task maintenance (Dosenbach et al. 2007), it is possible that the current results point to an ongoing monitoring process, especially given the functional connectivity to primary sensory areas as identified by Dosenbach.
Increased Beta-Band Power in the Left Temporal Areas Before the Sound Onset Precedes the Perception of the Illusion
A number of studies have been performed on the comparison between matching and mismatching multimodal information (Senkowski et al. 2008) as well as with regard to variations in the perception of invariant multimodal stimuli (Lange et al. 2011). However, little is known about the predictive value of prestimulus oscillatory activity on multimodal perception. It has been suggested that the brain weighs sources of sensory information according to their assumed reliability when producing a unified percept (Bulkin and Groh 2006). Thus, it should be possible to detect a signature of this top-down weighing mechanism in the prestimulus period and—depending on the current state of top-down processing—to find predictive values for perception in this interval. The recent work of our own group (Keil et al. 2012) has pointed to the important role of beta-band power and phase in the perception of the McGurk effect. Audiovisual fusion, as indexed by the perception of the McGurk illusion, depends on the current local power in the left STG and the state of connectivity between this region and frontal and temporal regions. The current study further supports the importance of local beta-band power in the multimodal areas for the fusion between multimodal information. In line with the work cited above, we found increased beta-band power prior to stimulation in the left BA39. This local power differed significantly between subsequent audiovisual fusion and nonfusion. The source of this effect has also been implicated with multimodal processing in both functional MRI (Calvert et al. 2001; Watkins et al. 2006) and electrophysiological studies (Cappe et al. 2010; Mishra et al. 2010). The correlation of source power values with subsequent behavior also indicated an involvement of the BA9 in the middle frontal gyrus. Thus, not only does local power in the middle temporal gyrus generally influence upcoming perception, but also top-down influences from frontal areas also reflect an interindividual predisposition to the illusion. Based on the effects found poststimulus in superior temporal areas, Mishra et al. (2010) have argued that this locus provides the proximal trigger for perceiving the SIFI. We extend this view and argue that this is particularly true for activity prior to stimulation, which sets the pathways for upcoming perception.
Audiovisual Integration as Seen in the SIFI is Characterized by a Complex Pattern of Alpha- and Beta-Band Phase Synchrony
Empirical evidence suggests that perception involves a widespread neuronal network in addition to activations of sensory and association areas (Koch 2004; Dehaene et al. 2006). It is therefore important to consider the states of functional networks in addition to local cortical activity (Buzsáki 2006; Senkowski et al. 2008). Phase synchronization of oscillatory activity (Lachaux et al. 1999; Varela et al. 2001) has been proposed as a plausible mechanism for accomplishing large-scale integration. Recently, Monto (2012) showed that modulations of PLVs are independent from the modulations of amplitude. To test the involvement of a distributed network influencing the subsequent perception, we analyzed source-space PLVs. In line with a recent report by Hipp et al. (2011), we found significant beta-band phase synchrony in a network comprising temporal and occipital areas. The perception of the illusion becomes more likely depending on the state of prestimulus connectivity between the multimodal area BA39 and auditory and visual areas. Moreover, single-trial phase difference between BA39 and the auditory area BA21 has a predictive value for upcoming perception in individual participants. The stronger these sources are coupled in each single trial, the more likely the illusion will occur. The trend we observed in the single-trial phase difference data indicates that the stronger the interaction between these 2 areas, the stronger the influence of auditory information during multimodal perception gets. Conversely, the less coupled these sources are in each individual trial, the smaller the influence of auditory information and the less likely the occurrence of the illusion, as the proportion of illusion trials is significantly below the chance level. Aside from this integration of a multimodal area with sensory areas, which is characterized by beta-band phase synchrony and in line with the notion of beta activity as a marker of top-down processing (Engel and Fries 2010), we also found modulations in alpha-band phase synchrony. If the illusion is subsequently perceived, the primary auditory cortex is phase locked to the higher visual area BA18. The primary visual cortex, however, is connected to medial frontal and parietal areas, but disconnected from inferior frontal areas. Thus, the influence between auditory and visual areas is increased, as well as the influence between the frontal and visual cortices. Mishra et al. (2010) have argued for the influence of attention on the perception of the SIFI. We argue that not only attention modulates upcoming perception, but also the current network architecture, as indicated by alpha- and beta-band phase synchrony. It can be thus speculated that the alpha and beta networks identified here help solve the multisensory-binding problem as described by Senkowski et al. (2008), in which information must be integrated across different cortical regions and also coordinated across sensory channels. Whereas the alpha-band synchrony could work as a top-down guide to modulate activity in primary sensory areas depending on, for example, attentional constraints, the beta-band synchrony could signal changing the flow of information between uni- and multisensory areas, and thus the coordination between different cortical regions and the input multisensory areas receive from the unisensory regions. Alpha-band effects were generally reported for the relatively simple distinction aware versus not aware. This may be well accounted by spontaneous fluctuations of attention (e.g. Busch et al. 2009), which does not make it less interesting. As mentioned above, in our study, stimulation always lead to a percept, even though the content of the percept changed. This more complex distinction may involve a larger network, the connectivity of which is expressed in beta-band activity. Even though it has recently been shown that PLV is independent from changes in amplitude (Monto 2012), different SNRs could lead to the different levels of phase locking. We therefore tested for the different power levels at the cortical sources, but found no significant differences. On the other hand, it could be argued that these different levels of signal to noise are a correlate of the distinct brain states giving rise to varying percepts on invariant stimuli.
Research on the processing of mulitsensory information has indicated processing differences between male and female participants (Collignon et al. 2010), which—given that only one male took part in the current study—could influence the current results. However, the aforementioned study only deals with emotional stimuli, whereas the tones and dots used in the present study do not convey emotion. We also reanalyzed our data without the single male subject, this however did not change the overall pattern of results. Another line of research deals with gender differences regarding the subjects' susceptibility to illusions based on differences in hemispheric processing (Rasmjou et al, 1999). However, we would not expect and differences regarding the current stimulation based on this work. First, albeit hemispheric differences in males were found, the rate of illusory perception was still approximately 50% in the nondominant right hemisphere. Secondly, in contrast to the aforementioned study that presented stimuli either to the left or the right hemisphere, we presented the visual stimuli centrally, instructed the subjects to fixate the central fixation cross, and presented the auditory stimuli with equal loudness to both ears.
The present study, in accordance with the work of our group and other researchers, adds evidence to the importance of prestimulus local and network activity on upcoming perception. For the SIFI, 2 prestimulus features appear to be of great importance: First, relatively enhanced beta power in left temporal multisensory integration regions and secondly, the strength of coupling between this region and left secondary auditory regions. More research is needed to address the issue of directionality of the network activity. Moreover, studies involving active external perturbations of the current state, for instance with transcranial magnetic stimulation or transcranial direct or alternating current stimulation, are needed in order to evaluate the role of multimodal cortical areas.
The study was funded by the Deutsche Forschungsgemeinschaft.
Conflict of Interest: None declared.