Humans are assumed to predict the sensory consequences of their own actions by means of forward models that enable discrimination between self-produced and external sensory signals. Here we tested whether responses in the human auditory cortex would differ to self-triggered versus externally triggered tones. The responses were recorded with a whole-scalp neuromagnetometer from 12 healthy subjects who either themselves triggered a tone by pressing a button once every 5 s or passively listened to externally triggered tones, presented in an identical sound sequence. Sources of the auditory N100m responses, peaking ∼90 ms after sound onset in the supratemporal auditory cortex, were significantly weaker to self-triggered than to externally triggered sounds (suppressions 24 ± 7% and 18 ± 4% in the left and right hemispheres, respectively). These results support the existence of a forward model that predicts the auditory consequences of the subject's own motor acts on the environment — even with a tool — and thereby enables discrimination between self-produced and external sounds.
A part of the sensory input to the human brain results from the subject's own acts. Therefore, to be able to extract important features from the mixture of self-produced and external sensory information, and to avoid misattribution of the agent of an act, the subject has to discriminate accurately and continuously between self-produced and real external stimuli. One solution to this problem is provided by a predictive forward model that is assumed to internally represent transformations from motor commands to their sensory consequences, and then to compare these predictions with the actual consequences of the actions (Wolpert et al., 1995; Wolpert and Ghahramani, 2000). The forward model thereby predicts the sensory consequences of one's own actions so that more processing capacity can be allocated to external stimuli (Blakemore et al., 2000b; Blakemore and Decety, 2001). A role similar to the forward model has previously been attributed to efferent neural signals, ‘efference copies’ (von Holst and Mittelstaedt, 1950) and ‘corollary discharges’ (Sperry, 1950), which are assumed to be created by central motor networks in parallel with the motor commands and to be used to predict the sensory consequences of the subject's own motor acts.
Evidence for motor-to-auditory inhibition exists both in monkeys and humans. In monkeys, ∼50% of call-responsive neurons in the auditory cortex are inhibited during vocalization (Müller-Preuss and Ploog, 1981), and the magnetoencephalographic (MEG) responses from the human auditory cortex are smaller to self-uttered speech sounds than to the same utterances replayed to the subject (Numminen et al., 1999; Curio et al., 2000). Thus the brain seems to predict vocalization-related auditory input, a capability important for, for example, self-monitoring of speech (Curio et al., 2000).
Besides uttering and speech, simple hand and finger movements — often accompanied by tool use — are everyday producers of various sounds. Similar sounds can, however, result from actions of other people. Thus it is necessary to be able to correctly determine the agent of the sound. Schafer and Marcus (1973) demonstrated that human electroencephalographic vertex responses (‘N1’, peaking at 85–100 ms as well as the later peaks up to 350 ms) were significantly smaller to self-administered auditory stimuli than to identical machine-delivered stimuli. However, in that study the generation sites of the evoked potentials could not be determined and therefore the brain areas in which the effects took place remained unresolved.
Utilizing the selectivity of MEG recordings to activity of the supratemporal auditory cortex, we aimed to find out whether the auditory cortical responses to self-triggered and externally triggered sounds would differ in healthy humans. The subjects pressed a button once every 5 s, thereby creating within one session ∼60 short tones; the same sequence was played back in the subsequent session. We expected that forward-model mechanisms would result in sensory prediction and in subsequent motor-to-auditory inhibition, reflected by suppressed responses in the auditory cortex to self-triggered sounds.
Materials and Methods
We studied 12 healthy volunteers (3 females, 9 males; aged 20–38 years, mean 27); all had normal hearing and no history of neurological or otological disorders. Nine subjects were right-handed, two ambidextrous, and one left-handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). Informed consent was obtained from each subject after full explanation of the study. The experimental procedure was approved by the Hospital District of Helsinki and Uusimaa Ethics Committee.
The experiment consisted of three different conditions. In the motor–auditory (MA) condition the subjects pressed a plastic button with the right index finger at a self-paced rate of about once every 5 s; the subjects were instructed to avoid counting the intervals. The key press triggered a 50 ms, 1000 Hz sinusoid, which was transformed into a binaural tone burst 65 dB above the individual sensation level, i.e. ∼70–85 dB sound pressure level (SPL). The tones were presented to the subjects through plastic tubes and earpieces tightly fitted into the ear canals. The subjects were asked simply to listen to the sounds that arrived to the ears of the subject within a few milliseconds from the button press. The timing signals of key presses were monitored on-line on a computer screen during the whole experiment, and were recorded to produce a trigger sequence for later sessions.
In the auditory (A) condition, the subjects listened to an ‘external’ sound sequence created using their own trigger sequence in the MA condition. Thus the A and MA sound sequences were identical both acoustically and in their timing.
In the motor (M) condition, the subjects pressed the key in a similar fashion as in the MA condition, but no sounds were delivered. This sequence was also recorded and later compared with the pressing performance in the MA condition.
In all subjects, the three conditions were repeated once to study the reproducibility of the responses. The order of the conditions was varied between the first and second runs in every subject and between different subjects using all three possible alternatives: MA–A–M, MA–M–A, M–MA–A. The MA condition, obviously, always had to precede the replay, i.e. the A condition.
The cortical responses were recorded with a 306 channel whole-scalp neuromagnetometer (Vectorview, Neuromag Ltd, Helsinki) in a magnetically shielded room. For the analysis, we used signals from the 204 planar gradiometers that measured the two orthogonal derivatives of the radial magnetic field at 102 locations outside the head. These planar gradiometers detect the largest signal just above a local cerebral current source. The subject, seated in a reclined chair with head resting on the helmet-shaped inner vault of the device, was instructed to fixate a point straight ahead and to minimize blinking and head movements. Four head position indicator coils were attached to the scalp, and their locations with respect to anatomical landmarks were measured with a three-dimensional digitizer. During the MEG recording, weak currents were led into these coils, and the resulting magnetic fields were measured with the sensor helmet to find the head location with respect to the sensors. This information was used in the alignment of the MEG and the magnetic resonance imaging (MRI) coordinate systems. A vertical electro-oculogram (EOG) was recorded to exclude the epochs contaminated by blinks and eye movements.
The recorded neuromagnetic signals were bandpass filtered through 0.03–200 Hz and digitized at 609 Hz. For each condition, 60 responses were averaged online.
Sources of the MEG signals were modeled as previously described (Hämäläinen et al., 1993). The averaged responses were first digitally low-pass filtered at 40 Hz. A 200 ms prestimulus baseline was applied for amplitude measurements. The MEG waveforms obtained in the two measurement runs were averaged together.
We focused on the most prominent evoked response deflection, N100m, which usually peaks ∼100 ms after the sound onset (Hari et al., 1980; Hari, 1990). During clearly dipolar field patterns, the sources of N100m were modeled as equivalent current dipoles (ECDs), one in each hemisphere. Spherical head models were optimized on the basis of magnetic resonance images that were available for 9 of the 12 subjects. The ECDs that best explained the most dominant signals were determined by a least-squares search, based on data of 20–30 channels at areas that included the local signal maximum. An 80% goodness-of-fit threshold was used in selecting the ECDs. Finally, the analysis period was extended to the whole measurement time and to all channels by using a two-dipole model in which the strengths of the previously found two ECDs were allowed to vary as a function of time while their locations and orientations were kept fixed (Hämäläinen et al., 1993). The dipoles determined for the A condition were also used to model the responses of the other two conditions.
The N100m peak latencies and strengths were determined from the source waveforms. In the M condition, the source strengths were measured at the latency of N100m in the MA condition; this procedure was used to quantify the possible motor contamination, as is explained in more detail below.
The two-tailed Student's t-test and the binomial test were used for statistical analysis.
Key Pressing Performance
The mean ± SEM (across eight subjects) intervals from the beginning of a key press (trigger on) to the next were practically the same in the MA and M conditions: 4.9 ± 0.3 and 4.9 ± 0.5 s, respectively.
Example of Individual Responses
Figure 1 shows for Subject 1 the sources of the auditory N100m response superimposed on his magnetic resonance image, as well as the corresponding source waveforms; N100m peaked ∼90 ms after sound onset. The current dipoles were located bilaterally in the supratemporal auditory cortices, in good agreement with previous studies (Hari et al., 1980; Hari, 1990; Pantev et al., 1990). N100m followed the tone onset of both self-triggered (MA) and externally triggered (A) sounds but it was, in both hemispheres, weaker in the MA than in the A condition. The peak latencies did not differ between the conditions.
To rule out the possibility that the suppression of N100m in the MA condition could be due to contamination from brain signals related to the hand movement per se, the gray line in Figure 1 illustrates the strength of the auditory cortex source as a function of time in the M condition. This trace shows no N100m-type deflection, but a small shift opposite in polarity to N100m occurs in both hemispheres. Therefore, in subsequent analyses the M-condition signals were taken into account to calculate conservative estimates for the suppression of N100m in the MA condition.
The rationale for this procedure is that even very local source currents produce magnetic fields that can be detected by a large number of sensors. Thus the finger presses could, in principle, produce signals also in those sensors that were here used to model the activity of the auditory cortex. Although the major part of the movement-related cortical activity often can be modeled by sources in the precentral motor cortex, some signals could still contaminate modeling of the auditory response even if the motor-cortex contribution was extracted from the data. Instead, the present procedure takes into account the movement-related contamination, irrespective of the spatial distribution of the signals. Thus our corrected results provide conservative estimates of the suppression of responses in the MA condition.
Responses in MA versus A Condition
Adequate dipole models were found for the N100m response in both hemispheres of 8 of the 12 subjects, and in the right hemisphere only of one and in the left hemisphere only of two subjects. Figure 2 compares the source strengths of N100m to self-triggered (MA) versus externally triggered (A) sounds in these 11 individuals (10 for left hemisphere and 9 for right hemisphere). The responses were weaker in MA than in A condition in 9 out of the 10 subjects in the left hemisphere (P = 0.011) and in all 9 subjects in the right hemisphere (P < 0.002). The N100m suppression was on average 39.7 ± 7.2% in the left hemisphere (27.0 ± 3.8 nAm versus 43.9 ± 3.0 nAm; P < 0.001) and 27.0 ± 4.4% in the right hemisphere (45.3 ± 4.4 nAm versus 62.3 ± 5.6 nAm; P < 0.001). At this stage, when the potential motor contamination has not yet been taken into account, the suppression (MA < A) was stronger in the left than the right hemisphere (P < 0.01).
The N100m peak latencies did not differ between A and MA conditions in either hemisphere (91 ± 2 ms for A and 92 ± 2 ms for MA in the left hemisphere; 89 ± 1 ms for A and 89 ± 2 ms for MA in the right hemisphere).
Ruling out motor contamination
To ensure that the response decrease during MA cannot be accounted for by signals that are related to motor activity as such, the sources in the auditory cortex were used to explain the signal patterns during the M condition, as was explained above for Figure 1.
In Figure 3, the third row from the top shows that the source strengths during the M condition differed statistically significantly from zero in both hemispheres, with current polarities opposite to those of N100m (−6.4 ± 0.7 nAm, P < 0.001 in the left hemisphere, and −5.3 ± 1.1 nAm, P < 0.005 in the right). These signals were slightly stronger in the left than the right hemisphere (P = 0.05), probably because all subjects used the right hand for the key presses. It is also evident from Figure 3 (bottom white bars) that this possible contamination from movement-related magnetic fields does not explain the observed suppression of auditory responses during the MA condition: responses are statistically significantly larger during A than MA even when the opposite signals during M are taken into account.
In other words, the conservative estimate for the response suppression during the MA condition is (A – MA) + M, and this value is >0 with a probability of P = 0.002 in the left hemisphere (suppression 23.8 ± 7.2%) and with a probability of P < 0.005 in the right (suppression 17.9 ± 4.1%), with no statistically significant difference between the hemispheres.
In the present study, responses of the human auditory cortex were smaller to self-triggered than to externally triggered sounds, and the effect was clear at the peak of the response, ∼90 ms after the sound onset. This finding extends previous MEG results that found smaller auditory-cortex responses to self-produced utterances than to the same utterances replayed to the subject (Curio et al., 2000; Houde et al., 2002). Similarly, activity in the temporal lobes, measured by positron emission tomography, was weaker during self-produced than externally elicited sounds (Blakemore et al., 1998).
One explanation for such results is the existence of a motor-to-sensory forward model (Blakemore et al., 2000b), which here would result in motor-to-auditory inhibition. Such a forward model could play a key role in differentiating self-produced and external stimuli even beyond the overlearned speech-to-hearing association, i.e. when subjects work with their hands and use tools, thereby producing ‘non-biological’ sounds by their own actions. Here the prediction from the forward model needs to be less accurate than during the detailed monitoring of the subject's own speech. In the vertex-potential study of Schafer and Marcus (1973), the suppression of responses to self-triggered sounds was resilient for delays between the key presses and sound production of even up to a few seconds.
Forward models may be essential for attribution of agency. Accordingly, in psychotic patients who suffer from delusions of control and/or auditory hallucinations, the percepts are not attenuated to self-produced versus externally produced tactile stimuli; this is in contrast to findings in healthy control subjects and in psychotic patients without such symptoms (Blakemore et al., 2000a). Curio et al. (2000) suggested that the suppression of responses to own speech utterances might be disturbed in schizophrenic subjects, and such results have already been reported (Ford et al., 2001). MEG recordings similar to those of the present study could provide a suitable and simple objective test for such patients, and thereby add to the understanding of the mechanisms underlying auditory hallucinations and delusions of control.
Recent intracellular recordings of auditory neurons in the singing cricket (Gryllus bimaculatus) showed modulation by efferent signals from the singing motor network (Poulet and Hedwig, 2002). Moreover, in the auditory cortex of marmoset, self-initiated vocalizations suppress neural discharges in the majority of neurons, probably via an efference copy mechanism (Eliades and Wang, 2003). Similar neuronal mechanisms in humans might cause the suppression of auditory responses to both self-produced utterances and to other self-produced sounds.
The peak latencies of the auditory-cortex responses did not differ between self- and externally triggered sounds, whereas in the previous electroencephalographic study (Schafer and Marcus, 1976) latencies were 5% shorter to self-produced than machine-produced sounds. This discrepancy is most likely due to the different generation mechanisms of the responses quantified in these two studies: the electric vertex potential ‘N1’ has other contributors than the auditory-cortex activations, whereas only the latter are reflected in the N100m responses recorded in the present study (cf. Hari et al., 1982; Hari, 1990).
This study was financially supported by the Academy of Finland and the Sigrid Jusélius Foundation. We thank Hanna Renvall for comments on the manuscript.
1Brain Research Unit, Low Temperature Laboratory, Helsinki University of Technology, PO Box 2200, FIN-02015 HUT, Espoo, Finland and 3Department of Clinical Neurophysiology, Helsinki University Central Hospital, FIN-00290 Helsinki, Finland