Speech production is an extremely rapid and seemingly effortless process with speech errors in normal subjects being rare. Although psycholinguistic models incorporate elaborate monitoring mechanisms to prevent and correct errors, the brain regions involved in their commitment, detection, and correction have remained elusive. Using event-related brain potentials in a task known to elicit spoonerisms representing a special class of sound errors, we show specific brain activity prior to the vocalization of such spoonerisms. Source modeling localized this activity to the supplementary motor area in medial frontal cortex. We propose that this activity reflects the simultaneous activation of 2 competing speech plans on processing levels related to the construction of a rather “phonetic” speech plan contrasting with the traditional view, assuming the substitution of abstract phonological representations as the main source for sound errors.
In speaking aloud, we produce up to 150 words/min. The act of speaking thus requires to proceed from the intention of what to say to semantic, syntactic, phonological, and articulatory processing stages within milliseconds (Levelt 1989). The low incidence of speech errors not amounting to more than about 1 in every 1000 words of normal speech (Leuninger 1993) demonstrate that the production of speech is a highly skilled behavior with low susceptibility to interference. The capability of speakers to detect and correct some of their errors even before they are produced, as suggested by early interruptions of unintended utterances (Levelt 1989; Blackmer and Mitton 1991), speak for the existence of mechanisms allowing for the self-monitoring of ones own speech production even before articulation.
Therefore, self-monitoring devices are incorporated in virtually all the current speech production models (Motley and others 1983; Dell 1985, 1986; Levelt and others 1999; Postma 2000)—either as a feedback mechanism via the perceptual system or inherently built into the production cycle (Postma 2000). Self-corrections have been reported to occur in about 50% of all speech errors (Nooteboom 1980). In some cases, such corrections include the interruption of the error as early as after the articulation of the first syllable or phoneme. Moreover, hesitations accompanied by “editing terms” (uh) or repetitions of previous words are believed at least in some cases to signal the occurrence of “covert repairs” where the error has been detected even before articulation, but ongoing speech has to be interrupted in order to covertly correct the error (Levelt 1989).
Monitoring of one's overt speech can neither explain fast interruptions nor covert repair phenomena. Rather, a fast “inner monitoring loop” (Levelt 1983) examining the “inner speech” (Dell and Repka 1992) has to be assumed. The representation targeted by this inner loop has been shown to be phonological on the basis of a phoneme monitoring task (Wheeldon and Levelt 1995).
One way to elicit speech errors in normal subjects is the so called spoonerisms of laboratory induced predisposition (SLIP) (spoonerisms are named after Reverend W.A. Spooner of Oxford who coined some of the famous examples, such as “You have hissed all my mystery lectures”) technique (Motley and Baars 1976) (Fig. 1): Word pairs are presented visually with a rate of about 1/s with the task to silently read the words for a subsequent memory test. Every few trials, a “target” pair is marked for overt articulation. The production of these target pairs can be influenced by the phonological make up of the preceding “inductor pairs” such that the initial phonemes of the target words will be exchanged with a probability of about 10%. As an example, the inductor pairs (ball doze), (bash door), and (bean deck) followed by the target pair (darn bore) could give rise to the potential spoonerism “barn door” or partial spoonerisms like “darn door” and “barn bore.” Speech errors in this task are thought to occur because 2 competing speech plans become activated (Baars 1980), and the subject is unable to inhibit the erroneous plan prior to vocalization.
Whereas speech errors have played a crucial role in speech production research, little is known about the underlying brain processes. To gain an initial insight into these mechanisms, event-related brain potentials (ERPs) were recorded in a variant of the SLIP task in native, neurologically healthy speakers of German.
All procedures of this study were cleared by the Institutional Review Board of the University of Magdeburg.
After giving written informed consent, 34 right-handed native speakers of German (age range 20–25 years) participated in a 3-h recording session. Because reasonable numbers of error trials had to be acquired for subsequent ERP analysis, only those 11 participants showing error rates in the range of 6.9–17.5% (average 9.7%) participated in a second recording session otherwise identical to the first one.
Word pairs were presented for 1000 ms in green against a dark gray background on a video monitor. At the chosen viewing distance, they subtended 0.35 degrees of visual angle in height and between 1.5 and 2.2 degrees in width. Each trial comprised the presentation of 2–7 word pairs with a stimulus-onset-asynchrony of 1100 ms. A row of 3 small pink stars presented during the interstimulus interval indicated the center of the screen. Subjects were instructed to keep their fixation central. A target pair was signaled by a response cue (German for: Respond now! presented in red color) occurring 100 ms after the offset of the target pair and staying on the screen for 650 ms. After the presentation of the response cue, the screen remained dark for 1350 ms until the start of the next trial. The subjects' task was to vocalize, as fast as possible, the word pair immediately preceding the response cue. On some additional “memory” trials, a single word from the preceding series of word pairs was presented in red letters with the subjects' task to complete the pair by vocalizing the complement. This was done to ensure reading of all word pairs. Error rates for memory trials ranged from 5% to 10%. Each of 2 experimental sessions comprised 20 experimental blocks of 25 trials each. Each block contained 16 “critical” trials, in which the target was preceded by at least 2 matching inductors, 4 “control” trials, in which the target was preceded by 2 nonmatching inductors, and 5 “memory” trials. Thus, a total of 640 “critical,” 160 “noncritical,” and 200 “memory” trials was shown. The subjects' vocalizations were digitally recorded onto a hard disk and classified off-line as 1) complete spoonerisms, 2) partial spoonerisms (only one word with phoneme change), 3) self-corrected trials (vocalizations started with the articulation of a phoneme that would have led to a spoonerism, but were interrupted and continued with the articulation of the correct pair of words), and 4) other errors. Only trials with errors of type 1) and 2) were entered to electrophysiological analysis. Other error trials were discarded.
Recording and Analysis
ERPs were recorded from the scalp using 29 tin electrodes mounted in an electro cap against a reference electrode placed on the left mastoid process. Biosignals were rereferenced off-line to the mean of the activity at the 2 mastoid processes. Blinks and vertical eye movements were monitored with electrodes placed at the sub- and supraorbital ridge of the left eye. Lateral eye movements were monitored by a bipolar montage using 2 electrodes placed on the right and left external canthus. Eye movements were recorded in order to allow for later off-line rejection, which was carried out by a computer program based on an amplitude criterion (75 μV). All electrode impedances were kept below 5 kOhm. Electrophysiological signals were amplified with a band-pass filter of 0.01–50 Hz and digitized at a rate of 250 Hz (4-ms resolution).
ERPs were pooled for the 2 sessions and obtained time locked either to the onset of the target word pair and comprised a 1024-ms period (−100 to 924 ms) or to the vocalization prompt (−100 to 400 ms). Waveforms were quantified by mean amplitude measures that were entered into analyses of variance statistics with the Huynh–Feldt epsilon correction applied as necessary.
Neural generators of the brain activity associated with speech errors were estimated using 2 methods. First, employing brain electric source analysis software (BESA2000, Scherg and others 1999), multiple stationary dipoles located within a 3-shell homogeneous spherical head model with correction factors for brain, skull, and scalp conductivity were used to model the group average difference wave (error minus correct) potential. The dipole solutions were generated by iteratively changing both the location and/or orientation of dipoles to yield a least-squares best fit to the ERP surface signal. This solution was projected onto a canonical average brain magnetic resonance imaging as provided by the Montreal Neurological Institute. Alternatively, the cortical 3-dimensional distribution of current density was computed using the low resolution electromagnetic tomography (LORETA) algorithm (Pascual-Marqui and others 1994), which solves the inverse problem by assuming related orientations and strengths of neighboring neuronal sources without assuming a specific number of generating sources. The “smoothest” of all possible activity distributions is thereby obtained. The version of LORETA employed here (Pizzagalli and others 2002) uses a 3-shell spherical head model registered to standardized stereotactic space (Talairach and Tournoux 1988) and projected onto the Montreal Neurological Institute standard average brain. Computations were restricted to cortical gray matter and hippocampi (spatial resolution of 7 mm, 2394 voxels) as described elsewhere (Pizzagalli and others 2002).
Full and partial spoonerisms occurred in 9.95% (standard deviation [SD] 4.6) of the critical word pairs and 4.0% (SD 1.7) of the control word pairs (t10 = 4.8, P < 0.001), indicating that the experimental manipulation had been successful. The percentage of all speech errors, that is, spoonerisms (full and partial) and other miscellaneous types of errors, was similarly enhanced for the critical pairs (14.1% vs. 8.0%, t10 = 4.14, P < 0.002). Of the spoonerisms, 58% (SD 18) were full spoonerisms with a high variability between subjects (range 20–88% full spoonerisms).
Self-corrections were rare and did not differ between critical and control trials (0.29% vs. 0.56%, t10 = 0.89).
In the period prior to the vocalization prompt, brain potentials to the critical trials in which a spoonerism occurred showed an increased negativity between 350 and 600 ms after the onset of the target pair relative to control trials and critical trials without speech errors (Fig. 1). A mean amplitude measure in the 400 to 600-ms time window (6 frontocentral electrode sites) yielded a main effect of trial type (F2,20 = 8.44, P < 0.01). Post hoc tests showed that the error trials differed significantly from both the control trials and the critical trials without errors. The maximum of this increased negativity error trials was over frontocentral portions of the scalp (Fig. 2A).
To pinpoint the possible underlying neural generators of this effect, 2 different inverse source localization methods, based either on multiple stationary point dipoles (Scherg and others 1999) or on distributed sources (Pascual-Marqui and others 1994; Pizzagalli and others 2002), were used (Fig. 2). In spite of their different assumptions and limitations (Phillips and others 2002), both methods identified a medial frontal generator in (or near) the supplementary motor area (SMA, LORETA coordinates: x = −3, y = −4, z = 57) as the main source of the negativity preceding the erroneous vocalizations. In addition, a secondary left anterior temporal source was found by both techniques (LORETA coordinates: x = −59, y = −18, z = −13, middle temporal gyrus [MTG]). The 2 dipole solution found with BESA explained 93% of the variance in the 400 to 600-ms period.
Brain potentials time locked to the vocalization prompt (Fig. 3A,B) again showed an increased negativity for the error trials. This difference led to a main effect of trial type in the 50 to 150-ms (F2,20 = 8.02, P < 0.01) and 230 to 300-ms time windows (F2,20 = 6.72, P < 0.02; 6 frontocentral electrodes) with post hoc analyses indicating that the error trials differed significantly from both the control trials and the critical trials without errors. Difference potentials obtained by subtracting the activity in the control trials from the activity of the other 2 trial types. Only the error trials were associated with a negative potential. A source solution computed for the error minus correct difference wave at 250 ms using the LORETA method revealed a mesial frontal generator implicating the SMA.
The experimental manipulation in the present study successfully induced spoonerisms that were preceded by increased negativities following 1) the presentation of a target word pair and 2) the presentation of the vocalization prompt. In both cases, a similar frontocentral scalp distribution was observed.
The main generator of both effects, as revealed by 2 independent source localization methods, was located in medial frontal cortex (SMA). Given the spatial resolution of source localization methods, it is not possible, however, to completely rule out the anterior cingulate region as a locus of the effect. Indeed, the hemodynamic activations reported in error monitoring and conflict studies are not strictly restricted to the anterior cingulate cortex (ACC) region and usually extend to adjacent areas like the pre-SMA and SMA proper.
Electrical SMA stimulation in awake epileptic patients leads to speech arrest or involuntary vocalizations of simple consonant-vocal-sequences (like “da-da-da” or “ta-ta-ta”; (Brickner 1940; Erickson and Woolsey 1951; Penfield and Welch 1951; Penfield and Jasper 1954; Chauvel 1976; Woolsey and others 1979; Dinner and Lüders 1995). Likewise damage to the SMA has been associated with involuntary vocalizations (Jonas 1981; Ackermann, Daum, and others 1996), acquired dysfluencies (Ackermann, Hertrich, and others 1996), reduced spontaneous verbal communication, and speech arrest (Krainik and others 2003). These clinical observations fit with the identification of the SMA among the areas most likely involved in phonetic encoding and articulation by Indefrey and Levelt (2004) within a thorough meta-analysis of brain imaging studies of speech production. Enhanced activity within a subregion of the SMA has also recently been associated with higher demands imposed on phonetic encoding during the production of long nonwords compared with the production of words and short nonwords (Alario and others 2006).
In other task domains, activation of the SMA has been associated with response conflict (Carter and others 1998; Hazeltine and others 2000; Liotti and others 2000; MacDonald and others 2000; Ullsperger and von Cramon 2001; Ridderinkhof and others 2004; Yeung and others 2004).
Likewise, it has been proposed that the error-related negativity (ERN) a component reported to arise after the execution of an erroneous response (Gehring and others 1995; Falkenstein and others 2000) might not reflect the output of a feedforward control mechanism (Bernstein and others 1995) but the degree of conflict between 2 coactivated motor channels (Botvinick and others 2001; Yeung and others 2004). Coherent with the SMA source reported here, the ERN has been located within the anterior cingulate cortex/SMA region (Dehaene and others 1994; Luu and Tucker 2001).
Given the evidence just presented, enhanced SMA activity preceding articulations of sound errors in our data would be in line with the assumption of conflicts arising at a processing level related to the phonetic encoding or articulatory planning of speech output. At the same time, these data are also compatible with a role of the SMA in speech production comparable with its function in other domains of motor behavior (see e.g., Crosson and others 2001; Ziegler 2002; Krainik and others 2003; Indefrey and Levelt 2004).
Most prominent models of speech production assume sound errors like the spoonerisms elicited within the SLIP paradigm to result from misallocations of abstract phonological representations within a “prosodic” frame (Shattuck-Hufnagel 1983; Dell 1986; Levelt and others 1999; Berg 2005) on an antecedent level of processing. This view is based on the observation that phonemes constitute the linguistic unit mostly affected in sound errors. Meyer (1992), for example, estimates that 60–90% of all errors can be identified as single-segment misorderings, whereas probably less than 5% of all sound errors can be identified as feature errors. The often observed accommodation of shifted segments to their new position (Fromkin 1971; Garrett 1975; Stemberger 1982, 1983) leading to phonotactic and articulatory well-formedness of such errors has been taken to suggest that the production of sound errors otherwise does not differ from the production of correct utterances.
This view has been challenged by acoustic, electromyographic, and kinematic analysis of speech errors, suggesting that sound errors can also affect the articulatory stage of speech production (Mowrey and MacKay 1990; Pouplier and Hardcastle 2005). Pouplier M (submitted), for example, shows that many errors produced within the SLIP paradigm feature the coproduction of the intended and an intruding gesture.
This finding suggests that at least the articulation of some sound errors is preceded by a conflict between competing representations of articulatory gestures in agreement with the SMA activation as the main source for the negativities preceding the production of spoonerisms within the present study.
Yet, it is unclear why such conflict should not arise in the context of phonological priming: Although the higher rate with which critical pairs compared with control pairs are followed by the production of spoonerisms would suggest to expect a higher chance for conflicts to occur during the production of critical word pairs, no corresponding brain potential difference was obtained between critical and control pairs followed by correct articulations (Figs 1 and 3). This suggests that in these trials interference from the inductor pairs either did not occur, had been effectively controlled, or did not reach the stage of phonetic encoding.
Interestingly, the current data showed increased activity within the SMA not only after presentation of the target word pair but also immediately after the presentation of the vocalization prompt that was followed by the production of the speech error.
In this sense, the first negativity arising after the presentation of a target word pair is probably reflecting conflict at a phonological/phonetic encoding stage, whereas the negativity observed directly after the presentation of the vocalization prompt might be indexing conflict at a following articulatory motor stage.
In agreement with this, the second negativity differs from the first also in terms of its neural generators, in this particular case, restricted only to the medial prefrontal source.
The source analysis of the first negativity following the presentation of a target word pair also revealed a secondary left temporal source. The anterior left MTG is normally considered to play a role in the retrieval of lexical rather than phonological representations (Indefrey and Levelt 2004). Therefore, it cannot be easily related to the production of sound errors. Given the spatial resolution of source localization methods, it could be considered if this source might rather reflect activity within the adjacent superior temporal gyrus, a structure suggested by Indefrey and Levelt (2004) to participate not only in the processing of the perceived speech of others but also external and internal self-monitoring of one's own speech (see also Callan and others 2006). Although the time course of the negativity is in line with the possibility for self-monitoring 300 ms after onset of a visual word as can be derived from the analysis of Indefrey and Levelt (2004), the low rate with which spoonerisms are corrected within the SLIP paradigm even when subjects are instructed to do so (Nooteboom 2005a, 2005b) speaks against a role of the temporal source in the internal detection of errors. Indeed, spoonerisms as the main class of errors produced within the SLIP paradigm are probably hard to detect for the internal self-monitoring system as they constitute correct entries from the mental lexicon. Moreover, within the SLIP paradigm, no context information is available to the speaker by which the appropriateness of an utterance could be assessed.
If an error monitoring account of the temporal activation is unlikely, what could be an alternative explanation for this source? As demonstrated by Wilshire (1998) using a tongue twister task, the lexical status of items to be produced has a strong influence on positional constraints of sound errors and especially the preferential tendency for interactions like anticipations, perseverations, and exchanges to occur between word initial phonemes. Sound exchanges between word initial segments of real words like they occur in spoonerisms may therefore be correlated with higher levels of activation of the respective entries from the lexicon, which in turn could explain the MTG activity (c.f., Indefrey and Levelt 2004). Activation changes of lexical representations might likewise be influenced by “phonological coactivation” of potential spoonerisms. Phonological coactivation has been proposed to result from direct (Dell 1985, 1986) or indirect (Roelofs 2004) positive feedback from phonological segments of target words to lexical entries of potential errors that share a sufficient number of phonological segments with the target. Because phonological coactivation is per definition restricted to lexical entries representing potential sound errors of real words, it was proposed to be the reason for the “lexical bias” (Dell 1985, 1986), the statistical tendency of sound errors to form real words while the “error monitoring account” of the lexical bias assumes that the lexical spoonerisms are just harder to detect for the internal self-monitoring system (Levelt and others 1999).
Although the current data show differential brain activity preceding slips of the tongue, it has to be kept in mind that our elicitation method induced errors, which occur very late in the speech production process. Other “earlier” types, such as conceptual (e.g., “We start in the middle with—in the middle of the paper with a blue disc.”), syntactic (e.g., “And when they chew coca, which they chew coca all the day long.”), or lexical (“Left of purple is—uh—of white is purple.”) errors (Postma 2000) might engage different brain regions and will require different methods of elicitation.
We are grateful to Dr Stefan Dilger for sharing his materials. We thank Drs Jane Banfield, Arie van der Lugt, and Niels Schiller for their comments. This work was supported by grants from the Deutsche Forschungsgemeinschaft to TFM, the Spanish Ministry of Science to ARF, the Dutch Science Organization Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NOW) to BMJ, and the German Ministry of Science Bundesministerium für Bildung und Forschung (BMBF) to the Center for Advanced Imaging, Magdeburg. Conflict of Interest: None declared.