Music performance is an extremely rapid process with low incidence of errors even at the fast rates of production required. This is possible only due to the fast functioning of the self-monitoring system. Surprisingly, no specific data about error monitoring have been published in the music domain. Consequently, the present study investigated the electrophysiological correlates of executive control mechanisms, in particular error detection, during piano performance. Our target was to extend the previous research efforts on understanding of the human action-monitoring system by selecting a highly skilled multimodal task. Pianists had to retrieve memorized music pieces at a fast tempo in the presence or absence of auditory feedback. Our main interest was to study the interplay between auditory and sensorimotor information in the processes triggered by an erroneous action, considering only wrong pitches as errors. We found that around 70 ms prior to errors a negative component is elicited in the event-related potentials and is generated by the anterior cingulate cortex. Interestingly, this component was independent of the auditory feedback. However, the auditory information did modulate the processing of the errors after their execution, as reflected in a larger error positivity (Pe). Our data are interpreted within the context of feedforward models and the auditory–motor coupling.
Music performance entails a tight control of motor programs that has to be fine-tuned through auditory feedback. This implies that executive control mechanisms need to be in effect during the acquisition of musical skills as well as during the performance at a high, professional level (Münte et al. 2002; Zatorre et al. 2007). Surprisingly, however, such mechanisms have not been exhaustively studied in relation to music.
The human action-monitoring system has attracted increasing interest since the early 1990's (Falkenstein et al. 1990; Gehring et al. 1993, 1995; Ullsperger and von Cramon 2001; Rodriguez-Fornells et al. 2002; Kerns et al. 2004). A seminal finding was a negative deflection in the event-related potentials (ERPs), termed error-related negativity (ERN), or error negativity (Ne), which peaks about 100 ms after the onset of the electromyographic activation of the incorrect response agonist or at about 70 ms after the incorrect key press. Its neural generators are located in the anterior cingulate cortex (ACC), presupplementary motor area (pre-SMA), and SMA (Dehaene et al. 1994; Carter et al. 1998). Recently, the Nucleus accumbens has been shown to be involved in action-monitoring by eliciting error-related activity even 40 ms before the scalp ERN (Münte et al. 2008). The ERN has been hypothesized to reflect error-detection processes (Holroyd and Coles 2002) or conflict monitoring (Cohen et al. 2000; Botvinick et al. 2001). In the first case, it is assumed that the ERN indexes the error signal of a feedforward control mechanism (Bernstein 1967; Bernstein et al. 1995). This assumption is based on the short latency of the ERN, which makes it implausible that the slow sensory and proprioceptive loops generate such fast error signal (Wolpert et al. 1995). To the best of our knowledge, besides the work of Möller et al. (2007) with manipulation of tongue slips, no published electrophysiological data have demonstrated that the ERN can be elicited before error onset in highly skilled motor tasks. After the ERN, a Pe is elicited between 200 and 500 ms with parietal maximum, which reflects the subjective conscious error recognition (Falkenstein et al. 1990; Nieuwenhuis et al. 2001; Van Veen and Carter 2002).
A key issue is whether the ERN is related to action monitoring only or also to the emotional outcomes of action monitoring (Luu and Tucker 2004). Evidence for the latter has been provided by several studies reporting affective influences on the amplitude of the ERN (Luu et al. 2000a, b; Vidal et al. 2000) and on the fMRI activation when errors are committed (Kiehl et al. 2000; Menon et al. 2001; Garavan et al. 2002). More specifically, the affective or emotional significance of errors has been proposed to be reflected in the activation of the rostral ACC after erroneous responses (Luu et al. 2003; Taylor et al. 2006). Other studies have engaged the more general affective appraisal network—rostral ACC, insula, and amygdala—in the emotional processing of errors (Menon et al. 2001; Garavan et al. 2003; Polli et al. 2008).
The present study investigated the ERPs associated with error detection in humans during the performance of piano sequences which had to be retrieved from memory at fast tempi. In order to tease out the contributions of the auditory and somatosensory information in error detection, our experimental paradigm consisted of 3 parts: an audiomotor condition (AM), a motor condition (M), and a purely auditory condition (A).
Regarding the importance of auditory feedback in music making, Lashley argued already in 1951 that the fast production rates in piano performance prove that the motor control does not rely on the slow auditory feedback. Later findings reported that auditory feedback is essential during the processing of learning new musical pieces; however, once the pieces are learned, their retrieval from memory is independent of the presence or absence of auditory feedback (Repp 1999; Finney and Palmer 2003). This is not the case, however, for string instruments, in which the lack of auditory feedback (not using the bow) distorts the pitch performance extremely (Chen et al. 2008).
Music and speech production are time-based sequential behaviors which require planning by means of a memory representation to prepare events for production (Pfordresher and Palmer 2006). This is in agreement with the ideas of Lashley (1951), who suggested that to achieve the temporal precision of fast piano performances, pianists would have to not only prepare in advance for the production of the current event, but also for the peripheral events. Recently, a model of the simultaneous co-activation of the current event and the surrounding context in a fast sequence has been proposed and validated empirically in an experiment on piano performance (Pfordresher et al. 2007).
Taking the previous findings into account, our main hypotheses were as follows: First, the presence or absence of auditory feedback while pianists were playing the musical pieces would not modulate the ERN or error rate. Second, according to the models of preparation in advance of the upcoming nearby sequence positions in a fast sequence (Lashley 1951; Pfordresher et al. 2007), we expected the subjects’ anticipation of several notes in the motor preparation. In case of an upcoming error, the action-monitoring system could trigger the ERN even before the note onset. Third, the auditory feedback of errors in AM would have a stronger impact on the subjective awareness of errors as compared with M, and a larger Pe could be obtained. Finally, the auditory feedback alone would give rise in the auditory condition to a feedback-locked error-related negativity (f-ERN, Miltner et al. 1997; Badgaiyan and Posner 1998; Nieuwenhuis et al. 2002) in the brain responses associated to pitch errors.
Materials and Methods
Nineteen healthy pianists (8 females, age range 20–29 years, mean 22 years), who were students at or had graduated from the University of Music and Drama of Hanover participated in this study. All participants were professional pianists. Eighteen of the participants were right handed, and one was left handed, according to the Edinburgh inventory (Oldfield 1971). All participants reported normal hearing. All subjects gave informed consent to participation in the study, which had received approval by the local Ethics Committee of Hanover. Due to equipment malfunction, one subject was excluded leaving 18 subjects for the analysis.
Initially, we selected sequences from the right-hand parts of the Preludes V, VI, and X of the Well Tempered Clavier (Part 1) by J. S. Bach and the Piano Sonata No. 52 in E Flat Major by J. Haydn. These pieces were chosen because their parts for the right hand contain mostly single pitches of the same value (duration), 16th-notes, which made our stimulus material homogeneous. The stimuli were 6 sequences extracted from the aforementioned material (Fig. 1). In piece 5, which was adapted from the Prelude X of Bach, we replaced one chord of the original score by one single pitch and replaced one pair of eighth notes by a group of four 16th-notes; all stimuli constituted complete musical phrases. The numbers of notes per sequence were 200, 201, 202, 185, 192, 192. Accordingly, the stimulus material consisted of 1172 different notes. The tempo for each piece was selected so that the interonset interval (IOI, time between onsets of 2 subsequent notes) was 125 ms (8 tons/s) in all cases. The performance rate was fast in order to induce error production in the pianists. The duration of the pieces was around 25 s. The last 16–24 notes (last bar) of each sequence were not analyzed, because the ritardando (slowing down) at the cadence constitutes a change in tempo and, consequently, in the IOI. Most pieces were familiar to all pianists. However, they were instructed to rehearse and memorize them before the experimental session. We stressed the importance of memorizing the pieces with the corresponding tempo, with the help of metronome. Further, it was recommended to the pianists to rehearse the pieces before the experiment in the presence and absence of auditory feedback and without tracking the fingers. Once the pianists came to the experimental session, they had to perform all pieces correct in tempo and pitch without using the score. This was the prerequisite to start with the electroencephalography (EEG) recording.
Participants were seated at a digital piano (Wersi Digital Piano CT2, Halsenbach, Germany) in a light-dimmed room. They sat comfortably in an arm-chair with the left forearm resting on the left armrest of the chair. The right forearm was supported by a movable armrest attached to a sled-type device that allowed effortless movements of the right hand along the keyboard of the piano (see Supplement figure). The keyboard and the right hand of the participant were covered with a board to prevent participants from visually tracking hand and finger movements. Instructions were displayed on a TV monitor (angle 4°) located above the piano. Before the experiment, we tested whether each pianist was able to perform all musical sequences according to the score and in the desired tempo. They were instructed to perform the pieces each time from beginning to end without stopping to correct errors. Playing the correct notes and maintaining accurate timing were stressed. Pianists were unaware of our interest in investigating error-monitoring processes.
The experimental design consisted of 3 conditions (AM, M, A) comprising 60 trials (around 11 700 notes) each. The order of the conditions was randomized with the constraint that the performance in AM was recorded to conduct the A session and therefore preceded A. The 60 trials were also randomly selected out of the 6 stimulus materials. Participants initiated each trial by pressing the left pedal of the MIDI (music instruments digital interface) keyboard. Both in AM and M, participants had to play from memory the musical stimuli 1–6 without the music score. The only difference between both conditions was that the volume of the MIDI keyboard was set to zero in M, thus canceling out the auditory feedback. The specifications of each trial were as follows: The pianists pressed the left pedal when they were ready for a trial. After a silent time interval of 500 ± 500 ms randomized, the first 2 bars of the music score were presented visually on the monitor for 4000 ms to indicate which of the 6 sequences had to be played. To control for the timing in each piece, we used a synchronization–continuation paradigm. After 2500 ms of the visual cue, the metronome started and paced for 1500 ms the tempo corresponding to the piece and then faded out (after 4 metronome beats at 120 bpm or after 5 metronome beats at 160 bpm depending on stimulus sequence, see Fig. 1). After the last metronome beat, the visual cue also vanished. Participants were instructed not to play while the music score was displayed on the screen, but to wait until a green ellipse appeared on the monitor (100 ms after the vanishing of metronome and visual cue with the score).
In A, the pianists listened through loudspeakers to their performances recorded in AM. The volume level was adjusted to their preferences.
Each of the 3 conditions was approximately of 40 min length, in which the pianists produced (or listened to) around 11 700 notes.
EEG Recordings and Preprocessing
Continuous EEG signals were recorded from 35 electrodes placed over the scalp according to the extended 10–20 system (FP1,2, AF7,8, F7,8, F3,4, FT7,8, FC3,4, T7,8, C3,4, TP7,8, CP3,4, P7,8, P3,4, PO7,8, O1,2, AFz, Fz, FCz, Cz, Cpz, Pz, and POz) referenced to linked mastoids. Additionally, electrooculogram was recorded to monitor blinks and eye movements. Impedance was kept below 5 kΩ. Data were sampled at 500 Hz; the upper cutoff was 100 Hz (software by NeuroScan, Inc., Herndon, VA). Visual trigger stimuli, note onsets and metronome beats were automatically documented with markers in the continuous EEG file. Performance was additionally recorded as MIDI files using a standard MIDI sequencer program. We used the EEGLAB Matlab Toolbox (Delorme and Makeig 2004) for visualization and filtering purposes. A high-pass filter at 0.5 Hz was applied to remove linear trends and a notch filter at 50 Hz (49–51 Hz) to eliminate power-line noise. The EEG data were cleaned of artifacts such as blinks and eye movements by means of wavelet-enhanced independent component analysis (wICA; Castellanos and Makarov 2006), after first computing the ICA components with the FastICA algorithm (Hyvärinen and Oja 2000). Standard ICA is commonly used to obtain the statistically independent components of raw EEG signals. The user then rejects the ICA components which contain artifacts, and the rest of the ICA components are transformed back into the signal space in order to obtain EEG signals without artifacts. However, the rejection of the ICA components has been proven to constitute a loss of neural activity, because the rejected components do not always contain only artifacts: they contain also cerebral activity. This might then affect the data analysis and lead to spurious results (Wallstrom et al. 2004; Castellanos and Makarov 2006). Consequently, wavelet-enhanced ICA is an algorithm designed to improve the “leak” of the cerebral activity by separating the background neural activity from the isolated artifacts in the ICA components. This is possible by means of wavelet thresholding as an intermediate step to the demixed independent components. The wavelet thresholding filters out the artifacts only due to their specific time-frequency properties and leaves the background neural activity “untouched”. Because this procedure can be performed automatically by filtering all independent components rendered by ICA, wavelet-enhanced ICA does not require the laborious visual inspection of all ICA components and is, accordingly, a faster procedure. After applying ICA, we did a visual inspection of the data to eliminate epochs still containing muscle artifacts.
The data epochs representing single experimental trials time-locked to the onset of the isolated errors (see Data Analysis) and isolated correct notes were extracted from −300 ms to 500 ms, resulting in approximately n = 50−120 artifact-free epochs for errors and n = 500 artifact-free epochs for correct notes per participant.
An error-detection algorithm was developed in MatLab, which compared each MIDI performance with the pitch contents of a template (the score). Similarly to Finney and Palmer (Finney and Palmer 2003), all errors which systematically appeared in at least 7 out of 10 trials of a type and which could be related to a learning error were removed from the analysis. In addition, when several consecutive pitch errors were identified, they were removed from the analysis. Further, only isolated pitch errors, which were preceded and followed by 3 correct notes entered the analysis. Similarly, only isolated correct notes based on the previous criterion were selected. Two additional constraints were set to all pre-selected errors and correct notes in order to assure their temporal precision and to avoid overlapping of brain responses: First, the time interval between MIDI note on and off was not accepted to be above 150 ms. Second, the minimal and maximal IOI prior to and posterror were set to 100 and 300 ms, respectively. We did not set a stricter criterion of IOIs for errors because it would have rendered few isolated errors, which is inconvenient for EEG analysis. Furthermore, in the case of a posterror slowing, the IOI after errors would be strictly larger than 125 ms. Because there were thousands of notes correct in pitch, the IOI constraint was strengthened to a minimal IOI of 120 ms and maximal of 130 ms for correct notes. By means of this last criterion, we achieved 2 goals: 1) trials of correct pitches generating brain responses related to errors in timing were excluded; 2) we obtained fewer correct trials (from several thousands) for further analysis.
We performed the following types of data analysis: At first, the standard time averaging technique was executed to analyze the ERPs of the brain responses triggered by actions leading to pitch errors (wrong note was played) as compared with actions leading to correct pitches. ERPs were derived by averaging the raw epochs for each subject and condition, and the result was baseline-corrected. The baseline was computed from 300 to 150 ms prior to correct notes or errors. The short interstimulus intervals (ISI) of 125 ms (between consecutive notes) imposed on the pianists to elicit pitch errors in an ecological paradigm are, beyond question, realistic in highly skilled music performance. However, short ISIs produce overlapping ERP components of neighbor events (Woldorff 1993). Consequently, as a second analysis we used a coarse-graining method, the symbolic resonance analysis (SRA), to disentangle possible overlapping brain responses (Beim Graben and Kurths 2003), and validate the ERP analysis. The SRA has been demonstrated to detect ERP differences between conditions which cannot be discovered by the traditional voltage average, although differences in processing are theoretically expected (Frisch and beim Graben 2005; Beim Graben et al. 2007). Furthermore, this method performs optimally when there is a small number of trials, as in our experiment, by increasing the signal-to-noise ratio (SNR). Finally, the SRA is able to disentangle different contributions to the EEG when the intervals between stimuli are small, as in our case.
SRA is an analytic technique for ERPs which exploits the properties of stochastic resonance in threshold systems (Moss et al. 1994). SRA was inspired by Lehmann (1971), who had proposed considering only positive and negative maximal field values of the EEG voltages. A theoretical foundation of such coarse-graining technique is provided by the SRA (Beim Graben and Kurths 2003). This method maps EEG time series corresponding to single trials onto sequences of 3 symbols by varying the encoding thresholds, θ, which are voltage levels. More specifically, each sampled measurement is mapped onto “0” if the value is below − θ, onto “2” if the value is above +θ, and onto “1” if the value is in-between. Thus, the SRA benefits from the inherent noise of the EEG to drive the underlying ERP components beyond the thresholds. From the grand epoch ensemble (including epochs of all subjects) of 3-symbol sequences, a histogram with the 3-symbol statistics at each time point is computed, representing the relative frequencies of above- and below-threshold crossing events at each sampling point (P2, P0), as well as of the noncrossing events (P1). By means of the Reversi transformation which exploits the competition between the mean-fields, M0 = P0 - P1 and M2 = P2 - P1, the “undecided” symbol 1 is flipped into a 0 (if there are more 0 s than 2 s) or into 2 (when there are more 2 s than 0 s). The distribution of 3 symbols is thus transformed into a 2 symbols’ distribution (0, below-threshold events; 2, above-threshold events). This procedure is computed for different encoding thresholds. For example, ERPs’ positive deflections will be associated with higher probability of 2 symbols for the optimal encoding threshold.
The cylinder entropies generally decrease within the time range of an ERP. In Beim Graben (2001) it was demonstrated that the entropy averaged across the time interval ton - toff of the ERP
In sum, the implementation of the above-mentioned SRA algorithm to obtain the largest between-conditions difference in the SNR curves, was performed in our study as follows: 1) We selected varying thresholds between 0 and 10 μV (every 0.1 μV) representing voltage ERPs; 2) we computed for each encoding threshold and condition the grand epoch ensemble (GEE) of 3-symbol sequences (gathering symbolic sequences of all subjects); 3) we applied the Reversi transformation to obtain the binary sequences for each condition; 4) we computed the cylinder entropies; and 5) integrated them within a time window of interest to get the SNR value for each condition and threshold. Next, the SNR curves of the GEE for each condition were plotted against the encoding thresholds. The optimal threshold, θ#, is the threshold value which maximizes the SNR difference between conditions. Thus, the largest between-conditions difference SNR is associated with the optimized amplitude of the difference ERP waveforms, and can be related to the maximal separation of the dynamics.
From the MIDI files, we extracted information regarding the time between onset of notes (IOI) and the loudness of each note (the so-called MIDI velocity). The temporal unevenness of IOI for each playing condition was characterized by the mean IOI and the mean standard deviation of IOI (mSD-IOI). The latter parameter was previously reported to be a precise indicator of pianists’ motor control (Jabusch et al. 2004). The mean IOI provided an indicator of how well the pianists adjusted to the given tempi (125 ms between 2 consecutive onsets of notes). In addition, we computed the mean overall loudness (mean velocity) for correct notes and for errors in AM and M separately. This parameter was able to indicate whether pianists pressed the keys with different force depending on the presence and absence of auditory feedback. To investigate whether the loudness values of the errors were different from the loudness values of correct notes at the same position on the score, we calculated the difference between the average loudness of correct notes and the loudness of the matching error. Again, this analysis was performed for AM and M separately.
To assess the statistical differences in the ERPs, the ERP waveforms were first averaged for each subject and condition across the electrodes grouped into the clusters defined below. Next, for each time point from −200 to 500 ms, the averaged indices were analyzed by means of synchronized permutations of a 3 × 2 (Condition × Event type) design (Good 2005). The 3 levels of the factor condition were AM, M, and A; the 2 levels of the factor event type were correct and wrong note. Synchronized permutations are based on the nonparametric pairwise permutation test (Good 2005) and are recommended to obtain exact tests of hypotheses when multiple factors are involved. They are generated, for instance, by exchanging elements between rows in one column and duplicating these exchanges in all other columns. Thus, synchronized permutations provide a clear separation of main effects and interactions.
Selected electrode sites were pooled to 3 topographical clusters (see below), and in each one the synchronized permutations were computed. Differences were considered significant if P < 0.05. Significance levels for multiple comparisons of the same data pool were obtained by a Bonferroni-correction of the 0.05 level.
Six clusters of surface EEG channels were selected on the basis of a priori anatomical and physiological knowledge (Gerloff et al. 1998; Stemmer et al. 2004; Eichele et al. 2008). For familiarity with the standard notation, we renamed the clusters to regions of interest (ROIs). We chose electrodes that cover the lateral premotor cortex, the SM1 bilaterally (left: FC3, C3, CP3; right: FC4, C4, CP4), and the mesial frontocentral cortex including the pre-SMA and SMA (FCz, Cz, CPz). Additionally, electrodes from bilateral prefrontal regions were selected (left: FP1, AF1, F3, F7; right: FP2, AF2, F4, F8), due to the role of the prefrontal cortex in maintaining motivation and effort in tasks requiring retrieval from memory (Eichele et al. 2008). Finally, parietal electrodes from the midline were also pooled to constitute the sixth ROI (CPz, Pz, POz). This selection was based on evidence that the parietal regions might be involved in the generation of the ERN (Stemmer et al. 2004) and that they also display the maximal activity related to the Pe (Falkenstein et al. 1990; Nieuwenhuis et al. 2001). For the topographic analyses, the threshold value after the Bonferroni correction was thus 0.0083. All results based on the ROI analysis refer to the clusters of surface electrodes and, although the surface activity is certainly related to the underlying neurophysiological sources, we cannot claim a one-to-one correspondence between surface electrode and intracranial source.
In case of a significant interaction between factors condition and event type, univariate analyses were performed with the use of a nonparametric pair-wise permutation test (Good 2005). As previously stated, for multiple comparisons of same data pool significance levels, we used the Bonferroni correction.
The statistical reliability of the SRA can be assessed with a permutation test by 1) generating M = 5000 replicas of the GEE of errors and correct notes, 2) exchanging in each replica around half of the binary epochs randomly between the GEE of errors and correct notes (Beim Graben et al. 2005). We evaluated for each replica the test statistics:
For our selection of 6 ROIs the standard 0.05 significance level was again corrected to 0.0083.
Differences in the behavioral performance data between-conditions or between-event types were also analyzed using a nonparametric pair-wise permutation test.
Results of the performance analysis are presented in Table 1. Pitch errors occurred in 3% (SD 2%) of all played notes in AM and also in 3% (SD 1%) in M. Based on the above-mentioned criteria, the selection of isolated erroneous notes yielded a value of 0.7% (SD 0.3%) in AM and of 0.7% (SD 0.4%) in M. The percentages of total and isolated errors did not differ statistically between conditions (permutation test across subjects, P > 0.05). The values of the mean IOI and its SD provide an indication of how the pianists adjusted to the given tempi (ideal IOI of 125 ms). In AM, the mean IOI was 121 ms (8 ms), whereas in M the mean IOI was 123 ms (8 ms). The difference in mean IOI was not significant (P > 0.05). These results confirmed that pianists successfully performed the sequences with a timing very close to the right IOI. Moreover, these data indicated that pianists played with a similar timing with or without auditory feedback. The mean IOI of the 3 correct notes before an error was larger than 125 ms (190 ms in AM, 170 ms in M), and also after the error (240 ms in AM, 200 ms in M). This outcome demonstrated that there was pre- and posterror slowing in the IOI in both playing conditions. A permutation test performed in each condition separately demonstrated that the difference between the pre-error slowing and the mean IOI of all trials as well as between the posterror slowing and the mean IOI was significant in AM and M (P < 0.05). Moreover, the pre- and posterror slowing did not differ statistically either for AM or for M (P > 0.05 in both conditions).
|Audiomotor condition||Motor condition|
|Percentage of total pitch errors||3% (2%)||3% (1%)|
|Percentage of isolated pitch errors||0.7% (0.3%)||0.7% (0.3%)|
|Number of total pitch errors||400 (300)||400 (200)|
|Number of isolated errors||80 (30)||80 (40)|
|IOI of all notes (ms)||121 (8)||123 (8)|
|Mean IOI of 3 notes before isolated pitch errors (ms)||190 (60)||170 (60)|
|Mean IOI of 3 notes after isolated pitch errors (ms)||240 (60)||200 (60)|
|Overall loudness: Correct||75 (6)||76 (7)|
|Overall loudness: Errors||68 (6)||72 (5)|
|DiffLoudness (Err-Corr) at same position on the score||−7 (4)||−5 (4)|
|Audiomotor condition||Motor condition|
|Percentage of total pitch errors||3% (2%)||3% (1%)|
|Percentage of isolated pitch errors||0.7% (0.3%)||0.7% (0.3%)|
|Number of total pitch errors||400 (300)||400 (200)|
|Number of isolated errors||80 (30)||80 (40)|
|IOI of all notes (ms)||121 (8)||123 (8)|
|Mean IOI of 3 notes before isolated pitch errors (ms)||190 (60)||170 (60)|
|Mean IOI of 3 notes after isolated pitch errors (ms)||240 (60)||200 (60)|
|Overall loudness: Correct||75 (6)||76 (7)|
|Overall loudness: Errors||68 (6)||72 (5)|
|DiffLoudness (Err-Corr) at same position on the score||−7 (4)||−5 (4)|
The mean overall loudness (mean MIDI velocity) of correct notes was 75 (6) in AM and 76 (7) in M (nonsignificant difference, P > 0.05), which confirms that the performance with and without auditory feedback was similar in MIDI velocity. In addition, the mean overall loudness of pitch errors was the same in both performance conditions: 68 (6) in AM and 72 (6) in M (nonsignificant difference, P > 0.05).
A very interesting question was whether the loudness of errors was reduced as compared with the loudness of the corresponding correct notes in the same position on the musical score. An affirmative answer to this question would indicate that a corrective response had already been initiated by the time of pressing the erroneous key. The analysis of the mean difference in loudness between pitch error and the averaged loudness of the matching correct notes yielded a value of −7 (4). A permutation test across subjects with the mean difference between the MIDI velocity of errors and matching correct notes as test statistics revealed a significant difference (P < 0.01) in AM. Similarly, the difference between loudness of errors and matching correct notes in M, −5 (4), was also significant (P < 0.01). We could, therefore, confirm that the loudness of pitch errors decreased in comparison with the loudness of the corresponding correct notes consistently across performance conditions.
The grand-average waveforms of the note-onset–locked responses in AM are depicted in Figure 2 at electrode positions Fz, FCz, Cz, and CPz. When comparing errors with correct notes, a negative deflection is observed at all electrode positions between 70 and 20 ms prior to the onset of errors. Furthermore, a larger positive peak was elicited after note onsets in errors as compared with correct notes. The latency of the positive deflection was of 50–100 ms. Also, a final larger positive deflection between 240 and 280 ms was observed, resembling the Pe. In M a negative peak was also found in the difference ERP waveforms at all electrode locations (Fig. 3) and between 50 and 0 ms prior to note onset. Similarly, the Pe was elicited in M but earlier than in AM: between 180 and 220 ms. No positive components were elicited around 50 ms after the note onsets of errors or correct notes. The maximum of the negative deflection prior to errors was localized across frontocentral positions of the scalp in both conditions (Fig. 4). Likewise, the topographic maxima of the Pe in AM and M were localized across frontocentral electrode positions (Fig. 5).
In the auditory condition (A), in which participants listened to their performance recorded in AM, a negative-going deflection for errors compared with correct pitches was observed between 200 and 250 ms at midline electrode locations (Fig. 6). This large negativity at frontocentral brain regions elicited by the auditory feedback of errors may correspond to the f-ERN (see Discussion).
The multivariate statistical analysis performed with synchronized permutations in the 6 selected ROIs returned a main effect of event type (error, correct note) in the time window of 220–260 ms in the mesial frontocentral (Fz, FCz, Cz; P < 0.0083) and centro-parietal regions (Cpz, Pz, POz; P < 0.0083). In the same time window, a main effect of condition was found over the frontocentral ROI (P < 0.0083). From −70 to −20 ms no significant main effects were found, which is understandable considering that in A the ERP waveforms at this prestimulus latency were not affected.
Further, the ERP waveforms of the mesial frontocentral electrodes between errors and correct notes differed depending on the task condition in the time intervals from −70 to −20 ms and from 220 to 260 ms (significant interaction of the factors event type × condition, P < 0.0083). In the mesial centro-parietal region and between 220 and 260 ms we observed also a significant interaction event type × condition (P < 0.0083).
A post hoc univariate permutation test across subjects in AM revealed a significant enhanced negativity before errors as compared with correct notes (P < 0.0083). This effect was localized at the midline electrodes and between −70 and −20 ms, corresponding to what we term pre-error negativity (pre-ERN). In addition, a significant (P < 0.0083) positive difference was found between 50 and 85 ms in the same medial frontocentral brain areas. The Pe was found to be significant across the medial frontocentral but also across the medial centroparietal brain areas between 240 and 280 ms. A similar post hoc univariate permutation test in M showed a significant pre-ERN at the mesial frontocentral electrodes between −50 and 0 ms (P < 0.0083) and a significant Pe in the mesial frontocentral and centroparietal electrodes between 180 and 220 ms.
We were also interested in the specific comparison between the reaction to errors and correct notes in the auditory condition. In this case, the univariate permutation test revealed that the auditory feedback of performance errors lead participants to elicit a significantly larger negative deflection between 200 and 250 ms than correct notes. This significant effect appeared in all ROIs (P < 0.0083).
Finally, to test our hypotheses that the pitch errors with or without auditory feedback would not differ prior to their execution but rather in the final Pe, we computed a univariate permutation test across subjects comparing 1) error minus correct trials in AM with 2) error minus correct trials in M (Fig. 7). Such a comparison reflects how the auditory information present in AM and lacking in M influences the processing of the errors; the components of the motor and somatosensory information are canceled out by the subtraction. The permutation test provided the significant result that the Pe in AM is larger over frontocentral electrode regions than in M and also peaks later (P < 0.0083 in 250–280 ms). Besides the larger Pe in AM, no other significant results were found, not even before note onset. However, we can observe a negative peak in the difference waveforms around 200 ms in Figure 7C,F. This effect might probably arise due to the Pe at 200 ms in M, which is turned into a negative peak in the subtraction of the curves.
In sum, this last statistical test confirmed that the pre-ERN was identical in both conditions across all brain regions (P > 0.05, before 0 ms). Additionally, it corroborated that the Pe in AM was larger than in M. This posterror positivity around 200–250 ms may be associated with error awareness (Falkenstein et al. 1990, Nieuwenhuis et al. 2001; see Discussion).
We tried to estimate the neural generators of the brain activity associated with the pre-ERN and Pe in AM and M using the sLORETA inverse model (Pascual-Marqui 2002). This method is a standardized low-resolution brain electromagnetic tomography and computes the standardized current density with zero localization error. sLORETA revealed that the main focus of activity related to the pre-ERN between −70 and −20 ms in AM and between −50 and 0 ms in M was located in the Brodmann area 32 of the rostral ACC (MNI coordinates: x = −5, y = 35, z = 0). The source of activity related to the Pe in AM and M was found in the Brodmann area 24 of the rostral ACC (x = −5, y = 35, z = 5). Figure 8 illustrates these results.
Symbolic Resonance Analysis
The EEG epochs were extracted in a time window beginning 300 ms before and ending 500 ms after the onset of the note. The baseline of the prestimulus interval from 150 to 300 ms was subtracted from all EEG epochs. For each encoding threshold tuned from 0.1 to 10 μV in steps of 0.1 μV, the EEG trials of all subjects were encoded into GEEs of sequences of the 3 symbols (“0”, “1”, “2”) and then transformed into binary sequences (“0”, “2”) by means of the Reversi transformation (see Materials and Methods). From the binary sequences, the SNR curves were computed in each condition for errors and correct notes in the time windows of interest associated with ERP waveforms. In AM we focused on the time window between −70 and −20 ms corresponding to the pre-ERN, whereas in M we selected the interval between −50 and 0 ms for the pre-ERN. An illustration of the different SNR curves of errors and correct notes associated with the pre-ERN in AM and in M for different encoding thresholds θ is presented in Figure 9 at electrode locations FCz and Cz. The figure reveals that in AM (Fig. 9A,B) the SNR associated with correct notes is higher than the SNR of errors, particularly around 3.3 μV which corresponds roughly to the optimal encoding threshold at these electrode positions. At the optimal encoding threshold we obtain the greatest separation of ERPs (error minus correct) with respect to the amplitude. More specifically, in the frontocentral and posterior mesial electrodes the values of the optimal encoding thresholds were in the range 3.2–3.4 μV (also in most electrode positions, with the exception of FP1 and FP2, in which they were higher). Note that the encoding thresholds have positive values, reflecting the optimal absolute values of voltages which are crossed by the underlying ERP components. Interestingly, the higher SNR attained at θ# for the correct notes indicates that more correct trials crossed the threshold than error trials, which can also be understood as a higher intertrial coherence for correct notes than for errors relative to θ#. By contrast, the smaller SNR for errors indicates that the ERP between −70 and −20 ms was much more affected by noise of contrary polarity, leading to less error trials crossing θ# and, moreover, leading to a smaller amplitude in the ERP of errors relative to that optimal encoding threshold. This interpretation is clear when we observe Figure 2. In that figure, we find that the ERPs of correct notes and errors in the time window under consideration are both positive, but the ERP of errors has an amplitude closer to the baseline, leading to the negative pre-ERN in the difference curve.
In Figure 9A,B we also find a true resonant effect for errors at 3.8 μV, threshold at which the SNR of errors is maximum. However, the SNR of correct notes at 3.8 μV is identical to that of errors, so that the separation of the dynamics is minimal and does not lead to a difference ERP of large amplitude.
Contrary to the AM condition, in M the SNR curves reached higher peaks for errors than for correct notes (Fig. 9C,D), particularly at the optimal encoding threshold around 2.7 μV. In this condition, the values of θ# for the frontocentral and posterior mesial electrodes were in the range 2.5–3.3 μV, values which are smaller than those obtained in AM. This result demonstrates that the maximal difference in the ERP waveforms is obtained in M for lower threshold values. Furthermore, we can say that at the optimal encoding thresholds more error trials cross the threshold than correct trials, leading to a higher intertrial coherence and, thus, to a larger amplitude relative to that encoding threshold. In agreement with the last interpretation, we observe in Figure 3 that in M the ERP waveform of errors between −50 and 0 ms has larger negative amplitude, whereas the ERP waveform of correct notes is closer to the baseline.
Interestingly, the SNR curves in Figure 9C,D are bimodal for correct and wrong notes. Such bimodal SNR curves indicate 2 symbolic resonances within the particular time window. Thus, the ERPs for correct and wrong notes seem to consist of 2 superimposed components of slightly different amplitudes. Although the 2 ERP subcomponents cannot be easily observed in Figure 3 for each condition separately, in the difference curve (error minus correct) it is clear that the pre-ERN has 2 modulations of the amplitude. Future work has to shed some light on the bimodal aspect of the pre-ERN in M. In the case of AM, it could be that there is also a bimodal pre-ERN but that it overlaps with the auditory processing of the previous note.
The topographic distribution of the maximal difference SNR (attained at θ#) for the pre-ERN in AM and M is plotted in Figure 10. Panel A in this figure reflects that in AM there is more signal (and less noise) for correct notes than for errors; in other words, because from 70 to 20 ms prior to note onset the waveforms had positive polarity, the negative difference SNR indicates that less trials were above-threshold crossing events for errors than for correct notes, given θ#. We can observe that the largest differences between the SNR of errors and correct notes point to the electrodes located over the pre-SMA and SMA, but also seem to extend to the recording sites over the left sensorimotor cortex. In (B) we find a dipole-like topographical distribution: the electrode Cz (Fig. 9D) and the parietal mesial electrode positions had higher SNR values for errors than for correct notes; however, in the frontocentral channels FCz and Fz, the effect was the opposite and larger (−0.04). In Figure 9C we observed that the global maximal value of the SNR curve was higher for errors. Nevertheless, the difference SNR curve had its maximum value for a θ# such that the SNR of correct notes was indeed larger.
The permutation test across symbolically encoded EEG epochs computed in AM in the 6 ROIs revealed significant differences between the resonance curves only at the midline electrodes (P < 0.0083). In these same ROI, the permutation test showed significant differences between the resonance curves of errors and correct notes in M (P < 0.0083).
In sum, the SRA confirmed that mainly the surface electrodes located over mesial frontocentral areas are active during error-related processing. Because the SRA is a robust method against a small number of trials or short ISIs, it strengthens the evidence provided by the ERP analysis.
To our knowledge, this is the first electrophysiological study assessing 1) the time course of error detection and 2) the different contributions of auditory and somatosensory information to error monitoring in a natural kind of piano performance as an example of a highly skilled multimodal task (Münte et al. 2002; Zatorre et al. 2007).
Error Detection in Advance is Independent of the Auditory Feedback
The main finding was that already at 50−70 ms before the onset of pitch errors, the brain potentials in the mesial frontocental electrodes (Fz, FCz, Cz) elicited a negative deflection, the pre-ERN, possibly indexing an error signal of the self-monitoring system. The pre-ERN was independent of the presence or absence of auditory feedback and had a correlate at the behavioral level: the loudness of pitch errors was decreased as compared with the loudness of correct notes at the same position on the score.
These results demonstrate that the pitch accuracy and temporal precision required in the production of fast complex musical sequences is possible in part by the perfect functioning of feedforward mechanisms in highly skilled pianists. Internal forward models can predict the next state of a system from its current state and motor command (Bernstein 1967; Wolpert et al. 1995; Desmurget and Grafton 2000). Further, they compare the actual motor outflow (efference copy) with the motor command. In case of a mismatch, an error signal is triggered to cancel the undesired sensory effects of the movement (reafference), and a corrective response is initiated.
In our paradigm the reported pre-ERN may be the neural correlate of this error signal, and the decreased MIDI velocity of errors might demonstrate that the self-monitoring system tries to cancel the sensory effects associated with the erroneous action. However, another interpretation which cannot be ruled out is that the decreased loudness of errors might be due to inhibition of the on-going motor response or, more generally, to an erroneous on-going motor pattern in which not only the wrong note is pressed but also with a wrong loudness.
The strikingly similar values of the performance analyses (e.g., pre- and posterror slowing, loudness of errors) for both AM and M provides evidence for our hypothesis that the auditory feedback does not mediate the detection of the pitch errors prior to the execution in piano performance. This fact is in agreement with the previous literature. In the context of piano performance, Lashley (1951) postulated that auditory feedback could not control the fast motor sequences of piano performance at a high tempo. This statement was further strengthened by the study of Finney and Palmer (2003), which demonstrated that the presence or absence of auditory feedback in the retrieval of memorized music sequences did not affect the error rate. Performance of rapid movements must thus be prepared in advance (Schmidt 1975, 2003; Pfordresher and Palmer 2006; Pfordresher et al.,2007). This last point is supported by the finding of the pre-error slowing in the AM and M conditions: the preparation in advance of the upcoming notes might enable musicians to detect the error already in the previous note (125 ms before), thus triggering a corrective response and delaying the latency of the error.
These findings, however, probably cannot be applied to string instruments. Here the absence of auditory feedback (not using the bow) has been demonstrated to have a profound impact on the accuracy of the pitch performance (Chen et al. 2008), most likely due to the dissociation between the “pitch map” and the “physical map” of the instrument. But it is important to note that speech production is indeed remarkably stable even in the temporary absence of auditory feedback and therefore constitutes a similar paradigm for piano performance whenever there is a temporary lack of auditory feedback. For instance, it was shown that intelligible speech is possible when the speaker cannot hear him or herself due to masking noise (Lane and Tranel 1971). However, when the speaker became deaf after learning to speak—which might constitute a permanent lack of auditory feedback—the stability of speech production is initially well preserved, showing a gradual deterioration with time (Waldstein 1990; Lane and Webster 1991).
Auditory Feedback Modulates the Expectations of the Sensory Effects and the Emotional Evaluation of Errors
We know that sound production is the ultimate goal of music performance. What can we say from our data about the fundamental role of auditory feedback in monitoring piano performance? When contrasting the difference (error-correct) ERP waveforms with and without auditory feedback, we found a significant difference across frontocentral regions between 250 and 280 ms. The Pe had a larger amplitude and appeared later in AM than in M. This result indicated that the effect of the auditory feedback in the processing of errors is to enhance the subjective conscious error recognition or attentional resource allocation following errors as reflected in the Pe (Falkenstein et al. 1990, Nieuwenhuis et al. 2001; Van Veen and Carter 2002).
Further, the larger Pe in AM suggests that the auditory modality enhanced the sensory expectations associated with an erroneous action. Eventually, the sensory outcome led to a keener awareness of the error. In contrast, in M the sensory expectations were modulated only by the proprioceptive information causing a weaker impact on the awareness of the error. As a result, the Pe in M had a smaller amplitude.
These findings can be partially understood within the framework of the ideomotor theory of action control (e.g., Prinz 1997). According to this theory, there is a binding between the motor action and the sensory effects it produces. This link arises after frequently performing the specific action and learning the sensory effects associated with it (Elsner and Hommel 2001) and leads to the strong auditory–motor coupling observed in musicians (Bangert and Altenmüller 2003; Drost et al. 2005a, b; Zatorre et al. 2007).
In the present study, the larger Pe observed in AM between 250 and 280 ms after errors cannot be explained by the violation of the auditory–motor mapping in the performance of the learned sequences. The reason is that incorrect actions produce the corresponding sensory effects (incorrect pitches). We speculate that it reflects an enhanced conscious error recognition possibly due to the following mechanism: in case of an upcoming error, the feedforward mechanisms anticipate and try to cancel the sensory effects of the movement. It may be then that the self-monitoring system expects to successfully correct the sensory effects (as the MIDI velocity data confirm), but the final erroneous auditory feedback increases the impact on the subjective conscious evaluation of the error.
In the case of an “artificial” violation of the auditory expectancies coupled with a correct voluntary action, some studies have provided converging evidence for a larger attentional resource allocation as reflected in a larger positive P300 peak (Nittono 2006; Waszak and Herwig 2007). In that same context, the positive deflection was preceded by a negative ERP component which reflects the mismatch between the intended auditory image evoked by motor activity and the actual modified auditory feedback (N210 in Katahira et al. 2008; MMN in Waszak and Herwig 2007). The P300 can be clearly discarded in our experiment to explain the Pe, because the Pe was observed after real self-made errors, rather than after “artificial” sensory violations.
At present there is an on-going debate about the significance of the Pe (Van Veen and Carter 2006). The main proposals are that the Pe could be involved in the subjective emotional evaluation of errors (Falkenstein et al. 2000), in the conscious error recognition (Nieuwenhuis et al. 2001) or in the posterror compensatory behavior (Nieuwenhuis et al. 2001; Hajcak et al. 2003), but some other studies are at odds with the previous interpretations (Hajcak et al. 2004; Debener et al. 2005).
Our findings are consistent with the previous suggestions of the emotional assessment of errors. Indeed, sLORETA detected the BA 24 in the rostral “affective” ACC as a source of the activity generating the Pe. The rostral ACC has been associated with emotional evaluation after erroneous responses (Luu et al. 2003; Taylor et al. 2006). Further, it has been reported to interact with other paralimbic and limbic regions (e.g., the amygdala and insula) to mediate affective processes (Devinsky et al. 1995; Whalen et al. 1998).
The fact that in music performance the evaluation of errors is emotionally modulated is not surprising. It is relevant to note that for expert musical performance, not only technical motor skills are required, but also the ability to communicate emotions by means of generating expressive performance (Sloboda 2000). Emotion and motivation are thus key elements for expertise in music performance (Palmer 1997; Ericsson et al. 2007). However, emotional evaluation of errors is not exclusive to music performance but is also characteristic of other goal-oriented behaviors, such as gambling or problem solving. Empirical evidence for the emotional/affective evaluation of errors has been found following monetary losses in gambling tasks (Gehring and Willoughby 2002; Dunning and Hajcak 2007) and after errors in a difficult mathematical task (Cavanagh and Allen 2008). Even in a more general error-monitoring scenario, mood and personality variables have been correlated to the brain responses following errors (Luu et al. 2000a).
The Relevance of Piano Performance as an Example of a Highly Skilled Multimodal Task
The results of the ERP analysis in AM and M were validated by the novel SRA, a method which benefits from the effect of stochastic resonance to disentangle possible overlapping brain responses (Beim Graben and Kurths 2003) and which is robust against low number of trials. The SRA confirmed that the pre-ERN in M and AM was located over mesial frontocentral electrodes. Further, it demonstrated that in the time interval of the pre-ERN the maximum separation of the dynamics for the ERPs of errors and correct notes, obtained at the optimal encoding threshold, was characterized by a larger SNR and more above-threshold crossing events for correct notes than for errors in AM but the opposite was true in M.
In AM, the larger SNR at θ# for correct notes than for errors reflected that more correct trials (of positive polarity) constituted above-threshold events and this lead to a larger intertrial coherence for correct notes. At θ#, more trials of errors (of positive polarity) were driven away from that threshold by noise and, consequently, we observed a reduced SNR. By contrast, in M, the SNR at θ# was larger for errors than for correct notes at most electrode positions. In this case more error trials (of negative polarity) crossed the optimal encoding thresholds than correct trials (also of negative polarity), leading to a larger intertrial coherence.
These results are in agreement with the ERP results: 1) the prestimulus ERP waveform of errors in AM had positive polarity but was close to the baseline, whereas for correct notes it had also positive polarity but a larger amplitude; 2) the prestimulus ERP waveform of errors in M had negative polarity and a large amplitude, but for correct notes the ERP was closer to the baseline. Both situations lead to an ERP component of negative polarity in the difference waveforms (error minus correct notes), the pre-ERN.
Finally, it is interesting to note that we obtained bimodal SNR curves for wrong and correct notes in the lack of auditory feedback. Such bimodal curves reflect 2 symbolic resonance effects leading to 2 subcomponents in the ERPs, which could also be observed in the difference ERP waveforms in Figure 3. Future investigations must address 1) the issue of the significance of the 2 subcomponents in the pre-ERN in M and 2) its presence or absence in AM.
The specific brain source generating the activity associated with the pre-ERN was the BA 32 of the rostral ACC. A vast majority of studies support the relevance of the ACC in error monitoring (Dehaene et al. 1994; Tanji 1996; Carter et al. 1998; Holroyd and Coles 2002) and in signaling the need for corrective adjustments (Klein et al. 2006; Ullsperger et al. 2007). Complementing the previous results, a number of findings have pointed to the engagement of the ACC in detecting motivationally salient negative events, such as errors, monetary losses in a gambling task or more general negative emotions (Luu et al. 2000a; Gehring and Willoughby 2002; Dunning and Hajcak 2007). As mentioned in the previous section and in the introduction, the rostral ACC is involved in the emotional evaluation of the erroneous outcomes (Van Veen and Carter 2002; Luu et al. 2003; Taylor et al. 2006). Our results add to the previous findings by providing evidence for a more emotional and less mechanistic processing of error detection in music performance. This can be understood if we consider that musical expertise is the product of years of intense practice guided by motivation. Similarly to other experts in motor control, such as elite surgeons or athletes, professional musicians analyze continually during their years of practice what they did wrong, adjust their techniques, and work arduously to correct their errors (Ericsson et al. 2007). Consequently, during musical training the motor programs are optimized to achieve the highest accuracy with a minimum of effort (Parlitz et al. 1998). The efficiency at the performance level is accompanied by an increased efficiency of cortical and subcortical systems for bimanual movement control in musicians (Haslinger et al. 2004). Precisely these achieved high-level motor skills might allow the musicians to focus on the expressive aspects of musical performance.
However, given that sLORETA or other inverse localization solutions are not optimal and may produce variability in the sources across experiment, we must wait for future studies in the context of error monitoring in music performance to validate the brain source of the pre-ERN.
From the previous results, several remarks must be considered. First, a number of studies investigated what happens before errors and showed that the disengagement of the error monitoring system can be detected one trial (error-related positivity; Ridderinkhof et al. 2003; Allain et al. 2004; Hajcak et al. 2005) or even 30 s (Eichele et al. 2008) prior to the actual execution of an error. The use of routinely executed repetitive tasks in these investigations, such as flanker or Stroop tasks, may have an impact in the reduction of the attention and effort and makes comparison with piano performance difficult.
Second, studies which use flanker, Stroop and gambling tasks to elicit errors report an ERN which peaks 100 ms later than the pre-ERN (Falkenstein et al. 1990; Gehring et al. 1993, 1995; Hewig et al. 2007). This result can be accounted for by advocating the role of long-term training in providing internal information for a faster functioning of the self-monitoring system, which is characteristic of highly skilled multimodal behaviors such as music performance or speech production. In the speech domain the study of Möller et al. (2007) reported for the first time a negative deflection in the ERPs elicited prior to an erroneous vocalization. In the music domain, our present data are the first to demonstrate how fast in advance the error signal is triggered in pianists and how the auditory feedback plays a role in the goal-directed motor program only at later stages of error processing. In line with the former, a recent experiment studying error monitoring in pianists while they had to execute musical scales and simple motor patterns also reports ERN before errors are committed (Maidhof et al., unpublished observations).
Third, in most ERN studies 2 different response hands are used (Gehring et al. 1993, 1995; Falkenstein et al. 2000). This can create a conflict between the activation of 2 different effectors (i.e., the left and right hand). Indeed, the debate whether the ERN reflects overt errors based on a comparator process (Gehring et al. 1993; Falkenstein et al. 2000) or the detection of conflict (Carter et al. 1998) has spawned a comprehensive amount of literature (Van Veen and Carter 2002). An important ERP component in the conflict response literature is the frontocentral N200 which is elicited in correct trials with high response conflict (Kopp et al. 1996; Wang et al. 2000). The conflict theory has established that the ERN is generated by conflict following response in error trials and that the N200 is rather generated by conflict prior to correct high-conflict trials (Van Veen and Carter 2002, 2006). Regarding the brain sources generating the ERN and the N200, there is evidence that different regions of the ACC are the main generators of these ERP components: the rostral “emotive” ACC for the ERN and the caudal “cognitive” ACC for the N200 (Kiehl et al. 2000; Menon et al. 2001). However, these results seem at odds with a more recent study showing the same generator in the caudal ACC for the ERN and N200 (Van Veen and Carter 2002). Interestingly, up to now no published data has reported activity in the ACC prior to the response in conflict correct trials.
Because the pre-ERN in our paradigm is elicited prior to errors, an important question is whether this ERP component is related to error detection, as we proposed, or rather to conflict detection. In order to answer that question, one should first consider whether the erroneous notes were produced as “error note/wrong finger” or “error note/correct finger.” In our study pianists were free to select the fingering, a condition which is already different from the mentioned paradigms. We cannot completely rule out the possibility of conflict between the activation of different fingers for a particular error because the fingering was not tracked. Nevertheless, we believe that the isolated erroneous notes were due to either a faulty motor preparation and execution which led to pressing the wrong neighbor-note (with the correct finger) or a serial ordering error. In this last case a wrong note event is prepared and activated based on: 1) the similarity of its metrical accent strength with the current note; 2) its serial proximity to the current note (Pfordresher and Palmer 2006; Pfordresher et al. 2007). Pfordresher and Palmer (2006) addressed the fingering issue in a model they proposed to predict serial error production in piano performance. Their model succeeded in predicting serial errors by using metrical similarity as the main predictor. In contrast, the fingering parameter was not able to predict the serial errors.
Another argument against the conflict theory is that the source generating the pre-ERN was found in the anterior part of the ACC and not in the caudal ACC, which has been broadly shown to be related to conflict detection (Kiehl et al. 2000; Menon et al. 2001; Van Veen and Carter 2002). Still, future research is needed to evaluate the validity of the conflict theory in music performance.
Finally, the literature on motor control has studied how the goal-directed movement is generated and how the presence or absence of visual feedback has an impact on the end point errors (Desmurget et al. 1995, 1998; Vindras et al. 1998). This is interesting in our context because the participants had to execute the musical pieces without visual feedback. Some findings point to systematic biases in the estimation of the initial state of the motor apparatus as responsible for reaching errors when there is no visual tracking of the limb (Vindras et al. 1998). This wrong estimation of the initial state would introduce a bias in the forward model predicting the next state of the system and would consequently initiate imprecise online corrections to match the target-goal (Desmurget and Grafton 2000). In contrast, precise visual information of the initial position of the hand before the reaching movement increases the accuracy (Rossetti et al. 1994; Desmurget et al. 1995). These findings can be applied to our paradigm in the sense that the lack of visual feedback during the performance might have introduced a higher variability in the target movements and thus have produced higher error rates. However, the reported data are challenged by another study demonstrating that corrections to the trajectory of the limb are based on non-visual feedback loops (Prablanc and Martin 1992). The best approach then to understand the accuracy of fast movements is to rely on a dual model which uses both internal forward information in terms of a motor plan and sensory feedback loops to make corrections at the end of the trajectory (Meyer et al. 1988; Milner 1992; Plamondon and Alimi 1997). This observation is underscored by our findings: the high precision in the performance at a fast tempo without visual feedback shows that internal forward models can produce an accurate motor execution. Moreover, the detection in advance of the errors as reflected in the pre-ERN and reduced loudness of errors proves that the central nervous system is capable of amazing accuracy in the movement predictions even without a second source of sensory feedback: the auditory feedback.
Listening Passively to Self-Made Errors Elicits a Feedback-Locked ERN
When pianists listened to their performances recorded in AM, we observed a large negativity between 220 and 260 ms at frontocentral brain regions elicited by the auditory feedback of errors. We propose that this ERP is an f-ERN. The f-ERN peaks around 200−350 ms after the feedback-onset of errors (Miltner et al. 1997; Badgaiyan and Posner 1998; Nieuwenhuis et al. 2002) and has been reported to originate in the ACC (Holroyd et al. 2004). The f-ERN has also been proposed to arise due to more general violations of expectancy (Oliveira et al. 2007). Interestingly, Heldmann et al. (2008) demonstrated that whenever there is internal self-monitoring information about errors, an ERN is elicited and additional feedback information about the error is redundant, which was reflected in a lack of the f-ERN. However, when there is no internal but only external (feedback) information, an f-ERN is observed. In light of these findings, we could interpret the negative ERP in A as an f-ERN: pianists were aware that they were listening to their performance, and, thus, to the outcomes of their actions in AM; the external auditory information could have elicited accordingly the f-ERN after erroneous pitches in such a context in which no internal self-monitoring information was available.
On the other hand, any one of the several negative ERP components, reported in the literature to arise around 200 ms after deviant auditory stimuli, is realistic in the present paradigm. For instance, the N2b has also a frontocentral topography but is associated to the conscious detection of task-relevant deviants (Novak et al. 1990), a condition which is not fulfilled in our case, because the errors were task-irrelevant. Another component, the mismatch negativity (MMN), indicates the detection of a deviant event in an otherwise invariant context (Giard et al. 1990; Näätänen 1992; Alho 1995). In our paradigm, the musical materials were highly versatile and even the correct notes would not constitute an homogeneous invariant context. Besides, the MMN displays a frontocentral-posterolateral polarity inversion at mastoidal sites, which did not hold in our case either.
A recent research work with musicians focused on the ERP components triggered by deviant sounds which are incongruent with the score (Katahira et al. 2008). The participants had to passively listen to melodies while tracking the notes on a score, and in response to deviant sounds, the brain responses generated an imaginary MMM (iMMN; Yumoto et al. 2005). This result reflects the violation of the auditory image generated by musicians when visualizing a score. Despite the interest of this study within the context of the music-related error monitoring, a direct comparison with our results in the auditory condition is difficult. We emphasized to our participants that they should listen carefully to their own performances, which could have engaged the action-monitoring system and triggered the f-ERN. Nevertheless, a follow-up study comparing pianists listening to their own performances and to the performances of others would elucidate the nature of the reported negative ERP waveform.
Center of Systems Neuroscience, Hanover; and the EU through the Marie Curie Early Stage Training Contract (MEST-CT-2005-021014) to M.H.R.
We are thankful to both Clemens Maidhof and Caroline Palmer for useful discussions and valuable comments. The authors also gratefully acknowledge the help of Marc Bangert in the implementation of the hardware and software to record the MIDI data. Finally, we are grateful to the anonymous reviewers for very helpful suggestions. Conflict of Interest: None declared.