Using functional magnetic resonance imaging, we show for the first time that levels of musical expertise stepwise modulate higher order brain functioning. This suggests that degree of training intensity drives such cerebral plasticity. Participants (non-musicians, amateurs, and expert musicians) listened to a comprehensive set of specifically composed string quartets with hierarchically manipulated endings. In particular, we implemented 2 irregularities at musical closure that differed in salience but were both within the tonality of the piece (in-key). Behavioral sensitivity scores (d′) of both transgressions perfectly separated participants according to their level of musical expertise. By contrasting brain responses to harmonic transgressions against regular endings, functional brain imaging data showed compelling evidence for stepwise modulation of brain responses by both violation strength and expertise level in a fronto-temporal network hosting universal functions of working memory and attention. Additional independent testing evidenced an advantage in visual working memory for the professionals, which could be predicted by musical training intensity. The here introduced findings of brain plasticity demonstrate the progressive impact of musical training on cognitive brain functions that may manifest well beyond the field of music processing.
Although it is a known fact that brain functioning and structure differ between trained musicians and non-musicians (Bever and Chiarello 1974; Schlaug et al. 1995; Schneider et al. 2002; Musacchia et al. 2007; James et al. 2008; Oechslin et al. 2009, 2010; Elmer et al. 2011; James et al. 2011), the origins of these changes remain a matter of debate. Recently, 2 promising longitudinal studies demonstrated that musical training—and thus nurture not nature—may determine functional (Moreno et al. 2009) and structural (Hyde et al. 2009) brain plasticity. Using functional magnetic resonance imaging (fMRI) and behavioral approaches, we investigated, with a multilevel cross-sectional approach, the impact of training history on brain function during musical syntax processing. We hypothesized that such functional brain plasticity exhibits stepwise differences with increasing levels of musical expertise.
Musical phrases in western tonal tradition end with explicit and highly expected harmonic conclusions. Such expectation being a consequence of musical background, cerebral processing of transgressions of musical grammar may reflect degree of expertise. So far this relationship has been studied with electroencephalography (ECG) opposing musicians to non-musicians (Koelsch, Schmidt, et al. 2002; James et al. 2008). In the present study, we recruited 3 groups of participants with distinct levels of expertise and observed brain responses to expectation violations of varying strength with event-related fMRI.
The neural correlate of musical syntax processing is traditionally examined by establishing a musical prime context and evaluating brain responses to interspersed harmonic transgressions in auditorily presented musical material (Maess et al. 2001; Koelsch, Schmidt, et al. 2002; Tillmann et al. 2006). Musical end formulas or “cadences” in classical western tonal music consist of a particular sequence of chords that always ends on the tonic, a chord built on the root note of the tonality used. Peoples' regular exposure to various genres of western tonal music leads to expectations about the structure of such closures; even if they are non-musicians and absolutely naïve with respect to harmony theory (Bigand and Poulin-Charronnat 2006). Brain responses to structural violations in such musical closures engage a fronto-temporo-parietal network that is considered typical for auditory- and visual-target detection and novelty processing in general (Kiehl et al. 2001; Strobel et al. 2008). Accordingly, previous fMRI research revealed that expectation violations following short chord sequences involve a large neural network in non-musicians comprising the pars opercularis (POp), anterior insula (INS), supramarginal gyrus (SMG), superior temporal gyrus, and middle temporal gyrus (MTG) (Tillmann et al. 2006). Electrophysiological studies enlightened the timecourse of expectation violation, showing an early brain response to syntactical irregularities in music, described as the early right-anterior negativity (ERAN), with maximum amplitude around 200 ms post-stimulus (Koelsch, Schmidt, et al. 2002). Putative sources of the ERAN were localized in Broca's area (POp in particular) and its right-hemisphere homolog (Maess et al. 2001). Musical expertise leads to increased ERAN amplitude following such syntactical violations (Koelsch, Schmidt, et al. 2002).
To optimally separate brain and behavioral responses as a function of level of expertise, we implemented 2 different refined in-key harmonic irregularities at musical closure in expressive polyphone music. This is in contrast with most previous research on musical syntax processing (see Experimental Procedure for more details). We anticipated that these challenging musical targets would be selective for the three levels of musical expertise. In line with this, recent research using an event-related potential (ERP) source localization technique (James et al. 2008) identified putative sources of an early right ERP component (peak latency approximately 230 ms), occurring exclusively in expert pianists, in right medial temporal structures (INS, hippocampus, and amygdala), in response to subtle in-key harmonic irregularities in expressive music. In sum, a great number of studies contributed to the identification of the anatomical correlates and temporal dynamics of these brain networks involved in musical syntax processing.
The present work focuses on the influence of musical practice intensity on experience driven brain plasticity relative to music syntactic processing. The novel contribution of this research consists in a systematic observation of brain activation patterns as a function of syntactic transgression strength and degree of musical experience within an ecological musical context. Our fMRI design, comprising 3 groups with an increasing level of musical expertise and 2 levels of syntactical transgression, enables us to address the following 3 questions: 1) Which brain areas are modulated by the level of musical expertise irrespective of the strength of harmonic transgressions (main effect of “Expertise”)? 2) Which brain areas are modulated by the strength of harmonic transgression irrespective of the level of musical expertise (main effect of “Transgression”)? 3) Which brain areas are modulated by the level of transgression strength and as a function of musical expertise (interaction between “Transgression” and “Expertise”)?
Materials and Methods
For this study, we recruited 20 expert (professional) pianists (E: Mean age = 24.5 [standard deviation, SD = 4.5], 10 f/10 m, start of musical training at mean age = 6.2 [SD = 1.9], mean training duration [in years] = 18 [SD = 4.2]), 20 amateur pianists (A: Mean age = 22.2 [SD = 3.1], 10 f/10 m, start of musical training at mean age = 7.0 [SD = 1.4], mean training duration [in years] = 14.4 [SD = 4]), and 19 non-musicians (N: Mean age = 24.0 [SD = 4.5], 9 f/10 m) matched by gender and age (1-way analysis of variance (ANOVA) on age, F2,56= 2.2, P= 0.12). One non-musician was excluded from the analysis, since structural data acquisition was corrupted. All subjects were positively tested for right-handedness (“Edinburgh Inventory,” Oldfield 1971) and reported normal hearing. E and A samples did not significantly differ for the age of start of musical training. We defined inclusion criteria for N to have no extracurricular musical education. The criterion for being part of group A was defined as being still musically active at the moment of participating this study; however, musical practice should have never exceeded 10 h per week. To control for this we assessed individual training intensity by a questionnaire focusing on the following age periods (in years): 6–8, 8–10, 10–12, 12–14, 14–16, 16–18, and 18–25. A and E differed significantly in training hours per week already from the age period 10–12. For the early periods (6–8, 8–10), we found no significant differences in training intensity, whereas the subsequent periods revealed a pattern of a consistently increasing training lag of A behind E (Table 1). We are aware of the fact that such retrospective assessments could suffer of a certain lack of precision. However, from a music-pedagogical point of view, we consider it an interesting finding: Expert musicians report a higher training intensity compared with amateurs within a period of life (10–12 years) preceding the average explicit decision for professional musicianship. Either this effect is due to a retrospective bias, or due to predetermining intrinsic and extrinsic factors. As the case may be, to our knowledge there is no research so far explicitly focusing on early career development of expert and amateur musicians.
|Age period||A (h/week)||E (h/week)||T-values|
|10–12||4 (±2.3)||6.5 (±4.3)||2.7*|
|12–14||4.7 (±2.6)||9 (±5.3)||3.3**|
|14–16||5.3 (±3.2)||14.8 (±7.7)||5.1***|
|16–18||4.7 (±2.2)||19.9 (±9.3)||7.1***|
|18–25||4.4 (±2.9)||30.7 (±8.5)||12.4***|
|Age period||A (h/week)||E (h/week)||T-values|
|10–12||4 (±2.3)||6.5 (±4.3)||2.7*|
|12–14||4.7 (±2.6)||9 (±5.3)||3.3**|
|14–16||5.3 (±3.2)||14.8 (±7.7)||5.1***|
|16–18||4.7 (±2.2)||19.9 (±9.3)||7.1***|
|18–25||4.4 (±2.9)||30.7 (±8.5)||12.4***|
Notes: Differences were tested by t-tests for independent samples (2-tailed, asterisks indicate the level of significance: *P< 0.05, **P< 0.01, ***P< 0.001).
Task Independent Behavioral Measures
Because musical syntax processing demands to keep trace of the passing musical context, it is reasonable to assume that musical practice yields enhanced working memory capacities. Recently published highly relevant data evidenced such effects for non-verbal auditory stimulus material (Schulze et al. 2011). In the context of the present investigation that focuses on higher order brain functioning, we wanted to investigate whether working memory enhancement as a consequence of extensive musical training undergoes cross-modal transfer effects. Complex music behavior strengthens audiovisual binding and multisensory integration including motor actions (Hodges et al. 2005; Lee and Noppeney 2011). To assess working memory capacity beyond auditory processing and detached from the context of music, our participants performed a visual n-back letter task. Here, we applied a shortened version of a n-back paradigm previously published by a research group in our faculty at the University of Geneva (for a detailed description of the paradigm we refer to Ludwig et al. (2008)). During the 3-back letter paradigm, participants responded by button presses whether a letter (consonant) presented on the screen also appeared 3 items before, independently of case. Presented were 4 blocks of 36 trials each; the first block served as training. Maximum score was thus 3 × 36 = 108.
To assess participants' general capacities of processing musical material, we administered the “Advanced Measures of Music Audiation” (Gordon 1989). This test comprises 30 trials consisting of pairs of musical melodies. The listener judges for each pair whether the melodies are identical or different, either tonally or rhythmically.
Finally, we evaluated all participants' fluid intelligence by Raven's “Advanced Progressive Matrices” (Raven et al. 2003).
All task independent measures were performed during a second appointment about 1 week after the brain imaging experiment. To anticipate any fatigue or training effects, we administered these tests across subjects in a randomized order.
Experimental Procedure (fMRI)
Our stimuli covered a large range of musical styles from baroque to late romanticism, balanced for minor and major tonalities (cf. examples in Supplementary Material). The musically compelling material was specifically composed for our experiment by a professional composer (see Notes). The compositions (n= 30) consisted exclusively of string quartets (for 2 violins, viola and cello), with a mean duration of ∼10 s. On the one hand, it is a very common and well-established musical genre and on the other hand, string quartets facilitate harmonic encoding because each voice is played by a single instrument. Moreover, we avoided a possible “instrument of practice effect” (Shahin et al. 2008) as all our participants were pianists. The music was arranged with the notation software “Sibelius” (Avid Technology, Inc.) and “Logic Pro” (Apple Inc.); instrumental timbres were established using the “Garritan Personal Orchestra” (Garritan).
We implemented carefully voiced harmonic transgressions at musical closure within the tonality of the preceding musical context (in-key) in expressive musical stimuli. Importantly, as such these chords would make perfect sense if the music continued. However, as endings, these chords represent syntactic transgressions according to the western tonal rule system; only the “tonic” in the root position is appropriate. The tonic consists of the first, third, and fifth notes of the scale or the tonality. Differences between the 3 versions of each piece appeared only in the last chord. These terminal chords were either 1) regular endings (R), composed of the first degree or tonic in the root position (I, fundamental or first note of the scale in the bass); 2) subtly transgressed endings (Tsub), composed of the first degree or tonic in first inversion (I6, third note of the scale in the bass); or 3) apparently transgressed endings (Tapp), composed of the fourth degree or “subdominant” in first inversion (IV6). The latter chord is built on the fourth note of the scale; in its first inversion the sixth note of the tonality used in the rest of the piece appears in the bass. R represents the perfect and highly expected ending (cadence) of a piece, whereas Tapp at closure provokes an apparent expectation violation. The subdominant is traditionally used as second from last chord of a cadence. Finally, Tsub exposes a subtle transgression that is much harder to detect, because it contains the same notes as R but arranged in a different order. Figure 1 displays the scores of two musical pieces with their 3 different endings (R, Tsub, Tapp, see Supplementary Stimulus Material for audio examples).
This set-up contrasts with previous research observing musical syntax processing. Most of these studies used chord sequences, not expressive music. Some studies presented highly apparent expectation violations containing out-of-key notes (Patel et al. 1998; Maess et al. 2001; Koelsch et al. 2005). Other studies used more subtle and partially in-key transgressions as well (Tillmann et al. 2003; Koelsch et al. 2007). Investigating listeners, sensitivity for in-key and out-of-key transgressions revealed identical response accuracy (Koelsch et al. 2007). However, with the above outlined manipulations (Tsub, Tapp), we aimed to establish 2 levels of in-key transgressions that would provoke distinct behavioral performances. Our set of expressive stimuli therefore yields a refinement of the musical priming paradigm and challenges the sensitivity for conclusivness in musical cadences allowing to assess its training related adaptations both by brain activations and by the basis of behavioral performance. Finally, the great number and variability of our ecological musical stimuli allows to draw more general conclusions on music processing.
Using “Adobe Audition” (Adobe Systems), we tagged onsets of terminal chords and used this information to position the stimuli into the scanner paradigm allowing us to keep constant time periods between the onset of last chords and image acquisition (jittering: 4.5, 5.0, 5.5s post-last-chord, Fig. 2).
During the fMRI session, we administered all stimuli by “e-prime” software (Psychology Software Tool, Inc.) and subjects were instructed to keep their eyes open and focus on a fixation cross. All participants held the response box in their right hand. Pressing the index finger button indicated satisfaction, whereas pressing the middle finger button indicated dissatisfaction.
The main experiment was divided into 2 separate parts, both lasting ∼19 min in total, stimuli were presented in a pseudo-randomized order. Moreover, to cancel out any sequential effects, half of the subjects were presented with a reversed order of the stimuli. In sum, we presented 108 trials consisting of 30 string quartets with regular endings (R), 30 with subtle transgression at closure (Tsub), 30 with apparent transgression at closure (Tapp), and 18 silence periods (S).
Comprehensive task explanation was given by means of written text before scanning. Participants were instructed to listen carefully to the pieces and respond by means of right hand button presses whether the endings of the pieces sounded satisfactory or not. Already installed in the scanner participants passed a short training, identical in all aspects to the main experiment, during which 3 example pieces, one within each experimental condition (see below) were presented. Afterwards, exchange was possible, and, if necessary, instructions were explained once more.
To evaluate whether the experimental forced choice task revealed group selectivity and to estimate the impact of target salience (Tsub vs. Tapp), we calculated rater sensitivity (d-prime or d′) based on signal detection theory (Macmillan and Creelman 1997) and executed a 2-way ANOVA on the individual d′ scores of N, A, and E for Tsub and Tapp, respectively.
fMRI Data Acquisition and Analysis
Binaural auditory stimulation was provided via a digital playback system (System: MR-confon mark II [MR confon GmbH]). T2*-weighted echo planar imaging (descending acquisition, echo time [TE] = 30 ms, repetition time [TR] = 18.6 s, flip angle = 90 °, field of view [FOV] = 205 × 205 mm, slice thickness = 3.2 mm, voxel size = 3.2 × 3.2 × 3.2 mm, slices per volume = 36, volumes = 56, 2 acquisition sessions) was performed on a 3-T scanner (Siemens TRIO, Erlangen, Germany). A scheme of our experimental paradigm (“silent” fMRI, sparse temporal sampling, single image acquisition) is delineated below in Figure 2. Additionally, we recorded a T1-weighted rapid gradient-echo structural image for each individual (TE = 2.27 ms, TR = 1900 ms, flip angle = 9°, FOV = 256 × 256 mm, slice thickness = 1 mm, inversion time = 900 ms, voxel size = 1 × 1 × 1 mm). The data analysis was performed with statistical parametric mapping software (SPM8, Welcome Department of Imaging Neuroscience, London, United Kingdom). The preprocessing pipeline comprised spatial realignment of both acquisition sessions, segmentation and normalization of the T1 image to MNI space, normalization of functional images to MNI space by applying individuals' structural normalization parameters, and a smoothing procedure with an 8-mm Gaussian kernel, Silence contrasts were determined as implicit baseline at first-level analysis. Realignment parameters were introduced as regressors to account for head movements. Due to the experimental scan design (Fig. 2), we used a finite impulse response (window length: 2.1 s, first order) as basic function and preceded the analysis in event-related manner. To examine expertise-dependent modulations as a function of musical syntax transgression, we processed for second-level analysis individual contrasts transgression versus regular (subtle: Tsub vs. R and apparent: Tapp vs. R). This was driven by the aim to isolate brain activation patterns that purely characterize syntax transgression processing. For further group-level analysis, we specified a SPM8 full factorial model via a 3 × 2 ANOVA: Expertise (N [n= 19]/A [n= 20]/E [n= 20]) × Transgression (Tsub vs. R/Tapp vs. R). The main effects (Expertise, Transgression) are reported on the P< 0.0001 level, the interaction effect (“Expertise × Transgression”) is reported on the P< 0.01 level. All maps are corrected for multiple comparisons at cluster level (family-wise error [FWE] corrected, P< 0.05), yielding an extent threshold of 13 voxels per cluster.
In-scanner Behavioral Results
A 2-way ANOVA (Expertise × Transgression) executed on rater sensitivity (d′; Macmillan and Creelman 1997) for both transgressions (Tsub and Tapp; Fig. 1), yielded significant main effects for the factors Expertise, Transgression, and interaction Expertise × Transgression (Expertise: F56.2 = 51.4, P < 0.001; Transgression: F56.1 = 210, P < 0.001; Expertise × Transgression F56.2= 13.6, P < 0.001). Post hoc t-tests for independent samples (2-tailed, Bonferroni-corrected for multiple comparisons) showed that all groups significantly differed in Tsub sensitivity (N < A < E), whereas N significantly differed from both musician groups in Tapp sensitivity (N < A/E): Tsub: E (mean = 3.0 [SD = 0.9]) versus A (mean = 1.2 [SD = 0.8]), T38= 6.3, P< 0.001; E versus N (mean = 0.4 [SD = 0.7]), T37= 10.1, P< 0.001; A versus N, T37= 3.6, P< 0.01. Tapp: E (mean = 4.2 [SD = 0.6]) versus N (mean = 1.4 [SD = 1.2]), T37= 9.3, P< 0.001; A versus N (mean = 3.5 [SD = 1.2]), T37= 5.5, P< 0.001; E versus A, T38= 2.4, n.s. Post hoc t-tests for dependent samples (2-tailed, Bonferroni-corrected for multiple comparisons) revealed consistently smaller sensitivity for Tsub then for Tapp within all groups (Tsub vs. Tapp): N: T18= 6.6, P< 0.001, A: T19= 11.2, P< 0.001, E: T19= 6.9, P< 0.001 (within-group effects are not illustrated in Fig. 3).
Performing a 3 × 2 ANOVA (SPM8, full factorial model) comprising the factors Expertise (levels: N, A, E) and Transgression (levels: contrast Tsub vs. R, contrast Tapp vs. R), we found significant activation clusters for both main effects (Expertise, Transgression) and interaction (Expertise × Transgression; Table 2). To illustrate the effects' directionalities, we plotted for each group and both levels of transgression mean β-values of each significant cluster's peak voxel. Beta weights correspond to the fit of the assumed general linear model of the measured brain signal. Consequently, large β-values (either positive or negative) indicate strong cerebral responses (either activation or deactivation) related to the modeled experimental conditions and implicit baseline. Anatomical labels were assigned according to cytoarchitectonic probabilities using the SPM anatomy toolbox (Eickhoff et al. 2007). Additionally, with respect to the identification of premotor subregions, especially regarding the localization of pre-supplementary motor area (pre-SMA), we refer to research based on functional–anatomical assignment and connectivity analyses (Picard and Strick 2001).
|Regions||BA||Voxels||Zmax||Fmax||Coordinates LH||Coordinates RH|
|Transgression × Expertise|
|Regions||BA||Voxels||Zmax||Fmax||Coordinates LH||Coordinates RH|
|Transgression × Expertise|
Notes: The main effects of Expertise, Transgression, and Interaction Expertise × Transgression are specified by anatomical labels and assigned Brodmann areas (BA), cluster size (voxel), local peak effects (Z/F value), and the MNI coordinates of the local peak in left (LH) and right (RH) hemispheres, respectively. The main effects are reported by the threshold of P< 0.0001, interaction effects by P< 0.01 (all P-values are FWE-corrected on cluster level).
ACC, anterior cingulum; SMG, supramarginal gyrus; pre-SMA, pre-supplementary motor area; POp, pars opercularis; Prec, Precuneus; PG, postcentral gyrus; INS, insula; MTG, middle temporal gyrus; mSFG, medial superior frontal gyrus.
The main effect of Expertise (Fig. 4, upper panel) revealed a global down modulation of activations in response to harmonic transgressions as a function of increasing musical expertise, irrespective of transgression level. The activation pattern involved exclusively right-hemisphere clusters: Anterior cingulum (ACC, peak effect), SMG, pre-SMA, POp, precuneus (Prec), and postcentral gyrus (PG). The main effect of Transgression (Fig. 4, lower panel) exposed a consistent effect of lower activations in response to apparent transgressions compared with subtle transgressions, irrespective of expertise level. This effect is limited to the right-hemisphere INS (peak effect) and pre-SMA.
The interaction Expertise × Transgression unveiled a complex activation pattern showing that processing of different levels of harmonic transgression is modulated stepwise by the level of musical expertise. Amateur musicians' brain activations were consistently on an intermediate level. This effect manifested in the right MTG (peak effect, Fig. 5 upper panel) that exhibited increasing activations as a function of expertise while processing apparent transgressions (Tapp) and decreasing activations as a function of expertise while processing subtle transgressions (Tsub). Moreover, we found an opposite activation pattern in a frontal network (Fig. 5, lower panel), symmetrically distributed in both hemispheres, comprising both insulae (INS_r, INS_l), bilateral POp, and the right medial superior frontal gyrus (mSFG).
Additional Behavioral Measures
In a 3-back working memory task on visual material (letters), E significantly outperformed N (Fig. 6A). Group comparisons were performed by t-tests for independent samples (1-tailed, Bonferroni-corrected for multiple comparisons): E (mean = 84.4 [SD = 7.1]) versus A (mean = 75.8 [SD = 19]): T38 = 1.9, n.s.; E versus N (mean = 72.1 [SD = 19.7]): T38 = 2.6, P< 0.05. We found no significant difference between A and N. An 1-way ANOVA for the factor Expertise did not reach significance (F2,56= 2.9, P= 0.06). Moreover, we tested the predictability of n-back performance by training intensity between the ages of 18–25 years (Table 1). This certainly appears to be the most important period in life with respect to the dissociation between amateur and professional musicianship. We performed regression analyses separately for A and E (Fig. 6B), whereas the best fits were achieved by quadratic models (A: R2= 0.131, β = 0.369, T16= 1.186, P = 0.255; E: R2= 0.292, β = 0.599, T19= 2.515, P = 0.022).
Analyses on results of Gordon's test (Fig. 6C) for musical aptitude revealed that E significantly outperformed A and N for both subtests (rhythm, tonal). Within-group comparisons showed that both non-professional groups (N/A) performed significantly better at the rhythm subtest than at the tonal subtest, which was not the case for the professionals (E). Accordingly, a 2-way ANOVA unveiled significant main effects for the factors Expertise and Subtest, as well as significant interaction Expertise × Subtest (Expertise: F56.2= 10.6, P< 0.001; Subtest F56.1= 13.5, P< 0.001; Expertise × Subtest: F56.2= 4.6, P< 0.05). Between-group comparisons were performed by t-tests for independent samples (1-tailed, Bonferroni-corrected for multiple comparisons, on Gordon's test maximum tonal/rhythm score 40, Fig. 6C): E (mean = 65.6 [SD = 8]) versus A (mean = 58.5 [SD = 9.7]): T38 = 2.5, P< 0.05; E versus N (mean = 55 [SD = 6.6]): T38= 4.1, P< 0.001. We found no significant performance differences between A and N. To examine within-group performance differences between subtests (tonal vs. rhythm), we carried out paired t-tests (2-tailed): E: no significant difference, A: T19= 4.7, P< 0.001; N: T18= 4.7, P< 0.05. Thus, Gordon's test appears not to be selective for an intermediate level of expertise, namely musical amateurship. Applying the Edinburgh Inventory (Oldfield 1971) all subjects were tested positively for right-handedness. Controlling for fluid intelligence by Raven's Advanced Progressive Matrices (Raven et al. 2003), revealed no significant group differences.
The cardinal result of this study resides in gradual changes in behavior and brain responses relative to musical syntax processing as a function of musical expertise level. We were able to demonstrate for the first time that degree of training intensity induces stepwise changes in brain functioning and behavioral performances.
Behavioral Differences Between Non-Musicians, Amateurs, and Experts
Our in-scanner transgression detection task successfully separated participants' performances according to their level of musical expertise. The distribution of individual d′ (Fig. 3, upper part) considerably overlaps such that the best non-musician's performance reaches accuracy of the lowest expert's performance, whereas amateurs are well situated in between. This pattern accounts for the assumption that non-musicians possess a certain amount of implicit knowledge about musical structures (Koelsch et al. 2005; Bigand and Poulin-Charronnat 2006). However, post hoc t-tests revealed that d′ scores for Tsub perfectly dissociated all groups hierarchically, whereas for Tapp amateurs and expert musicians did not significantly differ in performance (Fig. 3, lower part). These analyses demonstrate on the one hand that amateur musicians perfectly detect relatively salient transgressions at musical closure consisting of an in-key chord other than the tonic (Tapp). On the other hand, even though some amateurs show remarkable sensitivity for incorrectly built tonics (Tsub), they do by far not reach experts' accuracy. Thus in contrast to the results of “Gordon's test”, which did not dissociate non-musicians from amateurs (Fig. 6C), our behavioral in-scanner task discriminated between musical naïve subjects and musical experienced subjects in both conditions Tsub and Tapp, and moreover, between all 3 levels of musical expertise in condition Tsub (Fig. 3).
The 3-back letter task revealed that E significantly outperformed N, whereas A performed on an intermediate level that did not statistically differ neither from N nor from E (Fig. 6A). Furthermore, regression analyses (Fig. 6B) disclosed that training intensity (hours per week [h/w]) within the most important period for professional musical formation (18–25 years) significantly predicts visual working memory performance. Comparing musicians with non-musicians, previous research reported that musicians show advantages in tonal (Schulze et al. 2011) and verbal auditory working memory, but not in visual working memory tasks (Chan et al. 1998). To the best of our knowledge, these findings represent the first report of visual working memory advantages as a function of musical expertise outside the auditory domain. Taking into account the linkage between musical training intensity and visual working memory performance, we assume that training induced cross-modal plasticity can be attributed to extensive multisensory integration during longtime deliberate practice typical for professional musical behavior (Krampe and Ericsson 1996; Wan and Schlaug 2010; Grahn et al. 2011; Lee and Noppeney 2011). Furthermore, this finding may be attributed to an alteration of modality independent executive control (Baddeley 2003) with putative neural correlates in ventral premotor cortex and ACC (D'Esposito et al. 1995; Osaka et al. 2004). The origins of these advantages may reside in the high attentional demands of on-stage performance.
Brain Responses to Syntax Transgressions are Modulated by the Level of Musical Expertise and Transgression Strength
Studies on structural connectivity support our findings that reported temporal and ventrolateral frontal areas are tied together within a cerebral network responsible for auditory pattern recognition requiring selective attention (Frey et al. 2008; Saur et al. 2008).
In line with earlier findings (61 −24 9, appended coordinates (MNI) refer to peak voxels in discussed activation clusters; Steinbeis and Koelsch 2008), the Expertise × Transgression interaction pattern in right MTG (Fig. 5, upper part) most likely reflects the decoding of terminal chords' frequency distribution relative to the previous musical context (“sound to meaning mapping”). Based on statistical learning, a tonic chord in root position at musical closure is highly expected; both transgressions used here violate this expectation, Tapp much stronger than Tsub though. Our data reveal that such functioning is modulated by the level of musical training and suggest an expert specific signature in neural encoding of harmonic transgressions in music as a function of their perceptual salience (Tapp being more salient than Tsub). This finding reveals in particular, relative to perfect cadences (R), that increasing musical expertise enhances the MTG responsiveness for apparent transgressions (Tapp), but dims the responsiveness for subtle transgressions (Tsub). We assume that this interaction pattern reflects the degree of harmonic knowledge represented by our 3 levels of musical expertise: Harmonically speaking, the subtle transgression (Tsub = first inversion of the fundamental chord, I6) is very similar to the pattern stored (by experts) and known as the perfect cadence (R), whereas the apparent transgression (Tapp = first inversion of the fourth degree, IV6) is very deviant compared with the perfect cadence (although in-key). In summary, the ability of MTG to map sound to meaning is highly accurate, even for harmonic in-key phrasings (Tsub vs. R) and moreover, the level of musical expertise progressively modulates this feature.
An opposite interaction pattern exhibited in bilateral ventrolateral premotor cortices (POp), INS, and mSFG (Fig. 5, lower part). Such a down modulation with expertise for salient transgressions (Tapp) suggests increased efficiency in the processing of these targets that were easily detected by all pianists. The up-modulation for the inconspicuous (Tsub) transgressions suggests higher task demands for those subjects who are sensitive to these transgressions. The role of the POp (bilaterally, with right-hemisphere preponderance) in processing of musical syntax is well established (50 6 14, Maess et al. 2001; 52 9 11, Koelsch et al. 2005; 43 15 3, Tillmann et al. 2006) and has been found to be responsive to deviant musical events in combination with anterior INS (29 23 11/−32 20 15, Koelsch, Gunter, et al. 2002; 34 22 5, Tillmann et al. 2003). Within the context of auditory perception, the common understanding of POp functionality is referring to its role in decoding sequential auditory information (Koelsch, Gunter, et al. 2002; Tillmann et al. 2003, 2006; Koelsch et al. 2005). However, due to the SPM contrasts introduced in the factorial design (Tsub/Tapp vs. R), we suggest that the present findings allow to narrow down the POp functionality to a monitoring unit, highly sensitive for targets within sequential patterns. This supports the assumption by Janata et al. (2002), who considered POp to contribute to a network involved in general cognitive utilities such as top-down attention and working memory.
The anterior INS cortices are known to be involved in various aspects of auditory processing (review by Bamiou et al. 2003), such as discrimination of language pitch patterns (Wong et al. 2004) and rhythm processing (Platel et al. 1997). The latter implication was recently more precisely assessed as working memory for rhythm processing (Jerde et al. 2011). Additionally, melody perception (Wehrum et al. 2011) and the evaluation of emotional valence in music (Trost et al. 2011) were also associated to the anterior INS. With respect to the here introduced contrasts in the factorial design (Tsub/Tapp vs. R), our results yield an increased specification of the anterior INS responsiveness that goes in parallel with POp functionality such as target detection under high attentional demands within the context of sequentially presented stimulus material.
Taking into account the third area that contributes to the network identified by the interaction effect (mSFG), it seems plausible that the present superior/prefrontal ensemble exceeds auditory domain-specific processing. The fact that in visual tasks mSFG activation is strongly driven by task attentional load (−12 9 51, Vickery and Jiang 2009) and working memory (discussed by Klingberg 2006) supports this assumption. Increased bilateral anterior INS activation occurred in situations of decision under uncertainty (30 26 8/−33 22 12, Grinband et al. 2006). Finally, we like to put forward the integrative function of these areas that may apply to the auditory and the visual perceptual domain (Downar et al. 2002; Eckert et al. 2009; Sterzer and Kleinschmidt 2010) and reflect task or stimulus driven perceptual demand.
A different interpretation of the here discussed finding that musical training induces a progressive modulation of the functional couple POp/INS resides in the fact that both areas contribute to what is understood as “Broca's area”. This is not least because one of Broca's patients, Leborgne, suffered of a posterior inferior frontal lesion including amongst other areas such also POp and anterior INS (Dronkers et al. 2007). The role of the POp in music processing is often linked to the concept of the mirror-neuron system: A population of neurons located in monkeys' Broca homolog (area F5) has been found to be specifically responsive to action-related sounds (Kohler et al. 2002). Following this it is assumed that musical training strengthens sensory-motor integration that yields “mirrored” motor activation patterns evoked by associated auditory percepts (Lahav et al. 2007; Zatorre et al. 2007). Accordingly, when listening to prior practiced piano music, important longitudinal data unveiled that even musical novices show co-activations of auditory and motor areas (Bangert and Altenmüller 2003). However, in the present research we aimed to rule out this linkage by presenting unknown musical stimulus material played on instruments (violin, viola, and violoncello) not identical to the listeners' instrument (piano).
Subtle Transgressions Yield Stronger Activations
Comprising 2 clusters (right anterior INS and pre-SMA), the main effect of Transgression yielded stronger activations in response to Tsub compared with Tapp, irrespective of expertise level. At first glance, it seems counterintuitive that higher activations follow subtle rather than apparent transgressions. However, this finding is perfectly coherent if explained relative to perceptual decision-making. Our task forced listeners to take categorical decisions, whether musical pieces ended correctly or incorrectly, at different levels of certainty. All subjects, independent of musical expertise level, consistently manifested lower sensitivity (d′) for Tsub than for Tapp (see In-scanner Behavioral Results). Thus all subjects coped with higher uncertainty in detecting the inconspicuous Tsub compared with the salient Tapp stimuli. In line with this, categorical decision-making under increased uncertainty, in the visual domain, could be associated with higher activations in various regions including pre-SMA (4 8 60, Deary et al. 2004) and anterior INS (36 26 8, Grinband et al. 2006). Moreover, in this context pre-SMA (Deary et al. 2004) reactivity is suggested to be modality independent: Increased pre-SMA activations manifested as a consequence of response selection under uncertainty in the context of both visual and auditory stimuli (−2 8 50/6 12 54, Sakai et al. 2000).
Down-modulation in Specific Brain Networks with Increasing Expertise Irrespective of Transgression Strength
The main effect of Expertise involved a large consistently right lateralized brain network (ACC, SMG, pre-SMA, POp, Prec, and PG). First of all, this finding confirms the involvement of certain regions, especially POp, in processing syntactical irregularities in musical material (50 6 14, Maess et al. 2001; 52 9 11, Koelsch et al. 2005; 43 15 3, Tillmann et al. 2006). Moreover, especially SMG and POp have been identified by fMRI research as expertise specific regions for musical syntax processing (57 −49 13, Koelsch et al. 2005; 54 −33 27, Wehrum et al. 2011). In this context, the results by Wehrum et al. (2011) support our findings that revealed a decrease of activation in musicians compared with non-musicians in areas including the SMG during the processing of melodic incongruities. As discussed above with respect to the interaction Expertise × Transgression, apart from its role in musical syntax processing, POp is also involved in domain general cognitive aspects such as attention and working memory. Medial prefrontal regions (i.e. dorsal subdivision of ACC and pre-SMA) appear to be involved in cognitive aspects of error detection (Bush et al. 2000). This has been confirmed lately by using an attentive visual double dissociation task separating the neural correlates of cognitive and emotional functioning (Mohanty et al. 2007). Accordingly, we found here a stepwise decrease of brain activity as a function of expertise in the same areas, reflecting lower task demands or higher efficiency in transgression detection. Subjects' behavioral performance indirectly supports this, since increasing musical expertise leads to a robust advantage for detecting harmonic transgressions (for d′-values, Fig. 3). Striking is the spatial overlap with a fronto-temporo-parietal network associated with universal attentional sub-functions such as target detection and novelty processing.
The down modulation of activation with expertise in the Prec, known to be recruited in musical performance, imagining, and listening (Parsons et al. 2005), possibly also reflects greater processing efficiency for musical material. The same effect for the PG may reside in the fact that in highly automatized actions, like piano playing, sensory feedback is suppressed (Binkofski et al. 2002) to allow smooth virtuoso motor behavior.
In this study, fMRI was used to investigate changes in the cerebral processing of musical syntax as a function of expertise. We found strong evidence for variations in behavior and brain function due to the degree of expertise, transgression strength, and their interactions. The changes in activation level occurred in brain regions linked to universal functions of target detection and novelty processing, which suggest transfer effects of such brain plasticity beyond the auditory domain, especially in the field of attention and memory. Additional independent testing evidenced visual working memory advantages for our musical experts. We conclude that the high attentional and mnesic demands of musical performance may induce wide spread developmental advantages that deserve to be more extensively studied in the future.
This work was supported by the Swiss National Science Foundation (100014–125050 to C.E.J.) and is part of a multibrain imaging project entitled “Behavioral, neuro-functional, and neuro-anatomical correlates of experience dependant music perception”.
We are indebted to Andres Posada, Sebastian Rieger, Alexis Hervais-Adelman, and Julien Chanal for assisting in MR data acquisition, fruitful discussions about fMRI setup and data analysis. Then we thank Bruno Pietri for the compelling compositions. Conflict of Interest: None declared.