Musical training has been shown to positively influence linguistic abilities. To follow the developmental dynamics of this transfer effect at the preattentive level, we conducted a longitudinal study over 2 school years with nonmusician children randomly assigned to music or to painting training. We recorded the mismatch negativity (MMN), a cortical correlate of preattentive mismatch detection, to syllables that differed in vowel frequency, vowel duration, and voice onset time (VOT), using a test-training–retest procedure and 3 times of testing: before training, after 6 months and after 12 months of training. While no between-group differences were found before training, enhanced preattentive processing of syllabic duration and VOT, as reflected by greater MMN amplitude, but not of frequency, was found after 12 months of training in the music group only. These results demonstrate neuroplasticity in the child brain and suggest that active musical training rather than innate predispositions for music yielded the improvements in musically trained children. These results also highlight the influence of musical training for duration perception in speech and for the development of phonological representations in normally developing children. They support the importance of music-based training programs for children's education and open new remediation strategies for children with language-based learning impairments.
Musical training has been shown to enhance preattentive auditory processing when participants, adults or children, passively listen to sound sequences (see Kraus and Chandrasekaran 2010 for review). Analyses of auditory brainstem response (ABRs; Jewett et al. 1970) and of the mismatch negativity (MMN, Näätänen et al. 1978), a cortical correlate of preattentive mismatch detection (e.g., Woldorff and Hillyard 1991; Näätänen et al. 1993), have demonstrated more robust ABRs, as well as larger and shorter MMNs to deviants in pure tones, harmonic tones and musical sounds in adult musicians compared with nonmusicians (e.g., Tervaniemi, et al. 2005; Musacchia et al. 2007). Intriguingly, the positive influence of musical training has also been shown to extend the processing of speech sounds. Both the ABRs and the MMNs to speech or speech-like stimuli occur earlier and/or are larger with greater musical expertise (Musacchia et al. 2007, 2008; Wong et al. 2007; Bidelman and Krishnan 2009; Chandrasekaran et al. 2009). Although no definite explanations have yet been provided regarding the musician's advantage, several interpretations have been proposed in terms of common processing of acoustic features shared by music and speech and of transfer of training from music to language, possibly as a result of short- and long-term plasticity induced by corticofugal mechanisms (Kraus and Chandrasekaran 2010; Besson et al. 2011).
To our knowledge, no studies have yet examined the impact of musical training on the ABRs to speech stimuli in children. However, Strait et al. (2011) recently reported that both musical aptitudes and reading abilities in 8- to 13-year-old children correlate with subcortical enhancement to the syllable “Da” presented in predictable compared with variable sequences (i.e., 100% “Da” vs. 13% “Da” intermixed with 7 other syllables). At the cortical level, Milovanov et al. (2009) reported enhanced MMNs to speech duration deviants in 10- to 12-year-old children with high musical aptitudes and pronunciation skills compared with children who lacked these skills. In a recent study (Chobert et al. 2011), we compared 9-year-old musician children, with an average of 4 years of musical training, with nonmusician children using a “multi-feature” MMN paradigm (Näätänen et al. 2004). The syllable “Ba” served as standard with deviants either close to or far from the standard (small and large deviants) on 3 dimensions: vowel frequency, vowel duration, and voice onset time (VOT). The VOT is a phonological parameter acoustically defined as the interval between noise-burst produced at consonant release and the onset of the waveform periodicity associated with vocal cord vibration (Lisker and Abramson 1967). Changes in the VOT allow one to perceive stop consonants as voiced (e.g., /b/) or voiceless (e.g., /p/). While no between-group differences were found for frequency deviants, MMNs were larger in musician than in nonmusician children for both large and small duration deviants. Moreover, a deviance-size effect was found for VOT deviants in musician children but not in nonmusician children (i.e., large deviants elicited larger MMNs than small deviants). Results of this cross-sectional study therefore suggest that musician children are more sensitive than nonmusicians to acoustic (duration) and phonological (VOT) cues that are known to be of primary importance for speech perception.
However, as correlation does not indicate causality, the only way to determine whether these previous findings were causally linked to musical training was to conduct a longitudinal study with nonmusician children (e.g., Schellenberg, 2004). This was the aim of the present study. Only a few authors have used a longitudinal design in children to examine the influence of musical training on the processing of musical sounds (e.g., Fujioka et al. 2006; Hyde et al. 2009). For instance, Hyde et al. (2009) showed that 15 months of musical training in 6-year-old children increased the level of performance in auditory discrimination tests (melodic and rhythmic) as well as in a finger motor sequencing test. Correlated structural changes were found in brain structures known to be important for instrumental music performance and auditory processing in adults (i.e., primary motor cortex, right auditory cortex, and corpus callosum; Schneider et al. 2002; Gaser and Schlaug 2003; Schlaug et al. 2005). Even fewer studies have aimed at examining transfer effects from musical training to other cognitive abilities such as general intelligence and speech segmentation abilities in children (Schellenberg 2004; Moreno et al. 2009; François et al. 2013) and social behavior in infants (Trainor et al. 2012). For instance, Moreno et al. (2009) trained 8-year-old nonmusician children with music or painting for 6 months and showed enhanced pitch discrimination abilities for both musical sounds and words in sentence contexts in musically trained children only. This increased pitch sensitivity was reflected by an increase in the amplitude of the N3 and P2/P3 components of the event-related brain potentials (ERPs). Most interestingly, and in line with previous studies showing that musical aptitude is positively correlated with phonological abilities (e.g., Anvari et al. 2002; Overy et al. 2003; Slevc and Miyake 2006), results also revealed a benefit of musical training on the reading of phonologically complex words. Finally, Moreno et al. (2011) recently tested a large number of children (64) and showed improved verbal ability in 90% of the children after only 20 days of musical training but not after visual arts training. Thus, evidence is accumulating that musical training positively influences different perceptual and cognitive processes.
The specific aim of the present longitudinal experiment was to test for the influence of musical training on preattentive speech perception. To this aim, children were pseudorandomly assigned to music or to painting training programs based on results at several tests presented before training (at the first testing session; T0). To examine the developmental dynamics of the training effects, children were tested both after 6 months (T1) and after 12 months (T2) of training. Based on the literature reviewed above, specific hypotheses were tested regarding the 3 types of deviants used in the experiment. For frequency deviants, no clear MMNs were elicited in Chobert et al. (2011) that possibly resulted from the deviants being too close to the standard to be preattentively detected (deviance size for large deviants: 21 Hz and for small deviants: 6 Hz). We therefore increased the deviance size for both large (51 Hz) and small deviants (14 Hz). For duration deviants and based on the results of Milovanov et al. (2009) and Chobert et al. (2011) reviewed above, we predicted larger MMNs after 6 and/or 12 months of training than before training in musically trained children but not in children trained with painting.
Finally, and of most interest, are the predictions for VOT deviants. In French, consonants with VOT values shorter than the phonemic boundary (around 0 ms) tend to be classified as voiced (negative VOT around −100 ms) and consonants with VOT values equal to or longer than 0 ms tend to be classified as voiceless (positive VOT around +30 ms; Serniclaes 1987). VOT perception has been examined using the MMN in adults (e.g., Sharma and Dorman 1999; Phillips et al. 2000) and results typically showed smaller MMNs for within-category (e.g., different types of “Ba”) than for across-category consonant changes (e.g., from “Ba” to “Pa”), despite the fact that the size of the change was the same in both cases. Thus, an across-category change from 30 to 50 ms elicited an MMN in American English adults whereas a within-category change from 60 to 80 ms VOT did not (Sharma and Dorman 1999).
The standard “Ba” had a VOT of −70 ms (Ba−70 ms) and the VOT deviants were issued from a “Ba”–“Pa” continuum with the small deviant located in the middle (Ba−40 ms) and the large deviant at the extreme of the continuum (Ba0 ms), very close to “Pa”. Based on the results above and on previous findings in musician children (Chobert et al. 2011), we hypothesized that after 6 and/or 12 months of musical training children should perceive large deviants as across-category changes and small deviants as within-category changes. Consequently, they should develop larger MMNs to large than to small deviants. No such deviance-size effect (or smaller) should be found in the painting group.
A total of 37 nonmusician children attending the third grade in 2 elementary schools in Southern France (Aix-en-Provence and Marseille) were enrolled in these experiments that lasted for 2 school years. Thirteen children were excluded from final analysis either because they moved away during the first (5) or the second academic year (3) or because of too many artifacts in the electrophysiological recordings (5). The remaining 24 children were native speakers of French, with no known deficits, and 19 were right-handed, as determined from a detailed questionnaire that parents were asked to fill in prior to the experiment. Children had similar socioeconomic backgrounds (middle-to-low social class) as determined from the parents' professions according to the criteria of the National Institute of Statistics and Economic Studies. Most children were involved in extra-curricular activities (i.e., mainly sports) but none of the children, and none of their parents, had formal training in music or painting.
Consent from the inspector of schools, as well as from the school directors, the teachers and from the children's parents was granted before the start of the project. This study was conducted in accordance with local norms and guidelines for the protection of human subjects and parents signed an informed consent sheet prior to the experiment. They were informed in details about the procedure (see below) and about music and painting training. Both types of training were described as challenging, interesting, and rewarding experiences for their children. Thus, none of the parents complained that their children followed one type of training and not the other. Rather, they were pleased for their children to be given free music and painting lessons at school. Shortly after the start of training, children in the painting group went to an art exhibition and children in the music group went to a concert. At the end of each school year, children from the painting group displayed their artwork at a school exhibition and children from the music group performed a concert. Children were given gifts at the end of each testing session (T0, T1, and T2) to thank them for their participation and to maintain their levels of motivation.
Longitudinal Study: Design and Procedure
In order to ensure that no between-group differences were found before training, children were pseudorandomly assigned to musical training or to painting training (control group) based on age, school level, sex, and socioeconomic background as well as on results at standardized neuropsychological tests from the WISC-IV (Wechsler 2003), NEPSY (Korkman et al. 1998), and ODEDYS batteries (Jacquier-Roux et al. 2005) that were administrated before training (T0: verbal comprehension, verbal and nonverbal reasoning abilities, working memory, visual and auditory attention, and visuospatial and visuomotor abilities). We also ensured that the MMN amplitude was not significantly different between the two groups before training (T0). The final 2 groups comprised 12 children each with 3 girls (8.3 year old, SD = 0.45) in the music group and 4 girls (8.2 year old; SD = 0.45) in the painting group (t(1, 22) = 0.97; P = 0.34). The training took place 6 months per year during 2 school-years. The 2 testing sessions T1 and T2 were identical to T0 and took place at the end of the first and second school years.
Two teachers professionally trained in music or painting were specifically hired for this project. Training took place from October to May in sessions of 45 min, twice a week during the first school year and once a week during the second school year. Musical training was based on a combination of Kodály and Orff methods (http://www.iks.hu/; http://www.orff.de/en.html) and included training on rhythm, melody, harmony, and timbre. Painting training was based on the approach developed by Arno Stern (http://www.arnostern.com/) and emphasized the development of visuospatial performance on several components such as light and color, line and perspective, and matter and texture.
MMN Experiment Procedure
MMN was recorded at T0, T1, and T2. Children sat in a comfortable chair 1 m from a computer screen. Electroencephalogram (EEG) was recorded while the children watched a silent subtitled movie displayed on a computer screen (as in Chobert et al. 2011). Children were told to watch the movie without paying attention to the sounds that were presented through headphones. Frequency, duration, and VOT deviants, each with 2 levels of deviance size (small and large deviance from the standard), were randomly presented within the auditory sequence with a sound onset asynchrony of 600 ms synchronized with vowel onset. A total of 1200 stimuli were used with 432 deviants (72 for each of the 6 deviant types; 6% probability). All stimuli were presented within a single block that lasted for 12.2 min. At the end of the experiment, children were asked questions to ensure they had paid attention to the movie. They were then asked to listen to 20 pseudorandomly presented VOT stimuli (standard syllables, large and small VOT deviants), through headphones, and to say aloud after each stimulus which syllable they heard (/ba/or/pa/).
Stimuli were syllables with consonant–vowel (CV) structure (see Fig. 1). The standard stimulus “Ba” had a fundamental frequency (F0) of 103 Hz, vowel duration of 208 ms for a total duration of the stimulus equal to 278 ms, and a VOT of −70 ms. For frequency deviants, VOT and vowel duration were the same as for the standard but the F0 of the vowel was increased using the Praat software (Boersma and Weenink 2001). For large deviants, the F0 was increased to 154 Hz (i.e., 51 Hz higher than standard, 49% increase) and for small deviants to 117 Hz (i.e., 14 Hz higher than standard, 13% increase). For duration deviants, F0 and VOT were the same as for the standard but vowel duration was shortened using “Adobe Audition” software (Chavez et al. 2003). For large deviants, vowel duration was 128 ms (i.e., 80 ms shorter than the standard, 38% decrease; total duration large deviant = 198 ms) and for the small deviant 158 ms (i.e., 50 ms shorter than the standard, 24% decrease; total duration small deviant = 228 ms). For VOT deviants, F0 and vowel durations were the same as for the standard but VOT changed. Small and large deviants were selected on a “Ba–Pa” continuum that comprised 9 sounds. The large deviant was “Ba0 ms” (VOT = 0 ms; i.e., 70 ms shorter than the standard, 100% decrease) and the small deviant was “Ba−40 ms” (VOT = −40 ms; i.e., 30 ms shorter than the standard, 42% decrease).
ERP Recording and Processing
The EEG was continuously recorded at a sampling rate of 512 Hz using a Biosemi amplifier system (Amsterdam, BioSemi Active 2) from 32 active Ag-Cl electrodes mounted on a child-sized elastic cap (Biosemi Pintype) at standard positions of the International 10/20 System (Jasper 1958). Electrode impedance was kept below 5 kΩ. Data were re-referenced off-line to the average of the left and right mastoids as well as to the nose recordings and filtered with a bandpass of 1–30 Hz (12 dB/oct; as recommended by Kujala et al. 2007). The electro-oculogram (EOG) was recorded from Flat-type active electrodes placed 1 cm to the left and right of the external canthi, and from an electrode beneath the right eye. Three additional electrodes were placed on the left and right mastoids and on the nose. EEG data were analyzed using the Brain Vision Analyser software (Version 01/04/2002; Brain Products, Gmbh). Recordings were segmented into 700 ms epochs (from −100 ms until 600 ms poststimulus onset). Epochs with electric activity exceeding baseline activity by 60 µV were considered as artifacts and were automatically rejected from further processing (around 10%). Further analyses showed that the percentage of rejected trial was not different between the two groups and the 3 sessions.
Repeated-measures multivariate analyses of variance (MANOVAs) were used to analyze data from the various neuropsychological and speech tests. They included group (music vs. painting) as a between-subjects factor and sessions (T0 vs. T1 vs. T2) and tests as within-subject factors.
For the syllable identification test, the percentages of “Ba” identification were computed for each stimulus and for each child. Three-way repeated-measures ANOVAs were conducted that included group (music vs. painting) as between-subject factor, session (T0 vs. T1 vs. T2) and VOT stimuli (Ba vs. Ba−40 ms vs. Ba0 ms) as within-subject factors.
The MMN was computed by using the nose reference to verify the typical MMN inversion between Fz/Cz and the mastoid electrodes (Näätänen et al. 2007). However, mastoid-referenced averages were used to compute the difference waveforms (deviants minus standard) and to quantify MMN amplitude because they typically show a better signal-to-noise ratio than the nose-referenced averages (Schröger and Wolff 1998; Kujala et al. 2007). The MMN was identified at Fz as the most negative peak in the grand-average difference waveform in each condition. Mean amplitudes were measured for each participant and for each deviant using 75 ms windows centered on the MMN peak.
Five-way repeated-measures ANOVAs were first conducted on MMN amplitude for each dimension separately (frequency, duration, and VOT) that included group (music and painting) as a between-subject factor, and session (T0, T1, and T2), deviance size (small and large), laterality (left: F3, C3, P3; midlines: Fz, Cz, Pz; and right: F4, C4, P4) and anterior–posterior locus (frontal, central, and parietal) as within-subject factors. MMN effects being larger at frontal sites (Näätänen et al. 2007), 4-way ANOVAs including only frontal electrodes were then conducted on MMN amplitude that included group (music and painting) as between-subject factor, and session (T0, T1 and T2), deviance size (small and large) and laterality (left: F3; midlines: Fz, and right: F4) as within-subject factors. Greenhouse–Geisser corrections were applied when appropriate and Tukey post-hoc tests were conducted to determine the source of significant interactions.
No between-group differences were found before training. The level of performance was higher after 6 and after 12 months of training (T1 and T2) than before training (main effect of session: [F2,44 = 39.49; P < 0.001]). The main effect of session can be explained by repetition effects (i.e., the same tests were presented 2 and 3 times) and/or by maturation effects as children were 7 months older at T1 and 20 months older at T2 than at T0. Most importantly, the improvement was not significantly different in the music and painting training groups [main effect of group: F < 1; group by session interaction F < 1].
MMNs always showed the typical polarity inversion between fronto-central electrodes and mastoid electrodes (see Kujala et al. 2007 for a review). Results of the ANOVAs are reported in Table 1 and, when appropriate, results of post-hoc Tukey tests are included in text. MMNs are illustrated on Figures 2–5, and ERPs to standard and deviants are illustrated on Figure 6. MMN mean values are included in Table 2.
|Separate ANOVAs by dimension|
|Group × session||2.44||<1|
|Group × session||2.44||3.59||0.04|
|Group × session||2.44||3.07||0.05|
|Separate ANOVAs by dimension|
|Group × session||2.44||<1|
|Group × session||2.44||3.59||0.04|
|Group × session||2.44||3.07||0.05|
Significant effects and interactions are highlighted in bold.
As MMNs were always larger over fronto-central regions than over parietal regions, we focused analyses on frontal sites (main effect of antero–posterior factor: frequency: frontal = −1.64 µV, central = −1.37 µV, parietal = −0.69 µV, [F2,44 = 24.17; P < 0.001]; duration: frontal = −0.70 µV, central = −0.97 µV, parietal = −0.66 µV, [F2,44= 2.99; P < 0.05]; VOT: frontal = −2.48 µV; central = −2.06 µV; parietal = −0.94 µV, [F2,44= 70.74; P < 0.001]).
Importantly, MMN amplitudes did not differ between the two groups of children before training (at T0) for any of the 3 deviants (frequency: music = −1.15 µV, painting = −1.05 µV, F < 1; duration: music = −0.42 µV, painting = −0.49 µV, F < 1; VOT: music = −2.14 µV, painting = −2.46 µV, F < 1). Moreover, independently of the session and of the deviant types, MMN amplitude was always larger for large than for small deviants (main effect of deviance size: frequency: large = −2.46 µV; small = −0.82 µV, (P < 0.001); duration: large = −1.09 µV; small = −0.36 µV, (P < 0.01); VOT: large = −3.20 µV; small = −1.76 µV, (P < 0.001).
Turning to the effect of training, results showed no main effect of group for frequency deviants (see Fig. 3; music = −1.68 µV; painting = −1.59 µV, F < 1) but a significant main effect of session (P < 0.01): MMNs were larger after 12 months of training (T2 = −2.45 µV) than both before training (T0 = −1.10 µV; T2 vs. T0: P < 0.001) and after 6 months of training (T1 = −1.37 µV; T2 vs. T1: P < 0.03). However, the increase in MMN amplitude was not different in the music and in the painting training groups (no group by session interaction, F < 1).
For duration deviants (see Fig. 4), neither the main effect of group nor the main effect of session were significant (music = −0.76 µV vs. painting = −0.69 µV, F < 1 and T0 = −0.46 µV vs. T1 = −0.64 µV vs. T2 = −1.07 µV, P > 0.13). In contrast, the group-by-session interaction was significant (P < .04). In the music group, MMNs were larger at T2 (−1.54 µV) than at T1 (−0.30 µV; P < 0.05) but not larger at T1 than at T0 (0.42 µV; P > 0.99). No effect of training was found on MMN amplitude in the painting group (T2 = −0.60 µV, T1 = −0.97 µV, T0 = −0.49 µV; all P > 0.95).
For VOT deviants (see Fig. 5), the main effect of group was again not significant (music = −2.83 µV vs. painting = −2.13 µV; P = 0.11) but the main effect of session and the group-by-session interaction were significant (P < 0.02 and P < 0.05, respectively). In the music group, MMNs were larger at T2 (−3.98 µV) than at T0 (−2.14 µV; T2 vs.T0: P < 0.02) and marginally larger at T2 than at T1 (−2.37 µV; T2 vs.T1: P = 0.07) with no significant difference between T1 and T0 (P > 0.99). These effects were not significant in the painting group (T2 = −2.28 µV, T1 = −1.66 µV; T0 = −2.46 µV; all P > 0.88).
No other main effects or interactions were found.
Identification Test with VOT Deviants
The standard “Ba−70 ms” was more often identified as “Ba” (98%) than the small deviant “Ba−40 ms” (89%) that was also more often identified as “Ba” than the large deviant (25%; [F2,44= 302;57; P < 0.001]). Neither the main effect of group (P = 0.19), session (P = 0.66) or the group-by-session interaction (P = 0.11) were significant.
This longitudinal study over 2 school years aimed at testing the influence of active musical training on the preattentive processing of syllables in 8- to 10-year-old nonmusician children pseudorandomly assigned to music or to painting training. Results showed significant enhancements of MMN amplitude to duration and VOT deviants after 12 months of musical training (T2) but not after painting training. In contrast, no specific effect of musical training was found for frequency deviants. As the 2 groups of children did not differ before training (at T0) for any of the 3 types of deviant, active musical training rather than preexisting differences or innate predispositions for music most likely yielded the enhancements in MMN amplitude in the musically trained group.
Training Effects on MMN Amplitude
Importantly, the MMNs always showed the typical polarity inversion at mastoid electrodes (using the nose reference) as well as the typical MMN fronto-central distribution (Näätänen et al. 2007).
For frequency deviants, Chobert et al. (2011) reported no significant difference in MMN amplitude between musician children with an average of 4 years of musical training and nonmusician children. In line with this result, MMN amplitude was not larger after 2 years of musical training than after 2 years of painting training (no group-by-session interaction; F < 1). It may be that the acoustic difference between small deviants and standard was too small to be preattentively detected by either groups and that, in contrast, the acoustic difference for large deviants was large enough to be similarly detected in both groups.
In contrast, in Chobert et al. (2011), musician children outperformed nonmusicians in an active syllable frequency discrimination task. Moreover, Moreno et al. (2009) showed that 6 months of musical training increased the active discrimination of subtle pitch variations on the final words of sentences. This facilitation was associated with an increase in the amplitude of early positive components (P2/P3), taken to reflect enhanced perceptual sensitivity and/or enhanced auditory attention with musical training (Fujioka et al. 2006). These contrastive results possibly reflect differences in stimulus materials (i.e., pitch variations on isolated syllables vs. on words in sentence context) and/or differences in the mechanisms underlying active and passive frequency discrimination of speech sounds. Similar differences have been reported for nonspeech sounds (e.g., Tervaniemi et al. 2005, 2009; Pakarinen et al. 2007). For instance, Tervaniemi et al. (2005) reported enhanced active discrimination of pitch deviants in sequences of harmonic sounds in adult musicians compared with nonmusicians with no between-group differences in MMN amplitude. Future experiments should aim at directly comparing the effect of musical training on passive versus active frequency discrimination of speech sounds.
In contrast to frequency deviants and consistent with the previous results of Chobert et al. (2011), the MMNs to both duration and VOT deviants were larger after 12 months of musical training than before training. The finding that these effects were not significant at T1 indicates that at least 6 months of musical training are necessary to improve the preattentive processing of duration and VOT deviants. Importantly, none of these differences (between T2 and T0 or between T1 and T0) were significant in children with 12 months of painting training (group-by-session interaction).
How can we account for these effects? Processing temporal structure in music and speech possibly relies on common processing (Besson 1998; Besson et al. 2011) and draws into the same pool of neural resources (Patel 2003, 2008; Kraus and Chandrasekaran 2010). Direct evidence for these interpretations comes from results showing that Brodmann Area 47 of the left Inferior Frontal Gyrus (IFG) and the temporal cortex of both hemispheres are involved in processing temporal structure in both music and speech (e.g., Levitin and Menon 2003; Brown et al. 2006; Tillmann et al. 2006; Abrams et al. 2011) although possibly differently within each domain (Abrams et al. 2011). Indirect evidence comes from results showing that adult musicians are more sensitive to the metric structure of words than nonmusicians (Marie, Magne et al. 2011). Moreover, long-term exposure to a language such as Finnish, in which vowel duration is linguistically relevant, increases both the MMN amplitude and the active discrimination of duration deviants in sequences of harmonic sounds (Marie et al. 2012). Thus, long-term experience with either music or language (or possibly both) seems to influence the processing of acoustic features such as duration that are common to both domains.
However, VOT is a feature specific to language (i.e., not shared with music) that plays an important role in the development of phonological representations. In this respect, VOT processing is unlikely to draw on the same pool of neural resources than those used for musical sounds. How can we then explain that musical training enhances the amplitude of the MMN to VOT deviants? Besson et al. (2011) have argued that enhanced sensitivity to acoustic features that are common to music and speech (e.g., duration) allows musicians to construct more elaborate percepts of the speech signal than nonmusicians. This, in turn, facilitates stages of speech processing that are speech-specific (i.e., not common to music and speech). In other words, by increasing the sensitivity to an acoustic parameter, the duration that may be common to both music and speech, musical training may also increase the sensitivity to a phonological parameter (i.e., VOT) that is specific to speech and is also based on a timing feature. In this case, we consider that the positive influence of music training on the preattentive processing of VOT reflects transfer effects from music to speech rather than common processing (see also Bidelman et al. 2009; Kraus and Chandrasekaran 2010; Besson et al. 2011).
In line with this view, previous results showed that acoustic processing of rapidly changing auditory patterns, which is a prerequisite for rhythmic processing in music and for hearing speech-specific features such as formant transitions (Bidelman and Krishnan 2010), involves the superior temporal gyrus (STG) bilaterally (Griffiths and Warren 2002; Jäncke et al. 2002; Hickok and Poeppel 2007; Zaehle et al. 2008). Thus, musical training may not only shape the activity of brain structures that are necessary for processing music and speech, such as the brainstem, primary auditory cortex, and STG, but may also influence the activity of other brain regions that are more specifically involved in speech processing and auditory working memory such as the superior temporal sulcus (STS; Hickok and Poeppel 2007; Hickok 2012) and the IFG (Gelfand and Bookheimer 2003). Further experiments should aim at directly testing for changes of activity within, and of connectivity between, different brain structures such as the Heschl gyrus, STG, STS, and IFG with musical training.
As previously reported in the literature, MMNs to frequency and duration deviants were greater for large than to small deviants and this is taken to reflect the increased difficulty in processing small deviants that are closer to the standard than large deviants (e.g., Sams et al. 1985; Tiitinen et al. 1994; Novitski et al. 2004). However, it may also be linked with the rate of preattentively detected deviants being higher for large than for small deviants with MMNs of similar amplitude in both cases (Winkler et al. 1993). The deviance-size effect was also significant for VOT deviants. This finding is in line with previous results showing that within-phonemic category deviants (i.e., small deviants) typically elicit a small (or no) MMN, while across-phonemic category deviants (i.e., similar to large deviants here) elicit large MMNs (Dehaene-Lambertz 1997; Dehaene-Lambertz and Baillet 1998; Sharma and Dorman 1999; Phillips et al. 2000). Importantly, the deviance-size effect for VOT deviants parallels the results in the syllable identification test, showing that children most often identified the standard syllable (98%) and the small deviants (89%) as “Ba” (within-category) and the large deviants as an across-category “Pa” (only 25% of “Ba” identification). However, the finding of a significant deviance-size effect for VOT deviants in both groups of children (with no group-by-session by deviance-size interaction) contrasts with previous results by Chobert et al. (2011) showing a VOT deviance-size effect in musician but not in nonmusician children. Importantly, different stimuli were used in the 2 experiments with larger between-deviant differences in the present experiment (40 ms difference between large and small deviants) than in Chobert et al. (2011; 28 ms between large and small deviants). Moreover, compared with the stimuli used in Chobert et al. (2011), the large deviant “Ba0 ms” was closer to the French “Pa” and the small deviant “Ba−40 ms” was closer to the standard “Ba−70 ms.” The difference between small and large VOT deviants was consequently larger in the present study, thereby likely accounting for the difference between the 2 experiments.
MMN to frequency and VOT deviants were larger at T2 than at T1 and T0 in both groups of children. As children were 22 months older at T2 than at T0, these enhancements are likely to reflect maturational effects. For instance, Jensen and Neff (1993) have shown that frequency processing becomes mature (i.e., reaching adult level) around 10 year olds. Moreover, repetition effects also possibly account for the MMN increase at T2 since children are listening to the same sequence of sounds for the third time (but see, Uwer and von Suchodoletz. 2000). In contrast, MMN to duration deviants seemed not modified by maturation or repetition, a finding in line with previous results showing that duration processing develops very slowly from childhood to adulthood (e.g., Uwer and von Suchodoletz. 2000; Smith et al. 2011). Two school years, between 8 and 10 years, are possibly not long enough to expect significant effects of maturation on duration deviants at the preattentive level.
Repetition effects and maturation effects are also likely to account for the increased level of performance after 6 and 12 months of training in most neuropsychological tests (WISC-IV, NEPSY, and ODEDYS). However, these improvements were not significantly different in the music and painting training groups. These results are in line with those of the longitudinal study conducted by Hyde et al. (2009), showing that structural changes can develop in auditory and motor areas after 15 months of musical training with no significant improvement on several neuropsychological tests. Similarly, previous results from our group showed increased amplitude of ERP components (N3, P2/P3) related to perceptual and cognitive processes with no significant improvement on measures of full-scale IQ (Moreno et al. 2009). These results stand in contrast with the findings by Schellenberg (2004) of small but significant improvements of general intelligence after 1 year of musical training and of Moreno et al. (2011) showing improvements in verbal intelligence after 20 days of intensive computerized musical training. However, between-experiment differences in sample size (twice as many children were tested in the last 2 cited studies than in the other ones), as well as the fact that we used only a subset of performance and verbal IQ tests to reduce the length of the experiment, possibly account for these different results.
The present results demonstrate neuroplasticity in the children's brain as reflected by enhancements in MMN amplitude at T2 compared with T0. These neuroplastic changes to both duration and VOT deviants were only found in nonmusician children trained with music. Since no differences were found at T0 between these children and those trained with painting, the observed effects were most likely due to active musical training rather than to genetic predispositions for music. At first, this conclusion may seem at odds with results of cross-sectional studies showing positive correlations between musical aptitudes (that are possibly linked with genetic predispositions for music) and the preattentive processing of syllable duration (Milovanov et al. 2009) or different aspects of phonological processing (e.g., Anvari et al. 2002; Overy et al. 2003; Slevc and Miyake 2006; Marques et al. 2007; Huss et al. 2011). However, these influences are not necessarily mutually exclusive. Both musical aptitudes, as reported by Milovanov et al. (2009) and Slevc and Miyake (2006), for instance, and musical training, as reported here, can enhance the preattentive duration and phonological processing of speech sounds.
Nevertheless, by controlling for preexisting between-group differences and by using pseudorandom assignment to music or painting training, our results showed that the MMN enhancement to duration and VOT deviants is causally linked to musical training and does not only result from genetic predispositions for music. This conclusion is in line with the view following which the impact of biological predispositions is “probabilistic and context-dependent” rather than “deterministic” (The Royal Society 2011, p.17). It is also important to note that while we used musical training, it may be that any consistent, interactive auditory training could also lead to the effects reported here. Further longitudinal experiments are needed to compare the effects of different types of auditory and non auditory training. Finally, our results are also of interest for understanding the developmental dynamics of musical training effects on speech processing. While no difference in MMN amplitude to duration and VOT was found after 6 months of musical training, the differences developed later, between T1 and T2. Thus, at least 6 months of musical training seem necessary to induce these effects. However, as children were trained only for 45 min once a week during school year 2 and twice a week during school year 1, shorter but more intensive musical training may allow for the development of similar effects (Bangert et al. 2001; Lappe et al. 2008). Further experiments are needed to test for this interesting possibility.
By increasing our understanding of how musical training influences the preattentive processing of syllables, the building blocks of words, the present results should benefit research-based education programs and help develop new methods to improve the abilities of children with abnormal development (Posner and Rothbart 2005; Schlaug et al. 2005; Tallal and Gaab 2006; Santos et al. 2007; Goswami 2011). Importantly, children with language-based learning impairments are often impaired in the perception and encoding of both musical (Huss et al. 2011) and speech metrical structures (Overy 2000; Tallal and Gaab 2006; Gaab et al. 2007; Abrams et al. 2009; Goswami 2011). For instance, Goswami (2011) reviewed convincing evidence for atypical temporal integration windows for syllabic parsing in dyslexia possibly resulting from atypical basic auditory processing of rise time, a crucial parameter allowing the segmentation of continuous speech into syllables (see also François and Schön 2011). Moreover, we have recently shown impaired preattentive processing of vowel duration and VOT in children with dyslexia, with no impairment in the processing of vowel frequency deviants (Chobert et al. 2012). Thus, if acoustic and phonological processes are strongly linked, musical training, by improving the discrimination of the acoustic features of speech sounds, may increase the abilities of children with developmental dyslexia and of children with cochlear implants to build better phonological representations. This may also facilitate the understanding of speech in normal or in adverse conditions (Song et al. 2012). Moreover, by helping children with dyslexia to develop more robust phonological representations, musical training may also enhances reading ability (e.g., Ziegler and Goswami 2006). This hypothesis has been tested by including dyslexic children in the longitudinal study presented here and results are currently under analysis. Finally, if strong links exist between the acoustic and the more abstract levels of language representations, musical training may also enhance children's abilities to learn foreign languages, particularly languages in which pitch and duration variations are linguistically relevant, as in tone (e.g., Mandarin Chinese, Thaï, and many African languages) and in quantity languages (e.g., Finnish, Japanese). Results of cross-sectional experiments in adults already provide support for this view (Bialystok and DePape 2009; Bidelman et al. 2009; Marie, Delogu et al. 2011; Marie, Magne et al. 2011; McNealy et al. 2011; Sadakata and Sekiyama 2011; Slevc and Miyake 2006; Wong and Perrachione 2007; Wong et al. 2007).
This research was supported by a grant from the ANR-Neuro (#024-01 to M.B.). At the time the study was conducted, J.C. and C.F. were PhD students supported by the ANR-Neuro (#024-01). J.C. is now a post-doctoral researcher supported by a grant from the Fondation de France (#00015167).
We thank Noël N'Guyen and Daniele Schön for helping us prepare the stimuli, the neuropsychologists, Carine Verse, Amandine Dettori-Campus, Emmanuelle Germain, Morgane Pichou, Coralie Durand, and Sandrine Piron for running part of the experiments and Nia Cason for her attentive reading of the manuscript. We also thank the directors of the 2 schools where the children were tested, Mrs Muriel Gaiarsa and Mr Jean-Jacques Gaubert, the teachers of the schools, as well as all the children who participated in this study and their parents. Conflict of Interest: None declared.