Previous work has shown a relationship between brain anatomy and how quickly adults learn to perceive foreign speech sounds. Faster learners have greater asymmetry (left > right) in parietal lobe white matter (WM) volumes and larger WM volumes of left Heschl's gyrus than slower learners. Here, we tested native French speakers who were previously scanned using high-resolution anatomical magnetic resonance imaging. We asked them to pronounce a Persian consonant that does not exist in French but which can easily be distinguished from French speech sounds, the voiced uvular stop. Two judges scored the goodness of the utterances. Voxel-based morphometry revealed that individuals who more accurately pronounce the foreign sound have higher WM density in the left insula/prefrontal cortex and in the inferior parietal cortices bilaterally compared with poorer producers. Results suggest that WM anatomy in brain regions previously implicated in articulation and phonological working memory, or the size/shape of these or adjacent regions, is in part predictive of the accuracy of speech sound pronunciation.
Individuals differ considerably in how easily they learn to perceive foreign speech sounds (Pruitt and others 1990; Polka 1991; Bradlow and others 1997; Golestani and Zatorre 2004). We previously demonstrated a relationship between brain anatomy and how quickly healthy adults learn to hear non-native speech sounds. Faster phonetic learners showed a greater asymmetry (left > right) in the amount of white matter (WM) in parietal regions compared with slower phonetic learners (Golestani and others 2002). More recently, we scanned a new group of 11 fast and 10 slow phonetic learners using structural magnetic resonance imaging (MRI). Voxel-based morphometry (VBM) suggested a higher WM density in the left Heschl's gyrus (HG) in faster compared with slower learners, and manual segmentation of this structure confirmed that the WM volume of left HG is larger in the former compared with the latter group (Golestani and others 2006). We replicated the finding of a larger volume of left HG in faster compared with slower learners in an independent sample of subjects, those from the original study (Golestani and others 2002). HG includes primary auditory cortex, and these findings suggest that left auditory cortex WM anatomy can in part predict individual differences in the perception of foreign speech sounds.
In the current study, we tested 21 native French speakers on their pronunciation of a foreign speech sound, the Farsi uvular–voiced stop /q/ (Farsi is a language spoken by Persians in Iran). This non-native phoneme is not perceptually assimilated with the native velar–voiced stop /g/, which exists natively in French; in other words, subjects could clearly distinguish the non-native sound from ones that they use natively. Subjects' productions were recorded and rated for their accuracy by a native Farsi speaker, and the “production scores”, based on goodness ratings across utterances, were then correlated with brain anatomy using VBM. We predicted that we would find gray matter (GM) and/or WM anatomical correlates of non-native speech sound production in regions thought to be involved in speech articulation or articulatory planning such as the left insula (Dronkers 1996; Wise and others 1999; Riecker and others 2000) and possibly in regions thought to store speech sounds in verbal working memory such as the left temporoparietal cortex (Paulesu and others 1993; Jonides and others 1998; Henson and others 2000; Honey and others 2000).
Materials and Methods
Twenty-one native French speakers participated in the study. These were the same individuals who participated in the previous study on the anatomical correlates of phonetic perception (Golestani and others 2006). They had a relatively homogeneous language background; all had learned a second language in school from the age of 11 to 18 and a third language from the ages of 13 to 18. Second and third languages only included English, Spanish, and German. None spoke a second or third language proficiently, and none had been regularly exposed to a language other than French before the ages of 11. All subjects gave informed written consent to participate in the study, which was approved by the regional ethical committee.
In order to control for individual differences in experience with the foreign sound that participants were going to be required to produce, we selected a non-native speech sound that is rare across languages. In addition, we wanted to ensure that our participants could adequately perceive the non-native sound. We therefore selected one that could easily be perceptually distinguished from native sounds. We chose the Farsi voiced uvular stop /q/, which is not employed in any widely used language.
The following non-native utterances were presented to and then repeated by the participants (see below). The word-initial target phoneme /q/ was produced 3 times in each of 12 contexts, yielding a total of 36 non-native utterances per participant. The 12 different contexts were the following: the sound /q/ was presented in the context of 6 different consonant–vowel (CV) syllables (sound /q/ followed by -a, -o, -e, -i, -u, -A) and in the context of 6 different bisyllabic nonwords (Farsi words) (sound /q/ followed by -azA, -orme, -ese, -ise, -ulum, -Ali). The use of a variety of phonological contexts was motivated by previous behavioral studies that have stressed the importance of using different vowel phonological contexts when testing speech sound production (Williams 1979; Lambacher and others 2005). After each non-native utterance, subjects were presented with and required to repeat the same sound (i.e., CV or nonword) but this time starting with a word-initial native-voiced velar stop (e.g., ga, go, ge, gi, etc.). This was done in order to 1) emphasize the difference between the native and non-native sounds and 2) facilitate the task somewhat by alternating the more difficult non-native trials with easy native ones.
Subjects were previously (3–4 months earlier) scanned using high-resolution aMRI. They were tested on their production of the Farsi-voiced uvular stop using methods similar to ones previously employed in behavioral studies on speech sound production (Williams 1979; Lambacher and others 2005). A native speaker of Farsi (N.G.) produced the non-native and native utterances one at a time, and the subject was instructed to repeat each utterance while trying to reproduce the sound that they heard to the best of their ability. We required subjects to repeat after a native speaker of Farsi rather than to repeat prerecorded utterances in an attempt to minimize shyness and hesitation on behalf of participants. Subjects' utterances were recorded using Praat software (www.praat.org).
A native Farsi speaker (N.G.) listened to the recordings and rated the subjects' utterances. During a first rating phase, she listened to all of the subjects' utterances in order to familiarize herself with them and have an idea of the range in performance across subjects. The rater then listened to the utterances once more, this time providing “goodness ratings” for the production accuracy of the utterances, using a scale rating from 1 (indicating poor exemplars of the Farsi /q/) to 20 (indicating good exemplars of the Farsi /q/). Each subject was given a production score based on the average goodness rating across all of their non-native utterances.
The reliability of the production scores was assessed by having a second native Farsi speaker judge the utterances. The production scores given by the 2 raters were significantly correlated (Pearson's r12 = 0.71, P < 0.001), providing evidence for interrater reliability.
MRI Acquisition and Analysis
Scans were obtained on a 1.5-T Signa Horizon Echospeed MRI scanner (General Electric Medical Systems, Milwaukee, WI). High-resolution anatomical T1-weighted images were acquired in the axial plane using a spoiled gradient echo sequence (128 slices, 1.2 mm thick, 2 number of excitations [repetitions], time repetition = 10 ms, time echo = 2.2 ms, time to inversion = 600 ms, field of view = 22 cm, 0.86 × 0.86 × 1.2 mm voxels).
We used VBM (Ashburner and Friston 2000), an exploratory, whole-brain technique, to search for relationships between brain morphology and phonetic production. This method does not rely on the manual identification of anatomical boundaries and thus does not depend on arbitrary or conventional definitions of particular brain structures. We used the optimized VBM method (Ashburner and Friston 2000; Senjem and others 2005), in which the anatomical images were processed in 3 steps: tissue segmentation, spatial normalization, and smoothing at 4 mm. We correlated production scores of the first native rater with the smoothed GM and WM tissue–classified images on a voxel-by-voxel basis. We used a voxelwise significance threshold of P = 0.001 and a cluster extent threshold of P = 0.05, corrected for multiple comparisons.
All subjects reported easily hearing the difference between the Farsi uvular– and native velar–voiced stops. There were considerable individual differences in non-native speech sound production; some individuals produced relatively accurate exemplars of the non-native–voiced uvular stop from the very first trials, whereas others did not articulate an accurate exemplar even once out of the 36 non-native trials. Production scores ranged from 2 to 20 out of 20, with a mean of 11.24 and a standard deviation of 5.93 for rater 1, and from 0 to 20, with a mean of 9.52 and a standard deviation of 5.68 for rater 2. We also had learning rate measures for non-native phonetic perception from the same subjects from a different study (Golestani and others 2006), where subjects were trained to perceive the Hindi dental–retroflex contrast. We therefore also examined the relationship between behavioral production and perception scores across subjects. Interestingly, the participants who were the fastest phonetic learners in perception were not necessarily the ones who produced the non-native phonemes most accurately. An independent sample t-test comparing the goodness ratings for the production scores across the 2 groups of 11 faster and 10 slower learners in perception was not significant (t19 = −0.9, P > 0.05).
VBM was performed to test for linear correlations between phonetic production scores and GM and WM tissue–classified maps. Results revealed higher WM density in the left insula/prefrontal cortex (Talairach coordinates: −29, 29, 10, t = 4.78, P < 0.001; see Fig. 1) and in the inferior parietal cortices bilaterally (Talairach coordinates—left: −34, −38, 41, t = 9.17, P < 0.001; right: 44, −30, 38, t = 5.92, P < 0.001; see Fig. 2) in individuals who produced more accurate exemplars of the non-native speech sounds compared with ones who produced poorer exemplars. There were no significant correlations between GM tissue–classified maps and phonetic production measures.
The same subjects participated in a previous study on anatomical correlates of phonetic perception (Golestani and others 2006) and could thus be separated into 2 groups of fast and slow learners. An additional voxelwise analysis of the WM tissue–classified maps was performed using an analysis of covariance with a categorical “group” variable (“fast” vs. “slow”) and a variable for the production scores. This analysis revealed that the main effect of speech production on brain anatomy in the left insula/prefrontal (Talairach coordinates: −28, 30, 10, t = 4.86, P < 0.001) as well as the left (Talairach coordinates: −34, −38, 41, t = 8.85, P < 0.001) and right (Talairach coordinates: 43, −30, 37, t = 5.63, P < 0.001) inferior parietal cortices exists even when accounting for the effect of speech perception performance. Moreover, the effect of production did not differ significantly between the 2 groups.
Using VBM, we found a relationship between individual differences in non-native speech sound production and the WM probability density in the left insula/prefrontal cortex and in the inferior parietal lobe bilaterally, suggesting that WM anatomy in brain regions previously implicated in articulation, articulatory planning, speech production, and phonological working memory is in part predictive of the ability to accurately pronounce foreign speech sounds. We found no relationship between GM anatomy and phonetic production. The VBM result could be due to greater WM volumes in these regions in better compared with poorer producers, which could in turn be due to differences in the number of WM fibers and/or to differences in myelination, suggesting differences in anatomical connectivity between these regions across subjects. Alternatively (but not exclusively), the VBM result could be due to differences in the size and/or shape of adjacent prefrontal and parietal gyri and/or sulci. For example, if one examines the coronal image in Figure 1, it can be seen that the region showing a difference in WM density appears to be in the depth of the origin of the horizontal ramus of the Sylvian fissure. It appears to extend medially to the insula and possibly to the frontal operculum. The region above the ramus could be Brodmann's area 45, and those below could be Brodmann's areas 47/12. This interpretation, however, is only speculative, in particular given that sulci do not necessarily demarcate boundaries between different cytoarchitectonic regions. If correct, however, this interpretation could suggest that one or several of these regions is bigger in better than in worse phonetic producers and that a volume difference results in a positional displacement of the horizontal ramus across groups (for a similar interpretation of sulcal displacement results, cf., Golestani and others 2002). Note that inverse relationships between GM and WM are typically found in brain regions in which GM and WM tissues are in close proximity and, when found near a sulcus, can be due to a positional displacement of this latter between groups or conditions (cf., Golestani and others 2002). This, however, is not always the case because the finding of such an inverse relationship likely also depends on factors such as the anatomical position and shape of the finding, as well as on factors such as sulcal thickness.
Lesion and functional imaging work supports the role of the insula in speech articulatory planning (Dronkers 1996; Wise and others 1999) and in articulatory based phonological analysis (Fiez and Petersen 1998). Activation of the left insula has been shown during nonword production, also supporting its role in encoding and buffering phonetic plans in articulation (Bohland JW, Guenther FH, unpublished data). Keller and others (2003) examined the functional correlates of the “tongue-twister” effect by having participants silently read sentences equated for syntactic structure and lexical frequency of the constituent words but differing in the proportion of words that shared similar initial phonemes. The manipulation affected the amount of activation seen in regions involved in articulatory speech programming or rehearsal such as the inferior frontal gyrus and anterior insula and also in areas associated with phonological processing and storage such as the left inferior parietal cortex (Keller and others 2003). Other studies have shown left insula activation during overt but not covert speech production, supporting its role in the actual coordination of speech production rather than in articulatory planning (Riecker and others 2000; Ackermann and Riecker 2004). The insula is also thought to be involved in a number of language-related and other functions including aspects of phonological perception, phonological working memory, lexical knowledge, and word retrieval/generation (Paulesu and others 1993; Rumsey and others 1997; Ardila 1999; Bamiou and others 2003). For example, Chee and others (2004) showed greater left insular activation during a phonological working memory (PWM) task in proficient compared with less proficient bilinguals. They suggested that more optimal engagement of regions involved in PWM in the former group may be related to greater proficiency in a second language in bilinguals (Chee and others 2004). More generally, left prefrontal regions including Broca's area (BA 44/45) are classically thought to be involved in the processing and preparation of speech output. There is also evidence for the involvement of these regions during speech sound perception tasks (Zatorre and others 1992, 1996; Burton and others 2000), and it has been suggested that they are recruited when phonetic segments must be extracted and manipulated in relating the phonetic information to articulation (i.e., when phonetic segmentation or working memory processes are required).
Studies examining the functional correlates of phonetic processing using perceptual tasks have also often shown activation in the left temporo-parietal cortex (Démonet and others 1992, 1994; Zatorre and others 1992, 1996; Paulesu and others 1993), supporting the role of the left (Henson and others 2000; Honey and others 2000; Keller and others 2003) or of bilateral (Paulesu and others 1993; Jonides and others 1998) inferior parietal cortices in the storage of phonological information in verbal short term memory. The location of our left inferior parietal morphological correlate of phonological production is similar to ones reported in previous functional imaging work on the phonological store (Becker and others 1999). Functional imaging and lesion work also support the role of the left inferior parietal cortex when there are greater compared with fewer syllable selection and segmentation demands during speech production (Shuster and Lemieux 2005) and in sublexical production (Martin 2003), respectively. These findings have been interpreted as supporting the role of this region in phonological working memory during speech production.
Relevant to our study are the results of a VBM study showing higher gray density in the left inferior parietal cortex in bilingual compared with monolingual individuals (Mechelli and others 2004). A systematic relationship between GM density and proficiency/age of acquisition was also found, suggesting that the attainment of better skills in a second language or earlier learning of a second language results in structural reorganization in this region. There was also a trend in a similar region in the right hemisphere. The anatomical location of these parietal lobe results are more lateral and posterior to ours, in a region previously shown to be activated during verbal fluency tasks (Poline and others 1996; Warburton and others 1996). Note that the Mechelli study revealed differences in GM probability density across individuals, whereas our findings suggest differences in the probability of WM in the parietal cortex across individuals, suggesting that different anatomical features underlie the attainment of proficiency in a second language versus the ability to accurately pronounce foreign speech sounds.
Anatomical Correlates of Phonological Perception versus Production
As described in the Introduction, we previously found a relationship between phonetic learning in perception and brain anatomy in the same individuals who were tested in the current study. Faster phonetic learners were found to have greater WM volumes in left HG compared with slower learners (Golestani and others 2006), suggesting that the WM anatomy of the left temporal cortex, a region that has previously been shown to subserve speech sound perception (Binder, Rao, Hammeke, Frost, and others 1994; Binder, Rao, Hammeke, Yetkin, and others 1994; Démonet and others 1992, 1994; Zatorre and others 1992; Poeppel and others 1996), in part predicts an important aspect of language perception and learning, one that involves the processing of speech sounds that involve rapid temporal change. We also found that faster phonetic learners have greater asymmetry (left > right) in the amount of WM in parietal regions in a different group of subjects (Golestani and others 2002). We now show a behavioral dissociation between measures of phonological production and perception obtained in the same subjects. Taken together, the previous and current findings suggest that 1) individuals who are faster at learning to perceive foreign speech sounds are not necessarily the ones who are good at correctly pronouncing foreign speech sounds, and vice versa, and that 2) the anatomical differences that predict behavioral measures of phonetic perception and production partially dissociate. Note that it is not necessary to be able to articulate sounds in order to be able to perceive them, but that it is necessary to be able to accurately perceive speech sounds in order to be able to articulate them correctly. Our results fit with findings of behavioral dissociations between phonological production and perception deficits in patients with lesions (Praamstra and others 1991; Dronkers 1996), and with functional imaging work showing some similarities and some differences in the neural systems that underlie phonological perception and production. As reviewed briefly above, lesion and functional imaging work suggests that the left insula and prefrontal cortex may subserve aspects of articulation and speech production, that the left or bilateral temporal areas may underlie the perception of speech sounds, and that left or bilateral inferior parietal cortex may subserve phonological working memory during both phonological perception and production. Taken together, the results of our previous and present morphometric studies suggest that anatomical differences in these same brain regions (left insula/prefrontal cortex, left temporal cortex, and inferior parietal cortices bilaterally) may in part predict individual differences in the very sublexical, speech sound–processing functions that they are thought to subserve. The finding that parietal cortex anatomy in part predicts measures of phonological perception and production is consistent with the idea that there is at least partial overlap in the brain regions that are involved in speech production and perception (Liberman and Mattingly 1985; Hickok and others 2003). Work remains to be done on characterizing the nature of the relationship between brain functional and anatomical correlates of language processing across individuals. Such work may help to elucidate the anatomical features that underlie aspects of brain function and, thereby help to better understand some of the mechanisms that give rise to certain patterns of activation during the performance of certain tasks. For example, a better characterization of anatomical connectivity between language regions of the brain using tracking in diffusion tensor imaging may help to predict aspects of functional connectivity between these regions.
Many thanks to Nicolas Molko for help with preprocessing for VBM analyses and to Niloufar Family for help with rating the speech sound productions. This work has been presented at Human Brain Mapping (HBM) 2005. We acknowledge the support of the French Ministry of Research through an Acions Concertées Incitatives (ACI) grant. Conflict of Interest: None declared.