The Microstructural Plasticity of the Arcuate Fasciculus Undergirds Improved Speech in Noise Perception in Musicians

Abstract Musical training is thought to be related to improved language skills, for example, understanding speech in background noise. Although studies have found that musicians and nonmusicians differed in morphology of bilateral arcuate fasciculus (AF), none has associated such white matter features with speech-in-noise (SIN) perception. Here, we tested both SIN and the diffusivity of bilateral AF segments in musicians and nonmusicians using diffusion tensor imaging. Compared with nonmusicians, musicians had higher fractional anisotropy (FA) in the right direct AF and lower radial diffusivity in the left anterior AF, which correlated with SIN performance. The FA-based laterality index showed stronger right lateralization of the direct AF and stronger left lateralization of the posterior AF in musicians than nonmusicians, with the posterior AF laterality predicting SIN accuracy. Furthermore, hemodynamic activity in right superior temporal gyrus obtained during a SIN task played a full mediation role in explaining the contribution of the right direct AF diffusivity on SIN performance, which therefore links training-related white matter plasticity, brain hemodynamics, and speech perception ability. Our findings provide direct evidence that differential microstructural plasticity of bilateral AF segments may serve as a neural foundation of the cross-domain transfer effect of musical experience to speech perception amid competing noise.


Introduction
There is a great deal of evidence that musical training experience has a pervasive positive effect on auditory cognitive functions (e.g., working memory, language skills) and results in widespread structural and functional changes in the human brain (Kraus and Chandrasekaran 2010;Herholz and Zatorre 2012). However, surprisingly, little is known about the relationship between behavioral improvements and specific neurobiological changes. Speech perception in noisy environments is one of the critical abilities that is frequently demonstrated to be improved in musicians (Coffey et al. 2017a). Since speech and music are 2 communication systems fundamental for human social interaction, musicians' advantage in speech-in-noise (SIN) perception represents a valuable model for studying the cross-domain transfer effect of musical experience. To date, no study has investigated the brain structural correlates of strengthened SIN perception in musicians. The answer to this question is important for understanding the shared neural resources underlying musical and speech processing, and the application of more targeted musical therapy to restore speech functions in neurological and developmental disorders or in aging population (Zendel et al. 2019). SIN perception, or the "cocktail party phenomenon," requires skilled perceptual and cognitive processes (segmenting, grouping, representing, and storing target acoustic signals) in order to pick up meaningful units (Parbery-Clark et al. 2009). The musician advantage in SIN perception has been associated with more faithful spectral and temporal encoding of speech sounds along the auditory pathway Kühnis et al. 2013;Coffey et al. 2017b). In addition, enhancement of higher-level cognitive processes, such as auditory attention and working memory, has been correlated with improved SIN perception in musicians (Strait and Kraus 2011;Kraus et al. 2012;Puschmann et al. 2019;Yoo and Bidelman 2019;Zhang et al. 2021). Musicians also exhibit more robust specificity of phoneme representations in frontal articulatory system and auditory regions as well as stronger intra-and interhemispherical functional connectivity between those regions than nonmusicians in a SIN task (Du and Zatorre 2017). According to the analysis-by-synthesis model, feedforward articulatory predictions are generated to assist the perception of acoustic patterns under noisy and uncertain listening contexts (Poeppel and Monahan 2011). Since sensorimotor interplay is ubiquitous in playing music that results in pervasive structural and functional plasticity in the overlapped sensorimotor network implicated in both the production and perception of music and speech (Hickok and Poeppel 2007;Zatorre et al. 2007;Bailey et al. 2014), it is hypothesized that musicians would benefit from strengthened sensorimotor integration function when processing speech, particularly in challenging listening environments. Indeed, greater task-related (Du and Zatorre 2017) and resting (Luo et al. 2012;Palomar-Garcia et al. 2017;Zamorano et al. 2017) functional connectivity in auditory-motor networks have been found in musicians than nonmusicians. Compared with nonmusicians, musicians also exhibited stronger structural connectivity in sensorimotor circuits including the arcuate fasciculus/superior longitudinal fasciculus (AF/SLF) (Oechslin et al. 2010;Halwani et al. 2011;Giacosa et al. 2016).
The AF/SLF has been proposed as the anatomical basis of the auditory "dorsal stream" for mapping speech phonological information onto articulatory motor representations (Hickok and Poeppel 2007;Saur et al. 2008). Note that, historically, the AF and SLF have been viewed as synonymous fiber pathways, but recently, they are identified as partially overlapped tracts with differential termination regions (Frey et al. 2008;Gierhan 2013;Martino et al. 2013;Chang et al. 2015). According to the widely used segmentation approach by Catani et al. (2005), the AF, including the classical AF and most portions of SLF, has 3 segments: a long direct frontotemporal segment, an anterior indirect frontoparietal segment (overlapping with SLF-II and SLF-III), and a posterior indirect temporoparietal segment (belonging to the temporal part of SLF, SLF-tp) (Wang et al. 2016). Nonetheless, the exact functional specialization of the 3 AF segments is far from definite.
Based on lesion studies and brain electrical stimulation, the direct AF has been implicated in syntactic and phonological processing, the anterior AF in articulation, and the posterior AF in phonological processing (Duffau 2008;Gierhan 2013;Chang et al. 2015). Using diffusion tensor imaging (DTI), it was found that the microstructure of the left direct AF was associated with auditory word learning ability (López-Barroso et al. 2013) and phoneme awareness (Vandermosten et al. 2012), the microstructure of the left anterior AF correlated with speech imitation ability (Vaquero et al. 2017), whereas abnormal microstructure of the left posterior AF was related to impaired phonological working memory in patients with reading disability and autism (Lu et al. 2016). Directly related to the present study, the fractional anisotropy (FA) of the left posterior AF significantly correlated with performance of perceiving speech sentences in noise in 20 dyslexic and 20 control participants when group type, IQ, and quality index of DTI acquisition were controlled (Vandermosten et al. 2012), whereas the mean diffusivity of the left direct AF mediated the aging effect on syllable in noise discrimination sensitivity (Tremblay et al. 2019).
However, the white matter substrates supporting SIN perception in musicians have not been studied. In contrast to the relevance of speech and language function with AF microstructure in the left hemisphere, the right AF has been implicated in music processing. For instance, the structure of the right AF predicted melody and rhythm learning speed (Vaquero et al. 2018) and pitch-related musical grammar learning performance  in nonmusicians, and abnormal right AF was identified in congenital amusia (Peretz 2016;Chen et al. 2018) and acquired amusia (Sihvonen et al. 2019). Moreover, musical training has been associated with increased tract volume in the right AF and higher FA value in the left AF (Halwani et al. 2011), enlarged F1 (a directional diffusivity measure) value of the right AF/SLF (Giacosa et al. 2016), as well as enhanced volume in the right direct AF that reduced the normative leftward asymmetry of the direct AF (Vaquero et al. 2020). In general, it remains unclear how musical experience modulates the microstructure of bilateral AF segments, and more importantly, whether those features are related to superior SIN perception ability in musicians.
In the present DTI study, we aimed at investigating whether and how long-term musical training modulates the white matter diffusivity of bilateral AF segments and its contribution to SIN perception ability. The corticospinal tract (CST), which is part of the sensorimotor system and connects cortical motor regions with brainstem, is often modulated by musical experience (Giacosa et al. 2016;Imfeld et al. 2009), but should not be related to audition, and thus served as a control tract in the current study. The diffusivity values and lateralization pattern of each tract were compared between a group of musicians and a group of nonmusicians who had participated in our previous functional magnetic resonance imaging (fMRI) study (Du and Zatorre 2017); and more importantly, these values were correlated with SIN perception performance. As shown above, the right AF is consistently reported as a key tract with musical experiencedependent plasticity. We hypothesized that compared with nonmusicians, musicians might exhibit changed diffusivity in the right direct AF and altered AF asymmetry that would scale with individual's SIN perception ability. Moreover, as shown in our previous fMRI study, the blood oxygenation level-dependent (BOLD) activity in auditory areas of the right superior temporal gyrus (STG) predicted SIN perception accuracy in musicians (Du and Zatorre 2017). Since the right STG is a terminal region of the right direct AF, a mediation analysis was further carried out to test our assumption that the diffusivity of the right direct AF leveraged on SIN perception performance via the BOLD activity in the right STG as a mediator. The rationale for the model selection is that the long-term structural basis is more likely to influence the immediate hemodynamic activity in the target region, which in turn contributes to behavioral performance (structure → function → behavior), than the other way around (function → structure → behavior). By doing so, it could link the white matter microstructural reorganization and hemodynamic functional changes in the auditory-motor network with SIN behavior, which would yield new insights into the neural foundations of speech perception advantage associated with musical expertise.

SIN Perception Task
This task was carried out during a previous fMRI scanning (for details, see Du and Zatorre 2017). Four 500-ms consonant-vowel syllables (/ba/, /ma/, /da/, and /ta/) with fixed 85 dB sound pressure level were randomly presented with a simultaneous 500-ms white noise segment at one of the 5 signal-to-noise ratios (SNRs: −12, −8, −4, 0, and 8 dB). Participants were required to identify syllables by pressing corresponding keys. The mean accuracy across syllables and SNRs was calculated for each individual and used as an index of SIN perception performance. As previously reported, musicians showed higher accuracy than nonmusicians in the SIN task (musicians: 76.64 ± 3.68%, nonmusicians: 68.29 ± 5.56%, t 26 = 4.69, P < 0.001, Cohen's d = 1.80). No significant correlation was found between SIN accuracy and musical training years in musicians (r = 0.21, P = 0.48). The lack of correlation may be due to a narrow range of training duration (11 ∼ 22 years) without recruiting amateur or life-long musicians, as well as the small sample size.
PANDA, the fully automated brain diffusion images processing software (Cui et al. 2013, http://www.nitrc.org/projects/pa nda/), was employed to process the raw DTI data. The core commands for preprocessing procedure were embedded in the Functional MRI of the Brain (FMRIB) software library (FSL v5.0, Smith et al. 2004, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki). Firstly, the Brain Extraction Toolbox was executed on the diffusion tensor images to obtain skull stripped images. Then the diffusion images were registered to the b0 images with an affine transformation for correcting the eddy current distortion. After that, DTIFIT was applied to build tensor models, and 3 eigenvalues, λ 1, λ 2 , and λ 3 , were determined through tensor fitting. The λ 1 and the averaged λ 2 and λ 3 were used to profile water diffusion parallel to the axonal direction (axial diffusivity, AD) and in the perpendicular direction (radial diffusivity, RD) that differentiate axon and myelin changes, respectively (Song et al. 2003). The relative ratio of AD to RD was defined as FA, which is considered as a global measure to reflect the degree of myelination and properties of axons (Song et al. 2003) and an index of microstructural ordering and integrity of fibers (Catani et al. 2007). Mean diffusivity (the average of 3 eigenvalues) was not used here since its meaning is hard to interpret. Compared with FA alone, various diffusivity measurements including FA, AD, and RD enable a better understanding of white matter properties (e.g., axons diameters, myelination) due to axonal sprouting, pruning or rerouting (Walhovd et al. 2014;Zatorre et al. 2012). The diffusivity images in individual space were then registered to a standardized template in MNI space with a voxel size of 2 × 2 × 2 mm 3 .

Fiber Tractography
Deterministic tractography was performed to attain the probability maps of bilateral AF in each participant group using Diffusion Toolkit software (http://trackvis.org./dtk/) based on FACT tracking algorithm (Mori et al. 1999). The 3 cortical termination territories for reconstructing the tractography of AF and its 3 segments were defined according to Catani et al. (2005). The direct AF connects posterior inferior frontal gyrus (IFG) and precentral gyrus (Broca's territory) with posterior temporal lobe (Wernicke's territory); the anterior AF connects posterior IFG and precentral gyrus (Broca's territory) with inferior parietal cortex (Geschwind's territory); the posterior AF links inferior parietal cortex (Geschwind's territory) with posterior temporal lobe (Wernicke's territory) (Catani and Mesulam 2008). Here, using the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002), the Broca's territory included pars opercularis (BA44) of IFG and ventral precentral gyrus, the Geschwind's territory included angular gyrus and supramarginal gyrus, the Wernicke's territory contained posterior STG and middle temporal gyrus. In spite of several ways to delineate the branches of AF/SLF (Wakana et al. 2007;Catani and Mesulam 2008;Glasser and Rilling 2008), the well-studied Catani's definition of AF segments has achieved consistency across subjects and fiber tracking methods in previous studies (Catani et al. 2007;Chen et al. 2018; Note: The numbers of participants with identified AF segment or CST are indicated in italic for musicians and nonmusicians, respectively. P was estimated by permutation tests. * FDR-corrected P < 0.05. SD, standard deviation. Oechslin et al. 2010;Vaquero et al. 2018Vaquero et al. , 2020Yeatman et al. 2012). As a control tract, the CST was dissected using the protocol of Wakana et al. (2007), in which the inferior termination territory was placed on the cerebral peduncle at the level of the decussation of the superior cerebellar peduncle while the superior territory was identified to cover motor fibers penetrating primary motor cortex and central sulcus.
Before formal fiber tracking, all termination regions in MNI space were mapped into individual native space. Fiber tracking initiated from 10 randomly selected seeds within each voxel of the termination regions in the native diffusion space and terminated if the angle between 2 consecutive orientations exceeded 45 • or if the FA value was lower than 0.2. It is suggested in the literature that it was impossible to reconstruct a continuous trajectory of the right direct AF in nearly a half of participants using deterministic tracking (Catani et al. 2007); for instance, the right direct AF was identified in 20 of 39 subjects in Yeatman et al. (2012), 16 of 33 and 54 of 100 subjects in Bain et al. (2019) and 32 of 54 subjects in Chen et al. (2018). Consistent with these previous findings, the right direct AF was reconstructed in 8 of 14 musicians and 9 of 14 nonmusicians here. In comparison, the left direct AF and the right anterior AF were identified in 13 musicians and 13 nonmusicians, the left anterior AF was found in 10 musicians and 11 nonmusicians, and bilateral posterior AF were reconstructed in all participants (see Table 1). Tract detection rates for the 3 segments of bilateral AF were not statistically different between groups (P > 0.05, Pearson's chi-squared tests). The fiber trajectories of the 3 segments of bilateral AF in a typical individual are shown in Figure 1A and that of the CST are shown in Supplementary Figure S1A.
Each tract trajectory was then registered to the MNI space resulting in a binary map. The overlapped binary maps in all participants of each group were used to generate a probabilistic map. The value of each voxel in the probabilistic map means the number of subjects who had fiber trajectories in that voxel. A group-level threshold was then set at voxel value >28% of subjects (4 of 14 subjects) and cluster size >240 voxels (1920 mm 3 ). This approach balances the need to minimize extraneous fibers, register subtle differences of AF morphology between groups and hemispheres, and enable the correlational analysis with the behavioral variable across all subjects. Many studies on AF/SLF and other language or music-related tracts used a similar threshold as we did (25-30%, Fang et al. 2015, Oechslin et al. 2018 or even lower (10%, Loui et al. 2011). Figure 1B shows the probabilistic maps of AF segments in 2 groups separately, whereas that of the CST are shown in Supplementary Figure S1B. The overlapped and unique parts of the thresholded group-level probability maps for extracting diffusivity measures are displayed for AF segments in Figure 1C and the CST in Supplementary Figure S1C. Total voxel numbers of each tract in 2 groups are shown in Supplementary Table S2.
Lastly, for each individual, the mean diffusivity values (FA, AD, and RD) of each tract were extracted from the diffusivity maps that were overlaid with the reconstructed tract templates of each group in MNI space. The FA-based LI was then calculated for characterizing the hemispherical asymmetry of each AF segment and the CST using the following algorithm: LI = (left FA − right FA ) / (left FA + right FA ).

Statistical Analysis
The group differences of age, education years, pure-tone hearing level, auditory digit span, nonverbal IQ, and SIN perception accuracy were examined using parametric tests (independent samples t-test for 2 groups). To obtain robust estimations on small samples without making any assumptions about the sampling distribution, nonparametric permutation tests were conducted on tract diffusivity or LI. Firstly, the diffusivity (FA, AD, and RD) and LI values of each tract were compared between groups by 2-sample t tests to verify the musical experience-dependent plasticity on white matter microstructure. The significance of laterality of each AF segment was additionally tested by a 1sample t-test in each group separately. Next, for specifying white matter tracts involved in SIN perception, partial correlations were conducted between the diffusivity or LI values of each tract and the SIN performance, with hearing level, working memory (digit span) and nonverbal IQ as covariates. Additionally, Pearson's correlations were performed to assess the relationships between musical training time and diffusivity or LI values. Following each above mentioned analysis, a null distribution of test statistics (t-or r-values) was generated by randomizing the labels of diffusivity or LI values 10 000 times. The permutation test P value was calculated by ranking of a true statistical value in the shuffled distribution (P = ranking+1 10 000+1 ). Multiple comparisons were corrected with a FDR-corrected q < 0.05 using Benjamini-Hochberg procedure on 24 measurements (FA, AD, and RD of 8 tracts) or 28 measurements (FA, AD, and RD of 8 tracts plus 4 LIs).

Mediation Analysis
To further investigate how musical experience-related white matter plasticity influenced the SIN performance, mediation analysis was conducted in AMOS software (Version 7.0) using maximum likelihood estimation to assess the direct and indirect relationships of the following variates:1) the FA value of the right direct AF that showed significant group difference and correlation with SIN performance; 2) the BOLD activity in the right STG cluster (peak Talairach coordinates: 47, −28, −1, BA 22/21, 324 mm 3 ) that was significantly higher in musicians than nonmusicians during the SIN task (family-wise error corrected P < 0.001 by main effect of group) and predicted SIN performance in musicians (r = 0.53, P = 0.043, N = 15 by Pearson's correlation) in our previous fMRI study (Du and Zatorre 2017); and 3) the SIN perception accuracy. Based on our hypothesis, a mediation model was tested with the hearing level, digit span, and nonverbal IQ as covariates: the FA of the right direct AF impacted on SIN performance via the right STG BOLD activity as a mediator (diffusivity → BOLD → SIN). The reverse model that white matter diffusivity mediated the effect of BOLD activity on SIN behavior (BOLD → diffusivity → SIN) was also tested as a supplementary verification. The bias-corrected bootstrapping method with 5000 iterations was used to estimate the 95% confidence intervals (CI). The indices of model fitting included Chi-Square statistic (χ2), its degrees of freedom and P value, root mean square error of approximation (RMSEA), normed-fit index (NFI), and comparative fit index (CFI). P of χ2 > .05, RMSEA < 0.07, NFI > 0.95, and CFI > 0.95 indicate an acceptable fit of the model (Hopper et al. 2008). It is worth mentioning that although the mediation analysis cannot confirm the causal link, it provides a way to infer the causal relationships of white matter changes on SIN performance in a statistical sense (Pearl 2012).

Group Differences in Diffusivity
As shown in Table 1, compared with nonmusicians, musicians showed significantly higher FA (P = 0.004, FDR-corrected, Cohen's d = 1.20) and higher AD (P < 0.001, FDR-corrected, Cohen's d = 1.68) in the right direct AF, as well as lower RD (P = 0.003, FDR-corrected, Cohen's d = -1.26) and a tendency of higher FA (uncorrected P = 0.021, Cohen's d = 0.92) in the left anterior AF. The increment of FA with an increment of AD and a stable RD suggests axonal property changes in the right direct AF, whereas the increment of FA accompanied by a decrement of RD and a stable AD are compatible with an enhanced degree of myelination of the left anterior AF in musicians (Song et al. 2005;Wheeler-Kingshott and Cercignani 2009;Zatorre et al. 2012). Notably, years of musical training did not correlate with diffusivity values in bilateral AF (Supplementary Table S3, FDR-corrected P > 0.05). With regard to the control tract, no group difference in diffusivity was identified (FDR-corrected P > 0.05, Table 1) and no correlation of CST diffusivity with years of training was found (FDR-corrected P > 0.05, Supplementary Table S3).

Correlation Between Diffusivity and SIN Performance
To identify the white matter correlates supporting musician benefit on SIN perception in a robust way, we report only tract diffusivity values showing both a significant group difference, as well as a significant partial correlation with SIN performance after controlling for hearing level, auditory working memory and nonverbal IQ, and passing FDR correction. As shown in Figure 2, for tracts with significant group difference in diffusivity, only the FA of the right direct AF positively correlated with SIN accuracy (P = 0.013, FDR-corrected) and the RD of the left anterior AF negatively correlated with SIN performance (P = 0.006, FDRcorrected) in all participants. In addition, the RD values of the left direct AF, the right anterior AF and the left posterior AF as well as the AD values of bilateral posterior AF negatively correlated with SIN perception across the subjects (FDR-corrected P < 0.05 for all), although the group difference was not significant. All the correlation estimates are listed in Supplementary Table S4.
Notably, the correlation estimates should be treated with caution due to the relative small sample size. When group type was additionally added as a covariate, only the AD of the left posterior AF showed a significant correlation with SIN Figure 2. AF segments showing significant group difference and significant partial correlation with SIN perception accuracy in all participants after controlling for hearing level, digit span, and nonverbal IQ. The histograms show the group mean diffusivity and error bars represent standard errors of the mean. * FDR-corrected P < 0.05 by permutation tests. Note that, due to the relative small sample size, the correlation estimates should be treated with caution. performance (P = 0.001, FDR-corrected, Supplementary Table S4), which is consistent with previous finding (Vandermosten et al. 2012). As a control tract, the diffusivity of bilateral CST did not correlate with SIN accuracy (FDR-corrected P > 0.05 for all, Supplementary Table S4). Therefore, among a number of fibers associated with SIN processing or modulated by training, only the right direct AF and the left anterior AF were the 2 core white matter underpinnings of musician advantage in SIN perception.

Mediation Analysis: Diffusivity, BOLD, and SIN Performance
As the FA of the right direct AF (r = 0.45, P = 0.013) and the BOLD activity in the right STG (r = 0.63, P < 0.001) all correlated with SIN accuracy across the entire sample of subjects, mediation analysis was performed to understand their relationships. As shown in Figure 4, Model A (diffusivity → BOLD → SIN) showed significant mediation effect and fitted the data well (χ2(9) = 9.77, P = 0.37; RMSEA = 0.56; NFI = 0.71; CFI = 0.96). Model A could explain 46.8% of variance in SIN perception accuracy. Specifically, the FA of the right direct AF significantly predicted the BOLD activity in the right STG (a: β = 0.41, P = 0.003, 95% CI = [0.14, 0.61]) which in turn significantly contributed to SIN performance (b: β = 0.50, P = 0.006, 95% CI = [0.20, 0.85]). Thus, the indirect effect of FA of the right direct AF on SIN performance with the right STG BOLD activity as a mediator was significant (c': β = 0.21, P = 0.005, 95% CI = [0.07, 0.43]), whereas the direct effect of FA of the right direct AF on SIN accuracy was insignificant (c: β = 0.23, P = 0.12, 95% CI = [−0.07, 0.50]). Additionally, to test the possibility that structural changes may mediate the effect of brain function on behavioral performance, Model B with the FA of the right direct AF as a mediator in explaining the effect of the right STG BOLD activity on SIN performance (BOLD → diffusivity → SIN) was tested but failed (c': β = 0.01, P = 0.076, 95% CI = [−0.12, 0.29]). These results suggest that the structural alterations of the right direct AF drove enhanced SIN perception in musicians by influencing the hemodynamic activity in the right STG.

Discussion
In the present study, we found higher FA in the right direct AF, lower RD in the left anterior AF, and stronger left-lateralized posterior AF in musicians relative to nonmusicians; and we also found that these variables predicted better SIN perception across all participants. Thus, the microstructural organization of white matter tracts that connect auditory and frontal motor Error bars indicate standard error of the mean. (B) Partial correlations between the LI and SIN accuracy in all participants after controlling for hearing level, digit span, nonverbal IQ. * FDR-corrected P < 0.05 by permutation tests. Compared with nonmusicians, musicians showed stronger right lateralization of the direct AF and stronger left lateralization of the posterior AF, and stronger left lateralization of the posterior AF correlated with better SIN performance in all subjects. Note that, due to the relative small sample size, the effect size estimates should be treated with caution. regions in both hemispheres may serve as a neural foundation of the musician advantage in understanding SIN, which supports the view that musical training strengthens sensorimotor integration in facilitating speech perception in noisy environments. Moreover, mediation analysis revealed an indirect effect for the FA of the right direct AF in predicting SIN accuracy, in which the right STG BOLD activity played a full mediation role. This finding is important in supporting a causal relationship between white matter structure, immediate brain hemodynamic function and behavior in explaining the musical training effect on speech perception.
Musical training has emerged as an efficient framework to explore brain plasticity underlying sensorimotor interactions (Herholz and Zatorre 2012). Playing music requires precisely, coherently and timely organized motor sequences, which depend on widespread anatomical and functional networks for fine-grained perception and motor control (Zatorre et al. 2007). Feedforward and feedback sensorimotor integration are implemented for monitoring, preventing and minimizing errors during music perception and production. Similarly, sensorimotor integration is involved in speech perception and production (Hickok and Poeppel 2007). Articulatory gestures based on internal models are fed forward for constraining the interpretation of acoustic patterns during speech perception, whereas acoustic features are fed back for monitoring the discrepancy between desired and actual articulation during speech production. Indeed, a unified sensorimotor integration network in speech production and speech perception was demonstrated in a recent meta-analysis of neuroimaging studies (Skipper et al. 2017). Thus, it is likely that an overlapped neuroanatomical network contributes to sensorimotor interactions in both music and speech processing, allowing a transfer effect from music domain to speech domain (Kraus and Chandrasekaran 2010).
As analysis-by-synthesis models propose, speech perception requires articulatory prediction and sensorimotor integration to promote phonetic encoding and compensate for environmental degradation (Poeppel and Monahan 2011). Accordingly, fMRI studies have demonstrated that sensorimotor integration can provide a means of compensation for decoding impoverished speech representations due to background noise or aging (Du et al. 2014;Du et al. 2016). Thus, musicians would take advantage of more efficient motor prediction and sensorimotor integration in improving speech perception in noisy circumstances. As shown in our previous fMRI study, improved SIN perception in musicians relied on stronger recruitment of auditory and frontal speech motor cortices in both hemispheres, as well as finer phonological representations in, and stronger functional connectivity between these structures (Du and Zatorre 2017). The present DTI findings support such an account that strengthened sensorimotor interaction contributes to musician advantage in SIN perception and shed light on the structural correlates underlying sensorimotor integration involved in both musical training and speech perception.
According to the dual-stream model of speech processing, the dorsal stream is involved in mapping phonological features onto articulatory representations (Hickok and Poeppel 2007). The AF is considered as the primary white matter pathway connecting posterior temporal and inferior frontal regions in support of the sensorimotor integration function of the dorsal stream. Here, musicians had significantly higher FA in the right direct AF that correlated with better SIN performance. This result is consistent with previous findings that musical training was associated with increased tract volume in the right AF (Halwani et al. 2011;Vaquero et al. 2020) and larger F1 (a directional diffusivity measure) value of the right SLF/AF (Giacosa et al. 2016). The architecture of the right AF has been related to musical abilities, such as melody and rhythm learning (Vaquero et al. 2018), pitch-related musical grammar learning , and it was found to be abnormal in congenital amusia (Peretz 2016;Chen et al. 2018) and acquired amusia (Sihvonen et al. 2019). In comparison, the left AF has been strongly involved in speech processing tasks. In particular, the FA of the left direct AF positively correlated with phoneme awareness (Vandermosten et al. 2012), and the mean diffusivity of the left direct AF mediated the aging effect on SIN perception (Tremblay et al. 2019).
A new finding here is the involvement of the right direct AF in SIN processing in association with musical training. This indicates a possible functional extension for stream segregation of speech sounds from background in addition to musical function of the right AF in musicians and also suggests a transition from a left lateralized dorsal pathway to a bilaterally symmetric or even right lateralized dorsal pathway as a result of musical expertise. Indeed, as revealed by the FA-based lateralization analysis, the direct AF was right lateralized in musicians but symmetric in nonmusicians. Although many studies have described a leftward laterality of the whole AF (Lebel and Beaulieu 2009) and the direct AF in particular (Catani et al. 2007) in normal population using either macro-(i.e., streamline count) or microstructural (i.e., FA) measurements, symmetric direct AF (López-Barroso et al. 2013) or right lateralized whole SLF/AF (including direct and indirect anterior tracts, Oechslin et al. 2010) have also been observed when laterality was defined by FA rather than streamlines.
A recent study using different tractography pipelines on 2 large-sample datasets found that the tractography choice and implementing the cortical constraints substantially impact the laterality measurement, which may explain the discrepant results (Bain et al. 2019). When deterministic tractography with constrained AF through defined endpoints was performed as we did here, across the 2 datasets no significant laterality of the direct AF was found if LI was calculated by voxel or streamline count, and the FA-based LI exhibited a left laterality in the anterior and posterior portions of the direct AF but a right laterality in the middle portion of the direct AF. The lack of asymmetry in nonmusicians in the present study may be partially due to diverse microstructure properties along the direct AF trajectory as well as the application of group and hemisphere-specific tract templates. Moreover, no consensus has been reached about how musical experience impacts on AF laterality. Using FA-based laterality of the group-thresholded core fibers of the whole SLF (direct plus indirect anterior AF), musicians with absolute pitch exhibited a left asymmetry, musicians with relative pitch had no asymmetry, whereas nonmusicians showed a right asymmetry (Oechslin et al. 2010). However, it is also found that musical training started at early childhood significantly reduced the leftward asymmetry in the volume of the direct AF (Vaquero et al. 2020). Interestingly, although the direct AF showed a bilateral symmetry in only 17.5% of the subjects, individuals with more symmetric direct AF were better at verbal recall (Catani et al. 2007). Thus, the better organized right AF and bilaterally symmetric or even right lateralized auditory-motor pathway may benefit speech processing in musicians.
Moreover, our previous fMRI study found that higher activity in the right STG was beneficial for SIN perception in musicians (Du and Zatorre 2017). As the mediation analysis pointed out, the FA of the right direct AF did not directly affect SIN performance but took the hemodynamic response in its terminated cortical region, the right auditory cortex, as a key mediator. It is thought that auditory cortices exhibit functional asymmetry: the left and right auditory cortices are specialized in terms of sensitivity to temporal and spectral modulation rate (Zatorre et al. 2002;Albouy et al. 2020) or in short and long temporal integration windows (Poeppel 2003), respectively. Similarly, it is reported that musicians were more sensitive to temporal (voice-onset time and duration) variations of syllables in left auditory regions and spectral (vowel) variations of syllables in right auditory regions (Kühnis et al. 2013). Presumably, a more bilaterally organized dorsal stream accompanying stronger engagement of the right auditory cortex in musicians would provide complementarity and advantage in processing speech in adverse listening environments since both temporal and spectral feature processing would be enhanced. The present finding for the first time reveals not only the white matter foundation but also its interaction with dynamic neural activity in contributing to the enhanced SIN perception in musicians.
For the posterior AF, no significant group difference was found for the diffusivity measurements. However, the FA of the posterior AF showed greater left lateralization in musicians relative to nonmusicians, and larger leftward laterality of the posterior AF as well as lower RD of the left posterior AF predicted better SIN performance. This fits well with findings that the left posterior AF contributed to phonological processing and SIN perception (Duffau 2008;Vandermosten et al. 2012). In spite of that nonmusicians generally had symmetric posterior AF in terms of the FA measurement (Catani et al. 2007;López-Barroso et al. 2013), stronger microstructural ordering and integrity of the left posterior AF relative to its right counterpart may give rise to more efficient encoding and integration of phonological representations of speech signals in musicians.
In addition, the left anterior AF showed decreased RD in musicians, which was related to enhanced SIN perception. Previous studies have shown that electrical stimulation of the anterior segment of left AF gave rise to dysarthria, suggesting its function in articulation (Duffau 2008;Maldonado et al. 2011). The musical training-related plasticity in the left anterior AF may facilitate the mapping between articulatory gestures and phonological representations in SIN perception. By examining the group difference and correlation with SIN behavior on refined AF segments in bilateral hemispheres, our findings provide insight on distinct musical experience-related plasticity in bilateral AF segments and their differential contributions to improved SIN perception.
The current study takes the first step to reveal the white matter substrates of musicians' advantage in processing speech in adverse environments; however, some limitations exist. Firstly, the musician cohort included different types of instrumentalists as well singers, making it impossible to differentiate more detailed plasticity effects on brain structures due to the discrepancy of hands or vocal tract involvement during musical performance and their transfer effects on speech processing. Secondly, as a cross-sectional study, lacking pretraining observation of brain structures makes it difficult to distinguish between musical training-related plasticity and other environmental influences or genetic regulation on microstructural reorganization. Thirdly, the relative small sample size may lead to unstable observations on group probabilistic maps of tracts and their diffusivity, and thus reduce the reproducibility. Replications are needed with larger and more homogeneous samples in future studies. Moreover, here we focused on musicians with early age of start and relatively long-term training; amateur musicians with different training length and age of training onset should be included in future studies to explore the effects of training time and training onset on SIN perception and its structural correlates. Finally, advanced image acquisition methods like high angular resolution diffusion-weighted imaging (HARDI, Tournier et al. 2011) and analysis strategies including fiber orientation distributions function and fixel-based analysis (Raffelt et al. 2017) would solve the crossing-fiber issue, improve the tractography quality, and deepen our understanding of musicians' advantage in processing speech.
To sum up, microstructural organization of the right direct AF and the left anterior AF, as well as stronger left lateralization of the posterior AF, may serve as structural correlates bolstering improved speech perception in noisy environments in musicians, providing new insight on the functional role of bilateral sensorimotor circuits in speech perception. In particular, mediation analysis revealed an indirect effect for the FA of the right direct AF in explaining SIN performance, in which the right STG BOLD activity played a full mediation role, highlighting the causal relationship between white matter plasticity, hemodynamic function, and behavior.

Supplementary Material
Supplementary material can be found at Cerebral Cortex online.

Funding
The National Natural Science Foundation of China (grants 31671172, 31822024); the Strategic Priority Research Program of Chinese Academy of Sciences (grant XDB32010300); the Canadian Institutes of Health Research (Foundation Grant); an infrastructure grant from the Canada Fund for Innovation.