Damage to Broca’s area does not contribute to long-term speech production outcome after stroke

Abstract Broca’s area in the posterior half of the left inferior frontal gyrus has long been thought to be critical for speech production. The current view is that long-term speech production outcome in patients with Broca’s area damage is best explained by the combination of damage to Broca’s area and neighbouring regions including the underlying white matter, which was also damaged in Paul Broca’s two historic cases. Here, we dissociate the effect of damage to Broca’s area from the effect of damage to surrounding areas by studying long-term speech production outcome in 134 stroke survivors with relatively circumscribed left frontal lobe lesions that spared posterior speech production areas in lateral inferior parietal and superior temporal association cortices. Collectively, these patients had varying degrees of damage to one or more of nine atlas-based grey or white matter regions: Brodmann areas 44 and 45 (together known as Broca’s area), ventral premotor cortex, primary motor cortex, insula, putamen, the anterior segment of the arcuate fasciculus, uncinate fasciculus and frontal aslant tract. Spoken picture description scores from the Comprehensive Aphasia Test were used as the outcome measure. Multiple regression analyses allowed us to tease apart the contribution of other variables influencing speech production abilities such as total lesion volume and time post-stroke. We found that, in our sample of patients with left frontal damage, long-term speech production impairments (lasting beyond 3 months post-stroke) were solely predicted by the degree of damage to white matter, directly above the insula, in the vicinity of the anterior part of the arcuate fasciculus, with no contribution from the degree of damage to Broca’s area (as confirmed with Bayesian statistics). The effect of white matter damage cannot be explained by a disconnection of Broca’s area, because speech production scores were worse after damage to the anterior arcuate fasciculus with relative sparing of Broca’s area than after damage to Broca’s area with relative sparing of the anterior arcuate fasciculus. Our findings provide evidence for three novel conclusions: (i) Broca’s area damage does not contribute to long-term speech production outcome after left frontal lobe strokes; (ii) persistent speech production impairments after damage to the anterior arcuate fasciculus cannot be explained by a disconnection of Broca’s area; and (iii) the prior association between persistent speech production impairments and Broca’s area damage can be explained by co-occurring white matter damage, above the insula, in the vicinity of the anterior part of the arcuate fasciculus.

independent of the scanner and/or sequence used (because voxel intensities are normalised with respect to those observed in neurologically-intact controls imaged on the same scanners with the same sequences).
In Supplementary Table 1, we replicate our main result (i.e. Model 2 reported in the main text) using the fuzzy (continuous) lesion images (rather than the binary lesion images) which provide an unbiased, objective quantification of the degree of structural abnormality across the whole brain relative to neurologically-intact controls. In contrast to their binary counterpart, the fuzzy lesion images do not necessitate the adoption of an arbitrary threshold to define what is damaged or not. To compute the degree of structural abnormality in every atlasdefined regions of interest, we averaged the signal indexed by the fuzzy lesion images over all voxels within each region and entered these (rather than % damaged) into the regression.

Checking the assumptions of multiple regression
For multiple regression to generate valid results that can be generalised beyond the sample at hand, the data must pass six core assumptions: (1) independence of observations; (2) linearity of relationship between the outcome variable and each of the explanatory variables; (3) homoscedasticity; (4) no high multicollinearity; (5) lack of significant outliers and influential cases; (6) normally distributed errors. To assess whether our patient data met these assumptions in the context of Model 2 (i.e. the result that guided all other analyses), we resorted to the multiple regression diagnostic statistics and plots (Field, 2018) described below.
A Durbin-Watson test statistic smaller than 1 or larger than 3 signalled violations of the independence of observations assumption. Scatterplots and partial regression plots of the relationship between the outcome variable and each of the explanatory variables were created to check for linearity. Scatterplots of standardised residuals against standardised predicted values and studentised residuals against standardised predicted values were created to check for homoscedasticity. A variance inflation factor (VIF) value for any of the regressors greater than 10 signalled the presence of high multicollinearity. Significant outliers were defined as those that were associated with residuals that departed from the mean by more than 3 standard deviations. Influential cases were defined as those that resulted in a Cook's distance value greater than 1. A P-P plot of standardised residuals was created to check for normally distributed errors.
According to these multiple regression diagnostics, our data met all core assumptions outlined above with the exception of "no high multicollinearity". Specifically, the coefficients for two regressors of interest (vPMC and FAT) and one regressor of no interest (total lesion volume) were associated with VIF values greater than 10. Multicollinearity renders the affected regression coefficients and corresponding significance tests unreliable, but it does not compromise the validity of the regression model as a whole. In other words, it makes it difficult to assess the relative importance of the individual regressors affected by multicollinearity.
Since total lesion volume was included as a control variable, we only handled multicollinearity for vPMC and FAT by re-running the regression after removing the regressors of interest with which vPMC or FAT were highly correlated (i.e. r > 0.80). This was achieved in two separate steps: first for vPMC (removing BA44, M1 and FAT) and then for FAT (removing BA44, vPMC and M1), which successfully lowered their VIF values well below the cut-off (to 2.4 and 5.6, respectively). Critically, both these sanity checks confirmed that aAF continued to be the only significant anatomical predictor of speech production scores (see Supplementary Table 1).
Finally, although one case (PS0571) yielded a standardised residual greater than 3, it did not exert undue influence over the model (i.e. Cook's distance = 0.11) giving us no cause for concern. Supplementary Figure 6. Transcripts of the spoken picture description responses for two exemplar patients. The picture that is shown to participants during the spoken picture description task from the CAT is provided along with the transcripts of the spoken responses for two exemplar patients from the BA44 group or aAF group. These patients had >50% damage to the BA44 (PS2856) or aAF (PS0364) regions of interest and similarly sized lesions (62.2 cm 3 and 58.2 cm 3 ). In PS0364 (with aAF damage), speech production is severely impaired. In contrast, PS2856 (with BA44 damage) is able to select and combine words to describe the scene in detail but misses an important point of the story (i.e. that the boy is trying to warn the sleeping man that books are about to land on his head). For all replications, the following regressors of no interest were included: (i) total lesion volume, (ii) months post-stroke, (iii) age at stroke and (iv) scores from the CAT semantic memory task.  Regressor affected by multicollinearity (i.e. VIF > 10). Multicollinearity was handled by re-running the regression after removing the regressors of interest (other than aAF) with which the affected regressor was highly correlated (i.e. r > 0.80). All these sanity checks confirmed that aAF continued to be the only significant anatomical predictor of speech production scores.

Supplementary
The table shows how the number of patients (n) in each group drops rapidly with increasing damage thresholds. This empirical observation led us to choose the damage threshold that maximised between-group differences in the degree of damage to BA44 and aAF, while ensuring sufficient statistical power to match total lesion volume across groups and conduct formal statistical comparisons. Lesion volume is expressed in cm 3 . Lesion load in BA44 and aAF is specified in terms of percentage of damage. Numbers indicate mean (±standard deviation). SPD = spoken picture description T-score. SPD = spoken picture description T-score. Aphasic score on SPD is 60 or below. The three selected patient groups did not significantly differ in terms of age at stroke, age at scan, months post-stroke and total lesion volume (all p > 0.45). P-values and standardised beta coefficients for the regressors of no interest: (i) LVol = total lesion volume, (ii) TimePS = time post-stroke in months, (iii) Age = age at stroke, and (iii) SemM = scores from the semantic memory task. See Table 4 for regressors of interest.  Regressor affected by multicollinearity (i.e. VIF > 10); see supplementary section entitled "Checking the assumptions of multiple regression" for details. P-values correspond to the comparison of the BA44 group versus aAF group. Mean task scores that fell within the impaired range for the BA44 group and/or aAF group are highlighted in bold. The cut-off T-score signals the upper bound of the impaired range for that particular task. At an uncorrected statistical threshold of p < 0.05, the aAF group performed worse than the BA44 group on the following tasks: repetition of words, repetition of nonwords and reading words. The BA44 group did not perform worse than the aAF group on any of the tasks.