-
PDF
- Split View
-
Views
-
Cite
Cite
Tamar I Regev, Hee So Kim, Xuanyi Chen, Josef Affourtit, Abigail E Schipper, Leon Bergen, Kyle Mahowald, Evelina Fedorenko, High-level language brain regions process sublexical regularities, Cerebral Cortex, Volume 34, Issue 3, March 2024, bhae077, https://doi.org/10.1093/cercor/bhae077
- Share Icon Share
Abstract
A network of left frontal and temporal brain regions supports language processing. This “core” language network stores our knowledge of words and constructions as well as constraints on how those combine to form sentences. However, our linguistic knowledge additionally includes information about phonemes and how they combine to form phonemic clusters, syllables, and words. Are phoneme combinatorics also represented in these language regions? Across five functional magnetic resonance imaging experiments, we investigated the sensitivity of high-level language processing brain regions to sublexical linguistic regularities by examining responses to diverse nonwords—sequences of phonemes that do not constitute real words (e.g. punes, silory, flope). We establish robust responses in the language network to visually (experiment 1a, n = 605) and auditorily (experiments 1b, n = 12, and 1c, n = 13) presented nonwords. In experiment 2 (n = 16), we find stronger responses to nonwords that are more well-formed, i.e. obey the phoneme-combinatorial constraints of English. Finally, in experiment 3 (n = 14), we provide suggestive evidence that the responses in experiments 1 and 2 are not due to the activation of real words that share some phonology with the nonwords. The results suggest that sublexical regularities are stored and processed within the same fronto-temporal network that supports lexical and syntactic processes.
Introduction
Languages contain rich statistical patterns across a range of information scales—from inter-word dependencies, to meanings of individual words and morphemes, to patterns of sounds within words—but whether linguistic information at different scales is represented and processed by overlapping or distinct cognitive and neural mechanisms remains debated. Traditionally, a distinction has been drawn between “high-level” linguistic processes, such as syntax and lexical semantics, and phonological processing, which was considered lower level and thus assumed to rely on distinct cognitive and neural machinery (e.g. Chomsky 1965, 1995; Chomsky and Halle 1965; Bromberger and Halle, 1989; Pinker 1991; Heinz and Idsardi, 2011, 2013; Berwick and Chomsky, 2016; see Matchin (2018) for recent implementation-level claims about the separation between phonological and higher-level processes). However, some linguistic theories have suggested a more integrated view of language processing, where the boundaries between our processing of the sentence structure, word meanings, and sublexical sound patterns are less sharp (e.g. Gaskell and Marslen-Wilson 1997; Bybee 1999, 2013; Goldberg 2003; Jackendoff 2007; Huettig et al. 2020; Jackendoff and Audring 2020).
In support of this integrated view of language processing, corpus investigations across diverse languages have revealed strong relationships between sound patterns and other aspects of language. For instance, more frequent words tend to be more phonotactically regular, i.e. obeying the phoneme-combinatorial constraints of the language (e.g. Zipf 1936; Landauer and Streeter 1973; Frauenfelder et al. 1993; Mahowald et al. 2018; Pimentel et al. 2020), phonological clustering may be one organizing principle of the lexicon (e.g. Dautriche et al. 2017), and some sounds/sound patterns appear to be associated with aspects of meaning (e.g. Iwasaki et al. 2007; Monaghan et al. 2014; Larsson 2015; Blasi et al. 2016; Winter et al. 2017; Sidhu and Pexman 2018; Pimentel et al. 2019; Vinson et al. 2021). Further, sound patterns can differentiate syntactic categories, like nouns and verbs (e.g. Kelly 1992; Albright 2008; Arciuli and Monaghan 2009; Arciuli et al. 2012). These links between sound patterns and other aspects of linguistic structure and meaning may be particularly important for language acquisition as linguistic input is initially perceived as a meaningless sequence of sounds that the language system attempts to interpret. Indeed, early word learning is facilitated by sound–meaning associations or iconicity (Perry et al. 2018) and by knowledge of phonotactic regularities (Storkel 2001; Coady and Aslin 2004; Dautriche et al. 2015; de Carvalho et al. 2016; Jones et al. 2021); this knowledge continues to facilitate lexical access in adulthood (e.g. Vitevitch et al. 1999; Vitevitch and Luce 1999; Luce and Large 2001).
Does strong integration between sound patterns and lexical or syntactic features mean that—at the implementation level—the system that processes words and sentences (i.e. supports computations related to lexical access, syntactic structure building, and semantic composition) also processes sublexical sound patterns? Past neuroscience research has not provided a clear answer. Prior neuroimaging investigations have reported effects for phonological manipulations in diverse left-hemisphere (or bilateral) brain areas, including superior temporal gyrus (e.g. Paulesu et al. 1993; Price et al. 1997; Okada and Hickok 2006; Graves et al. 2007, 2008; DeWitt and Rauschecker 2012; Gow and Olson 2015; Lopopolo et al. 2017; Scott and Perrachione 2019), supramarginal gyrus (e.g. Paulesu et al. 1993; Celsis et al. 1999; Church et al. 2011; Weiss et al. 2018; Yen et al. 2019), and inferior frontal cortex (e.g. Paulesu et al. 1993; Demonet et al. 1994; Poldrack et al. 1999; Burton 2001; Myers et al. 2009; Vaden et al. 2011; Okada et al. 2017; Xie and Myers 2018). Similarly, lesions in these different brain areas (e.g. Geva et al. 2011; Pillay et al. 2014; Kries et al. 2023), as well as their interruption by electric/magnetic stimulation (e.g. Devlin et al. 2003; Boatman 2004; Hartwigsen et al. 2016), have been shown to lead to impairments on phonological tasks, like rhyme judgments, nonword repetition, or phoneme identification.
Some of the brain areas implicated in phonological processing appear to overlap with the “core” language network—a set of left-lateralized frontal and temporal areas that selectively respond to linguistic input, visual or auditory (e.g. Fedorenko et al. 2011; Monti et al. 2012) and support the processing of word forms and meanings and combinatorial syntactic and semantic processes (e.g. Bozic et al. 2010; Fedorenko et al. 2010, 2020; Bautista and Wilson 2016). However, inferences about shared vs. distinct neural mechanisms based on the similarity of gross anatomical locations across studies are problematic (e.g. Poldrack 2006; Fedorenko 2021). Furthermore, most past studies of phonological processing have employed tasks that differ in their computational demands from those of naturalistic language processing, where the goal is to simply extract meaning from the linguistic input. Some studies have required (overt or covert) speech production and may have therefore recruited the speech articulation system (e.g. Bohland and Guenther 2006; Basilakos et al. 2017), and others have used tasks with executive demands (e.g. rhyme judgments) and may have therefore recruited domain-general executive resources (see, e.g. Diachek et al. 2020; Quillen et al. 2021 for evidence that the executive control system gets engaged when language comprehension is accompanied by extraneous tasks).
To provide a clearer answer about whether the system that supports lexical and word-combinatorial processing is sensitive to sublexical sound patterns, we functionally defined the language network using an established language “localizer” task (Fedorenko et al. 2010) and then examined these brain regions’ responses to nonwords—sequences of phonemes that do not constitute real English words—during relatively naturalistic reading/listening across five fMRI experiments. It is important to note that although we define the language regions as regions that respond more strongly during sentence processing compared to the processing of nonwords (or similar control conditions; Methods), this definition does not entail that the response to nonwords would be negligible. In fact, we have previously observed that the response to nonwords in these language regions is consistently above a low-level fixation baseline (Fedorenko et al. 2010; Blank et al. 2016; Fedorenko and Blank 2020). We here formally investigate this effect.
To foreshadow the key findings, visually and auditorily presented nonwords elicited robust responses across the language network despite their lack of meaning and lack of ability to combine into larger units like phrases. Further, nonwords that were more well-formed elicited stronger responses than less well-formed ones, which suggests that the language network represents and processes phoneme-combinatorial regularities. We further provide suggestive evidence that the response to nonwords in the language network is not merely due to the activation of representations of real words that share some phonology with the nonwords, thus strengthening the claim that sublexical meaningless units are processed by the same system that processes words, phrases, and sentences.
Materials and methods
Participants
In total, 620 individuals (age 18 to 71 mean 24.9 + −7.3; 358 [57.7%] females) from the Cambridge/Boston, MA community participated for payment across five fMRI experiments (n = 605 in experiment 1a, n = 12 in experiment 1b, n = 13 in experiment 1c, n = 16 in experiment 2, and n = 14 in experiment 3, for a total of 660 scanning sessions; Table 1). For experiment 1a, we leveraged a large dataset that was collected in our lab across 10+ years (Lipkin et al. 2022). Forty participants overlapped between experiment 1a and other experiments (12, 14, and 14 with experiments 1c, 2, and 3, respectively; Table 1) and 4 participants overlapped between experiments 2 and 3. 558 participants (~90%, see Table 1 for numbers per experiment) were right-handed, as determined by the Edinburgh handedness inventory (Oldfield, 1971), or self-report; the remaining participants were either left-handed (n = 40), ambidextrous (n = 14), or missing handedness information (n = 8; see Willems et al. 2014 for arguments for including left-handers in cognitive neuroscience research). All participants were native English speakers, and all gave written informed consent to participate in our experiments in accordance with the requirements of Massachusetts Institute of Technology (MIT)’s Committee on the Use of Humans as Experimental Subjects (COUHES), which approved the study.
Experiment . | 1a . | 1b . | 1c . | 2 . | 3 . |
---|---|---|---|---|---|
Participants | 605 | 12 | 13 | 16 | 14 |
Females | 349 | 7 | 6 | 11 | 11 |
Age (mean, std) (years) | 24.9 (7.3) | 23.2 (4) | 24.7 (6.7) | 22.2 (7.1) | 25.1 (7.6) |
Left-handed (ambidextrous) | 36 (13) | 0 (0) | 1 (0) | 2 (0) | 1 (2) |
Participants overlapping with experiment: 1a 1b 1c 2 3 | |||||
0 | 12 | 14 | 14 | ||
0 | 0 | 0 | |||
0 | 0 | ||||
4 | |||||
Experiment . | 1a . | 1b . | 1c . | 2 . | 3 . |
---|---|---|---|---|---|
Participants | 605 | 12 | 13 | 16 | 14 |
Females | 349 | 7 | 6 | 11 | 11 |
Age (mean, std) (years) | 24.9 (7.3) | 23.2 (4) | 24.7 (6.7) | 22.2 (7.1) | 25.1 (7.6) |
Left-handed (ambidextrous) | 36 (13) | 0 (0) | 1 (0) | 2 (0) | 1 (2) |
Participants overlapping with experiment: 1a 1b 1c 2 3 | |||||
0 | 12 | 14 | 14 | ||
0 | 0 | 0 | |||
0 | 0 | ||||
4 | |||||
Experiment . | 1a . | 1b . | 1c . | 2 . | 3 . |
---|---|---|---|---|---|
Participants | 605 | 12 | 13 | 16 | 14 |
Females | 349 | 7 | 6 | 11 | 11 |
Age (mean, std) (years) | 24.9 (7.3) | 23.2 (4) | 24.7 (6.7) | 22.2 (7.1) | 25.1 (7.6) |
Left-handed (ambidextrous) | 36 (13) | 0 (0) | 1 (0) | 2 (0) | 1 (2) |
Participants overlapping with experiment: 1a 1b 1c 2 3 | |||||
0 | 12 | 14 | 14 | ||
0 | 0 | 0 | |||
0 | 0 | ||||
4 | |||||
Experiment . | 1a . | 1b . | 1c . | 2 . | 3 . |
---|---|---|---|---|---|
Participants | 605 | 12 | 13 | 16 | 14 |
Females | 349 | 7 | 6 | 11 | 11 |
Age (mean, std) (years) | 24.9 (7.3) | 23.2 (4) | 24.7 (6.7) | 22.2 (7.1) | 25.1 (7.6) |
Left-handed (ambidextrous) | 36 (13) | 0 (0) | 1 (0) | 2 (0) | 1 (2) |
Participants overlapping with experiment: 1a 1b 1c 2 3 | |||||
0 | 12 | 14 | 14 | ||
0 | 0 | 0 | |||
0 | 0 | ||||
4 | |||||
Design, materials, and procedure
All experiments—overview
In all experiments, we examined responses to nonwords—meaningless sound/letter strings—in the high-level language system. Therefore, in all experiments each participant completed an fMRI reading–based language network “localizer” task based on contrasting fMRI responses between reading sentences and reading nonword sequences, as detailed below. This localizer was previously shown to be robust to modality—the same brain regions are found in a reading-based or listening-based version of the localizer (e.g. Fedorenko et al. 2010, Scott et al. 2017, Chen et al. 2021, Malik-Moraleda et al. 2022). Participants also completed one or more tasks, including the critical experimental task (the main task used for each experiment in this study) and, in most cases, other tasks for unrelated studies. The total session duration was typically around 2 h.
The purpose of experiments 1a, 1b, and 1c was to examine the general robustness of responses to nonwords within the language network, across the visual (reading) and auditory (listening) modalities. The nonwords in experiments 1a, 1b, and 1c, as well as in the language localizer, were all constructed to meet the phoneme-combinatorial constraints of English (and thus to sound relatively well-formed) using slightly different methods, as detailed below for each experiment. The purpose of experiments 2 and 3 was to examine how phonological characteristics of the nonwords affected neural responses. In experiment 2, we manipulated nonword well-formedness (which correlates with phonotactic probability), and in experiment 3 we manipulated the neighborhood density of nonwords (i.e. the number of real words that are phonologically similar to the nonword; Vitevitch et al. 1999), as detailed below.
Reading-based language network localizer
This task was originally designed to elicit robust responses in the high-level language network, as described in detail in Fedorenko et al. (2010) and subsequent studies from the Fedorenko lab (and is available for download from https://evlab.mit.edu/funcloc/). In this task, participants read sentences and lists of unconnected pronounceable nonwords in a blocked design and were asked to press a button at the end of each trial, when a special symbol appeared, to maintain alertness. The words or nonwords appeared on the screen one at a time. The vast majority of participants (605 out of 620) performed a version of the localizer where the nonwords were created using the Wuggy software (Keuleers and Brysbaert 2010), to match their phonotactic properties to those of the words used in the sentence condition. See further details in Supplementary Information Section 1 (Timing and details of stimulus presentation). The Sentences > Nonwords contrast targets brain regions that support high-level language comprehension, including lexico-semantic and combinatorial (syntactic and compositional semantic) processes (e.g. Fedorenko et al. 2010, 2020; Fedorenko et al. 2012b; Blank et al. 2016), and has been shown to be robust to changes in modality (visual/auditory), materials, task, timing parameters, and other aspects of the procedure (e.g. Fedorenko et al. 2010; Fedorenko 2014; Mahowald and Fedorenko 2016; Scott et al. 2017; Diachek et al. 2020). As such, this specific contrast is standardly used in our lab as a language localizer contrast, but many similar contrasts work equally well (see Fedorenko et al. in press, for a review).
Experiment 1a (passive reading of lists of nonwords from the language localizer)
To examine the robustness of responses to visually presented nonwords in the language regions, we used the nonwords condition from the reading-based language network localizer. Response magnitudes were estimated using cross-validation across experimental runs to ensure that the data used for the localization of the language regions were independent from the data used to estimate the responses to nonwords in this critical task (e.g. Kriegeskorte et al. 2009). The cross-validation was performed in the following way: First, run 1 was used to define the regions of interest and run 2 to estimate the responses (each participant performed 2 runs of the task); then, run 2 was used to define the regions and run 1 to estimate the responses; finally, the estimates were averaged across the two runs to derive a single estimate per participant per region. As noted above (Reading-based language network localizer), the nonwords were accompanied by a simple button-press task to maintain alertness. Behavioral responses for this and all other experiments are summarized in a supplementary table available at the Open Science Framework (OSF) platform (https://osf.io/6c2y7/). Example items are shown in Fig. 1.

Procedure and example stimuli for all experiments. Color-filled rounded rectangles represent a typical block or trial in a specific condition of each experiment. The color codes match those used in Fig. 2. Experiment numbers and conditions are indicated above each rectangle. Left to right, top to bottom (see Methods for further detail): Exp 1a—passive reading of lists of nonwords from the language localizer. Exp 1b—listening to lists of nonwords followed by a memory probe. Exp 1c—passive listening to lists of nonwords. Exp 2—reading of lists of nonwords parametrically varying in well-formedness, followed by a memory probe. Exp 3—reading of lists of nonwords with a low or high phonological neighborhood, accompanied by repetition detection (the experiments are not ordered due to their ordinal numbers to preserve space in the figure).
Experiment 1b (listening to lists of nonwords followed by a memory probe)
To examine the robustness of responses to auditorily presented nonwords in the language regions, we used the nonwords condition from an auditory language experiment that was published previously (experiment 3 in Fedorenko et al. 2010). Participants listened to recordings of lists of nonwords (and materials from three other conditions that are not relevant to the current study) in a blocked design and, at the end of each trial, judged whether a probe word/nonword appeared in the trial. The nonwords were constructed by recombining the syllables that comprised the words in the real-word conditions of the experiment (to preserve phonotactic well-formedness) and were recorded by a female native English speaker (see Fedorenko et al. 2013b for a detailed acoustic analysis of these materials). Example items are shown in Fig. 1. See further details in Supplementary Information Section 1 (Timing and details of stimulus presentation).
Experiment 1c (passive listening to lists of nonwords)
To replicate and generalize the results from experiment 1b, we used a nonwords condition from another auditory experiment (experiment 4 in Chen et al. 2023). Participants passively listened to recordings of lists of nonwords (and materials from several other conditions that are not relevant to the current study) in a blocked design. The nonword lists were constructed by taking a set of sentences and replacing each word with a nonword that has a similar phonological structure (taking into account consonant–vowel structure, consonant class, vowel class, and rhythmicity) but that does not have any meaning. These “nonword sentences” were recorded by a female and a male native English speaker. In the experiment, half of the trials came from the female speaker, and the other half from the male speaker. Example items are shown in Fig. 1, and the full list of materials is available at OSF (https://osf.io/6c2y7/). See further details in Supplementary Information Section 1 (Timing and details of stimulus presentation).
Experiment 2 (reading lists of nonwords—that vary in their well-formedness—followed by a memory probe)
To test whether more well-formed nonwords would elicit a stronger response in the language regions, participants read lists of real words and nonwords (and materials from five other conditions that are not relevant to the current study) in a blocked design. The nonwords were created from real words via one or multiple letter replacements, as detailed below. The original words and different resulting versions of nonwords were grouped into five conditions based on well-formedness ratings, which were obtained in a behavioral norming study conducted online, with independent participants, as described below. Example items are shown in Fig. 1, and all items are available at OSF (https://osf.io/6c2y7/). See further details in Supplementary Information Section 1 (Timing and details of stimulus presentation).
Construction and norming of the materials
To create the nonwords, a large set of real trisyllabic English words (n = 20,695) was first identified. For each word, 14 versions of nonwords were created by iteratively replacing random letters with other letters, while ensuring that the local trigram context (the letter preceding the critical letter, the critical replaced letter, and the letter following it) is attested in English (i.e. appears in at least one real word). For example, consider the word “BLACKBERRY”; the letter C could be replaced with the letter R because the string “ARK” is attested (e.g. BARK), or with the letter L because the string “ALK” is attested (e.g. ALKALINE), but not with the letter X because the string “AXK” is not attested. This replacement process was repeated on the resulting nonword (e.g. BLARKBERRY in this example, if R was used as the replacement for C) using the same constraints, up to 14 times total. This procedure resulted in a set of 310,425 words and nonwords including the original words and all the resulting nonwords from the 14 letter-replacement iterations done on each word. A subset of these materials (n = 900, sampled ~equally from the 15 “levels” of degradation, i.e. number of replaced letters, between 0 and 14) were presented to participants online via Amazon.com’s Mechanical Turk platform. Participants were presented with one word/nonword at a time and asked to rate each for how well-formed it was (the exact wording was: “How close is this to being an acceptable English word?”), on a scale from 1 (very unacceptable) to 5 (very acceptable). The words/nonwords were then divided into five sets according to the well-formedness ratings, from least to most well-formed: 1 to 1.5, 1.5 to 2, 2 to 3, 3 to 4.5, and 4.5 to 5. The bin sizes were determined by the distribution of ratings, to balance the number of items within each set (condition). Each set consisted of 180 items, except for the most well-formed set, for which there were only 173 items. The most well-formed set consisted of mostly real words, and the other four sets consisted exclusively of nonwords. Fifteen 12-item strings were created from these materials for each of the five conditions for presentation in a blocked design experiment (for the most well-formed condition, seven of the items were used twice, never within the same string).
Participants in experiment 2 also performed a non-linguistic (spatial working memory) task (Fedorenko et al. 2013a). Data from this task were used in a control analysis, as a comparison to the critical nonword reading task (Methods). In this task, participants viewed a grid within which locations were randomly flashed sequentially (one at a time for a total of four locations in the easy condition and two at a time for a total of eight locations in the hard condition). At the end of the trial, participants had to indicate the locations they just saw by selecting one of two options via a button press, followed by feedback on the correctness of their response. The Hard > Easy contrast engages the multiple demand (MD) network, which is robustly distinct from the language network (Fedorenko et al. 2013a). See further details in Supplementary Information Section 1 (Timing and details of stimulus presentation).
Experiment 3 (reading lists of nonwords with a low or high phonological neighborhood, accompanied by repetition detection)
Previous work has shown that phonotactic regularity of nonwords (which correlates with perceived well-formedness, as manipulated in experiment 2) tends to correlate with phonological neighborhood (defined as the number of real words that are one edit away from the nonword) (e.g. Vitevitch et al. 1999; Vitevitch and Luce 1999), but these two factors can be disentangled (e.g. Luce and Large 2001). Furthermore, behavioral investigations have suggested that experimental manipulations of phonotactic regularities target local phoneme-combinatorial pattern processing (i.e. sublexical processing) whereas manipulations of phonological neighborhood emphasize holistic (lexical) recognition of phoneme strings (Vitevitch and Luce 1999; Luce and Large 2001). We therefore wanted to test whether the results obtained in experiment 2 (stronger responses to more well-formed nonwords) could be due to activation of real words that sound similar to the nonwords (neighbors), instead of perception of the phonological structure of the nonwords themselves. Stronger neural responses to nonwords with more phonological neighbors compared to nonwords with fewer neighbors would support this possibility. Participants read lists of nonwords that were matched on phonotactic probability and other phonological characteristics (as described below) but critically varied in their phonological neighborhood size in a blocked design and were instructed to press a button when a nonword repeated in a row. See further details in Supplementary Information Section 1 (Timing and details of stimulus presentation).
Construction of the materials
To construct two sets of nonwords that are matched on phonotactic probability and other phonological characteristics but vary in their phonological neighborhood size, a 3-gram model over phonemes was used, using the generative procedure described in Dautriche et al. (2017). In particular, each phoneme is generated probabilistically, conditioned on the preceding two phonemes. Using this model, a large set of candidate nonwords was sampled without replacement. Then, 80 pairs of nonwords were selected such that they were matched on length in letters and syllables, on the consonant–vowel patterns, and on phonotactic probability, as measured with a pronunciation-based phonotactic (“BLICK”) score (e.g. Hayes and Wilson 2008; Hayes 2012) [a two-sample t-test of BLICK scores between the sets: t(158) = 0.05, P = 0.96], but critically differed maximally in their phonological neighborhood size. Neighborhood size was estimated as the number of real English words that are one edit away from the nonword. For example, phonological neighbors of the nonword “ZAT” include “BAT,” “CAT,” and “ZAP,” among others (although phonological neighborhood size has some limitations as a measure, such as treating all letter positions equally [cf. Marslen-Wilson 1987; Wedel et al. 2019 for evidence of letter-position effects in word recognition], it is standardly used and has been shown to underlie many behavioral effects in word/nonword processing [for a review, see Vitevitch and Luce 2016]). In the high-neighborhood set, each nonword had at least nine neighbors (mean = 11, SD = 2.6), and in the low-neighborhood group, each nonword had at most three neighbors [mean = 1.85, SD = 0.9, two-sample t-test of neighborhood scores between the groups: t(158) = 28.7, P < 0.0001]. Example items are shown in Fig. 1, and all items are available at https://osf.io/6c2y7/.
Phonotactic probability and neighborhood size of stimuli in experiments 2 and 3
To investigate which features of the nonword stimuli may contribute to neural responses in the language system, we calculated the phonotactic probability and neighborhood size of the stimuli in a unified manner across experiments 2 and 3. We used the English Lexicon Project (https://elexicon.wustl.edu/, Balota et al. 2007) for both measures. This website allows one to submit lists of written nonwords and outputs a series of characteristics calculated based on an English corpus (Balota et al. 2007). The phonotactic probability measure was computed as the mean bigram frequency, which is the sum of bigram counts (where a bigram is a sequence of two letters like ZA and AT in ZAT) for all the local bigrams within a nonword, divided by the number of bigrams. The neighborhood size measure was computed as the number of real words that can be obtained by changing one letter while preserving the identity and positions of the other letters (i.e. Coltheart’s N; Coltheart et al. 1977). We chose to use orthography-based measures and not phonology-based measures (as we originally did when designing the materials for experiment 3) because (a) the stimuli were visually presented to the participants and (b) the pronunciation of many English nonwords is inherently ambiguous because of the non-transparency of English spelling (e.g. the nonword KLOUGH could be pronounced to rhyme with through, trough, or tough). However, our results are robust to whether we use orthography- or phonology-based measures (e.g. the correlation between orthographic and phonological neighborhood size measures for the nonwords in experiment 3 is r = 0.55, P < 0.001). Having obtained the phonotactic probability and neighborhood size measures for the nonwords in experiments 2 and 3, we calculated the average and standard error across all nonwords in each condition (five conditions varying in well-formedness in experiment 2 and two conditions varying in neighborhood size in experiment 3).
fMRI data acquisition
Experiments 1a, 1c, 2, and 3
Whole-brain structural and functional data were collected on a whole-body 3 Tesla Siemens Trio scanner with a 32-channel head coil at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1-weighted structural images were collected in 176 axial slices with 1 mm isotropic voxels (repetition time (TR) = 2,530 ms; echo time (TE) = 3.48 ms). Functional, blood oxygenation level–dependent (BOLD) data were acquired using an EPI sequence with a 90o flip angle and using GRAPPA with an acceleration factor of 2; the following parameters were used: 31 4.4 mm thick near-axial slices acquired in an interleaved order (with 10% distance factor), with an in-plane resolution of 2.1 mm × 2.1 mm, FoV in the phase encoding anterior to posterior (A > > P) direction 200 mm and matrix size 96 × 96 voxels, TR = 2,000 ms and TE = 30 ms. The first 10 s of each run were excluded to allow for steady state magnetization.
experiment 1b
This experiment had distinct data acquisition parameters because it was conducted at an earlier point in time (2008 to 2009). Whole-brain structural and functional data were collected on the whole-body 3 Tesla Siemens Trio scanner at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1- weighted structural images were collected in 128 axial slices with 1.33 mm isotropic voxels (TR = 2,000 ms, TE = 3.39 ms). Functional BOLD data were acquired in 3.1 × 3.1 × 4 mm voxels (TR = 2,000 ms, TE = 30 ms) in 32 near-axial slices. The first 4 s of each run were excluded to allow for steady-state magnetization.
fMRI data preprocessing
fMRI data were analyzed using SPM12 (release 7487), CONN EvLab module (release 19b) and other custom MATLAB scripts. Each participant’s functional and structural data were converted from DICOM to NIFTI format. All functional scans were coregistered and resampled using B-spline interpolation to the first scan of the first session (Friston et al. 1995). Potential outlier scans were identified from the resulting subject-motion estimates as well as from BOLD signal indicators using default thresholds in CONN preprocessing pipeline (5 SD above the mean in global BOLD signal change, or framewise displacement values above 0.9 mm, Nieto 2020). Functional and structural data were independently normalized into a common space (the Montreal Neurological Institute [MNI] template; IXI549Space) using SPM12 unified segmentation and normalization procedure (Ashburner and Friston 2005) with a reference functional image computed as the mean functional data after realignment across all timepoints omitting outlier scans. The output data were resampled to a common bounding box between MNI-space coordinates (−90, −126, −72) and (90, 90, 108), using 2 mm isotropic voxels and 4th-order spline interpolation for the functional data, and 1 mm isotropic voxels and trilinear interpolation for the structural data. Lastly, the functional data were then smoothed spatially using spatial convolution with a 4 mm FWHM Gaussian kernel.
fMRI data first-level modeling
Effects were estimated using a general linear model (GLM) in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (HRF) (fixation was modeled implicitly). Temporal autocorrelations in the BOLD signal timeseries were accounted for by a combination of high-pass filtering with a 128 s cutoff, and whitening using an AR(0.2) model (first-order autoregressive model linearized around the coefficient a = 0.2) to approximate the observed covariance of the functional data in the context of restricted maximum likelihood estimation (ReML). In addition to main condition effects, other model parameters in the GLM design included first-order temporal derivatives for each condition, modeling spatial variability in the HRF delays, as well as nuisance regressors controlling for the effect of slow linear drifts, subject-motion parameters, and potential outlier scans on the BOLD signal.
Definition of the language functional regions of interest
For each critical experiment, we first defined a set of language functional regions of interest (fROIs) using an established language localizer (Fedorenko et al. 2010), which identifies a set of brain regions that respond strongly and selectively during language processing and form an integrated functional network, which we refer to as the “language regions” or the “language network” (Fedorenko et al. in press). To define the fROIs, we used a group-constrained, subject-specific (GcSS) approach (Fedorenko et al. 2010), where each individual map for the Sentences > Nonwords localizer contrast was intersected with a set of five binary left-hemisphere masks. These masks (Fig. 2; available at http://web.mit.edu/evlab//funcloc/#parcels) were derived from a probabilistic activation overlap map for the same contrast in a large set of participants (n = 220) using watershed parcellation, as described in Fedorenko et al. (2010) for a smaller set of participants. Within each mask, a participant-specific language fROI was defined as the top 10% of voxels with the highest t-values for the localizer contrast (see Lipkin et al. 2022 for evidence that the language fROIs are similar when defined with a fixed statistical threshold). Effect sizes for the critical tasks were then estimated in the language fROIs by averaging across the voxels within each participant-specific fROI [see also Analyses of the critical tasks (all experiments) below]. For completeness, we also defined (i) homotopic right-hemisphere language fROIs using the same voxel selection procedure within the mirrored versions of the LH masks (see Supplementary Information Section 3, Fig. S1), as well as (ii) bilateral language fROIs in the angular gyrus (see Supplementary Information Section 4, Fig. S2). Both of these sets of areas are activated by the language localizer contrast but have been shown to dissociate from the core frontal and temporal LH language areas (e.g. Shain, Paunov, Chen et al. 2023).

Responses of the left-hemisphere language network to nonwords in all experiments. Bar graphs show % BOLD signal change relative to a fixation baseline in individually defined language fROIs averaged across participants in each specific experiment (number of participants specified on abscissa). Here and elsewhere, error bars denote standard errors of the mean by participants, and dots are individual participants. The brain images display the masks that were used to define the fROIs; individual fROIs are 10% of most language-responsive voxels within each mask; the effects are estimated using data that are independent from the data used to define the fROIs. A) Responses in all five language fROIs together. B) Responses in each fROI separately. IFGorb, inferior frontal gyrus orbital, IFG, inferior frontal gyrus, MFG, medial frontal gyrus, AntTemp, anterior temporal, PostTemp, posterior temporal (see Fig. S1 for responses in the right-hemisphere homotopic language fROIs, and Fig. S2 for responses in the language fROIs located in the bilateral angular gyri).
A whole-brain search for areas sensitive to nonword well-formedness (Experiment 2)
In addition to a targeted analysis of the language regions, we searched across the brain for regions that process phoneme-combinatorial regularities and thus exhibit sensitivity to the nonword well-formedness gradient in the critical task in experiment 2. To do so, we used a group-constrained subject-specific (GcSS) analysis, which is more sensitive than a traditional fMRI group analysis given that it takes into account inter-individual variability in the precise locations of functional areas. Using data from experiment 2, we defined a contrast based on the parametric well-formedness manipulation: GradientW = −1*W1 − 0.5*W2 + 0*W3 + 0.5*W4 + 1*W5, where W1 to W5 are the five conditions in experiment 2 (nonwords that vary in well-formedness from low to high; condition W5 almost exclusively consists of real words). Individual activation maps for this contrast were binarized in the following way: voxels that pass the significance threshold of P < 0.01 for the parametric contrast above were denoted as 1 and the rest of the voxels as 0 (note that the use of a relatively liberal threshold at this stage is acceptable because it is only used to create probabilistic maps, and the statistical tests are performed at a later stage with independent data as explained next). Individual binarized activation maps were overlaid to create a probabilistic activation overlap map, which was then parcellated using a watershed algorithm, as described in Fedorenko et al. (2010; a custom Matlab toolbox for doing this is available at https://evlab.mit.edu/funcloc) to identify areas of common activation across participants. Using the resulting masks, we then defined individual fROIs using the top 10% of voxels responding to the contrast above in each individual (similar to how the language fROIs were defined using the group-level language masks and the top 10% voxels responding to Sentences > Nonwords within the masks in each individual). Then, using an across-runs cross-validation procedure (described above, in Definition of the language functional regions of interest), we estimated the responses within these fROIs to the five conditions (W1 to W5) as well as the Sentences and Nonwords conditions from the language localizer. We report the results for fROIs that showed a replicable (across runs) well-formedness gradient (W1 < W2 < W3 < W4 < W5) and an above-baseline response to W5 (see Supplementary Information Section 5 for the full set of results).
Comparison of voxel-level activation patterns between the nonword well-formedness contrast and the language localizer contrast (experiment 2)
To quantify the similarity of the activation patterns between the critical contrast in experiment 2 (sensitivity to nonword well-formedness; GradientW contrast, see above) and the language localizer contrast in a way that is not biased by the use of the functional localization approach, we examined voxel-wise correlations in the contrast values (across the brain as well as within the language masks). The correlations were computed for each individual participant (n = 16), Fisher-transformed, and then averaged across participants. As an additional comparison, we also included a robust contrast from a non-linguistic spatial working memory task (see Experiment 2 (Reading lists of nonwords—that vary in their well-formedness—followed by a memory probe) above).
Validation of the language fROIs (all experiments)
To ensure that the language fROIs behave as expected (i.e. show a reliably greater response to the sentences condition compared to the nonwords condition), we used an across-runs cross-validation procedure (e.g. Nieto-Castañón and Fedorenko 2012). In this analysis, the first run of the localizer was used to define the fROIs, and the second run to estimate the responses (in percent BOLD signal change, PSC, relative to fixation baseline) to the localizer conditions, ensuring independence (e.g. Kriegeskorte et al. 2009); then the second run was used to define the fROIs, and the first run to estimate the responses; finally, the extracted magnitudes were averaged across the two runs to derive a single response magnitude for each of the localizer conditions. Statistical analyses were performed on these extracted PSC values. Namely, for each of the five left-hemisphere language fROIs identified, we fit a linear mixed-effect (LME) regression model, predicting the level of PSC for sentences relative to nonwords. The model included fixed effects for an intercept and a slope variable encoding the difference between sentences and nonwords on top of the common intercept. This scheme was implemented by coding sentences as a + 0.5 factor and nonwords as a − 0.5 factor. The model additionally included random terms for both the intercept and the slope variable encoding the difference between sentence and nonwords, both grouped by participant:
where 1 denotes the intercept, diff_sent_nonwords denotes the difference between sentence and nonwords slope variable, encoded as explained above, and participant denotes a unique number per participant.
In this coding scheme, the intercept estimate reflects the average PSC response for the sentence and nonword conditions together and the slope variable estimate reflects the difference between the sentence and nonwords conditions. Therefore, to test the validity of the language fROIs, we examined the values of the fixed intercept and slope variable estimates. Both of these estimates had to be significantly positive. The results were FDR-corrected for the five ROIs. A similar analysis was performed for the five right-hemisphere fROIs.
Analyses of the critical tasks (all experiments)
To estimate the responses in the language fROIs to the conditions of the critical tasks, in each experiment the data from all the runs of the language localizer were used to define the fROIs, and the responses to each condition were then estimated in these regions (in percent BOLD signal change, PSC, relative to fixation baseline). The critical conditions were as follows (see Design, materials, and procedure above): (i) in experiment 1a: visual nonwords from the language localizer, (ii) in experiment 1b: auditory words/nonwords, (iii) in experiment 1c: auditory nonwords, (iv) in experiment 2: five word/nonword conditions parametrically varying in well-formedness, and (v) in experiment 3: two nonword conditions varying in phonological neighborhood size.
For each experiment, we used LME regression models (using Matlab fitlme routine) to determine the significance of activations of the critical conditions within the language network. We used these models in two ways: (i) to examine the response within the language network as a whole and (ii) to examine the responses in each of the five language fROIs separately. Treating the language network as an integrated system is reasonable given that the regions of this network (a) show similar functional profiles, both with respect to selectivity for language over non-linguistic processes (e.g. Fedorenko et al. 2011) and with respect to their role in lexico-semantic and syntactic processing (e.g. Blank et al. 2016; Fedorenko et al. 2020), and (b) exhibit strong inter-region correlations in both their activity during naturalistic cognition paradigms (e.g. Blank et al. 2014; Paunov et al. 2018; Braga et al. 2020; Malik-Moraleda et al. 2022) and key functional markers, like the strength of response or the extent of activation in response to language stimuli (e.g. Mahowald and Fedorenko 2016; Mineroff et al. 2018; Lipkin et al. 2022). However, because we wanted to allow for the possibility that language regions might differ in their response to nonwords, as well as in order to examine the robustness of the effects across the language fROIs, we supplement the network-wise analyses with the analyses of the five language fROIs separately.
For each of the five language fROIs, we fit a linear mixed-effect regression model, predicting the level of PSC in the target language fROI in the contrasted conditions.
In the case of modeling a condition with a single level, as in experiments 1a, b and c, which all contained a single critical condition (nonwords), this condition was modeled as the intercept of the model. The intercept estimates are reported as representing the condition. The model then included a fixed effect for the intercept, and a random intercept grouped by participant.
For the network-level analysis, we included a random intercept grouped by fROI:
For the ROI-level analysis, we ran this model for each ROI:
The P-values (comparing the intercept estimate to 0) were FDR-corrected for the 5 ROIs.
In the case of modeling a condition with multiple levels, we added a slope variable encoding the effect of the critical condition beyond the common intercept. For experiment 2, we modeled the five levels of well-formedness by coding them on a linear scale from −2 to 2 (multiplying brain activity by the factors −1, −0.5, 0, 0.5, 1) from low to high well-formedness, respectively. In experiment 3, we coded the low phonological neighborhood condition as −0.5 and the high neighborhood condition as 0.5.
In these cases, the model included fixed effects for the intercept and condition (the slope variable coding the critical condition) and potentially correlated random intercepts and slopes grouped by participant. Here, the intercept represents the mean brain activity across all the levels of the critical condition and the condition slope estimate represents the deviation in brain activity due to the different levels of the critical conditions. Therefore, the overall effect of the critical condition was significant if the condition estimates were significantly different from 0.
For the network-level analysis for experiments 2 and 3 we included potentially correlated random intercept and slopes grouped by fROI:
For the ROI-level analysis, we ran this model for each fROI:
The P-values comparing the condition estimates to 0 were FDR-corrected for the five fROIs.
Parallel analyses were run for the right-hemisphere language fROIs.
A similar procedure was applied to evaluate the effects for the fROIs that were selected based on the word/nonword well-formedness gradient in experiment 2.
Results
Validation of the language fROIs (all experiments)
Across all experiments, each of the five left-hemisphere fROIs (see Fig. 2B for parcel locations and names) showed a reliably above-baseline response to Sentences (all intercept estimates > 0, Ps < 0.001, FDR-corrected for the five fROIs; full results available at OSF: https://osf.io/6c2y7/), as well as a robust Sentences > Nonwords effect (all slope estimates > 0, ps < 0.001, FDR-corrected for the five fROIs), consistent with much previous work (e.g. Fedorenko et al. 2010; Mahowald and Fedorenko 2016; Diachek et al. 2020; Lipkin et al. 2022).
Behavioral measures in the fMRI tasks
The behavioral tasks that we included in some of the fMRI paradigms were designed to maintain participants’ alertness throughout the experiment while providing us with quantitative estimates of alertness in the different conditions. In general, the behavioral performance in all the tasks was relatively high (>70%) and revealed no significant differences between experimental conditions. The only exception was experiment 2, where nonword well-formedness had a small effect on the memory probe task performance, with better performance for more well-formed conditions [fixed slope of accuracy as a function of well-formedness in a LME model that includes a fixed effect of condition and random intercepts and slopes for participants; −0.02, t(78) = −2.23, P = 0.025, Fig. 3C]. See Supplementary Information Section 2 for average behavioral performance for all experiments. Raw behavioral data are available at OSF: https://osf.io/6c2y7/.

Brain regions that are sensitive to nonword well-formedness across the brain (experiment 2, n = 16). A–E) Five brain regions that were found using a GcSS approach (Fedorenko et al. 2010). (A–D) are left-hemisphere regions and (E) is a right-hemisphere region. (E) is a small parcel (only four voxels) that was buried inside the superior temporal sulcus and is visible only when plotted on top of an unfolded cortical surface. The brain images display the masks that were used to define the fROIs; individual fROIs are 10% of voxels that are most sensitive to the nonword well-formedness gradient within each mask. Bar graphs show % BOLD signal change relative to a fixation baseline in individually defined fROIs averaged across participants; the effects are estimated using data that are independent from the data used to define the fROIs. Bar graphs, all panels, left to right—sentences (red) and nonwords (white) from the language localizer (similar to Experiment 1a) that was run on these 16 participants, 5 conditions from experiment 2, from high to low well-formedness (shades of blue).
Key result 1: The language fROIs respond robustly to visually and auditorily presented nonwords (experiments 1a to c)
In experiment 1a, visually presented nonwords elicited a robust response relative to the fixation baseline across the language network as a whole, when treating the fROIs as a random effect (P < 0.001; Table S2, Fig. 2A), and in each of the five fROIs individually (ps < 0.001, FDR-corrected; Table S1, Fig. 2B). Similarly, auditorily presented nonwords elicited a robust response relative to the fixation baseline. This result held for the network as a whole in both experiment 1b and experiment 1c (Ps < 0.01; Table S2, Fig. 2A). In experiment 1b, this effect was also reliable in each of the five language fROIs (Ps < 0.05, FDR-corrected, Table S1, Fig. 2B), and in experiment 1c, this effect was reliable in the two temporal fROIs (AntTemp and PostTemp fROIs; Ps < 0.01, FDR-corrected; Table S1, Fig. 2B). Thus, experiments 1a-1c revealed robust sensitivity in the language fROIs to nonwords across modalities and tasks.
It is worth noting that a language-responsive region in the left angular gyrus was not sensitive to nonwords (Fig. S2). The left AngG language fROI was originally included as part of the language network (Fedorenko et al. 2010) but subsequently excluded given its functional differentiation from the rest of the language fROIs (e.g. Shain, Paunov, Chen et al. 2023; see Supplementary Information Section 4 for details). The lack of this fROI’s sensitivity to nonwords provides yet another piece of evidence for its distinctness from the core frontal and temporal language regions.
Key result 2: The language fROIs respond more strongly to more well-formed nonwords (experiment 2)
The well-formedness manipulation resulted in a gradient of fMRI response strength in the language network such that words and more well-formed nonwords elicited stronger responses than less well-formed nonwords (P < 0.001, Table S1, Fig. 2). This result held both for the network as a whole and in each of the five fROIs individually (all Ps < 0.001, FDR-corrected, Tables S1 and S2). Thus, experiment 2 suggested that the language network is strongly sensitive to the well-formedness of nonwords.
To test whether this effect was restricted to the language network, we performed a whole-brain search for regions that show a reliable gradient response to nonword well-formedness (along with an above-baseline response to the most well-formed condition). This search revealed five brain regions: four in the left hemisphere (regions A to D, Fig. 3) and one in the right hemisphere (region E, Fig. 3). The masks for the left-hemisphere regions roughly coincided with the language masks (regions A to D roughly coincided with the PostTemp, IFG, AntTemp, and IFGorb language masks, respectively, see Figs 2 and 3). The right-hemisphere region was very small (only four voxels) and was buried inside the superior temporal sulcus, roughly coinciding with the RH AntTemp language mask (Fig. 3 and Supplementary Information Section 3, Fig. S1). Importantly, all of these regions showed robust sensitivity to language processing: the responses to the Sentences condition from the language localizer were significantly larger than to the Nonwords condition (all Ps < 0.0001, FDR-corrected for the five regions, full results at OSF: https://osf.io/6c2y7/). This result suggests that the brain regions that are most sensitive to the degree of nonword well-formedness across the whole brain are also sensitive to lexical semantics and syntactic/combinatorial processing, and that by focusing on the language-responsive areas in our main analysis, we did not miss any critical areas outside of the language network.
To evaluate the similarity of the fine-grained activation patterns between the nonword well-formedness contrast and the language localizer contrast, we performed two additional analyses. First, we visually examined individual whole-brain activation maps for the two contrasts, along with an additional control contrast from the spatial working memory task (Supplementary Information Section 6, Fig. S4). In line with the results of the whole-brain search for areas sensitive to nonword well-formedness above, the activations for the nonword well-formedness contrast appear similar to the activations for the language localizer (although the latter is, of course, a broader and more robust contrast, leading to overall stronger activations); in contrast, the control, nonlinguistic spatial working memory task elicits a very different pattern of activations, in line with past work (e.g. Fedorenko et al. 2013a). To quantify this similarity, we computed voxel-wise spatial correlations (Supplementary Information Section 7, Figs. S5 and S6). This analysis asks whether, e.g. the most language-responsive voxels also show the strongest sensitivity to nonword well-formedness. We found a strong correlation between the nonword well-formedness contrast and the language localizer contrast (>0.5, on average across participants, across the whole left hemisphere); in contrast, the correlations between each of the two language contrasts and the spatial working memory contrast are close to zero or negative (Fig. S5). Further, the correlation between the nonword well-formedness contrast and the language localizer contrast was approximately as high as the correlation across the runs of the nonword reading task, representing the noise ceiling (Fig. S6). These results thus strengthen our claim that phonotactic regularities are primarily processed within the language network.
Key result 3: No evidence for lexical “neighbors” driving the language network’s response to nonwords (experiments 2 and 3)
One possible explanation for the results of experiment 2 is that reading nonwords that are well-formed activates the representations of real words that are similar to them (e.g. BRIVERY ➔ BRAVERY). Thus, given the strong sensitivity of the high-level language network to word meanings (e.g. Fedorenko et al. 2012b; Pereira et al. 2018), stronger responses to more well-formed nonwords could be explained on a purely lexical basis, without invoking sublexical/phonological regularities.
We tested this possibility in two ways by focusing on phonological neighborhood measures because nonwords that are similar to (and may therefore activate) real words will have higher neighborhood density: first, in experiment 3, we measured neural responses to two groups of nonwords that were matched on phonotactic probability [two-sample t-test, orthography-based measure: t(172) = 1.1, P = 0.27, Fig. 4D; see Methods for a pronunciation-based measure yielding similar results] but differed in the size of their orthographic and phonological neighborhood [t(172) = 6.9 P < 0.001, Fig. 4E], and second, we computed the average orthographic neighborhood size of the nonwords in the five conditions of experiment 2 and examined the relationship between this measure and neural response strength.

Stimulus characteristics and behavioral results, Experiments 2 A–C) and 3 D–F). A) Phonotactic probability of the materials in Experiment 2. The ordinate represents phonotactic probability (Methods)—The mean count from an English corpus of all bigrams that occur in a nonword; the abscissa represents the five conditions in Experiment 2, ordered by the bin centers of the well-formedness ratings, from most to least well-formed. Note that the highest well-formed group (bin center 4.75) mostly contained real words, but all four other groups contained only nonwords. B) Orthographic neighborhood size of the materials in Experiment 2. The ordinate represents orthographic neighborhood size (Methods), i.e. the number of real words that are identical to the nonword up to a substitution of a single letter. The abscissa is the same as in (A). C) Behavioral results in experiment 2. The ordinate is the accuracy in the memory probe task (Methods). The abscissa is the same as in (A). D) Phonotactic probability of the materials in Experiment 3. The ordinate is the same as in (A). The abscissa represents the two conditions in Experiment 3. The graph shows a numerical decrease of phonotactic probability due to neighborhood size but this effect is not significant (see text). E) Orthographic neighborhood size of the materials in Experiment 3. The ordinate is the same as in (B). The abscissa is the same as in (D). F) Behavioral results in Experiment 3 (shows participants were at ceiling for both conditions). The ordinate is the accuracy in the repetition detection task (Methods). The abscissa is the same as in (D).
In experiment 3, the high- and low-neighborhood conditions elicited responses that were comparable in magnitude across the language network (pink bars in Fig. 2, experiment 3), with no evidence for stronger responses to high-neighborhood nonwords either in the language network as a whole or in any of the individual fROIs (ps > 0.1, Table S1 and 3).
In experiment 2, of greatest relevance are the two conditions with the lowest well-formedness ratings (the two rightmost, light blue bars in Fig. 2, experiment 2). Although these conditions have similarly low orthographic neighborhood size [both around 0; two-sample t-test: t(718) = 0.6, P = 0.52, Fig. 4B], they elicited differential brain responses such that the second-lowest well-formed condition activated the language network significantly more than the least well-formed condition [a post-hoc LME revealed a small but significant difference in PSC between these two conditions = 0.14, t(158) = 2.4, P = 0.017]. In contrast to their similar orthographic neighborhood size, these conditions differ reliably in their phonotactic probability [t(718) = 6.2, P < 0.001; Fig. 4A], largely mirroring the well-formedness ratings.
In summary, the results of both experiment 3 and the post-hoc analysis of the two least well-formed conditions in experiment 2 suggest that phonotactic probability (likely reflected in the well-formedness ratings in experiment 2) explains neural responses in the language regions better than neighborhood size. This result suggests that these responses are not likely to be due to the activation of lexical representations of neighboring real words.
Discussion
Across five fMRI experiments, we investigated the responses of “high-level” language processing brain regions (Fedorenko et al. in press) to nonwords—meaningless sequences of sounds/letters (e.g. punes, silory, flope)—and found that these regions indeed robustly respond to such stimuli in an abstract (modality- and task-independent) fashion. Moreover, we found that the language regions are highly sensitive to the phototactic well-formedness of nonwords, which suggests that regions that extract high-level meaning from language also represent and process sublexical phoneme-combinatorial regularities. In the remainder of the discussion, we situate these findings in the broader theoretical and empirical context and discuss their implications.
The high-level language network responds to nonwords
A network of frontal and temporal brain regions supports language processing. These regions respond during both listening to and reading of linguistic stimuli (e.g. Fedorenko et al. 2010; Vagharchakian et al. 2012; Regev et al. 2013; Scott et al. 2017) across tasks (e.g. Fedorenko et al. 2010; Cheung et al. 2020; Diachek et al. 2020), but show little or no response to diverse non-linguistic functions (Fedorenko et al. 2011; Monti et al. 2012; Ivanova et al. 2020, 2021; Fedorenko and Blank 2020).
The precise contributions of this network to language processing remain debated (Hickok and Poeppel 2007; Price 2010; Friederici 2011; Indefrey 2011; Hagoort 2013, 2019; Duffau et al. 2014; Pylkkänen 2019; Fedorenko et al. in press). Many have argued that distinct subsets of this network store and process syntactic/combinatorial structure vs. word meanings (e.g. Grodzinsky and Santi 2008; Baggio and Hagoort 2011; Friederici 2011, 2012; Tyler et al. 2011; Duffau et al. 2014; Ullman 2015). However, evidence has been accumulating against this distinction, suggesting that each region of the language network supports both syntactic and lexico-semantic processing (e.g. Dick et al. 2001; Wilson and Saygin 2004; Fedorenko et al. 2010, 2012, 2020; Bautista and Wilson 2016; Blank et al. 2016; Shain, Kean et al. 2023). Other work has implicated the language network in word-internal morphological processing (e.g. Bozic et al. 2010).
The current study establishes that the language network is sensitive to an even shorter scale of linguistic information relative to syntax, lexical semantics, and morphology—sublexical sound patterns—as evidenced by responses to sequences of phonemes that do not constitute real words. The response to nonwords in the language network is, by definition, lower than the response to sentences because this network is defined by the Sentences > Nonwords contrast (Fedorenko et al. 2010). Nevertheless, nonwords elicit a response that is consistently and reliably higher than the low-level baseline. Above-baseline responses to nonwords in the language network can be observed in prior fMRI (e.g. Fedorenko et al. 2010; Mahowald and Fedorenko 2016; Mollica et al. 2020; Chen et al. 2023) and intracranial (e.g. Fedorenko et al. 2016) reports. Additionally, previous data show that the responses to nonwords are larger than to many nonlinguistic tasks, including arithmetic, spatial working memory, and music perception (e.g. Mineroff et al. 2018; Fedorenko and Blank 2020; Chen et al. 2023). Sensitivity of the language network to phonological information is also consistent with reliable responses to unfamiliar foreign languages—from which only phonological-level information can be extracted—in the language regions of bilinguals and polyglots (Malik-Moraleda et al. 2022, Malik-Moraleda, Jouravlev et al. 2023) and with robust representations of phonemic information across the language network during naturalistic auditory language comprehension (e.g. Gong et al. 2023). However, this is the first study to systematically investigate the responses in the language network to nonwords and try to understand what drives them.
The fact that the language regions respond both when participants read nonwords (experiments 1a, 2, and 3) and when they listen to them (experiments 1b and 1c) demonstrates that the representation of nonwords is abstract (unpublished findings from Rebecca Saxe’s lab further show that nonwords in American Sign Language (ASL)—signs similar in form to meaningful ones but lacking meaning—also elicit above-baseline responses in the language areas; data as published in Richardson et al. 2020). These results align with previous findings of modality-independent responses of the language network to stories, sentences, and word lists (e.g. Fedorenko et al. 2010; Vagharchakian et al. 2012; Regev et al. 2013), but critically extend them to stimuli that lack meaning.
Similarly, we show that the response to nonwords in the language network generalizes across tasks, including passive reading/listening, processing of nonword strings followed by a memory probe (“did you encounter this nonword in the preceding string?”), and repetition detection. These findings are in line with the task-independence of the language network’s responses to words and sentences (e.g. Cheung et al. 2020; Diachek et al. 2020). Importantly, none of these tasks require selective attention to particular properties of the nonwords, which suggests that this response reflects the intrinsic computations necessary for recognizing and processing sublexical sound patterns. Such computations are presumably critical to language acquisition and processing given that any newly encountered word is, at first, just a sequence of sounds that gradually acquires semantic associations as we learn the word’s meaning (e.g. Davis et al. 2009; Perry et al. 2018; Jones et al. 2021).
Combined with prior studies, our results suggest that the fronto-temporal language network supports not only the processing of words and inter-word dependencies but also of lower-level phonological information, as evidenced by strong responses to sequences of phonemes that obey phoneme-combinatorial constraints but do not correspond to meanings in our lexicon. The reports of phonological impairments following brain lesions that also cause higher-level linguistic deficits, i.e. aphasia (e.g. Geva et al. 2011; Kries et al. 2023) further point to a causal role of these brain areas in phonological processing. Any proposal about the language network’s computations should therefore account for its role in phonological-level processing.
The language network is sensitive to phonotactic regularities
In experiment 2, we found that more well-formed/phonotactically probable nonwords elicit stronger responses in the language network. This modulation of neural activity by nonword well-formedness plausibly reflects a process of matching sound patterns to stored representations extracted from our previous experience with a language, whereby the strength of response is proportional to how well the stimulus matches stored linguistic regularities and the amount of matching information (e.g. Hayes and Wilson 2008). This idea is reminiscent of the notion of “phonological schemata” (Jackendoff 2002). Storage of frequent sound/letter n-grams may allow for more efficient processing through enabling the representation assembly to proceed in larger chunks than single phonemes/letters (Bybee 1999; Bybee and Hopper 2001; Vitevitch and Luce 2005; O’Donnell 2015).
Might the stronger responses to more well-formed nonwords instead (or additionally) reflect activation of lexical representations of real words that share phonological/sound structure with them? We evaluated this possibility in two ways and did not find support for it. First, in experiment 3, nonwords that differed in the number of real-word neighbors (but were matched on phonotactic probability) elicited similar-magnitude responses in the language network. Second, in experiment 2, we found that although the two least well-formed groups of nonwords both had few or no real-word neighbors, the more well-formed nonwords elicited stronger responses in the language network. So, it appears that the language regions represent sublexical units including phoneme sequences that are not associated with a lexical–semantic representation. We suggest that the frequency of these phoneme sequences in our experience with the language is what drives the response to nonwords in the language regions, even if familiar sound patterns do not lead to lexical activation of similar-sounding real words.
A possibility that is more difficult to rule out is that the response to nonwords is, at least in part, driven by the (relatively rare) semantic associations that might be elicited by particular sounds/sound clusters (e.g. Iwasaki et al. 2007; Monaghan et al. 2014; Larsson 2015; Blasi et al. 2016; Winter et al. 2017; Sidhu and Pexman 2018; Pimentel et al. 2019; Vinson et al. 2021) or morphemes/morpheme-like elements that occur in some nonwords (e.g. Bozic et al. 2010). Further research is needed to determine the precise features that make a nonword elicit an above-baseline response in high-level language areas, including whether sublexical semantic associations may be sufficient to explain this response. In addition, developmental investigations, especially during the first few years of life—when most words we encounter do not yet have meaning—could help illuminate the formation of linguistic knowledge representations (e.g. Jones et al. 2021).
Phonological processing outside of the high-level language network?
In a whole-brain search, all the brain regions that showed sensitivity to nonword well-formedness also showed sensitivity to high-level linguistic meaning, suggesting that they fall within the boundaries of the language network. Furthermore, whole-brain voxel-wise activation patterns were highly similar between the nonword well-formedness contrast and the language localizer contrast. These results suggest that not only are language regions sensitive to phonotactic regularities, they also constitute the primary processing system for these regularities. However, it is important to clarify that our core claim is a positive claim about the sensitivity of the language regions to sublexical regularities. We are not making a strong argument about the lack of sensitivity to phonotactic regularities, or more general contributions to phonological processing, by brain regions outside of the language network. The latter claim would require additional evidence, such as (1) a more comprehensive characterization of the functional profiles of the phonology-sensitive regions we found in the whole-brain search and/or (2) functionally identifying specific brain regions other than the language network in individual participants (some candidate regions are mentioned below) and examining their responses to diverse kinds of nonwords under different task conditions.
Aside from the language regions, where might one expect to find sensitivity to phonotactic well-formedness? One likely candidate are the speech perception areas in the superior temporal gyrus. These areas are highly selective for the processing of speech sounds relative to other sounds (e.g., Norman-Haignere et al. 2015), represent the identity of single phonemes (e.g., Mesgarani et al. 2014; Leonard, Gwilliams et al. 2023) and have even been suggested to be sensitive to transitional probabilities in multi-phoneme sequences (Leonard et al. 2015). Importantly, these areas are distinct from the language areas (Fedorenko et al. in press): unlike the language areas, the speech perception areas are not sensitive to linguistic meaning, showing similarly strong response to meaningful and meaningless speech (e.g. Norman-Haignere et al. 2015; Overath et al. 2015). Given that these areas have relatively short “temporal receptive windows” (e.g. Hasson et al. 2008) of approximately half a second (e.g. Overath et al. 2015; Norman-Haignere et al. 2022), they plausibly process temporally local phonological information and provide input to the language areas, which integrate information across longer scales—syllables and words—and compute linguistic meaning (Lerner et al. 2011; Blank and Fedorenko 2020; Regev et al. 2023). It is therefore possible that we did not see sensitivity to phonotactic well-formedness in the speech perception areas because the nonwords in experiment 2 were locally well-formed (by design). Another possibility is that the speech perception areas are only sensitive to phonotactic regularities for auditorily presented sequences (whereas our stimuli were presented visually). The latter would imply that these areas’ representations and computations are not truly abstract (in contrast to the language areas) and are instead tied specifically to the auditory modality. A definitive answer would require a version of experiment 2 with both visual and auditory stimuli (ideally, with manipulations that affect well-formedness at different temporal scales) and an independent functional localizer for speech perception areas (e.g. Overath et al. 2015).
In addition to the language areas and speech perception areas, past neuroimaging and patient studies of phonological processing have implicated a wide array of cortical, subcortical, and cerebellar areas (e.g. see Vigneau et al. 2006; Price 2012), including studies that, similar to the current study, have used manipulations of phonotactic/orthographic well-formedness and phonological/orthographic neighborhoods (e.g. Okada and Hickok 2006; Vinckier et al. 2007; Vaden et al. 2011; Gow and Nied 2014; Gow and Olson 2015; Woolnough et al. 2020; Avcu et al. 2023). However, several factors make it challenging to interpret these findings and to relate them to the current results. First, most prior studies have used a single set of stimuli and a single task, making it difficult to assess the robustness and generalizability of the reported results. Second, many of the tasks that are commonly used in investigations of phonological processing go beyond the natural “task” of processing linguistic input with the goal of meaning extraction. As a result, these tasks may engage cognitive processes, and associated neural mechanisms, beyond those that support the processing of linguistic input.
For example, tasks like rhyme judgments (e.g. Petersen et al. 1989; Paulesu et al. 1993; Seghier et al. 2004; Geva et al. 2011; Pillay et al. 2014; Yen et al. 2019), nonword repetition (e.g. Fridriksson et al. 2010; Church et al. 2011; Scott and Perrachione 2019), or other tasks that require active maintenance of words/nonwords in working memory (e.g. Paulesu et al. 1993; Awh et al. 1996) may engage the articulation network (e.g. Bohland and Guenther 2006; Guenther 2016; Basilakos et al. 2017, 2018). Similar to the speech perception areas—and in contrast to the language areas—the articulation areas are only sensitive to the surface properties of speech, not to linguistic meaning (Fedorenko et al. in press). Some phonological tasks may instead, or in addition, engage areas of the domain-general MD network, which supports task demands across domains (e.g. Duncan 2010, 2013; Fedorenko et al. 2013a; Shashidhara et al. 2019), is robustly distinct from the language network (e.g. Fedorenko et al. 2012a; Fedorenko and Blank 2020) and has been shown to get engaged when linguistic processing is accompanied by extraneous task demands (Diachek et al. 2020; Quillen et al. 2021). Importantly, however, because past work has not relied on functional localizers, interpretation of activation in a particular anatomical area as reflecting a particular perceptual, motor, or cognitive process—what is known as a “reverse inference”—is challenging (Poldrack 2006; Fedorenko 2021).
Conclusion
We have presented evidence that auditory or visual meaningless sequences of phonemes elicit responses in the language network—a set of brain regions that have been traditionally associated with the processing of word meanings and word-combinatorial processing. This robust sensitivity of the high-level language regions to sublexical phonemic patterns aligns with views of linguistic knowledge and processing where the boundaries between different levels of linguistic structure—from phonemes to morphemes to words to constructions and syntactic rules—are not sharp (e.g. Gaskell and Marslen-Wilson 1997; Bybee 1999, 2013; Goldberg 2003; Jackendoff 2007; Huettig et al. 2020; Jackendoff and Audring 2020), and it challenges accounts of the language network, or its subcomponents, that focus on phrase-structure building, compositional meaning, or prediction at the level of word sequences.
Acknowledgments
We would like to acknowledge the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT, and its support team (Steve Shannon and Atsushi Takahashi). We thank former and current EvLab members for their help with fMRI data collection, Steve Piantadosi for help in creating the materials for experiment 2, Peter Graff for early discussions of phonology in the brain, Alex Paunov for help with some of the analyses, and Ray Jackendoff for helpful comments on this work. T.I.R. also thanks Janet Werker, the audience at the 2020 Neurobiology of Language conference, and Tali Bitan and her lab for helpful discussions.
Author contributions
Conceptualization: T.I.R., L.B., K.M., E.F. Data curation: T.I.R., A.E.S, L.B., K.M., E.F. Investigation: J.A., K.M., E.F. Formal analysis: T.I.R., H.S.K., X.C., J.A., E.F. Visualization: T.I.R. Writing-original draft: T.I.R., K.M., E.F. Writing - review & editing: all authors. Supervision: K.M., E.F.
Funding
T.I.R. was supported by the Zuckerman-CHE STEM Leadership Program and the Poitras Center for Psychiatric Disorders Research. E.F. was supported by the National Institute of Health (NIH): R00 award HD057522, R01 awards DC016607 and DC016950, and funds from the Brain and Cognitive Sciences department, the McGovern Institute for Brain Research, and the Simons Center for the Social Brain.
Conflict of interest statement: None declared.
Data availability
Processed data and materials are available at: Open Science Framework (OSF) platform (https://osf.io/6c2y7/). Raw data can be made available upon request.
Abbreviations
AntTemp—anterior temporal; PostTemp—posterior temporal; IFG—inferior frontal gyrus; IFGorb—inferior frontal gyrus, orbital portion; MFG—medial frontal gyrus; AngG—angular gyrus; AntTemp-L—anterior temporal, left hemisphere (and similarly for PostTemp-L, IFG-L, IFGorb-L, MFG-L and AngG-L); AntTemp-R—anterior temporal, right hemisphere (and similarly for PostTemp-R, IFG-R, IFGorb-R, MFG-R and AngG-R); GcSS analysis—group-constrained subject-specific analysis; fROI—functional region of interest; fMRI—functional magnetic resonance imaging.
References
Chen X, Affourtit J, Ryskin R, Regev TI, Norman-Haignere S, Jouravlev O, Malik-Moraleda S, Kean H, Varley R, Fedorenko E. The human language system, including its inferior frontal component in “Broca's area,” does not support music perception.
Huettig F, Audring J, Jackendoff R. A parallel architecture perspective on pre-activation and prediction in language processing.
Author notes
Kyle Mahowald and Evelina Fedorenko Co-senior authors.