The distractibility that older adults experience when listening to speech in challenging conditions has been attributed in part to reduced inhibition of irrelevant information within and across sensory systems. Whereas neuroimaging studies have shown that younger adults readily suppress visual cortex activation when listening to auditory stimuli, it is unclear the extent to which declining inhibition in older adults results in reduced suppression or compensatory engagement of other sensory cortices. The current functional magnetic resonance imaging study examined the effects of age and stimulus intelligibility in a word listening task. Across all participants, auditory cortex was engaged when listening to words. However, increasing age and declining word intelligibility had independent and spatially similar effects: both were associated with increasing engagement of visual cortex. Visual cortex activation was not explained by age-related differences in vascular reactivity but rather auditory and visual cortices were functionally connected across word listening conditions. The nature of this correlation changed with age: younger adults deactivated visual cortex when activating auditory cortex, middle-aged adults showed no relation, and older adults synchronously activated both cortices. These results suggest that age and stimulus integrity are additive modulators of crossmodal suppression and activation.
Older adults have difficulty recognizing speech, particularly in challenging listening conditions (Gordon-Salant 1986; Dubno et al. 2002, 2003). This difficulty appears to stem from declines in the peripheral auditory system (Humes 1996) as well as the central auditory system (Zekveld et al. 2006; Eckert, Walczak, et al. 2008). Speech recognition declines may also result from changes in neural systems that direct attention. Indeed, aging is associated with increased distractibility (Connelly et al. 1991; Carlson et al. 1995; May 1999) which, according to the Inhibitory Deficit Hypothesis (Hasher and Zacks 1988; Gazzaley and D'Esposito 2007), reflects reduced inhibitory control within (Gazzaley et al. 2005) and across sensory systems (Laurienti et al. 2006; Peiffer et al. 2007). Thus, age-related declines in speech recognition may be due in part to older adults' diminished ability to inhibit both irrelevant auditory and non-auditory information.
Younger adults have been shown to readily suppress visual cortex activity when attending to auditory stimuli such as tones (Hairston et al. 2008), words (Yoncheva et al. 2010), and short melodies (Johnson and Zatorre 2005, 2006). This suppression follows from work by Haxby et al. (1994) showing that selective attention to a particular sensory modality is associated with decreased activity in regions that are responsive to other sensory modalities. For example, Kawashima et al. (1995) found that young adults suppressed visual cortex activation during a tactile task both when their eyes were kept open and closed. Likewise, visual suppression during an auditory task has been shown to occur whether or not visual stimuli are presented (Johnson and Zatorre 2005), suggesting that younger adults are adept at selectively attending and therefore inhibiting task-irrelevant crossmodal information whether externally or internally represented. Thus, suppression of task-irrelevant visual cortices may prevent increases in effort or declines in performance that are associated with divided attention or may limit distractibility, similar to wanting to close one's eyes while listening to difficult speech. The current study aimed to investigate to what extent task-irrelevant visual cortex suppression is dependent upon age and task difficulty when listening to speech.
Age-related changes in the control of visual cortex activation have been reported in previous research. Atypical activation has been observed when older adults view pictures of faces while ignoring scenes or vice versa (Gazzaley et al. 2005; Rissman et al. 2009) and when listening to tones (Peiffer et al. 2009). However, it is not known if older adults exhibit a crossmodal pattern of suppression during speech recognition like younger adults or if they exhibit atypical visual cortex activation. It is possible that changes in perceptual function (Murphy et al. 1999) and/or changes in task demands modulate these effects.
Increasing task difficulty may require additional attentional resources thereby limiting the availability of neural resources that typically provide inhibitory control and thus result in distractibility to disinhibited irrelevant stimuli. Older adults tend to exhibit more difficulty performing tasks than younger adults, especially with speeded responding (Salthouse and Somberg 1982). Young adults exhibit similar neurobiological effects as older adults when difficulty is equated between age groups. For example, in a unimodal visual task in which participants attend to images of faces and ignore scenes or vice versa, younger adults under increased task difficulty exhibited reduced suppression of task-irrelevant cortical regions similar to older adults performing under easier task conditions (i.e., increased engagement of the parahippocampal place area when attending to faces or the fusiform face area when attending to scenes; Rissman et al. 2009).
Such age-related changes in visusal cortex activation have even been found in a unimodal aural word recognition task, pointing to an interaction between deficits in aging and task difficulty on the degree of crossmodal engagement. For example, Eckert, Walczak, et al. (2008) observed increased frontal activity in older adults when they correctly repeated low-pass filtered words in relatively easy listening conditions. In contrast, younger adults were more likely to engage these regions when they incorrectly repeated these words in more challenging listening conditions. While it was not the focus of the study, a visual cortex region also exhibited this pattern of results. This finding suggests that visual cortex activation is dependent on both age and word intelligibility. The current experiment tests this hypothesis using a word listening task and examines to what extent changes in visual cortex activation are tied to changes in auditory processing.
The aim of this functional magnetic resonance imaging (fMRI) study was to investigate the extent to which listening difficulty and age modulate visual cortex (de)activation. Adults 19–79 years old listened to words that were band-pass filtered to vary intelligibility. We predicted that 1) younger adults would deactivate visual cortex during word listening compared with baseline, and, following from the Inhibitory Deficit Hypothesis, 2) increasing age would be associated with reduced suppression and therefore increasing visual cortex activation. Moreover, we predicted 3) similar effects of word intelligibility such that participants across the age range would demonstrate reduced visual cortex suppression with decreasing intelligibility (increased task difficulty). Because we have not observed differences in the parametric slope of auditory cortex activity with age in previous studies (e.g., Eckert, Walczak, et al. 2008; Harris et al. 2009), we did not predict interaction effects between word intelligibility and age in auditory cortex.
To rule out an alternative explanation that apparent failures of suppression may be explained by age-related changes in vascular reactivity (D'Esposito et al. 2003), participants also performed a respiration task in which their breathing was physiologically monitored as they were scanned (Thomason and Glover 2008). This is crucial in the current study because visual cortical regions are especially susceptible to age differences in vascular reactivity (Ross et al. 1997; Buckner et al. 2000). In addition, the hemodynamic response function (HRF) of older adults can be temporally delayed compared with younger adults (Taoka et al. 1998). A delayed HRF could manifest as an apparent suppression effect given that different points of the HRF curve could be sampled in a sparse sampling design. Thus, to control for physiological age differences in the interpretation of results, a respiration task was used to estimate potential individual variations in HRF amplitude and lag effects on crossmodal function.
Finally, we predicted that 4) activation in auditory and visual areas would be functionally correlated, thereby reflecting the engagement of a crossmodal system that has been observed in a resting state task with young adults (Eckert, Kamdar, et al. 2008). In the current study, we examined to what extent areas associated with auditory and visual processing are synchronously (de)activated and how this relationship changes with age. For younger adults who demonstrate suppression of visual cortex during auditory tasks, we predicted a negative relation between the engagement of auditory and visual cortices. If aging is associated with decreasing visual cortex suppression, this negative connectivity may weaken, degrade entirely, or manifest as a positive functional relation in older adults.
Materials and Methods
Thirty-six adults aged 19–79 years (M = 50.5, standard deviation [SD] = 21.0; 13 females) participated in this study. All were monolingual native speakers of American English. Their average years of education was 18.2 years (SD = 2.9), average socioeconomic status was 52.4 (SD = 10.0; Hollingshead 1975), and degree of handedness on the Edinburgh handedness questionnaire (Oldfield 1971) was 91.4 (SD = 9.6, where 100 is maximally right handed and −100 is maximally left handed). Age was not significantly correlated with any of these measures (all P > 0.05). Three age groups were constructed for a subset of analyses to clarify the pattern of results we find with our continuous age predictor: younger (<40 years old, n = 15), older (>61 years old, n = 15), and middle aged (n = 6).
Each participant completed the Mini Mental State Examination (Folstein et al. 1983) and had 3 or fewer errors, indicating little or no cognitive impairment (Tombaugh and McIntyre 1992). Participants provided written informed consent before participating in this Medical University of South Carolina Institutional Review Board approved study.
Pure-tone thresholds at standard intervals between 250 and 8000 Hz were measured with a Madsen OB922 clinical audiometer calibrated to appropriate ANSI standards (American National Standards Institute 2004) using TDH-39 headphones. Participants age 61 and younger had normal hearing (defined as thresholds ≤ 25 dB HL at 250–8000 Hz). Older participants' pure-tone thresholds varied from normal hearing to moderate high-frequency hearing loss. Mean pure-tone thresholds for 3 groups of adults (young, middle-age, and older) are depicted in Figure 1. As described later, a masking noise and band-pass filtering of speech reduced individual differences in word audibility.
The stimuli for this study included 120 English words selected from a database of consonant-vowel-consonant words, which were spoken by a male Native English speaker (Dirks et al. 2001; see Supplementary Table 1). Previous studies have shown that the properties of words and phonemes can alter the ease with which words may be recognized (e.g., Luce and Pisoni 1998). Because these properties have been shown to differentially impact various age groups (Sommers 1996), we controlled for several lexical and phonemic properties of our stimuli. Specifically, the words presented in each of the 4 filter conditions did not differ significantly (by a one-way analysis of variance [ANOVA] and pairwise t-tests) on any of the following average measures: word familiarity, word frequency (Spoken Word Frequency, Kucera–Francis, and SUBTLEXUS), phonological neighborhood density, mean frequency of phonological neighbors, frequency of biphonemic units averaged across the word (weighted and unweighted by word frequency), articulation duration of each stimulus, mean count of voiced/voiceless phonemes per word, and mean count of stop/continuant phonemes per word. All measures were obtained from 3 databases (except for the last 3 measures, which were calculated by the first author): 1) a database of 400 words used in a prior study with older adults of the neighborhood activation model (Dirks et al. 2001), 2) the Irvine Phonotactic Online Dictionary (IPhod) database (Vaden et al. 2009), and 3) a Spoken Word Frequency Database for American English (Pastizzo and Carbone 2007), all of which pool word and phoneme metrics from various established sources (e.g., Kucera and Francis 1967; Simpson et al. 2002; Brysbaert and New 2009). For a complete listing of means, SDs, and statistical tests, see Supplementary Table 2.
To manipulate word intelligibility, each word was presented in only 1 of 4 band-pass filter conditions in which the upper cutoff frequencies were 400 (most speech information removed), 1000, 1600, and 3150 Hz (least speech information removed). Articulation Index calculations (Studebaker et al. 1993) and previous behavioral testing (Eckert, Walczak, et al. 2008) predicted that these cutoff frequencies would be approximately linearly related to word recognition when the upper cutoff frequency is plotted on a log scale. The lower cutoff frequency was fixed at 200 Hz (Eckert, Walczak, et al. 2008; Harris et al. 2009), which was greater than the average pitch (F0) across words (M = 122 Hz, SD = 27).
Word Recognition Task
The word listening experiment used an event-related design in which each word was presented in 1 of the 4 filter conditions. The order of the conditions was pseudorandom within the presentation list such that no filter condition appeared twice in a row. Each participant received the same presentation list. A condition in which only background noise was presented (described below) was also included and pseudorandomly interspersed with the word trials. Eprime software (Psychology Software Tools, Pittsburgh, PA) and an IFIS-SA control system (Invivo, Orlando, FL) were used to control the presentation and timing of each trial in the scanner.
Each 8-s repetition time (TR) began with an image acquisition. During each word-presentation trial, a white crosshair was presented in the center of a projection screen display. Two seconds into the trial, a filtered word was presented (in 1 of the 4 filter conditions) at 75 dB SPL. One second later, the crosshair turned red and the participant responded by pressing the thumb button of an ergonomically designed button response keypad (MRI Devices Corp., Waukesha, WI) if they recognized the word and the index finger button if they did not. Participants were told that some words were more difficult to recognize than others but that they were encouraged to do their best to identify the word. Ten trials in which no word was presented were randomly interspersed throughout the experiment to serve as part of an implicit baseline and reduce the predictability of word onsets in this sparse sampling design. Thus, there was a total of 130 TRs (30 words in each of the 4 filter conditions and 10 no-word trials).
This task reduces participant focus on articulatory planning associated with overt production tasks or on semantic or sublexical processing associated with other types of semantic judgment and lexical decision tasks. Unlike with passive listening, however, this word recognition study ensured that participants were actively listening to stimuli while minimizing the potential for age-related differences in the ability to follow more complicated instructions throughout the experiment. Similar recognition judgment tasks have been successfully used in other studies of speech intelligibility (e.g., Davis and Johnsrude 2003; Obleser et al. 2007).
As described, pure-tone thresholds of older participants ranged from normal to moderately elevated, especially in the higher frequencies (Fig. 1). To reduce confounding effects of differences in word audibility across participants, words were presented in background noise that was spectrally shaped to elevate thresholds of all participants, similar to Dubno et al. (2006). One-third octave band levels of a broadband noise were set to produce estimated elevated thresholds (“masked thresholds”) of 20–25 dB HL from 250 to 2000 Hz and 30 dB HL at 3000 Hz (Hawkins and Stevens 1950). Measured thresholds in quiet for some older participants were higher than the estimated masked thresholds produced by the background noise. This occurred for 2 participants for thresholds between 250 and 1000 Hz, 5 participants for thresholds at 2000 Hz, and 6 participants for thresholds at 3000 Hz. For these participants, speech audibility in the unfiltered region may have been lower than for the other participants. Of course, this would not be a concern when band-pass filtering removed higher frequency speech information for all participants. A second computer was used to present the background noise continuously at 62.5 dB SPL, resulting in a signal-to-noise ratio of +12.5 dB. The words from the first computer were mixed with the broadband noise from the second computer at 2 s into each 8-s trial using an audio mixer and were delivered to 2 MR-compatible headphones (Sensimetrics, Malden, MA), diotically, so that the mixed speech and noise signal was presented to both ears. Signal levels were calibrated using a precision sound level meter (Larson Davis 800B, Provo, UT).
A respiration experiment occurred subsequent to the word recognition task. Following Thomason and Glover (2008), participants viewed a screen that changed color from green to yellow to red. When the screen was green (6 TRs; 12 s), participants were instructed to breathe normally. When the screen turned yellow (1 TR; 2 s), participants prepared to hold their breath and when the screen turned red (6 TRs; 12 s), they held their breath. This sequence cycled 10 times. If participants were unable to hold their breath for this time, they were asked to hold their breath as long as was comfortable and then breathe normally. For this reason, as we discuss in the analysis section, we chose to determine onsets and durations of inhalations based on actual respiration measurements rather than assuming compliance with these specific instructions. Respiration was measured by placing an air cushion on the participant's abdomen and securing it with a Velcro belt. As the participant's abdomen rose with an inhalation, increased pressure was placed on the cushion registering as a positive deflection from baseline. The pressure placed on the cushion (signaling changes in respiration) was sampled at 100 Hz.
A sparse sampling design was used in which an entire volume was acquired once for each TR. This designed provided a means to present each word when the scanner noise was turned off (Eckert, Walczak, et al. 2008; Harris et al. 2009). T2*-weighted functional images were acquired using an 8-channel SENSE head coil on a Philips 3-T scanner. A single shot echo-planar imaging (EPI) sequence covered the entire brain (40 slices with a 64 × 64 matrix, TR = 8 s, echo time [TE] = 30 ms, slice thickness = 3.25 mm, and acquisition time [TA] = 1647 ms).
The respiration experiment utilized a continuous acquisition paradigm. T2*-weighted functional images were acquired once every 2 s. A single shot EPI sequence covered the entire brain (36 slices with a 64 × 64 matrix, TR = 2 s, TE = 30 ms, slice thickness = 3.25 mm, and a TA = 1944 ms).
T1-weighted structural images were also collected to normalize the functional data using higher resolution anatomical information (160 slices with a 256 × 256 matrix, TR = 8.13 ms, TE = 3.7 ms, flip angle = 8°, slice thickness = 1 mm, and no slice gap).
A study-specific structural template was created to ensure that images for all participants were coregistered to a common coordinate space. The participants' structural images were used to normalize the functional data into a common coordinate space, which was derived from diffeomorphic transformation of the structural images. Unified segmentation and diffeomorphic image registration (DARTEL) were performed in SPM5 (Ashburner and Friston 2005; Ashburner 2007). The DARTEL procedure warps each participant's native space gray matter image to a common coordinate space, providing closely aligned coregistration across participants (Eckert et al. 2010). The realigned functional data were coregistered to each participant's T1-weighted image using a mutual information algorithm. The DARTEL normalization parameters for each T1-weighted image were then applied to the anatomically aligned functional data. An 8 mm Gaussian kernel was used to ensure the data were normally distributed for parametric testing.
Functional images were preprocessed using SPM5 (http://www.fil.ion.ucl.ac.uk/spm). The Linear Model of the Global Signal method (Macey et al. 2004) was used to detrend the global mean signal fluctuations from the preprocessed images. Image volumes with significantly deviant signal (more than 2.5 SDs) relative to the global mean and with a significantly deviant number of voxels from their voxelwise mean across the run were identified and modeled as 2 nuisance regressors in the first-level analysis (Vaden et al. 2010). Using this method, 4.65% of the images were identified as significantly deviant. Two motion nuisance regressors, representing 3D translational and rotational motion, were calculated from the 6 realignment parameters generated in SPM via the Pythagorean Theorem.
A first-level fixed-effects analysis was performed for each individual's images to estimate differences in activity across the 4 filter conditions. The model contained 4 parameters (1 for each filter condition), 4 regressors (2 motion regressors and 2 signal intensity regressors), and a constant vector. The model was convolved with the SPM5 canonical HRF and high-pass filtered at 128 s. Contrasts were derived in the first-level analysis to examine the relative activation of listening to words versus an implicit baseline, which included the silent trials. Additional contrasts examined both linearly increasing and decreasing effects of word intelligibility.
Second-level random-effects analyses were performed to examine the consistency of the effects across participants. An age covariate was included to model the linear effect of increasing age on cortical activation during word listening and to control for age differences in intelligibility analyses. The peak voxel threshold and cluster extent thresholds were each set at P < 0.01 uncorrected (Poline et al. 1997; Harris et al. 2009) for all reported analyses unless otherwise noted.
Based on the second-level aging results, a functional connectivity analysis was conducted. A functionally defined region of interest (ROI) was created from the intersection of the resulting visual cluster and a 16 mm diameter sphere, centered around a peak voxel within that cluster, to ensure that the sphere did not extend beyond the visual cortex result or the edges of the brain. Using MarsBar (Brett et al. 2002), the time series from this seed region was entered as a regressor into first-level analyses. Two additional nuisance regressors consisted of time series extracted from 20% probability gray and white matter masks to control for whole-brain correlations. The results depicted the areas across the entire brain in which activation was correlated with the seed region time series in a second-level group analysis containing age as a covariate.
Six of the 36 participants were not included in this analysis because they did not complete the respiration task (n = 1; age 52), because physiological data were not recorded (n = 3; ages 19, 25, and 61), or because physiological data were too noisy to identify periods of breathing and holding (n = 2; ages 27 and 57). The functional data (preprocessed as described above) were analyzed by examining the blood oxygen level–dependent (BOLD) signal changes that were time locked to behaviorally defined periods of inhalation. Activation was time locked to respiratory events instead of stimulus events (screen color change) to provide greater sensitivity to increases in oxygenation, to mitigate effects in visual cortex that were solely linked to changes in the visual display, and to reduce the impact of age-related processing slowing that could differentially delay the behavior of older adults relative to the stimulus onsets. Because older adults are likely to have a delayed HRF (Taoka et al. 1998) and because of concerns with variability of the speed of responding to task demands, it was important to model each physiological event rather than contrasting average blocks of purported inhalations with average blocks of purported breath holding.
A custom MATLAB (Mathworks, Natick, MA) script was used to smooth the respiratory data (weighted average across an 800 ms window) and identify inhalation onsets and durations to be entered as parameters into the first-level model for each individual. An inhalation was defined as an increase in pressure on the cushion that persisted for more than 700 ms. On average, each participant had 48 (SD = 18) inhalations lasting 1.35 s (SD = 0.27). Figure 2A depicts the raw (gray line) and smoothed (black line) data from a representative participant (age 27), showing how the chosen smoothing and duration criteria classified periods as inhalations (light gray bands). Figure 2B presents a fast-Fourier transform of these data, which permits examination of how the smoothing (black line) and duration criteria (dotted vertical line) ignore information at frequencies that may have resulted from other physiological events and noise. These figures depict that, even if changes were made to the specific cutoff criteria, the data comprise high-powered low-frequency cycles (respiration) rather than high-frequency noise.
This inhalation parameter provided a means to assess differences in HRF amplitude across participants. The temporal derivative of the HRF was also specified as a parametric modulator. In this way, deviations in lag from the expected HRF could be compared for each participant. Two motion and 2 signal intensity nuisance regressors were also included in the model according to the procedure previously described.
A second-level analysis examined the extent to which differences in HRF amplitude and lag were related to age. To rule out the possibility that age-related differences in word listening were due to differences in vascular reactivity, the ROIs created from the word listening data were used to extract parameter estimates from the respiration data. These values were then entered as a separate regressor in the second-level word-listening analysis.
Word Intelligibility Analyses
Effects of Word Intelligibility on Word Recognition
On average, participants indicated via button press that they had recognized (vs. not recognized) a word for 71.1% (SD = 17.1%) of responses. Figure 3 depicts reported word recognition for each of the 4 filter conditions, showing increased reported recognition with increasing word intelligibility (as determined by the 4 upper cutoff frequencies) across all ages. A repeated measures ANOVA confirmed that intelligibility had a significant influence on reported word recognition, F3,105 = 124.30, P < 0.001, with all pairwise comparisons P < 0.001. There were no significant correlations between word recognition and age across or within each condition, all |r| < 0.19, P > 0.29.
Because participants occasionally button pressed during no-word trials (M = 3.81 false alarms, SD = 3.34), we were able to calculate d-prime to estimate their rate of button pressing when a word had been presented (hits) after subtracting their bias to respond when a word had not been presented (false alarms during no-word trials). Age was not significantly correlated with this unbiased word detection measure across (r = −0.21) and within each filter condition (r = −0.16, −0.23, −0.11, −0.05 for 400, 1000, 1600, and 3150, respectively), all P > 0.05. The normalized false alarm rate component of the d-prime measure (bias to respond during no-word trials) was also not correlated with age, r = −0.06, P > 0.05.
Effects of Word Intelligibility on BOLD Signal
Examination of the functional images during the task revealed that listening to words resulted in significantly increased activation throughout the temporal lobe particularly in bilateral superior temporal sulcus (STS) and gyrus (STG) and primary auditory cortex (Heschl’s gyrus, HG). Activation was also observed in the inferior frontal gyrus (Fig. 4A). With parametrically increasing intelligibility (from the 400 to 3150 Hz filter condition), these regions exhibited increasing activation (Fig. 4B). Left and right regions of the anterolateral STG and STS were particularly responsive to word intelligibility.
Aging and Task Difficulty Analyses
Based on evidence that effects similar to those observed in older adults can be observed in young adults by manipulating task demands (Rissman et al. 2009), we predicted that aging and decreasing intelligibility should have a similar impact on the engagement of visual cortex. To test this hypothesis, we examined the effects of parametrically increasing age and decreasing word intelligibility (while controlling for the other variable) on BOLD signal during word listening. In both cases, there was an increase in the activation of medial occipital cortex, bilateral inferior angular gyrus (iAG), and bilateral collateral sulci (Fig. 5A), and thus, increasing age and listening difficulty were related to increasing visual cortex activation.
When the parametric estimate values from this analysis were extracted from an ROI defined at the medial occipital peak within the visual cortex cluster, younger adults appeared to deactivate this region below an implicit baseline while older adults activated it during word listening. The transition between patterns of deactivation and activation began to emerge in middle age, with a change from visual cortex deactivation to activation occurring within this group for the most difficult listening condition (400 Hz). Figure 5B presents these values across filter condition and 3 age groups: younger (<40 years old, n = 15), older (>61 years old, n = 15), and middle aged (n = 6). Similar patterns were found for ROIs based on the entire visual cluster or other subpeaks within the visual cluster as well as for varying ROI sizes (8, 12, and 16 mm diameter spheres). Figure 5B also shows that the pattern of results in panel A (increasing listening difficulty with increasing visual cortex activation) holds within age groups. Thus, there appear to be effects of both age and word intelligibility on visual cortex activity.
As expected, age was highly correlated with average pure-tone threshold from 250 to 8000 Hz (r = 0.79, P < 0.001). Nevertheless, no significant relationship between average pure-tone threshold and visual cortex activation was observed when controlling for the contribution of age across all conditions (partial r = 0.11, P = 0.54) nor within any of the intelligibility conditions (all r < 0.18, P > 0.31). In contrast, the relationship between age and visual cortex activation persisted even when controlling the contribution of average pure-tone threshold (partial r = 0.42, P = 0.01), indicating that visual cortex activation was more strongly predicted by age than by hearing loss. This pattern held within each of the 3 most difficult listening conditions (partial r = 0.49, P < 0.005 for 400 Hz; partial r = 0.48, P < 0.005 for 1000 Hz; partial r = 0.34, P < 0.05 for 1600 Hz), however, the relation between age and visual cortex activation was no longer significant when controlling for pure-tone thresholds in the easiest condition (partial r = 0.27, P = 0.12 for 3150 Hz). This suggested that hearing loss may have contributed to visual cortex activation for the condition where the range of participants' thresholds was largest and older participants had more hearing loss (3150 Hz in Fig. 1).
Vascular Reactivity Analyses
The respiration data were analyzed to determine the extent to which the age-related visual cortex findings could be attributed to non-neuronal sources of variance in the BOLD signal, such as the amplitude and delay of the vascular response. A group-level analysis demonstrated age differences within visual cortex only for the HRF lag, not amplitude (Fig. 6A). Moreover, HRF differences in visual cortex were only observed with less conservative peak and cluster extent thresholds, each set to P < 0.05.
These weak age-related differences in the vascular reactivity of visual cortex did not account for the relationship between visual cortex activation and age depicted in Figure 5. The upper graphs of Figure 6B show the standardized parameter estimates extracted from medial occipital cortex and the collateral sulcus for the 30 participants whose word listening and respiration data could be analyzed. These results strengthen the findings depicted in Figure 5A by showing that even with 6 fewer participants, age is significantly correlated with visual cortex activation in medial occipital cortex (r = 0.63, P < 0.01). The same was true for a spot in the left collateral sulcus (r = 0.58, P < 0.01). As shown in Figure 6B, this relation persisted when the degree of individual HRF lag was entered as a predictor in a linear regression model (medial occipital cortex r = 0.57, P < 0.01; left collateral sulcus r = 0.55, P < 0.01). These results demonstrate that vascular reactivity did not drive the age-related changes in visual cortex during the word listening task. A separate analysis confirmed that the significant positive relation between increasing age and decreasing word intelligibility, as depicted in Figure 5A, persisted in this subsample (n = 30).
Crossmodal Connectivity Analysis
We predicted that auditory and visual cortical regions would be functionally correlated during the task if visual cortex activation is not spurious but rather driven by listening to words. Additionally, we tested the prediction that the nature of this correlation may be age dependent. Using the medial occipital cortex region as a seed in a functional connectivity analysis, we examined age-related changes in the synchrony of activation in medial occipital cortex with the entire brain. With increasing age, functional connectivity between medial occipital cortex and STS, STG, and HG increased (Fig. 7A). Highlighting that these areas are sensitive to word recognition within the current study, we found that 30.6% of the volume of this region overlapped with the volume of the age-independent region in left auditory cortex that responded to increasing intelligibility (Fig. 4). Figure 7B further clarifies the significant connectivity effect by showing how the sign of the correlation varied with age. While older adults generally demonstrated a positive correlation between occipital and temporal regions, younger adults were more likely to demonstrate inversely correlated activity between these regions, with middle-aged adults demonstrating no association.
The results of this study are consistent with evidence that aging is associated with a decreasing ability to inhibit crossmodal processing in a unimodal task. In addition, we observed that declining stimulus integrity had an independent and spatially similar effect as aging. Together these results demonstrate additive effects of age and word intelligibility on the degree of visual cortex engagement in an auditory word listening task. The relation between temporal and occipital cortex was not driven by age-related differences in vascular reactivity but rather these regions were functionally correlated across word intelligibility conditions. The nature of this correlation appeared to change with age: younger adults deactivated visual cortex when activating auditory cortex, while older adults synchronously activated both cortices.
In this study, participants listened to words at 4 different levels of intelligibility and reported whether they recognized each word. The results showed the typical pattern of bilateral temporal and left inferior frontal activation when listening to speech and increasing activation in bilateral STS, STG, and HG with increasing intelligibility (Binder et al. 2000; Fridriksson et al. 2006; Scott et al. 2006; Sharp et al. 2006; Obleser et al. 2007; Eckert, Walczak, et al. 2008; Harris et al. 2009). We did not observe age effects in the responsiveness of auditory cortex to degraded stimuli. In the context of the Inhibitory Deficit Hypothesis, this result is consistent with evidence that the difference between younger and older adults lies not in the ability to activate task-specific information (bilateral auditory cortical regions) but rather in the ability to suppress extra-task information (visual cortical regions; Gazzaley et al. 2005).
Indeed, we found that with increasing age, visual cortex activation increased, primarily in regions surrounding the calcarine sulcus, collateral sulcus, and inferior angular gyrus. Younger adults tended to suppress visual cortex activation below the implicit baseline when listening to words, supporting previous results (Johnson and Zatorre 2005, 2006; Hairston et al. 2008; Yoncheva et al. 2010). Consistent with the Inhibitory Deficit Hypothesis (Hasher and Zacks 1988; Gazzaley and D'Esposito 2007), older adults activated this region above baseline. The shift from visual cortex suppression to activation occurred in middle age, notably in the most difficult listening condition. These results suggest an interaction between degradations in auditory signal and aging on the extent to which crossmodal systems are engaged.
Mechanisms for this age-related change could involve declines in top-down inhibitory control (Gazzaley and D'Esposito 2007; Eckert, Walczak, et al. 2008) and slowed processing speed (Eckert et al. 2010) that occur with declining integrity of prefrontal cortex (Eckert 2011), potentially producing a shift in baseline levels of attention (Greicius and Menon 2004). White matter pathology is consistently observed in people with slowed processing speed, particularly within frontal tracts projecting to and from the dorsolateral prefrontal cortex, which contributes to top-down control. Eckert et al. (2010) observed that periventricular white matter hyperintensities were most frequent in older adults with reduced dorsolateral prefrontal cortex gray matter and slow processing speed. Because changes in processing speed are due at least in part to increased distractibility (Lustig et al. 2006), we predict that declines in dorsolateral prefrontal tracts, due to cerebral small vessel disease, for example, limit the top-down control of frontal cortex on posterior sensory cortex. The consequence, based on the results of this study, appears to be failed suppression of visual cortex during a speech recognition task. Indeed, this could be one explanation for why older adults often experience distraction when trying to follow conversation (e.g., Tun et al. 2002).
The association between decreasing word intelligibility and increasing visual cortex activation suggests that degraded auditory representation may exacerbate this distractibility. Specifically, more frontal resources may be required to attend to difficult auditory stimuli and, as a result, inhibitory visual control decreases. Thus, older adults who are least able to suppress crossmodal information due to declining frontal control and degraded auditory representations should be most distracted by irrelevant visual stimuli when trying to understand speech in naturalistic conditions. While the visual information being suppressed may be internally represented (e.g., Johnson and Zatorre 2005), it is also possible that older adults' increased distractibility results in differential attention to the crosshair stimulus that appeared on every word listening trial (Townsend et al. 2006). We believe that this is a testable prediction suggested by our results, which may be best examined having older and younger adults participate in auditory fMRI tasks while their locus of attention to potentially distracting visual stimuli is monitored via eye tracking.
As an alternative to the interpretation that older adults' distractibility leads to an inhibitory failure of suppression, visual cortex may be activated in difficult listening conditions as a compensatory strategy to sustain optimal performance relative to younger adults. Indeed, we did not observe age differences for reported word recognition. Such a strategy could be unrelated to reported age-related differences in the speed-accuracy trade-off (Salthouse 1979), as response time was also not correlated with age (within and across all conditions, r < 0.21, P > 0.23). While it is possible that the task and the restricted range of reaction times resulted in insensitivity to performance differences in the sample, the fact that reported recognition tracked with the degree of word filtering suggests that participants were responsive to task demands. Also supporting this compensatory explanation, adults with hearing loss are, in fact, more likely to rely on visual cues than normal-hearing adults, regardless of age. For example, Pelson and Prather (1974) found that not only were older adults with hearing loss more successful speechreaders compared with age-matched normal-hearing controls, but they were also most greatly aided by the presentation of a semantically related scene cue during speechreading compared with controls. Older adults with hearing loss are often taught to use visual cues, such as those from articulatory gestures in speechreading and visual speech, to support word recognition (Erber 1975, 1996). In particular, a speech perception training protocol developed by Humes et al. (2009) trains older adults with hearing loss to link the orthographic forms of words to their degraded auditory form to improve word recognition. Work currently underway in our laboratory examines whether training to engage visual cortical areas supports improvements in understanding of speech.
The nature of the visual representation that younger adults suppress and older adults engage is not discernable in the current study, which employed a unimodal auditory word recognition task. The BrainMap application Sleuth (http://www.brainmap.org/sleuth/index.html; Laird et al. 2005) was used to identify studies which reported peak coordinates within a 10 mm3 cube surrounding the visual cortex peaks observed in the current study. Peak coordinates surrounding the medial occipital sulcus or left collateral sulcus peaks reported in Table 1 were observed in studies that involved the visual presentation of shapes and objects (e.g., Kohler et al. 2000; Rowe et al. 2000; Lepage et al. 2001; Gitelman et al. 2002) and orthography (e.g., Petersen et al. 1989; Price et al. 1994; Buckner et al. 1998; Kuo et al. 2001). Studies in which peak activation was observed surrounding the left iAG peak involved the presentation of myriad sensory stimuli, including motion (Decety et al. 1994), tactile (Ricciardi et al. 2006), spatial (Bonda et al. 1995), auditory (Sharp et al. 2010), and visual (shapes, Beason-Held et al. 1998; word forms, Buckner et al. 1996). This diversity points to left iAG's involvement in multisensory integration. Thus, the representations associated with the visual cortex activation observed in this study may be related to imagery of objects or scenes related to the words' semantic content or imagery of the words' orthographic forms. Determining the exact nature of this representation is beyond the scope of the current paper but could be investigated in future studies that manipulate the nature of audiovisual material that is presented to older and younger adults.
|Region||MNI coordinates||T-score||Cluster size (voxels)|
|Areas correlated with decreasing intelligibility|
|Left collateral sulcus||−28||−81||−15||4.31||363|
|Right collateral sulcus||23||−70||−23||3.68||142|
|Areas correlated with increasing age|
|Left collateral sulcus||−29||−63||−3||4.32|
|Medial occipital cortex||−11||−94||0||4.14|
|Right collateral sulcus||31||−59||−9||3.84||119|
|Left superior sensory cortex||−24||−38||60||3.67||182|
|Region||MNI coordinates||T-score||Cluster size (voxels)|
|Areas correlated with decreasing intelligibility|
|Left collateral sulcus||−28||−81||−15||4.31||363|
|Right collateral sulcus||23||−70||−23||3.68||142|
|Areas correlated with increasing age|
|Left collateral sulcus||−29||−63||−3||4.32|
|Medial occipital cortex||−11||−94||0||4.14|
|Right collateral sulcus||31||−59||−9||3.84||119|
|Left superior sensory cortex||−24||−38||60||3.67||182|
Note: Reported regions have peak voxels exceeding P < 0.01 and cluster extent thresholds P < 0.01 (65 voxels). Italicized entries represent a single visual cluster with 3 significant subpeaks. SG, supermarginal gyrus.
To further demonstrate that the suppression effect in visual cortex stemmed from engaging in the auditory task, thereby extending the results described by Eckert, Walczak, et al. (2008), we presented evidence for a functional link between the activation in left auditory cortex and visual cortex. The correlation between these areas became increasingly positive with age, with younger adults tending to show a negative relation, middle-aged adults showing almost no relation, and older adults showing a positive relation. The negative correlation found for younger adults again supports the interpretation that suppression of visual cortex is tied to listening to words. The positive correlation for older adults may be indicative of the engagement of a crossmodal system as a compensatory mechanism to support failing speech recognition in older adults. Behavioral work by Laurienti et al. (2006) suggested that older adults exhibit increased crossmodal integration compared with younger adults, engaging auditory and visual processing in synchrony (Peiffer et al. 2007).
Other researchers have found that while older and younger adults' speech recognition performance improves equivalently from the presentation of audiovisual compared with unimodal stimuli, older adults show overall reduced performance on audiovisual tasks compared with younger adults (Sommers et al. 2005; Tye-Murray et al. 2008). This suggests that the synchronous visual and auditory cortical activation we observed may not be compensatory. Rather, age-related failures of crossmodal suppression may result in increases in processing information that does not help or even hinders speech recognition performance. In fact, the degree to which increased crossmodal integration may be harmful (i.e., processing extratask information is distracting), helpful (i.e., processing extratask information may provide task-relevant cues), or irrelevant has been shown to depend on the nature of the task and stimuli (May 1999). While we do not distinguish among these alternatives in the current study, it will be important to determine the conditions under which visual cortex upregulation supports or diminishes speech recognition performance.
Hearing loss, which degrades the incoming speech signal, may also contribute to visual cortex activation. It was especially important to examine the impact of audibility on our results given research showing that differences in hearing loss may account for apparent differences in inhibition (Murphy et al. 1999). While controlling for individual differences in average pure-tone thresholds did not eliminate the correlation between age and activation in visual cortex, hearing loss independent of age was most related to visual cortex activation in the easiest listening condition. In this condition, audibility varied the most across participants (Fig. 1). Thus, there may be independent contributions of word intelligibility and age on crossmodal activation due to declines in both inhibitory control and hearing loss.
The visual cortex results, however, could not be attributed to declines in vascular reactivity. Because vascular reactivity has been shown to decrease with age, especially in visual cortex (Ross et al. 1997; Buckner et al. 2000), we utilized the results of a respiration experiment (Thomason and Glover 2008) to ensure that the age effect was not driven by physiological changes. We found a relatively weak increase in the time-to-peak (lag) of the HRF with increasing age in visual cortex, primarily surrounding the fusiform gyrus. These small effects did not account for the age effects on the activation of visual cortex associated with listening to speech.
The current study supports the premise that age and task difficulty are additive modulators of crossmodal suppression and activation. Unlike younger adults, older adults did not inhibit visual cortex activation during word listening. Extending previous work (Rissman et al. 2009), we found that degraded word intelligibility had an effect comparable to aging, suggesting that both factors modulate the level of activation of extratask cortical regions. Additionally, functional connectivity analyses provided a means to show that this visual cortex responsiveness is linked to auditory processing. This relationship appears malleable across the lifespan, highlighting that important changes occur in the understudied middle-aged population when the effects of cerebral small vessel disease and other age-related events begin to occur (Eckert 2011).
This work was supported by the National Institute on Deafness and Other Communication Disorders (P50 DC00422) and the MUSC Center for Advanced Imaging Research. This investigation was conducted in a facility constructed with support from Research Facilities Improvement Program (C06 RR14516) from the National Center for Research Resources (NCRR), National Institutes of Health (NIH). This project was supported by the South Carolina Clinical & Translational Research (SCTR) Institute, with an academic home at the Medical University of South Carolina, NIH/NCRR (UL1 RR029882)
We thank the study participants. Conflict of Interest: None declared.