Learning perceptual skills is characterized by rapid improvements in performance within the first hour of training (fast perceptual learning) followed by more gradual improvements that take place over several daily practice sessions (slow perceptual learning). Although it is widely accepted that slow perceptual learning is accompanied by enhanced stimulus representation in sensory cortices, there is considerable controversy about the neural substrates underlying early and rapid improvements in learning perceptual skills. Here we measured event-related brain potentials while listeners were presented with 2 phonetically different vowels. Listeners' ability to identify both vowels improved gradually during the first hour of testing and was paralleled by enhancements in an early evoked response (∼130 ms) localized in the right auditory cortex and a late evoked response (∼340 ms) localized in the right anterior superior temporal gyrus and/or inferior prefrontal cortex. These neuroplastic changes depended on listeners' attention and were preserved only if practice was continued; familiarity with the task structure (procedural learning) was not sufficient. We propose that the early increases in cortical responsiveness reflect goal-directed changes in the tuning properties of auditory neurons involved in parsing concurrent speech signals. Importantly, the neuroplastic changes occurred rapidly, demonstrating the flexibility of human speech segregation mechanisms.
Perceptual learning refers to an improvement in sensory discrimination after a period of practice that can vary from days to several weeks (Watson 1980; Karni and Bertini 1997). Animal studies have shown that extended training is accompanied by topographical changes in sensory cortex (Recanzone and others 1993; Blake and others 2002; Bao and others 2004; however, for a failure to find any auditory cortical correlate of perceptual learning in the cat, see Brown and others 2004; Polley and others 2004; Witte and Kipke 2005). In humans, neuroplastic changes in latency and amplitude of scalp-recorded event-related brain potentials (ERPs) have been observed after extensive training (Tremblay and others 1997; Atienza and others 2002; Pantev and others 2003; Reinke and others 2003; Bosnyak and others 2004; Gottselig and others 2004). For instance, the latency of the N1 wave, a negative deflection that peaks at about 100 ms after sound onset over the frontocentral scalp region, peaked earlier after extended training (Reinke and others 2003; Bosnyak and others 2004), whereas the N1c component (∼140 ms) of the auditory evoked response measured at the right temporal scalp site increased in amplitude with practice (Bosnyak and others 2004). In addition, extended training has been found to enhance the amplitude of the P2 wave, a positive deflection that peaks at about 180 ms after sound onset over the frontocentral scalp region (Tremblay and others 2001; Atienza and others 2002; Reinke and others 2003; Bosnyak and others 2004). The training-related P2 enhancement appears after 2 (Atienza and others 2002) or 3 (Bosnyak and others 2004) daily test sessions and may index neuroplastic changes associated with slow perceptual learning. Demonstrating the effects of long-term musical training, Shahin and others (2003) observed a larger N1c amplitude over the right temporal lobe and larger P2 amplitude at central sites in musicians relative to nonmusicians. Together, these studies showed learning-related changes in sensory evoked responses (i.e., N1, N1c, and P2) that can be observed after training in a wide range of auditory tasks and between individuals with different amounts of musical expertise.
Although it is well accepted that neuroplasticity occurs following extended training, there is considerable debate about the neural substrates underlying early and rapid improvements in task performance taking place within the first hour of training. The procedural hypothesis posits that rapid improvement in performance is due to changes in observers' strategies that occur during task familiarization (Karni and Bertini 1997). However, evidence from a recent behavioral study suggests that early and rapid improvement in task performance can occur while controlling for learning the response demands of the task (Hawkey and others 2004), suggesting that at least part of the changes underlying improvement in performance takes place in sensory cortices. Additionally, animal research has revealed rapid neuroplastic changes in primary auditory cortex during classical conditioning (Bakin and Weinberger 1990; Edeline and others 1993), instrumental avoidance conditioning (Bakin and others 1996), and auditory discrimination learning (Fritz and others 2003). Although these studies suggest that rapid auditory discrimination learning may be related to neuronal plasticity in sensory cortex, neurophysiological evidence supporting such neuroplastic changes in human observers remains elusive.
Previous ERP studies on the time course of perceptual learning in human observers have observed enhanced amplitude of the mismatch negativity (MMN) wave within a single daily training session (Atienza and others 2002; Gottselig and others 2004). In these studies, the MMN was recorded during passive listening prior to and after a brief training session. Although the MMN is thought to index a preattentive change detection process (Näätänen 1992), evidence from many studies has suggested that attention to auditory stimuli enhances MMN amplitude (Woldorff and others 1991, 1998; Alain and Woods 1997; Arnott and Alain 2002). Therefore, it is unclear whether enhanced MMN amplitude following training was related to learning per se or whether it indexed increased participants' attention to recently learned auditory material.
The present study was therefore designed to directly test, using ERPs, the hypothesis that neuroplastic changes in sensory cortex parallel rapid perceptual learning. We used a difficult speech segregation task in which participants were asked to identify 2 phonetically different vowels presented simultaneously. Such a design allows us to examine whether rapid improvement in task performance is accompanied by ERP changes localized in auditory areas or whether it is associated with changes in more cognitive and strategic processes. One important question is whether task familiarity is sufficient for generating changes in auditory cortical areas. If rapid improvements in performance are due to an increase in task familiarity or other cognitive factors, then rapid training-related changes in ERPs should be observed in late cortical activity following stimulus encoding such as during stimulus classification and categorization. However, if improvement in performance is due to training-related plasticity in sensory cortices, then changes should be seen in the amplitude and/or latency of early sensory evoked responses generated in primary and/or associative auditory cortex.
Materials and Methods
Thirty-two participants provided written informed consent to participate in the study. In Experiment 1, there were 8 women and 8 men aged between 19 and 34 years (M = 24 ± 4.6 years). In Experiment 2, there were 10 women and 6 men (none of whom participated in Experiment 1) aged between 19 and 34 years (M = 24 ± 4.7 years). All participants were right handed and had pure-tone thresholds within normal limits for frequencies ranging from 250 to 4000 Hz (both ears). For all participants, English was the first language. Ethical approval and informed consent were obtained according to the guidelines set out by the Baycrest Centre for Geriatric Care and the University of Toronto.
Stimuli and Task
Stimuli were 5 synthetic steady-state American English vowels: /i/, /a/, /æ/, /u/, /з/ (Assmann and Summerfield 1994). Each vowel was 200 ms in duration (2442 samples at a 12.21-kHz sample rate, 16-bit quantization) with fundamental frequency (f0) and formant frequencies held constant for the entire duration. Onsets and offsets were shaped by 2 halves of an 8-ms Kaiser window, respectively. Double-vowel stimuli were created by adding together the digital waveforms of 2 different vowels and then dividing the sum by 2. Stimuli were examined using an oscilloscope to ensure that there was no “clipping.” The vowels were added in phase and this resulted in smaller amplitude when the 2 vowels differed in f0. Each pair contained one vowel with f0 set at 100 Hz; the other vowel's f0 was set at 100, 101, 103, 106, 112, or 126 Hz. Each vowel was paired with every other vowel, giving 120 different pairs. Stimuli were converted to analog form using a Tucker Davis Technologies (TDT) RP-2 real-time processor (24-bit, 90-kHz bandwidth) under the control of a Dell computer with a Pentium 4 processor. The analog outputs were fed into a Headphone driver (TDT HB-7) and then transduced by a pair of headphones (Sennheiser HD 265). Stimuli were digitally low-pass filtered at 6 kHz and presented binaurally at about 80 dB SPL (Damilar SPL meter model 824). No attempt was made to compensate for absolute level differences among the different vowel pairs.
At the beginning of Experiment 1, each vowel was presented individually (30 trials, 5 vowels by 6 f0 levels), and participants identified each vowel by pressing a corresponding key on the number pad of a keyboard. None of the participants had any difficulty in identifying the single vowels, and all reached a level of 95% correct or better. For the experiment proper, double-vowel stimuli were randomized in blocks of 120 trials, and 5 blocks of trials were presented, each block lasting about 8 min. On each trial, participants were presented with a double-vowel stimulus and asked to identify both vowels in the pair by sequentially pressing the 2 corresponding keys on a keypad. The interval between the participant's response and the next trial was 1500 ms. No feedback was provided. Participants were provided with a brief pause between blocks (∼2 min).
ERPs were recorded during two 1-h sessions separated by 1 week. Half of the participants practiced the task during the intervening week, whereas the other half served as controls and did not receive additional training. The trained group received 4 sessions of training on the vowel discrimination task on 4 separate days. No ERPs were recorded during these sessions, but otherwise the blocks of trials were identical to those of the ERP sessions. In the training phase, participants were presented with 3 blocks of trials in each of the training session, which lasted about 35 min.
In Experiment 2, the same stimulus set was presented in a single 1-h session while a different group of participants who did not take part in Experiment 1 watched a muted subtitled movie of their choice. Because there was no response, the interstimulus interval (i.e., offset to onset) was set at 1500 ms. All other aspects of stimulus presentation were identical to that of the active listening condition.
The data from Experiments 1 and 2 were the same as in our previous reports (Reinke and others 2003; Alain, Reinke, He, and others 2005), which were subjected to further analysis, and new results are reported here.
Electrophysiological Recording and Analysis
The electroencephalogram was digitized continuously (bandpass 0.05–50 Hz; 250-Hz sampling rate) from an array of 64 electrodes. We recorded eye movements with electrodes placed at the outer canthi and at the inferior orbits. During recording, all electrodes were referenced to the Cz electrode; for off-line data analysis, they were re-referenced to an average reference. The analysis epoch included 200 ms of prestimulus activity and 1200 ms of poststimulus activity. Trials contaminated by excessive peak-to-peak deflection (±100 μV) at the channels not adjacent to the eyes were automatically rejected before averaging. ERPs were then averaged separately for each electrode site and stimulus block. For each individual average, we corrected ocular artifacts by means of ocular source components (Berg and Scherg 1994; Picton and others 2000) using Brain Electrical Source Analysis (BESA 3.0). ERPs were digitally filtered to attenuate frequencies above 30 Hz. ERP amplitudes were measured relative to the mean amplitude over the prestimulus interval. The effects of learning were examined for the 100- to 140-ms and the 300- to 400-ms interval. The former was chosen because it encompassed the N1 and N1c, which were previously found to be modulated by extended training (Reinke and others 2003; Bosnyak and others 2004). The latter was chosen because it covered the N2 wave, which is thought to reflect a decision process that controls behavioral responses in sensory discrimination tasks (Ritter and others 1979, 1982). The effects of practice on the N1 and N2 were examined at 9 frontocentral sites (F1, Fz, F2, FC1, FCz, FC2, C1, Cz, C2) and left (T7) and right (T8) temporal sites. Previous research has shown enhanced P2 amplitude after only 2 to 3 recording sessions (Atienza and others 2002; Bosnyak and others 2004). In the present study, we examined whether the P2 amplitude would vary within a single test session by comparing mean P2 amplitude (40 ms centered around the group mean latency) recorded over frontocentral, central, and centroparietal scalp electrodes (i.e., FC1, FCz, FC2, C1, Cz, C2, CP1, CPz, CP2).
Source analysis of the group average ERPs was performed using standardized low-resolution brain electromagnetic tomography (sLORETA). This method is based on standardized values of the current density estimates given by the minimum norm solution (Pascual-Marqui 2002). That is, for each location, the squared current density strength is divided by its expected a posteriori variance. Three spherical shells were used as the volume conductor model. sLORETA was computed on a 3-dimensional regular grid with 5 mm spacing using Curry software (version 5.0, Compumedic/Neuroscan, El Paso, TX). In the current study, we used the statistical maps in a qualitative way to identify the most likely sources of learning-related differences, having established that these differences were significant using statistical tests on scalp-recorded data.
Figure 1 illustrates the group mean accuracy in identifying the dominant and nondominant vowel during the first ERP recording session. When 2 vowels are played together, listeners generally hear one of the vowels as foreground (dominant) and the other as background (nondominant). Across participants and blocks, the percentage of trials in which at least one vowel was correctly identified ranged from 78% to 100% (mean = 94%, standard error [SE] = 1%), whereas the percentage of trials in which both vowels were correctly identified ranged from 14% to 93% (mean = 51%, SE = 6%). (In the present study, the accuracy data represent success in identifying either 1 or 2 vowels regardless of the f0 separation between the 2 vowels. As in previous studies, we found that the likelihood of correctly identifying both vowels increased with larger f0 separation between the 2 vowels. For more details about the effects of f0 separation on accuracy and ERPs, see Alain, Reinke, He, and others 2005.) In both cases, the mean levels of performance for identifying either 1 or 2 vowels were above chance levels of 70% and 10%, respectively.
Listeners' ability to identify the dominant vowel, as reflected by the percentage of trials in which at least one vowel was correctly identified (one correct), improved from the first to the second block of trials (main effect of block, F4,60 = 4.12, P < 0.01; quadratic trend, F1,15 = 6.78, P < 0.02) and remained stable throughout the rest of the experiment (Fig. 1). Pairwise comparisons revealed a significant increase in accuracy but only between the first and the remaining blocks (P < 0.05 in all cases). In contrast, successful sound segregation as measured by identifying both vowels (both correct) gradually improved from the first to the fifth block of trials (main effects of block, F4,60 = 5.07, P < 0.005; linear trend, F1,15 = 12.52, P < 0.005). Pairwise comparisons revealed significant improvement in performance in blocks 3, 4, and 5 relative to the first block (P < 0.05 in all cases). These behavioral differences in the time course of perceptual learning are likely to be related to the greater difficulty in identifying both vowels correctly compared with identifying only the dominant vowel.
The time required for identifying the 2 vowels was also analyzed using a within-subject analysis of variance (ANOVA) with block and button press (first vs. second) as variables. Overall, the mean response time for the first (RT1) and second button presses (RT2) was 1811 and 2385 ms, respectively. These relatively long response times are likely due to the fact that the task was difficult and that we emphasized accuracy over speed in our instructions to the participants. Both the time needed to generate RT1 and RT2 decreased significantly as a function of block (main effects of block, F4,60 = 6.82, P < 0.005; linear trend, F1,15 = 13.39, P < 0.005). Pairwise comparisons revealed significant decreases in RT1 and RT2 in blocks 2, 3, 4, and 5 relative to block 1 (P < 0.05 in all cases). The interaction between block and button presses (RT1, RT2) was not significant.
The ERP analyses begin with those obtained during the first training session and included 16 participants. During the processing of double-vowel stimuli, scalp-recorded auditory ERPs revealed well-known N1, P2, N2, and slow-wave responses peaking, respectively, at about 112, 208, 310, and 610 ms after sound onset over the frontocentral scalp regions. There was no difference in N1 and P2 amplitudes recorded over the frontocentral scalp electrodes as a function of block (for N1 and P2, main effect of block, F4,60 < 1.27; linear trend, F1,15 < 1.07).
At temporal sites, there was a positive wave referred to as Ta (Wolpaw and Penry 1975), which was larger over the right hemisphere and inverted in polarity with the N1 wave recorded at frontocentral sites. We identified 2 ERP differences associated with rapid improvement in performance. The first was characterized by an increase in Ta amplitude (100–140 ms) (main effect of block, F4,60 = 7.07, P < 0.001; linear trend, F1,15 = 11.65, P < 0.005), which was greater over the right (T8) than the left (T7) temporal site (Fig. 2A) as evidenced by a block × hemisphere interaction (linear trend, F1,15 = 5.23, P < 0.05). A separate ANOVA on the mean amplitude recorded at T8 yielded a main effect of block (F4,60 = 5.97, P < 0.001; linear trend, F1,15 = 9.91, P < 0.01). Pairwise comparisons revealed significant increases in blocks 4 and 5 relative to the first block (P < 0.01 in all cases). There was no significant difference in ERP amplitude recorded over the left temporal site as a function of block (main effect of block, F4,60 = 1.04; linear trend, F1,15 = 0.25). The training-related enhancement in Ta amplitude was followed by an increased positivity between 300 and 400 ms after sound onset (main effect of block, F4,60 = 4.29, P < 0.01, linear trend, F1,15 = 14.13, P < 0.005). Although this modulation overlapped in latency with the N2 wave recorded at frontocentral sites, it likely reflects a different process because its polarity was inverted at temporal sites, whereas the N2 wave usually does not reverse in polarity at these sites (Näätänen 1992). Like the Ta effect, the right temporal enhancement in ERP amplitude, referred to as the rT350, was greater over the right than the left temporal site as revealed by a significant interaction between hemisphere and block (linear trend, F1,15 = 7.25, P < 0.05). For the mean ERP amplitude recorded at T8, the main effect of block was significant (F4,60 = 5.97, P < 0.001; linear trend, F1,15 = 16.79, P < 0.001) with pairwise comparisons revealing significant increases in blocks 3, 4, and 5 relative to the first block (P < 0.001 in all cases). There was no significant difference in ERP amplitude recorded over the left auditory cortex (i.e., T7) as a function of block (F4,60 < 1; linear trend, F1,15 = 2.39, P = 0.14).
The previous analyses suggest a link between changes in neural activity and behavioral improvement. To further investigate this relationship, we first compared the rates of change in ERP amplitude and performance across blocks using orthogonal polynomial decomposition (with a focus on the linear and quadratic trends). For each participant, we standardized the accuracy (i.e., probability of reporting the 2 vowels accurately), response times, and electrophysiological data (i.e., mean amplitude for the 100- to 140-ms and 300- to 400-ms interval at T8) based on the within-subject range over the blocks of trials (i = 1–5) for each measure (i.e., blocki − blockmin/blockmax − blockmin). (For this analysis as well as the correlation between response time and ERP amplitude, the response time to first [RT1] and second [RT2] button were averaged together because they show comparable learning rate [see Behavioral Data] and were highly correlated with each other, r = 0.96.) Figure 3 shows the group mean change in accuracy, response time, and brain activity as a function of block. An ANOVA on the standardized value revealed a main effect of block (linear trend, F1,15 = 61.81, P < 0.001), but more importantly there was no block by measure interaction, F < 1, indicating that increases in ERP amplitudes over the right temporal cortex paralleled behavioral improvement.
In addition, we examined whether this relationship between ERP amplitude and behavioral improvement can also be observed in each participant by computing correlations between performance and change in ERP amplitude for each of the 16 participants and by testing, using one-sample t-tests, whether the group mean correlations differed from 0 (after Alain and others 2001). For the accuracy data, the relationship between learning rate in identifying the 2 concurrent vowels and changes in ERP amplitude varied substantially among the participants and did not differ from 0 (Ta: group mean r = 0.02, range: −0.87 to 0.99; rT350: group mean r = 0.10, range: −0.72 to 0.86). For the response time data, there was a more consistent learning-related decrease in response time that was paralleled by an increase in ERP amplitude in 13 out of 16 participants (Fig. 4). That is, larger Ta and rT350 amplitude were associated with faster response time (Ta: group mean r = −0.43, range: −0.94 to 0.24, t15 = 4.91, P < 0.001; rT350: group mean r = −0.42, range: −0.97 to 0.84, t15 = 3.23, P < 0.006). These individual correlations suggest a more direct link between behavioral changes and the ERP measures and provide further evidence supporting the claim that enhancement in ERP amplitude over the right temporal lobe underlies behavioral improvement.
The contour map for the early (130 ms) training-related enhancement in ERP amplitude (Fig. 2B) is consistent with a radial source in the right secondary auditory cortex along the lateral superior temporal surface (Wood and Wolpaw 1982; Picton and others 1999). The amplitude distribution of the late enhancement likely reflects activity from tangential generators located in the anterior portion of the right superior temporal plane (Fig. 2B). However, a generator located in the inferior portion of the prefrontal cortex could also account for the amplitude distribution.
To investigate the sources of these early and rapid changes in cortical responsiveness further, we used sLORETA to compute statistical maps from the difference wave between the first and the fourth block of trials over the 40- to 640-ms interval. (The fourth block of trials was chosen because it showed the largest practice-related changes.) These maps are derived by performing a location-wise weighting of the results of a minimum norm least-squares analysis with their estimated variances. The analysis shows that the dominant intracerebral source for the early evoked response (136 ms) was located in the right middle and lateral temporal lobe (Fig. 5). This right-lateralized effect of learning is not limited to fast perceptual learning but has also been observed following extended training over several days (Reinke and others 2003; Bosnyak and others 2004) and as a result of musical training (Shahin and others 2003). The sLORETA solution also suggested a source in the right inferior prefrontal cortex rather than in the anterior temporal lobe during the late evoked response (336 ms). A contribution of the prefrontal cortex may come as no surprise given the nature of the task, which requires participants to parse, maintain, and manipulate acoustic information. Together, these results suggest that rapid auditory perceptual learning depends on a right frontotemporal network that has been shown to play an important role in sensory memory (Alain and others 1998) as well as in tasks involving pitch comparison and retention (Zatorre and others 1994).
Effects of Extended Training
We found that rapid improvement in an auditory identification task is paralleled by neuroplastic changes in auditory sensory areas followed by changes in neural activity likely generated in the anterior portion of the right temporal lobe and/or prefrontal cortex. However, if these ERP modulations index perceptual learning, then these rapid learning-related changes in cortical responsiveness should be sensitive to prior experience with the task. To test whether or not the Ta and rT350 modulations were affected by prior task experience, we compared ERPs recorded from the same participants 1 week later (Reinke and others 2003). Half of the participants received four 35-min daily practice sessions on the double-vowel task between the 2 ERP recording sessions whereas the other half served as controls and received no practice.
An ANOVA on the participants' ability to accurately identify both vowels during the first and second ERP recording sessions yielded a main effect of session (F1,14 = 18.76, P < 0.001) and a significant group × session interaction (F1,14 = 7.63, P < 0.02). In the trained group, performance across blocks increased from 50% to 71%, whereas in untrained listeners, performance increased only from 51% to 56%. During the second recording session, neither the main effect of block nor the group × block interaction was significant. In the trained group, performance ranged from 68% to 72% across blocks, whereas in the untrained group, performance ranged from 54% to 58% across blocks; in both cases the linear trend was not significant. Lastly, the main effect of block was not significant during the training days in this relatively small sample (N = 8).
As for accuracy data, the analyses of response time data yielded a main effect of session (F1,14 = 10.29, P < 0.01) and a significant group × session interaction (F1,14 = 7.07, P < 0.02). However, in contrast with accuracy, the analysis of response time recorded during the second ERP session yielded a main effect of block (linear trend, F1,15 = 9.81, P < 0.01), with response times decreasing with practice. The group × block interaction was not significant, F < 1, suggesting comparable rate of learning in the trained and untrained groups. Neither the main effect of group, F < 1, nor the group × button press (RT1, RT2) interaction was significant. These results suggest that some learning took place during the second ERP recording session and that the learning rate did not differ between the trained and untrained listeners.
Figure 6 shows the group mean ERPs recorded before and after the training week over the midline frontal and right temporal cortex for the trained (Fig. 6A) and untrained (Fig. 6B) groups. For the trained group, there was a marked increase in P2 amplitude after extended training (for a detailed analysis of the effects of extended training on the sensory evoked response, see Reinke and others 2003). More importantly, for the trained group, there was no significant change in ERP amplitude over the right temporal lobe (T8) as a function of block for the 100- to 140-ms interval after training. The interaction between recording session and block was significant (linear trend, F1,7 = 9.67, P < 0.02), indicating early and rapid neuroplastic changes only prior to the weeklong training (Fig. 6A and Fig. 7). A separate ANOVA on the ERP amplitude recorded in the untrained participants in the second session revealed a significant increase as a function of block over the right temporal cortex (main effect of block, F4,28 = 3.73, P < 0.05, linear trend, F1,7 = 7.97, P < 0.05), and these effects were not different from those observed during the first recording session (Figs 6B and 7). A between-group analysis revealed a significant interaction between group and block (F1,14 = 5.31, P < 0.05), reflecting training-related changes during the second ERP recording session but only for those individuals who did not receive extended training between the first and the second ERP session. This indicates that the rapid enhancement in cortical responsiveness during the 100- to 140-ms interval is modulated by prior experience, being present only in individuals that did not have the opportunity to practice the task in the preceding days.
We also examined whether the prior exposure to the task would modulate ERP amplitude recorded over the right temporal site during the 300- to 400-ms interval (Fig. 6 and Fig. 7). Trained participants showed a small, albeit significant, increase in ERP amplitude from the beginning to the end of the second recording session (F1,7 = 10.84, P < 0.02). The interaction between recording session and block tended toward significance (linear trend, F1,7 = 4.28, P = 0.08). A separate ANOVA on the ERP amplitude recorded in the untrained participants also revealed a significant increase as a function of block during the second recording session (main effect of block, F4,28 = 6.69, P < 0.01, linear trend, F1,7 = 12.54, P < 0.01), and these effects were not different from those observed during the first recording session (days × block interaction, F1,7 < 1.0). A between-group analysis on the ERP amplitude recorded during the second recording session revealed a main effect of block (linear trend, F1,14 = 20.50, P < 0.001). The interaction between group and block tended toward significance (linear trend, F1,14 = 4.09, P = 0.063), suggesting greater within-session enhancement for those individuals who did not receive extended training between the first and the second ERP session. Overall, trained participants tended to have larger amplitude than untrained participants (F1,14 = 3.95, P = 0.067).
Experiment 2: Passive Listening
Although attention to stimuli facilitates learning, there is also evidence that mere exposure to sounds improves performance in subsequent recognition and identification tasks (Yonan and Sommers 2000; Clarke and Garrett 2004; Szpunar and others 2004). In Experiment 2, we therefore measured ERPs elicited by the same stimuli as in Experiment 1 in a new group of 16 participants, while they watched a muted subtitled movie of their choice, to assess whether passive listening is sufficient to induce rapid increases in cortical responsiveness. The mean ERP amplitude recorded across the 5 blocks of trials did not differ significantly during the 100- to 140-ms and 300- to 400-ms intervals (F1,15 < 1.10) (Fig. 8), suggesting that active listening is required to generate reliable and rapid neuroplastic changes (Recanzone and others 1993; Fritz and others 2003). These findings rule out the explanation that mere repeated exposure to sounds is the cause of the observed training-related enhancements in ERP amplitude (Sheehan and others 2005) and emphasize the role of top–down mechanisms in rapid perceptual learning. In addition to the ERP amplitude recorded over the temporal sites, we also examined whether the simple repeated exposure to these speech sounds would modulate the amplitude of the N1 and P2 wave recorded over the central scalp regions (i.e., FC1, FCz, FC2, C1, Cz, C2, CP1, CPz, CP2), where they showed the largest amplitude during passive listening. The N1 wave (∼115 ms), measured over the 95- to 135-ms interval, showed little changes as a function of block (linear trend, F < 1). However, there was a significant decrease in P2 (170–210 ms) mean amplitude as a function of block (linear trend, F1,15 = 6.08, P < 0.05). This decrement in P2 amplitude may reflect long-term habituation following repeated presentation of the same stimuli (Teismann and others 2004) or repetition priming because previously heard vowels may help identify the vowels in subsequent trials. This would be consistent with recent functional magnetic resonance imaging studies reporting decreased activity in sensory areas during repetition priming (Bergerbest and others 2004; Dobbins and others 2004).
This study demonstrates that listeners improve very quickly at parsing and identifying concurrent speech sounds. Importantly, we showed that rapid improvement in task performance is accompanied by neuroplastic changes in sensory cortex as evidenced by enhanced amplitude of sensory evoked responses. For both behavioral and ERP data, the first reliable effects of practice in distinguishing both vowels were seen after at least 2 practice blocks. The increase in Ta and rT350 amplitude over the right hemisphere was preserved only if participants continued to practice at the task. “Procedural learning,” conceptualized as a rapidly acquired familiarity with task requirements and stimuli, was therefore not sufficient for generating the changes in ERP amplitude. Furthermore, the learning-related change in Ta amplitude occurred relatively early (∼120 ms, after sound onset) and was localized in auditory cortex. This finding is difficult to reconcile with a procedural learning account and seems more in line with animal research showing that rapid changes can take place in the receptive fields of sensory neurons (Edeline and Weinberger 1993; Fritz and others 2003, 2005b).
Our results emphasize the importance of attention in rapid perceptual learning by demonstrating that passive listening is not enough to produce reliable changes in auditory cortex. This is consistent with findings from recent animal studies showing changes in the receptive fields of ferret auditory cortex within minutes of beginning a tone discrimination task that was previously learned (Fritz and others 2003, 2005a, 2005b). These neuroplastic changes were smaller or absent when the animal listened passively to the same sounds. In the present study, selective attention may have sped up receptive field sharpening of neurons in auditory cortex such that spectral analysis of the vowel constituents was more precise, thereby improving vowel separation and identification. Such neuroplastic changes in receptive fields could result in a larger population of neurons responding synchronously to the task-relevant attribute, which in turn could be reflected in the amplitude of the evoked potentials recorded at the surface of the scalp.
As described earlier, when 2 vowels are played together, listeners generally hear one of the vowels as foreground (dominant) and the other as background (nondominant). The difficulty of the task resides in identifying the nondominant vowel that depends on successfully parsing the incoming vowel mixture into its constituent parts for a comparison with templates of the vowels in long-term memory. The auditory system is thought to achieve this separation by first extracting the fundamental frequency of the dominant vowel and then “subtracting” its components in order to facilitate the identification of the nondominant vowel (de Cheveigne 1999). In the present study, the increases in ERP amplitude may reflect changes in the tuning properties of auditory neurons involved in parsing concurrent speech signals. The contour maps as well as the solution from sLORETA suggest that the lateral portion of the right superior and middle temporal gyrus plays an important role in learning to parse and identify concurrent speech sounds. Previous research has shown that the right temporal cortex plays an important role in processing voices (Belin and others 2000), is sensitive to acoustic cues that lead to the perception of concurrent auditory events (Hiraumi and others 2005), and may play a dominant role in stream segregation (Snyder and others 2006). In the present study, rapid improvement in speech separation and identification may recruit activity from voice-selective areas (Belin and others 2000) because vowels are produced vocally and provide important tonal information, which may help identify the speaker (e.g., fundamental frequency). Similarly, enhanced ERP amplitude may reflect better neural synchrony in neurons sensitive to those acoustic cues that lead to segregation of concurrent auditory objects.
The contour maps as well as the solution from sLORETA also suggest a source in right inferior prefrontal cortex. Although we cannot exclude a possible contribution from a source in the anterior portion of the right temporal lobe, a right prefrontal cortex generator may be more plausible given that the task used in the present study likely involves a comparison between the incoming sounds and the representation of the vowels in working memory. This is consistent with previous studies showing that the right prefrontal cortex plays an important role in auditory memory (Alain and others 1998; Doeller and others 2003; Schall and others 2003), as well as in processing and remembering speech (Tulving and others 1994; Buckner and others 1996) and musical sentences (Zatorre and others 1994). The putative enhanced activity in prefrontal cortex and auditory areas may also index a frontotemporal network involved in encoding fine acoustic details in sensory memory (Alain and others 1998). Such a model could account for the enhanced MMN amplitude that was previously shown to occur within the first testing session (Atienza and others 2002; Gottselig and others 2004).
The Ta and rT350 training-related enhancements might index increased neural network efficiency during the segregation and identification of concurrently presented vowels. These modulations are consistent with previous studies suggesting that vowel segregation and identification involve a widely distributed neural network that includes the thalamus, primary auditory cortex, and the planum temporale (Alain, Reinke, He, and others 2005; Alain, Reinke, McDonald, and others 2005). The neuroplastic changes reported here differ from short- and long-term habituation effects, which are usually associated with decreased rather than increased ERP amplitude (Näätänen and Picton 1987). Rather, the increased ERP amplitude observed here might reflect enhanced synchrony of neurons tuned to frequencies within the same vowel and/or rapid adjustment of receptive field tuning to maximize the separation of concurrent sounds. Alternatively, the enhanced amplitude could also reflect recruitment of larger populations of neurons associated with learning a novel task. Further research is needed to clarify the relation between neuroplastic changes in scalp-recorded data and the underlying changes in the neural substrates.
The rapid neuroplastic changes observed in the present study differed from those reported following extended daily practice sessions. For instance, the P2 amplitude showed little change within a testing day but rather exhibited enhanced amplitude between recording sessions. This indicates that the P2 effects indexed a relatively slow learning process that may depend on consolidation over several days. Although the early Ta amplitude enhancement showed some similarity in terms of latency, amplitude distribution, and source location with the N1c reported in other studies, it also differed in some respects. Whereas the N1c amplitude continued to grow over 15 training sessions (Bosnyak and others 2004), the intrasession enhancement in Ta amplitude was only preserved if practice was continued. However, if practice was discontinued for a week, then the intrasession enhancement reappeared. Together, these findings suggest that the right auditory cortex, and in particular the lateral portion of the superior temporal lobe, plays an important role in both fast and slow auditory perceptual learning. Further research is needed, however, to further clarify the link between the neuroplastic changes observed within the first training day and those observed in the subsequent daily practice sessions.
Prior imaging research using positron emission tomography has shown that associative learning in humans can be paralleled by changes in sensory cortex (Molchan and others 1994; Schreurs and others 1997). Our findings extend those from earlier studies by showing that early and rapid increases in sensory evoked response amplitude can also be observed during auditory perceptual learning. These neural events reflect rapid neuroplastic changes in processing and segregating concurrent vowels, for which participants become more proficient with practice. We suggest that the early enhancement in ERP amplitude recorded over the right temporal lobe indicates changes in auditory cortical receptive fields, which can occur within a few minutes (Edeline and others 1993; Weinberger 2004), whereas the subsequent modulation in ERP amplitude may be related to increased cognitive efficiency in comparing the double-vowel constituents with stored representations. Previous research has shown that listeners can quickly adapt to acoustic and phonetic deviations from native speech (Clarke and Garrett 2004), emphasizing the flexibility and adaptability of the human speech processing system. Our findings further highlight the flexibility of the auditory system in processing speech sounds and reveal a striking degree of plasticity in adult auditory cortex that occurs rapidly to assist in processing vowels occurring simultaneously. Further research is needed to explore the characteristics of this remarkable enhancement in cortical activity and to uncover its boundary conditions. For instance, what are the consequences of rapid learning in parsing concurrent sounds for long-term linguistic representation? What are the effects of age and hearing impairment on these early and rapid changes in cortical responsiveness? Given age-related decline in parsing concurrent vowels (Snyder and Alain 2005), would older adults show rapid improvement in performance paralleled by neuroplastic changes in auditory cortex? Answers to these and related questions will advance our knowledge about learning and cortical plasticity and may have important implications for rehabilitation.
This work was supported by grants from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada, and the McDonnell Foundation. We wish to thank B. J. Dyson, E. E. Hannon, T. W. Picton, B. Ross, and A. Shahin for their comments on the manuscript and valuable discussion. We are particularly indebted to Peter Assmann and Quentin Summerfield for providing the vowel stimuli. Conflict of Interest: None declared.