Abstract

Speech contains prosodic cues such as pauses between different phrases of a sentence. These intonational phrase boundaries (IPBs) elicit a specific component in event-related brain potential studies, the so-called closure positive shift. The aim of the present functional magnetic resonance imaging study is to identify the neural correlates of this prosody-related component in sentences containing segmental and prosodic information (natural speech) and hummed sentences only containing prosodic information. Sentences with 2 IPBs both in normal and hummed speech activated the middle superior temporal gyrus, the rolandic operculum, and the gyrus of Heschl more strongly than sentences with 1 IPB. The results from a region of interest analysis of auditory cortex and auditory association areas suggest that the posterior rolandic operculum, in particular, supports the processing of prosodic information. A comparison of natural speech and hummed sentences revealed a number of left-hemispheric areas within the temporal lobe as well as in the frontal and parietal lobe that were activated more strongly for natural speech than for hummed sentences. These areas constitute the neural network for the processing of natural speech. The finding that no area was activated more strongly for hummed sentences compared with natural speech suggests that prosody is an integrated part of natural speech.

Introduction

The speech melody of an utterance can carry information that is critically important to understand the meaning of a sentence (see, for a review, Friederici and Alter 2004; Frazier et al. 2006). In intonational languages such as German, Dutch, and English, prosodic information on sentence level is mainly conveyed, among others, by the pitch contour of an utterance and the presence of speech pauses. Sentences usually contain one or more major intonational phrases (IPhs; Selkirk 1995) that can be separated by speech pauses. Syntactically relevant speech breaks are also referred to as intonational phrase boundaries (IPBs). In studies using an event-related brain potential (ERP) paradigm, IPBs were observed to give rise to a positive shift in the electroencephalographic (EEG) signal that is referred to as the closure positive shift or CPS component (Steinhauer et al. 1999). This component has been interpreted as being specifically related to the prosodic information contained in IPBs, as it has been observed using sentence materials that lacked semantic and syntactic information, such as pseudoword sentences (Pannekamp et al. 2005), filtered speech materials (Steinhauer and Friederici 2001), or hummed speech (Pannekamp et al. 2005). The present functional magnetic resonance imaging (fMRI) study attempts to identify the brain regions that are involved in the processing of IPBs.

IPBs are often employed by the speaker to clarify the structure of an otherwise syntactically ambiguous sentence. A sentence like “Before Ben starts # the day dreaming has to stop” has a different meaning than “Before Ben starts the day # dreaming has to stop” (# indicating a break). Syntactically relevant speech breaks are also referred to as IPBs. IPBs often separate major IPhs and correspond to major syntactic boundaries (Cooper and Paccia-Cooper 1981). They represent a high level in the so-called prosodic hierarchy (Nespor and Vogel 1983; Selkirk 1995, 2000). In psycholinguistic experiments, IPBs were shown to help resolve ambiguities related to late closure ambiguities (Grabe et al. 1994; Schafer et al. 2000).

Experimental evidence suggests that humans can make use of the prosodic information contained in IPBs, that is, in the absence of semantic or syntactic information. Behaviorally, it has been shown that listeners are able to detect major prosodic boundaries in meaningless speech materials, such as, for example, reiterant speech (i.e., a sentence spoken as a string of repeated syllables while preserving the original prosodic contour) (de Rooij 1976), spectrally scrambled and low-pass–filtered speech (de Rooij 1975; Kreiman 1982), and hummed sentences (t'Hart and Cohen 1990). It should be noted that the rationale for using stimuli of this kind is based on the assumption that speech is separable into layers, such as semantics, syntax, and prosody. This assumption, however, may only be an approximation as it has been argued that the possibility to separate these layers is inherently limited (Searle 1969; Austin 1975).

In studies using an ERP paradigm, IPBs were observed to give rise to a positive shift in the EEG signal that is referred to as the CPS component (Steinhauer et al. 1999). Subsequent studies indicated that the CPS is specifically related to the prosodic aspects of an IPB, as it was also observed for sentence materials that were stripped of semantic information, such as pseudoword sentences, and for sentence materials with reduced or absent segmental information, such as hummed sentences (Pannekamp et al. 2005), and filtered speech materials (Steinhauer and Friederici 2001). The CPS component is typically distributed bilaterally with a central maximum with a shift to the right for hummed sentences. However, due to the intrinsic difficulty of source localization in EEG studies, it is unclear which brain regions generate the CPS component.

To our knowledge, only one imaging study so far investigated the processing of IPBs in speech (Strelnikov et al. 2006). Strelnikov et al. compared sentence materials in Russian that contained an IPB (e.g., “To chop not # to saw,” meaning one should not chop but saw) with sentences that did not contain an IPB (“Father bought him a coat”). Comparing sentences with IPB (segmented) to sentences without IPB (not segmented), stronger activation was observed within the right posterior prefrontal cortex and an area within the right cerebellum. In the reverse comparison, stronger activation was observed in the gyrus of Heschl, bilaterally, and the left sylvian sulcus. However, the 2 types of sentence materials were used in 2 different tasks thus confounding stimulus type and task. Although the comparison between segmented and nonsegmented speech materials very likely yielded brain areas that are relevant for prosody processing, it cannot be excluded that differences due to the tasks contributed to the results.

It should be noted that the term “activation” suggests an absolute value, although it is, in the case of fMRI data, always relative. This is due to 2 reasons. First, the statistical analysis only evaluates differences between conditions. Second, it is generally difficult to define an absolute baseline in brain activation for physiological reasons (Stark and Squire 2001; Tomasi et al. 2006). In the present article, the term “activation” is used only when an experimental condition is compared with a low-level baseline. For comparisons between conditions, the term “activation difference” is used or the term activation with an adjective indicating the direction of the difference e.g., (“stronger”).

Ideally, a study investigating the processing of IPBs should compare conditions that do not differ in any other respect than the presence or absence of IPBs, keeping everything else constant. In the electrophysiological studies reviewed above (e.g., Steinhauer et al. 1999), the same task was used on sentence materials that either had 1 or 2 IPBs. The sentence materials used in the electrophysiological studies reviewed above were carefully constructed as sentence pairs with the same or similar words. It should be noted that the IPBs in these sentences were obligatory, entailing differences with regard to the syntactic and semantic structure between the 2 sentences of such a pair. However, these differences only play a role when the sentence materials are presented naturally spoken but not when their segmental content is removed by filtering or humming.

Hummed speech has the advantage that it preserves the prosody of natural speech while removing major aspects of semantic and syntactic information of the utterance. Human speakers have been shown to be able to selectively preserve the original prosodic contour of an utterance. When asked to produce reiterant speech (i.e., selectively preserving the prosodic contour of a meaningful utterance by repeating a syllable), the resultant utterance preserved the prosodic aspects from normal speech, such as duration and pitch (Larkey 1983), as well as accentuation and boundary marking (Rilliard and Aubergé 1998). When utterances are hummed, the hummed version of an utterance has also been shown to preserve pitch contour and duration of the natural spoken version (Pannekamp et al. 2005). Most importantly, the CPS component Pannekamp et al. observed at IPBs within hummed speech materials was similar to the CPS observed for natural speech, but more lateralized to the right hemisphere. Hummed speech has the additional advantage that it is a familiar human vocalization. Different from speech materials that are rendered unintelligible artificially and sound more unfamiliar (Scott et al. 2000; Meyer et al. 2004), the known unintelligibility of humming effectively prevents participants from any attempt to decipher the original speech content of the signal.

In the present fMRI study, sentence pairs containing 1 or 2 IPBs were presented auditorily to the participants as natural speech and as hummed sentences. Similar to previous electrophysiological studies (Steinhauer et al. 1999; Isel et al. 2005; Pannekamp et al. 2005), the sentence pairs were constructed using the same or equivalent content words, except for 1 or 2 critical words. All sentences were meaningful, syntactically correct, and spoken with natural prosody. To ensure variability in the sentence materials used with regard to the position of the additional IPB, 2 types of sentence pairs were constructed, one type with the additional IPB at an early position within the sentence (type A), the other type at a later position in the sentence (see Table 1, for examples of the sentence materials). Similar to previous electrophysiological studies, the IPBs contained in the sentences were obligatory. We had used materials with obligatory IPBs because such materials had been investigated in previous electrophysiological studies. A CPS had been observed for sentences of type A (Steinhauer et al. 1999; Pannekamp et al. 2005) as well as for coordination structures like type B (Steinhauer 2003). In the case of the naturally spoken sentences with obligatory IPBs, observed activation differences might in part be due to the additional semantic and syntactic differences between the sentences, rather than being solely due to the presence of IPBs. To differentiate the processing of prosody from associated syntactic and semantic processing, also a hummed version of each sentence was produced by a trained speaker who was instructed to preserve the natural prosody of the original sentence.

Table 1

Example of the stimulus materials and mean length of the sentences in seconds

 Sentences (naturally spoken and hummed) Natural Hummed 
Type A    
    1 IPB Peter verspricht Anna zu arbeiten # und das Büro zu putzen   
 Peter promises Anna to work and to clean the office 4.42 4.64 
    2 IPBs Peter verspricht # Anna zu entlasten # und das Büro zu putzen   
 Peter promises to support Anna and to clean the office 4.74 5.06 
Type B    
    1 IPB Otto bringt Fleisch, # Ute und Georg kaufen Salat und Säfte für das Grillfest   
 Otto contributes meat, Ute and Georg buy salad and soft drinks to the barbecue 5.73 5.76 
    2 IBPs Otto bringt Fleisch, # Ute kauft Salat # und Georg kauft Säfte für das Grillfest   
 Otto contributes meat, Ute buys salad and Georg buys soft drinks to the barbecue 6.12 5.85 
 Sentences (naturally spoken and hummed) Natural Hummed 
Type A    
    1 IPB Peter verspricht Anna zu arbeiten # und das Büro zu putzen   
 Peter promises Anna to work and to clean the office 4.42 4.64 
    2 IPBs Peter verspricht # Anna zu entlasten # und das Büro zu putzen   
 Peter promises to support Anna and to clean the office 4.74 5.06 
Type B    
    1 IPB Otto bringt Fleisch, # Ute und Georg kaufen Salat und Säfte für das Grillfest   
 Otto contributes meat, Ute and Georg buy salad and soft drinks to the barbecue 5.73 5.76 
    2 IBPs Otto bringt Fleisch, # Ute kauft Salat # und Georg kauft Säfte für das Grillfest   
 Otto contributes meat, Ute buys salad and Georg buys soft drinks to the barbecue 6.12 5.85 

The aim of the present study was to identify the neural structures involved in the processing of sentence-level prosody by comparing sentences with 2 IPBs to sentences that have only 1 IPB. To differentiate prosodic processing from syntactic and semantic processing, differences between sentences with a different number of IPBs were investigated separately for natural speech and hummed sentences.

With regard to the neural correlates of IPB processing, we hypothesized that the primary auditory cortices and the auditory association areas play an important role. These areas are, among others, involved in the processing of complex auditory signals and speech (see for a review, Griffiths et al. 2004). The superior temporal gyrus (STG) has been observed to be involved in processing of prosody (Doherty et al. 2004; Hesling et al. 2005). Activation in regions outside the temporal lobe has also been reported but appears to vary across different studies. These activations were observed to depend on the specifics of the task (Plante et al. 2002; Tong et al. 2005), the type of prosody involved (e.g., affective vs. linguistic, Wildgruber et al. 2004), and the degree of propositional information contained in the stimulus materials (Gandour et al. 2002, 2004; Hesling et al. 2005; Tong et al. 2005). It is possible that areas outside the temporal lobe are also involved and that the auditory association areas are only part of a more extended processing network. IPBs are realized by variations in the prosody of an utterance. We therefore hypothesized that the STG, among others, will show a modulation, positive or negative, due to the presence of an additional IPB.

Furthermore, given the observed shift of the CPS from a bilateral distribution for natural speech to a more right-hemispheric distribution for hummed sentences, a more right-hemispheric lateralization for prosodic processing may be observed. Evidence from patients with brain lesions inspired a first raw hypothesis about prosody processing, namely, that the right hemisphere plays a dominant role in prosody processing. Patients with lesions within the left hemisphere often suffer from aphasia, whereas nonaphasic patients with right-hemispheric damage seem to have difficulties to perceive or produce the prosodic aspects of speech (see, for a review, Wong 2002; but see Perkins et al. 1996). On the basis of this first raw hypothesis, 2 more sophisticated classes of hypotheses have been developed. According to the acoustic or cue-dependent class of hypotheses, hemispheric dominance is determined solely by the acoustic properties of the auditory signal. Zatorre and Belin (2001), for example, suggested that the left hemisphere is specialized in processing the high-frequency components that generate the vowels and consonants in speech, whereas the right hemisphere is specialized in processing the low-frequency patterns that make up the intonational contour of a syllable or sentence (for a similar view, see Poeppel 2003). According to the class of functional or task-dependent hypotheses, hemispheric dominance depends on the function of prosodic information (van Lancker 1980) or on the attentional focus required, for example, by an experimental task. A shift from the right hemisphere to the left is assumed to occur when the task or function of the prosodic information contained in a speech signal involves language processing rather than prosody processing. A review of the available empirical studies suggests that lateralization is stimulus dependent but can, in addition, vary as a function of task (Friederici and Alter 2004). It should be noted, however, that the first raw hypothesis of right-hemispheric dominance of prosody is still under debate. If prosodic processes are mainly subserved by the right hemisphere, we should find the right hemisphere being activated in both natural and hummed speech and if, moreover, prosody is represented neuronally as a separate layer, a direct comparison between hummed minus natural speech should result in more left-hemispheric activation for natural speech.

Methods

Participants

Sixteen healthy right-handed healthy young adults (8 female; mean age: 26.1 years, standard deviation [SD] = 4.44) took part in the experiment. The data of 2 participants were discarded from the analysis, one because of scanner malfunction, the other because of too many errors. All participants were or had been students of the University of Leipzig. They were native speakers of German with normal hearing and had no history of neurological or psychiatric illness. Volunteers were paid for their cooperation and had given written consent before the experiment. Ethical approval to the present study has been provided by the University of Leipzig.

Materials

Forty-eight German sentence pairs were constructed, which either had 1 or 2 IPBs. Ideally, the materials used should only differ with regard to the number of IPBs they contain. If possible, they should consist of the same words to ensure similar lexical retrieval processes. They should also contain the same number of words and syllables, so that they do not differ in length. To ensure some variability in the sentence materials used with regard to the position of the additional IPB, 2 types of sentence pairs were constructed. One type (A) had an early additional phrase boundary and one type (B) had an additional phrase boundary at a later position in the sentence (see Table 1, for examples of the sentence materials). The IPBs contained in the sentences were obligatory because such materials had been investigated in previous electrophysiological studies. Sentences of type A had been used in earlier electrophysiological studies where they were observed to elicit the CPS component (Steinhauer et al. 1999). These materials were also found to elicit the CPS component when they were low-pass filtered (Steinhauer and Friederici 2001) or hummed (Pannekamp et al. 2005). In addition, coordination structures like the type B sentences used here also have been investigated in previous electrophysiological studies (Steinhauer 2003), and a CPS was observed. The pairs of 1 and 2 IPB sentences of types A and B were as similar as possible with regard to the words used. In type B, the same content words were used. Also in type A, the same content words were used, with the exception of the verb, which was either transitive or intransitive. To ensure comparability, the frequency and number of syllables of the verb was matched within a type A sentence pair. We had constructed these sentence materials with the aim to create conditions that do not differ by more than the aspect under scrutiny, namely the number of IPBs. It should be noted that the sentence materials constructed for this experiment come close to this ideal but are not perfect. The type A (and type B) sentences with 1 or 2 IPBs additionally differ from each other with regard to their syntactic structure and word order (type B). A possibility to circumvent these differences would have been the choice of sentence materials with optional IPBs rather than obligatory IPBs as in the materials used here. Although obligatory IPBs correspond to major syntactic boundaries, optional IPBs are on a lower level of the prosodic hierarchy, namely the level of phonological phrases (Selkirk 1995). They do not necessarily correspond to syntactic phrases (Truckenbrodt 2005). Furthermore, optional IPBs depend on several factors such as speech rate and hesitations (filled or unfilled pauses) which might make them more difficult to detect for the listener. Although it is highly probable that optional IPBs also elicit a CPS component, this has not yet been investigated. The choice of our materials was primarily motivated to ensure comparability to previous electrophysiological studies where a CPS component was observed. Although not optimal, we think that our materials are reasonably comparable to allow inferences on prosodic processing.

The sentences were spoken with natural prosody. Additionally, a hummed version of each sentence was recorded. All materials were recorded from a trained female native speaker of German. The speaker was instructed to speak a hummed version of the sentence after its naturally spoken version and to take care to preserve its original prosody, speed, and total number of syllables of the normal sentence. In Figure 1, spectrograms and fundamental frequency contours are given for type A and type B example sentence (hummed and naturally spoken). The recorded materials were digitized (44.1 kHz) and normalized (70%) with regard to amplitude envelope to ensure an equal volume for all stimuli.

Figure 1.

Spectrograms (0–5000 Hz) and fundamental frequency contours (pitch: 0–500 Hz) for sample stimuli of the sentence materials used for all conditions of the experiment. Natural speech is presented on the left, the respective hummed version of the sentence on the right. Spectograms and pitch contours are given for each pair (1 and 2 IPBs) of sentence materials, for type A sentences in the top half, for type B sentences in the bottom half.

Figure 1.

Spectrograms (0–5000 Hz) and fundamental frequency contours (pitch: 0–500 Hz) for sample stimuli of the sentence materials used for all conditions of the experiment. Natural speech is presented on the left, the respective hummed version of the sentence on the right. Spectograms and pitch contours are given for each pair (1 and 2 IPBs) of sentence materials, for type A sentences in the top half, for type B sentences in the bottom half.

Figure 2.

Trial timing scheme.

Figure 2.

Trial timing scheme.

Design and Procedure

The task of the participants was to indicate in a 2-alternative forced-choice task whether a probe word presented after the sentence had been contained in the sentence or not. Probe words were chosen for each sentence and belonged to different word classes (e.g., nouns, verbs, determiners). The probes were spoken with final rising pitch, indicating a question. Of the 96 sentences in total, 72 were selected as experimental materials, 18 as probe sentences, and 6 as practice sentences. In each trial, participants had to decide whether the probe word had occurred in the sentence or not. “Yes” (“no”) answers were given with the middle (index) finger of the right hand. Sentences from the experimental materials never contained the probe. The probe sentences were not analyzed. In the case of the naturally spoken probe sentences, the probe word was always contained in the sentence and a “yes” answer was expected. In the case of the hummed experimental sentence materials, the task was very easy. As soon as the hummed sentence presentation finished, participants knew that “no” had to be the answer. The main purpose of the task in the case of the hummed sentence materials was to have participants attend to the presentation of the sentence while it lasted. In the hummed probe sentences, one word of the hummed sentence was naturally spoken. To prevent that the participants suspended attention as soon as they detected a naturally spoken word in the hummed probe sentence, the probe word was identical to the naturally spoken word within the probe sentence in only half of the trials. This ensured that participants had to wait until the presentation of the probe word.

Naturally spoken and hummed sentence materials were presented in alternating blocks. Every block consisted of 2 practice sentences, 24 experimental sentences, 6 probe sentences, and 6 null events (i.e., trials in which no auditory stimulus was presented). The null events were included to increase the efficiency of the design (Liu et al. 2001) and to provide a baseline condition for analysis. The blocks of hummed materials were of the same structure. The total experiment consisted of 6 blocks (3 hummed, 3 normal), that is, 228 trials in total. Trials were randomized with first-order transition probabilities between conditions held constant. Three randomizations were generated in total. Each trial began with the presentation of a hummed or normal sentence. Then a probe word was presented 500 ms after the offset of the sentence. After the presentation of the probe, a waiting time was inserted to ensure a total intertrial interval (ITI) of 11 s (see Fig. 1). With a repetition time (TR) of 3 s and an ITI of 11 s, trial presentation and scanner triggering were synchronous every 3 trials. To ensure synchronization, the beginning of every third trial was synchronized using the scanner trigger. After synchronization, a random waiting time of 0–600 ms was inserted before the presentation of the sentence. Stimulus presentation and response time recording were controlled by a computer outside the scanner using Presentation software (Neurobehavioral Systems Inc., Albany, CA). The experimental materials were presented over earphones within the scanner. The participants were additionally protected from scanner noise by earplugs. They reported that they had no difficulties understanding the sentences over the scanner noise. A button box compatible with the scanner environment was used to record the responses of the participants.

fMRI Data Acquisition

All imaging data were acquired using a 3T Bruker 30/100 Medspec MR scanner. A high-resolution structural scan using the 3-dimensional modified driven equilibrium Fourier transform imaging technique was also obtained (128 sagittal slices, 1.5 mm thickness, in-plane resolution: 0.98 × 0.98 mm, field of vision [FOV]: 250 mm). Functional images were acquired using a T2* gradient echo-planar imaging sequence. For the functional measurement, a noise-reduced sequence was chosen to ensure that noise would not impair comprehension. Eighteen ascending axial slices per volume were continuously obtained, in the plane of the anterior and posterior commissure (in-plane resolution of 3 × 3 mm, FOV: 192 mm, 5 mm thickness, 1 mm gap, echo time: 40 ms, TR: 3000 ms). The 18 slices covered the temporal lobes and part of the parietal lobes and cerebellum. According to a short questionnaire given to the participants after the experiment, the noise level of the scanner did not critically impair the auditory quality and comprehensibility of the naturally spoken or the hummed stimulus materials.

Functional Data Analysis

fMRI data analysis was performed with statistical parametric mapping (SPM2) software (Wellcome Department of Cognitive Neurology, London, UK). The 6 blocks were measured as separate runs with 142 volumes each. The first 2 images of each functional run were discarded to ensure signal stabilization. The functional data of each participant were motion corrected. The structural image of each participant was registered to the time series of functional images and normalized using the T1 template provided by SPM2, corresponding approximately to Talairach and Tournoux space (Talairach and Tournoux 1988). The functional images were normalized using the normalization parameters of the structural image and then smoothed with a full-width half-maximum Gaussian kernel of 12 mm. A statistical analysis on the basis of the general linear model was performed, as implemented in SPM2. Though hummed and normal sentence materials were presented in blocks, an event-related analysis was chosen. This made it possible to compare the sentences to the null events interspersed within the blocks as a baseline, as well as to discard probe trials and error trials from the analysis. The delta function of the trial onsets per condition was convolved with the canonical form of the hemodynamic response function as given in SPM2 and its first and second temporal derivative to generate model time courses for the different conditions. Each trial was modeled in SPM2 using the beginning of the sentence as trial onset. Due to the length of the auditorily presented sentence stimuli, the blood oxygen level–dependent response was modeled with a duration of 5.345 s (average length of the sentences, hummed and natural). Errors and probe sentences were modeled as separate conditions and excluded from the contrasts calculated. The functional time series was high-pass filtered with a frequency cutoff of 1/80 Hz. No global normalization was used. Motion parameters and the lengths of the individual sentences per trial were entered into the analysis as parameters of no interest. For the random-effects analysis, the calculated contrast images from the first-level analysis of each participant were entered into a second-level analysis. Region of interest (ROI) analysis—8 ROIs were created on the basis of the automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002): STG, the rolandic operculum, the gyrus of Heschl, and the inferior frontal gyrus (pars triangularis and pars opercularis). The ROI for the STG was divided into 3 parts, anterior (y > −15), middle (−35 < y < −15), and posterior (y < −35), and the rolandic operculum into 2 parts, anterior (y > −15) and posterior (y < −15). A visualization of the temporal ROIs is given in Figure 3. The ROI analysis was performed by averaging the effect sizes (contrast estimates) over all voxels for the left and right ROIs per participant.

Figure 3.

A visualization of the 6 temporal ROIs as an ascending series of axial slices (shown only for the left hemisphere). ROIs: anterior (y > −15, light blue), middle (−15 > y > −35, yellow), and posterior (y < −35, brown) part of the STG, gyrus of Heschl (dark blue), anterior (y > −35, medium blue), and posterior (y < −35, orange) part of the Rolandic operculum.

Figure 3.

A visualization of the 6 temporal ROIs as an ascending series of axial slices (shown only for the left hemisphere). ROIs: anterior (y > −15, light blue), middle (−15 > y > −35, yellow), and posterior (y < −35, brown) part of the STG, gyrus of Heschl (dark blue), anterior (y > −35, medium blue), and posterior (y < −35, orange) part of the Rolandic operculum.

Results

Behavioral

Participants were excluded from further analysis if they made more than 30% errors in any condition. This led to the exclusion of one participant in addition to the one participant who was excluded due to scanner malfunction. Analyzing the data of the remaining 14 participants yielded 144 errors (5.7%) in total. Response times were measured from the presentation of the probe word. Only correct responses to experimental trials, not probe trials, within the interval of 200–2000 ms were analyzed. Response times were entered into a repeated-measures analysis of variance (ANOVA) with speech type (natural vs. hummed) and IPB (1 vs. 2 IPBs) as factors. Participants reacted significantly faster to the hummed (473 ms, SD = 46.05) than to the naturally spoken sentence materials (753 ms, SD = 85.36), yielding a significant main effect of speech type (F1,13 = 30.64, mean squared error [MSE] = 35691, P < 0.001). The number of IPBs did not influence response times; the main effect of IPB and the interaction speech type × IPB were not significant.

fMRI Data: Whole-Brain Analysis

Comparisons of Sentence Materials with a Different Number of IPBs

For natural speech, sentences with 2 IPBs activated the STG, extending to the rolandic operculum and gyrus of Heschl, bilaterally, more strongly than sentences with 1 IPB (Fig. 4a and Table 2). In the corresponding comparison for hummed sentences, a significant cluster was observed in the left supramarginal gyrus, extending to the left STG and the left gyrus of Heschl. No significant clusters were observed in the reverse comparisons, for natural speech as well as for hummed sentences.

Figure 4.

(a) Comparison of sentences with 2 IPBs to sentences with 1 IPB: 2 IPBs > 1 IPB. Natural speech (left) and hummed sentences (right). (b) Comparison of speech types: natural speech > hummed sentences. Threshold: P < 0.001, uncorrected, showing only clusters with more than 40 voxels, corresponding to a P < 0.05 corrected on cluster level. Left is left in the image.

Figure 4.

(a) Comparison of sentences with 2 IPBs to sentences with 1 IPB: 2 IPBs > 1 IPB. Natural speech (left) and hummed sentences (right). (b) Comparison of speech types: natural speech > hummed sentences. Threshold: P < 0.001, uncorrected, showing only clusters with more than 40 voxels, corresponding to a P < 0.05 corrected on cluster level. Left is left in the image.

Table 2

Contrasts are thresholded at P < 0.001 uncorrected and P < 0.05 corrected on cluster level

Side  x y z k Z  x y z k Z 
 Natural speech > baseline Hummed sentences > baseline 
Frontal         
    Left       PrecGa −36 −12 66 4928 4.78 
    Right SMA 15 54 7897 5.36       
    Right Insulaa 36 24 7897 4.77 SMAa −3 66 4928 5.81 
    Right PrecGa 30 −6 30 7897 4.63 MFGa 27 36 30 4928 4.42 
    Right PrecGa 54 48 7897 4.09 PrecG 51 −3 4928 6.40 
Temporal           
    Left STG −60 −24 7897 6.25 STGa −54 −6 −6 4928 5.90 
    Right STGa 66 −18 7897 6.03 STG 51 54 47 4.96 
Parietal           
    Left SPL −30 −57 48 7897 4.68 IPL 27 −39 30 4928 3.63 
Cerebellum           
    Left       Crus1c −42 −54 −36 92 4.00 
    Right       VI 33 −36 −36 93 4.04 
    Right Crus1c 42 −57 −36 7897 4.87 Crus1 45 −57 −36 87 3.82 
 Natural > hummed sentencesb Hummedc > natural speechb 
Frontal         
    Left IFG_tri, IFG_oper −39 30 600 5.13 NS      
    Right SMA 15 54 209 5.42       
    Right IFG_tri 45 27 18 50 4.13       
    Right MFG 33 54 65 4.30       
Temporal           
    Left TP, MTG, STG −57 −24 583 5.01       
    Right MTG, STG 63 −18 150 4.43       
Parietal           
    Left AG, SPL −27 −63 42 83 4.00       
Basal ganglia           
    Left Thalamus, caudate −15 −12 12 347 4.13       
 Natural speech: 2 pauses > 1 pause Hummed sentences: 2 pauses > 1 pause 
Temporoparietal         
    Left HeschlG, STG, ROp −45 −18 12 287 4.51 SupramG, STG, HeschlG −63 −39 24 105 3.61 
    Right STG, ROp, HeschlG 42 −12 242 4.64       
 Natural speech: 1 pauses > 2 pause Hummed sentences: 1 pauses > 2 pause 
 NS      NS      
Side  x y z k Z  x y z k Z 
 Natural speech > baseline Hummed sentences > baseline 
Frontal         
    Left       PrecGa −36 −12 66 4928 4.78 
    Right SMA 15 54 7897 5.36       
    Right Insulaa 36 24 7897 4.77 SMAa −3 66 4928 5.81 
    Right PrecGa 30 −6 30 7897 4.63 MFGa 27 36 30 4928 4.42 
    Right PrecGa 54 48 7897 4.09 PrecG 51 −3 4928 6.40 
Temporal           
    Left STG −60 −24 7897 6.25 STGa −54 −6 −6 4928 5.90 
    Right STGa 66 −18 7897 6.03 STG 51 54 47 4.96 
Parietal           
    Left SPL −30 −57 48 7897 4.68 IPL 27 −39 30 4928 3.63 
Cerebellum           
    Left       Crus1c −42 −54 −36 92 4.00 
    Right       VI 33 −36 −36 93 4.04 
    Right Crus1c 42 −57 −36 7897 4.87 Crus1 45 −57 −36 87 3.82 
 Natural > hummed sentencesb Hummedc > natural speechb 
Frontal         
    Left IFG_tri, IFG_oper −39 30 600 5.13 NS      
    Right SMA 15 54 209 5.42       
    Right IFG_tri 45 27 18 50 4.13       
    Right MFG 33 54 65 4.30       
Temporal           
    Left TP, MTG, STG −57 −24 583 5.01       
    Right MTG, STG 63 −18 150 4.43       
Parietal           
    Left AG, SPL −27 −63 42 83 4.00       
Basal ganglia           
    Left Thalamus, caudate −15 −12 12 347 4.13       
 Natural speech: 2 pauses > 1 pause Hummed sentences: 2 pauses > 1 pause 
Temporoparietal         
    Left HeschlG, STG, ROp −45 −18 12 287 4.51 SupramG, STG, HeschlG −63 −39 24 105 3.61 
    Right STG, ROp, HeschlG 42 −12 242 4.64       
 Natural speech: 1 pauses > 2 pause Hummed sentences: 1 pauses > 2 pause 
 NS      NS      

Note: AG, angular gyrus; HeschlG, gyrus of Heschl; IFG_oper, inferior frontal gyrus, pars opercularis; IFG_tri, inferior frontal gyrus, pars triangularis; IPL, inferior parietal lobule; k, cluster size; MFG, middle frontal gyrus; MTG, middle temporal gyrus; NS, not significant; PrecG, precentral gyrus; ROp, rolandic operculum; SPL, superior parietal lobule; TP, temporal pole; SupramG, surpamarginal gyrus. Coordinates are reported as given by SPM2, corresponding only approximately to Talairach–Turnoux space (Talairach and Tournoux 1988; Brett et al. 2001). Anatomical labels are given on the basis of the classification of the AAL (automated anatomical labeling) atlas (Tzourio-Mazoyer et al. 2002).

a

Activation is part of a bigger cluster.

b

Masked with natural speech > baseline (hummed sentences > baseline) at P < 0.05, inclusive.

c

Cerebellar labels within the AAL atlas are based on Schmahmann et al. (1999). The first label denotes the location of the maximum, the following labels denote further areas containing a majority of voxels of the activated cluster.

Comparisons of the 2 Basic Types of Materials (hummed sentences and natural speech) to Baseline

When natural speech was compared with baseline (null events), the strongest activations were observed within the STG, bilaterally, and the supplemental motor area (SMA) (Table 2). Further activations were observed within the right precentral gyrus, the right insula, the left superior parietal lobule, and in the cerebellum. When hummed sentences were compared with baseline, activations were observed within the STG, bilaterally, the precentral gyrus, bilaterally, the SMA, and the right middle frontal gyrus, as well as the left inferior parietal lobule. Activations were also observed within the cerebellum, bilaterally.

Comparisons of Hummed Sentences to Natural Speech

Compared with hummed sentences, natural speech activated the frontal gyrus and middle temporal gyrus, bilaterally, but with a left-hemispheric predominance, as well as the left angular gyrus and the left thalamus and caudate, more strongly (Fig. 4b and Table 2). No brain area was significantly more activated for hummed sentences than for natural speech.

fMRI Data: ROI Analysis

To further investigate and compare the behavior of the brain areas potentially involved in IPB processing, we conducted an ROI analysis over 8 ROIs (pars opercularis and triangularis of the inferior frontal gyrus, anterior, middle and posterior part of the STG, gyrus of Heschl, and anterior and posterior rolandic operculum). The effect sizes for each condition were averaged over all voxels contained within each ROI per side and participant. The results per ROI, hemisphere, and condition are shown in Figure 5. They were entered into a repeated-measures ANOVA with ROI (8), “hemisphere” (left vs. right), IPB (1 vs. 2 IPS), and speech type (naturally spoken vs. hummed) as factors. The left hemisphere was on average more strongly activated than the right hemisphere, which is reflected in a significant main effect of hemisphere (F1,13 = 49.53, MSE = 2.279, P < 0.001). Natural speech activated the brain areas investigated more strongly than hummed speech, yielding a significant main effect of speech type (F1,13 = 11.93, MSE = 4.415, P < 0.01). More activation was observed for sentences containing 2 IPBs than sentences containing 1 IPB giving a main effect of IPB (F1,13 = 17.45, MSE = 0.385, P < 0.01). The different overall activation levels within the ROIs yielded a significant main effect of ROI (F7,91 = 72.71, MSE = 1.435, P < 0.001). speech type and IPB yielded additive effects as none of the interactions containing both factors reached significance. The type of speech materials (hummed or natural) influenced lateralization, yielding a significant speech type × hemisphere interaction (F1,13 = 30.59, MSE = 0.419, P < 0.001) and depended on the ROI giving a significant triple ROI × speech type × hemisphere interaction (F7,91 = 3.92, MSE = 0.103, P < 0.001). The lateralization, type of materials, and number of IPBs influenced activation differently in different ROIs, giving the significant 2-way interactions ROI × hemisphere (F7,91 = 19.83, MSE = 0.437, P < 0.001), ROI × speech type (F7,91 = 6.40, MSE = 0.175, P < 0.001), and ROI × IPB (F7,91 = 16.00, MSE = 0.018, P < 0.001). The significant triple interactions between ROI × hemisphere × IPB (F7,91 = 6.14, MSE = 0.007, P < 0.001) and ROI × hemisphere × speech type (F7,91 = 3.92, MSE = 0.103, P < 0.001) indicate that speech type and IPB each interacted with hemisphere differently in different ROIs. The remaining interactions were not significant.

Figure 5.

Results of the ROI analysis (bars with black stripes = left hemisphere, white bars = right hemisphere). Effect sizes were averaged over all voxels of each ROI. Error bars represent the standard error of the mean. n1, natural speech with 2 IPBs; n2, natural speech with 2 IPBs; h1, hummed sentences with 1 IPB; h2, hummed sentences with 2 IPBs; IFG oper, inferior frontal gyrus, pars opercularis; IFG tri, inferior frontal gyrus, pars triangularis; ant/post RolOp, anterior/posterior part of the rolandic operculum; ant/mid/post STG, anterior/middle/posterior part of the STG; Heschl, gyrus of Heschl.

Figure 5.

Results of the ROI analysis (bars with black stripes = left hemisphere, white bars = right hemisphere). Effect sizes were averaged over all voxels of each ROI. Error bars represent the standard error of the mean. n1, natural speech with 2 IPBs; n2, natural speech with 2 IPBs; h1, hummed sentences with 1 IPB; h2, hummed sentences with 2 IPBs; IFG oper, inferior frontal gyrus, pars opercularis; IFG tri, inferior frontal gyrus, pars triangularis; ant/post RolOp, anterior/posterior part of the rolandic operculum; ant/mid/post STG, anterior/middle/posterior part of the STG; Heschl, gyrus of Heschl.

To identify these differences between ROIs more closely we conducted additional ANOVAs separately for different ROIs. Significant main effects for IPB were observed in all ROIs except for the inferior frontal gyrus ROIs. The main effect speech type was significant in all ROIs but the anterior and posterior rolandic operculum. Significant main effects for hemisphere were observed in all ROIs with the exception of the anterior STG and the anterior rolandic operculum. The hemisphere × speech type interaction was significant in all but 3 ROIs, the anterior and posterior rolandic operculum and the gyrus of Heschl. A hemisphere × IPB interaction was observed in the posterior STG and the gyrus of Heschl. No ROI showed a significant speech type × IPB interaction. The triple interaction hemisphere × speech type × IPB was significant only in the anterior rolandic operculum.

To investigate differences with regard to lateralization for the 2 types of material in every ROI more closely, Newman–Keuls post hoc tests were calculated for the interaction hemisphere × speech type. A stronger activation within the left hemisphere compared with the right was observed for natural speech in nearly all ROIs, except the anterior rolandic operculum and the anterior part of the STG. For hummed speech, a left-hemispheric dominance was observed in only 4 ROIs, the gyrus of Heschl, the middle and posterior part of the STG, and the posterior rolandic operculum. No ROI showed a significantly stronger activation of the right hemisphere for hummed sentences compared with natural speech.

Discussion

The aim of the present study was to identify the brain areas involved in the processing of sentence-level prosody by investigating the processing of IPBs. We will first discuss our results with regard to differences in the processing of sentences with 2 IPBs as compared with sentences with 1 IPB and later turn to the general differences between natural and hummed speech.

Brain Areas Involved in IPB Processing

In the whole-brain analysis, stronger activation was observed for sentences with 2 IPBs than sentences with 1 IPB within the left STG, for natural speech as well as for hummed sentences. An additional focus within the STG on the right side was significant for natural speech but failed to reach significance for hummed speech. This indicates that processing an additional IPB activates the STG similarly for natural speech and hummed sentences.

In the ROI analysis (pars opercularis and triangularis of the inferior frontal gyrus, anterior, middle and posterior part of the STG, gyrus of Heschl, and anterior and posterior rolandic operculum), more activation for 2 than for 1 IPB was observed only in the temporal ROIs, but not within the inferior frontal gyrus. This indicates that prosody processing mainly involves brain areas related to auditory processing. In addition, all but one ROIs, the posterior rolandic operculum, showed a significant main effect or an interaction with speech type (natural speech or hummed sentences). This result could be taken to indicate that these regions are involved in the processing of the more complex segmental information contained in natural speech. The posterior rolandic operculum (bilaterally), on the other hand, did not show stronger activation for natural speech compared with hummed sentences: there was no main effect or any interaction with speech type. This suggests that this region might play a less pronounced role in the processing of the specific spectral composition and the additional linguistic information (semantics, syntax) contained in natural speech. As this region, however, was more strongly activated by materials containing 2 IPBs rather than one (main effect of IPB), it could be speculated that this region might be specifically involved in the processing of the prosodic information.

Interestingly, there was no interaction of IPB and speech type in any of the ROIs investigated. This indicates that the activation elicited by the presence of an additional IPB in the ROIs investigated did not depend on the comprehensibility of the utterance. Although IPBs can aid the understanding of an utterance, we did not observe any evidence for an interaction in the respective ROIs, not even within the left inferior frontal gyrus, a region assumed to be involved in syntactic processing. This finding should not lead to the conclusion, however, that prosody and syntax do not interact. It is possible that syntax and prosody are processed independently from each other in different brain regions and that the interaction between both types of information occurs in higher associative areas within the brain. This is also suggested by a recent ERP study with patients suffering from lesions in the corpus callosum (Friederici et al. 2007).

So far, only one neuroimaging study investigated the perception of IPBs (Strelnikov et al. 2006). In the condition with IPB, the IPB could be in one of 2 positions, changing the meaning of the sentence. “To chop not # to saw” (“Rubit nelzya, pilit”) means: “not to chop but to saw,” whereas “To chop # not to saw” (“Rubit, nelzya pilit”) means “to chop and not to saw.” This is a possible construction in Russian because, different from Germanic languages such as German, Dutch, or English, the negative word “not” may have scope to its left or right depending on the position of an IPB. In this condition, participants were first presented with one of the 2 versions and had to select the appropriate alternative (one needs to: chop/saw). In the condition without IPB, participants were first presented with a simple statement (“Father bought him a coat”) and then had to select the appropriate alternative (“His father bought him: a coat/a watch”). Strelnikov et al. reported stronger activation for sentences containing an IPB (e.g., “To chop not # to saw,” or “To chop # not to saw”) than for sentences without an IPB (e.g., “Father bought him a coat”) within the right posterior prefrontal cortex and an area within the right cerebellum. Different from our results, Strelnikov et al. did not observe stronger activation within the STG. However, as already outlined in the introduction, the task used was different for both sentences types. Although the task consisted in both cases of a visually presented question with 2 response alternatives (“Father bought him: a coat/a watch” and “One needs to: saw/chop,” for the 2 examples, respectively), the first type of materials (sentences without IPBs) required a semantic judgement based on the segmental content of the utterance, whereas the second type of materials (sentences with an IPB) required a semantic judgement based on the segmental content as well as the prosodic information (IPB position) of the utterance. It is therefore possible that the differences observed by Strelnikov et al. (2006) might not only be due to the presence or absence of IPBs in the materials but also to task variables.

To summarize our results so far, we observed a stronger activation within the STG for sentences containing 2 IPBs compared with sentences containing 1 IPB. The activation extended to the gyrus of Heschl and the rolandic operculum. To find out whether one of these areas might show a specialization with regard to prosody processing as compared with the processing of the segmental content of speech, we also conducted an ROI analysis. In the ROI analysis, the posterior part of the rolandic operculum was the only brain region that showed a modulation of activation due to the presence of an additional IPB independent of the amount of segmental information contained in natural speech as compared with hummed sentences.

The Processing of Hummed Sentences Compared with Natural Speech

Hummed sentences also represent a human vocalization and a complex auditory signal. Comparing the processing of natural speech with hummed sentences can therefore yield information with regard to the brain areas processing the segmental content of speech as compared with brain areas processing the more basic aspects of speech. Comparing the processing of hummed sentences to baseline gives an initial overview of the brain areas involved in the processing of hummed speech, as well as, potentially, in the processing of prosodic aspects of speech. The strongest activations were observed within the STG, extending to the rolandic operculum and the gyrus of Heschl, areas involved in auditory processing. Strong activations were also observed within the SMA, the right precentral gyrus, and the cerebellum. These activations are most likely due to the manual response required by the task. On the basis of the logic of cognitive subtraction, subtracting hummed sentences from natural speech should yield brain areas that are involved in the processing of segmental information, lexico-semantics, and syntax. Stronger activations for natural speech than for hummed sentences were observed in the middle temporal gyrus, bilaterally, and within the left opercular and triangular part of the inferior frontal gyrus, corresponding approximately to Brodmann area 44 and 45. These frontotemporal activations are related to the processing of syntactic and semantic information contained in natural speech. Similar activations have been reported in other studies comparing natural speech to unintelligible speech (Spitsyna et al. 2006). Natural speech also activated the STG more strongly than hummed speech, possibly reflecting the involvement of the auditory cortex in processing the more complex spectral components carrying the segmental information of natural speech. It should be noted that the probe-monitoring task used here might have had some influence on the brain activation patterns observed. Our task induced an attentional focus on lexico-semantic processing rather than pure prosody processing that might also have influenced the activations we observed in temporal areas for hummed speech. In the following, we will further discuss our findings on the background of the organization of the auditory system with regard to its relation to speech and prosody processing.

The Brain Areas Involved in the Processing of Complex Sounds and Speech

The auditory association cortex seems to represent mainly spectral and temporal properties of auditory stimuli rather than more abstract high-level properties such as auditory objects (see, for reviews, Griffiths and Giraud 2004; Griffiths et al. 2004). The auditory association areas are assumed to be processing more abstract properties than primary auditory cortex. Some of these properties are assumed to be processed in separate regions. Rauschecker and Tian (2000) proposed that anterior regions subserve auditory object identification, whereas posterior regions process spatial information, similar to the ventral (“what”) and dorsal (“where”) pathways of the visual system (Ungerleider and Mishkin 1982). Although such an anterior–posterior distinction is supported in part by single-cell recordings in monkeys (Recanzone 2000; Tian et al. 2001) as well as by lesion (Clarke et al. 2000; Adriani et al. 2003) and brain-imaging studies (Maeder et al. 2001), the distinction is not clear-cut. The functional nature of this anterior–posterior division is therefore still under debate (Zatorre et al. 2002).

With regard to the processing of speech, the dual pathway model (Rauschecker and Tian 2000) suggests that speech should activate regions anterior to the primary auditory cortex as it is a familiar and highly complex auditory signal with no relation to spatial information. When intelligible speech is compared with nonspeech, stronger left-lateralized activations obtain in anterior and middle parts of the superior temporal sulcus, for single words as well as for sentences (Binder et al. 2000; Scott al. 2000; Narain et al. 2003; Meyer et al. 2004). However, often also activation within the posterior temporal and inferior parietal regions is observed in imaging studies (Meyer et al. 2002; Spitsyna et al. 2006). Although not directly derivable from the dual pathway model, this finding is not altogether surprising. It has long been known from lesion studies that posterior temporal and inferior parietal regions are critical for supramodal language comprehension (Geschwind 1965).

The stronger activations for natural speech than for hummed sentences observed in the present study within the STG are compatible with previous results. We observed stronger activation for natural than for hummed speech in middle and posterior parts of the STG and the posterior part of the rolandic operculum. With regard to the lack of activation in anterior parts of the superior temporal lobe, it could be speculated that hummed speech is recognized as a familiar human vocalization. This familiarity might explain differences to results from studies using speech materials that are rendered unintelligible artificially and sound more unfamiliar (Scott et al. 2000; Meyer et al. 2004).

It should be noted, however, that the activations observed within the temporal cortex for speech may not be specific for speech. Similar activations have been observed for other complex auditory stimuli such as musical sounds and melodies (Griffiths et al. 1998; Patterson et al. 2002; see, for a review, Price et al. 2005). It is possible that differences between speech and other complex auditory stimuli are subtle and go beyond simple localization. A recent study by Tervaniemi et al. (2006), for example, showed that the superior temporal region reacts differently to changes in pitch or duration for speech syllables than for musical sounds.

The Brain Areas Involved in the Perception of Prosody

Activations within the STGs have been regularly observed in imaging studies when speech with or without prosodic modulation is presented and compared with baseline (e.g., Gandour et al. 2003, 2004; Wildgruber et al. 2004; Hesling et al. 2005). Regions outside the temporal lobe have also been observed to show a modulation of activation dependent on prosodic properties but appear to vary considerably across different studies. Prosody-related activations were observed to depend on the specifics of the task (Plante et al. 2002; Tong et al. 2005), the type of prosody involved (e.g., affective vs. linguistic, Wildgruber et al. 2004), and the degree of propositional information contained in the stimulus materials (Gandour et al., 2003, 2004; Hesling et al. 2005; Tong et al. 2005). Although the superior temporal region seems to be the main candidate for the perception of prosodic modulation in speech, other areas outside the temporal lobe are likely to be involved, and the auditory association areas are likely to be only part of a more extended processing network for prosodic aspects of speech.

Imaging studies investigating the processing of sentence-level prosody often compared natural speech to speech stimuli either with no segmental information or with no or little prosodic information. One possibility is to remove the segmental information by filtering, preserving the intonational contour of the sentence (Meyer et al. 2002, 2004). The results from these studies point toward a frontotemporal network with a right-hemispheric dominance. The rolandic operculum, in particular in the right hemisphere, has been identified as part of this network. Another approach is to remove the intonational contour of sentences, for example, by high-pass filtering (Hermann et al. 2003; Meyer et al. 2004) or to reduce prosody information by speaking with little prosodic expressiveness (Hesling et al. 2005). The rationale for this approach is to identify prosody-related activations by subtracting low-prosody speech from normal speech, thus ideally subtracting away activations related to the processing of the segmental content of speech while leaving prosody-related activations. The results from these studies, however, are mixed. Meyer et al. (2004) did not observe stronger activations for normal speech than for low-prosody speech. In this study, the task required a prosody comparison between 2 successive sentences stimuli in an experimental setting in which flattened speech (no prosodic information) and degraded speech (no segmental information) were presented in a pseudorandomized order. Hesling et al. (2005) observed different results for intelligible and nonintelligible (i.e., low-pass filtered) speech. In the case of high-prosody intelligible speech, the right STG was more strongly activated than in the case of low-prosody intelligible speech. High-prosody nonintelligible speech did not activate any brain regions more strongly than low-prosody nonintelligible speech. These results indicate that the additional prosodic information contained in high-prosodic speech activates the STG, although not very strongly. Finally, a study by Doherty et al. (2004) compared sentences with a rising pitch (question) to sentences with a falling pitch (statement). Although their results might not exclusively reflect prosody processing, they observed, among others, a stronger activation within the STG, bilaterally, for sentences with a rising pitch. These findings also indicate that the STG might play a dominant role in the first stages of prosody processing. The results of the present study, namely, that areas within and around the STG are involved in IPB processing therefore agree well with previous results.

The Lateralization of Prosody Processing

Although a right-hemispheric dominance in prosody processing was initially suggested based on evidence from patients, the results with regard to the perception of sentence-level nonaffective prosody show a mixed picture. Some studies show greater impairment in patients with right hemisphere damage, compared with patients that had lesions in their left hemisphere, and controls (Bryan et al 1989), whereas others find greater impairment in patients with left hemisphere damage (Perkins et al. 1996; Pell and Baum 1997) or a more complicated pattern of preserved function alongside impairments for both patient groups compared with controls (Baum et al. 1997; Imaizumi et al. 1998; Walker et al. 2001; Baum and Dwivedi 2003). A possible reason for these differences might be that the patient groups comprise patients with nonoverlapping patterns of brain damage, often in frontoparietal areas. Cognitive functioning in these patients might therefore be compromised in a number of domains, such as, for example, working memory and drawing inferences, faculties required in complex tasks such as prosodic disambiguation of sentences. Imaging studies investigating the perception of sentence-level prosody with healthy adults circumvent this problem and allow a separate assessment of lateralization for different brain regions.

A number of imaging studies have investigated the lateralization of prosodic processing (Meyer et al. 2002; Kotz et al. 2003; Meyer et al. 2004; Hesling et al. 2005). Meyer et al. (2002, 2004) observed stronger right-hemispheric activations within frontotemporal areas for low-pass–filtered speech than for natural speech, whereas Kotz et al. (2003) using natural speech stimuli observed predominantly bilateral activations. Hesling et al. (2005) found activation within the right STG when high-prosody intelligible speech was compared with low-prosody intelligible speech. Another way to remove the propositional content of speech while preserving its prosodic contour is to present foreign language materials to listeners with no knowledge of this language. In an experiment by Tong et al. (2005), bilateral activation was observed for English speakers listening to Chinese language materials for 8 of 9 ROIs investigated. A stronger right-hemispheric activation was observed for only one ROI within the middle frontal gyrus. Tong et al. also found mostly bilateral activation (6 of 9 ROIs) for Chinese participants listening to Chinese and a dependence of lateralization on the type of task. This evidence indicates that prosody when presented together with segmental information is subserved by a bilaterally distributed network of fronto–temporal brain areas.

In the present study, hummed sentence materials were presented as well as natural speech. Under the hypothesis that the right hemisphere is specialized in the processing of sentence-level prosody, it was hypothesized that the processing of hummed speech compared with baseline shows right-hemispheric lateralization. In the whole-brain analysis, bilateral activation was observed in the STG and the precentral gyrus, and stronger activation of the right hemisphere was observed within the middle frontal gyrus for hummed sentences. In the ROI analysis, 6 of 8 ROIs showed stronger activation within the left hemisphere for natural speech, whereas only 4 of 8 ROIs show a left-hemispheric dominance for hummed sentences. This suggests that prosodic processing is organized more bilaterally than the processing of other properties of speech, such as syntax or lexico-semantics. Further insights about the lateralization with regard to the processing of specific linguistic aspects of prosody can be derived from the lateralization of IPB processing. An interaction between the number of IPBs and lateralization was observed only in 2 ROIs (posterior STG, gyrus of Heschl) with both ROIs showing a stronger modulation of activation due to the number of IPBs in the left hemisphere, for natural speech as well as hummed sentences. This could be taken to indicate that these 2 temporal areas show a left-hemispheric dominance for prosody processing.

The left-hemispheric dominance in the temporal regions observed in the present study might be due to some degree to the task employed. Although other studies required participants to attend directly to the prosodic information of the speech stimuli (Meyer et al., 2002, 2004; Tong et al. 2005), a probe detection task was used here that required participants, even in the hummed speech condition, to attend to and memorize a potentially appearing naturally spoken word carrying segmental information. The results of the present study are therefore compatible with the functional or task-dependent class of hypotheses. According to this class of hypotheses, the lateralization of prosody processing could shift from the right to the left hemisphere when the task or the stimulus materials promote attention to syntactic–semantic rather than prosodic properties of the materials.

Conclusion

This study aimed at identifying the brain areas involved in the processing of sentence-level prosody and, in particular, the processing of IPBs. Sentences with 2 IPBs activated the STG, bilaterally, more strongly than sentences with 1 IPB. This pattern of activation was very similar for natural speech and hummed sentences. The results from the ROI analysis suggest that the posterior rolandic operculum might play a specific role in the processing of prosodic information because it was the only ROI not showing an influence of the type of speech materials (hummed sentences or natural speech). When comparing natural speech and hummed sentence materials, we found natural speech to activate a number of areas in the left hemisphere more strongly than hummed sentences. The left-hemispheric dominance of temporal activations observed for hummed sentences, however, might be due to the attentional focus on segmental information required by the task employed in the present study.

Funding

Human Frontier Science Program (HFSP RGP5300/2002-C102) to KA.

Conflict of Interest: None declared.

References

Adriani
M
Maeder
P
Meuli
R
Bellmann
A
Frischknecht
R
Villemure
J-G
Mayer
J
Annoni
J-M
Bogousslavsky
J
Fornari
E
, et al.  . 
Sound recognition and localization in man: specialized cortical networks and effects of acute circumscribed lesions
Exp Brain Res
 , 
2003
, vol. 
153
 (pg. 
591
-
604
)
Austin
J
How to do things with words
The William James Lectures delivered at Harvard University in 1955
 , 
1975
2nd ed
Cambridge (MA)
Harvard University Press
Baum
S
Pell
M
Leonard
C
Gordon
J
The ability of right- and left-hemisphere-damaged individuals to produce and interpret prosodic cues marking phrasal boundaries
Lang Speech
 , 
1997
, vol. 
40
 (pg. 
313
-
330
)
Baum
SR
Dwivedi
VD
Sensitivity to prosodic structure in left-and right-hemisphere-damaged individuals
Brain Lang
 , 
2003
, vol. 
87
 (pg. 
278
-
289
)
Binder
JR
Frost
JA
Hammeke
TA
Bellgowan
PSF
Springer
JA
Kaufman
JN
Possing
ET
Human temporal lobe activation by speech and nonspeech sounds
Cereb Cortex
 , 
2000
, vol. 
10
 (pg. 
512
-
528
)
Brett
M
Christoff
K
Cusack
R
Lancaster
J
Using the Talairach atlas with the MNI template
Neuroimage
 , 
2001
, vol. 
13
 pg. 
S85
 
Bryan
K
Language prosody and the right hemisphere
Aphasiology
 , 
1989
, vol. 
3
 (pg. 
285
-
299
)
Clarke
S
Bellman
A
Meuli
R
Assal
G
Steck
AJ
Auditory agnosia and auditory spatial deficits following left hemispheric lesions: evidence for distinct processing pathways
Neuropsychologia
 , 
2000
, vol. 
38
 (pg. 
797
-
807
)
Cooper
WE
Paccia-Cooper
J
Syntax and speech
 , 
1981
Cambridge (MA)
Harvard University Press
de Rooij
JJ
Prosody and the perception of syntactic boundaries
IPO Annu Prog Rep
 , 
1975
, vol. 
10
 (pg. 
36
-
39
)
de Rooij
JJ
Perception of prosodic boundaries
IPO Annu Prog Rep
 , 
1976
, vol. 
11
 (pg. 
20
-
24
)
Doherty
CP
West
WC
Dilley
LC
Shattuck-Hufnagel
S
Caplan
D
Question/statement judgments: an fMRI study of intonation processing
Hum Brain Mapp
 , 
2004
, vol. 
23
 (pg. 
85
-
98
)
Frazier
L
Carlson
K
Clifton
C
Prosodic phrasing is central to language comprehension
Trends Cogn Sci
 , 
2006
, vol. 
6
 (pg. 
244
-
249
)
Friederici
AD
Alter
K
Lateralization of auditory language functions: a dynamic dual pathway model
Brain Lang
 , 
2004
, vol. 
89
 (pg. 
267
-
276
)
Friederici
AD
von Cramon
DY
Kotz
SA
Role of the corpus callosum in speech comprehension: interfacing syntax and prosdody
Neuron
 , 
2007
, vol. 
53
 (pg. 
135
-
145
)
Gandour
J
Dzemidzic
M
Wong
D
Lowe
M
Tong
Y
Hsieh
L
Satthamnuwong
N
Lurito
J
Temporal integration of speech prosody is shaped by language experience: an fMRI study
Brain Lang
 , 
2003
, vol. 
84
 (pg. 
318
-
336
)
Gandour
J
Tong
Y
Wong
D
Talavage
T
Dzemidzic
M
Xu
Y
Li
X
Lowe
M
Hemispheric roles in the perception of speech prosody
Neuroimage
 , 
2004
, vol. 
23
 (pg. 
344
-
357
)
Gandour
J
Wong
D
Lowe
M
Dzemidzic
M
Satthamnuwong
N
Tong
Y
Li
X
A cross-linguistic fMRI study of spectral and temporal cues underlying phonological processing
J Cogn Neurosci
 , 
2002
, vol. 
14
 (pg. 
1076
-
1087
)
Geschwind
N
Disconnexion syndromes in animals and man: part I
Brain
 , 
1965
, vol. 
88
 (pg. 
237
-
294
)
Grabe
E
Warren
P
Nolan
F
Resolving category ambiguities—evidence from stress shift
Speech Commun
 , 
1994
, vol. 
15
 (pg. 
101
-
114
)
Griffiths
TD
Büchel
C
Frackowiak
RSJ
Patterson
RD
Analysis of temporal structure in sound by the human brain
Nat Neurosci
 , 
1998
, vol. 
1
 (pg. 
422
-
427
)
Griffiths
TD
Giraud
AL
Frackowiak
RSJ
Friston
KJ
Frith
CD
Dolan
RJ
Price
CJ
Zeki
S
Ashburner
J
Penny
W
Auditory function
Human brain function
 , 
2004
2nd ed
Amsterdam
Elsevier
(pg. 
p. 61
-
75
)
Griffiths
TD
Warren
JD
Scott
SK
Nelken
I
King
AJ
Cortical processing of complex sound: a way forward?
Trends Neurosci
 , 
2004
, vol. 
27
 (pg. 
181
-
185
)
Grosjean
F
Hirt
C
Using prosody to predict the end of sentences in English and French: normal and brain-damaged subjects
Lang Cogn Process
 , 
1996
, vol. 
11
 (pg. 
107
-
134
)
Herrmann
CS
Friederici
AD
Oertel
U
Maess
B
Hahne
A
Alter
K
The brain generates its own sentence melody: a Gestalt phenomenon in speech perception
Brain Lang
 , 
2003
, vol. 
85
 (pg. 
396
-
401
)
Hesling
I
Clement
S
Bordessoules
M
Allard
M
Cerebral mechanisms of prosodic integration: evidence from connected speech
Neuroimage
 , 
2005
, vol. 
24
 (pg. 
937
-
947
)
Imaizumi
S
Mori
K
Kiritani
S
Hiroshi
H
Tonoike
M
Task-dependent laterality for cue decoding during spoken language processing
Neuroreport
 , 
1998
, vol. 
9
 (pg. 
899
-
903
)
Isel
F
Alter
K
Friederici
AD
Influence of prosodic information on the processing of split particles: ERP evidence from spoken German
J Cogn Neurosci
 , 
2005
, vol. 
17
 (pg. 
154
-
167
)
Kotz
SA
Meyer
M
Alter
K
Besson
M
von Cramon
DY
Friederici
AD
On the lateralization of emotional prosody: an event-related functional MR investigation
Brain Lang
 , 
2003
, vol. 
86
 (pg. 
366
-
376
)
Kreiman
J
Perception of sentence and paragraph boundaries in natural conversation
J Phon
 , 
1982
, vol. 
10
 (pg. 
163
-
175
)
Larkey
LS
Reiterant speech: an acoustic and perceptual validation
J Acoust Soc Am
 , 
1983
, vol. 
73
 (pg. 
1337
-
1345
)
Liu
TT
Frank
LR
Wong
EC
Buxton
RB
Detection power, estimation efficiency and predictability in eventrelated fMRI
Neuroimage
 , 
2001
, vol. 
13
 (pg. 
759
-
773
)
Maeder
PP
Meuli
RA
Adriani
M
Bellmann
A
Fornari
E
Thiran
J-P
Pittet
A
Clarke
S
Distinct pathways involved in sound recognition and localization: a human fMRI study
Neuroimage
 , 
2001
, vol. 
14
 (pg. 
802
-
816
)
Meyer
M
Alter
K
Friederici
AD
Lohmann
G
von Cramon
DY
FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences
Hum Brain Mapp
 , 
2002
, vol. 
17
 (pg. 
73
-
88
)
Meyer
M
Steinhauer
K
Alter
K
Friederici
AD
von Cramon
DY
Brain activity varies with modulation of dynamic pitch variance in sentence melody
Brain Lang
 , 
2004
, vol. 
89
 (pg. 
277
-
289
)
Narain
C
Scott
SK
Wise
RJS
Rosen
S
Leff
A
Iversen
SD
Matthews
PM
Defining a left-lateralized response specific to intelligible speech using fMRI
Cereb Cortex
 , 
2003
, vol. 
13
 (pg. 
1362
-
1368
)
Nespor
M
Vogel
I
Cutler
A
Ladd
DR
Prosodic structure above the word
Prosody: models and measurements
 , 
1983
Berlin (Germany)
Springer Verlag
(pg. 
123
-
140
)
Pannekamp
A
Toepel
U
Alter
K
Hahne
A
Friederici
AD
Prosody-driven sentence processing: an event-related brain potential study
J Cogn Neurosci
 , 
2005
, vol. 
17
 (pg. 
407
-
421
)
Patterson
RD
Uppenkamp
S
Johnsrude
I
Griffiths
TD
The processing of temporal pitch and melody information in auditory cortex
Neuron
 , 
2002
, vol. 
36
 (pg. 
767
-
776
)
Pell
M
Baum
S
The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults
Brain Lang
 , 
1997
, vol. 
57
 (pg. 
80
-
99
)
Perkins
JM
Baran
JA
Gandour
J
Hemispheric specialization in processing intonation contours
Aphasiology
 , 
1996
, vol. 
10
 (pg. 
343
-
362
)
Plante
E
Creusere
M
Sabin
C
Dissociating sentential prosody from sentence processing: activation interacts with task demands
Neuroimage
 , 
2002
, vol. 
17
 (pg. 
401
-
410
)
Poeppel
D
The analysis of speech in different temporal integration windows: cerebral lateralization as asymmetric sampling in time
Speech Commun
 , 
2003
, vol. 
41
 (pg. 
245
-
255
)
Price
C
Thierry
G
Griffiths
T
Speech-specific auditory processing: where is it?
Trends Cogn Sci
 , 
2005
, vol. 
9
 (pg. 
271
-
276
)
Rauschecker
JP
Tian
B
Mechanisms and streams for processing of ‘what’ and ‘where’ in auditory cortex
Proc Natl Acad Sci USA
 , 
2000
, vol. 
97
 (pg. 
11800
-
11806
)
Recanzone
GH
Spatial processing in the auditory cortex of the macaque monkey
Proc Natl Acad Sci USA
 , 
2000
, vol. 
97
 (pg. 
11829
-
11835
)
Rilliard
A
Aubergé
V
Reiterant speech for the evaluation of natural vs. synthetic prosody
 , 
1998
(pg. 
87
-
92
Third ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan Caves House, Blue Mountains, Australia, November 26–29, 1998
Schafer
AJ
Speer
SR
Warren
P
White
SD
Intonational disambiguation in sentence production and comprehension
J Psycholing Res
 , 
2000
, vol. 
29
 (pg. 
169
-
182
)
Schmahmann
JD
Doyon
J
McDonald
D
Holmes
C
Lavoie
K
Hurwitza
AS
Kabanic
N
Toga
A
Evans
A
Petrides
M
Three-dimensional MRI atlas of the human cerebellum in proportional stereotaxic space
Neuroimage
 , 
1999
, vol. 
10
 (pg. 
233
-
260
)
Scott
SK
Blank
CC
Rosen
S
Wise
RJS
Identification of a pathway for intelligible speech in the left temporal lobe
Brain
 , 
2000
, vol. 
123
 (pg. 
2400
-
2406
)
Searle
J
An essay in the philosophy of language
 , 
1969
Cambridge (MA)
Cambridge University Press
Selkirk
E
Goldsmith
JA
Sentence prosody: intonation, stress, and phrasing
The handbook of phonological theory
 , 
1995
Cambridge (MA)
Blackwell
(pg. 
550
-
569
)
Selkirk
E
Horne
M
The interaction of constraints on prosodic phrasing
Prosody: theory and experiment
 , 
2000
Dordrecht (The Netherlands)
Kluwer Academic Publishing
(pg. 
231
-
262
)
Spitsyna
G
Warren
JE
Scott
SK
Turkheimer
FE
Wise
RJS
Converging language streams in the human temporal lobe
J Neurosci
 , 
2006
, vol. 
26
 (pg. 
7328
-
7336
)
Stark
CEL
Squire
LR
When zero is not zero: the problem of ambiguous baseline conditions in fMRI
Proc Natl Acad Sci USA
 , 
2001
, vol. 
98
 (pg. 
12760
-
12766
)
Steinhauer
K
Electrophysiological correlates of prosody and punctuation
Brain Lang
 , 
2003
, vol. 
86
 (pg. 
142
-
164
)
Steinhauer
K
Alter
K
Friederici
AD
Brain potentials indicate immediate use of prosodic cues in natural speech processing
Nat Neurosci
 , 
1999
, vol. 
2
 (pg. 
191
-
196
)
Steinhauer
K
Friederici
AD
Prosodic boundaries, comma rules, and brain responses: the closure positive shift in ERPs as a universal marker for prosodic phrasing in listeners and readers
J Psycholing Res
 , 
2001
, vol. 
30
 (pg. 
267
-
295
)
Strelnikov
KN
Vorobyev
VA
Chernigovskaya
TV
Medvedeva
SV
Prosodic clues to syntactic processing—a PET and ERP study
Neuroimage
 , 
2006
, vol. 
29
 (pg. 
1127
-
1134
)
Talairach
J
Tournoux
P
Co-planar stereotaxic atlas of the human brain
 , 
1988
New York
Thieme
Tervaniemi
M
Szameitat
A
Kruck
S
Schröger
E
Alter
K
De Baene
W
Friederici
AD
From air oscillations to music and speech: functional magnetic resonance imaging evidence for fine-tuned neural networks in audition
J Neurosci
 , 
2006
, vol. 
23
 (pg. 
8647
-
8652
)
t'Hart
RC
Cohen
A
A perceptual study of intonation: an experimental-phonetic approach to speech melody
 , 
1990
Cambridge (UK)
Cambridge University Press
Tian
B
Reser
D
Durham
A
Kustov
A
Rauschecker
J
Functional specialization in rhesus monkey auditory cortex
Science
 , 
2001
, vol. 
292
 (pg. 
290
-
293
)
Tomasi
D
Ernst
T
Caparelli
EC
Chang
L
Common deactivation patterns during working memory and visual attention tasks: an intra-subject fMRI study at 4 Tesla
Hum Brain Mapp
 , 
2006
, vol. 
27
 (pg. 
694
-
705
)
Tong
Y
Gandour
J
Talavage
T
Wong
D
Dzemidzic
M
Xu
Y
Li
X
Lowe
M
Neural circuitry underlying sentence-level linguistic prosody
Neuroimage
 , 
2005
, vol. 
28
 (pg. 
417
-
428
)
Truckenbrodt
H
A short report on intonation phrase boundaries in German
Linguist Ber
 , 
2005
, vol. 
203
 (pg. 
273
-
296
)
Tzourio-Mazoyer
N
Landeau
B
Papathanassiou
D
Crivello
F
Etard
O
Delcroix
N
Mazoyer
B
Joliot
M
Automated anatomical labelling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single subject brain
Neuroimage
 , 
2002
, vol. 
15
 (pg. 
273
-
289
)
Ungerleider
L
Mishkin
M
Ingle
DJ
Goodale
MA
RJW
Two cortical visual systems
Analysis of visual behavior
 , 
1982
Cambridge (MA)
MIT Press
(pg. 
549
-
586
)
Van Lancker
D
Cerebral lateralization of pitch cues in the linguistic signal
Pap Linguist: Int J Hum Commun
 , 
1980
, vol. 
13
 (pg. 
200
-
277
)
Walker
JP
Fongemie
K
Daigle
T
Prosodic facilitation in the resolution of syntactic ambiguities in subjects with left and right hemisphere damage
Brain Lang
 , 
2001
, vol. 
78
 (pg. 
169
-
196
)
Wildgruber
D
Hertrich
I
Riecker
A
Erb
M
Anders
S
Grodd
W
Ackermann
H
Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation
Cereb Cortex
 , 
2004
, vol. 
14
 (pg. 
1384
-
1389
)
Wong
PCM
Hemispheric specialization of linguistic pitch patterns
Brain Res Bull
 , 
2002
, vol. 
59
 (pg. 
83
-
95
)
Zatorre
RJ
Belin
P
Spectral and temporal processing in human auditory cortex
Cereb Cortex
 , 
2001
, vol. 
11
 (pg. 
946
-
953
)
Zatorre
RJ
Bouffard
M
Ahad
P
Belin
P
Where is ‘where’ in the human auditory cortex?
Nat Neurosci
 , 
2002
, vol. 
5
 (pg. 
905
-
909
)