Human subjects overestimate the change of rising intensity sounds compared with falling intensity sounds. Rising sound intensity has therefore been proposed to be an intrinsic warning cue. In order to test this hypothesis, we presented rising, falling, and constant intensity sounds to healthy humans and gathered psychophysiological and behavioral responses. Brain activity was measured using event-related functional magnetic resonance imaging. We found that rising compared with falling sound intensity facilitates autonomic orienting reflex and phasic alertness to auditory targets. Rising intensity sounds produced neural activity in the amygdala, which was accompanied by activity in intraparietal sulcus, superior temporal sulcus, and temporal plane. Our results indicate that rising sound intensity is an elementary warning cue eliciting adaptive responses by recruiting attentional and physiological resources. Regions involved in cross-modal integration were activated by rising sound intensity, while the right-hemisphere phasic alertness network could not be supported by this study.
In a hazardous environment, it is fundamentally important to successfully evaluate the significance of sounds. An organism's failure to pay attention to an important sound source might shorten its lifespan, while reacting to meaningless acoustic events will dissipate resources otherwise needed. Sudden and intense sounds elicit startle responses, an automated set of motor actions to deal with emergency situations. Less intense sounds, however, can trigger quick adaptive responses as well, among them the autonomic orienting reaction (Solokov 1963). Such warning cues can be learned as conditioned stimuli or by experimental instruction. There may however be sounds that contain an intrinsic, unconditioned warning value. Besides the novelty of a sound—the orienting response habituates quickly—it is not known which stimulus characteristics constitute such a warning value.
Neuroimaging studies have addressed complex auditory stimuli indicating salient events in the environment, including language (Zald 2003), scary music (Gosselin et al. 2005), emotional vocalization in language (Phillips et al. 1998; Morris et al. 1999; Sander et al. 2005), and pure emotional vocalization such as laughing and crying (Sander and Scheich 2001; Seifritz et al. 2003). In most studies, such stimuli have been shown to activate the human amygdala, which has been described as a detector of relevant events in the environment (Sander et al. 2003) and reacts to arousing stimuli (Zald 2003; Lewis et al. 2006). Activity in other areas seems to be dependent on stimuli and paradigms and is less consistent.
The signaling properties of such complex sounds are obvious. On a more molecular level, however, it is much less clear which basic auditory cues constitute warning cues.
Rising sound intensity—that is analogous to “auditory looming”—has been proposed as a possible candidate for an elementary auditory warning cue (Neuhoff 1998). This hypothesis is based on the fact that rising sound intensity is overestimated compared with falling sound intensity, a perceptual illusion that might advance the shaping of an adequate response to potentially relevant events. An attentional bias toward rising sound intensity was shown in rhesus monkeys, who orient more often to rising than to falling sound intensity (Ghazanfar et al. 2002). Rising sound intensity can signal an approaching sound source, and consequently, the distance of approaching sound sources is, in general terms, underestimated (Rosenblum et al. 1987; Schiff and Oldak 1990) and, more specifically, estimated to be smaller than that of receding sound sources that have the same objective distance to the listener (Neuhoff 2001). Attentional bias toward rising sound intensity resembles responses to visual looming, that is, seemingly approaching objects (Ball and Tronick 1971). When auditory and visual looming information are combined, rhesus monkeys attend more to coherent visual and auditory signals than to conflicting information (Maier et al. 2004). However, discrepant information of one modality does not affect performance on time-to-arrival judgments compared with coherent information in both modalities (Gordon and Rosenblum 2005). The latter 2 studies confirm that when visual and auditory looming information is accessible, both are integrated.
These data underline the perceptual significance across species of rising sound intensity. Its neurobiological value as an intrinsic warning stimulus, beyond intensity change perception, is however still not understood in detail. We have investigated the apparent auditory motion-alert properties of rising and falling sound intensity using blocked stimulus presentation in a previous experiment (Seifritz et al. 2002). However, in such a paradigm, activity related to repetitive presentation of the same warning signals will quickly habituate and will likely not be detected (Breiter et al. 1996).
Warning cues that are established by associative learning in the experimental setting phasically increase alertness. Reactions to subsequent targets in the respective modality are accelerated when the warning cue appears shortly before target onset modality and task-specific mechanisms have proposed (Posner 1980) as well as general, modality-unspecific mechanisms (Roberts et al. 2006). Neuroimaging studies have found a supramodal right-hemisphere network for phasic alertness, comprising dorsolateral and ventrolateral frontal cortices, anterior cingulate cortex, inferior temporal gyrus, and thalamus, whereas a left-hemisphere network has been related to intrinsic attention (Sturm and Willmes 2001). These networks are similar to those observed in visual alerting (Sturm and Willmes 2001), although other studies report different networks (Thiel et al. 2004).
We suggest that an intrinsic auditory warning stimulus will enhance activation of early preattentive processes related to stimulus registration, prepare for action, increase phasic alertness, shift attentional resources toward the auditory modality, and activate both a phasic alertness network in the right hemisphere as well as the amygdala as detector of intrinsically relevant events in the environment. Therefore, we presented rising, falling, and constant intensity sine tones in an event-related fashion and measured in 2 experiments psychophysiological, behavioral, and neuronal responses to these stimuli. First, we addressed the question of whether rising sound intensity would facilitate the autonomic orienting response (Solokov 1963). This response is a distinctive subprocess signaling the active orientation of attention toward potentially significant events. We especially addressed heart rate (HR) deceleration and skin conductance response (SCR). Furthermore, rising and falling sounds were tested as warning stimuli in a phasic alertness paradigm, suitable to also address divided attention between acoustic and visual modalities. Finally, functional magnetic resonance imaging (fMRI) was used to examine whether rising sounds would activate the right-hemisphere networks and the amygdala.
Materials and Methods
We studied healthy volunteers in the psychoacoustic (10 males, 11 females; mean age ± standard deviation 25 ± 5 years) and imaging studies (18 males, 17 females; mean age ± standard deviation 29 ± 6 years). Informed consent was obtained from all subjects. Volunteers who participated in the psychoacoustic experiment were not enrolled in the fMRI study.
All experiments were carried out using pulsed tones of 2 s duration and a 1-kHz carrier frequency that were amplitude-modulated with a smoothed square wave envelope of 5 Hz. The 2-s sound sweeps were multiplied with an exponential function to obtain sound pressure level changes of 15 dB (rising intensity sound, 70–85 dB; falling intensity sound, 85–70 dB; constant intensity sound, 77 dB). Stimuli were bilaterally presented through headphones. A pilot study revealed that these stimuli do not elicit startle eyeblink responses.
During fMRI, participants were instructed to concentrate on the changes in the auditory signals and to fixate a visual crosspiece to avoid eye movements. They were not asked to carry out any output tasks or to make judgments about intensity, auditory motion, or other sound parameters. This passive listening task was chosen because it was assumed to better resemble a real-life situation, where an immediate reaction to warning cues is not normally required. Also, the task was designed to be comparable to previous studies. Electrooculographic recordings outside the scanner using similar stimuli showed no task-related eye movements (Seifritz et al. 2002).
All experimental sessions were conducted in the morning. Participants were relaxed and sitting. A total of 180 stimuli (60 per category) were presented via headphones in 4 blocks of 45 stimuli with a mean stimulus onset asynchrony of 10 s. In a complex phasic alertness paradigm, a visual (20%, i.e., 36 trials), auditory (20%, i.e., 36 trials), or no target (60%, i.e., 108 trials) was delivered 100 ms after the stimulus had ended.
Auditory targets were delivered via headphones (1200 Hz, 85 dB, 100 ms). Visual targets (100 ms) were delivered using a red light emitting diode located 0.6 m in front of the participant. Reaction times were measured via a right-hand push button with 1 ms temporal resolution.
For the analysis of orienting responses, only trials not followed by reaction time cues were included, as to avoid motor action obscuring autonomic responses. Electrocardiography electrodes were attached according to a standard lead II configuration. SCRs were assessed via 2 Ag/AgCl electrodes at thenar and hypothenar position of the left hand. Data were analogue/digital-converted at 1 kHz rate with 16-bit resolution. An offline artifact control was conducted, and HR data from one participant had to be excluded because of low signal quality. The SCR signal was resampled at 5 Hz. All SCR data were log-transformed and individually z-scored to control for interindividual differences in skin conductance responsiveness. The instantaneous HR signal was calculated offline, linearly interpolated, and resampled at 5 Hz. For statistical analysis, the peak SCR was calculated as mean SCR between 4 and 5 s after stimulus onset, corrected for 1 s baseline before stimulus onset. In a similar fashion, peak HR deceleration was calculated as mean HR between 2 and 3 s after stimulus onset corrected for 1 s baseline before stimulus onset.
Imaging data were acquired on a 1.5-T magnetic resonance scanner (Sonata, Siemens Medical Solution, Erlangen, Germany) equipped with a circularly polarized head coil. Anatomical T1-weighted volumes were obtained with a 3-dimensional magnetization-prepared rapid acquisition gradient-echo sequence at a voxel size of 1 mm3 (repetition time [TR], 9.7 ms; echo time [TE], 4 ms). The fMRI data were acquired using blood oxygen level–dependent (BOLD) signal-sensitive T2*-weighted gradient-recalled echo-planar imaging (TE, 54 ms; TR, 2675 ms; interslice time, 107 ms). A series of 355 functional whole-brain volumes consisting of 25 contiguous oblique slices 5 mm thick (field of view, 180 × 180 mm2; matrix, 64 × 64 pixels) were acquired. The first 9 volumes were discarded to obtain steady-state regarding longitudinal magnetization and scanner-induced auditory excitation. Fifteen stimuli of 3 categories (rising, falling, and constant) were presented in an event-related design with an average stimulus onset asynchrony of 18.4 s. Acquisition of fMRI data produced a banking background noise peaking at 100 dB; however, noise reduction by headphones (Commander XG, Resonance Technology, Northridge, CA) of approximately 30 dB and the spectral difference between scanner noise and presented sounds allowed clear perception of stimuli.
Images were postprocessed using Brain Voyager QX 1.6 (Brain Innovation, Maastricht, the Netherlands) and Matlab 6.5 (MathWorks, Natick, MA) software. The fMRI time series were corrected for slice-acquisition time through sinc interpolation, motion-corrected using Levenberg–Marquart least square fit for 6 spatial parameters, corrected voxelwise for linear and nonlinear drifts with a high-pass temporal filter of 0.01 Hz, realigned with their corresponding T1 volumes, warped into Talairach space, resampled into 3 mm isotropic voxels, and smoothed using a 6-mm full-width at half-maximum isotropic Gaussian kernel. To detect magnitude and latency differences in BOLD response across brain regions, stimulus-specific responses to each event type (rising, falling, and constant intensity) were modeled using a canonical hemodynamic response function (double gamma) together with its first- and second-order temporal derivatives (Henson et al. 2000, 2002). The resulting 9 functions, together with a constant term, were used as predictors in a random effects general linear model analysis (Worsley and Friston 1995). The parameter estimates were used to generate statistical parametric maps (t-statistic) for main and differential effects of the stimuli. All t-maps were thresholded at a significance level that protected against false-positive effects at 5% (corrected for multiple comparisons). For the main effects of all responses, a whole-slab Bonferroni-corrected threshold of P < 0.05 was accepted. For differential effects, we used a combined voxel- and cluster-level correction approach based on the 3-dimensional extension of the randomization procedure described previously (Forman et al. 1995; Etkin et al. 2004). Accordingly, a voxel-level threshold was set at t = 3.61 (P < 0.001 uncorrected); then, the thresholded maps were submitted to a whole-slab correction criterion based on the estimate of the maps' spatial smoothness and on an iterative procedure (Montecarlo simulation) for estimating cluster-level false-positive rates. After 1000 iterations, maps were applied the minimum cluster size threshold that yielded a cluster-level false-positive rate (alpha) of 5%. The generated statistical parametric maps were finally superimposed on anatomical sections of the standardized Montreal Neurological Institute T1-weighted brain template (www.bic.mni.mcgill.ca).
In experiment 1, we investigated orienting response and found an enhanced SCR (F1,20 = 19.5; P < 0.001) and an enhanced early deceleration of HR (F1,19 = 4.3; P < 0.05) in response to rising compared with falling intensity sounds (Fig. 1). This pattern of autonomic responses represents the prototypical changes during enhanced orienting reflex, which constitutes an active central nervous process interrupting ongoing behavior and is dependent on expectation as well as physical properties of the stimulus (Solokov 1963).
When cues followed rising compared with falling intensity sounds, reaction times were accelerated to acoustic cues and delayed to visual cues (significant a priori contrasts, see Fig. 2; F1,20 [interaction] = 11.8; P < 0.01). There was no significant main effect of rising sound intensity or of modality.
In experiment 2, rising compared with falling sound intensity activated the right amygdala (Figs. 3, 4). The BOLD time course during rising sound intensity peaked about 8 s after stimulus onset. Descriptively, this contrast was due to an increase of the BOLD signal during rising sound intensity as well as a decrease during falling sound intensity. In addition, areas in the left temporal plane and the posterior part of the left superior temporal sulcus (STS) extending to the temporoparietal junction were activated (Figs. 3, 4), as well as an area in the right intraparietal sulcus (IPS) (Figs. 3, 4), where BOLD peaks were observed earlier than in the amygdala. There was no response in Heschl's gyrus (primary auditory cortex) or in any other area.
In order to explore laterality effects in the amygdala, a post hoc analysis was conducted by forming a region of interest in the left amygdala that corresponded to the activated region in the right amygdala. Although activity in the left amygdala could be identified with a rather lenient threshold of P < 0.05 (uncorrected), activity was significantly greater in the right than in the left amygdala (t498 = 55; P < 0.0001).
By using an event-related paradigm, we tested psychophysiological, behavioral, and neuronal responses to rising as compared with falling sound intensity. We found that the physically identical sounds with rising compared with falling sound intensity facilitated autonomic orienting response and accelerated reaction times to subsequent acoustic stimuli, while reaction times to stimuli in the visual modality were slowed down. Time locked to these psychophysiological reactions, a neural network comprising the right amygdala, right IPS, posterior part of the left STS, and left temporal plane was activated. Our data demonstrate that intensity change in a simple auditory stimulus is sufficient to activate the amygdala, trigger autonomic reaction indicative of early preattentive stimulus registration, and reallocate processing resources by selectively increasing phasic alertness for auditory stimuli. It seems therefore reasonable to assume that rising sound intensity is an elementary auditory warning cue. These findings specify the neural underpinnings of the perceptual illusion of overestimated rising sound intensity (Neuhoff 1998) and of the attentional preference of rhesus monkeys for rising sound intensity (Ghazanfar et al. 2002).
Facilitation of the autonomic orienting reflex occurs when significant stimuli are detected (Barry 1987; Siddle 1991). Previous experiments have shown increased phasic alertness after presentation of an experimentally learned auditory warning cue (Roberts et al. 2006). Rising sound intensity elicited similar responses without being learned by conditioning or instruction, thus acting as an intrinsic warning cue.
Specifically, we could show HR deceleration that is considered to mirror the activity of early preattentive processes of stimulus registration, as well as an increased SCR which is considered to reflect to mobilization of energetic resources (Barry 1987). Reallocation of resources was shown by increased phasic alertness to auditory targets. This effect was modality specific and did not occur in response to visual targets. A possible reason is the hypothesized specialization of the visual system for identification, while in this model, the acoustic system subserves efficient orientation and direction of the visual system (Guski 1992). In this framework, quick reaction to acoustic targets would be compatible with the acoustic system's functions, whereas quick reaction to visual targets would contradict the visual system's propensity to identify the source of the previous sound. Thus, modality specificity might not be present in real-life situations, where coincident auditory and visual cues are typically present at the same time (Amedi et al. 2005) and multisensory integration can take place (Maier et al. 2004).
Rising Sound Intensity and Amygdala Responses
The amygdala has been shown to respond to a number of related stimulus properties, among them novelty (Zald 2003), arousal (Lewis et al. 2006), relevance (Sander et al. 2003), and ambiguity (Whalen 1998; Rosen and Donley 2006). In light of our behavioral data which showed that rising sound intensity acts as a salient warning cue, we propose that amygdala activation here is mainly related to the salience of rising sound intensity that might indicate relevant events in the environment. Other explanations have to be taken into account. Although rising sound intensity may be one feature of an approaching object (Hall and Moore 2003), several other acoustic phenomena are known to signal sound source motion. Therefore, the artificial sound stimuli used in this investigation are more ambiguous than natural recordings of approaching and receding sound objects. Amygdala activation could therefore also be linked to ambiguity (Rosen and Donley 2006). Further studies with varying spatial cues in rising sound intensity will help elucidating if it is salience per se or ambiguity of salience information, which correlates with amygdala activity.
Amygdala activity in response to rising sound intensity is also in line with an increased orienting response. Although the amygdala as anatomical structure is not necessary for eliciting orienting responses (Zald 2003), it exerts influence over other brain structures that trigger both sympathetic and parasympathetic autonomic responses (Davis and Whalen 2001). Thus, a modulating influence of the amygdala on the orienting response could be shown. As an example, orienting responses to fearful faces predict amygdala activity (Williams et al. 2001). In the present study, behavioral and neuroimaging data could not directly be correlated. With methodological limitations, however, it seems reasonable to assume a modulatory function of the amygdala on the orienting response in response to rising sound intensity.
Rising Sound Intensity and the Auditory System
In a quite different methodological approach and on a different timescale, it has been shown that neurons in the primary auditory cortex of awake monkeys respond selectively to ultrashort sounds (duration of a couple of milliseconds) with rising (ramped) and falling (damped) sound intensity (Lu et al. 2001). We did not observe a selective reaction to rising and falling sound intensity in the primary auditory cortex. The drastically different nature of the sounds in the study of Lu et al. and our experiment has however to be taken into account. Our results are consistent with the observation that the primary auditory cortex is not activated in acoustic pattern perception that rather takes place in the temporal plane (Griffiths and Warren 2002).
In contrast to primary auditory areas, we observed left temporal plane activity in response to rising versus falling sound intensity. Temporal plane activity has been shown in auditory spatial analysis in general (Pavani et al. 2002; Warren et al. 2002), and specifically when motion-signaling properties of a sound have to be segregated from intrinsic sound properties (Griffiths and Warren 2002), which is likely to be the case in simple motion cues such as rising and falling sound intensity. It is therefore plausible to assume that the temporal plane serves analysis of rising and falling intensity, which are in all but one parameter physically identical. As an explanation for the differential activity observed in this study, one must take advantage of the greater significance of rising sound intensity. Although attentional influence on temporal plane processing has been discussed somewhat equivocal (Griffiths and Warren 2002), this study adds evidence to the assumption that temporal plane processing is influenced by stimulus significance and attentional processes.
Rising and falling sound intensity stimuli in this investigation are characterized by a sudden offset that is more salient in rising than in falling sound intensity. Sound offset—or acoustic edge detection in general—has been related to activity strictly lateralized to right temporal areas (Herdener et al. 2007). These right-hemisphere areas were not found activated in our study. Therefore, it seems likely that rising and falling sound intensity are detected as complex signals rather than simply by different offsets.
Rising Sound Intensity and the Superior Temporal Sulcus/Temporoparietal Junction
The left STS is not commonly involved in salience detection or auditory motion perception (Warren et al. 2002). Its function in cross-modal analysis is well-known (Beauchamp 2005), especially for complex or socially relevant stimuli (Barraclough et al. 2005). The adjacent temporoparietal junction has also been described as multimodal integrator of change detection (Downar et al. 2000). In the monkey brain, the superior temporal polysensory area (STPa) is located in the STS. A majority of monkey STPa neurons respond to visual motion, and more specifically, to looming signals (Anderson and Siegel 1999). Activity in the STS can therefore be interpreted as attempt to enable cross-modal integration, which is more pronounced after salient rising sound intensity. This is in line with the cited framework of an auditory system that quickly detects salient objects and directs the visual system for further identification (Guski 1992). Another tentative speculation arises from the fact that rising sound intensity is one of the basic components of vocalization (Cowie et al. 2001). Hence, activation of left hemispheric language-related areas would be consistent with a concept where rising sound intensity represents a more generalizable meaning beyond motion perception and is evaluated by language-related networks although it does not contain full language or prosody information. The variation of motion-signaling properties seems an interesting approach to clarify this issue.
Rising Sound Intensity and IPS
We observed an activated region in the right IPS. This area corresponds to the monkey ventral intraparietal area (VIP) in that both brain regions process moving stimuli from different modalities and that they especially respond to simultaneous input from 2 or 3 modalities (Bremmer et al. 2001). In addition to their capacity in detecting movement, there are neurons in the monkey VIP that specifically respond to visual looming, that is, objects seemingly on a collision course toward the eye or the body surface (Graziano and Cooke 2006). These neurons have also been described to be bimodal or trimodal with additional tactile and auditory receptive fields. Activity in this area therefore supports the concept of multisensory integration of looming signals that has previously been shown on a behavioral level both in monkeys and humans (Maier et al. 2004; Gordon and Rosenblum 2005).
Rising Sound Intensity and the Right-Hemisphere Alertness Network
In response to rising sound intensity, here we could show an increase in phasic alertness as indicated by accelerated reaction to subsequent auditory targets. As a substrate for phasic alertness elicited by cues that are learned in the experimental setting, a right-hemisphere network has been proposed, comprising dorsolateral and ventrolateral frontal cortices, anterior cingulate cortex, inferior temporal gyrus, and thalamus (Sturm and Willmes 2001). None of these areas were active in response to rising sound intensity. As a reason, it might be speculated that attentional shifts induced by experimentally learned, distinct cues in low-demanding cognitive-motor tasks are operated by different brain networks than alertness induced by complex, intrinsically motivated, and environmentally salient cues. Furthermore, diversity of phasic alertness networks in the visual domain has been described (Sturm and Willmes 2001; Thiel et al. 2004); therefore it seems likely that also in auditory tasks, recruitment of such networks is dependent on stimulus material and paradigm.
In the search for elementary components of sounds that constitute intrinsic warning cues, rising sound intensity has been proposed. Here, we show that rising compared with falling sound intensity leads to facilitated autonomic orienting, modality-specific acceleration of reaction time, and activity of the right amygdala and left temporal areas. This provides direct evidence for the warning properties of rising sound intensity. STS and IPS activity might be indicative for cross-modal integration processes. The right-hemisphere phasic alertness network could not be shown in this study.
We thank C. Canela and S. Zwiller for help in data acquisition. This work was supported by the Swiss National Science Foundation PP00B-103012/1 and by the American National Institute for Occupational Safety and Health. Conflict of Interest: None declared.