Expectations of the timing and intensity of a stimulus propagate to the auditory periphery through the medial olivocochlear reflex

Abstract Expectations concerning the timing of a stimulus enhance attention at the time at which the event occurs, which confers significant sensory and behavioral benefits. Herein, we show that temporal expectations modulate even the sensory transduction in the auditory periphery via the descending pathway. We measured the medial olivocochlear reflex (MOCR), a sound-activated efferent feedback that controls outer hair cell motility and optimizes the dynamic range of the sensory system. MOCR was noninvasively assessed using otoacoustic emissions. We found that the MOCR was enhanced by a visual cue presented at a fixed interval before a sound but was unaffected if the interval was changing between trials. The MOCR was also observed to be stronger when the learned timing expectation matched with the timing of the sound but remained unvaried when these two factors did not match. This implies that the MOCR can be voluntarily controlled in a stimulus- and goal-directed manner. Moreover, we found that the MOCR was enhanced by the expectation of a strong but not a weak, sound intensity. This asymmetrical enhancement could facilitate antimasking and noise protective effects without disrupting the detection of faint signals. Therefore, the descending pathway conveys temporal and intensity expectations to modulate auditory processing.


Introduction
Humans constantly generate predictions about future events and adaptively optimize neural processing in order to cope with a large amount of information with limited neural resources (Rao and Ballard 1999;Friston 2005;Palmer et al. 2015;Singer et al. 2018). Such predictions not only pertain to what will happen in the sensory environment but also concern the time at which the upcoming event will occur (reviewed in Arnal and Giraud 2012;Nobre and van Ede 2018;Rimmele et al. 2018). Temporal expectations guide the dynamical peak of attention to a particular time point and allow for the allocation of processing resources solely to relevant sensory events, thus enabling faster responses and improved behavioral performance (Niemi and Lehtonen 1982;Barnes and Jones 2000;Correa et al. 2005;Rohenkohl et al. 2012;Cravo et al. 2013).
Recent human studies have started to uncover the brain regions underlying temporal expectation. Functional brain imaging has pointed toward the involvement of multiple neocortical regions responsible for cognitive processing, namely the prefrontal cortex (Coull et al. 2000;Vallesi et al. 2009), left-lateralized parietal cortex (Coull and Nobre 1998;Coull et al. 2008), and ventral premotor cortex (Schubotz 2007;Coull et al. 2008). In addition, the integration of supplementary motor areas and the superior temporal gyrus has been suggested (Cui et al. 2009). Consistently, the increased response to stimuli occurring at expected time points is observed in the P300 wave and the lateralized readiness potential, both of which are linked to response preparation and execution after the appearance of target events (Miniussi et al. 1999;Griffin et al. 2002;Los and Heslenfeld 2005;Correa et al. 2006;Hackley et al. 2007). Sensory processing was also reported to be modulated by temporal expectation in perceptual potentials originating in the auditory and visual cortex (Ghose and Maunsell 2002;Jaramillo and Zador 2011), as well as in auditory subcortical areas (Gorina-Careta et al. 2016). With respect to resource allocation, it is reasonable that such expectation-based optimization begins at early processing stages. In visual pathways, for example, even oculomotor behavior, that is, saccades, ocular drift, and blinks, is reported to be modulated by temporal expectation (Betta and Turatto 2006;Dankner et al. 2017;Amit et al. 2019). Furthermore, the pupil light reflex (PLR), which changes the pupil size and adjusts the amount of light entering the eye to balance sensitivity and visual acuity (Binda and Gamlin 2017), is modulated by the expectation of luminance coming from the point in space toward which gaze is moving (Mathôt et al. 2015). This would allow the periphery to prepare for intense luminance and achieve an optimal sensitivity for upcoming stimuli under a limited dynamic range.
As for the auditory pathway, however, the location where the modulation according to temporal expectations first occurs, as well as its role in such modulatory process, is unknown. As in the visual system, the information concerning temporal expectations may modulate the auditory periphery through the top-down control of peripheral receptor activity via the descending pathway. This could be accomplished through two major efferent feedback pathways (Liberman and Guinan 1998), the middle ear muscle ref lex (MEMR) and the medial olivocochlear ref lex (MOCR), both of which can decrease responses at the auditory periphery. The MEMR acts by stiffening the ossicle chains (Mukerji et al. 2010), whereas the MOCR induces an inhibitory effect on the motility of outer hair cells (OHCs; Guinan 2006). The MOCR and MEMR can be considered the auditory counterparts of the PLR, as many similarities exist between these ref lexes, including their brainstem origination aimed at optimizing sensory dynamic range and their slow response speed (operating over hundreds of milliseconds) (Liberman and Guinan 1998;Backus and Guinan 2006;Guinan 2006;Binda and Gamlin 2017). Similar to the PLR, which receives descending projections from the cortical area (Ebitz and Moore 2017), there is some evidence of corticofugal projections to MOC neurons via subcortical nuclei  and top-down controls of the MOCR (Dragicevic et al. 2015). Specifically, modulation of the MOCR is achieved by orienting attention to a specific laterality (Froehlich et al. 1993), auditory target (Smith et al. 2012), frequency (Maison et al. 2001), working memory task (Marcenaro et al. 2021), or visual task (Delano et al. 2007;Wittekindt et al. 2014). Dragicevic et al. (2019) further provided evidence of the interaction between otoacoustic emissions (OAEs) and low-frequency cortical oscillations during selective attention, which supports the possibility that cognitive processing at cortical levels can modulate the MOCR via the corticofugal pathways. In addition, the stapedius muscle can be activated even without acoustic stimulation during (and in anticipation of) vocalization to reduce self-stimulation (Borg and Zakrisson 1975), and some individuals can voluntarily engage the MEMR. Although this voluntary control of the MEMR is expected to be attributed to the descending projections from the cerebral cortex to stapedius motoneurons, no direct evidence of this has been provided (Mukerji et al. 2010). Given the abundant evidence for corticofugal projections to MOC neurons, we hypothesized that anticipatory topdown MOCR control is plausible.
Furthermore, analogous to the PLR-mediated changes in eye movements as a result of changes in luminance, expectations about the intensity of upcoming sounds are also be important for the MOCR and are combined with temporal expectation for optimal responses. The MOCRinduced inhibition of the OHC amplification improves signal detection in noise by preventing auditory nerve adaptation to the noise (Kawase and Liberman 1993;Kumar and Vanaja 2004;Otsuka et al. 2020) and protecting cochlear sensory cells from acoustic overexposure (Maison and Liberman 2000;Maison et al. 2013;Wolpert et al. 2014;Otsuka et al. 2016). Stronger MOCR suppression facilitates antimasking and noise protective effects, but excessive suppression disrupts the detection of faint signals. Therefore, adaptive MOCR control based on upcoming sound intensity would be a reasonable solution to balance this.
In this study, we performed three experiments to evaluate the effect of expectation with respect to the timing (experiment 1 and 2) and intensity (experiment 3) of an upcoming stimulus on the MOCR. In experiment 1, we assessed the effect of stimulus-driven and exogenous, possibly inflexible and automatic, expectation by applying a preceding visual cue presented at a fixed interval before the MOCR elicitor, such that the physical temporal association between the cue and the MOCR elicitor notified the timing of the upcoming stimulus. In experiment 2, we explored whether the MOCR can be voluntarily controlled in a flexible and dynamic manner, or goaldirected and endogenous manner, by applying a symbolic cue whose appearance indicated the timing of the upcoming MOCR elicitor. In experiment 3, by applying the cued paradigm used in experiment 2, participants were informed about the intensity of the upcoming MOCR elicitor using a visual cue.

Participants
All participants provided informed consent, and the experiments were approved by the Research Ethics Committee of Chiba University (Chiba, Japan).
A total of 24 volunteers (3 males and 21 females) aged 21-32 years participated in experiment 1, and 12 of them (1 male and 11 females) were subjected to timingunpredictable conditions. A total of 11 volunteers (2 males and 9 females) aged 21-24 years participated in experiment 2, and 12 of them (3 males and 9 females) participated in experiment 3. Some volunteers were tested in more than one experiment, and in these cases, the order of the experiments was randomized for each participant to minimize the possible confounding effects of learning.

Equipment
Stimuli were digitally synthesized at a sampling rate of 48 kHz and converted to analog signals using a Roland OCTA-CAPTURE audio interface (16 bits; Roland). Analog signals were amplified by a headphone buffer and presented through Etymotic Research ER-2 earphones (Etymotic) connected to an ER-10B low-noise microphone system (Etymotic). Ear-canal sound pressure was recorded using an Etymotic Research ER-10B low-noise microphone system (Etymotic) inserted into each ear. Prior to the measurements, the outputs from the ER-2 were calibrated using a DB2012 accessory (External Ear Simulator) for the Ear Simulator Type 4257 system (Brüel and Kjaer, Naerum, and Denmark). Visual stimuli were displayed on a 10-inch LCD monitor (900 × 600). The viewing distance was ∼120 cm, and the display height was ∼90 cm. Participants were explicitly instructed to sit up straight, to not move away from or toward the display, and to maintain their gaze on the fixation point at the center of the display.

Assessment of MOCR Function
MOCR function was noninvasively evaluated through contralateral suppression of OAEs, which are sounds that originate in the cochlea and ref lect OHC motility (Kemp 1978). Contralateral suppression of OAEs refers to a reduction in OAE amplitude induced by contralateral acoustic stimulation. This effect is attributed to alterations in OHC motility mediated by the MOCR, which is induced by contralateral acoustic stimulation (Collet et al. 1990).
For measuring OAEs, click trains were presented to the right ear. The clicks had a duration of 100 μs and were presented at a 60-dB peak-equivalent sound pressure level (SPL) and at a rate of 50 times/s. For eliciting the MOCR, the noise was presented to the left ear and band-pass filtered between 100 and 10 000 Hz with a duration of 500 ms, including a 10-ms raised-cosine ramp. It is known that contralateral acoustic stimulation also induces a MEMR. However, the MEMR is generally induced by high-level sounds (>75 dB SPL). In our experiments, the MOCR elicitor was presented at 60-dB SPL. Hence, the OAE suppression observed in our experiment would be dominated by the MOCR.

Experiment 1: Temporal Expectation Induced by Visual Cue Presentation
Contralateral noise-induced MOCR was compared with and without a visual cue presented immediately before the MOCR elicitor during a timing-predictable and a timing-unpredictable condition. The visual cue consisted of a 10-cm square cross presented on a display. Participants pressed a button once when they heard the sound without visual cue and twice when they heard it with the cue. Participants were instructed to press the button slightly after the noise ended to avoid data contamination with artifacts associated with button presses.
In the timing-predictable condition, the interstimulus interval (ISI) between the visual cue and the MOCR elicitor was fixed across trials (250 ms), such that participants could predict the timing of the MOCR elicitor ( Fig. 2A). In the timing-unpredictable condition, the ISI changed across trials (randomly chosen between 250, 750, 1250, and 1750 ms), such that participants could not predict the exact timing of the MOCR elicitor (Fig. 2B). In both conditions, the MOCR elicitor was randomly presented 30 times for without-cue trials and 120 times for with-cue trials. The order of the trials within the two conditions was randomized across participants. The between-trial interval was randomized and ranged between 2 and 7 s. To avoid foreperiod effects, the comparison of OAE suppression was performed for data generated for the same time interval, that is, 250 ms.

Experiment 2: Temporal Expectation Induced by Visual Cue Size
We explored whether the MOCR can be voluntarily controlled, so that attention can be flexibly and dynamically shifted based on stimulus-driven temporal expectations.
The timing of the MOCR elicitor presentation was indicated by the size of the visual cue, whereby the appearance of a big (10 cm) or small (5 cm) cross primed the subjects to expect a long (1250 ms) or short (250 ms) ISI, respectively. Participants pressed the button once when hearing the noise without a visual cue and twice when hearing the noise with a visual cue. Participants were instructed to press the button slightly after the noise ended to avoid data contamination with artifacts associated with button presses. Unexpectedly late and unexpectedly early conditions were tested. In the former, one measurement block comprised 30 expectedly late trials (i.e., the cue accurately predicted the 1250-ms interval before the MOCR elicitor), 30 unexpectedly late trials (i.e., the cue predicted a short interval, but the MOCR elicitor appeared after 1250 ms), and 120 expectedly early trials (i.e., the cue accurately predicted the 250-ms interval before the MOCR elicitor). In the unexpectedly early condition, one measurement block comprised 30 expectedly early and 30 unexpectedly early trials (i.e., the cue predicted late intervals, but the MOCR elicitor appeared after 250 ms) and 120 expectedly late trials. The order of trials within the two conditions was randomized across participants. The between-trial onset interval was 2 s.

Experiment 3: Intensity Expectation Induced by Visual Cue Size
Utilizing a paradigm similar to that employed in experiment 2, we examined whether intensity expectations can modulate the MOCR. The intensity of the MOCR elicitor was indicated by the size of the visual cue, whereby the appearance of a big (10 cm) or small (5 cm) cross primed the subjects to expect a low-or high-intensity sound, respectively. Unexpectedly stronger and unexpectedly weaker conditions were tested. In the former, one measurement block comprised 30 expectedly stronger trials (i.e., the cue accurately predicted a 60-dB SPL sound), 30 unexpectedly stronger trials (i.e., the cue predicted a weak sound, which was instead presented at 60-dB SPL), and 120 expectedly weaker trials (i.e., the cue accurately predicted a weak sound presented at 40-dB SPL). In the unexpectedly weaker condition, one measurement block comprised 30 expectedly weaker trials (i.e., the cue predicted a weak sound presented at 60-dB SPL), 30 unexpectedly weaker trials (i.e., the cue predicted a weak sound, which was instead presented at 60-dB SPL), and 120 expectedly stronger trials (i.e., the cue accurately predicted a strong sound presented at 80-dB SPL). The order of trials within the two conditions was randomized across participants. The between-trial onset interval was 2 s.

Data Analysis
The recorded signals were band-pass filtered between 1 and 4 kHz to observe the MOCR-related suppression of click-evoked OAEs. Signals were divided into epochs with duration of 2.5 s, starting and ending 0.5 s before and 1 s after the onset of the MOCR elicitor, respectively. The number of epochs was equalized across the trial types. For each of the trials in each condition, 30 out of 120 epochs were randomly selected. For the with-cue trials in the timing-unpredictable condition, 30 epochs with a 250-ms ISI were selected. To maintain an acceptable signal-to-noise ratio, a lower limit of 25 artifactfree epochs per trial and condition was selected. The selected 25 epochs were averaged across trials for each condition, and a time series composed of 125 OAE waveform samples were obtained from the averaged epoch.
To smooth the f luctuations included in the time series, 10 adjacent OAE waveform samples were averaged for each time point. The OAE level was calculated as a root mean square (RMS) value for each waveform sample in the 8-18 ms region of the waveform. Finally, an MOCR time course was obtained by subtracting the baseline level from the time series of OAE levels. The baseline level was defined as the average OAE level in the 1-s period before the onset of the first stimulus in a series. The strength of the MOCR for each time course was defined as the mean suppression between 0.25 and 0.75 s after the onset of the preceding sound. We also calculated the RMS for the 0-4 ms region of each waveform sample that was band-pass filtered from 0.1 and 1 kHz (L 0−4 ms ), which is a measurement of MEMR strength. Such early portions of the waveforms ref lect the ringing of the click stimulus inside the ear canal, which can be utilized to assess eardrum ref lectance and thereby MEMR-induced changes in middle ear transmission (Feeney and Keefe 1999;Schairer et al. 2007).

Statistical Analysis
In experiments 1 and 3, a paired t-test was performed with the cuing and intensity of the MOCR elicitor (with and without the visual cue in experiment 1; expectedly and unexpectedly weaker, expectedly and expectedly stronger in experiment 3). In experiment 2, a repeatedmeasures analysis of variance (ANOVA) was performed with the timing of the MOCR elicitor (expectedly early, expectedly late, and unexpectedly late) as withinsubjects factor. In addition, a repeated-measures ANOVA was also used to assess L 0−4 ms . The Ryan-Einot-Gabriel-Welsh F procedure (REGWF) was employed for post-hoc comparisons.

Ear Characteristics
The ears of all participants had normal pure-tone audiometric thresholds (hearing loss < 20 dB) ranging from 0.5 to 8 kHz. All ears showed normal tympanogram results; the peak-compensated static compliance was 0.3-2.0 mL, and peak pressure ranged between −100 and +50 daPa. The mean peak-compensated static compliance was 0.84 (standard deviation [SD] = 0.39) for the right ear and 0.82 (SD = 0.39) for the left ear. The mean peak pressure was −9.7 (SD = 18.6) and −10.6 (SD = 19.3) for the right and left ear, respectively. The audiometry results are shown in Figure 1.

The Effects of Visual Cue Presentation on Temporal Expectation and the MOCR
The participants enrolled in experiment 1 had a mean age of 23.3 years (SD = 3.7). In the timing-predictable condition, visual cue presentation led to a stronger OAE suppression compared with the without-cue condition (T = −3.0, P = 0.0069; Fig. 2C). Mean OAE suppression with and without the visual cue was 0.99 dB (SD = 0.81) and 0.76 dB (SD = 0.90), respectively. L 0−4 ms with and without the visual cue was 0.46 dB (SD = 0.90) and 0.51 dB (SD = 1.85), which were not significantly different from zero (T = 2.45, P = 0.022; T = 1.33, P = 0.19) and did not statistically differ between each other (T = 0.15, P = 0.88).
In contrast, there was no difference in OAE suppression with or without the visual cue in the timingunpredictable condition (T = −0.58, P = 0.57; Fig. 2D). Mean OAE suppression with and without the visual cue was 1.26 dB (SD = 1.12) and 1.16 dB (SD = 1.23), respectively. L 0−4 ms with and without the visual cue was 0.016 dB (SD = 0.062) and 0.022 dB (SD = 0.062), which were not significantly different from zero (T = 0.83, Figure 2. Visual signal-induced temporal expectation modulates the MOCR, but only when the visual cue predicts stimulus onset timing. (A, B) Schematic representation of the task. Participants maintained their gaze on a fixation point. A 60-dB-SPL noise was contralaterally presented to elicit the MOCR and preceded by a visual cue. (A) In the timing-predictable condition, the ISI between the visual cue and the noise was constant at 250 ms. (B) In the timing-unpredictable condition, the ISI was randomly chosen between 250, 750, 1250, and 1750 ms, such that the subjects could not predict the exact timing of noise presentation. In both conditions, in one of five trials, the noise was presented without a preceding visual cue. The interval between the noise end and the onset of the next visual stimulus varied from 2 to 7 s for every trial. Participants pressed a button once when they heard the noise without visual cue and twice when they heard it with it. Participants were instructed to press the button slightly after the noise ended to avoid data contamination with artifacts associated with button presses. (C, D) Grand average of the time course of OAE suppression induced by the MOCR elicitor (left panels) and maximum OAE suppression with and without a preceding cue (right panels). (C) In the time-predictable condition, maximum OAE suppression was significantly stronger in trials with a visual cue compared with those without a preceding cue. (D) In the timing-unpredictable condition, there was no difference in OAE suppression between conditions with and without visual cue. The comparison between the two conditions was performed for data across the same time interval, that is, 250 ms. The light gray color indicates the duration of the MOCR elicitor. The light-colored area depicts the standard error. Error bars represent the standard error of the mean. * * P < 0.01 (paired t-test). P = 0.42; T = 1.17, P = 0.27) and did not statistically differ between each other (T = 0.94 P = 0.37).

The Effects of Visual Cue Size on Temporal Expectation and the MOCR
The participants enrolled in experiment 2 had a mean age of 21.3 years (SD = 2.4). The results of experiment 2 showed that sounds occurring unexpectedly later induced a weaker MOCR than those occurring expectedly later and those occurring earlier than expected induced a strong MOCR that was comparable to that elicited by expectedly early stimuli. Mean OAE suppression in the expectedly early, expectedly late, and unexpectedly late condition was 1.56 dB (SD = 0.90), 1.68 dB (SD = 0.80), and 1.18 dB (SD = 1.09), respectively. In addition, the timing of the MOCR elicitor had a significant effect on OAE suppression (F 2, 20 = 5.5, P = 0.012; Fig. 3C). Post-hoc comparison showed that unexpectedly late eliciting sound occurrence induced weaker OAE suppression compared with expectedly late (T = 3.2, P = 0.0046 < nominal level of P < 0.01; Fig. 3C) or early (T = 2.4, P = 0.027 < nominal level of P < 0.05; Fig. 3C) sound onset. Mean L 0−4 ms changes in the expectedly early, expectedly late, and unexpectedly late conditions were 0.063 dB (SD = 0.19), 0.14 dB (SD = 0.21), and −0.081 dB (SD = 0.39), which were not significantly different from zero (T = 1.02, P = 0.33; T = 2.13, P = 0.059; T = −0.67, P = 0.52, respectively). In addition, the timing of the MOCR elicitor had no effect on L 0−4 ms (F 2, 20 = 1.80, P = 0.19).

The Effects of Visual Cue Size on Intensity Expectation and the MOCR
The participants enrolled in experiment 3 had a mean age of 22.7 years (SD = 1.7). An unexpectedly stronger eliciting sound induced a weaker OAE suppression than that induced by an expectedly stronger eliciting Figure 3. Unexpectedly late sounds induce a weak MOCR, while unexpectedly early sounds induce a strong MOCR that is comparable to that elicited by sounds appearing at the expected moment. (A, B) Schematic representation of the task. Participants maintained their gaze on a fixation point in the center of the screen and were informed that a brief visual cue (either a small or big cross) indicating the ISI length (250 or 1250 ms) would follow. The trial rate for each combination is reported in parentheses. Participants pressed the button once when hearing the noise appearing at an expected timing and twice when hearing the noise appearing at an unexpected timing. Participants were instructed to press the button slightly after the noise ended to avoid data contamination with artifacts associated with button presses. (A) In the unexpectedly late condition, the noise appeared later than expected, once every six trials. (B) In the unexpectedly early condition, the noise appeared earlier than expected, once every six trials. (C, D) Grand average of the time course of OAE suppression induced by the MOCR elicitor (left panels) and maximum OAE suppression for trials with and without the preceding cue (right panels). (C) Sounds appearing unexpectedly late induced a weak MOCR, (D) while those that appeared unexpectedly early induced a strong MOCR, which was comparable to that appearing at the expected moment. Error bars represent the standard error of the mean. * P < 0.05, * * P < 0.01 (corrected for multiple comparisons with the REGWF procedure). sound (T = 3.3, P = 0.0074; Fig. 4C). Mean OAE suppression in the expectedly stronger and unexpectedly stronger conditions was 1.45 dB (SD = 1.24) and 1.14 dB (SD = 1.20), respectively. L 0−4 ms in the expectedly stronger and unexpectedly stronger conditions was 0.029 dB (SD = 0.22) and 0.12 dB (SD = 0.31), which were not significantly different from zero (T = 0.45 P = 0.66; T = 1.22, P = 0.25, respectively) and did not statistically differ between each other (T = −1.43 P = 0.18). In contrast, an unexpectedly weaker eliciting sound induced an OAE suppression that was comparable to that induced by an expectedly weaker eliciting sound (T = 0.29, P = 0.77; Fig. 4D). Mean OAE suppression in the expectedly stronger and unexpectedly stronger conditions was 1.09 dB (SD = 1.56) and 1.14 dB (SD = 1.24), respectively. L 0−4 ms in the expectedly stronger and unexpectedly stronger conditions was 0.056 dB (SD = 0.23) and −0.0017 dB (SD = 0.19), which were not significantly different from zero (T = 0.80 P = 0.44; T = −0.029 P = 0.98, respectively) and did not statistically differ between each other (T = 0.53 P = 0.61).

Discussion
We found that the MOCR was enhanced by a warning signal presented when the ISI was fixed but not when the interval changed across trials. In addition, a stronger MOCR was observed when the learned timing expectation matched with the timing of the sound but remained unvaried when these two factors did not match. These findings indicate that the MOCR can be voluntarily controlled in a goal-oriented and not only a stimulus-driven manner. By applying a similarly cued paradigm, we further showed that the MOCR is enhanced by the expectation of a stronger, but not of weaker, sound intensity. In contrast, the ringing inside the ear canal (L 0−4 ms ), which would reflect middle ear transmission and thereby MEMR, did not differ. These findings indicate that expectations relevant to the timing and intensity of upcoming sounds can modulate the MOCR, but not the MEMR, under flexible and preparatory control, thereby influencing the sensory transduction phase.
Top-down circuits are ubiquitous in the central nervous system (Elgueda and Delano 2020). The auditory cortex receives and is modulated by descending projections from other cortical areas, such as the frontal and nonauditory cortex, which create an attentional processing loop (Winer and Lee 2007). The auditory descending pathway originates in the auditory cortex and projects to the subcortical nucleus, reaching the cochlea through MOC fibers (Winer 2005; Terreros and Delano 2015). These . An unexpectedly stronger stimulus induces a weaker MOCR than a stimulus with an expectedly stronger intensity, whereas unexpectedly weaker sounds induce a weak MOCR, comparable with that appearing at an expectedly weaker intensity. (A, B) Schematic representation of the task. Participants maintained their gaze on a fixation point at the center of the screen and were told that a brief visual cue (either a small or big cross) indicated the intensity of the upcoming stimulus (40-or 60-dB SPL in the unexpectedly stronger condition, 60-or 80-dB SPL in the unexpectedly weaker condition). The ISI between cue and noise was fixed at 250 ms. Participants pressed a button once when they heard the noise appearing at an expected intensity, and twice when they heard the noise appearing at an unexpected intensity. Participants were instructed to press the button slightly after the noise ended to avoid data contamination with artifacts associated with button presses. (A) To examine the effect of an invalid cue on the MOCR, in the unexpectedly stronger condition, a stronger intensity noise appeared after the small cross once every six trials. (B) In contrast, in the unexpectedly weaker condition, a weaker noise appeared after the large cross once every six trials. The trial rates for each combination are reported in parentheses. (C, D) Grand average of the time course of OAE suppression induced by the MOCR elicitor (left panels) and maximum OAE suppression for trials with and without a preceding cue (right panels). (C) The MOCR induced by the unexpectedly stronger condition was weaker than that induced by the expectedly stronger condition, whereas (D) stimuli weaker than expected elicited an MOCR comparable to that elicited by expectedly weaker stimuli. Error bars represent the standard error of the mean. * * P < 0.01 (corrected for multiple comparisons with the REGWF procedure).
connections form a feedback loop that initiates and reinforces altered neural sound representations along the central auditory pathway (Suga and Ma 2003). Focal electrical stimulation in the auditory cortex evokes highly specific changes in the frequency, intensity, location, and duration of potentials in subcortical neurons (reviewed in King and Bajo 2013;Schofield and Beebe 2019;Suga and Ma 2003). The cortically driven modulation plays a role in perceptual learning (Bajo et al. 2010) and is hypothesized to mediate attentional modulation of auditory processing (Xiao and Suga 2002).
Concomitantly, previous literature has reported the positive effect of the MOCR on attention, despite some negative results (de Boer and Thornton 2007;Beim et al. 2018); studies measuring OAE-based MOCR in humans and experimental animals have shown that orienting attention to a specific laterality (Froehlich et al. 1993), auditory target (Smith et al. 2012), frequency (Maison et al. 2001), working memory task (Marcenaro et al. 2021), and visual task (Delano et al. 2007;Wittekindt et al. 2014) modulates the MOCR. Auditory training also enhances the MOCR, leading to speech-in-noise perception facilitation (de Boer and Thornton 2008). Anderson and Malmierca (2013) further showed that inactivation of the auditory cortex modulates stimulusspecific adaptation (SSA) of cells in subcortical areas (Anderson and Malmierca 2013). As SSA is a plausible mechanism underlying predictive coding (Baldeweg 2006;Winkler et al. 2009;Bendixen et al. 2012;Malmierca et al. 2015), the descending pathway may be related to forming or facilitating predictive processing at subcortical levels (Malmierca et al. 2015). In line with these animal studies, Riecke et al. (2020) found that the predictability of the frequency of upcoming tones alters OAE amplitude in a fashion that depends on the behavioral relevance of the tone sequences (Riecke et al. 2020). In addition, the authors reported a significant correlation between the increase in OAE amplitude and cortical auditory event-related potential in the case of predictability. This correlation provides evidence that auditory predictions concerning the frequency of a sound exert a top-down effect on the sensory processing of the auditory periphery via the corticofugal pathway. However, the present study is the first to show evidence that the descending pathway conveys an expectation about timing and intensity of an upcoming sound to the first stage of auditory processing, that is, the sensory transduction phase.
One may think that the participants just detected the visual cue but did not form any expectation concerning the timing of sound occurrence. However, in experiment 1, we found that the warning cue enhanced MOCR in the case that the ISI remains unchanged between the cue and the MOCR elicitor, which suggests that MOCR enhancement ref lects increased preparatory processes that become engaged by timing predictability. However, this datum does not suggest enhanced general attention readiness induced by the preceding visual cue. In addition, the lack of significant MOCR changes in the timing-unpredictable condition also implies that the number of button pressings did not inf luence the results. In experiment 2, where the visual stimulus appeared both in the expected and unexpected condition, the differences in the MOCR can be attributed to the temporal predictability of the elicitor. Therefore, the current study showed that stimuli that occur at expected times induce a stronger MOCR.
Similarly, previous studies have reported increased cortical responses to stimuli appearing at the expected moment, presumably due to expectation-mediated orienting of attention to the event (Ghose and Maunsell 2002;Lange et al. 2003;Doherty et al. 2005;Praamstra et al. 2006;Jaramillo and Zador 2011;Auksztulewicz et al. 2019). The timing-specific increase in neural excitability, or temporal attention, is plausibly underpinned by the entrainment of low-frequency cortical oscillations (<10 Hz, including delta, theta, and low alpha bands) by periodic, thereby temporally predictable, stimulation (Lakatos et al. 2008;Schroeder and Lakatos 2009;Arnal and Giraud 2012;Auksztulewicz et al. 2019), and aperiodic stimulation when the timing of the stimulus occurrence is predictable (Morillon et al. 2016;Breska and Deouell 2017;Rimmele et al. 2018). Increased cortical neural entrainment associated with temporal attention could modulate the activity of MOC neurons via the corticofugal pathway, which could thus underlie the enhanced OAE suppression observed in our study. In line with this hypothesis, Dragicevic et al. (2019) provided evidence of the interaction between OAEs and low-frequency cortical oscillations during selective attention, which supports that attentional processing at the cortical level can modulate the OHC gain via the corticofugal pathways. In contrast, temporal predictability has also been reported to suppress cortical and brainstem potentials (Lange 2009;Costa-Faidella et al. 2011;Todorovic et al. 2011;Gorina-Careta et al. 2016), which could be evidence of the predictive coding hypothesis, which posits that the neural responses to expected stimuli should be suppressed (Friston 2005). Enhanced MOCR associated with temporal predictability, which leads to increased suppression of the cochlear response, can be understood as a part of a predictionbased inhibition network underling the predictive coding framework.
The descending pathway from the auditory cortex is not the only pathway that can modulate the MOCR. Studies on animal models have demonstrated the presence of corticofugal projections from the frontal cortex to the inferior colliculus (Olthof et al. 2019;Elgueda and Delano 2020), which could be an alternative circuit to the top-down MOC efferent pathway. In addition, the enhancement of the MOCR can be explained by the increased firing rate of auditory nerves via the lateral olivocochlear (LOC) fibers, which innervate auditory nerves (Guinan 2006). However, the attentional modulation possibly operated by the LOC system has not been examined.
Previous psychological data have shown that the temporal orienting effect on target detection is only significant for invalidly cued targets that appear earlier than expected (Coull and Nobre 1998;Miniussi et al. 1999;Griffin et al. 2001Griffin et al. , 2002. A similar dependence of responses on the foreperiod interval length has been observed in multiple brain areas, as revealed by event-related potentials, which reflect the decision-making state or memory trace originating in the prefrontal area (Miniussi et al. 1999;Griffin et al. 2001Griffin et al. , 2001, and by single-cell recordings in the auditory cortex (Jaramillo and Zador 2011). A possible explanation for the previous results is that invalidly cued targets appearing later than expected provide enough time for attention to be re-oriented to the later interval as a result of increasing conditional probabilities over time.
However, we found that, when the elicitor is presented earlier than expected, the degree of the MOCR is comparable to that occurring at the expected moment. This result implies that MOCR enhancement starts immediately after cue presentation and lasts until the expected time, disappearing afterward. This discrepancy in the dependance of the response on the length of foreperiod interval suggests that the MOCR can be voluntarily controlled in a flexible or dynamic manner but is not controlled by the re-orienting effect according to conditional probabilities. Although the neural substrate that modulates MOCR responses according to temporal expectation is likely to correspond to the top-down control of the auditory cortex via corticofugal projections to the subcortical MOCR circuits , its principal operation would differ from that in cortical areas and might be aimed at fast updating subsequent expectations after the expected moment. Alternatively, the weaker MOCR observed when the learned expectation and timing match might be attributed to the release of the learned expectation from working memory. Marcenaro et al. (2021) found that larger acoustic suppression of distortion-product OAEs arise during a visual working memory task than during control conditions, in which the same stimuli as those of the working memory task were presented, but no task was performed (Marcenaro et al. 2021). In a cuing task, like in a working memory task, participants are forced to retain what a cue indicates until the expected timing of event occurrence, which would enhance the MOCR-related OAE suppression. After the expected event occurrence, listeners might release the memory, and therefore, the enhancement of OAE suppression would disappear.
With respect to optimization, our result that the MOCR is enhanced by expectation of stronger, but not weaker, intensity sounds is reasonable. As mentioned above, the MOCR inhibits OHC motility and protects the sensory system from acoustic overexposure (Maison and Liberman 2000;Otsuka et al. 2016). The suppression induced by the MOCR also improves the detection of signals in noise by preventing the adaption of auditory nerves to the noise and maintaining their responsiveness to upcoming targets (Kawase and Liberman 1993;Kumar and Vanaja 2004;Otsuka et al. 2020). Stronger suppression facilitates noise protection and antimasking effects, but too strong suppression disrupts the detection of faint signals. In this sense, the enhancement of the MOCR, that is, a stronger suppression only for stronger sounds, could be a reasonable solution to find balance in the trade-off.

Funding
KAKENHI grant from the Japan Society for the Promotion of Science (grants 18 K18066, 21 K17757).

Notes
Conf lict of Interest: The authors declare no competing interests.