Hallucination Proneness Alters Sensory Feedback Processing in Self-voice Production

Abstract Background Sensory suppression occurs when hearing one’s self-generated voice, as opposed to passively listening to one’s own voice. Quality changes in sensory feedback to the self-generated voice can increase attentional control. These changes affect the self-other voice distinction and might lead to hearing voices in the absence of an external source (ie, auditory verbal hallucinations). However, it is unclear how changes in sensory feedback processing and attention allocation interact and how this interaction might relate to hallucination proneness (HP). Study Design Participants varying in HP self-generated (via a button-press) and passively listened to their voice that varied in emotional quality and certainty of recognition—100% neutral, 60%–40% neutral-angry, 50%–50% neutral-angry, 40%–60% neutral-angry, 100% angry, during electroencephalography (EEG) recordings. Study Results The N1 auditory evoked potential was more suppressed for self-generated than externally generated voices. Increased HP was associated with (1) an increased N1 response to the self- compared with externally generated voices, (2) a reduced N1 response for angry compared with neutral voices, and (3) a reduced N2 response to unexpected voice quality in sensory feedback (60%–40% neutral-angry) compared with neutral voices. Conclusions The current study highlights an association between increased HP and systematic changes in the emotional quality and certainty in sensory feedback processing (N1) and attentional control (N2) in self-voice production in a nonclinical population. Considering that voice hearers also display these changes, these findings support the continuum hypothesis.


Introduction
Sensations arise inevitably and incessantly from various internal and external sources.As we can predict the sensory consequences of self-generated actions, we suppress these sensations.For example, we perceive the sound of our own footsteps as less intense than those of another person.Accordingly, self-and externally generated events differ in how we respond and adjust to them in a dynamic environment.2][3] The model suggests that an internal copy of a motor plan (efference copy) is used to predict the sensory consequences of self-generated actions to prepare the brain for incoming sensory information.The perceived sensory feedback (reafference signal) is processed by comparison to this prediction, resulting either in a match or a mismatch (prediction error). 4,5Prediction errors, in turn, allow adaptation and updating of predictions to continuously optimize behavior.
These processes have been studied in voice production and perception.Neural activity in the auditory cortex is suppressed when we speak and hear our own voice compared with when we listen to someone else's voice. 6his suppression appears to stem from the comparison between predicted and actual sensory feedback to the S. X. Duggirala et al self-voice, as suggested by the forward model framework.Electrophysiologically, this phenomenon is captured by the N1 event-related potential (ERP) suppression effect, ie, the difference in the N1 amplitude for self-generated and externally generated voices during real-time talking [7][8][9] but also when self-voices are "self-generated" via a button-press. 10Changes in the acoustic properties of the self-generated voice, eg, during a cold or vocal strain, can result in a mismatch between the predicted and the actual sensory feedback to the self-voice.5][16][17][18] Unexpected changes in sensory feedback might evoke a surprise response (increased N1 12,13 ) that, in turn, can increase error awareness and attentional control (increased N2 19,20 ).

Hallucination Proneness and Sensory Suppression
1,42 While the underlying cognitive and neural mechanisms of AVH seem to somehow overlap in voice hearers with and without a psychotic disorder, 28,29,43,44 differences pertain to the perceived emotional quality, appraisal, controllability, and related distress. 22,457][48] This distinction in emotional voice quality and the potentially resulting distress are linked to deficits in the recognition and appraisal of vocal emotions in both voice hearers with [49][50][51] and without a psychotic disorder. 521][62][63] This imbalance might lead to the misattribution of a negative meaning to neutral stimuli and the perception of meaningful information (eg, speech) in noise, 60,[63][64][65][66] ultimately leading to false perceptions-AVH.Taken together, these findings emphasize the interdependence and mutual influence between alterations in sensory perception and predictive processing in voice hearers.Therefore, by manipulating emotional quality and thereby altering the perceptual certainty of self-voice recognition, one can probe both changes in sensory predictive processing as well as attention allocation in those who are more prone to AVH, highlighting transitions along the HP spectrum.

The Current Study and Rationale
Using a well-validated EEG motor-auditory task and building on own prior work (figure 1), 10 the current study examined whether systematic changes in sensory feedback processing of the self-voice as a function of HP lead to altered sensory suppression (N1 and P2) and attentional control (N2).The emotional quality of the self-voice was manipulated to change the level of certainty in sensory feedback processing (100% neutral; 60%-40% neutralangry; 50%-50% neutral-angry; 40%-60% neutral-angry; and 100% angry).For the self-voice ie most certain (100% neutral and 100% angry), we expected a reduction of the classical N1 suppression effect (self-< externally generated) with higher HP. 10 For the uncertain self-voice (60%-40% neutral-angry; 50%-50% neutral-angry; 40%-60% neutral-angry), we expected a reversed N1 suppression effect (self-> externally generated) with increasing levels of uncertainty regarding sensory feedback, in persons with low compared with high HP.Similar effects were expected for the P2 response that indicates the conscious detection of self-generated stimuli. 15,67,68Considering that the presumed alterations are linked to attentional control and error awareness, a reduced or reversed N2 suppression effect (self-> externally generated) was expected for the certain compared with uncertain self-voice with higher HP.

Participants
Twenty-nine adults (age range 18-27 years) were recruited.All participants were first invited for a voice recording, followed by the EEG session.Three participants did not participate in the EEG sessions due to time constraints, whereas 1 participant was excluded from further analysis due to technical issues during the EEG data collection.Therefore, the final participant number was 25 (21 females, mean age = 21.24,SD = 2.49 years; Self-voice Processing and Hallucination Proneness 21 right-, 3 left-handed, and 1 ambidextrous) varying in HP measured with the Launay Slade Hallucinations Scale (LSHS) [69][70][71][72] total scores: mean = 18.56,SD = 10.17,max = 42, min = 3; LSHS AVH scores [sum of items: "In the past, I have had the experience of hearing a person's voice and then found no one was there," "I often hear a voice speaking my thoughts aloud," and "I have been troubled by voices in my head"]: mean = 2.40, SD = 2.62, min = 0, max = 11).All participants provided their written informed consent before the start of the study.They either received financial compensation (vouchers) or study credits for their participation.All participants self-reported normal or corrected-to-normal visual acuity and normal hearing.Participants were excluded (1) if they reported the presence of current or past psychiatric illness, (2) if voice hearing was solely attributed to substance abuse, and (3) if they were unable to recognize and differentiate between their own voice and familiar voices.The study was approved by the Ethics Committee of the Faculty of Psychology and Neuroscience at Maastricht University and performed in accordance with the approved guidelines and the Declaration of Helsinki (ERCPN-176_08_02_2017_S2).

Procedure
All participants underwent 2 study sessions conducted on separate visits.During the first voice recording session, "ah" and "oh" vocalizations from each participant were recorded and morphed (see section A of supplementary material) to create the final (100% neutral; 60%-40% neutral-angry; 50%-50% neutral-angry; 40%-60% neutral-angry; 100% angry) voice morphs for the EEG experiment.During the second session, EEG was recorded while the participants performed the auditorymotor task (figure 1; see section A of supplementary material).The task was programmed and presented using the Presentation software (version 18.3; Neurobehavioral Systems, Inc.).Stimuli were presented via ear inserts.Button presses were recorded via the spacebar button on the keyboard.Participants were given an overview of the procedure and the principles of EEG at the start of the session.They sat comfortably in an electrically shielded soundproof chamber in front of a screen placed about 100 cm away.Participants filled in the LSHS questionnaire while the EEG cap was prepared.
The paradigm was presented in a fully randomized event-related design over 12 runs.Each run consisted of 80 trials (40 auditory only condition [AO], 40 motor auditory condition [MA], and 10 motor only condition [MO]).Each trial started with a fixation cross, after which the presentation (vertical or horizontal) of a cue was jittered between 400 and 1000 ms.The cue was then followed by an auditory stimulus (after 500 ms for AO) or a button-press that could (MA) or not (MO) elicit an auditory stimulus.Five types of voice morphs consisting of "ah" and "oh" vocalizations, respectively, were presented in the AO and MA conditions.Thus, each run consisted of 4 trials of 10 stimulus types each ("ah" and "oh" for 5 voice morphs).This included 96 trials per voice morph ("ah" and "oh" combined; see supplementary table 1).Participants were given short breaks after each run.To minimize potential influences of lateralized motor activity, participants were asked to switch their response hand every 3 runs.Prior to the experiment, participants were trained to press the button within 500 ± 100 ms after the cue (horizontal bar) to align the presentation of auditory stimuli in the MA and AO conditions and to avoid overlap of cue-elicited and motor activation.Please note that the term "self-generated voice" in the current manuscript specifically denotes the self-voice generated by the participant through a button-press during the MA condition.Throughout this manuscript, "self-generated voice" will consistently refer to the voice produced through a button-press by the participant.

Stimulus Rating
At the end of the EEG session, participants rated their voices in arousal and valence (see supplementary figure 1).They additionally rated the voices in perceived ownness, ie, how much they identified their self-voice on a Likert scale (1-10).This was done to ensure that participants recognized their own voice and perceived the emotion expressed by it.Participants were debriefed after the experiment was finished.

EEG Data Acquisition and Preprocessing
EEG data were recorded with BrainVision Recorder (Brain Products, Munich, Germany) using an ActiChamp 128-channel active electrode setup while participants performed the auditory-motor task.Data were acquired with a sampling frequency of 1000 Hz, and an electrode impedance below 10 kΩ, using TP10 as an online reference.During the EEG recording, participants were seated in a comfortable chair about 100 cm away from the screen in an acoustically and electrically shielded chamber.
EEG data were preprocessed (see section A of supplementary material) using the Letswave6 toolbox (https://github.com/NOCIONS/letswave6)running on MATLAB 2019a.The grand averaged waveforms revealed 3 ERP components: 2 negative components peaking at approximately 164 and 460 ms, respectively, and 1 positive component peaking at 286 ms.As the latencies of the ERP responses varied significantly (see supplementary table 2), peak amplitudes were chosen as an outcome measure.The N1 peak amplitude was defined as the largest negative peak occurring between 80 and 230 ms, the P2 peak amplitude was defined as the following positive peak between N1 and 380 ms, and the N2 peak amplitude as the negative peak between the P2 and 600 ms. 73,74Previous research showed that all ERP components of interest have prominent frontomedial and frontocentral topographies. 6,75,76Therefore, the N1, P2, and N2 responses were extracted from the same frontocentral region of interest that included 21 electrode locations: AFF1h, AFF2h, F1, Fz, F2, FFC3h, FFC1h, FFC2h, FFC4h, FC3, FC1, FCz, FC2, FC4, FCC3h, FCC1h, FCC2h, FCC4h, C1, Cz, and C2 (see figure 2).

Statistical Analyses
Statistical analyses on N1, P2, and N2 data were performed in R version 4.2.2 (2022-10-31) Copyright (C) 2022, using linear mixed modeling with lmer and lmerTest packages. 77,78We used linear mixed modeling to control for the random effects of participants influencing the outcome measure.Additionally, since HP measured by the LSHS is a continuous variable, linear mixed modeling was considered more appropriate than classical ANOVA to analyze the impact of HP on sensory feedback (condition) and voice quality (stimulus type).Amplitude values of the ERPs (N1/P2/N2) were used as outcome measures, while participants were used as random effects, and condition (2 levels: motor auditory corrected [MAc] and AO), stimulus type (5 levels: 100% neutral, 60%-40% neutral-angry, 50%-50% neutral-angry, 40%-60% neutral-angry, 100% angry) and LSHS total or LSHS AVH scores (continuous variable) were included as fixed effects in the models.For all models, the Gaussian distribution of model residuals and quantile-quantile plots confirmed their respective adequacy.

Results
We followed a hypothesis-driven approach to probe changes in voice quality (stimulus type) and sensory prediction (condition) as a function of HP.
N1: To probe the influence of HP (based on LSHS total scores) on condition and stimulus type, we tested the model [m1_N1 <-lmer(N1 ~ + Condition * LSHS total + Stimulus Type * LSHS total + (1|ID), data = data, Restricted Maximum Likelihood (REML) = FALSE)] against the null model [m0_N1], which showed the best goodness of fit and yielded a significant difference (χ 2 (11) = 24.072,P = .01*;the Akaike Information Criterion (AIC) = 432.93;see table 1 and figure 3).We thus replicated the N1 sensory suppression effect showing that externally generated (AO) voices lead to a larger (more negative) N1 response than self-generated (MAc) voices.We also observed an overall decrease (less negative) in the N1 response independent of condition (AO or MAc) with increased HP (LSHS total scores) for the angry compared with neutral voice (see table 1 and figure 3).
P2: The analysis of the P2 followed the same procedure as for the N1.However, the results indicated that HP (based on LSHS total or AVH scores) did not significantly affect sensory prediction (condition) or voice quality (stimulus type) (see section B of supplementary material. N2: The model that showed the best goodness of fit [m1_N2 <-lmer(N2 ~ + Condition * LSHS total + Stimulus Type * LSHS total + (1|ID), data = data, REML = FALSE)] also yielded a significant difference (χ 2 (11) = 27.44,P = .003**;AIC = 323.15;see table 2 and figure 3) when compared against the null model

Discussion
This EEG study investigated how changes in sensory feedback processing of the self-voice link to HP and might engage attentional resources by manipulating the emotional quality of the self-voice.This manipulation aimed to change the certainty of self-voice recognition.The data analyses focused on the N1, P2, and N2 ERP components elicited by the self-and externally generated self-voice, in certain (ie, unmorphed) and uncertain (ie, morphed) conditions.The results replicated previous findings, 10,67 confirming an N1 suppression effect when comparing sensory feedback processing for the self-and externally generated voice.Critically, this N1 suppression effect was reduced in high HP (based on both LSHS total and AVH scores), confirming a link between HP and altered sensory feedback processing.Moreover, regardless of condition, high HP (based on LSHS total scores) was associated with decreased attention allocation indicated by a reduced N1 response to angry compared with neutral voice, as well as with lower error awareness reflected in a reduced N2 response to the uncertain (60%-40% neutral-angry morph) compared with neutral voice.However, HP did not modulate the P2 responses.Overall, these results confirm that HP is associated with sensory feedback processing, and it suggests that attention allocation for the self-generated voice varies with HP in a group of nonclinical individuals.

Sensory Feedback Processing and Attention Allocation as a Function of HP
Replication of the classical N1 sensory suppression effect 10,14,15,67,68,79 likely indicates that the auditory cortex is prepared for the sensory consequences of the selfgenerated voice.However, increased HP was associated with an increased N1 response for the self-generated

S. X. Duggirala et al
voice, thus reducing the N1 suppression effect.This may indicate altered sensory feedback processing for the selfgenerated voice as well as increased attentional resource allocation toward sensory feedback processing in high HP individuals.One may consider that this alteration and the need for additional resources stem from a less efficient comparison of expected and actual sensory input and the resultant error signal, which might lead to hyperaccentuation of the self-voice.This perspective is supported by previous studies with voice hearers with 8,9,42,80 and without a psychotic disorder 10 using similar paradigms.Altered responses to the self-generated voice might indicate that subtle changes in self-monitoring might already be present in nonclinical persons with high HP.
Furthermore, regardless of condition (AO or MAc), the N1 response to the angry compared with neutral self-voice was reduced in high HP participants, likely indicating differences in their response when the emotional quality of their voice becomes (fully) negative.Prior research indicates that high HP persons tend to show a reduced sensory processing of negative emotional cues, based on their ability to control attentional bias toward negative cues. 53Therefore, the current results may point to a link between high HP and reduced appraisal of and inhibition of attention allocation to negatively valenced voice input in a nonclinical sample.
Contrary to our expectations, HP did not modulate the P2 in sensory feedback processing of the self-voice.The N1 and P2 have been linked to dissociable effects when attributing a sensory event to one's own action.Whereas the N1 suppression effect seems to reflect the outcome of the comparison of expected and actual sensory input, the P2 was associated with the more conscious realization that a finger tap elicited a concomitant auditory stimulus. 14,15,68The present task, which involved the pseudorandom interweaving of conditions (MA, AO, and MO) and stimuli (5 types of "ah" and "oh" vocalizations each), may have precluded sufficient opportunity for the P2 to engage in conscious processing of a button-press eliciting the self-voice.
The N2 was reduced for the 60%-40% neutral-angry compared with the 100% neutral self-voice in high HP individuals regardless of the condition.Prior pilot data showed that anger expressed in "ah" vocalizations was already recognized in the initial morphing steps, ie, the 70%-30% neutral-anger voice in the neutral-angry continuum.It is therefore possible that the 60%-40% neutralangry self-voice, among the 5 presented voice types, marks a distinct shift from perceiving a voice as neutral to detecting anger, imbuing the perception of an uncertain voice.Consequently, this specific self-generated voice may have yielded the most equivocal outcome regarding the perceptual uncertainty of the self-voice.Functionally, the N2 has been linked to error awareness, attentional control, and conscious processing of perceptual novelty. 81,82hus, the reduced N2 to this uncertain self-voice in high HP individuals might suggest an altered response to unexpected change or error awareness.Additionally, the   1).The N2 response decreased with an increase in HP for the most uncertain self-voice, regardless of the conditions (see table 2).Note: HP, hallucination proneness; LSHS, Launay Slade Hallucinations Scale.

S. X. Duggirala et al
N2 has been linked to heightened emotional reactivity to negative compared with neutral stimuli. 83Taken together, the reduced N2 in high HP individuals may thus indicate downregulation of negative emotional reactivity, reduced error awareness, and processing of an uncertain self-voice.
Although the N1 suppression effect was observed for the self-generated voice, there was no significant interaction between condition (AO and MAc) and stimulus type (5 types of self-voice).This suggests that the self-voice manipulations were still within the acceptable range of feasible acoustic changes and therefore, we did not find differential suppression effects for the different types of self-voices (see supplementary figure 1).Furthermore, the lack of this interaction in the N1 could be the result of stimulus type probability (2:3 for certain:uncertain).5][86] Taken together, the unexpected self-voices might not have induced sufficiently different perceptual processing either because they were presented more frequently, or because they did not differ sufficiently in their acoustic profile.Consequently, there was no difference in the N1 suppression effect among the self-voices.Some specificities of the task design should be noted.Unlike the classical ERP suppression paradigm, where different conditions are presented in a blocked design, 10,14,15,68 here all conditions and stimuli were presented in a fully event-related design.Due to the mixing of conditions, a cue was introduced to indicate whether the participant was required to press a button to generate a self-voice or to passively listen to the self-voice.While this cue was removed from the MA by subtracting the MO condition for the final analysis, it remained present in the AO condition resulting in a prestimulus positive potential (see figure 2).Next to the presence of the cue, the duration between the cue and the auditory stimulus was constant (500 ms).Both factors caused the participants to pay close attention and made them anticipate the onset of the voice in the externally generated condition.However, even though the temporal delay was similar in the self-and externally generated conditions, we observed a significant N1 suppression effect (AO > MAc).This could be attributed to a confluence of factors.Studies have reported that sensory suppression is not driven by the motor action per se but by the voluntary intention involving motor planning to self-generate an action (eg, a voice). 87,880][91][92][93] Together, the performance of motor action in the self-generated condition may take away attention from listening to the generated stimulus, which differs from a cued listening condition. 94,95These factors collectively may influence how attentional resources are directed toward diverse sensory input and to the different N1 responses to the self-vs externally generated voice.

Self-voice Processing and Hallucination Proneness
Future studies with larger samples are needed to replicate the current findings.These studies should also include samples of nonclinical and clinical voice hearers, in addition to participants varying in HP.This approach should facilitate a comprehensive exploration of alterations in sensory prediction and attentional control across the entire spectrum of voice hearing.To ascertain the specificity of AVH, it is also crucial for future studies to explore the correlation between ERP data during sensory suppression paradigms and non-AVH-related items from the LSHS.Previous investigations focused on how we process uncertainty in sensory feedback, especially in how self-voice changes to someone else's voice, highlighting the relevance of self-identity. 96The current study, however, marks the first exploration of how uncertainty about one's own emotional self-voice quality changing from neutral to angry, impacts sensory feedback processing and attentional control as a function of HP.Prior research has also highlighted stronger alterations for negative than positive vocalizations, 53 underscoring the role of emotional valence and refuting a general uncertainty phenomenon.Future studies should therefore, aim to uncover similar alterations, concentrating on morphing from neutral to positive emotions, to elucidate whether changes in sensory feedback processing and attentional control are specific to self-voices displaying a change from neutral to angry voice quality or extend to other emotions.
Taken together, the current results link increased HP to changes in sensory feedback processing and attentional engagement to the self-voice in nonclinical participants varying in HP.Specifically, these findings suggest that the processing of sensory consequences of one's own actions is attenuated, but this attenuation decreases with an increase in HP.High HP is also associated with reduced attention allocation to the angry compared with the neutral voice, demonstrating their ability to effectively manage negative content. 53The current findings thus support the continuity perspective regarding changes in sensory feedback processing and attention allocation previously reported in voice hearers. 8,42,80,97,98Nevertheless, to strengthen this concept, further investigations involving participants across the psychosis continuum, including nonclinical persons who do not hear voices, voice hearers with and without psychotic disorders, are warranted.

Fig. 1 .
Fig. 1.Graphical representation of the motor-auditory task.Note: AO, auditory only condition; ERPs, event-related potentials; MA, motor auditory condition; MO, motor only condition.Motor activity from the MA condition was removed by subtracting MO from MA to obtain MA corrected condition (corrected MA: MAc).Statistical analyses were performed with ERPs from MAc and AO conditions.
Self-voice Processing and Hallucination Proneness[m0_N2; AIC = 328.59].The N2 for the self-generated (60N) self-voice compared with the neutral self-voice decreased (less negative) with an increase in HP (LSHS total score).

Fig. 2 .
Fig. 2. Grand average ERP waveforms ± SE of the mean and topographic maps comparing self-generated (via a button-press) and externally generated voices for the 5 self-voice types over a frontocentral ROI.Note: AO, auditory only condition; ERP, event-related potential; MAc, motor auditory corrected; ROI, region of interest.

Fig. 3 .
Fig.3.Scatter plots depicting N1 and N2 modulations as a function of HP based on LSHS total scores, for each stimulus type.The N1 response for the self-generated voice increased (more negative) with an increase in HP (see table1).The N2 response decreased with an increase in HP for the most uncertain self-voice, regardless of the conditions (see table2).Note: HP, hallucination proneness; LSHS, Launay Slade Hallucinations Scale.

Table 1 .
Linear Mixed Effects Model for the N1 Including the Effect of HP Based on LSHS Total Scores Degrees of freedom for fixed effects: df = 225.0(except intercept: df = 29.03).AO, auditory only condition; HP, hallucination proneness; LSHS, Launay Slade Hallucinations Scale.

Table 2 .
Linear Mixed Effects Model for the N2, Including the Effect of HP Based on LSHS Total Scores Degrees of freedom for fixed effects: df = 225.0(except intercept: df = 29.2785).AO, auditory only condition; HP, hallucination proneness; LSHS, Launay Slade Hallucinations Scale.