-
PDF
- Split View
-
Views
-
Cite
Cite
Ineke Fengler, Pia-Céline Delfau, Brigitte Röder, Early Sign Language Experience Goes Along with an Increased Cross-modal Gain for Affective Prosodic Recognition in Congenitally Deaf CI Users, The Journal of Deaf Studies and Deaf Education, Volume 23, Issue 2, April 2018, Pages 164–172, https://doi.org/10.1093/deafed/enx051
Close - Share Icon Share
Abstract
It is yet unclear whether congenitally deaf cochlear implant (CD CI) users’ visual and multisensory emotion perception is influenced by their history in sign language acquisition. We hypothesized that early-signing CD CI users, relative to late-signing CD CI users and hearing, non-signing controls, show better facial expression recognition and rely more on the facial cues of audio-visual emotional stimuli. Two groups of young adult CD CI users—early signers (ES CI users; n = 11) and late signers (LS CI users; n = 10)—and a group of hearing, non-signing, age-matched controls (n = 12) performed an emotion recognition task with auditory, visual, and cross-modal emotionally congruent and incongruent speech stimuli. On different trials, participants categorized either the facial or the vocal expressions. The ES CI users more accurately recognized affective prosody than the LS CI users in the presence of congruent facial information. Furthermore, the ES CI users, but not the LS CI users, gained more than the controls from congruent visual stimuli when recognizing affective prosody. Both CI groups performed overall worse than the controls in recognizing affective prosody. These results suggest that early sign language experience affects multisensory emotion perception in CD CI users.
Uni- and multisensory emotion perception in CI users
The ability to derive the affective state of a speaker from his or her voice and/or face is important for a quick and smooth interpersonal communication. An interesting, yet largely unexplored question is how congenitally deaf cochlear implant (CD CI) users, who experienced transient auditory deprivation from birth, accomplish this task.
Previous studies have suggested that CI users generally display impaired affective prosodic recognition capabilities compared to hearing control participants, irrespectively of their age at deafness onset (Chatterjee et al., 2014; Luo, Fu, & Galvin, 2007; Most & Aviner, 2009). In fact, it has been proposed that the CI as a device limits the perception of vocal emotional cues, probably because of the limited range of available pitch information (Gfeller et al., 2007; Marx et al., 2015; Reiss, Turner, Karsten, & Gantz, 2014).
Research on affective facial expression recognition in CI users has revealed rather inconsistent results: Whereas some investigators have failed to find differences in recognition accuracy between early deaf CI users and controls (Hopyan-Misakyan, Gordon, Dennis, & Papsin, 2009; Most & Aviner, 2009; Most & Michaelis, 2012; Ziv, Most, & Cohen, 2013), other investigators have reported relatively lower performance in the early deaf CI users (Wang, Su, Fang, & Zhou, 2011; Wiefferink, Rieffe, Ketelaar, De Raeve, & Frijns, 2013). “Early deafness” is commonly defined as acquired deafness with onset before the age of 3 years. Because performance differences were mostly observed in preschool-aged children, a delayed rather than impaired acquisition of facial expression recognition abilities in early deaf CI users was proposed (Wang et al., 2011; Wiefferink et al., 2013).
Two studies with 4–6-year-old and 10–17-year-old early deaf CI users, respectively, investigated not only unisensory, but also multisensory emotion recognition (Most & Aviner, 2009; Most & Michaelis, 2012). The results suggested that adolescent, but not preschool-aged, CI users rely more on the facial as compared to the vocal information when presented with audio-visual emotional stimuli: Whereas the adolescents gained from congruent cross-modal information relative to auditory, but not visual signals, the preschoolers displayed a cross-modal congruency gain relative to both unimodal conditions. Age at implantation did not have a significant effect in the adolescent CI users. However, neither study found performance differences between the CI users and hearing controls in the visual-only condition. Rather than a delayed acquisition of typical facial expression recognition abilities (cf. Wang et al., 2011; Wiefferink et al., 2013), these results suggest that the reliance on visual cues in the presence of cross-modal information may increase with age in CI users.
It has to be pointed out that the available evidence on visual and audio-visual emotion recognition comes from CI users with “early” (or “pre-lingual”) deafness onset including participants with both congenital and late onset of deafness. Furthermore, the CI users’ history in sign language has not yet been taken into account. Arguably, it is important to investigate multisensory emotion recognition (a) in a homogeneous group of CD CI users and (b) as a function of sign language experience.
Congenital sensory deprivation has been associated with larger functional changes within the intact modalities as compared to acquired sensory loss (Hensch, 2005; Merabet & Pascual-Leone, 2010). In particular, congenital auditory deprivation has been related to specific visual function improvements such as enhanced reactivity to visual events and improved visual processing in the periphery (for reviews, see Bavelier, Dye, & Hauser, 2006; Pavani & Bottari, 2012) as well has more efficient tactile processing (Nava et al., 2014). Important in the present context are findings suggesting that CD individuals outperformed hearing controls in facial discrimination and facial memory (Arnold & Murray, 1998; Bettger, Emmorey, McCullough, & Bellugi, 1997; McCullough & Emmorey, 1997). Probably, both congenital deafness and the availability of a sign language from birth contribute to this superiority: For example, Arnold & Murray (1998) reported that deaf native signers outperformed hearing native signers in facial memory, whereas hearing native signers outperformed hearing non-signers. The authors discussed that deafness and sign language experience (i.e., experience with linguistic facial expressions) may have additive effects in enhancing visual skills such as facial discrimination and attention to facial details.
Moreover, the development of emotion perception seems to be facilitated by early language experience. Emotion socialization theories propose that children start learning to comprehend emotions by means of exposure and by observational learning. This process presumably is enhanced by verbal instructions provided by their caregivers (McClure, 2000; Vaccari & Marschark, 1997). Based on these theories, it has been proposed that CD children of deaf parents, who acquire a sign language as a native language, are at an advantage relative to deaf children of hearing parents, who do not necessarily share a natural language with their parents from birth (cf. Gregory, 1976; Vaccari & Marschark, 1997). Moreover, there is evidence that deaf children of deaf parents outperform deaf children of hearing parents in terms of spoken language acquisition after cochlear implantation (Davidson, Lillo-Martin, & Chen Pichler, 2014; Hassanzadeh, 2012).
Based on this evidence, it might be reasoned that first, congenital deafness as well as early sign language experience result in enhanced facial processing abilities. Second, the exposure to a sign language early in life may lead to better emotion comprehension skills in CD individuals. It can thus be speculated that CI users might have an advantage compared to hearing controls in affective facial expression recognition—in particular when they were exposed to a sign language from birth. It has to be noted that individuals meeting the latter criterion are rare. Although many CD CI users use sign languages to some degree (Lillo-Martin, 1999), only a minority of these individuals are considered native signers, born to deaf parents (Campbell, MacSweeney, & Woll, 2014; Lillo-Martin, 1999). About 95% of deaf children are born from hearing parents, who generally know no or no more than little sign language (Lillo-Martin, 1999; Mitchell & Karchmer, 2004; Vaccari & Marschark, 1997). Moreover, deaf parents with deaf children commonly do not have their children implanted (Mitchiner, 2015).
In the present study, we investigated auditory, visual and multisensory emotion perception in CD CI users with and without sign language exposure from birth (i.e., early-signing and late-signing CI users) and hearing, non-signing controls. We hypothesized that all CI users would perform worse in affective prosodic recognition, but better in facial expression recognition than the hearing controls and that they would rely more on the facial cues when presented with cross-modal emotional stimuli. Moreover, we hypothesized that the expected group effects in visual and multisensory emotion perception would be stronger in the early-signing CI users than in the late-signing CI users, resulting in differences between the two CI groups as well.
Method
Participants
Twenty-one CD CI users took part in this study (see Table 1 for a detailed description of the participants). All CI users reported that they were suspected to have a severe to profound hearing loss at birth, that they were ultimately diagnosed with deafness during the first 3 years of life, and that although they were initially fitted with hearing aids they did not benefit from the acoustic amplification for speech understanding. The age at diagnosis appears rather late, but a general hearing assessment in newborns was not established in Germany before 2009 (Bundesministerium für Gesundheit, 2008), and the average age at diagnosis of deafness had been 1.9 years (2.5 years for severe hearing loss) previously (Finckh-Krämer, Spormann-Lagodzinski, & Gross, 2000).
| ID . | Gender . | Age (years) . | Age at diagnosis (months) . | Age at implantation (first CI, years) . | CI experience (years) . | Early intervention before CI (years) . | Side of CI(s) . | Type of CI(s)a . |
|---|---|---|---|---|---|---|---|---|
| ES CI users (n = 11) | ||||||||
| 1 | f | 23 | 6 | 22 | 1 | 0 | Left | Cochlear |
| 2 | f | 20 | 18 | 10 | 10 | 0 | Both | Cochlear |
| 3 | f | 23 | 8 | 11 | 12 | 2.5 | Right | Cochlear |
| 4 | m | 31 | 12 | 29 | 2 | 6 | Right | Cochlear |
| 5 | f | 33 | 36 | 27 | 6 | 8 | Both | Cochlear |
| 6 | f | 20 | 11 | 10 | 10 | 7 | Both | Cochlear |
| 7 | m | 22 | 6 | 19 | 3 | 9 | Right | Med-El |
| 8 | f | 20 | 6 | 2 | 18 | 0 | Left | Cochlear |
| 9 | f | 22 | 12 | 3 | 19 | 0 | Right | AB |
| 10 | f | 20 | 12 | 7 | 13 | 1 | Both | Med-El |
| 11 | f | 27 | 24 | 7 | 20 | 2 | Both | AB |
| M = 23.73 | M = 13.73 | M = 13.36 | M = 10.36 | M = 3.23 | ||||
| SD = 4.61 | SD = 9.21 | SD = 9.39 | SD = 6.83 | SD = 3.56 | ||||
| LS CI users (n = 10) | ||||||||
| 1 | f | 25 | 24 | 3 | 22 | 1 | Both | Cochlear, AB |
| 2 | m | 27 | 12 | 14 | 13 | 0.5 | Both | Cochlear |
| 3 | f | 33 | 30 | 31 | 2 | 10 | Right | Cochlear |
| 4 | f | 22 | 36 | 4 | 18 | 1 | Both | Cochlear |
| 5 | m | 27 | 24 | 12 | 15 | 0 | Right | Cochlear |
| 6 | m | 19 | 18 | 4 | 15 | 5 | Both | Med-El |
| 7 | f | 23 | 9 | 9 | 14 | 6 | Left | Cochlear |
| 8 | m | 32 | 6 | 24 | 8 | 8 | Right | Cochlear |
| 9 | f | 23 | 36 | 17 | 6 | 6 | Both | Cochlear |
| 10 | f | 27 | 12 | 8 | 19 | 5 | Both | Med-El |
| M = 25.80 | M = 20.70 | M = 12.60 | M = 13.20 | M = 4.25 | ||||
| SD = 4.37 | SD = 11.00 | SD = 9.22 | SD = 6.20 | SD = 3.46 | ||||
| ID . | Gender . | Age (years) . | Age at diagnosis (months) . | Age at implantation (first CI, years) . | CI experience (years) . | Early intervention before CI (years) . | Side of CI(s) . | Type of CI(s)a . |
|---|---|---|---|---|---|---|---|---|
| ES CI users (n = 11) | ||||||||
| 1 | f | 23 | 6 | 22 | 1 | 0 | Left | Cochlear |
| 2 | f | 20 | 18 | 10 | 10 | 0 | Both | Cochlear |
| 3 | f | 23 | 8 | 11 | 12 | 2.5 | Right | Cochlear |
| 4 | m | 31 | 12 | 29 | 2 | 6 | Right | Cochlear |
| 5 | f | 33 | 36 | 27 | 6 | 8 | Both | Cochlear |
| 6 | f | 20 | 11 | 10 | 10 | 7 | Both | Cochlear |
| 7 | m | 22 | 6 | 19 | 3 | 9 | Right | Med-El |
| 8 | f | 20 | 6 | 2 | 18 | 0 | Left | Cochlear |
| 9 | f | 22 | 12 | 3 | 19 | 0 | Right | AB |
| 10 | f | 20 | 12 | 7 | 13 | 1 | Both | Med-El |
| 11 | f | 27 | 24 | 7 | 20 | 2 | Both | AB |
| M = 23.73 | M = 13.73 | M = 13.36 | M = 10.36 | M = 3.23 | ||||
| SD = 4.61 | SD = 9.21 | SD = 9.39 | SD = 6.83 | SD = 3.56 | ||||
| LS CI users (n = 10) | ||||||||
| 1 | f | 25 | 24 | 3 | 22 | 1 | Both | Cochlear, AB |
| 2 | m | 27 | 12 | 14 | 13 | 0.5 | Both | Cochlear |
| 3 | f | 33 | 30 | 31 | 2 | 10 | Right | Cochlear |
| 4 | f | 22 | 36 | 4 | 18 | 1 | Both | Cochlear |
| 5 | m | 27 | 24 | 12 | 15 | 0 | Right | Cochlear |
| 6 | m | 19 | 18 | 4 | 15 | 5 | Both | Med-El |
| 7 | f | 23 | 9 | 9 | 14 | 6 | Left | Cochlear |
| 8 | m | 32 | 6 | 24 | 8 | 8 | Right | Cochlear |
| 9 | f | 23 | 36 | 17 | 6 | 6 | Both | Cochlear |
| 10 | f | 27 | 12 | 8 | 19 | 5 | Both | Med-El |
| M = 25.80 | M = 20.70 | M = 12.60 | M = 13.20 | M = 4.25 | ||||
| SD = 4.37 | SD = 11.00 | SD = 9.22 | SD = 6.20 | SD = 3.46 | ||||
aListed are the brand names of the cochlear implants (AB = Advanced Bionics).
| ID . | Gender . | Age (years) . | Age at diagnosis (months) . | Age at implantation (first CI, years) . | CI experience (years) . | Early intervention before CI (years) . | Side of CI(s) . | Type of CI(s)a . |
|---|---|---|---|---|---|---|---|---|
| ES CI users (n = 11) | ||||||||
| 1 | f | 23 | 6 | 22 | 1 | 0 | Left | Cochlear |
| 2 | f | 20 | 18 | 10 | 10 | 0 | Both | Cochlear |
| 3 | f | 23 | 8 | 11 | 12 | 2.5 | Right | Cochlear |
| 4 | m | 31 | 12 | 29 | 2 | 6 | Right | Cochlear |
| 5 | f | 33 | 36 | 27 | 6 | 8 | Both | Cochlear |
| 6 | f | 20 | 11 | 10 | 10 | 7 | Both | Cochlear |
| 7 | m | 22 | 6 | 19 | 3 | 9 | Right | Med-El |
| 8 | f | 20 | 6 | 2 | 18 | 0 | Left | Cochlear |
| 9 | f | 22 | 12 | 3 | 19 | 0 | Right | AB |
| 10 | f | 20 | 12 | 7 | 13 | 1 | Both | Med-El |
| 11 | f | 27 | 24 | 7 | 20 | 2 | Both | AB |
| M = 23.73 | M = 13.73 | M = 13.36 | M = 10.36 | M = 3.23 | ||||
| SD = 4.61 | SD = 9.21 | SD = 9.39 | SD = 6.83 | SD = 3.56 | ||||
| LS CI users (n = 10) | ||||||||
| 1 | f | 25 | 24 | 3 | 22 | 1 | Both | Cochlear, AB |
| 2 | m | 27 | 12 | 14 | 13 | 0.5 | Both | Cochlear |
| 3 | f | 33 | 30 | 31 | 2 | 10 | Right | Cochlear |
| 4 | f | 22 | 36 | 4 | 18 | 1 | Both | Cochlear |
| 5 | m | 27 | 24 | 12 | 15 | 0 | Right | Cochlear |
| 6 | m | 19 | 18 | 4 | 15 | 5 | Both | Med-El |
| 7 | f | 23 | 9 | 9 | 14 | 6 | Left | Cochlear |
| 8 | m | 32 | 6 | 24 | 8 | 8 | Right | Cochlear |
| 9 | f | 23 | 36 | 17 | 6 | 6 | Both | Cochlear |
| 10 | f | 27 | 12 | 8 | 19 | 5 | Both | Med-El |
| M = 25.80 | M = 20.70 | M = 12.60 | M = 13.20 | M = 4.25 | ||||
| SD = 4.37 | SD = 11.00 | SD = 9.22 | SD = 6.20 | SD = 3.46 | ||||
| ID . | Gender . | Age (years) . | Age at diagnosis (months) . | Age at implantation (first CI, years) . | CI experience (years) . | Early intervention before CI (years) . | Side of CI(s) . | Type of CI(s)a . |
|---|---|---|---|---|---|---|---|---|
| ES CI users (n = 11) | ||||||||
| 1 | f | 23 | 6 | 22 | 1 | 0 | Left | Cochlear |
| 2 | f | 20 | 18 | 10 | 10 | 0 | Both | Cochlear |
| 3 | f | 23 | 8 | 11 | 12 | 2.5 | Right | Cochlear |
| 4 | m | 31 | 12 | 29 | 2 | 6 | Right | Cochlear |
| 5 | f | 33 | 36 | 27 | 6 | 8 | Both | Cochlear |
| 6 | f | 20 | 11 | 10 | 10 | 7 | Both | Cochlear |
| 7 | m | 22 | 6 | 19 | 3 | 9 | Right | Med-El |
| 8 | f | 20 | 6 | 2 | 18 | 0 | Left | Cochlear |
| 9 | f | 22 | 12 | 3 | 19 | 0 | Right | AB |
| 10 | f | 20 | 12 | 7 | 13 | 1 | Both | Med-El |
| 11 | f | 27 | 24 | 7 | 20 | 2 | Both | AB |
| M = 23.73 | M = 13.73 | M = 13.36 | M = 10.36 | M = 3.23 | ||||
| SD = 4.61 | SD = 9.21 | SD = 9.39 | SD = 6.83 | SD = 3.56 | ||||
| LS CI users (n = 10) | ||||||||
| 1 | f | 25 | 24 | 3 | 22 | 1 | Both | Cochlear, AB |
| 2 | m | 27 | 12 | 14 | 13 | 0.5 | Both | Cochlear |
| 3 | f | 33 | 30 | 31 | 2 | 10 | Right | Cochlear |
| 4 | f | 22 | 36 | 4 | 18 | 1 | Both | Cochlear |
| 5 | m | 27 | 24 | 12 | 15 | 0 | Right | Cochlear |
| 6 | m | 19 | 18 | 4 | 15 | 5 | Both | Med-El |
| 7 | f | 23 | 9 | 9 | 14 | 6 | Left | Cochlear |
| 8 | m | 32 | 6 | 24 | 8 | 8 | Right | Cochlear |
| 9 | f | 23 | 36 | 17 | 6 | 6 | Both | Cochlear |
| 10 | f | 27 | 12 | 8 | 19 | 5 | Both | Med-El |
| M = 25.80 | M = 20.70 | M = 12.60 | M = 13.20 | M = 4.25 | ||||
| SD = 4.37 | SD = 11.00 | SD = 9.22 | SD = 6.20 | SD = 3.46 | ||||
aListed are the brand names of the cochlear implants (AB = Advanced Bionics).
Eleven CD CI users were defined as “early signers” of German Sign Language (ES CI users; 2 males, mean age: 24 years, age range: 20–33 years; mean age at diagnosis of deafness: 14 months; mean age at implantation: 13 years; mean CI experience: 10 years; mean self-reported age at sign language acquisition: 1.5 years). Six of the ES CI users had two deaf parents. Two ES CI users had one deaf parent each, and three ES CI users had hearing parents with knowledge of German Sign Language. All ES CI users reported having been exposed to sign language from birth, either exclusively (participants with two deaf parents) or concurrently with spoken language (participants with one or two hearing parents). Nevertheless, the majority of ES CI users reported having been enrolled in early intervention programs that focused on the development of spoken speech (traditional approach in Germany, e.g., Hennies, 2010) as well.
Ten CD CI users were defined as “late signers” of German Sign Language and considered spoken German (i.e., oral-aural mode of communication), which they were exposed to by their parents from birth and trained on in early intervention programs, as their first language (LS CI users; 4 males; mean age: 26 years, age range: 19–33 years; mean age at diagnosis of deafness: 21 months; mean age at implantation: 13 years; mean CI experience: 13 years; mean self-reported age at sign language acquisition: 16 years). The LS CI users reported having learned to sign in kindergarten and/or in school, from friends, or in private lessons.
Additionally, a group of 12 healthy, age-matched control participants was included (4 males; mean age: 25 years, age range: 22–31 years). The controls reported normal hearing and no experience with any sign language.
The ES CI users and the LS CI users did not differ in terms of age (t(18) = −1.06, p = .30; two-tailed Welch two-sample t-test), age at diagnosis of deafness (t(17) = −1.57, p = .13), age at (first) implantation (t(18) = 0.19, p = .85), duration of CI experience (t(19) = −1.0, p = .33), and duration of participation in early intervention programs (t(18) = −0.67, p = .51).
In all CI users, auditory abilities and speech recognition skills were assessed by measuring pure-tone hearing thresholds (at 500, 1,000, 2,000, and 4,000 Hz) and free-field, open-set word recognition with and without background noise (using a German standard test: Freiburger Sprachverständlichkeitstest, Hahlbrock, 1953) at 65 dB SPL (noise level: 60 dB SPL). Furthermore, all CI users completed a test of German sign language proficiency (German Sign Language Sentence Repetition Task [DGS-SRT], Rathmann & Kubus, 2015). See Table 2 for group statistics. Since the control participants reported normal hearing and no experience with German Sign Language at all, they were not tested on these tasks. For each CI participant, mean hearing thresholds were calculated by averaging the thresholds measured at the four tested frequencies. For spoken word recognition with and without background noise, respectively, accuracy rates (in percent) were computed. The DGS-SRT score was calculated as the sum of correctly repeated sentences (possible range: 0–30 points).
Group means of hearing thresholds, spoken word recognition, sign language proficiency, and self-reported age at spoken and sign language acquisition of the ES and LS CI users
| . | ES CI users (n = 11) . | LS CI users (n = 10) . |
|---|---|---|
| Mean hearing threshold (dB)a | M = 30.80 | M = 35.69 |
| SD = 3.63 | SD = 7.53 | |
| Word recognition w/o noise (%)b | M = 52.27 | M = 43.00 |
| SD = 21.72 | SD = 32.34 | |
| Word recognition w/ noise (%)b | M = 25.00 | M = 27.00 |
| SD = 18.30 | SD = 20.17 | |
| DGS-SRT scorec | M = 15.44 | M = 6.80 |
| SD = 6.91 | SD = 4.99 | |
| Age at spoken language acquisition (years)d | M = 2.33 | M = 2.14 |
| SD = 0.82 | SD = 0.69 | |
| Age at sign language acquisition (years)d | M = 1.50 | M = 16.00 |
| SD = 0.84 | SD = 7.30 |
| . | ES CI users (n = 11) . | LS CI users (n = 10) . |
|---|---|---|
| Mean hearing threshold (dB)a | M = 30.80 | M = 35.69 |
| SD = 3.63 | SD = 7.53 | |
| Word recognition w/o noise (%)b | M = 52.27 | M = 43.00 |
| SD = 21.72 | SD = 32.34 | |
| Word recognition w/ noise (%)b | M = 25.00 | M = 27.00 |
| SD = 18.30 | SD = 20.17 | |
| DGS-SRT scorec | M = 15.44 | M = 6.80 |
| SD = 6.91 | SD = 4.99 | |
| Age at spoken language acquisition (years)d | M = 2.33 | M = 2.14 |
| SD = 0.82 | SD = 0.69 | |
| Age at sign language acquisition (years)d | M = 1.50 | M = 16.00 |
| SD = 0.84 | SD = 7.30 |
aMean calculated across thresholds at 500, 1,000, 2,000, and 4,000 Hz.
bAssessed using the German “Freiburger Sprachverständlichkeitstest.”
cThe German Sign Language Sentence Repetition Task (DGS-SRT) consists of 30 trials. For each correct repetition of a sentence, one point is obtained (possible score: 0–30 points).
dAges at spoken and/or sign language acquisition according to self-report.
Group means of hearing thresholds, spoken word recognition, sign language proficiency, and self-reported age at spoken and sign language acquisition of the ES and LS CI users
| . | ES CI users (n = 11) . | LS CI users (n = 10) . |
|---|---|---|
| Mean hearing threshold (dB)a | M = 30.80 | M = 35.69 |
| SD = 3.63 | SD = 7.53 | |
| Word recognition w/o noise (%)b | M = 52.27 | M = 43.00 |
| SD = 21.72 | SD = 32.34 | |
| Word recognition w/ noise (%)b | M = 25.00 | M = 27.00 |
| SD = 18.30 | SD = 20.17 | |
| DGS-SRT scorec | M = 15.44 | M = 6.80 |
| SD = 6.91 | SD = 4.99 | |
| Age at spoken language acquisition (years)d | M = 2.33 | M = 2.14 |
| SD = 0.82 | SD = 0.69 | |
| Age at sign language acquisition (years)d | M = 1.50 | M = 16.00 |
| SD = 0.84 | SD = 7.30 |
| . | ES CI users (n = 11) . | LS CI users (n = 10) . |
|---|---|---|
| Mean hearing threshold (dB)a | M = 30.80 | M = 35.69 |
| SD = 3.63 | SD = 7.53 | |
| Word recognition w/o noise (%)b | M = 52.27 | M = 43.00 |
| SD = 21.72 | SD = 32.34 | |
| Word recognition w/ noise (%)b | M = 25.00 | M = 27.00 |
| SD = 18.30 | SD = 20.17 | |
| DGS-SRT scorec | M = 15.44 | M = 6.80 |
| SD = 6.91 | SD = 4.99 | |
| Age at spoken language acquisition (years)d | M = 2.33 | M = 2.14 |
| SD = 0.82 | SD = 0.69 | |
| Age at sign language acquisition (years)d | M = 1.50 | M = 16.00 |
| SD = 0.84 | SD = 7.30 |
aMean calculated across thresholds at 500, 1,000, 2,000, and 4,000 Hz.
bAssessed using the German “Freiburger Sprachverständlichkeitstest.”
cThe German Sign Language Sentence Repetition Task (DGS-SRT) consists of 30 trials. For each correct repetition of a sentence, one point is obtained (possible score: 0–30 points).
dAges at spoken and/or sign language acquisition according to self-report.
The ES and LS CI users did not differ with regard to their mean hearing thresholds (t(11) = 1.79, p = .10; two-tailed Welch two-sample t-test), and their word recognition rates without noise (t(15) = 0.76, p = .46) and with noise (t(18) = −0.24, p = .83). In contrast, the ES CI users obtained significantly higher mean scores in the DGS-SRT than the LS CI users (t(18) = 3.94, p < .001).
As spoken word recognition is often taken as indicator for CI success, we additionally examined the individual scores in this regard. “CI proficiency” is commonly defined by at least 70% correct responses in an open-set word recognition test conducted without background noise (Champoux, Lepore, Gagné, & Théoret, 2009; Landry, Bacon, Leybaert, Gagné, & Champoux, 2012; Tremblay, Champoux, Lepore, & Théoret, 2010). According to this definition, all but four CI users of the present study (n = 2 ES CI users: IDs 9 and 10, n = 2 LS CI users: IDs 4 and 7) were non-proficient CI users.
One participant in the ES CI group (ID 7) and two participants in the LS CI group (IDs 5 and 7) were bimodal CI users (i.e., used one CI and a hearing aid at the contralateral side). These individuals were included after confirming that their performance in the different conditions of the emotion recognition task and in terms of hearing thresholds and spoken word recognition did not deviate (i.e., was within the range of 2 SDs) from their groups’ means.
All CI and control participants reported normal or corrected-to-normal vision. There were no known neurological or mental impairments. The three groups did not differ significantly with respect to their years of education (F(2,30) = 1.85, p = .18; between-subjects analysis of variance [ANOVA]).
CI participants were recruited through CI support groups, online platforms, and personal contacts. Control participants were recruited from the University of Hamburg and the local community. All participants received either course credit or a monetary compensation. For CI users, costs for traveling and/or accommodation were additionally covered.
The study was approved by the ethical board of The DGPs (Deutsche Gesellschaft für Psychologie; no. BR 022015). All participants gave informed written consent prior to the start of the experiment. If necessary, written information was supplemented by explanations in German Sign Language by a signing experimenter.
Stimuli and Procedure
The task was adapted from Föcker, Gondan, and Röder (2011) where details of the stimulus generation and evaluation are described. Visual stimuli consisted of short video clips showing faces mouthing bisyllabic German pseudowords (“lolo,” “tete,” or “gigi”), which were presented on a computer screen at 60 cm viewing distance (width = 4° of visual angle; height = 9° of visual angle; Dell Latitude E6530, Dell Inc., Round Rock, TX, www.dell.com [last accessed: 10/10/2017]). The outer facial features (i.e., hair and ears) were covered to maximize attention to the inner facial features (i.e., eyes and mouth). Auditory stimuli consisted of short sound tracks of voices speaking out the same bisyllabic pseudowords at a sound level varying between 65 and 72 dB SPL, presented via two loudspeakers (Bose® Companion 2® Series II, Bose Corporation, Framingham, MA, www.bose.com [last accessed: 10/10/2017]) one located at either side of the monitor. The used affective facial expressions and vocal affective prosody were happy, sad, angry, or neutral. Participants were provided with written instructions and a keyboard for response registration. They were instructed to respond as quickly and as accurately as possible using their dominant hand throughout the experiment. Stimulus presentation and response recording were performed using Presentation® software (Version 15.1, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com [last accessed: 10/10/2017]).
Unimodal visual and auditory stimuli were faces and voices presented alone, respectively. Cross-modal congruent trials presented face-voice pairings with face and voice displaying the same emotion (e.g., “happy” face and “happy” voice) and cross-modal incongruent trials presented face-voice pairings with face and voice displaying a different emotion (e.g., “happy” face and “angry” voice). Note that, for both congruent and incongruent trials the video stream and audio track originated from independent recordings. This allowed compensating for possible minimal temporal misalignments when independent visual and audio streams were combined (see Föcker et al., 2011, for details).
Each trial began with a 500 ms audio-visual warning signal (a gray circle of 2° of visual angle, combined with multi-talker babble noise) to orient the participants’ attention to the stimuli. After a variable inter-stimulus interval (ISI; 600–700 ms, uniform distribution), participants were presented with a face alone (i.e., a video stream), a voice alone (i.e., an audio track), or a face-voice pair (i.e., video and audio stream). On half of the trials, participants had to identify the facial expression and to rate its intensity while ignoring the auditory input (Face task). In the other half of the trials participants had to identify the affective prosody to rate its intensity while ignoring the visual input (Voice task). While both tasks comprised cross-modal congruent and incongruent trials, the Face task additionally enclosed unimodal visual trials and the Voice task additionally comprised unimodal auditory trials.
Each stimulus was presented twice, with an ISI of 3 s. Following the first presentation, participants indicated the perceived emotional category (happy, sad, angry, or neutral) by pressing one of four adjacent marked buttons on the keyboard. After the second presentation, participants rated the intensity of the emotional expression. To this end, a visual five-point rating scale (1 = low, 5 = high) was presented on the screen 50 ms after stimulus offset and participants typed in one of the corresponding numbers on the keyboard. Up to 10 practice trials were run to familiarize the participants with the task. The experiment took on average 40 min to complete.
Data Analysis
Emotion recognition accuracy rates (in percent), reaction times (RTs, in ms), and emotion intensity ratings (possible range: 1–5) were calculated for each participant, separately for condition (i.e., unimodal, cross-modal congruent, cross-modal incongruent), and task (i.e., Voice task, Face task). Each dependent variable was analyzed with a 3 (Group: ES CI users, LS CI users, controls) × 3 (Condition: unimodal, cross-modal congruent, cross-modal incongruent) ANOVA, separately for each sub-task (i.e., Voice task, Face task). Violations of sphericity for the repeated measurement factor Condition was accounted for by Huynh-Feldt corrections. Note that the results of both the accuracy and the RT analyses are presented in this article because congruency and incongruency effects (i.e., the difference between the cross-modal congruent and incongruent as compared to the unimodal condition, respectively) can be and have been described for both accuracy and RTs. Indeed, due to possible speed-accuracy trade-offs the one is needed to unambiguously interpret the other. Emotion intensity rating results are provided in the Supplementary Material.
Post hoc tests on significant main effects were computed by two-tailed Welch two-sample t-tests (for group comparisons) and two-tailed paired t-tests (for comparisons of conditions), respectively. Significant interaction effects (Group × Condition) were followed up by testing condition differences in each group by repeated measures ANOVAs and post hoc two-tailed paired t-tests when applicable (i.e., whenever a sub-ANOVA showed a significant result), and group differences per condition by between-subjects ANOVAs and post hoc two-tailed Welch two-sample t-tests when applicable (i.e., whenever a sub-ANOVA showed a significant result). Additionally, interaction effects were followed up by comparing the congruency and incongruency effects between the groups with between-subjects ANOVAs and post hoc two-tailed Welch two-sample t-tests when applicable (i.e., whenever a sub-ANOVA showed a significant result) to directly assess differences in multisensory emotion processing. The p-values of all post hoc comparisons were corrected for multiple comparisons using the Holm–Bonferroni procedure. The data analysis was performed using the R software (R Core Team, 2017).
Results
Accuracy
Face task
The mixed-effects ANOVA revealed a significant main effect of Condition (F(2,60) = 3.59, p = .04, ηG2 = 0.04) and a significant interaction effect of Group × Condition (F(4,60) = 6.25, p < .001, ηG2 = 0.14).
The repeated measures ANOVAs which were separately computed for each group to follow up on this interaction effect indicated a Condition effect for the LS CI users (Condition: F(2,18) = 4.65, p = .02, ηG2 = .34) and the controls (F(2,22) = 9.02, p < .01, ηG2 = 0.45), but not for the ES CI users (Condition: F(2,20) = 0.30, p = .74). The post hoc two-tailed paired t-tests showed that the LS CI users performed significantly worse in the unimodal condition relative to the incongruent condition (t(9) = −3.00, p = .04, d = 0.95). Their performance in the unimodal and incongruent condition did not significantly differ from the congruent condition, respectively (unimodal vs. congruent: t(9) = −1.91, p = .18; incongruent vs. congruent: t(9) = 1.26, p = .24). The controls performed significantly better in the congruent as compared to both the unimodal (t(11) = 3.58, p = .01, d = 1.03) and the incongruent (t(11) = 3.41, p = .01, d = 0.99) condition, whereas the unimodal and incongruent conditions did not significantly differ (t(11) = 1.74, p = .11).
The between-subjects ANOVAs that were separately computed for each condition did not yield any significant group effects (all p > .05; see Figure 1).

Accuracy rates (in percent) in the late-signing CI users (n = 10), early-signing CI users (n = 11), and hearing controls (n = 12), separately for task (Face task, Voice task) and condition (unimodal, congruent, incongruent). Error bars denote standard deviations. p-values indicate significant between-group differences.
Congruency and Incongruency Effects
The between-subjects ANOVAs on the congruency and incongruency effects for the accuracy rates (i.e., the difference values between the unimodal and the congruent and incongruent condition, respectively) indicated a significant Group effect concerning the incongruency effect (Group: F(2,30) = 7.41, p < .01, ηG2 = 0.33). There was no significant Group effect with regard to the congruency effect (Group: F(2,30) = 2.56, p = .23). The post hoc two-tailed Welch two-sample t-tests revealed that the LS CI users displayed a significantly smaller incongruency effect than controls (t(17) = 3.45, p < .01, d = 1.50) and a marginally significantly smaller incongruency effect than the ES CI users (t(15) = 2.18, p = .09). The ES CI users had a marginally significantly smaller incongruency effect than the controls as well (t(20) = 1.81, p = .09). Note that numerically both CI groups had a positive rather than negative incongruency effect (see Figure 2).

Congruency and incongruency effects in recognition accuracy (i.e., accuracy difference between the congruent and incongruent and the unimodal condition, respectively; in percent) in the late-signing CI users (n = 10), early-signing CI users (n = 11), and hearing controls (n = 12), displayed separately for task (Face task, Voice task). Error bars denote standard deviations. (Marginally) significant between-group differences are indicated accordingly.
Voice task
The mixed-effects ANOVA showed a significant main effect of Group (F(2,30) = 21.03, p < .001, ηG2 = 0.48), a significant main effect of Condition (F(2,60) = 97.76, p < .001, ηG2 = 0.53), and a significant interaction effect of Group × Condition (F(4,60) = 3.43, p = .02, ηG2 = 0.07).
The repeated measures ANOVAs which were separately computed for each group showed that all groups displayed a significant effect of Condition (ES CI users, Condition: F(2,20) = 40.71, p < .001, ηG2 = 0.63; LS CI users: F(2,18) = 32.35, p < .001, ηG2 = 0.42; controls, Condition: F(2,22) = 27.09, p < .001, ηG2 = 0.54). The post hoc two-tailed paired t-tests revealed that all groups performed significantly better in the congruent as compared to both the unimodal and the incongruent condition, as well as in the unimodal as compared to the incongruent condition (all p < .05, d = 1.05–2.26).
The between-subjects ANOVAs that were separately computed for each condition indicated a significant Group effect in each condition (unimodal condition, Group: F(2,30) = 22.42, p < .001, ηG2 = 0.60; congruent condition, Group: F(2,30) = 17.47, p < .001, ηG2 = 0.54; incongruent condition, Group: F(2,30) = 10.79, p < .001, ηG2 = 0.42). The post hoc two-tailed Welch two-sample t-tests showed that in the unimodal condition, the controls performed significantly more accurately as compared to both CI groups, which did not significantly differ in accuracy (ES CI users vs. controls: t(14) = −6.75, p < .001, d = 2.89; LS CI users vs. controls: t(10) = −5.60, p < .001, d = 2.60; ES CI users vs. LS CI users: t(14) = 1.13, p = .28; see Figure 1). In the congruent condition, the controls performed significantly more accurately as compared to both CI groups (ES CI users vs. controls: t(16) = −3.13, p = .01, d = 1.33; LS CI users vs. controls: t(11) = −5.40, p < .001, d = 2.47). Additionally, the ES CI users performed significantly more accurately as compared to the LS CI users (t(15) = 2.85, p = .01, d = 1.27; see Figure 1). In the incongruent condition, the controls performed significantly more accurately as compared to both CI groups, which did not significantly differ in accuracy (ES CI users vs. controls: t(18) = −3.77, p < .01, d = 1.59; LS CI users vs. controls: t(18) = −4.36, p < .001, d = 1.88; ES CI users vs. LS CI users: t(18) = 0.24, p = .81; see Figure 1).
Congruency and incongruency effects
The between-subjects ANOVAs on the congruency and incongruency effects for the accuracy rates (i.e., the difference values between the unimodal and the congruent and incongruent condition, respectively) indicated a significant Group effect for the congruency effect (Group: F(2,30) = 7.35, p < .01, ηG2 = 0.34). There was no significant Group effect with regard to the incongruency effect (Group: F(2,30) = 0.42, p = .66). The post hoc two-tailed Welch two-sample t-tests showed that the ES CI users had a significantly larger congruency gain as compared to the controls (t(15) = 4.72, p < .001, d = 2.02). The LS CI users significantly differed from neither the ES CI users (t(16) = 1.08, p = .30) nor the controls (t(11) = 2.17, p = .10; see Figure 2).
Reaction Time
Face task
The mixed-effects ANOVA displayed a significant main effect of Condition (F(2,60) = 4.76, p = .01, ηG2 = 0.01). The post hoc two-tailed paired t-tests showed that the participants responded significantly faster in the incongruent as compared to the congruent (t(32) = −3.28, p < .01, d = 0.57) condition. They were marginally significantly faster in the incongruent as compared to the unimodal condition as well (t(32) = −2.17, p = .07). The congruent and unimodal conditions did not significantly differ (t(32) = 0.53, p = .60; see Figure 3).

Reaction time (RT, ms) in the late-signing CI users (n = 10), early-signing CI users (n = 11), and hearing controls (n = 12), separately for task (Face task, Voice task) and condition (unimodal, congruent, incongruent). Error bars denote standard deviations. P-values indicate significant differences between conditions, specifying the significant main effect of Condition found in each task. Note that in the Voice task, there was an additional marginally significant main effect of Group (p = .06).
Voice task
The mixed-effects ANOVA displayed a significant main effect of Condition (F(2,60) = 4.11, p = .03, ηG2 = 0.04) and a marginally significant main effect of Group (F(2,30) = 3.30, p = .06). The post hoc two-tailed paired t-tests showed that the participants responded significantly faster in the congruent condition as compared to the incongruent condition (t(32) = −3.40, p < .01, d = 0.59). The cross-modal conditions did not significantly differ from the unimodal condition (congruent vs. unimodal: t(32) = −1.69, p = .10; unimodal vs. incongruent: t(32) = −1.16, p = .25; see Figure 3).
Discussion
In the present study, we investigated unisensory and multisensory emotion recognition in adult CD CI users as a function of whether or not they were exposed to a sign language from birth by their parents. Early-signing CD CI users (ES CI users) and late-signing CD CI users (LS CI users) were compared and both CI groups were compared to hearing, non-signing, age-matched controls. Participants performed an emotion recognition task with auditory, visual, and cross-modal emotionally congruent and incongruent speech stimuli. On different trials, they categorized either the affective facial expressions (Face task) or the affective vocal expressions (Voice task).
We hypothesized that all CI users would perform worse in affective prosodic recognition, but better in facial expression recognition than the hearing controls. Furthermore, we hypothesized that the CI users would rely more on the facial cues when presented with cross-modal (congruent and incongruent) emotional stimuli. Finally, we hypothesized that the expected effects in visual and multisensory emotion perception would be stronger in the ES CI users than in the LS CI users, resulting in differences between the two CI groups as well.
Group differences emerged in accuracy of emotion recognition, but not with regard to the reaction times, indicating that the accuracy data cannot be explained by a speed-accuracy trade-off. In contrast to our main hypothesis, the accuracy results did not show any difference between the ES CI users, LS CI users, and controls for unisensory affective facial expression recognition. However, the groups differed concerning multisensory emotion perception in the Face task and both unisensory and multisensory emotion perception in the Voice task.
In the Face task, both CI groups experienced less interference by incongruent voices on their facial expression recognition accuracy as compared to the control group, but there was no difference between the ES CI users and LS CI users regarding this incongruency effect.
The results for facial expression recognition resemble a previous report of Landry and colleagues (2012) that speechreading performance is less impaired by incongruent speech sounds in non-proficient CI users than in both proficient CI users and hearing controls. In line with this finding, several other studies have suggested that limited CI proficiency in terms of auditory speech recognition relate to a stronger focus on visual speech information (Champoux et al., 2009; Song et al., 2015; Tremblay et al., 2010). Arguably, limited affective prosodic perception with the CI due to low CI proficiency may analogously relate to a stronger reliance on facial expressions when faced with cross-modal emotion information.
The results of the Voice task indeed showed that both CI groups performed worse than the hearing controls in recognizing affective prosody, which is in accordance with previous studies and our corresponding hypothesis (cf. Chatterjee et al., 2014; Luo et al., 2007; Marx et al., 2015; Most & Aviner, 2009).
The cross-modal conditions of the Voice task revealed additional group differences in multisensory emotion perception: The ES CI users displayed a larger congruency effect than the controls (i.e., gained more from cross-modal congruent relative to unimodal auditory stimuli) and attained higher absolute affective prosodic recognition accuracy in the presence of congruent facial expressions than the LS CI users. Notably, the two CI groups did not differ in their intensity rating of the affective prosody (see Supplementary Material).
The voice recognition data suggest that early sign language experience may be beneficial for the ability to link vocal and congruent facial stimuli to boost the accuracy of affective prosodic perception. Considering that we did not find enhanced facial expression recognition capabilities in the early signers, a possible reason might be that early sign language experience is associated with increased attention to the lower part of the face (i.e., the mouth region). In support of this idea is a previous study that investigated gaze patterns during identity and facial emotion judgments in CD native signers and hearing controls (Letourneau & Mitchell, 2011; Mitchell, Letourneau, & Maslin, 2013). The results showed that the CD native signers fixated the lower part of the faces more often and for a longer duration than the controls in both tasks. Mitchell et al. (2013) were however not able to decide whether the native sign language experience or the congenital deafness was responsible for the observed changes in gaze patterns. In showing a difference between the ES and LS CI users, the results of the present study speak for an additional effect of early sign language experience at least. However, we cannot disentangle the effects of early sign language experience and sign language proficiency, given that our ES CI users had significantly higher proficiency in German Sign Language as compared to the LS CI users as well.
Another explanation for the relatively stronger interaction of congruent auditory and visual signals during affective prosodic recognition in ES CI users may be the impact of early language experience per se. As proposed by emotion socialization theories, early language experience is important for the full development of “emotion knowledge,” including emotion understanding, elicitation, expression, and coping (cf. McClure, 2000; Vaccari & Marschark, 1997). Proposedly, infants start learning about emotions through exposure and observational learning. However, as soon as children acquire language, their emotion knowledge is thought to be elaborated mainly by instructions provided by their caregivers (concerning, e.g., attention to salient emotional cues, analyses of social interactions; cf. McClure, 2000; Vaccari & Marschark, 1997). Emotion socialization theories would thus predict an advantage for ES CI users.
Alternatively, it could be argued that the difference we observed between the ES CI users and LS CI users may stem from differences in CI proficiency, that is, relatively better auditory abilities in the ES CI users. In fact, there is evidence for a beneficial role of early sign language experience for later CI success (cf. Davidson et al., 2014; Hassanzadeh, 2012; Tomasuolo, Valeri, Di Renzo, Pasqualetti, & Volterra, 2013). However, neither the results on spoken word comprehension nor the measured hearing thresholds of the ES and LS CI users differed, speaking against the idea that CI proficiency mediated the differences we found in multisensory emotion recognition. In fact, the CI groups did not significantly differ regarding any of the variables which were assessed as control variables (i.e., age, age at diagnosis of deafness, age at [first] implantation, duration of CI experience, and duration of participation in early intervention programs, as well as spoken word recognition and hearing thresholds). These null results confirmed our grouping based on only the age at sign language acquisition and suggest that these variables do not account for the reported differences between the CI groups in multisensory emotion recognition. Although there was a considerable inter-individual variability in the age at implantation and the duration of CI experience, this was true for both CI groups and thus cannot explain any of the reported differences between the CI groups either. It has to be noted, however, that the generalizability of our results is limited by the small sample size of the groups we investigated (n = 10–12 per group) and the overall small effect sizes of the interaction effects involving Group and Condition (Bakeman, 2005; Cohen, 1988).
In conclusion, our study reports evidence that early sign language experience affects multisensory emotion recognition in CD CI users: higher cross-modal performance gains for the recognition of affective prosody were observed for CD CI users with early sign language experience.
Supplementary Data
Supplementary data is available at Journal of Deaf Studies and Deaf Education online.
Funding
This work was supported by the European Research Council [ERC-2009-AdG 249425-CriticalBrainChanges to B.R.] and grant “Crossmodal Learning” of the City of Hamburg.
Conflicts of Interest
No conflicts of interest were reported.
Acknowledgments
We thank all our participants for their time and effort in taking part and recommending our study to others. We thank Julia Föcker for providing us with the paradigm. Moreover, we thank Reza Muliawarman for his help in assessing auditory thresholds and word recognition abilities and Christian Rathmann for providing us with the German Sign Language Sentence Repetition Task (DGS-SRT). We are thankful to Lisa Bickelmayer and Antonia Ricke for their evaluation of the DGS-SRT data. Additionally, we thank Jonas Straumann from the online magazine “hearZONE” as well as the administrators of all (social media) support groups for CI users who promoted our study.