Abstract

This paper, the fourth in a series concerned with the level of access afforded to students who use educational interpreters, focuses on the intelligibility of interpreters who use Signing Exact English (SEE). Eight expert receivers of SEE were employed to evaluate the intelligibility of transliterated messages that varied in accuracy and lag time. Results of intelligibility tests showed that, similar to Cued Speech transliterators, (a) accuracy had a large positive effect on transliterator intelligibility, (b) overall intelligibility (69%) was higher than average accuracy (58%), and (c) the likelihood that an utterance reached 70% intelligibility was somewhat sigmoidal in shape, with the likelihood of reaching 70% intelligibility dropping off fastest for accuracy values <65%. Accuracy alone accounted for 53% of the variance in transliterator intelligibility; mouthing was identified as a secondary factor that explained an additional 11%. Although lag time accounted for just .4% of the remaining variance, utterances produced with lag times between .6 and 1.2 s were most likely to exceed 70% intelligibility. With 36% of the variance still unexplained, other sources of transliterator variability (for example, facial expression, nonmanual markers, and mouth/sign synchronization) may also play a role in intelligibility and should be explored in future research.

This paper is the fourth in a series concerned with the level of access afforded to students who use educational interpreters. The aim of the work is to identify factors affecting clarity of the “visual signal”1 that interpreters produce, in as many communication modes as possible. It is based on the premise that a clear visual signal is a necessary prerequisite for understanding an interpreter’s message (just as a clear auditory/speech signal is a necessary prerequisite for understanding a speaker’s message), and that a clear visual signal depends on two channels in the communication pathway: (a) accuracy, or the percentage of the original message correctly produced by the interpreter, and (b) intelligibility, or the percentage of the original message that can be correctly received by deaf persons who are proficient in the language and communication mode used by the interpreter. The first two papers in the series (Krause & Tessler, 2016; Krause & Lopez, 2017) focused on educational interpreters who use Cued Speech (CS; Cornett, 1967), a visual communication system that uses manual cues (handshapes and placements) near the face in synchrony with the mouth movements of speech to disambiguate phonemes confusable through speechreading alone. The third paper (Krause & Murray 2019) examined accuracy of interpreters who use Signing Exact English (SEE; Gustason, Pfetzing, & Zawolkow, 1972), and in this article, we examine their intelligibility.

Developed in the early 1970s, SEE is an invented sign system that provides visual access to English morphology through a combination of American Sign Language (ASL) signs, invented signs, and signed representations of English affixes. The goal of SEE is to represent English vocabulary and syntax as literally as possible by establishing a one-to-one mapping between signs and English words. Signed representations of English affixes are used to make English words relating to the same concept (for example, electric, electrical, electrician, electricity, and nonelectrical) visually distinct (Nielsen, Luetke, & Stryker, 2011), and invented signs are used in order to (a) represent English grammatical words that do not exist in ASL (for example, “the”) and (b) differentiate English synonyms that correspond to the same ASL sign. Because of the direct mapping between signs and English words, each English word can be represented with signs in only one way and vice versa, which means that intelligibility can be measured quantitatively, with a high degree of resolution, simply by tabulating the number of differences between the original spoken message and the message received by deaf individuals who are highly proficient in SEE. Using methods analogous to those we have used previously in this series (for example, Author citation, 2017), such data can then be used to determine the extent to which various factors, such as accuracy and lag time, affect intelligibility of SEE transliterators2, which is the purpose of this paper.

Intelligibility

While important for characterizing the level of access afforded to deaf individuals who use interpreters, the factors that affect intelligibility have not been widely studied; moreover, intelligibility itself has rarely been measured directly. Instead, most of what is known comes from methods of assessment that focus on the message the interpreter produces (rather than the message the consumer receives). Most notably, the Educational Interpreter Performance Assessment, or EIPA (Schick & Williams, 1994), has proven to be a valid and reliable (Schick, Williams, & Bolster, 1999) research tool for examining the quality of messages produced by educational interpreters who use ASL, Manually Coded English systems3, Pidgin Signed English (PSE)4, and CS (EIPA-CS; Krause, Kegl, & Schick, 2008). Studies using the EIPA have reported data for >2,000 educational interpreters nationwide (for example, Schick, Williams, and Kupermintz, 2006); the results of these studies have provided valuable data regarding accuracy and other aspects of interpreter performance.

In contrast, very little is known about intelligibility of interpreters. To our knowledge, the only data available come from the second paper in this series (Krause & Lopez, 2017), which evaluated factors affecting the intelligibility of 12 CS transliterators. In that study, intelligibility was measured using a percent-correct metric, and average intelligibility was 75% and ranged from 52% to 90% for the individual transliterators. In this study, we follow the same approach; intelligibility is measured using a percent-correct metric, and the effects of both accuracy and lag time on SEE transliterator intelligibility are evaluated.

Accuracy. Of all aspects of the visual signal, accuracy is perhaps the one most likely to affect intelligibility. In CS transliterators, for example, accuracy (that is, %-correct cues) has been shown to explain 26% of the variance in message intelligibility (Krause & Lopez, 2017). For SEE transliterators, the relationship between accuracy (that is, %-correct signs) and intelligibility has not previously been explored; however, some information is available regarding accuracy itself. In the third paper of this series, Krause & Murray, (2019) reported an average accuracy of 42% for 12 SEE transliterators (with varying experience levels, averaged over three different rates of presentation); that is, of the sign sequence that would be expected to represent the original message in SEE, just 42% were correctly produced on average by the transliterators in the study. Omissions were the most frequent type of error (40%), followed by substitutions (10%), misproductions (6%), and attempts to paraphrase (2%). Insertions occurred relatively infrequently, accounting for an extraneous 2% beyond the expected sign sequence.

Although the overall accuracy was quite low, it is worth noting that such accuracy measurements do not directly reflect how accessible each phrase would be to a deaf receiver with fluent English skills who is proficient in SEE. In other words, it cannot be assumed that 42% accuracy corresponds to 42% intelligibility. Indeed, our earlier study of CS transliterators (Krause & Lopez, 2017) found that overall message intelligibility was >10 percentage points higher than average accuracy (72% versus 61%). Thus, it is likely that intelligibility is somewhat higher than accuracy, particularly given that some aspects of the transliterated message (that is, mouthing and paraphrase) could provide information that may help deaf consumers recognize words even when errors are present, at least in some circumstances (Krause & Murray, 2019). Also, for certain type of errors (for example, misproductions and some types of substitutions), it may be possible for deaf consumers to compensate for or adjust to a transliterator’s signing style, much like a listener can adjust to a talker’s accent. One final point worth noting is that the relationship between accuracy and intelligibility is not necessarily linear; in CS, for example, the relationship is sigmoidal (Krause & Lopez, 2017), or S-shaped, similar to what has been observed for speech intelligibility in relation to physical properties of the speech waveform (for example, Wilson & Strouse, 1999). Yet, it cannot be assumed that a similar relationship holds for all types of interpreters and transliterators. Research is needed to determine the nature of the relationship between accuracy and intelligibility for SEE transliterators.

Lag time. Another factor that could affect SEE transliterator intelligibility—at least indirectly—is lag time, or the average delay (in seconds) between the spoken message and the transliterated message. Although the relationship between intelligibility and lag time has not been explored for most types of interpreters, more is known about the relationship between accuracy and lag time. For ASL interpreters, Cokely (1986) observed that longer lag times were associated with increased accuracy. In contrast, both SEE and CS transliterators exhibited the opposite effect; that is, increased lag time was associated with decreases in accuracy (Krause & Murray, 2019; Krause & Tessler, 2016). In both cases, however, the relationship was quite weak, with lag time accounting for just 8% of the variance in accuracy of SEE transliterators and only 3% of the variance in accuracy of CS transliterators. For SEE transliterators, whether this weak inverse relationship is preserved in the relationship between lag time and intelligibility or obscured by other factors is not yet known. In either case, it may be possible to identify a lag time or range of lag times that optimizes SEE transliterator intelligibility. In CS transliterators, for example, no relationship between lag time and intelligibility could be detected, but an optimal range of lag times between .6 and 1.8 s was identified (Krause & Lopez, 2017). It is unknown if this range of lag times would be optimal for SEE transliterator intelligibility, particularly given that average lag times for SEE transliterators (3.36 s; Krause & Murray, 2019) are considerably longer than those for CS transliterators (1.86 s; Krause & Tessler, 2016). In order to determine the answer to this question, the effect of lag time on intelligibility must be examined for SEE transliterators.

Present Study

In this study, eight highly skilled receivers of SEE were presented with visual stimuli excised from transliterated messages produced for an earlier study (Krause & Murray, 2019) on SEE transliterator accuracy. Receivers were asked to transcribe the stimuli, and intelligibility was measured as the percentage of words correctly received. For each stimulus, two characteristics of the visual signal were derived from previous measurements of the transliterated messages: (a) accuracy (in percent-correct), or the proportion of the target sign sequence correctly produced by the transliterator, and (b) lag time (in seconds), or the average delay between the spoken message and the transliterated message. The goal of the experiment was to determine the effect of accuracy and lag time on intelligibility levels.

Method

Participants

In order to evaluate the intelligibility of the transliterated messages, eight adults who were highly skilled receivers of SEE were recruited by advertising through The Signing Exact English Center (http://www.seecenter.org/). All (six females and two males; age range: 22–26 years) reported English as a first language, possessed at least a high school education, and had no known visual acuity problems. The definition of a “highly skilled” SEE receiver was consistent with that used in a previous study of CS receivers (Krause & Lopez, 2017). Specifically, each participant (SEE-R01–SEE-R08) was required meet the following criteria: (a) introduced to SEE before age 10, (b) used SEE receptively (or receptively and expressively) at home (with at least one parent) and at school (through a teacher or CS transliterator) before age 18, and (c) had at least 10 years of experience using SEE. In addition, participants were required to pass a receptive SEE proficiency screening. The screening consisted of five conversational English sentences obtained from a list of Clarke sentences (Magner, 1972) that were presented in SEE one at a time without repetition. The sentences were signed with 100% accuracy and did not include any audio. Participants were required to transcribe, verbatim, at least 90% of the words correctly in order to pass the screening.

As in Krause & Lopez (2017), participants were also screened for basic proficiency in written English, using the Expressive Written Vocabulary section of the Test of Adolescent and Adult Language Third Edition (TOAL-3; Hammil, Brown, Larsen, & Wiederholt, 1994). Given that the experimental format relied upon participants providing written English responses, participants were required to score within one standard deviation of age-appropriate normative data. Because TOAL-3 normative data does not include deaf and hard-of-hearing individuals, this screening procedure ensured that all participants, regardless of level of hearing loss, possessed written English skills on par with those of typical high school graduates. All eight participants who were recruited for the study met this criterion and also passed the receptive SEE proficiency screening.

Finally, participants were asked to complete a survey regarding communication background and level of hearing loss. The information collected from this survey and the screening results for each of the eight participants are summarized in Table 1.

Table 1

Language, education, and communication background of participants

SEE-R01SEE-R02SEE-R03SEE-R04SEE-R05SEE-R06SEE-R07SEE-R08
Age (years)2426252525222624
GenderMFFMFFFF
EducationBachelor’sSome collegeSome collegeBachelor’sBachelor’sBachelor’sMaster’sBachelor’s
Hearing lossProfoundProfoundProfoundProfoundProfoundProfoundProfoundProfound
SEE receptive screening (%)92.797.697.697.690.210097.6100
TOAL-3 percentile84th84th84th84th75th95th50th75th
First languageEnglishEnglishEnglishEnglishEnglishEnglishEnglishEnglish
Age first SEE exposure1 year6 months6 months18–20 monthsBirth1–2 years20 months2 years
SEE home use (during childhood)At least one parent, alwaysBoth parents, alwaysBoth parents, alwaysBoth parents—always before age 6, as needed thereafterAt least one parent, alwaysBoth parents, alwaysBoth parents, dailyBoth parents, always
SEE school use (during childhood)preK-12preK-12preK-12preK-8, some HSK-12K-12preK-12preK-9
SEE experience (years)2325242320+20–212418
Preferred communication modeEnglish/SEESEEASL and written EnglishOral EnglishPSEASLSEEEnglish
Fluency—other (age of initial exposure, years)NoneNoneASL (15)NoneASL (18)ASL (teen years)CASE (15)None
SEE-R01SEE-R02SEE-R03SEE-R04SEE-R05SEE-R06SEE-R07SEE-R08
Age (years)2426252525222624
GenderMFFMFFFF
EducationBachelor’sSome collegeSome collegeBachelor’sBachelor’sBachelor’sMaster’sBachelor’s
Hearing lossProfoundProfoundProfoundProfoundProfoundProfoundProfoundProfound
SEE receptive screening (%)92.797.697.697.690.210097.6100
TOAL-3 percentile84th84th84th84th75th95th50th75th
First languageEnglishEnglishEnglishEnglishEnglishEnglishEnglishEnglish
Age first SEE exposure1 year6 months6 months18–20 monthsBirth1–2 years20 months2 years
SEE home use (during childhood)At least one parent, alwaysBoth parents, alwaysBoth parents, alwaysBoth parents—always before age 6, as needed thereafterAt least one parent, alwaysBoth parents, alwaysBoth parents, dailyBoth parents, always
SEE school use (during childhood)preK-12preK-12preK-12preK-8, some HSK-12K-12preK-12preK-9
SEE experience (years)2325242320+20–212418
Preferred communication modeEnglish/SEESEEASL and written EnglishOral EnglishPSEASLSEEEnglish
Fluency—other (age of initial exposure, years)NoneNoneASL (15)NoneASL (18)ASL (teen years)CASE (15)None
Table 1

Language, education, and communication background of participants

SEE-R01SEE-R02SEE-R03SEE-R04SEE-R05SEE-R06SEE-R07SEE-R08
Age (years)2426252525222624
GenderMFFMFFFF
EducationBachelor’sSome collegeSome collegeBachelor’sBachelor’sBachelor’sMaster’sBachelor’s
Hearing lossProfoundProfoundProfoundProfoundProfoundProfoundProfoundProfound
SEE receptive screening (%)92.797.697.697.690.210097.6100
TOAL-3 percentile84th84th84th84th75th95th50th75th
First languageEnglishEnglishEnglishEnglishEnglishEnglishEnglishEnglish
Age first SEE exposure1 year6 months6 months18–20 monthsBirth1–2 years20 months2 years
SEE home use (during childhood)At least one parent, alwaysBoth parents, alwaysBoth parents, alwaysBoth parents—always before age 6, as needed thereafterAt least one parent, alwaysBoth parents, alwaysBoth parents, dailyBoth parents, always
SEE school use (during childhood)preK-12preK-12preK-12preK-8, some HSK-12K-12preK-12preK-9
SEE experience (years)2325242320+20–212418
Preferred communication modeEnglish/SEESEEASL and written EnglishOral EnglishPSEASLSEEEnglish
Fluency—other (age of initial exposure, years)NoneNoneASL (15)NoneASL (18)ASL (teen years)CASE (15)None
SEE-R01SEE-R02SEE-R03SEE-R04SEE-R05SEE-R06SEE-R07SEE-R08
Age (years)2426252525222624
GenderMFFMFFFF
EducationBachelor’sSome collegeSome collegeBachelor’sBachelor’sBachelor’sMaster’sBachelor’s
Hearing lossProfoundProfoundProfoundProfoundProfoundProfoundProfoundProfound
SEE receptive screening (%)92.797.697.697.690.210097.6100
TOAL-3 percentile84th84th84th84th75th95th50th75th
First languageEnglishEnglishEnglishEnglishEnglishEnglishEnglishEnglish
Age first SEE exposure1 year6 months6 months18–20 monthsBirth1–2 years20 months2 years
SEE home use (during childhood)At least one parent, alwaysBoth parents, alwaysBoth parents, alwaysBoth parents—always before age 6, as needed thereafterAt least one parent, alwaysBoth parents, alwaysBoth parents, dailyBoth parents, always
SEE school use (during childhood)preK-12preK-12preK-12preK-8, some HSK-12K-12preK-12preK-9
SEE experience (years)2325242320+20–212418
Preferred communication modeEnglish/SEESEEASL and written EnglishOral EnglishPSEASLSEEEnglish
Fluency—other (age of initial exposure, years)NoneNoneASL (15)NoneASL (18)ASL (teen years)CASE (15)None

Materials

Intelligibility stimuli were generated from SEE materials recorded for an earlier study on the factors affecting accuracy of 12 SEE transliterators (SEE-T01–SEE-T12; Author citation, 2019). In that study, the 12 participants were asked to transliterate lecture materials presented at three different speaking rates: a slow-conversational rate of 88 words-per-minute (wpm), a normal-conversational rate of 109 wpm, and a fast-conversational rate of 137 wpm. The audio-only lecture was derived from a 25-min educational film designed for use in a high school setting, entitled Life Cycle of Plants (Films for the Humanities, 1989). The content of the film focuses on key topics in plant growth and reproduction and includes some specialized vocabulary pertaining to plants (for example, names of plant species). A lecture version of the material was created by re-recording the audio narration of the film; a male talker read the film transcript, using a lecture-style delivery with deliberate pauses at phrase boundaries. The entire lecture was then presented to each SEE transliterator in three segments, always beginning with the slowest rate and finishing with the fastest rate. The segments were counterbalanced across speaking rates (that is, at a particular speaking rate, each of the three segments was presented to one-third of the transliterators; see Krause & Murray, 2019, for details). The purpose of counterbalancing was to minimize any accuracy effects that might be due to differences in difficulty associated with transliterating a particular segment of the lecture (rather than differences in speaking rate).

Similar to our previous study on CS transliterator intelligibility (Krause & Lopez, 2017), the goal for the current study was to extract intelligibility stimuli with a wide range of accuracy scores produced at a single-speaking rate so that the effect of various levels of accuracy on intelligibility could be examined independent of speaking rate. Accordingly, materials elicited at the slow-conversational speaking rate were used to generate all intelligibility stimuli because SEE transliterators at this speaking rate exhibit a wide range in accuracy, with phrase level scores ranging from 0 to 100% (Krause & Murray, 2019). For each of the three lecture segments, materials were available at this speaking rate from four different SEE transliterators (Segment 1: SEE-T01, SEE-T02, SEE-T07, and SEE-T08; Segment 2: SEE-T03, SEE-T04, SEE-T09, and SEE-T10; Segment 3: SEE-T05, SEE-T06, SEE-T11, and SEE-T12). Thus, materials were available from all 12 SEE transliterators, with any given sentence of the lecture produced by a subset of four transliterators.

Stimulus preparation. Materials appropriate for use in the intelligibility experiment were prepared with the same methods used for a previous intelligibility experiment in this series (Krause & Lopez, 2017); video clips consisting of short utterances were excised from the recordings of each SEE transliterator, using Adobe Premiere Pro 1.5, resulting in roughly 75 video clips per SEE transliterator. The video clips were typically extracted at points in the audio narration whether the talker had deliberately paused to signify a natural phrase boundary. In some cases, however, the anticipated break point was not available, either because the SEE transliterator did not pause as expected or because the mouth movements produced by the SEE transliterator led or lagged the corresponding signs to such an extent that they overlapped with signs from the previous or next phrase and obscured the break point. In these cases, two consecutive phrases or short sentences were either combined or divided at alternate break points, provided that the modified utterances remained semantically appropriate and did not contain >12 words. This upper limit on length was imposed to constrain the number of bits of unrelated information in the utterance in order to maximize the likelihood that participants would be able to remember the exact utterance (Miller, 1956) long enough to transcribe it.

In order to facilitate stimulus selection, four properties of the utterance produced in each video clip were documented: (a) lag time, (b) accuracy, (c) average key word accuracy, and (d) percent-mouthed (that is, the percentage of the source message that was mouthed by the transliterator). Lag time, accuracy, and percent-mouthed were all either available or easily derived from data that was previously collected for an earlier study on the relationship between accuracy and lag time (Krause & Murray, 2019); key word accuracy was determined by calculating the average accuracy for each key word and then averaging the word-level accuracies of all key words in the utterance. Key words were identified by a panel of experts in sign transliteration as content words that are required for full comprehension of the meaning of the sentence by deaf consumers (Kile, 2005).

Stimulus selection. At the slow-conversational rate, a total of roughly 900 video clips were available for use in this study, consisting of approximately four instances (one per SEE transliterator) of each of the roughly 225 phrases in the audio narration (3 lecture sections × 4 SEE transliterators per section × ~ 75 phrases per section/transliterator). Using as many of these clips as possible, four unique stimulus sets (that is, groups of video clips) were constructed, with each set designed in such a way that all phrases from the audio narration could be presented to the receiver in order. The four stimulus sets were counterbalanced across participants in order to minimize any possibly stimulus-specific effects on the accuracy–intelligibility relationship.

As in Krause & Lopez (2017), the primary goal in selecting the stimuli for each stimulus set was to obtain a wide range in accuracy scores, with a relatively uniform distribution from 0 to 100%. This distribution allowed the relationship between accuracy and intelligibility to be assessed across a full range of accuracy scores. Similarly, clips were also selected with a variety of lag times, so that the effect of different lag times on intelligibility could be evaluated. Figure 1 and the top panel of Figure 2, respectively, show that these goals were achieved in each of the four stimulus sets. In addition, the lower panel of Figure 2 shows that all 12 SEE transliterators were represented in relatively equal proportions within and across stimulus sets. Lastly, a secondary goal of stimulus selection was to—as much as possible—distribute clips with similar accuracy scores or with similar lag times to different locations in the lecture.

Accuracy and key word accuracy score distribution for stimuli in each of the four stimulus sets. Each data point represents one stimulus item, and one stimulus set is shown per panel (Panel A: Set 1, Panel B: Set 2, Panel C: Set 3, and Panel D: Set 4).
Figure 1

Accuracy and key word accuracy score distribution for stimuli in each of the four stimulus sets. Each data point represents one stimulus item, and one stimulus set is shown per panel (Panel A: Set 1, Panel B: Set 2, Panel C: Set 3, and Panel D: Set 4).

Top panel: distribution of lag time values for stimuli in each of the four stimulus sets, with number of stimuli shown for each .5-srange of lag times (that is, .5 represents lag times greater than zero and ≤.5 s). Lower panel: number of stimuli representing each SEE transliterator across the four stimulus sets.
Figure 2

Top panel: distribution of lag time values for stimuli in each of the four stimulus sets, with number of stimuli shown for each .5-srange of lag times (that is, .5 represents lag times greater than zero and ≤.5 s). Lower panel: number of stimuli representing each SEE transliterator across the four stimulus sets.

Because it was necessary for each stimulus set to contain all phrases from the narration in order, there were some limitations on the number of clips available that met particular criteria. Most notably, fewer clips were available at some lag times (especially in the 0–1 s and >6 s ranges) than others. Similarly, the number of available of clips for a particular phrase was sometimes reduced because the phrase was omitted by one or more transliterators. In addition, available accuracy and lag time values associated with each SEE transliterator were characteristic properties that could not be manipulated. In order to balance these constraints, it was necessary for some clips to be selected for more than one stimulus set; specifically, 190 clips were used in two stimulus sets, 49 clips were used in three sets, and 13 clips were used in all four stimulus sets. Of the remaining stimuli, 405 were unique to a single stimulus set, which resulted in the intelligibility evaluation of a total of 657 video clips.

Procedures

All procedures were identical to those used previously for a study on the intelligibility of CS transliterators (Krause & Lopez, 2017). Participants were tested individually at a computer in a sound-treated room at the University of South Florida. Presentation sessions were 2 hr in length, with one 15-min and two 10-min breaks per session. Participants were also encouraged to take additional breaks as needed in order to maximize attention and minimize any possible fatigue effects.

At the first session, the English language and receptive SEE screenings were administered, and then instructions were given to participants in written English and in sign (SEE or PSE) as needed. The participants were then presented with a practice stimulus set in order to become familiar with the experimental setup and procedures. Throughout the practice set, participants were given the opportunity to ask procedural questions as needed. When the practice set was finished, the experimental stimuli were presented.

Stimulus presentation. In order to provide context, stimulus items were preceded by short video scenes from the educational film, Life Cycle of Plants (that is, the source material for the transliterated lecture), presented in the same order that they appeared in the film. At the conclusion of each scene, one or more stimulus items were presented containing the audio narration for that scene. The video scene as well as the subsequent stimulus item(s) were visual-only and contained no audio. Participants were asked to transcribe, verbatim, each stimulus item (that is, video clip consisting of one phrase from the transliterated message) by typing into a response box. A customized user interface implemented in MATLAB (Mathworks, 2007) software allowed participants to control the rate of presentation of the stimuli so that they had as much time as needed to complete each response. Repetitions, however, were not permitted, and they viewed each video (that is, scene from the film or stimulus item) only one time.

Due to the goals of stimulus selection, consecutive stimulus items were not necessarily produced by the same SEE transliterator. Yet, each of the three lecture sections was limited to materials from just four SEE transliterators; as a result, participants had a chance to become familiar with each set of four transliterators throughout the course of a lecture section.

Subjective ratings. Upon completion of each lecture section, participants were asked to rate each of the four SEE transliterators from that section by completing a short survey. The survey, which included pictures of the four SEE transliterators for reference, required participants to select the one transliterator that they perceived to be most effective for that section and the one that they perceived to be least effective. In addition, participants were instructed to characterize how they would feel about using each of the SEE transliterators in a real-life situation by choosing from three possible ratings: “Very comfortable,” “OK,” or “Concerned I might miss something.” Finally, participants had a free-response option to comment about anything they particularly liked or disliked about each SEE transliterator.

Scoring. Two types of intelligibility scores were tabulated: original message (OM) intelligibility, or the percentage of the original spoken message that was correctly received by the participant, and transliterated message (TM) intelligibility, or the percentage correctly received of just that portion of the message that was actually produced by the transliterator. The purpose of OM intelligibility was to provide an overall measure of intelligibility that reflects how much access deaf receivers actually receive from a transliterated lecture, while the purpose of TM intelligibility was to describe the intelligibility of an individual SEE transliterator’s message, even if that message differed from the original message (either because the transliterator rephrased the material or omitted words or sequences of words).

To measure intelligibility, percent-correct scores were procured using a computer program that tabulated the proportion of words in agreement between each of the typed responses and the corresponding source message (for OM intelligibility) or message that was transliterated (for TM intelligibility). For each response, three types of scores were calculated: (a) original message (OM)—all words, (b) original message (OM)—key words, and (c) transliterator message (TM)—key words. As described above, key words were identified by a panel of experts in sign transliteration as content words that are required for full comprehension of the meaning of the sentence by deaf consumers (Kile, 2005). Each section of the lecture had a total of between 1,475 and 1,483 words, 760 to 766 of which were identified as key words by the panel. For example, in the sentence, “All the information is contained within the seed,” the OM—all word total was eight words, and the OM key word total was the three words that the panel designated as necessary for understanding the sentence (information, contained, and seed). TM—key word totals varied according to how the transliterator signed the sentence and which of these key words were included.

For all three types of scoring, an experimenter reviewed the scoring output of the computer program and gave credit for any obvious spelling and typographical errors as well as homophonous words (for example, “mail” for “male”). For key word grading, credit was also given for morphological errors that involved the wrong affix (for example, “running” for “run” or “ran”) as well as for contractions when both words were target words (for example, “won’t” for “will not”) and vice versa. Lastly, the correct criteria were loosened slightly for proper nouns and names of plant species, since many of these words were difficult to spell. That is, the response was simply required to sound phonetically similar to the target word in order for credit to be given; for all word scoring, the number of syllables was also required match the target word.

Results

Table 2 summarizes the accuracy, lag time and other physical characteristics such as key word accuracy (average accuracy of key words) and percent-mouthed (percentage of words in original message mouthed by the transliterator) of the stimuli presented in the experiment, as well as the corresponding intelligibility results (OM—all word, OM—key word, and TM—key word) for each participant, on average. This information confirms that the physical characteristics of the stimuli selected for all four stimulus sets (each set presented to two participants5) were well balanced across participants—not only with respect to average accuracy scores and average lag time but also with respect to average key word accuracy scores and average percent-mouthed scores. In other words, the average key word accuracy of the stimuli was roughly the same for each participant, with overall key word averages of 70 and 73% for each participant’s stimulus set, and the average percent-mouthed scores of the stimulus sets varied by only 2 percentage points (80–82%). In addition, as designed, each participant viewed a stimulus set that averaged 57–59% accuracy across all stimuli presented, with average lag times per stimulus set ranging from 3.45 to 3.57 s, a difference of <3.5%.

Table 2

Stimulus characteristics and intelligibility by participant, averaged over stimulus set, with range for individual stimulus items

ParticipantCharacteristics of stimuli receivedIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-R0158703.5781677686
SEE-R0258703.5781697991
SEE-R0359733.5482738391
SEE-R0459733.5482728594
SEE-R0558733.5580617382
SEE-R0658733.5580738394
SEE-R0758733.4582678091
SEE-R0857733.5682728292
Grand average58723.5481698090
Range0–1000–100.66–10.370–1000–1000–1000–100
ParticipantCharacteristics of stimuli receivedIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-R0158703.5781677686
SEE-R0258703.5781697991
SEE-R0359733.5482738391
SEE-R0459733.5482728594
SEE-R0558733.5580617382
SEE-R0658733.5580738394
SEE-R0758733.4582678091
SEE-R0857733.5682728292
Grand average58723.5481698090
Range0–1000–100.66–10.370–1000–1000–1000–100

aFrequency of correct signs (%), based on sign sequence expected from original message.

bAverage accuracy (%-correct signs) of key words.

Table 2

Stimulus characteristics and intelligibility by participant, averaged over stimulus set, with range for individual stimulus items

ParticipantCharacteristics of stimuli receivedIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-R0158703.5781677686
SEE-R0258703.5781697991
SEE-R0359733.5482738391
SEE-R0459733.5482728594
SEE-R0558733.5580617382
SEE-R0658733.5580738394
SEE-R0758733.4582678091
SEE-R0857733.5682728292
Grand average58723.5481698090
Range0–1000–100.66–10.370–1000–1000–1000–100
ParticipantCharacteristics of stimuli receivedIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-R0158703.5781677686
SEE-R0258703.5781697991
SEE-R0359733.5482738391
SEE-R0459733.5482728594
SEE-R0558733.5580617382
SEE-R0658733.5580738394
SEE-R0758733.4582678091
SEE-R0857733.5682728292
Grand average58723.5481698090
Range0–1000–100.66–10.370–1000–1000–1000–100

aFrequency of correct signs (%), based on sign sequence expected from original message.

bAverage accuracy (%-correct signs) of key words.

As Table 2 shows, the overall intelligibility for these stimuli (averaged across all SEE transliterators and all participants) was 69% for all words in the original message, 80% for key words in the original message, and 90% for key words in the transliterated message. Although participants had varying absolute performance levels, with scores differing by 12–13 percentage points within conditions (for example, in the OM all word condition, average intelligibility ranged from SEE-R05 at 61% to SEE-R03 at 73%), the relative intelligibility of the conditions remained the same across individual participants. Specifically, intelligibility scores were highest for the TM—key word measure and lowest for the OM—all word measure (TM key word > OM key word > OM all word) for all participants (regardless of their absolute performance levels).

An examination of the average accuracy score presented to each participant and the average intelligibility that the participant obtained also followed a consistent pattern across participants. The OM—all word intelligibility score obtained by a participant was always greater—substantially greater, in most cases—than the accuracy average of the stimuli; the advantage was 11 points on average (69% versus 58%) and ranged from 3 to 15 points across individual participants. A similar relationship held for key words in that the OM—key word intelligibility score obtained by each participant was equal to or higher than the average key word accuracy of the stimuli; this advantage was 8 points on average (80% versus 72%) and ranged from 6 to 12 points for 7 of the 8 participants. Thus, it appears that the relationship between accuracy and intelligibility is largely unaffected by the type of word in the sentence.

Individual SEE Transliterator Results

As Table 3 shows, the intelligibility scores for individual SEE transliterators were consistent with the overall intelligibility results: TM intelligibility scores were the highest of the three intelligibility measures, followed by OM—key word intelligibility, and then, OM—all word intelligibility. In addition, intelligibility scores in all three conditions (last three columns of Table 3) were higher than accuracy (first data column of Table 3) for all individual SEE transliterators. This difference was substantial, with OM—all word intelligibility scores >10 percentage points higher than accuracy for 9 of 12 SEE transliterators and with the remaining two intelligibility conditions (OM—key word and TM—key word) higher than accuracy for all 12 transliterators (notably, OM—key word intelligibility and TM—key word intelligibility were also higher than key word accuracy—second data column of Table 3—for all 12 transliterators). Interestingly, the only SEE transliterator who was not able to obtain an OM—all word intelligibility score that was at least five points higher than accuracy was the one with the highest overall accuracy (SEE-T05), raising the possibility that transliterator accuracy and intelligibility may converge as a transliterator’s accuracy reaches higher levels.

Table 3

Stimulus characteristics and intelligibility by SEE transliterator, averaged over stimulus set

TransliteratorCharacteristics of selected stimuliIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-T0158694.9886727991
SEE-T0269783.9087768392
SEE-T0360794.0070658184
SEE-T0450763.0174618188
SEE-T0578841.8993818892
SEE-T0633523.9075546881
SEE-T0760752.4389818891
SEE-T0863673.8583737993
SEE-T0958793.4480718794
SEE-T1032543.3463507388
SEE-T1153644.9883657196
SEE-T1265792.6291798893
TransliteratorCharacteristics of selected stimuliIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-T0158694.9886727991
SEE-T0269783.9087768392
SEE-T0360794.0070658184
SEE-T0450763.0174618188
SEE-T0578841.8993818892
SEE-T0633523.9075546881
SEE-T0760752.4389818891
SEE-T0863673.8583737993
SEE-T0958793.4480718794
SEE-T1032543.3463507388
SEE-T1153644.9883657196
SEE-T1265792.6291798893

aFrequency of correct signs (%), based on sign sequence expected from original message.

bAverage accuracy (%-correct signs) of key words.

Table 3

Stimulus characteristics and intelligibility by SEE transliterator, averaged over stimulus set

TransliteratorCharacteristics of selected stimuliIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-T0158694.9886727991
SEE-T0269783.9087768392
SEE-T0360794.0070658184
SEE-T0450763.0174618188
SEE-T0578841.8993818892
SEE-T0633523.9075546881
SEE-T0760752.4389818891
SEE-T0863673.8583737993
SEE-T0958793.4480718794
SEE-T1032543.3463507388
SEE-T1153644.9883657196
SEE-T1265792.6291798893
TransliteratorCharacteristics of selected stimuliIntelligibility
Accuracya (%)Key word accuracyb (%)Lag time (s)Percent-mouthed (%)OM all word (%)OM key word (%)TM key word (%)
SEE-T0158694.9886727991
SEE-T0269783.9087768392
SEE-T0360794.0070658184
SEE-T0450763.0174618188
SEE-T0578841.8993818892
SEE-T0633523.9075546881
SEE-T0760752.4389818891
SEE-T0863673.8583737993
SEE-T0958793.4480718794
SEE-T1032543.3463507388
SEE-T1153644.9883657196
SEE-T1265792.6291798893

aFrequency of correct signs (%), based on sign sequence expected from original message.

bAverage accuracy (%-correct signs) of key words.

While the results of individual SEE transliterators followed the same pattern as the overall results, there was a much larger range of intelligibility scores for the individual SEE transliterators than for the individual participants in each condition. In the OM—all word condition, for example, intelligibility scores varied by 31 percentage points, from 50% (SEE-T10) to 81% (SEE-T05). Considering that the accuracy averages also varied widely (Table 3), it is possible to get a rough indication of the relationship between accuracy and intelligibility by examining the individual SEE transliterators’ average accuracy and average intelligibility scores.

As expected, the SEE transliterators’ intelligibility scores followed accuracy; that is, the SEE transliterators with higher accuracy averages also obtained higher intelligibility scores, while those with lower accuracy averages obtained lower intelligibility scores. In order to see this pattern, Table 4 shows the rank order of SEE transliterators based on accuracy averages as well as the rank order of transliterators based on average intelligibility scores. A comparison of these rank orderings reveals that accuracy and intelligibility rankings are very similar. For example, SEE-T05 is ranked highest of the SEE transliterators in both accuracy and intelligibility, while SEE-T10 is ranked lowest in both of these categories. Overall, accuracy rankings and intelligibility rankings were either the same or very similar (within 1 or 2 rankings) for all but two SEE transliterators, SEE-T03 and SEE-T07. Ranked fifth and sixth in accuracy, respectively, these SEE transliterators had nearly the same average accuracy (both rounding to 60%) but markedly different intelligibility ranks. SEE-T03 ranked eighth in intelligibility (three ranks below her accuracy ranking), while SEE-T07 was ranked second, achieving nearly the same intelligibility score as SEE-T05 (intelligibility scores for both rounded to 81%), a SEE transliterator whose accuracy was considerably higher. The reason for this unexpectedly high intelligibility is unknown and suggests that, similar to previous reports for CS transliterators (Krause & Lopez, 2017), there are factors beyond accuracy that can be used to improve SEE transliterator intelligibility.

In addition to the transliterator rankings for accuracy and intelligibility, Table 4 includes the rank order of SEE transliterators by the percentage of words in the source message that were mouthed by the transliterator. It is worth noting that this metric does not capture mouthshape clarity (which depends on many factors that are difficult to quantify); nonetheless, it provides a rough indication of the visual accessibility of the transliterator’s mouthshapes and reveals a possible explanation for the difference between accuracy and intelligibility rankings for SEE-T07 and SEE-T03. That is, SEE-T07, who had an unexpectedly high intelligibility ranking, also has a relatively high ranking in mouthing (third), whereas SEE-T03 has a much lower mouth ranking (11th) that could help to explain her lower intelligibility ranking. Thus, it is possible that the percentage of words mouthed by the SEE transliterator is another factor that may play a role in intelligibility.

The last column of Table 4 shows subjective rankings for each transliterator, derived from receiver ratings. To obtain the subjective rankings, point values were assigned to each participant’s responses (1.25 points for each “Very comfortable” rating, .625 points for each “Okay” rating, and 0 points for each “Concerned I might miss something” rating), yielding a composite rating from 0 (when all eight receivers rated the SEE transliterator with “Concerned I might miss something”) to 10 (when all eight receivers rated the SEE transliterator with “Very Comfortable”). For most of the SEE transliterators, ranks based on subjective receiver ratings generally agreed with accuracy and intelligibility ranks. The three exceptions to this pattern included SEE-T01 and SEE-T02, who were subjectively ranked much lower than their respective accuracy and intelligibility rankings, and SEE-T11, who was ranked much higher than her accuracy and intelligibility ranking. Taken together, the subjective rankings suggest that receivers highly value the traits of accuracy and intelligibility but may also value (to a lesser extent) traits other than those measured in this study.

Table 4

SEE transliterator rankings by accuracy, intelligibility, percent-mouthed, and subjective ratings

RankingAccuracyOM all word intelligibilityPercent-mouthedSubjective ratings
1stSEE-T05 (78%)SEE-T05 (81%)SEE-T05 (93%)SEE-T05 (10)
2ndSEE-T02 (69%)SEE-T07 (81%)SEE-T12 (91%)SEE-T08 (8.125)
3rdSEE-T12 (65%)SEE-T12 (79%)SEE-T07 (89%)SEE-T11 (8.125)
4thSEE-T08 (63%)SEE-T02 (76%)SEE-T02 (87%)SEE-T07 (7.5)
5thSEE-T03 (60%)SEE-T08 (73%)SEE-T01 (86%)SEE-T12 (7.1875)
6thSEE-T07 (60%)SEE-T01 (72%)SEE-T08 (83%)SEE-T09 (6.875)
7thSEE-T01 (58%)SEE-T09 (71%)SEE-T11 (83%)SEE-T03 (5)
8thSEE-T09 (58%)SEE-T03 (65%)SEE-T09 (80%)SEE-T04 (4.375)
9thSEE-T11 (53%)SEE-T11 (65%)SEE-T06 (75%)SEE-T10 (4.375)
10thSEE-T04 (50%)SEE-T04 (61%)SEE-T04 (74%)SEE-T02 (3.125)
11thSEE-T06 (33%)SEE-T06 (54%)SEE-T03 (70%)SEE-T01 (1.25)
12thSEE-T10 (32%)SEE-T10 (50%)SEE-T10 (63%)SEE-T06 (.625)
RankingAccuracyOM all word intelligibilityPercent-mouthedSubjective ratings
1stSEE-T05 (78%)SEE-T05 (81%)SEE-T05 (93%)SEE-T05 (10)
2ndSEE-T02 (69%)SEE-T07 (81%)SEE-T12 (91%)SEE-T08 (8.125)
3rdSEE-T12 (65%)SEE-T12 (79%)SEE-T07 (89%)SEE-T11 (8.125)
4thSEE-T08 (63%)SEE-T02 (76%)SEE-T02 (87%)SEE-T07 (7.5)
5thSEE-T03 (60%)SEE-T08 (73%)SEE-T01 (86%)SEE-T12 (7.1875)
6thSEE-T07 (60%)SEE-T01 (72%)SEE-T08 (83%)SEE-T09 (6.875)
7thSEE-T01 (58%)SEE-T09 (71%)SEE-T11 (83%)SEE-T03 (5)
8thSEE-T09 (58%)SEE-T03 (65%)SEE-T09 (80%)SEE-T04 (4.375)
9thSEE-T11 (53%)SEE-T11 (65%)SEE-T06 (75%)SEE-T10 (4.375)
10thSEE-T04 (50%)SEE-T04 (61%)SEE-T04 (74%)SEE-T02 (3.125)
11thSEE-T06 (33%)SEE-T06 (54%)SEE-T03 (70%)SEE-T01 (1.25)
12thSEE-T10 (32%)SEE-T10 (50%)SEE-T10 (63%)SEE-T06 (.625)
Table 4

SEE transliterator rankings by accuracy, intelligibility, percent-mouthed, and subjective ratings

RankingAccuracyOM all word intelligibilityPercent-mouthedSubjective ratings
1stSEE-T05 (78%)SEE-T05 (81%)SEE-T05 (93%)SEE-T05 (10)
2ndSEE-T02 (69%)SEE-T07 (81%)SEE-T12 (91%)SEE-T08 (8.125)
3rdSEE-T12 (65%)SEE-T12 (79%)SEE-T07 (89%)SEE-T11 (8.125)
4thSEE-T08 (63%)SEE-T02 (76%)SEE-T02 (87%)SEE-T07 (7.5)
5thSEE-T03 (60%)SEE-T08 (73%)SEE-T01 (86%)SEE-T12 (7.1875)
6thSEE-T07 (60%)SEE-T01 (72%)SEE-T08 (83%)SEE-T09 (6.875)
7thSEE-T01 (58%)SEE-T09 (71%)SEE-T11 (83%)SEE-T03 (5)
8thSEE-T09 (58%)SEE-T03 (65%)SEE-T09 (80%)SEE-T04 (4.375)
9thSEE-T11 (53%)SEE-T11 (65%)SEE-T06 (75%)SEE-T10 (4.375)
10thSEE-T04 (50%)SEE-T04 (61%)SEE-T04 (74%)SEE-T02 (3.125)
11thSEE-T06 (33%)SEE-T06 (54%)SEE-T03 (70%)SEE-T01 (1.25)
12thSEE-T10 (32%)SEE-T10 (50%)SEE-T10 (63%)SEE-T06 (.625)
RankingAccuracyOM all word intelligibilityPercent-mouthedSubjective ratings
1stSEE-T05 (78%)SEE-T05 (81%)SEE-T05 (93%)SEE-T05 (10)
2ndSEE-T02 (69%)SEE-T07 (81%)SEE-T12 (91%)SEE-T08 (8.125)
3rdSEE-T12 (65%)SEE-T12 (79%)SEE-T07 (89%)SEE-T11 (8.125)
4thSEE-T08 (63%)SEE-T02 (76%)SEE-T02 (87%)SEE-T07 (7.5)
5thSEE-T03 (60%)SEE-T08 (73%)SEE-T01 (86%)SEE-T12 (7.1875)
6thSEE-T07 (60%)SEE-T01 (72%)SEE-T08 (83%)SEE-T09 (6.875)
7thSEE-T01 (58%)SEE-T09 (71%)SEE-T11 (83%)SEE-T03 (5)
8thSEE-T09 (58%)SEE-T03 (65%)SEE-T09 (80%)SEE-T04 (4.375)
9thSEE-T11 (53%)SEE-T11 (65%)SEE-T06 (75%)SEE-T10 (4.375)
10thSEE-T04 (50%)SEE-T04 (61%)SEE-T04 (74%)SEE-T02 (3.125)
11thSEE-T06 (33%)SEE-T06 (54%)SEE-T03 (70%)SEE-T01 (1.25)
12thSEE-T10 (32%)SEE-T10 (50%)SEE-T10 (63%)SEE-T06 (.625)

Effect of Accuracy on Intelligibility

In order to examine more closely the relationship between SEE transliterator accuracy and message intelligibility, scatterplots relating the accuracy and intelligibility of individual stimulus items were examined for three combinations of measures: (a) OM—all word intelligibility versus accuracy, (b) OM—key word intelligibility versus accuracy, and (c) OM—key word intelligibility versus key word accuracy. As expected, there was a positive relationship between accuracy and intelligibility in all three cases; a Spearman’s rank order test of correlation6 confirmed the statistical significance of each relationship. Two of the relationships were moderate in strength (OM—key word intelligibility versus accuracy: Spearman’s rho = .451, P < .001; OM—key word intelligibility versus key word accuracy: Spearman’s rho = .576, P < .001), while the relationship between accuracy and OM—all word intelligibility, shown in Figure 3, was the strongest (Spearman’s rho = .704, P < .001). The strength of the linear relationship between these two variables was higher than expected, with accuracy accounting for 53% of the variation in OM—all word intelligibility scores. Moreover, it was stronger than the corresponding linear relationship previously obtained for CS, where cue accuracy accounted for just 26% of variance in OM—all word intelligibility scores (Krause & Lopez, 2017). Given the strength of the relationship, the linear fitting function plotted in Figure 3 provides a good prediction, on average, of how intelligibility changes with accuracy for SEE transliterators.

Relationship between accuracy and intelligibility of individual stimulus items (open circles represents each instance of a stimulus item) for OM all word intelligibility.
Figure 3

Relationship between accuracy and intelligibility of individual stimulus items (open circles represents each instance of a stimulus item) for OM all word intelligibility.

As we have discussed previously, however, estimating intelligibility from the linear fitting function is not always appropriate for individual stimulus items (Krause & Lopez, 2017). The reason for this is that the range of intelligibility scores corresponding to a particular accuracy can be quite large; for example, individual stimulus items with 50% accuracy ranged in intelligibility from 0 to 100% (Figure 3). This variation means that the intelligibility of individual stimulus items can, in some cases, be quite different from the intelligibility predicted by the linear fitting function. As we have previously described (Krause & Lopez, 2017), a more practical tool for describing the relationship between accuracy and intelligibility is a likelihood function, or a function which characterizes the likelihood of a particular intelligibility score at a given accuracy.

Figure 4 shows the accuracy–intelligibility likelihood function for an (OM—all word) intelligibility of 70%, calculated over 10-point accuracy intervals. In other words, this function shows the proportion of individual stimulus items within each 10-point accuracy interval that had intelligibility scores ≥70%, thereby approximating the probability that a particular participant would receive 70% or more of the words in a particular stimulus item correctly, for any stimulus item in the interval. As Figure 4 shows, the accuracy–intelligibility likelihood function is somewhat sigmoidal in shape, showing less change in intelligibility likelihood at the ends of the accuracy scale (0–45% accuracy and 65–100% accuracy) and a greater change in the middle of the scale, where the slope is steepest. In this region (45–65% accuracy), changes in accuracy cause more rapid changes in intelligibility likelihood. In other words, the likelihood of reaching 70% intelligibility declines more quickly as soon as accuracy drops <65%.

Effect of Lag Time on Intelligibility

Figure 5 shows a scatterplot relating the lag time and OM—all word intelligibility of individual stimulus items. A negative relationship between the two variables reached statistical significance (Spearman’s rho = −.273, P < .001), indicating that OM—all word intelligibility decreased as lag time increased. The relationship, however, was weak and explained just 4% of the variance in intelligibility. In order to examine the relationship further, an intelligibility likelihood function is also shown in Figure 5. This function represents the proportion of stimulus items with intelligibility scores >70%, calculated using lag time intervals of .6 s.

Proportion of data with >70% intelligibility as a function of accuracy. Each filled circle represents the proportion of data points that reach 70% or higher intelligibility scores within each 10-point range in accuracy. Dashed line shows identify function (where proportion of data with >70% intelligibility equals accuracy) for reference. Accuracy and intelligibility of individual stimulus items (open circles) are also plotted for reference.
Figure 4

Proportion of data with >70% intelligibility as a function of accuracy. Each filled circle represents the proportion of data points that reach 70% or higher intelligibility scores within each 10-point range in accuracy. Dashed line shows identify function (where proportion of data with >70% intelligibility equals accuracy) for reference. Accuracy and intelligibility of individual stimulus items (open circles) are also plotted for reference.

Likelihood that individual stimulus items have >70% intelligibility (OM all word) as a function of lag time. Each filled circle represents the proportion of data points that reach 70% or higher intelligibility scores within each .6-s range in lag times. Lag time and intelligibility of individual stimulus items (open circles) are also plotted for reference.
Figure 5

Likelihood that individual stimulus items have >70% intelligibility (OM all word) as a function of lag time. Each filled circle represents the proportion of data points that reach 70% or higher intelligibility scores within each .6-s range in lag times. Lag time and intelligibility of individual stimulus items (open circles) are also plotted for reference.

The shape of the lag time–intelligibility likelihood function in Figure 5 shows that intelligibility scores of >70% were most likely to occur for lag times between .6 and 1.2 s (that is, lag times associated with the .6-s interval centered on .9 s). A very high proportion (.92) of stimulus items produced with lag times in this range were associated with intelligibility scores of >70%, suggesting that this range of lag times is optimal for SEE transliterators (at least for the materials and speaking rate used in this study). The likelihood of intelligibility scores >70% remained very high (.79) for lag times between 1.2 and 1.8 s (that is, lag times associated with the .6-s interval centered on 1.5 s) and fairly high (.66 to .68) for intermediate lag times ranging from 1.8 to 3.0 s (that is, lag times associated with the .6-s intervals centered on 2.1 and 2.7 s, respectively). As lag time increased >3.0 s, however, intelligibility likelihood decreased substantially; likelihood of reaching at least 70% intelligibility was just .42 to .47 for stimulus items with lag times between 3.0 and 6.6 s. The lowest intelligibility likelihood, .33, occurred for stimulus items with lag times between 6.6 and 7.2 s (that is, the interval centered on 6.9 s), suggesting that very long lag times are very likely to be detrimental to intelligibility.

Role of Mouthing

The percentage of words mouthed by the transliterator was identified as another possible factor in intelligibility, based upon the individual transliterator results, described above. To examine the possible role of this factor alongside the effects of accuracy and lag time, a stepwise linear regression analysis was performed. The independent variables in the analysis were accuracy, lag time, and percent-mouthed, and the dependent variable was OM—all word intelligibility. The results of the analysis, shown in Table 5, revealed that accuracy was indeed the primary factor in intelligibility, accounting for 53% of the variance (P < .001). Mouthing explained an additional 11% of the variance (P < .001) after accuracy was taken into account, and lag time accounted for just .4% (P = .023) of the remaining variance. Together, the three variables explained 64% of the variance in intelligibility (P < .001).

Table 5

Summary of stepwise multiple regression analysis

R-Square changeSignificance
Accuracy.526P < .001
Percent-mouthed.110P < .001
Lag time.003P = .023
R-Square changeSignificance
Accuracy.526P < .001
Percent-mouthed.110P < .001
Lag time.003P = .023
Table 5

Summary of stepwise multiple regression analysis

R-Square changeSignificance
Accuracy.526P < .001
Percent-mouthed.110P < .001
Lag time.003P = .023
R-Square changeSignificance
Accuracy.526P < .001
Percent-mouthed.110P < .001
Lag time.003P = .023

Discussion

The results of this study show that average intelligibility of messages selected from 12 SEE transliterators conveying high-school level educational materials to highly skilled SEE receivers was 69% of all words in the original message. This level of intelligibility was considerably higher than message accuracy on average and for all individual transliterators; specifically, 69% of the English words in the spoken English source message were successfully transmitted to the deaf participants on average, even though just 58% of the message was signed correctly. Intelligibility of key words in the original message was higher, 80% on average, suggesting that message content was preserved more effectively than English structure. Finally, intelligibility was highest for key words in the transliterated message, demonstrating that highly skilled SEE receivers can obtain very good reception (90% on average), even when confronted with some errors in sign production (not all words in the transliterated message were signed with 100% accuracy). However, two points are worth noting. First, intelligibility of the transliterated message was still less than the intelligibility level obtained by typical hearing listeners (98% of all words and 99% of key words in the original message; Tope, 2008). Second, although this study was not designed to characterize the average intelligibility of these 12 SEE transliterators or SEE transliterators in general (but rather, to determine if and how factors such as accuracy and lag time are related to intelligibility), these intelligibility values are specific to the stimuli selected for this study, which had an average accuracy of 58%. The average intelligibility of the 12 transliterators overall is likely to be somewhat lower in all three categories because their overall average accuracy was just 42% (Krause & Murray, 2019). In any case, consistent with previously reported results for CS transliterators (Krause & Lopez, 2017), the very high transliterator message intelligibility (90%) is encouraging and suggests that if SEE transliterator accuracy were to increase, communication of the original message could be improved.

That said, the primary purpose of this study was to investigate the effects of two factors, accuracy and lag time, on the intelligibility of messages produced by SEE transliterators. Although both factors showed a statistically significant relationship with intelligibility, lag time had a much weaker effect. The negative relationship between lag time and intelligibility (decreased intelligibility with increased lag time) explained very little of the variance, especially after accounting for the effect of accuracy. However, an examination of the lag time–intelligibility likelihood function reveals a good/better/best pattern for lag times ranging from 0 to 3 s: the likelihood of reaching at least 70% intelligibility was good for 1.8–3.0 s lag times, better for 1.2–1.8 s lag times, and best for .6–1.2 s lag times. This suggests an optimal lag time range for SEE transliterators of .6 to 1.2 s with lag times of up to 3 s still having a good chance of high intelligibility (that is, > 70%).

Notably, this optimal lag time range is fairly similar to the one obtained for CS transliterators presented with the same materials (.6 to 1.8 s; Krause & Lopez, 2017) and considerably shorter than the average lag time reported for two highly accurate ASL interpreters presented with different materials (4 s; Cokely, 1986). Thus, the data so far are consistent with Cokely’s (1986) hypothesis that lag time is likely to be proportional to structural differences between the source and target languages (which are much larger for ASL interpreters working between two languages than for CS and SEE transliterators working between two forms of the same language). However, additional research is needed (for example, using similar source materials for all three groups) in order to confirm this hypothesis because the ASL interpreters in Cokely’s study had different source material (English plenary presentation sessions at a national conference), and source material characteristics such as vocabulary, complexity of language used, and speaking rate are likely to affect optimal lag time. Finally, it is important to note that such characteristics not only vary across different source materials but are also likely to vary (at least to some extent) within one set of materials. Thus, the optimal lag time range reported for SEE transliterators, while optimal on average for the materials used in this study may not necessarily be optimal for every sentence in the lecture. Rather, an inspection of the individual lag time–intelligibility likelihood functions for each sentence in the lecture is likely to reveal several different optimal lag time ranges for different types of sentences. However, such an analysis is not possible from the data collected in this study because each sentence was presented just once per stimulus block, resulting in too few data points for construction of sentence-level lag time–intelligibility functions.

Although lag time played only a small role in intelligibility, accuracy had a large effect. The positive relationship between accuracy and intelligibility (increased intelligibility with increased accuracy) accounted for 53% of the variance in transliterator intelligibility. Moreover, the strength of the linear relationship between these two variables was greater than expected, given that same relationship explained just 26% of the variance in CS transliterators (Krause & Lopez, 2017). However, it is possible that some of the difference in strength of the relationship could be explained by how the accuracy measurements are made. Accuracy was measured at the morpheme level for SEE and at the cue level for CS (the basic unit of each system, respectively), while intelligibility was measured at the word level. The result is that both strength of relationship estimates are conservative (perhaps to different degrees), since it is possible that more variance would be explained in each case if both accuracy and intelligibility were measured in the same manner. Regardless, it is clear that accuracy has a strong effect on SEE transliterator intelligibility.

Although accuracy is the predominant factor in SEE transliterator message intelligibility, mouthing of words also plays an important role. A stepwise linear regression analysis showed that mouthing accounted for an additional 11% of the variance (P < .001) beyond that explained by accuracy and lag time. This effect can also be seen in the individual SEE transliterator data. In 10 of 12 cases, transliterators’ accuracy and intelligibility ratings were very similar; in the remaining two cases (SEE-T03 and SEE-T07), differences can be explained by the transliterators’ rankings in percentage of words mouthed.

Taken together, a linear combination of accuracy, mouthing, and lag time explains a large portion of the variance (64%) in SEE transliterator intelligibility, but 36% of the variance remains unexplained. This is evident in Figure 4 (and Figure 5) in that a particular accuracy (or a particular lag time) is associated with a wide range of intelligibility scores. For example, stimulus items with 50% accuracy were associated with intelligibility ranging from 0 to 100%, and even 100% accuracy did not guarantee 100% intelligibility. Although the vast majority (94%) of stimulus items produced with 100% accuracy was at least 70% intelligible, just 53% were received by participants with 100% intelligibility. Taking all three dimensions (accuracy, mouthing, and lag time) into account reduces this variability but does not eliminate it altogether. Indeed, considering only “optimal” stimulus items that were produced with 100% accuracy, 100% mouthing, and lag times between .6 and 1.2 s, it is still the case that only 71% were received by participants with 100% intelligibility. This remaining variability has important scientific implications.

One implication is that intelligibility cannot be judged or predicted from accuracy alone (or even from a combination of accuracy, mouthing, and lag time); rather, it is more appropriate to estimate intelligibility likelihood. In this study, intelligibility likelihood calculated as a function of accuracy was somewhat sigmoidal in shape, with the likelihood of reaching 70% intelligibility dropping off fastest as accuracy falls <65%. Notably, the same knee point has been reported for CS transliterators (Krause & Lopez, 2017). In addition, the accuracy–intelligibility likelihood functions for SEE and CS have very similar slopes (2:1 change from 45 to 65% accuracy) and left–right shifts (SEE: 55% accuracy; CS: 53% accuracy), indicating similar variability and task difficulty for both groups. However, it must be noted that specific characteristics of the sigmoidal shape (such as slope and left–right shift) are likely to reflect the conditions of the experiment (Krause & Lopez, 2017), at least to the extent that accuracy–intelligibility functions are analogous to psychometric functions that have been used to document the influence of various factors on speech reception (for example, Wilson & Strouse, 1999). Therefore, it should not be assumed that 65% accuracy will always be associated with a relatively good intelligibility likelihood (that is, a relatively good likelihood of reaching at least 70% intelligibility) or that the accuracy–intelligibility likelihood relationship for SEE and CS transliterators will be similar in all circumstances. Rather, the effect of different variables (for example, speaking rate, nature of speech materials, transliterator characteristics, receiver characteristics, etc.) on the accuracy–intelligibility relationship should be assessed, not only for SEE transliterators but also for messages produced by other types of transliterators and interpreters. As a research tool, the accuracy–intelligibility likelihood function has great potential for making such assessments. It could even be used to examine individual differences between transliterators (or interpreters) or receivers, if experiments could be constructed to elicit sufficient numbers of stimulus items across the full accuracy range from individual transliterators.

Lastly, with 36% of the variance in this study still unexplained after accounting for the effects of accuracy, mouthing, and lag time, it is worth considering other sources of variability for their possible role in intelligibility. For example, several sources of variability that may influence intelligibility stem from receiver characteristics, including—but not limited to—communication background, attention, fatigue, experience, and processing strategies, such as inter-subject differences regarding degree of reliance on mouth movements and/or sign clarity. Another possible factor is the receiver’s background knowledge regarding the subject material (that is, the plant life cycle). Marschark, Sapere, Convertino, and Pelz (2008) have shown that a priori content knowledge affects deaf individuals’ comprehension of lecture material in academic settings, with more a prior knowledge associated with increased comprehension. While comprehension was not the focus of this study, it is likely that there is a similar effect for intelligibility; that is, receivers with more background knowledge regarding the subject matter are likely to recognize a higher proportion of words in the lecture (specialized vocabulary, for example), resulting in a higher intelligibility score.

In addition to receiver characteristics, other sources of variability affecting intelligibility are most likely linked to differences in transliterators. Some examples of transliterator factors that could explain additional variance in intelligibility include presentation rate (Fischer, Delhorne, & Reed, 1999), facial expression, nonmanual markers, mouth and sign synchronization, prosody (that is, timing and emphasis), intra-transliterator factors (for example, fatigue, attention, and nervousness), and dialect (dialect may be apparent in selection of signs as well as mouth movements). To further our understanding of transliterator intelligibility, it is important to isolate and quantify the contribution of as many of these (receiver and transliterator) factors as possible in future studies.

Implications for Practitioners

Given that receiver characteristics cannot be controlled in real-life interpreting situations, it is transliterator factors that are of primary interest to practitioners. If these factors can be controlled with transliterator training, then it should be possible to improve the overall message intelligibility of SEE transliterators. Based on what is currently known, SEE transliterator training should focus heavily on sign accuracy, while also giving substantial attention to mouthing; lag time need not be heavily emphasized, but unduly long lag times (>3 s) should be avoided when possible. However, it must be acknowledged that intelligibility is affected by more than just these three factors, since even an optimal combination of 100% accuracy, 100% mouthing, and .6–1.2 s lag time is insufficient to guarantee 100% intelligibility in every situation. Therefore, it is also important for some elements of SEE transliterator training to focus on other factors expected to increase intelligibility, such as those discussed above.

For assessment of SEE transliterators, the implications of this study are similar to those previously reported for CS transliterators (Krause & Lopez, 2017). That is, any assessment should include a direct evaluation of intelligibility (using highly skilled SEE receivers), in addition to evaluation transliterator accuracy and other skills. In addition, materials that are comparable (in speed and difficulty) to materials in the transliterator’s work environment should be used for assessment, since the accuracy–intelligibility likelihood function is expected to change with environment. A final point is that even when intelligibility likelihood is high, because transliterators are operating with near-perfect accuracy and consistent mouthing, intelligibility of difficult material (like the materials used in this study) can still vary from utterance to utterance. Consequently, consumers should be encouraged to request repetition when necessary to maintain message clarity in these situations.

Conclusions

The results of the present study provide valuable quantitative information regarding factors that affect the message intelligibility of SEE transliterators. By using methods similar to the second study in this series (Krause & Lopez, 2017) to examine transliterators messages across a wide range of accuracy and lag times, it was established that the primary factor affecting intelligibility is accuracy, which accounted for 53% of the variance. In addition to a strong relationship between the two variables, average intelligibility was higher than average accuracy for all individual transliterators, and 10 or more percentage points higher in the majority of cases. A secondary factor, percent of words mouthed, accounted for an additional 11% of the variance in intelligibility, while lag time had only a weak effect. However, the frequency of intelligibility scores >70% suggested that lag times ranging from .6 to 1.2 s are most likely to be associated with optimal intelligibility, although lag times up to 3 s retain reasonable likelihood of intelligibility >70%. Although these three variables account for the majority of variance in intelligibility, 36% remains unexplained. While some portion of this variability may be random or attributable to receiver characteristics (which are difficult to control), the remainder is likely due to transliterator factors. Such factors are particularly important to understand because they can potentially be adjusted by transliterators to improve intelligibility.

Of all the transliterator factors identified in this study, speechreadability is of particular interest because of its relationship to mouthing. As a more refined measurement of transliterator mouthing than the one used in this study, speechreadability carries the potential to explain a larger portion of the variance in intelligibility than mouthing alone and thus warrants further investigation. In addition, other transliterator factors that may influence intelligibility (for example, presentation rate, facial expressions, nonmanual markers, etc.) should be systematically examined in future research in order to evaluate the extent to which each one affects intelligibility. Such research could help determine the relative weighting of each factor’s contribution to intelligibility, which could then be used to improve validity and efficiency of training and assessment. Moreover, as this study has shown, relative weightings can vary considerably depending on communication mode, which underscores the value of examining each communication mode separately. Therefore, it is important to extend this work to educational interpreters who use communication modes such as Conceptually Accurate Signed English (that is, CASE, a method of transliteration that represents English, to the extent possible, with ASL signs alone; no invented signs are employed for English affixes or concepts without ASL signs; see Winston, 1989) as well as ASL in order to increase our understanding of the factors needed to ensure source message accessibility in every modality.

Funding

National Institute on Deafness and Other Communication Disorders (National Institutes of Health grant number 5 R03 DC 007355).

Acknowledgments

The authors wish to thank Kendall Tope Beaudry and John Lum for assistance in stimulus creation, and Nancy Murray, Andrea Smith, and Steven Surrency for donated transliteration services used to develop screening and practice items.

Notes

1 The term “visual signal” used here is a shortened version of Battison’s (1978) observation that sign language is “a manually produced, visually received signal” while speech is “an orally produced, auditorily received signal.”

2 Whereas the function of an interpreter is to translate between two languages (for example, spoken English and ASL), the function of a transliterator is to transfer information between two modes of the same language (for example, spoken English and either signed English or cued English).

3 Manually coded English (MCE) includes any invented sign system designed to represent English visually through the use of signs borrowed from ASL and other invented signs. Examples of MCE systems include SEE, Signed English (Bornstein, 1990), and Seeing Essential English (Luetke-Stahlman & Milburn, 1996).

4 Also known as contact signing, PSE is a form of signing that serves as a middle ground between signers who are fluent in ASL and those who are not (and thus have more English influence in their signing). In contrast to MCE systems, which are invented, PSE occurs naturally in conversation as needed.

5 Due to experimenter error, SEE-R07 was presented a small amount of stimuli (about 10%) from the wrong stimulus set. The remainder of the stimulus set was presented correctly and was identical to the stimulus set presented to SEE-R08.

6 This non-parametric test of correlation was employed because the data are not normally distributed; as is visible on the graphs, the distribution of the accuracy–intelligibility functions is skewed toward 100%, given that a substantial number of stimuli reached the maximum intelligibility values.

References

Battison
,
R.
(
1978
).
Lexical borrowing in American sign language
.
Silver Spring, MD
:
Linstok Press
.

Bornstein
,
H.
(
1990
). Signed English. In
H.
Bornstein
(Ed.),
Manual communication: implications for education
(pp.
128
138
).
Washington, DC
:
Gallaudet University Press
.

Cokely
,
D.
(
1986
).
The effects of lag time on interpreter errors
.
Sign Language Studies
,
53
,
341
375
.

Cornett
,
R. O.
(
1967
).
Cued speech
.
American Annals of the Deaf
,
112
(
1
),
3
13
.

Films for the Humanities (Producer)
(
1989
).
The life cycle of plants [film]
.
Available from Films Media Group, PO Box 2053, Princeton, NJ, 08543-2053

Fischer
,
S. D.
,
Delhorne
,
L. A.
, &
Reed
,
C. M.
(
1999
).
Effects of rate of presentation on the reception of American sign language
.
Journal of Speech, Language, and Hearing Research
,
42
,
568
582
.

Gustason
,
G.
,
Pfetzing
,
D.
, &
Zawolkow
,
E.
(
1972
).
Signing exact English
.
Los Alamitos, CA
:
Modern Sign Press
.

Hammil
,
D. D.
,
Brown
,
V. L.
,
Larsen
,
S. C.
, &
Wiederholt
,
J. L.
(
1994
).
Test of adolescent and adult language
(3rd Ed.).
Austin, TX
:
Pro-Ed
.

Kile
,
S.
(
2005
).
An evaluation of CASE transliteration accuracy
(Unpublished undergraduate honor’s thesis)
.
Tampa, Florida
:
University of South Florida
.

Krause
,
J. C.
,
Kegl
,
J. A.
, &
Schick
,
B.
(
2008
).
Toward extending the Educational Interpreter Performance Assessment to Cued Speech
.
Journal of Deaf Studies and Deaf Education
,
13
(
3
),
432
450
.

Krause
,
J. C.
, &
Lopez
,
K. A.
(
2017
).
Cued speech transliteration: Effects of accuracy and lag time on message intelligibility
.
Journal of Deaf Studies and Deaf Education
,
22
,
378
392
.

Krause
,
J. C.
, &
Murray
,
N. J.
(
2019
).
Signing Exact English Transliteration: Effects of Speaking Rate and Lag Time on Production Accuracy
.
Journal of Deaf Studies and Deaf Education
,
24
(
3
),
234
244
.

Krause
,
J. C.
, &
Tessler
,
M. P.
(
2016
).
Cued Speech transliteration: Effects of speaking rate and lag time on production accuracy
.
Journal of Deaf Studies and Deaf Education
,
21
,
373
382
.

Luetke-Stahlman
,
B.
, &
Milburn
,
W. O.
(
1996
).
A history of seeing essential English (SEE I)
.
American Annals of the Deaf
,
141
(
1
),
29
33
. doi:
https://doi.org/10.1353/aad.2012.0001

Magner
,
M. E.
(
1972
).
A speech intelligibility test for deaf children
.
Northampton, MA
:
Clarke School for the Deaf
.

Marschark
,
M.
,
Sapere
,
P.
,
Convertino
,
C.
, &
Pelz
,
J.
(
2008
).
Learning via direct and mediated instruction by deaf students
.
Journal of Deaf Studies and Deaf Education
,
13
(
4
),
546
561
.

Mathworks
(
2007
).
MATLAB [Computer software] (Version 7.6 R29)
.
Natick, MA
.

Miller
,
G. A.
(
1956
).
The magical number seven plus or minus two: some limitations on our capacity for processing information
.
Psychological Review
,
63
,
81
97
.

Nielsen
,
D. C.
,
Luetke
,
B.
, &
Stryker
,
D. S.
(
2011
).
The importance of morphemic awareness to reading achievement and the potential of signing morphemes to supporting reading development
.
Journal of Deaf Studies and Deaf Education
,
16
(
3
),
275
288
.

Schick
,
B.
, &
Williams
,
K.
(
1994
). The evaluation of educational interpreters. In
B.
Schick
&
M. P.
Moeller
(Eds.),
Sign language in the schools: current issues and controversies
(pp.
47
56
).
Omaha, NE
:
Boys Town Press
.

Schick
,
B.
,
Williams
,
K.
, &
Bolster
,
L.
(
1999
).
Skill levels of educational interpreters working in public schools
.
Journal of Deaf Studies and Deaf Education
,
4
(
2
),
144
155
.

Schick
,
B.
,
Williams
,
K.
, &
Kupermintz
,
H.
(
2006
).
Look who’s being left behind: educational interpreters and access to education for deaf and hard-of-hearing students
.
Journal of Deaf Studies and Deaf Education
,
11
(
1
),
3
20
.

Tope
(
2008
).
The effect of bilingualism on L2 speech perception
(Unpublished undergraduate honor’s thesis)
.
Tampa, FL
:
University of South Florida
.

Wilson
,
R. H.
, &
Strouse
,
A. L.
(
1999
). Auditory measures with speech signals. In
Contemporary perspectives in hearing assessment
(pp.
21
99
).
Needham Heights, MA: Allyn & Bacon
.

Winston
,
E.
(
1989
). Transliteration: what’s the message? In
C.
Lucas
(Ed.),
The sociolinguistics of the deaf community
(pp.
147
164
).
San Diego, CA
:
Academic Press
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)