Abstract

A computer-administered version of the Word Memory Test (WMT) was compared with the orally administered version in two clinical samples to assess equivalency of the two versions. The two samples included inpatients at an epilepsy center (n = 67) and forensic and clinical referrals to a private practice (n = 58). A randomized procedure was used to assign participants to either version of the WMT. Only the results of the WMT primary effort measures were analyzed. Between-group comparisons of the WMT effort measures were conducted using Mann–Whitney nonparametric analysis. No significant differences were found between versions for several diagnostic subgroups. The data generally support equivalency of the orally administered version and the computerized version of the WMT effort measures in a mixed outpatient sample.

The Word Memory Test (WMT; Green, Allen, & Astner, 1996) is a test of verbal learning and memory, which can be administered orally or self-administered by computer, designed to allow detection of suboptimal effort during testing (Green, Iverson, & Allen, 1999). In addition to measuring the examinee's ability to learn and remember word pairs, the WMT incorporates several indices of response bias (Wynkoop & Denney, 2005). The Immediate Recognition (IR), Delayed Recognition (DR), and Consistency (CNS) subtests are considered to be the primary effort measures. In prior studies, the WMT was sensitive to suboptimal effort, and it appears to be relatively insensitive to genuine neuropsychological impairment (Goodrich-Hunsaker & Hopkins, 2009; Green, Flaro, & Courtney, 2009; Green et al., 1999; Green, Lees-Haley, & Allen, 2002).

The WMT was originally developed as an orally administered test with an examiner reading the list of 20 word pairs (Green & Astner, 1995) and later adapted for computerized administration in a DOS environment utilizing visual presentation of the word pairs. The original computerized version was published by CogniSyst (Green et al., 1996). More recently, another computerized version programmed for the Windows environment was published by the test author, Green (2003). The computerized version offers the advantages of efficiency because of self-administration and automated scoring, but the oral version of the WMT may be preferred by some examiners who choose to avoid the use of computers for test administration for a variety of reasons. The availability, cost, and portability of necessary equipment may render computerized administration prohibitive. Although increased portability of computerized versions of traditional paper-and-pencil tests may be now possible via laptop computer, bedside computerized testing of acute medical/neurological patients may remain unwieldy. Computerized administrations of neuropsychological tests typically require that an examinee is physically able to utilize a keyboard or manipulate and click a mouse in order to record responses, which may be severely limiting to individuals with physical or motor impairments. Computers may not be feasible or allowed in certain circumstances, such as forensic testing in a prison or other environment with increased security. Additional potential advantages of an examiner-administered assessment include an increased ability to observe an examinee's test-taking approach and to modify test administration for purposes of “testing the limits” (Agnew, Schwartz, Bolla, Ford, & Bleecker, 1991; Schatz & Browndyke, 2002). Finally, examiners may select traditional paper-and-pencil versions over computerized adaptations out of concern for an examinee's aversion to or lack of exposure to computers. This later concern may be particularly salient for computer-adapted symptom validity tests (SVTs); attorneys could attribute failing WMT scores to confusion about the computer interface rather than suboptimal effort.

Computers have been increasingly used for the administration of psychological and neuropsychological measures (Campbell et al., 1999; Mead & Dragsow, 1993; Russell, 2000), and several traditional paper-and-pencil neuropsychological tests have been computerized as the newer electronic versions provide many advantages over the examiner-administered paper-and-pencil tests. These have been discussed elsewhere (Campbell et al., 1999; Fortuny & Heaton, 1996; Lichtenberger, 2006; Kane & Kay, 1992; Schatz & Browndyke, 2002), but some of the inherent benefits of computer-based assessment include increased rigorous standardization in administration and scoring, decreased examiner influence, automated comparison to normative databases and analysis of response patterns, and transfer of scores to database for storage. While the relative advantages and disadvantages of traditional examiner- and computer-administered assessments must always be carefully weighed during the process of test selection with special consideration of the type of data to be collected and the purpose of the evaluation, the examiner should first examine how modification of the traditional format into a computerized version affects the validity of a particular test and consider whether the two versions are interchangeable.

The increasing adaptation of examiner-administered neuropsychological tests to a computer-based format and utilization of computerized versions necessitate that the equivalency of computerized and noncomputerized versions of the same tests be established. Modification of traditional tests for computerized administration might introduce differences in task demand and response method that could potentially alter the sensitivity of a test to various examinee characteristics (Agnew et al., 1991); thus, differentially impacting neuropsychological test results depending on administration format and affecting the comparability of test versions. If equivalency is not established, one cannot assume that the results on one version are comparable with the results obtained from the other version. Comparisons of computerized and noncomputerized versions of other neuropsychological tests have not consistently demonstrated equivalence. Research comparing standard and computerized versions of the Category Test has been equivocal; some supporting their equivalence (Choca & Morris, 1992; Mercer, Harrell, Miller, Childs, & Rockers, 1997) and at least one study indicating that the standard version was not equivalent to its computerized counterpart (Berger, Chibnall, & Gfeller, 1994). French and Beaumont's (1990) research supported equivalency for the standard and computer-administered versions of the Mill Hill Vocabulary Test, but not the Standard Progressive Matrices Test; they concluded that the two versions of the later test could not be used interchangeably, as patients performed significantly worse on the computerized version than on the standard version. Comparison of the traditional oral version of the Serial Digit Learning Test and a computerized version showed a correlation of only .23 compared with test–retest correlation for repeated administrations of the oral version of .55 (Campbell et al., 1999). Furthermore, Mead and Dragsow (1993) reported a medium effect size of 0.72 for performance differences between computerized and noncomputerized versions of perceptual speed tests. These studies highlight the need to demonstrate that a computerized version of a test is equivalent to its standard examiner-administered counterpart. While it has long been recognized that adaptation of a traditional paper-and-pencil test to a computerized format may alter the nature of the test and that there is a need to assure psychometric equivalency (Hofer, 1985; Kane & Kay, 1992), the advantages of computerized administration may justify the use of computer versions for some clinicians in the absence of established psychometric equivalency with the traditional counterpart. As Schatz and Browndyke (2002) pointed out, computerized versions of the Wisconsin Card Sorting Test are widely utilized despite a lack of demonstrated equivalency with the standard manually administered version (Fortuny & Heaton, 1996). Given the WMT is the most robustly researched SVT and is considered by many neuropsychologists to be one of the most accurate SVTs (Hartman, 2002; Sharland & Gfeller, 2007), demonstration of the equivalency between the standard and computer versions is of critical import.

Although test developers have historically cautioned against using computer-administered versions as interchangeable equivalents of the traditional examiner-administered version (Letz & Baker, 1986), the oral and computerized versions of the WMT tend to be utilized by most clinicians as interchangeable versions despite the possibility that the two versions may not yield equivalent results. The social demand characteristics of an in-person administration might affect the performance on a test sensitive to effort, and the performance might differ depending on whether or not a live person (i.e., examiner) is sitting in front of the examinee. Situational variables, such as the presence of third-party observers and audio-recording, have been demonstrated to have a potent effect on neuropsychological test performance (Binder & Johnson-Greene, 1995; Constantinou, Ashendorf, & McCaffrey, 2002; Yantz & McCaffrey, 2009). The presence of an examiner may improve or worsen the performance depending on the motivation of the examinee. The performance on the computerized Test of Memory Malingering in undergraduate research volunteers was associated with worse performance when an examiner was not present to observe an examinee's performance, but the same effect did not occur for the computerized Wisconsin Card Sorting Test (Yantz & McCaffrey, 2007). This finding suggests that an examiner's presence and attention to an examinee's performance may affect the task performance on computerized SVTs. As previously mentioned, the original and computerized versions of the WMT differ with regard to modality presentation of the word pairs (i.e., auditory vs. visual, respectively). Thus, it is possible that such differences produce discrepant results due to the different aspects or types of memory function (i.e., verbal memory vs. visual memory or possible dual encoding of stimuli) being tapped by a particular version. Another difference between the two versions is that only the computerized version provides feedback regarding accuracy during the primary effort trials. The examiner does not provide feedback in the oral version.

Although the computerized versions of the WMT are widely used and the oral and computerized forms have been described as “appearing equivalent” (Green et al., 2002, p. 100), we were unable to find a single published study that compared the performance on the computerized version with the orally administered version. In an online published study, Allen and Green's (2002) data indicated that the two versions yielded similar results on the primary effort measures of IR, DR, and CNS. There were no significant differences between scores on the two forms, although mean scores were found to be up to 7% higher for the oral WMT version than for the computerized administration in individuals suspected of putting forth suboptimal effort. Given this nonsignificant trend, they postulated that face-to-face oral administration may moderate the degree to which an individual not putting forth full effort performs poorly on the WMT whereas the largely unobserved computerized administration contributes to exaggerated SVT performance in those who are not inclined to put forth full effort. Although Allen and Green concluded that the two versions are equivalent for their overall sample, the presence of a possible “observer effect” for face-to-face versus computerized administration of the WMT led the authors to state, “that SVTs that require face-to-face administration may be inferior to unattended computerized administration for discovering suboptimal effort during neuropsychological examination” (2000, p. 840).

Although it is asserted in the WMT manual (Green, 2005) that the data collected from the oral version is applicable to the computerized version, it is important to empirically demonstrate equivalence of the oral and computer-administered versions of the WMT given the clinical and medicolegal implications of WMT failure. Utilized as an SVT to detect suboptimal effort and symptom exaggeration, failure on the WMT can be interpreted as evidence that the examinee may be feigning memory impairment. Such a performance on the WMT leads examiners to question the validity of the additional data obtained during an assessment and often raises the question of malingering. If the oral and computer versions of the WMT are not equivalent, interpretation of failures should include consideration of the version administered.

The present study examined the equivalency of the computerized version of the IR, DR, and CNS scores of the WMT (Green, 2003, 2005) to same scores derived from the orally administered version of the test in two clinical populations. One sample consisted of inpatients referred for neuropsychological examinations during continuous video-telemetry EEG monitoring for evaluation of medically intractable seizures at the University of Washington (UW) Regional Epilepsy Center. The other sample consisted of clinical and forensic referrals to an outpatient neuropsychological practice in the Portland, Oregon, area.

Materials and Methods

Subjects

The subjects were divided into two groups: (i) inpatients at the UW Regional Epilepsy Center and (ii) examinees at the outpatient private practice of one of the authors (LMB) in Oregon, including both forensic and clinical referrals. There were no significant differences between the two groups with regard to age, t(123) = 1.31; p = .19, and education, t(123) = 1.78; p = .078. All patients were 18 years of age or older, native English speakers, and able to give consent and understand all test instructions.

Inpatient participants

The inpatient sample in this study included a consecutive sample of adult patients referred to UW Regional Epilepsy Center for continuous video-EEG monitoring during a 6-month period of 2005. All patients underwent video-EEG monitoring for a minimum of 24 hr and most completed an extensive battery of neuropsychological tests and underwent an MRI scan of the brain. Some epilepsy patients also received additional diagnostic tests, such as PET or SPECT scans or invasive monitoring (i.e., grid or strip monitoring), in order to more definitively localize or lateralize their events.

All inpatients were diagnosed based on their video-EEG recordings by one of three board certified electroencephalographers, and were given one of the five diagnoses based on the following criteria: (a) Epileptic Seizures (ES)—Evidence of definite ictal EEG abnormalities or interictal epileptiform discharges; (b) Psychogenic Non-Epileptic Seizures (PNES)—Episodes of unresponsiveness or behavioral abnormality in the absence of epileptiform EEG changes; (c) Indeterminate Spells (IS)—No spells during monitoring or subjective feelings only, in the absence of EEG abnormality, unresponsiveness, or behavioral abnormality; (d) Co-Occurrence Group (CO)—Evidence of episodes fitting the criteria for both ES and PNES during the same or across multiple monitoring sessions; or (e) Physiological Non-Epileptic Seizures (PhyNES)—This included patients with spells resulting from medical conditions other than epilepsy (i.e., syncopal episodes, sleep disturbance). For the purposes of this study, we excluded all patients experiencing acute seizures (<24 hr prior to WMT administration, n = 25), as we have found that post-ictal testing with the WMT differs from baseline in many of these individuals (Williamson et al., 2005). We also excluded those who had undergone previous neurosurgery (n = 6), those who were too severely impaired to be tested in any fashion (e.g., developmentally delayed and completely nonverbal, n = 5), and patients whose primary language was not English. We did not exclude cognitively impaired patients on the basis of any other criteria (e.g., some studies exclude those who are institutionalized or cannot live independently), as we wanted to see how low-functioning individuals perform on the two versions of the tests.

The UW inpatient sample consisted of 23 male and 44 female inpatients, 61 Caucasians (91%), 2 African Americans (3%), 2 Hispanics (3%), 1 African American–Caucasian (1.5%), and 1 Native American-Caucasian (1.5%). Diagnostic make-up of the inpatient sample consisted of 31 individuals (46.3%) diagnosed with epilepsy, 30 individuals (44.8%) with PNES, 2 individuals (3%) with co-occurrence of epilepsy and PNES, and 4 individuals (6%) with sleep disturbance. Age ranged between 18 and 83 years (mean = 38.50, SD = 12.05). Completed years of education ranged between 4 and 18 years (mean = 12.70, SD = 2.69).

Outpatient participants

The outpatient sample consisted of 37 men and 21 women, 55 Caucasians (94.8%), and 3 African Americans (5.2%). Diagnostically, the sample consisted of 21 individuals (36.2%) with mild head trauma, 11 individuals (19.0%) with developmental or learning disorders, 9 individuals (15.5%) with psychiatric disorders, 7 individuals with moderate–severe traumatic brain injury (12.1%), 3 individuals with unknown diagnoses (5.2%), 2 individuals with dementia, (3.4%), 2 individuals (3.4%) with epilepsy, 1 individual (1.7%) with medically unexplained symptoms, 1 individual (1.7%) with a stroke, and 1 individual (1.7%) with hydrocephalus. Diagnoses were performed on a clinical basis from history, neurodiagnostic studies, and medical diagnoses. Age ranged between 17 and 71 years (mean = 42.22, SD = 14.72). Completed years of education ranged between 9 and 20 years (mean = 13.55, SD = 2.64).

Participants in both the inpatient and outpatient samples were randomly assigned to either the computerized or oral administration groups. In the inpatient setting, we had traditionally administered the oral WMT, which was the original format of presentation for this test, but shifted to the computer WMT for 5 weeks for quality control purposes (i.e., before deciding whether or not to switch to the newer, computer version of the WMT, we wanted to be sure that both measures produced equivalent results). All consecutive examinees referred during that 5-week period were administered the computer version of the WMT, whereas examinees referred prior to or subsequent to this period all received the oral version. All inpatients included in this study came from an institutional review board-approved database registry maintained by the UW Regional Epilepsy Center, and all patients included in this database provided written consent to allow their data, including all disease-related and demographic variables and neuropsychological and neuroimaging results, to be stored on an ongoing basis for clinical research. As we requested the data used in the current study retrospectively, all inpatients were blind to any study specifics. In the outpatient setting, randomization was accomplished by listing the order of randomization, and then assigning consecutive examinees to either condition according to the list. All outpatient examinees included in this study provided informed consent to participate in a clinical evaluation, and their data were subsequently de-identified and entered into a research database without any individually identifiable health information.

Among the inpatient sample, there were 33 subjects (10 men and 23 women) in the oral administration group with a mean age of 39.91 years (SD = 13.33) and 34 subjects (13 men and 21 women) in the computerized administration group with a mean age of 37.03 years (SD = 10.68). Mean WAIS-III FSIQ for the oral group was 87.82 (SD = 19.07) and 92.39 (SD = 17.36) for the computer group. Within the inpatient sample, there were no significant differences between the groups on age, U = 504.50; p = .48; FSIQ, U = 48; p = .40, or education, U = 466; p = .23. There were similar numbers of men and women in the oral and computer administration groups, χ2 (1) = 0.18; p = .67.

In the outpatient sample, there were 30 subjects (19 men and 11 women) in the oral administration group with a mean age of 40.83 years (SD = 14.77) and 28 subjects (18 men and 10 women) in the computerized administration group with a mean age of 43.71 years (SD = 14.80). Mean WAIS-III FSIQ for the oral group was 94.35 (SD = 15.43) and 102.16 (SD = 15.23) for the computer group. There were no significant differences between the groups on age, U = 357; p = .33, FSIQ, U = 236.50; p = .10, or education, U = 393.50; p = .67. There were similar numbers of men and women in the oral and computer administration groups, Pearson χ2 (1) = 0.001; p = 1.00.

Procedure

All patients completed a comprehensive neuropsychological assessment that included either the oral or computer version of the WMT. For the inpatient sample, the WMT was administered during the first hour of testing, and preceded all other memory measures. Order of test administration was not controlled in the outpatient sample. The inpatient and the outpatient samples were analyzed separately, given the potentially disparate characteristics of these two samples. For the purposes of this study, only the data from the WMT primary effort measures, including IR, DR trial, and CNS, were utilized. Both versions of the WMT were administered according to standardized instructions. Green and Astner (1995) provided a description of the oral version and instructions for administration. Green (2003) provided a description of the computer version and instructions for administration.

Results

Because the data violated the assumption of normality, between-group comparisons of the WMT effort measures were analyzed using Mann–Whitney nonparametric analyses. There were no significant differences between the two versions on the DR effort measures for either the inpatient sample, DR: U = 482; p = .32, or the outpatient sample, DR: U = 371.50; p = .44. However, there was a nonsignificant trend with outpatients administered the oral version scoring lower on CNS than those administered the computer version, CNS: U = 306.00; p = .07. Inpatients did not significantly differ on CNS, U = 432.50; p = .11. On the IR measurement in the inpatient sample, those administered the computer version scored significantly higher, U = 384.50; p = .02, although the effect size of 0.15 (Cohen's d) was small. However, for the outpatient sample, there was no significant difference in IR effort scores between versions, U = 333.00; p = .17. Means and standard deviations for percentages of WMT variables for the inpatient and outpatient samples are listed in Tables 1 and 2.

Table 1.

Means and standard deviations for the WMT effort measures: Inpatient sample

Variable Version
 
 Oral (n = 33)
 
Computer (n = 34)
 
 M SD M SD 
IR percentage 88.11 13.16 90.37 16.25 
DR percentage 89.85 12.18 89.12 17.17 
CNS percentage 86.82 10.39 88.60 13.54 
Variable Version
 
 Oral (n = 33)
 
Computer (n = 34)
 
 M SD M SD 
IR percentage 88.11 13.16 90.37 16.25 
DR percentage 89.85 12.18 89.12 17.17 
CNS percentage 86.82 10.39 88.60 13.54 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

Table 2.

Means and standard deviations for the WMT effort measures: outpatient sample

Variable Version
 
 Oral (n = 30)
 
Computer (n = 28)
 
 M SD M SD 
IR percentage 88.25 10.22 88.84 17.49 
DR percentage 89.87 11.26 88.30 18.14 
CNS percentage 86.08 10.50 89.29 13.59 
Variable Version
 
 Oral (n = 30)
 
Computer (n = 28)
 
 M SD M SD 
IR percentage 88.25 10.22 88.84 17.49 
DR percentage 89.87 11.26 88.30 18.14 
CNS percentage 86.08 10.50 89.29 13.59 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

The published failure cutoff score for the WMT effort measures was utilized (Green et al., 1996). The specific cutoff scores that identify poor effort are not identified here but are available from the authors and can be found in the test manual, per Sweet's (1999) recommendation. Failure rates for the inpatient and outpatient samples are presented in Table 3. Among the inpatient sample, 10 (30.3%) subjects in the oral administration group and 9 (26.5%) of subjects in the computer administration group failed at least one of the three effort measures. Among the outpatient sample, 12 subjects (40.0%) in the oral administration group and 7 subjects (25.0%) in the computer administration group failed at least one of the three effort measures. Failure rates were not significantly different for the oral and computer administration groups among the inpatient sample, Pearson χ2 (1, N = 67) = 0.01; p = .94, or the outpatient sample, Pearson χ2 (1, N = 58) = .88; p = .35. The combined failure rate for both WMT versions was 28.4% for the inpatient sample and 32.8% for the outpatient sample.

Table 3.

Inpatient and outpatient WMT failure rates

Sample Version
 
 Oral Computer 
Inpatient 10 (30.3%) 9 (26.5%) 
Outpatient 12 (40.0%) 7 (25.0%) 
Sample Version
 
 Oral Computer 
Inpatient 10 (30.3%) 9 (26.5%) 
Outpatient 12 (40.0%) 7 (25.0%) 

Note: Failure rates are shown as the number of subjects scoring below the published cutoffs, with the percentage of the group failing showing in parentheses.

Equivalency was further examined by breaking each sample down by diagnostic subgroups. Among the inpatient sample, there were no significant differences in mean scores between versions for participants diagnosed with epilepsy, IR: U = 91.50; p = .26, DR: U = 117; p = .94, CNS: U = 107; p = .63, or PNES, IR: U = 74; p = .11, DR: U = 89; p = .34, CNS: U = 81; p = .20. However, while the differences in inpatient failure rates did not reach statistical significance, numerical differences were evident; specifically, epilepsy patients failed the computer version more frequently than the oral version (31.3% vs. 14.3%), Pearson χ2 (1, N = 31) = 0.33; p = .57, and PNES patients more frequently failed the oral version (43.8% vs. 28.6%), Pearson χ2 (1, N = 30) = 0.23; p = .63. Means and standard deviations for the epilepsy participants are presented in Table 4. Means and standard deviations for the PNES participants are presented in Table 5. In a follow-up analysis, we found that there was a nonsignificant trend for PNES patients to fail the oral version more than patients with epilepsy (43.8% vs. 14.3%), Pearson χ2 (1, N = 30) = 3.09; p = .08, yet there was virtually no difference in their performance on the computer version (28.6% for PNES vs. 31.3% for epilepsy), Pearson χ2 (1, N = 30) = 0.03; p = .87. Such a difference in discriminatory power based on test version may have significant clinical implications but would need to be replicated given our small sample size.

Table 4.

Means and standard deviations for the WMT effort measures: Epilepsy participants

Variable Version
 
 Oral (n = 14)
 
Computer (n = 17)
 
 M SD M SD 
IR percentage 91.61 11.87 90.00 16.89 
DR percentage 91.96 14.15 89.12 17.21 
CNS percentage 90.54 10.10 88.53 14.36 
Variable Version
 
 Oral (n = 14)
 
Computer (n = 17)
 
 M SD M SD 
IR percentage 91.61 11.87 90.00 16.89 
DR percentage 91.96 14.15 89.12 17.21 
CNS percentage 90.54 10.10 88.53 14.36 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

Table 5.

Means and standard deviations for the WMT effort measures: PNES participants

Variable Version
 
 Oral (n = 16)
 
Computer (n = 14)
 
 M SD M SD 
IR percentage 84.38 17.50 89.29 17.50 
DR percentage 87.50 11.44 87.50 19.12 
CNS percentage 84.06 10.91 87.50 14.18 
Variable Version
 
 Oral (n = 16)
 
Computer (n = 14)
 
 M SD M SD 
IR percentage 84.38 17.50 89.29 17.50 
DR percentage 87.50 11.44 87.50 19.12 
CNS percentage 84.06 10.91 87.50 14.18 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; PNES = Psychogenic Non-Epileptic Seizures; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

Among the outpatient sample, there were no significant differences in mean scores found between versions for participants with history of mild head trauma, IR: U = 36.50; p = .19, DR: U = 39; p = .25, CNS: U = 35.50; p = .17, developmental/learning disabilities, IR: U = 13.50; p = .78, DR: U = 10; p = .32, CNS: U = 10; p = .35, or psychiatric conditions, IR: U = 8; p = .61, DR: U = 7.50; p = .52, CNS: U = 7.50; p = .52. There were no significant differences in failure rates for any of the diagnostic subgroups within the outpatient sample. For the developmental/learning disability group, none of the six examinees failed the oral version (0.0%) and one of the five examinees failed the computer version (20.0%), Pearson χ2 (1, N = 11) = 0.01; p = .92. Among the mild head trauma group, 5 of 10 examinees failed the oral version (50.0%) and 3 of 11 examinees failed the computer version (27.3%), Pearson χ2 (1, N = 21) = 0.386; p = .53. For the psychiatric group, 2 of 5 examinees failed the oral version (40.0%) and 0 of 4 examinees failed the computer version (0.0%), Pearson χ2 (1, N = 9) = 0.39; p = .53. The remaining outpatient groups (moderate–severe traumatic brain injury, unknown diagnoses, dementia, epilepsy, medically unexplained symptoms, stroke, and hydrocephalus) each had seven or less examinees in each group and failure rates were not compared across versions. Means and standard deviations for participants diagnosed with mild head injury are presented in Table 6, learning disabilities in Table 7, and psychiatric conditions in Table 8.

Table 6.

Means and standard deviations for the WMT effort measures: Mild head trauma participants

Variable Version
 
 Oral (n = 10)
 
Computer (n = 11)
 
 M SD M SD 
IR percentage 84.25 12.53 87.96 19.23 
DR percentage 86.00 12.97 88.41 18.00 
CNS percentage 83.00 11.47 88.64 14.72 
Variable Version
 
 Oral (n = 10)
 
Computer (n = 11)
 
 M SD M SD 
IR percentage 84.25 12.53 87.96 19.23 
DR percentage 86.00 12.97 88.41 18.00 
CNS percentage 83.00 11.47 88.64 14.72 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

Table 7.

Means and standard deviations for the WMT effort measures: Learning disability participants

Variable Version
 
 Oral (n = 6)
 
Computer (n = 5)
 
 M SD M SD 
IR percentage 95.83 3.76 96.00 5.18 
DR percentage 97.67 2.27 96.50 7.83 
CNS percentage 94.25 3.59 94.50 3.59 
Variable Version
 
 Oral (n = 6)
 
Computer (n = 5)
 
 M SD M SD 
IR percentage 95.83 3.76 96.00 5.18 
DR percentage 97.67 2.27 96.50 7.83 
CNS percentage 94.25 3.59 94.50 3.59 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

Table 8.

Means and standard deviations for the WMT effort measures: Psychiatric participants

Variable Version
 
 Oral (n = 5)
 
Computer (n = 4)
 
 M SD M SD 
IR percentage 91.50 9.62 96.98 2.39 
DR percentage 95.00 5.00 97.50 2.04 
CNS percentage 88.00 12.04 95.63 2.39 
Variable Version
 
 Oral (n = 5)
 
Computer (n = 4)
 
 M SD M SD 
IR percentage 91.50 9.62 96.98 2.39 
DR percentage 95.00 5.00 97.50 2.04 
CNS percentage 88.00 12.04 95.63 2.39 

Note: WMT = Word Memory Test; M = Mean; SD = Standard Deviation; IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency.

Discussion

Demonstration of equivalency of test formats is imperative to confirm that factors specific to computer administration do not modify test performance (Choca & Morris, 1992). The present study examined the equivalency of the computerized version and the orally administered version of the WMT in two clinical populations.

We found negligible differences between the two versions of the WMT in either the outpatient or inpatient samples when considering each sample as a whole. No significant differences were found for rate of failure for either the outpatient or the inpatient samples. There were no differences in mean scores on the primary effort measures (IR and DR) in the outpatient sample. The CNS mean score difference in the outpatient sample approached significance with those administered the oral version scoring lower than those administered the computer version. In the inpatient sample, performance was equivalent on the DR and CNS variables. While there was a significant difference in the inpatient sample on IR means, the resulting effect size was small. This difference again reflected a slightly lower mean for the oral versus the computer version.

Although small sample sizes may limit the usefulness of comparisons within specific diagnostic groups, we decided to conduct preliminary subgroup analyses to guide future research. No significant mean score differences on the primary effort measures were found between versions for the following diagnostic subgroups: Epilepsy, PNES, mild head trauma, learning disabilities, or psychiatric conditions. However, patients with epilepsy more often failed the computer version (31.3%) rather than the oral version (14.3%), while patients with PNES showed the opposite pattern of performance (i.e., 28.6% failed the computer version but 43.8% failed the oral version). Future research with larger samples should investigate whether the oral version has more discriminatory power for these subgroups than the computer version. We should note that at least two of the lower functioning epilepsy patients (both developmentally delayed) who failed the computer version had clear-cut difficulty with the physical completion of the task. For example, one of them held the space bar down as more than one item passed. According to the author of the WMT, it would have been acceptable for the examiner to manually enter the responses for these patients under these circumstances (P. Green, personal communication, 2007), and this approach may have lessened the rate of failure among low-functioning individuals. Without their inclusion, the failure rates for epilepsy patients would have been more equivalent between the two WMT versions (computer WMT = 21.4%, oral WMT = 14.3%). Likewise, with the current sample sizes, small shifts in the pass–fail distributions could greatly alter the χ2 results. Overall, more research with larger sample sizes is needed to determine whether the two versions are equivalent for various diagnostic subgroups.

The failure rate within our outpatient sample was higher than those reported by Green (2003) for persons with no external incentives for poor neuropsychological performance. This is important to note, as a recent study designed to look at specificity and sensitivity estimated that 20% of WMT failures may be false positives and that lower cutoff scores improve the specificity of the test (Greve, Ord, Curtis, Bianchini, & Brennan, 2008). In our outpatient sample, for example, the groups with developmental learning and psychiatric problems were largely without financial incentives, yet 3 of 20 in those combined groups failed at least one of the primary effort measures. Such results could suggest that failures in patients with clear-cut brain dysfunction and other forms of pathology are more common than in Green's sample. However, we should also point out that a “genuine memory impairment profile” has been developed for the WMT using additional subtests not administered to all the outpatient sample (Green, 2005), and the genuine memory impairment profile was not employed in the current study or in the study by Greve and colleagues (2008). This profile is intended to eliminate potential false-positive errors among patients experiencing true impairment, and a preliminary study employing it appears to be promising (Green et al., 2009). Therefore, a change in cutoff scores appears to be premature, although further research is recommended, exploring sensitivity and specificity issues across various patient subgroups with both versions of the WMT.

The failure rate observed in the current study for the inpatient epilepsy monitoring sample (23.0% of ES and 36.7% PNES patients failed the WMT regardless of version) is generally consistent with our prior work (Drane et al., 2006; Williamson et al., 2004; seeWilliamson, Drane, & Stroup, 2007 for specificity data). The base rate of failure on the WMT for ES patients in this study is consistent with that observed (22%) in a similar patient group on other commonly utilized SVT measures as well (Cragar, Berry, Fakhoury, Cibula, & Schmitt, 2006). However, our PNES failure rates for the WMT, both reflected by our current data and our previous work, are somewhat higher than the results obtained by Cragar with differing SVT measures (24%; Cragar et al., 2006). As Cragar and colleagues concluded, this may relate to differences in sensitivity to poor task engagement. This seems particularly plausible as the discrepancy between groups appears to be present for the PNES data, but not for the ES data.

We have previously reported a slightly higher rate of failure (28.0%) on the WMT in an unselected epilepsy sample that included patients tested post-ictally and those that were institutionalized (Williamson et al., 2004). Our exclusion of all post-ictal patients in the current study may account for the slightly lower rate of failure in our current sample. We have also previously reported slightly higher WMT failure rates (40%–50%) for PNES patients than found in the current PNES sample (Drane et al., 2006; Williamson et al., 2004); however, the higher failure rates previously reported were obtained from samples in which only the oral version was administered. In the current study, it is the slightly lower failure rate on the computer version that appears to be bringing down the overall rate for the PNES sample. One could speculate that perhaps the provision of feedback regarding accuracy on the computer version or the absence of an examiner alters the performance of the PNES patients in a manner not observed in ES patients.

In summary, the mean scores generally support equivalency of the orally administered version and the computerized version of the WMT in the outpatient sample; however, there is a possible lack of equivalency in the inpatient sample. Within a mixed outpatient sample of the type described here, computerized administration of the WMT does not appear to significantly alter performance on the effort measures. Failure rates for the two versions should be further explored by subgroup, to investigate whether performance patterns differ at this level. Subsequent validity papers are needed that examine the rate of WMT failure in various patient populations, and we recommend that these employ the new genuine impairment profile based on our current experience (i.e., using the primary effort indices alone would likely lead to false-positive errors with more impaired patients) and current test publisher recommendations.

Conflict of Interest

Laurence Binder has used the oral version of the WMT in his forensic practice.

Acknowledgement

We would like to thank the anonymous reviewers for their helpful suggestions and feedback.

References

Agnew
J.
Schwartz
B. S.
Bolla
K. I.
Ford
D. P.
Bleecker
M. L.
Comparison of computerized and examiner-administered neurobehavioral testing techniques
Journal of Occupational Medicine
 , 
1991
, vol. 
33
 (pg. 
1156
-
1162
)
Allen
L. M.
Green
P.
Moderated memory deficit exaggeration in face-to-face administration of the Word Memory Test
Archives of Clinical Neuropsychology
 , 
2000
, vol. 
15
 (pg. 
840
-
841
)
Allen
L. M.
Green
P.
Equivalence of the computerized and orally administered Word Memory Test effort measures
WebPsychEmpiricist
 , 
2002
 
Retrieved July 30, 2010 from http://www.wpe.info/papers_table.html
Berger
S. G.
Chibnall
J. T.
Gfeller
J. D.
The Category Test: A comparison of computerized and standard versions
Assessment
 , 
1994
, vol. 
1
 (pg. 
255
-
258
)
Binder
L. M.
Johnson-Greene
D.
Observer effects on neuropsychological performance: A case report
The Clinical Neuropsychologist
 , 
1995
, vol. 
9
 (pg. 
74
-
78
)
Campbell
K. A.
Rohlman
D. S.
Storzbach
D.
Binder
L. M.
Anger
W. K.
Kovera
C. A.
, et al.  . 
Test-retest reliability of psychological and neurobehavioral tests self-administered by computer
Assessment
 , 
1999
, vol. 
6
 (pg. 
21
-
32
)
Choca
J.
Morris
J.
Administering the Category Test by computer: Equivalence of results
The Clinical Neuropsychologist
 , 
1992
, vol. 
6
 (pg. 
9
-
15
)
Constantinou
M.
Ashendorf
L.
McCaffrey
R. J.
When the third party observer of a neuropsychological evaluation is an audio-recorder
The Clinical Neuropsychologist
 , 
2002
, vol. 
16
 (pg. 
407
-
412
)
[PubMed]
Cragar
D. E.
Berry
D. T. R.
Fakhoury
T. A.
Cibula
J. E.
Schmitt
F. A.
Performance of patients with epilepsy or psychogenic non-epileptic seizures on four measures of effort
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
552
-
566
)
Drane
D. L.
Williamson
D. J.
Stroup
E. S.
Holmes
M. D.
Jung
M.
Koerner
E.
, et al.  . 
Cognitive impairment is not equal in patients with epileptic and psychogenic nonepileptic seizures
Epilepsia
 , 
2006
, vol. 
47
 (pg. 
1879
-
1886
)
Fortuny
L. A.
Heaton
R. K.
Standard versus computerized administration of the Wisconsin Card Sorting Test
The Clinical Neuropsychologist
 , 
1996
, vol. 
10
 (pg. 
419
-
424
)
French
C. C.
Beaumont
J. G.
A clinical study of the automated assessment of intelligence by the Mill Hill Vocabulary Test and the Standard Progressive Matrices Test
Journal of Clinical Psychology
 , 
1990
, vol. 
46
 (pg. 
129
-
140
)
Goodrich-Hunsaker
N. J.
Hopkins
R. O.
Word Memory Test performance in amnestic patients with hippocampal damage
Neuropsychology
 , 
2009
, vol. 
23
 (pg. 
529
-
534
)
Green
P.
Green's Word Memory Test for Windows user's manual
 , 
2003
Edmonton
Green's Publishing
Green
P.
Green's Word Memory Test for Windows user's manual – Revised
 , 
2005
Edmonton
Green's Publishing
Green
P.
Allen
L.
Astner
K.
The Word Memory Test: A user's guide to the oral and computer-assisted forms, US version 1.1
 , 
1996
Durham, NC
CogniSyst
Green
P.
Astner
K.
Manual: Word Memory Test (Research Form I) Oral Administration.
 , 
1995
Durham, NC
CogniSyst
Green
P.
Flaro
L.
Courtney
J.
Examining false positives on the Word Memory Test in adults with mild traumatic brain injury
Brain Injury
 , 
2009
, vol. 
23
 (pg. 
741
-
750
)
[PubMed]
Green
P.
Iverson
G. L.
Allen
L.
Detecting malingering in head injury litigation with the Word Memory Test
Brain Injury
 , 
1999
, vol. 
13
 (pg. 
813
-
819
)
[PubMed]
Green
P.
Lees-Haley
P. R.
Allen
L. M.
The Word Memory Test and the validity of neuropsychological test scores
Journal of Forensic Neuropsychology
 , 
2002
, vol. 
2
 (pg. 
97
-
124
)
Greve
K. W.
Ord
J.
Curtis
K. L.
Bianchini
K. J.
Brennan
A.
Detecting malingering in traumatic brain injury and chronic pain: A comparison of three forced-choice symptom validity tests
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
896
-
918
)
Hartman
D. E.
The unexamined lie is a lie worth fibbing: Neuropsychological malingering and the Word Memory Test
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
709
-
714
)
[PubMed]
Hofer
P. J.
Developing standards for computerized psychological testing
Computers in Human Behavior
 , 
1985
, vol. 
1
 (pg. 
305
-
315
)
Kane
R. L.
Kay
G. G.
Computerized assessment in neuropsychology: A review of tests and test batteries
Neuropsychology Review
 , 
1992
, vol. 
3
 (pg. 
1
-
117
)
Letz
R.
Baker
E. L.
Computer-administered neurobehavioral testing in occupational health
Seminars in Occupational Medicine
 , 
1986
, vol. 
1
 (pg. 
197
-
203
)
Lichtenberger
E. O.
Computer utilization and clinical judgment in psychological assessment reports
Journal of Clinical Psychology
 , 
2006
, vol. 
62
 (pg. 
19
-
32
)
Mead
A. D.
Dragsow
F.
Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis
Psychological Bulletin
 , 
1993
, vol. 
114
 (pg. 
449
-
458
)
Mercer
W. N.
Harrell
E. H.
Miller
D. C.
Childs
H. W.
Rockers
D. M.
Performance of brain-injured versus healthy adults on three versions of the Category Test
The Clinical Neuropsychologist
 , 
1997
, vol. 
11
 (pg. 
174
-
179
)
Russell
E. W.
Vanderloeg
R. D.
The application of computerized scoring programs to neuropsychological assessment
Clinician's guide to neuropsychological assessment
 , 
2000
2nd ed., pp. 483–515
New Jersey
Laurence Erlbaum Associates
Schatz
P.
Browndyke
J.
Applications of computer-based neuropsychological assessment
Journal of Head Trauma Rehabilitation
 , 
2002
, vol. 
17
 (pg. 
395
-
410
)
Sharland
M. J.
Gfeller
J. D.
A survey of neuropsychologists' beliefs and practices with respect to the assessment of effort
Archives of Clinical Neuropsychology
 , 
2007
, vol. 
22
 (pg. 
213
-
223
)
Sweet
J. J.
Sweet
J. J.
Malingering: Differential diagnosis
Forensic Neuropsychology: Fundamentals and Practice
 , 
1999
New York
Swets & Zeitlinger
(pg. 
255
-
285
)
Williamson
D. J.
Drane
D. L.
Stroup
E. S.
Boone
K. B.
Symptom validity tests in the epilepsy clinic
Assessment of Feigned Cognitive Impairment: A Neuropsychological Perspective
 , 
2007
New York
The Guilford Press
(pg. 
346
-
365
)
Williamson
D. J.
Drane
D. L.
Stroup
E. S.
Holmes
M. D.
Wilensky
A. J.
Miller
J. W.
Recent seizures may distort the validity of neurocognitive test scores in patients with epilepsy
Epilepsia
 , 
2005
, vol. 
46
 
(Suppl. 8)
pg. 
74
 
Williamson
D. J.
Drane
D. L.
Stroup
E. S.
Miller
J. W.
Holmes
M. D.
Wilensky
A. J.
Detecting cognitive differences between patients with epilepsy and patients with psychogenic nonepileptic seizures: Effort matters
Epilepsia
 , 
2004
, vol. 
45
 
(Suppl. 7)
pg. 
179
 
[PubMed]
Wynkoop
T. F.
Denney
R. L.
Test review: Green's Word Memory Test (WMT) for Windows
Journal of Forensic Neuropsychology
 , 
2005
, vol. 
4
 (pg. 
101
-
105
)
Yantz
C. J.
McCaffrey
R. J.
Social facilitation effect of examiner attention or inattention to computer-administered neuropsychological tests: First sign that the examiner may affect results
The Clinical Neuropsychologist
 , 
2007
, vol. 
21
 (pg. 
663
-
671
)
[PubMed]
Yantz
C. L.
McCaffrey
R. J.
Effects of parental presence and child characteristics on children's neuropsychological test performance: Third party observer effect confirmed
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
118
-
132
)

Author notes

A version of this paper was presented as a poster at the 35th annual meeting of the International Neuropsychological Society in Portland, Oregon.