Abstract

The relative usefulness of two digit span (DS) variables in detecting negative response bias, as defined by below cut-off performance on the Test of Memory Malingering (TOMM), was examined among primarily middle-aged military veteran outpatients who were judged clinically to be at increased risk for displaying negative response bias on cognitive testing. Digit span variables included DS Age Scaled Score (DS Age SS) and Reliable DS. Findings from this retrospective data analysis (N = 46) suggest that DS Age SS is preferable for use over Reliable DS in predicting TOMM failure. Results of the current study suggest that, particularly if the Wechsler scales are an existing part of the neuropsychological assessment, examination of DS Age SS is an efficient means of detecting negative response bias.

Introduction

In the last decade or so, there has been a surge of interest in developing stand-alone measures used to detect negative response bias on neuropsychological testing. Accompanying this interest has been a corollary push to identify validity checks that are embedded in already existing neuropsychological instruments. Some authors have argued that the utilization of such embedded indicators may not only be more efficient and less costly than the use of stand-alone measures, but may also be potentially more valid (for review, see Meyers & Volbrecht, 2003). Specifically, embedded symptom validity tests may be more sensitive to malingering because persons who malinger do not necessarily do so consistently across the test battery (Meyers & Volbrecht, 2003). Instead, they may choose to feign one type of impairment or another. Thus, persons feigning different types of cognitive impairment (i.e., impaired fine motor skill versus attention deficits) may be better identified by techniques specific to the nature of the alleged dysfunction rather than by one global measure of malingering, most of which focus on memory skills.

Because basic measures of attention have been shown to be relatively insensitive to severe neurological injury (Baddeley & Warrington, 1970; Butters & Cermak, 1980; Cermak & Butters, 1972; Warrington & Weizkrantz, 1973; for review, see Suhr & Barrash, 2007), they are a focus of considerable interest as potentially useful malingering detection measures. Reliable DS appears to be the most well researched of these attention measures. Reliable DS was introduced by Greiffenstein, Baker and Gola (1994) and is calculated by summing the longest string of digits repeated without error over two trials under both forward and backward conditions of the DS task. Reliable DS was originally derived from the Wechsler Adult Intelligence Scale – Revised (Wechsler, 1981), but it is also easily calculable from the Wechsler Adult Intelligence Scale – Third Edition (WAIS-III; Wechsler, 1997a), the Wechsler Memory Scale – Third Edition (WMS-III; Wechsler, 1997b), and the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler, 2008).

A review of the literature shows that the use of Reliable DS has resulted in sensitivities ranging from .27 to .89 and specificities ranging from .57 to 1.0 (Table 1). Perhaps in light of the widely varying sensitivities and specificities, recent research has examined the relative usefulness of other DS variables in predicting malingering. Specifically, a few recently published studies have included a comparison of the relative usefulness of Reliable DS and DS Age Scaled Score (DS Age SS) in predicting negative response bias (Table 1). Results of these studies have generally shown that, when using the traditional cut-offs on these variables, Reliable DS appears to be more sensitive (sensitivities ranging from .54 to .65) to negative response bias than DS Age SS (sensitivities ranging from .36 to .42); however, there is a negative trade-off in terms of specificity. That is, Reliable DS tends to produce more false-positive errors (specificity ranging from .77 to .89) than does DS Age SS (specificities ranging from .92 to .93) (Babikian, Boone & Arnold, 2006; Greve et al., 2007; Heinly, Greve, Bianchini, Love & Brennan, 2005). Corresponding positive predictive values (PPVs) for DS Age SS ranged from 0.69 to 1.0, whereas PPVs for Reliable DS ranged from .58 to 1.0 (Table 1). Negative predictive values (NPVs) for DS Age SS ranged from .68 to .94, whereas NPVs for Reliable DS are slightly lower, ranging from .66 to .90 (Table 1). The association between Reliable DS and excessive false positive errors is a concern because mislabeling patients/forensic clients/veteran compensation and pension candidates as displaying negative response bias can be associated with negative emotional, reputational, and, sometimes, financial consequences for them.

Table 1.

Summary of published sensitivity and specificity values for various Digit Span variables

Na Cut-off Sensitivity (%) Specificity (%) PPVb NPVb Description of non-malingerers Description of malingerers Age of groups (mean years) Authors 
DS Age SS 
 60/40 <5 90 90–100c .86–1.0 .93–0.94 1/3 undergraduates, 1/3 federal inmates, 1/3 persons with head injury (80% severe) Simulatedd: 1/2 students, 1/2 federal inmates 21–38 Iverson and Franzen (1994) 
 60/40 ≤3 78 100 1.0 .87 Memory impaired, undergraduates, psychiatric patients Simulated: Undergraduates and psychiatric patients, repeated measures design 20–42 Iverson and Franzen (1996) 
 51/36 ≤5 36 97 .89 .69 Traumatic brain injury, mild traumatic brain injury Probablee: Mild head trauma 38–43 Axelrod and colleagues (2006) 
 88/66 ≤5 42 93 .81 .70 Neuropsychology clinic referrals, normal controls Probable: Nearly all litigation/compensation seeking for cognitive symptoms 42–75 Babikian and colleagues (2006)f 
 40/20 ≤5 40 100 1.0 .71 Undergraduate controls, undergraduates experiencing cold pressor-induced pain Simulated: undergraduates asked to simulated pain-related memory impairment Range: 18–23 Etherton, Bianchini, Ciota, Heinly and Greve (2006) Study 1 
 124/32 ≤5 47 85 .69 .70 Clinic pain patients, traumatic brain injury, and memory disorder patients Definiteg: Pain patients 32–70 Etherton and colleagues (2006) Study 2 
 146/71h ≤5 36 93 .78 .68 Traumatic brain injury Probable/definite:Traumatic Brain Injury 40 Heinly and colleagues (2005)f 
 38/46 ≤5 46 92 .80 .71 Toxic exposure Probable/definite: Toxic exposure 40–43 Greve and colleagues (2007)f 
Reliable DS 
 63/43 ≤7 68 89 .81 .80 Persistent postconcussive syndrome and traumatic brain injury Probable: Persistent postconscussive syndrome 33–39 Greiffenstein and colleagues (1994) 
 49/47 ≤7 49 96 .89 .73 Mild brain injury Probable: Litigating mild brain injuryi 36–40 Meyers & Volbrecht (1998) 
 40/34 ≤7 53 95 .88 .74 Naïve healthy people, professionals working with head-injured people, persons with head injury not in litigation Simulated: Naïve healthy people, professionals working with head-injured people, persons with head injury not in litigation 33–41 Strauss and colleagues (2002) 
 48/44 ≤7 27 100 1.0 .66 Undergraduates with and without head injury Simulated: Undergraduates with and without head injury 18–19 Inman and Berry (2002) 
 30/24 ≤7 67 93 .87 .80 Traumatic brain injury, 67% severe Probable/definite: Traumatic brain injury, 75% mild 34–39 Mathias, Greve, Bianchini, Houston and Crouch (2002) 
 56/68 ≤7 86j 57j .58 .85 Severe brain injury, persistent post-concussive syndrome Probable: Persistent postconcussive symptoms 33–38 Greiffenstein and colleagues (1995) 
  89k 68k .66 .90     
 134/54 ≤7 68 72 .63 .76 Penitentiary sample of pretrial/presentence detainees Probable: Penitentiary sample of pretrial/presentence detainees 35– 38 Duncan and Ausborn (2002) 
 27/24 ≤7 50 94 .85 .73 Moderate/severe closed head injury Definite: Mostly mild closed head trauma 35–39 Larrabee (2003) 
 122/35 ≤7 60 92 .84 .77 Chronic pain, moderate to severe traumatic brain injury Definite: Chronic pain 35–43 Etherton, Bianchini, Greve and Heinly (2005a) 
 40/20 ≤7 65 100 1.0 .80 Undergraduate controls, undergraduates experiencing cold pressor-induced pain Simulated: Undergraduates asked simulated pain-related memory impairment Range: 18–23 Etherton, Bianchini, Ciota and Greve (2005b) 
 88/66 ≤7 62 77 .65 .74 Neuropsychology clinic referrals, normal controls Probable: Nearly all litigation/ compensation seeking for cognitive symptoms 42–75 Babikian and colleagues (2006)f 
 146/71h ≤7 65 87 .78 .78 Traumatic Brain Injury Probable/definite: Traumatic Brain Injury 40 Heinly and colleagues (2005)f 
 38/46 ≤7 54 89 .77 .74 Toxic Exposure Probable/definite: Toxic exposure 40–43 Greve and colleagues (2007)f 
Na Cut-off Sensitivity (%) Specificity (%) PPVb NPVb Description of non-malingerers Description of malingerers Age of groups (mean years) Authors 
DS Age SS 
 60/40 <5 90 90–100c .86–1.0 .93–0.94 1/3 undergraduates, 1/3 federal inmates, 1/3 persons with head injury (80% severe) Simulatedd: 1/2 students, 1/2 federal inmates 21–38 Iverson and Franzen (1994) 
 60/40 ≤3 78 100 1.0 .87 Memory impaired, undergraduates, psychiatric patients Simulated: Undergraduates and psychiatric patients, repeated measures design 20–42 Iverson and Franzen (1996) 
 51/36 ≤5 36 97 .89 .69 Traumatic brain injury, mild traumatic brain injury Probablee: Mild head trauma 38–43 Axelrod and colleagues (2006) 
 88/66 ≤5 42 93 .81 .70 Neuropsychology clinic referrals, normal controls Probable: Nearly all litigation/compensation seeking for cognitive symptoms 42–75 Babikian and colleagues (2006)f 
 40/20 ≤5 40 100 1.0 .71 Undergraduate controls, undergraduates experiencing cold pressor-induced pain Simulated: undergraduates asked to simulated pain-related memory impairment Range: 18–23 Etherton, Bianchini, Ciota, Heinly and Greve (2006) Study 1 
 124/32 ≤5 47 85 .69 .70 Clinic pain patients, traumatic brain injury, and memory disorder patients Definiteg: Pain patients 32–70 Etherton and colleagues (2006) Study 2 
 146/71h ≤5 36 93 .78 .68 Traumatic brain injury Probable/definite:Traumatic Brain Injury 40 Heinly and colleagues (2005)f 
 38/46 ≤5 46 92 .80 .71 Toxic exposure Probable/definite: Toxic exposure 40–43 Greve and colleagues (2007)f 
Reliable DS 
 63/43 ≤7 68 89 .81 .80 Persistent postconcussive syndrome and traumatic brain injury Probable: Persistent postconscussive syndrome 33–39 Greiffenstein and colleagues (1994) 
 49/47 ≤7 49 96 .89 .73 Mild brain injury Probable: Litigating mild brain injuryi 36–40 Meyers & Volbrecht (1998) 
 40/34 ≤7 53 95 .88 .74 Naïve healthy people, professionals working with head-injured people, persons with head injury not in litigation Simulated: Naïve healthy people, professionals working with head-injured people, persons with head injury not in litigation 33–41 Strauss and colleagues (2002) 
 48/44 ≤7 27 100 1.0 .66 Undergraduates with and without head injury Simulated: Undergraduates with and without head injury 18–19 Inman and Berry (2002) 
 30/24 ≤7 67 93 .87 .80 Traumatic brain injury, 67% severe Probable/definite: Traumatic brain injury, 75% mild 34–39 Mathias, Greve, Bianchini, Houston and Crouch (2002) 
 56/68 ≤7 86j 57j .58 .85 Severe brain injury, persistent post-concussive syndrome Probable: Persistent postconcussive symptoms 33–38 Greiffenstein and colleagues (1995) 
  89k 68k .66 .90     
 134/54 ≤7 68 72 .63 .76 Penitentiary sample of pretrial/presentence detainees Probable: Penitentiary sample of pretrial/presentence detainees 35– 38 Duncan and Ausborn (2002) 
 27/24 ≤7 50 94 .85 .73 Moderate/severe closed head injury Definite: Mostly mild closed head trauma 35–39 Larrabee (2003) 
 122/35 ≤7 60 92 .84 .77 Chronic pain, moderate to severe traumatic brain injury Definite: Chronic pain 35–43 Etherton, Bianchini, Greve and Heinly (2005a) 
 40/20 ≤7 65 100 1.0 .80 Undergraduate controls, undergraduates experiencing cold pressor-induced pain Simulated: Undergraduates asked simulated pain-related memory impairment Range: 18–23 Etherton, Bianchini, Ciota and Greve (2005b) 
 88/66 ≤7 62 77 .65 .74 Neuropsychology clinic referrals, normal controls Probable: Nearly all litigation/ compensation seeking for cognitive symptoms 42–75 Babikian and colleagues (2006)f 
 146/71h ≤7 65 87 .78 .78 Traumatic Brain Injury Probable/definite: Traumatic Brain Injury 40 Heinly and colleagues (2005)f 
 38/46 ≤7 54 89 .77 .74 Toxic Exposure Probable/definite: Toxic exposure 40–43 Greve and colleagues (2007)f 

Notes: PPV = Positive Predictive Value; NPV = Negative Predictive Value.

aReported Ns are for non-malingerers/malingerers.

bValues calculated using a base rate of .41.

c90 in a traumatic brain injury patient sample, 100 in non-malingering students and inmates.

dIndividuals who were instructed not to put forth adequate effort.

eIndividuals who, regardless of motive, did not put forth adequate effort as measured by various tests of motivation.

fInvestigations that provide sensitivity and specificity data for both Reliable DS and DS Age SS.

gIndividuals who demonstrated a statistically below-chance performance on a forced choice symptom validity test.

hFor clarity, only the traumatic brain injury statistics are presented from this study.

iSuspect malingering group assignment was made solely based on litigation versus non-litigation status, not objective measures.

jWhen malingerers were compared with non-malingering persons with traumatic brain injury.

kWhen malingerers were compared with non-malingering persons with persistent post-concussive symptoms.

Although research seems to suggest that DS Age SS may be a more specific predictor of negative response bias than Reliable DS, unlike Reliable DS, DS Age SS does not seem to be enjoying widespread use among neuropsychologists. In fact, while a recent survey addressing practices of fellows and professional members of the National Academy of Neuropsychology did not include DS Age SS as one of the 29 measures used by neuropsychologists to assess effort, results of the survey showed that Reliable DS, on the other hand, was the sixth most-often “Always” used measure. Specifically, the neuropsychologists ranked their use of Reliable DS as: Always 13.0%, Often 25.0%, Rarely 24.5%, and never 37.5% (Sharland & Gfeller, 2007).

Because it is likely that many clinicians are unaware of the research supporting the use of DS Age SS over Reliable DS in measuring effort, we hope to provide additional exposure and support for the use the former by cross-validating its utility in the veteran population. Specifically, it was the aim of the present study to compare the relative usefulness of DS Age SS and Reliable DS in predicting failure on the Test of Memory Malingering (TOMM; Tombaugh, 1996) among a group of middle-aged veterans judged to be at an increased risk of displaying negative response bias by the evaluating clinician. It was hypothesized that DS Age SS would be a better predictor of TOMM failure than Reliable DS.

Materials and Methods

Participants

Data were collected from the files of 46 outpatients who were referred to the lead author (K.W.) for neuropsychological testing within a 19-month period at a Department of Veterans Affairs (VA) Medical Center. Referral sources included the Psychiatry Ambulatory Care Clinic, primary medical clinics, and the neurology clinic. No patients were diagnosed with mental retardation. Consecutive referrals were reviewed for cases that were administered both the TOMM and DS of the WAIS-III as part of the clinical neuropsychological evaluation.

Consistent with the current practice recommendations (Bush et al., 2005) and assessment guidelines (Slick, Sherman & Iverson, 1999), symptom validity was clinically evaluated in these patients with consideration to evidence provided by the referral context, clinical interview, behavioral observations during test administration, and performance on measures of examinee effort. Not all patients seen by K.W. are routinely administered tests of effort. For example, typically patients who have an incentive to perform well on testing are typically not administered tests of effort. These patients include those who are being tested for capacity to manage their own funds, suitability for penile implants, suitability for organ transplants and so forth. The reasons for administering effort testing to patients in the present study were often multi-factorial and included self-reported cognitive deficits in a relatively young person (i.e., under the age of 65) who was free of major medical or neurological illness that produced symptoms obvious to the observer; the observation of multiple requests for compensation and pension evaluations in the patient's medical record; referring diagnosis of mild traumatic brain injury, somatization disorder, or toxic exposure; observation of inconsistent self-reports concerning illness or injury in the patient's medical record; and/or suspicions regarding symptom exaggeration as noted by other clinicians.

It should be noted that, in the present study, all patients were primarily referred for neuropsychological evaluation to assess the potential presence of cognitive dysfunction, not primarily to assess for the presence of psychiatric disorder. At the time of this study, participants' medical records were retrospectively reviewed and they were categorized into groups according to whether or not they passed (n = 26, 56.5% of total sample) or failed (n = 20, 43.5% of total sample) the TOMM. Patient demographic and diagnostic information is listed in Table 2. In both the Pass TOMM (26.9%) and Fail TOMM (30.0%) groups, approximately one-third of patients were referred with symptoms of traumatic brain injury; judging by their self-reports, most of these patients reported brain injuries would have fallen within the “mild traumatic brain injury” or lesser “possible traumatic brain injury” range (100% in the Fail TOMM group, 57% in the Pass TOMM group) (Malec, Brown, Leibson, Flaada & Mandrekar, 2007).

Table 2.

Patient demographic and diagnostic information

 Pass TOMM group (N = 26) Fail TOMM group (N = 20) 
Demographic information 
Age, M (SD50.6 (9.7); range 30–68 47.3 (9.8); range 29–62 
Highest year of education, M (SD12.0 (2.0); range 7–16 11.9 (2.0); range 8–16 
Gender 
 Female, n (%) 1 (3.8) 3 (15.0) 
 Male, n (%) 25 (96.2) 17 (85.0) 
Ethnicity 
 Caucasian, n (%) 21 (80.8) 18 (90.0) 
 African-American, n (%) 5 (19.2) 2 (10.0) 
Diagnostic information 
Traumatic brain injury,an (%) 7 (26.9) 6 (30.0) 
 Mild traumatic brain injury, n (%) 4 (15.4) 6 (30.0) 
 Moderate-to-severe traumatic brain injury, n (%) 3 (11.5) 0 (0.0) 
Under the age of 65, memory complaints with no obvious neurological dysfunction, n (%) 19 (73.1) 14 (70.0) 
 Psychiatric diagnosis only, n (%) 4 (15.4) 5 (25.0) 
 Neurologic diagnosis only, n (%) 2 (7.7) 1 (5.0) 
 Psychiatric and neurologic diagnoses, n (%) 13 (50.0) 8 (40.0) 
 Pass TOMM group (N = 26) Fail TOMM group (N = 20) 
Demographic information 
Age, M (SD50.6 (9.7); range 30–68 47.3 (9.8); range 29–62 
Highest year of education, M (SD12.0 (2.0); range 7–16 11.9 (2.0); range 8–16 
Gender 
 Female, n (%) 1 (3.8) 3 (15.0) 
 Male, n (%) 25 (96.2) 17 (85.0) 
Ethnicity 
 Caucasian, n (%) 21 (80.8) 18 (90.0) 
 African-American, n (%) 5 (19.2) 2 (10.0) 
Diagnostic information 
Traumatic brain injury,an (%) 7 (26.9) 6 (30.0) 
 Mild traumatic brain injury, n (%) 4 (15.4) 6 (30.0) 
 Moderate-to-severe traumatic brain injury, n (%) 3 (11.5) 0 (0.0) 
Under the age of 65, memory complaints with no obvious neurological dysfunction, n (%) 19 (73.1) 14 (70.0) 
 Psychiatric diagnosis only, n (%) 4 (15.4) 5 (25.0) 
 Neurologic diagnosis only, n (%) 2 (7.7) 1 (5.0) 
 Psychiatric and neurologic diagnoses, n (%) 13 (50.0) 8 (40.0) 

Note:aTraumatic brain injury categorization was based on patient self-report and is of questionable validity.

The majority of patients in both the Pass TOMM (73.1%) and Fail TOMM (70.0%) groups were referred with memory complaints, and, even though they may have had a neurological and/or psychiatric diagnosis associated with potential brain dysfunction, they had no signs of neurological dysfunction that were obvious to the observer. In the Pass TOMM group, 50% (n = 13/26) of patients had both a neurological and psychiatric diagnosis, 15% (n = 4/26) had only a psychiatric diagnosis (n = 3 depression and/or anxiety, n = 1 schizoaffective disorder), and 8% (n = 2/26) had only a neurological diagnosis, specifically, coronary artery disease. In patients with both a neurological and psychiatric condition in the Pass TOMM group (n = 13/26), the most common neurological condition was hypertension (n = 7/13), while the most common psychiatric condition was depression and/or anxiety (n = 11/13).

In the Fail TOMM group, 40% (n = 8/20) of patients had both a neurological and a psychiatric diagnosis, 25% (n = 5/20) had only a psychiatric diagnosis (n = 5 depression and/or anxiety, n = 3 post-traumatic stress disorder) and 5% (n = 1/20) had only a neurological diagnosis, specifically, small vessel ischemic disease. The most common neurological conditions in persons with both neurological and psychiatric diagnoses in the Fail TOMM group (n = 8/20) were hypertension (n = 2/8) and transient ischemic attack/minor stroke (n = 2/8), while the most common psychiatric condition in this group was depression and/or anxiety (n = 6/8).

Measures and Procedures

The TOMM (Tombaugh, 1996) was administered according to standardized instructions by a testing technician in concert with other selected neuropsychological tests. The TOMM has been validated using numerous populations, including neurological patients, community dwelling geriatrics individuals, college student normal controls and simulators, persons simulating traumatic brain injury, actual brain injury litigants, and persons with depression (Ashendorf, Constantinou & McCaffrey, 2004; Rees, Tombaugh, Gansler & Moczynski, 1998; Tombaugh, 1997). Scores less than the specified cut-offs on any trial of the TOMM raise doubt about the validity of the test-taker's performance. The TOMM is often perceived as being a harder task than it actually is; thus, individuals motivated to exaggerate impairments often perform worse than normative data predict.

In the present study, as suggested in the test manual, patients were considered to have failed the TOMM if they performed less than chance on Trial 1 or below the cut-off listed in the test manual on either Trial 2 or the Retention Trial. In the present sample, 20% (n = 4/20) of the individuals in the group who failed the TOMM performed below chance on Trial 1; all of these individuals also failed Trial 2 and the Retention Trial of the TOMM. Eighty percent (n = 16/20) of participants in the group who failed the TOMM performed below the cut-off on Trial 2. Nearly all individuals who performed below the cut-off on Trial 2 also performed below the cut-off on the retention trial; there were two exceptions, including one individual who performed below cut-off on Trial 2, but passed the Retention Trial and one individual who performed below the cut-off on Trial 2, but was inadvertently not administered the Retention Trial. Ninety percent (n = 18/20) of individuals who failed the TOMM performed below the cut-off on the Retention Trial. Four of the 18 individuals who performed below cut-off on the Retention Trial performed within normal limits on Trials 1 and 2 of the TOMM.

The DS subtest of the Wechsler Adult Intelligence Scale-III (WAIS-III; Wechsler, 1997a, 1997b) was administered to all study participants as a part of their routine clinical care. Administration followed instructions specified in the test manual. From each participant's performance on DS, two scores were derived. These scores were DS Age SS and Reliable DS: DS Age SS was computed by summing the participant's scores on all forward and backward trials of DS and then extracting the corresponding DS Age SS from the tables provided in the test manual. Reliable DS, introduced by Greiffenstein and colleagues (1994), represents the sum of the longest string of digits repeated without error over two trials under both forward and backward conditions. For example, a participant who passed both trials of four digits forward, passed both trials of three digits backward, but failed one trial each of five digits forward and four digits backward, would receive a Reliable DS score of 7.

Statistical Analyses

Except where indicated, statistical analyses were calculated using SPSS (version 10.1, Chicago, IL). Initial analyses consisted of conducting-independent t-tests to compare group differences in age, education, DS Age SS, and Reliable DS among persons either failing or passing the TOMM. Alpha was set at 0.05 for this and all other analyses. Cohen's d effect sizes, along with 95% confidence intervals (CIs), for these group differences were calculated using a computerized program provided by Devilly (2004). The aforementioned analyses all focused on a dichotomous distinction between groups (i.e., TOMM pass or fail). Given the additional variability inherent in continuous TOMM scores, an alternative view of the relationship between the TOMM and the DS variables was created by computing two-tailed Spearman's rho correlations between all trials on the TOMM and the DS variables. Spearman's rho correlations were used in place of Pearson's correlations due to the fact that TOMM scores were not normally distributed. Two-tailed Pearson correlations were computed to examine the relationship between age and the various DS variables.

A receiver operating characteristic (ROC) curve analysis was used to evaluate the usefulness of DS scores in predicting TOMM failure. As part of the ROC analysis, the respective sensitivity and specificity of Reliable DS and DS Age SS at various cut-offs was examined. In the present study, sensitivity was calculated by dividing the number of individuals who failed both the TOMM and the specific DS variable under consideration by the total number of individuals who failed the TOMM. Specificity, on the other hand, was calculated by dividing the number of individuals who passed both the TOMM and the specific DS variable under consideration by the total number of individuals who passed the TOMM.

Following the ROC analysis, PPVs and NPVs were calculated using the formulas presented in O'Bryant and Lucas (2006). In the present study, PPV refers to the likelihood that a patient failed the TOMM, given that they failed a specific DS variable. Negative predictive value refers to the likelihood that a patient passed the TOMM, given that they passed a specific DS variable. An estimated base rate of the condition in question (in this case, negative response bias as evidenced by TOMM failure) is needed to calculate PPV and NPV. The decision was made to use a relatively high base rate (41%) for several reasons. The first reason was that, at the time of their clinical evaluation, participants in the present study were judged to be at an elevated risk of displaying negative response bias for a variety of reasons that were discussed in the Participants section. The second reason was that 22% of the participants were referred with self-reported mild head injury, which is a diagnosis estimated by the American Board of Clinical Neuropsychology membership to be associated with a 41% probability of malingering or symptom exaggeration in persons in litigation or compensation-seeking status (Mittenberg, Patton, Canyock & Condit, 2002). Finally, the third reason is a relatively high base rate was chosen was that, in the veteran population, monetary gain in the form of monthly compensation is possible if the veteran (a) is at least 10% disabled as a result of military service and/or (b) served during a time of war, has limited income, and is deemed “permanently and totally disabled” or is at least 65 years old (Veterans Benefits Administration, 2006).

Results

Group Differences (as Shown by T-Tests and Effect Sizes) and Pearson Correlations

In terms of age and education, results of the t-tests showed that there were no significant differences between the group who passed the TOMM (N = 26) and the group who failed the TOMM (N = 20) (Table 2). In terms of performance on DS Age SS, there was a significant difference between the group who passed versus the group who failed the TOMM (Table 3). The effect size for this group difference is considered large (Lipsey, 1990) and was .80. There was a trend for group differences on Reliable DS, p < .10. As expected, persons failing the TOMM scored lower, indicating more negative response bias, than those passing the TOMM on both DS Age SS and Reliable DS. Results of Spearman's rho correlations showed that, although DS Age SS and Reliable DS were significantly correlated with all trials of the TOMM, DS Age SS score showed the highest correlations (Table 4). With regard to the relationship between age and DS variables, Reliable DS was significantly correlated with age, r = −.30, p < .05, while DS Age SS was not, r = −.23, p = .13.

Table 3.

Group differences in digit span scores among groups passing and failing TOMM

 Pass TOMM (n = 26)
 
Fail TOMM (n = 20)
 
t(44)
 
Cohen's d
 
 M SD M SD t p D 95% CI 
DS Age SS 9.19 2.02 7.60 1.96 2.69 .01 .80 .19 to 1.40 
Reliable DS 9.04 1.95 8.05 1.70 1.80 .08 .54 −.05 to 1.13 
 Pass TOMM (n = 26)
 
Fail TOMM (n = 20)
 
t(44)
 
Cohen's d
 
 M SD M SD t p D 95% CI 
DS Age SS 9.19 2.02 7.60 1.96 2.69 .01 .80 .19 to 1.40 
Reliable DS 9.04 1.95 8.05 1.70 1.80 .08 .54 −.05 to 1.13 

Note: TOMM = Test of Memory Malingering; DS = Digit Span.

Table 4.

Spearman's rho correlations between the TOMM Scales and Digit Span scores

 DS Age SS Reliable DS 
TOMM Trial 1 .50a .38a 
TOMM Trial 2 .52a .36b 
TOMM retention trial .43a .39a 
 DS Age SS Reliable DS 
TOMM Trial 1 .50a .38a 
TOMM Trial 2 .52a .36b 
TOMM retention trial .43a .39a 

Notes: TOMM = Test of Memory Malingering; DS = Digit Span; SS = Scaled Score.

aCorrelation is significant at the .01 level.

bCorrelation is significant at the .05 level.

Receiver operating characteristic, Sensitivity, Specificity, and Predictive Power Analyses

As shown by the ROC analysis, the area under the curve of .71 (95% CI .56 to .86) suggests that the predictive information captured by the DS Age SS was reasonably good and higher than Reliable DS. Using the DS Age SS of 7, suggested by Axelrod, Fitchenberg, Millis and Wertheimer (2006), resulted in a low specificity of .77, but acceptable sensitivity of .50 (Table 5). A more reasonable false-positive rate (.04) was found when using a slightly lower cut-off score of 6 (specificity = .96). Using a cut-off of 6, only one participant passed the TOMM, but failed DS Age SS; at the conclusion of testing, that participant was diagnosed with mild cognitive impairment. At the cut-off score of 6, the sensitivity of DS Age SS was considerably lower (sensitivity = .35) than at a cut-off score of 7 (sensitivity = .50), but was roughly similar to that found by Axelrod and colleagues (2006), who reported sensitivity value of .36 and a specificity value of .97 when using an even slightly lower cut-off score of 5 in their samples. In the present study, using a cut-off of 6 on DS Age SS and base rate of .41 for negative response bias resulted in a PPV of .86 and a NPV of .68 (Table 5). See Table 6 for failure rates on the TOMM and DS variables among the various study groups.

Table 5.

Sensitivity and specificity values for Digit Span variables

 ROC analysis (95% CI) Sensitivity (%) Specificity (%) PPV at base rate
 
NPV at base rate
 
    20% 30% 41% 20% 30% 41% 
DS Age SS .71 (.56–.86)         
 ≤4  100 1.0 1.0 1.0 .81 .71 .60 
 ≤5  15 96 .48 .62 .72 .82 .72 .62 
 ≤6  35 96 .69 .79 .86 .86 .78 .68 
 ≤7  50 77 .35 .48 .60 .86 .78 .69 
Reliable DS .64 (.48–.80)         
 ≤5  92 .14 .21 .31 .79 .69 .60 
 ≤6  25 92 .44 .57 .68 .83 .74 .64 
 ≤7  40 85 .40 .53 .65 .85 .77 .67 
 ≤8  50 42 .40 .27 .38 .85 .66 .55 
 ROC analysis (95% CI) Sensitivity (%) Specificity (%) PPV at base rate
 
NPV at base rate
 
    20% 30% 41% 20% 30% 41% 
DS Age SS .71 (.56–.86)         
 ≤4  100 1.0 1.0 1.0 .81 .71 .60 
 ≤5  15 96 .48 .62 .72 .82 .72 .62 
 ≤6  35 96 .69 .79 .86 .86 .78 .68 
 ≤7  50 77 .35 .48 .60 .86 .78 .69 
Reliable DS .64 (.48–.80)         
 ≤5  92 .14 .21 .31 .79 .69 .60 
 ≤6  25 92 .44 .57 .68 .83 .74 .64 
 ≤7  40 85 .40 .53 .65 .85 .77 .67 
 ≤8  50 42 .40 .27 .38 .85 .66 .55 

Notes: Prevalence of TOMM failure was .43.

PPV = positive predictive value; NPP = negative predictive value; DS = Digit Span; SS = Scaled Score.

Table 6.

Failure rates on the TOMM and DS Age SS among study groups

Diagnostic info/referral reason % Failing TOMM % Failing DS Age SS (≤6) % Failing Reliable Digit Span (≤6) 
Traumatic brain injury (n = 13)a 46.2 15.4 7.7 
 Mild traumatic brain injury (n = 10) 60.0 20.0 10.0 
 Moderate-to-severe traumatic brain injury (n = 3) 0.0 0.0 0.0 
Under age 65 no obvious neurological dysfunction (n = 33) 42.4 18.2 18.2 
 Psychiatric diagnosis only (n = 9) 55.6 33.3 22.2 
 Neurologic diagnosis only (n = 3) 33.3 33.3 33.3 
 Psychiatric and neurological diagnosis (n = 21) 38.1 9.5 14.3 
Diagnostic info/referral reason % Failing TOMM % Failing DS Age SS (≤6) % Failing Reliable Digit Span (≤6) 
Traumatic brain injury (n = 13)a 46.2 15.4 7.7 
 Mild traumatic brain injury (n = 10) 60.0 20.0 10.0 
 Moderate-to-severe traumatic brain injury (n = 3) 0.0 0.0 0.0 
Under age 65 no obvious neurological dysfunction (n = 33) 42.4 18.2 18.2 
 Psychiatric diagnosis only (n = 9) 55.6 33.3 22.2 
 Neurologic diagnosis only (n = 3) 33.3 33.3 33.3 
 Psychiatric and neurological diagnosis (n = 21) 38.1 9.5 14.3 

Note:aTraumatic brain injury categorization was based on patient self-report and is of questionable validity.

Discussion

Findings of the current study provide evidence for the superiority of DS Age SS over Reliable DS in predicting symptom validity test failure among a group of primarily middle-aged outpatient military veterans who were judged clinically to be at increased risk for displaying negative response bias. When DS Age SS and Reliable DS were compared between the groups who failed versus passed the TOMM, only scores on DS Age SS were significantly different between groups. There were trends for significant differences in means of groups on Reliable DS. As expected, the group who failed the TOMM scored lower than the group who passed the TOMM on both DS Age SS and Reliable DS. The DS Age SS demonstrated the largest effect size (d = .80) between groups and was the most highly correlated with performances on the various trials of the TOMM. Whereas DS Age SS was not significantly correlated with age, Reliable DS was; as age increased, Reliable DS scores decreased.

Receiver operating characteristic curve analysis suggested that a DS Age SS cut-off of 6 minimized false positives (specificity = .96), while retaining a reasonable sensitivity of .35 and a specificity of .96, both of which are fairly typical in the practice of symptom validity testing. These data suggest that, in terms of PPV, using a DS Age SS cut-off of 6 and a negative response bias base rate of .41, a clinician would have an 86% probability of being correct in suspecting a patient of TOMM failure given a below cut-off performance on DS Age SS. In terms of NPV, the same clinician would have a 68% probability of being correct in not suspecting a patient of possible TOMM failure given an above cut-off performance on DS Age SS.

In contrast, when used in settings where the base rate of negative response bias is lower, we would expect PPVs to decrease and NPVs to increase. For example, using the same DS Age SS score cut-off of 6, but decreasing the base rate of negative response bias to 20%, a clinician would only have a 69% probability of being correct (PPV) in suspecting a patient of TOMM failure (Table 5). In terms of NPV, the same clinician would have a higher, 86%, chance of being correct in not suspecting a patient of possible TOMM failure given an above cut-off performance on DS Age SS.

With regard to Reliable DS, choosing a cut-off of 6 maximized sensitivity (25%) and specificity (92%). However, using this cut-off, these values were still inferior to those produced when using a DS Age SS cut-off of 6, which produced a sensitivity of 35% and a specificity 96%. If used in a setting with a low base rate of malingering, a Reliable DS cut-off of 6 produced PPVs in the unacceptable level, falling below 50% (Table 5). Assuming the measure is to be used in a setting of increased risk for negative response bias, and using a base rate of 41%, improved the PPV to .68 and was associated with a .64 NPV; however, these values were still inferior to those produced by using a DS Age SS cut-off of 6, which was associated with a .86 PPV and a .68 NPV in the same hypothetical setting.

Thus, the results of the current study are consistent with previous research suggesting that DS Age SS is likely preferable for use over Reliable DS as a measure of negative response bias on cognitive testing. While providing supportive data for the use of DS Age SS in a group at high risk of displaying negative response bias, the study results also highlight the need for clinicians to carefully consider the predictive value of such measures in the specific population they serve. The PPV of all of these measures is substantially reduced among populations at low risk for negative response bias on cognitive testing. It should be emphasized that the diagnosis of invalid presentation, especially if malingering is in question, is a clinical judgment that cannot be made on the results of one test alone, but must be made in consideration of other psychometric, behavioral, and collateral data (Slick et al., 1999).

It should also be noted that evaluating the clinical utility of one symptom validity test by comparing it to another is limited by the validity of the criterion or “gold standard” measure used in the comparison. In this case, the TOMM was used as the gold standard. The finding that the TOMM shows less than a 1% false-positive rate (Gervais, Rohling, Green & Ford, 2004) suggests that, if a person fails the TOMM, she/he can nearly always be correctly classified as someone who is putting forth less than optimal effort. With this in mind, let us consider the group of individuals who failed the TOMM in the current study. Of these 20 individuals, only 7 failed DS Age SS and, even fewer, only 5, failed Reliable DS. Thus, even though DS Age SS may be more sensitive than Reliable DS, DS Age SS still only identified approximately 40% of individuals classified as exhibiting negative response bias by the TOMM. The latter suggests that the DS Age SS is a rather insensitive measure, even when compared with the relatively insensitive TOMM. By some reports, the TOMM has been estimated to identify only one out of three persons exhibiting negative response bias (Gervais et al., 2004). Therefore, if DS Age SS identifies only 40% of the persons demonstrating negative response bias as detected by the TOMM, then, theoretically speaking, DS Age SS may only identify 13% of all persons exhibiting negative response bias.

Keeping the above limitations in mind, future research should continue to investigate the relationships between cognitive symptom validity tests, such as the TOMM, and embedded indicators of negative response bias, such as DS scores. Whereas the current findings suggest that the WAIS-III DS Age SS is preferable for use over Reliable DS in detecting negative response bias on the TOMM, it remains to be seen whether or not the latter will hold true when using the WAIS-IV. Although the DS forward and backward trials remain essentially unchanged in the WAIS-IV, resulting in comparable Reliable DS scores, an additional “sequencing” task has been added. The sequencing task requires the examinee to listen to a string of numbers and then tell them back to the examiner in order, starting with the lowest number. Scores from that task are summed with scores from DS forward and backward in order to determine the DS Age SS. It is possible that the addition of this task may change the accuracy of DS Age SS in predicting negative response bias on testing.

Conflict of Interest

None declared.

References

Ashendorf
L.
Constantinou
M.
McCaffrey
R. J.
The effects of depression and anxiety on the TOMM in community dwelling older adults
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
125
-
130
)
Axelrod
B. N.
Fitchenberg
N. L.
Millis
S. R.
Wertheimer
J. C.
Detecting incomplete effort with digit span from the Wechsler Adult Intelligence Scale – Third Edition
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
513
-
523
)
Babikian
T.
Boone
K. B.
Arnold
G.
Sensitivity and specificity of various digit span scores in the detection of suspect effort
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
145
-
159
)
Baddeley
A. D.
Warrington
E. K.
Amnesia and the distinction between long- and short-term memory
Journal of Verbal Learning and Verbal Behavior
 , 
1970
, vol. 
9
 (pg. 
176
-
189
)
Bush
S. S.
Ruff
R. M.
Tröster
A. I.
Barth
J. T.
Koffler
S. P.
Pliskin
N. H.
, et al.  . 
Symptom validity assessment: Practice issues and medical necessity (NAN Policy & Planning Committee)
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
419
-
426
)
Butters
N.
Cermak
L.
Alcoholic Korsakoff's syndrome: An information-processing approach
 , 
1980
San Diego
Academic Press
Cermak
L. S.
Butters
N.
The role of interference and encoding in the short-term memory deficits of Korsakoff patients
Neuropsychologia
 , 
1972
, vol. 
10
 (pg. 
89
-
96
)
Devilly
G. J.
Effect size generator for windows: version 2.3.
 , 
2004
Australia
Centre for Neuropsychology, Swinburne University
Duncan
S.A.
Ausborn
D
The use of Reliable Digits to detect malingering in a criminal forensic pretrial population
Assessment
 , 
2002
, vol. 
9
 
1
(pg. 
56
-
61
)
Etherton
J. L.
Bianchini
K. J.
Ciota
M. A.
Greve
K. W.
Reliable digit span is unaffected by laboratory-induced pain: Implications for clinical use
Assessment
 , 
2005
, vol. 
12
 
1
(pg. 
101
-
106
)
Etherton
J. L.
Bianchini
K. J.
Greve
K. W.
Heinly
M. T.
Sensitivity and specificity of reliable digit span in malingered pain-related disability
Assessment
 , 
2005
, vol. 
12
 
2
(pg. 
130
-
136
)
Etherton
J. L.
Bianchini
K. J.
Ciota
M. A.
Heinly
M. T.
Greve
K. W.
Pain, malingering and the WAIS-III Working Memory Index
The Spine Journal
 , 
2006
, vol. 
6
 (pg. 
61
-
71
)
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
475
-
487
)
Greiffenstein
M. F.
Baker
W. J.
Gola
T.
Validation of malingered amnesia measures with a large clinical sample
Psychological Assessment
 , 
1994
, vol. 
6
 
3
(pg. 
218
-
224
)
Greiffenstein
M.F.
Gola
T.
Baker
W.J.
MMPI-2 validity scales versus domain specific measures in detection of factitious traumatic brain injury
The Clinical Neuropsychologist
 , 
1995
, vol. 
9
 
3
(pg. 
230
-
240
)
Greve
K. W.
Springer
S.
Bianchini
K. J.
Black
F. W.
Heinly
M. T.
Love
J. M.
, et al.  . 
Malingering in toxic exposure: Classification accuracy of Reliable Digit Span and WAIS-III Digit Span scaled score
Assessment
 , 
2007
, vol. 
14
 
1
(pg. 
12
-
21
)
Heinly
M. T.
Greve
K. W.
Bianchini
K. J.
Love
J. M.
Brennan
A.
WAIS Digit Span-based indicators of malingered neurocognitive dysfunction: Classification accuracy in traumatic brain injury
Assessment
 , 
2005
, vol. 
12
 
4
(pg. 
429
-
444
)
Inman
T. H.
Berry
T. R.
Cross-validation of indicators of malingering: A comparison of nine neuropsychological tests, four tests of malingering, and behavioral observations
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
1
-
23
)
Iverson
G.L.
Franzen
M.D.
The Recognition Memory Test, Digit Span, and Knox Cube Test as markers of malingered memory impairment
Assessment
 , 
1994
, vol. 
1
 
4
(pg. 
323
-
334
)
Iverson
G. L.
Franzen
M. D.
Using multiple objective memory procedures to detect simulated malingering
Journal of Clinical and Experimental Neuropsychology
 , 
1996
, vol. 
18
 (pg. 
38
-
51
)
Larrabee
G. J.
Detection of malingering using atypical performance patterns on standard neuropsychological tests
The Clinical Neuropsychologist
 , 
2003
, vol. 
17
 
3
(pg. 
410
-
425
)
Lipsey
M. W.
Design sensitivity
 , 
1990
Newberry Park, CA
Sage
Malec
J. F.
Brown
A. W.
Leibson
C. L.
Flaada
J. T.
Mandrekar
J. N.
The Mayo classification system for traumatic brain injury severity
Journal of Neurotrauma
 , 
2007
, vol. 
24
 (pg. 
1417
-
1424
)
Mathias
C. W.
Greve
K. W.
Bianchini
K. J.
Houston
R. J.
Crouch
J. A.
Detecting malingered neurocognitive dysfunction using the reliable digit span in traumatic brain injury
Assessment
 , 
2002
, vol. 
9
 
3
(pg. 
301
-
308
)
Meyers
J.E.
Volbrecht
M.
Validation of Reliable Digits for detection of malingering
Assessment
 , 
1998
, vol. 
5
 
3
(pg. 
303
-
307
)
Meyers
J. E.
Volbrecht
M. E.
A validation of multiple malingering detection methods in a large clinical sample
Archives of Clinical Neuropsychology
 , 
2003
, vol. 
18
 (pg. 
261
-
276
)
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
 , 
2002
, vol. 
24
 
8
(pg. 
1094
-
1102
)
O'Bryant
S. E.
Lucas
J. A.
Estimating the predictive value of the Test of Memory Malingering: An illustrative example for clinicians
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
533
-
540
)
Rees
L. M.
Tombaugh
T. N.
Gansler
D. A.
Moczynski
N. P.
Five validation experiments of the Test of Memory Malingering (TOMM)
Psychological Assessment
 , 
1998
, vol. 
10
 (pg. 
10
-
20
)
Sharland
M. J.
Gfeller
J. D.
A survey of neuropsychologists' beliefs and practices with respect to the assessment of effort
Archives of Clinical Neuropsychology
 , 
2007
, vol. 
22
 (pg. 
213
-
223
)
Slick
D. J.
Sherman
E. M. S.
Iverson
G. L.
Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research
The Clinical Neuropsychologist
 , 
1999
, vol. 
13
 
4
(pg. 
545
-
561
)
Strauss
E.
Slick
D. J.
Levy-Bencheton
J.
Hunter
M.
MacDonald
S. W. W.
Hultsch
D. F.
Intraindividual variability as an indicator of malingering in head injury
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
423
-
444
)
Suhr
J. A.
Barrash
J.
Larrabee
G. J.
Performance on standard attention, memory, and psychomotor speed tasks as indicators of malingering
Assessment of malingered neuropsychological deficits
 , 
2007
New York
Oxford University Press
(pg. 
131
-
170
)
Tombaugh
T. N.
TOMM: test of memory malingering
 , 
1996
North Tonawanda, NY
Multi-Health Systems, Inc
Tombaugh
T. N.
The Test of Memory Malingering (TOMM): Normative data from cognitively intact and cognitively impaired individuals
Psychological Assessment
 , 
1997
, vol. 
9
 (pg. 
260
-
268
)
Veterans Benefits Administration
A summary of benefits: VA Pamphlet 21-00-1
 , 
2006
Washington, DC
Author
Warrington
E. K.
Weiskrantz
L.
Deutsch
J. A.
An analysis of short- and long-term memory deficits in man
The physiological basis of memory
 , 
1973
New York
Academic Press
(pg. 
365
-
395
)
Wechsler
D.
WAIS-R manual.
 , 
1981
San Antonio, TX
Psychological Corporation
Wechsler
D. A.
Wechsler adult intelligence scale – III
 , 
1997
New York
Psychological Corporation
Wechsler
D. A.
Wechsler memory scale – III
 , 
1997
New York
Psychological Coporation
Wechsler
D. A.
Wechsler adult intelligence scale – IV
 , 
2008
San Antonio, TX
Psychological Corporation