Abstract

The current study attempted to improve upon the efficiency and accuracy of one of the most frequently administered measures of test validity, the Test of Memory Malingering (TOMM) by utilizing two short forms (TOMM trial 1 or TOMM1; and errors on the first 10 items of TOMM1 or TOMMe10). In addition, we cross-validated the accuracy of five embedded measures frequently used in malingering research. TOMM1 and TOMMe10 were highly accurate in predicting test validity (area under the curve [AUC] = 92% and 87%, respectively; TOMM1 ≤40 and TOMMe10 ≥1; sensitivities >70% and specificities >90%). A logistic regression of five embedded measures showed better accuracy compared with any individual embedded measure alone or in combination (AUC = 87%). TOMM1 and TOMMe10 provide evidence of greater sensitivity to invalid test performance compared with the standard TOMM administration and the use of regression improved the accuracy of the five embedded cognitive measures.

Introduction

Over the past two decades, there has been an increased interest in the assessment of test validity/malingering in neuropsychological assessment. There are now comprehensive reviews and assessment compilations available to the neuropsychologist regarding the assessment of test validity and malingering (Boone, 2007; Larrabee, 2007) as well as consensus statements from neuropsychological organizations clearly articulating the importance of validity assessment in clinical, research, and forensic settings (ABCN, 2007; Bush et al., 2005; Heilbronner et al., 2009). Despite the variety of tests developed to assess for invalid test performance, there is a constant need to update and refine these measures as patients become more aware of these methods and, therefore, putting at risk the validity of these valuable assessment techniques (Bauer & McCaffery, 2006; Kaufmann, 2009; Morel, 2009).

Both freestanding and embedded measures of effort have been utilized across a variety of settings (Sharland & Gfeller, 2007). A robust relationship reflecting lower cognitive performance across neuropsychological tests has been repeatedly found in those failing validity measures (Constantinou, Bauer, Ashendorf, Fisher, & McCaffrey, 2005; Gervais, Rohling, Green, & Ford, 2004; Green, 2006; Green, Rohling, Lees-Haley, & Allen, 2001; Gunner, Miele, Lynch, & McCaffrey, 2012; Locke, Smigielski, Powell, & Stevens, 2008; Marshall et al., 2010; Meyers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011; Schiehser et al., 2011; Suhr, Hammers, Dobbins-Buckland, Zimak, & Hughes, 2008). The expected relationship between the severity of brain injury pathology and neurocognitive measures is often confounded/reduced when validity measures are failed (Fox, 2011). One drawback of many freestanding measures is that administration time is often long, and the end result is often a “yes” or “no” finding regarding the validity of performance on that particular task. Increasing the administration efficiency of freestanding measures (while at the same time maintaining high sensitivity to non-credible performance) would be highly valued given the often time-limited nature of many neuropsychological evaluations.

In order to assess test validity more efficiently, we will attempt to improve upon the Test of Memory Malingering (TOMM; Tombaugh, 1996) which is already one of the most commonly administered freestanding measures of cognitive test validity (Sharland & Gfeller, 2007). The TOMM has an extensive research base identifying those exaggerating cognitive deficits (Boone, 2007; Larrabee, 2007; Sollman & Berry, 2011; Tombaugh, 1996) with very low false-positive rates in many clinical populations (Greve, Bianchini, Black, et al., 2006; Greve, Ord, Curtis, Bianchini, & Brennan, 2008; Haber & Fichtenberg, 2006; Iverson, Le Page, Koehler, Shojania, & Badii, 2007; Tombaugh, 1996). A recent meta-analytic review of the TOMM by Sollman and Berry (2011) found good accuracy statistics across a range of settings and populations (sensitivity = 65%, specificity = 94%, overall hit rate = 80%). Because the TOMM is often perceived as very easy or not likely measuring cognitive ability (Tan, Slick, Strauss, & Hultsch, 2002), it may not be as sensitive to poor effort compared with other freestanding measures (Armistead-Jehle & Gervais, 2011; Bauer, O'Bryant, Lynch, McCaffrey, & Fisher, 2007; Gervais, Rohling, Green, & Ford, 2004; Green, 2007, 2011). Greiffenstein and colleagues (2008) proposed that incorporating all three trials of the TOMM in decision-making (utilizing a cutoff of <45 on Trial 1, Trial 2 or retention trials) provides equivalent concordance rates with the Word Memory Test (WMT, Green, 2003). However, Greiffenstein and colleagues (2008) noted a 15% false-positive rate using this particular method. Gunner and colleagues (2012) recently described the Albany Consistency Index (ACI) for the TOMM showing that this index increased the sensitivity of the traditional TOMM cutoffs from 31% to 71% while maintaining adequate specificities (≥90%). This index shows promise, but it requires additional validation given the small sample size (n = 48), all were (primarily) mild traumatic brain injury (mTBI) litigants, and calculating the index requires that all three trials of the TOMM be given. In an effort to increase the efficiency of administration and possibly increasing the sensitivity of the TOMM to exaggerated cognitive deficits (while still maintaining low false-positive rates in good effort clinical populations), we will build upon previous research that has utilized the score from TOMM Trial 1 (TOMM1) as a measure of test validity.

A summary of TOMM1 cutoff scores extracted from several studies (providing sufficient TOMM1 information) as well as from the TOMM test manual (Tombaugh, 1996) is found in Tables 1 and 2. Several studies only provided specificity scores on TOMM1 based on the standard cutoffs used for the latter trials (≥45), but others provided data on all individuals that eventually passed the full administration of the TOMM (often showing slightly lower cutoff scores to maintain 100% specificity). The cutoffs in Table 1 provide evidence that those performing above a particular score for TOMM1 will go on to pass the full administration of the TOMM 100% of the time (providing some justification for discontinuing the TOMM after TOMM1). When several cutoffs for TOMM1 were available, cut scores in Table 2 were used that provided specificities ≥90%. Four studies and the TOMM test manual also provided data showing that 100% of patients scoring ≤33 (Tombaugh, 1996), ≤31 (Bauer et al., 2007), ≤29 (O'Bryant, Engel, Kleiner, Vasterling, & Black, 2007), ≤27 (Horner, Bedwell, & Duog, 2006), or ≤25 (Hilsabeck, Gordon, Hietpas-Wilson, & Zartman, 2011) on TOMM1 did not pass Trial 2 and/or the retention trials suggesting that further trials may be unnecessary (e.g., reflecting 100% sensitivity). Overall, prior studies across a variety of populations/settings (e.g., analog malingerers, litigants, TBI, pain, normal undergraduates, and psychiatric) indicate that a cut score of approximately ≤40 (weighted mean sensitivity = 0.77, specificity = 0.92) provides high levels of accuracy in predicting performance on the full administration of the TOMM and/or other freestanding/embedded measures of test validity. In general, samples that included patients with dementia (Hilsabeck et al., 2011; Horner et al., 2006; Tombaugh, 1996) or severe amnestic disorders (Greve, Bianchini, & Doane, 2006) required slightly lower TOMM1 cut scores to maintain specificity rates at 90% or higher.

Table 1.

TOMM trial 1 cut scores reflecting 100% specificity for passing the TOMM

Study (first author) N Sample TOMM trial 1 cut score 
Armistead-Jehle (2011) 75 Active duty military (clinical) ≥32a 
≥45 
Ashendorf (2004) 197 Non-clinical elderly (Anx/Depx) ≥40a 
Bauer (2007) 105 mTBI litigants ≥44a 
Brooks (2012) 53 Pediatric neurology patients ≥36a 
≥45 
Etherton (2005) 20 Student controls ≥45a 
20 Acute pain controls ≥45 
Gavett (2005) 77 mTBI litigants ≥45 
Gierok (2005) 20 Psychiatric inpatients ≥36a 
Hilsabeck (2011) 229 Mixed clinical ≥41 
Iverson (2007) 54 Fibromyalgia with depression/pain ≥40a 
Kirk (2011) 101 Pediatric clinical patients ≥33a 
Morgan (2009) 14 Litigantsb ≥39a 
Musso (2011) 54 Student controls ≥39a 
≥45 
O'Bryant (2008) 306 Non-clinical elderly ≥40a 
≥45 
89 Mixed clinical ≥41a 
≥45 
Rees (2001) 26 Depressed inpatients ≥45 
Ryan (2010) 72 Student controls ≥42a 
Teichner (2004) 21 Clinical-elderly normal ≥40a 
Tombaugh (1996) 142 Mixed clinicalc ≥41a 
≥45 
Vanderslice-Barr (2011) 96 Student controls ≥40a 
Yanez (2006) 20 Controls ≥45 
Study (first author) N Sample TOMM trial 1 cut score 
Armistead-Jehle (2011) 75 Active duty military (clinical) ≥32a 
≥45 
Ashendorf (2004) 197 Non-clinical elderly (Anx/Depx) ≥40a 
Bauer (2007) 105 mTBI litigants ≥44a 
Brooks (2012) 53 Pediatric neurology patients ≥36a 
≥45 
Etherton (2005) 20 Student controls ≥45a 
20 Acute pain controls ≥45 
Gavett (2005) 77 mTBI litigants ≥45 
Gierok (2005) 20 Psychiatric inpatients ≥36a 
Hilsabeck (2011) 229 Mixed clinical ≥41 
Iverson (2007) 54 Fibromyalgia with depression/pain ≥40a 
Kirk (2011) 101 Pediatric clinical patients ≥33a 
Morgan (2009) 14 Litigantsb ≥39a 
Musso (2011) 54 Student controls ≥39a 
≥45 
O'Bryant (2008) 306 Non-clinical elderly ≥40a 
≥45 
89 Mixed clinical ≥41a 
≥45 
Rees (2001) 26 Depressed inpatients ≥45 
Ryan (2010) 72 Student controls ≥42a 
Teichner (2004) 21 Clinical-elderly normal ≥40a 
Tombaugh (1996) 142 Mixed clinicalc ≥41a 
≥45 
Vanderslice-Barr (2011) 96 Student controls ≥40a 
Yanez (2006) 20 Controls ≥45 

Notes: TOMM = Test of Memory Malingering; mTBI = mild traumatic brain injury; Anx = anxiety, Depx = depression.

aDenotes inclusion of the entire sample.

bSelected case series.

cTOMM administration manual (1996).

Table 2.

Accuracy statistics of TOMM1 in predicting invalid test performance or malingering

Study (first author) N Sample Comparison Cut score SN SP 
Armistead-Jehle (2011) 75 Active military (clinical) TOMM2 ≤38 0.80 0.96 
Bauer (2007) 105 mTBI litigants TOMM ≤38 0.90 0.92 
Brooks (2012) 53 Pediatric neurology patients TOMM2 ≤38 0.75 1.00 
Duncan (2005) 50 Forensic/psychotic Consensusa ≤40 — 0.84 
Greve (2006) 50 Toxic exposure Slick criteria ≤42 0.64 0.94 
Greve (2006) 76 mTBI litigants Slick criteriab ≤42 0.73 0.93 
 41 Mild/severe TBI Slick criteriab ≤38 0.46 0.91 
 22 Memory disordered — ≤34 — 0.91 
Greve, Etherton, et al. (2009) 604 Pain litigants MPRD ≤42 0.53 0.95 
 208 Pain litigants MPRD ≤40 — 0.99 
 20 Pain simulators — ≤40 1.00 — 
Hilsabeck (2011) 229 Mixed clinical TOMM ≤36 0.84 0.91 
Hill (2003) 105 PNES, TLE TOMM ≤44 0.88 0.92 
Horner (2006) 114 Mixed clinical TOMM2 ≤35 0.96 0.92 
Kirkwood (2010) Pediatricc TOMM and MSVT ≤38 1.00 1.00 
Morgan (2009) 14 Litigantsc TOMM and others ≤38 1.00 1.00 
Musso (2011) 108 Students/simulators — ≤41 0.65 0.96 
O'Bryant (2007) 329 80% disability TOMM retention ≤39 0.88 0.90 
   WMT/Slick criteria ≤41 0.79 0.90 
Rees (2001) 26 Depressed inpatients TOMM ≤44 — 0.88 
Schroeder (2011) 40 mTBI litigants Slick criteriad ≤39 0.67 1.00 
Tombaugh (1996) 142 Mixed clinicale TOMM2 ≤37 0.75 0.92 
 121 Non-demented clinicale TOMM2 ≤39 1.00 0.90 
Wisdom (2012) 213 Inpatient epilepsy TOMM ≤40 0.77 0.91 
Weighted average across samples    ≤40 0.77 0.92 
Study (first author) N Sample Comparison Cut score SN SP 
Armistead-Jehle (2011) 75 Active military (clinical) TOMM2 ≤38 0.80 0.96 
Bauer (2007) 105 mTBI litigants TOMM ≤38 0.90 0.92 
Brooks (2012) 53 Pediatric neurology patients TOMM2 ≤38 0.75 1.00 
Duncan (2005) 50 Forensic/psychotic Consensusa ≤40 — 0.84 
Greve (2006) 50 Toxic exposure Slick criteria ≤42 0.64 0.94 
Greve (2006) 76 mTBI litigants Slick criteriab ≤42 0.73 0.93 
 41 Mild/severe TBI Slick criteriab ≤38 0.46 0.91 
 22 Memory disordered — ≤34 — 0.91 
Greve, Etherton, et al. (2009) 604 Pain litigants MPRD ≤42 0.53 0.95 
 208 Pain litigants MPRD ≤40 — 0.99 
 20 Pain simulators — ≤40 1.00 — 
Hilsabeck (2011) 229 Mixed clinical TOMM ≤36 0.84 0.91 
Hill (2003) 105 PNES, TLE TOMM ≤44 0.88 0.92 
Horner (2006) 114 Mixed clinical TOMM2 ≤35 0.96 0.92 
Kirkwood (2010) Pediatricc TOMM and MSVT ≤38 1.00 1.00 
Morgan (2009) 14 Litigantsc TOMM and others ≤38 1.00 1.00 
Musso (2011) 108 Students/simulators — ≤41 0.65 0.96 
O'Bryant (2007) 329 80% disability TOMM retention ≤39 0.88 0.90 
   WMT/Slick criteria ≤41 0.79 0.90 
Rees (2001) 26 Depressed inpatients TOMM ≤44 — 0.88 
Schroeder (2011) 40 mTBI litigants Slick criteriad ≤39 0.67 1.00 
Tombaugh (1996) 142 Mixed clinicale TOMM2 ≤37 0.75 0.92 
 121 Non-demented clinicale TOMM2 ≤39 1.00 0.90 
Wisdom (2012) 213 Inpatient epilepsy TOMM ≤40 0.77 0.91 
Weighted average across samples    ≤40 0.77 0.92 

Notes: TOMM = Test of Memory Malingering; SN = sensitivity (failed comparison measure); SP = specificity (passed comparison measure); MMPI-2, Minnesota Multiphasic Personality Inventory-Second Edition; MPRD = Malingered Pain-Related Disability; MSVT = Medical Symptom Validity Test; mTBI = mild traumatic brain injury; WMT = Word Memory Test; PNES = Psychogenic Non-epileptic Seizure; TLE = Temporal Lobe Epilepsy. Bold values are average of the entire table.

aInvalid test results were removed based on the review of medical records.

bFailure on Portland Digit Recognition Test or ≥2 embedded measures (Reliable Digit Span; Mittenberg formula for Wechsler Adult Intelligence Scale-Revised/Wechsler Adult Intelligence Scale-III; Millis formula for the California Verbal Learning Test; Wisconsin Cart Sorting Test: unique responses, Suhr formula; Minnesota Multiphasic Personality Inventory-Second Edition: F, Fb, Fp, and FBS).

cSelected case series.

dFailure on ≥2 effort measures (Reliable Digit Span, Word Memory Test, Validity Indicator Profile, or MMPI-2: FBS).

eTOMM administration manual (1996).

Compared with the latter two trials of the TOMM, which is the standard procedure in determining failure on the test, there is evidence to suggest that TOMM1 shows similar or even slightly greater effect sizes, hit rates, and/or sensitivity to poor effort (Etherton, Bianchini, Greve, & Ciota, 2005; Greve, Bianchini, Black, et al., 2006; Greve, Bianchini, & Doane, 2006; Greve et al., 2008; Greve, Etherton, Ord, Bianchini, & Curtis, 2009; Jasinski et al., 2011; Lindstrom, Lindstrom, Coleman, Nelson, & Gregg, 2009; Marshall et al., 2010; Musso, Barker, Jones, Roid, & Gouvier, 2011; Powell, Gfeller, Hendricks, & Sharland, 2004; Sollman, Ranseen, & Berry, 2010; Tan et al., 2002). TOMM1 has also shown equivalent or slightly higher correlations with other measures of psychiatric/cognitive malingering compared with Trial 2 and/or the retention trials (McCaffrey, O'Bryant, Ashendorf, & Fisher, 2003; Ruocco et al., 2008; Whiteside, Dunbar-Mayer, & Waters, 2009; Whitney, Davis, Shepard, & Herman, 2008).

Extracting additional information from TOMM1 might also be helpful in quickly and efficiently identifying exaggerated memory dysfunction without having to administer Trial 2 or the retention trial. The TOMM initially appears quite difficult, but as the test progresses (as one continues through Trial 1 and eventually shown the items again during Trial 2), in our experience, patients have commented on the simplicity of the task. Therefore, those who perform poorly (initially) on TOMM1 may then choose to perform much better on Trial 2 and the retention trial (resulting in an overall passing score). This highly variable pattern is consistent with a simulation study showing the initial trial of the WMT (Immediate Recognition trial = 63% failure rate) being much more sensitive to suspect effort compared with the Delayed Recognition trial (18% failure rate, Marshall et al., 2010). In addition, greater failure rates on the Abbreviated Hiscock Forced Choice Procedure were found when the test was administered at the beginning of a test battery compared with at the end (Guilmette, Whelihan, Hart, Sporadeo, & Buongiorno, 1996), suggesting that with experience/exposure to more demanding test measures, individuals begin to realize that these types of freestanding measures are very easy. The National Academy of Neuropsychology statement on effort assessment (Bush et al., 2005) noted that one measure of test validity should be administered early in the test battery which further emphasizes the importance of assessing test validity (presumably) before the patient becomes more aware of the nature/purpose of these types of tests.

It was our intention to capitalize on the patient's initial misperception of the TOMM being a difficult test. We predicted that if patients were choosing to exaggerate memory deficits on the TOMM, they would do this very early in the test administration of TOMM1. In order to capture this particular approach to testing (e.g., poor effort very early in the testing), we tabulated the number of errors a patient committed on the first 10 items of TOMM1 as a measure of test validity (referred to as TOMMe10). This is the first study that we are aware of that has assessed this type of error pattern on the TOMM and how well it might predict non-credible performance on other well-validated validity measures.

It has also been recommended that validity testing be carried out throughout the test battery to ensure adequate effort across multiple areas of cognitive and behavioral functioning (Boone, 2009; Heilbronner et al., 2009). This may be very time-consuming if one relies on administering several freestanding measures scattered throughout testing. However, by utilizing embedded measures of test validity (tests already part of one's standard battery), one can minimize not only the time needed to administer extra tests but also monitor the patient's level of effort across several hours of testing. Embedded measures often rely on simple cut points below which most credible, clinically impaired patients do not typically perform, or more complex statistical calculations are used to combine a variety of test scores (Larrabee, 2003; Meyers et al., 2011; Schutte, Millis, Axelrod, & VanDyke, 2011; Victor, Boone, Serpa, Beuhler, & Ziegler, 2009; Wolfe et al., 2010). Therefore, another goal of the current project was to take advantage of embedded measures to predict failure on other effort tests using logistic regression analysis (using all embedded measures as a continuous variable) or cut scores for each test. It was predicted that a combination of embedded measures (already well validated in previous malingering research) would also provide an accurate prediction of test validity.

The current project will assess the relative accuracy of two short forms of the TOMM (TOMM1 and TOMMe10) and a combination of five embedded measures (encompassing eight individual test scores) in predicting exaggerated cognitive deficits as measured by a well-validated freestanding measure of test validity, the Medical Symptom Validity Test (MSVT, Green, 2004).

Method

Participants

A total of 497 patients referred to an outpatient Veterans Affairs Neuropsychology Clinic in the southern USA were included in this retrospective, anonymous, database study that was approved by the local Veteran's Affairs IRB. Referrals were primarily from Primary Care Clinics, Psychiatry, Neurology, and ∼15% were from Compensation and Pension disability evaluations mostly for TBI (99% were mTBI). Those excluded from the sample were patients diagnosed with dementia, but no other medical, psychiatric, or substance abuse disorders were used to exclude subjects in an attempt to generalize to a wide range of outpatient populations. The most common psychiatric diagnoses (based on Diagnostic and Statistical Manual of Mental Disorders-fourth edition-text revision diagnostic criteria) were some type of Depressive Disorder (56%), post-traumatic stress disorder (38%), Anxiety Disorder, not otherwise specified (NOS) (28%), Cognitive Disorder NOS/Mild Cognitive Impairment (28%), Alcohol/Substance Abuse/Dependence (20%), No Diagnosis (5%), Bipolar Disorder (4%), and Psychotic Disorder (3%). The final sample was primarily male (93%), Caucasian (90%), and African American (9%). The most common medical conditions were hypertension (58%), chronic pain (49%), elevated lipids (38%), sleep apnea (25%), diabetes (22%), coronary artery disease (22%), hearing problems (21%), chronic obstructive pulmonary disease (10%), cerebrovascular accident (6%), and seizure disorder (4%).

Procedures

All patients were tested by the author, psychometrician, or a trained/supervised psychology intern over a 5-year period (2006–2011) in an outpatient setting. Tests administered, order of administration, and the specific measures of interest are found in Table 3. All patients had completed the tests of interest, but other tests were also administered as part of a comprehensive neuropsychological evaluation. The validity measures were typically given during the early and middle portions of the testing session and are relatively consistent with recent suggestions that the assessment of test validity include measures during the early part of the evaluation and continuously throughout the test battery (Boone, 2009; Bush et al., 2005). The MSVT was used to categorize patients as either “good effort” (passing) or “poor effort” (failing) according to the standard scoring rules found in the test manual across the Immediate Recognition, Delayed Recognition, and Consistency trials.

Table 3.

General order of administration and measures used for data analyses

Test administered Scores used for analysis 
1 Test of Memory Malingering (Trial 1) Raw Score; errors on the first 10 items 
Brief Visuospatial Memory Test-Revised (Immediate Recall)  
3 Medical Symptom Validity Test (Immediate Recognition) Percent correct 
4 Finger Tapping Test Average of first three trials, dominant hand 
6 WAIS-III: Digit-Symbol Coding Standard score (PSI) 
7 WAIS-III: Symbol Search Standard score (PSI) 
5 Medical Symptom Validity Test (Delayed Recognition) Percent correct 
8 Brief Visuospatial Memory Test-R (Recall/Recognition) Correct hits (recognition trial) 
Trail-making Test Parts A, B  
10 CVLT-II (Immediate Recall)  
11 WAIS-III: Block Design  
12 WAIS-III: Digit span Standard score (WMI) 
13 WAIS-III: Arithmetic Standard score (WMI) 
14 CVLT-II (Delayed Recall/Yes–No Recognition)  
15 WAIS-III: Letter-Number Sequencing Standard score (WMI) 
16 CVLT-II (Forced Choice) Raw score correct 
17 Stroop Color-Word Test (Golden Version)  
18 Auditory Consonant Trigrams (Stuss Version)  
19 WAIS-III: Similarities  
20 Rey Complex Figure Test: Copy (Meyers Version)  
21 Controlled Oral Word Association Test (FAS, CFL)  
22 Animal Naming  
23 Brief Test of Attention  
24 Boston Naming Test  
25 Wide Range Achievement Test, Third Edition (Reading)  
26 Wechsler Test of Adult Reading  
27 Wisconsin Card Sorting Test (WCST) or modified WCST (Nelson, 48-card version)  
Test administered Scores used for analysis 
1 Test of Memory Malingering (Trial 1) Raw Score; errors on the first 10 items 
Brief Visuospatial Memory Test-Revised (Immediate Recall)  
3 Medical Symptom Validity Test (Immediate Recognition) Percent correct 
4 Finger Tapping Test Average of first three trials, dominant hand 
6 WAIS-III: Digit-Symbol Coding Standard score (PSI) 
7 WAIS-III: Symbol Search Standard score (PSI) 
5 Medical Symptom Validity Test (Delayed Recognition) Percent correct 
8 Brief Visuospatial Memory Test-R (Recall/Recognition) Correct hits (recognition trial) 
Trail-making Test Parts A, B  
10 CVLT-II (Immediate Recall)  
11 WAIS-III: Block Design  
12 WAIS-III: Digit span Standard score (WMI) 
13 WAIS-III: Arithmetic Standard score (WMI) 
14 CVLT-II (Delayed Recall/Yes–No Recognition)  
15 WAIS-III: Letter-Number Sequencing Standard score (WMI) 
16 CVLT-II (Forced Choice) Raw score correct 
17 Stroop Color-Word Test (Golden Version)  
18 Auditory Consonant Trigrams (Stuss Version)  
19 WAIS-III: Similarities  
20 Rey Complex Figure Test: Copy (Meyers Version)  
21 Controlled Oral Word Association Test (FAS, CFL)  
22 Animal Naming  
23 Brief Test of Attention  
24 Boston Naming Test  
25 Wide Range Achievement Test, Third Edition (Reading)  
26 Wechsler Test of Adult Reading  
27 Wisconsin Card Sorting Test (WCST) or modified WCST (Nelson, 48-card version)  

Notes: WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; CVLT-II = California Verbal Learning Test, Second Edition; WMI = Working Memory Index; PSI = Processing Speed Index. Boldface in this table is to highlight only the specific tests that were used in the statistical analyses.

Scores for the two short forms of the TOMM were obtained for each patient: Total number of items correct for TOMM trial 1 (TOMM1) and total number of errors on the first 10 items of TOMM1 (TOMMe10). These two scores were assessed separately to predict performance on the MSVT using binomial logistic regression. In addition, it is also clinically useful to have a variety of possible cutoff scores depending on the population/clinical setting of interest. This would allow the individual clinician to decide what is most critical in making diagnostic decisions using TOMM1 and TOMMe10 (e.g., maximizing sensitivity or specificity). Therefore, analyses will provide a range of sensitivity/specificity values for all possible cut scores as well as utilizing receiver operating characteristic (ROC) analyses to calculate the overall accuracy of the TOMM measures.

Four of the five embedded measures chosen for this study have been evaluated previously. The Wechsler Adult Intelligence Scale, Third Edition (WAIS-III; Wechsler, 1997) Processing Speed Index and Working Memory Index (WMI) have been shown to reliably identify those providing invalid test performance (Curtis, Greve, & Bianchini, 2009; Etherton, Bianchini, Ciota, Heinly, & Greve, 2006; Etherton, Bianchini, Heinly, & Greve, 2006; Greve et al., 2008). The Finger Tapping Test (FTT) has also been shown to have acceptable accuracy statistics in predicting poor effort on cognitive testing (Arnold et al., 2005; Boone, 2007; Larrabee, 2007). The California Verbal Learning Test, Second Edition (CVLT-II) Forced Choice (FC) trial was initially designed to assess poor effort (Delis, Kramer, Kaplan, & Ober, 2000), and studies have found this measure to be an acceptable indicator of poor effort (Moore & Donders, 2004). The final embedded measure, the Brief Visuospatial Memory Test-Revised (BVMT-R) Recognition Hits Trial (Benedict, 1997) has not been studied previously, but given the relatively easy nature of this yes/no delayed recognition of six designs, it was hypothesized that this may be useful in assessing test validity. The number of correct hits out of a possible six was utilized in the current study. This measure of visual recognition memory also complemented the broad areas of cognitive functioning already covered by the four other embedded measures (e.g., auditory attention, psychomotor processing speed and visual attention, fine motor speed, and verbal recognition memory).

What has not been done, to our knowledge, is to combine these specific embedded measures in an effort to increase their accuracy over and above any single embedded measure alone. These five measures will first be assessed for any problems with collinearity that may over-inflate the accuracy of the regression equations. The embedded measures will then be entered into a binary logistic regression (backwise step procedure) with MSVT pass/fail as the outcome variable. The accuracy statistics of each individual embedded measure, based on empirically derived cutoffs from the literature, will also be presented as a comparison. The accuracy of the TOMM trial 1 measures using logistic regression will also be assessed with pass/fail of the MSVT as the binary outcome predictor.

Results

General demographic information and differences between groups passing/failing the MSVT are presented in Table 4. There was an overall failure rate on the MSVT of 33% based on the three easy subtests (immediate recognition, delayed recognition, or consistency ≤ 85%). There were no significant differences in age, educational attainment, or estimated WAIS-III Full-Scale IQ between those passing or failing the MSVT. There was a significant difference in WRAT-3 Reading with those failing the MSVT scoring slightly lower, but both groups were still within the average range.

Table 4.

Demographics and estimated premorbid functioning of the sample (N = 497)

 Mean (SDRange MSVT
 
t 
   Pass (n = 331)
 
Faila (n = 166)
 
 
   Mean (SDMean (SD 
Age (years) 48.6 (15.6) 21–87 49.9 (16.0) 46.9 (14.5) 1.74 
Education (years) 12.4 (2.5) 1–20 12.6 (2.5) 12.2 (2.3) 1.47 
WRAT-3: Reading (Standard Score)b 95.7 (12.0) 49–120 97.0 (11.8) 92.9 (12.1) 3.52* 
WTAR: Estimated Full-Scale IQc 99.2 (10.0) 75–123 99.9 (10.5) 97.9 (8.9) 1.45 
 Mean (SDRange MSVT
 
t 
   Pass (n = 331)
 
Faila (n = 166)
 
 
   Mean (SDMean (SD 
Age (years) 48.6 (15.6) 21–87 49.9 (16.0) 46.9 (14.5) 1.74 
Education (years) 12.4 (2.5) 1–20 12.6 (2.5) 12.2 (2.3) 1.47 
WRAT-3: Reading (Standard Score)b 95.7 (12.0) 49–120 97.0 (11.8) 92.9 (12.1) 3.52* 
WTAR: Estimated Full-Scale IQc 99.2 (10.0) 75–123 99.9 (10.5) 97.9 (8.9) 1.45 

Notes: WRAT-3 = Wide Range Achievement Test, Third Edition; WTAR = Wechsler Test of Adult Reading (corrected for age, education, gender, ethnicity); MSVT = Medical Symptom Validity Test.

a33% failed the MSVT.

bn = 470.

cn = 231.

*p < .001.

Test performance of those passing or failing the MSVT are presented in Table 5 along with respective effect sizes (Cohen, 1988). All t-test comparisons between those passing or failing the MSVT were significant with effect sizes ranging from medium to large (d = 0.7 [FTT] to d = 2.0 [TOMM1]). The MSVT showed the greatest effect sizes but this would be expected given that it was the grouping variable of interest. The two TOMM1 measures showed the largest effect sizes compared with the individual embedded measures. It should be noted that those passing the MSVT performed within the average range across all measures of interest.

Table 5.

Performance on freestanding and embedded measures of those passing or failing the MSVT

Test measures MSVT
 
td 
 Pass
 
Fail 
 
  
 Mean (SDMean (SD  
TOMM1 47 (3.6) 35 (8.7) 16.8 2.0 
TOMMe10 .3 (.8) 2.7 (1.9) −19.7 1.8 
MSVT 
 Immediate Recognition (% correct) 99 (2.4) 78 (16.2) 15.8 2.3 
 Delayed Recognition (% correct) 98 (3.0) 72 (17.7) 18.0 3.5 
 Consistency (% correct) 98 (3.4) 73 (12.3) 24.4 3.2 
FTT (dominant hand, mean of first three trials) 44 (10.9) 36 (11.5) 5.8 0.7 
WAIS-III 
 Processing Speed Index (Standard Score) 92 (12.7) 80 (10.4) 10.4 1.0 
 Working Memory Index (Standard Score) 99 (12.6) 89 (13.1) 7.9 0.8 
CVLT-II: FC(Raw Score) 15.8 (.6) 14.1 (2.5) 8.3 1.1 
BVMT-R (Recognition Hits) 5.5 (.8) 4.4 (1.4) 8.9 1.0 
Test measures MSVT
 
td 
 Pass
 
Fail 
 
  
 Mean (SDMean (SD  
TOMM1 47 (3.6) 35 (8.7) 16.8 2.0 
TOMMe10 .3 (.8) 2.7 (1.9) −19.7 1.8 
MSVT 
 Immediate Recognition (% correct) 99 (2.4) 78 (16.2) 15.8 2.3 
 Delayed Recognition (% correct) 98 (3.0) 72 (17.7) 18.0 3.5 
 Consistency (% correct) 98 (3.4) 73 (12.3) 24.4 3.2 
FTT (dominant hand, mean of first three trials) 44 (10.9) 36 (11.5) 5.8 0.7 
WAIS-III 
 Processing Speed Index (Standard Score) 92 (12.7) 80 (10.4) 10.4 1.0 
 Working Memory Index (Standard Score) 99 (12.6) 89 (13.1) 7.9 0.8 
CVLT-II: FC(Raw Score) 15.8 (.6) 14.1 (2.5) 8.3 1.1 
BVMT-R (Recognition Hits) 5.5 (.8) 4.4 (1.4) 8.9 1.0 

Notes: TOMM1 = Test of Memory Malingering Trial 1; TOMMe10 = Errors on the first 10 items of TOMM1; MSVT = Medical Symptom Validity Test; FTT = Finger Tapping Test; WAIS-III = Wechsler Adult Intelligence Scale, Third Edition; CVLT-II = California Verbal Learning Test, Second Edition; FC = Forced Choice; BVMT-R = Brief Visuospatial Memory Test, Revised. d = effect size (Cohen's d).

*p < .001 for all comparisons.

TOMM Trial 1

The results of the binary logistic regression analyses for TOMM1 and TOMMe10 are found in Table 6. Sensitivity rates for TOMM1 (cutoff ≤40) and TOMMe10 (cutoff ≥1 errors) were quite similar (TOMM1 = 72%, TOMMe10 = 71%). Specificities were above 90% for both measures, an overall hit rate is 87% for TOMM1 and 86% for TOMMe10, and likelihood ratios (LRs) are 12 and 11, respectively. The overall models were statistically reliable for both variables in predicting MSVT performance (TOMM1: χ2 = 302.908, p < .001; TOMMe10: χ2 = 248.740, p < .001) and the amount of variance explained by the model was excellent (TOMM1: Nagelkerke R2 = 63.8%; TOMMe10: Nagelkerke R2 = 55.3%). ROC analyses showed “excellent to outstanding” model discrimination (Hosmer & Lemeshow, 2000) across both measures (TOMM1: area under the curve [AUC] = 92%, 95% CI = 89.0–94.4; TOMMe10: AUC = 87%, 95% CI = 83.4–91.0). Tables 7 and 8 show the full range of scores for TOMM1 and TOMMe10 with corresponding sensitivities, specificities, as well as positive and negative predictive values at various base rates of malingering.

Table 6.

Results of binary logistic regression analysis of TOMM1 and TOMMe10 in predicting passing or failing the MSVT

Variable B SE Wald p-value Exp(B95% CI Exp(BSen Spec Overall LR 
TOMM1a −0.320 0.029 123.26 .000 0.726 0.686–0.768 72.0 94.2 86.8 12 
Constant 12.97 1.253 107.123 .000       
TOMMe10b 1.314 0.120 119.633 .000 3.722 2.941–4.710 71.0 93.3 85.9 11 
Constant −2.202 0.178 153.331 .000       
Variable B SE Wald p-value Exp(B95% CI Exp(BSen Spec Overall LR 
TOMM1a −0.320 0.029 123.26 .000 0.726 0.686–0.768 72.0 94.2 86.8 12 
Constant 12.97 1.253 107.123 .000       
TOMMe10b 1.314 0.120 119.633 .000 3.722 2.941–4.710 71.0 93.3 85.9 11 
Constant −2.202 0.178 153.331 .000       

Notes: TOMM1 = Test of Memory Malingering Trial 1; TOMMe10 = Errors on the first 10 items of TOMM1; Sen = sensitivity; Spec = specificity; Overall = overall hit rate; LR = likelihood ratio.

aTOMM1 cut-off ≤40.

bTOMMe10 cut-off ≥1.

Table 7.

Sensitivity, specificity, PPV, and NPV of TOMM1 in predicting passing or failing the MSVT

Cut score SN SP Base rate (0.40)
 
Base rate (0.25)
 
Base rate (0.10)
 
   PPV NPV PPV NPV PPV NPV 
9.00 0.01 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
13.00 0.02 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
16.00 0.02 1.00 1.00 0.61 1.00 0.75 1.00 0.90 
17.00 0.04 1.00 1.00 0.61 1.00 0.76 1.00 0.90 
18.00 0.06 1.00 1.00 0.62 1.00 0.76 1.00 0.91 
20.00 0.07 1.00 1.00 0.62 1.00 0.76 1.00 0.91 
21.00 0.09 1.00 1.00 0.62 1.00 0.77 1.00 0.91 
22.00 0.10 1.00 1.00 0.62 1.00 0.77 1.00 0.91 
23.00 0.12 1.00 1.00 0.63 1.00 0.77 1.00 0.91 
24.00 0.13 1.00 1.00 0.63 1.00 0.77 1.00 0.91 
25.00 0.14 1.00 1.00 0.64 1.00 0.78 1.00 0.91 
26.00 0.21 1.00 1.00 0.65 1.00 0.79 1.00 0.92 
27.00 0.24 1.00 1.00 0.66 1.00 0.80 1.00 0.92 
28.00 0.28 1.00 1.00 0.68 1.00 0.81 1.00 0.93 
29.00 0.31 1.00 1.00 0.69 1.00 0.81 1.00 0.93 
30.00 0.33 1.00 1.00 0.69 1.00 0.82 1.00 0.93 
31.00 0.35 0.99 0.98 0.70 0.95 0.82 0.87 0.93 
32.00 0.39 0.99 0.98 0.71 0.96 0.83 0.88 0.94 
33.00 0.42 0.99 0.98 0.72 0.96 0.84 0.88 0.94 
34.00 0.49 0.99 0.97 0.74 0.95 0.85 0.86 0.95 
35.00 0.52 0.99 0.97 0.75 0.95 0.86 0.85 0.95 
36.00 0.57 0.98 0.94 0.77 0.89 0.87 0.73 0.95 
37.00 0.60 0.97 0.93 0.78 0.87 0.88 0.69 0.96 
38.00 0.63 0.96 0.92 0.80 0.85 0.89 0.65 0.96 
39.00 0.70 0.96 0.92 0.82 0.84 0.90 0.64 0.97 
40.00 0.72 0.94 0.89 0.83 0.81 0.91 0.58 0.97 
41.00 0.76 0.91 0.85 0.85 0.74 0.92 0.48 0.97 
42.00 0.79 0.88 0.81 0.86 0.68 0.93 0.42 0.97 
43.00 0.84 0.85 0.79 0.89 0.65 0.94 0.38 0.98 
44.00 0.85 0.82 0.76 0.89 0.62 0.94 0.35 0.98 
45.00 0.88 0.77 0.72 0.91 0.57 0.95 0.30 0.98 
46.00 0.91 0.72 0.68 0.92 0.52 0.96 0.27 0.99 
47.00 0.93 0.63 0.63 0.93 0.46 0.97 0.22 0.99 
48.00 0.96 0.48 0.55 0.95 0.38 0.98 0.17 0.99 
49.00 0.99 0.27 0.47 0.97 0.31 0.99 0.13 1.00 
50.00 1.00 0.00 0.40 1.00 0.25 1.00 0.10 1.00 
Cut score SN SP Base rate (0.40)
 
Base rate (0.25)
 
Base rate (0.10)
 
   PPV NPV PPV NPV PPV NPV 
9.00 0.01 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
13.00 0.02 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
16.00 0.02 1.00 1.00 0.61 1.00 0.75 1.00 0.90 
17.00 0.04 1.00 1.00 0.61 1.00 0.76 1.00 0.90 
18.00 0.06 1.00 1.00 0.62 1.00 0.76 1.00 0.91 
20.00 0.07 1.00 1.00 0.62 1.00 0.76 1.00 0.91 
21.00 0.09 1.00 1.00 0.62 1.00 0.77 1.00 0.91 
22.00 0.10 1.00 1.00 0.62 1.00 0.77 1.00 0.91 
23.00 0.12 1.00 1.00 0.63 1.00 0.77 1.00 0.91 
24.00 0.13 1.00 1.00 0.63 1.00 0.77 1.00 0.91 
25.00 0.14 1.00 1.00 0.64 1.00 0.78 1.00 0.91 
26.00 0.21 1.00 1.00 0.65 1.00 0.79 1.00 0.92 
27.00 0.24 1.00 1.00 0.66 1.00 0.80 1.00 0.92 
28.00 0.28 1.00 1.00 0.68 1.00 0.81 1.00 0.93 
29.00 0.31 1.00 1.00 0.69 1.00 0.81 1.00 0.93 
30.00 0.33 1.00 1.00 0.69 1.00 0.82 1.00 0.93 
31.00 0.35 0.99 0.98 0.70 0.95 0.82 0.87 0.93 
32.00 0.39 0.99 0.98 0.71 0.96 0.83 0.88 0.94 
33.00 0.42 0.99 0.98 0.72 0.96 0.84 0.88 0.94 
34.00 0.49 0.99 0.97 0.74 0.95 0.85 0.86 0.95 
35.00 0.52 0.99 0.97 0.75 0.95 0.86 0.85 0.95 
36.00 0.57 0.98 0.94 0.77 0.89 0.87 0.73 0.95 
37.00 0.60 0.97 0.93 0.78 0.87 0.88 0.69 0.96 
38.00 0.63 0.96 0.92 0.80 0.85 0.89 0.65 0.96 
39.00 0.70 0.96 0.92 0.82 0.84 0.90 0.64 0.97 
40.00 0.72 0.94 0.89 0.83 0.81 0.91 0.58 0.97 
41.00 0.76 0.91 0.85 0.85 0.74 0.92 0.48 0.97 
42.00 0.79 0.88 0.81 0.86 0.68 0.93 0.42 0.97 
43.00 0.84 0.85 0.79 0.89 0.65 0.94 0.38 0.98 
44.00 0.85 0.82 0.76 0.89 0.62 0.94 0.35 0.98 
45.00 0.88 0.77 0.72 0.91 0.57 0.95 0.30 0.98 
46.00 0.91 0.72 0.68 0.92 0.52 0.96 0.27 0.99 
47.00 0.93 0.63 0.63 0.93 0.46 0.97 0.22 0.99 
48.00 0.96 0.48 0.55 0.95 0.38 0.98 0.17 0.99 
49.00 0.99 0.27 0.47 0.97 0.31 0.99 0.13 1.00 
50.00 1.00 0.00 0.40 1.00 0.25 1.00 0.10 1.00 

Notes: Embolden values indicate cut scores with the greatest accuracy based on binary logistic regression analyses; SN = sensitivity; SP = specificity; PPV = positive predictive value; NPV = negative predictive value.

Embedded Measures

The accuracy and LRs of each embedded measure in predicting MSVT performance was based on empirically derived cutoffs from the literature and are presented in Table 9. The CVLT-II: FC was the best overall predictor (sensitivity = 40%, specificity = 95%, hit rate = 77%, LR = 8) with sensitivities across all embedded measures ranging from 15% to 45%, with three of five measures showing specificities ≥90%. The relative accuracy of simply adding the number of failed embedded measures to predict performance on the MSVT is also shown in Table 9. Others have viewed this method as relatively straightforward and highly accurate using a variety of embedded measures and methods of grouping good/poor effort groups (Jasinski et al., 2011; Larrabee, 2003, 2008; Meyers & Volbrecht, 2003; Pella, Hill, Shelton, Elliott, & Gouvier, 2012; Schroeder & Marshall, 2011; Victor et al., 2009). Consistent with these studies, we found that failing ≥2 embedded measures provided the greatest accuracy (hit rate = 75%, LR = 6), but sensitivity was low (40%) with acceptable specificity (93%). However, there did not appear to be any significant advantage to adding failed embedded measures as the CVLT-II: FC trial alone showed slightly better accuracy statistics.

Table 8.

Sensitivity, specificity, PPV, and NPV of TOMMe10 in predicting passing or failing the MSVT

Cut score SN SP Base rate (0.40)
 
Base rate (0.25)
 
Base rate (0.10)
 
   PPV NPV PPV NPV PPV NPV 
0.00 0.85 0.78 0.72 0.89 0.56 0.94 0.30 0.98 
1.00 0.71 0.93 0.87 0.83 0.77 0.91 0.53 0.97 
2.00 0.51 0.97 0.92 0.75 0.85 0.86 0.65 0.95 
3.00 0.35 0.99 0.96 0.70 0.92 0.82 0.80 0.93 
4.00 0.14 0.99 0.90 0.63 0.82 0.78 0.61 0.91 
5.00 0.08 0.99 0.84 0.62 0.73 0.76 0.47 0.91 
6.00 0.03 1.00 1.00 0.61 1.00 0.75 1.00 0.90 
7.00 0.01 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
9.00 0.00 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
Cut score SN SP Base rate (0.40)
 
Base rate (0.25)
 
Base rate (0.10)
 
   PPV NPV PPV NPV PPV NPV 
0.00 0.85 0.78 0.72 0.89 0.56 0.94 0.30 0.98 
1.00 0.71 0.93 0.87 0.83 0.77 0.91 0.53 0.97 
2.00 0.51 0.97 0.92 0.75 0.85 0.86 0.65 0.95 
3.00 0.35 0.99 0.96 0.70 0.92 0.82 0.80 0.93 
4.00 0.14 0.99 0.90 0.63 0.82 0.78 0.61 0.91 
5.00 0.08 0.99 0.84 0.62 0.73 0.76 0.47 0.91 
6.00 0.03 1.00 1.00 0.61 1.00 0.75 1.00 0.90 
7.00 0.01 1.00 1.00 0.60 1.00 0.75 1.00 0.90 
9.00 0.00 1.00 1.00 0.60 1.00 0.75 1.00 0.90 

Notes: Embolden values indicate cut scores with the greatest accuracy based on binary logistic regression analyses; SN = sensitivity; SP = specificity; PPV = positive predictive value; NPV = negative predictive value.

Table 9.

Results of the five embedded measures predicting passing or failing the MSVT based on empirically derived cut scores from the literature

Embedded measure Sensitivity Specificity Hit rate LR Cutoff Reference 
FTT* 42.0 78.8 70.3 ≤35 Arnold et al. (2005) 
WAIS-III: PSI 14.6 96.7 74.3 ≤70 Etherton et al. (2006) 
WAIS-III: WMI 20.0 97.3 70.5 ≤75 Etherton et al. (2006) 
CVLT-II (FC) 39.9 94.9 77.3 ≤14 Moore and Donders (2004) 
BVMT-R (Hits) 45.1 89.1 74.6 ≤4 NA 
Any two tests failed 39.8 93.1 75.3 6 ≥2  
Embedded measure Sensitivity Specificity Hit rate LR Cutoff Reference 
FTT* 42.0 78.8 70.3 ≤35 Arnold et al. (2005) 
WAIS-III: PSI 14.6 96.7 74.3 ≤70 Etherton et al. (2006) 
WAIS-III: WMI 20.0 97.3 70.5 ≤75 Etherton et al. (2006) 
CVLT-II (FC) 39.9 94.9 77.3 ≤14 Moore and Donders (2004) 
BVMT-R (Hits) 45.1 89.1 74.6 ≤4 NA 
Any two tests failed 39.8 93.1 75.3 6 ≥2  

Notes: FTT = Finger Tapping Test; WAIS-III = Wechsler Intelligence Scale, Third edition; PSI = Processing Speed Index; WMI = Working Memory Index; CVLT-II (FC) = California Verbal Learning Test, Second edition, Forced Choice trial; BVMT-R (Hits) = Brief Visuospatial Memory Test-Revised, correct number of hits on Yes/No recognition trial. LR = likelihood ratio.

*FTT based on mean of first three trials of dominant hand.

Table 10 provides the results of the binary logistic regression for the five embedded measures as continuous predictors of the MSVT. There were no significant problems with collinearity given a variance inflation factor of <5.0 (Kutner, Nachtsheim, & Neter, 2004) for each variable based on a linear regression of the five measures with both the MSVT Immediate and Delayed Recognition trials. The overall model was statistically reliable (χ2 = 123.741, p < .001) and the amount of variance explained by the model was excellent (Nagelkerke R2 = 47.0%). Results indicate lower sensitivity but comparable specificity to the TOMM1 variables (sensitivity = 58%, specificity = 94%, overall hit rate = 83%, LR = 9), with model discrimination seen as “excellent” (AUC = 86.9%, 95% CI = 82.7–91.1).

Table 10.

Results of backward step-wise binary logistic regression analysis of the five embedded measures as continuous variables predicting passing or failing the MSVT (N = 307)

Variables B SE Wald p-value Exp(B95% CI Exp(BSen Spec Overall LR 
Model 1       58.1 93.5 82.7 
FTT* −0.023 0.016 2.175 .140 0.977 0.948–1.008     
WAIS-III: PSI −0.050 0.018 7.531 .006 0.951 0.918–0.986     
WAIS-III: WMI −0.026 0.015 3.003 .083 0.975 0.946–1.003     
CVLT-II (FC) −0.542 0.138 15.463 .000 0.581 0.444–0.762     
BVMT-R (Hits) −0.524 0.148 12.493 .000 0.592 0.443–0.792     
Constant 17.83 2.55 48.97 .000       
Variables B SE Wald p-value Exp(B95% CI Exp(BSen Spec Overall LR 
Model 1       58.1 93.5 82.7 
FTT* −0.023 0.016 2.175 .140 0.977 0.948–1.008     
WAIS-III: PSI −0.050 0.018 7.531 .006 0.951 0.918–0.986     
WAIS-III: WMI −0.026 0.015 3.003 .083 0.975 0.946–1.003     
CVLT-II (FC) −0.542 0.138 15.463 .000 0.581 0.444–0.762     
BVMT-R (Hits) −0.524 0.148 12.493 .000 0.592 0.443–0.792     
Constant 17.83 2.55 48.97 .000       

Notes: FTT = Finger Tapping Test; WAIS-III = Wechsler Intelligence Scale, Third edition; PSI = Processing Speed Index; WMI = Working Memory Index; CVLT-II (FC) = California Verbal Learning Test, Second edition, Forced Choice trial; BVMT-R (Hits) = Brief Visuospatial Memory Test-Revised, correct number of hits on Yes/No recognition trial; Sen = sensitivity; Spec = specificity; LR = likelihood ratio; Overall = overall hit rate; based on regression equation cut-off ≥0.5037680.

*FTT based on mean of first three trials of dominant hand.

The accuracy statistics noted above utilized a range of patients and performance levels including those with mild cognitive deficits (denoted as Cognitive Disorder NOS). Therefore, these findings would hopefully generalize to a wide range of populations and settings. However, if one could be confident that the population or individual patient is not suspected of having any significant cognitive decline (e.g., remote mTBI, chronic pain, mild depression/anxiety, etc.) than more stringent cutoffs could provide a more accurate assessment of non-credible performance. With this in mind, the above calculations were re-run excluding those with dementia as well as those diagnosed with clearly evident cognitive deficits (Cognitive Disorder NOS diagnoses) and can be found in Table 11.

Table 11.

Accuracy of TOMM1, TOMMe10, and the five embedded measures regression formula after removing those with a diagnosis of Cognitive Disorder NOS from the sample (N = 344)

Measures Sensitivity Specificity Hit rate AUC LR 
TOMM1a 83.0 93.0 89.0 94.2 12 
TOMMe10b 75.5 94.6 87.3 89.6 14 
Regressionc 78.2 95.3 88.9 92.1 17 
FTT*      
WAIS-III: PSI      
WAIS-III: WMI      
CVLT-II (FC)      
BVMT-R (Hits)      
Measures Sensitivity Specificity Hit rate AUC LR 
TOMM1a 83.0 93.0 89.0 94.2 12 
TOMMe10b 75.5 94.6 87.3 89.6 14 
Regressionc 78.2 95.3 88.9 92.1 17 
FTT*      
WAIS-III: PSI      
WAIS-III: WMI      
CVLT-II (FC)      
BVMT-R (Hits)      

Notes: WAIS-III = Wechsler Intelligence Scale, Third edition; PSI = Processing Speed Index; WMI = Working Memory Index; CVLT-II (FC) = California Verbal Learning Test, Second edition, Forced Choice trial; BVMT-R (Hits) = Brief Visuospatial Memory Test-Revised, correct number of hits on yes/no recognition trial; AUC = area under the curve; LR = likelihood ratio.

aTOMM1 cut-off ≤41.

bTOMMe10 cut-off ≥1 errors.

cRegression equation cut-off ≥0.5010178.

*FTT based on the mean of first three trials of dominant hand.

The overall accuracy statistics for TOMM1 and TOMMe10 showed modest increases in overall accuracy based on AUC and LRs, but the embedded measures regression dramatically improved (LR increased from 9 to 17). All three predictors showed AUC ≥90% with the greatest improvements in sensitivity shown by the embedded measures regression formula (sensitivity increased 20% from 58% to 78%) and TOMM1 (sensitivity increased 11% from 72% to 83%) with the cutoff being raised to ≤41. Therefore, this table may be more useful in settings/populations where there is no suspected neurological dysfunction or significant medical/psychiatric history that would tend to significantly impact cognitive functioning.

Discussion

The purpose of this study was to further assess the accuracy of TOMM1 in predicting performance on a well-validated and commonly used measure of cognitive test validity (MSVT). In addition, we explored the accuracy of a novel measure from TOMM1 by tabulating TOMMe10. Finally, given the recommendations to assess the validity of a patient's performance continuously throughout testing (Boone, 2009; Heilbronner et al., 2009), we also assessed the accuracy of a group of embedded measures frequently cited in the malingering research literature.

TOMM Trial 1

Consistent with an informal review of available studies providing TOMM1 accuracy statistics (Table 2), the current study found a similar pattern of results with the greatest accuracy in our sample found at a cutoff of ≤40 (sensitivity = 72%, specificity = 94%, hit rate = 87%, LR = 12) with the performance of TOMM1 overall showing a classification accuracy of 92%. There appears to be converging evidence across multiple studies, as well as in our mixed clinical sample of non-demented veterans, that TOMM1 is a robust measure predicting performance on other freestanding validity measures. Current results also suggest that TOMM1 shows greater sensitivity to invalid test performance compared with the standard TOMM administration as reviewed by Green (2007) and in a recent meta-analysis of the TOMM (Sollman & Berry, 2011). Our results are relatively consistent with the recently proposed Albany Consistency Index (ACI) which increased the sensitivity of the TOMM to poor effort as measured by the WMT (standard TOMM: sensitivity = 33%, ACI = 71%). Given that the standard TOMM requires administration of two or three trials, and analysis of response consistency as described by the ACI requires all three trials to be administered (Gunner et al., 2012), TOMM1 appears to provide a more efficient and more accurate measure of test validity compared with the latter two trials of the TOMM.

Armistead-Jehle and Gervais (2011) and Green (2011) have recently shown that the standard cutoffs for the TOMM show acceptable specificity rates (>90%), but other freestanding measures such as the Non-verbal MSVT (NV-MSVT, Green, 2008) are approximately twice as sensitive to invalid test performance. However, consistent with our hypotheses, utilizing TOMM1 ≤40 as a cutoff and using pass/fail on the MSVT as the comparison measure (similar to the methods of Armistead-Jehle et al.), we found sensitivity rates for TOMM1 to be over twice as high compared with the standard TOMM administration in that study (sensitivity = 35% vs. 72%). In fact, our sample, along with previous studies with TOMM1 data (see Table 2: TOMM1 ≤40, sensitivity = 0.77, specificity = 0.92), tends to show greater sensitivity than the NV-MSVT described in the Armistead-Jehle and colleagues study.

Furthermore, when removing those diagnosed with any level of suspected cognitive impairment from our sample (similar to the sample characteristics of Armistead-Jehle & Gervais, 2011: Disability seeking, non-head-injured sample), and increasing the cut score to ≤41, the sensitivity of TOMM1 increased 11% (from 72% to 83%) while still maintaining high specificity (93%, AUC = 94%). Therefore, accuracy statistics in Table 11, and TOMM1 ≤41, may be more appropriate in settings/populations where there is no suspected neurological dysfunction or severe medical/psychiatric disorder that would tend to significantly impact cognitive functioning.

With regard to the TOMMe10, at a cutoff of ≥1 errors on the first 10 items, this measure showed comparable accuracy statistics with TOMM1 noted above (sensitivity = 71%, specificity = 93%, hit rate = 86%, LR = 11) with the performance of TOMMe10 overall showing a classification accuracy of 87%. This is not surprising due to the very high correlation between TOMM1 and TOMMe10 (r = −.91, p < .001) and suggests that administering the entire TOMM1 may actually be redundant and lack significant incremental validity over the TOMMe10. However, TOMMe10 showed only a modest improvement in sensitivity and overall accuracy when those with cognitive impairment were removed from the sample. Nevertheless, the accuracy of TOMMe10 in our sample provides preliminary evidence of a feasible “short” short form of the TOMM that essentially reduces the number of items administered by up to 93% (150 items for all three trials to just 10 items). This savings in administration time may have the advantage of not “giving away” too much about the nature of the test and preserve the TOMM's intended purpose if a repeat administration at a later time is warranted. The TOMMe10 could also be used as a very brief screening measure of effort particularly when time constraints might be an issue. Overall, when the assessment of test validity is carried out very early in the testing session with TOMM1 or TOMMe10, the sensitivity to poor effort is enhanced (over the standard TOMM administration) while at the same time maintaining the low rates of false-positive errors.

The original TOMM administration instructions include decision rules based on poor performance on Trial 2 and the retention trial, but only include the interpretation of below chance performance for TOMM1 (which is quite rare in known malingerers, seeGreve, Binder, & Bianchini, 2009; Kim et al., 2010). It is unclear why the TOMM was originally designed to essentially ignore almost 30%–50% of the test data collected (if administering all three trials or just Trials 1 and 2, respectively). We are unaware of any neuropsychological test in frequent use today, let alone the most frequently used in the assessment of malingering/effort (Sharland & Gfeller, 2007) that essentially ignores almost half of the test data collected. We believe that the previously “neglected” TOMM1 scores will be more closely scrutinized by clinicians/researchers in the future as others become more aware of the greater efficiency and improved sensitivity of these measures over the standard TOMM administration.

The current results of TOMM1 could be used immediately by the average clinician with little need for supplemental statistical calculations or alteration in clinical practice. If one uses the TOMM, one already has the scores for TOMM1 as well as the patient's performance across the first 10 items of Trial 1. The TOMM1 and TOMMe10 tables may further assist the practicing clinician by providing a range of cutoffs and sensitivity/specificity rates allowing one to decide what cutoffs are most appropriate for their particular setting (based on tolerance for false-positive or false-negative errors). Previous reports have raised concerns about the use of TOMM1 as a freestanding measure of test validity in forensic contexts and note that it may only be appropriate as a screening measure in clinical settings (Bauer et al., 2007; Hilsabeck et al., 2011; O'Bryant et al., 2007, 2008). However, based on the current study and review of TOMM1 accuracy statistics across a range of populations/contexts covering over 2,600 individuals (Table 2), it appears that TOMM1 shows acceptable accuracy statistics in predicting non-credible performance patterns and could be considered an independent measure of test validity in its own right.

The cutoffs proposed in the current study for TOMM1 and TOMMe10 should be used with caution in various clinical patient samples where moderate to severe cognitive limitations are commonplace. The current cut scores appear to be most appropriate in patient samples that include those with relatively mild cognitive deficits, but these cutoffs may be too high if the sample includes individuals with dementia (Horner et al., 2006; Greve et al., 2009) or memory disorders (Greve, Bianchini, Black, et al., 2006). Frequently, lower cutoff scores for TOMM1 in the literature (Table 2) included patients diagnosed with dementia, so future research should develop more accurate cut scores for that population.

Other patient groups where the proposed TOMM1 and TOMMe10 cutoffs should be viewed with significant caution include those with mental retardation or those receiving inpatient treatment for psychotic disorders. Those with mental retardation have shown high rates of poor performance on TOMM1 with 30% performing ≤39 in one study (Shandera et al., 2010). Full administration of the TOMM would be recommended in mentally retarded populations as the retention trial has shown specificity rates at acceptable levels (>90%; Shandera et al., 2010). Similar cautions for TOMM1 cutoffs are highlighted by a study by Duncan (2005) who screened psychotic forensic inpatients for adequate effort. He found that 24% of those with concentration problems (as measured by an index on the Conners' Continuous Performance Test-II) scored ≤39 on TOMM1, but only 5% of those without concentration problems performed this poorly. Weinborn and colleagues (2003) also found substantial rates of low TOMM1 scores in psychiatric inpatients with or without greater implied incentives to provide poor effort (TOMM1 ≤40: civil commitment = 39%, competency to stand trial = 48%). Future research should address adjusting possible cutoffs in these specific populations in order to take into account moderate to severe cognitive limitations.

Embedded Measures

Another purpose of the current study was to cross-validate and provide additional information regarding the accuracy of a variety of individual and combined embedded measures of test validity. Most of the embedded measures in the current study have been used by others in malingering research across a range of patient populations. In our sample, individual measures showed sensitivities ranging from 15% to 45% with specificities above 90% (except FTT = 78% and BVMT-R hits = 89%). The overall hit rates for each individual measure ranged from 70% to 77%. Based on previous work showing that a regression-based approach may improve sensitivity without negatively impacting specificity (Schutte et al., 2011; Victor et al., 2009), we used the same approach by utilizing embedded measures as continuous variables. We were able to improve sensitivity (58%) while maintaining specificity ≥90%. Hit rate (83%) and overall accuracy (AUC = 87%) of the regression equation of embedded measures was also more accurate compared with any individual embedded indicator based on empirically derived cutoffs.

Others have also proposed adding the number of failed embedded measures to simplify decision-making, increase sensitivity, and reduce false-positive errors (Larrabee, 2003, 2008; Myers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011; Pella et al., 2012; Victor et al., 2009). By using this method, failure on any two or more measures showed the greatest balance between sensitivity (40%), specificity (93%) and had a hit rate of 75%. However, this method was less accurate than utilizing the embedded measures as continuous variables noted above which contrasted with Larrabee's (2003) findings where any pair-wise comparison of failed measures was more accurate. This highlights the importance of how a group of five different embedded measures used by Larrabee, as well as the criterion for group membership (e.g., below chance performance on the Portland Digit Recognition Test [PDRT] and failure on PDRT and one additional measure), can impact the accuracy statistics of various measures. Therefore, caution should be exercised when combining any two failed embedded measures across any test battery as this may be less sensitive/accurate than the other methods of combining validity indicators.

Several studies have highlighted the usefulness of combining embedded measures across different tests to predict poor effort. Schutte and colleagues (2011) created a composite measure of embedded memory measures (very similar to the current study using logistic regression techniques) based on the CVLT-II (Trial 5, FC), Rey Complex Figure Test (RCFT) (immediate recall), and Wechsler Memory Scale-III Verbal Paired Associates-2 to predict MSVT and TOMM performance. Results were strikingly similar to our findings despite using (mostly) different measures (sensitivity = 59% specificity = 95%, AUC = 84%). Victor and colleagues (2009) also used regression techniques and combined four embedded validity indicators (based on the RCFT, Reliable Digit Span, Rey Auditory Verbal Learning Test, FTT) to predict pass/fail on ≥2 freestanding validity measures. Overall, the accuracy of the regression analysis was quite impressive (sensitivity = 86%, specificity = 96%, hit rate = 92%), and contrary to our findings, Victor and colleagues found that failure on any two embedded indicators was just as accurate as the regression equation. Several studies have also found that combining a variety of embedded measures across multiple tests (using a variety of techniques and criteria for test validity) to be highly accurate and deserves ongoing study (Larrabee, 2003, 2008; Myers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011; Schroeder & Marshall, 2011; Whiteside, Wald, & Busse, 2011). The current study contributes to this growing research literature by providing three methods of determining the validity of various embedded measures: Utilizing them as continuous measures (regression equation), individually (based on empirically derived cut scores), or combining two or more failures in predicting test validity. The regression equation using all five variables (WMI, PSI, CVLT-II FC, FTT, and BVMT-R Hits) was more accurate in predicting performance on the MSVT than any individual embedded measure alone or in combination.

There are numerous validity measures that have been derived from embedded indices (Boone, 2007; Larrabee, 2007). Unfortunately, because there are so many different calculations possible across multiple tests, the average clinician may be quickly overwhelmed by which ones and/or how many one should actually interpret when making a determination of valid/invalid test results. Given that neuropsychologists often administer multiple tests to their patients, there are dozens of possible calculations that could be derived with very little research to guide them in which combination of measures are the most efficient to administer/score as well as being the most accurate (and not overly redundant). For example, Miele and colleagues (2012) assessed the utility of 15 embedded measures derived from the WAIS-R and the Halstead-Reitan Neuropsychological Battery. They found reliable digit span alone (cutoff ≤7) correctly classified 74% of individuals and no other embedded measure significantly impacted the classification rate. In other words, the other 14 embedded measures were essentially redundant and unnecessary. Additional research analyzing the combination of embedded measures will be helpful in further determining the most efficient group of test scores to interpret based on patient population, context of evaluation, and specific cutoffs.

Another unique finding from the current study was the BVMT-R recognition hits trial contributed significantly to the regression equation in predicting MSVT performance and was one of the more sensitive individual freestanding measures (45%) at a cut score of ≤4 correct hits. It is not surprising that this measure is useful in predicting test validity as it tends to be a somewhat easy task and follows the typical forced-choice format (e.g., yes/no) that many freestanding validity measures have utilized. Finally, using a measure of visual memory complements the other cognitive domains assessed by the four other embedded measures of test validity (e.g., verbal memory, auditory attention, fine motor speed, and visual attention/motor speed). This also is in line with suggestions to cover a variety of cognitive domains when assessing test validity (Heilbronner et al., 2009). Cross-validation of the newly introduced BVMT-R recognition hits trial as an embedded measure would also be helpful as no other study has utilized a cutoff of ≤4 correct hits to indicate invalid test performance. Likewise, cross-validation of the regression equations and combinations of embedded measures used in our study is warranted given some of the unique features of the current sample (primarily Caucasian, male Veterans, with high rates of psychiatric diagnoses).

Limitations of Study Findings and Future Directions

One limitation of the current study was the use of only a single freestanding measure of test validity (MSVT) in determining group status. However, the MSVT has shown impressive sensitivity to poor effort, is slightly easier than similar measures such as the WMT, and has low false-positive rates in various patient populations (Batt, Shores, & Chekaluk, 2008; Carone, 2008; Green, 2007). After excluding those with clear dementia from our sample, the rate of failure on the MSVT (33%) was relatively consistent with the effort test failure rates in other recent Veteran samples (Armistead-Jehle, 2010; Nelson et al., 2010; Whitney, Shepard, Williams, Davis, & Adams, 2009), similar to rates described by Mittenberg, Patton, Canyock, and Condit (2002), and consistent with failure rates suggested by Larrabee and colleagues (2009).

Another limitation is the patient sample being relatively homogeneous (all Veterans and greater than 90% male and Caucasian). Also, given that the majority of patients included in the sample were referred from primary care and psychiatry clinics, there were fewer individuals with clear signs of neurological dysfunction or evidence of neurological disorders, but the patients did have high levels of co-morbid psychiatric disorders (≥50% being diagnosed with depression and/or anxiety). Accuracy rates and cutoffs for the TOMM1 measures and embedded indices may differ in samples with significantly different demographic and clinical characteristics. However, the large sample size and generous inclusion criteria (excluding dementia) should generalize to a wide range of patient samples.

One of the most neglected areas of study with regard to the assessment of test validity is the order of test administration and how this might affect the accuracy of various freestanding and embedded measures. We have provided in Table 3 the order of the tests administered for the current study. It is possible that accuracy statistics for the TOMM1 measures and the embedded indices could differ slightly if the order of tests is not similar to our typical test battery. For example, if the TOMM and/or MSVT were administered after the CVLT-II or other cognitively demanding or verbal memory tasks, those intent on providing exaggerated deficits on testing may be less inclined to perform below the cutoffs on these measures (realizing the easy nature of the tests). In fact, Guilmette and colleagues (1996) provide evidence to support this assertion showing decreased sensitivity of the abbreviated Hiscock procedure when it was administered at the end versus at the beginning of a test battery. Bush and colleagues (2005) also encourages the administration of at least one validity measure early in the test battery indicating the possible decrease in the sensitivity of validity measures as patients become familiar with more challenging cognitive tests. This requires further exploration as many malingering research studies provide no evidence of a structured/consistent test order which may be one factor influencing accuracy rates of freestanding and embedded measures.

Conclusions

The current findings suggest that two short forms of the TOMM (TOMM1, TOMMe10) are highly accurate in predicting invalid cognitive test performance as measured by the MSVT. Even though this is the first study of the TOMMe10 effort test variable and cutoffs should be replicated in other samples, the TOMM1 cutoffs are consistent with an ever-growing research base showing TOMM1 as a more sensitive measure of test validity than the standard TOMM administration. Finally, further replication of the regression formula of embedded measures is warranted, but our results provide evidence that simply combining any two embedded validity indicators may be less accurate than using regression-based procedures in predicting invalid test performance. Continued research creating more accurate/efficient methods of assessing test validity will benefit clinicians in all areas of neuropsychological practice.

Funding

This material is based upon work supported in part by the Department of Veterans Affairs, Veterans Health Administration, and Office of Research and Development.

Conflict of Interest

None declared.

References

American Board of Clinical Neuropsychology
American Academy of Clinical Neuropsychology (AACN) practice guidelines for neuropsychological assessment and consultation
The Clinical Neuropsychologist
 , 
2007
, vol. 
21
 (pg. 
209
-
231
)
Armistead-Jehle
P.
Symptom validity test performance in U.S. veterans referred for evaluation of mild TBI
Applied Neuropsychology
 , 
2010
, vol. 
17
 (pg. 
52
-
59
)
Armistead-Jehle
P.
Gervais
R. O.
Sensitivity of the Test of Memory Malingering and the Nonverbal Medical Symptom Validity Test: A replication study
Applied Neuropsychology
 , 
2011
, vol. 
18
 (pg. 
284
-
290
)
Armistead-Jehle
P.
Hansen
C. L.
Comparison of the repeatable battery for the assessment of neuropsychological status effort index and stand-alone symptom validity tests in a military sample
Archives of Clinical Neuropsychology
 , 
2011
, vol. 
26
 (pg. 
592
-
601
)
Arnold
G.
Boone
K. B.
Lu
P.
Dean
A.
Wen
J.
Nitch
S.
, et al.  . 
Sensitivity and specificity of finger tapping scores for the detection of suspect effort
The Clinical Neuropsychologist
 , 
2005
, vol. 
19
 (pg. 
105
-
120
)
Ashendorf
L.
Constantinou
M.
McCaffrey
R. J.
The effect of depression and anxiety on the TOMM in community-dwelling older adults
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
125
-
130
)
Batt
K.
Shores
E. A.
Chekaluk
E.
The effect of distraction on the Word Memory Test and Test of Memory Malingering performance in patients with severe brain injury
Journal of the International Neuropsychological Society
 , 
2008
, vol. 
14
 (pg. 
1074
-
1080
)
Bauer
L.
McCaffrey
R. J.
Coverage of the Test of Memory Malingering, Victoria Symptom Validity Test, and Word Memory Test on the internet: Is test security threatened?
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
21
 (pg. 
121
-
126
)
Bauer
L.
O'Bryant
S. E.
Lynch
J. K.
McCaffrey
R. J.
Fisher
J. M.
Examining the test of memory malingering trial 1 and word memory test immediate recognition as screening tools for insufficient effort
Assessment
 , 
2007
, vol. 
14
 (pg. 
215
-
222
)
Benedict
R. H. B.
Brief Visuospatial Memory Test-Revised
 , 
1997
Lutz, FL
Psychological Assessment Resources
Boone
K. B.
Assessment of feigned cognitive impairment: A neuropsychological perspective
 , 
2007
New York
Guilford Press
 
(Ed.).
Boone
K. B.
The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
729
-
741
)
Brooks
B. L.
Sherman
E. M. S.
Krol
A. L.
Utility of TOMM trial 1 as an indicator of effort in children and adolescents
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 (pg. 
23
-
29
)
Bush
S. S.
Ruff
R. M.
Troster
A. I.
Barth
J. T.
Koffler
S. P.
Pliskin
N. H.
, et al.  . 
Symptom validity assessment: Practice issues and medical necessity NAN policy and planning committee
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
419
-
426
)
Carone
D. A.
Children with moderate/severe brain damage/dysfunction outperform adults with mild-to-no brain damage on the Medical Symptom Validity Test
Brain Injury
 , 
2008
, vol. 
22
 (pg. 
960
-
971
)
Cohen
J.
Statistical power analysis for the behavioral sciences
 , 
1988
2nd ed.
Hillsdale, NJ
Lawrence Erlbaum Associates
Constantinou
M.
Bauer
L.
Ashendorf
L.
Fisher
J. M.
McCaffrey
R. J.
Is poor performance on recognition memory effort measures indicative of generalized poor performance on neuropsychological tests?
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
191
-
198
)
Curtis
K. L.
Greve
K. W.
Bianchini
K. J.
The Wechsler Adult Intelligence Scale-III and malingering in traumatic brain injury: Classification accuracy in known groups
Assessment
 , 
2009
, vol. 
16
 (pg. 
401
-
414
)
Delis
D. C.
Kramer
J. H.
Kaplan
E.
Ober
B. A.
California Verbal Learning Test
 , 
2000
2nd ed.
San Antonio, TX
The Psychological Corporation
Duncan
A.
The impact of cognitive and psychiatric impairment of psychotic disorders on the Test of Memory Malingering (TOMM)
Assessment
 , 
2005
, vol. 
12
 (pg. 
123
-
129
)
Etherton
J.
Bianchini
K.
Ciota
M.
Heinly
M. T.
Greve
K.
Pain, malingering and the WAIS-III working memory index
The Spine Journal
 , 
2006
, vol. 
6
 (pg. 
61
-
71
)
Etherton
J.
Bianchini
K.
Greve
K.
Ciota
M.
Test of Memory Malingering is unaffected by laboratory-induced pain
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
375
-
384
)
Etherton
J.
Bianchini
K.
Heinly
M. T.
Greve
K.
Pain, malingering, and performance on the WAIS-III processing speed index
Journal of Clinical and Experimental Neuropsychology
 , 
2006
, vol. 
28
 (pg. 
1218
-
1237
)
Fox
D. D.
Symptom validity test failure indicates invalidity of neuropsychological tests
The Clinical Neuropsychologist
 , 
2011
, vol. 
25
 (pg. 
488
-
495
)
Gavett
B. R.
O'Bryant
S. E.
Fisher
J. M.
McCaffrey
R. J.
Hit rates of adequate performance based on the Test of Memory Malingering (TOMM) trial 1
Applied Neuropsychology
 , 
2005
, vol. 
12
 (pg. 
1
-
4
)
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
475
-
487
)
Gierok
S. D.
Dickson
A. L.
Cole
J. A.
Performance of forensic and non-forensic adult psychiatric inpatients on the test of memory malingering
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
755
-
60
)
Green
P.
Green's Word Memory Test
 , 
2003
Edmonton, CA
Green's Publishing
Green
P.
Green's Medical Symptom Validity Test (MSVT) for Windows: User's manual
 , 
2004
Edmonton, Alberta, Canada
Green's Publishing
Green
P.
The pervasive influence of effort on neuropsychological tests
International Journal of Forensic Psychology
 , 
2006
, vol. 
1
 (pg. 
1
-
21
)
Green
P.
Boone
K. B.
Spoiled for choice: Making comparisons between forced-choice effort tests
Assessment of feigned cognitive impairment: A neuropsychological perspective
 , 
2007
New York
Guilford Press
Green
P.
Manual for the Nonverbal Medical Symptom Validity Test
 , 
2008
Edmonton, Canada
Green's Publishing
Green
P.
Comparison between the Test of Memory Malingering (TOMM) and the Nonverbal Medical Symptom Validity Test (NV-MSVT) in adults with disability claims
Applied Neuropsychology
 , 
2011
, vol. 
18
 (pg. 
18
-
26
)
Green
P.
Rohling
M. L.
Lees-Haley
P. R.
Allen
L. M.
Effort has a greater effect on test scores than severe brain injury in compensation claimants
Brain Injury
 , 
2001
, vol. 
15
 (pg. 
1045
-
1060
)
Greiffenstein
M. F.
Greve
K. W.
Bianchini
K. J.
Baker
W. J.
Test of Memory Malingering and Word Memory Test: A new comparison of failure concordance rates
Archives of Clinical Neuropsychology
 , 
2008
, vol. 
23
 (pg. 
801
-
807
)
Greve
K. W.
Bianchini
K. J.
Black
F. W.
Heinly
M. T.
Love
J. M.
Swift
D. A.
, et al.  . 
Classification accuracy of the test of memory malingering in persons reporting exposure to environmental and industrial toxins: Results of a known-groups analysis
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
21
 (pg. 
439
-
448
)
Greve
K. W.
Bianchini
K. J.
Doane
B. M.
Classification accuracy of the test of memory malingering in traumatic brain injury: Results of a known groups analysis
Journal of Clinical and Experimental Neuropsychology
 , 
2006
, vol. 
28
 (pg. 
1176
-
1190
)
Greve
K. W.
Binder
L. M.
Bianchini
K. J.
Rates of below-chance performance in forced-choice symptom validity tests
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
534
-
544
)
Greve
K. W.
Etherton
J. L.
Ord
J.
Bianchini
K. J.
Curtis
K. L.
Detecting malingered pain-related disability: Classification accuracy of the test of memory malingering
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
1250
-
1271
)
Greve
K. W.
Ord
J.
Curtis
K. L.
Bianchini
K. J.
Brennan
A.
Detecting malingering in traumatic brain injury and chronic pain: A comparison of three forced-choice symptom validity tests
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
896
-
918
)
Guilmette
T. J.
Whelihan
W. M.
Hart
K. J.
Sporadeo
F. R.
Buongiorno
G.
Order effects in the administration of a forced-choice procedure for detection of malingering in disability claimants’ evaluations
Perceptual and Motor Skills
 , 
1996
, vol. 
83
 (pg. 
1007
-
1016
)
Gunner
J. H.
Miele
A. D.
Lynch
J. K.
McCaffrey
R. J.
The Albany consistency index for the Test of Memory Malingering
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 (pg. 
1
-
9
)
Haber
A. H.
Fichtenberg
N. L.
Replication of the Test of Memory Malingering (TOMM) in a traumatic brain injury and head trauma sample
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
524
-
532
)
Heilbronner
R. L.
Sweet
J. J.
Morgan
J. E.
Larrabee
G. J.
Millis
S. R.
and Conference Participants
America academy of clinical neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias, and malingering
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
1093
-
1129
)
Hill
S. K.
Ryan
L. R.
Kennedy
C. H.
Malamut
B. L.
The relationship between measures of declarative memory and the test of memory malingering in patients with and without temporal lobe dysfunction
Journal of Forensic Neuropsychology
 , 
2003
, vol. 
3
 (pg. 
1
-
18
)
Hilsabeck
R. C.
Gordon
S. N.
Hietpas-Wilson
T.
Zartman
A. I.
Use of trial 1 of the Test of Memory Malingering (TOMM) as a screening measure of effort: Suggested discontinuation rules
The Clinical Neuropsychologist
 , 
2011
, vol. 
25
 (pg. 
1228
-
1238
)
Horner
M. D.
Bedwell
J. S.
Duog
A.
Abbreviated form of the test of memory malingering
International Journal of Neuroscience
 , 
2006
, vol. 
116
 (pg. 
1181
-
1186
)
Hosmer
D. W.
Lemeshow
S.
Applied logistic regression
 , 
2000
New York
Wiley
Iverson
G. L.
Le Page
J.
Koehler
B. E.
Shojania
K.
Badii
M.
Test of Memory Malingering (TOMM) scores are not affected by chronic pain or depression in patients with fibromyalgia
The Clinical Neuropsychologist
 , 
2007
, vol. 
21
 (pg. 
532
-
546
)
Jasinski
L. J.
Harp
J. P.
Berry
D. T. R.
Shandera-Oschner
A. L.
Mason
L. H.
Ranseen
J. D.
Using symptom validity tests to detect malingered ADHD in college students
The Clinical Neuropsychologist
 , 
2011
, vol. 
25
 (pg. 
1415
-
1428
)
Kaufmann
P. M.
Protecting raw data and psychological tests from wrongful disclosure: A primer on the law and other persuasive strategies
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
1130
-
1159
)
Kim
M. S.
Boone
K. B.
Victor
T.
Marion
S. D.
Amano
S.
Cottingham
M. E.
, et al.  . 
The Warrington Recognition Memory Test for Words as a measure of response bias: Total score and response time cutoffs developed on “real world” credible and noncredible subjects
Archives of Clinical Neuropsychology
 , 
2010
, vol. 
25
 (pg. 
60
-
70
)
Kirk
J. W.
Harris
B.
Hutaff-lee
C. F.
Koelemay
S. W.
Dinkins
J. P.
Kirkwood
M. W.
Performance on the test of memory malingering (TOMM) among a large clinic-referred pediatric sample
Child Neuropsychology
 , 
2011
, vol. 
17
 (pg. 
242
-
254
)
Kirkwood
M. W.
Kirk
J. W.
Blaha
R. Z.
Wilson
P.
Noncredible effort during pediatric neuropsychological exam: A case series and literature review
Child Neuropsychology
 , 
2010
, vol. 
16
 (pg. 
604
-
618
)
Kutner
M.
Nachtsheim
C.
Neter
J.
Applied linear regression models
 , 
2004
4th ed
New York
McGraw-Hill/Irwin
Larrabee
G. J.
Detection of malingering using atypical performance patterns on standard neuropsychological tests
The Clinical Neuropsychologist
 , 
2003
, vol. 
17
 (pg. 
410
-
425
)
Larrabee
G. J.
Assessment of malingered neuropsychological deficits
 , 
2007
Oxford
Oxford University Press
Larrabee
G. J.
Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
666
-
679
)
Larrabee
G. L.
Millis
S. R.
Meyers
J. E.
40 Plus or minus 10, a new magical number: Reply to Russell
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
841
-
849
)
Lindstrom
W. A.
Lindstrom
J. H.
Coleman
C.
Nelson
J.
Gregg
N.
The diagnostic accuracy of symptom validity tests when used with postsecondary students with learning disabilities: A preliminary investigation
Archives of Clinical Neuropsychology
 , 
2009
, vol. 
24
 (pg. 
659
-
669
)
Locke
D. E. C.
Smigielski
J. S.
Powell
M. R.
Stevens
S. R.
Effort issues in post-acute outpatient acquired brain injury rehabilitation seekers
Neurorehabilitation
 , 
2008
, vol. 
23
 (pg. 
273
-
281
)
Marshall
P.
Schroeder
R.
O'Brien
J.
Fischer
R.
Ries
A.
Blesi
B.
, et al.  . 
Effectiveness of symptom validity measures in identifying cognitive and behavioral symptom exaggeration in adult attention deficit hyperactivity disorder
The Clinical Neuropsychologist
 , 
2010
, vol. 
24
 (pg. 
1204
-
1237
)
McCaffrey
R. J.
O'Bryant
S. E.
Ashendorf
L.
Fisher
J. M.
Correlations among the TOMM, rey-15, and mmpi-2 validity scales in a sample of TBI litigants
Journal of Forensic Neuropsychology
 , 
2003
, vol. 
3
 (pg. 
45
-
53
)
Meyers
J. E.
Volbrecht
M.
A validation of multiple malingering detection methods in a large clinical sample
Archives of Clinical Neuropsychology
 , 
2003
, vol. 
18
 (pg. 
261
-
276
)
Meyers
J. E.
Volbrecht
M.
Axelrod
B. N.
Reinsch-Boothby
L.
Embedded symptom validity tests and overall neuropsychological test performance
Archives of Clinical Neuropsychology
 , 
2011
, vol. 
26
 (pg. 
8
-
15
)
Miele
A. S.
Gunner
J. H.
Lynch
J. K.
McCaffrey
R. J.
Are embedded validity indices equivalent to free-standing symptom validity tests
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 (pg. 
10
-
22
)
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
 , 
2002
, vol. 
24
 (pg. 
1094
-
1102
)
Morgan
J. E.
Sweet
J. J.
Neuropsychology of malingering casebook
 , 
2009
New York
Psychology Press
Moore
B. A.
Donders
J.
Predictors of invalid neuropsychological test performance after traumatic brain injury
Brain Injury
 , 
2004
, vol. 
18
 (pg. 
975
-
984
)
Morel
K. R.
Test security in medicolegal cases: Proposed guidelines for attorneys utilizing neuropsychology practice
Archives of Clinical Neuropsychology
 , 
2009
, vol. 
24
 (pg. 
635
-
646
)
Musso
M. W.
Barker
A. A.
Jones
G. N.
Roid
G. H.
Gouvier
W. D.
Development and validation of the Stanford binet-5 rarely missed items-nonverbal index for the detection of malingered mental retardation
Archives of Clinical Neuropsychology
 , 
2011
, vol. 
26
 (pg. 
756
-
767
)
Nelson
N. W.
Hoelzle
J. B.
McGuire
K. A.
Ferrier-Auerbach
A. G.
Charlesworth
M. J.
Sponheim
S. R.
Evaluation context impacts neuropsychological performance of OIF/OEF veterans with reported combat-related concussion
Archives of Clinical Neuropsychology
 , 
2010
, vol. 
25
 (pg. 
713
-
723
)
O'Bryant
S. E.
Engel
L. R.
Kleiner
J. S.
Vasterling
J. J.
Black
F. W.
Test of memory malingering (TOMM) trial 1 as a screening measure for insufficient effort
The Clinical Neuropsychologist
 , 
2007
, vol. 
21
 (pg. 
511
-
521
)
O'Bryant
S. E.
Gavett
B. E.
McCaffrey
R. J.
O'Jile
J. R.
Huerkamp
J. K.
Smitherman
T. A.
, et al.  . 
Clinical utility of trial 1 of the test of memory malingering (TOMM)
Applied Neuropsychology
 , 
2008
, vol. 
15
 (pg. 
113
-
116
)
Pella
R. D.
Hill
B. D.
Shelton
J. T.
Elliott
E.
Gouvier
W. D.
Evaluation of embedded malingering indices in a non-litigating clinical sample using control, clinical, and derived groups
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 (pg. 
45
-
57
)
Powell
M. R.
Gfeller
J. D.
Hendricks
B. L.
Sharland
M.
Detecting symptom-and test-coached simulators with the test of memory malingering
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
693
-
702
)
Rees
L.
Tombaugh
T.
Boulay
L.
Depression and the Test of memory Malingering
Archives of Clinical Neuropsychology
 , 
2001
, vol. 
16
 (pg. 
501
-
506
)
Ruocco
A. C.
Swirsky-Sacchetti
T.
Chute
D. L.
Mandel
S.
Platek
S. M.
Zillmer
E. A.
Distinguishing between neuropsychological malingering and exaggerated psychiatric symptoms in a neuropsychological setting
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
547
-
564
)
Ryan
J. J.
Glass
L. A.
Hinds
R. M.
Brown
C. N.
Administration order effects on the test of memory malingering
Applied Neuropsychology
 , 
2010
, vol. 
17
 (pg. 
246
-
250
)
Schiehser
D. M.
Delis
D. C.
Filoteo
J. V.
Delano-Wood
L.
Han
S. D.
Jak
A. J.
, et al.  . 
Are self-reported symptoms of executive dysfunction associated with objective executive performance following mild to moderate traumatic brain injury?
Journal of Clinical and Experimental Neuropsychology
 , 
2011
, vol. 
33
 (pg. 
704
-
714
)
Schroeder
R. W.
Baade
L. E.
Peck
C. P.
Heinrichs
R. J.
Use of test of memory malingering trial 1 as a measure of response bias
The Clinical Neuropsychologist
 , 
2011
, vol. 
26
 pg. 
564
 
Schroeder
R. W.
Marshall
P. S.
Evaluation of the appropriateness of multiple symptom validity indices in psychotic and non-psychotic psychiatric populations
The Clinical Neuropsychologist
 , 
2011
, vol. 
25
 (pg. 
437
-
453
)
Schutte
C.
Millis
S.
Axelrod
B.
VanDyke
S.
Derivation of a composite measure of embedded symptom validity indices
The Clinical Neuropsychologist
 , 
2011
, vol. 
25
 (pg. 
454
-
462
)
Shandera
A. L.
Berry
D. T. R.
Clark
J. A.
Schipper
L. J.
Graue
L. O.
Harp
J. P.
Detection of malingered mental retardation
Psychological Assessment
 , 
2010
, vol. 
22
 (pg. 
50
-
56
)
Sharland
M. J.
Gfeller
J. D.
A survey of neuropsychologist's beliefs and practices with respect to the assessment of effort
Archives of Clinical Neuropsychology
 , 
2007
, vol. 
22
 (pg. 
213
-
223
)
Sollman
M. J.
Berry
D. T. R.
Detection of inadequate effort on neuropsychological testing: A meta-analytic update and extension
Archives of Clinical Neuropsychology
 , 
2011
, vol. 
26
 (pg. 
774
-
789
)
Sollman
M. J.
Ranseen
J. D.
Berry
D. T. R.
Detection of feigned ADHD in college students
Psychological Assessment
 , 
2010
, vol. 
22
 (pg. 
325
-
335
)
Suhr
J.
Hammers
D.
Dobbins-Buckland
K.
Zimak
E.
Hughes
C.
The relationship of malingering test failure to self-reported symptoms and neuropsychological findings in adults referred for ADHD evaluation
Archives of Clinical Neuropsychology
 , 
2008
, vol. 
23
 (pg. 
521
-
530
)
Tan
J. E.
Slick
D. J.
Strauss
E.
Hultsch
D. F.
How'd they do it? Malingering strategies on symptom validity tests
The Clinical Neuropsychologist
 , 
2002
, vol. 
16
 (pg. 
495
-
505
)
Teichner
G.
Wagner
M. T.
The Test of Memory Malingering (TOMM): Normative data from cognitively intact, cognitively impaired, and elderly patients with dementia
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
455
-
464
)
Tombaugh
T. N.
Test of Memory Malingering
 , 
1996
North Tonawanda, NY
Multi-Health Systems
Vanderslice-Barr
J. L.
Miele
A. S.
Jardin
B.
McCaffrey
R. J.
Comparison of computerized versus booklet versions of the TOMM
Applied Neuropsychology
 , 
2011
, vol. 
18
 (pg. 
34
-
36
)
Victor
T. L.
Boone
K. B.
Serpa
J. G.
Beuhler
J.
Ziegler
E. A.
Interpreting the meaning of multiple symptom validity test failure
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
297
-
313
)
Wechsler
D.
Wechsler Adult Intelligence Scale
 , 
1997
3rd ed.
San Antonio, TX
The Psychological Corporation
Weinborn
M.
Orr
T.
Woods
S. P.
Conover
E.
Feix
J.
A validation of the Test of Memory Malingering in a forensic psychiatric setting
Journal of Clinical and Experimental Neuropsychology
 , 
2003
, vol. 
25
 (pg. 
979
-
990
)
Whiteside
D.
Wald
D.
Busse
M.
Classification accuracy of multiple visual spatial measures in the detection of suspect effort
The Clinical Neuropsychologist
 , 
2011
, vol. 
25
 (pg. 
287
-
301
)
Whiteside
D. M.
Dunbar-Mayer
P.
Waters
D. P.
Relationship between TOMM performance and the PAI validity scales in a mixed clinical sample
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
523
-
533
)
Whitney
K. A.
Davis
J. J.
Shepard
P. H.
Herman
S. M.
Utility of the response bias scale (RBS) and other MMPI-2 validity scales in predicting TOMM performance
Archives of Clinical Neuropsychology
 , 
2008
, vol. 
23
 (pg. 
777
-
786
)
Whitney
K. A.
Shepard
P. H.
Williams
A. L.
Davis
J. J.
Adams
K. M.
The Medical Symptom Validity Test in the evaluation of Operation Iraqi Freedom/Operation Enduring Freedom soldiers: A preliminary study
Archives of Clinical Neuropsychology
 , 
2009
, vol. 
24
 (pg. 
145
-
152
)
Wisdom
N. M.
Brown
W. L.
Chen
D. K.
Collins
R. L.
The use of all three Test of Memory Malingering trials in establishing the level of effort
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 (pg. 
208
-
212
)
Wolfe
P. L.
Millis
S. R.
Hanks
R.
Fichtenberg
N.
Larrabee
G. J.
Sweet
J. J.
Effort indicators within the California Verbal Learning Test-II (CVLT-II)
The Clinical Neuropsychologist
 , 
2010
, vol. 
24
 (pg. 
153
-
168
)
Yanez
Y.
Fremouw
W.
Tennant
J.
Strunk
J.
Coker
C.
Effects of severe depression on TOMM performance among disability-seeking outpatients
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
21
 (pg. 
161
-
165
)