Abstract

Symptom validity assessment is an important part of neuropsychological evaluation. There are currently several free-standing symptom validity tests (SVTs), as well as a number of empirically derived embedded validity indices, that have been developed to assess that an examinee is putting forth an optimal level of effort during testing. The use of embedded validity indices is attractive since they do not increase overall testing time and may also be less vulnerable to coaching. In addition, there are some instances where embedded validity indices are the only tool available to the neuropsychological practitioner for assessing an examinee's level of effort. As with free-standing measures, the sensitivity and specificity of embedded validity indices to suboptimal effort varies. The present study evaluated the diagnostic validity of 17 embedded validity indices by comparing performance on these indices to performance on combinations of free-standing SVTs. Results from the current medico-legal sample revealed that of the embedded validity indices, Reliable Digit Span had the best classification accuracy; however, the findings do not support the use of this embedded validity index in the absence of free-standing SVTs.

Introduction

The importance of objectively determining the validity of neuropsychological test scores has received a great deal of attention in the literature. Two major factors determine whether valid neuropsychological data will be obtained. First, the examiner must carefully adhere to all standardized administration and scoring procedures (Lee, Reynolds, & Willson, 2003), which depends on the training of the practitioner and is difficult to objectively evaluate outside of a supervised test administration. The other factor is dependent on the examinee, whose degree of participation in the assessment determines the validity of the data. Suboptimal performance by an examinee for whatever reason invalidates the test findings (Strauss, Sherman, & Spreen, 2006). It absolutely essential, therefore, that all neuropsychological evaluations include methods of determining examinee effort throughout the assessment process.

The assessment of effort as a separate domain of importance in clinical neuropsychological examinations appears to have had its beginnings in the late 1970's (Heaton, Smith, Lehman, & Vogt, 1978; Pankratz, 1979). The concern among practitioners regarding the assessment of effort was spurred by the publication in Neurosurgery of the seminal paper entitled Disability Caused by Minor Head Injury by Rimel, Giordani, Barth, Boll, and Jane (1981). The clinical literature on mild traumatic brain injury (mTBI) at that time attributed patients' subjective complaints to undetectable structural changes such as microscopic neuronal shearing and tearing, as computerized tomography scans and other routine diagnostic neurological and neuroradiological procedures routinely reported “within normal limits.” Consequently, the identification of objective findings in support of a diagnosis of mTBI relied upon neuropsychological assessment, which brought clinical neuropsychologist into the forensic arena. The clinical neuropsychological assessment was utilized as evidence of the “undetectable” structural changes due to a mTBI but, at that point, clinical neuropsychological testing did not include much in the way of performance validity assessment. An analysis of the four editions of Muriel Lezak's classic “Neuropsychological Assessment” (Leazak, 1983; Lezak, 1976, 1995; Lezak, Howieson, & Loring, 2004) provides a gauge to the emergence of the assessment of effort as a separate domain. Contained in Table 1 is a list of the “effort” tests described in each of the four editions. The 1976 edition included only five effort tests and the 1983 edition reported a total of 12, whereas the 1995 edition cited a total of 22. The 2004 edition presents a total of 35 effort tests and procedures and also contains a separate chapter entitled “Testing for Response Bias and Incomplete Effort.” This summary review represents only a cursory examination of the history of procedures, methods, and tests for the detection of suboptimal effort and is by no means a comprehensive analysis of this extensive literature. Nonetheless, the addition of multiple tests of effort has expanded considerably since 1976.

Table 1.

Emergence of the assessment of effort

Measures of efforta Lezak (1976) Lezak (1983) Lezak (1995) Lezak and colleagues (2004) 
Rey 15-Item Test 
Dot Counting Test (Ungrouped Dots) 
Dot Counting Test (Ungrouped Dots and Grouped Dots) 
Word Recognition Test 
Symptom Validity Test 
Bender–Gestalt  
Benton Visual Retention Test  
Halstead–Reitan Battery  
Minnesota Multiphasic Personality Inventory  
Porsch Index of Communication   
Rorschach   
Paced Auditory Serial Addition Test   
Tests in the Wechsler Intelligence Scales   
Auditory Verbal Learning Test    
Complex Figure Test   
Recognition Memory Test   
Continuous Recognition Memory Test       
Continuous Visual Memory Test 
Wechsler Memory Scale     
Wechsler Memory Scale-Revised 
Beck Depression Inventory    
Minnesota Multiphasic Personality Inventory: Profile and Index Analysis   
Symptom Checklist-90-R    
Portland Digit Recognition Test   
Luria–Nebraska Neuropsychological Battery    
California Verbal Learning Test    
Memory Assessment Scales    
Raven's Progressive Matrices    
Wisconsin Card Sorting Test    
Knox Cube Test    
Reaction Time    
Motor-Related Test    
Amsterdam Short-Term Memory Test    
Coin-in-the-Hand Test    
48-Pictures Test    
Hopkins Recall/Recognition Test    
Test of Memory Malingering    
The 21-Item Test    
Validity Profile Indicator    
Personality Assessment inventory    
Autobiographical Memory Interview    
The b-Test    
The Victoria Symptom Validity Test    
Word Memory Test    
Measures of efforta Lezak (1976) Lezak (1983) Lezak (1995) Lezak and colleagues (2004) 
Rey 15-Item Test 
Dot Counting Test (Ungrouped Dots) 
Dot Counting Test (Ungrouped Dots and Grouped Dots) 
Word Recognition Test 
Symptom Validity Test 
Bender–Gestalt  
Benton Visual Retention Test  
Halstead–Reitan Battery  
Minnesota Multiphasic Personality Inventory  
Porsch Index of Communication   
Rorschach   
Paced Auditory Serial Addition Test   
Tests in the Wechsler Intelligence Scales   
Auditory Verbal Learning Test    
Complex Figure Test   
Recognition Memory Test   
Continuous Recognition Memory Test       
Continuous Visual Memory Test 
Wechsler Memory Scale     
Wechsler Memory Scale-Revised 
Beck Depression Inventory    
Minnesota Multiphasic Personality Inventory: Profile and Index Analysis   
Symptom Checklist-90-R    
Portland Digit Recognition Test   
Luria–Nebraska Neuropsychological Battery    
California Verbal Learning Test    
Memory Assessment Scales    
Raven's Progressive Matrices    
Wisconsin Card Sorting Test    
Knox Cube Test    
Reaction Time    
Motor-Related Test    
Amsterdam Short-Term Memory Test    
Coin-in-the-Hand Test    
48-Pictures Test    
Hopkins Recall/Recognition Test    
Test of Memory Malingering    
The 21-Item Test    
Validity Profile Indicator    
Personality Assessment inventory    
Autobiographical Memory Interview    
The b-Test    
The Victoria Symptom Validity Test    
Word Memory Test    

aThis is not an exhaustive list of all methods, procedures, tests, and/or a combination of tests presented in each of the four editions. For that the reader should consult each of the primary editions.

In 2005, the Policy and Planning Committee of the National Academy of Neuropsychology issued the National Academy of Neuropsychology (NAN) Position Paper “Symptom Validity Assessment: Practice Issues and Medical Necessity” (Bush et al., 2005). This was the first time that a leading organization in clinical neuropsychology acknowledged the importance and need to include the assessment of the validity of an examinee's effort as part of any neuropsychological evaluation. Furthermore, this policy statement also established that symptom validity assessment, as part of any evaluation for medical reasons, was considered medically necessary. In 2007, the Board of Directors of the American Academy of Clinical Neuropsychology (AACN) issued a set of “Practice Guidelines for Neuropsychological Assessment and Consultation” that included a section on the assessment of motivation and effort, which was further elaborated in the 2009 publication of the AACN's findings in their “Consensus Conference Statement on the Neuropsychological Assessment of Effort, Response Bias and Malingering” (Heilbronner et al., 2009). This Consensus Statement provides detailed discussion of response bias under which effort assessment is subsumed, referred therein as the assessment of “performance validity.” This terminology helps emphasize the distinction between two separate components of symptom validity evaluation: the assessment of test-taking effort (i.e., performance validity) and the assessment of symptom exaggeration. With regard to performance validity assessment, the AACN Consensus Statement yielded recommendations for the use of multiple measures of effort that tap varying cognitive domains and includes a combination of free-standing effort tests and embedded validity indices.

As they are part of other standardized clinical neuropsychological assessment instruments, numerous embedded validity indices have been developed over the past decade, and currently each neuropsychological domain has at least one standard test that includes an embedded validity index. Embedded validity indices have been developed within neuropsychological tests of attention, processing speed, visuoperceptual functions, executive function, motor functioning, and sensory functioning (Boone, 2007; Larrabee, 2007). The use of embedded indices is attractive since they do not increase overall testing time and may also be less vulnerable to the effects of coaching. Although these indices are increasingly available, there is limited information regarding the comparability of free-standing and embedded validity indices in identifying suboptimal effort. Are embedded validity indices more sensitive to suboptimal effort than free-standing symptom validity measures? Or, are these indices more likely to result in false positive errors; that is, misidentify optimal effort examinees as having suboptimal effort? The purpose of this study was to investigate these questions; specifically, we evaluated whether embedded validity indices and free-standing symptom validity tests (SVTs) have similar diagnostic validity.

Method

Participants

An IRB-approved archival analysis was carried out on data from 50 examinees seen for neuropsychological evaluation for medico-legal reasons (i.e., compensation-seeking, litigation, or disability claims in a private practice setting). The majority of referrals were based on claims of mild traumatic brain injury (mTBI; n = 44). These 44 examinees met the mTBI criteria from the American Congress of Rehabilitation Medicine (ACRM) Mild Traumatic Brain Injury Committee of the Head Injury Interdisciplinary Special Interest Group (Committee on Mild Traumatic Brain Injury, 1993). There were six examinees with other medical conditions, diagnosed by their treating neurologist. The diagnoses were fibromyalgia (n = 2), transient ischemic attack (n = 1), brain stem stroke and major depressive disorder (n = 1), sarcoidosis (n = 1), and multiple sclerosis (n = 1). Each examinee was administered a battery of neuropsychological tests from which the embedded validity indices were extracted. The neuropsychological test battery, therefore, dictated the embedded indices used in this study.

Materials

The battery included 4 free-standing SVTs and 17 embedded validity indices (see Table 2 for a list with references and Table 3 for description). SVTs included the Rey 15-Item Test (Rey-15; Rey, 1964), the Test of Memory Malingering (TOMM; Tombaugh, 1996), Victoria Symptom Validity Test (VSVT; Slick, Hopp, Strauss, & Thompson, 1997), and the Word Memory Test (WMT; Green, 2003). For a detailed discussion and description of these measures, see Lezak and colleagues (2004) and Strauss and colleagues (2006). The free-standing SVTs are frequently administered in routine clinical neuropsychological practice (Sharland & Gfeller, 2007; Slick, Tan, Strauss, & Hultsch, 2004). The embedded validity indices are also frequently utilized in clinical practice (e.g., Boone, 2007; Larrabee, 2007) and are derived from the Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler, 1981) and Halstead–Reitan Neuropsychological Battery (HRNB) for Adults (Reitan & Wolfson, 1993) and, as stated, were specific to the neuropsychological battery.

Table 2.

Embedded validity indices

Embedded validity indices References (see also Boone, 2007; Larrabee, 2007
WAIS-R Digit Span WAIS-R Digit Span-Related References 
1. ACSS Greiffenstein, Baker, and Gola (1994); Mittenberg, Theroux-Fichera, Zielinski, and Heilbronner (1995); Mittenberg and colleagues (2001); Iverson and Tulsky (2003); Heinly, Greve, Bianchini, Love, and Brennan (2005); Babikian, Boone, Lu, and Arnold (2006) 
2. Reliable Digit Span 
3. Longest Span Forward 
4. Longest Span Backward 
5.Vocabulary ACSS minus Digit Span ACSS 
HRNB HRNB-Related References 
6. CT Total Errors DiCarlo and Gfeller (2000); Forrest, Allen, and Goldstein (2004); Gfeller and Cradock (1998); Goebel (1983); Heaton, Smith, Lehman, and Vogt (1978); Inman and Berry (2002); Laatsch and Choca (1991); Ross, Putnam, Millis, Adams, and Krukowski (2006); Sweet and King (2002); Trueblood and Schmidt (1993) 
7. CT Total Errors  on subtests 1 and 2 
8. CT Total Errors on subtest 7 
9. CT Total Errors Bolter Index 
10. CT Total Errors “Easy” Items 
11. CT Total Errors “Difficult” Items 
12. Seashore Rhythm Test Total Errors 
13. Speech Sounds Perception Total Errors 
14. Tactile Form Recognition Total Errors 
15. Fingertip Number-Writing Total Errors 
Embedded validity indices References (see also Boone, 2007; Larrabee, 2007
WAIS-R Digit Span WAIS-R Digit Span-Related References 
1. ACSS Greiffenstein, Baker, and Gola (1994); Mittenberg, Theroux-Fichera, Zielinski, and Heilbronner (1995); Mittenberg and colleagues (2001); Iverson and Tulsky (2003); Heinly, Greve, Bianchini, Love, and Brennan (2005); Babikian, Boone, Lu, and Arnold (2006) 
2. Reliable Digit Span 
3. Longest Span Forward 
4. Longest Span Backward 
5.Vocabulary ACSS minus Digit Span ACSS 
HRNB HRNB-Related References 
6. CT Total Errors DiCarlo and Gfeller (2000); Forrest, Allen, and Goldstein (2004); Gfeller and Cradock (1998); Goebel (1983); Heaton, Smith, Lehman, and Vogt (1978); Inman and Berry (2002); Laatsch and Choca (1991); Ross, Putnam, Millis, Adams, and Krukowski (2006); Sweet and King (2002); Trueblood and Schmidt (1993) 
7. CT Total Errors  on subtests 1 and 2 
8. CT Total Errors on subtest 7 
9. CT Total Errors Bolter Index 
10. CT Total Errors “Easy” Items 
11. CT Total Errors “Difficult” Items 
12. Seashore Rhythm Test Total Errors 
13. Speech Sounds Perception Total Errors 
14. Tactile Form Recognition Total Errors 
15. Fingertip Number-Writing Total Errors 

Notes: WAIS-R = Wechsler Adult Intelligence Scale-Revised; CT = Category Test; HRNB = Halstead–Reitan Neuropsychological Battery; ACSS = Age-Corrected Scaled Score.

Table 3.

Description of embedded validity indices

WAIS-R (Wechsler, 1981Description 
1. Digit Span ACSS ACSS is calculated using the raw score and provided norms 
2. Reliable Digit Span Score is calculated by adding the longest span forward (where both trials were correct) and the longest span backward (again where both trials were correct) 
3. Digit Span Forward Longest number of digits recalled forward correctly 
4. Digit Span Backward Longest number of digits recalled backward correctly 
5. Vocabulary ACSS − Digit Span ACSS Difference between the age-scaled scores 

 
HRNB (Reitan & Wolfson, 1993Description 

 
6. CT Total Errors Total errors across the seven subtests 
7. CT on subtests 1 and 2 Total Errors Errors on these subtests 
8. CT on subtest 7 Total Errors Errors on subtest 7 only 
9. CT Bolter Index Total Errors Total errors on this index (see Bolter, Picano, & Zych 1985; Tenhula & Sweet, 1994a, 1994b, for items; Tenhula & Sweet, 1996
10. CT “Easy” Items Total Errors Total errors on this index (see Tenhula & Sweet, 1994a, 1994b, for items; Tenhula & Sweet, 1996
11. CT “Difficult” Items; Total Errors Total errors on this index (see Tenhula & Sweet, 1994a, 1994b, for items; Tenhula & Sweet, 1996
12. Fingertip Number-Writing Total Errors Total errors 
13. Speech Sounds Perception Total Errors Total errors 
14. Tactile Form Recognition Total Errors Total errors for both hands combined 
15. Seashore Rhythm Test Total Errors Total number of errors 
WAIS-R (Wechsler, 1981Description 
1. Digit Span ACSS ACSS is calculated using the raw score and provided norms 
2. Reliable Digit Span Score is calculated by adding the longest span forward (where both trials were correct) and the longest span backward (again where both trials were correct) 
3. Digit Span Forward Longest number of digits recalled forward correctly 
4. Digit Span Backward Longest number of digits recalled backward correctly 
5. Vocabulary ACSS − Digit Span ACSS Difference between the age-scaled scores 

 
HRNB (Reitan & Wolfson, 1993Description 

 
6. CT Total Errors Total errors across the seven subtests 
7. CT on subtests 1 and 2 Total Errors Errors on these subtests 
8. CT on subtest 7 Total Errors Errors on subtest 7 only 
9. CT Bolter Index Total Errors Total errors on this index (see Bolter, Picano, & Zych 1985; Tenhula & Sweet, 1994a, 1994b, for items; Tenhula & Sweet, 1996
10. CT “Easy” Items Total Errors Total errors on this index (see Tenhula & Sweet, 1994a, 1994b, for items; Tenhula & Sweet, 1996
11. CT “Difficult” Items; Total Errors Total errors on this index (see Tenhula & Sweet, 1994a, 1994b, for items; Tenhula & Sweet, 1996
12. Fingertip Number-Writing Total Errors Total errors 
13. Speech Sounds Perception Total Errors Total errors 
14. Tactile Form Recognition Total Errors Total errors for both hands combined 
15. Seashore Rhythm Test Total Errors Total number of errors 

Notes: WAIS-R = Wechsler Adult Intelligence Scale-Revised; CT = Category Test; ACSS = Age-Corrected Scaled Score; HRNB = Halstead–Reitan Neuropsychological Battery.

Cutoff Selection and Procedure

The cutoff scores for the free-standing SVTs were those routinely used in clinical practice (Table 4). See Table 5 for criteria utilized to determine passing or failing each embedded validity index. Digit Span Age-Corrected Scaled Score (ACSS) and the Seashore Rhythm Test were examined at two different cutoffs because the literature supports the use of both cutoffs with no empirically based superiority of one cutoff over the other.

Table 4.

Criteria for failure on the SVTs

Free-standing SVT Failure determined by 
Rey-15 ≤9 correcta 
TOMM <45 on Trial 2 or Retentionb 
VSVT ≤17 Hard Items correctc 
WMT ≤82.5% correct on Immediate Recall, Delayed Recall or Consistency Indexd,e 
Free-standing SVT Failure determined by 
Rey-15 ≤9 correcta 
TOMM <45 on Trial 2 or Retentionb 
VSVT ≤17 Hard Items correctc 
WMT ≤82.5% correct on Immediate Recall, Delayed Recall or Consistency Indexd,e 

Notes: SVT = symptom validity test; TOMM = Test of Memory Malingering; VSVT = Victoria Symptom Validity Test; WMT = Word Memory Test; Rey-15 =Rey 15-Item Test.

cGrote and colleagues (2000) and Loring, Lee, and Meador (2005).

eAll examinees' profiles were additionally examined using the recommended profile analysis (Green, 2005).

Table 5.

Cutoff scores* for each of the embedded validity indices

Embedded validity indices Cutoff 
WAIS-R  
 Digit Span ACSS ≤5 
<5 
 Reliable Digit Span ≤7 
 Digit Span Longest Span Forward ≤4 
 Digit Span Longest Span Backward ≤2 
 Vocabulary ACSS − Digit Span ACSS >4 
Halstead–Reitan Category Test 
 CT Total Errors >87 
 CT Total Errors on subtests 1 and 2 >1 
 CT Errors on subtest 7 >5 
 CT Total Errors Bolter Index >3 
 CT Total Errors on “Easy” Items >2 
 CT Total Errors on “Difficult” Items >11 
 Total Errors on Seashore Rhythm ≥9 
≥10 
 Errors on Speech Sound Perception ≥13 
 Total Errors on Tactile Finger Recognition >3 
 Total Errors on Fingertip Number Writing >5 
Embedded validity indices Cutoff 
WAIS-R  
 Digit Span ACSS ≤5 
<5 
 Reliable Digit Span ≤7 
 Digit Span Longest Span Forward ≤4 
 Digit Span Longest Span Backward ≤2 
 Vocabulary ACSS − Digit Span ACSS >4 
Halstead–Reitan Category Test 
 CT Total Errors >87 
 CT Total Errors on subtests 1 and 2 >1 
 CT Errors on subtest 7 >5 
 CT Total Errors Bolter Index >3 
 CT Total Errors on “Easy” Items >2 
 CT Total Errors on “Difficult” Items >11 
 Total Errors on Seashore Rhythm ≥9 
≥10 
 Errors on Speech Sound Perception ≥13 
 Total Errors on Tactile Finger Recognition >3 
 Total Errors on Fingertip Number Writing >5 

Notes: WAIS-R = Wechsler Adult Intelligence Scale-Revised; CT = Category Test; ACSS = Age-Corrected Scaled Score.

*Cutoffs were taken from the clinical literature (Boone, 2007; Larrabee, 2007).

As NAN (Bush et al., 2005) and the AACN (Heilbronner et al., 2009) recommend that clinicians use multiple symptom validity measures, rather than looking at SVT performance individually, a grouping variable (fail ≥2 SVTs) was created as a conservative method for identifying effort level. Examinees failing two or more SVTs were classified as suboptimal effort while those failing less than two were classified as optimal effort.

A single direct logistic regression was computed in order to determine the ability of the 17 embedded validity indices as a group to accurately classify effort according to the dependent variable (fail ≥2 SVTs = suboptimal effort and fail <2 SVTs = optimal effort). Logistic regression is a statistical technique that uses scores on a set of independent variables to distinguish between two or more groups (Tabachnick & Fidell, 2007), which is useful for clinicians because this provides information about likelihood of group membership. Correlational analyses were also performed. Additionally, 16 embedded validity indices were entered as predictor variables into individual direct logistic regressions to determine their unique ability to accurately classify examinees into optimal or suboptimal effort groups as determined by our dependent variable. Only 16 indices were entered because no examinees failed the Digits Backward embedded validity index.

In addition to using more than one symptom validity measure during a neuropsychological evaluation, the practice guidelines (Bush et al., 2005; Heilbronner et al., 2009) recommend that symptom validity measures cover diverse neuropsychological domains. For this reason, an exploratory factor analysis was utilized to determine the underlying factor structure of the embedded validity indices. Exploratory factor analysis reveals latent variables that can explain covariation among other variables (Tabachnick & Fidell, 2007). Ultimately, results from the exploratory factor analysis and 16 individual logistic regressions were then used to select “the best” embedded validity indices for use in clinical practice. Finally, both the embedded validity indices significant as individual predictors and those representing different factors from the exploratory factor analysis were used in a single direct logistic regression to examine their combined ability to predict effort according to the dependent variable. All statistically-based investigations were conducted using SPSS version 18.0.

While statistically identifying the embedded validity indices that classify suboptimal and optimal effort is informative, these techniques fail to consider the classification accuracy of the individual embedded indices. If identified a significant predictor, an embedded validity index may still over- or under-identify suboptimal effort. That is, a significant proportion of examinees with suboptimal effort may have “passed” the embedded validity index (false negatives), and a significant proportion of examinees with optimal effort may have “failed” the embedded validity index (false positives). For this reason, classification accuracy statistics were calculated and examined for all embedded validity indices.

In computing the classification accuracy statistics for the embedded validity indices, performance on each embedded index (i.e., the test result) was compared with the effort determination made by the criterion. False positives were defined as passing the criterion (i.e., failing <2 SVTs) but failing the embedded validity index (false positive = 1 − the specificity [SP]). False negatives were defined as failing ≥2 SVTs but obtaining a passing score on the specific embedded validity index (false negative = 1 − the sensitivity [SN]). See Fig. 1 for a summary of classification accuracy calculations.

Fig. 1.

Calculating classification accuracy.

Fig. 1.

Calculating classification accuracy.

Results

Demographics for Entire Sample and Across Effort Groups

Overall, the mean examinee age was 44.92 (SD = 10.26) years. The mean Full-Scale IQ (FSIQ) was 95.06 (SD = 10.92) and the mean years of education was 14.10 (SD = 2.92). The majority of the sample was right handed (94%), Caucasian (92%), and men (52%). When the sample was divided into optimal and suboptimal effort groups, there were no differences between any of the demographic variables except for FSIQ—t(48) = −4.12, p < .001. Suboptimal effort examinees had an average FSIQ score of 88.57 (SD = 9.5), which was ∼11 points lower than optimal effort examinees' average score of 99.76 (SD = 9.5).

SVT Descriptive Results

The original sample consisted of 54 individuals. Four examinees were subsequently eliminated because they had not completed a sufficient number of SVTs for definitive effort classification. The final sample consisted of 50 examinees. Of these, 37 completed all four of the SVTs and 12 completed three of the SVTs. The remaining examinee had completed the WMT and the VSVT and passed both. These examinee's data were included in the analyses since no other examinee in our data set who passed both of these SVTs had been classified as suboptimal effort due to failure on the other two SVTs.

Those examinees identified as suboptimal effort failed at least two of the four SVTs. Each examinee who completed only three of the SVTs passed all of them and was identified as optimal effort. See Table 6 for descriptive information regarding examinee performance across the four SVTs as well as for the examinees failing two or more SVTs. The six non-mTBI examinees were all classified as optimal effort and had been administered the VSVT, TOMM, and the WMT; two examinees also completed the Rey-15. All but one non-mTBI examinee passed each SVT administered; the exception was the examinee with a diagnosis of multiple sclerosis who failed the WMT while passing the other three SVTs.

Table 6.

Examinees passing and failing the free-standing SVTs and fail ≥2 SVTs grouping variable

 Pass Fail 
Rey-15 (n = 43) 37 (86%) 6 (14%) 
TOMM (n = 49) 40 (81.6%) 9 (18.4%) 
VSVT (n = 45) 22 (51.8%) 22 (48.9%) 
WMT (n = 49) 27 (55.1%) 22 (44.9%) 
Fail ≥2 SVTs (n = 50) 29 (58%) 21 (42%) 
 Pass Fail 
Rey-15 (n = 43) 37 (86%) 6 (14%) 
TOMM (n = 49) 40 (81.6%) 9 (18.4%) 
VSVT (n = 45) 22 (51.8%) 22 (48.9%) 
WMT (n = 49) 27 (55.1%) 22 (44.9%) 
Fail ≥2 SVTs (n = 50) 29 (58%) 21 (42%) 

Notes: SVT = symptom validity test; TOMM = Test of Memory Malingering; VSVT = Victoria Symptom Validity Test; WMT = Word Memory Test.

Performance on SVTs tended to cluster into two groups. Similar percentages of examinees passed and failed the Rey-15 and the TOMM, while a similar percentage also passed and failed the WMT, VSVT, and the fail ≥2 SVTs grouping variable. Of the 21 examinees identified as suboptimal effort by the fail ≥2 SVTs grouping variable, 33.3% failed the Rey-15, 42.9% failed the TOMM, 100% failed the VSVT, and 85.7% failed the WMT. All examinees identified as suboptimal effort by this grouping variable failed either the VSVT or the WMT as one of the two or more failed SVTs. As seen in Table 6, over 40% of examinees failed at least two SVTs, which indicates a 42% base rate of suboptimal effort when using the fail ≥2 SVTs criterion.

Embedded Validity Indices Descriptive Results

Fig. 2 shows the frequency of examinees failing each embedded validity index. The three most frequently failed embedded validity indices were the Category Test (CT) Total Errors on the “Difficult” items (n = 23), Reliable Digit Span (n = 18), and CT Errors on subtest 7 (n = 18). The three least frequently failed embedded validity indices were Digits ACSS at a cutoff of <5 (n = 5), the CT Total Errors on the Bolter Index (n = 4), and Longest Digit Span Backward (n = 0). Although these descriptive data are informative, it is important to consider both inferential statistics and classification accuracy statistics when determining the diagnostic validity of the embedded validity indices to identify suboptimal effort.

Fig. 2.

Frequency of embedded validity index failure.

Fig. 2.

Frequency of embedded validity index failure.

Correlations and Direct Logistic Regression Analyses

Due to the large number of predictor variables, correlational analyses were carried out for each of the 17 embedded validity indices. Not surprisingly, every embedded validity index was significantly correlated with at least one other embedded validity index, and over half of these correlations were >0.8. The first logistic regression with all embedded validity indices entered as predictors was significant—χ2(16, n = 46) = 34.50, p = .005. The variance in effort accounted for was large (McFaddens D = 0.56). According to the Wald criterion, in this model, only Reliable Digit Span and CT Total Errors on subtests 1 and 2 were significant predictors. The absence of other significant predictors was not unexpected given the significant multicollinearity among the embedded validity indices. For this reason, individual logistic regressions were carried out for each of the 16 embedded validity indices. Results showed that only four embedded indices explained any unique variance in effort as determined by the fail ≥2 SVTs grouping variable: Reliable Digit Span, Speech Sounds Errors, Tactile Finger Recognition Total Errors, and CT Errors on subtest 7. See Table 7 for the results of this analysis. The variance in effort accounted for was small (McFaddens D = 0.16, 0.09, 0.09, and 0.11, respectively, for each of the four significant embedded validity indices). Examinees failing Reliable Digit Span, Speech Sounds Errors, Tactile Finger Recognition Total Errors, and CT Errors on subtest 7 were ∼8, 7, 6, and 5 times more likely to also be classified as suboptimal effort by the dependent variable, respectively. Each of these four embedded validity indices had similar correct classification rates (Reliable Digit Span = 74%, Tactile Finger Recognition Total Errors = 71%, CT Errors on subtest 7 = 69%, and Speech Sounds Errors = 68%).

Table 7.

Individual direct logistic regression output for 16 embedded validity indices

Variablea β SE Wald χ2 p-value Odds ratio 95% confidence interval for odds ratio
 
Lower Upper 
CT Total Errors “Difficult” Index 1.02 0.60 2.86 0.091 2.76 0.85 8.97 
Reliable Digit Span 2.05 0.67 9.51 0.002 7.80 2.11 28.78 
CT Errors on subtest 7 1.54 0.64 5.81 0.016 4.67 1.33 16.34 
Seashore Rhythm Errors ≥9 0.48 0.61 0.61 0.433 1.62 0.49 5.36 
Tactile Finger Recognition Errors 1.79 0.70 6.53 0.011 6.00 1.52 23.71 
Fingertip Number-Writing Errors 0.94 0.65 2.10 0.147 2.56 0.72 9.08 
Vocabulary ACSS − Digits ACSS 1.08 0.67 2.65 0.104 2.95 0.80 10.90 
Seashore Rhythm Errors ≥10 0.88 0.68 1.68 0.195 2.40 0.64 9.02 
CT Total Errors “Easy” Index 0.79 0.68 1.35 0.245 2.20 0.58 8.31 
Speech Sounds Errors 1.91 0.87 4.85 0.028 6.75 1.24 36.91 
CT Total Errors on subtests 1 and 2 1.16 0.78 2.22 0.136 3.20 0.69 14.76 
CT Total Errors 0.04 0.82 0.96 1.04 0.21 5.24 
Digits ACSS ≤5 21.93 15,191.52 0.999  
Digits Forward 21.86 16,408.71 0.999  
Digits ACSS <5 21.80 17,974.84 0.999  
CT Total Errors Bolter Index 1.47 1.20 1.51 0.220 4.33 0.42 45.06 
Variablea β SE Wald χ2 p-value Odds ratio 95% confidence interval for odds ratio
 
Lower Upper 
CT Total Errors “Difficult” Index 1.02 0.60 2.86 0.091 2.76 0.85 8.97 
Reliable Digit Span 2.05 0.67 9.51 0.002 7.80 2.11 28.78 
CT Errors on subtest 7 1.54 0.64 5.81 0.016 4.67 1.33 16.34 
Seashore Rhythm Errors ≥9 0.48 0.61 0.61 0.433 1.62 0.49 5.36 
Tactile Finger Recognition Errors 1.79 0.70 6.53 0.011 6.00 1.52 23.71 
Fingertip Number-Writing Errors 0.94 0.65 2.10 0.147 2.56 0.72 9.08 
Vocabulary ACSS − Digits ACSS 1.08 0.67 2.65 0.104 2.95 0.80 10.90 
Seashore Rhythm Errors ≥10 0.88 0.68 1.68 0.195 2.40 0.64 9.02 
CT Total Errors “Easy” Index 0.79 0.68 1.35 0.245 2.20 0.58 8.31 
Speech Sounds Errors 1.91 0.87 4.85 0.028 6.75 1.24 36.91 
CT Total Errors on subtests 1 and 2 1.16 0.78 2.22 0.136 3.20 0.69 14.76 
CT Total Errors 0.04 0.82 0.96 1.04 0.21 5.24 
Digits ACSS ≤5 21.93 15,191.52 0.999  
Digits Forward 21.86 16,408.71 0.999  
Digits ACSS <5 21.80 17,974.84 0.999  
CT Total Errors Bolter Index 1.47 1.20 1.51 0.220 4.33 0.42 45.06 

Notes: ACSS = Age-Corrected Scaled Score; CT = Category Test. Bold values denote exact p-values.

aNo examinees failed Digits Backward so this embedded validity index was not included in this analysis.

In order to better understand the neuropsychological domains associated with the embedded validity indices, an exploratory factor analysis was performed using the embedded indices as continuous variables. Results indicated a three-factor solution. Tactile Finger Recognition and Fingertip Number Writing were not included in the exploratory factor analysis because they did not explain a significant amount of variance in any of the three factors (factor communalities = 0.432 and 0.394, respectively). The five Digit Span embedded validity indices and Seashore Rhythm Test Errors loaded onto the first factor, which may be conceptualized to involve auditory attention. CT Total Errors, CT Errors on subtest 7, and CT Total Errors on the “Difficult” Index loaded onto a second factor, which may be conceptualized as a nonverbal conceptual reasoning factor. The remaining embedded validity indices (Speech Sounds Errors, CT Total Errors Bolter Index, CT Total Errors “Easy” Index, and CT on subtests 1 and 2) loaded on the third factor. These embedded validity indices are not easily grouped under a single cognitive domain. The three factors were not highly correlated with one another—r(50) = .171, .292, and .308, respectively. See Table 8 for factor loadings for the 13 embedded validity indices.

Table 8.

Factor loadings for 13 embedded validity indices

Variable Factor
 
Digits ACSS −1.011 0.007 0.046 
Reliable Digit Span −0.966 −0.023 0.001 
Digits Forward −0.894 −0.023 0.047 
Digits Backward −0.833 −0.195 0.095 
Vocabulary ACSS − Digits ACSS 0.722 −0.137 0.046 
Seashore Rhythm Total Errors 0.536 −0.055 0.439 
CT Total Errors −0.038 0.935 0.178 
CT Errors on subtest 7 0.118 0.865 0.049 
CT Total Errors “Difficult” Index −0.039 0.696 −0.089 
CT Total Errors Bolter Index 0.023 0.169 0.866 
CT Total Errors on subtests 1 and 2 −0.113 −0.058 0.685 
CT Total Errors “Easy” Index 0.038 0.463 0.598 
Speech Sounds Errors 0.263 0.032 0.455 
Variable Factor
 
Digits ACSS −1.011 0.007 0.046 
Reliable Digit Span −0.966 −0.023 0.001 
Digits Forward −0.894 −0.023 0.047 
Digits Backward −0.833 −0.195 0.095 
Vocabulary ACSS − Digits ACSS 0.722 −0.137 0.046 
Seashore Rhythm Total Errors 0.536 −0.055 0.439 
CT Total Errors −0.038 0.935 0.178 
CT Errors on subtest 7 0.118 0.865 0.049 
CT Total Errors “Difficult” Index −0.039 0.696 −0.089 
CT Total Errors Bolter Index 0.023 0.169 0.866 
CT Total Errors on subtests 1 and 2 −0.113 −0.058 0.685 
CT Total Errors “Easy” Index 0.038 0.463 0.598 
Speech Sounds Errors 0.263 0.032 0.455 

Notes: Pattern matrix factor loadings from factor analysis with principal axis factoring and oblimin rotation. ACSS = Age-Corrected Scaled Score; CT = Category Test. Bold values denote significant factor loadings.

The usefulness of classifying examinee effort as determined by the fail ≥2 SVTs variable by failure on Reliable Digit Span, CT Errors on subtest 7, Speech Sounds Errors, and Tactile Finger Recognition Total Errors was examined in a final direct logistic regression. Tactile Finger Recognition Total Errors was included despite its nonloading in the exploratory factor analysis. Results showed that, overall, this model had utility in classifying examinees—χ2(4, n = 46) = 20.95, p < .001. The variance in effort accounted for by the model was moderate (McFaddens D = 0.35), with correct classification of ∼78% of examinees. See Table 9 for the regression coefficients, Wald Statistics, odds ratios, and 95% confidence intervals for the odds ratios as determined by the fail ≥2 SVTs grouping variable. Reliable Digit Span emerged as the only significant predictor. Examinees failing Reliable Digit Span were over 11 times more likely to be classified as suboptimal effort than examinees who passed this embedded validity index. It is important, however, to consider the predictive power of Reliable Digit Span, which is discussed below.

Table 9.

Direct logistic regression with Reliable Digit Span and other individually significant embedded validity indicators

Variable β SE Wald χ2 p-value Odds ratio 95% confidence interval for odds ratio
 
Lower Upper 
Reliable Digit Span 2.41 0.91 7.11 .008 11.18 1.90 65.95 
CT Errors on subtest 7 1.60 0.85 3.56 .059 4.97 0.94 26.34 
Speech Sounds Errors 1.39 0.99 1.96 .162 4.01 0.57 28.11 
Tactile Finger Recognition Errors 0.40 0.90 0.20 .659 1.49 0.26 8.68 
Variable β SE Wald χ2 p-value Odds ratio 95% confidence interval for odds ratio
 
Lower Upper 
Reliable Digit Span 2.41 0.91 7.11 .008 11.18 1.90 65.95 
CT Errors on subtest 7 1.60 0.85 3.56 .059 4.97 0.94 26.34 
Speech Sounds Errors 1.39 0.99 1.96 .162 4.01 0.57 28.11 
Tactile Finger Recognition Errors 0.40 0.90 0.20 .659 1.49 0.26 8.68 

Note: CT = Category Test. Bold values denote exact p-values.

Classification Accuracy Statistics

See Table 10 for classification accuracy statistics information. In general, the sensitivity (SN), specificity (SP), positive predictive power (PPP) and negative predictive power (NPP) varied across all of the embedded validity indices. The false-positive rate across the 17 embedded validity indices ranged from 0.0 to 0.37, whereas the false-negative rate ranged from 0.38 to 1.0. The PPP for Digits Backward was unable to be calculated because no examinees failed this index. Four embedded validity indices (Digits ACSS ≤ 5, Digits ACSS < 5, Digits Forward, and Digits Backward) had SP values of 1.0, indicating no false-positive errors. The SN values for these four embedded validity indices ranged from 0.0 to 0.33 (false negatives ranged from 0.67 to 1.0). Although all optimal effort examinees passed these four embedded validity indices, they failed to identify a substantial number of suboptimal effort examinees. As seen in Table 10, when considering PPP and NPP together, Reliable Digit Span, CT Total Errors on subtest 7, Speech Sounds Total Errors, and Tactile Finger Recognition Total Errors had good combined classification accuracy. The SN and the SP for these four embedded validity indices, however, indicate a significant risk of both false-positive and false-negative errors.

Table 10.

Classification accuracy for embedded validity indices for fail ≥2 SVTs grouping variable

 Sensitivity (false negatives) Specificity (false positives) PPP NPP 
Digits ACSS ≤5 0.33 (0.67) 1.0 (0.0) 1.0 0.67 
Digits ACSS <5 0.24 (0.76) 1.0 (0.0) 1.0 0.64 
Digits Forward 0.29 (0.71) 1.0 (0.0) 1.0 0.66 
Digits Backward 0.0 (1.0) 1.0 (0.0) NAa 0.58 
Reliable Digit Span 0.62 (0.38) 0.83 (0.17) 0.72 0.75 
Vocabulary ACSS – Digits ACSS 0.38 (0.62) 0.83 (0.17) 0.62 0.65 
CT Total Errors 0.14 (0.86) 0.87 (0.13) 0.43 0.58 
CT Total Errors on subtests 1 and 2 0.29 (0.71) 0.89 (0.11) 0.67 0.62 
CT Errors on subtest 7 0.57 (0.43) 0.78 (0.22) 0.67 0.70 
CT Total Errors Bolter Index 0.14 (0.86) 0.96 (0.04) 0.75 0.59 
CT Total Errors “Easy” Index 0.33 (0.67) 0.81 (0.19) 0.58 0.61 
CT Total Errors “Difficult” Index 0.62 (0.38) 0.63 (0.37) 0.57 0.68 
Seashore Total Errors ≥9 0.38 (0.62) 0.72 (0.28) 0.50 0.62 
Seashore Total Errors ≥10 0.33 (0.67) 0.83 (0.17) 0.58 0.63 
Speech Sounds Errors 0.33 (0.67) 0.93 (0.07) 0.78 0.66 
Tactile Finger Recognition Total Errors 0.50 (0.50) 0.86 (0.14) 0.71 0.71 
Fingertip Number-Writing Total Errors 0.40 (0.60) 0.79 (0.21) 0.57 0.66 
 Sensitivity (false negatives) Specificity (false positives) PPP NPP 
Digits ACSS ≤5 0.33 (0.67) 1.0 (0.0) 1.0 0.67 
Digits ACSS <5 0.24 (0.76) 1.0 (0.0) 1.0 0.64 
Digits Forward 0.29 (0.71) 1.0 (0.0) 1.0 0.66 
Digits Backward 0.0 (1.0) 1.0 (0.0) NAa 0.58 
Reliable Digit Span 0.62 (0.38) 0.83 (0.17) 0.72 0.75 
Vocabulary ACSS – Digits ACSS 0.38 (0.62) 0.83 (0.17) 0.62 0.65 
CT Total Errors 0.14 (0.86) 0.87 (0.13) 0.43 0.58 
CT Total Errors on subtests 1 and 2 0.29 (0.71) 0.89 (0.11) 0.67 0.62 
CT Errors on subtest 7 0.57 (0.43) 0.78 (0.22) 0.67 0.70 
CT Total Errors Bolter Index 0.14 (0.86) 0.96 (0.04) 0.75 0.59 
CT Total Errors “Easy” Index 0.33 (0.67) 0.81 (0.19) 0.58 0.61 
CT Total Errors “Difficult” Index 0.62 (0.38) 0.63 (0.37) 0.57 0.68 
Seashore Total Errors ≥9 0.38 (0.62) 0.72 (0.28) 0.50 0.62 
Seashore Total Errors ≥10 0.33 (0.67) 0.83 (0.17) 0.58 0.63 
Speech Sounds Errors 0.33 (0.67) 0.93 (0.07) 0.78 0.66 
Tactile Finger Recognition Total Errors 0.50 (0.50) 0.86 (0.14) 0.71 0.71 
Fingertip Number-Writing Total Errors 0.40 (0.60) 0.79 (0.21) 0.57 0.66 

Notes: SVT = symptom validity test; PPP = positive predictive power; NPP = negative predictive power; ACSS = Age-Corrected Scaled Score; CT = Category Test.

aUnable to calculate because no examinees failed this embedded validity index.

Results Summary

In order to identify the most useful embedded validity indices, both inferential and descriptive statistics were used in the form of logistic regression analysis, exploratory factor analysis, and classification accuracy statistics. Logistic regression first showed that as a group, embedded validity indices have statistical utility in classifying examinees into optimal or suboptimal effort groups, as determined by the fail ≥2 SVTs grouping variable. Independently, only four of the 16 embedded validity indices included in this analysis significantly grouped examinees according to effort; these were Reliable Digit Span, CT Total Errors on subtest 7, Speech Sounds Errors, and Tactile Finger Recognition Total Errors. An exploratory factor analysis showed a three-factor underlying structure for 13 of the embedded validity indices. Most interestingly, three of the four statistically significant embedded validity indices identified in the individual logistic regressions (i.e., Reliable Digit Span, CT Total Errors on subtest 7, and Speech Sounds Errors) fell in different factors. A final logistic regression including the four individually significant embedded validity indices revealed that only Reliable Digit Span uniquely explained a significant amount of variance in the classification of suboptimal effort. This final logistic regression demonstrated that the sole inclusion of a Reliable Digit Span at a cutoff of ≤7 will correctly classify over 74% of examinees according to the fail ≥2 SVTs grouping variable and that the inclusion of any additional embedded validity index did not significantly increase the classification rate.

Discussion

The purpose of this study was to examine the diagnostic validity and resulting clinical utility of a number of embedded validity indices. We examined whether several embedded validity indices, developed from the WAIS-R and the HRNB, were comparable with free-standing SVTs as indicators of effort in a medico-legal sample. The classification ability of each embedded validity index was determined using failure of two or more free-standing SVTs as a criterion for suboptimal effort. Through direct logistic regression and an exploratory factor analysis, we determined that Reliable Digit Span was a significantly better predictor of suboptimal effort than the other embedded validity indices included in this analysis. A Reliable Digit Span of ≤7 was able to correctly classify 74% of examinees. This classification rate is consistent with that reported in a recent meta-analysis that found a classification accuracy rate of 76% using a Reliable Digit Span at the same cutoff (Jasinski, Berry, Shander, & Clark, 2011). The model including Reliable Digit Span, CT Total Errors on subtest 7, Speech Sounds Errors, and Tactile Finger Recognition Total Errors only correctly classified two more examinees (78%). For this reason, the use of Reliable Digit Span alone provided a more parsimonious method for identifying suboptimal effort; examinees who failed Reliable Digit Span were almost eight times more likely to have failed two or more SVTs. In addition, we also calculated the sensitivity, specificity, PPP, and NPP for each embedded validity index. Reliable Digit Span, CT Errors on subtest 7, Speech Sounds Errors, and Tactile Finger Recognition Total Errors had the highest PPP (0.72, 0.67, 0.78, and 0.71, respectively) and NPP (0.75, 0.70, 0.66, and 0.71, respectively), but as reflected in the sensitivity and specificity values, each was associated with an elevated false-positive and false-negative rate.

Direct logistic regression at the multivariate level showed that embedded validity indices were useful in classifying examinees into optimal versus suboptimal effort groups; at the univariate level, four embedded validity indices were found to uniquely make this discrimination. An exploratory factor analysis assisted in delineating a common underlying three-factor structure for the embedded validity indices. Interestingly, three of the four significant embedded validity indices were found within these different factors; however, when the individually significant embedded validity indices were entered into a final direct logistic regression, only Reliable Digit Span significantly predicted suboptimal effort. The other three embedded validity indices did not provide significant incremental predictive validity.

Because of the lack of consideration of false negatives and false positives in these analyses, the data were viewed in conjunction with classification accuracy statistics. After reviewing both the inferential and descriptive data, Reliable Digit Span emerged as the “most robust” embedded validity index in classifying suboptimal effort as defined by failing two or more free-standing SVTs, although there was nearly a 20% chance of over-identifying suboptimal effort (false positives = 0.17) and a 40% chance of missing suboptimal effort (false negatives = 0.38). CT Errors on subtest 7, Speech Sounds Errors, and Tactile Finger Recognition Total Errors were nonsignificant predictors in the final logistic regression that included Reliable Digit Span, likely because of their unacceptable classification accuracy rates.

Four embedded validity indices (Digits ACSS ≤ 5, Digits ACSS < 5, Digits Forward, and Digits Backward) had no false positives, but had false-negative rates that ranged from 67% to 100%. Data gathered from the inferential statistics do not support the use of these three embedded validity indices because none emerged as significant predictors, possibly due to their inability to accurately identify suboptimal effort. In addition, these four embedded validity indices loaded onto the same factor (auditory attention, Factor 1), so evaluating effort using all four is likely redundant. In terms of “worst” clinical utility, Digits ACSS <5, CT Total Errors, CT Total Errors subtest 1 and 2, and CT Total Errors Bolter Index appear to provide the “least reliable” information regarding examinee effort and were not significant predictors. Notably, ∼80% of the sample passed these embedded validity indices but failed three or more of the SVTs. The high false-negative rate associated with these embedded validity indices suggests that the cutoff scores may be too lax and that a passing performance is not necessarily indicative of optimal effort. In addition, Digits Backward provided “no information” about examinee effort because all examinees passed this embedded validity index at the cutoff of ≤2.

The results of this study are based on specific cutoff scores that were selected from the literature. Whether different cutoff scores for these embedded validity indices would have similar results remains an empirical issue. Cutoff selection is difficult and may depend on the population under investigation. Easier cutoffs often have higher false-negative rates, while more difficult cutoffs will likely have higher rates of false positives. An ideal cutoff, therefore, is one where sensitivity and specificity, as well as PPP and NPP, are each approaching 100%. In neuropsychology, “excellent” values are those that approach 80%–90% (Larrabee, 2007). The same principles of cutoff selection may also be influencing the low failure rates on some of the embedded validity indices. For example, since all examinees had a Digits Backward span of two or more, the sensitivity and specificity of a different cutoff for litigating samples warrants further investigation.

PPP and NPP were utilized in this study, which is in contrast to prior research that relies heavily on sensitivity and specificity to determine classification accuracy. As PPP and NPP consider the base rate of the sample, these statistics provide more accurate information on diagnostic accuracy for clinicians. Clinically, over-identifying optimal effort examinees as suboptimal effort (false positives) results in more harmful ramifications than under-identifying suboptimal effort examinees as optimal effort (false negatives). This might include denial of disability claims, workers compensation, or other modes of occupational accommodation, and the refusal of future neuropsychological evaluation.

Finally, our results showed that ∼42% (21 participants) failed two or more SVTs. This number is relatively consistent with other failure rates of SVTs in medico-legal (forensic) samples (Gervais, Rohling, Green, & Ford, 2004; Mittenberg et al., 2002), both of which found failures approaching 40%. Excluding Digits Backward because no examinees failed this embedded validity index, the 16 embedded validity indices had failure rates ranging from 8% to 48%.

Limitations and Future Considerations

Some limitations of this study merit careful consideration. First, our study had a relatively small sample size. In addition, the categorical nature of both our predictor and outcome variables (performance on neuropsychological tests and overall effort) necessitated grouping and dichotomizing the sample accordingly. Logistic regression is considered robust to each of these issues, however.

Second, our sample was heterogeneous with regard to neurological condition, which introduces the possibility of differing predictive validity values for the different neurological groups. Of particular concern is that individuals with conditions other than mTBI were more likely to fail SVTs and embedded validity indices. Analysis revealed that all six non-mTBI individuals were classified as optimal effort by SVTs and did not have a higher false-positive rate on embedded validity indices than the mTBI group. The false-positive rate ranged from 17% to 33% for the non-mTBI group, and from 62% to 91% for the mTBI group.

As stated, due to the large number of predictor variables, an exploratory factor analysis was conducted to better understand the underlying factor structure of the embedded validity indices. We used a principle axis factoring method of extraction rather than principle component analysis or maximum-likelihood factor extraction because the embedded validity indices were not normally distributed even as continuous variables. Therefore, only the shared variance of the factors was examined. In addition, we used an oblique-oblimin rotation because of the correlation between the embedded validity indices. This rotation method necessitates the interpretation of the pattern matrix (rather than the factor or structure matrices that are produced in SPSS). This matrix displays the unique relationship between each embedded validity index and the latent factor.

Despite the numerous potential benefits of using embedded validity indices (e.g., shorter testing administration time, less “insider knowledge”/coaching influences), there is currently limited statistical information to support the adjunctive use and/or possible replacement of current free-standing SVTs with embedded indices. As stated, our findings do not support the use of some embedded validity indices alone. In this study, however, some emerged as better candidates than others for use in conjunction with free-standing SVTs in litigating settings. Cross-validation regarding the classification accuracy of these embedded validity indices in other settings (i.e., clinical or academic) is warranted, especially as our base rate of suboptimal effort was ∼40%. For this reason, we recommend that clinicians calculate PPP and NPP using base rates from their practices before beginning to use embedded validity indices. This may be achieved using the sensitivity and specificity provided in Table 10.

The major finding from this study revealed that certain embedded validity indices provide more accurate information about effort than others, although all embedded indices included in this study tended to either over-identify optimal effort or under-identify suboptimal effort. Clinically, this means that examinees with optimal effort will fail some embedded validity indices and that examinees with suboptimal effort will pass some embedded validity indices. In the current study, Reliable Digit Span had the best diagnostic validity, but still misclassified 20%–40% of examinees.

Conflict of Interest

None declared.

Acknowledgements

The authors are grateful to Cecil R. Reynolds, the former editor of Archives of Clinical Neuropsychology and the current editor of Psychological Assessment, who served as the guest action editor. This manuscript was submitted to the blind peer review process. The authors also wish to acknowledge the statistical consultation received from Richard F. Haase.

References

Babikian
T.
Boone
K. B.
Lu
P.
Arnold
G.
Sensitivity and specificity of various digit span scores in the detection of suspect effort
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
145
-
159
)
Board of Directors of American Academy of Clinical Neuropsychology.
Practice guidelines for neuropsychological assessment and consultation
The Clinical Neuropsychologist
 , 
2007
, vol. 
21
 (pg. 
209
-
231
)
Bolter
J. F.
Picano
J. J.
Zych
K.
Item error frequencies on the Halstead Category Test: An index of performance validity
1985
Paper presented at the annual meeting of the National Academy of Neuropsychology
Philadelphia, PA
Boone
K. B.
Assessment of feigned cognitive impairment. A neuropsychological perspective
 , 
2007
New York
Guilford Press
Bush
S. S.
Ruff
R. M.
Troster
A. I.
Barth
J. T.
Koffler
S. P.
Pliskin
N. H.
, et al.  . 
Symptom validity assessment: Practice issues and medical necessity NAN policy & planning committee
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
419
-
426
)
Committee on Mild Traumatic Brain Injury, American Congress of Rehabilitation Medicine (ACRM).
Definition of mild traumatic brain injury
Journal of Head Trauma Rehabilitation
 , 
1993
, vol. 
8
 
3
(pg. 
86
-
87
)
DiCarlo
M. A.
Gfeller
J. D.
Effects of coaching on detecting feigned cognitive impairment with the category test
Archives of Clinical Neuropsychology
 , 
2000
, vol. 
15
 
5
(pg. 
399
-
413
)
Forrest
T. J.
Allen
D. N.
Goldstein
G.
Malingering indexes for the Halstead Category Test
The Clinical Neuropsychologist
 , 
2004
, vol. 
18
 
2
(pg. 
334
-
347
)
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 
4
(pg. 
475
-
487
)
Gfeller
J. D.
Cradock
M. M.
Detecting feigned neuropsychological impairment with the Seashore Rhythm Test
Journal of Clinical Psychology
 , 
1998
, vol. 
54
 
4
(pg. 
431
-
438
)
Goebel
R. A.
Detection of faking on the Halstead neuropsychological test battery
Journal of Clinical Psychology
 , 
1983
, vol. 
39
 
5
(pg. 
731
-
742
)
Green
P.
Green's Word Memory Test for Microsoft Windows
 , 
2003
Edmonton, Alberta
Green's Publishing
Green
P.
Manual for the Word Memory Test
 , 
2005
Edmonton
Green's Publishing
Greiffenstein
M. F.
Baker
W. J.
Gola
T.
Validation of malingered amnesia measures with a large clinical sample
Psychological Assessment
 , 
1994
, vol. 
6
 (pg. 
218
-
224
)
Grote
C. L.
Kooker
E. K.
Garron
D. C.
Nyenhuis
D. L.
Smith
C. L.
Mattingly
M. L.
Performance of compensation seeking and non-compensation seeking samples on the Victoria Symptom Validity Test: Cross-validation and extension of a standardization study
Journal of Clinical and Experimental Neuropsychology
 , 
2000
, vol. 
22
 
6
(pg. 
709
-
719
)
Heaton
R. K.
Smith
H. H.
Lehman
R. A. W.
Vogt
A. T.
Prospects for faking believable deficits on neuropsychological testing
Journal of Consulting and Clinical Psychology
 , 
1978
, vol. 
46
 
5
(pg. 
892
-
900
)
Heilbronner
R. L.
Sweet
J. J.
Morgan
J. E.
Larrabee
G. L.
Millis
S. R.
Conference Participants
American academy of clinical neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias and malingering
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
1093
-
1129
)
Heinly
M. T.
Greve
K. W.
Bianchini
K. J.
Love
J. M.
Brennan
A.
WAIS digit span-based indicators of malingered neurocognitive dysfunction. Classification accuracy in traumatic brain injury
Assessment
 , 
2005
, vol. 
12
 
4
(pg. 
429
-
444
)
Inman
T. H.
Berry
D. T. R.
Cross-validation of indicators of malingering. A comparison of nine neuropsychological tests, four tests of malingering, and behavioral observations
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
1
-
23
)
Iverson
G. L.
Tulsky
D. S.
Detecting malingering on the WAIS-III: Unusual digit span performance patterns in the normal population and in clinical groups
Archives of Clinical Neuropsychology
 , 
2003
, vol. 
18
 (pg. 
1
-
9
)
Jasinski
L. J.
Berry
D. T. R.
Shander
A. I.
Clark
J. A.
Use of the Wechsler Adult Intelligence Scale Digit Span subtest for malingering detection: A meta-analytic review
Journal of Clinical and Experimental Neuropsychology
 , 
2011
, vol. 
33
 
3
(pg. 
300
-
314
)
Laatsch
L.
Choca
J.
Understanding the Halstead Category Test by using item analysis
Journal of Consulting and Clinical Psychology
 , 
1991
, vol. 
3
 
4
(pg. 
701
-
704
)
Larrabee
G. L.
Assessment of malingered neuropsychological deficits
 , 
2007
Oxford
Oxford University Press
Lee
D.
Reynolds
C. R.
Willson
V. L.
Standardized test administration: Why bother?
Journal of Forensic Neuropsychology
 , 
2003
, vol. 
3
 
3
(pg. 
55
-
81
)
Lezak
M. D.
Neuropsychological assessment
 , 
1983
2nd ed.
New York
Oxford University Press
Lezak
M. D.
Neuropsychological assessment
 , 
1976
New York
Oxford University Press
Lezak
M. D.
Neuropsychological assessment
 , 
1995
3rd ed.
New York
Oxford University Press
Lezak
M. D.
Howieson
D. B.
Loring
D. W.
Neuropsychological assessment
 , 
2004
4th ed.
Oxford
Oxford University Press
Loring
D. W.
Lee
G. P.
Meador
K. J.
Victoria Victoria Symptom Validity Test performance in non-litigating epilepsy surgery candidates
Journal of Clinical and Experimental Neuropsychology
 , 
2005
, vol. 
27
 (pg. 
610
-
617
)
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
 , 
2002
, vol. 
24
 (pg. 
1094
-
1102
)
Mittenberg
W.
Theroux
S.
Aguila-Puentes
G.
Bianchini
K.
Greve
K.
Rayls
K.
Identification of malingered head injury on the Wechsler Adult Intelligence Scale (3rd ed.)
The Clinical Neuropsychologist
 , 
2001
, vol. 
15
 (pg. 
440
-
445
)
Mittenberg
W.
Theroux-Fichera
S.
Zielinski
R. E.
Heilbronner
R. L.
Identification of malingered head injury on the Wechsler Adult Intelligence Scale-Revised
Professional Psychology: Research and Practice
 , 
1995
, vol. 
26
 (pg. 
491
-
498
)
Pankratz
L.
Symptom validity testing and symptom retraining: Procedures for the assessment and treatment of functional sensory disorders
Journal of Consulting and Clinical Psychology
 , 
1979
, vol. 
47
 (pg. 
409
-
410
)
Reitan
R. M.
Wolfson
D.
The Halstead-Reitan neuropsychological test battery. Theory and clinical interpretation
 , 
1993
2nd ed.
Tucson, AZ
Neuropsychology Press
Rey
A.
L'examen clinique en psychologie
 , 
1964
Paris
Presses Universitaires de France
Rimel
R. W.
Giordani
B.
Barth
J. T.
Boll
T. J.
Jane
J. A.
Disability caused by minor head injury
Neurosurgery
 , 
1981
, vol. 
9
 (pg. 
221
-
228
)
Ross
S. R.
Putnam
S. T.
Millis
S. R.
Adams
K. M.
Krukowski
R. A.
Detecting insufficient effort using the Seashore Rhythm and Speech Sounds Perception tests in head injury
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 
4
(pg. 
798
-
815
)
Sharland
M. J.
Gfeller
J. D.
A survey of neuropsychologists’ beliefs and practices with respect to the assessment of effort
Archives of Clinical Neuropsychology
 , 
2007
, vol. 
22
 (pg. 
213
-
223
)
Slick
D.
Hopp
G.
Strauss
E.
Thompson
G. B.
Victoria Symptom Validity Test
 , 
1997
Odessa, FL
Psychological Assessment Resources
Slick
D. J.
Tan
J. E.
Strauss
E. H.
Hultsch
D. F.
Detecting malingering: A survey of experts’ practices
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
465
-
473
)
Strauss
E.
Sherman
E. M. S.
Spreen
O.
) A compendium of neuropsychological tests. Administration, norms and commentary
 , 
2006
3rd ed.
Oxford
Oxford University Press
Sweet
J. J.
King
J. H.
Category test validity indicators: Overview and practice recommendations
Journal of Forensic Neuropsychology
 , 
2002
, vol. 
3
 (pg. 
241
-
271
)
Tabachnick
B. G.
Fidell
L. S.
Using multivariate statistics
 , 
2007
5th ed.
Boston
Pearson Education
Tenhula
W. N.
Sweet
J. J.
Category test cut-off scores for identifying malingerers
1994
Paper presented at the annual meeting of the International Neuropsychological Society
Cincinnati, OH
Tenhula
W. N.
Sweet
J. J.
Category test cut-off scores for identifying malingerers
1994
Paper presented at the annual meeting of the National Academy of Neuropsychology
Orlando, FL
Tenhula
W. N.
Sweet
J. J.
Double cross validation of the booklet category test in detecting malingered traumatic brain injury
The Clinical Neuropsychologist
 , 
1996
, vol. 
10
 
1
(pg. 
104
-
116
)
Tombaugh
T. N.
The Test of Memory Malingering (TOMM)
 , 
1996
North Tonawanda, NY
Multi-Health Systems
Trueblood
W.
Schmidt
M.
Malingering and other validity considerations in the neuropsychological evaluation of mild head injury
Journal of Clinical and Experimental Neuropsychology
 , 
1993
, vol. 
15
 
4
(pg. 
578
-
590
)
Wechsler
D.
Wechsler Adult Intelligence Scale-Revised
 , 
1981
New York
The Psychological Corporation

Author notes

Special Circumstances: This paper is based upon the first author's Master's thesis. Portions of the paper were presented at the 30th annual meeting of the National Academy of Neuropsychology, Vancouver, BC.