Abstract

The Victoria Symptom Validity Test (VSVT) is one of the most accurate performance validity tests. Previous research has recommended several cutoffs for performance invalidity classification on the VSVT. However, only one of these studies used a known groups design and no study has investigated these cutoffs in an exclusively mild traumatic brain injury (mTBI) medico-legal sample. The current study used a known groups design to validate VSVT cutoffs among mild traumatic brain injury litigants and explored the best approach for using the multiple recommended cutoffs for this test. Cutoffs of <18 Hard items correct, <41 Total items correct, an Easy – Hard items correct difference >6, and <5 items correct on any block yielded the strongest classification accuracy. Using multiple cutoffs in conjunction reduced classification accuracy. Given convergence across studies, a cutoff of <18 Hard items correct is the most appropriate for use with mTBI litigants.

Introduction

Neuropsychological assessment depends on examinees putting forth a performance that enables valid interpretation of test scores. When performance invalidity is detected, the correlation between objectively defined levels of brain damage and neuropsychological test performance is vitiated, preventing the neuropsychologist from accurately interpreting test data as reflective of brain dysfunction (Green, Rohling, Lees-Haley, & Allen, 2001; Fox, 2011). Base rates of performance invalidity are highly influenced by the presence of external incentives to perform poorly, with prevalence estimates reaching 30%–50% in medico-legal settings where compensation for injury is at issue (Larrabee, 2003; Mittenberg, Patton, Canyock, & Condit, 2002). Recognizing the impact of performance invalidity on test interpretation, both the National Academy of Neuropsychology (NAN; Bush et al., 2005) and the American Academy of Clinical Neuropsychology (AACN; Heilbronner et al., 2009) have issued position papers on performance validity assessment. NAN considers performance validity assessment medically essential and AACN advocates the use of multiple measures of performance validity throughout an evaluation.

Free-standing performance validity tests (PVTs) are among the most popular methods to evaluate performance validity (Sharland & Gfeller, 2007). Free-standing PVTs are stand-alone tests that serve no other clinical purpose than to determine the validity of an examinee's performance. Given that false-positive classification errors can result in denial of compensation to examinees with genuine brain dysfunction, emphasis has been placed on limiting these types of errors. Accordingly, in order to be deemed acceptable for clinical use, Larrabee (2012) and Victor, Boone, Serpa, Buehler, and Ziegler (2009) have advocated that PVTs result in no greater than a 10% false-positive rate, which is equivalent to maintaining a minimum specificity (SP) of 0.90. In contrast, these authors have recommended that sensitivity (SN), or the ability of a PVT to correctly detect performance invalidity when it is present, should be maximized only after false-positive rates are limited to <10%.

Among free-standing PVTs, the Victoria Symptom Validity Test (VSVT; Slick, Hopp, Strauss, & Thompson, 1997) is considered one of the most accurate (Sollman & Berry, 2011). The VSVT is a computerized two-item forced-choice recognition test that involves presentation of five-digit number strings. Stimuli are classified as Easy or Hard based on the degree of similarity between targets and foils at test. In addition, the length of delay between presentation of digit strings and recognition testing is varied (5, 10, and 15 s delay). Combining levels of item difficulty with length of delay results in six blocks of VSVT trials. Measures of items correct and response latency are collected for Easy, Hard, and Total items. Originally, the VSVT manual (Slick et al., 1997) recommended that interpretation of the VSVT should be “largely based on binomial probability theory” (p. 29). However, subsequent research demonstrated that binomial probably theory-derived cutoffs are too conservative for performance invalidity detection (e.g., Grote et al., 2000).

Several studies have been conducted to validate VSVT cutoffs. These studies have implemented three types of validation designs: (i) simulated malingering designs comparing examines instructed to feign neuropsychological impairment to those instructed to perform their best, (ii) differential prevalence designs evaluating performance invalidity base rates in samples at-risk for performance invalidity against those who are not at-risk, and (iii) known groups designs examining the ability of the VSVT to discriminate between examinees independently classified as performance valid and performance invalid (Lezak, Howieson, Bigler, & Tranel, 2012). Limitations, however, have been detailed for each of these designs. Specifically, simulated malingering designs often lack external validity and overestimate classification accuracy, especially when non-clinical examinees are included in samples (Haines & Norris, 1995; Rogers, 1997; Rogers, Bagby, & Dickens, 1992; Vickery, Berry, Inman, Harris, & Orey, 2001). In contrast, differential prevalence designs commonly use litigation status as an indicator of risk for performance invalidity and do not include an independent measure of performance validity. These studies often inadvertently include examinees exhibiting performance invalidity in no-risk, clinical samples and examinees exhibiting performance validity in at-risk, litigating samples and thus lack internal validity (Brennan & Gouvier, 2006). Finally, known groups designs are limited by the accuracy of the independent criteria used for performance validity criterion group assignment (Bilder, Sugar, & Hellemann, 2014). To overcome this issue, some known groups design studies include only extreme performance validity cases (definitely malingering vs. definitely not malingering) and screen out more subtle cases; however, this strategy inflates estimated classification accuracy and limits external validity (Lezak et al., 2012). Given the limitations in each of these research designs, Rogers (2008) recommends that PVTs be evaluated in all three types of studies in order to establish convergence of results.

Most early validation studies of the VSVT implemented simulated malingering designs. Slick, Hopp, Stauss, and Spellacy (1996) evaluated the VSVT in a sample of non-compensation-seeking neurological outpatients, non-clinical students instructed to perform their best, and non-clinical students instructed to simulate neuropsychological impairment. A cutoff of <16 on Easy or Hard items correct resulted in high sensitivity (SN; 0.81) and SP (1.00) and also classified 15% of a separate group of compensation-seeking examinees as performance invalid. Similarly, Strauss and colleagues (2002) reported that a cutoff of <16 Hard items correct accurately discriminated between non-compensation-seeking head injury volunteers and head injury volunteers instructed to simulate neuropsychological impairment with SN of 0.82 and SP of 0.94. In a more recent study, Macciochi, Seel, Alderson, and Goodsall (2006) explored VSVT performance among acute severe TBI neurorehabilitation inpatients and samples from the VSVT manual of healthy subjects instructed to simulate neuropsychological impairment and healthy control subjects. Among severe TBI patients, Macciochi et al. found that only 4% scored <22 Easy items correct, 8% and 17% scored below 41 and 44 Total items correct, respectively, 13% exhibited response latencies on Easy items >4.66 s, and 10% exhibited Hard item response latencies >6.89 s. However, severe TBI inpatients with visual perceptual and verbal initiation impairments were at risk for scoring poorly on VSVT Total items. After excluding severe TBI patients with severe visual perceptual or verbal initiation impairments, Macciochi et al. found that a cutoff of <38 total items correct was able to correctly classify severe TBI and healthy control subjects as performance valid and healthy simulators as performance invalid with SN of 0.74 and SP of 1.00. Increasing this cutoff to <44 Total items correct raised SN to 0.91 while having a negligible impact on SP (0.99).

Several differential prevalence design studies have explored VSVT cutoffs as well. Grote and colleagues (2000) evaluated failure rates on the VSVT among a compensation-seeking group of primarily mild traumatic brain injury (mTBI) examinees and a group of non-compensation-seeking intractable epilepsy patients. A cutoff of <18 Hard items correct and a cutoff of <21 Hard items correct yielded failure rates of 0% and 6.7%, respectively, among intractable epilepsy patients. Compensation-seeking mTBI patients were much more likely to obtain scores <18 Hard items correct (49.1%) and <21 Hard items correct (64.2%). Easy and Hard item response latency measures also demonstrated promise as measures of performance invalidity. Low failure rates among epilepsy patients were documented at response latencies levels of ≥3 s on Easy items (3.3% failure rate) and ≥4 s on Hard items (10% failure rate); however, the Easy item response latency measure identified substantially less compensation-seeking examinees as performance invalid (28%) than the Hard item response latency cutoff (56%). In addition, a consistency of response latency measure (i.e., a standard deviation for response latency for Hard items ≥1.9 s) and a consistency of items correct performance (items correct <6/8 on any one block of items) demonstrated low failure rates of 10% and 3.3% among epilepsy patients, respectively, and considerably higher failure rates among compensation-seeking examinees (57.7% and 59.2%, respectively).

Other differential prevalence studies examining VSVT performance among intractable epilepsy surgery candidates have reported higher failure rates, raising questions about the SP of proposed VSVT cutoffs in this diagnostic population. Following Grote and colleagues (2000), Loring, Lee, and Meador (2005) examined cutoffs of <18 Hard items correct and <21 Hard items correct in a sample of 120 epilepsy surgery candidates. Both cutoffs resulted in high failure rates, with 13.3% of their sample scoring below 18 items and 20% scoring below 21 Hard items correct. Further analyses demonstrated that epilepsy surgery candidate examinees with low intellectual functioning were at particular risk for scoring below cutoffs on VSVT Hard items. However, as noted by Grote (2007), Loring and colleagues (2005) did not directly screen for compensation-seeking status and thus may have inadvertently included examinees who had an external incentive to perform poorly in their sample. A more recent study with intractable epilepsy surgery candidates documented VSVT Hard items correct scores intermediate to Grote and colleagues (2000) and Loring and colleagues (2005) (Keary et al., 2013). In this sample of 404 intractable epilepsy surgery candidates, 5% failed to achieve a score of at least 18 correct Hard items and 13% failed to achieve a score of at least 21 correct Hard items. Epilepsy surgery candidate examinees with low intellectual functioning and poor working memory were at particular risk for scoring below VSVT Hard item correct cutoffs. The authors noted that none of the patients in their sample were known to be involved in litigation at the time of their evaluation and that disability would be more likely to be granted on the basis of seizure severity, not cognitive functioning. As such, compensation-seeking status likely had little to no effect on their results.

Differential prevalence studies have also evaluated cutoffs based on VSVT Hard items correct in other neurological conditions. Loring, Larrabee, Lee, and Meador (2007) examined cutoffs of <18 Hard items correct and <21 Hard items correct in a heterogeneous clinically referred sample of dementia (n = 50), cerebrovascular event (n = 38), multiple sclerosis (n = 19), mixed diagnosis (n = 27), memory complaints (n = 163), and TBI (n = 49) patients. None of these examinees had a known external financial incentive. A cutoff of <18 Hard items correct resulted in an overall failure rate of 16%, with 22% of dementia, 16% of cerebrovascular, 11% of multiple sclerosis, 15% of mixed diagnoses, 15% of memory complaints, and 18% of TBI patients falling below this cutoff. An even higher average failure rate of 27% was documented with a cutoff of <21 Hard items correct. Dementia patients had the highest failure rate (38%) and memory complaints the lowest (22%), with cerebrovascular event (29%), TBI (29%), multiple sclerosis (26%), and mixed diagnosis (26%) patients intermediate to them at this cutoff. A separate group of non-clinically referred TBI (n = 20), electrical injury (n = 3), and cerebrovascular event (n = 2) examinees with an external incentive were also evaluated. These non-clinically referred examinees had much higher failure rates at cutoffs of <18 Hard items correct (44%) and <21 Hard items correct (60%). Thus, while these cutoffs were likely sensitive to performance invalidity among examinees with an external incentive, they led to troublingly high failure rates among clinically referred patients with a range of presenting conditions. However, similar to Loring and colleagues (2005), Loring and colleagues (2007) failed to definitively screen out examinees with an external incentive, and thus may have retained some compensation-seeking examinees in their clinical sample (Grote, 2007; Jones, 2013a).

Only one study to date has used a known groups design to investigate VSVT cutoffs. Jones (2013a) explored VSVT cutoffs in a sample composed of outpatient active duty veterans. The majority of examinees were seen for evaluation for closed head injury, heat injury, and blast exposure. An additional 10% were evaluated for brain disease (multiple sclerosis, epilepsy, Huntington's disease, etc.). Most, if not all, examinees had an external incentive to perform poorly. Examinees were classified as “non-malingerers,” “probable malingerers,” and “probable-to-definite malingerers” based on their performance on two self-report symptom validity tests (the Symptom Validity Scale and the Response Bias Scale, both from the MMPI-2; Gervais, Ben-Porath, Wygant, & Green, 2007; Lees-Haley, English, & Glenn, 1991) and four PVTs (the Test of Memory Malingering, the Word Memory Test, the Effort Index for the Repeatable Battery of Neuropsychological Status, and Reliable Digit Span; Babikian, Boone, Lu, & Arnold, 2006; Green, 2003; Randolph, 1998; Silverberg, Weitheimer, & Fichtenberg, 2007; Tombaugh, 1996). Examinees failing 0 PVTs/ symptom validity tests were classified as “non-malingerers,” examinees failing exactly two PVTs/symptom validity tests were classified as “probable malingerers,” and examinees failing three or more PVTs/symptom validity tests were classified as “probable-to-definite malingerers” (Larrabee, Greiffenstein, Greve, & Bianchini, 2007; Larrabee, 2008). Notably, examinees failing a single PVT/symptom validity test were excluded from analyses.

In this study, item correct scores were able to accurately discriminate between “non-malingerers” and “probable-to-definite malingers” (Jones, 2013a). Cutoffs of <23 Easy items correct (SN = 0.52, SP = 0.95), <20 Hard items correct (SN = 0.91, SP = 0.93), and <44 Total items correct (SN = 0.91, SP = 0.93) yielded maximal classification accuracy. In addition, an Easy items correct and Hard items correct difference score cutoff of >4 resulted in strong SN (0.80) and SP (0.96). Response latency cutoffs demonstrated promise among these examinees, but classification accuracy was generally weaker than with items correct cutoffs. An Easy item response latency cutoff of ≥4 s resulted in SN of 0.30 and SP of 0.96 and a Hard item response latency cutoff of ≥5 s yielded SN of 0.57 and SP of 0.93. Among “non-malingerers” and “probable malingers,” item correct cutoff scores again demonstrated strong classification accuracy. An Easy item correct cutoff of <23 yielded SN of 0.44 and SP of 0.95, a Hard item correct cutoff <20 resulted in SN of 0.79 and SP of 0.93, a Total items correct cutoff of <44 resulted in SN of 0.85 and SP of 0.90, and an Easy items correct and Hard items correct difference score of >4 resulted in SN of 0.67 and SP of 0.96. An Easy item response latency cutoff of ≥4 s and a Hard item response latency cutoff of ≥5 s again yielded adequate SP (0.96 and 0.93, respectively), but considerably lower SN (0.28 and 0.48, respectively) relative to item correct cutoffs. Thus, Jones (2013a) demonstrated strong classification accuracy for a variety of VSVT cutoff scores. However, as noted, this study excluded examinees who failed a single PVT from further analyses and as a result may have overestimated the classification accuracy of these cutoffs (Lezak et al., 2012).

Given the limited number of VSVT known groups studies, the current study was conducted to provide further converging evidence on the validity of VSVT cutoff scores using a known groups design. There were three specific aims of this study. First, we sought to validate VSVT cutoffs in an exclusively mTBI sample, as these examinees are commonly encountered in medico-legal contexts and no study to date has focused exclusively on this diagnostic group. Secondly, we aimed to adopt a more conservative approach than Jones (2013a) by retaining all examinees in analyses, recognizing that exclusion of examinees from analyses might lead to overestimation of classification accuracy and limit external validity. Thirdly, given that several VSVT cutoffs have been recommended previously, we sought to evaluate the best strategy for using these multiple cutoffs. Some research has suggested that using multiple PVTs can increase false-positive rates substantially (Berthelson, Mulchan, Odland, Miller, & Mittenberg, 2013; Bilder et al., 2014; Silk-Eglit, Stenclik, Miele, Lynch, & McCaffrey, 2015a, 2015b), while other research has argued otherwise (Davis & Millis, 2014; Larrabee, 2014). Thus, we aimed to evaluate the impact of using multiple cutoffs from the VSVT in conjunction on rates of false-positive classification. We predicted that VSVT Hard items correct would yield the strongest classification accuracy, although many other VSVT cutoffs would demonstrate adequate SP for clinical implementation. We also predicted that recommended VSVT cutoffs would be less stringent and that classification accuracy of VSVT cutoffs would be lower than in Jones (2013a) due to our use of a more conservative methodological approach. Lastly, we predicted that using multiple VSVT cutoffs would increase false-positive rates above clinically acceptable levels.

Materials and Methods

Participants

The sample consisted of 134 examinees seen for neuropsychological evaluation for medico-legal purposes in an urban setting. Three inclusion criteria were implemented for this sample. Examinees must (i) have met the American Congress of Rehabilitation Medicine Mild Traumatic Brain Injury Committee of the Head Injury Interdisciplinary Special Interest Group criteria for a diagnosis of mTBI (Committee on Mild Traumatic Brian Injury, ACRM, 1993); (ii) have been administered the VSVT and have sufficient data on all VSVT indexes under consideration; and (iii) have been administered at least four out of five PVTs used to establish criterion performance valid and performance invalid groups. In cases in which an examinee was only administered four criterion PVTs, they must not have failed only a single PVT. To anticipate what will be discussed further below, examinees were grouped as performance valid and performance invalid using a criterion of failing ≥2 PVTs. As a result, implementing this addendum to inclusion Criterion 3 ensured that missing PVT data would never have resulted in a change in criterion group placement. Eight examinees were excluded for not receiving a diagnosis of mTBI, 33 examinees were excluded for not having been administered the VSVT or not having all data on VSVT indexes, and 1 examinee was excluded for not having sufficient criterion PVT data. This resulted in a final sample size of 92 examinees, all of whom met diagnostic criteria for mTBI and had an external incentive to perform poorly.

Materials

All examinees were administered a comprehensive neuropsychological battery by a doctoral level psychologist specializing in neuropsychology that included the Halstead–Reitan Neuropsychological Battery (Reitan & Wolfson, 1993) and several free-standing PVTs, including the VSVT. In keeping with recommendations that multiple free-standing and embedded PVTs should be used to classify performance validity during an evaluation (Heilbronner et al., 2009), criterion PVTs used to group examinees as performance valid and performance invalid included two free-standing PVTs and three embedded PVTs. Criterion PVTs were selected to ensure that their classifications were independent of one another and thus not redundant. This is important because if two PVTs are redundant, then failing those two tests is equivalent to failing only a single PVT. Establishing independence of measures therefore provides support for the validity of performance validity classifications based on multiple PVTs. Larrabee (2008) has argued that free-standing PVTs are typically independent of one another and that embedded PVTs that tap different cognitive domains also tend to be non-redundant. As such, embedded PVTs were selected from different cognitive domains and included Reliable Digit Span (an attention measure; Lezak et al., 2012), Finger Tapping both hands combined (a motor measure; Lezak et al., 2012), and Category Test Subtest 7 Errors (a concept formation measure; Lezak et al., 2012). Free-standing criterion PVTs consisted of the WMT and the TOMM. Embedded PVT cutoffs were ≤7 on RDS (Greiffenstein, Baker, & Gola, 1994; Heinly, Greve, Bianchini, Love, & Brennan, 2005), >5 Errors on Subtest 7 of the Category Test (Dicarlo, Gfeller, & Oliveri, 2000), and <63 on Finger Tapping both hands combined (Larrabee, 2003). Standard cutoffs for “easy” trials based on the test manual were used for the WMT, while alternative cutoffs previously validated by Jones (2013b), Greve, Bianchini, and Doane (2006), Greve, Ord, Curtis, Bianchini, and Curtis (2008), and Stenclik, Miele, Silk-Eglit, Lynch, and McCaffrey (2013) were implemented for the TOMM (Trial 1 < 43; Trial 2 < 49; Retention Trial < 49). Green (2005) notes that the WMT consists of six trials, three of which are considered “easy” and taken to reflect performance validity and three of which are “hard” and are sensitive to genuine memory abilities. Importantly, Green, Montijo, and Brockhaus (2011) noted that there is a Genuine Memory Impairment Profile (GMIP) based on discrepancies between easy and hard trial performance that is used to determine if memory impairment on the WMT is genuine. Green, Flaro, Brockhaus, and Montijo (2012) argued that memory impairment sufficient to cause failure on WMT easy subtests is unlikely in mTBI examinees without a comorbid diagnosis of dementia. Given that none of the examinees in the current study met criteria for dementia, the GMIP was not analyzed in the current study.

Procedure

Performance on the WMT, TOMM, RDS, Category Test Subtest 7, and Finger Tapping both hands combined served as the criterion by which examinees were classified as performance valid and performance invalid in the current study. Previous research indicates that using a classification threshold of failing ≥2 PVTs in aggregated models of five PVTs results in both high SN and SP (Larrabee, 2003). Moreover, this research suggests that examinees who fail ≥2 PVTs show similar patterns of performance on neuropsychological and PVTs as examinees who perform below chance on forced-choice recognition measures (Larrabee, 2003). As such, examinees who failed two or more of these PVTs were classified as performance invalid, while all other examinees were classified as performance valid. Analyses sought to validate VSVT indexes previously used for performance validity classification against this criterion. Thus, the current study evaluated the classification accuracy of the following VSVT indexes: Easy items correct, Hard items correct, Total Items correct, Easy–Hard items correct, consistency of items correct, Easy items response latency, Hard items response latency, and the standard deviation of Hard items response latency scores.

Data Analytic Plan

Data analyses focused on four main goals. First, we sought quantitatively to establish that our criterion PVTs were independent of one another. Following Jones (2013a), we evaluated whether there was an association between passing or failing any pair of criterion PVTs using χ2 tests. To ensure that at least one PVT was failed, we restricted these analyses to examinees who failed at least two criterion PVTs (n = 58).

Secondly, we aimed to document raw score differences between performance valid and performance invalid groups on each of the VSVT indexes described above. Two effect sizes were reported to facilitate interpretation. Cohen's d measures mean differences across the performance valid and performance invalid groups relative to the pooled standard deviation from both groups. In contrast, Glass's Delta measures mean differences across groups relative to the standard deviation of the performance valid group only. As the performance valid group's scores are likely more valid estimates of population parameters due to the lack of extraneous factors that are present in the performance invalid group, Glass's Delta is a more accurate effect size measure in this context. Cohen's d is reported to help synthesize findings across studies.

Our third goal was to investigate the classification accuracy of each of the VSVT indexes described above. We sought to document cutoffs with acceptable classification accuracy in the current sample and to explore the classification accuracy of previously recommended cutoffs. VSVT cutoffs were deemed acceptable if they maintained SP of at least 0.90 (Larrabee, 2012; Victor et al., 2009). When deciding between cutoffs with equivalent SP, cutoffs with higher SN were preferred. To provide a fuller picture of the classification accuracy of specific VSVT cutoffs, positive predictive values (PPV) and negative predictive values (NPV) at specific base rates of performance invalidity and positive likelihood ratios (LR+) and negative likelihood ratios (LR−) were also computed. One shortcoming of PPV and NPV is that they are influenced by the base rate of performance invalidity. In contrast, LR+ and LR− are independent of the base rate of performance invalidity and provide an indication of how much a PVT finding will raise or lower the probability that an examinee is exhibiting performance invalidity, thus making them especially informative. An LR+ of >10 is said to have a large effect on increasing the post-test probability that an examinee is exhibiting performance invalidity, an LR+ from 5 to 10 a moderate effect, and an LR+ from 2 to 5 a small effect. Alternatively, an LR− of <0.1 has a large effect on ruling out that an examinee is exhibiting performance invalidity, an LR− of 0.1–0.2 a moderate effect, and an LR− of 0.2–0.5 a small effect (Grimes & Schulz, 2005; Hayden & Brown, 1999). Lastly, an odds ratio at each cutoff was calculated as a measure of effect size.

Fourthly, we wanted to establish the best approach for using the multiple cutoffs available on the VSVT that met acceptable classification accuracy in the current sample. There were three chief questions for this goal. First, given that multiple cutoffs are available, what is the result of using all of these cutoffs together? Secondly, which of these cutoffs should be used together? Thirdly, what is the result of using only those VSVT cutoffs that should be used together in conjunction? To answer this first question, we evaluated the classification accuracy of using all cutoffs available on the VSVT. For the second question, we determined which cutoffs should be used together by establishing their independence by conducting χ2 tests among all pairs of VSVT cutoffs in our entire sample. As mentioned above, establishing independence ensures that performance on one cutoff is not redundant with another. Finally, for the third question, we evaluated the classification accuracy of using a variety of pairs of independent VSVT cutoffs.

Results

Demographics

In the overall sample, the average age was 45.00 years (SD = 12.42) and the average education was 13.86 years (SD = 2.45). There were 49 women (53%) and 85 examinees were right-handed (92%). Overall, 59 out of 91 examinees failed the WMT (64%), 46 out of 91 failed the TOMM (51%), 34 out of 91 failed the Category Test Subtest 7 errors (38%), 26 out of 92 examinees failed RDS (28%), and 15 out of 89 failed Finger Tapping both hands combined (17%). Table 1 presents demographic variables after dividing the sample into performance valid (n = 39) and performance invalid (n = 53) groups on the basis of their performance on these criterion PVTs. Significant effects were not found for age, t(89) = 1.51, p = .13, education, t(89) = 1.17, p = .34, or gender, χ2(1, n= 92) = 1.91, p = .17. There were more left-handed examinees in the performance invalid group than the performance valid group, p = .02.

Table 1.

Summary of demographic variables for the performance valid and performance invalid groups

 Performance valid (n = 39) Performance invalid (n = 53) Test p-value 
Mean (SDMean (SD
Age 42.77 (11.54) 46.64 (12.89) t(90) = 1.51 .13 
Education 14.21 (2.39) 13.60 (2.49) t(90) = 1.17 .34 
Gender   χ2(1, n = 92) = 1.91 .17 
 Male 22 (24%) 21 (23%)   
 Female 17 (18%) 32 (35%)   
Handedness    .02a,
 Right 39 (42%) 45 (49%)   
 Left 0 (0%) 7 (8%)   
 Performance valid (n = 39) Performance invalid (n = 53) Test p-value 
Mean (SDMean (SD
Age 42.77 (11.54) 46.64 (12.89) t(90) = 1.51 .13 
Education 14.21 (2.39) 13.60 (2.49) t(90) = 1.17 .34 
Gender   χ2(1, n = 92) = 1.91 .17 
 Male 22 (24%) 21 (23%)   
 Female 17 (18%) 32 (35%)   
Handedness    .02a,
 Right 39 (42%) 45 (49%)   
 Left 0 (0%) 7 (8%)   

Notes: Age and Education were run as t-tests; Gender and Handedness were run as χ2 statistics.

SD = standard deviation.

aExpected cell counts were <10; therefore, Fisher's exact test was implemented for this analysis.

*Statistically significant.

Performance Validity Test Independence

Table 2 presents χ2 tests for all criterion PVTs among examinees that failed ≥2 PVTs (n = 53). No pairwise combination of PVTs resulted in a significant χ2 test. This provides quantitative support that criterion PVTs are independent of one another and that formation of our criterion groups is valid.

Table 2.

The χ2 tests of independence among criterion performance validity tests

 RDS
 
Finger tapping
 
Category Test 7
 
WMT
 
TOMM
 
Test p-value Test p–value Test p-value Test p-value Test p–value 
RDS   0.241 .624 0.047 .828 a .494 a 1.00 
Finger tapping     a .086 a 1.00 a 1.00 
Category Test 7       a .497 a .118 
WMT         a 1.00 
TOMM           
 RDS
 
Finger tapping
 
Category Test 7
 
WMT
 
TOMM
 
Test p-value Test p–value Test p-value Test p-value Test p–value 
RDS   0.241 .624 0.047 .828 a .494 a 1.00 
Finger tapping     a .086 a 1.00 a 1.00 
Category Test 7       a .497 a .118 
WMT         a 1.00 
TOMM           

Notes: RDS = Reliable Digit Span (Greiffenstein et al., 1994; Heinly et al., 2005); Finger Tapping = mean number of taps from both hands combined on the Finger Tapping Test (Larrabee, 2003); Category Test 7 = total errors on subtest 7 of the Category Test (Dicarlo et al., 2000); WMT = Word Memory Test (Green, 2003);

TOMM = Test of Memory Malingering (Tombaugh, 1996). aExpected cell counts were <10; therefore, Fisher's exact test was implemented for these analyses.

Victoria Symptom Validity Test Raw Score Differences

Table 3 presents the mean raw score differences on relevant VSVT indexes across the performance valid and performance invalid groups. A Bonferroni corrected α-level of 0.0063 was implemented due to the use of eight comparisons. Significant differences across the performance valid and performance invalid groups emerged for all indexes under investigation with the exception of Easy items correct. Glass's delta effect sizes for all indexes were large, with the largest effect sizes observed on Hard items correct, Total items correct, Easy–Hard items correct, and average items correct in each block. Cohen's d measures of effect size were generally smaller, but were still consistently in the moderate–large effect size range. Hard items correct, Total items correct, Easy–Hard items correct, and average items correct in each block again resulted in the largest effect sizes when calculated using Cohen's d.

Table 3.

Victoria Symptom Validity Test raw score differences across the performance valid and performance invalid groups

 Performance valid (n = 39) Performance invalid (n = 53) t-test p-value Glass's Delta Cohen's d 
Mean (SDMean (SD
Easy items correct 23.69 (0.89) 22.32 (3.41) 2.80 .0068 1.54 0.55 
Hard items correct 21.74 (2.87) 13.55 (6.50) 8.16 <.0001* 2.85 1.63 
Total items correct 45.69 (3.28) 35.87 (8.80) 7.39 <.0001* 2.99 1.48 
Easy–Hard items correct 1.95 (2.85) 8.77 (5.52) 7.61 <.0001* 2.39 1.55 
Items correct in each block 7.57 (0.53) 5.98 (1.47) 7.30 <.0001* 3.00 1.44 
Response latency easy items 2.39 (1.50) 3.53 (1.93) 3.11 .0022* 0.76 0.66 
Response latency hard items 3.93 (1.92) 7.39 (6.09) 3.92 .0002* 1.80 0.77 
Response latency hard items SD 1.89 (1.46) 4.89 (7.32) 2.89 .0052* 2.06 0.57 
 Performance valid (n = 39) Performance invalid (n = 53) t-test p-value Glass's Delta Cohen's d 
Mean (SDMean (SD
Easy items correct 23.69 (0.89) 22.32 (3.41) 2.80 .0068 1.54 0.55 
Hard items correct 21.74 (2.87) 13.55 (6.50) 8.16 <.0001* 2.85 1.63 
Total items correct 45.69 (3.28) 35.87 (8.80) 7.39 <.0001* 2.99 1.48 
Easy–Hard items correct 1.95 (2.85) 8.77 (5.52) 7.61 <.0001* 2.39 1.55 
Items correct in each block 7.57 (0.53) 5.98 (1.47) 7.30 <.0001* 3.00 1.44 
Response latency easy items 2.39 (1.50) 3.53 (1.93) 3.11 .0022* 0.76 0.66 
Response latency hard items 3.93 (1.92) 7.39 (6.09) 3.92 .0002* 1.80 0.77 
Response latency hard items SD 1.89 (1.46) 4.89 (7.32) 2.89 .0052* 2.06 0.57 

Notes: df = 90 for all t-tests.

SD = standard deviation.

*Statistically significant at a Bonferroni-corrected α-level of <0.0063.

Classification Accuracy of Victoria Symptom Validity Test Cutoffs

Classification accuracy analyses first explored VSVT cutoffs based on items correct scores. Table 4 provides classification accuracy statistics for VSVT Easy items correct, VSVT Hard items correct, and VSVT Total items correct, and Table 5 presents classification accuracy statistics on Easy–Hard items correct and items correct consistency scores. Adequate SP was achieved at cutoffs of <23 Easy items correct, <18 Hard items correct, <41 Total items correct, an Easy–Hard items correct score of >6, and an items correct consistency score of <5 on any one VSVT block. SN values were considerably lower for <23 Easy items correct (0.32) than for <18 Hard items correct (0.68), <41 Total items correct (0.68), Easy–Hard items correct >6 (0.64), and <5 on any one block (0.70). For these latter four cutoffs, PPV was strong (≥0.73) at higher base rates of performance invalidity, while NPV remained strong (≥0.71) across all prevalence estimates of performance invalidity, suggesting that these cutoffs may be especially accurate in high performance invalidity base rate contexts. LR+ values for these four recommended cutoffs demonstrated that they had a moderate effect on the post-test probability of performance invalidity, while their LR− values indicated that they had small effects on ruling out performance invalidity. In contrast, for <23 Easy items correct, PPV was strong only in high performance invalidity base rate contexts and NPV was strong only at low performance invalidity base rate contexts. The LR+ for <23 Easy items correct indicated a moderate effect of this cutoff on the post-test probability of performance invalidity; however, the LR− value revealed that this cutoff had little to no effect on ruling out performance invalidity.

Table 4.

Classification accuracy of items correct cutoffs

VSVT cutoff Sensitivity, specificity (95% CI)
 
PPV for select base rates
 
NPV for select base rates
 
Likelihood ratios (95% CI)
 
Odds ratio (95% CI) 
Sensitivity Specificity 10% 20% 30% 40% 50% 10% 20% 30% 40% 50% Positive Negative 
Easy items correct 
 Easy < 24 0.45 (0.33–0.59) 0.82 (0.67–0.91) 0.22 0.39 0.52 0.63 0.72 0.93 0.86 0.78 0.69 0.60 2.52 (1.21–5.25) 0.67 (0.50–0.89) 3.78 (1.42–10.09) 
Easy < 23 0.32 (0.21–0.45) 0.95 (0.83–0.99) 0.41 0.61 0.73 0.81 0.86 0.93 0.85 0.77 0.68 0.58 6.26 (1.53–25.51) 0.72 (0.59–0.87) 8.74 (1.88–40.56) 
 Easy < 22 0.23 (0.13–0.36) 0.97 (0.87–1.00) 0.50 0.69 0.79 0.85 0.90 0.92 0.83 0.75 0.65 0.56 8.83 (1.20–65.09) 0.79 (0.68–0.93) 11.12 (1.38–89.67) 
 Easy < 21 0.13 (0.07–0.25) 0.97 (0.87–1.00) 0.36 0.56 0.69 0.77 0.84 0.91 0.82 0.72 0.63 0.53 5.15 (0.66–40.18) 0.89 (0.79–1.0) 5.78 (0.68–49.09) 
Hard items correct 
 Hard < 22 0.83 (0.71–0.91) 0.67 (0.51–0.79) 0.22 0.38 0.52 0.62 0.71 0.97 0.94 0.90 0.85 0.80 2.49 (1.57–3.95) 0.26 (0.14–0.48) 9.78 (3.68–26.01) 
 Hard < 21 0.79 (0.67–0.88) 0.77 (0.62–0.87) 0.28 0.46 0.60 0.70 0.77 0.97 0.94 0.90 0.85 0.79 3.43 (1.91–6.19) 0.27 (0.16–0.47) 12.73 (4.69–34.52) 
 Hard < 20 0.74 (0.60–0.84) 0.85 (0.70–0.93) 0.35 0.54 0.67 0.76 0.83 0.97 0.93 0.88 0.83 0.76 4.78 (2.25–10.16) 0.31 (0.20–0.50) 15.32 (5.29–44.35) 
 Hard < 19 0.72 (0.58–0.82) 0.85 (0.70–0.93) 0.34 0.54 0.67 0.76 0.82 0.96 0.92 0.87 0.82 0.75 4.66 (2.19–9.92) 0.33 (0.21–0.52) 13.93 (4.85–40.03) 
Hard < 18 0.68 (0.55–0.79) 0.90 (0.76–0.96) 0.42 0.62 0.74 0.82 0.87 0.96 0.92 0.87 0.81 0.74 6.62 (2.57–17.07) 0.36 (0.24–0.54) 18.53 (5.67–60.57) 
 Hard < 17 0.66 (0.53–0.77) 0.92 (0.80–0.97) 0.49 0.68 0.79 0.85 0.90 0.96 0.92 0.86 0.80 0.73 8.59 (2.85–25.90) 0.37 (0.25–0.54) 23.33 (6.31–86.29) 
 Hard < 16 0.64 (0.51–0.76) 0.92 (0.80–0.97) 0.48 0.68 0.78 0.85 0.89 0.96 0.91 0.86 0.79 0.72 8.34 (2.76–25.20) 0.39 (0.27–0.56) 21.47 (5.83–79.17) 
Total items correct 
 Total < 44 0.74 (0.60–0.84) 0.82 (0.67–0.91) 0.31 0.51 0.64 0.73 0.80 0.97 0.93 0.88 0.82 0.76 4.10 (2.06–8.18) 0.32 (0.20–0.52) 12.74 (4.59–35.34) 
 Total < 43 0.72 (0.58–0.82) 0.82 (0.67–0.91) 0.31 0.50 0.63 0.73 0.80 0.96 0.92 0.87 0.81 0.74 4.00 (2.00–7.98) 0.35 (0.22–0.54) 11.58 (4.21–31.89) 
 Total < 42 0.68 (0.55–0.79) 0.87 (0.73–0.94) 0.37 0.57 0.69 0.78 0.84 0.96 0.92 0.86 0.80 0.73 5.30 (2.29–12.26) 0.37 (0.24–0.55) 14.40 (4.79–43.34) 
Total < 41 0.68 (0.55–0.79) 0.90 (0.76–0.96) 0.42 0.62 0.74 0.82 0.87 0.96 0.92 0.87 0.81 0.74 6.62 (2.57–17.07) 0.36 (0.24–0.54) 18.53 (5.67–60.57) 
 Total < 40 0.66 (0.53–0.77) 0.90 (0.76–0.96) 0.42 0.62 0.73 0.81 0.87 0.96 0.91 0.86 0.80 0.73 6.44 (2.50–16.62) 0.38 (0.26–0.56) 17.01 (5.23–55.39) 
 Total < 39 0.62 (0.49–0.74) 0.95 (0.83–0.99) 0.57 0.75 0.84 0.89 0.92 0.96 0.91 0.85 0.79 0.72 12.14 (3.10–47.59) 0.40 (0.28–0.57) 30.53 (6.63–140.61) 
VSVT cutoff Sensitivity, specificity (95% CI)
 
PPV for select base rates
 
NPV for select base rates
 
Likelihood ratios (95% CI)
 
Odds ratio (95% CI) 
Sensitivity Specificity 10% 20% 30% 40% 50% 10% 20% 30% 40% 50% Positive Negative 
Easy items correct 
 Easy < 24 0.45 (0.33–0.59) 0.82 (0.67–0.91) 0.22 0.39 0.52 0.63 0.72 0.93 0.86 0.78 0.69 0.60 2.52 (1.21–5.25) 0.67 (0.50–0.89) 3.78 (1.42–10.09) 
Easy < 23 0.32 (0.21–0.45) 0.95 (0.83–0.99) 0.41 0.61 0.73 0.81 0.86 0.93 0.85 0.77 0.68 0.58 6.26 (1.53–25.51) 0.72 (0.59–0.87) 8.74 (1.88–40.56) 
 Easy < 22 0.23 (0.13–0.36) 0.97 (0.87–1.00) 0.50 0.69 0.79 0.85 0.90 0.92 0.83 0.75 0.65 0.56 8.83 (1.20–65.09) 0.79 (0.68–0.93) 11.12 (1.38–89.67) 
 Easy < 21 0.13 (0.07–0.25) 0.97 (0.87–1.00) 0.36 0.56 0.69 0.77 0.84 0.91 0.82 0.72 0.63 0.53 5.15 (0.66–40.18) 0.89 (0.79–1.0) 5.78 (0.68–49.09) 
Hard items correct 
 Hard < 22 0.83 (0.71–0.91) 0.67 (0.51–0.79) 0.22 0.38 0.52 0.62 0.71 0.97 0.94 0.90 0.85 0.80 2.49 (1.57–3.95) 0.26 (0.14–0.48) 9.78 (3.68–26.01) 
 Hard < 21 0.79 (0.67–0.88) 0.77 (0.62–0.87) 0.28 0.46 0.60 0.70 0.77 0.97 0.94 0.90 0.85 0.79 3.43 (1.91–6.19) 0.27 (0.16–0.47) 12.73 (4.69–34.52) 
 Hard < 20 0.74 (0.60–0.84) 0.85 (0.70–0.93) 0.35 0.54 0.67 0.76 0.83 0.97 0.93 0.88 0.83 0.76 4.78 (2.25–10.16) 0.31 (0.20–0.50) 15.32 (5.29–44.35) 
 Hard < 19 0.72 (0.58–0.82) 0.85 (0.70–0.93) 0.34 0.54 0.67 0.76 0.82 0.96 0.92 0.87 0.82 0.75 4.66 (2.19–9.92) 0.33 (0.21–0.52) 13.93 (4.85–40.03) 
Hard < 18 0.68 (0.55–0.79) 0.90 (0.76–0.96) 0.42 0.62 0.74 0.82 0.87 0.96 0.92 0.87 0.81 0.74 6.62 (2.57–17.07) 0.36 (0.24–0.54) 18.53 (5.67–60.57) 
 Hard < 17 0.66 (0.53–0.77) 0.92 (0.80–0.97) 0.49 0.68 0.79 0.85 0.90 0.96 0.92 0.86 0.80 0.73 8.59 (2.85–25.90) 0.37 (0.25–0.54) 23.33 (6.31–86.29) 
 Hard < 16 0.64 (0.51–0.76) 0.92 (0.80–0.97) 0.48 0.68 0.78 0.85 0.89 0.96 0.91 0.86 0.79 0.72 8.34 (2.76–25.20) 0.39 (0.27–0.56) 21.47 (5.83–79.17) 
Total items correct 
 Total < 44 0.74 (0.60–0.84) 0.82 (0.67–0.91) 0.31 0.51 0.64 0.73 0.80 0.97 0.93 0.88 0.82 0.76 4.10 (2.06–8.18) 0.32 (0.20–0.52) 12.74 (4.59–35.34) 
 Total < 43 0.72 (0.58–0.82) 0.82 (0.67–0.91) 0.31 0.50 0.63 0.73 0.80 0.96 0.92 0.87 0.81 0.74 4.00 (2.00–7.98) 0.35 (0.22–0.54) 11.58 (4.21–31.89) 
 Total < 42 0.68 (0.55–0.79) 0.87 (0.73–0.94) 0.37 0.57 0.69 0.78 0.84 0.96 0.92 0.86 0.80 0.73 5.30 (2.29–12.26) 0.37 (0.24–0.55) 14.40 (4.79–43.34) 
Total < 41 0.68 (0.55–0.79) 0.90 (0.76–0.96) 0.42 0.62 0.74 0.82 0.87 0.96 0.92 0.87 0.81 0.74 6.62 (2.57–17.07) 0.36 (0.24–0.54) 18.53 (5.67–60.57) 
 Total < 40 0.66 (0.53–0.77) 0.90 (0.76–0.96) 0.42 0.62 0.73 0.81 0.87 0.96 0.91 0.86 0.80 0.73 6.44 (2.50–16.62) 0.38 (0.26–0.56) 17.01 (5.23–55.39) 
 Total < 39 0.62 (0.49–0.74) 0.95 (0.83–0.99) 0.57 0.75 0.84 0.89 0.92 0.96 0.91 0.85 0.79 0.72 12.14 (3.10–47.59) 0.40 (0.28–0.57) 30.53 (6.63–140.61) 

Notes: Bold values indicate cutoffs with optimal classification accuracy. VSVT = Victoria Symptom Validity Test; PPV = positive predictive value; NPV = negative predictive value; CI = confidence interval.

Table 5.

Classification accuracy of Easy–Hard items correct and items correct consistency cutoffs

VSVT cutoff Sensitivity, specificity (95% CI)
 
PPV for select base rates
 
NPV for select base rates
 
Likelihood ratios (95% CI)
 
Odds ratio (95% CI) 
Sensitivity Specificity 10% 20% 30% 40% 50% 10% 20% 30% 40% 50% Positive Negative 
Easy–Hard items correct 
 Easy–Hard > 4 0.72 (0.58–0.82) 0.85 (0.70–0.93) 0.34 0.54 0.67 0.76 0.82 0.96 0.92 0.87 0.82 0.75 4.66 (2.19–9.92) 0.33 (0.21–0.52) 13.93 (4.85–40.03) 
 Easy–Hard > 5 0.68 (0.55–0.79) 0.85 (0.70–0.93) 0.33 0.52 0.65 0.75 0.82 0.96 0.91 0.86 0.80 0.73 4.42 (2.07–9.43) 0.38 (0.25–0.57) 11.65 (4.10–33.07) 
Easy–Hard > 6 0.64 (0.51–0.76) 0.90 (0.76–0.96) 0.41 0.61 0.73 0.81 0.86 0.96 0.91 0.85 0.79 0.71 6.26 (2.42–16.17) 0.40 (0.27–0.58) 15.66 (4.83–50.80) 
 Easy–Hard > 7 0.62 (0.49–0.74) 0.92 (0.80–0.97) 0.47 0.67 0.78 0.84 0.89 0.96 0.91 0.85 0.79 0.71 8.09 (2.68–24.49) 0.41 (0.29–0.58) 19.80 (5.38–72.81) 
Items correct consistency score 
 <7 any block 0.87 (0.75–0.93) 0.69 (0.54–0.81) 0.24 0.41 0.55 0.65 0.74 0.98 0.95 0.92 0.89 0.84 2.82 (1.74–4.57) 0.19 (0.09–0.39) 14.79 (5.19–42.10) 
 <6 any block 0.77 (0.64–0.87) 0.77 (0.62–0.87) 0.27 0.46 0.59 0.69 0.77 0.97 0.93 0.89 0.84 0.77 3.35 (1.86–6.06) 0.29 (0.17–0.50) 11.39 (4.26–30.47) 
<5 any block 0.70 (0.56–0.80) 0.92 (0.80–0.97) 0.50 0.69 0.80 0.86 0.90 0.96 0.92 0.88 0.82 0.75 9.08 (3.02–27.31) 0.33 (0.22–0.50) 27.75 (7.45–103.44) 
 <4 any block 0.58 (0.45–0.71) 0.97 (0.87–1.00) 0.72 0.85 0.91 0.94 0.96 0.95 0.90 0.85 0.78 0.70 22.81 (3.25–160) 0.43 (0.31–0.59) 53.55 (6.83–419.86) 
VSVT cutoff Sensitivity, specificity (95% CI)
 
PPV for select base rates
 
NPV for select base rates
 
Likelihood ratios (95% CI)
 
Odds ratio (95% CI) 
Sensitivity Specificity 10% 20% 30% 40% 50% 10% 20% 30% 40% 50% Positive Negative 
Easy–Hard items correct 
 Easy–Hard > 4 0.72 (0.58–0.82) 0.85 (0.70–0.93) 0.34 0.54 0.67 0.76 0.82 0.96 0.92 0.87 0.82 0.75 4.66 (2.19–9.92) 0.33 (0.21–0.52) 13.93 (4.85–40.03) 
 Easy–Hard > 5 0.68 (0.55–0.79) 0.85 (0.70–0.93) 0.33 0.52 0.65 0.75 0.82 0.96 0.91 0.86 0.80 0.73 4.42 (2.07–9.43) 0.38 (0.25–0.57) 11.65 (4.10–33.07) 
Easy–Hard > 6 0.64 (0.51–0.76) 0.90 (0.76–0.96) 0.41 0.61 0.73 0.81 0.86 0.96 0.91 0.85 0.79 0.71 6.26 (2.42–16.17) 0.40 (0.27–0.58) 15.66 (4.83–50.80) 
 Easy–Hard > 7 0.62 (0.49–0.74) 0.92 (0.80–0.97) 0.47 0.67 0.78 0.84 0.89 0.96 0.91 0.85 0.79 0.71 8.09 (2.68–24.49) 0.41 (0.29–0.58) 19.80 (5.38–72.81) 
Items correct consistency score 
 <7 any block 0.87 (0.75–0.93) 0.69 (0.54–0.81) 0.24 0.41 0.55 0.65 0.74 0.98 0.95 0.92 0.89 0.84 2.82 (1.74–4.57) 0.19 (0.09–0.39) 14.79 (5.19–42.10) 
 <6 any block 0.77 (0.64–0.87) 0.77 (0.62–0.87) 0.27 0.46 0.59 0.69 0.77 0.97 0.93 0.89 0.84 0.77 3.35 (1.86–6.06) 0.29 (0.17–0.50) 11.39 (4.26–30.47) 
<5 any block 0.70 (0.56–0.80) 0.92 (0.80–0.97) 0.50 0.69 0.80 0.86 0.90 0.96 0.92 0.88 0.82 0.75 9.08 (3.02–27.31) 0.33 (0.22–0.50) 27.75 (7.45–103.44) 
 <4 any block 0.58 (0.45–0.71) 0.97 (0.87–1.00) 0.72 0.85 0.91 0.94 0.96 0.95 0.90 0.85 0.78 0.70 22.81 (3.25–160) 0.43 (0.31–0.59) 53.55 (6.83–419.86) 

Notes: Bold values indicate cutoffs with optimal classification accuracy. VSVT = Victoria Symptom Validity Test; PPV = positive predictive value; NPV = negative predictive value; CI = confidence interval.

Table 6 provides classification accuracy statistics for response latency-based cutoffs and response latency SD-based cutoffs. Adequate SP was achieved at response latencies of ≥4 s on Easy items (0.92) and ≥7 s on Hard items (0.95), and on response latency SD times of ≥4.5 s on Hard items (0.90). SN for each of these cutoffs was relatively low, however, ranging from 0.32 to 0.40. For each of these cutoffs, PPV was strong only at higher base rates of performance invalidity and NPV was strong only at lower base rates, suggesting that these cutoffs may not be adequate. LR− values indicated little to no effect of these cutoffs in ruling out performance invalidity, while LR+ values were in the small to moderate range for changing the post-test probability of performance invalidity.

Table 6.

Classification accuracy of response latency cutoffs

VSVT cutoff Sensitivity, specificity (95% CI)
 
PPV for select base rates
 
NPV for select base rates
 
Likelihood ratios (95% CI)
 
Odds ratio (95% CI) 
Sensitivity Specificity 10% 20% 30% 40% 50% 10% 20% 30% 40% 50% Positive Negative 
Easy items response latency 
 Easy ≥ 3 0.49 (0.36–0.62) 0.77 (0.61–0.87) 0.19 0.35 48 0.59 0.68 0.93 0.86 0.78 0.69 0.60 2.13 (1.13–4.01) 0.66 (0.48–0.91) 3.21 (1.28–8.05) 
 Easy ≥ 3.5 0.43 (0.31–0.57) 0.87 (0.73–0.94) 0.27 0.46 0.59 0.69 0.77 0.93 0.86 0.78 0.70 0.61 3.39 (1.41–8.11) 0.65 (0.50–0.85) 5.21 (1.76–15.42) 
Easy ≥ 4 0.38 (0.26–0.51) 0.92 (0.80–0.97) 0.35 0.55 0.68 0.77 0.83 0.93 0.86 0.78 0.69 0.60 4.91 (1.57–15.35) 0.68 (0.54–0.85) 7.27 (1.98–26.74) 
 Easy ≥ 4.5 0.26 (0.16–0.40) 0.95 (0.84–0.99) 0.36 0.56 0.69 0.77 0.84 0.92 0.84 0.75 0.66 0.56 5.28 (1.27–21.94) 0.78 (0.65–0.92) 6.82 (1.45–32.05) 
 Easy ≥ 5 0.23 (0.13–0.36) 0.97 (0.87–1.00) 0.50 0.69 0.79 0.85 0.90 0.92 0.83 0.75 0.65 0.56 8.83 (1.20–65.09) 0.79 (0.68–0.93) 11.12 (1.38–89.67) 
Hard items response latency 
 Hard ≥ 4 0.64 (0.51–0.76) 0.67 (0.51–0.79) 0.18 0.32 0.45 0.56 0.66 0.94 0.88 0.81 0.74 0.65 1.93 (1.18–3.13) 0.54 (0.35–0.82) 3.58 (1.50–8.55) 
 Hard ≥ 4.5 0.57 (0.43–0.69) 0.69 (0.54–0.81) 0.17 0.32 0.44 0.55 0.65 0.93 0.86 0.79 0.71 0.61 1.84 (1.09–3.11) 0.63 (0.43–0.91) 2.94 (1.23–7.01) 
 Hard ≥ 5 0.49 (0.36–0.62) 0.79 (0.64–0.89) 0.21 0.37 0.51 0.61 0.71 0.93 0.86 0.78 0.70 0.61 2.39 (1.22–4.70) 0.64 (0.47–0.87) 3.73 (1.45–9.61) 
 Hard ≥ 5.5 0.45 (0.33–0.59) 0.79 (0.64–0.89) 0.20 0.36 0.49 0.60 0.69 0.93 0.85 0.77 0.69 0.59 2.21 (1.11–4.38) 0.69 (0.51–0.92) 3.21 (1.24–8.27) 
 Hard ≥ 6 0.42 (0.29–0.55) 0.85 (0.70–0.93) 0.23 0.40 0.54 0.64 0.73 0.93 0.85 0.77 0.68 0.59 2.70 (1.21–6.02) 0.69 (0.53–0.90) 3.90 (1.40–10.90) 
 Hard ≥ 6.5 0.40 (0.28–0.53) 0.90 (76–0.96) 0.30 0.49 0.62 0.72 0.79 0.93 0.86 0.78 0.69 0.60 3.86 (1.44–10.36) 0.67 (0.53–0.86) 5.74 (1.78–18.53) 
Hard ≥ 7 40 (0.28–0.53) 0.95 (0.83–0.99) 0.46 0.66 0.77 0.84 0.89 0.93 0.86 0.79 0.70 0.61 7.73 (1.92–31.03) 0.64 (0.51–0.80) 12.14 (2.64–55.82) 
Hard items response latency SD 
 Hard SD ≥ 4 0.34 (0.23–0.47) 0.87 (0.73–0.94) 0.23 0.40 0.53 0.64 0.73 0.92 0.84 0.75 0.66 0.57 2.65 (1.08–6.52) 0.76 (0.60–0.95) 3.50 (1.17–10.48) 
Hard SD ≥ 4.5 0.32 (0.21–0.45) 0.90 (0.76–0.96) 0.26 0.44 0.57 0.68 0.76 0.92 0.84 0.76 0.66 0.57 3.13 (1.14–8.57) 0.76 (0.61–0.94) 4.13 (1.26–13.51) 
 Hard SD ≥ 5 0.30 (0.20–0.44) 0.95 (0.83–0.99) 0.40 0.60 0.72 0.80 0.85 0.92 0.84 0.76 0.67 0.58 5.89 (1.44–24.13) 0.74 (0.61–0.89) 8.00 (1.72–37.28) 
 Hard SD ≥ 5.5 0.25 (0.15–0.38) 0.97 (0.87–1.00) 0.52 0.71 0.80 0.86 0.91 0.92 0.84 0.75 0.66 0.56 9.57 (1.31–70.08) 0.78 (0.66–0.91) 12.35 (1.54–99.04) 
VSVT cutoff Sensitivity, specificity (95% CI)
 
PPV for select base rates
 
NPV for select base rates
 
Likelihood ratios (95% CI)
 
Odds ratio (95% CI) 
Sensitivity Specificity 10% 20% 30% 40% 50% 10% 20% 30% 40% 50% Positive Negative 
Easy items response latency 
 Easy ≥ 3 0.49 (0.36–0.62) 0.77 (0.61–0.87) 0.19 0.35 48 0.59 0.68 0.93 0.86 0.78 0.69 0.60 2.13 (1.13–4.01) 0.66 (0.48–0.91) 3.21 (1.28–8.05) 
 Easy ≥ 3.5 0.43 (0.31–0.57) 0.87 (0.73–0.94) 0.27 0.46 0.59 0.69 0.77 0.93 0.86 0.78 0.70 0.61 3.39 (1.41–8.11) 0.65 (0.50–0.85) 5.21 (1.76–15.42) 
Easy ≥ 4 0.38 (0.26–0.51) 0.92 (0.80–0.97) 0.35 0.55 0.68 0.77 0.83 0.93 0.86 0.78 0.69 0.60 4.91 (1.57–15.35) 0.68 (0.54–0.85) 7.27 (1.98–26.74) 
 Easy ≥ 4.5 0.26 (0.16–0.40) 0.95 (0.84–0.99) 0.36 0.56 0.69 0.77 0.84 0.92 0.84 0.75 0.66 0.56 5.28 (1.27–21.94) 0.78 (0.65–0.92) 6.82 (1.45–32.05) 
 Easy ≥ 5 0.23 (0.13–0.36) 0.97 (0.87–1.00) 0.50 0.69 0.79 0.85 0.90 0.92 0.83 0.75 0.65 0.56 8.83 (1.20–65.09) 0.79 (0.68–0.93) 11.12 (1.38–89.67) 
Hard items response latency 
 Hard ≥ 4 0.64 (0.51–0.76) 0.67 (0.51–0.79) 0.18 0.32 0.45 0.56 0.66 0.94 0.88 0.81 0.74 0.65 1.93 (1.18–3.13) 0.54 (0.35–0.82) 3.58 (1.50–8.55) 
 Hard ≥ 4.5 0.57 (0.43–0.69) 0.69 (0.54–0.81) 0.17 0.32 0.44 0.55 0.65 0.93 0.86 0.79 0.71 0.61 1.84 (1.09–3.11) 0.63 (0.43–0.91) 2.94 (1.23–7.01) 
 Hard ≥ 5 0.49 (0.36–0.62) 0.79 (0.64–0.89) 0.21 0.37 0.51 0.61 0.71 0.93 0.86 0.78 0.70 0.61 2.39 (1.22–4.70) 0.64 (0.47–0.87) 3.73 (1.45–9.61) 
 Hard ≥ 5.5 0.45 (0.33–0.59) 0.79 (0.64–0.89) 0.20 0.36 0.49 0.60 0.69 0.93 0.85 0.77 0.69 0.59 2.21 (1.11–4.38) 0.69 (0.51–0.92) 3.21 (1.24–8.27) 
 Hard ≥ 6 0.42 (0.29–0.55) 0.85 (0.70–0.93) 0.23 0.40 0.54 0.64 0.73 0.93 0.85 0.77 0.68 0.59 2.70 (1.21–6.02) 0.69 (0.53–0.90) 3.90 (1.40–10.90) 
 Hard ≥ 6.5 0.40 (0.28–0.53) 0.90 (76–0.96) 0.30 0.49 0.62 0.72 0.79 0.93 0.86 0.78 0.69 0.60 3.86 (1.44–10.36) 0.67 (0.53–0.86) 5.74 (1.78–18.53) 
Hard ≥ 7 40 (0.28–0.53) 0.95 (0.83–0.99) 0.46 0.66 0.77 0.84 0.89 0.93 0.86 0.79 0.70 0.61 7.73 (1.92–31.03) 0.64 (0.51–0.80) 12.14 (2.64–55.82) 
Hard items response latency SD 
 Hard SD ≥ 4 0.34 (0.23–0.47) 0.87 (0.73–0.94) 0.23 0.40 0.53 0.64 0.73 0.92 0.84 0.75 0.66 0.57 2.65 (1.08–6.52) 0.76 (0.60–0.95) 3.50 (1.17–10.48) 
Hard SD ≥ 4.5 0.32 (0.21–0.45) 0.90 (0.76–0.96) 0.26 0.44 0.57 0.68 0.76 0.92 0.84 0.76 0.66 0.57 3.13 (1.14–8.57) 0.76 (0.61–0.94) 4.13 (1.26–13.51) 
 Hard SD ≥ 5 0.30 (0.20–0.44) 0.95 (0.83–0.99) 0.40 0.60 0.72 0.80 0.85 0.92 0.84 0.76 0.67 0.58 5.89 (1.44–24.13) 0.74 (0.61–0.89) 8.00 (1.72–37.28) 
 Hard SD ≥ 5.5 0.25 (0.15–0.38) 0.97 (0.87–1.00) 0.52 0.71 0.80 0.86 0.91 0.92 0.84 0.75 0.66 0.56 9.57 (1.31–70.08) 0.78 (0.66–0.91) 12.35 (1.54–99.04) 

Notes: Bold values indicate cutoffs with optimal classification accuracy. VSVT = Victoria Symptom Validity Test; PPV = positive predictive value; NPV = negative predictive value; CI = confidence interval; SD = standard deviation.

Using Multiple Victoria Symptom Validity Test Cutoffs

Initial analyses on using multiple VSVT cutoffs evaluated the impact of using all eight VSVT cutoffs recommended in the preceding analyses. At a classification threshold of failing ≥1 VSVT cutoff to be classified as performance invalid, using all VSVT cutoffs resulted in a dramatic decrease in SP (0.74) and only a modest increase in SN (0.75). Ultimately, adequate SP was not achieved until examinees were required to fail ≥4 VSVT cutoffs to be classified as performance invalid (SP = 0.97). However, this strategy reduced SN (0.64) relative to the stronger VSVT cutoffs recommended above used on their own.

Subsequent analyses focused on determining which PVTs should be used in conjunction by evaluating which pairs of VSVT cutoffs are independent of one another. Table 7 presents χ2 tests of all pairs of the eight recommended VSVT cutoffs. A Bonferroni-corrected α-level of <0.0014 was implemented for the use of 28 χ2 tests. Only 5 of 28 possible pairs of VSVT cutoffs were deemed independent at this α-level. Table 8 presents the SN and SP values of these five pairs of independent VSVT cutoffs at classification thresholds of failing ≥1 and ≥2 VSVT cutoffs to be classified as performance invalid. At a classification threshold of failing ≥1 VSVT cutoff, all but one pair of independent VSVT cutoffs resulted in unacceptable levels of SP. However, the one pair with acceptable SP (Hard Items Response Latency ≥ 7 s and Easy Items Correct < 23) resulted in SP of 0.90 and SN of only 0.52. In contrast, all pairs of independent VSVT cutoffs resulted in adequate SP at a classification threshold of failing ≥2 VSVT cutoffs, but SN was exceedingly low with these VSVT cutoff pairs, ranging from 0.19 to 0.32.

Table 7.

The χ2 tests of independence for pairs of recommended Victoria Symptom Validity Test (VSVT) cutoffs

Cutoff 
 18.32* 23.04* 12.11* 23.12* a,a a,
  80.20* 80.29* 60.31* 9.67 17.04* 13.64* 
   72.82* 60.31* 9.67 17.04* 10.19 
    59.03* 5.98 15.30* 11.86* 
     10.80* 18.46* 14.08* 
      35.73* 28.16* 
       49.38* 
        
Cutoff 
 18.32* 23.04* 12.11* 23.12* a,a a,
  80.20* 80.29* 60.31* 9.67 17.04* 13.64* 
   72.82* 60.31* 9.67 17.04* 10.19 
    59.03* 5.98 15.30* 11.86* 
     10.80* 18.46* 14.08* 
      35.73* 28.16* 
       49.38* 
        

Notes: 1 = VSVT Easy items correct <23; 2 = VSVT Hard items correct <18; 3 = VSVT Total items correct <41; 4 = VSVT Easy–Hard items correct >6; 5 = VSVT <5 any block; 6 = VSVT Easy items response latency ≥4 s; 7 = VSVT Hard items response latency ≥7 s; 8 = VSVT Hard items response latency ≥4.5 s.

aExpected cell counts were <10; therefore, Fisher's exact test was run for these tests.

*Statistically significant at a Bonferroni-corrected α-level of 0.0014.

Table 8.

Sensitivity (SN) and specificity (SP) values of pairs of independent Victoria Symptom Validity Test (VSVT) cutoffs

Pairs of independent VSVT cutoffs Classification thresholds
 
≥1
 
≥2
 
SN SP SN SP 
Easy item response latency ≥ 4 s and Hard items correct <18 0.74 0.82 0.32 1.00 
Easy items response latency ≥ 4 s and Total items correct <41 0.74 0.82 0.32 1.00 
Easy items response latency ≥ 4 s and Easy–Hard items correct >6 0.74 0.82 0.28 1.00 
Hard items response latency ≥ 7 s and Easy items correct <23 0.53 0.90 0.19 1.00 
Hard items response latency SD ≥ 4.5 s and Total items correct <41 0.70 0.79 0.30 1.00 
Pairs of independent VSVT cutoffs Classification thresholds
 
≥1
 
≥2
 
SN SP SN SP 
Easy item response latency ≥ 4 s and Hard items correct <18 0.74 0.82 0.32 1.00 
Easy items response latency ≥ 4 s and Total items correct <41 0.74 0.82 0.32 1.00 
Easy items response latency ≥ 4 s and Easy–Hard items correct >6 0.74 0.82 0.28 1.00 
Hard items response latency ≥ 7 s and Easy items correct <23 0.53 0.90 0.19 1.00 
Hard items response latency SD ≥ 4.5 s and Total items correct <41 0.70 0.79 0.30 1.00 

Discussion

The current study evaluated the classification accuracy of previously recommended VSVT cutoffs in a medico-legal sample of mTBI examinees. Raw score differences between the performance valid and performance invalid groups were greatest on indexes of Hard items correct, Total items correct, Easy–Hard items correct, and average items correct in each block. Cutoffs based on these indexes also demonstrated the strongest classification accuracy, with cutoffs of <18 Hard items correct, <41 Total items correct, and an Easy–Hard items correct score >6 achieving adequate SP for clinical implementation. In addition, a cutoff based on consistency of items correct scores (<5 on any one block) also demonstrated strong classification accuracy and acceptable SP. Evaluation of the impact of using combinations of VSVT cutoffs revealed that this approach either substantially decreased SP or substantially reduced SN. Instead, the current results suggest that only a single, strong cutoff from the VSVT should be used in clinical practice.

The findings of the current study are both consistent and inconsistent with previous findings. Similar to Jones (2013a), cutoffs based on Hard items correct, Total items correct, and Easy–Hard items correct demonstrated stronger classification accuracy than cutoffs based on Easy items correct, Easy items response latency, and Hard items response latency. Moreover, cutoffs on Hard items correct of up to <18 (Grote et al., 2000; Jones, 2013a; Keary et al., 2013) Easy items correct up to <23 (Jones, 2013a), Total items correct up to <41 (Macciochi et al., 2006), Easy items response latency ≥4 s (Jones, 2013a), and Hard item response latency of >6.89 s (Macciochi et al., 2006) all demonstrated adequate SP in the current sample.

Inconsistencies, however, also emerged. Specifically, Jones (2013a) and Grote reported that more stringent cutoffs of <20 Hard items correct and <21 Hard items correct, respectively, resulted in adequate SP, while the current study found that these cutoffs led to unacceptably low SP (0.85 and 0.77, respectively). It should be noted, however, that several other studies (Keary et al. 2013; Loring et al. 2005, 2007) have raised questions about a Hard items correct cutoff of <21, documenting troublingly high false-positive rates in a range of diagnostic groups. Previous work also documented adequate SP for cutoffs of Easy–Hard items correct >4 (Jones, 2013a), Total items correct <44 (Jones, 2013a; Macciochi et al., 2006), and an items correct consistency score of <6 on any VSVT block (Grote et al., 2000). In the current sample, however, each of these cutoffs resulted in inadequate SP ranging from 0.77 to 0.85.

Recommended cutoffs based on response latency and response latency–SD in the current study were also generally discrepant from previous studies. Previous research recommended Easy item response latency cutoffs of ≥3 s (Grote et al., 2000), ≥4 s (Jones, 2013a), and >4.66 s (Macciochi et al., 2006), Hard item response latency cutoffs of ≥4 s (Grote et al., 2000), ≥5 s (Jones, 2013a), and >6.89 s, and Hard item response latency SD cutoffs of ≥1.9 s (Grote et al., 2000). In contrast, the current study found that most of these cutoffs resulted in inadequate SP, with the exception of an Easy items response latency cutoff of ≥4 s and Hard item response latency >6.89 s (Macciochi et al., 2006). In the current sample, the strongest classification accuracy with adequate SP emerged for cutoffs of Easy items response latency ≥4 s, Hard items response latency ≥7 s, and Hard items response latency SD ≥ 4.5 s.

There are several factors that may account for these inconsistencies. First, the current sample included only examinees meeting criteria for a diagnosis of mTBI, while no previous study has evaluated VSVT cutoffs in an exclusively mTBI sample. This may be especially important in accounting for discrepancies between the current study and previous studies including examinees with more severe cognitive impairment as in Loring and colleagues (2005), Loring and colleagues (2007), and Keary and colleagues (2013).

Secondly, with the exception of Jones (2013a), no prior study has used a known groups design to validate VSVT cutoffs. Instead, these previous studies have implemented simulated malingering designs (Slick et al., 1996; Strauss et al., 2002), differential prevalence designs (Grote et al. 2000; Keary et al. 2013; Loring et al. 2005,2007), or a combination of these two designs (Macciochi et al., 2006). As mentioned previously, each of these designs has unique limitations that may bias results. Specifically, simulated malingering designs often overestimate classification accuracy, especially when non-clinical examinees are included in samples. This may account for the stronger classification accuracy documented by Slick and colleagues (1996) and Strauss and colleagues (2002) and the more stringent cutoffs proposed by Macciochi and colleagues (2006). In addition, differential prevalence designs are frequently limited by their use of litigation status as an indicator of risk for performance invalidity and their omission of independent measures to objectively determine performance invalidity classification. Grote (2007) has argued that this failure to sufficiently screen out performance invalid examinees from clinical samples may explain the much higher failure rates on VSVT cutoffs documented by Loring and colleagues (2005) and Loring and colleagues (2007).

Thirdly, among the one study that has used a known groups design to validate VSVT cutoffs (Jones, 2013a), key methodological differences may have impacted results. Specifically, in an attempt to provide purer performance validity criterion groups, Jones (2013a) excluded examinees who failed a single criterion PVT from analyses. By excluding these more ambiguous and harder-to-classify examinees, however, Jones (2013a) likely overestimated the classification accuracy of VSVT cutoffs. Consistent with this claim, when we exclude examinees who failed a single PVT from our sample, recommended VSVT cutoffs on items correct indexes increase to Easy items correct <23 (SN = 0.32, SP = 0.95), Hard items correct <20 (SN = 0.74, SP = 0.90), Total items correct <42 (SN = 0.68, SP = 0.90), and Easy–Hard items correct >4 (SN = 0.77, SP = 0.90). With the exception of Total items correct, these cutoffs are consistent with those recommended by Jones (2013a).

In contrast, the current study retained examinees who failed only a single criterion PVT and opted for a deliberately conservative methodological approach by classifying these examinees as performance valid. This approach, however, may have underestimated the classification accuracy of VSVT cutoffs, as some of these examinees who failed a single PVT may have in fact been producing performance invalid scores. Given that there is no true “gold standard” of performance invalidity available for known groups designs, we believe that these two approaches should be interpreted in conjunction. In this way, the results of Jones (2013a) should be understood as representing the ceiling on classification accuracy of VSVT cutoffs and the current study's findings should represent the floor of estimated classification accuracy of these cutoffs. Given that greater emphasis has been placed on maintaining high SP over SN (Greve & Bianchini, 2004) and that SN can be substantially improved through the use of multiple PVTs (Iverson & Franzen, 1996; Larrabee, 2003; Meyers & Volbrecht, 2003; Victor et al., 2009), we believe that the more conservative cutoffs recommended by the current study should guide clinical practice with examinees diagnosed with mTBI. However, serious caution should be exercised in interpreting neuropsychological test data of examinees who pass the VSVT based on cutoffs recommended by the current study but fail the VSVT based on cutoffs proposed by Jones (2013a).

Findings from the current study suggest that only a single cutoff from the VSVT should be analyzed. Although multiple cutoffs are available, flexibly analyzing these cutoffs substantially reduced classification accuracy by decreasing either SN or SP in the current sample. This potential danger of flexible analysis has previously been recognized with embedded PVTs in which, similar to the multiple cutoffs available on the VSVT, a number of performance validity indicators are available and it is at the clinician's discretion which and how many indicators should be analyzed (Silk-Eglit et al., 2015a). While the current study and previous research have documented several cutoffs from the VSVT with strong classification accuracy, considerably fewer of these cutoffs have garnered converging support from multiple studies and multiple research designs. Of all recommended cutoffs, a cutoff of <18 Hard items correct has received the greatest converging support (Grote et al., 2000; Jones, 2013a; Keary et al., 2013). This suggests that this cutoff alone should be used with mTBI examinees.

Future research should continue to explore the classification accuracy of VSVT cutoffs in other diagnostic populations. To date, no known groups study has explored the VSVT outside of samples consisting primarily of mTBI examinees. The generalizability of the current findings to other diagnostic groups thus remains unclear. Moreover, some research has suggested that examinees with low full scale IQ (Keary et al., 2013; Loring et al., 2005), poor working memory (Keary et al., 2013), and impaired visual perceptual and verbal initiation abilities (Macciochi et al., 2006) may be especially prone to false-positive classification on the VSVT. Consequently, future research should continue to explore the cognitive abilities that underlie successful VSVT performance and outline profiles of specific neuropsychological impairments that impact the accuracy of the VSVT.

Conflict of Interest

None declared.

Acknowledgements

The authors are grateful to Cecil R. Reynolds, former editor of Archives of Clinical Neuropsychology and the current editor of Psychological Assessment, who served as the guest action editor. This manuscript was submitted to the blind peer review process.

References

Babikian
T.
,
Boone
K. B.
,
Lu
P.
,
Arnold
G.
(
2006
).
Sensitivity and specificity of various digit span scores in the detection of suspect effort
.
Archives of Clinical Neuropsychology
 ,
26
,
592
601
.
Berthelson
L.
,
Mulchan
S. S.
,
Odland
A. P.
,
Miller
L. J.
,
Mittenberg
W.
(
2013
).
False positive diagnosis of malingering due to the use of multiple effort tests
.
Brain Injury
 ,
27
,
909
916
.
Bilder
R. M.
,
Sugar
C. A.
,
Hellemann
G. S.
(
2014
).
Cumulative false positive rates given multiple performance validity tests: Commentary on Davis and Millis (2014) and Larrabee (2014)
.
The Clinical Neuropsychologist
 ,
28
,
1212
1223
.
Bush
S. S.
,
Ruff
R. M.
,
Troster
A. I.
,
Barth
J. T.
,
Koffler
S. P.
,
Pliskin
N. H.
et al
. (
2005
).
Symptom validity assessment: Practice issues and medical necessity NAN policy & planning committee
.
Archives of Clinical Neuropsychology
 ,
20
,
419
426
.
Brennan
A. M.
,
Gouvier
W. D.
(
2006
).
Are we honestly studying malingering? A profile and of simulated and suspected malingerers
.
Applied Neuropsychology
 ,
13
,
1
11
.
Committee on Mild Traumatic Brain Injury, American Congress of Rehabilitation Medicine (ACRM)
(
1993
).
Definition of mild traumatic brain injury
.
Journal of Head Trauma Rehabilitation
 ,
8
,
86
87
.
Davis
J. J.
,
Millis
S. R.
(
2014
).
Examination of performance validity test failure in relation to number of tests administered
.
The Clinical Neuropsychologist
 ,
28
,
199
214
.
Dicarlo
M. A.
,
Gfeller
J. D.
,
Oliveri
M. V.
(
2000
).
Effects of coaching on detecting feigned cognitive impairment with the category test
.
Archives of Clinical Neuropsychology
 ,
15
,
399
413
.
Green
P.
(
2003
).
Green's word memory test for Microsoft windows
 .
Edmonton, Alberta
:
Green's Publishing
.
Green
P.
(
2005
,
revised 2003
).
Manual for the word memory test
 .
Edmonton
:
Green's Publishing
.
Green
P.
,
Flaro
L.
,
Brockhaus
R.
,
Montijo
J.
(
2012
).
Word memory test
. In
Reynolds
C. R.
,
Horton
A. M.
Jr.
(Eds.),
Detection of malingering during head injury litigation
  (
2nd ed
., pp.
201
219
).
New York
:
Springer
.
Green
P.
,
Montijo
J.
,
Brockhaus
R.
(
2011
).
High specificity of the Word Memory Test and Medical Symptom Validity Test in groups with severe verbal memory impairment
.
Applied Neuropsychology
 ,
18
,
86
94
.
Green
P.
,
Rohling
M. L.
,
Lees-Haley
P. R.
,
Allen
L. M.
(
2001
).
Effort has a greater effect on test scores than severe brain injury in compensation claimants
.
Brain Injury
 ,
15
,
1045
1060
.
Fox
D.
(
2011
).
Symptom validity test failure indicates invalidity of neuropsychological tests
.
The Clinical Neuropsychologist
 ,
25
,
488
495
.
Gervais
R. O.
,
Ben-Porath
Y. S.
,
Wygant
D. B.
,
Green
P.
(
2007
).
Development and validation of a Response Bias Scale (RBS) for the MMPI-2
.
Assessment
 ,
14
,
196
208
.
Greiffenstein
M. F.
,
Baker
W. J.
,
Gola
T.
(
1994
).
Validation of malingered amnesia measures with a large clinical sample
.
Psychological Assessment
 ,
6
,
218
224
.
Greve
K. W.
,
Bianchini
K. J.
(
2004
).
Setting empirical cutoffs on psychometric indicators of negative response bias: A methodological commentary with recommendations
.
Archives of Clinical Neuropsychology
 ,
19
,
533
541
.
Greve
K. W.
,
Bianchini
K. J.
,
Doane
B. M.
(
2006
).
Classification accuracy of the Test of Memory Malingering in traumatic brain injury: Results of known-groups analysis
.
Journal of Clinical and Experimental Neuropsychology
 ,
28
,
1176
1190
.
Greve
K. W.
,
Ord
J.
,
Curtis
K. L.
,
Bianchini
K. J.
,
Brennan
A.
(
2008
).
Detecting malingering in traumatic brain injury and chronic pain: A comparison of three forced-choice symptom validity tests
.
The Clinical Neuropsychologist
 ,
22
,
896
918
.
Grimes
D. A.
,
Schulz
K. F.
(
2005
).
Epidemiology 3. Refining clinical diagnosis with likelihood ratios
.
Lancet
 ,
365
,
1500
1505
.
Grote
C. L.
(
2007
).
Forced-choice recognition tests of malingering
. In
Larabee
G. J.
(Ed.),
Assessment of malingered neurocognitive deficits
  (pp.
44
79
).
New York, NY
:
Oxford University Press
.
Grote
C. L.
,
Kooker
E. K.
,
Garron
D. C.
,
Nyenhuis
D. L.
,
Smith
C. L.
,
Mattingly
M. L.
(
2000
).
Performance of compensation seeking and non-compensation seeking samples on the Victoria Symptom Validity Test: Cross-validation and extension of a standardization study
.
Journal of Clinical and Experimental Neuropsychology
 ,
22
(6)
,
709
719
.
Haines
M. E.
,
Norris
M. P.
(
1995
).
Detecting the malingering of cognitive deficits: An update
.
Neuropsychology Review
 ,
5
,
125
148
.
Hayden
S. R.
,
Brown
M. D.
(
1999
).
Likelihood ratios: A powerful tool for incorporating the results of diagnostic tests into clinical decision making
.
Annals of Emergency Medicine
 ,
33
,
575
580
.
Heilbronner
R. L.
,
Sweet
J. J.
,
Morgan
J. E.
,
Larrabee
G. L.
,
Millis
S. R.
, &
Conference Participants
. (
2009
).
American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias and malingering
.
The Clinical Neuropsychologist
 ,
23
,
1093
1129
.
Heinly
M. T.
,
Greve
K. W.
,
Bianchini
K. J.
,
Love
J. M.
,
Brennan
A.
(
2005
).
WAIS digit span-based indicators of malingered neurocognitive dysfunction: Classification accuracy in traumatic brain injury
.
Assessment
 ,
12
,
429
444
.
Iverson
G. L.
,
Franzen
M. D.
(
1996
).
Using multiple objective memory procedures to detect simulated malingering
.
Journal of Clinical and Experimental Neuropsychology
 ,
18
,
38
51
.
Jones
A.
(
2013a
).
Victoria Symptom Validity Test: Cutoff scores for psychometrically defined malingering groups in a military sample
.
The Clinical Neuropsychologist
 ,
27
,
1373
1384
.
Jones
A.
(
2013b
).
Test of Memory Malingering: Cutoff scores for psychometrically defined malingering groups in a military sample
.
The Clinical Neuropsychologist
 ,
27
,
1043
1059
.
Keary
T. A.
,
Frazier
T. W.
,
Belzile
C. J.
,
Chapin
J. S.
,
Naugle
R. I.
,
Najm
I. M.
et al
. (
2013
).
Working memory and intelligence are associated with Victoria Symptom Validity Test hard item performance in patients with intractable epilepsy
.
Journal of the International Neuropsychological Society
 ,
19
,
314
323
.
Larrabee
G. J.
(
2003
).
Detection of malingering using atypical performance patterns on standard neuropsychological tests
.
The Clinical Neuropsychologist
 ,
17
,
410
425
.
Larrabee
G. J.
(
2008
).
Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios
.
The Clinical Neuropsychologist
 ,
17
,
410
425
.
Larrabee
G. J.
(
2012
).
Performance validity and symptom validity in neuropsychological assessment
.
Journal of the International Neuropsychological Society
 ,
18
,
625
631
.
Larrabee
G. J.
(
2014
).
False-positive rates associated with the use of multiple performance and symptom validity testes
.
Archives of Clinical Neuropsychology
 ,
29
,
364
373
.
Larrabee
G. J.
,
Greiffenstein
M. F.
,
Greve
K. W.
,
Bianchini
K. J.
(
2007
).
Refining diagnostic criteria for malingering
. In
Larrabee
G. J.
(Ed.),
Assessment of malingered neuropsychological deficits
  (pp.
334
371
).
New York, NY
:
Oxford University Press
.
Lees-Haley
P. R.
,
English
L. T.
,
Glenn
W. J.
(
1991
).
A Fake Bad scale on the MMPI-2 for personal injury claimants
.
Psychological Reports
 ,
68
,
203
210
.
Lezak
M. D.
,
Howieson
D. B.
,
Bigler
E. D.
,
Tranel
D.
(
2012
).
Neuropsychological assessment
  (
5th ed
.).
Oxford
:
Oxford University Press
.
Loring
D. W.
,
Larrabee
G. J.
,
Lee
G. P.
,
Meador
K. J.
(
2007
).
Victoria Symptom Validity Test performance in a heterogeneous clinical sample
.
The Clinical Neuropsychologist
 ,
21
,
522
531
.
Loring
D. W.
,
Lee
G. P.
,
Meador
K. J.
(
2005
).
Victoria Symptom Validity Test performance in non-litigating epilepsy surgery candidates
.
Journal of Clinical and Experimental Neuropsychology
 ,
27
,
610
617
.
Macciochi
S. N.
,
Seel
R. T.
,
Alderson
A.
,
Goodsall
R.
(
2006
).
Victoria Symptom Validity Test performance in acute severe traumatic brain injury: Implications for test interpretation
.
Archives of Clinical Neuropsychology
 ,
21
,
395
404
.
Meyers
J. E.
,
Volbrecht
M. E.
(
2003
).
A validation of multiple malingering detection methods in a large clinical sample
.
Archives of Clinical Neuropsychology
 ,
18
,
261
276
.
Mittenberg
W.
,
Patton
C.
,
Canyock
E. M.
,
Condit
D. C.
(
2002
).
Base rates of malingering and symptom exaggeration
.
Journal of Clinical and Experimental Neuropsychology
 ,
24
,
1094
1102
.
Randolph
C.
(
1998
).
Repeatable battery for the assessment of neuropsychological status (RBANS)
 .
San Antonio, TX
:
Harcourt: The Psychological Corporation
.
Reitan
R. M.
,
Wolfson
D.
(
1993
).
The Halstead–Reitan neuropsychological test battery. Theory and clinical interpretation
  (
2nd ed
.).
Tucson, AZ
:
Neuropsychology Press
.
Rogers
R.
(Ed.) (
1997
).
Clinical assessment of malingering and deception
  (
2nd ed
.).
New York
:
Guilford Press
.
Rogers
R.
(Ed.) (
2008
).
Clinical assessment of malingering and deception
  (
3rd ed
.).
New York
:
Guilford Press
.
Rogers
R.
,
Bagby
R. M.
,
Dickens
S. E.
(
1992
).
Structured interview of reported symptoms: Professional manual
 .
Odessa, FL
:
Psychological Assessment Resources
.
Sharland
M. J.
,
Gfeller
J. D.
(
2007
).
A survey of neuropsychologists’ beliefs and practices with respect to the assessment of effort
.
Archives of Clinical Neuropsychology
 ,
22
,
213
223
.
Silk-Eglit
G. M.
,
Stenclik
J. H.
,
Miele
A. S.
,
Lynch
J. K.
,
McCaffrey
R. J.
(
2015a
).
Rates of false positive classification resulting from the analysis of additional embedded performance validity measures
.
Applied Neuropsychology: Adult
 ,
Advance online publication. doi:10.1080/23279095.2014.938809
.
Silk-Eglit
G. M.
,
Stenclik
J. H.
,
Miele
A. S.
,
Lynch
J. K.
,
McCaffrey
R. J.
(
2015b
).
Performance validity classification accuracy of single, pairwise-, and triple-failure models using the Halstead–Reitan Neuropsychological Battery for Adults
.
Applied Neuropsychology: Adult
 ,
Advance online publication. doi:10.1080/23279095.2014.921167
.
Silverberg
N. D.
,
Wertheimer
J. C.
,
Fichtenberg
N. L.
(
2007
).
An effort index for the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS)
.
The Clinical Neuropsychologist
 ,
21
,
841
854
.
Slick
D.
,
Hopp
G.
,
Strauss
E.
,
Thompson
G. B.
(
1997
).
Victoria symptom validity test
 .
Odessa, FL
:
Psychological Assessment Resources
.
Slick
D. J.
,
Hopp
G.
,
Strauss
E.
,
Spellacy
F. J.
(
1996
).
Victoria Symptom Validity Test: Efficiency for detecting feigned memory impairment and relationship to neuropsychological tests and MMPI-2 validity scales
.
Journal of Clinical and Experimental Neuropsychology
 ,
18
,
911
922
.
Sollman
M. J.
,
Berry
D. T.
(
2011
).
Detection of inadequate effort on neuropsychological testing: A meta-analytic update and extension
.
Archives of Clinical Neuropsychology
 ,
26
,
774
789
.
Stenclik
J. H.
,
Miele
A. S.
,
Silk-Eglit
G. M.
,
Lynch
J. K.
,
McCaffrey
R. J.
(
2013
).
Can the sensitivity and specificity of the TOMM be increased with differential cutoff scores?
Applied Neuropsychology: Adult
 ,
20
,
243
248
.
Strauss
E.
,
Slick
D. J.
,
Levy-Bencheton
J.
,
Hunter
M.
,
MacDonald
S. W. S.
,
Hultsch
D. F.
(
2002
).
Intraindividual variability as an indicator of malingering in head injury
.
Archives of Clinical Neuropsychology
 ,
17
,
423
444
.
Tombaugh
T. N.
(
1996
).
The test of memory malingering (TOMM)
 .
North Tonawanda, NY
:
Multi-Health Systems
.
Vickery
C. D.
,
Berry
D. T.
,
Inman
T. H.
,
Harris
M. J.
,
Orey
S. A.
(
2001
).
Detection of inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures
.
Archives of Clinical Neuropsychology
 ,
16
,
45
73
.
Victor
T. L.
,
Boone
K. B.
,
Serpa
J. G.
,
Buehler
J.
,
Ziegler
E. A.
(
2009
).
Interpreting the meaning of multiple symptom validity test failure
.
Clinical Neuropsychology
 ,
23
,
297
313
.