## Abstract

Retrospective review of a consecutive patient series (n = 115) referred for neuropsychological examinations for traumatic brain injury was undertaken to evaluate an embedded measure of symptom validity for the Continuous Visual Memory Test (CVMT). Performance on the Test of Memory Malingering (TOMM) and Word Memory Test (WMT) was used for classification. Individuals who failed the TOMM or WMT were almost six times more likely to fail the CVMT validity criteria than those who passed the TOMM or WMT. The addition of compensation seeking increased this odds ratio to 9.80. The area under the curve for the latter classification was 0.74. Maximum likelihood ratio optimization of the CVMT validity test cutoff score indicated sensitivity of 0.25 and specificity of 0.99 at a revised cutoff of <12 items. Classification accuracy was 91%. The original cutoff score of <14 items also performed acceptably, with a classification accuracy of 88%. While low sensitivity argues against use in isolation, the proposed measure has utility in conjunction with other established effort measures.

## Introduction

Assessing response validity is crucial for valid measurement of cognitive or emotional pathology, particularly in the context of traumatic brain injury (TBI), where increased risk for symptom magnification and/or test underperformance has been shown in individuals with financial incentives or complicating premorbid factors (Donders & Boonstra, 2007; Green, Rohling, Lees-Haley, & Allen, 2001; Binder & Rohling, 1996). Accordingly, a number of means to assess response validity, also known as response bias or symptom validity tests (SVTs), have been developed. While these measures can be categorized in many ways, one major distinction between response bias indices has to do with whether they are “stand-alone,” meaning that they constitute separate tests that generally measure only response bias and are added to a neuropsychological battery, or “embedded,” meaning that they rely on a secondary analysis of measure(s) that are already being used to assess another cognitive attribute (Iverson & Binder, 2000). Both types of symptom validity measures have received considerable attention, and a number of both have been demonstrated to be efficacious (Larrabee, 2003; Lynch, 2004). Consequently, the assessment of response bias has been indicated as an integral component of a valid and interpretable neuropsychological examination by multiple governing bodies in the field (Bush et al., 2005; Heilbronner, Sweet, Morgan, Larrabee, & Millis, 2009).

Research has shown that the use of multiple measures of symptom validity not only increases the accuracy of detecting invalid performance, but may be necessary for adequate identification of individuals demonstrating intentional underperformance (Larrabee, 2008; Lynch, 2004; Victor, Boone, Serpa, Buehler, & Ziegler, 2009). This is because of the possibility that response validity may be variable over the course of an assessment. Accordingly, it has been recommended that the assessment of response bias happen at multiple time points within the overall evaluation and that concordance of SVT failures be considered in making inferences about the validity of test performance (Bush et al., 2005; Slick, Sherman, & Iverson, 1999). Although multiple stand-alone measures of response bias could potentially be used to meet these requirements, there are some limitations associated with this approach, which have to do with the fact that these measures are inherently self-contained instruments that do not provide information outside of symptom validity (Iverson & Binder, 2000; Larrabee, 2003; Mittenberg, Aquilla-Puentes, et al., 2002). First, to the extent that performance validity is variable over the course of the day or in some idiosyncratic fashion with respect to individual tests administered to a patient, a single formal measure of response bias provides a snapshot of symptom validity at a different “time” and in a different “context” than the actual measures of cognitive functioning for which validity is being inferred. Boone (2009) highlights several scenarios in which an invalid response profile may not be detected if validity testing is conducted only at a single point in time, such as with an isolated test at the beginning of a neuropsychological battery. Furthermore, multiple stand-alone tests of response bias also require additional patient and clinician time dedicated to their administration, which may be inefficient when examination time is limited or when there are confounding factors like patient fatigue.

In contrast, embedded measures are a means of assessing symptom validity based on a secondary, alternate analysis of result from an existing instrument. These measures take a variety of forms, including consideration of infrequent errors, unusual response patterns, or levels of impairment that are not seen in well-characterized neurological populations in the absence of poor motivation. Embedded measures can potentially overcome both of the above obstacles seen with stand-alone measures. First, if a test of memory has a valid embedded measure of symptom validity built in, the sample of performance validity is taken at the same time and in the same context as the actual data about the patient's memory performance that this instrument yields. Second, when embedded measures that arise from tests that are already being used to assess cognition are utilized to measure response bias, doing so requires little or no additional patient time, even if a sizable number of tests within a neurocognitive battery have embedded symptom validity measures. These advantages make embedded measures a potentially compelling part of a neuropsychological evaluation, either in the form of multiple embedded measures or with some combination of embedded and stand-alone measures.

It comes as little surprise, therefore, that a number of embedded measures of symptom validity are in common clinical use. These include reliable digit span on the Wechsler intelligence scales, cut-off scores for raw completion time on the Trail Making Test-Part A, analysis of process indices on the Wisconsin Card Sorting Test, and a number of others (Axelrod, Fichtenberg, Millis, & Wertheimer, 2006; Horton & Roberts, 2005; Iverson et al., 2002; Larrabee, 2003). These embedded measures build upon very commonly used tests, and they offer incremental utility in detecting patterns of suboptimal effort. Notably these measures are embedded in tests that assess a variety of cognitive attributes, meaning that their use also allows for validity inferences across a broader range of cognitive domains, such as attention, processing speed, and problem solving. Another cognitive skill or attribute that is a common component of neuropsychological assessments is memory. The Continuous Visual Memory Test (CVMT) is a test of visual memory that involves discrimination between previously presented drawings and novel drawings (Trahan & Larrabee, 1988). The CVMT has been shown to have clinical utility in assessing visual memory in patient groups of interest, such as individuals with TBI (Strong & Donders, 2008). Recently, a new method of an embedded SVT has been proposed for the CVMT (Larrabee, 2009).

The proposed method consists of creating an SVT score for the CVMT by tallying scores on a set of empirically identified items from the test (both during the immediate learning and delayed recognition [DR] phases) that are rarely missed by traditional clinical patients or individuals who otherwise appear to be putting forth valid test performance and are motivated to perform well (Larrabee, 2009). To assess the presence of invalid test performance characteristic of malingering, Larrabee made use of a known-groups research design. A group of individuals whose test performance was likely to be invalid was selected using an established set of criteria for identifying individuals who were found not only to have negative response bias by virtue of invalid test results (e.g., performance below established cut-offs on tests shown to be sensitive to negative response bias, or performance below the chance level on forced-option tests) but also had a known motivating factor for response bias, such as involvement in litigation. This set of criteria is commonly referred to as the “Slick criteria” (Slick, Sherman, & Iverson, 1999). Their CVMT results were then compared with those of individuals who did not have any evidence of invalid test performance or motivation concerns. Larrabee demonstrated that a cut-off score for the SVT could be established, below which the individuals who did not show other evidence of negative response bias rarely performed. A validated embedded method of measuring response bias on this test could potentially serve as a useful complement to other measures of symptom validity already in use. Accordingly, the present study sought to validate the method proposed by Larrabee (2009) by (a) determining the prevalence of negative response bias by his definition, in a consecutive series of patients with TBI referred to our clinic, (b) comparing the classification of symptom validity provided by the CVMT to classifications provided by well-established, stand-alone measures of symptom validity that are already in current use in our clinic, and (c) determining correlates of apparent invalid CVMT performance (e.g., performance on other measures of effort, injury severity, premorbid complicating factors, concurrent litigation).

It was decided a priori that, in order for the proposed SVT to be clinically useful, a cut-off would have to be found, such that performance below it would be highly specific to individuals with invalid test performance. A specificity of at least 0.90 has been recommended for SVTs (Heilbronner et al., 2009). At least moderate sensitivity would be valuable as well. Acceptable discrimination, as measured by an area under the curve (AUC) of at least 0.70, would also be required for the measure to be useful (Hosmer & Lemeshaw, 2000). However, as Greve and Bianchini (2004) note, instruments with low sensitivity but high specificity still have good positive predictive power and accordingly can be useful. This is particularly true in the case where multiple measures are used together, leading to higher overall sensitivity (e.g., Larrabee, 2003; Victor et al., 2009).

## Methods

### Participants

Participants were selected from a 48-month series of consecutive referrals to our outpatient neuropsychology clinic (January 2006–December 2009) according to the following criteria: (a) diagnosis of TBI through an external force to the head with associated alteration of consciousness, (b) neuropsychological evaluation completed within 1–24 months after injury, (c) age 18–80 years at assessment, (d) absence of any prior history of major neurological (e.g., brain tumor) or developmental (e.g., autism spectrum) condition, and (e) inclusion in the assessment of the CVMT and either Green's Word Memory Test (WMT; Green, 2003) or the Test of Memory Malingering (TOMM; Tombaugh, 1996). These tests were administered routinely to all patients with TBI during the period of review at the organization where this investigation was pursued, unless patient disability precluded them (e.g., significant uncorrected visual impairment). As part of these evaluations, the CVMT had been used clinically to make judgments about visual memory, but it had not been used for the purpose of SVT at the time of the clinical evaluations. Only original evaluations, no repeat evaluations, were used. In order to represent a typical clinical referral stream and to allow the examination of potential correlates of invalid effort, individuals were not excluded for reasons such as a prior psychiatric history or alcohol or substance abuse concerns. Using these criteria, a total of 115 patients were identified for incorporation in the study; their characteristics are summarized in Table 1.

Table 1.

Patient characteristics (n = 115)

Variable Mean or percent Median SD Range
Age (years, at time of evaluation) 40.71 44 13.35 20–63
Sex
Men 66% — — —
Women 34% — — —
Ethnicity
Caucasian 90% — — —
African American 5% — — —
Hispanic 3% — — —
Other 2% — — —
Education (years) 12.86 12 2.25 6–20
Prior psychiatric history (%) 39% — — —
Alcohol abuse
Never 72% — — —
Recovered >6 months 17% — — —
Active 11% — — —
Substance abuse
Never 90% — — —
Recovered >6 months 7% — — —
Active 3% — — —
Seeking $compensation 32% — — — Type of injury MVA 65% — — — Falls/recreation 19% — — — Other 16% — — — Days since injury 356 189 196 34–174 Length of coma (days) 1.39 4.98 0–37 <1 day 81% — — — ≥1 days 19% — — — Positive imaging findings 39% — — — Uncomplicated TBIa 61% — — — Obtained Full-Scale IQ 96.79 97 13.19 65–137 Variable Mean or percent Median SD Range Age (years, at time of evaluation) 40.71 44 13.35 20–63 Sex Men 66% — — — Women 34% — — — Ethnicity Caucasian 90% — — — African American 5% — — — Hispanic 3% — — — Other 2% — — — Education (years) 12.86 12 2.25 6–20 Prior psychiatric history (%) 39% — — — Alcohol abuse Never 72% — — — Recovered >6 months 17% — — — Active 11% — — — Substance abuse Never 90% — — — Recovered >6 months 7% — — — Active 3% — — — Seeking$ compensation 32% — — —
Type of injury
MVA 65% — — —
Falls/recreation 19% — — —
Other 16% — — —
Days since injury 356 189 196 34–174
Length of coma (days) 1.39 4.98 0–37
<1 day 81% — — —
≥1 days 19% — — —
Positive imaging findings 39% — — —
Uncomplicated TBIa 61% — — —
Obtained Full-Scale IQ 96.79 97 13.19 65–137

Note: TBI = traumatic brain injury; MVA = motor vehicle accident.

aDefined as individuals who had negative imaging findings as well as <1 day of coma.

All procedures were performed in compliance with applicable state and federal laws, as well as institutional guidelines. Use of patient information was approved by the institutional review board (IRB) at Mary Free Bed Rehabilitation Hospital prior to the initiation of the study, and a waiver of informed consent was granted for non-identifying retrospective use of patient data. The study was also monitored by the IRB on an ongoing basis. The current sample was largely independent of that used in a previous study in our laboratory pertaining to the criterion validity of the CVMT in persons with complicated mild–severe TBI and with no premorbid (e.g., psychiatric) or comorbid (e.g., litigation) confounding factors and who had all passed SVTs (Strong & Donders, 2008). Only 4 of the current 115 participants (3.48%) had also been included in that previous study.

### Measurements

The CVMT consists of a series of drawings containing a set of target items and a number of novel foils, which are sequentially presented at the rate of one design every 2 s, with the patient indicating whether drawings are new or have been observed previously (Trahan & Larrabee, 1988). Based on a combination of correct and incorrectly identified items, this yields a Total Score (TS). After a 30-min delay, delayed memory of the target drawings is tested in a recognition paradigm, yielding a DR index.

The TOMM and the WMT are two stand-alone measures of effort in cognitive testing that have been extensively validated and found to be reasonably comparable in detecting suboptimal test performance (e.g., Greiffenstein, Greve, Bianchini, & Baker, 2008; Lynch, 2004) and served as criterion standard against which the CVMT symptom validity could be compared. Some individuals completed both the WMT and the TOMM. The TOMM was more commonly utilized in cases of severe TBI without complicating premorbid or comorbid factors.

### Procedures

Participants were classified as having invalid test performance if they failed either the TOMM or the WMT. In addition, in order to simplify and operationalize the Slick and colleagues (1999) criteria, a simplified version of the “probable malingered neurocognitive disorder” definition was used. Specifically, individuals were considered to meet criteria for probable malingering if they failed either the TOMM or the WMT and also had known financial compensation seeking. The performance of the embedded CVMT symptom validity measure was assessed by comparison of individuals classified as providing acceptable versus invalid test performance (according to the criteria of Larrabee, 2009) with classifications provided by either the TOMM or the WMT, which were used as “gold standards” for the purpose of the study, as well as with the “Slick criteria.” Standard criteria were applied to TOMM and WMT scores, as outlined in the respective test manuals (Green, 2003; Tombaugh, 1996). The χ2 test of association was applied, as well as determination of the κ association statistic, to determine the overall level of agreement between the CVMT and the TOMM/WMT. The overall correct classification rate, sensitivity, specificity, positive predictive power, and negative predictive power based on the present sample were computed. Maximization of the likelihood ratio (MLR, or the ratio of sensitivity to false-positive rate) was used to determine if the CVMT would be able to better classify participants when the SVT item list was applied with either a higher or a lower threshold than proposed previously.

## Results

Of the 115 patients, 81 (70%) completed the WMT, with 25 obtaining “clear fail” scores (31% of patients taking the WMT). Thirty-nine (34%) completed the TOMM, with 3 (8%) obtaining failing scores. Taken together, 27 individuals (24% of the complete sample) met operational criteria for suspect effort (one individual failed both the TOMM and the WMT). When probable malingering by the above-specified definition was considered, 12 of these 27 participants met the criteria.

In terms of performance on the CVMT, patients had TSs ranging from 52 to 88 (M = 74.14, SD = 6.62) and DR scores ranging from 0 to 7 (M = 3.60, SD = 1.41). When the proposed SVT score was computed, patients had values ranging from 8 to 20, with a total of 12 patients (10%) performing below the originally proposed cut-off (<14 of 20 items correct). Lower scores on the proposed SVT were associated with both lower CVMT TS (r = .23, p < .02) as well as DR (r = .34, p < .001). However, this corresponded to only modest, although statistically significant, differences in group means, with individuals failing the SVT demonstrating an average TS of 70.50 (SD = 5.93) compared with 74.56 (SD = 6.59) for those who passed, t(113) = 2.04, p = .044; and an average DR of 2.67 (SD = 1.16) compared with 3.71 (SD = 1.40) for those who passed, t(113) = 2.48, p = .015. In fact, some individuals who failed the SVT performed quite well overall on the CVMT, with TS as high as 80 and DR as high as 5, scores which clearly fall outside the impaired range in comparison with normative data.

Next, the CVMT SVT was compared against the results of the TOMM and WMT. The 27 patients who failed either the TOMM or the WMT were substantially more likely to fail the CVMT SVT using the recommended criteria, χ2(1) = 9.06, p = .003; κ = 0.25, p = .015; OR = 5.81, 95% CI = 1.67–20.23. Notably, however, 5 of the 12 individuals who failed the CVMT SVT did not perform in the invalid range on the TOMM or WMT. The patients who met the criteria for potential malingering were substantially more likely to fail the CVMT SVT as well, χ2(1) = 13.98, p < .001; κ = 0.35, p = .010; OR = 9.80, 95% CI = 2.46–38.94. The AUC obtained for predicting meeting the potential malingering criteria by the raw score on the CVMT SVT was 0.742, which is comparable to the AUC of 0.779 reported by Larrabee (2009) when the proposed SVT threshold was cross-validated in a comparison of a probable malingering population to a neurological and psychiatric population.

When sensitivities and specificities at various SVT cut-off scores were considered in combination with the modified Slick criteria, the MLR was actually obtained at a cut-off SVT score of <12, or two points lower than the cutoff proposed by Larrabee (2009). At this threshold, a sensitivity of 0.25 and a specificity of 0.99 were obtained. Based on the present sample, positive predictive power was 0.75 and negative predictive power was 0.92. The overall correct classification rate was 91%. At the threshold originally suggested by Larrabee (i.e., a score <14), sensitivity was 0.42 and specificity was 0.93. The positive predictive power was 0.42 and the negative predictive power was 0.94. The overall correct classification rate was 88%. No threshold was found at which sensitivity was >0.90 with acceptable specificity. These findings are summarized in Table 2.

Table 2.

Classification summary

SVT threshold (score ≤ xCases below threshold Subset meeting criterion Sensitivity Specificity PPV NPV LR Correct classification rate (%)
0.00 0.99 0.00 0.89 0.00 89
11 0.08 0.99 0.50 0.90 8.58 90
12 0.25 0.99 0.75 0.92 25.75 91
13 0.33 0.98 0.67 0.93 17.17 91
14 12 0.42 0.93 0.42 0.93 6.13 88
15 18 0.50 0.88 0.33 0.94 4.29 84
16 28 0.67 0.81 0.29 0.95 3.43 79
17 42 0.75 0.68 0.21 0.96 2.34 69
18 68 0.75 0.43 0.13 0.96 1.31 46
19 87 10 0.83 0.25 0.11 0.96 1.11 31
20 109 12 1.00 0.06 0.11 1.000 1.06 16
SVT threshold (score ≤ xCases below threshold Subset meeting criterion Sensitivity Specificity PPV NPV LR Correct classification rate (%)
0.00 0.99 0.00 0.89 0.00 89
11 0.08 0.99 0.50 0.90 8.58 90
12 0.25 0.99 0.75 0.92 25.75 91
13 0.33 0.98 0.67 0.93 17.17 91
14 12 0.42 0.93 0.42 0.93 6.13 88
15 18 0.50 0.88 0.33 0.94 4.29 84
16 28 0.67 0.81 0.29 0.95 3.43 79
17 42 0.75 0.68 0.21 0.96 2.34 69
18 68 0.75 0.43 0.13 0.96 1.31 46
19 87 10 0.83 0.25 0.11 0.96 1.11 31
20 109 12 1.00 0.06 0.11 1.000 1.06 16

Notes: SVT = symptom validity test; PPV = positive predictive value; NPV = negative predictive value; LR = likelihood ratio.

At the lower SVT threshold of <12, individuals who failed the TOMM/WMT were more likely to fail the CVMT SVT as well, although the association was less robust, with the κ falling to the trend-significant range, χ2(1) = 6.12, p = .013; κ = 0.14, p = .091; OR = 10.88, 95% CI = 1.08–109.29. The association remained significant using either statistical test for individuals who met the revised Slick criteria, χ2(1) = 18.49, p < .001; κ = 0.34, p = .027; OR = 34.00, 95% CI = 3.20–361.41.

At the cutoff SVT score determined by the MLR method, one potential false positive (an individual who failed the CVMT SVT but passed the TOMM/WMT) was observed. On closer inspection, the clinical evaluation for this patient did suggest mild deficits, which were thought to be multi-factorial, and there was no suggestion of motivating factors, psychogenic overlay, or other issues related to poor performance. At the higher threshold of 14, four additional potential false positives were identified. No clear pattern was seen here. Two of these patients (one with severe TBI and another who was 6 weeks status post-mild TBI) were thought by the clinician to have valid TBI-related deficits. A third actually had a very strong CVMT performance, aside from the SVT, and was found to be essentially intact in terms of cognitive functioning. The final patient passed formal and embedded response bias measures but had variability in cognitive testing results that was thought to reflect a psychogenic overlay. In comparing the five potential false-positive cases with the remainder of the patient sample, the individuals who failed the CVMT but passed the TOMM/WMT did not differ significantly by age, sex, education, measured Full-Scale IQ, or time since injury.

In order to address the remote possibility that apparent SVT failure was actually the result of cognitive deficits associated with a more serious injury, a post hoc linear regression analysis was also completed. In this analysis, the CVMT SVT score was predicted using the following independent variables: Presence of positive imaging findings, length of coma ≥1 day, failure of the TOMM/WMT, and known financial compensation seeking. To address the issues of non-normality (e.g., the large skew of length of coma toward zero days), the four predictor variables were all dichotomized. The results of this regression are summarized in Table 3. The overall model was statistically significant, F(4, 110) = 3.33, p < .02, R2 = .11. Variance inflation factors were all <1.62, indicating no concerns about collinearity. As can be seen in Table 3, only failure of the TOMM/WMT reached significance as a predictor variable.

Table 3.

Regression model predicting CVMT SVT score

Variable Parameter estimate Standard error Standardized estimate t-value p
Positive imaging −0.53 0.54 −0.11 0.99 0.33
Coma ≥ 1 day −1.08 0.67 −0.18 1.61 0.11
Seeking $compensation −0.70 0.46 −0.14 1.53 0.13 Failure of TOMM/WMT −1.35 0.50 −0.25 2.72 0.008 Variable Parameter estimate Standard error Standardized estimate t-value p Positive imaging −0.53 0.54 −0.11 0.99 0.33 Coma ≥ 1 day −1.08 0.67 −0.18 1.61 0.11 Seeking$ compensation −0.70 0.46 −0.14 1.53 0.13
Failure of TOMM/WMT −1.35 0.50 −0.25 2.72 0.008

Notes: CVMT = Continuous Visual Memory Test; SVT = symptom validity test; TOMM = Test of Memory Malingering; WMT = Word Memory Test.

## Discussion

The AUC observed for using the proposed SVT item set on the CVMT as a predictor of meeting the revised Slick criteria fell into the range of acceptable discrimination, as defined by Hosmer and Lemeshaw (2000). The SVT also had an acceptable specificity (i.e., >0.90) at either the initially proposed cut-off score or the lower cutoff score identified using MLR optimization. This provides an independent replication for the validity of using the proposed SVT item list for the CVMT from Larrabee (2009) to detect invalid test performance. Although the AUC value obtained in the present study was substantially lower than the outstanding discrimination observed when Larrabee (2009) compared a group with definite malingered neurocognitive disorder with a sample of nonmalingering individuals with moderate and severe TBI, it was comparable with the cross-validation using probable malingerers in the same study. The distinction between the two groups lies essentially in the fact that the “definite” characterization requires statistically below chance performance in effort testing, suggesting severe exaggeration of symptoms (Slick et al., 1999). The level of performance required on effort testing for placement in the definite malingered neurocognitive disorder grouping is rarely encountered among our clinical referrals; accordingly, our sample is more comparable with the probable malingering group in the Larrabee (2009) study. Given that our sample was a relatively heterogeneous series of patients with TBI, this comparable result is fairly encouraging, as it suggests that the proposed SVT is reasonably robust.

At the same time, while the proposed CVMT SVT had high specificity, a cut-off could not be determined at which adequate specificity was paired with moderate sensitivity. This suggests that the CVMT SVT is not appropriate for use in isolation, as many individuals who have invalid test performance might be missed. However, the SVT may still be useful as an adjunct to existing tests of symptom validity (e.g., TOMM, WMT, embedded measures on the Wechsler intelligence scales). In this regard, particularly the presence of a low rate of false positives is helpful, as this test can be used in cumulative interpretation with other tests without becoming overly concerned with attributing negative response bias to individuals who may have legitimate neuropsychological profiles. Since concordant findings of invalid test performance from multiple SVTs has been found to dramatically reduce the overall false-positive rate (Larrabee, 2008), the CVMT SVT holds promise for improving diagnostic accuracy in this regard.

Although the threshold value obtained in the current study, using the MLR method, did differ from that previously suggested, it is notable that the sensitivity and specificity computed at either threshold are fairly similar, as was the overall correct classification rate. The use of single, invariant cutoff scores can be problematic, since the underlying variable being measured via an SVT often does not conform to any natural distribution across samples. As the obtained value against the SVT measure worsens, rather, the clinician can essentially be increasingly certain that the results are due to negative response bias issues and are not simply values on the outer ends of the normal distribution. Clinicians may accordingly wish to start getting concerned about invalid test performance with values of the CVMT SVT <14 and pay particularly close attention to concordance with other measures of effort for obtained scores of <12. Independent validation of the CVMT SVT in different clinical and medicolegal populations also remains important, as the current study looked at response bias primarily in patients with TBI. Additionally, base rates may vary substantially across populations, which would lead to differences in predictive power (Mittenberg, Patton, et al., 2002; for a discussion of base-rate effects on positive and negative predictive power, seeStreiner, 2003).

The present study does have several limitations. First, in order to maximize the sample size and best reflect our actual clinical practice, a mixture of the TOMM and the WMT was used to establish a “gold standard” for suboptimal performance. Notably, multiple criteria have been established (e.g., Greiffenstein et al., 2008) for symptom validity using either the TOMM or the WMT, and this in turn affects the sensitivity of these measures themselves. In choosing the relatively strict standard criteria, for instance, the present study had a fairly low rate of TOMM failure, which may have been related to the fact that this instrument was more often used when there were no known complicating premorbid or comorbid factors. In the present study, using these criteria, 27 of 115 patients met criteria for response invalidity based on the TOMM or the WMT (23%), with 10% of the sample also meeting operational criteria for probable malingering. These observed base rates of response invalidity and probable malingering in this sample are generally in line with expectations for a clinical population with TBI. For instance, a study of mixed-severity TBI patients in the post-acute setting showed high rates of invalid response profiles in both mild TBI (23.5%) and moderate–severe TBI (18.1%) populations, with no relationship between TBI severity and likelihood of invalid response (Locke et al., 2008). Two studies previously published using completely independent samples of patients from the same referral stream as the present study also showed similar rates of invalid response (Donders & Boonstra, 2007; Moore & Donders, 2004). Similar to the findings in the Locke and colleagues (2008) study, in the Moore and Donders (2004) study, injury severity was not a predictor of response validity. Thus, the rate of invalid response profile obtained in this study is generally in keeping with the range of expectations previously reported; that is, there are not substantially more than expected individuals found in this sample who have invalid response profiles. To further address the possibility of false positives, analyses were re-completed using only the individuals who received the WMT, with similar results.

Second, the use of multiple stand-alone measures of response bias for each patient could also serve to strengthen the present findings, but the similar results obtained in comparison with Larrabee's original analysis argue that the CVMT SVT is reliable and valid in clinical practice. Third, although the sample size was adequately powered to find the expected association between the CVMT SVT and the criterion measures, a larger sample would have allowed more fine-grain determination of some classification accuracy variables, particularly specificity, at various cut-offs, because these variables can be strongly influenced by a single individual's performance. Future research may provide more understanding of the improvement in specificity at progressively lower obtained SVT scores.

Finally, the sample of patients with TBI used in this study was somewhat more heterogeneous than in some other studies, including individuals with more severe injuries. Individuals with more significant injuries do sometimes perform poorly on SVTs even when invalid test performance is not suspected (Merlen, Bossink, & Schmand, 2007). The WMT does have a so-called “dementia profile,” which screens for the possibility that individuals' poor results may be due to legitimate cognitive impairments, if other evidence suggests a more substantial brain injury or other neurological condition affecting cognition (Green, 2003). In fact, a few (5) individuals who failed the CVMT in the current investigation did meet that WMT criterion. However, only one of these cases involved more than an uncomplicated mild TBI, suggesting that results from the present study were not overly influenced by individuals whose poor results on the CVMT SVT were due to severe brain compromise. Overall, 13 of the 81 participants who completed the WMT met the numerical profile of WMT scores for the dementia profile, but only four of these were associated with positive imaging findings or coma lasting ≥1 day. Of the four, three had known non-neurological complicating factors (one compensation seeking, two with active poly-substance abuse). Thus, there was only one participant with severe TBI who had no premorbid or comorbid complicating factors and who failed the WMT. This would represent a false-positive proportion of 4% (1 out of all 25 persons who failed WMT), a rate that is generally agreed to be acceptably low (Greve & Bianchini, 2004). Thus, the present study does not appear to be overly affected by false positives, either in the sense of individuals who are mischaracterized as having an invalid response profile by the referent instruments (TOMM/WMT) or in the sense of individuals who are mischaracterized by the CVMT SVT.

In summary, the present study demonstrates in a new, independent sample of patients that the CVMT SVT paradigm proposed by Larrabee (2009) is able to provide acceptable discrimination with high specificity between patients with valid test performance and individuals who perform poorly on well-validated tests of response bias. This supports the conclusion that the CVMT SVT is useful clinically as an embedded measure of negative response bias in neuropsychological assessments. It is best used as an adjunct to other measures of symptom validity, such as in combination with a stand-alone measure and one or more other embedded measures of response bias, as well as other clinical information about the patient, in order to obtain a reasonable overall sensitivity.

## Funding

This work was supported by a grant from the Campbell Foundation.

## Conflict of Interest

The authors' clinical practice, as employees of a non-profit hospital, includes about 15% of medicolegal referrals, which are predominantly of defense origin. They do not receive any extra benefits from such referrals as compared to clinical referrals.

## Acknowledgments

The authors would like to thank Dr Glenn Larrabee for his comments on a preliminary version of this manuscript.

## References

Axelrod
B. N.
Fichtenberg
N. L.
Millis
S. R.
Wertheimer
J. C.
Detecting incomplete effort with Digit Span from the Wechsler Adult Intelligence Scale-Third Edition
The Clinical Neuropsychologist
,
2006
, vol.
20
(pg.
513
-
523
)
Binder
L.
Rohling
M.
Money matters: A meta-analytic review of the effects of financial incentives on recovery after closed-head injury
American Journal of Psychiatry
,
1996
, vol.
153
(pg.
7
-
10
)
Boone
K. B.
The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations
The Clinical Neuropsychologist
,
2009
, vol.
23
(pg.
729
-
741
)
Bush
S. S.
Ruff
R. M.
Tröster
A. I.
Barth
J. T.
Koffler
S. P.
Pliskin
N. H.
, et al.  .
Symptom validity assessment: Practice issues and medical necessity NAN policy & planning committee
Archives of Clinical Neuropsychology
,
2005
, vol.
20
(pg.
419
-
426
)
Donders
J.
Boonstra
T.
Correlates of invalid neuropsychological test performance after traumatic brain injury
Brain Injury
,
2007
, vol.
21
(pg.
319
-
326
)
Green
P.
Manual for the Word Memory Test for windows
,
2003
Edmonton
Green's Publishing
Green
P.
Rohling
M. L.
Lees-Haley
P. R.
Allen
L. M.
Effort has a greater effect on test scores than severe brain injury in compensation claimants
Brain Injury
,
2001
, vol.
15
(pg.
1045
-
1060
)
Greiffenstein
M. F.
Greve
K. W.
Bianchini
K. J.
Baker
W. J.
Test of memory malingering and word memory test: A new comparison of failure concordance rates
Archives of Clinical Neuropsychology
,
2008
, vol.
23
(pg.
801
-
807
)
Greve
K. W.
Bianchini
K. J.
Setting empirical cut-offs on psychometric indicators of negative response bias: a methodological commentary with recommendations
Archives of Clinical Neuropsychology
,
2004
, vol.
19
(pg.
533
-
541
)
Heilbronner
R. L.
Sweet
J. J.
Morgan
J. E.
Larrabee
G. J.
Millis
S. R.
American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias, and malingering
The Clinical Neuropsychologist
,
2009
, vol.
23
(pg.
1093
-
1129
)
Horton
A. M.
Roberts
C.
Derived trail making test cutoffs and malingering among substance abusers
International Journal of Neuroscience
,
2005
, vol.
115
(pg.
1083
-
1096
)
Hosmer
D. W.
Lemeshaw
S.
Applied logistic regression
,
2000
2nd ed.
New York
Wiley
(pg.
156
-
164
)
Iverson
G. L.
Binder
L. M.
Detecting exaggeration and malingering in neuropsychological assessment
,
2000
, vol.
15
(pg.
829
-
858
)
Iverson
G. L.
Lange
R. T.
Green
P.
Franzen
M. D.
Detecting exaggeration and malingering with the trail making test
The Clinical Neuropsychologist
,
2002
, vol.
16
(pg.
398
-
406
)
Larrabee
G. J.
Detection of malingering using atypical performance patterns on standard neuropsychological tests
The Clinical Neuropsychologist
,
2003
, vol.
17
(pg.
410
-
425
)
Larrabee
G. J.
Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios
The Clinical Neuropsychologist
,
2008
, vol.
22
(pg.
666
-
679
)
Larrabee
G. J.
Malingering Scales for the Continuous Recognition Memory Test and the Continuous Visual Memory Test
The Clinical Neuropsychologist
,
2009
, vol.
23
(pg.
167
-
180
)
Locke
D. E. C.
Smigielski
J. S.
Powerll
M. R.
Stevens
S. R.
Effort issues in post-acute outpatient acquired brain injury rehabilitation seekers
NeuroRehabilitation
,
2008
, vol.
23
(pg.
273
-
281
)
Lynch
W. J.
Determination of effort level, exaggeration, and malingering in neurocognitive assessment
,
2004
, vol.
19
(pg.
277
-
283
)
Merlen
T.
Bossink
L.
Schmand
B.
On the limits of effort testing: Symptom validity tests and severity of neurocognitive symptoms in nonlitigant patients
Journal of Clinical and Experimental Neuropsychology
,
2007
, vol.
29
(pg.
308
-
318
)
Mittenberg
W.
Aquilla-Puentes
G.
Patton
C.
Canyock
E. M.
Heilbronner
R. L.
Neuropsychological profiling of symptom exaggeration and malingering
Journal of Forensic Neuropsychology
,
2002
, vol.
3
(pg.
227
-
240
)
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
,
2002
, vol.
24
(pg.
1094
-
1102
)
Moore
B. A.
Donders
J.
Predictors of invalid neuropsychological test performance after traumatic brain injury
Brain Injury
,
2004
, vol.
18
(pg.
975
-
984
)
Slick
D. J.
Sherman
E. M.
Iverson
G. L.
Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research
The Clinical Neuropsychologist
,
1999
, vol.
13
(pg.
545
-
561
)
Streiner
D. L.
Diagnosing tests: Using and misusing diagnostic and screening tests
Journal of Personality Assessment
,
2003
, vol.
81
(pg.
209
-
219
)
Strong
C. A.
Donders
J.
Validity of the Continuous Visual Memory Test (CVMT) after traumatic brain injury
Journal of Clinical and Experimental Neuropsychology
,
2008
, vol.
30
(pg.
885
-
891
)
Tombaugh
T. N.
Test of Memory Malingering: TOMM
,
1996
North Tonawanda, NY
Multi-Health Systems
Trahan
D. E.
Larrabee
G. J.
Continuous Visual Memory Test
,
1988
Odessa, FL
Psychological Assessment Resources
Victor
T. L.
Boone
K. B.
Serpa
J. G.
Buehler
J.
Ziegler
E. A.
Interpreting the meaning of multiple symptom validity test failure
The Clinical Neuropsychologist
,
2009
, vol.
23
(pg.
297
-
313
)