Continuous performance tests (CPT) provide a useful paradigm to assess vigilance and sustained attention. However, few established methods exist to assess the validity of a given response set. The present study examined embedded validity indicators (EVIs) previously found effective at dissociating valid from invalid performance in relation to well-established performance validity tests in 104 adults with TBI referred for neuropsychological testing. Findings suggest that aggregating EVIs increases their signal detection performance. While individual EVIs performed well at their optimal cutoffs, two specific combinations of these five indicators generally produced the best classification accuracy. A CVI-5A ≥3 had a specificity of .92–.95 and a sensitivity of .45–.54. At ≥4 the CVI-5B had a specificity of .94–.97 and sensitivity of .40–.50. The CVI-5s provide a single numerical summary of the cumulative evidence of invalid performance within the CPT-II. Results support the use of a flexible, multivariate approach to performance validity assessment.

There is a growing consensus that establishing the validity of a given neuropsychological profile before interpreting it clinically is an essential part of the assessment process (Heilbronner, Sweet, Morgan, Larrabee, & Millis, 2009). While free-standing performance validity tests (PVTs) are still considered the gold standard measures of cognitive effort, their limitations (time-consuming, do not measure clinically relevant cognitive constructs, easier to identify and hence more susceptible to coaching) prompted a search for alternative solutions. As a result, a variety of embedded validity indicators (EVIs) were developed that unobtrusively measure test taking effort and provide valuable information about the credibility of a given response set.

Employing EVIs instead of relying on external PVTs represents a methodological advancement, as it reduces the level of inference involved in determining the validity of a given test score. Analyzing performance patterns within an instrument to establish the credibility of the response set seems to be a logical approach in performance validity assessment. A number of EVIs have been well established (Greiffenstein, Baker, & Gola, 1994) and research continues to produce novel and clinically useful measures within tests to assess the veracity of a given performance pattern (Johnson, Silverberg, Millis, & Hanks, 2012).

Continuous performance tests (CPTs) are designed to quantify various aspects of an individual's performance during vigilance/sustained attention tasks and have been shown to be sensitive to ADHD (Balint, Czobor, Meszaros, Simon, & Bitter, 2008) as well as acquired attention deficit resulting from traumatic brain injury (TBI; Larrabee, 2012a). However, recent investigations suggest that CPTs are vulnerable to distortions caused by suboptimal effort. Leark, Dixon, Hoffman, and Huynh (2002) found that simulators produced excessive omission (OMI) and commission (COM) errors as well as having slower responding and higher response time variability than those who received standard instructions on the test of variables of attention (TOVA). Similarly, in a sample of 50 individuals assessed for personal injury and disability status, Henry (2005) found that the group that failed at least one PVT obtained significantly worse scores on all TOVA variables compared with those who produced a valid neuropsychological profile. Suhr, Hammers, Dobbins-Buckland, Zimak, and Hughes (2008) produced similar findings, with non-credible performance contributing to both self-report and objective data. They found that individuals who failed the Word Memory Test (WMT; Green, Allen, & Astner, 1996) were more impaired on the Conners' CPT, second edition (CPT-II), highlighting the importance of independently measuring performance validity before clinically interpreting the results. Likewise, Marshall and colleagues (2010) reported that in a sample of 268 adults referred for assessment of ADHD, individual TOVA cutoffs of >25 OMI errors, >30 COM errors and an RT variability >180 were associated with a specificity >.90 and sensitivity ranging from 0.26 to 0.63 in reference to established PVTs. On the CPT-II, OMI and COM T > 80 individually produced a specificity of 0.87 and 1.00, with a sensitivity of 0.57 and 0.04, respectively.

More recently, the use of EVIs within CPTs has been extended to the assessment of attention deficits secondary to TBI. Ord, Boettcher, Greve, and Bianchini (2010) studied EVIs in the CPT-II using a sample of 82 patients with TBI. Invalid responding was defined as failing two or more of the following indicators: Test of Memory Malingering and WMT (using standard cutoffs), Portland Digit Recognition Test (East <23, Hard <18; Total <45), reliable digit span (RDS) <7 and MMPI-II (F > 80, FBS > 27, Meyers > 6). They identified OMI and hit reaction time standard error (HRT SE) as the most effective at separating invalid response sets in mild TBI (mTBI) with sensitivity of 0.30 and 0.41, respectively, at a specificity of 0.95. Combining the two indicators improved sensitivity to 0.44.

A similar study by Lange and colleagues (2013), involving 158 military service members undergoing neuropsychological assessment following TBI, found that OMI, COM and perseverations (PER) were the highest performing EVIs. Poor effort was defined as failing the WMT at the standard cutoffs. The single best cutoff was COM T > 68.7, with a sensitivity of 0.26 and specificity of 0.99 within the mTBI subsample. When these measures were combined, there was a small increase in sensitivity (0.29).

Together, these findings suggest that scores on CPTs are influenced by diagnostically non-credible response styles. The technical manual for the CPT-II states that three of the 12 subscales also function as EVIs: Response style (β), OMI and PER. As an index of the speed-accuracy trade-off, extreme values of β are conceptualized as a bias in response style. Conners (2004) suggested T > 60 as the cutoff for an overly cautious (OMI prone) response style and T < 40 as the cutoff for a hypervigilant (COM prone) response style. Likewise, scores of T > 100 on either the OMI or PER subscales are presented as possible signs of invalid response as they may reflect disengagement from the task or random responses, respectively. However, these cutoffs are presented as tentative EVIs as they may in fact reflect serious attention or neurological problems. Moreover, the manual does not offer any data on the signal detection property of these cutoffs in detecting invalid profiles.

Given the need for empirically based detection of exaggerated attention deficit and the lack of consensus on what the best markers of non-credible CPT-II profiles are, the present study was designed to further investigate the potential of EVIs within the CPT-II to detect invalid response sets in adults referred for neuropsychological testing following TBI. The project had two main goals: (i) to cross-validate previous cutoffs reported in the literature and (ii) to advance the methodology of aggregating EVIs to improve the compound signal detection performance pioneered by Ord and colleagues (2010) and Lange and colleagues (2013).

Method

Participants

Neuropsychological test data from 104 adults (55.8% males, 44.2% females, 90.4% right handed, 9.6% left handed) clinically referred for neuropsychological assessment following a TBI were collected at the outpatient neurorehabilitation service of a Midwestern academic medical center. Mean age of the sample was 38.8 years (SD = 16.7, range: 17–74). Mean level of education was 13.7 years (SD = 2.6, range: 7–20). Mean Full Scale IQ (FSIQ) was 92.6 (SD = 15.9, range: 60–130). Age, education, and FSIQ were normally distributed with skew within ±0.50 and kurtosis within ±1.0. The age distribution was slightly bimodal, with one peak ∼20 and another ∼50.

The majority of the sample (n = 78) was classified as mTBI based on available injury parameters [duration of loss of consciousness, self-reported peri-traumatic amnesia, and Glasgow Coma Scale (GCS) score]. The rest of the sample (n = 26) was classified as moderate to severe (mod/sevTBI). The mTBI group had significantly higher GCS scores (M = 14.7, SD = 0.6) than the mod/sevTBI group (M = 7.0, SD = 3.3). Likewise, the proportion of the sample with positive neuroradiological findings was significantly lower [z(102) = 2.66, p < .01] in the mTBI group (60%) than the mod/sevTBI group (88%). The mTBI group scored slightly higher on the Beck Depression Inventory-II (M = 16.5, SD = 11.3) than the mod/sevTBI group (M = 11.8, SD = 11.5), but the contrast was not significant: t(96) = 1.77, p = .08.

Procedure

A fixed battery approach was employed using standardized measures of general intelligence, memory, attention, executive functions, and language and motor skills. Level of emotional distress was assessed with self-report measures. Cognitive effort was measured with a combination of stand-alone and embedded PVTs.

Materials

Each participant was administered the CPT-II. The main stand-alone PVT was the WMT. EVIs from a variety of standard neuropsychological instruments were also used as alternative measures of cognitive effort in an attempt to establish a more comprehensive evaluation of the credibility of each neurcognitive profile.

The Effort Index (EI-7)

Although the WMT has become a de facto gold standard for the detection of suboptimal cognitive effort (Bauer, O'Bryant, Lynch, McCaffrey, & Fisher, 2007; Gervais, Rohling, Green, & Ford, 2004; Hartman, 2002), it has several potential limitations. First, there are reports that it may be prone to false positives (Batt, Shores, & Chekaluk, 2008; Frederick, 2009; Greve, Ord, Curtis, Bianchini, & Brennan, 2008), although this criticism has since been challenged (Green, Montijo, & Brockhaus, 2011). Second, it generates a dichotomous classification system (Pass/Fail). While this is quite practical in settings where a categorical decision is expected, it lacks the detail required for a more nuanced assessment of performance validity. In effect, viewing effort as a dimensional construct could add important information to refine group-based classification models that may be missed using a pass/fail classification system (Feinstein, 1977). Third, there is a growing consensus about the need for a comprehensive assessment of effort in neuropsychological testing instead of relying on a single PVT (Boone, 2009).

To overcome the potential limitations of relying on a single PVT producing a dichotomous classification, a composite based on several independent effort indicators was developed for the purpose of this study (EI-7). Seven empirically supported PVTs were selected, and together labeled as the EI-7: the WMT, the RDS (Greiffenstein, Baker, & Gola, 1994; Heinly, Greve, Bianchini, Love, & Brennan, 2005; Jasinski, Berry, Shandera, & Clark, 2011; Larrabee, 2003; Mathias, Greve, Bianchini, Houston, & Crouch, 2002), the Word Choice Test (WCT; Pearson, 2008), the California Verbal Learning Test, Second Edition, forced choice recognition (CVLT-II FCR; Moore & Donders, 2004); Finger Tapping Test raw score of the dominant hand (FTT-DH; Arnold et al., 2005); and the Trail Making Test (TMT A & B raw scores; Iverson, Lange, Green, & Franzen, 2002). The FTT-DH and TMT A and B were chosen because together they tap domains that are closely related to the target constructs of the CPT-II (simple reaction time, visuomotor speed, inhibition, simultaneous processing of two classes of stimuli).

The resulting aggregate of independent PVTs has the advantage of broadly sampling test taking behavior over time while monitoring cognitive effort over a wide range of tests. Also, by rescaling several dichotomous PVTs into a composite resulting in interval scale, the underlying continuity in cognitive effort can be recaptured.

Recognizing that on most PVTs there are levels of failures, and that using a single cutoff (Pass/Fail approach) may produce an artificial dichotomy that results in a loss of clinical data about the examinee's degree of engagement, each of the widely accepted cutoffs was assigned a separate value to better capture the effort gradient inherent in cognitive testing. A performance that passed the most liberal cutoff (i.e., very low probability of poor effort) was assigned the value of zero. The next most liberal cutoff (high sensitivity, low specificity) available in the literature was assigned the value of one, the next available cutoff was assigned a value of two, and finally, the most conservative (low sensitivity, high specificity) cutoff was assigned a value of three, reflecting a gradient of increasing confidence in the presence of poor effort.

Table 1 provides the key for transforming dichotomous cutoffs into a continuous scale as well as the observed base rate (BRobserved) at each cutoff in the sample. The WMT classified the largest proportion of the sample as invalid (39.4%), while the TMT-B had the lowest BRobserved of failure (5.8%). Table 2 provides a frequency, percentage, and cumulative percentage as well as a classification label for the first nine levels of the EI-7. Since the frequency at more extreme EI-7 values decreases rapidly, the table was truncated at 8.

Table 1.

The components of the Effort Index (EI-7), base rates for failing each cutoff and cumulative failure rates

Components of the Effort Index (EI-7) Effort Index (EI-7) value
 
Cumulative failure rates (1–3) 
ACS-WCT 47–50 44–46 42–43 32–41  
Base rate 79.0 5.0 4.0 12.0 21.0 
CVLT-II FCR 16 15 14 ≤13  
Base rate 75.0 6.7 7.7 10.6 25.0 
FTT (DH tapping) >35/28 ≤35/28 – –  
Base rate 93.3 6.7   6.7 
RDS ≥8 ≤5  
Base rate 76.0 13.5 4.8 5.8 24.0 
TMT-A time <62 ≥62 – –  
Base rate 89.4 10.6   10.6 
TMT-B time <200 ≥200 – –  
Base rate 94.2 5.8   5.8 
WMT failuresa  
Base tate 60.6 4.8 11.5 23.1 39.4 
Components of the Effort Index (EI-7) Effort Index (EI-7) value
 
Cumulative failure rates (1–3) 
ACS-WCT 47–50 44–46 42–43 32–41  
Base rate 79.0 5.0 4.0 12.0 21.0 
CVLT-II FCR 16 15 14 ≤13  
Base rate 75.0 6.7 7.7 10.6 25.0 
FTT (DH tapping) >35/28 ≤35/28 – –  
Base rate 93.3 6.7   6.7 
RDS ≥8 ≤5  
Base rate 76.0 13.5 4.8 5.8 24.0 
TMT-A time <62 ≥62 – –  
Base rate 89.4 10.6   10.6 
TMT-B time <200 ≥200 – –  
Base rate 94.2 5.8   5.8 
WMT failuresa  
Base tate 60.6 4.8 11.5 23.1 39.4 

Note: ACS-WCT = Advanced Clinical Solutions—Word Choice Test; CVLT-II FCR = California Verbal Learning Test, Second Edition forced choice recognition; FTT = finger tapping test; DH = dominant hand; RDS = reliable digit span; TMT = Trail Making Test; WMT = Green's Word Memory Test.

aImmediate recall, delayed recall, and consistency ≤82.5%.

Table 2.

Frequencies, percentages, cumulative percentages, and descriptive labels for the first nine levels of the EI-7

EI-7 f Cumulative percentage Classification 
47 45.2 45.2 PASS 
12 11.5 56.7 Pass 
5.8 62.5 Borderline 
11 10.6 73.1 Borderline 
2.9 76.0 Fail 
11 10.6 86.5 Fail 
3.8 90.4 FAIL 
1.9 92.3 FAIL 
1.0 93.3 FAIL 
EI-7 f Cumulative percentage Classification 
47 45.2 45.2 PASS 
12 11.5 56.7 Pass 
5.8 62.5 Borderline 
11 10.6 73.1 Borderline 
2.9 76.0 Fail 
11 10.6 86.5 Fail 
3.8 90.4 FAIL 
1.9 92.3 FAIL 
1.0 93.3 FAIL 

Note: The shading provides an additional visual cue for the increasing probability that the given response has been accurately classified as invalid.

An EI-7 value of zero means that the individual did not have any score on these seven PVTs that reached the most liberal of the cutoffs. Therefore, this subsample consistently demonstrated good effort and is considered an unequivocal PASS. Having obtained an EI-7 value of one is also considered a Pass, as failing one of the seven PVTs at the most liberal cutoff can happen for reasons other than poor effort. An EI-7 value of two or three starts to become problematic, as it implies either multiple PVT failures at the most liberal cutoff or failing a single PVT at a more conservative cutoff. Thus, this range is labeled Borderline. An EI-7 value of 4 or above is considered a Fail, with increasing confidence in the classification accuracy as the EI-7 value increases. These gradations of degree of certainty associated with each level of PVT failure are similar to the Slick, Sherman, and Iverson (1999) criteria for malingered neurocognitive dysfunction (MND). However, given the lack of data on external incentive status, the original MND model could not be applied to the current sample.

Although the EI-7 was used as a primary index to establish the validity of a given neuropsychological profile, given its experimental nature, analyses based on it were repeated using the WMT to provide cross-validation with an established instrument. To maximize the purity of the reference groups used to establish cutoffs on the CPT-II subtests, borderline cases (EI-7 values 2–3) were excluded from these analyses, following recommendations by Greve and Bianchini (2004). Thus, participants with EI-7 scores of ≤1 were classified as Pass, while those with EI-7 scores of ≥4 were classified as Fail.

The ≥4 cutoff protects against incidental PVT failures that are commonly considered to be insufficient evidence to label an entire neurocognitive profile as invalid. For example, performing very poorly on a single component of the EI-7 could result in a score of 3, even though technically it represents a single PVT failure, which is classified as a Pass according to broadly accepted forensic standards. However, an EI-7 score of ≥4 means at least two independent PVT failures, and hence, provides the minimum acceptable level of evidence for invalid responding.

Data Analysis

Descriptive statistics (frequency, percentage, and cumulative percentage; mean, median, standard deviation, skew, kurtosis, and range) were computed for the key variables. Significance testing was performed using the F- and t-tests as well as χ2. Effect size was reported using partial η2, Cohen's d, and Ф2. Confidence intervals (CIs) around point estimates were computed using the t distribution.

Signal detection properties (sensitivity, specificity, positive [PPP], and negative predictive power [NPP]) were computed using the formula provided by Baldessarini, Finklestein, and Arana (1983) in addition to the overall area under the curve (AUC) statistic. AUC is the overall classification accuracy of the model. Sensitivity (or true positive [TP] rate) is the ratio of TP and the sum of TP and false negatives (FN). Specificity (or true negative [TN] rate) is the ratio of TN and the sum of TN and false positives (FP). In the present study, sensitivity is the probability that an instrument will correctly detect an invalid profile, while specificity is the probability that valid profiles will be identified as such. PPP is the ratio of TP and the sum of TP and FP, whereas NPP is the ratio of TN and the sum of TN and FN. In the context of performance validity assessment, PPP is the probability that an individual's performance is invalid given that he or she failed the PVT. Conversely, NPP is the probability that an individual's performance was valid given that he or she passed the PVT.

Likelihood ratios (LRs) were also reported given their utility in interpreting individual scores in clinical settings. The value of +LR expresses how much more likely an individual with a given condition (i.e., invalid profile) will obtain a positive result (i.e., Fail) compared with an individual without the condition (i.e., valid profile). Essentially, +LR is TP rate over FP rate. Conversely, –LR captures how much less likely an individual with a given condition (i.e., invalid profile) will obtain a negative result (i.e., Pass) compared with an individual without the condition (i.e., valid profile). In other words, –LR is the FN rate over TN rate. It follows from these definitions that higher values for +LR, and lower values for –LR reflect better discriminative power (Grimes & Schultz, 2005).

Results

A series of t-tests were performed using Pass/Fail status on the EI-7 and WMT as IV, and the 12 CPT-II subscales as DVs. All contrasts were significant except for β and Hit RT ISI Change. One-sample t-tests against the population mean of T = 50 revealed that the Fail group had significant elevations on all but the same two scales. The Fail group also produced more variable distributions on all CPT-II scales except for COM and d′. The largest effects (d > 1.0) were observed on the subscales that were previously identified in the literature as sensitive to poor effort: OMI, COM, HRT SE, VAR, and PER. A similar pattern, although with smaller effects, was observed using the WMT as IV. A summary of the analyses are provided in Table 3.

Table 3.

Independent t-tests comparing valid and invalid neuropsychological profiles on each of the CPT-II subscales with associated effect size measures, one-sample t-tests comparing each mean to the mean of T = 50 of the normative sample, and F-tests comparing sample variances

CPT-II subtests (T-scores) PASS (EI-7 ≤ 1)
 
FAIL (EI-7 ≥ 4)
 
p dEI-7 dWMT EI-7 WMT 
n = 59
 
n = 28
 
σ1 versus σ2 σ1 versus σ2 
M SD M SD P p 
OMI 50.8 18.2 114.5b 84.0 <.001 1.05 1.04 <.001 <.001 
COM 49.9 9.4 60.8b 11.4 <.001 1.04 .61 .22 <.01 
Hit RT 48.8 10.4 57.3a 17.7 <.01 .59 .59 <.001 <.01 
Hit RT SE 50.9 11.6 70.4b 20.9 <.001 1.15 .97 <.001 <.001 
Variability SE 50.7 9.3 68.0b 17.9 <.001 1.21 1.03 <.001 <.001 
d′ 50.1 9.8 56.5b 7.9 <.01 .72 .30 .22 .30 
Response style (β) 49.1 12.0 50.9 8.2 .48 .18 .39 <.05 <.001 
Perseverations 52.6 14.7 84.6b 39.6 <.001 1.07 .73 <.001 <.01 
Hit RT BC 48.7 8.9 59.1a 20.1 <.05 .67 .49 <.001 <.01 
Hit RT BC SE 52.1 9.5 63.3b 19.8 <.01 .72 .57 <.001 <.001 
Hit RT ISI change 53.3a 11.6 54.1 18.3 .80 .05 .04 <.001 <.01 
Hit RT ISI change SE 49.6 10.2 57.7a 18.7 <.05 .54 .43 <.001 <.001 
CPT-II subtests (T-scores) PASS (EI-7 ≤ 1)
 
FAIL (EI-7 ≥ 4)
 
p dEI-7 dWMT EI-7 WMT 
n = 59
 
n = 28
 
σ1 versus σ2 σ1 versus σ2 
M SD M SD P p 
OMI 50.8 18.2 114.5b 84.0 <.001 1.05 1.04 <.001 <.001 
COM 49.9 9.4 60.8b 11.4 <.001 1.04 .61 .22 <.01 
Hit RT 48.8 10.4 57.3a 17.7 <.01 .59 .59 <.001 <.01 
Hit RT SE 50.9 11.6 70.4b 20.9 <.001 1.15 .97 <.001 <.001 
Variability SE 50.7 9.3 68.0b 17.9 <.001 1.21 1.03 <.001 <.001 
d′ 50.1 9.8 56.5b 7.9 <.01 .72 .30 .22 .30 
Response style (β) 49.1 12.0 50.9 8.2 .48 .18 .39 <.05 <.001 
Perseverations 52.6 14.7 84.6b 39.6 <.001 1.07 .73 <.001 <.01 
Hit RT BC 48.7 8.9 59.1a 20.1 <.05 .67 .49 <.001 <.01 
Hit RT BC SE 52.1 9.5 63.3b 19.8 <.01 .72 .57 <.001 <.001 
Hit RT ISI change 53.3a 11.6 54.1 18.3 .80 .05 .04 <.001 <.01 
Hit RT ISI change SE 49.6 10.2 57.7a 18.7 <.05 .54 .43 <.001 <.001 

Note:aOne-sample t-test against the mean of T = 50, two tailed, p < 0.05.

bOne-sample t-test against the mean of T = 50, two tailed, p < 0.01.

Next, the signal detection properties of various cutoffs on these five scales were examined in reference to the EI-7 and WMT. On OMI, >60 produced an AUC of 0.74–0.73, with good sensitivity (0.61–0.58), and specificity (0.88–0.89). At >65, AUC was essentially the same, with minimal decreases in sensitivity (0.57–0.55) and increases in specificity (0.91–0.92). At >80, AUC deteriorated (0.70–0.71), with notable losses in sensitivity (0.46–0.48) and modest gains in specificity (0.93–0.95). Raising the cutoff to >100 lowered AUC (0.69), achieving excellent specificity (0.98–1.00) at the cost of a substantial loss in sensitivity (0.39–0.38). Further increasing the cutoff to >110 produced no net gain. At the optimal cutoff of >65, +LR was 5.8, whereas –LR was 0.75.

On COM, a cutoff of >60 produced an AUC of 0.67–0.68, with both sensitivity (0.50–0.43) and specificity (0.88–0.89) in the acceptable range. Raising it to >65 resulted in a decrease in AUC (0.62–0.61), with an asymmetric trade-off: substantial loss in sensitivity (0.29–0.25) and modest increase in specificity (0.95–0.97). Increasing the cutoff to >70 further deflated the AUC (0.56), shrinking sensitivity (0.18–0.15) with no improvement in specificity. At the optimal cutoff of >60, +LR was 4.2, whereas –LR was 0.57.

A score of >60 on HRT SE resulted in AUC 0.75–0.68, with robust sensitivity (0.68–0.60), but unacceptably low specificity (0.81–0.77). While the overall cost of raising the cutoff to >65 was a negligible decrease in AUC (0.73–0.67) and a notable loss in sensitivity (0.54–0.48), the specificity improved significantly (0.92–0.86). At >70, AUC was restored (0.75–0.68), with both sensitivity (0.54–0.45) and specificity (0.97–0.91) in the acceptable range. This cutoff proved to be the point of diminishing returns: at >80 and >90 the AUC (0.68–0.63 & 0.55) and especially the sensitivity (0.36–0.30 and 0.11–0.10) declined precipitously. At the optimal cutoff of >70, +LR was 18.0, whereas –LR was 0.47.

On VAR, a cutoff of >60 produced the highest AUCs (0.75–0.68), trading specificity (0.86–0.84) for sensitivity (0.64–0.53). At >65, the benchmark specificity was reached (0.90–0.89), with a minimal sacrifice in AUC (0.72–0.68) and sensitivity (0.54–0.48). Further increasing the cutoffs (>70 and >75) consolidated the specificity (0.92 and 0.98, respectively) at the expense of both AUC (0.69–0.67 & 0.63–0.64) and sensitivity (0.46–0.43 & 0.29–0.30). At the optimal cutoff of >65, +LR was 5.4, whereas –LR was 0.51.

Finally, on PER, a cutoff of >60 had unacceptably low specificity (0.81–0.77) with good sensitivity (0.61–0.55) and acceptable overall AUC (0.71–0.66). Increasing the cutoff to >70 improved the AUC against the EI-7 (0.73), but not the WMT (0.65) with a loss in sensitivity (0.54–0.45) but a gain in specificity (0.91–0.84). Adjusting the cutoff to >80 resulted in little change in AUC (0.72–0.66), but consolidated the specificity (0.94–0.89) without sacrificing too much sensitivity (0.50–0.43). Further increases to >90 and >100 disproportionately favored specificity (0.98–0.97 in both cases) at the expense of sensitivity (0.36–0.38 and 0.32–0.33, respectively). At these highly conservative cutoffs, the AUCs trended downwards: 0.67 and 0.65, respectively, for both EI-7 and WMT. At the optimal cutoff of >70, +LR was 6.0, whereas –LR was 0.51. Table 4 and Table 6 provide a visual summary of these values.

Table 4.

AUC, SENS, and SPEC at various cutoffs of embedded individual and composite CPT-II validity indicators against the EI-7 and WMT

  OMI
 
COM
 
HRT SE
 
VAR
 
PER
 
CVI-5Aa
 
CVI-5Bb
 
EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT 
>60 >60 >60 >60 >60 >1 >1 
AUC .74 .73 .69 .68 .75 .68 .75 .68 .71 .66 .75 .71 .77 .72 
SENS .61 .58 .50 .43 .68 .60 .64 .53 .61 .55 .61 .55 .71 .65 
SPEC .88 .89 .88 .89 .81 .77 .86 .84 .81 .77 .90 .87 .83 .80 
 >65 >65 >65 >65 >70 >2 >2 
AUC .74 .74 .62 .61 .73 .67 .72 .68 .73 .65 .74 .69 .75 .72 
SENS .57 .55 .29 .25 .54 .48 .54 .48 .54 .45 .54 .45 .61 .55 
SPEC .91 .92 .95 .97 .92 .86 .90 .89 .91 .84 .95 .92 .85 .89 
 >80 >70 >70 >70 >80 >3 >3 
AUC .70 .71 .56 .56 .75 .68 .69 .67 .72 .66 .68 .64 .73 .67 
SENS .46 .48 .18 .15 .54 .45 .46 .43 .50 .43 .36 .30 .50 .40 
SPEC .93 .95 .95 .97 .97 .91 .92 .92 .94 .89 1.00 .98 .97 .94 
 >100   >80 >75 >90   >4 
AUC .69 .69   .68 .63 .63 .64 .67 .67   .68 .54 
SENS .39 .38   .36 .30 .29 .30 .36 .38   .40 .30 
SPEC .98 1.00   1.00 .97 .98 .98 .98 .97   1.00 1.00 
 >110   >90   >100     
AUC .67 .68   .55 .55   .65 .65     
SENS .36 .35   .11 .10   .32 .33     
SPEC .98 1.00   1.00 1.00   .98 .97     
  OMI
 
COM
 
HRT SE
 
VAR
 
PER
 
CVI-5Aa
 
CVI-5Bb
 
EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT EI-7 WMT 
>60 >60 >60 >60 >60 >1 >1 
AUC .74 .73 .69 .68 .75 .68 .75 .68 .71 .66 .75 .71 .77 .72 
SENS .61 .58 .50 .43 .68 .60 .64 .53 .61 .55 .61 .55 .71 .65 
SPEC .88 .89 .88 .89 .81 .77 .86 .84 .81 .77 .90 .87 .83 .80 
 >65 >65 >65 >65 >70 >2 >2 
AUC .74 .74 .62 .61 .73 .67 .72 .68 .73 .65 .74 .69 .75 .72 
SENS .57 .55 .29 .25 .54 .48 .54 .48 .54 .45 .54 .45 .61 .55 
SPEC .91 .92 .95 .97 .92 .86 .90 .89 .91 .84 .95 .92 .85 .89 
 >80 >70 >70 >70 >80 >3 >3 
AUC .70 .71 .56 .56 .75 .68 .69 .67 .72 .66 .68 .64 .73 .67 
SENS .46 .48 .18 .15 .54 .45 .46 .43 .50 .43 .36 .30 .50 .40 
SPEC .93 .95 .95 .97 .97 .91 .92 .92 .94 .89 1.00 .98 .97 .94 
 >100   >80 >75 >90   >4 
AUC .69 .69   .68 .63 .63 .64 .67 .67   .68 .54 
SENS .39 .38   .36 .30 .29 .30 .36 .38   .40 .30 
SPEC .98 1.00   1.00 .97 .98 .98 .98 .97   1.00 1.00 
 >110   >90   >100     
AUC .67 .68   .55 .55   .65 .65     
SENS .36 .35   .11 .10   .32 .33     
SPEC .98 1.00   1.00 1.00   .98 .97     

Note: AUC = area under the curve (overall classification accuracy); SENS = sensitivity; SPEC = specificity; EI-7 = Pass ≤1and Fail = ≥4; WMT = Green's Word Memory Test at Conventional Pass/Fail Classification; OMI = CPT-II omission errors (T-score); COM = CPT-II commission errors (T-score); HRT SE = CPT-II hit reaction time standard error (T-score); VAR = CPT-II Variability Standard Error (T-score); PER = CPT-II Perseverative Errors (T-score); CVI-5 = CPT-II Validity Indicator.

aOMI >65, COM >65, HRT SE >65, VAR >65, and PER >70.

bOMI >60, COM >60, HRT SE >60, VAR >60, and PER >60.

Although β failed to show promise as an EVI based on the initial contrasts, given that it has been identified as a potential validity scale in the test manual, it was included in the signal detection analyses. Following Conners’ (2004) guidelines, scores between 40 and 60 were defined as Pass, and scores outside that range as Fail. The scale performed poorly against both the EI-7 and the WMT (AUC: 0.49–0.54; sensitivity: 0.11–0.18; specificity: 0.88–0.91; +LR: 0.9–1.9; –LR: 1.0–0.9).

Replicating the methodology used to develop the EI-7, a composite of these five EVIs was created and was labeled “CPT-II Validity Index-5” (CVI-5). It ranges from 0 (none of the five EVIs reached the cutoff) to 5 (all EVIs were above the cutoff), and provides and index of cumulative EVI failures. Given the inescapable trade-off between high sensitivity and low specificity of liberal cutoffs and the opposite pattern associated with conservative ones, two different indices were developed, capitalizing on the strength of each of these strategies.

The first one, labeled CVI-5A represents a conservative approach. As such, each of its components reaches or exceeds the benchmark specificity of 0.90 against at least one of the reference PVTs: OMI >65, COM >65, HRT SE >65, VAR >65, and PER >70. The purpose of this aggregate measure was to exploit the potential for improving signal detection performance of a multivariate model of validity assessment. Specifically, this index was developed to examine whether combining multiple indicators would increase the confidence in classifying a profile as invalid.

A CVI-5A value of 0 is an unequivocal PASS. A value of 1 is considered Borderline, given that any of the five EVI in the composite could individually raise legitimate concerns about the validity of the overall response set. A value of 2 is considered Fail, with a specificity of 0.90 against the EI-7 and 0.87 against the WMT. Values of ≥3 are considered to be unequivocal FAIL, with increasing confidence in the classification accuracy with any additional CVI-5 component in the failing range (specificity: 0.98–0.92, and 1.00–0.98, respectively).

The second one was labeled CVI-5B and is based on the opposite approach: each constituent EVI is calibrated to maximize sensitivity. Therefore, the lowest reasonable cutoff of >60 was used to define failure. As a result, three of the components did not reach the benchmark specificity of 0.90. The goal behind the CVI-5B was to explore the potential in aggregating sub-threshold EVI failures into a single composite that would have better signal detection properties than any of its individual component.

A CVI-5B value of 0 is an unequivocal PASS. A value of 1 is still considered a Pass, since three of its five components do not meet the minimum requirements to function as independent PVTs (sensitivity < 0.90). A CVI-5B value of 2 is considered Borderline for the same reason, but it starts to raise questions about the credibility of the profile. A value of 3 is the first level of Fail, given the cumulative evidence of questionable test performance. At ≥4, the composite provides strong evidence of invalid response pattern (specificity: 97–0.94). Therefore, this level is labeled FAIL. A CVI-5B value of 5 had a perfect specificity, meaning that all those who failed all of the individual EVIs also failed the reference PVTs.

Univariate ANOVAs found a comparably strong association between the levels of both CVI-5A and CVI-5B and the EI-7 (partial η2 = 0.49, very large effect). Mean EI-7 values increased more slowly and leveled off sooner across levels of CVI-5B compared with its counterpart, which is to be expected given the more liberal cutoffs compared with the CVI-5A. The only significant difference between the two versions of the CVI-5 was observed at the value of 1. This pattern is predicted by the inner logic of the composites described above. The mean EI-7 values and the surrounding 95% CIs at each level provide further justification for the labeling: at 0, the upper limit is <1.5 units, whereas at 5, the lower limit of the 95% CI is 2.9 and 5.0, respectively.

A similar pattern was found against the WMT using non-parametric significance tests. A gradual overall increase in failure rate was observed as CVI-5 values increased, with a very large effect (Φ2 = 0.29) for both versions. As observed against the EI-7, at the highest level (all five indicators failed) the CVI-5 produced a perfect sensitivity. While the failure rate at the same CVI-5 value varied between the two versions, the discrepancy never reached statistical significance. Table 5 provides a visual summary of these analyses.

Table 5.

Descriptive labels, percentages, means, 95% confidence intervals and failure rates for the six levels of the CVI-5s with corresponding EI-7 and WMT distributions

Classification
 
CVI-5 A B EI-7
 
 WMT
 
A
 
B
 
 A B 
CVI-5Aa CVI-5Bb Level M 95% CI M 95% CI  Failure rate
 
PASS/PASS 51.0 40.4 0.8 0.4–1.2 0.9 0.4–1.4  9/53 9/42 
Borderline/Pass 19.2 22.1 2.7 1.6–3.8 1.5 0.6–2.4  9/20 5/23 
Fail/Borderline 6.7 9.6 2.3 0.6–4.0 2.3 0.9–3.7  4/7 4/10 
FAIL/Fail 9.6 9.6 3.7 1.3–6.1 3.0 1.3–4.7  6/10 6/10 
FAIL/FAIL 7.7 6.7 6.9 3.8–10 4.9 1.0–8.8  7/8 4/7 
FAIL/FAIL 4.8 11.5 9.2 2.9–16 7.8 5.0–11  5/5 12/12 
    F 18.6 16.4 χ2 29.7 30.1 
    p <.001 <.001 p <.01 <.01 
    η2 0.49 0.49 Φ2 0.29 0.29 
            
Classification
 
CVI-5 A B EI-7
 
 WMT
 
A
 
B
 
 A B 
CVI-5Aa CVI-5Bb Level M 95% CI M 95% CI  Failure rate
 
PASS/PASS 51.0 40.4 0.8 0.4–1.2 0.9 0.4–1.4  9/53 9/42 
Borderline/Pass 19.2 22.1 2.7 1.6–3.8 1.5 0.6–2.4  9/20 5/23 
Fail/Borderline 6.7 9.6 2.3 0.6–4.0 2.3 0.9–3.7  4/7 4/10 
FAIL/Fail 9.6 9.6 3.7 1.3–6.1 3.0 1.3–4.7  6/10 6/10 
FAIL/FAIL 7.7 6.7 6.9 3.8–10 4.9 1.0–8.8  7/8 4/7 
FAIL/FAIL 4.8 11.5 9.2 2.9–16 7.8 5.0–11  5/5 12/12 
    F 18.6 16.4 χ2 29.7 30.1 
    p <.001 <.001 p <.01 <.01 
    η2 0.49 0.49 Φ2 0.29 0.29 
            

Note: The shading provides an additional visual cue for the increasing probability that the given response has been accurately classified as invalid. aOMI >65, COM >65, HRT SE >65, VAR >65, and PER >70.

b OMI >60, COM >60, HRT SE >60, VAR >60, and PER >60.

Although sensitivity and specificity are important features of an instrument, they are nested in group level analyses. In routine clinical practice, while evaluating the credibility of an individual response set, the most relevant test parameters are PPP and NPP (i.e., what a test score predicts in a given examinee). Therefore, those values were computed at the conventional hypothetical population base rates of 10%, 20%, 30%, 40%, and 50%. Results are displayed in Table 6. The expected changes in the confidence in classification accuracy as a function of cutoffs and base rates were observed.

Table 6.

PPP, NPP, SENS, SPEC, +LR, and –LR at the optimal cutoff of the CVI-5 components and composites at various hypothetical base rates of invalid performance against the EI-7 (Pass ≤ 1; Fail ≥ 4)

 Cutoff  10% 20% 30% 40% 50% AUC SENS SPEC +LR −LR 
OMI >65 PPP 0.41 0.61 0.73 0.81 0.86 0.74 0.57 0.91 6.3 0.47 
NPP 0.95 0.89 0.83 0.76 0.68  
COM >60 PPP 0.32 0.51 0.64 0.74 0.81 0.69 0.50 0.88 4.2 0.57 
NPP 0.94 0.88 0.80 0.73 0.64 
HRT SE >70 PPP 0.67 0.82 0.89 0.92 0.95 0.75 0.54 0.97 18.0 0.47 
NPP 0.95 0.89 0.83 0.77 0.68 
VAR >65 PPP 0.38 0.57 0.70 0.78 0.84 0.72 0.54 0.90 5.4 0.51 
NPP 0.95 0.89 0.82 0.75 0.66 
PER >70 PPP 0.40 0.60 0.72 0.80 0.86 0.65 0.54 0.91 6.0 0.51 
NPP 0.95 0.89 0.82 0.75 0.66 
CVI-5Aa >2 PPP 0.55 0.73 0.82 0.88 0.92 0.74 0.54 0.95 10.8 0.48 
NPP 0.95 0.89 0.83 0.76 0.67 
CVI-5Bb >3 PPP 0.65 0.81 0.88 0.92 0.94 0.73 0.50 0.97 16.7 0.52 
NPP 0.95 0.89 0.82 0.74 0.67 
 Cutoff  10% 20% 30% 40% 50% AUC SENS SPEC +LR −LR 
OMI >65 PPP 0.41 0.61 0.73 0.81 0.86 0.74 0.57 0.91 6.3 0.47 
NPP 0.95 0.89 0.83 0.76 0.68  
COM >60 PPP 0.32 0.51 0.64 0.74 0.81 0.69 0.50 0.88 4.2 0.57 
NPP 0.94 0.88 0.80 0.73 0.64 
HRT SE >70 PPP 0.67 0.82 0.89 0.92 0.95 0.75 0.54 0.97 18.0 0.47 
NPP 0.95 0.89 0.83 0.77 0.68 
VAR >65 PPP 0.38 0.57 0.70 0.78 0.84 0.72 0.54 0.90 5.4 0.51 
NPP 0.95 0.89 0.82 0.75 0.66 
PER >70 PPP 0.40 0.60 0.72 0.80 0.86 0.65 0.54 0.91 6.0 0.51 
NPP 0.95 0.89 0.82 0.75 0.66 
CVI-5Aa >2 PPP 0.55 0.73 0.82 0.88 0.92 0.74 0.54 0.95 10.8 0.48 
NPP 0.95 0.89 0.83 0.76 0.67 
CVI-5Bb >3 PPP 0.65 0.81 0.88 0.92 0.94 0.73 0.50 0.97 16.7 0.52 
NPP 0.95 0.89 0.82 0.74 0.67 

Note: PPP = positive predictive power; NPP = negative predictive power; SENS = sensitivity; SPEC = specificity; +LR = positive likelihood ratio; −LR = negative likelihood ratio; AUC = area under the curve (overall classification accuracy); OMI = CPT-II omission errors (T-score); COM = CPT-II commission errors (T-score); HRT SE = CPT-II hit reaction time standard error (T-score); VAR = CPT-II variability standard error (T-score); PER = CPT-II perseverative errors (T-score); CVI-5 = CPT-II validity indicator.

aOMI >65, COM >65, HRT SE >65, VAR >65, and PER >70.

bOMI >60, COM >60, HRT SE >60, VAR >60, and PER >60.

Besides having better signal detection properties than all but one (HRT SE > 70) of their individual components, the CVI-5s allow the assessor to harvest the compound discriminant power of multiple imperfect EVIs and transform them into a superior single index. Moreover, the two versions have been developed to detect different manifestations of invalid response pattern: the combined effect of multiple failed EVIs that provide evidence of non-credible responding (CVI-5A) or consistent sub-threshold level EVI failures that individually contain insufficient evidence to establish the invalidity of a given profile, but repeated failures allow the examiner to correctly identify non-credible responding (CVI-5B).

For example, if all of the constituent EVIs fall between a T-score of 61 and 65, they produce a CVI-5A value of 0, which is an unequivocal PASS. The same pattern of performance produces a CVI-5B value of 5, which is an unequivocal FAIL with a perfect specificity. Therefore, the clinician can confidently label the response set as invalid. As the example above suggests, the two versions of the CVI-5 complement each other and while they overlap significantly, they can provide non-redundant and clinically relevant information about the veracity of given response sets.

Finally, Table 7 illustrates the well-documented paradox that individuals with mTBI fail PVTs at a higher rate than those with mod/sevTBI. Goodness-of-fit tests produced medium effects with CVI-5s, EI-7, and WMT. Data on the EI-7 shows that the most extreme form of invalid responding was concentrated in the mTBI group.

Table 7.

Distribution of mTBI and mod/sevTBI patients across levels of CVI-5, EI-7, and WMT

  CVI-5
 
 
 
 
 
A
 
B
 
EI-7
 
WMT
 
Mild Mod/sev Mild Mod/sev  Mild Mod/sev  Mild Mod/sev 
62.3a 37.7a 69.0a 31.0 61.7a 38.3a Pass 65.6a 34.4 
80.0 20.0 60.9 39.1 91.7 8.3 Fail 90.0 10.0 
100.0 0.0 80.0 20.0 66.7 33.3    
80.0 20.0 90.0 10.0 63.6 36.4    
100.0 0.0 85.7 14.3 100    
74.8 25.2 100.0 0.0 91.9 9.1    
     100    
     100    
χ2 11.6  9.0  14.3     7.80 
p .04  .11  .35     <.001 
Φ2 0.11  0.09  0.14     0.08 
  CVI-5
 
 
 
 
 
A
 
B
 
EI-7
 
WMT
 
Mild Mod/sev Mild Mod/sev  Mild Mod/sev  Mild Mod/sev 
62.3a 37.7a 69.0a 31.0 61.7a 38.3a Pass 65.6a 34.4 
80.0 20.0 60.9 39.1 91.7 8.3 Fail 90.0 10.0 
100.0 0.0 80.0 20.0 66.7 33.3    
80.0 20.0 90.0 10.0 63.6 36.4    
100.0 0.0 85.7 14.3 100    
74.8 25.2 100.0 0.0 91.9 9.1    
     100    
     100    
χ2 11.6  9.0  14.3     7.80 
p .04  .11  .35     <.001 
Φ2 0.11  0.09  0.14     0.08 

Note: The shading provides an additional visual cue for the increasing probability that the given response has been accurately classified as invalid.

aχ2(1) goodness of fit against the proportion of mTBI and mod/sevTBI in the sample, p < .05.

Discussion

The present study explored the potential of aggregate EVIs within the CPT-II to improve the assessor's ability to discriminate between valid and invalid response sets in adults referred for neuropsychological assessment following TBI. The results converge in a number of clinically relevant findings. (i) Two of the EVIs at the cutoffs originally proposed by Conners' (OMI and PER T > 100) performed reasonably well in the current sample against other PVTs, but tended to sacrifice sensitivity for specificity. In contrast, β had unacceptably poor overall signal detection properties. (ii) In addition, other CPT-II scales previously identified in the literature as potential EVIs at various cutoffs (OMI, COM, PER, and HRT SE) also performed well at discriminating between valid and invalid response sets in the present sample. The single best individual EVI was HRT SE > 70. (iii) The practice of aggregating EVIs advocated by previous research (Lange et al., 2013; Larrabee 2003, 2012a, 2012b; Ord et al., 2010) also proved effective in our sample. The CVI-5s not only increased the overall classification accuracy but also improved the ecological validity of the final decision, as it sampled multiple facets of sustained attention to determine the validity of a given response set. (iv) The CPT-II appears to be more sensitive to invalid responding than acquired attention deficit when used in a TBI population. This finding is consistent with previous reports (Brenner et al., 2009; Skandsen et al., 2010) and has two main implications. On the one hand, the CPT-II could be co-opted as a PVT. On the other hand, given the remarkably intact CPT-II profiles of those who produced valid response sets, it could be useful in ruling out residual deficits in processing speed, vigilance, inhibition and sustained attention following TBI. (v) As commonly reported, individuals with mTBI failed reference PVTs at significantly higher rates than those with more severe injuries, as well as failing the CVI-5s at higher rates.

An in-depth analysis of the 12 CPT-II scales sheds some light on the success of the OMI and PER scales at identifying invalid response sets. These are the only two scales on which extreme error rates can be easily achieved. Producing a severely impaired score (i.e., high T-scores) on OMI simply requires failing to respond to a target, of which there are many. Likewise, one can obtain an extreme elevation on PER through incessant/random responding, pressing the button often, disregarding the actual stimulus. Again, there are ample opportunities to “score” on this scale during the 14-min administration time. Moreover, what constitutes a “failure” on these two parameters is clear. The other 10 scales, however, have built-in physiological or statistical ceilings that are virtually impossible to penetrate. In addition, several of these scales are derived measures that may be difficult to conceptualize even for trained professionals who are well versed in the signal detection paradigm. For example, it is quite easy to inflate OMI errors by disengaging from the test, but it is more challenging to directly manipulate standard deviations, standard errors or discriminability indices. While extreme response patterns on the OMI and PER scales can affect the derived signal detection parameters, they are exceedingly difficult to influence deliberately.

Given the inherent trade-off among combinations of cutoffs, a flexible application of the signal detection properties of a list of cutoffs appears to be the most sensible approach in a clinical setting. While individual CPT-II scales at various cutoffs continue to be useful at assessing the validity of the overall profile, there are clear rational and empirical advantages of aggregating EVIs and making the final decision with respect to validity based on a composite score. First, an aggregate measure relies on input from multiple different sources, reducing the influence of random findings and generating a more stable estimate of performance validity. Thus, it offers a more representative index by sampling multiple facets of the target constructs. Second, classification accuracy data show that the CVI-5s performed better than most of the individual cutoffs.

Also, given that the two versions of the CVI-5 were specifically developed to detect different manifestations of invalid responding, they offer matching alternative strategies to detect them. The CVI-5A was designed to aggregate the cumulative evidence of more extreme impairment in fewer facets of sustained visual attention. In contrast, the CVI-5B was purposefully calibrated to be overly sensitive. As such, it monitors sub-threshold level (i.e., suspect) performance patterns that are not sufficiently extreme to be classified as invalid on their own, but clearly deviate from the centroid of valid responders. Signal detection analyses suggest that even mild impairment (i.e., barely reaching the clinical cutoff) on a certain combination of the five EVIs provides evidence of in invalid responding. Although the rationale behind this mechanism may not be apparent, a string of T-scores between 61 and 65 on these scales represents a clinically implausible pattern of performance: equally elevated OMI and COM errors, overall and block-by-block response time variability in conjunction with sub-threshold level perseverative/random responding.

The side-by-side comparison between the EI-7 and WMT provides further evidence for the enhanced utility of composites versus stand-alone PVTs. Although the former incorporates the information contained by the latter, it parcels out the data provided by its monolithic approach and adds data from other independent PVTs. As a result, the difference between the valid and invalid subsamples on the CPT-II subtests is generally more pronounced when the criterion PVT is the EI-7 as opposed to the WMT, as captured by the differences in effect size estimates (Table 3), as well as AUCs, sensitivity, and specificity (Table 4).

Another advantage of the multivariate approach to effort assessment from a practitioner's perspective is the inherent flexibility of the model. Namely, it allows the user to choose a more liberal or conservative cutoff to define poor effort as well as from five different hypothetical base rates of invalid responding, to match the expectations and decision-making criteria of unique assessment settings. This level of customization has the potential to improve the classification accuracy at the individual level by applying a more nuanced assessment of performance validity.

The present study has several strengths. It used a large clinical sample with a representative FSIQ, age, education and handedness distribution, and standard neuropsychological instruments. It empirically validated cutoffs published in the technical manual and subsequent research studies, introduced alternative combination of cutoffs, and reported detailed signal detection properties of both individual EVIs and two new composites (CVI-5 A and B) in the CPT-II.

The study also has a number of limitations that should be taken into consideration. The participants were referred for neuropsychological assessment in the context of TBI. Hence, the results may not generalize to individuals with other conditions such as ADHD. The sample only contained adults, even though the CPT-II was normed on children as young as 6 years of age and continues to be a widely used instrument in pediatric populations. The effect of administration sequence was not modeled systematically despite previous reports that time-related changes can influence CPT-II scores (Erdodi & Lajiness-O'Neill, 2014; Erdodi, Lajiness-O'Neill, & Saules, 2010). Thus, further research replicating the present methodology in samples with different clinical diagnoses, age ranges and PVTs to cross-validate the EVIs in the CPT-II is necessary to establish the generalizability of the findings.

In summary, aggregating EVIs in the CPT-II improves the overall efficiency of the validity assessment on both theoretical and empirical grounds. In order to maximize the classification accuracy of a given response set, a flexible approach is recommended, which takes advantage of the opportunity to refine the signal detection accuracy of these EVIs using different levels and combinations of cutoffs. Future research exploring different combination of EVIs and cutoff as well as reference PVTs in TBI and other clinical populations is needed to increase the confidence in the multivariate approach to calibrating EVIs.

References

Arnold
G.
Boone
K. B.
Lu
P.
Dean
A.
Wen
J.
Nitch
S.
et al.  
Sensitivity and specificity of Finger Tapping Test scores for the detection of suspect effort
Clinical Neuropsychologist
 
2005
19
1
105
120
Baldessarini
R. J.
Finklestein
S.
Arana
G. W.
The predictive power of diagnostic tests and the effect of prevalence of illness
Archives of General Psychiatry
 
1983
40
5
569
573
Balint
S.
Czobor
P.
Meszaros
A.
Simon
V.
Bitter
I.
Neuropsychological deficits in adult ADHD
Psychiatria Hungarica
 
2008
23
5
324
335
Batt
K.
Shores
A. E.
Chekaluk
E.
The effect of distraction on the Word Memory Test and Test of Memory Melingering performance in patients with a severe brain injury
Journal of International Neuropsychological Society
 
2008
14
6
1074
1080
Bauer
L.
O'Bryant
S. E.
Lynch
J. K.
McCaffrey
R. J.
Fisher
J. M.
Examining the Test of Memory Malingering Trial 1 and Word Memory Test Immediate Recognition as screening tools for insufficient effort
Assessment
 
2007
14
3
215
222
Boone
K. B.
The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examination
Clinical Neuropsychologist
 
2009
23
4
729
741
Brenner
L. A.
Ladley-O'Brien
S. E.
Harwood
J. E.
Filley
C. M.
Kelly
J. P.
Homaifar
B. Y.
et al.  
An exploratory study of neuroimaging, neurologic, and neuropsychological findings in veterans with traumatic brain injury and/or posttraumatic stress disorder
Military Medicine
 
2009
174
4
347
352
Conners
K. C.
Conner's Continuous Performance Test (CPT II). Version 5 for windows. Technical Guide and Software Manual
 
2004
North Tonawada, NY
Multi-Health Systems
Erdodi
L. A.
Lajiness-O'Neill
R.
Time related changes in Conners’ CPT-II Scores: A replication study
Applied Neuropsychology: Adult
 
2014
21
1
43
50
Erdodi
L. A.
Lajiness-O'Neill
R.
Saules
K. K.
Order of Conners’ CPT-II administration within a cognitive test battery influences ADHD indices
Journal of Attention Disorders
 
2010
14
1
43
51
Feinstein
A. R.
On the sensitivity, specificity, and discrimination of diagnostic tests
Clinical biostatistics
 
1977
St. Louis, MO
Mosby
Frederick
R. I.
Evaluating constructs represented by symptom validity tests in forensic neuropsychological assessment of traumatic brain injury
Journal of Head Trauma Rehabilitation
 
2009
24
2
105
122
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
A comparison of WMT, CARB and TOMM failure rates in non-head injury disability claimants
Archives of Clinical Neuropsychology
 
2004
19
4
475
487
Green
P.
Allen
L.
Astner
K.
The Word Memory Test: A Users Guide to the Oral and Computer-Administered Forms, US Version 1.1
 
1996
Durham, NC
CogniSyst
Green
P.
Montijo
J.
Brockhaus
R.
High specificity of the Word Memory Test and Medical Symptom Validity Test in groups with severe verbal memory impairment
Applied Neuropsychology
 
2011
18
2
86
94
Greiffenstein
M. F.
Baker
J. W.
Gola
T.
Validation of malingered amnesia measures with a large clinical sample
Psychological Assessment
 
1994
6
3
218
224
Greve
K. W.
Bianchini
K. J.
Setting empirical cut-offs on psychometric indicators of negative response bias: A methodological commentary with recommendations
Archives of Clinical Neuropsychology
 
2004
19
4
533
541
Greve
K. W.
Ord
J.
Curtis
K. L.
Bianchini
K. J.
Brennan
A.
Detecting malingering in traumatic brain injury and chronic pain: A comparison of three forced-choice symptom validity tests
Clinical Neuropsychologist
 
2008
22
5
896
918
Grimes
D. A.
Schultz
K. F.
Refining clinical diagnosis with likelihood ratios
Lancet
 
2005
365
9469
1500
1505
Hartmann
D. E.
The unexamined lie is a lie worth fibbing. Neuropsychological malingering and the Word Memory Test
Archives of Clinical Neuropsychology
 
2002
17
7
719
714
Heilbronner
R. L.
Sweet
J. J.
Morgan
J. E.
Larrabee
G. J.
Millis
S. R.
American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias, and malingering
Clinical Neuropsychologist
 
2009
23
7
1093
1129
Heinly
M. T.
Greve
K. W.
Bianchini
K. J.
Love
J. M.
Brennan
A.
WAIS Digit Span-based indicators of malingered neurocognitive dysfunction. Classification accuracy in traumatic brain injury
Assessment
 
2005
12
4
429
444
Henry
G.
Probable malingering and performance of the Test of Variables of Attention
Clinical Neuropsychologist
 
2005
19
1
121
129
Iverson
G. L.
Lange
R. T.
Green
P.
Franzen
M. D.
Detecting exaggeration and malingering with the Trail Making Test
Clinical Neuropsychologist
 
2002
16
3
398
406
Jasinski
L. J.
Berry
D. T. R.
Shandera
A. L.
Clark
J. A.
Use of the Wechsler Adult Intelligence Scale Digit Span Subtest for malingering detection: A meta-analytic review
Journal of Clinical and Experimental Neuropsychology
 
2011
33
3
300
314
Johnson
S. C.
Silverberg
N. D.
Millis
S. R.
Hanks
R. A.
Symptom validity indicators embedded in the Controlled Oral Word Association Test
Clinical Neuropsychologist
 
2012
26
7
1230
1241
Lange
R. T.
Iverson
G. L.
Brickell
T. A.
Staver
T.
Pancholi
S.
Bhagwat
A.
et al.  
Clinical utility of the Conners’ Continuous Performance Test-II to detect poor effort in U.S. military personnel following traumatic brain injury
Psychological Assessment
 
2013
25
2
339
352
Larrabee
G. J.
Detection of malingering using atypical performance patterns on standard neuropsychological tests
Clinical Neuropsychologist
 
2003
17
3
410
425
Larrabee
G. J.
Larrabee
G. J.
Assessment of malingering
Forensic neuropsychology: A scientific approach.
 
2012a
2nd ed.
New York
Oxford University Press
Larrabee
G. J.
Performance validity and symptom validity in neuropsychological assessment
Journal of International Neuropsychological Society
 
2012b
18
4
625
631
Leark
R. A.
Dixon
E.
Hoffman
T.
Huynh
D.
Fake bad test response bias effects on the test of variables of attention
Archives of Clinical Neuropsychology
 
2002
17
4
335
342
Marshall
P.
Schroeder
R.
O'Brien
J.
Fischer
R.
Ries
A.
Blesi
B.
et al.  
Effectiveness of symptom validity measures in identifying cognitive and behavioral symptom exaggeration in adult attention deficit hyperactivity disorder
Clinical Neuropsychologist
 
2010
24
1204
2010
Mathias
C. W.
Greve
K. W.
Bianchini
K. J.
Houston
R. J.
Crouch
J. A.
Detecting malingered neurocognitive dysfunction using the reliable digit span in traumatic brain injury
Assessment
 
2002
9
3
301
308
Moore
B. A.
Donders
J.
Predictors of invalid neuropsychological test performance after traumatic brain injury
Brain Injury
 
2004
18
10
975
984
Ord
J. S.
Boettcher
A. C.
Greve
K. J.
Bianchini
K. J.
Detection of malingering in mild traumatic brain injury with the Conners’ Continuous Performance Test-II
Journal of Clinical and Experimental Neuropsychology
 
2010
32
4
380
387
Skandsen
T.
Finnanger
T. G.
Andersson
S.
Lydersen
S.
Brunner
J. F.
Vik
A.
Cognitive impairment 3 months after moderate and severe traumatic brain injury: A prospective follow-up study
Archives of Physical Medicine and Rehabilitation
 
2010
91
12
1904
1913
Slick
D. J.
Sherman
E. M. S.
Iverson
G. L.
Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research
Clinical Neuropsychologist
 
1999
13
4
545
561
Suhr
J.
Hammers
D.
Dobbins-Buckland
K.
Zimak
E.
Hughes
C.
The relationship of malingering test failure to self-reported symptoms and neuropsychological findings in adults referred for ADHD evaluation
Archives of Clinical Neuropsychology
 
2008
23
5
521
530