## Abstract

This examination of four embedded validity indices for the Repeated Battery for the Assessment of Neuropsychological Status (RBANS) explores the potential utility of integrating cognitive and self-reported depressive measures. Examined indices include the proposed RBANS Performance Validity Index (RBANS PVI) and the Charleston Revised Index of Effort for the RBANS (CRIER). The CRIER represented the novel integration of cognitive test performance and depression self-report information. The sample included 234 patients without dementia who could be identified as having demonstrated either valid or invalid responding, based on standardized criteria. Sensitivity and specificity for invalid responding varied widely, with the CRIER emerging as the best all-around index (sensitivity = 0.84, specificity = 0.90, AUC = 0.94). Findings support the use of embedded response validity indices, and suggest that the integration of cognitive and self-report depression data may optimize detection of invalid responding among older Veterans.

## Introduction

Neuropsychology as a clinical subdiscipline has increasingly refined strategies for evaluating the validity of measured clinical data (Slick, Tan, Strauss, & Hultsch, 2004), particularly with older adults (Tombaugh, 1997; Ashendorf, O'Bryant, & McCaffrey, 2003; Teichner & Wagner, 2004; Dean, Victor, Boone, Philpott, & Hess, 2009). Such strategies are said to measure either performance validity, which refers to the validity of performance on cognitive measures, or symptom validity, which refers to the validity of subjective symptom report (Larrabee, 2012). Rarely have these domains been integrated in the effort to better identify invalid responding. Two broad strategies for measuring response validity are often used to complement each other. The first employs stand-alone measures such as the Test of Memory Malingering (TOMM; Tombaugh, 1996) or the Word Memory Test (Green, 2003). The second strategy uses embedded measures of response validity, which typically employ combinations of cognitive measures in ways that enable identification of unusual or exaggerative cognitive symptom presentations. The primary advantages of embedded measures, such as Reliable Digit Span (Greiffenstein, Baker, & Gola, 1994), is that they do not require additional time for administration and are not readily identifiable by test takers. Both the covert administration of embedded measures, and their complexity relative to recognition paradigms (such as the TOMM), may make them less vulnerable to coaching. For these reasons, embedded response validity indices are useful as supplementary measures to be used alongside stand-alone validity measures. Like measures of any other psychometrically assessed variable, performance validity indices can differ markedly between samples (Birath, MacKillop, & Horner, 2013). This broad variability underscores the need for research evaluating these indices in samples representing the diversity of individuals who are referred for neuropsychological evaluation. In particular, this study builds on past work identifying a non-ignorable rate (15%) of invalid responding among primarily older adults presenting to a Veterans Affairs (VA) hospital memory disorders clinic (Barker, Horner, & Bachman, 2010).

Recently, Novitski, Steele, Karantzoulis, and Randolph (2012) proposed the RBANS Effort Scale (ES), an embedded performance validity measure, which they examined alongside Silverberg, Wertheimer, and Fichtenberg's (2007) RBANS Effort Index (EI). The RBANS ES attempts to identify invalid responding on the basis of large disparities between recall and recognition. This scale is applied in two steps. First, respondents with combined raw scores on the RBANS digit span and list recognition subtests ≥28, whose responses are likely to be valid representations of their cognitive functioning based on data from the normative sample, are excluded. In the second step, the RBANS ES formula (illustrated in Equation (1)) is applied and invalid responding is identified as scores <12. Novitski and colleagues (2012) reported that the RBANS ES produced an area under the curve (AUC) of 0.908 in a study contrasting responses of patients with mTBI, amnestic mild cognitive impairment, or probable Alzheimer's disease. Preliminary research using a sample of patients with dementia, and “coached” and “naïve” simulators produced AUC ranging from 0.78 to 0.93 (Dunham, Shadi, Sofky, & Denney, 2012), and supports use of the RBANS ES. The study by Dunham and colleagues (2012) was based on a smaller sample so further research is needed to establish the validity of this measure.

(1)
$RBANS ES=(List Recognition−(ListRecall+Story Recall+Figure Recall))+Digit Span,$
where the RBANS ES identifies invalid responding based largely on the discrepancy between recall and recognition, another approach to delineating embedded measures draws on work identifying consistently low scores across disparate domains of cognition as one correlate of invalid performance validity (Meyers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011; Schutte, Millis, Axelrod, & VanDyke, 2011). The RBANS EI (Silverberg et al., 2007), which employs this strategy, seeks to identify invalid responding based on low scores on both digit span and list recognition. Scores on these two subtests are converted using a weighting algorithm based on normative score distributions. Converted subtest scores are then added to calculate the RBANS EI score. Silverberg and colleagues (2007) published preliminary findings suggesting good sensitivity (86–96%) and specificity (78–96%) in a mixed sample of individuals with mild traumatic brain injury, “clinical malingerers,” and coached- and uncoached-simulated malingerers. Subsequent work found that this scale offered only modest predictive utility with specificity of 0.85, and sensitivity ranging from 0.51 to 0.64 based on varying cut-scores (Barker et al., 2010). Similarly, Hook, Marquine, and Hoelzle (2009) reportedly that the RBANS EI may offer limited utility, particularly with geriatric medical patients.

Validity of neuropsychological examination results can also be compromised by subjective exaggeration of psychiatric symptomatology. As such, symptom validity is formally evaluated based on a patient's endorsement of atypical or unusually severe psychiatric symptoms using measures such as the Structured Interview of Reported Symptoms-2 (Rogers, Kropp, Bagby, & Dickens, 1992) or the Posttraumatic Symptom Scale (Foa, Riggs, Dancu, & Rothmaum, 1993). Despite the intuitive, logical, and empirically established discrimination between symptom and performance validity (Demakis, Gervais, & Rohling, 2008), significant evidence exists that invalid responding on cognitive measures is often accompanied by over-reporting of psychiatric symptoms (Wygant et al., 2007; Gfeller & Roskos, 2013). While depressed older adults do not score differently on the TOMM than do non-depressed controls (Rees, Tombaugh, & Boulay, 2001; Ashendorf, Constantinou, & McCaffrey, 2004; Iverson, Le Page, Koehler, Shojania, & Badii, 2013), it may be that “hybrid” embedded indices integrating performance validity and self-reported mood symptomatology could better predict clinically exhibited response validity than measures assessing only performance validity. To our knowledge, this hypothesis has not been systematically explored.

The first goal of this paper is to evaluate the RBANS ES (Novitski et al., 2012) and RBANS EI (Silverberg et al., 2007), against two novel embedded validity indices in a sample of older Veterans referred for cognitive evaluation by a multidisciplinary outpatient clinic. The second goal is to examine whether a “hybrid” embedded validity index integrating both cognitive performance and self-reported mood symptoms might better predict invalid responding than a similar index using only scores on RBANS subtests.

## Method

### Participants

The sample was drawn from a larger sample of 830 individuals who were seen at an outpatient memory disorders clinic at a VA hospital in the southeastern United States. This clinic primarily serves geriatric patients, though occasionally younger patients are referred for symptoms suggesting early manifestation of dementia. Patients in this sample completed a brief, standardized neuropsychological screening evaluation as part of a larger, interdisciplinary evaluation. To minimize the risk of confounding the results of response validity testing with severe cognitive deficits (Teichner & Wagner, 2004), individuals who met criteria for dementia were excluded from this study. Thus, patients who met DSM-IV-TR criteria for dementia (including Alzheimer's disease, vascular dementia, Lewy body dementia, Korsakoff's syndrome, etc.; n = 427) were excluded. These diagnoses were based on consensus in the multidisciplinary evaluation, which included patient history, clinical interviews, full neurological examination, laboratory results, neuroimaging, and results of neuropsychological evaluation.

Memory disorders clinic personnel included a clinical neuropsychologist (MDH), a behavioral neurologist (DB), a clinical psychology intern (DP, among others), and one or more residents or geriatric psychiatry fellows. Diagnoses were assigned by interdisciplinary consensus using standardized criteria. The clinical neuropsychologist (MDH) made the primary clinical determination of valid or invalid responding, though other team members informed this decision by broadly commenting on credibility of presenting concerns. Determination of performance validity was made at the time of clinical service, prior to development of the present study's hypotheses. For this study, identification of valid or invalid responding was based on both TOMM Trial 2 scores (using standard cutoff scores) and behavioral observations made by the team. The TOMM has been found to be a useful measure of performance validity among older adults with non-dementia cognitive impairment (Teichner & Wagner, 2004), such as the participants retained in the final sample of this study. Behavior suggestive of invalid responding included neuropsychological test performance that was grossly disproportionate to the observed or reported functional status, unusual errors or error patterns across tests or test items, or impairment on testing that was inconsistent with known patterns of brain functioning. The latter criteria were implemented conservatively, meaning that only conspicuous departures from typical clinical presentations were considered possibly to represent invalid responding. For instance, a percentage of this sample were clearly functioning well independently (e.g., driving, managing finances and medication, and running errands), but performed in the severely impaired range across multiple indices of cognitive functioning. This strategy is consistent with both the multi-method approach recommended by Bush and colleagues (2005), and past research (Barker et al., 2010; Benitez, Horner, & Bachman, 2011; Bortnik, Horner, & Bachman, 2013; Horner, VanKirk, Dismuke, Turner, & Muzzy, 2014).

After respondents with incomplete data were excluded (n = 151), three groups were identified from the remaining 257 participants. Respondents in the valid responding group (n = 189, 73.5% of sample) had a passing TOMM performance as described in the test manual, and were judged to have demonstrated valid responding based on the behavioral criteria above. The “invalid response” group included 45 participants (19.3% of sample) who did not pass the TOMM based on the standardized test administration, and were judged to have exhibited invalid responding using the behavioral criteria above. The third group represented participants with indeterminate response validity, and included either patients who were identified as having valid responding based on behavioral evidence, but who did not pass the TOMM (n = 1, 0.4% of sample), or patients who did pass the TOMM but were considered to have demonstrated invalid responding by the above behavioral criteria (n = 17, 6.6% of sample). The relatively infrequent (7%) incongruity between the TOMM and the behavioral criteria of invalid responding listed above would be far greater had this step preceded the exclusion of participants who received dementia diagnoses. Patients with indeterminate response validity were excluded from the final sample. The final sample of 239 participants included 189 patients who demonstrated valid responding and 45 patients who demonstrated invalid responding. The determination of valid or invalid responding was made without knowledge of the RBANS ES, RBANS EI, RBANS PVI, or the CRIER scale, all of which were calculated only in preparation of this manuscript.

### Measures

The TOMM (Tombaugh, 1996) was administered and scored according to its standard instructions. This test consists of two trials and an optional retention trial. In the interests of ensuring routinized administration and preventing missing data, Trail 2 was consistently administered regardless of score on Trial 1. Only 2 patients scored below the author-recommended “below chance” cutoff on Trial 1 (Tombaugh, 1996), and neither of those patients passed Trail 2. The optional Retention trial was not administered to all patients, and was not used in identification of invalid responding in this study. For example, patients who obtained perfect scores on Trials 1 and 2, and who showed no behavioral evidence of invalid responding, were often not administered the Retention Trial. Published statistical norms for the TOMM identify the rate of false positives (patients inaccurately identified as responding invalidly) using the recommended clinical cutoff scores ranges from 0% for cognitively intact respondents to 7.2% for those with non-dementia cognitive impairment (Tombaugh, 1996).

The RBANS (Randolph, 1998) was administered and scored as described in its manual. The RBANS is a brief neuropsychological screening measure that includes 12 subscale scores that are clustered into five index scores (Immediate Memory, Delayed Memory, Attention, Language, and Visuospatial/Constructional) and a Total Scale score reflecting overall functioning.

Depressive symptoms were measured using the Geriatric Depression Scale (GDS; Yesavage et al., 1983). This index is comprised of 30 dichotomously scored items reflecting different aspects of depression. The score is calculated by summing the number of “depressive” responses. Scores between 10 and 19 are interpreted as reflecting mild depressive symptomatology. Scores over 20 are interpreted as suggestive of severe depressive symptomatology (Yesavage et al., 1983).

### Procedures

A brief neuropsychological evaluation was administered by a clinical neuropsychologist, a predoctoral psychology intern, or occasionally by a geropsychologist. The battery included the TOMM, the RBANS, and GDS, among other tests. Results of these evaluations were then discussed at regular meetings of the memory disorders clinic personnel. A clinical neuropsychiatric evaluation was also performed independently of neuropsychological testing that included a Mini-Mental State Exam, clock drawing, a brief mental status examination, and neurologic examination.

The RBANS ES and RBANS EI were calculated as described by Novitski et al. (2012) and Silverberg et al. (2007), respectively. Past research using the same clinic population as in this study identified the optimal RBANS EI cutoff score at 1 (Barker et al., 2010) so that cutoff was employed in this study.

Two additional embedded response validity scales were then identified by capitalizing on quantitative differences observed between respondents with valid and invalid performance. As with some other embedded validity indices (Silverberg et al., 2007; Meyers et al., 2011; Schutte et al., 2011), these two indices were developed using the rationale that patients who demonstrate invalid performance validity often produce consistently low scores across measures of disparate cognitive domains. A series of independent-samples t-tests were completed to identify RBANS subtests with mean score differences related to response validity. These t-tests contrasted patients who provided valid responses based on the TOMM and behavioral criteria versus those who presented with invalid responding on these metrics. It was found that respondents with valid responding performed better on RBANS List Recall, Story Recall, Figure Recall, Digit Span, and List Recognition.

The first of the embedded indices created for this study, the RBANS Performance Validity Index (RBANS PVI), was calculated using only the RBANS subtests found to differ between these groups. The RBANS Performance Validity Index (RBANS PVI) was calculated using Equation (2).

(2)
$RBANS PVI=list recall+story recall+figure recall+digit span+list recognition.$

Receiver–operator characteristics, described in Results, indicated that invalid responding was optimally identified by RBANS PVI scores under 42.

Additional independent-samples t-tests found significantly higher scores on the GDS among patients with invalid responding when compared with those who responded validly. The Charleston Revised Index of Effort for the RBANS (CRIER) was calculated by subtracting GDS score from the RBANS PVI score so as to maximize the difference between index scores of patients demonstrating valid and invalid responding. The CRIER is calculated using these indices as illustrated in Equation (3).

(3)
$CRIER=list recall+story recall+figure recall+digit span+list recognition−GDS.$

A cutoff score for the CRIER was identified based on the observed distribution of invalid responding with respect to scores of this index. Based on receiver–operator characteristics (described in Results), scores <24 were identified as suggestive of invalid responding.

## Results

In comparison with excluded participants (n = 588), included participants (n = 234) were more likely to identify as represented by an ethnic minority, younger with an earlier age of cognitive symptom onset, and better educated, with higher mean scores across all administered cognitive measures, and higher GDS scores (Table 1). Participants included in this sample were assigned diagnoses of major depressive disorder (12.4%), other mood disorder (12%), mild cognitive impairment (25.2%), or “other diagnosis” (3.4%). Diagnoses were deferred for 15.4% of the sample, and no diagnosis was assigned for 31.6% of the sample. Nonetheless, the RBANS total scores of respondents with valid responding range from 52 to 114, distributed essentially normally around the group mean of 85. Patients with invalid responses were, in comparison with those with valid responses, younger with an earlier reported age of cognitive symptom onset, lower mean scores on all administered cognitive measures, and higher mean GDS scores (see Table 2).

Table 1.

Description of participants from sample included and excluded from this study

Included (N = 234)

Excluded (N = 588)

Comparison
Range Mean (SD) or % Range Mean (SD) or % t or (χ2
% Female  4.3%  1.9% (4.0*)
% Minority  25.8%  31.7% (2.8)
Age 41–87 68.2 (9.9) 18–89 73.0 (9.7) 6.3**
Age at onset 18–86 63.9 (11.8) 2–92 69.4 (11.7) 5.8**
Years of education 2–20 12.8 (3.1) 1–20 12.0 (3.4) 3.0**
MMSE score 10–30 26.6 (3.3) 1–30 23.3 (5.1) 10.6**
RBANS Subtests (raw scores)
Digit span 4–16 9.6 (2.4) 0–16 8.4 (2.2) 6.0**
List recall 0–9 2.6 (2.3) 0–10 15.2 (5.6) 11.4
Story delayed recall 0–12 6.0 (3.2) 7–20 3.5 (3.1) 10.1**
Figure recall 0–20 9.7 (4.9) 0–19 5.5 (4.8) 10.9**
List recognition 6–20 17.2 (2.8) 0–10 15.5 (3.2) 7.3**
RBANS Total Score 43–116 80.3 (15.5) 41–123 66.9 (14.3) 11.3**
Geriatric Depression Scale 0–30 11.3 (7.6) 0–29 9.1 (6.7) 3.8**
Included (N = 234)

Excluded (N = 588)

Comparison
Range Mean (SD) or % Range Mean (SD) or % t or (χ2
% Female  4.3%  1.9% (4.0*)
% Minority  25.8%  31.7% (2.8)
Age 41–87 68.2 (9.9) 18–89 73.0 (9.7) 6.3**
Age at onset 18–86 63.9 (11.8) 2–92 69.4 (11.7) 5.8**
Years of education 2–20 12.8 (3.1) 1–20 12.0 (3.4) 3.0**
MMSE score 10–30 26.6 (3.3) 1–30 23.3 (5.1) 10.6**
RBANS Subtests (raw scores)
Digit span 4–16 9.6 (2.4) 0–16 8.4 (2.2) 6.0**
List recall 0–9 2.6 (2.3) 0–10 15.2 (5.6) 11.4
Story delayed recall 0–12 6.0 (3.2) 7–20 3.5 (3.1) 10.1**
Figure recall 0–20 9.7 (4.9) 0–19 5.5 (4.8) 10.9**
List recognition 6–20 17.2 (2.8) 0–10 15.5 (3.2) 7.3**
RBANS Total Score 43–116 80.3 (15.5) 41–123 66.9 (14.3) 11.3**
Geriatric Depression Scale 0–30 11.3 (7.6) 0–29 9.1 (6.7) 3.8**

Notes: RBANS = Repeatable Battery for the Assessment of Neuropsychological Status; MMSE = Mini-Mental State Exam.

*p .05; **p < .001.

Table 2.

Description of participants demonstrating valid and invalid responding, respectively

Valid responding (n = 189)

Invalid responding (n = 45)

Comparison

Range Mean (SD) or % Range Mean (SD) or % t or (χ2Cohen's d
% Female  4.2%  4.4% (0.004)
% Minority  18.0%  59.1% (31.6)**
Age 47–87 69.2 (9.4) 41–87 64.2 (10.9) 3.1* 0.41
Age at onset 20–86 65.2 (10.7) 18–86 59.1 (14.3) 3.2* 0.43
Years of education 2–20 12.9 (3.2) 6–19 12.3 (2.9) 1.1 0.15
MMSE score 21–30 27.5 (2.2) 10–30 23.1 (4.9) 5.7** 1.71
RBANS Subtests (raw scores)
Digit span 5–16 10.5 (2.3) 4–12 7.5 (2.1) 6.8** 0.89
List recall 0–9 2.9 (2.2) 0–7 1.2 (1.9) 6.6** 0.87
Story delayed recall 0–12 6.7 (2.9) 0–10 2.9 (2.8) 8.0** 1.05
Figure recall 0–20 10.85 (4.3) 0–15 4.9 (4.4) 8.3** 1.09
List recognition 12–20 18.0 (2.0) 6–18 13.87 (3.3) 8.1** 2.26
RBANS Total Score 52–116 85.0 (12.2) 43–80 59.9 (10.7) 12.6** 1.65
Geriatric Depression Scale 0–30 10.5 (7.1) 1–28 14.7 (8.4) 3.0* 0.79
Valid responding (n = 189)

Invalid responding (n = 45)

Comparison

Range Mean (SD) or % Range Mean (SD) or % t or (χ2Cohen's d
% Female  4.2%  4.4% (0.004)
% Minority  18.0%  59.1% (31.6)**
Age 47–87 69.2 (9.4) 41–87 64.2 (10.9) 3.1* 0.41
Age at onset 20–86 65.2 (10.7) 18–86 59.1 (14.3) 3.2* 0.43
Years of education 2–20 12.9 (3.2) 6–19 12.3 (2.9) 1.1 0.15
MMSE score 21–30 27.5 (2.2) 10–30 23.1 (4.9) 5.7** 1.71
RBANS Subtests (raw scores)
Digit span 5–16 10.5 (2.3) 4–12 7.5 (2.1) 6.8** 0.89
List recall 0–9 2.9 (2.2) 0–7 1.2 (1.9) 6.6** 0.87
Story delayed recall 0–12 6.7 (2.9) 0–10 2.9 (2.8) 8.0** 1.05
Figure recall 0–20 10.85 (4.3) 0–15 4.9 (4.4) 8.3** 1.09
List recognition 12–20 18.0 (2.0) 6–18 13.87 (3.3) 8.1** 2.26
RBANS Total Score 52–116 85.0 (12.2) 43–80 59.9 (10.7) 12.6** 1.65
Geriatric Depression Scale 0–30 10.5 (7.1) 1–28 14.7 (8.4) 3.0* 0.79

Notes: RBANS = Repeatable Battery for the Assessment of Neuropsychological Status; MMSE = Mini-Mental State Exam.

*p < .01; **p < .001.

The RANS ES was calculated and applied as described above. Using the recommended cutoff score of 12 (Novitski et al., 2012), sensitivity for invalid responding was 0.42 and specificity was 0.71. The AUC of the RBSANS ES was 0.71 (Fig. 1A), and did not reveal alternate cut-points with substantively improved sensitivity and specificity. Using a cutoff score of 1 (Silverberg et al., 2007), the RBANS EI produced sensitivity of 0.93, specificity of 0.63, and AUC of 0.883. The RBANS PVI was found to have sensitivity of 0.82, specificity of 0.77, and AUC of 0.90. Finally, the CRIER had sensitivity of 0.84, specificity of 0.90, and AUC of 0.94. Confusion matrices are shown in Table 3 and AUC curves are shown in Fig. 1.

Table 3.

Confusion matrices for (a) the RBANS ES, (b) The RBANS EI, (c) the RBANS PVI, and (d) the Charleston Revised Index of Effort for the RBANS (CRIER)

Responding

Total  Responding

Total
Valid Invalid   Valid Invalid
(a) RBANS ES (b) RBANS EI
Pass 134 26 160 Pass 119 122
Fail 55 19 74 Fail 70 42 112
Total 189 45 234 Total 189 45 234
Sen. 0.42   Sen. 0.93
Spec. 0.71   Spec. 0.63
PPV 0.26   PPV 0.38
NPV 0.84   NPV 0.98
(c) RBANS PVI (d) CRIER
Pass 145 153 Pass 171 178
Fail 44 37 81 Fail 18 38 56
Total 189 45 234 Total 189 45 234
Sen. 0.82   Sen. 0.84
Spec. 0.77   Spec. 0.90
PPV 0.46   PPV 0.68
NPV 0.95   NPV 0.96
Responding

Total  Responding

Total
Valid Invalid   Valid Invalid
(a) RBANS ES (b) RBANS EI
Pass 134 26 160 Pass 119 122
Fail 55 19 74 Fail 70 42 112
Total 189 45 234 Total 189 45 234
Sen. 0.42   Sen. 0.93
Spec. 0.71   Spec. 0.63
PPV 0.26   PPV 0.38
NPV 0.84   NPV 0.98
(c) RBANS PVI (d) CRIER
Pass 145 153 Pass 171 178
Fail 44 37 81 Fail 18 38 56
Total 189 45 234 Total 189 45 234
Sen. 0.82   Sen. 0.84
Spec. 0.77   Spec. 0.90
PPV 0.46   PPV 0.68
NPV 0.95   NPV 0.96

Notes: Sens. = sensitivity; Spec. = specificity; PPV = positive predictive value; NPV = negative predictive value.

Fig. 1.

Receiver–operator characteristic curves for (A) the RBANS ES, (B) the RBANS EI, (C) RBANS PVI, and (D) the Charleston Revised Index of Effort for the RBANS.

Fig. 1.

Receiver–operator characteristic curves for (A) the RBANS ES, (B) the RBANS EI, (C) RBANS PVI, and (D) the Charleston Revised Index of Effort for the RBANS.

Hanley and MacNeil's (1983) procedure was used for evaluating the significance of differences between AUCs derived from this single sample. Results of these analyses, illustrated in Table 4, suggest that the RBANS EI, RBANS PVI, and the CRIER all produced better receiver-operator characteristics than did the RBANS ES. Notably, the CRIER was also superior to the RBANS PVI, suggesting that inclusion of the GDS in the CRIER significantly improved its clinical utility.

Table 4.

Quantitative comparisons of AUCs for the RBANS ES, RBANS EI, RBANS PVI, and CRIER

Index 1 Index 2 AUC1 AUC2 SE1 SE2 R Z
RBANS ES RBANS EI 0.71 0.88 0.03 0.03 −0.44 −3.23*
RBANS ES CRIER 0.71 0.94 0.03 0.01 0.39 −7.61**
RBANS ES RBANS PVI 0.71 0.90 0.03 0.02 0.30 −5.42**
RBANS EI CRIER 0.88 0.94 0.03 0.01 −0.68 −1.44
RBANS EI RBANS PVI 0.88 0.90 0.03 0.02 −0.73 −0.28
CRIER RBANS PVI 0.94 0.90 0.01 0.02 0.85 3.15*
Index 1 Index 2 AUC1 AUC2 SE1 SE2 R Z
RBANS ES RBANS EI 0.71 0.88 0.03 0.03 −0.44 −3.23*
RBANS ES CRIER 0.71 0.94 0.03 0.01 0.39 −7.61**
RBANS ES RBANS PVI 0.71 0.90 0.03 0.02 0.30 −5.42**
RBANS EI CRIER 0.88 0.94 0.03 0.01 −0.68 −1.44
RBANS EI RBANS PVI 0.88 0.90 0.03 0.02 −0.73 −0.28
CRIER RBANS PVI 0.94 0.90 0.01 0.02 0.85 3.15*

Notes: *p < .01; **p < .001.

Though the CRIER's balanced sensitivity and specificity are encouraging, other cutoffs may be preferable in situations demanding even greater specificity. To provide context to the reported ROC curves, a series of cross-tabs were generated examining sensitivity and specificity using varying CRIER cutoff values. Those results (Table 5) suggest a cutoff of <21 does elevate specificity to 0.93 while preserving modest sensitivity (0.69). The use of lower cutoffs reduces measure sensitivity beyond acceptable levels, and is not recommended for this measure. Alternate cutoff values for the RBANS ES (Novitski et al., 2012) and RBANS EI (Silverberg et al., 2007) are discussed in their respective publications. Alternative cutoff values for the RBANS EI were also examined in another study (Barker et al., 2010) using a dataset that largely overlaps with that used in this study.

Table 5.

Sensitivity and specificity values associated with alternate CRIER cutoff values

Cutoff Sensitivity Specificity
<24 0.84 0.90
<23 0.80 0.92
<22 0.73 0.92
<21 0.69 0.93
<20 0.69 0.93
<19 0.53 0.94
<18 0.44 0.95
<17 0.36 0.96
<16 0.33 0.97
<15 0.33 0.98
Cutoff Sensitivity Specificity
<24 0.84 0.90
<23 0.80 0.92
<22 0.73 0.92
<21 0.69 0.93
<20 0.69 0.93
<19 0.53 0.94
<18 0.44 0.95
<17 0.36 0.96
<16 0.33 0.97
<15 0.33 0.98

## Discussion

In this sample of older adults referred for evaluation of memory disorders, the RBANS ES did not clearly discriminate between valid and invalid responding. However, the CRIER, a new embedded performance validity index utilizing RBANS and GDS scores, quite reliably distinguished valid from invalid responding in our sample. The CRIER and RBANS PVI, both created for this study, are distinguished from each other only by the inclusion (subtraction) of GDS score. Nonetheless, this combination of cognitive subtests previously used in embedded response validity measures (RBANS EI, RBANS ES) and the GDS better predicted assessed performance validity than did the cognitive subtests alone. The CRIER and RBANS EI did not significantly differ with respect to percentage of overall variance in performance validity accounted for. The CRIER is easier to calculate and produced superior specificity in this sample of older adults, though the RBANS EI did produce modestly better specificity.

These results underscore the clinical utility of embedded response validity testing as a supplementary strategy for the identification of invalid responding, particularly with older adults. The notably high rate of invalid responding in this sample contradicts the intuitive expectation that, lacking obvious incentives for embellishing the presentation of cognitive complaints and facing limitations on independence imposed by dementia, older adults would complete all cognitive measures to the best of their ability. In fact, these findings identify invalid responding as a common threat to the validity of evaluation for memory disorders. Embedded measures are one method for evaluating performance validity in memory disorders clinics where brief neuropsychological evaluations might be tightly scheduled between evaluations by other professionals.

It is interesting that, in this sample, the RBANS ES so poorly distinguished between valid and invalid responding, though this scale produced far better results in other studies (Dunham et al., 2012; Novitski et al., 2012). One possible reason for this may be sample selection. Individuals identified by Novitski and colleagues (2012) as having invalid responding included neurologically normal adults with a history of a blow to the head but no substantive loss of consciousness. In the present study, the invalid response group was not identified at the time of study enrollment, but rather at the time of clinical evaluation. By contrast to Novitski and colleagues' sample of mild TBI patients (mean age = 49.0 years), the invalidly responding group in this study was much older, and like the subsample with valid responses, was presenting to a memory disorders clinic. Given the known influence of external incentives on patient presentation (Flaro, Green, & Robertson, 2007), it is a strength that this study compares validly and invalidly responding patients with common referral questions and clinical evaluations.

It is possible that the very different populations of patients with invalid responding represented by these samples may present with very different motivations (Rogers, Salekin, Sewell, Goldstein, & Leonard, 1998). While it is unclear what invalid performance reflects among patients in a memory disorders clinic, brief neuropsychological evaluations in this study were not related to the determination of service-connected disability, and were not performed for forensic reasons; there was no obvious financial motivation for invalid responding. For some of these patients, invalid responding may reflect factors such as apathy, sickness behavior, or an effort to ensure that troubling symptoms do not go unrecognized. Whatever the case, invalid responding is a highly complex clinical behavior that can be characterized by atypical patient performance across multiple behavioral domains. Thus, an emerging literature suggests invalid responding may be best identified by the presence of multiple indicators (Slick, Sherman, & Iverson, 1999). For instance, past work has identified variable performance on neurocognitive measures among patients with pain-related malingering (Bianchini, Greve, & Glynn, 2005), and MMPI-2 profile characteristics consistent with mild traumatic brain injury (Greve, Bianchini, Love, Brennan, & Heinly, 2014). Similarly, Russo (2012) has argued that identification of invalid responding among OEF/OIF veterans could be improved by integrating across multiple data sources. In the context of these findings, perhaps it is less surprising that the combination of subjective mood and cognitive variables emerged in this study as predictive of invalid responding. It should also be considered that participants may be reacting to a diagnosis threat created by receiving a memory disorders clinic referral (Suhr & Gunstad, 2002). Novitski and colleagues (2012) comparison group included 69 older adults (mean age = 80.1 years) with either amnestic MCI or probable Alzheimer's disease to whom no formal measures of performance validity were administered. In contrast, the comparison group used in this study included older adults who exhibited valid responding and did not meet clinical criteria for a dementia diagnosis.

The findings suggest an interesting hypothesis about the influence of sample selection on the findings of symptom and performance validity studies. In particular, Novitski's and colleagues (2012) finding that disparities between recall and recognition predict other measures of performance validity may be very informative to the identification of invalid responding among patients with TBI, while memory disorders clinic patients may express their desire for symptom identification in ways that are better characterized by the blending of performance and symptom validity measures, as was done here. In addition to validating this measure across other samples, future research might explore these differences between samples and the patient populations they represent. One strategy may be to characterize the construct of “invalid responding” by identifying psychosocial correlates including perceived access to resources, or the presence of unmet social needs. The absence of such variables is one limitation of this study.

Other limitations of this study include the lack of consensus on how to delineate invalid responding in research samples. In this study, identification of invalid responding was largely based on a single measure (the TOMM), with only moderate sensitivity (Gervais, Rohling, Green, & Ford, 2004). Thus, it remains possible that the valid response group may include some respondents with more severe MCI or misidentified participants who in fact exerted marginally valid responding. If so, this may explain the broad range of RBANS raw scores among patients identified as having demonstrated valid responding. Nonetheless, the conceptualization of invalid responding used in this study is consistent with evaluation strategies used among practicing clinical neuropsychologists, and with the multi-method approach recommended by Bush et al. (2005). Another limitation of this study is that the sample was comprised exclusively of older Veterans, the vast majority of whom are male. These findings may generalize to similar patient populations, though the generalizability of these findings to other patient populations is unclear. Future research should evaluate the utility of these embedded indices in patient populations that are more heterogeneous with respect to ethnicity and education, for example. Furthermore, it is unknown how variables such as VA service-connected disability might be related to performance validity in this sample, and future research might address this important question. Another unknown factor is how artificially dichotomizing performance validity into “valid responding” and “invalid responding” groups may exaggerate the apparent difference between these groups. Though exclusion of participants with indeterminate performance validity reduced the sample size by only 7% after other exclusionary criteria were applied, this remains an important question in our ongoing efforts to generalize research on response validity to clinical practice. Additionally, the identification of invalid responding based on substantive discrepancies between cognitive test performance and daily functioning may artificially dichotomize the construct of response validity, thereby overestimating of the relationship between RBANS test scores and assessed performance validity. This risk is minimal in this sample, as only 1 participant was excluded for demonstrating behavioral evidence of invalid responding despite a passing TOMM score. Nonetheless, these risks should be carefully evaluated in future research. Another limitation of this study is that the CRIER was not evaluated on a second sample. Future research should work to replicate these findings across samples using additional or alternative performance validity measures.

In conclusion, this study presents a sensitive, specific, and easy-to-compute validity index that shows promise for clinical application. In combination with the above-cited performance validity research, these findings suggest that neuropsychological characteristics of invalid responding may vary between patient populations.

## References

Ashendorf
L.
Constantinou
M.
McCaffrey
R. J.
(
2004
).
The effect of depression and anxiety on the TOMM in community-dwelling older adults
.
Archives of Clinical Neuropsychology
,
19
,
125
130
.
Ashendorf
L.
O'Bryant
S. E.
McCaffrey
R. J.
(
2003
).
Specificity of malingering detection strategies in older adults using the CVLT and WCST
.
The Clinical Neuropsychologist
,
17
(2)
,
255
262
.
Barker
M. D.
Horner
M. D.
Bachman
D. L.
(
2010
).
Embedded indices of effort in the repeatable battery for the assessment of neuropsychological status (RBANS) in a geriatric sample
.
The Clinical Neuropsychologist
,
24
,
1064
1077
.
Benitez
A.
Horner
M. D.
Bachman
D. L.
(
2011
).
Intact cognition in depressed elderly veterans providing adequate effort
.
Archives of Clinical Neuropsychology
,
26
,
184
193
.
Bianchini
K. J.
Greve
K. W.
Glynn
G.
(
2005
).
On the diagnosis of malingered pain-related disability: Lessons from cognitive malingering research
.
The Spine Journal
,
5
,
404
417
.
Birath
J. B.
MacKillop
E. A.
Horner
M. D.
(
2013
).
Standard Neuropsychoogical Measures as Embedded Indicators of Effort in a Clinical Sample
.
Journal of the International Neuropsychological Society
,
19
(
S1
),
161
162
.
Bortnik
K. E.
Horner
M. D.
Bachman
D. L.
(
2013
).
Performance on standard indices of effort among patients with dementia
.
,
20
4
,
233
242
.
Bush
S. S.
Ruff
R. M.
Troster
A. I.
Barth
J. T.
Koffler
S. P.
Pliskin
N. H.
et al
(
2005
).
Symptom validity assessment: Practice issues and medical necessity
.
Archives of Clinical Neuropsychology
,
20
(4)
,
419
426
.
Dean
A. C.
Victor
T. L.
Boone
K. B.
Philpott
L. M.
Hess
R. A.
(
2009
).
Dementia and effort test performance
.
The Clinical Neuropsychologist
,
23
(1)
,
133
152
.
Demakis
G. J.
Gervais
R. O.
Rohling
M. L.
(
2008
).
The effect of failure on cognitive and psychological symptom validity tests in litigants with symptoms of post-traumatic stress disorder
.
The Clinical Neuropsychologist
,
22
,
879
895
.
Dunham
K. J.
S.
Sofky
C.
Denney
R. L.
(
2012
).
Preliminary look: Comparison of the Effort Index and the Effort Scale on the RBANS
.
Paper presented at the 32nd Annual Conference of the National Academy of Neuropsychology
,
Nashville, TN
.
Flaro
L.
Green
P.
Robertson
E.
(
2007
).
Word memory Test failure 23 times higher in mild brain injury that in patients seeking custody: The power of external incentives
.
Brain Injury
,
21
(4)
,
373
383
.
Foa
E. B.
Riggs
D. S.
Dancu
C. V.
Rothmaum
B. O.
(
1993
).
Reliability and validity of a brief instrument for assessing post-traumatic stress disorder
.
Journal of Traumatic Stress
,
6
(4)
,
459
473
.
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
(
2004
).
A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants
.
Archives of Clinical Neuropsychology
,
19
(4)
,
475
487
.
Gfeller
J. D.
Roskos
P. T.
(
2013
).
A comparison of insufficient effort rates, neuropsychological functioning, and neuropsychiatric symptom reporting in military veterans and civilians with chronic traumatic brain injury
.
Behavioral Sciences and the Law
,
31
(6)
,
833
849
.
Green
P.
(
2003
).
Word Memory Test for Windows: Users manual and program
.
Edmonton
:
Green's Publishing
.
Greiffenstein
M. F.
Baker
W. J.
Gola
T.
(
1994
).
Validation of malingered amnesia measures with a large clinical sample
.
Psychological Assessment
,
6
(3)
,
218
224
.
Greve
K. W.
Bianchini
K. J.
Love
J. M.
Brennan
A.
Heinly
M. T.
(
2014
).
Sensitivity and specificity of MMPI-2 validity scales and indicators to malingered neurocognitive dysfunction in traumatic brain injury
.
The Clinical Neuropsychologist
,
20
,
491
512
.
Hanley
J. A.
MacNeil
B. J.
(
1983
).
A method of comparing the areas under receiver operating characteristic curves derived from the same cases
.
,
148
(3)
,
839
843
.
Hook
J. N.
Marquine
M. J.
Hoelzle
J. B.
(
2009
).
Repeatable Battery for the Assessment of Neuropsychological Status Effort Index performance in a medically ill geriatric sample
.
Archives of Clinical Neuropsychology
,
24
(3)
,
231
235
.
Horner
M. D.
VanKirk
K. K.
Dismuke
C. E.
Turner
T. H.
Muzzy
W.
(
2014
).
Inadequate effort on neuropsychological evaluation is associated with increased healthcare utilization
.
The Clinical Neuropsychologist
,
28
(5)
,
703
713
.
Iverson
G. L.
Le Page
J.
Koehler
B. E.
Shojania
K.
M.
(
2013
).
Test of Memory Malingering (TOMM) scores are not affected by chronic pain or depression in patients with fibromyalgia
.
The Clinical Neuropsychologist
,
21
,
532
546
.
Larrabee
G. J.
(
2012
).
Performance validity and symptom validity in neuropsychological assessment
.
Journal of the International Neuropsychological Society
,
18
,
1
7
.
Meyers
J. E.
Volbrecht
M.
Axelrod
B. N.
Reinsch-Boothby
L.
(
2011
).
Embedded symptom validity tests and overall neuropsychological test performance
.
Archives of Clinical Neuropsychology
,
26
(1)
,
8
15
.
Novitski
J.
Steele
S.
Karantzoulis
S.
Randolph
C.
(
2012
).
The Repeatable Battery for the Assessment of Neuropsychological Status Effort Scale
.
Archives of Clinical Neuropsychology
,
27
,
190
195
.
Randolph
C.
(
1998
).
Repeatable battery for the assessment of neuropsychological status manual
.
San Antonio, TX
:
The Psychological Corporation
.
Rees
L. M.
Tombaugh
T. N.
Boulay
L.
(
2001
).
Depression and the test of memory malingering
.
Archives of Clinical Neuropsychology
,
16
,
501
506
.
Rogers
R.
Kropp
P. R.
Bagby
R. M.
Dickens
S. E.
(
1992
).
Faking specific disorders: A study of the structured interview of reported symptoms (SIRS)
.
Journal of Clinical Psychology
,
48
(5)
,
643
648
.
Rogers
R.
Salekin
R. T.
Sewell
K. W.
Goldstein
A.
Leonard
K.
(
1998
).
A comparison of forensic and nonforensic malingerers: A prototypical analysis of explanatory models
.
Law and Human Behavior
,
22
(4)
,
353
367
.
Russo
A. C.
(
2012
).
Symptom validity test performance and consistency of self-reported memory functioning of Operation Enduring Freedom/Operation Iraqi Freedom Veterans with positive Veteran Health Administration comprehensive traumatic brain injury evaluations
.
Archives of Clinical Neuropsychology
,
27
(8)
,
840
848
.
Schutte
C.
Millis
S.
Axelrod
B.
VanDyke
S.
(
2011
).
Derivation of a composite measure of embedded symptom validity indices
.
The Clinical Neuropsychologist
,
25
(3)
,
454
462
.
Silverberg
N. D.
Wertheimer
J. C.
Fichtenberg
N. L.
(
2007
).
An effort index for the repeatable battery for the assessment of neuropsychological status (RBANS)
.
The Clinical Neuropsychologist
,
21
(5)
,
841
854
.
Slick
D. J.
Sherman
E. M. S.
Iverson
G. L.
(
1999
).
Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research
.
The Clinical Neuropsychologist
,
13
(4)
,
545
561
.
Slick
D. J.
Tan
J. E.
Strauss
E. H.
Hultsch
D. F.
(
2004
).
Detecting malingering: A survey of experts’ practices
.
Archives of Clinical Neuropsychology
,
19
,
465
473
.
Suhr
J. A.
J.
(
2002
).
“Diagnosis threat:” The effect of negative expectations on cognitive performance in head injury
.
Journal of Clinical and Experimental Neuropsychology
,
24
(4)
,
448
457
.
Teichner
G.
Wagner
M. T.
(
2004
).
The test of memory malingering (TOMM): Normative data from cognitively intact, cognitively impaired, and elderly patients with dementia
.
Archives of Clinical Neuropsychology
,
19
(3)
,
455
464
.
Tombaugh
T. N.
(
1996
).
The Test of Memory Malingering
.
:
MultiHealth Systems
.
Tombaugh
T. N.
(
1997
).
The Test of Memory Malingering (TOMM): Normative data from cognitively intact and cognitively impaired individuals
.
Psychological Assessment
,
9
(3)
,
260
268
.
Wygant
D. B.
Sellbom
M.
Ben-Porath
Y. S.
Stafford
K.
Freeman
D. B.
Heilbronner
R. L.
(
2007
).
The relationship between symptom validity testing and MMPI-2 scores as a function of forensic evaluation context
.
Archives of Clinical Neuropsychology
,
22
(4)
,
489
499
.
Yesavage
J. A.
Brink
T. L.
Rose
T. L.
Lum
O.
Huang
V.
M.
et al
(
1983
).
Development and validation of a geriatric depression screening scale: A preliminary report
.
Journal of Psychiatric Research
,
17
(1)
,
37
49
.