Abstract

The present meta-analysis provides the first meta-analysis of research on stand-alone neurocognitive feigning tests since publication of the preceding paper by Vickery, Berry, Inman, Harris & Orey (2001). Studies of dedicated neurocognitive feigning test performances in adults appearing in published or unpublished (theses and dissertations) sources through October 2010 were reviewed and subjected to stringent inclusion criteria to maximize the validity of results. Neurocognitive feigning tests were included only if at least three contrasts of criterion-supported honest patient groups and feigners were available. Tests that met criteria for review included the Victoria Symptom Validity Test, used as an anchor to compare Vickery and colleagues' results; Test of Memory Malingering, Word Memory Test, Letter Memory Test, and Medical Symptom Validity Test. Effect sizes and test parameters at published cut scores were compiled and compared. Results reflected large effect sizes for all measures (mean d = 1.55, 95% confidence interval [CI] = 1.48–1.63). Mean specificity was 0.90 (95% CI = 0.85–0.94). Mean sensitivity was 0.69 (95% CI = 0.63–0.75). Several moderators of effect size were identified, with certain manipulations resulting in a weakening of effect size. Unexpectedly, warning simulators to feign believably increased effect sizes.

Introduction

A meta-analysis published by Vickery and colleagues (2001) in the edition of Archives of Clinical Neuropsychology examined the state of research support for tests dedicated to identifying neurocognitive feigning. The goal of that review was not to crown a “gold standard,” but to provide a quantitative analysis of how well the available measures might meet clinicians' needs. Fourteen feigning tests were identified at that time, of which only a few were well-studied. In fact, just five (variations on the Hiscock and Hiscock Digit Memory Test [DMT; Hiscock & Hiscock, 1989], the Portland Digit Recognition Test [PDRT; Binder, 1993; Binder & Willis, 1991, the Dot Counting Test, 15-Item Test [both Lezak, 1976] and the 21-Item Test [Iverson, Franzen & McCracken, 1991]) supported coding of at least three independent effect sizes, the minimal requirement for inclusion set by Vickery and colleagues. Using rather broad-based criteria of evaluations comparing one or more groups of individuals at least inferred to be responding honestly with another group at least inferred to be exaggerating or fabricating, Vickery and colleagues' analysis demonstrated that three measures were particularly useful at separating the groups. The DMT was found to distinguish probably honest and probably feigning individuals by approximately two pooled standard deviations, the best of the measures examined. The 21-Item Test and PDRT, however, also demonstrated strength in separating groups by almost 1.5 SD each, with the mean effect size across all five tests of 1.15. Although this aspect of the measures' performances can be viewed as “good news,” clinical choice of measures is more directly affected by test's operating characteristics—sensitivity for detecting a feigning individual, specificity for classifying an honest performer, and overall hit rate. At the time of Vickery and colleagues' assessment, the literature reviewed suggested that, as a group, these measures classified honest individuals with statistically equivalent specificity, at a mean value of 95.7%. Sensitivity was more variable however (overall mean value = 56.0%), ranging from 22.0% for the 21-Item Test to 83.4% for the DMT. Thus, although clinicians and clients could be confident that effortful performance would only rarely be misclassified using a single instrument, a distinct limitation demonstrated by most of these instruments was modest sensitivity to feigning.

In the decade since completion of the Vickery and colleagues meta-analysis, interest in the development and validation of effort measures has grown, as evidenced by substantial additional publications in the literature. Further, a position paper published by the National Academy of Neuropsychology (Bush et al., 2005) urged that “the assessment of symptom validity is an essential part of a neuropsychological evaluation,” and “the clinician should be prepared to justify a decision not to assess symptom validity as part of a neuropsychological evaluation” (p. 421). The use of these measures has also been spurred by the 1993 Daubert federal rulings on the admissibility of expert scientific opinion (Daubert v. Merrell Dow Pharmaceuticals Inc., 509 U.S. 579, 1993). Finally, the need for these tests has also been heightened by increasing evidence for fairly high base rate estimates of suboptimal effort in clinical practice, ≥40% in some settings (seeLarrabee, 2003; Mittenberg, Patton, Canyock, & Condit, 2002). In fact, these prevalence rates may be an underestimate, as most tests for neurocognitive feigning emphasize high specificity as the cost of sensitivity, as demonstrated in the Vickery and colleagues (2001) results detailed above.

The goals of the present study were 2-fold. The first was to provide clinicians with an updated evaluation of stand-alone effort measures either unavailable or not well-studied enough to be included in the earlier Vickery and colleagues meta-analytic review. As with that study, the meta-analytic technique was employed, because it is well-supported for use in neuropsychological research (Demakis, 2006) and allows the statistical combination of test results across multiple studies. Combined results possess greater power to detect differences than individual studies. Moreover, the resulting aggregate scores can be statistically adjusted for biasing factors such as small sample size. Further, the effect size estimates and test parameters can be subjected to analyses to determine how well the tests perform relative to each other.

This second goal of this study was to provide clinicians and researchers with a better understanding of the variables that affect test performance across studies. Such factors, called moderators, may include the type of research design employed, the preparation or coaching of feigners, participants' history of clinical “experience,” or even the provision of payments to motivate simulators. Meta-analysis can provide quantitative description of variables affecting test performance at a group level. Understanding the effect of moderators on effect size is essential for knowing how reflective various studies' results are of a tool's performance in the real world.

Several modifications were made from the Vickery and colleagues meta-analysis in order to reflect changes in malingering research methodologies since completion of that study and to increase the clinical generalizability of findings. Broadly speaking, inclusion criteria were greatly tightened relative to the Vickery and colleagues review. The use of a well-defined exclusion criterion set is important in meta-analysis (seeLipsey & Wilson, 2001) and has been employed by recent researchers in this field (seeNelson, Sweet, & Demakis, 2006, for an example). The “restrictive criteria” procedure mitigates two common criticisms of meta-analysis: the “apples and oranges” criticism (Sharp, 1997), which argues that data combined may come from studies too dissimilar to yield a valid generalization, and the “garbage in–garbage out” criticism, which suggests that data from methodologically poor studies may contaminate the overall findings. In addition to using more restrictive inclusion criteria, the present review imposed a modified list of “methodological (strength) characteristics” quantitatively combined to shed light on possible contributions to reported results.

Methods

The present review sought “stand-alone” (non-embedded) procedures designed with the primary intent of detecting inadequate motivation to perform well on neuropsychological tests. A search of the PsycInfo database through October 2010, as well as a review of reference sections in obtained sources, was used to identify tests and papers meeting the following criteria:

  1. presented results from a study investigating the performance of at least one stand-alone (non-embedded) test intended to detect neurocognitive feigning;

  2. not reviewed by Vickery and colleagues (2001) except for the Victoria Symptom Validity Test (VSVT; see below);

  3. published in a professional journal or appearing in a thesis or dissertation study in English;

  4. included only adult participants;

  5. included a clinical honest control group of neurological or psychiatric patients;

  6. proved compelling evidence for the accuracy of assignment to honest or feigning groups, such as use of the Slick, Sherman, and Iverson (1999) criteria for known-group (KG) studies. In the case of a simulated feigning design, objective evidence for honesty of a patient control group is required;

  7. enough data available to code at least three contrasts of honest versus feigning groups.

Studies Meeting Inclusion Criteria

As noted above with one exception, tests reviewed in the Vickery and colleagues meta-analysis were not reconsidered here. The exception made was for the VSVT (Slick, Hopp, Strauss, & Spellacy, 1996), which resembles a computerized version of the original Hiscock and Hiscock DMT evaluated by Vickery and colleagues. In that study, the group of Hiscock variations was noted to have the highest sensitivity to detect feigning, as well as the largest effect size to separate groups of individuals believed to be honest or feigning. Thus, it was intended that including VSVT data would serve as an “anchor” to approximate comparisons of the present to previously reviewed tests. It should be noted that, with the exception of the Slick and colleagues (1996) publication, no VSVT data presented in this paper were included in the previous meta-analysis.

Table 1 presents a synopsis of the initial literature review, illustrating the large number of available stand-alone effort measures and those meeting inclusion criteria. In total, 306 papers were reviewed, with contrasts from only 41 meeting the stringent inclusion criteria outlined above. Ten of these provided data for multiple feigning measures included in this paper. The vast majority of studies excluded from this review either provided only specificity data for test performance in new populations, compared performance of groups with uncertain criterion status (e.g., differential prevalence design), or did not provide any control patient data. In addition to the VSVT, tests meeting inclusion criteria included the Letter Memory Test (LMT; Inman et al., 1998), the Medical Symptom Validity Test (MSVT; Green, 2004), the Test of Memory Malingering (TOMM; Tombaugh, 1997), and the Word Memory Test (WMT; Green, 2003).

Table 1.

Measures identified for potential use in meta-analytic review, number of studies meeting inclusion criteria

Measure Total references Meeting criteria 
*Test of Memory Malingering 138 21 
*Word Memory Test 64 
*Medical Symptom Validity Test (Original) (Non-Verbal MSVT) 27 3 (2) 
Computerized Assessment of Response Bias 23 
*Letter Memory Test 17 11 
*Victoria Symptom Validity Test 17 
Validity Indicator Profile 16 
b Test 16 
Word Completion Memory Test 
Rey Word Recognition List 
Colorado Priming Test 
Dyslexia Assessment of Simulation or Honesty 
Word Reading Test 
48-Pictures Test 
Rey II Test 
Rey Memory Complaint Test-II 
Measure Total references Meeting criteria 
*Test of Memory Malingering 138 21 
*Word Memory Test 64 
*Medical Symptom Validity Test (Original) (Non-Verbal MSVT) 27 3 (2) 
Computerized Assessment of Response Bias 23 
*Letter Memory Test 17 11 
*Victoria Symptom Validity Test 17 
Validity Indicator Profile 16 
b Test 16 
Word Completion Memory Test 
Rey Word Recognition List 
Colorado Priming Test 
Dyslexia Assessment of Simulation or Honesty 
Word Reading Test 
48-Pictures Test 
Rey II Test 
Rey Memory Complaint Test-II 

Notes: MSVT=Medical Symptom Validity Test. The Validity Indicator Profile was not evaluated due to difficulty in deriving mean scores. Asterisk denotes those included in the review.

Data Extraction

Articles and theses/dissertations meeting the above characteristics were coded to obtain several characteristics. Eligible studies were classified as either KG, in which patients undergoing evaluation were independently and objectively classified as honest or feigning during testing, simulation designs including patients instructed to feign deficits or answer honestly (Sim:PTFGN), or simulation designs contrasting normals instructed to feign deficits with patients answering honestly (Sim:NLFGN). Also coded were patient or normal characteristics used in each comparison (e.g., mild mental retardation, head injury, non-patient students, etc.); group demographic variables (e.g., age, education, gender); the means and standard deviations for each group on the measure of interest; and the sensitivity and specificity of each test at the recommended cutting score. Lastly, several methodological characteristics of each study were coded in order to rate the strength of each study. Methodological characteristics (provided in Supplementary material online, Appendix A) were used to gauge the strength of each study and closely match those of Vickery and colleagues (2001) while including more recent suggestions for maximizing the reliability and validity of studies' results (seeBerry et al., 2002; Berry & Schipper, 2007). A total methodological strength score was obtained and divided by the number of possible points for each design type in order to create a proportion that was independent of research design type and more directly comparable with the other values. Higher scores represent methodologically stronger studies.

Following coding of descriptive and test score data, effect sizes for the separation of honest and feigning groups were calculated, using the DSTAT meta-analytic software (Johnson, 1993). Effect size extraction was done in a manner that sought to maximize the number of sample-independent contrasts. When a study provided data from multiple honest and feigning samples (e.g., honest psychiatric patients, honest brain-injured community volunteers, normal coached feigners, and normal naïve feigners), multiple independent effect sizes were extracted. When multiple clinical honest samples were available with only one feigning sample, clinical data were aggregated in order to use all available data. As noted earlier, only contrasts that included feigners and an honest clinical group were included in analyses.

Because statistical comparisons were derived from each study, consideration of the independence of contrasts is warranted. Here, two points must be noted. First, some studies provided data on multiple tests using the same participant samples. In this case, sample non-independence was allowed. It was felt that test comparisons within the same study were more valuable than the risk of correlated effect sizes. The second point is that several feigning measures include multiple subscales. The VSVT provides three indices, and the percent correct “Hard” trial was chosen for analysis. In the case of the TOMM Trial 2 and Retention Trial, indices are typically highly correlated, thus a decision was made to include only Trial 2 data in analyses as more data were available for that index. For the WMT and the MSVT, the smaller number of eligible studies and variability in combining indices for a determination of individual feigning status dictated the calculation of an aggregate effect size, collapsed across each index within the test.

In order to ascertain coding accuracy, all effect sizes were calculated twice for verification purposes. Additionally, 15 sources were cross-coded by a second rater. Inter-rater reliabilities were found to be acceptable for demographic characteristics (r = 1.0), test means and standard deviations (r = .98), methodological characteristics (r = .99), and classification rate data (sensitivity [r = .95], specificity [r = 1.0]).

Results

After applying the above-noted criteria to the selection of studies and their contributing data, 49 contrasts were identified. Seven of these came from studies evaluating multiple feigning measures. Sensitivity and specificity data were available for two additional studies meeting inclusion criteria. The tests, contributing studies, and index means for each study are summarized in Table 2.

Table 2.

Methodological characteristics, effect sizes, and classification accuracy for studies included in analyses

First author Des Grp N Sample chars. Age (M [SD]) Educ (M [SD]) Coa Prep Mat $ Incentive Adm Meth Strng Trial Test Perf (M [SD]) d Cut< Sn Sp 
Victoria Symptom Validity Test 
Slick (1996) 43 N St 23.9 (7.3) 14.5 (2.2) CC 0.391 Hard 10.5 (5.9) 2.59 16 0.81 0.97 
  32 nCS Neu Out 35.3 (12.1) 12.6 (3.0)   –    22.6 (1.8)     
Strauss (2002) 11 nCS HI CV 36.2 (8.7)a 13.9 (2.1)a 0.522 Hard 12.0 (5.2) 2.53 16 0.82 0.94 
  16 nCS HI CV        22.4 (2.9)     
Wolfe (2004) 12 w/i US Orth In 42.3 (13.3) 15.5 (2.6) 15 0.783 Hard 9.2 (4.9) 3.61 – – 
            23.0 (1.8)     
 12 w/i US Orth In 44.3 (12.4) 14.4 (3.1) 15 0.739 Hard 5.3 (6.1) 3.07 – – 
            22.1 (4.3)     
Word Memory Test 
Batt (2008) 11 nCS HI Out 39.2 (11.2) 12.1 (11.5) – 0.609 IR 20.7 (6.1) 2.64 – 1.00 0.56 
  25 nCS HI Out 43.6 (11.1) 12.7 (2.5)   –    34.8 (4.8)     
Greve (2008) 27 CS HI Out 44.6 (7.7) 11.7 (1.7) 0.750 IR 65.1 (21.7) 1.52 82.5 0.67 0.56 
  32 CS HI Out 40.0 (12.5) 13.5 (2.2)       91.6 (12.1)     
           Dr 38.5 (12.5) 4.34 82.5 0.67 0.56 
            91.6 (11.7)     
           Con 37.7 (12.9) 3.99 82.5 0.78 0.30 
            88.8 (12.4)     
            Mean  3.07  0.71 0.47 
 58 CS Pain Out 41.5 (9.5) 11.1 (2.5) 0.750 IR 76.7 (18.8) 1.08 82.5 0.52 0.88 
  25 CS Pain Out 43.3 (10.9) 13.0 (2.0)       94.4 (6.5)     
           DR 74.4 (24.2) 1.02 82.5 0.50 0.88 
            95.6 (5.7)     
           Con 75.1 (17.6) 1.14 82.5 0.53 0.83 
            92.8 (8.1)     
            Mean  1.07  0.52 0.86 
Hubbard (2008) 50 w/i Psy In 37.7 (10.7) 11.6 (2.1) 5 + 3BA 0.609 IR 56.9 (17.2) 1.99 82.5 0.93 0.82 
        5 + 3BA    88.1 (13.7)     
            DR 56.0 (18.9) 1.97 82.5 0.90 0.76 
             89.6 (14.7)     
            Con 60.1 (14.5) 1.70 82.5 0.95 0.60 
             85.0 (14.5)     
            Mean  1.90  0.93 0.73 
Lindstrom (2009) 25 N St 19.7 (0.9) 13.0 (1.0) CC 0.261 IR 68.4 (18.1) 2.00    
  25 nCS LD St 19.5 (1.2) 12.5 (1.0)   –    95.9 (6.2)  82.5b 0.92b 0.92b 
            DR 64.4 (16.3) 2.60    
             96.5 (5.5)     
            Con 59.8 (15.9) 2.73    
             94.5 (7.7)     
            Mean  2.45    
Sholtz (2006) 38 N St naïve mode: 21-24 13% ≤HS 100B 0.783 – – – Notec 0.82c  
  29 N St Caution mode: 18-19 35% ≤HS 100B 0.739 – – – Notec 0.83c  
  37 N St Caut/Educ mode: 25+ 27% ≤HS 100B 0.783 – – – Notec 0.49c  
  24 US mHI Out mode: 25+ 33% ≤HS   100B   – –  Notec  0.96c 
Shandera (2010) 25 N CV 33.8 (10.4) 10.2 (1.0) 75 + 25 BA 0.870 IR 69.9 (20.9) 0.47 82.5 0.68 0.42 
  24 MR 36.0 (9.1) 11.2 (1.5)   20    78.4 (13.9)     
           DR 71.3 (24.5) 0.44 82.5 0.60 0.42 
            80.2 (13.8)     
           Con 57.0 (27.8)  82.5 0.80 0.25 
            68.0 (18.1)     
            Mean  0.46  0.69 0.36 
Test of Memory Malingering 
Abramsky (2005) 44 N CV 41.4 (–) 14.4 (–) 15 + 25B 0.609 T2 39.1 (10.3) 1.41 45 0.50 0.98 
  44 US Psy In/Out 42.7 (–) 14.1 (–)   15    49.6 (1.8)     
Batt (2008) 11 nCS HI Out 39.2 (11.2) 12.1 (11.5) – 0.609 T2 22.5 (9.6) 3.25 45 1.00 0.84 
  25 nCS HI Out 43.6 (11.1) 12.7 (2.5)   –    47.2 (6.3)     
Connell (2005) 40 N St 25.4 (9.0) Noted 50B 0.696 T2 39.7 (12.7) 1.12 45 0.50 1.00 
  40 nCS PTSD Out 33.4 (12.2) Noted   25    49.9 (0.4)  48 0.50 1.00 
Farkas (2009) 43 N St 18.9 (0.9) 12.7 (0.8) 20 + 50B 0.609 T2 31.4 (8.5) 2.50 45 0.91 0.84 
  43 nCS mMR Out 49.1 (11.1) Notee   20    47.7 (3.4)     
Graue (2007) 25 N CV 34.2 (15.5) 9.7 (1.2) 75 + 25 BA 0.783 T2 28.7 (7.2) 2.96 45 0.80 0.69 
  26 nCS mMR Out 37.1 (9.5) 11.4 (1.0)   20    45.7 (3.6)     
           Ret 27.6 (7.5) 3.23 45 0.80 0.81 
            46.6 (3.4)     
Greve (2009) 216 US Pain Out 42.3 (8.2) 11.1 (2.4) 0.833 T2 37.0 (8.7) 1.84 45 0.35 1.00 
  118 CS Pain Out 42.8 (9.1) 12.6 (2.3)       49.9 (0.3)  48 0.43 1.00 
            Ret 42.9 (8.9) 1.38 45 0.34 1.00 
             49.8 (0.4)  48 0.43 1.00 
Greve (2008) 27 CS TBI Out 44.6 (7.7) 11.7 (1.7) 0.750 T2 38.5 (12.5) 1.24 45 0.48 0.95 
  32 CS TBI Out 40.0 (12.5) 13.5 (2.2)       49.4 (2.7)     
           Ret 37.7 (12.9) 1.23 45 0.56 0.88 
            49.1 (3.8)     
 58 CS Pain Out 41.5 (9.5) 11.1 (2.5) 0.750 T2 42.1 (11.3) 0.75 45 0.40 0.95 
  25 CS Pain Out 43.3 (10.9) 13.0 (2.0)       49.4 (3.6)     
           Ret 42.0 (11.4) 0.72 45 0.35 0.95 
            49.1 (3.8)     
Greve, Bianchini & Doane (2006) 33 CS Tox Out 43.0 (11.4) 10.8 (2.9) 0.583 T2 40.9 (10.3) 1.07 45 0.55 1.00 
  17 CS Tox Out 41.5 (12.5) 12.8 (2.8)       50.0 (0)  48 0.58 1.00 
  30 same as above same same      Ret 38.5 (11.8) 1.10 45 0.52 1.00 
  12          49.8 (0.6)  48 0.68 1.00 
Greve, Bianchini & Doane (2006) 41 CS HI Out 39.3 (12.9) tot sample 12.3 (3.3) tot sample 0.667 T2 40.2 (11.1) 1.51 45 0.42 0.96 
  82 CS HI Out         49.9 (0.3)  48 0.54 0.91 
            Ret 38.7 (11.9) 1.61 45 0.57 0.97 
             49.8 (0.4)  48 0.61 0.92 
Hubbard (2008) 50 w/i US Psy In 37.7 (10.7) 11.6 (2.1) 5 + 3BA 0.609 T2 28.4 (11.1) 2.21 45 0.93 0.84 
        5 + 3BA    47.3 (4.5)     
           Ret 29.6 (11.5) 1.96 45 0.88 0.84 
            47.3 (5.4)     
Lindstrom (2009) 25 N St 19.7 (0.9) 13.0 (1.0) CC 0.261 T2 41.1 (7.6) 1.63    
  25 CS LD St 19.5 (1.2) 12.5 (1.0)   –    50.0 (0.2)  45b 0.68b 1.00b 
            Ret 38.6 (8.6) 1.83    
             49.9 (0.3)     
Pivovarova (2009) 29 US CV 40.0 (13.5) 12.4 (1.7) 20 0.478 T2 29.1 (19.6) 1.87 45 0.62 0.94 
  81 US Psy Out 40.9 (10.6) 11.7 (3.0)   20    48.6 (3.3)     
Rees (1998) nCS HI Out/CV 44.4 (14.3) 14.0 (2.7) 50B 0.739 T2 32.1 (7.3) 3.44 45 1.00 0.90f 
  10 nCS HI Out/CV 41.8 (14.2) 13.5 (2.5)   50R    49.6 (0.6)     
           Ret 31.6 (8.0) 3.22 45 1.00 1.00 
            49.6 (0.7)     
Rosenfeld (2010) 29 N CV 40.0 (13.5) – 20 0.609 T2 – – 45 0.62 0.94 
  87 US Psy Out 40.8 (10.6) –   20    –     
Samra (2004) 48 N St 20.3(3.4)g 13.4(1.5)g 100B 0.609 T2 41.2 (7.4) 1.33 45 0.56 1.00 
  16 US Dep Out 37.6 (10.7) 14.8 (3.3)   –    49.9 (0.3)  48 0.79 1.00 
           Ret 41.1 (8.3) 1.20 45 0.52 1.00 
            49.9 (0.3)  48 0.75 1.00 
Shandera (2010) 25 N CV 33.8 (10.4) 10.2 (1.0) 75 + 25 BA 0.870 T2 81.3 (23.6) 0.69 45 0.40 0.88 
  24 nCS mMR Out 36.0 (9.1) 11.2 (1.5)   20    94.5 (12.2)     
           Ret 78.8 (25.9) 0.95 45 0.44 0.92 
            97.1 (6.3)     
Sollman (2010) 30 N St 19.1 (1.3) 13.1 (0.9) 45BA CC 0.826 T2 84.5 (17.1) 1.18 45 0.47 0.97 
  29 nCS ADHD St 19.4 (1.2) 13.4 (1.1)   45 or 15CC    99.2 (2.7)     
            Ret 84.9 (16.1) 1.21 45 0.47 0.97 
             99.2 (2.7)     
Tombaugh (1997) 20 N St 22.2 (3.9)h 13.3 (1.0)h CC 0.304 T2 27.9 (7.2) 5.25i 45 1.00 0.94 
  145 nCS Neu Out 57.0 (12.6)g 12.5 (3.0)g      48.4 (3.2)  48 1.00 0.99 
           Ret 26.4 (7.5) 6.29i 45 1.00 0.89 
            49.0 (2.4)  48 1.00 0.95 
Vagnini (2008) 16 N St 32.7 (12.8) 15.7 (2.5) 10/h 0.391 T2 64.0 (16.7) 2.88 45 1.00 1.00 
  15 nCS HI CV 40.5 (11.7) 14.3 (1.9)   10/h + 20    99.6 (1.5)     
           Ret 64.1 (17.7) 2.73 45 1.00 1.00 
            99.8 (0.7)     
Vickery (2004) 23 nCS HI CV 32.5 (10.5) 12.7 (2.0) 75 + 20BA 0.870 T2 42.7 (8.1) 1.21    
  23 nCS HI CV 29.9 (11.3) 12.9 (2.1)   75    49.8 (0.7)  45b 0.52b 1.00b 
           Ret 40.8 (8.8) 1.42    
            49.8 (0.7)     
Wood (2009) 45 US CV 27.9 (10.2) 80% ≤HS 20 + 50B 0.609 T2 34.1 (13.8) 1.46 45 0.67 0.91 
  45 US Psy Out CV 40.4 (10.3) 80% ≤HS   20    48.8 (2.9)     
Letter Memory Test 
Graue (2007) 25 N CV 34.2 (15.5) 9.7 (1.2) 75 + 25 BA 0.783 48.8 (29.1) 1.97 0.88 0.58 0.73 
  26 nCS mMR Out 37.1 (9.5) 11.4 (1.0)   20    91.2 (8.4)     
Greub (2005) 25 US mHI St Notej – 0.565 76.7 (21.1) 1.44 93 0.68 1.00 
  25 US mHI St  –   –    98.9 (3.7)     
Inman (1998) 41 N St coach 19.2 (2.9) 12.6 (1.1) 20B 0.783 65.4 (18.1) 2.46 93 0.84 1.00 
  32 nCS Neu Out 33.2 (10.6) 12.3 (2.9)   10    99.3 (1.4)     
 26 N CV coach 31.7 (10.3) 12.5 (1.5) 20 + 5BA 0.870 68.1 (21.6) 2.06 93 Notek 1.00 
  28 nCS Neu Out 34.7 (14.6) 13.7 (2.5)   10    99.5 (1.0)     
 25 N CV naïve 30.4 (7.3) 13.2 (2.7) 20 + 5BA 0.870 54.8 (28.6) 2.00 93 Notek 1.00 
  18 US Dep Out 31.6 (9.6) 13.1 (2.7)   20 + 5BA    99.4 (1.3)     
 19 CS HI Out 42.2 (15.9) 11.1 (2.5) 0.833 69.4 (12.2) 3.46 93 0.95 1.00 
  21 CS HI Out 34.6 (8.2) 12.3 (3.0)       99.3 (1.4)     
Inman (2002) 21 nCS HI St 18.7 (0.9) 12.5 (1.9) 20BA 0.783 80.4 (15.2) 1.78 93 Notel 1.00 
  24 nCS HI St 18.7 (1.7) 12.3 (0.6)   CC    99.3 (1.7)     
Orey (2000) 26 nCS HI St 19.4 (1.9) 13.0 (1.3) 25 0.783 74.7 (26.4) 1.30 93 0.58 1.00 
  24 nCS HI St 18.8 (1.0) 12.7 (1.0)   25   99.9 (0.4)     
Schipper (2008) 10 CS Neu Out 38.2 (14.0) 11.0 (1.9) 0.833 61.7 (26.8) 0.92 93 0.46 1.00 
  39 nCS Neu Out  13.3 (2.2)       98.1 (41.5)     
Shandera (2010) 25 N CV 33.8 (10.4) 10.2 (1.0) 75 + 25 BA 0.870 77.5 (24.8) 0.52 93 0.60 0.61 
  24 nCS mMR Out 36.0 (9.1) 11.2 (1.5)   20    87.9 (12.5)     
Sollman (2010) 30 N St 19.1 (1.3) 13.1 (0.9) 45BA CC 0.826 85.5 (16.0) 1.03 93 0.52 0.93 
  29 nCS ADHD St 19.4 (1.2) 13.4 (1.1)   45 or 15CC    97.7 (3.4)     
Vagnini (2006) 53 CS Neu Out 40.4 (12.1) 12.1 (2.9) NA 0.833 79.9 (20.5) 1.07 93 0.64 0.95 
  69 CS Neu Out 43.4 (14.8) 12.6 (3.1)      97.2 (11.5)     
Vickery (2004) 23 nCS HI CV 32.5 (10.5) 12.7 (2.0) 75 + 20BA 0.870 73.0 (21.9) 1.46 93 0.87 0.87 
  23 nCS HI CV 29.9 (11.3) 12.9 (2.1)   75    96.8 (5.8)     
Wolfe (2004) 12 w/i US Orth In 42.3 (13.3) 15.5 (2.6) 15 0.783 54.8 (17.4) 3.26 93 – – 
            97.4 (4.0)     
 12 w/i US Orth In 44.3 (12.4) 14.4 (3.1) 15 0.739 31.9 (23.7) 3.75 93 – – 
            97.8 (3.9)     
Medical Symptom Validity Test (MSVT) 
Covert (2010) 25 N St 45.4 (7.2) 13.5 (2.5) 0.217 IR 82.6 (15.6) 0.86 – 0.65 0.79 
  48 HI Out 29.2 (8.5) 15.0 (0.8)   –    94.5 (8.5)     
            DR 87.6 (11.6) 0.55 –   
             93.7 (9.8)     
            Con 80.2 (17.1) 0.88 –   
             93.5 (9.6)     
            Mean  0.78    
Harrison (2009) 21 N St / Incarc 34 (–) – CC / 0 0.348 Mean 45.2 (17.8) 3.26 – 0.95 0.95 
  30 Incarc / N St 35 (–) –    0/ 0    92.2 (11.0)     
Singhal (2009) 10 N CV 36 (10) 17 (2) 0.174 IR 57 (15) 0.92 AB1 0.40 1.00 
  10 Dem IP 81.7 (4.6) 10 (2.9)      70 (12)  AB2 0.60 1.00 
            DR 57 (18) 0.77    
             70 (14)     
            Con 60 (19) 0.16    
             63 (17)     
            Mean  0.60    
First author Des Grp N Sample chars. Age (M [SD]) Educ (M [SD]) Coa Prep Mat $ Incentive Adm Meth Strng Trial Test Perf (M [SD]) d Cut< Sn Sp 
Victoria Symptom Validity Test 
Slick (1996) 43 N St 23.9 (7.3) 14.5 (2.2) CC 0.391 Hard 10.5 (5.9) 2.59 16 0.81 0.97 
  32 nCS Neu Out 35.3 (12.1) 12.6 (3.0)   –    22.6 (1.8)     
Strauss (2002) 11 nCS HI CV 36.2 (8.7)a 13.9 (2.1)a 0.522 Hard 12.0 (5.2) 2.53 16 0.82 0.94 
  16 nCS HI CV        22.4 (2.9)     
Wolfe (2004) 12 w/i US Orth In 42.3 (13.3) 15.5 (2.6) 15 0.783 Hard 9.2 (4.9) 3.61 – – 
            23.0 (1.8)     
 12 w/i US Orth In 44.3 (12.4) 14.4 (3.1) 15 0.739 Hard 5.3 (6.1) 3.07 – – 
            22.1 (4.3)     
Word Memory Test 
Batt (2008) 11 nCS HI Out 39.2 (11.2) 12.1 (11.5) – 0.609 IR 20.7 (6.1) 2.64 – 1.00 0.56 
  25 nCS HI Out 43.6 (11.1) 12.7 (2.5)   –    34.8 (4.8)     
Greve (2008) 27 CS HI Out 44.6 (7.7) 11.7 (1.7) 0.750 IR 65.1 (21.7) 1.52 82.5 0.67 0.56 
  32 CS HI Out 40.0 (12.5) 13.5 (2.2)       91.6 (12.1)     
           Dr 38.5 (12.5) 4.34 82.5 0.67 0.56 
            91.6 (11.7)     
           Con 37.7 (12.9) 3.99 82.5 0.78 0.30 
            88.8 (12.4)     
            Mean  3.07  0.71 0.47 
 58 CS Pain Out 41.5 (9.5) 11.1 (2.5) 0.750 IR 76.7 (18.8) 1.08 82.5 0.52 0.88 
  25 CS Pain Out 43.3 (10.9) 13.0 (2.0)       94.4 (6.5)     
           DR 74.4 (24.2) 1.02 82.5 0.50 0.88 
            95.6 (5.7)     
           Con 75.1 (17.6) 1.14 82.5 0.53 0.83 
            92.8 (8.1)     
            Mean  1.07  0.52 0.86 
Hubbard (2008) 50 w/i Psy In 37.7 (10.7) 11.6 (2.1) 5 + 3BA 0.609 IR 56.9 (17.2) 1.99 82.5 0.93 0.82 
        5 + 3BA    88.1 (13.7)     
            DR 56.0 (18.9) 1.97 82.5 0.90 0.76 
             89.6 (14.7)     
            Con 60.1 (14.5) 1.70 82.5 0.95 0.60 
             85.0 (14.5)     
            Mean  1.90  0.93 0.73 
Lindstrom (2009) 25 N St 19.7 (0.9) 13.0 (1.0) CC 0.261 IR 68.4 (18.1) 2.00    
  25 nCS LD St 19.5 (1.2) 12.5 (1.0)   –    95.9 (6.2)  82.5b 0.92b 0.92b 
            DR 64.4 (16.3) 2.60    
             96.5 (5.5)     
            Con 59.8 (15.9) 2.73    
             94.5 (7.7)     
            Mean  2.45    
Sholtz (2006) 38 N St naïve mode: 21-24 13% ≤HS 100B 0.783 – – – Notec 0.82c  
  29 N St Caution mode: 18-19 35% ≤HS 100B 0.739 – – – Notec 0.83c  
  37 N St Caut/Educ mode: 25+ 27% ≤HS 100B 0.783 – – – Notec 0.49c  
  24 US mHI Out mode: 25+ 33% ≤HS   100B   – –  Notec  0.96c 
Shandera (2010) 25 N CV 33.8 (10.4) 10.2 (1.0) 75 + 25 BA 0.870 IR 69.9 (20.9) 0.47 82.5 0.68 0.42 
  24 MR 36.0 (9.1) 11.2 (1.5)   20    78.4 (13.9)     
           DR 71.3 (24.5) 0.44 82.5 0.60 0.42 
            80.2 (13.8)     
           Con 57.0 (27.8)  82.5 0.80 0.25 
            68.0 (18.1)     
            Mean  0.46  0.69 0.36 
Test of Memory Malingering 
Abramsky (2005) 44 N CV 41.4 (–) 14.4 (–) 15 + 25B 0.609 T2 39.1 (10.3) 1.41 45 0.50 0.98 
  44 US Psy In/Out 42.7 (–) 14.1 (–)   15    49.6 (1.8)     
Batt (2008) 11 nCS HI Out 39.2 (11.2) 12.1 (11.5) – 0.609 T2 22.5 (9.6) 3.25 45 1.00 0.84 
  25 nCS HI Out 43.6 (11.1) 12.7 (2.5)   –    47.2 (6.3)     
Connell (2005) 40 N St 25.4 (9.0) Noted 50B 0.696 T2 39.7 (12.7) 1.12 45 0.50 1.00 
  40 nCS PTSD Out 33.4 (12.2) Noted   25    49.9 (0.4)  48 0.50 1.00 
Farkas (2009) 43 N St 18.9 (0.9) 12.7 (0.8) 20 + 50B 0.609 T2 31.4 (8.5) 2.50 45 0.91 0.84 
  43 nCS mMR Out 49.1 (11.1) Notee   20    47.7 (3.4)     
Graue (2007) 25 N CV 34.2 (15.5) 9.7 (1.2) 75 + 25 BA 0.783 T2 28.7 (7.2) 2.96 45 0.80 0.69 
  26 nCS mMR Out 37.1 (9.5) 11.4 (1.0)   20    45.7 (3.6)     
           Ret 27.6 (7.5) 3.23 45 0.80 0.81 
            46.6 (3.4)     
Greve (2009) 216 US Pain Out 42.3 (8.2) 11.1 (2.4) 0.833 T2 37.0 (8.7) 1.84 45 0.35 1.00 
  118 CS Pain Out 42.8 (9.1) 12.6 (2.3)       49.9 (0.3)  48 0.43 1.00 
            Ret 42.9 (8.9) 1.38 45 0.34 1.00 
             49.8 (0.4)  48 0.43 1.00 
Greve (2008) 27 CS TBI Out 44.6 (7.7) 11.7 (1.7) 0.750 T2 38.5 (12.5) 1.24 45 0.48 0.95 
  32 CS TBI Out 40.0 (12.5) 13.5 (2.2)       49.4 (2.7)     
           Ret 37.7 (12.9) 1.23 45 0.56 0.88 
            49.1 (3.8)     
 58 CS Pain Out 41.5 (9.5) 11.1 (2.5) 0.750 T2 42.1 (11.3) 0.75 45 0.40 0.95 
  25 CS Pain Out 43.3 (10.9) 13.0 (2.0)       49.4 (3.6)     
           Ret 42.0 (11.4) 0.72 45 0.35 0.95 
            49.1 (3.8)     
Greve, Bianchini & Doane (2006) 33 CS Tox Out 43.0 (11.4) 10.8 (2.9) 0.583 T2 40.9 (10.3) 1.07 45 0.55 1.00 
  17 CS Tox Out 41.5 (12.5) 12.8 (2.8)       50.0 (0)  48 0.58 1.00 
  30 same as above same same      Ret 38.5 (11.8) 1.10 45 0.52 1.00 
  12          49.8 (0.6)  48 0.68 1.00 
Greve, Bianchini & Doane (2006) 41 CS HI Out 39.3 (12.9) tot sample 12.3 (3.3) tot sample 0.667 T2 40.2 (11.1) 1.51 45 0.42 0.96 
  82 CS HI Out         49.9 (0.3)  48 0.54 0.91 
            Ret 38.7 (11.9) 1.61 45 0.57 0.97 
             49.8 (0.4)  48 0.61 0.92 
Hubbard (2008) 50 w/i US Psy In 37.7 (10.7) 11.6 (2.1) 5 + 3BA 0.609 T2 28.4 (11.1) 2.21 45 0.93 0.84 
        5 + 3BA    47.3 (4.5)     
           Ret 29.6 (11.5) 1.96 45 0.88 0.84 
            47.3 (5.4)     
Lindstrom (2009) 25 N St 19.7 (0.9) 13.0 (1.0) CC 0.261 T2 41.1 (7.6) 1.63    
  25 CS LD St 19.5 (1.2) 12.5 (1.0)   –    50.0 (0.2)  45b 0.68b 1.00b 
            Ret 38.6 (8.6) 1.83    
             49.9 (0.3)     
Pivovarova (2009) 29 US CV 40.0 (13.5) 12.4 (1.7) 20 0.478 T2 29.1 (19.6) 1.87 45 0.62 0.94 
  81 US Psy Out 40.9 (10.6) 11.7 (3.0)   20    48.6 (3.3)     
Rees (1998) nCS HI Out/CV 44.4 (14.3) 14.0 (2.7) 50B 0.739 T2 32.1 (7.3) 3.44 45 1.00 0.90f 
  10 nCS HI Out/CV 41.8 (14.2) 13.5 (2.5)   50R    49.6 (0.6)     
           Ret 31.6 (8.0) 3.22 45 1.00 1.00 
            49.6 (0.7)     
Rosenfeld (2010) 29 N CV 40.0 (13.5) – 20 0.609 T2 – – 45 0.62 0.94 
  87 US Psy Out 40.8 (10.6) –   20    –     
Samra (2004) 48 N St 20.3(3.4)g 13.4(1.5)g 100B 0.609 T2 41.2 (7.4) 1.33 45 0.56 1.00 
  16 US Dep Out 37.6 (10.7) 14.8 (3.3)   –    49.9 (0.3)  48 0.79 1.00 
           Ret 41.1 (8.3) 1.20 45 0.52 1.00 
            49.9 (0.3)  48 0.75 1.00 
Shandera (2010) 25 N CV 33.8 (10.4) 10.2 (1.0) 75 + 25 BA 0.870 T2 81.3 (23.6) 0.69 45 0.40 0.88 
  24 nCS mMR Out 36.0 (9.1) 11.2 (1.5)   20    94.5 (12.2)     
           Ret 78.8 (25.9) 0.95 45 0.44 0.92 
            97.1 (6.3)     
Sollman (2010) 30 N St 19.1 (1.3) 13.1 (0.9) 45BA CC 0.826 T2 84.5 (17.1) 1.18 45 0.47 0.97 
  29 nCS ADHD St 19.4 (1.2) 13.4 (1.1)   45 or 15CC    99.2 (2.7)     
            Ret 84.9 (16.1) 1.21 45 0.47 0.97 
             99.2 (2.7)     
Tombaugh (1997) 20 N St 22.2 (3.9)h 13.3 (1.0)h CC 0.304 T2 27.9 (7.2) 5.25i 45 1.00 0.94 
  145 nCS Neu Out 57.0 (12.6)g 12.5 (3.0)g      48.4 (3.2)  48 1.00 0.99 
           Ret 26.4 (7.5) 6.29i 45 1.00 0.89 
            49.0 (2.4)  48 1.00 0.95 
Vagnini (2008) 16 N St 32.7 (12.8) 15.7 (2.5) 10/h 0.391 T2 64.0 (16.7) 2.88 45 1.00 1.00 
  15 nCS HI CV 40.5 (11.7) 14.3 (1.9)   10/h + 20    99.6 (1.5)     
           Ret 64.1 (17.7) 2.73 45 1.00 1.00 
            99.8 (0.7)     
Vickery (2004) 23 nCS HI CV 32.5 (10.5) 12.7 (2.0) 75 + 20BA 0.870 T2 42.7 (8.1) 1.21    
  23 nCS HI CV 29.9 (11.3) 12.9 (2.1)   75    49.8 (0.7)  45b 0.52b 1.00b 
           Ret 40.8 (8.8) 1.42    
            49.8 (0.7)     
Wood (2009) 45 US CV 27.9 (10.2) 80% ≤HS 20 + 50B 0.609 T2 34.1 (13.8) 1.46 45 0.67 0.91 
  45 US Psy Out CV 40.4 (10.3) 80% ≤HS   20    48.8 (2.9)     
Letter Memory Test 
Graue (2007) 25 N CV 34.2 (15.5) 9.7 (1.2) 75 + 25 BA 0.783 48.8 (29.1) 1.97 0.88 0.58 0.73 
  26 nCS mMR Out 37.1 (9.5) 11.4 (1.0)   20    91.2 (8.4)     
Greub (2005) 25 US mHI St Notej – 0.565 76.7 (21.1) 1.44 93 0.68 1.00 
  25 US mHI St  –   –    98.9 (3.7)     
Inman (1998) 41 N St coach 19.2 (2.9) 12.6 (1.1) 20B 0.783 65.4 (18.1) 2.46 93 0.84 1.00 
  32 nCS Neu Out 33.2 (10.6) 12.3 (2.9)   10    99.3 (1.4)     
 26 N CV coach 31.7 (10.3) 12.5 (1.5) 20 + 5BA 0.870 68.1 (21.6) 2.06 93 Notek 1.00 
  28 nCS Neu Out 34.7 (14.6) 13.7 (2.5)   10    99.5 (1.0)     
 25 N CV naïve 30.4 (7.3) 13.2 (2.7) 20 + 5BA 0.870 54.8 (28.6) 2.00 93 Notek 1.00 
  18 US Dep Out 31.6 (9.6) 13.1 (2.7)   20 + 5BA    99.4 (1.3)     
 19 CS HI Out 42.2 (15.9) 11.1 (2.5) 0.833 69.4 (12.2) 3.46 93 0.95 1.00 
  21 CS HI Out 34.6 (8.2) 12.3 (3.0)       99.3 (1.4)     
Inman (2002) 21 nCS HI St 18.7 (0.9) 12.5 (1.9) 20BA 0.783 80.4 (15.2) 1.78 93 Notel 1.00 
  24 nCS HI St 18.7 (1.7) 12.3 (0.6)   CC    99.3 (1.7)     
Orey (2000) 26 nCS HI St 19.4 (1.9) 13.0 (1.3) 25 0.783 74.7 (26.4) 1.30 93 0.58 1.00 
  24 nCS HI St 18.8 (1.0) 12.7 (1.0)   25   99.9 (0.4)     
Schipper (2008) 10 CS Neu Out 38.2 (14.0) 11.0 (1.9) 0.833 61.7 (26.8) 0.92 93 0.46 1.00 
  39 nCS Neu Out  13.3 (2.2)       98.1 (41.5)     
Shandera (2010) 25 N CV 33.8 (10.4) 10.2 (1.0) 75 + 25 BA 0.870 77.5 (24.8) 0.52 93 0.60 0.61 
  24 nCS mMR Out 36.0 (9.1) 11.2 (1.5)   20    87.9 (12.5)     
Sollman (2010) 30 N St 19.1 (1.3) 13.1 (0.9) 45BA CC 0.826 85.5 (16.0) 1.03 93 0.52 0.93 
  29 nCS ADHD St 19.4 (1.2) 13.4 (1.1)   45 or 15CC    97.7 (3.4)     
Vagnini (2006) 53 CS Neu Out 40.4 (12.1) 12.1 (2.9) NA 0.833 79.9 (20.5) 1.07 93 0.64 0.95 
  69 CS Neu Out 43.4 (14.8) 12.6 (3.1)      97.2 (11.5)     
Vickery (2004) 23 nCS HI CV 32.5 (10.5) 12.7 (2.0) 75 + 20BA 0.870 73.0 (21.9) 1.46 93 0.87 0.87 
  23 nCS HI CV 29.9 (11.3) 12.9 (2.1)   75    96.8 (5.8)     
Wolfe (2004) 12 w/i US Orth In 42.3 (13.3) 15.5 (2.6) 15 0.783 54.8 (17.4) 3.26 93 – – 
            97.4 (4.0)     
 12 w/i US Orth In 44.3 (12.4) 14.4 (3.1) 15 0.739 31.9 (23.7) 3.75 93 – – 
            97.8 (3.9)     
Medical Symptom Validity Test (MSVT) 
Covert (2010) 25 N St 45.4 (7.2) 13.5 (2.5) 0.217 IR 82.6 (15.6) 0.86 – 0.65 0.79 
  48 HI Out 29.2 (8.5) 15.0 (0.8)   –    94.5 (8.5)     
            DR 87.6 (11.6) 0.55 –   
             93.7 (9.8)     
            Con 80.2 (17.1) 0.88 –   
             93.5 (9.6)     
            Mean  0.78    
Harrison (2009) 21 N St / Incarc 34 (–) – CC / 0 0.348 Mean 45.2 (17.8) 3.26 – 0.95 0.95 
  30 Incarc / N St 35 (–) –    0/ 0    92.2 (11.0)     
Singhal (2009) 10 N CV 36 (10) 17 (2) 0.174 IR 57 (15) 0.92 AB1 0.40 1.00 
  10 Dem IP 81.7 (4.6) 10 (2.9)      70 (12)  AB2 0.60 1.00 
            DR 57 (18) 0.77    
             70 (14)     
            Con 60 (19) 0.16    
             63 (17)     
            Mean  0.60    

Notes: – = data not provided; - = not applicable. Des = research design type (1 = known-groups [KG] contrast, 2 = clinical honest—clinical feigner simulation [CESim] contrast, 3 = clinical honest—normal feigner simulation [NESim] contrast); Grp = criterion group (F = feigning, H = Honest); w/i = within-subjects design; Sample Chars = sample characteristics (nCS = non-compensation seeking; CS = compensation-seeking; US = mixed or unknown compensation-seeking status; ADHD = ADHD diagnosed; Dep = clinically depressed; HI = head injury; mMR = mild mental retardation; Neu = mixed neurological; Orth = orthopedic; Psy = mixed psychiatric; N = non-clinical normal; St = students; CV = community volunteers; In = inpatients; Out = outpatients; coach = symptom- or test-coached group; naïve = no coaching or warning group); Coa = received symptom or test coaching; Prep Mat = resource provided or allowed for preparation and strategy development; $ Incentive = remuneration for participation (B = bonus for “success”; BA = bonus for “success” given to all; R = prize by random drawing; CC = course credit); Adm = admonition to perform believably; Meth Strng = methodological strength score, computed by dividing the number of points earned by the total number possible for either KG or ESim designs; Test Perf = test score; d = (Hedges' g), d-value effect size metric adjusted for sample size; Cut< = cutting score below which “fail,” used to derive operating characteristics; Sn = sensitivity, calculated using honest group presented here; Sp = specificity, calculated using feigning group presented here; IR = Immediate Recall; DR = Delayed Recall; Con = Consistency; LD = Learning Disability; Caut = Cautioned to feign believably; Educ = educated about symptoms; PTSD = Post Traumatic Stress Disorder; TBI = Traumatic Brain Injury; Tox = Toxic Exposure; Incarc = Incarcerated; IP = inpatient.

aThe authors provide demographics for the feigning and honest groups combined. No statistical analyses were conducted to assess for differences between these groups.

bThese values are derived using below cut score performance on any subtest.

cThis study used atypical cut scores of <82.5% correct on 2–3 subtests as indicative of failure, and 0–1 subtests <82.5 as passing.

d77.5% of FGN group, and 67.5% of HON group, had “some college or higher.” There was no significant difference in education between groups (p > .05) using χ2 analysis.

e37% of group received special education.

fThe manuscript presents a value of 96%, which is not possible given the malingering group size of 8. The value is assumed to be a typo.

gThese demographics come from the initial pool of participants (N = 96) who were later assigned to feign (the N = 48 included here) or to perform honestly (N = 48, not included in these analyses).

hThese demographics come from the initial pool of participants (N = 41) who were later assigned to feign (the N = 20 included here) or to perform honestly (N = 21, not included in these analyses).

iThe dementia group (Trial 2 N = 37, Retention N = 28) contributed significantly to these values, supporting research this test is not be appropriate in severely demented patients. Operating characteristics exclude that group.

jAge is not provided by group. However, the author states that of the four original groups (excluded here are HI feigners naïve [N = 24] and normal honest [N = 28]), there was no significant difference in age—F(3,98) = 0.96, p = .41.

kThe operating characteristics are provided for the depression and neurological groups combined (Sn = 84%, Sp = 100%).

lThe operating characteristics are provided for the head injured and normal undergraduates combined (Sn = 73%, Sp = 100%, HR = 87%).

Calculation of effect sizes was completed using Hedges' d, just as with the Vickery and colleagues meta-analysis. The metric d is an unbiased estimate of Hedges' g that corrects for sample size by using the inverse variance of g (seeHedges & Olkin, 1985; Lipsey & Wilson, 2001; Rosenthal, 1991) and, though often similar in magnitude, is not to be confused with Cohen's d (see any of the previously cited references for a review). As previously noted, the meta-analytic software “DSTAT” (Johnson, 1993) was used for effect size extraction and subsequent analyses. In addition to the information mentioned above, Table 2 also presents d scores along with data describing the study and samples of origin, test data, classification accuracy, and methodological strength score of each study.

The data set was then examined for homogeneity and effect sizes that might skew the results of subsequent analyses. Using the box-and-whiskers method (seeLipsey and Wilson, 2001), one outlier (Tombaugh, 1997) was identified. This effect size was >3 SD above the mean, suggesting that it was probably not appropriate for inclusion in analyses.

The remaining effect sizes ranged from d = 0.46 to 3.75. A mean d of 1.55 (95% confidence interval [CI] = 1.48–1.63) was found for the 48 remaining independent contrasts. This indicates that honest individuals provided higher percent-correct scores (and thus greater effort), on average, than feigning individuals. It also indicates that, as a group, the LMT, MSVT, TOMM T2, VSVT, and WMT separated participants providing suboptimal effort from those providing adequate effort by about 1.5 pooled standard deviations, a significant effect size that can be considered large. In fact, compared with the mean d score reported by Vickery and colleagues (2001), the present findings are significantly higher (Vickery's d = 1.13, 95% CI = 1.04–1.22). This may be due to an increase in the effectiveness of more recently developed feigning instruments or to the more stringent inclusion criteria used in the present study.

Statistical Considerations

An important issue in meta-analytic reviews involves the “file-drawer problem,” whereby available data may not be representative of the entire set from all studies conducted. It has historically been argued that peer-reviewed journals are more inclined to publish “significant” results; thus, a number of non-significant or small effects may be left in researchers' files and unavailable for meta-analysis. Calculating a fail-safe number (Rosenthal, 1991) provides an estimate of the number of null findings that would be necessary to overturn the results of the combined significance test. The fail-safe N for the subset of independent effects, excluding the earlier noted outlier, resulted in an estimate of 832 unpublished null findings that would be necessary to render results non-significant (p > .05), a seemingly unlikely number.

Two other issues are also of interest in evaluating this data set. First to be considered here is whether there is a relationship between methodological quality of the study and the associated d scores. The obtained correlation between methodological strength score noted earlier and d scores was non-significant (r = −.07; p = .34). A second concern is whether there is a difference between data reported in published versus unpublished studies. Although there was a trend toward higher d scores for unpublished studies (M = 2.31, SD = 0.99) than for published studies (M = 1.76, SD = 0.88), this difference did not reach statistical significance—F(1,46) = 3.2, p = .067.

Another potential concern when evaluating meta-analytic summary statistics is how representative the overall effect size is of the underlying data points. This was addressed by calculating the statistic Q, which tests whether all included studies have the same population effect size (Hedges & Olkin, 1985). For the 48 independent effect sizes corrected for sample size, a significant amount of variability was found, Q(47) = 377.93, p < .001. Thus, an overall d-value does not adequately represent the data set as a whole. Further exploration of sources of variability was therefore warranted.

Comparing Malingering Indices

The first source of variability considered in light of the overall heterogeneity was between-test differences. Separate effect sizes were calculated for each test, as displayed in Table 3. All indices demonstrated very strong mean effects in separating groups of individuals believed to be feigning impairment from those believed to be performing honestly (mean d range = 0.94–2.77). Contrasts were next performed to test for between-measure differences in the ability to separate feigning and honest groups. The QB statistic was first used to test an omnibus difference between all five (non-independent) indices. QB is a between-class goodness-of-fit statistic analogous to the F-test of whether class means are the same (Hedges & Olkin, 1985). It uses the Scheffé method to protect against Type I error. Results revealed significant differences among the five measures, QB(4) = 53.55, p = .001 suggesting reliable differences in effect sizes across the measures. Post hoc comparisons across tests indicated the following hierarchy of effect sizes: the mean d for the VSVT is significantly higher than all other measures here; the WMT, TOMM, and LMT have comparable effect sizes; the MSVT has a significantly lower d score than all the other tests.

Table 3.

Descriptive data: Effect size (d) values by feigning index

 VSVT Hard WMT TOMM Trial 2 LMT MSVT Contrast p-value 
d 2.77H 1.61 1.59a 1.54 0.94 .0000* 
95% CI 2.32–3.22 1.47–1.76 1.48–1.71 1.37–1.71 0.70–1.19  
k 15   
 VSVT Hard WMT TOMM Trial 2 LMT MSVT Contrast p-value 
d 2.77H 1.61 1.59a 1.54 0.94 .0000* 
95% CI 2.32–3.22 1.47–1.76 1.48–1.71 1.37–1.71 0.70–1.19  
k 15   

Notes: Combined effect size excluding outlier (d = 1.55, 95% CI = 1.48–1.63). Post hoc contrast was made using the QB statistic: QB (4) = 53.55, p = .0000; VSVT Hard > (WMT = TOMM T2 = LMT) > MSVT, according to post hoc analyses. CI = confidence interval; k = number of studies; d = d effect size metric; VSVT = Victoria Symptom Validity Test; WMT = Word Memory Test; TOMM = Test of Memory Malingering; LMT = Letter Memory Test; MSVT = Medical Symptom Validity Test; H = homogeneous by Q test (see text), p > .05.

aExcludes statistical outlier.

Effect Size Moderators

Having identified variability in index performance, it is appropriate to explore factors that may have contributed to, or moderated, these differences. However, the small number of studies evaluated for the VSVT and MSVT statistically precluded more focused evaluations. Instead, an examination of moderator variables on the sample-independent data set as a whole may provide some additional information about malingering research variables that may contribute to effect sizes in general.

Comparability of Research Designs

Considering possible differences between malingering research designs examined here (Sim:PTFGN, Sim:NLFGN, or KG), questions arose regarding whether the type of design used to derive each effect size affected the results. Specifically, the relationship between simulation and KG effect sizes may shed light on the interpretive utility of simulation research where KG research is absent. Examining the moderating effects of research design also allows exploration of whether experimental manipulation leads to an inflation of effect size. Thus, attention was turned next to evaluating the moderation of d-values by research design.

Effect sizes were found to vary as a factor of research design, QB(2) = 41.61, p < .001. As depicted in Table 4, post hoc comparisons showed that the largest mean effect size was obtained from the Sim:PTFGN contrast, which was significantly greater than that of the KG contrast and the Sim:NLFGN contrast. However, there was not a significant difference between the mean Sim:NLFGN and KG contrast effect sizes.

Table 4.

Results of additional categorical effect size moderator analyses

Manipulation Condition k Mean Corrected p-value Net effect of manipulation on d-mean 
Symptom Coaching No 20 d = 1.73 .0000* Reduction 
Yes 18 d = 1.38   
Preparation Materials No 29 d = 1.88 .0000* Reduction 
Yes d = 1.01   
Warning to Fake Believably No d = 1.34 .0014* Increase 
Yes 29 d = 1.67   
Provision of Financial Incentive No  .0000 Reduction 
Yes 29 $48.14   
Manipulation Condition k Mean Corrected p-value Net effect of manipulation on d-mean 
Symptom Coaching No 20 d = 1.73 .0000* Reduction 
Yes 18 d = 1.38   
Preparation Materials No 29 d = 1.88 .0000* Reduction 
Yes d = 1.01   
Warning to Fake Believably No d = 1.34 .0014* Increase 
Yes 29 d = 1.67   
Provision of Financial Incentive No  .0000 Reduction 
Yes 29 $48.14   

Note: Analyses exclude Test of Memory Malingering Ret values and statistical outliers.

*p-value significant at .05 level using QB statistic.

Design-Specific Methodologies

A number of specific experimental design moderators were included in the methodological strength ratings (Supplementary material online, Appendix A). Focusing on the simulation designs, these included provision of symptom-specific coaching, allowing use of outside resources to prepare feigning strategy, warning simulators not to fake too blatantly (seeViglione et al., 2001; Youngjohn et al., 1999), and the provision of financial compensation offered for successful feigning (Binder & Willis, 1991), to name a few. Table 5 presents results that compare d-scores for contrasts from simulation designs with and without these characteristics present. In considering these data, it might be helpful to keep in mind that moderators that reduce the d score differences here reflect decreased ability to discriminate honest from feigning groups. Coaching participants specifically on symptoms of the disorder, they were asked to dissimulate occurred in 20 of 38 simulation contrasts, and resulting in significantly reduced effect sizes for studies that coached simulators. Likewise, providing access to outside resources, the Internet, or other preparation materials was associated with a significantly lower effect size for simulation studies with prepared feigners. Warning participants to feign believably, without over-exaggerating and being caught, was surprisingly associated with significantly increased effect sizes than for those studies that did not include this warning. Lastly, provision of financial motivators to “successful feigners” was associated with a trend toward decreased effect size. Looking specifically at those simulation studies that did provide financial incentive, the amount of money offered was significantly related to effect size (Stouffer's Z = 6.47, p < .001), where higher incentive was associated with a smaller effect size. Of note, the mean total incentive for those provided it was $48.14 (although in many studies this was awarded by lottery to what amounted to a subset of successful feigners).

Table 5.

Test-wise contrasts of classification rates (M, SD) at various cutting scores

 Measure
 
Contrast p-value 
 VSVT (k = 2) WMT (k = 7) TOMM T2 (k = 22) LMT (k = 13) MSVT (k = 3) 
Cut score <16 <82.5% <45 <93 Various  
Sensitivity (%) 81.5 75.1 65.4 70.2 70.0 .703 
SD 0.7 20.4 22.6 15.4 22.9  
N 54 223 886 349 56  
Specificity (%) 95.5 69.4* 93.8 93.0 91.3 .001 
SD 2.1 23.5 7.8 12.4 11.0  
N 48 204 1,002 445 88  
Hit rate (%) 88.5 72.3 79.6 81.6 80.7 – 
SD 0.7 13.6 10.5 11.2 12.5  
Ntot 102 427 1,888 794 144  
 Measure
 
Contrast p-value 
 VSVT (k = 2) WMT (k = 7) TOMM T2 (k = 22) LMT (k = 13) MSVT (k = 3) 
Cut score <16 <82.5% <45 <93 Various  
Sensitivity (%) 81.5 75.1 65.4 70.2 70.0 .703 
SD 0.7 20.4 22.6 15.4 22.9  
N 54 223 886 349 56  
Specificity (%) 95.5 69.4* 93.8 93.0 91.3 .001 
SD 2.1 23.5 7.8 12.4 11.0  
N 48 204 1,002 445 88  
Hit rate (%) 88.5 72.3 79.6 81.6 80.7 – 
SD 0.7 13.6 10.5 11.2 12.5  
Ntot 102 427 1,888 794 144  

Notes: VSVT = Victoria Symptom Validity Test; WMT = Word Memory Test; TOMM = Test of Memory Malingering; LMT = Letter Memory Test; MSVT = Medical Symptom Validity Test; k = number of studies, N = total sample size. Sensitivity: F(4,42) = 0.546, Specificity: F(4,42) = 5.53.

*Significantly discrepant from all other values using the Tukey post hoc testing.

Accuracy of Tests for Individual Classification Decisions

Having explored the ability of each test to separate groups of honest and feigning individuals, and the within-study factors that may moderate differences in group separation across indices, the accuracy of each test's classification decisions for individual participants was evaluated. All research designs studied here allowed computation of sensitivity, specificity, and overall hit rate values at recommended cutting scores. These can be calculated from the passing rates of all known honest individuals (specificity) and failure rates of all known feigning individuals (sensitivity) in a study. Thus, sensitivity, specificity, and the aggregated hit rate values provide the proportion of those individuals accurately classified by the test at a given cutting score. These values are relevant for clinicians when selecting a test for use, because they suggest how likely it is that the true population of honest or feigning individuals will be correctly classified by a test. In actual clinical application, sensitivity, specificity, and estimated base rate of feigning may be used to determine what proportion of a test's positive (feigning) and negative (honest) determinations are accurate (e.g., positive and negative predictive powers or PPP and NPP). Because PPP and NPP vary with base rates, only the more stable sensitivity and specificity parameters are evaluated here.

Table 6 provides classification accuracy rates by measure. All 48 of the usable contrasts contributed sensitivity and specificity values, after re-including the outlier for the TOMM noted above. The mean sensitivity value across all five tests was 0.69, with a 95% CI of 0.63–0.75. There were no significant differences across tests for this variable—F(4,42) = 0.556, p = .703. The present mean sensitivity value was higher than that reported by Vickery and colleagues (0.56). However, given that CIs were not reported for this value by Vickery and colleagues, it is not possible to evaluate statistical significance. For specificity, the mean value across tests here was 0.90, with a 95% CI of 0.86–0.94. Specificity was significantly different across tests—F(4,42) = 5.42. In this case, the specificity value of the WMT (0.69) was significantly lower than that of the TOMM and the LMT. For Vickery and colleagues, the mean specificity value was 0.96, although once again a lack of CIs from the previous study precludes determination of a significant difference between the two meta-analyses. The mean hit rate across tests was 0.79, with a 95% CI of 0.76–0.83. There were no significant differences in this parameter across tests—F(4,42) = 1.87, p > .05. The mean hit rate across tests here of 0.79 was comparable with the value reported by Vickery and colleagues of 0.77.

Table 6.

Test-wise contrasts of classification rates (M, SD) at various cutting scores (Greub and Suhr, 2006)

 Measure
 
ANOVA p-value 
 VSVT (k = 2) WMT (k = 7) TOMM T2 (k = 22) LMT (k = 13) MSVT (k = 3)  
Cut score <16 <82.5% <45 <93 Various  
Sensitivity (%) 81.5 75.1 65.4 70.2 70.0 .703 
 95% CI 75.1–87.9 56.3–94.0 55.4–75.4 60.9–79.5 13.1–1.00  
N 54 223 886 349 56  
Specificity (%) 95.5 69.4 93.8 93.0 91.3 .001 
 95% CI 76.4–1.00 47.7–91.1 90.3–97.2 85.5–1.00 64.1–1.00  
N 48 204 1,002 445 88  
Hit rate (%) 88.5 72.3 79.6 81.6 80.7 .113 
 95% CI 80.3–96.6 56.2–82.2 73.4–84.1 76.2–89.5 52.1–1.00  
Ntot 102 427 1,888 794 144  
 Measure
 
ANOVA p-value 
 VSVT (k = 2) WMT (k = 7) TOMM T2 (k = 22) LMT (k = 13) MSVT (k = 3)  
Cut score <16 <82.5% <45 <93 Various  
Sensitivity (%) 81.5 75.1 65.4 70.2 70.0 .703 
 95% CI 75.1–87.9 56.3–94.0 55.4–75.4 60.9–79.5 13.1–1.00  
N 54 223 886 349 56  
Specificity (%) 95.5 69.4 93.8 93.0 91.3 .001 
 95% CI 76.4–1.00 47.7–91.1 90.3–97.2 85.5–1.00 64.1–1.00  
N 48 204 1,002 445 88  
Hit rate (%) 88.5 72.3 79.6 81.6 80.7 .113 
 95% CI 80.3–96.6 56.2–82.2 73.4–84.1 76.2–89.5 52.1–1.00  
Ntot 102 427 1,888 794 144  

Notes: VSVT = Victoria Symptom Validity Test; WMT = Word Memory Test; TOMM = Test of Memory Malingering; LMT = Letter Memory Test; MSVT = Medical Symptom Validity Test; ANOVA = analysis of variance. Includes all data points. Sensitivity ANOVA F(4,42) = 0.546, specificity ANOVA F(4,42) = 5.424, hit rate ANOVA F(4,42) = 1.87. Using Tukey's HSD for pairwise contrasts of specificity, WMT < TOMM, p < .01, WMT < LMT, p < .01. k = number of studies, N = sample size.

Test parameters were also examined by research design (i.e., Sim:NLFGN, Sim:PTFGN, and KG), collapsing across all tests examined here. Results are shown in Table 7. For sensitivity, significant differences were found across design—F(2,44) = 6.77, p = .003. Pairwise contrast using Tukey's HSD indicated that the Sim:PTFGN produced significantly higher sensitivity estimates than KG studies, but no other contrasts were statistically significant. Analyses of variance for specificity and hit rate were not significant across research design.

Table 7.

Contrasts of classification rates (M, SD) by research design

 Sim:PTFGN (N = 13) KG (N = 10) Sim:NLFGN (N = 24) ANOVA p-value 
Sensitivity (%) 82.0 54.8 68.2 .003 
 95% CI 71.8–92.3 42.0–67.6 60.6–75.8  
N 303 542 730  
Specificity (%) 88.6 91.4 89.9 .909 
 95% CI 80.5–96.7 79.8–1.00 83.3–96.5  
N 373 460 893  
Hit rate (%) 84.7 72.5 78.7 .060 
 95% CI 79.5–89.8 62.4–82.6 73.5–83.9  
Ntot 676 1,002 1,623  
 Sim:PTFGN (N = 13) KG (N = 10) Sim:NLFGN (N = 24) ANOVA p-value 
Sensitivity (%) 82.0 54.8 68.2 .003 
 95% CI 71.8–92.3 42.0–67.6 60.6–75.8  
N 303 542 730  
Specificity (%) 88.6 91.4 89.9 .909 
 95% CI 80.5–96.7 79.8–1.00 83.3–96.5  
N 373 460 893  
Hit rate (%) 84.7 72.5 78.7 .060 
 95% CI 79.5–89.8 62.4–82.6 73.5–83.9  
Ntot 676 1,002 1,623  

Notes: Sim:PTFGN = simulation designs including patients instructed to feign deficits or answer honestly; Sim:NLFGN = simulation designs contrasting normals instructed to feign deficits with patients answering honestly; KG = Known-Groups; ANOVA = analysis of variance. Sensitivity, ANOVA F(2,44) = 6.767; specificity, ANOVA F(2,44) = 0.095; hit rate, ANOVA F(2,44) = 3.01. Using Tukey's HSD for pairwise contrasts of sensitivity, Sim:PTFGN > KG, p < .01. k = number of studies, N = total sample size.

Discussion

This meta-analysis provides a review of selected tests dedicated to the detection of feigned deficits during neuropsychological examinations and is the first update since Vickery and colleagues (2001). Using tightened inclusion criteria, five tests were identified for review. In terms of aggregate effect sizes, the mean d collapsed across tests was 1.55, which is significantly larger than that reported by Vickery and colleagues (2001). Although all five tests had large effect sizes, a hierarchy was identified, such that VSVT > WMT = TOMM = LMT > MSVT. Effect sizes were related to study design, with ds for Sim:PTFGN > KG = Sim:NLFGN. Several features of simulation designs moderated effect sizes with reduced ds associated with symptom-specific coaching of feigners, availability of outside resources for feigners to prepare, and higher contingent monetary bonuses offered for successful feigning. Unexpectedly, inclusion in instructions of a warning to feigners to fake believably was associated with increased effect sizes. Considering classification parameters, mean sensitivity across tests was 0.69, specificity was 0.90, and hit rate was 0.79. There were no significant differences across tests in sensitivity or hit rate. However, the WMT had significantly lower specificity than the TOMM and the LMT. Research design was not related to specificity or hit rate values, although sensitivity was significantly higher for Sim:PTFGN designs than for KG methodologies.

Overall, these findings indicate that there are multiple well-validated tests for detecting neurocognitive feigning during neuropsychological examinations. All the procedures reviewed here had generally strong classification characteristics with the possible exception of the WMT's specificity. However, it should be noted that the WMT has robust support from a number of published studies that did not meet the inclusion criteria for the present meta-analysis (Green, 2007).

Effect size moderators were only examined for studies using simulation designs (Sim:PTFGN and Sim:NLFGN). Most of the findings here were intuitively clear, including the reduction in effect size differences for coaching feigners on specific symptoms, for making outside resources such as internet information available for feigners and for higher monetary incentives for successful feigning. One unexpected finding was that effect size differences increased when feigners were warned not to be too blatant in their approach to test-taking. There does not seem to be an obvious explanation for this finding, and it may be worthwhile to explore this issue further with additional experimental studies on the effects of various manipulations of instructions in simulated feigning designs.

One issue that has been of interest in the area of research on detection of neurocognitive feigning has been the question of the extent to which simulation studies of feigning produce ecologically valid results. The present meta-analysis does provide information on this issue, but in a nuanced way. Considering d scores as the dependent variable, simulation designs that included patients instructed to exaggerate their problems as well as patients instructed to answer honestly (Sim:PTFGN) produced significantly higher values than found in KG designs. However, d scores from KG studies were not significantly different from the simulation designs that contrasted normals instructed to feign deficits with patients independently verified as responding honestly. This pattern of results might have occurred because patients instructed to feign have fewer cognitive resources due to pathology to fake realistically. Alternatively, as instruction sets for patients who are to feign often request that they exaggerate their actual problems, these participants may have overdramatized their cognitive deficits during testing, as requested. Overall, results from d score analyses suggest that Sim:PTFGN may overestimate effect sizes relative to KG designs, although there was not a significant difference between Sim:NLFGN and KG findings.

The generalizability of results from simulation designs may also be addressed at the level of classification accuracy statistics. Here, the results were also mixed. No significant differences across research designs were seen for specificity and hit rate values. There was a significant difference across designs for sensitivity. This was accounted for by a significantly higher sensitivity rate for the Sim:PTFGN than for the KG studies. However, there was not a statistically significant difference between the Sim:NLFGN and KG studies for sensitivity rates. Although more data addressing the question of the generalizability of results from simulation studies would be desirable, at present it appears that only the Sim:PTFGN design might generate results of questionable generalizability, whereas the available evidence points to seemingly adequate ecological validity for Sim:NLFGN designs as they do not produce d scores or classification accuracy estimates that are significantly different from those found in KG studies.

In considering the question of research designs used to investigate the effectiveness of tests designed to detect neurocognitive feigning, it may be helpful to take an alternate perspective to the issue of which design is “best.” In fact, both simulation and KG designs have significant strengths and weaknesses. KG designs, by virtue of including actual patients in real-world evaluations, clearly have strong external validity. However, because this methodology does not utilize random assignments to groups and an experimental manipulation, internal validity is less robust. For example, differences between those identified as feigning and those classified as honestly responding are typically attributed to the presence or the absence of a malingering response set. However, the possibility that the two groups were different in important ways prior to decisions to feign or respond honestly cannot be ruled out as a partial determinant of any findings. Additionally, it is possible that only incompetent malingerers end up classified as feigners in KG studies, which ironically may raise questions about generalizability of findings. Of course, there are also weaknesses in simulation designs. Even the closest attention to instructions, monetary incentives, tests included in the battery, debriefing and elimination of non-compliant participants, and so forth, cannot capture the constellation of experiences and incentives present in real-world settings. However, simulation studies do have high internal validity when appropriate methodological steps are taken. It may be most useful to consider simulation and KG methodologies as complementary, with the highest confidence placed in tests that demonstrate converging support from both types of investigations.

There are important limitations to the present report that should be carefully considered. The mean sensitivity and specificity values presented here must be translated into predictive powers by incorporating estimated base rates of feigning in local settings. In KG results summarized here, symptom validity tests were typically an important component in assigning individuals to feigning or honest groups; although no study that used a test summarized here as both a criterion and a dependent variable in the report was included, some concern about circularity might potentially be raised. Consistent with current practice in this area, all of the KG procedures studied here used a norm-based cutting score to classify individuals as feigning or honest, and this is ultimately a probabilistic exercise, meaning that some errors in classification are likely present.

In summary, this meta-analysis demonstrated that there are a number of well-validated procedures for identifying neurocognitive feigning during evaluations. Given the modest sensitivity of most of the tests reviewed here, it is important to note that clinicians should not rely on a single procedure to address the possibility of feigned deficits. Instead, several procedures tapping different domains and administered across the evaluation should be employed (Boone, 2007). Additionally, given concerns about possibly compromised validity of existing feigning tests, it will continue to be important to develop novel, well-validated additional tests in this area.

Supplementary material

Supplementary material is available at Archives of Clinical Neuropsychology online.

Conflict of Interest

None declared.

Acknowledgements

D.T.R.B. holds the copyright to the Letter Memory Test and it may be ordered from him by email. All proceeds from sale of this measure are donated to the Jesse G. Harris Psychological Services Training Center. Gratitude is expressed to Monica Harris Kern, PhD, for reviewing an earlier draft of this manuscript; Lindsey Schipper, PhD, and Karen Kit, PhD, for assistance with cross-coding; and to those who provided reprints and other materials.

References

Notes: References marked with an asterisk were included in the meta-analysis. Due to space limitations, those not meeting inclusion criteria are not listed here.

*
Abramsky
A.
Assessment of test behaviors as a unique construct in the evaluation of malingered depression on the inventory of problems: Do test behaviors add significant variance beyond problem endorsement strategies?
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2005
, vol. 
66
 pg. 
1779
 
*
Batt
K.
Shores
E. A.
Chekaluk
E.
The effect of distraction on the Word Memory Test and Test of Memory Malingering performance in patients with a severe brain injury
Journal of the International Neuropsychological Society
 , 
2008
, vol. 
14
 (pg. 
1074
-
1080
)
Berry
D. T. R.
Baer
R. A.
Rinaldo
J.
Wetter
M. W.
Butcher
J.
Assessment of malingering
Clinical personality assessment: Practical approaches
 , 
2002
2nd ed.
New York
Oxford University Press
(pg. 
269
-
302
)
Berry
D. T. R.
Schipper
L. J.
Larrabee
G. J.
Detection of feigned psychiatric symptoms during forensic neuropsychological evaluations
Assessment of malingered neuropsychological deficits
 , 
2007
New York
Oxford University Press
(pg. 
226
-
263
)
Binder
L. M.
Assessment of malingering after mild head trauma with the Portland Digit Recognition Test
Journal of Clinical and Experimental Neuropsychology
 , 
1993
, vol. 
15
 (pg. 
170
-
182
)
Binder
L. M.
Willis
S. C.
Assessment of motivation after financially compensable minor head trauma
Psychological Assessment
 , 
1991
, vol. 
3
 (pg. 
175
-
181
)
Boone
K. B.
Assessment of feigned cognitive impairment: A neuropsychological perspective
 , 
2007
New York
Guilford Press
Bush
S. S.
Ruff
R. M.
Troster
A. I.
Barth
J. T.
Koffler
S. P.
Pliskin
N. H.
, et al.  . 
Symptom validity assessment: Practical issues and medical necessity. NAN position paper
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
419
-
426
)
*
Connell
K. E.
Detecting simulated versus genuine posttraumatic stress disorder (Doctoral dissertation, California School of Professional Psychology, 2004)
Dissertation Abstracts International
 , 
2005
, vol. 
65
 pg. 
5393
  
(UMI No. 3149303)
*
Covert
J. H.
Dissertation. Neurocognitive variables underlying group performance on a measure of effort: The Medical Symptom Validity Test (MSVT)
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2010
, vol. 
70
 pg. 
6544
 
Daubert v. Merrell Dow Pharmaceuticals Inc. 507 U.S. 579, 113 S. Ct. 2786
1993
Demakis
G. J.
Meta-analysis in neuropsychology: Basic approaches, findings, and applications
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
10
-
26
)
*
Farkas
M. R.
Ability of malingering measures to differentiate simulated versus genuine mental retardation
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2009
, vol. 
69
 
11-B
pg. 
7136
 
*
Graue
L. O.
Berry
D. T. R.
Clark
J. A.
Sollman
M. J.
Cardi
M.
Hopkins
J.
, et al.  . 
Identification of feigned mental retardation using the new generation of malingering detection instruments: Preliminary findings
The Clinical Neuropsychologist
 , 
2007
, vol. 
21
 (pg. 
929
-
942
)
Green
P.
Green's Word Memory Test for windows user's manual
 , 
2003
Edmonton, Alberta, Canada
Green's Publishing
Green
P.
Medical Symptom Validity Test for Windows: User's manual and program
 , 
2004
Edmonton, Alberta, Canada
Green's Publishing
Green
P.
Boone
K.
Spoiled for choice: Making comparisons between forced-choice effort tests
Assessment of feigned cognitive impairment: A neuropsychological perspective
 , 
2007
New York
The Guilford Press
(pg. 
50
-
77
)
*
Greub
B. L.
The validity of the LMT as a measure of memory malingering: Robustness to coaching (Doctoral dissertation, Ohio University, 2005)
Dissertation Abstracts International
 , 
2005
, vol. 
66
 pg. 
552
 
*
Greub
B. L.
Suhr
J. A.
The validity of the Letter Memory Test as a measure of memory malingering: Robustness to coaching
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
21
 (pg. 
249
-
254
)
*
Greve
K. W.
Bianchini
K. J.
Black
F. W.
Heinly
M. T.
Love
J. M.
Swift
D. A.
, et al.  . 
Classification accuracy of the Test of Memory Malingering in persons reporting exposure to environmental and industrial toxins: Results of a known-groups analysis
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
21
 (pg. 
439
-
448
)
*
Greve
K. W.
Bianchini
K. J.
Doane
B. M.
Classification accuracy of the Test of Memory Malingering in Traumatic Brain Injury: Results of a known-groups analysis
Journal of Clinical and Experimental Neuropsychology
 , 
2006
, vol. 
28
 (pg. 
1176
-
1190
)
*
Greve
K. W.
Etherton
J. L.
Ord
J.
Bianchini
K. J.
Curtis
K. L.
Detecting malingered pain-related disability: Classification accuracy of the Test of Memory Malingering
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
1250
-
1271
)
*
Greve
K. W.
Ord
J.
Curtis
K. L.
Bianchini
K. J.
Brennan
A.
Detecting malingering in traumatic brain injury and chronic pain: A comparison of three forced-choice symptom validity tests
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
896
-
918
)
*
Harrison
C.
Assessing malingered responding: Concurrent validation of a forced-choice test using ink blot stimuli for the identification of malingered responses
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2009
, vol. 
70
 pg. 
3782
 
Hedges
L. V.
Olkin
I.
Statistical methods for meta-analysis
 , 
1985
San Diego, CA
Academic Press
Hiscock
M.
Hiscock
C. L.
Refining the forced-choice method for the detection of malingering
Journal of Clinical and Experimental Neuropsychology
 , 
1989
, vol. 
11
 (pg. 
967
-
974
)
*
Hubbard
K. L.
Feigning cognitive deficits among psychiatric inpatients: Validation of three measures of cognitive malingering
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2008
, vol. 
68
 pg. 
6966
 
*
Inman
T. H.
Berry
D. T. R.
Cross-validation of indicators of malingering: A comparison of nine neuropsychological tests, four tests of malingering, and behavioral observations
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
1
-
23
)
*
Inman
T. H.
Vickery
C. D.
Berry
D. T. R.
Lamb
D. G.
Edwards
C. L.
Smith
G. T.
Development and initial validation of a new procedure for evaluating adequacy of effort given during neuropsychological testing: The Letter Memory Test
Psychological Assessment
 , 
1998
, vol. 
10
 (pg. 
128
-
149
)
Iverson
G.
Franzen
M.D.
McCracken
L.
Evaluation of an objective assessment technique for the detection of malingered memory deficits
Law and Human Behavior
 , 
1991
, vol. 
15
 (pg. 
667
-
676
)
Johnson
B. T.
DSTAT 1.10 manual
 , 
1993
Hillsdale, NJ
Lawrence Erlbaum Associates, Inc
Larrabee
G. J.
Detection of malingering using atypical performance patterns on standard neuropsychological tests
The Clinical Neuropsychologist
 , 
2003
, vol. 
17
 (pg. 
410
-
425
)
Lezak
M. D.
Neuropsychological assessment
 , 
1976
New York
Oxford University Press
*
Lindstrom
W. A.
Lindstrom
J. H.
Coleman
C.
Nelson
J.
Gregg
N.
The diagnostic accuracy of symptom validity tests when used with postsecondary students with learning disabilities: a preliminary investigation
Archives of Clinical Neuropsychology
 , 
2009
, vol. 
24
 (pg. 
659
-
669
)
Lipsey
M. W.
Wilson
D. B.
Practical meta-analysis
 , 
2001
Thousand Oaks, CA
Sage Publications
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
 , 
2002
, vol. 
24
 (pg. 
1094
-
1098
)
Nelson
N. W.
Sweet
J. J.
Demakis
G. J.
Meta-analysis of the MMPI-2 Fake Bad Scale: Utility in forensic practice
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
39
-
58
)
*
Orey
C.
Cragar
D. E.
Berry
D. T. R.
The effects of two motivational manipulations on the neuropsychological performance of mildly head-injured college students
Archives of Clinical Neuropsychology
 , 
2000
, vol. 
15
 (pg. 
335
-
348
)
*
Pivovarova
E.
Rosenfeld
B.
Dole
T.
Green
D.
Zapf
P.
Are measures of cognitive effort and motivation useful in differentiating feigned from genuine psychiatric symptoms?
The International Journal of Forensic Mental Health
 , 
2009
, vol. 
8
 (pg. 
271
-
278
)
*
Rees
L. M.
Tombaugh
T. N.
Gansler
D. A.
Moczynski
N. P.
Five validation experiments of the Test of Memory Malingering (TOMM)
Psychological Assessment
 , 
1998
, vol. 
10
 (pg. 
10
-
20
)
*
Rosenfeld
B.
Green
D.
Pivovarova
E.
Dole
T.
Zapf
P.
What to do with contradictory data? Approaches to the integration of multiple malingering measures
The International Journal of Forensic Mental Health
 , 
2010
, vol. 
9
 (pg. 
63
-
73
)
Rosenthal
R.
Meta-analytic procedures for social research (Rev. ed.)
 , 
1991
Thousand Oaks, CA
Sage Publications
*
Samra
J.
The impact of depression on multiple measures of malingering (Doctoral dissertation, Simon Fraser University, 2002)
Dissertation Abstracts International
 , 
2004
, vol. 
64
 pg. 
4061
 
*
Schipper
L. J.
Berry
D. T. R.
Coen
E.
Clark
J. A.
Cross-validation of a manual form of the Letter Memory Test using a known-groups methodology
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
345
-
349
)
*
Shandera
A. L.
Berry
D. T. R.
Clark
J. A.
Schipper
L. J.
Graue
L. O.
Harp
J. P.
Detection of malingered mental retardation
Psychological Assessment
 , 
2010
, vol. 
22
 (pg. 
50
-
56
)
Sharp
D.
Of apples and oranges, file drawers, and garbage: Why validity issues in meta-analysis will not go away
Clinical Psychology Review
 , 
1997
, vol. 
17
 (pg. 
881
-
901
)
*
Sholtz
B. P.
Effects of cautioning and education in the detection of malingered mild traumatic brain injury
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2006
, vol. 
67
 pg. 
2243
 
*
Singhal
A.
Green
P.
Ashaye
K.
Shankar
K.
Gill
D.
High specificity of the Medical Symptom Validity Test in patients with very severe memory impairment
Archives of Clinical Neuropsychology
 , 
2009
, vol. 
24
 (pg. 
721
-
728
)
*
Slick
D. J.
The Victoria Symptom Validity Test: A new clinical measure of response bias (Doctoral dissertation, University of Victoria, 1996)
Dissertation Abstracts International
 , 
1996
, vol. 
59
 pg. 
6114
 
Slick
D.
Hopp
G.
Strauss
E.
Spellacy
F.
Victoria Symptom Validity Test: Efficiency for detecting feigned memory impairment and relationship to neuropsychological tests and MMPI-2 validity scales
Journal of Clinical and Experimental Neuropsychology
 , 
1996
, vol. 
18
 (pg. 
911
-
922
)
Slick
D. J.
Sherman
E. M. S.
Iverson
G. L.
Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research
The Clinical Neuropsychologist
 , 
1999
, vol. 
13
 (pg. 
465
-
473
)
*
Sollman
M. J.
Ranseen
J. R.
Berry
D. T. R.
Detection of feigned ADHD in college students
Psychological Assessment
 , 
2010
, vol. 
22
 (pg. 
325
-
335
)
*
Strauss
E.
Slick
D. J.
Levy-Bencheton
J.
Hunter
M.
MacDonald
S. W. S.
Hultsch
D. F.
Intraindividual variability as an indicator of malingering in head injury
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
423
-
444
)
*
Tombaugh
T. N.
The Test of Memory Malingering (TOMM): Normative data from cognitively intact and cognitively impaired individuals
Psychological Assessment
 , 
1997
, vol. 
9
 (pg. 
260
-
268
)
*
Vagnini
V. L.
Berry
D. T. R.
Clark
J. A.
Jiang
Y.
New measures to detect malingered neurocognitive deficit: Applying reaction time and event-related potentials
Journal of Clinical and Experimental Neuropsychology
 , 
2008
, vol. 
30
 (pg. 
766
-
776
)
*
Vagnini
V. L.
Sollman
M. J.
Berry
D. T. R.
Granacher
R. P.
Clark
J. A.
Burton
R.
, et al.  . 
Known-groups cross-validation of the Letter Memory Test in a compensation-seeking mixed neurologic sample
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
289
-
304
)
*
Vickery
C. D.
Berry
D. T. R.
Dearth
C. S.
Vagnini
V. L.
Baser
R. E.
Cragar
D. E.
, et al.  . 
Head injury and the ability to feign neuropsychological deficits
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
37
-
48
)
Vickery
C. D.
Berry
D. T. R.
Inman
T. H.
Harris
M. J.
Orey
S. A.
Detection of inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures
Archives of Clinical Neuropsychology
 , 
2001
, vol. 
16
 (pg. 
45
-
73
)
Viglione
D. J.
Wright
D.
Moynihan
N.
Dizon
J. E.
DuPuis
S.
Pizitz
T. D.
Evading detection on the MMPI-2: Does caution produce more realistic patterns of responding?
Psychological Assessment Resources
 , 
2001
, vol. 
8
 (pg. 
237
-
250
)
*
Wolfe
P. L.
Effects of coaching on simulated malingering performance: Comparison of three symptom validity measures (Doctoral dissertation, Illinois Institute of Technology, 2003)
Dissertation Abstracts International
 , 
2004
, vol. 
64
 pg. 
6346
  
(UMI No. 3117134)
*
Wood
S. M.
Unique contributions of performance and self-report methods in the detection of malingered psychotic symptoms
Dissertation Abstracts International, Section B: The Sciences and Engineering
 , 
2009
, vol. 
70
 
3-B
pg. 
1961
 
Youngjohn
J. R.
Lees-Haley
P. R.
Binder
L. M.
Comment: warning malingerers produces more sophisticated malingering
Archives of Clinical Neuropsychology
 , 
1999
, vol. 
14
 (pg. 
4511
-
4516
)

Author notes

Present address: Department of Neurology, Wake Forest University School of Medicine, Winston-Salem, NC, USA.