A prior report found unusually high rates of performance validity test (PVT) failure in undergraduate research participants (31%–56%). The present study examined 110 undergraduate volunteers in three conditions (positive, neutral, or negative demand characteristics) in either an easy to hard or a hard to easy progression of neuropsychological tests using the Word Memory Test PVT. Neither demand characteristics nor test order had a substantial effect on test performance, and only a 6.4% failure rate was found on the PVT. These results suggest that neuropsychological testing experiments are completed faithfully by the vast majority of college undergraduates, although excluding the small number of participants failing PVTs would strengthen the internal validity of most studies.

Introduction

An, Zakzanis, and Joordens (2012) reported that 30.8%–55.6% of college undergraduates failed at least one of three performance validity tests (PVTs), such as the Test of Memory Malingering, Dot Counting Test, or Victoria Symptom Validity Test, in a study completed with introductory psychology class volunteers who received extracredit for participation. The study included other neuropsychological and psychological measures, such as the Raven's Advanced Progressive Matrices, Delis–Kaplan Executive Function System Color-Word Interference tasks, the Wechsler Memory Scale-Third Edition, and the Beck Depression Inventory (BDI). To our knowledge, this has been the only study investigating suboptimal effort in this population. Effort test failure correlated with poor performance on neuropsychological testing and, though the sample size was small on repeat testing, those who failed effort testing during the first session tended to fail again in the second session. The authors concluded that such high rates of poor effort call into question the validity of research using undergraduate college volunteers.

An and colleagues (2012) results run counter to work in our laboratory where over 170 undergraduate volunteers have been tested as controls using either the Word Memory Test (WMT; Green, 2003) or the Medical Symptom Validity Test (MSVT; Green, 2004) over the past 5 years in studies examining simulation of various disorders (e.g., attention deficit hyperactivity disorder, learning disability, and mild traumatic brain injury). In the control groups in these studies, we found an overall rate of 2.1% failure with a range of 1.9%–2.6% across four studies. As a result, we designed the current study to test failure on the WMT (Green, 2003), which is generally held to be the most sensitive measure of effort currently available (Gervais, Rohling, Green, & Ford, 2004; Green, 2007; O'Bryant & Lucas, 2006; Tan, Slick, Strauss, & Hultsch, 2002). We decided to include different demand characteristics in the experimental situation to examine whether differential treatment of research participants accounted for differences in PVT scores. Additionally, we used a rationally derived easy to hard test progression with half of the participants, and a hard to easy progression with the other half, to examine whether differences in these variables controlled the failure rate of undergraduate volunteers. Considering the low insufficient effort rates (e.g., ∼2.1%) in undergraduate volunteers tested on the WMT or MSVT in our laboratory and McCambridge, de Bruin, and Witton's (2012) study on demand characteristics, we hypothesized that similar failure rates would be found in both the positive and neutral condition as opposed to the negative condition, which would exhibit higher failure rates, regardless of test progression.

Methods

A sample of 110 psychology undergraduate volunteers who received extracredit for voluntary participation were included in the study and divided into three conditions as follows: positive, neutral, and negative. Table 1 displays the participants' demographic information per condition. After signing the informed consent, participants were asked whether they had a pre-existing neurological and/or psychiatric conditions; participants who had any of such conditions, except those previously or currently treated for depression, anxiety disorders, and/or Attention Deficit Hyperactivity Disorder (ADHD) were excluded (Table 1). Five participants were excluded in total. There were no significant differences between the three experimental conditions with regard to age, gender, race, education level, and psychiatric disorder. Participants self-registered for time slots through an online experiment management system (SONA Systems, Ltd, Version 2.72; Tallinn, Estonia) that was only available to students enrolled in psychology classes. The time slots on SONA for participants to sign up were predetermined considering research assistants' schedules. Therefore, group assignment was contingent upon the participants' self-registered time slot and the research assistant's schedules, although the participants were blind to the conditions. The positive condition included greeting the participants as they arrived with a smile and an accommodating manner, and using optimistic encouragement throughout the testing. In contrast, the negative condition consisted of a brusque, but not rude, manner and a 15-min wait before commencing testing. The research assistant had an expedient and somewhat harried style of test administration without anticipatory accommodation to subject needs. The neutral manner was a professional interpersonal style that was neither positive nor negative in approach. In addition, half of the participants in each condition were given an easy to hard order of tests and the other half were given tests in the reverse order. The battery of tests included: Beck Anxiety Inventory (BAI; Beck & Steer, 1993); BDI-Second Edition (BDI-II; Beck, Steer, & Brown, 1996); Stroop Word Reading and Color-Word pages (Stroop-A and -C; Golden, 1978); Mini-International Personality Item Pool (Mini-IPIP; Donnellan, Oswald, Baird, & Lucas, 2006); Shipley-2 Abstraction and Vocabulary (Shipley-2-A and -V; Shipley, Gruber, Martin, & Klein, 2009); Trail Making Test Parts A and B (TMT-A and -B; Army Individual Test Battery, 1944); Wide Range Achievement Test-Fourth Edition Word Reading and Sentence Comprehension (WRAT-4-WR and -SC; Wilkinson & Robertson, 2006); and Word Memory Test Immediate Recognition (WMT-IR), Delayed Recognition (WMT-DR), Consistency (WMT-CNS), Multiple Choice (WMT-MC), Paired Associates (WMT-PA), Free Recall (WMT-FR), and Long Delayed Free Recall (WMT-LDFR). Apart from the WMT, which was administered in order to accommodate the test's timing of delay subtests, the administration order from easy to hard included: Stroop-A, TMT-A, Shipley-2-V, WMT-IR, BAI, BDI-II, Mini-IPIP, WRAT-4-WR, WRAT-4-SC, WMT-DR, WMT-CNS, WMT-MC, WMT-PA, WMT-FR, Stroop-C, TMT-B, Shipley-2-A, and WMT-LDFR. The administration order from hard to easy includes: Shipley-2-A, TMT-B, Stroop-C, WMT-IR, BAI, BDI-II, Mini-IPIP, WRAT-4-SC, WRAT-4-WR, WMT-DR, WMT-CNS, WMT-MC, WMT-PA, WMT-FR, Shipley-2-V, TMT-A, Stroop-A, and WMT-LDFR.

Table 1.

Demographics and psychiatric disorders by condition

Demographics Positive Neutral Negative Total (%) 
N 35 39 36 110 (100%) 
Age 
 Mean 22.4 22.6 24.4 23.1 
 Range 18–48 18–53 18–51 18–53 
Gender 
 Male 23 (21%) 
 Female 28 30 29 87 (79%) 
Race 
 Asian 11 (10%) 
 Black 13 (12%) 
 Hispanic 9 (8%) 
 White 23 26 23 72 (65%) 
 Other 5 (5%) 
Education level 
 Mean 14.4 14.2 14.4 14.3 
 Range (12–17) (12–17) (12–17) 12–17 
Psychiatric disorders 
 Anxiety 8 (7%) 
 Depression 5 (5%) 
 ADHD 3 (3%) 
 Comorbidity 6 (5%) 
Demographics Positive Neutral Negative Total (%) 
N 35 39 36 110 (100%) 
Age 
 Mean 22.4 22.6 24.4 23.1 
 Range 18–48 18–53 18–51 18–53 
Gender 
 Male 23 (21%) 
 Female 28 30 29 87 (79%) 
Race 
 Asian 11 (10%) 
 Black 13 (12%) 
 Hispanic 9 (8%) 
 White 23 26 23 72 (65%) 
 Other 5 (5%) 
Education level 
 Mean 14.4 14.2 14.4 14.3 
 Range (12–17) (12–17) (12–17) 12–17 
Psychiatric disorders 
 Anxiety 8 (7%) 
 Depression 5 (5%) 
 ADHD 3 (3%) 
 Comorbidity 6 (5%) 

Notes: Education level is provided in years of education completed. Comorbidity refers to participants who reported having been diagnosed with both depression and anxiety.

Results

Table 2 displays group performance across all measures used in the study. Order of test presentation had no effect on any variable according to multiple one-way ANOVAs (p > .05), so results were collapsed across order. The lone significant difference occurred for the Stroop-C task where the negative condition performed more poorly than the positive condition (p < .05), while neither differed from the neutral condition. As evident in the table, condition did not affect performance on the WMT, with only 6.4% (n = 7) of participants failing according to the criterion of below cut-off performance on any of the five effort indices (WMT-IR, -DR, -CNS, -MC, and -PA per the WMT manual: Green, 2004). Failures occurred roughly equally across conditions; two failures each in the positive and neutral condition, and three failures in the negative condition with scores as shown in Table 3.

Table 2.

Test performance means and standard errors by condition

Test Positive mean (SE) Neutral mean (SE) Negative mean (SE) 
TMT-A 23 (1.4) 23 (1.3) 23 (1.4) 
TMT-B 64 (3.5) 61 (3.3) 62 (3.4) 
Shipley-V 29 (0.8) 27 (0.8) 29 (0.8) 
Shipley-A 14 (0.5) 14 (0.5) 14 (0.5) 
Stroop-A 100 (3.2) 94 (3.1) 101 (3.2) 
Stroop-C 54 (2.1)a 47 (2.0)ab 48 (2.1)b 
WMT-IR 98 (0.9) 99 (0.8) 98 (0.9) 
WMT-DR 98 (1.7) 98 (1.7) 96 (1.7) 
WMT-CNS 97 (1.1) 97 (1.1) 97 (1.1) 
WMT-MC 96 (2.3) 92 (2.3) 93 (2.4) 
WMT-PA 95 (2.0) 93 (2.0) 93 (2.0) 
WMT-FR 69 (2.8) 64 (2.7) 60 (2.8) 
WMT-LDFR 70 (2.8) 66 (2.7) 62 (2.8) 
WRAT-4-WR 45 (0.9) 45 (0.8) 44 (0.9) 
WRAT-4-SC 45 (0.9) 43 (0.8) 44 (0.9) 
BAI 8 (1.3) 10 (1.2) 10 (1.3) 
BDI-II 8 (1.4) 11 (1.3) 9 (1.4) 
Mini-IPIP-E 12 (0.5) 12 (0.4) 11 (0.5) 
Mini-IPIP-A 13 (0.5) 14 (0.5) 13 (0.5) 
Mini-IPIP-C 13 (0.5) 14 (0.5) 13 (0.5) 
Mini-IPIP-N 11 (0.5) 11 (0.4) 11 (0.5) 
Mini-IPIP-O 11 (0.5) 11 (0.5) 11 (0.5) 
Test Positive mean (SE) Neutral mean (SE) Negative mean (SE) 
TMT-A 23 (1.4) 23 (1.3) 23 (1.4) 
TMT-B 64 (3.5) 61 (3.3) 62 (3.4) 
Shipley-V 29 (0.8) 27 (0.8) 29 (0.8) 
Shipley-A 14 (0.5) 14 (0.5) 14 (0.5) 
Stroop-A 100 (3.2) 94 (3.1) 101 (3.2) 
Stroop-C 54 (2.1)a 47 (2.0)ab 48 (2.1)b 
WMT-IR 98 (0.9) 99 (0.8) 98 (0.9) 
WMT-DR 98 (1.7) 98 (1.7) 96 (1.7) 
WMT-CNS 97 (1.1) 97 (1.1) 97 (1.1) 
WMT-MC 96 (2.3) 92 (2.3) 93 (2.4) 
WMT-PA 95 (2.0) 93 (2.0) 93 (2.0) 
WMT-FR 69 (2.8) 64 (2.7) 60 (2.8) 
WMT-LDFR 70 (2.8) 66 (2.7) 62 (2.8) 
WRAT-4-WR 45 (0.9) 45 (0.8) 44 (0.9) 
WRAT-4-SC 45 (0.9) 43 (0.8) 44 (0.9) 
BAI 8 (1.3) 10 (1.2) 10 (1.3) 
BDI-II 8 (1.4) 11 (1.3) 9 (1.4) 
Mini-IPIP-E 12 (0.5) 12 (0.4) 11 (0.5) 
Mini-IPIP-A 13 (0.5) 14 (0.5) 13 (0.5) 
Mini-IPIP-C 13 (0.5) 14 (0.5) 13 (0.5) 
Mini-IPIP-N 11 (0.5) 11 (0.4) 11 (0.5) 
Mini-IPIP-O 11 (0.5) 11 (0.5) 11 (0.5) 

Notes: Superscript letters signify statistical significance between scores not sharing the same superscript letters, F(2, 106) = 3.03, p = .05, R2 = .05. Group means and standard errors (SEs) are shown per each test. TMT-A = Trail Making, Part A raw seconds; TMT-B = Trail Making, Part B raw seconds; Shipley-V = Shipley-2 Vocabulary raw correct; Shipley-A = Shipley-2 Abstract raw correct; Stroop-A = Stroop Word Reading page raw number count; Stroop-C = Stroop Color-Word page raw number count; WMT-IR = Word Memory Test Immediate Recognition; WMT-DR = Word Memory Test Delayed Recognition; WMT-CNS = Word Memory Test Consistency; WMT-MC = Word Memory Test Multiple Choice; WMT-PA = Word Memory Test Paired Associates; WMT-FR = Word Memory Test Free Recall; WMT-LDFR = Word Memory Test Long Delayed Free Recall; WRAT-4-WR = Wide Range Achievement Test-Fourth Edition Word Reading; WRAT-4-SC = Wide Range Achievement Test-Fourth Edition Sentence Comprehension; BAI = Beck Anxiety Inventory; BDI-II = Beck Depression Inventory-Second Edition; Mini-IPIP-E = Mini-International Personality Item Pool Extraversion; Mini-IPIP-A = Mini-International Personality Item Pool Agreeableness; Mini-IPIP-C = Mini-International Personality Item Pool Conscientiousness; Mini-IPIP-N = Mini-International Personality Item Pool Neuroticism; and Mini-IPIP-O = Mini-International Personality Item Pool Openness.

Table 3.

Scores by condition for those seven subjects who failed the WMT

IR DR CNS MC PA Condition 
58 100 58 85 90 Positive 
80 65 75 35 35 Positive 
90 80 75 60 55 Neutral 
93 88 80 85 75 Neutral 
100 95 95 75 55 Negative 
88 98 85 50 55 Negative 
88 80 78 50 50 Negative 
IR DR CNS MC PA Condition 
58 100 58 85 90 Positive 
80 65 75 35 35 Positive 
90 80 75 60 55 Neutral 
93 88 80 85 75 Neutral 
100 95 95 75 55 Negative 
88 98 85 50 55 Negative 
88 80 78 50 50 Negative 

Notes: IR = Immediate Recognition; DR = Delayed Recognition; CNS = Consistency; MC = Multiple Choice; PA = Paired Associates; FR = Free Recall; LDFR = Long Delayed Free Recall.

Discussion

As noted in McCambridge and colleagues (2012), there are few studies that examine demand characteristics in non-laboratory settings (the authors found only one experimental and six observational studies). Thus, they concluded that an understanding of this important and well-known effect is unknown in situations such as clinical neuropsychological testing. Contrary to our hypothesis in regard to the negative condition exhibiting a higher rate of effort failure, the results of the present study suggested that PVT failure, as measured by WMT, is similar in the three experimental conditions (positive, neutral, and negative) and is relatively low as found in other studies involving undergraduate volunteers conducted in our laboratory. Specifically, two conclusions are suggested. First, undergraduate participants in our laboratory rarely fail the WMT when given standard instructions. Secondly, neither experimenter demeanor nor tests ordered according to difficulty impacted WMT performance in the current results. We believe the WMT to be robust to standard clinical conditions in neuropsychological evaluations and research studies.

The present results found only a 6.4% WMT failure rate, which are in direct contrast to the enormous failure rates in the An and colleagues study (31%–56%). This large discrepancy is difficult to reconcile by methodological differences across the two studies, and we believe the An and colleagues results to be aberrant. For example, we cannot attribute such differences to differing types and numbers of PVTs since the WMT has generally been found to be the most sensitive at high levels of specificity across many different studies using a wide range of populations (Gervais et al., 2004; Green, 2003; Green, Montijo, & Brockhaus, 2011; Tan et al., 2002). It is also noted that reported base rates of malingering for non-litigant cases fall between 7.10% and 11.56% (Mittenberg, Patton, Canyock, & Condit, 2002), a figure much closer to 6% than a third to more than half of the subjects. Therefore, it is recommended that this discrepancy in effort failure be reconciled in further work. Specifically, while demand characteristics and test order do not seem to be substantial moderators of PVT failure according to present results, further studies exploring these issues in more detail would be useful. Additionally, an aberrant sampling distribution is possible since An and colleagues included only 36 participants. Future studies should include larger sample sizes. Finally, it is possible that different strategies for feigning deficits may be detected by using different PVT instruments, as was done in the An and colleagues study. Future work should include a variety of PVTs that might better detect differing feigning strategies; however, PVTs that are included should have demonstrated high sensitivity when specificity is held at 90% or better. Future work may also note the limitations in the present study, as noted below.

Limitations in the present study include the lack of a post-experiment questionnaire to determine the participants' perception about the positive, negative, and neutral demand characteristics. Likewise, perceived test order difficulty was not empirically determined for the present study.

Acknowledgements

We thank the psychology undergraduate volunteers who participated in the study. Special thanks to our research assistants, Emily Kennedy-Hettwer, Ashten Morth, Elizabeth Peters, Blake Hummer, Abbey Van Boxtel, Erin Giese, and Olivia Harmelink, who helped with data collection and project management.

References

An
K. Y.
Zakzanis
K. K.
Joordens
S.
Conducting research with non-clinical healthy undergraduates: Does effort play a role in neuropsychological test performance?
Archives of Clinical Neuropsychology
 
2012
27
8
849
857
Army Individual Test Battery
Manual of directions and scoring
 
1944
Washington, DC
War Department, Adjutant General's Office
Beck
A. T.
Steer
R. A.
Beck Anxiety Inventory manual
 
1993
San Antonio, TX
The Psychological Corporation
Beck
A. T.
Steer
R. A.
Brown
G. K.
Manual for the Beck Depression Inventory-II
 
1996
San Antonio, TX
The Psychological Corporation
Donnellan
M. B.
Oswald
F. L.
Baird
B. M.
Lucas
R. E.
The mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality
Psychological Assessment
 
2006
18
192
203
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
A comparison of WMT, CARB, and TOMM failure rates in non-head injury disability claimants
Archives of Clinical Neuropsychology
 
2004
19
475
487
Golden
C. J.
Stroop color and word test: A manual for clinical and experimental uses
 
1978
Chicago, IL
Stoelting
Green
P.
Green’s word memory test for microsoft windows: User’s manual
 
2003
Edmonton, Canada
Green's Publishing Inc
Green
P.
Green's medical symptom validity test (MSVT) for microsoft windows: User's manual
 
2004
Edmonton, Canada
Green's Publishing Inc
Green
P.
Boone
K. B.
Spoiled for choice: Making comparisons between forced-choice effort tests
Assessment of feigned cognitive impairment
 
2007
New York
Guilford Press
50
77
Green
P.
Montijo
J.
Brockhaus
R.
High specificity of the Word Memory Test and Medical Symptom Validity Test in groups with severe verbal memory impairment
Applied Neuropsychology
 
2011
18
2
86
94
McCambridge
J.
de Bruin
M.
Witton
J.
The effects of demand characteristics on research participant behaviours in non-laboratory settings: A systematic review
PLoS ONE
 
2012
7
6
e39116
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
 
2002
24
8
1094
1102
O'Bryant
S. E.
Lucas
J. A.
Estimating the predictive value of the Test of Memory Malingering: An illustrative example for clinicians
The Clinical Neuropsychologist
 
2006
20
533
540
Shipley
W. C.
Gruber
C. P.
Martin
T. A.
Klein
A. M.
Shipley-2 manual
 
2009
Los Angeles
Western Psychological Services
Tan
J.
Slick
D.
Strauss
E.
Hultsch
D. F.
Malingering strategies on symptom validity tests
The Clinical Neuropsychologist
 
2002
16
4
495
505
Wilkinson
G. S.
Robertson
G. J.
Wide range achievement test 4 professional manual
 
2006
Lutz, FL
Psychological Assessment Resources