## Abstract

Poor effort by examinees during neuropsychological testing has a profound effect on test performance. Although neuropsychological experiments often utilize healthy undergraduate students, the test-taking effort of this population has not been investigated previously. The purpose of the present study was to determine whether undergraduate students exercise variable effort in neuropsychological testing. During two testing sessions, participants (N = 36) were administered three Symptom Validity Tests (SVTs), the Test of Memory Malingering, the Dot Counting Test, and the Victoria Symptom Validity Test (VSVT), along with various neuropsychological tests. Analyses revealed 55.6% of participants in Session 1 and 30.8% of participants in Session 2 exerted poor effort on at least one SVT. Poor effort on the SVTs was significantly correlated with poor performance on various neuropsychological tests and there was support for the temporal stability of effort. These preliminary results suggest that the base rate of suboptimal effort in a healthy undergraduate population is quite high. Accordingly, effort may serve as a source of variance in neuropsychological research when using undergraduate students.

## Introduction

Since the mid-1900's, the use of undergraduate students as research participants has been a popular practice at research universities. In fact, a recent analysis of the literature estimates the prevalence of undergraduate students in psychological research to be 68% with slightly less than half of this participant pool being first year introductory psychology students (Gallander Wintre, North, & Sugar, 2001). Furthermore, there has been no significant decrease in this practice over the past few decades, as studies illustrate undergraduate participant pool usage ranging from 70% in 1975 to 67% in 1985 to 68% in 1995 (Gallander Wintre et al., 2001). Although there have been no published studies on the prevalence of undergraduate students as participants in neuropsychological research, one might predict that a substantial portion of neuropsychological studies utilizes this population as well, especially as baseline comparisons to clinical populations.

One issue regarding the use of undergraduate students that has yet to be investigated concerns the test-taking effort exercised by these participants during neuropsychological research experiments. Several studies have investigated the base rate of malingering in clinical and medical populations (e.g., Mittenberg, Patton, Canyock, & Condit, 2002). However, there exist no studies that we know of to date whereby the investigators have examined the base rate of suboptimal effort in undergraduate research participants. Nevertheless, this is a relevant issue for neuropsychological research as effort can influence performance on any given neuropsychological measure and, therefore, compromise the validity of findings (Green, Rohling, Lees-Haley, & Allen, 2001; Fox, 2011). In effect, a valid test score on any neuropsychological measure rests on the assumption that the participant has exerted optimal effort such that their performance reflects their true ability.

The need to assess suboptimal effort in undergraduate participants is based on two factors. First, many students may perceive the stakes as being low. For example, many undergraduate students participate in psychology studies to earn extra course credits. In keeping with ethical guidelines, these credits are usually rewarded on a non-contingent basis. In other words, participants will be rewarded promised credits regardless of their level of performance during the experiment. Low-stake testing are those situations in which there is no personal consequence for poor effort during testing (Wise, 2006) and describes the circumstance for many university research experiments. Second, the tedious nature of neuropsychological testing may contribute to mental fatigue in student participants. Studies may span several hours and/or consist of neuropsychological tests that may be mentally tiresome to examinees, thereby, influencing the effort level of participants over time. Indeed, various studies in the literature have shown that mental fatigue increases as a function of the task length (Mackworth, 1968; Kato, Endo, & Kizuka, 2009), raising the possibility that test-taking effort may also decrease with lengthy or repetitive tasks. Thus, many of the published neuropsychology studies that recruit participants from such pools may be using undergraduate participants who may not be motivated to perform at their best and whose effort levels may be influenced by a variety of factors. Although this may not be unique to undergraduate students and may in fact represent a broader problem spanning across various research populations, we hypothesize that this issue is salient in undergraduate participants as they represent a majority of the research participants at universities.

Collaterally, it is important to note that there has been an increasing interest in test-taking effort and malingering in the past few decades (Morgan & Sweet, 2009). Although various terms exist in the literature to describe underperformance on tests, including non-optimal effort, suboptimal effort, and negative response bias (Iverson, 2006), the term poor effort is used in this context. To this end, we operationally define poor effort as underperformance (i.e., scores lower than expected for persons that give a credible effort) on standardized Symptom Validity Tests (SVTs), which is discussed in further detail below. At the same time, poor effort must be differentiated from malingering, which refers to the “intentional production of false or greatly exaggerated symptoms for the purpose of attaining some identifiable external reward” (American Psychiatric Association, 2000). Note that poor effort may encompass malingering, but the term is broader in the sense that the motivation for underperforming is unclear and may involve a lack of any incentive to perform well on tests.

Numerous types of measures are used in both clinical and research settings to examine poor effort and malingering. The SVT, first coined by Pankratz (Bianchini, Mathias, & Greve, 2001), is one such type of measures. These tests, while appearing to be difficult at the face value, are actually quite simple and insensitive to most mental health disorders and physical ailments (e.g., Huppert & Piercy, 1976; Martone, Butters, & Trauner, 1986; Seron, Deloche, Ferrand, & Cornet, 1991). Aberrant performance on such tests strongly suggests suboptimal effort and may invalidate other components of the evaluation (e.g., Ruocco et al., 2008; Zakzanis, Gammada, & Jeffay, 2012). Although research on SVTs is often criticized due to the lack of external validity of the simulation studies they are based on (Nies & Sweet, 1994), they are currently the best method for detecting suspect effort levels. Although various SVTs exist for different populations and purposes, they typically utilize a forced-choice paradigm, in which patients must discriminate between two stimuli and choose the one that was presented previously (Bianchini et al., 2001).

One value of these forced-choice tests relies on the fact that there is a known probability of success by guessing alone. When patients score below this probability, it suggests that they actually recognize the correct stimulus and but then actively choose the incorrect response (Bianchini et al., 2001). In addition, recognition memory is resilient to brain damage and other various psychiatric and medical disorders (Huppert & Piercy, 1976; Martone, Butters, & Trauner, 1986), making SVTs difficult to fail without intentionally attempting to do so.

Beyond forced-choice test measures, other tests have also been developed to examine poor effort and malingering (e.g., Dot Counting Test [DCT]; Boone, Lu, & Herzberg, 2002). These tests are based on overlearned skills and are also deceptively easy (Seron, Deloche, Ferrand, & Cornet, 1991). Poor effort in these cases can be detected by way of aberrant performance compared with the population norms.

Accordingly, the purpose of the present study is to investigate whether undergraduate introductory psychology students exercise variable effort in neuropsychology studies. This study examines three hypotheses. First, we hypothesize that a considerable number of participants who are healthy and representative of the general population of university undergraduate research pools that we ask to engage in our research will exert suboptimal effort. Second, we examine the temporal stability of effort during neuropsychological testing in order to determine whether effort is a transient state or a stable trait. Our hypothesis is that, on serial testing, a smaller portion of the participants who exhibit poor effort on initial testing will demonstrate similarly poor effort on follow-up testing. This would suggest that poor effort might be due to stable dispositional factors rather than varying situational factors. Third, we examine what variables might mediate or predict poor effort in participants. We hypothesized that academic performance, mood, and energy levels might play a moderating role here. Academic performance was investigated as previous research have shown that it is one variable that predicts students' participation in psychological research studies (Padilla-Walker et al., 2005), and thus we hypothesized that it might have a role in students' effort in the actual experiment as well. Similarly, various lines of research have established the effects of mood, sleep deprivation and fatigue on cognitive performance in both clinical (e.g., Váquez-Justo, Alvarez, & Otero, 2003) and student populations (e.g., Angus, Heslegrave, & Myles, 1985; Potts, Camp, & Coyne, 1989). We, therefore, set out to investigate whether such variables may also moderate effort levels.

## Methods

### Participants

Participants consisted of undergraduate students from an introductory psychology class recruited by way of the university's online recruitment system. Participation in these studies was not mandatory but students were able to receive extra course marks for compensation. Participants were excluded if they achieved a score below a Grade 4 level on the Word Reading and Sentence Comprehension subtests on the Wide Range Achievement Test Fourth Edition (WRAT4; Wilkinson & Robertson, 2006) to ensure that the participants could comprehend test instructions. The University of Toronto Research Ethics Board approved the study protocols, recruitment and use of human subjects.

### Measures

#### Symptom Validity Tests

The Test of Memory Malingering (TOMM; Tombaugh, 1996) is a 50-item forced-choice test of visual recognition memory and consists of two learning trials and a retention trial. After presentation of the 50 pictures during the learning trial, participants were shown 50 two-choice recognition panels one at a time consisting of a previously presented picture and a distractor picture. Participants were instructed to choose the picture presented previously. After each response, the examinee was given immediate feedback (i.e., “Correct” or “No, that's not right. It was this one.”). Participants were administered both learning trials of the TOMM, and the retention trial was administered only if participants received a score of <45 on Trial 2. To differentiate between participants who exert poor effort and those who exert normal effort on the TOMM, a cutoff score of <45 was used for Trial 2 and the Retention Trial, as was suggested in the test manual by Tombaugh. This cutoff score was based on normative data that showed that over 95% of the non-clinical sample obtained a score of 49 or 50 on Trial 2 and most of the clinical sample (except for the dementia group) obtained a perfect score on Trial 2 (Tombaugh, 1996).

We also employed the computerized Victoria Symptom Validity Test (VSVT; Slick, Hopp, & Strauss, 1997), which consists of 48 items employing the forced-choice paradigm. Items consist of a 5-digit string that are presented one at a time on a computer screen, followed by a delay of 5, 10, or 15 s in which the screen remains blank. Following the delay, participants must correctly choose between two 5-digit strings presented on the screen. Participants' response accuracy and response latency were measured. Cutoff scores were calculated for the VSVT based on data of the non-clinical sample as reported in the VSVT manual. Specifically, all cutoffs were derived based on the observation that 95% of the non-clinical sample scored better than the chosen cutoff. Hence, a cutoff of 44 was used for the total raw score, 23 for the easy category raw score, and 20 for the difficult category raw score. In terms of response latency scores, a cutoff of 2.43 s was used for the total response latency, 2.00 s for the easy category, and 2.95 s for the difficult category.

Finally, the DCT (Boone, Lu, & Herzberg, 2002) was also administered to participants to assess effort. This brief and simple test consists of counting grouped and ungrouped dots on a set of 12 cards. The skills involved in this task are well preserved in most patients with brain injury (Seron et al., 1991). Participants' response latency and errors were calculated for a total E-score. A cutoff E-score of 14 for non-clinical groups provides a sensitivity of 88.2% and a specificity of 96.1% (Boone et al., 2002). Hence, any score equal to or above the cutoff score was deemed to be poor effort.

#### Neuropsychological tests and questionnaires

Three commonly employed neuropsychological tests were administered and served to examine whether poor effort on the SVTs would predict poorer performance on these tests. The Raven's Advanced Progressive Matrice (RAPM; Raven, Raven, & Court, 1998), a measure of analytical intelligence, consists of a series of complex patterned matrices in which the examinee must conceive visual analogies to complete. The Delis–Kaplan Executive Function System (D-KEFS) Color-Word Interference Test (Delis, Kaplan, & Kramer, 2001) is a measure of executive function and more specifically, inhibition and cognitive flexibility. This test consists of four conditions. The Color Naming and Word Reading conditions serve as a baseline to compare the examinee's Inhibition and Inhibition/Switching scores and involve naming colored patches and reading words (“red”, “blue,” or “green”). The Inhibition condition involves incongruent color-word items, and participants are required to name the ink color while ignoring the word. The fourth and final condition further elaborates on the Inhibition condition by requiring participants to switch back and forth between naming the incongruent ink colors and reading the colored word. Finally, the Wechsler Memory Scale Third Edition Letter-Number Sequencing Test (The Psychological Corporation, 1997), a test of working memory using auditory stimuli, consists of a sequence of alternating letters and numbers in which the participant must mentally sort and recall the order of numbers first from lowest to highest and then letters in alphabetical order. These tests were selected based on their demonstrated large effect sizes across an array of neurological and psychiatric disorders (Heinrichs & Zakzanis, 1998; Zakzanis, Leach, & Kaplan, 1998, 1999; Zakzanis, 2000).

We also administered the Beck Depression Inventory (BDI; Beck, Steer, & Brown, 1996), Beck Anxiety Inventory (BAI; Beck & Steer, 1993), and an unstandardized self-report questionnaire that was created for the purpose of this study. The self-report questionnaire included items indexing demographic information, academic grades, present alertness, and relevant neurological and family histories. In addition, the Word Reading and Sentence Comprehension subtests from the WRAT4 (Wilkinson & Robertson, 2006) were administered to screen for participants' reading levels and sentence comprehension as noted previously.

### Procedure

Under the supervision of a board certified licensed clinical neuropsychologist, trained undergraduate research assistants explained the testing process to participants and obtained their consent to participate. Participants were informed that this study investigated their cognitive functioning, and no further information about the hypotheses was revealed. As this was a two part repeated-measures study, participants were aware that they were eligible to register for the follow-up testing (Session 2) any time after 4 weeks from the initial testing (Session 1) and were encouraged to participate. In order to encourage participation in the follow-up, we doubled participation credits as an incentive for attending Session 2 of the study in addition to the credits rewarded for Session 1 of the study.

After consent was obtained, participants were asked to complete a series of neuropsychological measures and questionnaires administered by the investigators or trained upper-year undergraduate research assistants. The research assistants were privy to the main study hypothesis. No conflict of interest existed between the undergraduate examiners and the participants. The tests were administered with standardized instructions and were conducted in a quiet, distraction-free testing room. The order of the tests was counterbalanced across participants to control for fatigue effects. Following the testing for Session 1 of the study, participants were further encouraged to continue participation in the follow-up testing after 4 weeks. Procedures for Session 2 were identical to Session 1. Participants were fully debriefed of the study after participation in Session 2.

## Results

Thirty-six (men, n = 5; women, n = 31) participants with an average age of 19.42 (SD = 2.61; range = 17–29 years) completed Session 1 of the study. The majority (94.4%) was right-handed and half of the participants indicated that English was their primary language. The sample consisted of mostly Asian (77.2%), followed by Other/Mixed (8.6%), White (8.6%), and African-American (5.7%) participants. The participants' parental post-secondary education for their mothers and fathers was 69.4% and 83.3%, respectively. The average family income was variable among the participants, with 33.3% earning more than $60,000 per year, 30.6% earning$35,000–$60,000 per year, and 36.1% earning less than$35,000 per year. None of the participants reported a history of neurological or psychiatric disease.

Out of the 36 participants completing Session 1, only 13 (36.1%) participated in the Session 2 follow-up, with an average duration of 79.77 days (SD = 33.98, range = 29–115) between the two testing sessions. There were no significant differences in Session 1 SVT scores between participants who participated in Session 2 and those who did not. The only variable that significantly differed between those who participated in Session 2 and those who did not was BDI total score, F(1,32) = 5.324, p = .028, d = 0.814. Participants who signed-up and participated in Session 2 had a higher BDI score (M = 17.15, SD = 11.022), indicating that they endorsed more symptoms of depression than participants choosing not to participate in Session 2 (M = 10.14, SD = 6.762).

Participants were categorized as either exerting poor effort or normal effort using the cutoff scores for the TOMM, DCT, and VSVT as described in the “Methods” section. The poor effort group consisted of participants who scored below the cutoff of at least one SVT. An analysis of the SVTs revealed that 55.6% (n = 20) of the participants in Session 1 met the criteria for the poor effort group. In addition, 30.8% (n = 4) of participants failed to meet the cutoff on at least one effort test in Session 2. The majority of poor effort participants scored below the cutoff on the VSVT (65%; n = 13) or the DCT (45%, n = 9) in Session 1, whereas the composition of poor effort participants for Session 2 consisted of failure on the DCT (75%; n = 3) and the VSVT (25%, n = 1). No participants scored below the cutoff on the TOMM for either Session 1 or Session 2. In addition, there were no participants who scored below chance performance for any of the SVTs. The large number of poor effort participants in Session 1 allowed for meaningful comparison between poor effort and normal effort groups, thus permitting analysis of our other hypotheses.

In examining whether a portion of participants who exercised poor effort on initial testing also received the same poor effort scores on follow-up testing, we found that three of the four participants who exerted poor effort in Session 2 also exerted poor effort in Session 1. Four participants who exerted poor effort in Session 1 exerted normal effort in Session 2, and one participant who exerted normal effort in Session 1 exerted poor effort in Session 2. Spearman's correlation was also conducted to compare the association between SVTs in Session 1 and Session 2. Significant correlations were found for all SVTs, most prominently for the DCT and VSVT (Table 1). As can be seen in Table 1, there is generally a high degree of correlation between Session 1 and Session 2 SVTs, arguing that participants' performance on the SVTs in Session 1 can be predicted with a high degree in Session 2.

Table 1.

Significant Spearman's correlations between Session 1 and Session 2 effort tests

Session 1 Session 2 r p (two tails)
TOMM Trial 1 TOMM Trial 2 .625 .022
DCT total errors DCT total errors .645 .017
DCT total errors DCT mean G time .624 .023
DCT total errors DCT E-score .613 .026
DCT mean G time DCT mean G time .665 .013
DCT E-score DCT mean G time .652 .016
DCT E-score DCT E-score .657 .015
VSVT total latency VSVT total latency .758 .011
VSVT total latency VSVT easy latency .685 .029
VSVT total latency DCT mean UG time .671 .017
VSVT total latency VSVT difficult latency .758 .011
VSVT easy latency VSVT total latency .758 .011
VSVT easy latency DCT mean UG time .685 .014
VSVT easy latency VSVT easy latency .658 .029
VSVT easy latency VSVT difficult latency .758 .011
VSVT difficult latency DCT mean UG time .704 .011
VSVT difficult latency VSVT total latency .770 .009
VSVT difficult latency VSVT easy latency .721 .019
VSVT difficult latency VSVT difficult latency .770 .009
Session 1 Session 2 r p (two tails)
TOMM Trial 1 TOMM Trial 2 .625 .022
DCT total errors DCT total errors .645 .017
DCT total errors DCT mean G time .624 .023
DCT total errors DCT E-score .613 .026
DCT mean G time DCT mean G time .665 .013
DCT E-score DCT mean G time .652 .016
DCT E-score DCT E-score .657 .015
VSVT total latency VSVT total latency .758 .011
VSVT total latency VSVT easy latency .685 .029
VSVT total latency DCT mean UG time .671 .017
VSVT total latency VSVT difficult latency .758 .011
VSVT easy latency VSVT total latency .758 .011
VSVT easy latency DCT mean UG time .685 .014
VSVT easy latency VSVT easy latency .658 .029
VSVT easy latency VSVT difficult latency .758 .011
VSVT difficult latency DCT mean UG time .704 .011
VSVT difficult latency VSVT total latency .770 .009
VSVT difficult latency VSVT easy latency .721 .019
VSVT difficult latency VSVT difficult latency .770 .009

Notes: TOMM = Test of Memory Malingering; DCT = Dot Counting Test; VSVT = Victoria Symptom Validity Test.

An ANOVA was conducted to examine whether poor effort scores were related to performance on the neuropsychological tests. The Bonferroni correction was used to correct for multiple comparisons. The analysis revealed several measures that significantly differed between poor effort and normal effort participants in Session 1. Specifically, poor effort participants performed significantly worse on the WRAT4 Sentence Comprehension subtest, F(1,34) = 5.630, p = .023, d = 0.796; D-KEFS Color-Word Interference Color-Naming, F(1,34) = 4.750, p = .036, d = 0.731; and D-KEFS Color-Word Interference Inhibition, F(1,34) = 4.457, p = .042, d = 0.707. Table 2 summarizes these findings by displaying the ANOVA results and effect sizes for the various neuropsychological tests in Session 1.

Table 2.

Neuropsychological test and questionnaire results of normal effort (n = 16) and poor effort (n = 20) participants for Session 1

Variable  Group M SD F(1,34) d
Final grades for PSYA01 Normal effort 78.69 11.603 0.322a 0.196
Poor effort 76.44 11.408
Alertness level (rating 1–5) Normal effort 3.50 1.033 0.209 0.153
Poor effort 3.65 .933
Beck Anxiety Inventory Normal effort 10.56 4.912 0.649 0.27
Poor effort 8.90 6.980
Beck Depression Inventory Normal effort 11.38 7.544 0.751a 0.297
Poor effort 14.11 10.431
Poor effort 11.400 2.1755
WRAT4 Sentence Comprehension Grade Level Normal effort 12.163 1.2143 5.630* 0.796
Poor effort 10.775 2.0675
Letter-Number Sequencing Test Normal effort 10.69 2.822 0.242 0.166
Poor effort 10.20 3.054
RAPM Total score Normal effort 7.19 2.762 0.545 0.249
Poor effort 6.45 3.137
Color-Word Interference
Color Naming Normal effort 10.25 2.463 4.750* 0.731
Poor effort 8.05 3.379
Word Reading Normal effort 10.88 2.872 1.783 0.45
Poor effort 9.65 2.621
Inhibition Normal effort 11.31 2.676 4.457* 0.707
Poor effort 9.20 3.205
Inhibition/Switching Normal effort 11.31 2.414 2.758 0.556
Poor effort 9.80 2.931
Errors Color Naming Normal effort 1.00 2.530 0.552 0.249
Poor effort .55 .887
Errors Word Reading Normal effort .50 1.506 0.731 0.287
Poor effort .20 .410
Errors Inhibition Normal effort 9.31 3.554 0.298 0.183
Poor effort 8.60 4.135
Errors Inhibition/Switching Normal effort 9.75 3.088 0.002 0.014
Poor effort 9.80 3.861
Variable  Group M SD F(1,34) d
Final grades for PSYA01 Normal effort 78.69 11.603 0.322a 0.196
Poor effort 76.44 11.408
Alertness level (rating 1–5) Normal effort 3.50 1.033 0.209 0.153
Poor effort 3.65 .933
Beck Anxiety Inventory Normal effort 10.56 4.912 0.649 0.27
Poor effort 8.90 6.980
Beck Depression Inventory Normal effort 11.38 7.544 0.751a 0.297
Poor effort 14.11 10.431
Poor effort 11.400 2.1755
WRAT4 Sentence Comprehension Grade Level Normal effort 12.163 1.2143 5.630* 0.796
Poor effort 10.775 2.0675
Letter-Number Sequencing Test Normal effort 10.69 2.822 0.242 0.166
Poor effort 10.20 3.054
RAPM Total score Normal effort 7.19 2.762 0.545 0.249
Poor effort 6.45 3.137
Color-Word Interference
Color Naming Normal effort 10.25 2.463 4.750* 0.731
Poor effort 8.05 3.379
Word Reading Normal effort 10.88 2.872 1.783 0.45
Poor effort 9.65 2.621
Inhibition Normal effort 11.31 2.676 4.457* 0.707
Poor effort 9.20 3.205
Inhibition/Switching Normal effort 11.31 2.414 2.758 0.556
Poor effort 9.80 2.931
Errors Color Naming Normal effort 1.00 2.530 0.552 0.249
Poor effort .55 .887
Errors Word Reading Normal effort .50 1.506 0.731 0.287
Poor effort .20 .410
Errors Inhibition Normal effort 9.31 3.554 0.298 0.183
Poor effort 8.60 4.135
Errors Inhibition/Switching Normal effort 9.75 3.088 0.002 0.014
Poor effort 9.80 3.861

Notes: WRAT4 = Wide Range Achievement Test Fourth Edition; RAPM = Raven's Advanced Progressive Matrices.

*p < .05.

A comparison between poor effort participants and normal effort participants in Session 1 was conducted with a one-way ANOVA to investigate whether poor effort may be predicted or mediated by variables such as academic performance, mood, and energy level. The analysis revealed no significant differences between the two groups (Table 2). Because Session 2 had too few participants to have a meaningful analysis, poor effort and normal effort participants in Session 2 were not statistically compared.

## Discussion

We set out to determine the base rate of suboptimal effort exerted by undergraduate research participants during neuropsychological studies, and whether differences in participants' effort levels moderate performance on neuropsychological tests and thus distort the subsequent findings and conclusions of our research studies. We predicted that a considerable number of participants who are healthy and representative of the general population of university undergraduate research pools that we ask to engage in our research would exert suboptimal effort. In addition, we posited that a smaller portion of these participants who exhibit poor effort on initial testing would demonstrate the same poor effort scores on follow-up testing, suggesting that poor effort might be due to stable dispositional factors rather than varying situational factors. Finally, we hypothesized that academic performance, mood, and energy levels might play a moderating role here.

Our data support our hypotheses, as 55.6% of our sample in Session 1 testing was categorized as exerting poor effort. Although our sample size was smaller in Session 2 testing, we still found almost one third (30.8%) of participants were exerting poor effort. Notably, most of these poor effort participants in Session 1 were categorized as such because they scored poorly on the VSVT and DCT. In Session 2 testing, the DCT again accounted for the majority of the poor effort participants. Surprisingly, none of the poor effort participants were categorized as such because of scoring below the cutoff on the TOMM. This may be expected given the recent literature, suggesting that the TOMM may not be as sensitive in detecting suboptimal effort when compared with other SVTs (Armistead-Jehle & Gervais, 2011).

In terms of our second hypothesis regarding the temporal stability of effort, the results suggest that participants who exerted poor effort at one point in time also exerted poor effort at a later time. This could indicate a trait-like component. In addition, statistically significant relationships were found between all SVTs between Session 1 and Session 2, suggesting that participants who performed poorly on the SVTs in Session 1 were more likely to perform poorly on the SVTs in Session 2. This test–retest reliability suggests that effort levels may be due to stable disposition factors rather than varying situation factors. Alternatively, this finding may suggest that situational demands were similar between Session 1 and Session 2, thereby explaining the stability in behavior. Although our study did not examine personality variables as a potential mediator for effort, the results here show a trend in that direction that warrants further study.

We also examined whether poor performance on SVTs was systematically related to poor performance on other neuropsychological tests. We found that this was indeed characteristic of our sample. Specifically, participants who exerted poor effort on the SVTs also performed worse on some of the neuropsychological tests compared with participants who exerted normal effort.

Finally, we investigated whether poor effort may be predicted or mediated by such factors as academic performance, mood, or energy levels. These questions were deemed to be of practical importance, as information about differences between participants of variable effort would allow one to screen for participants who are highly likely to exert poor effort on testing. Our study found that academic performance, depression, anxiety, and energy levels did not significantly differentiate between poor effort and normal effort participants.

### Limitations and Implications

Our preliminary study on variable effort in undergraduate research participants raises several questions and implications related to neuropsychological research. Our results should, however, be coupled with its limitations. Small sample size, especially in the follow-up testing session, is a basic limitation of the study. Because only a few participants were compared with each other in some instances, this greatly increased the size of the confidence intervals and, thus, less accurately reflects the true population. Given the small number of participants who returned for Session 2, there exists a limited value in extrapolating findings from the follow-up session. Similarly, our sample consisted of mostly women and Asian students, which affects the generalizability of our study. However, although such a narrow demographic may not reflect other student populations, there exist no previous literature to our knowledge that suggests that effort would vary between different racial groups.

Furthermore, the design of our study also presents with some limitations. The research assistants involved in assessing participants were privy to the main hypothesis of the study and were not blinded. Although all effort was made to ensure impartiality in assessing participants, this knowledge may have influenced the outcomes of our study in the expected direction. It is also interesting to note that the research assistants themselves were upper year undergraduate students, although it is difficult to say whether this may have biased them in either direction. In addition, we were unable to account for some participant variables. Although participants were asked about their history of neurological and psychiatric disease, we did not ask specifically about their history of ADHD or learning disorders.

Another potential limitation of this study is that unstandardized cutoff scores were used for the VSVT. Because the VSVT manual did not provide cutoffs for a non-clinical sample and utilized only binomial probabilities for the interpretation of effort, cutoff scores were calculated for the purposes of this study based on the statistics for the non-clinical sample provided in the manual. These scores were all based on the fact that 95% of the non-clinical sample obtained the cutoff score or better. It may be argued that such high cutoff scores increase the sensitivity but decrease the specificity of detecting poor effort, thus including more false positives in our data. Nevertheless, the use of these VSVT cutoff scores with our sample of non-clinical university students is warranted given that these students should be expected to perform equal to or better than matched age and gender samples with significantly lower educational attainment. In addition, use of the DCT norm-referenced cutoff score of 14 (sensitivity = 88.2%, specificity = 96.1%) still detected a substantial portion of our sample as giving poor effort. Future research utilizing other established SVTs (e.g., Word Memory Test; Green, 2005) would increase the generalizability of our findings.

Despite these limitations, and while preliminary, our findings have broad implications for the field of neuropsychology. To begin, although our study found a substantial number of undergraduate students who exerted suboptimal effort during neuropsychological testing, it would certainly not be desirable to discard the use of undergraduate participant pools. Instead, our results should emphasize the need to measure and control for poor effort in studies on undergraduate volunteers. The extent of the error in past studies is of unknown magnitude. Our findings suggest that many past studies probably did produce group data that could be described as “invalid.” Hence, we stress that effort may be a source of variance in research using undergraduate participants; not unlike other sources of variance such as gender or socioeconomic status. Thus, researchers need to be cognizant of effort as a potential variable influencing their data and may consequently wish to control or account for effort in such studies where it is suspected to play a role. In some sense, the performance of those exerting suboptimal effort could be treated as outliers. An outlier in a data set is a point or a class of points that is considerably dissimilar to or inconsistent with the remainder of the data (Shaari, Bakar & Hamdan, 2009). Whether outliers should be removed is a matter of discussion that can be found elsewhere (e.g., Zijlstra, van der Ark, & Sijtsma, 2011), although it is noted that outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data that arise due to various factors such as mechanical faults, changes in system behavior, fraudulent behavior, human error, instrument error, or simply through natural deviations in populations (Hodge & Austin, 2004). Our findings suggest that effort is a factor that may produce outlier findings. Consequently, we suggest that it may be important to detect and identify poor effort within the context of any neuropsychological experiments. As we did here, a simple SVT (e.g., VSVT) might suffice as an outlier detection method. Performance scores on an SVT that can differentiate between students who exert poor effort in research studies and students who exert normal effort may be potentially used to screen for participants who are highly inclined to exert poor effort on testing. Such participants can be removed before completing the remainder of an (often costly) experiment, or their data can be removed post hoc as suggested. Conversely, it is possible that researchers might find that those who exert poor effort are of further interest to study. In other words, rather than remove outliers, poor effort might be considered as a possible covariate in future research studies.

As our results are preliminary, it is important for future research to replicate our findings with different samples and SVTs so as to increase the generalizability and reliability of our findings. Moreover, because our results demonstrate the temporal stability of effort and suggest the possible role of personality traits as a mediating variable, it would be interesting for future research to investigate whether personality variables differentiate between participants exerting variable effort in the context of an external incentive. Although much investigation is still needed on this topic, our preliminary results suggest that the base rate of suboptimal effort in undergraduate participants in neuropsychology studies is quite high, and thus question the validity of scientific findings in neuropsychology where research samples are utilized without controlling for poor effort. In experiments with non-clinical samples of undergraduate students, detection strategies should be employed to account for effort as a form of variance in neuropsychological test performance.

None declared.

## References

American Psychiatric Association.
Diagnostic and statistical manual of mental disorders
,
2000
Washington, DC
Author

(4th ed., text rev.)
Angus
R. G.
Heslegrave
R. J.
Myles
W. S.
Effects of prolonged sleep deprivation, with and without chronic physical exercise, on mood and performance
Psychophysiology
,
1985
, vol.
22

3
(pg.
276
-
282
)
P.
Gervais
R. O.
Sensitivity of the Test of Memory Malingering and the Nonverbal Medical Symptom Validity Test: A replication study
Applied Neuropsychology
,
2011
, vol.
18

4
(pg.
284
-
290
)
Beck
A. T.
Steer
R. A.
Beck anxiety inventory
,
1993
San Antonio, TX
The Psychological Corporation, Harcourt Brace and Company
Beck
A. T.
Steer
R. A.
Brown
G. K.
Beck Depression Inventory
,
1996
2nd ed.
San Antonio, TX
The Psychological Corporation
Bianchini
K. J.
Mathias
C. W.
Greve
K. W.
Symptom validity testing: A critical review
The Clinical Neuropsychologist
,
2001
, vol.
15

1
(pg.
19
-
45
)
Boone
K. B.
Lu
P.
Herzberg
D. S.
The Dot Counting Test manual
,
2002
Los Angeles, CA
Western Psychological Services
Delis
D. C.
Kaplan
E.
Kramer
J. H.
Delis-Kaplan Executive Function System (D-KEFS): Examiner's manual
,
2001
San Antonio, TX
The Psychological Corporation
Fox
D. D.
Symptom validity test failure indicates invalidity of neuropsychological tests
The Clinical Neuropsychologist
,
2011
, vol.
25

3
(pg.
488
-
495
)
Gallander Wintre
M.
North
C.
Sugar
L. A.
Psychologists’ response to criticisms about research based on undergraduate participants: A developmental perspective
,
2001
, vol.
42

3
(pg.
216
-
225
)
Green
P.
Word memory test for windows. User's manual and program
,
2005
Edmonton, Alberta
Green's Publishing
Green
P.
Rohling
M. L.
Lees-Haley
P.
Allen
L. M.
Effort has a greater effect on test scores than severe brain injury in compensation claimants
Brain Injury
,
2001
, vol.
15

12
(pg.
1045
-
1060
)
Heinrichs
R. W.
Zakzanis
K. K.
Neurocognitive deficit in schizophrenia: A quantitative review of the evidence
Neuropsychology
,
1998
, vol.
12
(pg.
426
-
445
)
Hodge
V. J.
Austin
J.
A survey of outlier detection methodologies
Artificial Intelligence Review
,
2004
, vol.
22
(pg.
85
-
126
)
Huppert
F. A.
Piercy
M.
Recognition memory in amnesic patients: Effects of temporal context and familiarity of material
Cortex: A Journal Devoted to the Study of the Nervous System and Behavior
,
1976
, vol.
12

1
(pg.
3
-
20
)
Iverson
G. L.
Ethical issues associated with the assessment of exaggeration, poor effort, and malingering
Applied Neuropsychology. Special Issue: Ethical Controversies in Neuropsychology
,
2006
, vol.
13

2
(pg.
77
-
90
)
Kato
Y.
Endo
H.
Kizuka
T.
Mental fatigue and impaired response processes: Event-related brain potentials in a Go/NoGo task
International Journal of Psychophysiology
,
2009
, vol.
72

2
(pg.
204
-
211
)
Mackworth
J. F.
Vigilance, arousal, and habituation
Psychological Review
,
1968
, vol.
75

4
(pg.
308
-
322
)
Martone
M.
Butters
N.
Trauner
D.
Some analyses of forgetting of pictorial material in amnesic and demented patients
Journal of Clinical and Experimental Neuropsychology
,
1986
, vol.
8

3
(pg.
161
-
178
)
Mittenberg
W.
Patton
C.
Canyock
E. M.
Condit
D. C.
Base rates of malingering and symptom exaggeration
Journal of Clinical and Experimental Neuropsychology
,
2002
, vol.
24

8
(pg.
1094
-
1102
)
Morgan
J. E.
Sweet
J. J.
Neuropsychology of malingering casebook
,
2009
New York
Psychology Press
Nies
K. J.
Sweet
J. J.
Neuropsychological assessment and malingering: A critical review of past and present strategies
Archives of Clinical Neuropsychology
,
1994
, vol.
9

6
(pg.
501
-
552
)
L. M.
Zamboanga
B. L.
Thompson
R. A.
Schmersal
L. A.
Extra credit as incentive for voluntary research participation
Teaching of Psychology
,
2005
, vol.
32

3
(pg.
150
-
153
)
Potts
R.
Camp
C.
Coyne
C.
The relationship between naturally occurring dysphoric moods, elaborative encoding, and recall performance
Cognition and Emotion
,
1989
, vol.
3

3
(pg.
197
-
205
)
Raven
J.
Raven
J. C.
Court
J. H.
Manual for Raven's progressive matrices and vocabulary scales. Section 4: The advanced progressive matrices
,
1998
San Antonio, TX
Harcourt Assessment
Ruocco
A.C.
Swirsky-Sacchetti
T.
Chute
D.
Mandel
S.
Platek
S.M.
Zillmer
E.
Distinguishing between neuropsychological malingering and exaggerated psychiatric symptoms in a neuropsychological setting
The Clinical Neuropsychologist
,
2008
, vol.
22
(pg.
547
-
564
)
Seron
X.
Deloche
G.
Ferrand
I.
Cornet
J.
Dot counting by brain damaged subjects
Brain and Cognition. Special Issue: Cognitive and Neuropsychological Aspects of Calculation Disorders
,
1991
, vol.
17

2
(pg.
116
-
137
)
Shaari
F.
Bakar
A. A.
Hamdan
A. R.
Outlier detection based on rough sets theory
Intelligent Data Analysis
,
2009
, vol.
13
(pg.
191
-
206
)
Slick
D.
Hopp
G.
Strauss
E.
Victoria symptom validity test
,
1997
Odessa, FL
Psychological Assessment Resources
The Psychological Corporation.
,
1997
San Antonio
Harcourt Brace
Tombaugh
T. N.
Test of Memory Malingering (TOMM)
,
1996
Toronto, ON
Multi-Health Systems
Váquez-Justo
E.
Alvarez
M. R.
Otero
M. J. F.
Influence of depressed mood on neuropsychologic performance in HIV-seropositive drug users
Psychiatry and Clinical Neurosciences
,
2003
, vol.
57

3
(pg.
251
-
258
)
Wilkinson
G. S.
Robertson
G. J.
Wide Range Achievement Test
,
2006
4th ed.
Lutz, FL
Psychological Assessment Resources
Wise
S. L.
An investigation of the differential effort received by items on a low-stakes computer-based test
Applied Measurement in Education
,
2006
, vol.
19

2
(pg.
95
-
114
)
Zakzanis
K. K.
Distinct neurocognitive profiles in multiple sclerosis subtypes
Archives of Clinical Neuropsychology
,
2000
, vol.
15
(pg.
115
-
136
)
Zakzanis
K. K.
E.
Jeffay
E.
The predictive utility of neuropsychological symptom validity testing as it relates to psychological presentation
Applied Neuropsychology
,
2012
, vol.
19
(pg.
98
-
107
)
Zakzanis
K. K.
Leach
L.
Kaplan
E.
On the nature and pattern of neurocognitive function in major depressive disorder
Neuropsychiatry, Neuropsychology, and Behavioral Neurology
,
1998
, vol.
11
(pg.
111
-
119
)
Zakzanis
K. K.
Leach
L.
Kaplan
E.
Neuropsychological differential diagnosis
,
1999
Lisse, The Netherlands
Swets & Zeitlinger
Zijlstra
W. P.
van der Ark
L.A.
Sijtsma
K.
Outliers in questionnaire data: Can they be detected and should they be removed?
Journal of Educational and Behavioural Statistics
,
2011
, vol.
36
(pg.
186
-
212
)