Objective

In addition to manual (i.e., “button press”) metrics, oculomotor metrics demonstrate considerable promise as tools for detecting invalid responding in neurocognitive assessment. This study was conducted to evaluate saccadic and manual metrics from a computerized continuous performance test as embedded indices of performance validity.

Method

Receiver operating characteristic analyses, logistic regressions, and ANOVAs were performed to evaluate saccadic and manual metrics in classification of healthy adults instructed to feign deficits (“Fake Bad” group; n = 24), healthy adults instructed to perform their best (“Best Effort” group; n = 26), and adults with a history of mild traumatic brain injury (TBI) who passed a series of validity indices (“mTBI-Pass” group; n = 19).

Results

Several saccadic and manual metrics achieved outstanding classification accuracy between Fake Bad versus Best Effort and mTBI-Pass groups, including variability (consistency) of saccadic and manual response time (RT), saccadic commission errors, and manual omission errors. Very large effect sizes were obtained between Fake Bad and Best Effort groups (Cohen's d range: 1.89–2.90; r range: .75–.78) as well as between Fake Bad and mTBI-Pass groups (Cohen's d range: 1.32–2.21; r range: .69–.71). The Fake Bad group consistently had higher saccadic and manual RT variability, more saccadic commission errors, and more manual omission errors than the Best Effort and mTBI-Pass groups.

Conclusions

These findings are the first to demonstrate that eye movements can be used to detect invalid responding in neurocognitive assessment. These results also provide compelling evidence that concurrently measured saccadic and manual metrics can detect invalid responding with high levels of sensitivity and specificity.

Introduction

Neuropsychological assessment can provide valuable information for diagnosis, treatment planning, and tracking changes in cognitive function over time. However, it is well established that a person's effort during neurocognitive testing accounts for more variance than demographic variables, etiology of and time since injury, and brain injury severity (Armistead-Jehle, Cooper, & Vanderploeg, 2016; Fox, 2011; Green, 2007; Green, Rohling, Lees-Haley, & Allen, 2001; Lange, Iverson, Brooks, & Rennison, 2010; Meyers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011). Accordingly, neuropsychological assessment depends on examinees to accurately report symptoms and perform at capacity levels throughout the testing process in order to draw accurate conclusions, make proper diagnoses, and recommend appropriate treatments (Iverson & Binder, 2000; Larrabee, 2007). Unfortunately, persons undergoing neurocognitive assessment may not always respond in a valid manner. The validity of one's responses during a neurocognitive assessment can be affected by internal factors, such as pursuit of secondary gain, fatigue, stress, medical conditions, psychiatric conditions, and medications, as well as external factors, such as testing environment (e.g., limited space, excessive ambient noise), examiner skill, unclear assessment instructions, and language/cultural barriers (Arnett, 2013; Millis & Volinsky, 2001). Beyond invalidating test results, invalid responding behaviors can lead to significant individual, economic, and societal consequences, such as undetected neurological problems, improperly awarded financial settlements, and inappropriate compensation for disability claims (Reynolds & Horton, 2012).

For these reasons, both symptom validity and performance validity testing (Bigler, 2012; Larrabee, 2012)—hereafter collectively referred to as response validity testing—are considered essential components of all neuropsychological evaluations, even in contexts where secondary gain may not be overt (Bush et al., 2005; Heilbronner et al., 2009). Response validity tests can be independent, “freestanding” measures (Green, Allen, & Astner, 1996; Tombaugh, 1996) or embedded validity indices (EVIs) derived from existing self-report or neurocognitive measures of attention (Ord, Boettcher, Greve, & Bianchini, 2010), memory (Bauer, Yantz, Ryan, Warden, & McCaffrey, 2005), and psychomotor speed (O'Bryant, Hilsabeck, Fisher, & McCaffrey, 2003). When persons exceed empirically derived cutoff scores for invalid responding on freestanding measures or EVIs, examiners obtain objective evidence that the data may be invalid.

Reaction time metrics, long recognized as useful indicators of brain damage (Stuss et al., 1989; Zihl, Von Cramon, & Mai, 1983), show considerable utility as EVIs. The rise of computerized cognitive testing has led to ample evidence demonstrating that reaction time can detect invalid responding (Hartman, 2008; Schatz & Browndyke, 2002; Vendemia, Buzan, & Simon-Dack, 2005; Willison & Tombaugh, 2006). In surveying the available literature, choice reaction times tend to be slower during invalid versus honest responding (Browndyke, 2013; Johnson, Barnhardt, & Zhu, 2003), suggesting reaction times are delayed when planning and executing an invalid response (Willison & Tombaugh, 2006). Similarly, Willison and Tombaugh (2006) reported that greater reaction time variability can detect simulated traumatic brain injury (TBI) because the formulation and execution of simulation strategies during testing increases response variability.

Omission and commission errors have also proven useful for detecting invalid responding (Busse & Whiteside, 2012; Chafetz, 2011; Erdodi, Roth, Kirsch, Lajiness-O'Neill, & Medoff, 2014; Leark, Dixon, Hoffman, & Huynh, 2002; Nelson et al., 2003; Ord et al., 2010). Omission and commission errors can be used to measure distractibility and impulsivity, respectively (Conners, 2004). These variables are commonly measured in continuous performance tests, which feature multiple trials appearing in rapid succession over a given length of time. Omission errors occur when a subject fails to provide a response (e.g., a button press) on a trial. Conversely, commission errors occur when a subject over-responds (e.g., multiple button presses) or incorrectly responds (e.g., presses a button instead of inhibiting a button press) on a given trial.

In addition to these manual (i.e., “button press”) measures of performance, oculomotor metrics derived from emerging eye tracking technology demonstrate considerable promise for detecting invalid responding (Hannula, Baym, Warren, & Cohen, 2012). Modern eye tracking systems can unobtrusively measure many parameters of eye position and movement (Duchowski, 2007). Similar to manual reaction time, oculomotor metrics like saccadic response time (RT; i.e., the latency to move one's point of gaze from one object to another) fall within a relatively narrow range for most individuals, facilitating the identification of abnormal scores (Exton & Leonard, 2009). Several studies have shown that eye movements are closely related to attention, response inhibition, working memory, processing speed, and executive function (Barnes, 2008; Gooding & Basso, 2008; Hutton, 2008; Müri & Nyffeler, 2008; Olk & Kingstone, 2003; Pierrot-Deseilligny, Milea, & Müri, 2004; Sharpe, 2008), cognitive processes commonly evaluated using continuous performance tests. Additionally, eye movement abnormalities have been demonstrated in a number of neurological conditions, such as TBI, dementia, and Parkinson's disease (Crawford et al., 2005; Ettenhofer & Barry, 2016; Heitger et al., 2009; Maruta, Suh, Niogi, Mukherjee, & Ghajar, 2010; van Stockum, MacAskill, Anderson, & Dalrymple-Alford, 2008). For these reasons, measurements of eye movements are increasingly utilized in brain research, and may eventually become common within clinical settings as well. Potentially, some of the same metrics that have demonstrated utility in detecting invalid responding on “button press” continuous performance tests—reaction time, reaction time variability, commission errors, and omission errors—could also prove to be valuable when measured using eye movements.

This study sought to determine the extent to which oculomotor responses collected during a computerized continuous performance test could discriminate between valid and invalid responders. The study employed a combined groups design consisting of a prospective, single-blinded simulator study and a cross-validation with a clinical comparison group of adults with a history of mild TBI. The simulator study consisted of a control group of valid responders and an experimental group of invalid responders. The clinical comparison group consisted of persons who passed a series of EVIs during a neurocognitive battery. To the best of our knowledge, this study is the first to evaluate saccadic performance as an EVI on a cognitive test. Parallel saccadic and manual metrics were compared to evaluate the potential added value of multimodal assessment in response validity testing. It was hypothesized that several saccadic and manual EVIs would be identified, and that invalid responders would have significantly slower and less consistent RTs, more commission errors, and more omission errors than valid responders on both saccadic and manual metrics. It was also hypothesized that saccadic and manual EVIs identified in the simulator study would demonstrate similar classification accuracy in a clinical comparison group of adults with a history of mild TBI who passed a series of validity checks.

Materials and Methods

Participants

Three groups were used in this study: a “Fake Bad” group of healthy adults instructed to feign deficits (n = 24), a “Best Effort” (i.e., control) group of healthy adults instructed to perform their best (n = 26), and a clinical comparison group of adults with a history of mild TBI who passed a series of eight EVIs (the “mTBI-Pass” group; n = 19). The Fake Bad and Best Effort groups were recruited for a prospective, experimental simulator study; the mTBI-Pass group was drawn from archival study data. Participants for all three groups were recruited using flyers, internet advertisements, and hand-outs; Fake Bad and Best Effort participants were compensated $30 for their involvement in the simulator study, whereas mTBI-Pass participants were compensated $40 in the archival study. The studies were approved by the Institutional Review Board (IRB) at Uniformed Services University of the Health Sciences.

Participants in all three groups had to be 18 years or older and fluent in English. Fake Bad and Best Effort group participants were excluded if they had ever sustained a TBI of any severity throughout their lifetime, including any head injuries that involved an alteration of consciousness (AOC). The mTBI-Pass group participants were excluded if they exceeded one or more previously validated EVI cutoff scores (WAIS-IV Digit Span: Reliable Digit Span [RDS] < 8, Age Corrected Scaled Score [ACSS] < 8; CPT-II: Omissions > 11 raw, Commissions > 21 raw, Hit RT SE > 13 raw, Perseverations > 1 raw; Trail Making Test A Completion Time > 62 sec; Trail Making Test B Completion Time > 199 sec). Participants in all three groups were excluded if they had a medical condition (e.g., thyroid disorder, sickle cell anemia) or were actively taking medication that could impair their cognitive abilities, if they had any visual impairment that could not be corrected by glasses/contacts, or if they had motor impairment or amputation of one or both upper extremities.

In accordance with the Department of Defense/Veterans Affairs clinical practice guidelines (Department of Veterans Affairs & Department of Defense, 2016), mild TBIs were defined as events that involved a sudden movement or a blow to the head the resulted in a loss of consciousness (LOC) ranging from 0 to 30 min or loss of memory (i.e., posttraumatic amnesia [PTA]) less than or equal to 24 hr. The mTBI-Pass group excluded participants who reported any head injury in the moderate-to-severe range (LOC > 30 min or PTA > 1 day) or who only reported head injuries without LOC or PTA. Consistent with other studies using samples with remote history of TBI (Larson, Kondiles, Starr, & Zollman, 2013; Terrio et al., 2009), head injury information was obtained using a comprehensive semi-structured interview that obtained detailed information about injury characteristics, mechanism of injury, and injury sequelae. Using information from this interview, a team of two or more licensed psychologists with post-doctoral fellowship training in clinical neuropsychology classified the individual's head injury group as “no TBI,” “possible mild TBI (AOC only),” “mild TBI,” “moderate TBI,” and “severe TBI.” Participants in the mTBI-Pass group reported a history of at least one mild TBI with median length of time since head injury being 6.9 years (interquartile range, IQR: 2.32–21.6 years), the average LOC being 3.00 min (SD = 4.29 min), and the average length of PTA being 18.1 min (SD = 50.8 min).

Overall, 57 people expressed interest in participating in the prospective, simulator study. Of these, one was ineligible due to history of concussion and five could not participate due to scheduling conflicts. Of the 51 participants enrolled, 50 participants completed the study and 1 was unable to complete the study due to technical difficulties with the laboratory equipment. Of the 30 subjects in the archival study's overall mild TBI group, 2 were excluded for having medical conditions that interfered with neurocognitive testing (i.e., optic nerve tumor, severe diplopia), and 9 were excluded for exceeding 1 or more EVI cutoff scores (5 subjects exceeded cutoff on CPT-II Commissions, 4 subjects exceeded cutoff on CPT-II Perseverations, 2 subjects exceeded cutoff on CPT-II Hit RT SE, 2 subjects exceeded cutoff on CPT-II Omissions, 2 subjects scored below cutoff on Digit Span ACSS, and 2 subjects scored below cutoff on Digit Span RDS).

Measures

All groups completed a semi-structured interview and a neurocognitive assessment battery that included the Wechsler Test of Adult Reading (WTAR; Holdnack, 2001) to estimate intelligence and the Bethesda Eye & Attention Measure (BEAM; Ettenhofer, Hershaw, & Barry, 2016) to assess saccadic and manual performance. The Fake Bad and Best Effort groups additionally completed the Head Injury Knowledge Scale (HIKS; Ono, Ownsworth, & Walters, 2011) to assess knowledge of head injury sequelae.

Bethesda Eye & Attention Measure v.34

Bethesda Eye & Attention Measure v.34 is a multimodal, computer-based cognitive task designed to assess attention and executive function via concurrent saccadic and manual performance (Ettenhofer et al., 2016). It is a 12-min, continuous performance test with a multiple trial format. Examinees are instructed, “At the beginning of each trial, look at the cross in the center of the screen. A target circle will appear above, below, left, or right of the cross. When the target circle appears, look at it and press the button as fast as you can. However, if you see a red arrow, do not look at the target circle and do not press the button.” On each trial, one of six different cue types is presented for 200 ms prior to the appearance of the target circle. The “Nondirectional” cue (NDC; white diamond shape) facilitates performance by providing information about the timing of the upcoming target; the “Directional” cue (DC; white arrow pointing toward the upcoming target) facilitates performance by providing valid spatial information about the upcoming target's location; the “Misdirectional” cue (MDC; white arrow pointing away from the upcoming target) introduces attentional interference by providing invalid spatial information; the “Gap” cue (blank screen replacing the central fixation cross) disengages attention in preparation for the upcoming target; the “No-go” cue (red arrow pointing toward the upcoming target) signals that the participant should inhibit manual and saccadic responses; and in “Uncued” (UC) trials, the target appears without any additional cues. Saccadic and manual commission errors are measured in “No-Go” trials, whereas saccadic and manual RT latency, RT intra-individual variability (i.e., consistency), and omission errors are measured on all other trials. Response time latency was defined as the time of initial fixation on the target circle (saccadic RT latency) or button press (manual RT latency) after the target circle appeared on the screen.

E-Prime 2.0 software was used to present BEAM stimuli. Eye tracking data were collected using an Applied Science Laboratories (ASL) D6 High-Speed Desktop Eye Tracker. A Cedrus RB-530 response pad was used to record manual RT. ASL Results Version 1.0 and custom software were used to process and score eye tracking data. To enhance confidence in the data quality, the scoring software automatically screened the data in the following manner: “(a) Both manual and saccadic RT values were excluded for trials in which the participant was not centrally fixated when the target appeared; (b) saccadic RT was excluded for trials in which the participant completed a saccade to any location other than the target without fixating on the target; (c) saccadic RT was excluded for trials in which optical signal loss during the RT interval exceeded 100 ms; (d) saccadic and manual RTs greater than 1000 ms were considered omissions and were not included in analyses” (Ettenhofer et al., 2016; pp. 101–102). Similar to previous studies (Ettenhofer et al., 2016, Ettenhofer & Barry, 2016), it was determined a priori that BEAM variables on a given cue type—RT, RT variability, commission errors, and omission errors—would only be calculated when data for at least 10 trials of a possible 32 were available. As such, there are some cases where the number of participants subjected to analyses for a given variable was less than the total number of participants in the study (e.g., for participants who made sufficient numbers of errors to prevent calculation of a given RT metric). Additional information about equipment, procedures, psychometrics, and validity for this task has been provided elsewhere (Ettenhofer et al., 2016).

Procedure

The study employed a combined groups design consisting of a prospective, single-blinded simulator study and a clinical comparison group drawn from an archival study. Simulator study participants were assigned to the Best Effort group or Fake Bad group using a randomly permuted block assignment program (Dallal, 2013). Participants were not told of their group assignment in advance of assessment date in order to mitigate potentially confounding effects of test preparation or coaching (Shum, O'Gorman, & Alpar, 2004).

Once the simulator study participant arrived for testing, a study coordinator obtained their informed consent and administered baseline measures to assess demographics, estimated intelligence, and knowledge of head injury sequelae. The study coordinator then presented a specific group assignment script (see

) to the participant and asked the participant not to reveal group membership to the examiner. The group assignment script was adapted from previous studies of invalid responding (Erdal, 2004; Strauss et al., 2002; Tan, Slick, Strauss, & Hultsch, 2002; Weinborn, Woods, Nulsen, & Leighton, 2012) where both Fake Bad and Best Effort groups were told that they were involved in a remote vehicular accident and that they do not feel any lingering cognitive effects. The Fake Bad group was told to exaggerate cognitive problems believably in order to get money from an insurance company (in their meta-analysis of 38 studies of invalid responding, Sollman and Berry [2011] reported that warnings to fake believably significantly increased the effect size of freestanding response validity test score differences between valid and invalid responding groups of healthy simulators). By contrast, the Best Effort group was told to perform their best. A separate examiner blinded to group assignment then administered BEAM and a brief neurocognitive assessment battery. Once the battery ended, a study coordinator administered a debrief script to the participant without the examiner present (see ).

In the archival study, participants were screened for suitability and completed a semi-structured interview, BEAM, and a neurocognitive battery. Archival study participants were made aware that they would be compensated $40 regardless of how they performed on the neurocognitive battery. They were asked to perform their best throughout the battery, and all attempts were made by examiners to maximize the validity of data obtained. Whereas the neurocognitive battery did not include any freestanding response validity tests, eight previously validated EVIs (Axelrod, Fichtenberg, Millis, & Wertheimer, 2006; Babikian, Boone, Lu, & Arnold, 2006; Iverson, Lange, Green, & Franzen, 2002; Larrabee, 2003; Ord et al., 2010; Schutte & Axelrod, 2013) were used to screen participants and exclude those with questionable response validity from this study.

Data Analysis

All statistical analyses were performed using SPSS Version 20. Chi-square analyses and analysis of variances (ANOVAs) were used to compare the three groups on demographic factors, estimated intelligence, and knowledge of head injuries. Receiver operating characteristic (ROC) analyses were used to identify BEAM metrics with the greatest potential to differentiate between the simulator study's Fake Bad and Best Effort groups (DeLong, DeLong, & Clarke-Pearson, 1988; Hanley & McNeil, 1982; Hsiao, Bartko, & Potter, 1989; Mandrekar, 2010; McNeil & Hanley, 1984). Variables that obtained outstanding classification accuracy (area under the curve [AUC] ≥ 0.90) in the simulator study were submitted to additional ROC analyses between mTBI-Pass and Fake Bad groups. Stepwise logistic regression models were then used to identify combinations of individual variables that provided classification accuracy above and beyond individual variables (i.e., joint classification accuracy; Busse & Whiteside, 2012).

All 28 BEAM variables were submitted to ROC analyses, including 2 variables representing overall saccadic and manual RT latency, 10 variables representing saccadic and manual RT latency for individual cue types, 2 variables representing overall saccadic and manual RT variability, 10 variables representing saccadic and manual RT variability for individual cue types, 2 variables representing saccadic and manual commission errors, and 2 variables representing saccadic and manual omission errors. A two-tailed alpha level of 0.05 was used for all ROC analyses. To mitigate the likelihood of making a Type I error in subsequent between-groups analyses, only variables with AUC ≥ 0.90 were submitted to between-group comparisons. Levene's F tests and Shapiro–Wilk tests were performed on these variables to test assumptions of equal variances and normalcy, respectively. One-way ANOVA, Kruskal–Wallis, and Welch's F tests were performed to identify omnibus group differences among the Fake Bad, Best Effort, and mTBI-Pass groups. Tukey HSD, Mann–Whitney U, and Games-Howell post hoc tests were performed to identify group differences and effect sizes between the three groups. All between-groups analyses used a Bonferroni-corrected 0.003 level of significance.

Results

Simulator Study Groups

The Fake Bad and Best Effort groups did not differ significantly on age, sex, race/ethnicity, years of formal education, estimated intelligence, or knowledge of head injury sequelae (see Table 1). As shown in Table 2, ROC analyses identified 12 variables with outstanding classification accuracy (AUC ≥ 0.90) between the Fake Bad and Best Effort groups, including overall saccadic and manual RT variability, saccadic commission errors, manual omission errors, and RT variability for several distinct manual and saccadic cue types. The equal variances assumption was found to be violated for MDC saccadic RT variability, Levene's F(2, 66) = 4.37, p = .016, saccadic commission errors, Levene's F(2, 66) = 5.23, p = .008, and manual omission errors, Levene's F(2, 66) = 15.3, p <.001. Additionally, significant Shapiro–Wilk values were identified in the Best Effort group for saccadic commission (W = .93, p < .001) and manual omission (W = .28, p < .001) error variables, violating the assumption of normal distributions. Owing to these violated assumptions, Welch's F test and Games-Howell post hoc analyses were conducted for MDC saccadic RT variability whereas Kruskal–Wallis and post hoc Mann–Whitney U analyses were conducted for saccadic commission errors and manual omission errors. As shown in Table 3, very large effect sizes were found for all comparisons between the Fake Bad and Best Effort groups (Cohen's d range: 1.89–2.90; r range: .75–.78). Specifically, the Fake Bad group had significantly higher RT variability, higher rates of saccadic commission errors, and higher rates of manual omission errors than the Best Effort group.

Table 1.

Demographic characteristics

 Fake Bad (n = 24) Best Effort (n = 26) mTBI-Pass (n = 19) Fa p 
Mean age in years (SD28.6 (8.9) 28.4 (10.5) 34.3 (13.3) 1.96 .15 
Mean years education (SD16.9 (2.0) 16.7 (1.7) 16.2 (2.4) 0.67 .52 
Estimated intelligence (SD117 (9.3) 116 (7.8) 114 (7.7) 0.77 .47 
    χ2 (df)a  
Gender (%)    0.54 (2) .76 
 Male 9 (37.5) 12 (46.2) 7 (36.8)   
 Female 15 (62.5) 14 (53.8) 12 (63.2)   
Race/ethnicity (%)    9.22 (8) .32 
 Caucasian 16 (66.7) 21 (80.8) 15 (78.9)   
 African-American 1 (4.2) 4 (15.4) 2 (10.5)   
 Hispanic 1 (4.2) 0 (0.0) 1 (5.3)   
 Asian 5 (20.8) 1 (3.8) 1 (5.3)   
Knowledge of head injury sequelae (SD)b 9.48 (2.63) 9.58 (2.23)   .89 
Head injury characteristics      
 Median years since injury (IQR)   6.9 (2.3–21.6)   
 LOC length in minutes (SD  3.00 (4.29)   
 PTA length in minutes (SD  18.1 (50.8)   
 Fake Bad (n = 24) Best Effort (n = 26) mTBI-Pass (n = 19) Fa p 
Mean age in years (SD28.6 (8.9) 28.4 (10.5) 34.3 (13.3) 1.96 .15 
Mean years education (SD16.9 (2.0) 16.7 (1.7) 16.2 (2.4) 0.67 .52 
Estimated intelligence (SD117 (9.3) 116 (7.8) 114 (7.7) 0.77 .47 
    χ2 (df)a  
Gender (%)    0.54 (2) .76 
 Male 9 (37.5) 12 (46.2) 7 (36.8)   
 Female 15 (62.5) 14 (53.8) 12 (63.2)   
Race/ethnicity (%)    9.22 (8) .32 
 Caucasian 16 (66.7) 21 (80.8) 15 (78.9)   
 African-American 1 (4.2) 4 (15.4) 2 (10.5)   
 Hispanic 1 (4.2) 0 (0.0) 1 (5.3)   
 Asian 5 (20.8) 1 (3.8) 1 (5.3)   
Knowledge of head injury sequelae (SD)b 9.48 (2.63) 9.58 (2.23)   .89 
Head injury characteristics      
 Median years since injury (IQR)   6.9 (2.3–21.6)   
 LOC length in minutes (SD  3.00 (4.29)   
 PTA length in minutes (SD  18.1 (50.8)   

Notes: IQR = interquartile range; LOC = loss of consciousness; PTA = posttraumatic amnesia; SD = standard deviation.

aOne-way ANOVA or chi-square with Fake Bad, Best Effort, and mTBI-Pass groups.

bFrom the HIKS

Table 2.

ROC analyses for saccadic and manual variables among simulator study (Fake Bad and Best Effort) and clinical comparison (mTBI-Pass) groups

Variable Fake Bad versus Best Effort Fake Bad versus mTBI-Passa 
 AUC 95% Low 95% High p AUC 95% Low 95% High p 
Saccadic RT latency         
 Overall 0.74 0.60 0.88 .004     
 Directional cue 0.77 0.65 0.90 .001     
 Misdirectional cue 0.81 0.69 0.93 <.001     
 Gap cue 0.61 0.44 0.78 .19     
 Uncued 0.60 0.42 0.77 .25     
 Nondirectional cue 0.71 0.57 0.86 .01     
Manual RT latency         
 Overall 0.89 0.80 0.99 <.001     
 Directional cue 0.84 0.73 0.96 <.001     
 Misdirectional cue 0.86 0.74 0.99 <.001     
 Gap cue 0.89 0.78 0.99 <.001     
 Uncued 0.83 0.70 0.95 <.001     
 Nondirectional cue 0.88 0.78 0.98 <.001     
Saccadic RT variability         
 Overall 0.97 0.93 1.00 <.001 0.91 0.84 0.99 <.001 
 Directional cue 0.93 0.86 1.00 <.001 0.90 0.80 0.99 <.001 
 Misdirectional cue 0.92 0.83 1.00 <.001 0.84 0.72 0.97 <.001 
 Gap cue 0.90 0.80 1.00 <.001 0.84 0.70 0.97 <.001 
 Uncued 0.85 0.75 0.96 <.001     
 Nondirectional cue 0.85 0.74 0.96 <.001     
Manual RT variability         
 Overall 0.97 0.92 1.00 <.001 0.93 0.86 1.00 <.001 
 Directional cue 0.97 0.92 1.00 <.001 0.94 0.87 1.00 <.001 
 Misdirectional cue 0.96 0.90 1.00 <.001 0.93 0.85 1.00 <.001 
 Gap cue 0.93 0.84 1.00 <.001 0.82 0.67 0.97 <.001 
 Uncued 0.92 0.84 0.99 <.001 0.82 0.69 0.95 <.001 
 Nondirectional cue 0.96 0.90 1.00 <.001 0.91 0.81 1.00 <.001 
Saccadic omission error % 0.60 0.44 0.75 .25     
Saccadic commission error % 0.94 0.87 1.00 <.001 0.91 0.82 0.99 <.001 
Manual omission error % 0.94 0.87 1.00 <.001 0.91 0.81 1.00 <.001 
Manual commission error % 0.80 0.66 0.94 <.001     
Variable Fake Bad versus Best Effort Fake Bad versus mTBI-Passa 
 AUC 95% Low 95% High p AUC 95% Low 95% High p 
Saccadic RT latency         
 Overall 0.74 0.60 0.88 .004     
 Directional cue 0.77 0.65 0.90 .001     
 Misdirectional cue 0.81 0.69 0.93 <.001     
 Gap cue 0.61 0.44 0.78 .19     
 Uncued 0.60 0.42 0.77 .25     
 Nondirectional cue 0.71 0.57 0.86 .01     
Manual RT latency         
 Overall 0.89 0.80 0.99 <.001     
 Directional cue 0.84 0.73 0.96 <.001     
 Misdirectional cue 0.86 0.74 0.99 <.001     
 Gap cue 0.89 0.78 0.99 <.001     
 Uncued 0.83 0.70 0.95 <.001     
 Nondirectional cue 0.88 0.78 0.98 <.001     
Saccadic RT variability         
 Overall 0.97 0.93 1.00 <.001 0.91 0.84 0.99 <.001 
 Directional cue 0.93 0.86 1.00 <.001 0.90 0.80 0.99 <.001 
 Misdirectional cue 0.92 0.83 1.00 <.001 0.84 0.72 0.97 <.001 
 Gap cue 0.90 0.80 1.00 <.001 0.84 0.70 0.97 <.001 
 Uncued 0.85 0.75 0.96 <.001     
 Nondirectional cue 0.85 0.74 0.96 <.001     
Manual RT variability         
 Overall 0.97 0.92 1.00 <.001 0.93 0.86 1.00 <.001 
 Directional cue 0.97 0.92 1.00 <.001 0.94 0.87 1.00 <.001 
 Misdirectional cue 0.96 0.90 1.00 <.001 0.93 0.85 1.00 <.001 
 Gap cue 0.93 0.84 1.00 <.001 0.82 0.67 0.97 <.001 
 Uncued 0.92 0.84 0.99 <.001 0.82 0.69 0.95 <.001 
 Nondirectional cue 0.96 0.90 1.00 <.001 0.91 0.81 1.00 <.001 
Saccadic omission error % 0.60 0.44 0.75 .25     
Saccadic commission error % 0.94 0.87 1.00 <.001 0.91 0.82 0.99 <.001 
Manual omission error % 0.94 0.87 1.00 <.001 0.91 0.81 1.00 <.001 
Manual commission error % 0.80 0.66 0.94 <.001     

Notes: Variables derived from the BEAM; RT = response time.

aOnly variables with AUC ≥ 0.90 in the simulator study were analyzed with the clinical comparison group.

Table 3.

Group comparisons of saccadic and manual variables with outstanding classification accuracy (AUC ≥ 0.90)

Measures Best Effort (n = 26) mTBI-Pass (n = 19) Fake Bad (n = 24) One-Way ANOVA Effect size (Cohen's d
M (SDF (dfp Pairwise comparisonsa Best Effort versus mTBI-Pass Best Effort versus Fake Bad mTBI-Pass versus Fake Bad 
Saccadic RT variability          
 Overall (ms) 82.4 (17.8) 94.3 (24.0) 142.3 (26.3) 47.0 (2, 66) <.001 1 & 2 < 3 .56 2.68 1.92 
 Directional cue (ms) 78.3 (23.6) 87.5 (22.6) 138.7 (31.4) 36.6 (2, 66) <.001 1 & 2 < 3 .43 2.20 1.87 
 Misdirectional cue (ms)b 80.0 (22.9) 101.2 (31.8) 160.4 (50.0) 26.0 (2, 37.7) <.001 1 < 2 < 3 .77 2.07 1.41 
 Gap cue (ms) 75.7 (20.5) 88.0 (31.4) 136.8 (38.2) 26.6 (2, 65) <.001 1 & 2 < 3 .45 1.99 1.41 
Manual RT variability          
 Overall (ms) 75.6 (17.9) 85.7 (22.7) 133.8 (21.9) 53.0 (2, 65) <.001 1 & 2 < 3 .48 2.89 2.13 
 Directional cue (ms) 76.1 (20.8) 86.9 (28.3) 150.8 (30.3) 54.6 (2, 65) <.001 1 & 2 < 3 .44 2.90 2.21 
 Misdirectional cue (ms) 77.3 (24.6) 83.8 (25.5) 143.6 (34.3) 33.6 (2, 60) <.001 1 & 2 < 3 .27 2.25 1.98 
 Gap cue (ms) 73.2 (22.0) 87.3 (31.0) 123.1 (22.8) 23.6 (2, 63) <.001 1 & 2 < 3 .52 2.23 1.32 
 Uncued (ms) 86.4 (18.9) 92.1 (24.9) 124.6 (21.5) 21.3 (2, 65) <.001 1 & 2 < 3 .26 1.89 1.40 
 Nondirectional cue (ms) 65.0 (22.2) 78.6 (27.1) 128.4 (27.0) 41.0 (2, 65) <.001 1 & 2 < 3 .57 2.56 1.81 
Measures Best Effort (n = 26) mTBI-Pass (n = 19) Fake Bad (n = 24) One-Way ANOVA Effect size (Cohen's d
M (SDF (dfp Pairwise comparisonsa Best Effort versus mTBI-Pass Best Effort versus Fake Bad mTBI-Pass versus Fake Bad 
Saccadic RT variability          
 Overall (ms) 82.4 (17.8) 94.3 (24.0) 142.3 (26.3) 47.0 (2, 66) <.001 1 & 2 < 3 .56 2.68 1.92 
 Directional cue (ms) 78.3 (23.6) 87.5 (22.6) 138.7 (31.4) 36.6 (2, 66) <.001 1 & 2 < 3 .43 2.20 1.87 
 Misdirectional cue (ms)b 80.0 (22.9) 101.2 (31.8) 160.4 (50.0) 26.0 (2, 37.7) <.001 1 < 2 < 3 .77 2.07 1.41 
 Gap cue (ms) 75.7 (20.5) 88.0 (31.4) 136.8 (38.2) 26.6 (2, 65) <.001 1 & 2 < 3 .45 1.99 1.41 
Manual RT variability          
 Overall (ms) 75.6 (17.9) 85.7 (22.7) 133.8 (21.9) 53.0 (2, 65) <.001 1 & 2 < 3 .48 2.89 2.13 
 Directional cue (ms) 76.1 (20.8) 86.9 (28.3) 150.8 (30.3) 54.6 (2, 65) <.001 1 & 2 < 3 .44 2.90 2.21 
 Misdirectional cue (ms) 77.3 (24.6) 83.8 (25.5) 143.6 (34.3) 33.6 (2, 60) <.001 1 & 2 < 3 .27 2.25 1.98 
 Gap cue (ms) 73.2 (22.0) 87.3 (31.0) 123.1 (22.8) 23.6 (2, 63) <.001 1 & 2 < 3 .52 2.23 1.32 
 Uncued (ms) 86.4 (18.9) 92.1 (24.9) 124.6 (21.5) 21.3 (2, 65) <.001 1 & 2 < 3 .26 1.89 1.40 
 Nondirectional cue (ms) 65.0 (22.2) 78.6 (27.1) 128.4 (27.0) 41.0 (2, 65) <.001 1 & 2 < 3 .57 2.56 1.81 
  Kruskal–Wallis H Effect size (Pearson r
Mdn (IQR) χ2 (dfp Pairwise comparisons Best Effort versus mTBI-Pass Best Effort versus Fake Bad mTBI-Pass versus Fake Bad 
Saccadic commission error % 7.29 (15.2) 8.33 (32.0) 65.5 (34.1) 33.6 (2) <.001 1 & 2 < 3 .03 .75 .69 
Manual omission error % 0.00 (0.63) 0.00 (0.63) 9.65 (22.2) 38.1 (2) <.001 1 & 2 < 3 .05 .78 .71 
  Kruskal–Wallis H Effect size (Pearson r
Mdn (IQR) χ2 (dfp Pairwise comparisons Best Effort versus mTBI-Pass Best Effort versus Fake Bad mTBI-Pass versus Fake Bad 
Saccadic commission error % 7.29 (15.2) 8.33 (32.0) 65.5 (34.1) 33.6 (2) <.001 1 & 2 < 3 .03 .75 .69 
Manual omission error % 0.00 (0.63) 0.00 (0.63) 9.65 (22.2) 38.1 (2) <.001 1 & 2 < 3 .05 .78 .71 

Notes: Variables derived from the BEAM; RT = response time; ms = milliseconds; DC = directional cue; MDC = misdirectional cue; NDC = nondirectional cue; UC = uncued.

aAll group differences significant at p < .001.

bWelch's F test and Games-Howell post hoc test used due to violation of heterogeneity of variance assumption. Degrees of freedom adjusted. Best Effort group significantly less than mTBI-Pass group, p = .049.

Stepwise logistic regression analyses indicated that combining overall saccadic and manual RT variability as a set most reliably differentiated the Fake Bad and Best Effort groups, χ2(2, N = 49) = 53.6, p < .001. Nagelkerke's R2 of .89 indicated a strong relationship between prediction and grouping; the joint variable's group prediction accuracy was 96%. Additional ROC analyses indicated that the combined AUC of overall saccadic and manual RT variability was nearly perfect (AUC = 0.99; 95% confidence interval (CI) = 0.96–1.00), with each variable contributing an additional 0.02 AUC to the combined model above and beyond the AUC of the individual variables. A similar analysis combining saccadic commission errors and manual omission errors obtained outstanding classification accuracy (AUC = 0.95; 95% CI = 0.90–1.00; group prediction accuracy = 88%) in the simulator study. Each variable contributed an additional 0.01 AUC to the combined model above and beyond the AUC of the individual variables.

Clinical Comparison Group

The mTBI-Pass group did not significantly differ from either the Fake Bad or Best Effort group on age, sex, race/ethnicity, years of formal education, estimated premorbid intelligence, or knowledge of head injury sequelae (see Table 1). Consistent with simulator study results, very large effect sizes were found for all comparisons between the Fake Bad and mTBI-Pass groups (Cohen's d range: 1.32–2.21; r range: .69–.71), with the Fake Bad group demonstrating significantly higher RT variability, higher rates of saccadic commission errors, and higher rates of manual omission errors than the mTBI-Pass group (see Table 3). In contrast, the Best Effort and mTBI-Pass groups did not significantly differ on any variable with outstanding classification accuracy. The joint variable representing both overall saccadic and manual RT variability obtained outstanding classification accuracy comparing Fake Bad and mTBI-Pass groups (AUC = 0.93, 95% CI = 0.86–1.00; group prediction accuracy = 81%). The joint variable representing both saccadic commission errors and manual omission errors also obtained outstanding classification accuracy in the clinical comparison group (AUC = 0.91, 95% CI = 0.82–0.99; group prediction accuracy = 81%).

Discussion

This combined groups study demonstrates that eye movements can be used to detect invalid responding in neurocognitive assessment. Results provide evidence that multimodal saccadic and manual indices collected concurrently during a computerized test of attention can be used to identify invalid responding in healthy adult populations. In a well-controlled, prospective simulator study, healthy adults instructed to feign deficits demonstrated significantly greater saccadic and manual RT variability (i.e., less consistent response speed), higher saccadic commission error rates, and higher manual omission error rates than healthy adults who gave their best effort. These results were replicated during a clinical comparison between the Fake Bad group and a group of adults with a history of mild TBI who passed multiple EVIs. Consistent with several meta-analyses of response validity tests, we found very large effect sizes between groups of invalid and valid responders (Jasinski, Berry, Shandera, & Clark, 2011; Sollman & Berry, 2011; Vickery, Berry, Inman, Harris, & Orey, 2001).

Several individual variables achieved outstanding classification accuracy in both experimental and clinical comparisons, including measures of saccadic and manual RT variability, saccadic commission errors, and manual omission errors. Furthermore, combinations of individual variables enhanced detection of invalid responding through joint classification accuracy (Busse & Whiteside, 2012). In the simulator study, a combined variable of overall saccadic and manual RT variability obtained near-perfect classification accuracy. A second joint variable combining saccadic commission errors and manual omission errors also obtained outstanding classification accuracy in the simulator study. These two combined variables maintained outstanding classification accuracy in the clinical comparison group. Collectively, these results demonstrate that combinations of saccadic and manual variables can provide enhanced detection of invalid responding.

Closer inspection of saccadic and manual metrics elicits several interesting trends. Across modalities, RT variability generally outperformed RT latency in classifying invalid responding. These findings are consistent with performance validity research that consistently demonstrates RT variability detects invalid responding better than RT itself (Erdodi et al., 2014; Ord et al., 2010). These findings are also consistent with continuous performance tests that use RT variability and RT latency in complementary ways to identify a clinical condition of interest. For example, RT variability has been described as “the single most important measure” (p. 13) of the Test of Variables of Attention (Greenberg, Kindschi, Dupuy, & Hughes, 2016), accounting “for more than 80% of the variance” in determining clinical attention deficit-hyperactivity disorder (ADHD) thresholds, whereas RT latency accounts for more than 12% of the variance.

Saccadic and manual omission and commission errors also emerged as complementary metrics for identifying invalid performance. Consistent with previous research with continuous performance tests (Busse & Whiteside, 2012; Erdodi et al., 2014; Lange, Iverson, et al., 2012; Ord et al., 2010), our results demonstrated that manual omission errors were sensitive indicators of invalid responding. Furthermore, our findings are the first to demonstrate that saccadic commission errors can be used to detect invalid responding. Notably, commission and omission error rates manifested differently across saccadic and manual modalities; saccadic commissions were more sensitive to invalid responding than manual commissions, and manual omissions were more sensitive than saccadic omissions. As described earlier, a combined variable of saccadic commission errors and manual omission errors resulted in improved classification relative to each individual error type. Extending previous findings that saccadic and manual metrics offer complementary and independent contributions toward the evaluation of cognitive performance (Ettenhofer et al., 2016), results of this study indicate that saccadic and manual commission and omission errors may also offer complementary value for the evaluation of performance validity.

This study benefited from several methodological strengths. The simulator study's tightly controlled, prospective experimental design rendered robust differences in saccadic and manual performance between the Fake Bad and Best Effort groups. The randomized, permuted block assignment with a substantial sample size (n = 50) rendered demographically similar groups. Examiner scoring bias was mitigated with blinding and computer-automated, objective data collection. Collectively, these findings suggest the simulator study's research design effectively isolated the “invalid responding” construct, enhancing the internal validity of the simulator study results.

This study's generalizability was enhanced by the clinical comparison group of adults with a history of mild TBI. Mild TBI populations are known for having high base rates of invalid responding (Carone & Bush, 2013; Greiffenstein, 2013; Mittenberg, Patton, Canyock, & Condit, 2002), making the clinical comparison group in this study particularly useful. Conservative screening methods increased confidence that the mild TBI participants in this study's cross-validation were performing at capacity levels (i.e., valid responding), enabling the study to add a “known group” to its design. As expected, classification accuracy from the simulator study decreased modestly in the clinical comparison group but mostly remained within the “outstanding” range. Whereas classification accuracy would likely have been reduced further in more heterogeneous clinical samples with more recent histories of neurological problems, this known group comparison provides evidence that saccadic eye movements can be used in tandem with manual responses to detect invalid responding in clinical populations. Future studies should evaluate these variables in divergent clinical populations and contexts.

Despite its many methodological strengths, the study would have benefited from larger and more diverse samples in all three groups. Because demographic factors can influence the prevalence and type of invalid responding behavior (Armistead-Jehle & Hansen, 2011), the above average intelligence and years of education of this study's groups may limit the generalizability of the study's findings among persons with lower intellectual ability. It is possible that invalid responding may manifest differently among these individuals, leading to differences in classification accuracy of certain variables. Additionally, the study's generalizability is limited by its use of non-clinical simulators in the Fake Bad group. The study would have benefited from having an additional known group of persons with a history of mild TBI who failed response validity tests. Ideally, the archival study would have included a freestanding validity test such as Word Memory Test (Green, 2003) to split mild TBI group into mTBI-Pass and mTBI-Fail groups (Lange, Iverson, et al., 2012; Lange, Pancholi, Bhagwat, Anderson-Barnes, & French, 2012).

In the absence of freestanding validity test data, multiple EVIs were used in this study to rigorously screen out possible invalid responding. Because EVIs are generally less sensitive than freestanding validity tests (Miele, Gunner, Lynch, & McCaffrey, 2012), a single EVI failure with 90% specificity disqualified a subject from this study's mTBI-Pass group. Whereas false positives using this method may have eliminated a small number of persons with mild TBI who were in fact responding validly, the benefit to this approach was an enhanced confidence in the validity of our clinical comparison group.

Collectively, findings from this study provide compelling evidence that concurrently measured saccadic and manual metrics can detect invalid responding with high levels of sensitivity and specificity. This study's findings support the growing trend of using continuous performance tests and their associated metrics (e.g., RT latency, RT variability, omissions, commissions) to detect invalid responding in neurocognitive assessment (Busse & Whiteside, 2012; Erdodi et al., 2014; Hartman, 2008; Henry, 2005; Lange, Iverson, et al., 2012; Marshall et al., 2010; Ord et al., 2010; Reicker, 2008; Suhr, Sullivan, & Rodriguez, 2011; Tombaugh, Rees, Stormer, Harrison, & Smith, 2007; Willison & Tombaugh, 2006). To the best of our knowledge, these findings are the first to demonstrate that saccadic performance during cognitive assessment can be used to detect invalid responding.

In addition to its performance in tightly controlled experimental conditions, our findings provide compelling preliminary evidence for clinical utility of saccadic variables in neurocognitive assessment. The collective results from the experimental and clinical group analyses suggest that assessment approaches incorporating eye movements could potentially identify invalid responding in broader mild TBI populations while minimizing false positives. Future studies are needed to replicate and cross-validate this project's findings in larger, more heterogeneous groups of persons with and without neurological conditions. Future projects should attempt to incorporate at least four groups to maximize internal and external validity: an unbiased experimental group without a clinical condition, a biased experimental group without a clinical condition, a clinical group that passed a validity threshold, and a clinical group that failed a validity threshold (Rogers, 2008; Stevens & Merten, 2010).

This study also suggests that multimodal tests incorporating saccadic and manual responses may provide enhanced capability to detect invalid responding above and beyond single-modality tests (i.e., manual-only responses) in clinical and non-clinical samples. Subsequent studies will be valuable to replicate and extend this study's findings. Future studies could compare the metrics used in this study to well-researched EVIs in the Conners’ Continuous Performance Test-II (CPT-II; Conners, 2004; Erdodi et al., 2014), Trail Making Test (Reitan & Wolfson, 1985; Suhr & Barrash, 2007), and Digit Span (Greiffenstein, Baker, & Gola, 1994; Jasinski et al., 2011), as well as freestanding validity measures like the Victoria Symptom Validity Test (Slick, Hopp, Strauss, & Thompson, 1997) and the Medical Symptom Validity Test (Green, 2004).

Finally, the multimodal assessment method utilized in this study (in which participants concurrently look at visual stimuli and press a button in response to a single target) may elicit wide-ranging invalid responding strategies, and these strategies should be examined in greater detail. The wide net of individual and joint variables that obtained outstanding classification accuracy in this study suggests that, in addition to single-metric cutoffs, multiple joint classifications and diagnostic algorithms could be developed for detecting invalid responding. These algorithms could incorporate performance across saccadic and manual variables into criteria for possible, probable, and definite invalid responding (Slick, Sherman, & Iverson, 1999; Slick & Sherman, 2012, 2013). Future studies of clinical and performance validity applications of multimodal assessment may identify powerful new metrics for researchers and clinicians in a variety of settings.

Supplementary material

Supplementary material is available at

online.

Funding

This work was supported by Uniformed Services University of the Health Sciences [Grant R027LP to M.L.E.] and the American Psychological Association [dissertation research award to D.M.B.]

Conflict of Interest

The technology described in this manuscript is included in U.S. Application No. 61/779,801 , U.S. Patent Application No. 14/773,987, European Patent Application No. 14780396.9, and International Patent Application No. PCT/US2014/022468 (rights assigned to Uniformed Services University of the Health Sciences).

Acknowledgements

We owe many thanks to the participants in this study and others who provided support and assistance, including Dmitry Mirochnitchenko, Jessica Kegel, Kathy Williams, Andrew Waters, Cara Olsen, Marjan Holloway, Amanda Devane, and Doug Girard. The views and opinions presented in this manuscript are those of the authors and do not necessarily represent the position of USUHS, the Department of Defense, or the United States government.

References

Armistead-Jehle
,
P.
,
Cooper
,
D. B.
, &
Vanderploeg
,
R. D.
(
2016
).
The role of performance validity tests in the assessment of cognitive functioning after military concussion: A replication and extension
.
Applied Neuropsychology
 ,
23
,
264
273
.
Armistead-Jehle
,
P.
, &
Hansen
,
C. L.
(
2011
).
Comparison of the Repeatable Battery for the Assessment of Neuropsychological Status Effort Index and stand-alone symptom validity tests in a military sample
.
Archives of Clinical Neuropsychology
 ,
26
,
592
601
. .
Arnett
,
P.
(
2013
).
Secondary influences on neuropsychological test performance
 .
New York, NY
:
Oxford University Press
.
Axelrod
,
B. N.
,
Fichtenberg
,
N. L.
,
Millis
,
S. R.
, &
Wertheimer
,
J. C.
(
2006
).
Detecting incomplete effort with Digit Span from the Wechsler Adult Intelligence Scale-Third Edition
.
The Clinical Neuropsychologist
 ,
20
,
513
523
.
Babikian
,
T.
,
Boone
,
K. B.
,
Lu
,
P.
, &
Arnold
,
G.
(
2006
).
Sensitivity and specificity of various digit span scores in the detection of suspect effort
.
The Clinical Neuropsychologist
 ,
20
,
145
159
. .
Barnes
,
G.
(
2008
).
Cognitive processes involved in smooth pursuit eye movements
.
Brain and Cognition
 ,
68
,
309
326
.
Bauer
,
L.
,
Yantz
,
C. L.
,
Ryan
,
L. M.
,
Warden
,
D. L.
, &
McCaffrey
,
R. J.
(
2005
).
An examination of the California Verbal Learning Test II to detect incomplete effort in a traumatic brain-injury sample
.
Applied Neuropsychology
 ,
12
,
202
207
.
Bigler
,
E. D.
(
2012
).
Symptom validity testing, effort, and neuropsychological assessment
.
Journal of the International Neuropsychological Society
 ,
18
,
632
640
.
Browndyke
,
J. N.
(
2013
). Functional neuroanatomical bases of deceptive behavior and malingering. In
D. A.
Carone
, &
S. S.
Bush
(Eds.),
Mild traumatic brain injury: Symptom validity assessment and malingering
  (pp.
303
321
).
New York, NY
:
Springer Publishing
.
Bush
,
S. S.
,
Ruff
,
R. M.
,
Tröster
,
A. I.
,
Barth
,
J. T.
,
Koffler
,
S. P.
,
Pliskin
,
N. H.
,
Silver
, &
C. H.
(
2005
).
Symptom validity assessment: Practice issues and medical necessity: NAN Policy & Planning Committee
.
Archives of Clinical Neuropsychology
 ,
20
,
419
426
.
Busse
,
M.
, &
Whiteside
,
D.
(
2012
).
Detecting suboptimal cognitive effort: Classification accuracy of the Conner's Continuous Performance Test-II, Brief Test of Attention, and Trail Making Test
.
The Clinical Neuropsychologist
 ,
26
,
675
687
. .
Carone
,
D. A.
, &
Bush
,
S. S.
(
2013
). Introduction: Historical perspectives on mild traumatic brain injury, symptom validity assessment, and malingering. In
D. A.
Carone
, &
S. S.
Bush
(Eds.),
Mild traumatic brain injury: Symptom validity assessment and malingering
  (pp.
1
29
).
New York, NY
:
Springer Publishing
.
Chafetz
,
M.
(
2011
).
Reducing the probability of false positives in malingering detection of Social Security disability claimants
.
The Clinical Neuropsychologist
 ,
25
,
1239
1252
. .
Conners
,
C. K.
(
2004
).
Conners’ Continuous Performance Test (CPT II)
 .
Toronto, Ontario, Canada
:
Multi-Health Systems
.
Crawford
,
T. J.
,
Higham
,
S.
,
Renvoize
,
T.
,
Patel
,
J.
,
Dale
,
M.
,
Suriya
,
A.
, et al
. (
2005
).
Inhibitory control of saccadic eye movements and cognitive impairment in Alzheimer's disease
.
Biological Psychiatry
 ,
57
,
1052
1060
.
Dallal
,
G. E.
(
2013
).
Randomization plans
 . Retrieved from www.randomization.com
DeLong
,
E. R.
,
DeLong
,
D. M.
, &
Clarke-Pearson
,
D. L.
(
1988
).
Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach
.
Biometrics
 ,
44
,
837
845
.
Department of Veterans Affairs
, &
Department of Defense
. (
2016
).
VA/DoD clinical practice guideline for the management of concussion-mild traumatic brain injury
 . Washington, DC. Retrieved from http://www.healthquality.va.gov/guidelines/Rehab/mtbi/mTBICPGFullCPG50821816.pdf.
Duchowski
,
A. T.
(
2007
).
Eye tracking methodology: Theory and practice
 .
London
:
Springer
.
Erdal
,
K.
(
2004
).
The effects of motivation, coaching, and knowledge of neuropsychology on the simulated malingering of head injury
.
Archives of Clinical Neuropsychology
 ,
19
,
73
88
.
Erdodi
,
L. A.
,
Roth
,
R. M.
,
Kirsch
,
N. L.
,
Lajiness-O'Neill
,
R.
, &
Medoff
,
B.
(
2014
).
Aggregating validity indicators embedded in Conners’ CPT-II outperforms individual cutoffs at separating valid from invalid performance in adults with traumatic brain injury
.
Archives of Clinical Neuropsychology
 ,
29
,
456
466
.
Ettenhofer
,
M. L.
, &
Barry
,
D. M.
(
2016
).
Saccadic impairment associated with remote history of mild traumatic brain injury
.
Journal of Neuropsychiatry and Clinical Neurosciences
 ,
28
,
223
231
. .
Ettenhofer
,
M. L.
,
Hershaw
,
J. N.
, &
Barry
,
D. M.
(
2016
).
Multimodal assessment of visual attention using the Bethesda Eye & Attention Measure (BEAM)
.
Journal of Clinical and Experimental Neuropsychology
 ,
38
,
96
110
.
Exton
,
C.
, &
Leonard
,
M.
(
2009
).
Eye tracking technology: A fresh approach in delirium assessment
.
International Review of Psychiatry
 ,
21
,
8
14
.
Fox
,
D. D.
(
2011
).
Symptom validity test failure indicates invalidity of neuropsychological tests
.
The Clinical Neuropsychologist
 ,
25
,
488
495
. .
Gooding
,
D. C.
, &
Basso
,
M. A.
(
2008
).
The tell-tale tasks: A review of saccadic research in psychiatric patient populations
.
Brain and Cognition
 ,
68
,
371
390
. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2755089/pdf/nihms83687.pdf.
Green
,
P.
(
2003
).
Green's Word Memory Test for Windows: User's manual and program
 .
Edmonton, Alberta, Canada
:
Author
.
Green
,
P.
(
2004
).
Medical Symptom Validity Test for Windows: User's manual and program
 .
Edmonton, Alberta, Canada
:
Author
.
Green
,
P.
(
2007
).
The pervasive influence of effort on neuropsychological tests
.
Physical Medicine and Rehabilitation Clinics of North America
 ,
18
,
43
68
. .
Green
,
P.
,
Allen
,
L. M.
, &
Astner
,
K.
(
1996
).
The Word Memory Test: A user's guide to the oral and computer-administered forms, US version 1.1
 .
Durham, NC
:
Cognisyst
.
Green
,
P.
,
Rohling
,
M. L.
,
Lees-Haley
,
P. R.
, &
Allen
, ,
L. M.
III
(
2001
).
Effort has a greater effect on test scores than severe brain injury in compensation claimants
.
Brain Injury
 ,
15
,
1045
1060
.
Greenberg
,
L. M.
,
Kindschi
,
C. L.
,
Dupuy
,
T. R.
, &
Hughes
,
S. J.
(
2016
).
Test of Variables of Attention (TOVA) clinical manual
 .
Los Alamitos, CA
:
The TOVA
.
Greiffenstein
,
M. F.
(
2013
). Foreward. In
D. A.
Carone
, &
S. S.
Bush
(Eds.),
Mild traumatic brain injury: Symptom validity assessment and malingering
  (pp.
xiii
xiv
).
New York, NY
:
Springer Publishing
.
Greiffenstein
,
M. F.
,
Baker
,
W. J.
, &
Gola
,
T.
(
1994
).
Validation of malingered amnesia measures with a large clinical sample
.
Psychological Assessment
 ,
6
,
218
.
Hanley
,
J. A.
, &
McNeil
,
B. J.
(
1982
).
The meaning and use of the area under a receiver operating characteristic (ROC) curve
.
Radiology
 ,
743
,
29
36
.
Hannula
,
D. E.
,
Baym
,
C. L.
,
Warren
,
D. E.
, &
Cohen
,
N. J.
(
2012
).
The eyes know: Eye movements as a veridical index of memory
.
Psychological Science
 ,
23
,
278
287
. .
Hartman
,
D. E.
(
2008
).
The Computerized Test of Information Processing (CTIP) by Tom Tombaugh
.
Applied Neuropsychology
 ,
15
,
226
227
. .
Heilbronner
,
R. L.
,
Sweet
,
J. J.
,
Morgan
,
J. E.
,
Larrabee
,
G. J.
, &
Millis
,
S. R.
(
2009
).
American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias, and malingering
.
The Clinical Neuropsychologist
 ,
23
,
1093
1129
.
Heitger
,
M. H.
,
Jones
,
R. D.
,
Macleod
,
A. D.
,
Snell
,
D. L.
,
Frampton
,
C. M.
, &
Anderson
,
T. J.
(
2009
).
Impaired eye movements in post-concussion syndrome indicate suboptimal brain function beyond the influence of depression, malingering or intellectual ability
.
Brain
 ,
132
,
2850
2870
.
Henry
,
G. K.
(
2005
).
Probable malingering and performance on the test of variables of attention
.
The Clinical Neuropsychologist
 ,
19
,
121
129
.
Holdnack
,
J. A.
(
2001
).
Wechsler Test of Adult Reading: Manual
 .
San Antonio, TX
:
Pearson
.
Hsiao
,
J. K.
,
Bartko
,
J. J.
, &
Potter
,
W. Z.
(
1989
).
Diagnosing diagnoses. Receiver operating characteristic methods and psychiatry
.
Archives of General Psychiatry
 ,
46
,
664
667
.
Hutton
,
S.
(
2008
).
Cognitive control of saccadic eye movements
.
Brain and Cognition
 ,
68
,
327
340
.
Iverson
,
G. L.
, &
Binder
,
L. M.
(
2000
).
Detecting exaggeration and malingering in neuropsychological assessment
.
Journal of Head Trauma Rehabilitation
 ,
15
,
829
858
.
Iverson
,
G. L.
,
Lange
,
R. T.
,
Green
,
P.
, &
Franzen
,
M. D.
(
2002
).
Detecting exaggeration and malingering with the Trail Making Test
.
The Clinical Neuropsychologist
 ,
16
,
398
406
. .
Jasinski
,
L. J.
,
Berry
,
D. T. R.
,
Shandera
,
A. L.
, &
Clark
,
J. A.
(
2011
).
Use of the Wechsler Adult Intelligence Scale Digit Span subtest for malingering detection: A meta-analytic review
.
Journal of Clinical and Experimental Neuropsychology
 ,
33
,
300
314
.
Johnson
,
R.
,
Barnhardt
,
J.
, &
Zhu
,
J.
(
2003
).
The deceptive response: Effects of response conflict and strategic monitoring on the late positive component and episodic memory-related brain activity
.
Biological Psychology
 ,
64
,
217
253
.
Lange
,
R. T.
,
Iverson
,
G. L.
,
Brickell
,
T. A.
,
Staver
,
T.
,
Pancholi
,
S.
,
Bhagwat
,
A.
, et al
. (
2012
).
Clinical utility of the Conners’ Continuous Performance Test-II to detect poor effort in U.S. military personnel following traumatic brain injury
.
Psychological Assessment
 . .
Lange
,
R. T.
,
Iverson
,
G. L.
,
Brooks
,
B. L.
, &
Rennison
,
V.
(
2010
).
Influence of poor effort on self-reported symptoms and neurocognitive test performance following mild traumatic brain injury
.
Journal of Clinical and Experimental Neuropsychology
 ,
32
,
961
972
.
Lange
,
R. T.
,
Pancholi
,
S.
,
Bhagwat
,
A.
,
Anderson-Barnes
,
V.
, &
French
,
L. M.
(
2012
).
Influence of poor effort on neuropsychological test performance in U.S. military personnel following mild traumatic brain injury
.
Journal of Clinical and Experimental Neuropsychology
 ,
34
,
453
466
. .
Larrabee
,
G. J.
(
2003
).
Detection of malingering using atypical performance patterns on standard neuropsychological tests
.
The Clinical Neuropsychologist
 ,
17
,
410
425
. .
Larrabee
,
G. J.
(
2007
).
Assessment of malingered neuropsychological deficits
 .
New York, NY
:
Oxford University Press
.
Larrabee
,
G. J.
(
2012
).
Performance validity and symptom validity in neuropsychological assessment
.
Journal of the International Neuropsychological Society
 ,
18
,
625
630
. .
Larson
,
E. B.
,
Kondiles
,
B. R.
,
Starr
,
C. R.
, &
Zollman
,
F. S.
(
2013
).
Postconcussive complaints, cognition, symptom attribution and effort among veterans
.
Journal of the International Neuropsychological Society
 ,
19
,
88
95
.
Leark
,
R. A.
,
Dixon
,
D.
,
Hoffman
,
T.
, &
Huynh
,
D.
(
2002
).
Fake bad test response bias effects on the test of variables of attention
.
Journal of the International Neuropsychological Society
 ,
17
,
335
342
.
Mandrekar
,
J. N.
(
2010
).
Receiver operating characteristic curve in diagnostic test assessment
.
Journal of Thoracic Oncology
 ,
5
,
1315
1316
.
Marshall
,
P.
,
Schroeder
,
R.
,
O'Brien
,
J.
,
Fischer
,
R.
,
Ries
,
A.
,
Blesi
,
B.
, et al
. (
2010
).
Effectiveness of symptom validity measures in identifying cognitive and behavioral symptom exaggeration in adult attention deficit hyperactivity disorder
.
The Clinical Neuropsychologist
 ,
24
,
1204
1237
. .
Maruta
,
J.
,
Suh
,
M.
,
Niogi
,
S. N.
,
Mukherjee
,
P.
, &
Ghajar
,
J.
(
2010
).
Visual tracking synchronization as a metric for concussion screening
.
Journal of Head Trauma Rehabilitation
 ,
25
,
293
305
. .
McNeil
,
B. J.
, &
Hanley
,
J. A.
(
1984
).
Statistical approaches to the analysis of receiver operating characteristic (ROC) curves
.
Medical Decision Making
 ,
4
,
137
.
Meyers
,
J. E.
,
Volbrecht
,
M.
,
Axelrod
,
B. N.
, &
Reinsch-Boothby
,
L.
(
2011
).
Embedded symptom validity tests and overall neuropsychological test performance
.
Archives of Clinical Neuropsychology
 ,
26
,
8
15
.
Miele
,
A. S.
,
Gunner
,
J. H.
,
Lynch
,
J. K.
, &
McCaffrey
,
R. J.
(
2012
).
Are embedded validity indices equivalent to free-standing symptom validity tests
.
Archives of Clinical Neuropsychology
 ,
27
,
10
22
. .
Millis
,
S. R.
, &
Volinsky
,
C. T.
(
2001
).
Assessment of response bias in mild head injury: Beyond malingering tests
.
Journal of Clinical and Experimental Neuropsychology
 ,
23
,
809
828
.
Mittenberg
,
W.
,
Patton
,
C.
,
Canyock
,
E. M.
, &
Condit
,
D. C.
(
2002
).
Base rates of malingering and symptom exaggeration
.
Journal of Clinical and Experimental Neuropsychology
 ,
24
,
1094
1102
. .
Müri
,
R. M.
, &
Nyffeler
,
T.
(
2008
).
Neurophysiology and neuroanatomy of reflexive and volitional saccades as revealed by lesion studies with neurological patients and transcranial magnetic stimulation (TMS)
.
Brain and Cognition
 ,
68
,
284
292
.
Nelson
,
N. W.
,
Boone
,
K.
,
Dueck
,
A.
,
Wagener
,
L.
,
Lu
,
P.
, &
Grills
,
C.
(
2003
).
Relationships between eight measures of suspect effort
.
The Clinical Neuropsychologist
 ,
17
,
263
272
. .
O'Bryant
,
S. E.
,
Hilsabeck
,
R. C.
,
Fisher
,
J. M.
, &
McCaffrey
,
R. J.
(
2003
).
Utility of the trail making test in the assessment of malingering in a sample of mild traumatic brain injury litigants
.
The Clinical Neuropsychologist
 ,
17
,
69
74
. .
Olk
,
B.
, &
Kingstone
,
A.
(
2003
).
Why are antisaccades slower than prosaccades? A novel finding using a new paradigm
.
Neuroreport
 ,
14
,
151
.
Ono
,
M.
,
Ownsworth
,
T.
, &
Walters
,
B.
(
2011
).
Preliminary investigation of misconceptions and expectations of the effects of traumatic brain injury and symptom reporting
.
Brain Injury
 ,
25
,
237
249
. .
Ord
,
J. S.
,
Boettcher
,
A. C.
,
Greve
,
K. W.
, &
Bianchini
,
K. J.
(
2010
).
Detection of malingering in mild traumatic brain injury with the Conners’ Continuous Performance Test-II
.
Journal of Clinical and Experimental Neuropsychology
 ,
32
,
380
387
. .
Pierrot-Deseilligny
,
C.
,
Milea
,
D.
, &
Müri
,
R. M.
(
2004
).
Eye movement control by the cerebral cortex
.
Current Opinion in Neurology
 ,
17
,
17
25
. .
Reicker
,
L. I.
(
2008
).
The ability of reaction time tests to detect simulation: An investigation of contextual effects and criterion scores
.
Archives of Clinical Neuropsychology
 ,
23
,
419
431
. .
Reitan
,
R. M.
, &
Wolfson
,
D.
(
1985
).
The Halstead-Reitan Neuropsychological Test Battery. Theory and clinical interpretation
 .
Tucson, AZ
:
Neuropsychology Press
.
Reynolds
,
C. R.
, &
Horton
,
A. M.
(
2012
).
Detection of malingering during head injury litigation
  (
2nd ed.
).
New York
:
Springer
.
Rogers
,
R.
(
2008
). Researching response styles. In
R.
Rogers
(Ed.),
Clinical assessment of malingering and deception
  (
3rd ed.
, pp.
411
434
).
New York, NY
:
Guilford Press
.
Schatz
,
P.
, &
Browndyke
,
J.
(
2002
).
Applications of computer-based neuropsychological assessment
.
The Journal of Head Trauma Rehabilitation
 ,
17
,
395
.
Schutte
,
C.
, &
Axelrod
,
B. N.
(
2013
). Use of embedded cognitive symptom validity measures in mild traumatic brain injury cases. In
D. A.
Carone
, &
S. S.
Bush
(Eds.),
Mild traumatic brain injury: Symptom validity assessment and malingering
  (pp.
159
181
).
New York, NY
:
Springer Publishing
.
Sharpe
,
J. A.
(
2008
).
Neurophysiology and neuroanatomy of smooth pursuit: Lesion studies
.
Brain and Cognition
 ,
68
,
241
254
.
Shum
,
D. H.
,
O'Gorman
,
J. G.
, &
Alpar
,
A.
(
2004
).
Effects of incentive and preparation time on performance and classification accuracy of standard and malingering-specific memory tests
.
Archives of Clinical Neuropsychology
 ,
19
,
817
823
. .
Slick
,
D. J.
,
Hopp
,
G.
,
Strauss
,
E.
, &
Thompson
,
G.
(
1997
).
VSVT, Victoria Symptom Validity Test: Version 1.0, Professional Manual
 .
Odessa, FL
:
Psychological Assessment Resources
.
Slick
,
D. J.
,
Sherman
,
E. M.
, &
Iverson
,
G. L.
(
1999
).
Diagnostic criteria for malingered neurocognitive dysfunction: proposed standards for clinical practice and research
.
The Clinical Neuropsychologist
 ,
13
,
545
561
. .
Slick
,
D. J.
, &
Sherman
,
E. M. S.
(
2012
). Differential diagnosis of malingering and related clinical presentations. In
E. M. S.
Sherman
, &
B. L.
,
Brooks
(Eds.),
Pediatric forensic neuropsychology
  (pp.
113
135
).
New York, NY
:
Oxford University Press
.
Slick
,
D. J.
, &
Sherman
,
E. M. S.
(
2013
). Differential diagnosis of malingering. In
D. A.
Carone
, &
S. S.
Bush
(Eds.),
Mild traumatic brain injury: Symptom validity assessment and malingering
  (pp.
57
72
).
New York, NY
:
Springer Publishing
.
Sollman
,
M. J.
, &
Berry
,
D. T.
(
2011
).
Detection of inadequate effort on neuropsychological testing: A meta-analytic update and extension
.
Archives of Clinical Neuropsychology
 ,
26
,
774
789
. .
Stevens
,
A.
, &
Merten
,
T.
(
2010
).
Psychomotor retardation: Authentic or malingered? A comparative study of subjects with and without traumatic brain injury and experimental simulators
.
German Journal of Psychiatry
 ,
13
,
1
8
.
Strauss
,
E.
,
Slick
,
D. J.
,
Levy-Bencheton
,
J.
,
Hunter
,
M.
,
MacDonald
,
S. W.
, &
Hultsch
,
D. F.
(
2002
).
Intraindividual variability as an indicator of malingering in head injury
.
Archives of Clinical Neuropsychology
 ,
17
,
423
444
.
Stuss
,
D.
,
Stethem
,
L.
,
Hugenholtz
,
H.
,
Picton
,
T.
,
Pivik
,
J.
, &
Richard
,
M.
(
1989
).
Reaction time after head injury: Fatigue, divided and focused attention, and consistency of performance
.
Journal of Neurology, Neurosurgery and Psychiatry
 ,
52
,
742
748
.
Suhr
,
J. A.
, &
Barrash
,
J.
(
2007
). Performance on standard attention, memory, and psychomotor speed tasks as indicators of malingering. In
G. J.
Larrabee
(Ed.),
Assessment of malingered neuropsychological deficits
  (pp.
131
170
).
New York, NY
:
Oxford University Press
.
Suhr
,
J. A.
,
Sullivan
,
B. K.
, &
Rodriguez
,
J. L.
(
2011
).
The relationship of noncredible performance to continuous performance test scores in adults referred for attention-deficit/hyperactivity disorder evaluation
.
Archives of Clinical Neuropsychology
 ,
26
,
1
7
. .
Tan
,
J. E.
,
Slick
,
D. J.
,
Strauss
,
E.
, &
Hultsch
,
D. F.
(
2002
).
How'd they do it? Malingering strategies on symptom validity tests
.
The Clinical Neuropsychologist
 ,
16
,
495
505
. .
Terrio
,
H.
,
Brenner
,
L. A.
,
Ivins
,
B. J.
,
Cho
,
J. M.
,
Helmick
,
K.
,
Schwab
,
K.
, et al
. (
2009
).
Traumatic brain injury screening: preliminary findings in a US Army Brigade Combat Team
.
Journal of Head Trauma Rehabilitation
 ,
24
,
14
23
.
Tombaugh
,
T. N.
(
1996
).
Test of Memory Malingering (TOMM)
 .
North Tonawanda, NY
:
Multi Health Systems
.
Tombaugh
,
T. N.
,
Rees
,
L.
,
Stormer
,
P.
,
Harrison
,
A. G.
, &
Smith
,
A.
(
2007
).
The effects of mild and severe traumatic brain injury on speed of information processing as measured by the computerized tests of information processing (CTIP)
.
Archives of Clinical Neuropsychology
 ,
22
,
25
36
. .
van Stockum
,
S.
,
MacAskill
,
M.
,
Anderson
,
T.
, &
Dalrymple-Alford
,
J.
(
2008
).
Don't look now or look away: Two sources of saccadic disinhibition in Parkinson's disease
.
Neuropsychologia
 ,
46
,
3108
3115
.
Vendemia
,
J. M. C.
,
Buzan
,
R. F.
, &
Simon-Dack
,
S. L.
(
2005
).
Reaction time of motor responses in two-stimulus paradigms involving deception and congruity with varying levels of difficulty
.
Behavioural Neurology
 ,
16
,
25
.
Vickery
,
C. D.
,
Berry
,
D. T.
,
Inman
,
T. H.
,
Harris
,
M. J.
, &
Orey
,
S. A.
(
2001
).
Detection of inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures
.
Archives of Clinical Neuropsychology
 ,
16
,
45
73
.
Weinborn
,
M.
,
Woods
,
S. P.
,
Nulsen
,
C.
, &
Leighton
,
A.
(
2012
).
The effects of coaching on the verbal and nonverbal medical symptom validity tests
.
The Clinical Neuropsychologist
 ,
26
,
832
849
.
Willison
,
J.
, &
Tombaugh
,
T. N.
(
2006
).
Detecting simulation of attention deficits using reaction time tests
.
Archives of Clinical Neuropsychology
 ,
21
,
41
52
. .
Zihl
,
J.
,
Von Cramon
,
D.
, &
Mai
,
N.
(
1983
).
Selective disturbance of movement vision after bilateral brain damage
.
Brain
 ,
106
,
313
340
.

Supplementary data