Abstract

Computerized neurocognitive assessment tools (NCATs) are increasingly used for baseline and post-concussion assessments. To date, NCATs have not demonstrated strong test–retest reliabilities. Most studies have used non-military populations and different methodologies, complicating the determination of the utility of NCATs in military populations. The test–retest reliability of four NCATs (Automated Neuropsychological Assessment Metrics 4 [ANAM4], CNS-Vital Signs, CogState, and Immediate Post-Concussion Assessment and Cognitive Test [ImPACT]) was investigated in a healthy active duty military sample. Four hundred and nineteen Service Members were randomly assigned to take one NCAT and 215 returned after approximately 30 days for retest. Participants deemed to have inadequate effort during one or both testing sessions, according to the NCATs scoring algorithms, were removed from analyses. Each NCAT had at least one reliability score (intraclass correlation) in the “adequate” range (.70–.79), only ImPACT had one score considered “high” (.80–.89), and no scores met “very high” criteria (.90–.99). However, overall test–retest reliabilities in four NCATs in a military sample are consistent with reliabilities reported in the literature and are lower than desired for clinical decision-making.

Introduction

Research suggests as many as 23%–32% of Service Members returning from deployment have screened positive for a mild traumatic brain injury (mTBI; Terrio et al., 2009). Many individuals sustaining mTBI will experience some type of disruption to cognitive processes, although the exact nature and course remains unclear. Cognitive deficits following mTBI are typically transient and resolve quickly, often within 7–10 days (Alexander, 1995; Belanger, Curtiss, Demery, Lebowitz, & Vanderploeg, 2005; Bigler, 2008), with cognitive symptoms rarely lasting longer than 3 months (Belanger et al., 2005; Carroll et al., 2004). However, 15%–25% of individuals sustaining mTBI experience persistent symptoms (i.e., longer than 3 months) that can interfere with daily functioning (Carroll et al., 2004; Ponsford et al., 2000).

According to Department of Defense (DoD) policy, Service Members sustaining mTBI are pulled from duty and required to rest at least 24 h before returning to duty (DoD, 2012). In addition, they must have symptom resolution before being cleared to return to duty. At present, no objective testing is required to determine if cognitive problems have resolved. Since individuals wishing to rejoin their units may deny symptoms, including cognitive problems, the lack of objective cognitive testing may result in Service Members with persistent cognitive deficits to be returned to duty or go undetected, placing them and fellow Service Members at risk. In an attempt to improve this process, the use of neuropsychological assessment to help determine recovery following mTBI has become an area of increased focus for policy makers (United States House of Representatives. H.R. 4986, 2008).

The use of neuropsychological assessment can be a valuable tool in the proper treatment of Service Members with TBI-related cognitive impairment (McCrory et al., 2009), as well as in tracking the course of recovery after mTBI (Iverson, Brooks, Collins, & Lovell, 2006). However, most of the available tests are time intensive and require one-on-one interaction with a Neuropsychologist or trained psychometrist. Traditional methods to assess Service Members with mTBI pose logistical challenges due to the large number of Service Members deploying and the difficulties with administering these tests in combat zones during the acute injury period.

In May 2007, a TBI task force was chartered by the Surgeon General of the Army. This TBI Task Force recommended the implementation of pre-deployment (baseline), post-deployment, and post-injury/exposure neurocognitive evaluations using the Automated Neuropsychological Assessment Metrics (ANAM; US Army Traumatic Brain Injury Task Force, 2007).

The ANAM is one of many computerized neurocognitive assessment tools (NCATs) currently available. NCATs have been proposed as an alternative to traditional neuropsychological testing, at least for use during acute injury periods. NCATs have existed for several decades and have been used for the assessment of a wide range of clinical, neurological, and psychiatric issues, including mTBI assessment (Bauer et al., 2012). NCAT batteries can be advantageous as they are often easier, faster, and less expensive to administer, allow rapid availability of results, provide potentially increased sensitivity and precision on time-sensitive tasks, and allow for the testing of large groups of people in one session (Bauer et al., 2012; Collie, Darby, & Maruff, 2001; Gualtieri & Johnson, 2005; Leposavic, Leposavic, & Saula-Marojevic, 2010; Lovell & Collins, 2002; Schatz & Browndyke, 2002; Schatz & Zillmer, 2003). However, though large group testing still frequently occurs in military and athletic settings, recent research suggests that testing in a group setting may negatively affect scores (Moser, Schatz, Neidzwski, & Ott, 2011).

NCATs are frequently used to establish cognitive baseline for athletes before the playing season that can be used to compare with clinical examinations following concussion. The implementation of widespread cognitive testing with NCATs by many sports leagues suggests the feasibility of their use in large military populations. Section 1673 of the National Defense Authorization Act of 2008 (United States House of Representatives. H.R. 4986, 2008) required the Secretary of Defense to establish a protocol for the assessment and documentation of cognitive functioning of deploying Service Members to facilitate the assessment of post-deployment cognitive functioning. In response, the Assistant Secretary of Defense for Health Affairs directed the Military Departments to administer the ANAM tool before deployment.

Though research has progressed in the past 10 years, NCATs used for the screening of mTBI-related cognitive dysfunction are in the early stages of clinical test development and validation. With existing research, it is difficult to directly compare the psychometrics of NCATs due to the unique characteristics of each battery. These include the use of different cognitive tests, different computer user interfaces, different types of stimuli, and computation of composite scores from different combinations of subtests. In addition, studies often differ in research methodology, such as sampling from different populations (e.g., high school athletes, college athletes, Service Members, etc.), defining mTBI differently, and using different statistical methods in analyses. Thus, it is difficult to discern the relative contribution of test differences versus the contribution of study design differences in the varying psychometric properties of NCATs reported in the literature. Since many commercially available NCATs exist, it is important to determine the relative strengths of the psychometrics of available NCATs. This will require further study of the psychometric properties of the ANAM and other available NCATs currently used in the assessment of individuals with mTBI.

Validity and test–retest reliability are the fundamental psychometric characteristics that should be established in each NCAT. Validity refers to the ability of the test to accurately measure the domains it is purported to measure. Test–retest reliability is the extent to which the test produces consistent results across multiple administrations to the same individual. A test cannot be valid if it is not reliable, and therefore, reliability should be established before making conclusions about a test's validity. Test–retest reliability is measured as an association between the scores obtained by a single examinee taking the same test on two or more different occasions, typically reported as the reliability coefficient Pearson's r or intraclass correlation (ICC). Though the Pearson correlation coefficients have been traditionally used as a measure of test–retest agreement, this statistic is generally meant for measuring linear relationships between groups or variables of different “classes” and does not account for the individual variations in scores. ICC coefficients are better suited for measuring the relationship between different measures of the same construct and does account for individual variability. Test–retest reliability coefficients at or greater than 0.90 are considered very high and preferred for clinical decision-making, 0.80–0.89 are considered high, 0.70–0.79 are considered adequate, 0.60–0.69 are marginal, and 0.59 or lower are considered low (Lezak, Howieson, Bigler, & Tranel, 2012). With adequate or higher test–retest reliability, the examiner will have greater confidence that changes in the examinee's score are the result of changes in the trait being measured rather than changes due to the instability of the test.

Reported test–retest reliability coefficients in the existing NCAT literature vary from the low to high ranges. For example, in test–retest studies of CNS-Vital Signs, with a mean 67-day interval between testing (range of 1–282 days), the Pearson r coefficients (Spearman's rho for tests not normally distributed) varied between.31 and.87 for the various subtests and index scores of the battery, with scores of speed and reaction time having the highest correlations between time 1 and time 2 (Gualtieri & Johnson, 2006). Test–retest reliability for the ANAM, based on a mean 166.5-day interval (SD = 75.6), found similar Pearson correlation coefficients of .38–.87 for the subtests of the ANAM (Cernich, Reeves, Sun, & Bleiberg, 2006). ICC coefficients for CogState subtests varied between .69 and .82 for speed scores and .31 and .51 for accuracy scores after a 1-week delay between testing periods (Collie et al., 2003). The Pearson correlation coefficients for the indices of the Immediate Post-Concussion Assessment and Cognitive Test (ImPACT) in a sample of healthy adolescents and young adults, after an average 5.8-day interval (SD = 3.0), ranged from .67 to .86, with Reaction Time and Processing Speed demonstrating the highest reliability (Iverson, Lovell, & Collins, 2003). An approximately 1-year test–retest interval (range of 0.5–2.35 years) in high-school athletes using ImPACT resulted in ICC coefficients ranging from .62 to .85, again with Reaction Time and Processing speed as the scores with the highest ICC coefficients (Elbin, Schatz, & Covassin, 2011). When the test–retest interval was extended to 2 years (M = 1.9, SD = 0.6) using the ImPACT, Schatz (2010) found ICC coefficients ranging between .43 and .74, with Processing Speed scores demonstrating higher reliability. These studies illustrate the variability in test–retest reliability coefficients within and across NCATs as well as the variability in methods of investigating test–retest reliability, making it difficult to compare NCATs across studies. Single studies of multiple NCATs are needed for adequate comparisons to better inform medical providers and policy-makers in making decisions about the best NCATs to use.

The only study to date investigating the test–retest reliability of multiple NCATs is by Broglio, Ferrara, Macciocchi, Baumgartner, and Elliott (2007). They compared the test–retest reliability of three computerized batteries (ImPACT, Concussion Sentinel, and Headminder Concussion Resolution Index) by administering them to healthy college students at three time points: Baseline, 45, and 50 days post-baseline. Results of this study suggested that the test–retest reliability of many of the indices and subtests from these batteries was below the recommended values for minimally acceptable test–retest reliability. Specifically, ICC coefficients ranged between .15 and .39 (baseline to day 45) and .39 and .61 (days 45–50) for ImPACT, .23 and .65 (baseline to day 45) and .39 and .66 (days 45–50) for Concussion Sentinel, and .15 and .66 (baseline to day 45) and .03 and .66 (days 45–50) for Headminder. Though results indicated some batteries performed better than others, there was a considerable amount of variability in test–retest reliability within batteries, failing to offer clear evidence about a psychometrically superior NCAT. Additionally, Schatz, Kontos, and Elbin (2012) criticize the methodology employed by the Broglio and colleagues. Specifically, Schatz and colleagues raise concerns about administering multiple NCATs in close succession. Eonta and colleagues (2011) reported on “interference effects” in their test–retest study of the ANAM, where a decline in scores was seen in testing sessions immediately following an earlier session. Schatz and colleagues also raise concerns about the number of participants removed for poor effort in the Broglio and colleagues study, as the proportion was significantly higher than in the normative groups. Finally, only ICCs were reported; Schatz and colleagues claim that additional association measures, such as Pearson's r, Reliable Change Indices (RCIs), and Regression-Based Measures, should be reported as well.

The duration between baseline and posttest assessments can be a confounding factor in establishing test–retest reliability. Investigators of ImPACT found that test–retest reliabilities are relatively stable over long periods of time (e.g., 1–2 years) and may be more stable than 1–2-week intervals (Elbin et al., 2011; Schatz, 2010). However, Broglio and colleagues' (2007) findings suggest a shorter duration (i.e., 5 days) results in higher test–retest reliabilities than a longer duration (i.e., 45 days), though practice effects may account for improvements in reliability between the second and third administrations of the tests.

In designing studies of test–retest reliability, it is both the characteristics of the trait being measured and the intended use of the test that dictates the duration between administrations. In the military, Service Members are administered predeployment screenings that may occur up to a year before a potential injury could occur during the subsequent deployment. If an injury occurs, Service Members may be administered the NCAT multiple times within a few weeks until they return to baseline levels of performance, similar to assessment procedures used with athletes. Specific recommendations regarding the duration between test administrations have indicated that waiting as long as a year is likely too long and that a few weeks may be the ideal time between administrations (Marx, Menezes, Horovitza, Jones, & Warren, 2003). This recommendation is consistent with the clinical course of concussion/mTBI as most symptoms typically resolve within 1–2 weeks (Alexander, 1995; Belanger et al., 2005; Bigler, 2008). Since the military is using NCATs to track recovery following concussion to assist with return to duty decisions, the shorter duration (e.g., a “few weeks”) is of interest in this current investigation.

Another possible confounding factor in establishing test–retest reliability is practice effects. Like interference effects, described by Eonta and colleagues (2011), where performance is affected by repeated administrations, practice effects involve improvements in performance with repeated administrations of a test. Practice effects can be estimated, and failing to control for them could result in artificially low estimates of test–retest reliability.

Poor effort on assessments can also be a significant contributing factor to a patient's symptom profile and should be accounted for in evaluations of those symptoms (Lange, Iverson, Brooks, & Rennison, 2010). In assessment of military personnel, an individual may put forth suboptimal effort for secondary gain, such as avoiding hazardous duty or receiving medical benefit pay. Suboptimal effort can influence test–retest reliability and lead to erroneous conclusions regarding the test's psychometric properties. Failure to take effort into consideration could result in misguided decisions regarding a Service Member's fitness for duty. Assessing for the effort of the test taker is a critical component of any neuropsychological evaluation. As such, most NCATs include effort indicators derived from algorithms that determine if the participant's performance was consistent with someone putting forth adequate effort on the tests.

The lack of directly comparable psychometric data in the currently available literature regarding NCATs used for mTBI assessment highlights the need for studies involving multiple NCAT batteries. Furthermore, given the widespread use of NCATs by the military, the psychometric properties of these batteries need to be established in military populations. The current study seeks to evaluate the test–retest reliability of four NCATs (ANAM4, CNS-Vital Signs, CogState, and ImPACT), in a homogenous sample of healthy active duty military personnel using the same test–retest interval and the same testing procedures for all batteries. This will allow for more direct comparisons of the reliability of these assessment tools that is useful for military health policy-makers and clinicians as well as those outside of the military who may need to choose the one that is best suited for their specific clinical or research needs.

Methods

Participants

Active duty Service Member volunteers (n = 419) were recruited from replacement units at Fort Bragg to which they are temporarily assigned upon in-processing to the base. Participants were enrolled until a target number of at least 50 Service Members returning for retest for each NCAT was achieved. Fifty cases were required for sufficient power to detect an association of 0.38 or higher between time 1 and time 2 (two-tailed, α = 0.05). Service Members who self-reported head injury in the 6-month period prior to testing or in between time 1 and time 2 testing sessions were not eligible to participate.

Testing Sessions

Time 1

Participants were recruited during a briefing conducted by the Defense and Veterans Brain Injury Center regarding concussion education, a component of in-processing. Participants indicating an interest to participate were informed of study procedures, then read and signed the consent form approved by the Womack Army Medical Center's Institutional Review Board and the U.S. Army Medical Research and Materiel Command's Human Research Protections Office. Participants logged onto computers where each was assigned a random study ID number and presented with questionnaires collecting demographic, military history, and medical information. After completing these questionnaires, they took the TBI battery for one of four NCATs (CNS-Vital Signs, ANAM4, CogState, or ImPACT), as determined by the random assignment upon logon to the computer system (see Table 1 for specific NCAT information). All tests were administered and scored according to their developers' recommendations. Participants were tested in carefully controlled group conditions: Computer labs with partitions between computers to minimize distractions and headphones for auditory stimuli. The total time to complete consenting, questionnaires, and the randomly assigned NCAT was approximately 45 min.

Table 1.

A list of the four NCATs, their version, and the scores from each NCAT used in analyses

NCAT Version Scores 
CNS-Vital Signs 3.2.0.51 Memory
Verbal Memory
Visual Memory
Psychomotor Speed
Reaction Time
Complex Attention
Cognitive Flexibility
Processing Speed
Executive Functioning 
Automated Neuropsychological Assessment Metric (ANAM4) 4.3.01 Simple Reaction Time
Simple Reaction Time (Repeated)
Procedural Reaction Time
Mathematical Processing
Code Substitution Learning
Code Substitution Memory
Matching to Sample 
CogState 5.6 Detection Speed
Identification Speed
One Card Learning Accuracy
One Back Speed 
ImPACT (Special Standalone for Army) Verbal Memory
Visual Memory
Reaction Time
Processing Speed
Impulse Control 
NCAT Version Scores 
CNS-Vital Signs 3.2.0.51 Memory
Verbal Memory
Visual Memory
Psychomotor Speed
Reaction Time
Complex Attention
Cognitive Flexibility
Processing Speed
Executive Functioning 
Automated Neuropsychological Assessment Metric (ANAM4) 4.3.01 Simple Reaction Time
Simple Reaction Time (Repeated)
Procedural Reaction Time
Mathematical Processing
Code Substitution Learning
Code Substitution Memory
Matching to Sample 
CogState 5.6 Detection Speed
Identification Speed
One Card Learning Accuracy
One Back Speed 
ImPACT (Special Standalone for Army) Verbal Memory
Visual Memory
Reaction Time
Processing Speed
Impulse Control 

Notes: NCAT = Computerized neurocognitive assessment tool; ImPACT = Immediate Post-Concussion Assessment and Cognitive Test.

Time 2

Participants were asked to return approximately 30 days from time 1 testing to complete the time 2 test session, with a window of 21–42 days allowed for the retest. Though 30 days from time 1 testing was the targeted date for return, the 21–42 day window was provided to accommodate Service Members' busy schedules and maximize retention. Upon their return, participants updated any relevant information on demographic and medical questionnaires and took the same NCAT they were assigned at time 1 testing. All four NCATs provide an alternate version for follow-up assessments, created by the random selection of stimuli built-in by the NCAT developers.

Statistical procedures

All statistical procedures were conducted by a neutral third party statistical consulting company working with a de-identified and blinded database to protect against possible bias (i.e., NCAT names and specific variable labels were replaced with generic variable names) in order to prevent the identification of the NCATs during analyses. Chi-square and paired samples t-tests were used to determine equivalence between individuals failing to return for time 2 and those completing both testing sessions as well as between groups taking each NCAT. The Pearson correlation coefficients (r) were calculated as an indicator of the linear relationship between measurements at time 1 and time 2 and were used in the reliable change analyses described below. The SAS software procedure GLM was used to calculate ICC coefficients to estimate the test–retest reliability between NCAT scores at time 1 and time 2. This procedure partitions the variability in the observed scores into within-person variance and between-person variance, where the ICC coefficient is the ratio of the between-person variability to the total variance. ICC analyses produce values between 0 and 1, with values closer to 1 indicating less error variance and stronger test–retest reliability. Reliabilities were evaluated using the previously described criteria (Lezak et al., 2012). The NCAT scores evaluated in test–retest reliability analyses were routinely recommended by each NCAT developer (Table 1). For ImPACT and CNS-Vital Signs, standardized index scores for the primary cognitive domains measured by each battery were used. For CogState, z-scores for each subtest as well as a total composite score were used. For ANAM4, “Throughput” scores, calculated from reaction time and accuracy, for each subtest were used. All NCAT scores were generated automatically by computerized algorithms using proprietary formulas and normative data built into each NCAT by its developer.

RCIs were also calculated for each individual across all NCAT scores from time 1 to time 2. The RCI is a method for determining if changes in scores across test sessions are beyond what would be expected by chance. The modified formula for the RCI presented by Moritz, Iverson, and Woodward (2003) was used for these analyses, as this formula corrects for potential practice effects. A 95% confidence interval was used in determining reliable change, indicating that by chance approximately 5% of all scores were expected to show changes greater than the calculated RCI.

Participant effort was also assessed using statistical algorithms built-in to each NCAT by its developer. For each NCAT, a score was generated indicating if the participant's effort was considered adequate, thereby whether test results were credible. In all cases, the NCATs have minimal scoring criteria (e.g., minimal accuracy or reaction/response time) that must be met across multiple subtests for performance to be considered effortful. Failure to meet those minimal scoring criteria is deemed to represent poor effort thereby invalidating one's test results. Participants identified as having inadequate effort on either or both testing sessions were removed from analyses.

Results

Of the 419 participants enrolled in the study, 3 withdrew during initial testing before completion due to frustration with the tests (2) and lack of time to complete the testing session (1). These data are not included in analyses. Overall, 215 participants (56 CNS-Vital Signs, 52 ANAM4, 57 CogState, and 50 ImPACT) completed both time 1 and time 2 test sessions (51% retention rate). Recruiting in-processing soldiers proved challenging for retention, as they often became too busy to return or difficult to contact again after integrating into their permanent units. The mean duration between time 1 and time 2 was 32 days (SD = 4.5), median was 31 days, with a range of 21–42 days. The duration between time 1 and time 2 was not significantly different between the four NCATs. No participant sustained a concussion or any other significant health-related event during the study. Seventeen participants were flagged for poor effort on CNS-Vital Signs, two on ANAM4, four on CogState, and six on ImPACT. Data from these participants were not used in subsequent analyses.

The participants with complete data were predominantly men and White, and many had completed either some college coursework or at least a 4-year degree. Approximately half were enlisted Service Members and half warrant or commissioned officers (Table 2). The group of soldiers completing both test sessions had more women and more officers and were slightly more educated than the group only completing time 1 testing. Officers likely have more flexibility in their schedules, allowing them to return for the retest session more easily. As a group, officers are comprised of more women and are more educated (i.e., college degree) than enlisted Service Members. The groups of Service Members completing each NCAT were not statistically different from each other.

Table 2.

Demographic characteristics for participants completing testing at time 1 but did not return for time 2 and for participants who completed testing at time 1 and time 2

 Time 1 only Time 1 and Time 2 
Age 
 Mean (Min–Max) 29 (19–51) 34 (19–59) 
Gender (N [%])a 
 Men 191 (95) 177 (82.7) 
 Women 10 (5) 37 (17.3) 
Ethnicity (N [%]) 
 Caucasian 125 (62.2) 127 (59.4) 
 African American 32 (15.9) 41 (19.2) 
 Latino 24 (11.9) 30 (14) 
 Asian 4 (2) 5 (2.3) 
 Other 16 (8) 11 (5.1) 
Educationa 
 GED/no High-School diploma 2 (1) 3 (1.4) 
 High-School diploma 44 (21.9) 16 (7.5) 
 Some college 63 (31.3) 62 (29) 
 Associates Degree 13 (6.5) 15 (7) 
 Bach degree or higher 79 (39.3) 118 (55.1) 
Paygradea 
 Enlisted 131 (65.2) 102 (47.7) 
 Warrant Officer 6 (3) 17 (7.9) 
 Commissioned Officer 64 (31.8) 95 (44.4) 
Years Active Duty 6.08 (6.01) 10.10 (7.29) 
# Past Concussions 0.79 (0–10) 0.71 (0–6) 
 Time 1 only Time 1 and Time 2 
Age 
 Mean (Min–Max) 29 (19–51) 34 (19–59) 
Gender (N [%])a 
 Men 191 (95) 177 (82.7) 
 Women 10 (5) 37 (17.3) 
Ethnicity (N [%]) 
 Caucasian 125 (62.2) 127 (59.4) 
 African American 32 (15.9) 41 (19.2) 
 Latino 24 (11.9) 30 (14) 
 Asian 4 (2) 5 (2.3) 
 Other 16 (8) 11 (5.1) 
Educationa 
 GED/no High-School diploma 2 (1) 3 (1.4) 
 High-School diploma 44 (21.9) 16 (7.5) 
 Some college 63 (31.3) 62 (29) 
 Associates Degree 13 (6.5) 15 (7) 
 Bach degree or higher 79 (39.3) 118 (55.1) 
Paygradea 
 Enlisted 131 (65.2) 102 (47.7) 
 Warrant Officer 6 (3) 17 (7.9) 
 Commissioned Officer 64 (31.8) 95 (44.4) 
Years Active Duty 6.08 (6.01) 10.10 (7.29) 
# Past Concussions 0.79 (0–10) 0.71 (0–6) 

Notes: GED = General Educational Development.

aSignificant differences between participants completing both time 1 and time 2 testing and those only completing time 1 testing.

The mean and the standard deviations for each domain/composite score of interest at time 1 and time 2 from those with valid data at both sessions are presented in Table 3 for ANAM4, Table 4 for CNS-Vital Signs, Table 5 for CogState, and Table 6 for ImPACT. The test–retest reliability ICC coefficients for each output score from time 1 to time 2 are also in these tables. Across all NCATs, 26 individual composite or domain scores were examined. In participants putting forth adequate effort, no scores met the criteria of “very high” test–retest reliability (≥0.90), only 1 score met criteria for “high” reliability (0.80–0.89), and 10 met the criteria for adequate reliability (0.70–0.79). Only one NCAT (ImPACT) had a score with high reliability (0.80–0.89). The remaining three NCATs each had at least two scores with adequate reliability (0.70–0.79). Each NCAT also had at least one score that did not meet adequate reliability criteria. CogState had the highest proportion of scores that met at least adequate reliability (80%). CNS-Vital Signs, ANAM4, and ImPACT each had less than one third of their scores meet at least adequate reliability. Of note, ICC coefficients were similar between officers and enlisted with the exception of Code Substitution on the ANAM4 (officers had a higher ICC coefficient) and Verbal Memory on ImPACT (enlisted had a higher ICC coefficient).

Table 3.

Means (SD), the Pearson correlation coefficient (r), ICC coefficient and associated 95% confidence interval, and Sdiff score, used in RCI calculations, for ANAM4

ANAM4  variable (n = 50) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Simple Reaction Time 86.88 (11.46) 91.38 (12.76) .65 .60 0.39 0.75 10.11 
Simple Reaction Time (repeated) 93.80 (9.92) 92.96 (13.32) .41 .40 0.14 0.61 12.77 
Procedural Reaction Time 97.62 (13.10) 103.56 (10.31) .62 .51 0.28 0.69 12.77 
Math Processing 104.26 (14.29) 103.66 (15.81) .70 .70b 0.53 0.82 11.67 
Code Subset Learning 104.14 (15.92) 103.96 (13.79) .79 .79b 0.66 0.87 9.56 
Code Subset Memory 97.76 (16.69) 105.98 (16.13) .68 .59 0.38 0.74 13.07 
Matching to Sample 98.60 (14.92) 101.52 (13.32) .69 .67 0.49 0.80 11.10 
ANAM4  variable (n = 50) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Simple Reaction Time 86.88 (11.46) 91.38 (12.76) .65 .60 0.39 0.75 10.11 
Simple Reaction Time (repeated) 93.80 (9.92) 92.96 (13.32) .41 .40 0.14 0.61 12.77 
Procedural Reaction Time 97.62 (13.10) 103.56 (10.31) .62 .51 0.28 0.69 12.77 
Math Processing 104.26 (14.29) 103.66 (15.81) .70 .70b 0.53 0.82 11.67 
Code Subset Learning 104.14 (15.92) 103.96 (13.79) .79 .79b 0.66 0.87 9.56 
Code Subset Memory 97.76 (16.69) 105.98 (16.13) .68 .59 0.38 0.74 13.07 
Matching to Sample 98.60 (14.92) 101.52 (13.32) .69 .67 0.49 0.80 11.10 

Notes: Only cases with valid data at both time 1 and time 2 test sessions are reported. ANAM = Automated Neuropsychological Assessment Metrics; CI = confidence interval; ICC = intraclass correlation; Sdiff = standard error of the difference.

aCalculated using the Moritz and colleagues (2003) methodology.

bMeets “adequate” or higher reliability criterion, ≥0.7.

Table 4.

Means (SD), the Pearson correlation coefficient, ICC coefficient and associated 95% confidence interval, Sdiff score, used in RCI calculations, for CNS-Vital Signs. Only cases with valid data at both time 1 and time 2 test sessions are reported

CNS-Vital Signs variable (n = 39) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Memory 102.23 (15.87) 104.08 (15.11) .53 .54 0.28 0.73 14.97 
Verbal Memory 94.59 (20.69) 100.54 (14.06) .34 .29 0.02 0.55 20.37 
Visual Memory 108.87 (13.14) 106.46 (16.59) .48 .47 0.19 0.68 15.23 
Psychomotor Speed 103.36 (15.27) 109.21 (15.80) .77 .72b 0.53 0.84 10.43 
Reaction Time 93.87 (13.76) 97.54 (12.88) .78 .75b 0.58 0.86 8.79 
Complex Attention 97.28 (19.83) 98.62 (18.56) .79 .79b 0.64 0.88 12.52 
Cognitive Flexibility 101.82 (15.42) 109.05 (14.51) .71 .62 0.39 0.78 11.35 
Processing Speed 103.44 (16.14) 109.85 (17.35) .68 .63 0.40 0.78 13.39 
Executive Functioning 102.44 (15.28) 109.72 (14.46) .73 .64 0.42 0.79 10.96 
Neurocognitive Index 99.74 (10.50) 103.77 (10.17) .76 .70b 0.50 0.83 7.11 
CNS-Vital Signs variable (n = 39) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Memory 102.23 (15.87) 104.08 (15.11) .53 .54 0.28 0.73 14.97 
Verbal Memory 94.59 (20.69) 100.54 (14.06) .34 .29 0.02 0.55 20.37 
Visual Memory 108.87 (13.14) 106.46 (16.59) .48 .47 0.19 0.68 15.23 
Psychomotor Speed 103.36 (15.27) 109.21 (15.80) .77 .72b 0.53 0.84 10.43 
Reaction Time 93.87 (13.76) 97.54 (12.88) .78 .75b 0.58 0.86 8.79 
Complex Attention 97.28 (19.83) 98.62 (18.56) .79 .79b 0.64 0.88 12.52 
Cognitive Flexibility 101.82 (15.42) 109.05 (14.51) .71 .62 0.39 0.78 11.35 
Processing Speed 103.44 (16.14) 109.85 (17.35) .68 .63 0.40 0.78 13.39 
Executive Functioning 102.44 (15.28) 109.72 (14.46) .73 .64 0.42 0.79 10.96 
Neurocognitive Index 99.74 (10.50) 103.77 (10.17) .76 .70b 0.50 0.83 7.11 

Notes: CI = confidence interval; ICC = intraclass correlation; Sdiff = standard error of the difference.

aCalculated using the Moritz and colleagues (2003) methodology.

bMeets “adequate” or higher reliability criterion, ≥0.7.

Table 5.

Means (SD), the Pearson correlation coefficient, ICC coefficient and associated 95% confidence interval, and Sdiff score, used in RCI calculations, for CogState

CogState variable (n = 53) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Detection Speed −0.89 (1.48) −0.92 (1.52) .77 .78b 0.65 0.87 0.64 
Identification Speed −0.51 (1.13) −0.38 (1.12) .78 .77b 0.64 0.86 1.01 
One Card Learning Accuracy 0.39 (0.89) 0.57 (0.62) .25 .22 −0.05 0.46 0.94 
One Back Speed −0.83 (1.14) −0.58 (1.11) .76 .74b 0.59 0.84 0.78 
Composite −0.54 (1.02) −0.39 (0.98) .80 .79b 0.66 0.87 0.64 
CogState variable (n = 53) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Detection Speed −0.89 (1.48) −0.92 (1.52) .77 .78b 0.65 0.87 0.64 
Identification Speed −0.51 (1.13) −0.38 (1.12) .78 .77b 0.64 0.86 1.01 
One Card Learning Accuracy 0.39 (0.89) 0.57 (0.62) .25 .22 −0.05 0.46 0.94 
One Back Speed −0.83 (1.14) −0.58 (1.11) .76 .74b 0.59 0.84 0.78 
Composite −0.54 (1.02) −0.39 (0.98) .80 .79b 0.66 0.87 0.64 

Notes: Only cases with valid data at both time 1 and time 2 test sessions are reported. CI = confidence interval; ICC = intraclass correlation; Sdiff = standard error of the difference.

aCalculated using the Moritz and colleagues (2003) methodology.

bMeets “adequate” or higher reliability criterion, ≥0.7.

Table 6.

Means (SD), the Pearson correlation coefficient, ICC coefficient and associated 95% confidence interval, and Sdiff score, used in RCI calculations, for ImPACT

ImPACTvariable (n = 44) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Verbal Memory 87.89 (15.40) 90.36 (13.20) .61 .60 0.38 0.76 12.66 
Visual Memory 72.02 (15.70) 73.59 (16.35) .49 .50 0.25 0.69 16.17 
Reaction Time 0.60 (0.08) 0.61 (0.09) .53 .53 0.28 0.71 0.08 
Visual Motor Speed 27.65 (3.75) 28.50 (3.74) .86 .83b 0.71 0.90 2.01 
ImPACTvariable (n = 44) Time 1 (mean [SD]) Time 2 (mean [SD]) r ICC ICC 95% CI
 
Sdiffa 
Lower Upper 
Verbal Memory 87.89 (15.40) 90.36 (13.20) .61 .60 0.38 0.76 12.66 
Visual Memory 72.02 (15.70) 73.59 (16.35) .49 .50 0.25 0.69 16.17 
Reaction Time 0.60 (0.08) 0.61 (0.09) .53 .53 0.28 0.71 0.08 
Visual Motor Speed 27.65 (3.75) 28.50 (3.74) .86 .83b 0.71 0.90 2.01 

Notes: Only cases with valid data at both time 1 and time 2 test sessions are reported. CI = confidence interval; ICC = intraclass correlation; Sdiff = standard error of the difference.

aCalculated using the Moritz and colleagues (2003) methodology.

bMeets “adequate” or higher reliability criterion, ≥0.7.

Of note, out of the 11 total scores that were in the adequate or higher range, 8 scores were from subtests/domains that involved response speed (e.g., reaction time, processing speed, or some other speeded aspect) to the test. Only six scores involving response speed were not in the adequate or higher range, with five of those from ANAM (i.e., Throughput scores) and the other from CNS-Vital Signs (Processing Speed).

Pearson's r and the standard error of the difference (Sdiff) score used to calculate the RCI are also reported in Tables 3–6. The Pearson coefficients were very similar to ICC coefficients. The RCI generally indicated an expected number of individuals displayed “reliable change” across all subtest/index scores. Specifically, all subtests had between 0 and 5 participants displaying significant change, with most subtests/indices at 5% or fewer participants (i.e., expected chance levels) and none at more than 10%.

Discussion

This study was designed to allow for direct comparisons of the test–retest reliability of four computerized NCATs (ANAM4, CNS-Vital Signs, CogState, and ImPACT) by using the same military population, the same testing environment, and the same clinically relevant test–retest interval. Although the study sample was not truly representative of the demographic make-up of all active duty Service Members, the groups were similar across NCATs. Inadequate effort was also controlled for in the data by computerized algorithms recommended by each NCAT company. The data from the current study reveal a wide range of test–retest reliabilities within and across NCATs, with coefficients ranging from the low range (0.22) to the high range (0.83). This is generally consistent with the existing literature on these tests (Broglio et al., 2007; Cernich et al., 2006; Collie et al., 2003; Elbin et al., 2011; Gualtieri & Johnson, 2006; Iverson et al., 2003; Schatz, 2010). Although the reliabilities varied within and across the NCATs, four of the five scores from CogState had adequate reliability, and less than one third of the scores from CNS-Vital Signs, ANAM4, and ImPACT had adequate or better reliability.

Across all four NCATs, tests of response speed appeared to have higher reliability coefficients than tests of accuracy. There are indications that reaction time data may be the most promising score derived from computer testing (Coldren, Russell, Parish, Dretsch, & Kelly, 2012; Eckner, Kutcher, Broglio, & Richardson, 2013). An advantage of NCATs over traditional tests is the ability to accurately and precisely measure response speed, and this may be reflected in the psychometric properties of timed tasks. However, not all NCATs measure response speed or incorporate response speed in composite scores in the same manner, again contributing to the difficulty in direct comparisons.

Overall, test–retest reliabilities reported in the literature and the current study suggest that computerized NCATs are less reliable than suggested for clinical use. If treatment algorithms required Service Members to return to baseline levels of performance, there is increased risk for false positives or false negatives. Specifically, some individuals could be returned to duty prior to full recovery, and conversely, some fully recovered individuals may be unnecessarily held out of duty. Clinicians using these tests should carefully consider the impact of lower-than-desired test–retest reliability. As Broglio and colleagues (2007) recommended, a battery of different neurocognitive tests is recommended to account for weak psychometric properties one test may have. In addition, clinicians should consider incorporating assessments from other functional domains, such as visual tracking and vestibular functioning/balance testing (Anderson, Heitger, & Macleod, 2006; Galetta et al., 2011; Heitger et al., 2009; McCrea, 2008; Riemann & Guskiewicz, 2000) as well as other symptoms, such as headache, dizziness, and so on.

It may be that the computerized screening of cognitive domains should not be held to the same psychometric standard as traditional Neuropsychological testing. Computerized NCATs have the capability of precisely and accurately measuring constructs such as reaction time to hundredths of seconds and, therefore, may be more sensitive to changes in performance due to state variables (e.g., fatigue, pain, testing environment, etc.) than traditional tests may be. Though tests involving reaction time had some of the highest reliabilities, they were still only within the adequate to high ranges at best. Given the high level of measurement precision for some constructs, it may be unreasonable to expect reliability coefficients above 0.90 on NCATs. These tests may also be more sensitive to practice effects, resulting in improvements in scores from administration to administration and thus lower test–retest reliability. Therefore, multiple administrations are required before test–retest reliabilities level off and improve to clinically acceptable levels (Eonta et al., 2011). Future studies in military populations should account for prior administrations of tests such as ANAM4 or ImPACT that were conducted for military or medical purposes. The lack of this information in the current study is viewed as a potential limitation.

Despite less than desired test–retest reliabilities, NCATs may continue to be clinically meaningful. Recent research suggests that comparisons to clinical norms are as meaningful, if not more so, than baseline comparisons (Echemendia et al., 2012; Randolph, 2011; Schmidt, Register-Mihalik, Mihalik, Kerr, & Guskiewicz, 2012). Utilizing norms as opposed to baseline assessments could help make better clinical decisions in light of current test–retest reliabilities of NCATs. Also, Schatz and colleagues (2012) have recommended alternative ways of assessing a test's stability over time, including the RCI, reported in this study. RCI analyses indicated generally acceptable rates of reliable change between time 1 and time 2 for all four NCATs. However, no matter how much intraindividual variability exists within a set of repeated measures, the RCI is designed to identify only those individuals whose change in scores between time 1 and time 2 is so large that it is highly unlikely to be due to the measurement error thereby indicating with a high degree of probability that a clinically meaningful amount of change has occurred (Schatz, 2010). This renders the RCI problematic as a measure of test–retest reliability (Mayers & Redick, 2012) and likely accounts for the often contradictory findings between the ICCs and RCIs observed in this study. Therefore, alternative measures such as the RCI may be more useful as clinical tools, when there are established norms, rather than measures of reliability.

Future studies should seek to establish test–retest reliabilities in a sample that is truly representative of active duty Service Members by including proportionally more enlisted Service Members and expanding to other branches of the military. Also, a variety of clinically meaningful test–retest intervals should continue to be evaluated. This would include longer intervals (e.g., months to 1–2 years) to simulate predeployment baseline assessments and subsequent injuries while deployed, as well as shorter intervals (e.g., days to weeks) to simulate post-injury assessments. Alternative methods for assessing stability, such as the RCI, should continue to be investigated for their clinical utility in NCATs.

This study also suggests that a deeper investigation of the construction of individual subtests is needed. As discussed above, subtests that are influenced heavily by reaction time and processing speed, and specifically scores that combine reaction time and accuracy, may be more psychometrically sound and clinically meaningful. Also, aggregate scores, calculated by combining scores across multiple domains or indices, may prove more stable across time as they factor out within-test fluctuations that could heavily influence scores on individual subtests and composites. In addition, the test setting should be considered, especially as there is emerging evidence that testing in large groups may negatively affect baseline scores and thus impact test–retest reliability, especially as follow-up testing (i.e., post-injury testing) is often conducted individually (Moser et al., 2011). Finally, some subtests may be improved with changes to the presentation format. For example, test proctors noted several subtests in which participants asked about the task instructions, with such queries often coming midway through the task. One subtest in particular on CNS-Vital Signs seemed to result in the most confusion for participants and may have largely accounted for the larger number of participants flagged for inadequate effort on CNS-Vital Signs. Small changes such as practice blocks with corrective feedback could eliminate such problems without significant additional burden to the test takers or proctors, while improving the psychometrics of the test. However, in all situations, considerations should be given to what is gained in test–retest stability, as clinical meaningfulness may be lost.

Finally, reliability is only a piece of the psychometric puzzle. A highly reliable test is clinically useless if it is not also valid. In addition to establishing either adequate test–retest reliability or methods of assessing stability that account for testing error, NCATs need to be evaluated for the degree they measure the cognitive domain claimed to measure or adequately identify individuals at risk for mTBI-related problems. Thus, validity studies comparing NCATs to traditional neuropsychological assessments with healthy controls and acutely injured participants are needed.

Funding

This research was supported by the Defense and Veterans Brain Injury Center and conducted with the oversight and support of the Henry M. Jacskon Foundation and Womack Army Medical Center.

Conflict of Interest

The views expressed herein are those of the authors and do not reflect the official policy of the Department of the Army, Department of Defense, or the U.S. Government.

Acknowledgements

Thank you to Mary Alice Dale, Katie Toll, Wes McGee, A3ITS, and Westat for assistance with data collection and analyses.

References

Alexander
M. P.
Mild traumatic brain injury: Pathophysiology, natural history, and clinical management
Neurology
 , 
1995
, vol. 
45
 (pg. 
1253
-
1260
)
Anderson
T.
Heitger
M.
Macleod
A. D.
Concussion and mild head injury
Practical Neurology
 , 
2006
, vol. 
6
 (pg. 
342
-
357
)
Bauer
R. M.
Iverson
G. L.
Cernich
A. N.
Binder
L. M.
Ruff
R. M.
Naugle
R. I.
Computerized neuropsychological assessment devices: Joint position paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 
3
(pg. 
362
-
373
)
Belanger
H. G.
Curtiss
G.
Demery
J. A.
Lebowitz
B. K.
Vanderploeg
R. D.
Factors moderating neuropsychological outcomes following mild traumatic brain injury: A meta-analysis
Journal of the International Neuropsychological Society
 , 
2005
, vol. 
11
 (pg. 
215
-
227
)
Bigler
E. D
Neuropsychology and clinical neuroscience of persistent post-concussive syndrome
Journal of the International Neuropsychological Society
 , 
2008
, vol. 
14
 
1
(pg. 
1
-
22
)
Broglio
S. P.
Ferrara
M. S.
Macciocchi
S. N.
Baumgartner
T. A.
Elliott
R.
Test-retest reliability of computerized concussion assessment programs
Journal of Athletic Training
 , 
2007
, vol. 
42
 
4
(pg. 
509
-
514
)
Carroll
L. J.
Cassidy
J. D.
Peloso
P. M.
Borg
J.
von Holst
H.
Holm
L.
, et al.  . 
Prognosis for mild traumatic brain injury: Results of the WHO Collaborating Centre Task Force on Mild Traumatic Brain Injury
Journal of Rehabilitation Medicine
 , 
2004
, vol. 
43
 (pg. 
84
-
105
)
Cernich
A.
Reeves
D.
Sun
W.
Bleiberg
J.
Automated Neuropsychological Assessment Metrics sports medicine battery
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
22S
 (pg. 
S101
-
S114
)
Coldren
R. L.
Russell
M. L.
Parish
R. V.
Dretsch
M.
Kelly
M. P.
The ANAM lacks utility as a diagnostic or screening tool for concussion more than 10 days following injury
Military Medicine
 , 
2012
, vol. 
177
 (pg. 
179
-
183
)
Collie
A.
Darby
D.
Maruff
P.
Computerized cognitive assessment of athletes with sports elated head injury
British Journal of Sports Medicine
 , 
2001
, vol. 
35
 (pg. 
297
-
302
)
Collie
A.
Maruff
P.
Makdissi
M.
McCrory
P.
McStephen
M.
Darby
D.
CogSport: Reliability and correlation with conventional cognitive tests used in postconcussion medical evaluations
Clinical Journal of Sports Medicine
 , 
2003
, vol. 
13
 (pg. 
28
-
32
)
Department of Defence
DoD Policy Guidance for Management of Mild Traumatic Brain Injury/Concussion in the Deployed Setting
 , 
2012, September
 
Instruction 6490.11 Retrieved from www.dtic.mil/whs/directives/corres/pdf/649011p.pdf
Echemendia
R. J.
Bruce
J. M.
Bailey
C. M.
Sanders
J. F.
Arnett
P.
Vargas
G.
The utility of post-concussion neuropsychological data in identifying cognitive change following sports-related MTBI in the absence of baseline data
The Clinical Neuropsychologist
 , 
2012
, vol. 
26
 
7
(pg. 
1077
-
1091
)
Eckner
J. T.
Kutcher
J. S.
Broglio
S. P.
Richardson
J. K.
Effect of sport-related concussion on clinically measured simple reaction time
British Journal of Sports Medicine
 , 
2013
, vol. 
00
 (pg. 
1
-
8
[Epub ahead of print].
Elbin
R. J.
Schatz
P.
Covassin
T.
One-year test-retest reliability of the online version of ImPACT in high school athletes
American Journal of Sports Medicine
 , 
2011
, vol. 
39
 
11
(pg. 
2319
-
2324
)
Eonta
S. E.
Carr
W.
McArdle
J. J.
Kain
J. M.
Tate
C.
Wesensten
N. J.
, et al.  . 
Automated neuropsychological assessment metrics: Repeated assessment with two military samples
Aviation, Space, and Environmental Medicine
 , 
2011
, vol. 
82
 
1
(pg. 
34
-
39
)
Galetta
K. M.
Barrett
J.
Allen
M.
Madda
F.
Delicata
D.
Tennant
A. T.
, et al.  . 
The King-Devick test as a determinant of head trauma and concussion in boxers and MMA fighters
Neurology
 , 
2011
, vol. 
76
 
1
(pg. 
1456
-
1462
)
Gualtieri
C. T.
Johnson
L. G.
Neurocognitive testing supports a broader concept of mild cognitive impairment
American Journal of Alzheimer's Disease and Other Dementia
 , 
2005
, vol. 
20
 
6
(pg. 
359
-
366
)
Gualtieri
C. T.
Johnson
L.
Reliability and validity of a computerized neurocognitive test battery, CNS Vital Signs
Archives of Clinical Neuropsychology
 , 
2006
, vol. 
21
 (pg. 
623
-
643
)
Heitger
M. H.
Jones
R. D.
Macleod
A. D.
Snell
D. L.
Frampton
C. M.
Anderson
T. J.
Impaired eye movements in post-concussion syndrome indicate suboptimal brain function beyond the influence of depression, malingering, or intellectual ability
Brain
 , 
2009
, vol. 
132
 (pg. 
2850
-
2870
)
Iverson
G. L.
Brooks
B. L.
Collins
M. W.
Lovell
M. R.
Tracking neuropsychological recovery following concussion in sport
Brain Injury
 , 
2006
, vol. 
20
 (pg. 
245
-
252
)
Iverson
G. L.
Lovell
M. R.
Collins
M. W.
Interpreting change on ImPACT following sports concussion
Clinical Neuropsychology
 , 
2003
, vol. 
17
 
4
(pg. 
460
-
467
)
Lange
R. T.
Iverson
G. T.
Brooks
B. L.
Rennison
V. L. A.
Influence of poor effort on self-reported symptoms and neurocognitive test performance following mild traumatic brain injury
Journal of Clinical and Experimental Neuropsychology
 , 
2010
, vol. 
32
 
9
(pg. 
961
-
972
)
Leposavic
I.
Leposavic
L.
Saula-Marojevic
B.
Neuropsychological assessment: Computerized batteries of standard tests
Psychiatria Danubina
 , 
2010
, vol. 
22
 
2
(pg. 
149
-
152
)
Lezak
M. D.
Howieson
D. B.
Bigler
E. D.
Tranel
D.
Neuropsychological Assessment
 , 
2012
5th ed.
New York
Oxford University Press
Lovell
M. R.
Collins
M. W.
New developments in the evaluation of sports related concussion
Current Sports Medicine Report
 , 
2002
, vol. 
1
 (pg. 
287
-
292
)
Marx
R. G.
Menezes
A.
Horovitza
L.
Jones
E. C.
Warren
R. F.
A comparison of two time intervals for test-retest reliability of health status instruments
Journal of Clinical Epidemiology
 , 
2003
, vol. 
56
 (pg. 
730
-
735
)
Mayers
L. B.
Redick
T. S.
Authors’ reply to “Response to Mayers and Redick: ‘Clinical utility of ImPACT assessment for postconcussion return-to-play counseling: Psychometric issues’”
Journal of Clinical and Experimental Neuropsychology
 , 
2012
, vol. 
34
 
4
(pg. 
435
-
442
)
McCrea
M. A.
Mild traumatic brain injury and postconcussion syndrome: The new evidence base for diagnosis and treatment
 , 
2008
New York
Oxford University Press
McCrory
P.
Meeuwisse
W.
Johnston
K.
Dvorak
J.
Aubry
M.
Molloy
M.
, et al.  . 
Consensus statement on concussion in sport: The 3rd International Conference on Concussion in Sport held in Zurich, November 2008
Journal of Athletic Training
 , 
2009
, vol. 
44
 
4
(pg. 
434
-
448
)
Moritz
S.
Iverson
G. L.
Woodward
T. S.
Reliable change indexes for memory performance in schizophrenia as a means to determine drug-induced cognitive decline
Applied Neuropsychology
 , 
2003
, vol. 
10
 
2
(pg. 
115
-
120
)
Moser
R. S.
Schatz
P.
Neidzwski
K.
Ott
S. D.
Group versus individual administration affects baseline neurocognitive test performance
American Journal of Sports Medicine
 , 
2011
, vol. 
39
 
11
(pg. 
2325
-
2330
)
Ponsford
J.
Willmott
C.
Rothwell
A.
Cameron
P.
Kelly
A.
Nelms
R.
, et al.  . 
Factors influencing outcome following mild traumatic brain injury in adults
Journal of the International Neuropsychological Society
 , 
2000
, vol. 
6
 
5
(pg. 
568
-
79
)
Randolph
C.
Baseline neuropsychological testing in managing sport-related concussion: Does it modify risk?
Current Sports Medicine Reports
 , 
2011
, vol. 
10
 
1
(pg. 
21
-
26
)
Riemann
B. L.
Guskiewicz
K. M.
Effects of mild head injury on postural stability as measured through clinical balance testing
Journal of Athletic Training
 , 
2000
, vol. 
35
 
1
(pg. 
19
-
25
)
Schatz
P.
Long-term test-retest reliability of baseline cognitive assessments using ImPACT
American Journal of Sports Medicine
 , 
2010
, vol. 
38
 
1
(pg. 
47
-
53
)
Schatz
P.
Browndyke
J.
Applications of computer-based neuropsychological assessment
Journal of Head Trauma and Rehabilitation
 , 
2002
, vol. 
17
 
5
(pg. 
395
-
410
)
Schatz
P.
Kontos
A.
Elbin
R. J.
Response to Mayers and Redick: “Clinical utility of ImPACT assessment for postconcussion return-to play counseling: Psychometric issues”
Journal of Clinical and Experimental Neuropsychology
 , 
2012
, vol. 
34
 
4
(pg. 
428
-
434
)
Schatz
P.
Zillmer
E. A.
Computer-based assessment of sports-related concussion
Applied Neuropsychology
 , 
2003
, vol. 
10
 (pg. 
42
-
47
)
Schmidt
J. D.
Register-Mihalik
J. K.
Mihalik
J. P.
Kerr
Z. Y.
Guskiewicz
K. M.
Identifying impairments after concussion: Normative data versus individualized baselines
Medicine and Science in Sports and Exercise
 , 
2012
, vol. 
44
 
9
(pg. 
1621
-
1628
)
Terrio
H.
Brenner
L. A.
Ivins
B. J.
Cho
J. M.
Helmick
K.
Schwab
K.
, et al.  . 
Traumatic brain injury screening: Preliminary findings in a US Army Brigade Combat Team
Journal of Head Trauma Rehabilitation
 , 
2009
, vol. 
24
 
1
(pg. 
14
-
23
)
US Army Traumatic Brain Injury Task Force
Report to the Surgeon General
2007
 
United States House of Representatives
H.R. 4986, National Defense Authorization Act for Fiscal Year 2008, Sec. 1618, Comprehensive Plan on Prevention, Diagnosis, Mitigation, Treatment, Rehabilitation of, and Research on, Traumatic Brain Injury, Post Traumatic Stress Disorder, and Other Mental Health Conditions in Members of the Armed Forces
2008