Abstract

There is currently no standard criterion for determining abnormal test scores in neuropsychology; thus, a number of different criteria are commonly used. We investigated base rates of abnormal scores in healthy older adults using raw and T-scores from indices of the Wisconsin Card Sorting Test and Stroop Color–Word Test. Abnormal scores were examined cumulatively at seven cutoffs including >1.0, >1.5, >2.0, >2.5, and >3.0 standard deviations (SD) from the mean as well as those below the10th and 5th percentiles. In addition, the number of abnormal scores at each of the seven cutoffs was also examined. Results showed when considering raw scores, ∼15% of individuals obtained scores >1.0 SD from the mean, around 10% were less than the 10th percentile, and 5% fell >1.5 SD or <5th percentile from the mean. Using T-scores, approximately 15%–20% and 5%–10% of scores were >1.0 and >1.5 SD from the mean, respectively. Roughly 15% and 5% fell at the <10th and <5th percentiles, respectively. Both raw and T-scores >2.0 SD from the mean were infrequent. Although the presence of a single abnormal score at 1.0 and 1.5 SD from the mean or at the 10th and 5th percentiles was not unusual, the presence of ≥2 abnormal scores using any criteria was uncommon. Consideration of base rate data regarding the percentage of healthy individuals scoring in the abnormal range should help avoid classifying normal variability as neuropsychological impairment.

Introduction

Intra-individual variability in neuropsychological test performance is common among children (Brooks, Iverson, Sherman, & Holdnack, 2009), adults, and older adults (Axelrod & Wall, 2007; Schretlen, Testa, Winicki, & Pearlson, 2008). Although this research is not altogether new (Dodrill, 1997), the “diagnostic relevance” of scores that either fall within an “abnormal” range or are greatly discrepant from other test scores is an area of current investigation. Schretlen, Munro, Anthony, and Pearlson (2003) found large discrepancies across test scores in over two thirds of a sample of healthy older adults of a magnitude of at least 3.0 SD between their highest and lowest test scores. In addition, there has been an accumulation of evidence that a few abnormal range test scores within a neuropsychological battery are neither uncommon nor necessarily indicative of impairment (Axelrod & Wall, 2007; Binder, Iverson, & Brooks, 2009; Brooks, Iverson, & White, 2007; Heaton, Miller, Taylor, & Grant, 2004; Palmer, Boone, Lesser, & Wohl, 1998; Reitan & Wolfson, 1993; Schretlen et al., 2008). In fact, the normality of some abnormal test scores in a neuropsychological test battery has been recognized for decades. Developed in the 1950s, the original Halstead Impairment Index from the Halstead–Reitan Neuropsychological Battery required that 50% of the 10 subtests comprising the original index to fall within the abnormal range before designating a profile as indicative of neuropsychological impairment (Reitan & Wolfson, 1993). The frequency of abnormal test scores in healthy individuals has been examined in more contemporary batteries as well.

For example, using the Expanded Halstead–Reitan Neuropsychological Battery for Adults, Heaton and colleagues (2004) found that 87% of neurologically normal adults had at least one score that fell 1.0 SD below the mean, 34% had five or more scores that fell at least 1.5 SD below the mean, and 28% had one score at least 2.0 SD below the mean. In addition, Brooks and colleagues (2007) examined memory scores from the Neuropsychological Assessment Battery in a large sample of older, healthy adults and found that 55.5%, 30.8%, and 16.4% had at least one score 1.0, 1.5, and 2.0 SD below the mean, respectively. Similarly, Palmer and colleagues (1998) found that 48% of healthy, older adults performed at or below 1.3 SD from the mean on two or more measures of a flexible battery of neuropsychological tests that yielded 28 test scores and ∼20% scored ≥2.0 SD below the mean on two or more measures. Furthermore, de Rotrou and colleagues (2005) discovered that when analyzed longitudinally, 48% of healthy older adults who scored ≥1.5 SD below the mean on one measure at first testing scored in the normal range when retested using the same battery. Lastly, the frequency of abnormal scores in a neuropsychological profile may also be a function of the number of tests included in the assessment. For example, Schretlen and colleagues (2008) found that 35.7% of healthy participants scored ≥1.0 SD below the mean on at least 2 of 10 tests, but 75.2% performed ≥1.0 SD below the mean on at least 2 of 25 measures.

At present, there is no single standard criterion for designating a test score as abnormal. As illustrated in the literature cited above, researchers and clinicians have traditionally defined abnormal test scores using a certain number of standard deviations from the mean to reflect the presence and magnitude of impaired functioning. Typically, cut-offs for defining abnormal scores have been set at 1.0, 1.5, or 2.0 SD from the mean (Axelrod & Wall, 2007; Heaton et al., 2004; Ingraham & Aiken, 1996; Lezak, Howieson, & Loring, 2004). Others have defined abnormal scores as falling below either the 10th (>1.28 SD) or the 5th percentile (>1.65 SD; Brooks et al., 2007). The Diagnostic and Statistical Manual-5 Work Group has proposed recommended cut-offs for impairment as 1.0–2.0 SD below the mean for a diagnosis of Mild Neurocognitive Disorder and <2.0 SD from the mean for Major Neurocognitive Disorder (American Psychiatric Association, 2010).

Due to the lack of consensus of a psychometric definition of abnormality, and thus the use of wide range of different cutoffs, it is not altogether unexpected that some test scores from healthy, community dwelling adults and older adults are likely to fall within an “abnormal” range. Binder and colleagues (2009) encourages the investigation of low scores at multiple levels of abnormality in order to better discriminate between normal variability and truly impaired test performance. The purpose of this study was to examine the base rates of abnormal test scores on two commonly used measures of executive functioning, the Wisconsin Card Sorting Test (WCST) and the Stroop Color–Word Test (SCWT), in a group of healthy older adults in order to further examine the frequency of such scores.

Method

Participants

Following IRB approval, an analysis was undertaken on data from 241 community dwelling older adults who were part of a large-scale epidemiological investigation. Participants underwent a detailed clinical interview as part of the epidemiological study, and only those individuals without any known medical/neurological conditions or psychiatric histories were included in the study. Complete data were obtained from 211 participants (105 men and 106 women). Participants had a mean age of 64.2 years (SD = 8.9). North America Adult Reading Test (Blair & Spreen, 1989) scores suggested a sample estimated mean Full-Scale Intelligence Quotient of 102.6 (SD = 8.2). All 211 participants performed at or above the recommended cut-off scores on the Test of Memory Malingering (Tombaugh, 1996).

Materials and Procedure

Five indices from the WCST (Berg, 1948; Grant & Berg, 1948) were examined: Total Errors, Perseverative Responses, Perseverative Errors, Nonperseverative Errors, and Percent Conceptual Level Responses. All four indices of the SCWT (Trennerry et al., 1989) were examined, including the Word score, Color score, Color-Word score, and Interference score. The Word and Color scores are not measures of executive functioning, but were included in the analyses since these scores are necessarily obtained when administering the SCWT. Raw scores and T-scores were examined for both tests. The T-scores for the WCST were obtained from the WCST Manual-Revised and Expanded (Heaton, Chelune, Talley, Kay, & Curtiss, 1993), which are age and education corrected. T-scores for the SCWT were obtained from the Stroop Neuropsychological Screening Test Manual using age-corrected norms (Trennerry et al., 1989). We examined seven cut-off scores for abnormality: >1.0, >1.5, >2.0, >2.5, and >3.0 SD from the mean as well as scores below the 10th (>1.28 SD) and 5th (>1.65 SD) percentiles. For the raw score data, the sample means for each index score were first computed, and the seven cut-off scores were then applied to determine the frequency of abnormal test scores.

Results

The mean raw scores and T-scores for each of the indices from the WCST and the SCWT are presented in Table 1. In addition, the base rates of abnormal scores for each index at the seven cut-off scores are included. Results indicated that when utilizing raw scores, the frequency of one abnormal score on any index from the WCST or SCWT at the >1.0 SD cut-off was always above 10%. When examining T-scores, the frequency of scores more than 1.0 SD from the mean was above or close to 10% and often times approached or exceeded 20%. Raw data at the >1.5 SD cut-off were somewhat more variable, and all but one index had frequencies of abnormal scores >5%. Abnormal scores at this cut-off using normative data were also more variable and occurred at a frequency of close to or exceeding 5% for all indices of the WCST and the SCWT. At >2.0, >2.5, and >3.0 SD from the mean, the frequency of abnormal scores never exceeded 3.4% using raw scores from the WCST and the SCWT. Utilizing T-scores, a similar number of participants performed more than 2.0, 2.5, or 3.0 SD from the mean, with only one index exceeding a frequency of 3.3%. Similar to those >1.0 SD from the mean, raw scores falling below the 10th percentile approached or exceeded 10% while the frequency of raw scores falling below the 5th percentile was similar to the >1.5 SD cut-off and either approached or exceeded 5%. T-score data for the WCST and the SCWT at the 10th and 5th percentiles also were comparable with the raw data frequencies.

Table 1.

Base rates (%) of abnormal raw scores and T-scores on the WCST and the SCWT

Test (n = 211) Raw scores
 
Base rates of abnormal raw scores
 
Base rates of abnormal T-scores
 
 Mean SD >1.0 SD >1.28 SD >1.5 SD >1.65 SD >2.0 SD >2.5 SD >3.0 SD >1.0 SD >1.28 SD >1.5 SD >1.65 SD >2.0 SD >2.5 SD >3.0 SD 
WCST 
 Total Errors 39.27 21.80 19.3 10.8 8.9 6.1 2.8 0.9 0.9 21.2 16.5 9.9 8.5 2.3 1.4 0.0 
 Perseverative Responses 22.62 17.61 10.5 6.2 3.4 3.4 2.9 2.4 2.4 10.3 7.9 5.1 4.2 1.4 1.4 0.0 
 Perseverative Errors 20.00 13.89 14.8 8.6 5.3 3.4 2.9 2.4 1.9 9.9 8.5 5.2 4.7 1.4 1.4 0.0 
 Nonperseverative Errors 19.27 11.25 17.6 10.0 8.6 6.7 3.4 1.0 0.5 36.4 29.8 19.4 18.5 4.8 0.5 0.0 
 Conceptual Level 57.94 20.92 19.9 13.7 11.3 7.5 2.8 0.9 0.0 16.9 14.1 9.4 8.5 3.3 1.4 0.0 
SCWT 
 Word 93.87 16.08 15.1 9.4 6.6 5.2 1.4 0.5 0.5 10.4 7.6 3.3 2.4 0.5 0.5 0.5 
 Color 66.55 13.50 16.5 9.9 7.1 5.7 2.4 0.5 0.5 23.2 16.1 6.6 6.6 1.9 0.5 0.0 
 Color–Word 33.18 9.34 14.2 11.8 8.0 5.2 2.4 0.0 0.0 18.0 12.8 5.7 3.8 0.5 0.0 0.0 
 Interference −0.37 7.23 14.7 10.4 5.7 4.8 2.4 1.0 0.5 7.2 4.8 2.4 1.9 0.5 0.5 0.0 
Test (n = 211) Raw scores
 
Base rates of abnormal raw scores
 
Base rates of abnormal T-scores
 
 Mean SD >1.0 SD >1.28 SD >1.5 SD >1.65 SD >2.0 SD >2.5 SD >3.0 SD >1.0 SD >1.28 SD >1.5 SD >1.65 SD >2.0 SD >2.5 SD >3.0 SD 
WCST 
 Total Errors 39.27 21.80 19.3 10.8 8.9 6.1 2.8 0.9 0.9 21.2 16.5 9.9 8.5 2.3 1.4 0.0 
 Perseverative Responses 22.62 17.61 10.5 6.2 3.4 3.4 2.9 2.4 2.4 10.3 7.9 5.1 4.2 1.4 1.4 0.0 
 Perseverative Errors 20.00 13.89 14.8 8.6 5.3 3.4 2.9 2.4 1.9 9.9 8.5 5.2 4.7 1.4 1.4 0.0 
 Nonperseverative Errors 19.27 11.25 17.6 10.0 8.6 6.7 3.4 1.0 0.5 36.4 29.8 19.4 18.5 4.8 0.5 0.0 
 Conceptual Level 57.94 20.92 19.9 13.7 11.3 7.5 2.8 0.9 0.0 16.9 14.1 9.4 8.5 3.3 1.4 0.0 
SCWT 
 Word 93.87 16.08 15.1 9.4 6.6 5.2 1.4 0.5 0.5 10.4 7.6 3.3 2.4 0.5 0.5 0.5 
 Color 66.55 13.50 16.5 9.9 7.1 5.7 2.4 0.5 0.5 23.2 16.1 6.6 6.6 1.9 0.5 0.0 
 Color–Word 33.18 9.34 14.2 11.8 8.0 5.2 2.4 0.0 0.0 18.0 12.8 5.7 3.8 0.5 0.0 0.0 
 Interference −0.37 7.23 14.7 10.4 5.7 4.8 2.4 1.0 0.5 7.2 4.8 2.4 1.9 0.5 0.5 0.0 

Note: 1.28 and 1.65 SD are equivalent to the 10th and 5th percentiles, respectively.

The cumulative frequency of participants who had at least 1, 2, 3, 4, or 5 abnormal scores at each of the seven cut-offs was also examined (Table 2). When utilizing raw data, at 1.0 SD from the mean, 16.6% of participants scored abnormally on at least one SCWT index but only 2.4% scored abnormally on two indices of the SCWT. Similarly, although 19.0% of the participants scored within the abnormal range on the WCST using the 1.0 SD criterion, only 8.1% scored abnormally on two or more indices of the WCST. When utilizing normative data, similar results were found for the SCWT and the WCST with 15.2% and 17.1% of participants performing in the abnormal range (i.e., 1.0 SD) on at least one index, respectively, whereas only 1.0% of participants scored abnormally on at least two indices of the WCST and only 2.4% of participants scored abnormally on at least two indices of the SCWT at this cut-off. In general, it was rare for a participant to score abnormally on two or more indices of the SCWT or WCST when using either normative or raw data at even the most liberal of the seven cut-offs.

Table 2.

The cumulative frequency of participants with at least 1, 2, 3, 4, or 5 abnormal scores at each of the seven cut-offs

 Number of abnormal scores
 
 Raw data
 
Normative data
 
 ≥1 ≥2 ≥3 ≥4 ≥5 ≥1 ≥2 ≥3 ≥4 ≥5 
WCST (n = 211) 
 1.0 SD 40 (19.0%) 17 (8.1%) 8 (3.8%) 3 (1.4%) 1 (0.5%) 36 (17.1%) 2 (1.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 
 1.28 SD 19 (9.0%) 4 (1.9%) 2 (1.0%) 0 (0.0%) 0 (0.0%) 42 (19.9%) 13 (6.1%) 3 (1.4%) 1 (0.5%) 0 (0.0%) 
 1.5 SD 16 (7.6%) 5 (2.4%) 1 (0.5%) 0 (0.0%) 0 (0.0%) 10 (4.7%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 
 1.65 SD 18 (8.5%) 7 (3.3%) 1 (0.5%) 0 (0.0%) 0 (0.0%) 37 (17.5%) 16 (7.6%) 8 (3.8%) 4 (1.9%) 1 (0.5%) 
 2.0 SD 10 (4.7%) 5 (2.4%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 12 (5.7%) 2 (1.0%) 1 (0.5%) 0 (0.0%) 0 (0.0%) 
 2.5 SD 4 (1.9%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 (1.9%) 3 (1.4%) 3 (1.4%) 3 (1.4%) 0 (0.0%) 
 3.0 SD 6 (2.8%) 4 (1.9%) 2 (1.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 
SCWT (n = 211) 
 1.0 SD 35 (16.6%) 5 (2.4%) 0 (0.0%) 0 (0.0%)  32 (15.2%) 5 (2.4%) 0 (0.0%) 0 (0.0%)  
 1.28 SD 29 (13.7%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  42 (19.9%) 7 (3.3%) 0 (0.0%) 0 (0.0%)  
 1.5 SD 13 (6.2%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  6 (2.8%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  
 1.65 SD 21 (10.0%) 4 (1.9%) 1 (0.5%) 0 (0.0%)  22 (10.4%) 2 (1.0%) 0 (0.0%) 0 (0.0%)  
 2.0 SD 11 (5.2%) 2 (1.0%) 1 (0.5%) 0 (0.0%)  4 (1.9%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  
 2.5 SD 1 (0.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  2 (1.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  
 3.0 SD 2 (1.0%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  1 (0.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  
 Number of abnormal scores
 
 Raw data
 
Normative data
 
 ≥1 ≥2 ≥3 ≥4 ≥5 ≥1 ≥2 ≥3 ≥4 ≥5 
WCST (n = 211) 
 1.0 SD 40 (19.0%) 17 (8.1%) 8 (3.8%) 3 (1.4%) 1 (0.5%) 36 (17.1%) 2 (1.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 
 1.28 SD 19 (9.0%) 4 (1.9%) 2 (1.0%) 0 (0.0%) 0 (0.0%) 42 (19.9%) 13 (6.1%) 3 (1.4%) 1 (0.5%) 0 (0.0%) 
 1.5 SD 16 (7.6%) 5 (2.4%) 1 (0.5%) 0 (0.0%) 0 (0.0%) 10 (4.7%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 
 1.65 SD 18 (8.5%) 7 (3.3%) 1 (0.5%) 0 (0.0%) 0 (0.0%) 37 (17.5%) 16 (7.6%) 8 (3.8%) 4 (1.9%) 1 (0.5%) 
 2.0 SD 10 (4.7%) 5 (2.4%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 12 (5.7%) 2 (1.0%) 1 (0.5%) 0 (0.0%) 0 (0.0%) 
 2.5 SD 4 (1.9%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 (1.9%) 3 (1.4%) 3 (1.4%) 3 (1.4%) 0 (0.0%) 
 3.0 SD 6 (2.8%) 4 (1.9%) 2 (1.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 
SCWT (n = 211) 
 1.0 SD 35 (16.6%) 5 (2.4%) 0 (0.0%) 0 (0.0%)  32 (15.2%) 5 (2.4%) 0 (0.0%) 0 (0.0%)  
 1.28 SD 29 (13.7%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  42 (19.9%) 7 (3.3%) 0 (0.0%) 0 (0.0%)  
 1.5 SD 13 (6.2%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  6 (2.8%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  
 1.65 SD 21 (10.0%) 4 (1.9%) 1 (0.5%) 0 (0.0%)  22 (10.4%) 2 (1.0%) 0 (0.0%) 0 (0.0%)  
 2.0 SD 11 (5.2%) 2 (1.0%) 1 (0.5%) 0 (0.0%)  4 (1.9%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  
 2.5 SD 1 (0.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  2 (1.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  
 3.0 SD 2 (1.0%) 1 (0.5%) 0 (0.0%) 0 (0.0%)  1 (0.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%)  

The base rate of individuals who had a specific number of abnormal scores (either 1, 2, 3, 4, or 5 abnormal scores) at each of the seven cut-offs was also examined (found in Figs. 1–4). Not surprisingly, these data were highly similar to the cumulative frequency of participants found in Table 2. The percentage of participants with 1, 2, or 3 abnormal scores were almost always greater across all indices of the WCST and the SCWT when using more liberal cut-offs (i.e., 1.0 SD, 10th percentile) compared with more stringent cut-offs. However, there were very few people who scored abnormally on two or more of the indices of the WCST or the SCWT when examining both raw and normative data. Only two (1.0%) participants had 3 abnormal scores (one at the 10th percentile and one at 2.0 SD), and no participants had 4 abnormal scores across any of the cut-offs on any of the 4 indices from the SCWT. On the WCST, depending on the cut-off, 0%–2.4% of participants had 3 abnormal scores, 0.0%–1.4% scored abnormally on 4 indices, and 0.0%–0.5% scored abnormally on all 5 indices.

Fig. 1.

Frequency of participants with a specific number of abnormal WCST raw scores at each cut-off.

Fig. 1.

Frequency of participants with a specific number of abnormal WCST raw scores at each cut-off.

Fig. 2.

Frequency of participants with a specific number of abnormal WCST T-scores at each cut-off.

Fig. 2.

Frequency of participants with a specific number of abnormal WCST T-scores at each cut-off.

Fig. 3.

Frequency of participants with a specific number of abnormal SCWT raw scores at each cut-off.

Fig. 3.

Frequency of participants with a specific number of abnormal SCWT raw scores at each cut-off.

Fig. 4.

Frequency of participants with a specific number of abnormal SCWT T-scores at each cut-off.

Fig. 4.

Frequency of participants with a specific number of abnormal SCWT T-scores at each cut-off.

Discussion

This study sought to examine the frequency of abnormal scores when seven cut-offs for abnormality were applied to both raw and normative data from the WCST and the SCWT. Our data support Binder and colleagues' (2009) assertion that abnormal scores on neuropsychological measures, including the WCST and the SCWT, may be “psychometrically normal.”

Reference to the base rate data for both the WCST and the SCWT (Table 1) illustrates the expected increase in the frequency of abnormal test scores as the criterion for abnormality becomes more liberal. The data also show variability in the base rates of abnormal scores across the test indices at each of the criterion. That is, the number of healthy older adults falling in the abnormal range using a specific criterion differed depending on the index examined. In this sample, index scores >1.0 SD below the mean on any of the WCST or SCWT indices were not uncommon, with all base rates approaching or exceeding 10%. Base rates for this criterion ranged from 9.9% for normed WCST Perseverative Responses and 7.2% for normed SCWT Interference scores up to 36.4% for normed WCST Nonperseverative Responses. Importantly, although variable, the >1.0 SD cut-off had a high base rate across all indices examined and, therefore, this cut-off appears to have a high false-positive rate for this age group on both measures. More stringent cut-offs have a broader range of base rates, with some too high to consider scores “abnormal” and others low enough to begin considering the presence of impairment. For example, base rates at the <10th percentile and >1.5 SD criteria remained high for several of the indices, including normed scores for WCST Errors, WCST Nonperseverative Errors, WCST Conceptual Level, and SCWT Color and Color-Word naming; however, a number of the other indices had relatively low frequencies of abnormal scores using these criteria (i.e., WCST Perseverative Responses, SCWT Interference, SCWT Word). Base rates of abnormal scores for the more stringent criteria (<5th percentile, >2, 2.5, and 3.0 SD) were low, with the exception of an 18.5% base rate of normed WCST Nonperseverative Responses scores using the <5th percentile criterion. In addition, the number of abnormal scores on the SCWT or the WCST appears to have diagnostic significance. The frequency of having two or more abnormal scores on either measure at a specific cut-off was very low in this sample of healthy older adults (Table 2 and Figs. 1–4). These findings were independent of the criterion used to determine abnormality, and obtaining more than two abnormal scores was rare even at liberal cut-offs.

In summary, the data demonstrate that the use of the >1.0 SD cut-off for defining an abnormal test score on the WCST and the SCWT should be used cautiously with older adults given the high base rate of scores in this range with this sample of healthy older adults. Additionally, it is not unusual for a proportion of adequate effort, healthy older adults to obtain 1 score below the 10th or 5th percentile as well as >1.5 SD below the mean on some indices from the WCST or the SCWT. Although less likely, a small percentage of healthy participants did obtain scores even at the most stringent cut-offs (>2.0, >2.5, or >3.0 SD from the mean); however, the base rate is low and these cut-offs result in a decreased risk of false positives. The presence of two scores falling within the abnormal range on either WCST or SCWT, however, was uncommon. Thus, having more than 1 score falling within the abnormal range even using the most liberal of criterion for abnormality may reflect neuropsychological impairment.

Although each individual had been medically evaluated, neurological status may have varied among the adults included in the study. In addition, the generalizability of our findings to other neuropsychological measures of executive and overall cognitive functioning may be limited. Our data show that some scores in the “abnormal” or outlier range on the WCST and the SCWT may reflect normal variability in aspects of executive functioning (i.e., conceptual reasoning, mental flexibility, and response inhibition) rather than impairment. Since executive functioning is multifaceted, a normal score on the WCST and the SCWT does not exclude the possibility of impairment in other aspects of executive functioning, nor does it exclude the possibility of neuropsychological impairment involving other cognitive domains. The determination of neuropsychological impairment relies upon a more comprehensive neuropsychological assessment.

Symptom validity testing was completed to ensure that the data represented valid performance of these healthy older adults. Therefore, the outlier scores are not due to effort-related issues. As has been found in other age groups, some degree of variability in test scores is common, including the presence of scores that fall within the abnormal range. The findings from this study indicate that a degree of caution is warranted in attributing isolated low scores on the WCST or the SCWT to cognitive decline or impairment in older individuals.

Acknowledgements

The authors are grateful Cecil R. Reynolds, former editor of Archives of Clinical Neuropsychology and current editor of Psychological Assessment, who served as the guest action editor. This manuscript was submitted to the blind peer review process.

References

American Psychiatric Association
American Psychiatric Association: DSM-5 development
2010
 
Retrieved from http://www.dsm5.org/proposedrevision/Pages/NeurocognitiveDisorders.aspx. Last accessed: November 1, 2011
Axelrod
B. N.
Wall
J. R.
Expectancy of impaired neuropsychological test scores in a non-clinical sample
International Journal of Neuroscience
 , 
2007
, vol. 
117
 (pg. 
1591
-
1602
)
Berg
E. A.
A simple objective technique for measuring flexibility in thinking
Journal of General Psychology
 , 
1948
, vol. 
39
 (pg. 
15
-
22
)
Binder
L. M.
Iverson
G. L.
Brooks
B. L.
To err is human: “Abnormal” neuropsychological scores and variability are common in healthy adults
Archives of Clinical Neuropsychology
 , 
2009
, vol. 
24
 (pg. 
31
-
46
)
Blair
J. R.
Spreen
O.
Predicting premorbid IQ: A revision of the National Adult Reading Test
The Clinical Neuropsychologist
 , 
1989
, vol. 
3
 (pg. 
129
-
136
)
Brooks
B. L.
Iverson
G. L.
Sherman
E. M. S.
Holdnack
J. A.
Healthy children and adolescents obtain some low scores across a battery of memory tests
Journal of the International Neuropsychological Society
 , 
2009
, vol. 
13
 (pg. 
490
-
500
)
Brooks
B. L.
Iverson
G. L.
White
T.
Substantial risk of “Accidental MCI” in healthy older adults: Base rates of low memory scores in neuropsychological assessment
Journal of the International Neuropsychological Society
 , 
2007
, vol. 
13
 (pg. 
490
-
500
)
de Rotrou
J.
Wenisch
E.
Chausson
C.
Dray
F.
Faucounau
V.
Rigaud
A. S.
Accidental MCI in healthy subjects: A prospective longitudinal study
European Journal of Neurology
 , 
2005
, vol. 
12
 (pg. 
879
-
885
)
Dodrill
C. B.
Myths of neuropsychology
The Clinical Neuropsychologist
 , 
1997
, vol. 
11
 (pg. 
1
-
17
)
Grant
D. A.
Berg
E. A.
A behavioral analysis of degree of reinforcement and ease of shifting to new responses in a Weigl-type card-sorting problem
Journal of Experimental Psychology
 , 
1948
, vol. 
38
 (pg. 
404
-
411
)
Heaton
R. K.
Chelune
G. J.
Talley
J. L.
Kay
G. G.
Curtiss
G.
Wisconsin Card Sorting Test manual—Revised and expanded
 , 
1993
Lutz, FL
Psychological Assessment Resource
Heaton
R. K.
Miller
S. W.
Taylor
M. J.
Grant
I.
Revised comprehensive norms for an expanded Halstead–Reitan Battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults professional manual
 , 
2004
Lutz, FL
Psychological Assessment Resources
Ingraham
L. J.
Aiken
C. B.
An empirical approach to determining criteria for abnormality in test batteries with multiple measures
Neuropsychology
 , 
1996
, vol. 
10
 (pg. 
120
-
124
)
Lezak
M. D.
Howieson
D. B.
Loring
D. W.
Neuropsychological assessment
 , 
2004
4th ed.
Oxford
Oxford University Press
Palmer
B. W.
Boone
K. B.
Lesser
I. M.
Wohl
M. A.
Base rates of “impaired” neuropsychological test performance among healthy older adults
Archives of Clinical Neuropsychology
 , 
1998
, vol. 
13
 (pg. 
503
-
511
)
Reitan
R. M.
Wolfson
D.
The Halstead-Reitan Neuropsychological Test Battery: Theory and clinical interpretation
 , 
1993
2nd ed.
S. Tucson, AZ
Neuropsychological Press
Schretlen
D. J.
Munro
C. A.
Anthony
J. C.
Pearlson
G. D.
Examining the range of normal intraindividual variability in neuropsychological test performance
Journal of the International Neuropsychological Society
 , 
2003
, vol. 
9
 (pg. 
864
-
870
)
Schretlen
D. J.
Testa
S. M.
Winicki
J. M.
Pearlson
G. D.
Gordon
B.
Frequency and bases of abnormal performance by healthy adults on neuropsychological testing
Journal of the International Neuropsychological Society
 , 
2008
, vol. 
14
 (pg. 
436
-
445
)
Tombaugh
T. N.
Test of Memory Malingering (TOMM)
 , 
1996
New York
Multi Health Systems
Trennerry
M. R.
Crosson
B.
DeBoe
J.
Leber
W. R.
Stroop Neurological Screening Test
 , 
1989
Odessa, FL
Psychological Assessment Resources

Author notes

Portions of the paper were presented at the 39th annual meeting of the International Neuropsychological Society, Boston, MA, USA.