Abstract

In contrast to expectations, demographic corrections to reduce biases against those of advanced age or few years of education does not universally improve diagnostic classification accuracy. Age corrections may be particularly problematic because age is also a risk factor for a dementia diagnosis. We found that simulating increased risk for dementia based on demographic variables, such as age, reduced the overall classification accuracy for demographically corrected simulated scores relative to the raw, uncorrected test scores. In clinical data with a small magnitude of association between age and dementia diagnosis, we found equivalent overall classification accuracy for demographically corrected and raw test scores. Regardless of the overall classification accuracy results, cutoff comparisons (16th and 9th percentiles) in clinical and simulated data demonstrated that for the most part, the sensitivity of raw scores was higher than the sensitivity of demographically corrected scores, but the specificity of scores corrected with normative data was superior.

Introduction

Demographic variables such as advanced age or few years of formal education are commonly related, at least by some degree, to older adults' neuropsychological (NP) test performance (e.g., Bank, Youchim, MacNeill, & Lichtenberg, 2000; Bravo & Hébert, 1997; Brown, Shinka, Mortimer, & Graves, 2003; Marcopulos, Gripshover, Broshek, McLain, & Brashear, 1999; Marcopulos, McLain, & Guiliano, 1997; Nabors, Vangel, Lichtenberg, & Walsh, 1997; Youngjohn, Larrabee, & Crook, 1993). Individuals with advanced age (e.g., Adams, Boake, & Crain, 1982; Ainslie & Murden, 1993; Belle et al., 1996; Bornstein, 1986; Davies, 1968; Folstein, Folstein, & McHugh, 1975; Jagger, Clarke, Anderson, & Battcock, 1992; Long & Klein, 1990; Mangione et al., 1993; Tombaugh, McDowell, Kristjansson, & Hubley, 1996), few years of formal education (e.g., Adams, et al., 1982; Belle et al., 1996; Folstein et al., 1975; Marcopulos et al., 1997; Murden, McRae, Kaner, & Bucknam, 1991; Tombaugh et al., 1996), lower socioeconomic class (e.g., Folstein et al., 1975; Jagger et al., 1992), non-majority ethnicity (e.g., Adams et al., 1982; Murden et al., 1991), or compromised physical (e.g., Jagger et al., 1992; Mangione et al., 1993) or mental health (e.g., Mangione et al., 1993) are more likely to be misclassified as impaired poor cognitive performance. It is logical to assume that corrections for the demographic variables such as advanced age and few years of formal education would reduce the error in classifications and, therefore, improve classification accuracy. Consequently, demographically corrected normative data are provided for cognitive tests (e.g., seeMitrushina, Boone, Razani, & D'Elia, 2005; Strauss, Sherman, & Spreen, 2006). Emerging research suggests, however, the use of such demographic corrections for cognitive test scores may not improve dementia classification accuracy (Belle et al., 1996; Sliwinski, Buschke, Stewart, Masur, & Lipton, 1997) and dementia screening accuracy (Kraemer, Moritz, & Yesavage, 1998; O'Connell, Tuokko, Graves, & Kadlec, 2004).

Sliwinski and colleagues (1997) and Sliwinski, Hofer, Hall, Buschke, and Lipton (2003) suggest that age corrections for tests used to screen for or to diagnose dementia may be inappropriate because advancing age is a risk factor for dementia (e.g., Canadian Study of Health and Aging [CSHA], 1994, 2000). Correcting for age may result in a removal of predictive variance for diagnosing dementia; consequently, Sliwinski and colleagues (1997, 2003) differentiate between demographically corrected norms, which are useful for determining clinical strengths and weaknesses (i.e., “comparative norms”), and “diagnostic norms.” “Diagnostic norms” combine uncorrected raw scores with age-corrected base rate information based on age-related increases in dementia prevalence. Sliwinski and colleagues (1997) explored “diagnostic norms” in a prospective study of incident dementia where they compared the classification accuracy of memory test scores that were demographically corrected (age-adjusted scores; i.e., “comparative norms”) with uncorrected memory test scores combined with participant's age (age-weighted receiver operating curve [ROC]; i.e., “diagnostic norms”). The age-weighted ROC appeared greater (i.e., qualitatively had a greater area under the curve) than the age-adjusted ROC. From this, they concluded that the modification of test scores by the use of age adjustment reduced the discriminative validity because age was associated with dementia.

These empirical findings and theoretical contributions by Sliwinski and colleagues (1997, 2003) are important. The foundation for correcting demographic influences on test scores is to reduce error in diagnosis, thus increasing the evidence for the criterion validity of these cognitive tests. If, however, demographic variables are also related to the diagnostic status, then they no longer constitute a source of error and their effects should not be minimized with demographic corrections. When considering group-based differences, these arguments are cogent, possibly even simplistic. Take, for example, two overlapping theoretical distributions of raw cognitive test scores; one distribution created by those with no cognitive impairment, and one distribution created by those with dementia diagnoses. If age is a source of error variance, age corrections would pull these two overlapping distributions farther apart, increasing Cohen's d, translating into larger sensitivity and specificity (Cohen, 1988). Importantly, increased sensitivity and specificity has implications at the individual patient level: When combined with base rate of disease, sensitivity and specificity can provide the probability that a patient has (viz., positive predictive value) or does not have (viz., negative predictive value) a specific disease. If, on the other hand, age is related to a dementia diagnosis, partialling out age from these test scores will bring these distributions closer together, reducing Cohen's d. A smaller d would mean a larger percent nonoverlap (U1 in Cohen's terminology), which translates into smaller sensitivity and specificity for the age-partialled test scores. If these group-based comparisons are true and, more importantly, if they do translate into reduced sensitivity and specificity, this would support Sliwinski and colleagues' (1997, 2003) argument that age adjustments should be avoided when using cognitive tests as part of the dementia differential diagnosis. Similarly, the use of education adjustments might also be cautioned since few years of formal education may also be associated with dementia diagnosis (e.g., Chibnall & Eastwood, 1998; Leibovici et al., 1996; Meguro et al., 2001; also see literature reviews by Gatz et al., 2001; Stern, 2002; Tuokko, Garrett, McDowell, Silverberg, & Kristjansson, 2003).

Quantitative corrections for age are commonplace when interpreting NP test scores (e.g., Strauss et al., 2006), but age corrections may reduce dementia classification accuracy (i.e., criterion validity). This research explored this possibility by testing Sliwinski and colleagues' (1997, 2003) hypothesis that corrections for demographic influences would compromise dementia classification criterion validity when these demographic variables were associated not only with test scores, but also with diagnosis. Analyses were tallies of individual patients' classification accuracy (i.e., sensitivity and specificity) summed across all patients (i.e., using ROC analyses).

In our opinion, the best method to test Sliwinski and colleagues' (1997, 2003) hypothesis required manipulating the magnitude of association between demographic variables and diagnosis, while maintaining the magnitude of association between demographic variables and test scores. To us, a simulation-based method appeared to be the only available option to experimentally manipulate these variables. As a crucial adjunct to these simulated data, we explored dementia classification accuracy in archivally collected clinical data. In these clinical data, we report a cutoff comparison of demographically corrected and raw test scores based on impairment defined at the 16th (Heaton, Miller, Taylor, & Grant, 2004) and then the 9th percentiles to serve as a tangible aid for understanding the ROC results. We hypothesized that we would find support for Sliwinski and colleagues' (1997, 2003) arguments: In the simulated data, we expected that creating age as a risk factor for a dementia diagnosis would reduce the accuracy of age-corrected scores and we expected this finding would be replicated in the archival clinical data.

Materials and Methods

Simulation Procedure

A novel Monte Carlo simulation procedure was used based on numbers randomly generated in S-Plus. Simulated scores representing “test” scores, “age,” and “education” were created based on 1,000 normally distributed numbers. “Age” and “education” were created to be associated with “test” scores with r = |0.30| to approximate a medium magnitude of association (Cohen, 1988). Thirty percent of theoretical participants were assigned a diagnosis of “dementia,” a proportion arbitrarily chosen to be consistent with the largest population-based prevalence estimates (i.e., for those 85+, CSHA Working Group, 1994). Simulated diagnostic status of “dementia” or “no dementia” was assigned based on a relationship with “test” scores (viz., proportions based on accuracy at one cutoff point, the “assignment cutoff”) that created an area under the receiver operating characteristic curve (AUC) of roughly 78% for raw scores. “Test” scores higher than the “assignment cutoff” were labeled as T− and “test” scores lower than this cutoff were labeled as T+. The mathematical relation between cutoff sensitivity, cutoff specificity, and proportion of T− and T+ were used to determine how many T− and T+ theoretical “participants” should be assigned a simulated diagnosis of “no dementia,” D−, or “dementia,” D+, based on cutoff sensitivity (i.e., “sens”) and cutoff specificity (i.e., “spec”). The proportion of theoretical “participants” who were T+ who should be assigned D+ was calculated as follows: 

formula
The proportion of “participants” who were T− and also needed to be assigned D+ was 
formula
D− was assigned for the remainder of T− and T+ (i.e., [T+] − [D+ T+] = D− T+ and [T−] − [D+ T−] = T− D−).

The above procedure was assigned based on simulated demographic variables (i.e., first “participant” in the randomized order whose “age” was 85+ and who was also T+ was designated D+) in order to manipulate the magnitude of associations between demographic variables and diagnosis. The association between demographic variables and diagnoses varied based on effect sizes for correlations (Cohen, 1988): (a) “Trivial Magnitude,” where diagnosis with age and diagnosis with education each approximated zero, (b) “Small Magnitude” where diagnosis with age, education approximated |0.10| (i.e., rpb-age,diagnosis ≈ −0.10; rpb-education,diagnosis ≈ 0.10), and (c) “Medium Magnitude,” where diagnosis with age, education each approximated |0.30|. Although the likelihood of a medium magnitude association between demographic variables and dementia diagnosis is unknown since we only had data for the risk of dementia across a restricted age range (i.e., 65+), we thought it prudent to test the outer limits likely possible across the adult lifespan. These specific magnitudes of association were chosen to reflect the range of possible associations between age and cognitive test scores (i.e., medium magnitude associations have been reported for age and cognitive test scores; e.g., Bank et al., 2000; Marcopulos et al., 1997).

For each experimental condition, demographically corrected “test” scores were regression based. Demographic variables were regressed onto “test” scores for the group with an assigned diagnosis of “no cognitive impairment.” These unstandardized regression coefficients were used to correct for age and education in the whole sample (of note, no out of range corrected scores were included in these simulated data). The overall accuracy (viz., AUC) of the raw “test” scores and the regression-adjusted “test” scores were compared with ROC analyses.

To account for random number variability, the above procedure was repeated 100 times. For each condition, the AUC (averaged across the 100 iterations) of raw and demographically corrected “test” scores was compared using Hanley and McNeil's (1983) procedure.

Participants

To investigate the generalizability of the simulation-based results, archival data collected as part of the multisite ethically approved CSHA (e.g., CSHA Working Group, 1994) were explored. Briefly, data from CSHA-1 consisted of screening information collected for 10,263 randomly selected Canadians who were 65 years of age or older at the time. The study had few exclusionary criteria and included English- or French-speaking community-dwelling older adults in addition to residents of care facilities. A subsample of the 10,263 participated in interdisciplinary diagnostic investigations: Selection for this procedure was based on cognitive screening performance (i.e., all who failed, but a random sample of those who passed the screen, were selected). Diagnoses of dementia, cognitive impairment but no dementia, and no cognitive impairment were made by consensus after thorough medical, nursing, and NP investigations. The CSHA-1 NP tests have been described elsewhere in detail (i.e., Tuokko, Kristjansson, & Miller, 1995), but included abbreviated versions (every second item) of subtests from the Wechsler Adult Intelligence Scale-Revised (i.e., WAIS-R similarities, comprehension, block design) and WAIS-R digit symbol (Wechsler, 1981, as cited in Tuokko et al., 1995). In addition, immediate and delayed memory was measured with the 15-noun list learning Rey Auditory Verbal Learning Test (RAVLT; Lezak, 1983), and immediate recognition was tested for 15 designs with the Benton Visual Retention Test (BVRT; Benton, 1974, as cited in Tuokko et al., 1995). Remote memory was assessed with the information subtest from the Wechsler Memory Scale (Wechsler, 1975, as cited in Tuokko et al., 1995). Span of attention was assessed with the Span of Digits Test (Tuokko et al., 1995), where participants are asked to repeat a series of digits of increasing length. Language comprehension was assessed with an 11-item version of the Token test (from Benton & Hamsher, 1989, as cited in Tuokko et al., 1995). Verbal fluency was assessed for words (controlled word association test, COWAT; Spreen & Benton, 1969, as cited in Spreen & Strauss, 1998) and a semantic category (Animal Naming; Rosen, 1980, as cited in Tuokko et al., 1995).

A subsample of participants who had completed “all” of the NP measures and received an interdisciplinary diagnosis was selected. Table 1 describes the overall CSHA-1 sample and the subsample selected for these analyses. As can be seen in Table 1, the demographics of the subsample used for these analyses was similar to the overall CSHA-1 sample, with the notable exception that the subsample contained fewer participants with diagnosis of dementia, as fewer participants with dementia would have been able to complete all measures required for the NP assessment, particularly all trials of the RAVLT (Tuokko et al., 1995).

Table 1.

Descriptive statistics for the CHSA-1 “overall sample” and “subsample” selected for these analyses

 Overall sample Subsample 
Overall N 10,263 1,252 
Age (years) M = 76.53; SD = 7.52; range 64–106; Skewness = 0.3877 M = 79.25; SD = 6.85; range 65–100; Skewness = −0.0559 
Education (years; N = 10,015) M = 9.95; SD = 3.90; range 0–33; Skewness = 0.2974 M = 9.20; SD = 3.93; range 0–25; Skewness = 0.3651 
% Female participants 60.95* (N = 10,263) 60.94 
Self-Identified Ethnic or Cultural Group Canadian 47.7%, United Kingdom 34.4%, Other European 10.3%, Other 7.6% (N = 8,875) Canadian 49.3%, United Kingdom 33.4%, Other European 10.7%, Other 6.5% (N = 908) 
Number diagnosed with dementia 1,132** (39%; N = 2,914) 213 (17%) 
Number diagnosed with either cognitive impairment but not dementia or no cognitive impairment 1,782** (61%; N = 2,914) 1,039 (83%) 
rpb between age and diagnosis 0.2301** (N = 2,914) 0.1210 
rpb between education and diagnosis −0.0252** (N = 2,914) −0.0220 
r between age and education −0.1230* (N = 10,015) 0.0556 
 Overall sample Subsample 
Overall N 10,263 1,252 
Age (years) M = 76.53; SD = 7.52; range 64–106; Skewness = 0.3877 M = 79.25; SD = 6.85; range 65–100; Skewness = −0.0559 
Education (years; N = 10,015) M = 9.95; SD = 3.90; range 0–33; Skewness = 0.2974 M = 9.20; SD = 3.93; range 0–25; Skewness = 0.3651 
% Female participants 60.95* (N = 10,263) 60.94 
Self-Identified Ethnic or Cultural Group Canadian 47.7%, United Kingdom 34.4%, Other European 10.3%, Other 7.6% (N = 8,875) Canadian 49.3%, United Kingdom 33.4%, Other European 10.7%, Other 6.5% (N = 908) 
Number diagnosed with dementia 1,132** (39%; N = 2,914) 213 (17%) 
Number diagnosed with either cognitive impairment but not dementia or no cognitive impairment 1,782** (61%; N = 2,914) 1,039 (83%) 
rpb between age and diagnosis 0.2301** (N = 2,914) 0.1210 
rpb between education and diagnosis −0.0252** (N = 2,914) −0.0220 
r between age and education −0.1230* (N = 10,015) 0.0556 

*Complete data were only available for the number shown.

**The N is smaller here because only 2,914 participants received follow-up of their screening performance and received a consensus diagnosis after interdisciplinary investigations, including neuropsychology.

Clinical Data Procedure

Each participant's (N = 1,252) cognitive performance was corrected in the following manner: Their education (and age) multiplied by the unstandardized regression coefficients for (a) age and education simultaneously predicting cognitive test scores and (b) education alone predicting cognitive test scores. The education alone comparison was completed because in these data, age was associated with both test scores and dementia, but education was only associated with test scores (its association with dementia was trivial). Coefficients were obtained from regression analyses performed for the subgroup with a diagnosis of no cognitive impairment (n = 1,039). The classification accuracy of these corrected scores (Adjustedage,education, Adjustededucation) were compared with the classification accuracy of the uncorrected raw test scores using ROC analyses.

Using the same sample (N = 1,252), demographic corrections were conducted with published normative data that were demographically stratified for age, or age and education. Sources of published normative data did not include those based solely on CSHA normative samples. From commonly used resources of normative data for NP tests, the demographically stratified normative data were examined for the age stratified COWAT and then education stratified COWAT (metanorms provided by Loonstra et al., 2001, as cited in Strauss et al., 2006), and age stratified for RAVLT A1–A5, Immediate Delay, and True Positive (metanorms provided by Schmidt, 1996, as cited in Strauss et al., 2006), age stratified for Animal Naming (metanorms provided by Mitrushina et al., 2005), and education stratified for the BVRT (Miatton et al., 2004, as cited in Strauss et al., 2006).

The published average and standard deviation provided for each group stratified by age, education, or both were used to transform each participants' test scores based on his or her demographic information (i.e., percentiledemographically-stratified). The published average and standard deviation pooled across demographically stratified groups were used to create a z-score, and subsequent percentile rank, based on the overall sample (i.e., percentileoverall), which was equivalent to the raw test scores.

These clinical data demonstrated a small association between age and dementia diagnosis, which served as a check of the simulated “Small Magnitude” condition in the simulated data. To explore the generalizability of the simulated “Trivial Magnitude” condition, all of the 213 participants with dementia diagnoses from the clinical sample were matched first based on age (within 2 years) and education (within 5 years). For this demographically matched sample, the association between demographic variables and dementia was trivial, and the overall classification accuracy for dementia diagnoses for percentiledemographically-stratified and percentileoverall were compared. Finally, to more tangibly explore the potential implications of these data for criterion validity of NP tests, the sensitivity and specificity of the raw and demographically corrected test scores (corrected with published normative data) was compared at two commonly used cutoffs of impairment: 8/9th percentile (Strauss et al., 2006) and 15/16th percentile (Heaton et al., 2004).

Results

Simulation Results

In the simulated data, examination of the bivariate correlations suggested that the degree of association between demographic variables and diagnostic status was successfully manipulated while maintaining stability in the associations between test scores and demographic variables. In these data, Sliwinski and colleagues' (1997, 2003) hypothesis was supported: As the association between demographic variables and diagnostic status increased (i.e., increasing demographic variables as risk factors for dementia), demographic corrections compromised classification accuracy. This is shown in Fig. 1. The classification accuracy decrease seen for demographically corrected scores was, however, only statistically significant in the Medium Magnitude condition (with α = 0.017 due to the three statistical tests required).

Fig. 1.

For simulated data: Percent (%) difference in accuracy raw score AUC minus AUC for demographically corrected scores plotted by the magnitude of association between the demographic variables and diagnosis.

Fig. 1.

For simulated data: Percent (%) difference in accuracy raw score AUC minus AUC for demographically corrected scores plotted by the magnitude of association between the demographic variables and diagnosis.

Clinical Data Results

Replication of the simulation data was explored in archivally collected clinical data where there was a small association between age and diagnosis (rpb = 0.1230, see Table 1), which means that the group with dementia was slightly older (M = 81.08, SD = 6.35, range = 65–95) than the group without a dementia diagnosis (M = 78.87, SD = 6.89, range = 65–100). Table 2 describes the associations in these clinical data between age and education with NP test scores. Table 3 reports the accuracy (viz., overall classification of dementia diagnosis or no dementia diagnosis) of the raw test scores, the test scores adjusted for age and education, and scores regression adjusted for education. The % Difference in accuracy (displayed in the third and fourth columns of Table 3) shows that these differences—both positive (accuracy of raw > demographically corrected) and negative (accuracy of raw < demographically corrected)—were quite small. Although scores adjusted for education alone demonstrated higher accuracy than scores regression adjusted for both age and education, all classification differences were nonsignificant (even if we were willing to tolerate inflated α for multiple comparisons).

Table 2.

Associations between age and education with specific cognitive test scores for the CSHA-1 sample

Cognitive measure r with age r with education 
WMS information −0.1683 0.1619 
RAVLT measures 
 RAVLT A1 −0.1960 0.2021 
 RAVLT A2 −0.2713 0.1851 
 RAVLT A3 −0.2722 0.1931 
 RAVLT A4 −0.2432 0.2125 
 RAVLT A5 −0.2430 0.2218 
 RAVLT B1 −0.1972 0.1928 
 RAVLT Immed Delay −0.2397 0.1685 
 RAVLT True Positive −0.0732 0.0696 
 RAVLT True Negative −0.1196 0.1679 
BVRT Total Correct −0.1734 0.2963 
Span of Digits 0.0124 0.3610 
11-Item Token Test −0.1122 0.3607 
COWAT −0.0001 0.4762 
Animal Naming −0.1522 0.2394 
WAIS-R subtests 
 WAIS-R similarities −0.0809 0.4752 
 WAIS-R comprehension −0.0189 0.4369 
 WAIS-R block design −0.1111 0.3123 
 WAIS-R digit symbol −0.2514 0.4433 
Cognitive measure r with age r with education 
WMS information −0.1683 0.1619 
RAVLT measures 
 RAVLT A1 −0.1960 0.2021 
 RAVLT A2 −0.2713 0.1851 
 RAVLT A3 −0.2722 0.1931 
 RAVLT A4 −0.2432 0.2125 
 RAVLT A5 −0.2430 0.2218 
 RAVLT B1 −0.1972 0.1928 
 RAVLT Immed Delay −0.2397 0.1685 
 RAVLT True Positive −0.0732 0.0696 
 RAVLT True Negative −0.1196 0.1679 
BVRT Total Correct −0.1734 0.2963 
Span of Digits 0.0124 0.3610 
11-Item Token Test −0.1122 0.3607 
COWAT −0.0001 0.4762 
Animal Naming −0.1522 0.2394 
WAIS-R subtests 
 WAIS-R similarities −0.0809 0.4752 
 WAIS-R comprehension −0.0189 0.4369 
 WAIS-R block design −0.1111 0.3123 
 WAIS-R digit symbol −0.2514 0.4433 

Notes: WMS = Wechsler Memory Scale; RAVLT = Rey Auditory Verbal Learning Test; BVRT = Benton Visual Retention Test; COWAT = controlled word association test; WAIS-R = Wechsler Adult Intelligence Scale-Revised.

Table 3.

CSHA-1 cognitive test score accuracy statistics for raw scores and the percent difference between the AUCs for the raw and adjusted scores for classifying dementia

Cognitive measure Raw AUC (SEAge, education adjusted % difference Education adjusted % difference 
WMS Information 0.8240 (0.0167) 0.37 0.21 
RAVLT measures 
 RAVLT A1 0.7082 (0.0186) 1.33 −0.05 
 RAVLT A2 0.7838 (0.0153) 1.75 −0.39 
 RAVLT A3 0.8072 (0.0143) 1.92 −0.11 
 RAVLT A4 0.8162 (0.0141) 1.89 −0.06 
 RAVLT A5 0.8276 (0.0133) 1.74 0.03 
 RAVLT B1 0.6914 (0.0182) 2.66 −0.06 
 RAVLT Immed Delay 0.8175 (0.0135) 1.20 −0.12 
 RAVLT True Positive 0.6313 (0.0218) 1.60 0.72 
 RAVLT True Negative 0.7800 (0.0181) 0.90 0.75 
BVRT Total Correct 0.7571 (0.0173) 0.18 −0.60 
Span of Digits 0.5396 (0.0215) 0.89 0.58 
11-item Token Test 0.6238 (0.0216) 0.35 −0.27 
COWAT 0.6568 (0.0199) −0.94 −1.14 
Animal Naming 0.7815 (0.0169) 1.48 0.11 
WAIS-R subtests 
 WAIS-R similarities 0.6739 (0.0185) −1.21 −2.00 
 WAIS-R comprehension 0.6303 (0.0199) −0.39 −0.60 
 WAIS-R block design 0.7318 (0.0183) 1.38 −0.34 
 WAIS-R digit symbol 0.7650 (0.0176) 1.44 −0.88 
Cognitive measure Raw AUC (SEAge, education adjusted % difference Education adjusted % difference 
WMS Information 0.8240 (0.0167) 0.37 0.21 
RAVLT measures 
 RAVLT A1 0.7082 (0.0186) 1.33 −0.05 
 RAVLT A2 0.7838 (0.0153) 1.75 −0.39 
 RAVLT A3 0.8072 (0.0143) 1.92 −0.11 
 RAVLT A4 0.8162 (0.0141) 1.89 −0.06 
 RAVLT A5 0.8276 (0.0133) 1.74 0.03 
 RAVLT B1 0.6914 (0.0182) 2.66 −0.06 
 RAVLT Immed Delay 0.8175 (0.0135) 1.20 −0.12 
 RAVLT True Positive 0.6313 (0.0218) 1.60 0.72 
 RAVLT True Negative 0.7800 (0.0181) 0.90 0.75 
BVRT Total Correct 0.7571 (0.0173) 0.18 −0.60 
Span of Digits 0.5396 (0.0215) 0.89 0.58 
11-item Token Test 0.6238 (0.0216) 0.35 −0.27 
COWAT 0.6568 (0.0199) −0.94 −1.14 
Animal Naming 0.7815 (0.0169) 1.48 0.11 
WAIS-R subtests 
 WAIS-R similarities 0.6739 (0.0185) −1.21 −2.00 
 WAIS-R comprehension 0.6303 (0.0199) −0.39 −0.60 
 WAIS-R block design 0.7318 (0.0183) 1.38 −0.34 
 WAIS-R digit symbol 0.7650 (0.0176) 1.44 −0.88 

Notes: WMS = Wechsler Memory Scale; CSHA = Canadian Studies of Health and Aging; AUCs = area under the receiver operating characteristic curves; RAVLT = Rey Auditory Verbal Learning Test; BVRT = Benton Visual Retention Test; COWAT = controlled word association test; WAIS-R = Wechsler Adult Intelligence Scale-Revised. Positive % difference: Raw AUC > adjusted AUC. Negative % difference: Raw AUC < adjusted AUC.

Moreover, demographic corrections with the use of published normative data with demographic stratifications produced equivalent results. As can be seen in the third column of Table 4, the classification accuracy of the demographically corrected scores was equivalent to the accuracy of overall (raw) scores. Similar results were found when participants with dementia were roughly matched for age and education with participants who did not receive a dementia diagnosis (see fourth column of Table 4).

Table 4.

CSHA-1 cognitive test score accuracy statistics and % difference between the z-scoreoverall and the z-scoredemographically-stratified percentiles for classifying dementia

Cognitive measure AUC (SE) overall normative data Stratified normative data % Difference Age, education matched groups % difference 
RAVLT measures 
 RAVLT A1 0.7082 (0.0186) 0.87* −0.19* 
 RAVLT A2 0.7837 (0.0153) 0.75* 0.14* 
 RAVLT A3 0.8069 (0.0143) 0.60* −0.66* 
 RAVLT A4 0.8160 (0.0141) 0.77* 0.31* 
 RAVLT A5 0.8272 (0.0133) 0.77* −0.05* 
 RAVLT Immed Delay 0.8175 (0.0135) 1.21* −0.46* 
 RAVLT True Positive 0.6319 (0.0218) 0.02* −0.58* 
BVRT Total Correct 0.7471 (0.0171) −1.41** −1.28** 
COWAT 0.6559 (0.0198) 1.46* 0.32* 
  −1.73** −2.35** 
Animal Naming 0.7815 (0.0169) 1.87* −0.17* 
Cognitive measure AUC (SE) overall normative data Stratified normative data % Difference Age, education matched groups % difference 
RAVLT measures 
 RAVLT A1 0.7082 (0.0186) 0.87* −0.19* 
 RAVLT A2 0.7837 (0.0153) 0.75* 0.14* 
 RAVLT A3 0.8069 (0.0143) 0.60* −0.66* 
 RAVLT A4 0.8160 (0.0141) 0.77* 0.31* 
 RAVLT A5 0.8272 (0.0133) 0.77* −0.05* 
 RAVLT Immed Delay 0.8175 (0.0135) 1.21* −0.46* 
 RAVLT True Positive 0.6319 (0.0218) 0.02* −0.58* 
BVRT Total Correct 0.7471 (0.0171) −1.41** −1.28** 
COWAT 0.6559 (0.0198) 1.46* 0.32* 
  −1.73** −2.35** 
Animal Naming 0.7815 (0.0169) 1.87* −0.17* 

Notes: CSHA = Canadian Studies of Health and Aging; AUCs = area under the receiver operating characteristic curves; RAVLT = Rey Auditory Verbal Learning Test; BVRT = Benton Visual Retention Test; COWAT = controlled word association test. Positive % difference: Raw AUC > adjusted AUC. Negative % difference: Raw AUC < adjusted AUC.

*Based on age stratified normative data.

**Based on education stratified normative data.

Of interest, comparisons of the sensitivity and specificity at specific cutoffs of impairment (i.e., 8/9th percentile and 15/16th percentile) mirrored the equivalency in overall accuracy, but with some interesting trends. As can be seen in Table 5, for many of the clinical NP test scores, the sensitivity of the raw scores was superior to the sensitivity of the demographically corrected test scores. When examining the ROC curves (see Figs 2 and 3 for two illustrative examples), it appeared that, for the most part, demographic corrections shifted the cutoff for impairment somewhat down and to the left resulting in the reduced sensitivity with increased specificity. This downward left shift of the impairment cutoff was also seen in the simulated data, including, as can be seen in Fig. 4, the “Medium Magnitude” condition, but in this condition the difference in the sensitivity was quite marked, and thus the large discrepancy in the overall classification accuracy (i.e., AUC difference).

Fig. 2.

For archival clinical data: ROC curve of Animal Naming scores based on “overall normative data” (equivalent to raw score accuracy) versus “demographically corrected normative data” (with age stratified metanorms provided by Mitrushina et al., 2005). Cutoffs for impairment at the 8/9th percentile are indicated with a diamond for “overall normative data” and a square for “demographically corrected normative data.”

Fig. 2.

For archival clinical data: ROC curve of Animal Naming scores based on “overall normative data” (equivalent to raw score accuracy) versus “demographically corrected normative data” (with age stratified metanorms provided by Mitrushina et al., 2005). Cutoffs for impairment at the 8/9th percentile are indicated with a diamond for “overall normative data” and a square for “demographically corrected normative data.”

Fig. 3.

For archival clinical data: ROC curve for BVRT-total correct scores based on “overall normative data” (equivalent to raw score accuracy) versus “demographically corrected normative data” (with education stratified normative data from Miatton et al., 2004, as cited in Strauss et al., 2006). Cutoffs for impairment at the 8/9th percentile are indicated with a diamond for “overall normative data” and a square for “demographically corrected normative data.”

Fig. 3.

For archival clinical data: ROC curve for BVRT-total correct scores based on “overall normative data” (equivalent to raw score accuracy) versus “demographically corrected normative data” (with education stratified normative data from Miatton et al., 2004, as cited in Strauss et al., 2006). Cutoffs for impairment at the 8/9th percentile are indicated with a diamond for “overall normative data” and a square for “demographically corrected normative data.”

Fig. 4.

For simulated data: ROC curve of raw and demographically corrected scores for the “Medium Magnitude” condition (average accuracy simulation trials), where age was moderately associated with a dementia diagnosis. Cutoffs for impairment at the 8/9th percentile are indicated with a diamond for the “raw scores” and a square for the “demographically corrected scores.”

Fig. 4.

For simulated data: ROC curve of raw and demographically corrected scores for the “Medium Magnitude” condition (average accuracy simulation trials), where age was moderately associated with a dementia diagnosis. Cutoffs for impairment at the 8/9th percentile are indicated with a diamond for the “raw scores” and a square for the “demographically corrected scores.”

Table 5.

For CSHA-1 cognitive test scores, sensitivity and specificity at two cutoffs of impairment (15/16th percentile and 8/9th percentile) for scores based on published normative data: (a) collapsed across demographic groups (“Overall”) or (b) with demographic stratification (“Corrected”)

Cognitive measure and normative source 15/16 Sensitivity (%) 15/16 Specificity (%) 8/9 Sensitivity (%) 8/9 Specificity (%) 
RAVLT measures 
 RAVLT A1     
  Overall 93.4 31.3 74.2 53.3 
  Corrected 74.6 50.3 74.2 53.3 
 RAVLT A2     
  Overall 88.7 53.4 88.7 53.4 
  Corrected 89.2 51.0 74.2 70.4 
 RAVLT     
  Overall 93.0 53.5 93.0 53.5 
  Corrected 93.4 51.9 81.7 66.7 
 RAVLT A4     
  Overall 94.4 50.6 94.4 50.6 
  Corrected 94.4 49.6 88.3 60.8 
 RAVLT A5     
  Overall 97.2 45.4 92.0 56.6 
  Corrected 92.5 55.6 89.2 60.8 
 RAVLT Immed Delay     
  Overall 94.8 47.9 90.6 56.3 
  Corrected 93.4 50.8 84.0 64.3 
 RAVLT True Positive     
  Overall 31.0 83.1 31.0 83.1 
  Corrected 27.2 85.2 31.0 88.0 
BVRT Total Correct     
 Overall 85.4 49.7 77.5 61.0 
 Corrected 65.7 71.7 64.8 73.3 
COWAT     
 Overall 67.6 55.3 49.3 73.4 
 Corrected education 67.6 62.6 51.6 75.1 
 Corrected age 62.0 60.3 39.0 76.8 
Animal Naming     
 Overall 81.2 58.7 73.2 69.6 
 Corrected 79.3 58.1 64.3 74.3 
Cognitive measure and normative source 15/16 Sensitivity (%) 15/16 Specificity (%) 8/9 Sensitivity (%) 8/9 Specificity (%) 
RAVLT measures 
 RAVLT A1     
  Overall 93.4 31.3 74.2 53.3 
  Corrected 74.6 50.3 74.2 53.3 
 RAVLT A2     
  Overall 88.7 53.4 88.7 53.4 
  Corrected 89.2 51.0 74.2 70.4 
 RAVLT     
  Overall 93.0 53.5 93.0 53.5 
  Corrected 93.4 51.9 81.7 66.7 
 RAVLT A4     
  Overall 94.4 50.6 94.4 50.6 
  Corrected 94.4 49.6 88.3 60.8 
 RAVLT A5     
  Overall 97.2 45.4 92.0 56.6 
  Corrected 92.5 55.6 89.2 60.8 
 RAVLT Immed Delay     
  Overall 94.8 47.9 90.6 56.3 
  Corrected 93.4 50.8 84.0 64.3 
 RAVLT True Positive     
  Overall 31.0 83.1 31.0 83.1 
  Corrected 27.2 85.2 31.0 88.0 
BVRT Total Correct     
 Overall 85.4 49.7 77.5 61.0 
 Corrected 65.7 71.7 64.8 73.3 
COWAT     
 Overall 67.6 55.3 49.3 73.4 
 Corrected education 67.6 62.6 51.6 75.1 
 Corrected age 62.0 60.3 39.0 76.8 
Animal Naming     
 Overall 81.2 58.7 73.2 69.6 
 Corrected 79.3 58.1 64.3 74.3 

Notes: RAVLT = Rey Auditory Verbal Learning Test; BVRT = Benton Visual Retention Test; COWAT = controlled word association test. Values in bold represent the superior cutoff sensitivity or specificity.

Discussion

In this paper, we described a test of Sliwinski and colleagues' (1997, 2003) hypothesis that corrections for demographic influences would compromise diagnostic accuracy by removing predictive variance when demographic variables were associated not only with test scores, but also with diagnostic status. Sliwinski and colleagues predicted this phenomenon would occur for age corrections because age is an established risk factor for dementia (e.g., Bachman et al., 1993; Braak et al., 1999; CSHA Working Group, 1994, 2000; Shaji, Promodu, Abraham, Roy, & Verchese, 1996). In a Monte Carlo procedure, we simulated age and education as dementia risk factors by manipulating the association between demographic variables and diagnosis, while maintaining the association between test scores and demographic variables (i.e., the degree of bias in test scores due to demographics was constant). We found that simulating age as a risk factor for dementia resulted in reduced AUCs for age-corrected scores, which was only statistically significant when there was a medium magnitude of association. In clinical data where there was only a small magnitude of association between age and dementia, demographic corrections—whether based on regression or published normative data—did not improve classification accuracy for dementia diagnoses.

Equivocal classification accuracy for demographically corrected versus raw scores have been reported when comparisons are made at the level of overall classification (i.e., ROC AUC; Kraemer et al., 1998; O'Connell et al., 2004). On the other hand, when considering cutoffs of impairment, clear differences in the demographically corrected versus raw scores exist. Our findings in the clinical data suggest that demographically corrected scores maximize specificity, a finding reported by others with the Halstead Impairment Index (Kiernan & Matthews, 1976), the Comprehensive Norms (Heaton et al., 1996), and the MMSE (Murden et al., 1991). Our data also show that demographically corrected scores compromise sensitivity, another frequently reported finding (Morgan & Caccappolo-van Vliet, 2001; Reitan & Wolfson, 1995, 1996, 1997, 2004, 2005). Although AUC comparisons may be the most appropriate method for comparing accuracy (i.e., Zweig & Campbell, 1993), it appears that the overall accuracy data are somewhat disconnected from cutoff analyses. This is most apparent in the simulated data condition where there was a large and statistically significant AUC difference in favor of raw scores (i.e., moderate association between age and dementia diagnosis; Fig. 4), but at specific cutoffs for impairment the specificity of demographically corrected scores remained superior. It appears that the shift in the cutoff for impairment down and to the left on the ROC plot resulted in superior specificity at this cutoff despite lower accuracy over all operating conditions. If, in clinical practice, judgments of impaired performance versus performance within normal limits are based on cutoffs (i.e., 8/9th percentile or 15/16th percentile), then the cutoff analyses may be most clinically relevant. Moreover, use of sensitivity and specificity data at a specific cutoff when combined with base rate data allows for the calculation of predictive values, which are clinically relevant because they provide a probability estimate of the likely diagnosis. When calculating predictive values, it would be possible to incorporate Sliwinski and colleagues' (2003) recommendation to use age-based differences in base rates (i.e., dementia prevalence rates based on age) consistent with their proposed “diagnostic” norms.

Limitations

There are numerous limitations to these data, and replication is required to inform clinical practice. Foremost, the simulation method used to create scores and diagnostic status and the method used to manipulate associations are novel and have not yet been peer reviewed, which is a potential limitation. Moreover, we decided to keep particular variables constant, such as the medium magnitude of association (r = |0.30|), and it is unclear how these choices impacted these data. Simulated data are limited in how well they reflect real clinical data, and one major limitation of these simulated data is the modeling of age associations solely with test scores as opposed to what appears to occur in real-life normal aging samples, where age is best conceptualized as a higher order effect common to multiple cognitive test scores that may or may not uniquely contribute to variance in individual test scores (Salthouse, 2009).

The use of clinical data does help mitigate many of the limitations of the generalizability of the simulated data results, but these clinical data are also problematic. Foremost, the neuropsychologist contributing to the interdisciplinary consensus diagnosis would have been aware of each participant's demographically corrected cognitive test performance, thus spuriously inflating the accuracy estimates for the demographically corrected test scores. Clearly, replication in samples without criterion contamination is required. Also, the clinical sample used was fairly ethnically homogenous, thus limiting its generalizability to other populations (although this relative homogeneity does accurately reflect this cohort of older Canadians). Finally, these clinical data only included one set of normative data, and it is unclear whether these same results would be replicated with all sets of normative data because classifications of impairment can vary widely depending on specific normative data used for comparisons (Kalechstein, van Gorp, & Rapport, 1998).

A major limitation of this study common to the simulated and clinical data is the dichotomous characterization of diagnosis because this does not capture the complicated nature of diagnosis nor does it reflect diagnostic decision making (which integrates information from patient and informant interviews, multiple test scores, and functional status). Nevertheless, we argue that knowledge of the psychometric properties of NP tests has clinical relevance, and speaks to evidence for criterion validity measured by dementia classification accuracy. A final limitation of these simulated and clinical data is the use of a restricted age range (i.e., 65+). It is possible, that corrections based on larger ages that capture periods of rapid developmental change would be appropriate. In addition, a larger age range may reveal a larger magnitude of association between age and dementia diagnosis, which these data suggest would differentially affect the sensitivity of age-corrected scores.

Conclusions

Although it seems that correcting for age when age is also a risk factor for dementia does reduce the classification accuracy of age-corrected scores when compared over all operating conditions, this appears most relevant when age is moderately associated with dementia risk. Cutoff analyses in the simulated and clinical data demonstrate that sensitivity is reduced for demographically corrected scores, but specificity appears enhanced. Moreover, these findings suggest that this trade-off between sensitivity and specificity may occur when the difference in AUCs is negligible and when the AUC difference is statistically significant, but replication of these findings is required.

Our findings do appear similar to previous work suggesting compromised sensitivity (e.g., Morgan & Caccappolo-van Vliet, 2001; Reitan & Wolfson, 1995, 1996, 1997, 2004, 2005) and enhanced specificity with demographically corrected scores (e.g., Heaton et al., 1996; Kiernan & Matthews, 1976; Murden et al., 1991). The clinical relevance of this phenomenon largely depends on the context of the assessment. In the context of a battery assessment, which includes multiple measures of the multiple domains assessed in a dementia diagnostic evaluation, maximizing sensitivity at cutoffs for impairment is less important. The reduction in any sensitivity associated with the use of demographically corrected scores would likely be offset by the use of multiple tests, which, in itself, increases sensitivity at the cost of specificity (e.g., Sackett, Haynes, Guyatt, & Tugwell, 1991). Demographic corrections appear to increase specificity, which would be needed in a battery approach. Maximizing specificity disproportionately increases the positive predictive value, which is the probability that the individual being assessed actually has the condition of interest, and this may be most important diagnostically (e.g., Smith & Ivnik, 2003). It appears, then, that the fact that demographic corrections can reduce sensitivity, particularly if these demographic variables are related to the diagnostic condition, has less clinical importance in an NP assessment provided a typical battery approach is used. On the other hand, if one test is used, such as in a screening context, the reduction in sensitivity could be detrimental and here it is recommended that raw scores be used over demographically corrected scores to maximize sensitivity.

Funding

M.E.O. was aided in manuscript preparation by a Dr Janet Bews and Alzheimer Society of Canada/CIHR (Institute of Aging) Doctoral Award. A research personnel award from the Canadian Institutes of Health Research, Institute of Aging provided support for H.T. in the preparation of this manuscript as did a Michael Smith Foundation for Health Research award for research unit infrastructure held by the Centre on Aging at the University of Victoria. The Canadian Study of Health and Aging (CSHA) was funded by the Seniors' Independence Research Program through the National Health Research and Development Program (NHRDP) of Health Canada (project 6606-3954-MC(S)). Additional funding for the CSHA was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, NHRDP (project 6603-1417-302 (R)), Bayer Incorporated, the British Columbia Health Research Foundation (projects 38 (93-2) and 34 (96-1)). The study was coordinated through the University of Ottawa and the Division of Aging and Seniors, Health Canada.

Conflict of Interest

None declared.

References

Adams
R. L.
Boake
C.
Crain
C.
Bias in a neuropsychological test classification related to education, age, and ethnicity
Journal of Consulting and Clinical Psychology
 , 
1982
, vol. 
50
 (pg. 
143
-
145
)
Ainslie
N. K.
Murden
R. A.
Effect of education on the clock-drawing dementia screen in non-demented elderly persons
Journal of the American Geriatrics Society
 , 
1993
, vol. 
41
 (pg. 
249
-
252
)
Bachman
D. L.
Wolf
P. A.
Linn
R. T.
Knoefel
J. E.
Cobb
J. L.
Belanger
A. J.
, et al.  . 
Incidence of dementia and probable Alzheimer's disease in a general population: The Framingham study
Neurology
 , 
1993
, vol. 
43
 (pg. 
515
-
519
)
Bank
A. L.
Yochim
B. P.
MacNeill
S. E.
Lichtenberg
P. A.
Expanded normative data for the Mattis Dementia Rating Scale for use with urban, elderly medical patients
The Clinical Neuropsychologist
 , 
2000
, vol. 
14
 (pg. 
149
-
156
)
Belle
S. H.
Seaberg
E. C.
Ganguli
M.
Ratcliff
G.
DeKosky
S.
Kuller
L. H.
Effect of education and gender adjustment on the sensitivity and specificity of a cognitive screening battery for dementia: Results from the MoVIES project
Neuroepidemiology
 , 
1996
, vol. 
15
 (pg. 
321
-
329
)
Bornstein
R. A.
Classification rates obtained with “standard” cut-off scores on selected neuropsychological measures
Journal of Clinical and Experimental Neuropsychology
 , 
1986
, vol. 
8
 (pg. 
413
-
420
)
Braak
E.
Griffing
K.
Arai
K.
Bohl
J.
Bratzke
H.
Braak
H.
Neuropathology of Alzheimer's disease: What is new since A. Alzheimer?
European Archives of Psychiatry and Clinical Neurosciences
 , 
1999
, vol. 
24
 (pg. 
14
-
22
)
Bravo
G.
Hébert
R.
Age- and education-specific reference values for the Mini-Mental and Modified Mini-Mental State Examinations derived from a non-demented elderly population
International Journal of Geriatric Psychiatry
 , 
1997
, vol. 
12
 (pg. 
1008
-
1018
)
Brown
L. M.
Shinka
J. A.
Mortimer
J. A.
Graves
A. B.
3MS normative data for the African Americans
Journal of Clinical and Experimental Neuropsychology
 , 
2003
, vol. 
25
 (pg. 
234
-
241
)
Canadian Study of Health and Aging Working Group.
Canadian Study of Health and Aging: Study methods and prevalence of dementia
Canadian Medical Association Journal
 , 
1994
, vol. 
6
 (pg. 
433
-
440
)
Canadian Study of Health and Aging Working Group.
The Incidence of Dementia in Canada
Neurology
 , 
2000
, vol. 
55
 (pg. 
66
-
73
)
Chibnall
J. T.
Eastwood
R.
Postsecondary education and dementia risk in older Jesuit priests
International Psychogeriatrics
 , 
1998
, vol. 
10
 (pg. 
359
-
368
)
Cohen
J.
Statistical power analysis for the behavioural sciences
 , 
1988
2nd ed.
Mahwah, NJ
Lawrence Erlbaum Associates
Davies
A. D. M.
The influence of age on Trail Making Test performance
Journal of Clinical Psychology
 , 
1968
, vol. 
24
 (pg. 
96
-
98
)
Folstein
M. F.
Folstein
S. E.
McHugh
P. R.
“Mini-Mental State” a practical method for grading the cognitive state of patients for the clinician
Journal of Psychiatric Research
 , 
1975
, vol. 
12
 (pg. 
189
-
198
)
Gatz
M.
Svedberg
P.
Pedersen
N. L.
Mortimer
J. A.
Berg
S.
Johansson
B.
Education and the risk of Alzheimer's disease: Findings from the study of dementia in Swedish twins
Journal of Gerontology
 , 
2001
, vol. 
56B
 (pg. 
292
-
300
)
Hanley
J. A.
McNeil
B. J.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases
Radiology
 , 
1983
, vol. 
148
 (pg. 
839
-
843
)
Heaton
R. K.
Matthews
C. G.
Grant
I.
Avitable
N.
Demographic corrections with comprehensive norms: An overzealous attempt or a good start?
Journal of Clinical and Experimental Neuropsychology
 , 
1996
, vol. 
18
 (pg. 
449
-
458
)
Heaton
R. K.
Miller
S. W.
Taylor
M. J.
Grant
I.
Revised comprehensive norms for an Expanded Halstead-Reitan Battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults.
 , 
2004
Lutz, FL
Psychological Assessment Resources
Jagger
C.
Clarke
M.
Anderson
J.
Battcock
T.
Misclassification of dementia by the Mini-Mental State Examination: Are education and social class the only factors?
Age and Ageing
 , 
1992
, vol. 
21
 (pg. 
404
-
411
)
Kalechstein
A. D.
van Gorp
W. G.
Rapport
L. J.
Variability in clinical classification of raw test scores across normative data sets
The Clinical Neuropsychologist
 , 
1998
, vol. 
12
 (pg. 
339
-
347
)
Kiernan
R. J.
Matthews
C. G.
Impairment index versus T-score averaging in neuropsychological assessment
Journal of Consulting and Clinical Psychology
 , 
1976
, vol. 
44
 (pg. 
951
-
957
)
Kraemer
H. C.
Moritz
D. J.
Yesavage
J.
Adjusting Mini-Mental State Examination scores for age and education level to screen for dementia: Correcting bias or reducing validity?
International Psychogeriatrics
 , 
1998
, vol. 
10
 (pg. 
43
-
51
)
Leibovici
D.
Ritchie
K.
Ledésert
B.
Touchon
J.
Does education level determine the course of cognitive decline?
Age and Ageing
 , 
1996
, vol. 
25
 (pg. 
392
-
397
)
Lezak
M. D.
Neuropsychological Assessment
 , 
1983
2nd edition
New York
Oxford University Press
Long
C. L.
Klein
K.
Decision strategies on neuropsychology II: Determination of age effects on neuropsychological performance
Archives of Clinical Neuropsychology
 , 
1990
, vol. 
5
 (pg. 
335
-
345
)
Mangione
C. M.
Seddon
J. M.
Cook
E. F.
Krug
J. H.
Sahagian
C. R.
Campion
E. W.
, et al.  . 
Correlates of cognitive function scores in elderly outpatients
Journal of the American Geriatrics Society
 , 
1993
, vol. 
41
 (pg. 
491
-
497
)
Marcopulos
B. A.
Gripshover
D. L.
Broshek
D. K.
McLain
C. A.
Brashear
H. R.
Neuropsychological assessment of psychogeriatric patients with limited education
The Clinical Neuropsychologist
 , 
1999
, vol. 
13
 (pg. 
147
-
156
)
Marcopulos
B. A.
McLain
C. A.
Guiliano
A. J.
Cognitive impairment or inadequate norms? A study of healthy, rural, older adults with limited education
The Clinical Neuropsychologist
 , 
1997
, vol. 
11
 (pg. 
111
-
131
)
Meguro
K.
Shimada
M.
Yamaguchi
S.
Ishizaki
J.
Ishii
H.
Shimada
Y.
, et al.  . 
Cognitive function and frontal lobe atrophy in normal elderly adults: Implications for dementia not as aging-related and the reserve hypothesis
Psychiatry and Clinical Neurosciences
 , 
2001
, vol. 
55
 (pg. 
565
-
572
)
Mitrushina
M.
Boone
K. B.
Razani
J.
D'Elia
L. F.
Handbook of normative data for neuropsychological assessment
 , 
2005
2nd ed.
New York
Oxford University Press
Morgan
M. E.
Caccappolo-van Vliet
E.
Advanced years and low education: The case against the comprehensive norms
Journal of Forensic Neuropsychology
 , 
2001
, vol. 
2
 (pg. 
53
-
69
)
Murden
R. A.
McRae
T. D.
Kaner
S.
Bucknam
M. E.
Mini-Mental State Exam scores vary with education in Blacks and Whites
Journal of the American Geriatrics Society
 , 
1991
, vol. 
39
 (pg. 
149
-
155
)
Nabors
N. A.
Vangel
S. J.
Lichtenberg
P. A.
Walsh
P.
Normative and clinical utility of the Hooper Visual Organization Test with geriatric medical inpatients
Journal of Clinical Geropsychology
 , 
1997
, vol. 
3
 (pg. 
191
-
198
)
O'Connell
M. E.
Tuokko
H.
Graves
R. E.
Kadlec
H.
Correcting the 3MS for bias does not improve accuracy when screening for cognitive impairment or dementia
Journal of Clinical and Experimental Neuropsychology
 , 
2004
, vol. 
26
 (pg. 
970
-
980
)
Reitan
R. M.
Wolfson
D.
Influence of age and education on neuropsychological test results
The Clinical Neuropsychologist
 , 
1995
, vol. 
9
 (pg. 
151
-
158
)
Reitan
R. M.
Wolfson
D.
Relationships of age and education to Wechsler Adult Intelligence Scale IQ values in brain-damaged and non-brain damaged groups
Clinical Neuropsychologist
 , 
1996
, vol. 
10
 (pg. 
293
-
304
)
Reitan
R. M.
Wolfson
D.
The influences of age and education on neuropsychological performances of persons with mild head injuries
Applied Neuropsychology
 , 
1997
, vol. 
4
 (pg. 
16
-
33
)
Reitan
R. M.
Wolfson
D.
Clinical and forensic issues regarding age, education, and the validity of neuropsychological test results: A review and presentation of a new study
Journal of Forensic Neuropsychology
 , 
2004
, vol. 
4
 (pg. 
1
-
32
)
Reitan
R. M.
Wolfson
D.
The effect of age and education transformations on neuropsychological test scores of persons with diffuse or bilateral brain damage
Applied Neuropsychology
 , 
2005
, vol. 
12
 (pg. 
181
-
189
)
Sackett
D. L.
Haynes
R. B.
Guyatt
G. H.
Tugwell
P.
Clinical epidemiology: A basic science for clinical medicine
 , 
1991
2nd ed.
New York
Lippincott Williams & Wilkins
Salthouse
T. A.
Decomposing age correlations on neuropsychological and cognitive variables
Journal of the International Neuropsychological Society
 , 
2009
, vol. 
15
 (pg. 
650
-
661
)
Shaji
S.
Promodu
K.
Abraham
T.
Roy
K. J.
Verchese
A.
An epidemiological study of dementia in a rural community in Kerala, India
British Journal of Psychiatry
 , 
1996
, vol. 
168
 (pg. 
745
-
749
)
Sliwinski
M.
Buschke
H.
Stewart
W. F.
Masur
D.
Lipton
R. D.
The effect of dementia risk factors on comparative and diagnostic selective reminding norms
Journal of the International Neuropsychological Society
 , 
1997
, vol. 
3
 (pg. 
317
-
326
)
Sliwinski
M. J.
Hofer
S. M.
Hall
C.
Buschke
H.
Lipton
R. B.
Petersen
R.
Optimizing cognitive test norms for detection
Mild Cognitive Impairment
 , 
2003
New York
Oxford University Press
(pg. 
89
-
104
)
Smith
G. E.
Ivnik
R. J.
Petersen
R.
Normative neuropsychology
Mild Cognitive Impairment
 , 
2003
New York
Oxford University Press
(pg. 
63
-
88
)
Spreen
O.
Strauss
E.
A compendium of neuropsychological tests: administration, norms, and commentary
 , 
1998
2nd ed
New York
Oxford University Press
Stern
Y.
What is cognitive reserve? Theory and research application of reserve concept
Journal of the International Neuropsychological Society
 , 
2002
, vol. 
8
 (pg. 
448
-
460
)
Strauss
E.
Sherman
E. M.
Spreen
O.
A compendium of neuropsychological tests: administration, norms, and commentary
 , 
2006
3rd ed.
New York
Oxford University Press
Tombaugh
T. N.
McDowell
I.
Kristjansson
B.
Hubley
A. M.
Mini-Mental State Examination (MMSE) and the Modified MMSE (3MS): A psychometric comparison and normative data
Psychological Assessment
 , 
1996
, vol. 
8
 (pg. 
48
-
59
)
Tuokko
H.
Garrett
D. D.
McDowell
I.
Silverberg
N.
Kristjansson
B.
Cognitive decline in high-functioning older adults: Reserve or ascertainment bias?
Aging and Mental Health
 , 
2003
, vol. 
7
 (pg. 
259
-
270
)
Tuokko
H.
Kristjansson
E.
Miller
J.
Neuropsychological detection of dementia: An overview of the neuropsychological component of the Canadian Study of Health and Aging
Journal of Clinical and Experimental Neuropsychology
 , 
1995
, vol. 
17
 (pg. 
352
-
373
)
Youngjohn
J. R.
Larrabee
G. L.
Crook
T. H.
New adult age- and education-corrected norms for the Benton Visual Retention Test
The Clinical Neuropsychologist
 , 
1993
, vol. 
7
 (pg. 
155
-
160
)
Zweig
M. H.
Campbell
G.
Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine
Clinical Chemistry
 , 
1993
, vol. 
39
 (pg. 
561
-
577
)