Abstract

Researchers who have been responsible for developing test batteries have argued that competent practice requires the use of a “fixed battery” that is co-normed. We tested this assumption with three normative systems: co-normed, meta-regressed norms and a system of these two methods. We analyzed two samples: 330 referred patients and 99 undergraduate volunteers. The T scores generated for referred patients using the three systems were highly associated with one another and quite similar in magnitude, with an Overall Test Battery Means (OTBMs) using the co-normed, hybrid, and meta-regressed scores equaled 43.8, 45.0, and 43.9, respectively. For volunteers, the OTBMs equaled 47.4, 47.5, and 47.1, respectively. The correlations amongst these OTBMs across systems were all above .90. Differences among OTBMs across normative systems were small and not clinically meaningful. We conclude that co-norming for competent clinical practice is not necessary.

Introduction

Across much of the recent history of clinical neuropsychology, there have been debates regarding the use of fixed versus flexible test batteries (e.g., Bigler, 2007; Hom, 2008; Larrabee, 2008). Recently, the debate seems to be more frequently fought in the forensic arena. For instance, the issue of fixed versus flexible batteries was at the center of a lead neurotoxicity case that came before the New Hampshire Supreme Court (i.e., Baxter v. Temple). An Amicus Brief was filed on behalf of the American Academy of Clinical Neuropsychology (AACN, 2008) in the case, which notes:

The flexible battery approach is a generally accepted in mainstream neuropsychological practice. Moreover, test validity lies in individual tests, not in “test batteries.” The flexible battery approach is very similar in structure, scientific merits, and logic to diagnostic testing in clinical medicine. There is no such thing as a fixed medical test battery, yet medical procedures [are] regularly found to be reliable enough to earn legal admissibility (p. 6).

Some may believe that the debate regarding batteries has been resolved to a large degree by market forces. That is, fixed batteries typically take substantially more time to administer and score. Third-party reimbursement agencies have been reducing the amount of time that they pay clinicians to complete an assessment. Therefore, flexible batteries become the dominant choice of clinicians out of economic necessity. This evident in the data collected by the American Academy of Clinical Neuropsychology every 5 years in their salary survey published in The Clinical Neuropsychologist (Putnam, 1989; Sweet, Meyers, Nelson, & Moberg, 2011). Over the past two decades, the percentage of clinicians who adhere to a fixed battery approach has dwindled significantly, yet the scientific debate remains alive. For example, from 1989 to the present, the proportion of neuropsychologists who report using a flexible battery approach has increased from 54% (Putnam, 1989) to 90% (Sweet et al., 2011). During this same period, the percentage of neuropsychologists who adhere to a fixed battery approach has decreased from 29% (Putnam, 1989) to 5% (Sweet et al., 2011).

Another contentious issue that is integrally entwined with the fixed versus flexible battery debate is the need to have assessment measures co-normed. It has been argued by many that psychologists should use co-normed batteries to generate more reliable and valid interpretations of patients' test scores. A strong advocate in the fixed battery approach, Elbert Russell, has argued in favor of the use of test batteries which are co-normed (e.g., Russell, 2004, 2012; Russell, Russell, & Hill, 2005). Specifically, Russell (2004) noted:

In flexible batteries, when different norms for each test are used, the assumption is that the norms for the different tests are equivalent regardless of their norming sample. Initially, standardization requires a common metric, such as Z scores or T scores. … In order to overcome this limitation in a flexible battery, Lezak et al. (2004) and Mitrushina et al. (1999) have advocated transforming scores to standard scores. … One such method of using Z scores to combine test results is the Rohling Interpretive Method (RIM; Miller & Rohling, 2001). It is an advancement in establishing the equivalence between tests, which could be used in a flexible manner. However, at this point the primary problem with the RIM is that the scores from various tests do not have equivalent norms.

As Russell (2004) suggests, the adequacy of norms cannot be assumed by clinicians; rather, norm equivalence has to be substantiated empirically. Most publishers of assessment instruments consider it to be a psychometric necessity to co-norm their batteries. Kern and colleagues (2008) made similar references to co-norming as a psychometric necessity in reference to the newly developed National Institute of Mental Health (NIMH) cognitive battery for use in research setting involved in the treatment of chronic mental illness. Similar comments have been noted by Heaton, Miller, Taylor, and Grant,(2004), when they published their revised expanded norms for the Halstead-Reitan Battery. However, Heaton and colleagues noted that their concern over co-norming was to allow clinicians to make demographic corrections based on patients' age, race, education, and gender.

We find it ironic that the publishers have focused so intently on co-norming their fixed batteries, yet these same batteries are increasingly less frequently used in their entirety by clinicians. For example, Stern and White (2003) described the need to co-norm the Neuropsychological Assessment Battery (NAB; Stern & White, 2004) during its development. Yet, few of the users of the NAB administer the entire battery and often supplement the NAB with tests that are not part of the battery (e.g., Wechsler Intelligence Scales). Schretlen, Testa, and Pearlson (2010) developed the Calibrated Neuropsychological Normative System (Schretlen et al., 2010). Its name alone implies the importance these authors put on, not just on a system of assessment, but on the need for the battery to be “calibrated” through co-norming. In another example, each revision to the Wechsler tests now involves an expensive and complicated co-norming process that “links” the IQ tests to various other instruments, such as academic achievement (e.g., Wechsler Individual Achievement Test-Third Edition or WIAT-III; Wechsler, 2009a, 2009b), memory assessment (e.g., Wechsler Memory Scale-Fourth Edition or WMS-IV; Wechsler, 2009a, 2009b), and the measures of motivation or validity (e.g., Advanced Clinical Solutions or ACS; Holdnack & Drozdick, 2009).

As noted, the degree of equivalence of normative systems should not be assumed (Russell, 2004) but empirically substantiated. However, strictly speaking, this argument can be applied in either direction. Not only should one not assume two normative systems are equivalent, but one should not reflexively dismiss two systems’ equivalence either. Whether normative systems are equivalent or not is best answered empirically. Claims that have been made regarding the need for co-norming to provide clinicians with the comparative data necessary to produce sound clinical judgments are often made dogmatically and without reference to empirical data. To our knowledge, no one other than the first author (MLR) and his colleagues has empirically examined the equivalence of any two normative systems (Fields et al., 2013; Hill et al., 2013; Rohling, Axelrod, & Wall, 2008).

As cited earlier, Russell (2004) commented that Miller and Rohling (2001) developed a system of neuropsychological test data interpretation entitled the Rohling Interpretive Method (RIM), which Russell remained skeptical of, as the RIM relied on normative data whose equivalence had not been substantiated. However, a similar method to the RIM was published by Heaton and colleagues (2001), but these authors used their own co-normed system to generate T scores. Thus, they were not subjected to the same criticism by Russell as was the RIM. However, unlike Heaton and colleagues (2001), the RIM is based on meta-analytic procedures commonly applied to group data, which are used in the analysis of data obtained during an assessment of a single patient. Further explanation and supportive data for the RIM have been provided by Rohling, Langhinrichsen-Rohling, and Miller (2003), Rohling, Miller, and Langhinrichsen-Rohling (2004), Meyers and Rohling (2004), and Meyers and Rohling (2009). These authors have argued that norms for tests entered into the RIM did not have to come from the same normative database. Rather, if two normative databases were both collected with proper sampling and standardized administration procedures, then it is likely that the two normative databases are equivalent to that which would have occurred had they been co-normed. Rohling and colleagues (2008) found empirical support for this assertion when they examined data that were collected from a sample of normal college student volunteers. However, what was not clear from their analyses (Rohling et al., 2008) was whether similar equivalence would exist with data collected from patients with cognitive deficits caused by disease or injury. Furthermore, these researchers examined only two normative systems, that is, Heaton and colleagues’ (2004) and Mitrushina, Boone, Razani, and D'Elia's (2005).

To conduct an RIM analysis, a clinician must transform all of the patient's raw test scores to a common metric. Then, these transformed scores are combined into various composite scores for subsequent analysis. Another composite that is typically generated in an RIM analysis is an estimate of premorbid general ability (EPGA), which is used in various comparisons of current abilities to predict prior abilities had the patient not suffered any disease or injury. The magnitude of the difference between the EPGA and a composite is standardized by a patient's standard deviation of the composite. Thus, a standardized mean difference effect size is generated. Unlike the Halstead-Reitan Battery, which uses a summary statistic (e.g., Halstead Impairment Index and the Global Deficit Score) and an arbitrary cutoff to determine if neuropathology exists, the RIM statistically tests for differences among composite scores and the EPGA. The most powerful summary statistic in the RIM is the Overall Test Battery Mean (OTBM). Deficits are judged to exist if a one-sample t-test determines that the effect size is significantly different from zero. The p-value of this inferential statistic, for it to be considered meaningful, can be conservatively set at .05 or more liberally set as high as .20. Because injury or disease is not expected to increase a patient's cognitive ability, this is a hypothesis driven t-test that is one tailed. The appropriate Type I and Type II error rates need to be determined by the examiner, based on the nature of the referral and the desired statistical power of the test.

As noted previously, Rohling and colleagues (2008) examined two normative systems with a single sample of normal undergraduate volunteers. An additional normative system of interest was developed for use with the Meyers Neuropsychological System (MNS; Meyers, 1995; Meyers & Rohling, 2004; Meyers & Rohling, 2009), which uses the RIM as a basis for analyzing a patient's set of scores. The present study was designed to examine all three of these normative systems and to include with the volunteers a sample of patients referred for assessment.

Meyers Neuropsychological System

As the MNS and the norms it uses are likely less well known to readers than other batteries, a brief description is presented here. We will be referring to the norms used in the MNS throughout this article as Meyers' and they are designated with the lower case letter “m” in various tables and figures. The MNS battery was developed using discriminate function analyses of tests to determine which tests were able to distinguish patients with TBI from patients with other conditions (e.g., depression, stroke, and learning disability; Meyers & Rohling, 2004). The MNS was intentionally developed to include the most commonly used neuropsychological instruments (Rabin, Barr, & Burton, 2005). Meyers and Rohling (2004) calculated the reliability of the MNS with a sample of 63 clinically referred patients, which equaled .86 (Wechsler, 2008). This estimate is within the range of other global scores of commonly used summary scores, such as the Wechsler IQ tests. It is actually higher than most of the commonly used neuropsychological assessment instruments documented by Calamia, Markon, and Tranel (2014). The MNS tests selected for use in these analyses are listed in Table 1, along with the original normative sample that was used when the normative system was being developed.

Table 1.

A portion of the tests used in the Meyers Neuropsychological Battery (MNB) that were included in these analyses. Listed are the test's original citation and the source of the norms initially used by Meyers for norming

 Test name Abbreviation Original citation Normative sample 
1. Victoria Revision Category Test CT Labreche (1983) Kozel and Meyers (1998) 
2. Controlled Oral Word Association COWA Benton, Hamsher, and Sivan (1994) Spreen and Strauss (1998), pp. 455–458, Table 11-11 through 11–14 
3. Animal Naming Test ANI 
4. Boston Naming Test BNT Kaplan, Goodglass, and Wientrab (1983) Spreen and Strauss (1998), pp. 436–438 Table 11-2 
5. Auditory Verbal Learning – Trial 1 AV1 Rey (1941) Spreen and Strauss (1998), pp. 333–335 Tables 10–16 through 10–18 
6. Auditory Verbal Learning – Trial 5 AV5 
7. Auditory Verbal Learning – Total AVtot 
8. Auditory Verbal Learning – Immed. AVim 
9. Auditory Verbal Learning – Delay AVdel 
10. Trail Making Test – Part A TrA Army Individual Test Battery (1944) Heaton and colleagues (1991) 
11. Trail Making Test – Part B TrB 
12. Finger Tapping – Dominant FTD Halstead (1947) Heaton and colleagues (1991) 
13. Finger Tapping – Nondominant FTND 
 Test name Abbreviation Original citation Normative sample 
1. Victoria Revision Category Test CT Labreche (1983) Kozel and Meyers (1998) 
2. Controlled Oral Word Association COWA Benton, Hamsher, and Sivan (1994) Spreen and Strauss (1998), pp. 455–458, Table 11-11 through 11–14 
3. Animal Naming Test ANI 
4. Boston Naming Test BNT Kaplan, Goodglass, and Wientrab (1983) Spreen and Strauss (1998), pp. 436–438 Table 11-2 
5. Auditory Verbal Learning – Trial 1 AV1 Rey (1941) Spreen and Strauss (1998), pp. 333–335 Tables 10–16 through 10–18 
6. Auditory Verbal Learning – Trial 5 AV5 
7. Auditory Verbal Learning – Total AVtot 
8. Auditory Verbal Learning – Immed. AVim 
9. Auditory Verbal Learning – Delay AVdel 
10. Trail Making Test – Part A TrA Army Individual Test Battery (1944) Heaton and colleagues (1991) 
11. Trail Making Test – Part B TrB 
12. Finger Tapping – Dominant FTD Halstead (1947) Heaton and colleagues (1991) 
13. Finger Tapping – Nondominant FTND 

The MNS norms were smoothed using the demographic variables of age, education, gender, ethnicity, and handedness, similar to the methods used by Heaton and colleagues (2004). The total sample consisted of 1,727 patients, with a mean age of 45.7 years of age (SD = 20.7). The mean years of education was 12.3 (SD = 2.7). There were 779 women (45.0%) and 948 men (55.0%), which included 1,543 participants who were right handed (89.3%) and 184 participants who were left handed (11.0%). The sample included 1,617 individuals who self-identified as White (93.6%), 32 were of mixed race (1.8%), 27 were Native American (1.6%), 27 were Latino American (1.6%), 22 were African American (1.3%), and 2 were Asian American (0.1%).

Other Normative Systems

Another normative system we examined was developed by Heaton and colleagues (2004). They gathered data over a 25-year period to generate their system. Each participant included had many of the same tests administered to him or her, but not all individuals were administered every test within the battery. The data gathered were converted to scaled scores using a fractional polynomial regression equation that included age, education, sex, and ethnicity. These scaled scores were then converted to T scores, which allowed for demographic adjustments. We refer to these norms throughout this article as Heaton's and they are designated with the lower case letter “h” in tables and figures.

The third normative system we examined was that of Mitrushina and colleagues (2005), who combined scores across several published research samples using meta-analytic regression procedures. Thus, we refer to these norms throughout this article as “meta-analytic regression-based” norms or MA-Reg. They are designated throughout this article with the lower case letter “r” for ease of identification. A description of the methods used to generate these norms is provided by Mitrushina and colleagues (2005) and will not be repeated here. Interested readers are referred to Hasselblad and Hedges (1995), Irwig and colleagues (1994), and Rosenthal (1995) for details regarding the statistical procedures used to generate these MA-Reg norms.

Summarizing, we compared three normative systems with one another to determine the concordance among the systems as well as the implications of the varying normative data on patients' summary scores. The relevant difference among the normative systems, which was our independent variable, is the degree to which the normative scores vary based on the degree to which a system was co-normed. The Heaton system is highly co-normed, the Meyers system is somewhat co-normed, and the Mitrushina system is not co-normed. Our analyses were designed to quantify the influence co-norming had on patients' normative T scores. In this article, we did not examine batteries, per se. We examined normative systems. We selected the tests that were most in common across the three normative systems to conduct our analyses. The best test of our hypothesis was our analyses of the OTBM. We believe that the OTBM is the most relevant dependent variable for the determination of the influence of co-norming.

The samples we examined came from two sources and represent different populations. The first sample consisted of patients referred for neuropsychological assessment to Dr. John Meyers. The second sample consisted of normal undergraduate volunteer who were attending a private Midwestern urban university. The undergraduates volunteered to receive course credit as part of the Psychology Department's research subject pool. We made no a priori predictions with respect to the influence of sample type.

Methods

Participants

Patients

The patient sample consisted of individuals who were randomly selected from the Meyers patient database, but who were independent of the normative sample. Patients were only eligible for selection if they had passed the symptom validity algorithm and their data were considered to be valid (Meyers, 2007; Meyers & Volbrecht, 2003; Meyers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011). The sample was generated using the method of simple random sampling. Specifically, we used a computerized random number generator, which identified 330 patient ID numbers without replacement for selection. The entire database included 3,520 adults, aged 16 years or older, who had completed the complete MNS (i.e., no missing data). The sample included patients who had been referred for assessment by a healthcare provider due to concerns that he or she, or healthcare provider, had related to cognitive and/or psychosocial functioning. These referrals were based on specific concerns that patients' complaints might have been caused by neurologic dysfunction. Table 2 lists the demographic data for the patient sample. The sample included 297 patients who self-identified as White (90.0%), 4 were of mixed race (1.2%), 7 were Native Americans (2.1%), 12 were Latino Americans (3.6%), 8 were African Americans (2.4%), and 2 were Asian Americans (0.6%). We found no significant difference between the ethnic makeup of the study sample and that of the overall database. The sample was significantly younger than was the average for the overall database, which most likely was caused by the restriction that for a patient to be selected for the sample they had to have completed the entire MNS battery. Older patients more frequently had missing data, most likely caused by fatigue during testing, which resulted in the examiner shorten the battery; thus, making these patients ineligible for selection for this study. There were no other significant difference between the selected sample and the total sample on any of the other demographic variables listed. Within the sample, 75 patients (23.0%) were involved in litigation at the time of the assessment. Table 3 lists the various reasons for referral and patients' secondary diagnoses related to mental health disorders. Most patients had more than one diagnosis listed in their medical records when they were referred for assessment. Table 3 lists patients' primary and secondary diagnoses, which may have been neurological, mental health related, or both. Referred patients may or may not have suffered from cognitive deficits.

Table 2.

Demographic variables for the two samples (patients and volunteers) examined

 Patients (n = 330)
 
Volunteers (n = 99)
 
ES p 
M SD Range M SD Range 
Age (years) 35.5 14.0 16–79 21.7 5.8 18–44 1.10 <.0001 
Education (years) 12.7 2.4 4–21 12.9 1.0 12–16 −0.10 NS 
Gender (% women) 40.0 — — 87.8 — — 1.32 <.0001 
Ethnicity (% White) 90.0 — — 85.8 — — 0.12 NS 
Handedness (% RH) 92.1 — — 93.9 — — 0.08 NS 
NAART est. of FSIQ 101.1 9.1 80–121 105.7 6.6 86–124 −0.55 <.0001 
WRAT-R est. of FSIQ 99.1 12.1 78–123 102.0 11.0 66–125 −0.24 NS 
Barona est. of FSIQ 99.2 7.6 89–123 103.8 3.3 91–111 −0.67 <.0001 
WTAR est. of FSIQ 96.7 5.6 95–114 106.3 4.5 95–114 −1.81 <.0001 
Mean of four est. FSIQ 99.0 8.9 78–117 104.5 7.0 84–117 −0.66 <.0001 
T score conversion 49.3 5.9 35–61 53.0 4.7 39–61 −0.66 <.0001 
 Patients (n = 330)
 
Volunteers (n = 99)
 
ES p 
M SD Range M SD Range 
Age (years) 35.5 14.0 16–79 21.7 5.8 18–44 1.10 <.0001 
Education (years) 12.7 2.4 4–21 12.9 1.0 12–16 −0.10 NS 
Gender (% women) 40.0 — — 87.8 — — 1.32 <.0001 
Ethnicity (% White) 90.0 — — 85.8 — — 0.12 NS 
Handedness (% RH) 92.1 — — 93.9 — — 0.08 NS 
NAART est. of FSIQ 101.1 9.1 80–121 105.7 6.6 86–124 −0.55 <.0001 
WRAT-R est. of FSIQ 99.1 12.1 78–123 102.0 11.0 66–125 −0.24 NS 
Barona est. of FSIQ 99.2 7.6 89–123 103.8 3.3 91–111 −0.67 <.0001 
WTAR est. of FSIQ 96.7 5.6 95–114 106.3 4.5 95–114 −1.81 <.0001 
Mean of four est. FSIQ 99.0 8.9 78–117 104.5 7.0 84–117 −0.66 <.0001 
T score conversion 49.3 5.9 35–61 53.0 4.7 39–61 −0.66 <.0001 
Table 3.

Reason for referral for all 330 subjects. Some individuals had both a primary cognitive component and a behavior health component

General label for patients' primary diagnoses Frequency % of sample 
Traumatic brain injury—moderate to severe 27 8.2 
Stroke/CVA—right/left/cerebellar 19 5.8 
Developmental (e.g., intellectual disability, cerebral palsy, autism spectrum Dx) 11 3.3 
Dementia (e.g., Alzheimer's, vascular, Huntington's, subcortical) 2.4 
Anoxia (e.g., carbon monoxide poisoning, asphyxia, cardiovascular) 2.1 
Multiple sclerosis 2.1 
Brain tumor w/ or w/o radiation 1.8 
Epilepsy/seizure Dx (e.g., foci in left temporal, medial temporal) 1.5 
Infectious (e.g., West Nile virus, Lyme's disease, lupus) 1.2 
Primary Dx—neurologic Dx w/ documented symptoms in the medical record 94 28.5 
Secondary Dx—neurologic Dx without deficits in self-care or lives independently 84 25.5 
No neurologic Dx or minimal neurologic Dx (e.g., mild TBI w/o LOC) 152 46.0 
Total number of patients 330 100.0 
Depression/mood/affect-related (major depressive Dx, bipolar-Dep) 32 9.7 
Psychiatric disorder NOS (multiple/no primary) 26 7.9 
Somatization (conversion, pain, somatoform) 24 7.3 
Anxiety disorder (e.g., general anxiety Dx, panic Dx, PTSD) 15 4.5 
Substance abuse (e.g., alcohol, marijuana, cocaine) 1.5 
Personality disorder (e.g., antisocial, borderline) 0.9 
Psychosis (e.g., schizophrenia) 0.6 
Primary Dx—psychiatric Dx w/ documented symptoms in the medical record 107 32.4 
Secondary Dx—psychiatric Dx without deficits in self-care or lives independently 31 9.4 
No psychiatric Dx or minimal psychiatric Dx (e.g., adjustment Dx, sleep disturbance) 192 58.2 
Normal—number of patients without either a neurologic or a psychiatric Dx 45 13.6 
Dual or multiple DX—number of patients with both a neurological and a psychiatric Dx 35 10.6 
Total number of patients w/ neurological Dx that were expected to show deficits 94 28.5 
Total number of patients w/ Dx considered minimal and were not expected to show deficits 236 71.5 
Total number of patients 330 100.0 
General label for patients' primary diagnoses Frequency % of sample 
Traumatic brain injury—moderate to severe 27 8.2 
Stroke/CVA—right/left/cerebellar 19 5.8 
Developmental (e.g., intellectual disability, cerebral palsy, autism spectrum Dx) 11 3.3 
Dementia (e.g., Alzheimer's, vascular, Huntington's, subcortical) 2.4 
Anoxia (e.g., carbon monoxide poisoning, asphyxia, cardiovascular) 2.1 
Multiple sclerosis 2.1 
Brain tumor w/ or w/o radiation 1.8 
Epilepsy/seizure Dx (e.g., foci in left temporal, medial temporal) 1.5 
Infectious (e.g., West Nile virus, Lyme's disease, lupus) 1.2 
Primary Dx—neurologic Dx w/ documented symptoms in the medical record 94 28.5 
Secondary Dx—neurologic Dx without deficits in self-care or lives independently 84 25.5 
No neurologic Dx or minimal neurologic Dx (e.g., mild TBI w/o LOC) 152 46.0 
Total number of patients 330 100.0 
Depression/mood/affect-related (major depressive Dx, bipolar-Dep) 32 9.7 
Psychiatric disorder NOS (multiple/no primary) 26 7.9 
Somatization (conversion, pain, somatoform) 24 7.3 
Anxiety disorder (e.g., general anxiety Dx, panic Dx, PTSD) 15 4.5 
Substance abuse (e.g., alcohol, marijuana, cocaine) 1.5 
Personality disorder (e.g., antisocial, borderline) 0.9 
Psychosis (e.g., schizophrenia) 0.6 
Primary Dx—psychiatric Dx w/ documented symptoms in the medical record 107 32.4 
Secondary Dx—psychiatric Dx without deficits in self-care or lives independently 31 9.4 
No psychiatric Dx or minimal psychiatric Dx (e.g., adjustment Dx, sleep disturbance) 192 58.2 
Normal—number of patients without either a neurologic or a psychiatric Dx 45 13.6 
Dual or multiple DX—number of patients with both a neurological and a psychiatric Dx 35 10.6 
Total number of patients w/ neurological Dx that were expected to show deficits 94 28.5 
Total number of patients w/ Dx considered minimal and were not expected to show deficits 236 71.5 
Total number of patients 330 100.0 

Volunteer

This sample consisted of 99 participants who were all undergraduates enrolled in a private Midwestern college who volunteered for a larger study designed to collect information regarding the base rate of abnormal test scores generated from a commonly administered standardized test battery (Axelrod & Wall, 2007). In exchange for their participation, students were given course credit and a small amount of financial remuneration. Table 2 lists the demographic descriptive statistics of the volunteers. Although some of the volunteers may have suffered from a condition that caused cognitive dysfunction, none had been referred for a neuropsychological assessment by a healthcare provider. Furthermore, all volunteers were living independently. The University website lists an average ACT score for admitted students to be 22. Using equations provided by Koenig, Frey, and Detterman (2008), this mean equates to a Full Scale IQ (FSIQ on the Wechsler Adult Intelligence Scale-III [WAIS-III] of 103.3 (SD = 12.6).

Design and Data Analysis

Regression analyses were used along with Fisher's z-tests for the pairwise comparisons of the regression coefficients. These analyses resulted in a regression coefficient (r), percent of variance accounted for (r2), and the standard error of the estimate (SEest) from three normative systems. The regression coefficient is a numerical index that ranges from −1.00 to +1.00 and indicates the strength and direction of the linear relationship between two continuous variables (Moore, McCabe, & Craig, 2012). The standard error of estimate (Moore et al., 2012) is a measure of error in the regression predictions, with lower scores considered to be more accurate predictions. According to Tabachnick and Fidell (2013), this test is best used when one wants to determine the significance of the difference between two correlated coefficients, which used the same sample and shared a similar dependent variable. For the significant difference between correlations, the result of the Fisher's z-tests must be higher than 1.96 for a two-tailed test. Additional paired t-tests were conducted to compare mean scores across normative systems. We tested the slopes of the resulting regression lines to see whether they were significantly different from 1.00, as well as the intercepts, to see whether they were significantly different from 0.00. We also compared slopes and intercepts with one another to see whether the systems differed in their predictions of scores. Finally, all regression results' scatter plots were visually inspected for the presence of outliers.

Materials

Estimates of subjects' FSIQ from the WAIS-III were generated using the North American Adult Reading Test (NAART; Uttle, 2002) and the Reading subtest of the Wide Range Achievement Test-Revised (WRAT-R; Jastak & Wilkinson, 1984). Additional estimates of the subjects' FSIQ were generated using demographic variables that were entered into the Barona equation (Barona, Reynolds, & Chastain, 1984) and the WTAR equation (Holdnack, 2001).

Each test used in the MNS, which the three normative systems had in common, was included in our analyses. The tests that were examined are listed in Table 1. The normative systems were used to calculate T scores, using the raw scores obtained from subjects, which were then demographically adjusted, based on the normative systems mean and standard deviation from the test scores.

The tests administered to the volunteers were slightly different from those that were administered to the patients. Specifically, there were only 11 tests that were of sufficient commonality across the normative systems for comparisons to be made. Yet, both samples had far more tests administered to each participant. The volunteers were administered a total of 11 different instruments, which generated 18 dependent variables. The patients were administered 18 different instruments, which yielded 40 dependent variables. Thus, minor differences were evident between the samples with respect to how the OTBM was calculated, as well as which tests are a component of the comparisons. The tests that had been administered to both samples included: (i) Category Test (CT; Labreche, 1983; DeFilippis & McCampbell, 1997); (ii) the Controlled Oral Word Association Test (COWA; using the letters FAS and CFL), a measure of verbal learning (California Verbal Learning Test or CVLT by Delis, Kramer, Kaplan, & Ober, 1987; Auditory Verbal Learning Test or AVLT by Rey, 1941), which included (iii) Trial 1; (iv) Trial 5; (v) Total Score; (vi) Immediate Free Recall; and (vii) Delayed Free Recall; (viii) Trail Making Test Part A; (ix) Trail Making Test Part B; and Finger Tapping of the (x) dominant and (xi) nondominant hands.

Additional Procedures

Statistical adjustments for T-score calculations

Some problems existed in the data that had to be addressed before the comparative analyses could be conducted. Not only did differences exist between the samples in terms of the tests administered, but there were also differences between the samples and the normative samples. For example, the verbal list learning test given to volunteers was the CVLT, whereas patients were given the AVLT. Moreover, the normative systems differed with respect to these two tests, with Heaton norms only available for scores from the CVLT, whereas the MA-Reg and Meyers' norms were only available for the AVLT. Therefore, raw scores from either the AVLT or the CVLT were transformed, via the prorating of raw scores, which were rounded to the nearest whole number, and these were then used for the normative system calculations of T scores. This process was necessary, because the total number of stimuli included in the AVLT is 15, whereas the total number of stimuli in the CVLT is 16.

A similar issue existed with the COWA. The volunteers had been given the letters C, F, and L, whereas the Heaton, MA-Reg, and Meyers' normative samples were all generated using the letters “F,” “A,” and “S.” Therefore, we scored the results of the letters CFL as though they had been generated from the FAS combination. Finally, the CT given to the two samples and available in the normative systems were not the same. The Heaton system used an older version of Halstead Category Test (HCT; Halstead, 1947), which consists of 208 items. The version of the test given to the two samples was also not the same. The patients were administered the Victoria Revisions of the Category Test (VCT), which consists of 81 items. The volunteers were given the Booklet Category Test (BCT), developed by DeFilippis and McCampbell (1997), which consists of 160 items. As noted, Heaton and colleagues’ (2004) norms were based on the HCT with 208 items. The Meyers' normative system uses the Victoria Revision of the CT (Kozel & Meyers, 1998), which only has 81 items. Therefore, the raw scores for the various administrations were all prorated, so that T scores for the three systems could be generated in a common manner. As is evident, there are several reasons why our comparisons of these normative systems might result in disparate results; not the least of which is the demographic differences of the normative samples from which the T scores are generated. Nevertheless, we believe that the data from these two samples were worthy of comparisons, despite the statistical manipulations that we had to subject the raw scores to so that the normative T scores could be compared fairly.

Another issue that required some compromise was the coding of ethnicity. The patient and volunteer samples included persons who self-identified as members of various recognized racial minorities within the United States. For patients, 33 individuals or 10.0% self-identified as belonging to a minority. Yet, only 8 of these 33 people identified themselves as African Americans (i.e., 2.4%; 6 men and 2 women). Moreover, for the volunteers, 14 participants or 14.1% identified themselves as members of a minority, but only 9 of these were African Americans (i.e., 9.1%; all women). To allow for the use of as many participants as possible, we coded all minority participants in both samples as African American (AA) and the AA norms of Heaton and colleagues (2004) were used to generate T scores regardless of what participants' ethnicity might have been. Thus, ethnicity was coded as a binary variable (i.e., White or minority/other).

Finally, the patient sample included individuals who were aged 16–19 and the volunteer sample included individuals who were 18–19. Both the Heaton and the MA-Reg normative systems do not provide data for individuals who under the age of 20 years old. However, Mitrushina and colleagues (2005) provided the regression equations used to generate the normative data presented in their tables. We used these equations, entering age as the independent variable, to generate exact T scores and standard deviations. We handled this problem slightly differently when Heaton normative system. These authors did not publish the regression equations they used to generate their normative tables. Therefore, we could not calculate exact scores as had been done with Mitrushina and colleagues. In our two samples, 13% of the patients and 52% of the volunteer were under the age of 20. Thus, we estimated the age-adjusted T scores of these younger subjects based on the youngest available age group provided by Heaton and colleagues (2004), which is 20–34 years old. This might have caused T scores for these younger subjects to be slightly in error if they were to be compared with T scores generated from a normative sample of the appropriate age.

Results

Estimated Premorbid Functioning

Our calculations of premorbid functioning included adjusted NAART, WRAT-R, Barona and colleagues (1984), and WTAR scores that were used to estimate FSIQ scores on the WAIS-III. These scores were averaged to generate a mean EPGA. For patients, the mean EPGA was 99.0 (SD = 8.9). For volunteers, the mean EPGA was 104.5 (SD = 7.0). As was noted previously, volunteers were students enrolled at a university whose admission data indicate entering freshman had an estimated FSIQ of 103.3 (SD = 12.6). Our EPGA for the volunteers closely matched the university's admission data. Volunteers' mean EPGA was significantly above that generated for patients, with a Cohen's d equal to 0.66. This was expected, as volunteers were college students who had not been referred for any neurological or psychiatric symptoms. On the other hand, patients were referred for an assessment because they were having problems coping with their environment. Some were not able to live independently (16%), others were either unemployed or underemployed (28%), and many had either a neurologic (45.0%) or a psychiatric diagnosis (32.0%), or both (9.0%) listed in their medical records. Only 14.0% of patients had no diagnosis listed in their medical records. Many within the patient sample had documented neurologic injuries, abnormal neurologic signs, or evidence of disease or injury based on the results of neuroimaging.

The magnitude of the difference between the EPGAs and the OTBMs, when patients and volunteers were compared, was quite similar. After averaging across normative systems, the standardized mean difference effect size for the EPGAs was −0.66 (p < .0001), with T scores of 49.3 and 53.0, respectively. Whereas, when the two samples OTBMs were compared, the difference was also significant, but the magnitude was slightly smaller (i.e., d = −0.47; p < .0001), with patients' OTBM equals 43.9 and volunteers' mean was 47.7. The difference between the EPGAs and the OTBMs averaged −5.4 points for the patients and −5.3 points for the volunteers. The reduction in the estimated effect size between the two groups was accounted for by an increase in the patients' OTBM standard deviation, when compared with the same data obtained from volunteers. The changes were consistent with one another across each of the normative systems examined, with there is no main effect of normative system on these data.

We generated a performance validity algorithm for the volunteers using three measures of validity that have previously been investigated. These were excessively large intraindividual variability (Hill, Rohling, Boettcher, & Meyers, 2013), excessively low scores on finger tapping (Meyers et al., 2011), the Seashore Rhythm Test, and Speech Sounds Perception Test (Ross, Putnam, Millis, Adams, & Krukowski, 2006). Data from volunteers who failed one or more of the four embedded performance validity measures were considered questionable. Across all volunteers, 25.3% failed at least one performance validity test. Thus, unlike the patient sample, which was screened for invalidity, the volunteers were initially assumed to be healthy and likely to perform to the best of their ability. However, this assumption proved to be false and there were significant differences on the OTBM in the volunteer sample between questionable and valid performing patients (43.6 vs. 48.9, respectively, p < .0001, d = −0.95).

Normative Comparisons

We present the results of our regression analyses of the three normative systems and two samples in Tables 4–11. We present graphic illustrations of these results in Figs. 1–8. The comparisons are presented in the tables and figures in the following order: (a) Meyers/Heaton, (b) MA-Reg/Heaton, and (c) MA-Reg/Meyers.

Table 4.

Comparisons of correlations across normative systems: (a) Meyers’ (m) compared with Heaton's (h); (b) MA-Reg's (r) compared with Heaton's (h); and (c) MA-Reg's (r) compared with Meyers’ (m)

 Test scores examined Norm system Patients
 
Volunteers
 
r z p SEest r z p SEest 
1a Category Test m-h .982 14.67 <.0001 2.38 .949 1.13 .13 3.98 
1b Category Test r-h .896 — — 6.62 .930 — — 4.59 
1c Category Test r-m .872 1.41 .16 6.24 .984 5.21 <.0001 6.08 
2a Animal Naming m-h .961 3.84 <.0001 3.35 — — — — 
2b Animal Naming r-h .930 — — 5.07 — — — — 
2c Animal Naming r-m .957 3.20 .0014 3.51 — — — — 
3a COWA m-h .968 1.61 .0282 2.59 .900 2.64 .0083 2.71 
3b COWA r-h .959 — — 2.93 .952 — — 2.84 
3c COWA r-m .971 2.25 .0244 2.49 .936 1.03 .30 1.81 
4a Boston Name Test m-h .820 0.91 .37 7.04 — — — — 
4b Boston Name Test r-h .842 — — 8.86 — — — — 
4c Boston Name Test r-m .934 5.89 <.0001 4.38 — — — — 
5a VLT Trial 1 m-h .969 3.25 <.0001 3.11 .939 4.50 <.0001 3.34 
5b VLT Trial 1 r-h .949 — — 3.43 .983 — — 1.80 
5c VLT Trial 1 r-m .960 1.59 .12 3.52 .935 4.73 <.0001 3.69 
6a VLT Trial 5 m-h .943 0.45 .66 4.94 .953 0.94 .35 3.07 
6b VLT Trial 5 r-h .939 — — 4.19 .964 — — 2.73 
6c VLT Trial 5 r-m .960 2.77 .0056 4.16 .964 0.00 1.00 2.56 
7a VLT Total Score m-h .921 0.17 .87 5.84 .945 0.48 .64 3.56 
7b VLT Total Score r-h .919 — — 5.02 .952 — — 3.42 
7c VLT Total Score r-m .936 1.56 .12 5.30 .970 1.66 .10 2.74 
8a VLT Immed. FR m-h .965 1.19 .24 3.75 .968 2.22 .0264 2.46 
8b VLT Immed. FR r-h .958 — — 3.47 .983 — — 1.90 
8c VLT Immed. FR r-m .981 5.15 <.0001 2.81 .879 7.08 <.0001 1.90 
9a VLT Delayed FR m-h .955 1.07 .29 4.12 .939 3.45 .0006 3.81 
9b VLT Delayed FR r-h .947 — — 5.50 .977 — — 2.33 
9c VLT Delayed FR r-m .953 0.79 .43 4.21 .968 1.16 .25 2.36 
10a Trials A m-h .957 4.57 <.0001 3.63 .979 5.73 <.0001 2.59 
10b Trials A r-h .914 — — 6.13 .895 — — 4.99 
10c Trials A r-m .921 0.57 .57 4.85 .870 0.79 .43 6.19 
11a Trials B m-h .968 7.30 <.0001 3.22 .979 6.94 <.0001 1.97 
11b Trials B r-h .903 — — 7.14 .854 — — 4.99 
11c Trials B r-m .901 0.14 .89 5.53 .841 0.32 .75 5.51 
12a Finger Tap Dom m-h .955 2.62 .0088 3.01 .970 1.99 .466 2.82 
12b Finger Tap Dom r-h .933 — — 5.28 .983 — — 2.12 
12c Finger Tap Dom r-m .928 0.48 .64 3.78 .977 1.06 .29 2.05 
13a Finger Tap Ndom m-h .962 1.79 .0735 3.19 .972 0.26 .80 2.20 
13b Finger Tap Ndom r-h .950 — — 4.79 .974 — — 2.52 
13c Finger Tap Ndom r-m .930 2.22 .0264 4.33 .958 1.69 .10 2.26 
OTBM(s) m-h .967 3.80 <.0001 2.16 .971 1.04 .30 1.38 
OTBM(s) r-h .941 — — 3.14 .961 — — 1.65 
OTBM(s) r-m .981 7.04 <.0001 1.67 .965 0.38 .71 1.47 
 Test scores examined Norm system Patients
 
Volunteers
 
r z p SEest r z p SEest 
1a Category Test m-h .982 14.67 <.0001 2.38 .949 1.13 .13 3.98 
1b Category Test r-h .896 — — 6.62 .930 — — 4.59 
1c Category Test r-m .872 1.41 .16 6.24 .984 5.21 <.0001 6.08 
2a Animal Naming m-h .961 3.84 <.0001 3.35 — — — — 
2b Animal Naming r-h .930 — — 5.07 — — — — 
2c Animal Naming r-m .957 3.20 .0014 3.51 — — — — 
3a COWA m-h .968 1.61 .0282 2.59 .900 2.64 .0083 2.71 
3b COWA r-h .959 — — 2.93 .952 — — 2.84 
3c COWA r-m .971 2.25 .0244 2.49 .936 1.03 .30 1.81 
4a Boston Name Test m-h .820 0.91 .37 7.04 — — — — 
4b Boston Name Test r-h .842 — — 8.86 — — — — 
4c Boston Name Test r-m .934 5.89 <.0001 4.38 — — — — 
5a VLT Trial 1 m-h .969 3.25 <.0001 3.11 .939 4.50 <.0001 3.34 
5b VLT Trial 1 r-h .949 — — 3.43 .983 — — 1.80 
5c VLT Trial 1 r-m .960 1.59 .12 3.52 .935 4.73 <.0001 3.69 
6a VLT Trial 5 m-h .943 0.45 .66 4.94 .953 0.94 .35 3.07 
6b VLT Trial 5 r-h .939 — — 4.19 .964 — — 2.73 
6c VLT Trial 5 r-m .960 2.77 .0056 4.16 .964 0.00 1.00 2.56 
7a VLT Total Score m-h .921 0.17 .87 5.84 .945 0.48 .64 3.56 
7b VLT Total Score r-h .919 — — 5.02 .952 — — 3.42 
7c VLT Total Score r-m .936 1.56 .12 5.30 .970 1.66 .10 2.74 
8a VLT Immed. FR m-h .965 1.19 .24 3.75 .968 2.22 .0264 2.46 
8b VLT Immed. FR r-h .958 — — 3.47 .983 — — 1.90 
8c VLT Immed. FR r-m .981 5.15 <.0001 2.81 .879 7.08 <.0001 1.90 
9a VLT Delayed FR m-h .955 1.07 .29 4.12 .939 3.45 .0006 3.81 
9b VLT Delayed FR r-h .947 — — 5.50 .977 — — 2.33 
9c VLT Delayed FR r-m .953 0.79 .43 4.21 .968 1.16 .25 2.36 
10a Trials A m-h .957 4.57 <.0001 3.63 .979 5.73 <.0001 2.59 
10b Trials A r-h .914 — — 6.13 .895 — — 4.99 
10c Trials A r-m .921 0.57 .57 4.85 .870 0.79 .43 6.19 
11a Trials B m-h .968 7.30 <.0001 3.22 .979 6.94 <.0001 1.97 
11b Trials B r-h .903 — — 7.14 .854 — — 4.99 
11c Trials B r-m .901 0.14 .89 5.53 .841 0.32 .75 5.51 
12a Finger Tap Dom m-h .955 2.62 .0088 3.01 .970 1.99 .466 2.82 
12b Finger Tap Dom r-h .933 — — 5.28 .983 — — 2.12 
12c Finger Tap Dom r-m .928 0.48 .64 3.78 .977 1.06 .29 2.05 
13a Finger Tap Ndom m-h .962 1.79 .0735 3.19 .972 0.26 .80 2.20 
13b Finger Tap Ndom r-h .950 — — 4.79 .974 — — 2.52 
13c Finger Tap Ndom r-m .930 2.22 .0264 4.33 .958 1.69 .10 2.26 
OTBM(s) m-h .967 3.80 <.0001 2.16 .971 1.04 .30 1.38 
OTBM(s) r-h .941 — — 3.14 .961 — — 1.65 
OTBM(s) r-m .981 7.04 <.0001 1.67 .965 0.38 .71 1.47 
Table 5.

Results from patient data for each of the 13 test scores examined, as well as the overall test battery mean (OTBM13). The independent variables in the regression equations are Meyers’ normative T scores and the dependent variables are Heaton's normative T scores

 Patients data set Raw scores
 
Independent var. Meyers T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 55.6 32.1 43.1 12.6 42.9 12.5 .990 0.57 0.98 30 45 59 97.7% 
Animal Naming 19.8 6.0 48.1 12.2 44.8 13.1 .961 −4.95 1.03 26 42 57 92.4% 
COWA 33.1 11.9 42.4 10.4 41.0 10.8 .968 −1.72 1.01 29 44 59 93.8% 
Boston Naming Test 51.9 7.2 41.3 12.3 46.1 11.6 .820 14.03 0.78 37 49 61 67.2% 
VLT Trial 1 6.1 1.9 45.9 12.5 45.7 11.4 .969 5.22 0.88 32 45 58 93.8% 
VLT Trial 5 11.6 2.9 45.4 12.1 39.6 12.4 .943 3.55 0.79 27 39 51 88.8% 
VLT Total Score 47.2 11.6 43.8 15.0 46.1 12.1 .921 13.48 0.74 36 47 58 84.9% 
VLT Immed. FR 9.7 3.7 45.4 14.4 44.2 12.7 .965 5.45 0.85 31 44 57 93.2% 
VLT Delayed FR 9.4 3.6 45.1 13.9 42.6 14.2 .955 −1.37 0.98 28 43 57 91.3% 
10 Trails A 33.3 21.6 44.6 12.4 45.5 12.3 .957 3.16 0.95 32 46 60 91.5% 
11 Trails B 81.4 45.4 45.7 12.7 46.1 13.1 .968 0.51 1.00 30 45 60 93.6% 
12 Finger Tap Dom 45.5 9.5 43.5 10.1 42.8 11.1 .955 −2.84 1.05 29 44 60 91.2% 
13 Finger Tap Ndom 41.8 10.2 45.0 11.7 43.5 12.6 .962 −3.13 1.04 28 43 60 92.6% 
— Column Averages — — 44.6 12.6 43.9 12.3 .963 2.46 0.93 30.4 44.3 58.3 92.7% 
— OTBM13 Mean — — 44.6 8.5 43.9 8.1 .967 2.73 0.92 30.4 44.3 58.2 93.5% 
— OTBM13 SD (IIV) — — 10.1 7.9 9.9 7.7 .838 1.52 0.83 5 = 5.6 10 = 9.8 15 = 13.9 70.2% 
 Patients data set Raw scores
 
Independent var. Meyers T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 55.6 32.1 43.1 12.6 42.9 12.5 .990 0.57 0.98 30 45 59 97.7% 
Animal Naming 19.8 6.0 48.1 12.2 44.8 13.1 .961 −4.95 1.03 26 42 57 92.4% 
COWA 33.1 11.9 42.4 10.4 41.0 10.8 .968 −1.72 1.01 29 44 59 93.8% 
Boston Naming Test 51.9 7.2 41.3 12.3 46.1 11.6 .820 14.03 0.78 37 49 61 67.2% 
VLT Trial 1 6.1 1.9 45.9 12.5 45.7 11.4 .969 5.22 0.88 32 45 58 93.8% 
VLT Trial 5 11.6 2.9 45.4 12.1 39.6 12.4 .943 3.55 0.79 27 39 51 88.8% 
VLT Total Score 47.2 11.6 43.8 15.0 46.1 12.1 .921 13.48 0.74 36 47 58 84.9% 
VLT Immed. FR 9.7 3.7 45.4 14.4 44.2 12.7 .965 5.45 0.85 31 44 57 93.2% 
VLT Delayed FR 9.4 3.6 45.1 13.9 42.6 14.2 .955 −1.37 0.98 28 43 57 91.3% 
10 Trails A 33.3 21.6 44.6 12.4 45.5 12.3 .957 3.16 0.95 32 46 60 91.5% 
11 Trails B 81.4 45.4 45.7 12.7 46.1 13.1 .968 0.51 1.00 30 45 60 93.6% 
12 Finger Tap Dom 45.5 9.5 43.5 10.1 42.8 11.1 .955 −2.84 1.05 29 44 60 91.2% 
13 Finger Tap Ndom 41.8 10.2 45.0 11.7 43.5 12.6 .962 −3.13 1.04 28 43 60 92.6% 
— Column Averages — — 44.6 12.6 43.9 12.3 .963 2.46 0.93 30.4 44.3 58.3 92.7% 
— OTBM13 Mean — — 44.6 8.5 43.9 8.1 .967 2.73 0.92 30.4 44.3 58.2 93.5% 
— OTBM13 SD (IIV) — — 10.1 7.9 9.9 7.7 .838 1.52 0.83 5 = 5.6 10 = 9.8 15 = 13.9 70.2% 
Table 6.

Results from patient data for each of the 13 test scores examined, as well as the overall test battery mean (OTBM13). The independent variables in the regression equations are MA-Reg's normative T scores and the dependent variables are Heaton's normative scores

 Patients data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 55.6 32.1 43.2 14.9 42.9 12.5 .896 10.26 0.75 33 44 56 80.2% 
Animal Naming 19.8 6.0 43.0 13.8 44.8 13.1 .930 6.98 0.88 33 47 60 86.6% 
COWA 33.1 11.9 41.1 10.3 41.0 10.8 .959 −0.18 1.00 30 45 60 92.0% 
Boston Naming Test 51.9 7.2 40.0 16.4 46.1 11.6 .842 22.27 0.60 40 49 58 70.9% 
VLT Trial 1 6.1 1.9 45.4 10.9 45.7 11.4 .949 0.61 0.99 30 45 60 90.1% 
VLT Trial 5 11.6 2.9 45.4 12.1 39.6 12.4 .939 −4.04 0.96 25 39 54 88.1% 
VLT Total Score 47.2 11.6 42.7 12.7 46.1 12.1 .919 8.69 0.88 35 48 61 84.5% 
VLT Immed. FR 9.7 3.7 44.1 12.0 44.2 12.7 .958 −0.32 1.01 30 45 60 91.7% 
VLT Delayed FR 9.4 3.6 42.5 17.2 42.6 14.2 .947 9.36 0.78 33 45 56 89.8% 
10 Trails A 33.3 21.6 45.8 15.0 45.5 12.3 .914 11.15 0.75 34 45 56 83.5% 
11 Trails B 81.4 45.4 45.0 16.6 46.1 13.1 .903 14.00 0.71 35 46 57 81.6% 
12 Finger Tap Dom 45.5 9.5 39.4 14.7 42.8 11.1 .933 15.01 0.71 36 47 57 87.1% 
13 Finger Tap Ndom 41.8 10.2 40.8 15.4 43.5 12.6 .950 11.66 0.78 35 47 58 90.3% 
— Column averages — — 42.9 14.4 43.9 12.3 .936 8.11 0.83 33.0 45.5 57.9 87.6% 
— OTBM13 mean — — 42.9 9.3 43.9 8.1 .941 8.68 0.82 33.3 45.6 57.9 88.6% 
— OTBM13 SD (IIV) — — 11.3 9.2 9.9 7.7 .838 1.52 0.83 5 = 5.7 10 = 9.8 15 = 14.0 60.7% 
 Patients data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 55.6 32.1 43.2 14.9 42.9 12.5 .896 10.26 0.75 33 44 56 80.2% 
Animal Naming 19.8 6.0 43.0 13.8 44.8 13.1 .930 6.98 0.88 33 47 60 86.6% 
COWA 33.1 11.9 41.1 10.3 41.0 10.8 .959 −0.18 1.00 30 45 60 92.0% 
Boston Naming Test 51.9 7.2 40.0 16.4 46.1 11.6 .842 22.27 0.60 40 49 58 70.9% 
VLT Trial 1 6.1 1.9 45.4 10.9 45.7 11.4 .949 0.61 0.99 30 45 60 90.1% 
VLT Trial 5 11.6 2.9 45.4 12.1 39.6 12.4 .939 −4.04 0.96 25 39 54 88.1% 
VLT Total Score 47.2 11.6 42.7 12.7 46.1 12.1 .919 8.69 0.88 35 48 61 84.5% 
VLT Immed. FR 9.7 3.7 44.1 12.0 44.2 12.7 .958 −0.32 1.01 30 45 60 91.7% 
VLT Delayed FR 9.4 3.6 42.5 17.2 42.6 14.2 .947 9.36 0.78 33 45 56 89.8% 
10 Trails A 33.3 21.6 45.8 15.0 45.5 12.3 .914 11.15 0.75 34 45 56 83.5% 
11 Trails B 81.4 45.4 45.0 16.6 46.1 13.1 .903 14.00 0.71 35 46 57 81.6% 
12 Finger Tap Dom 45.5 9.5 39.4 14.7 42.8 11.1 .933 15.01 0.71 36 47 57 87.1% 
13 Finger Tap Ndom 41.8 10.2 40.8 15.4 43.5 12.6 .950 11.66 0.78 35 47 58 90.3% 
— Column averages — — 42.9 14.4 43.9 12.3 .936 8.11 0.83 33.0 45.5 57.9 87.6% 
— OTBM13 mean — — 42.9 9.3 43.9 8.1 .941 8.68 0.82 33.3 45.6 57.9 88.6% 
— OTBM13 SD (IIV) — — 11.3 9.2 9.9 7.7 .838 1.52 0.83 5 = 5.7 10 = 9.8 15 = 14.0 60.7% 
Table 7.

Results from patient data for each of the 13 test scores examined, as well as the overall test battery mean (OTBM13). The independent variables in the regression equations are MA-Reg's normative T scores and the dependent variables are Meyers’ normative T scores

 Patients data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Meyers T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 55.6 32.1 43.2 14.9 43.2 12.7 .872 10.90 0.75 33 45 56 76.1% 
Animal Naming 19.8 6.0 43.0 13.8 48.1 12.2 .957 11.94 0.84 37 50 62 91.7% 
COWA 33.1 11.9 41.1 10.3 42.4 10.4 .971 2.35 0.97 32 46 61 94.2% 
Boston Naming Test 51.9 7.2 40.0 16.4 41.3 12.3 .934 13.41 0.70 34 45 55 87.3% 
VLT Trial 1 6.1 1.9 45.4 10.9 45.9 12.5 .960 −4.17 1.10 29 45 62 92.1% 
VLT Trial 5 11.6 2.9 45.4 12.1 45.4 12.1 .960 −7.56 1.17 27 45 63 92.1% 
VLT Total Score 47.2 11.6 42.7 12.7 43.8 15.0 .936 −3.33 1.11 30 46 63 87.6% 
VLT Immed. FR 9.7 3.7 44.1 12.0 45.4 14.4 .981 −6.15 1.17 31 46 64 96.2% 
VLT Delayed FR 9.4 3.6 42.5 17.2 45.1 13.9 .953 12.32 0.77 35 47 59 90.9% 
10 Trails A 33.3 21.6 45.8 15.0 44.6 12.4 .921 9.70 0.76 33 44 55 84.9% 
11 Trails B 81.4 45.4 45.0 16.6 45.7 12.7 .901 14.65 0.69 35 46 56 81.2% 
12 Finger Tap Dom 45.5 9.5 39.4 14.7 43.5 10.1 .928 18.35 0.64 37 47 57 86.1% 
13 Finger Tap Ndom 41.8 10.2 40.8 15.4 45.0 11.7 .930 16.09 0.71 37 48 59 86.4% 
— Column averages — — 42.9 14.4 44.6 12.6 .945 6.81 0.90 33.1 46.2 59.4 89.3% 
— OTBM13 Mean — — 42.9 9.3 44.6 8.5 .980 6.16 0.90 33.0 46.4 59.9 96.1% 
— OTBM13 SD (IIV) — — 11.3 9.2 10.1 7.9 .779 2.81 0.64 5 = 6.0 10 = 9.2 15 = 12.4 60.7% 
 Patients data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Meyers T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 55.6 32.1 43.2 14.9 43.2 12.7 .872 10.90 0.75 33 45 56 76.1% 
Animal Naming 19.8 6.0 43.0 13.8 48.1 12.2 .957 11.94 0.84 37 50 62 91.7% 
COWA 33.1 11.9 41.1 10.3 42.4 10.4 .971 2.35 0.97 32 46 61 94.2% 
Boston Naming Test 51.9 7.2 40.0 16.4 41.3 12.3 .934 13.41 0.70 34 45 55 87.3% 
VLT Trial 1 6.1 1.9 45.4 10.9 45.9 12.5 .960 −4.17 1.10 29 45 62 92.1% 
VLT Trial 5 11.6 2.9 45.4 12.1 45.4 12.1 .960 −7.56 1.17 27 45 63 92.1% 
VLT Total Score 47.2 11.6 42.7 12.7 43.8 15.0 .936 −3.33 1.11 30 46 63 87.6% 
VLT Immed. FR 9.7 3.7 44.1 12.0 45.4 14.4 .981 −6.15 1.17 31 46 64 96.2% 
VLT Delayed FR 9.4 3.6 42.5 17.2 45.1 13.9 .953 12.32 0.77 35 47 59 90.9% 
10 Trails A 33.3 21.6 45.8 15.0 44.6 12.4 .921 9.70 0.76 33 44 55 84.9% 
11 Trails B 81.4 45.4 45.0 16.6 45.7 12.7 .901 14.65 0.69 35 46 56 81.2% 
12 Finger Tap Dom 45.5 9.5 39.4 14.7 43.5 10.1 .928 18.35 0.64 37 47 57 86.1% 
13 Finger Tap Ndom 41.8 10.2 40.8 15.4 45.0 11.7 .930 16.09 0.71 37 48 59 86.4% 
— Column averages — — 42.9 14.4 44.6 12.6 .945 6.81 0.90 33.1 46.2 59.4 89.3% 
— OTBM13 Mean — — 42.9 9.3 44.6 8.5 .980 6.16 0.90 33.0 46.4 59.9 96.1% 
— OTBM13 SD (IIV) — — 11.3 9.2 10.1 7.9 .779 2.81 0.64 5 = 6.0 10 = 9.2 15 = 12.4 60.7% 
Table 8.

Results from volunteers data for each of the 11 test scores examined, as well as the overall test battery mean (OTBM11). The independent variables in the regression equations are MA-Reg's T normative scores and the dependent variables are Heaton's normative T scores

 Volunteers data set Raw scores
 
Independent var. Meyers T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 32.0 19.7 49.7 12.7 45.6 12.6 .949 −1.35 0.94 27 41 55 90.0% 
COWA 37.9 10.0 43.3 9.0 43.0 9.3 .900 2.90 0.93 31 45 59 81.0% 
VLT Trial 1 7.0 1.7 44.6 10.4 44.1 9.8 .967 2.07 0.89 29 42 56 93.6% 
VLT Trial 5 12.2 1.9 41.4 9.7 45.2 10.2 .953 3.70 1.00 34 49 64 90.8% 
VLT Total Score 51.9 8.8 40.5 11.3 43.7 11.0 .945 6.65 0.91 34 48 61 89.3% 
VLT Immed. FR 11.2 2.4 45.6 9.3 46.3 9.9 .968 −0.54 1.03 30 46 61 93.7% 
VLT Delayed FR 11.3 2.6 45.3 9.4 46.4 10.9 .939 −2.77 1.08 30 46 62 87.7% 
Trails A 21.9 7.9 52.7 12.6 53.0 12.0 .979 3.91 0.93 32 46 60 95.8% 
Trails B 46.4 14.0 55.4 10.2 55.7 9.6 .979 4.58 0.92 32 46 60 95.8% 
10 Finger Tap Dom 48.5 8.3 48.4 9.5 49.4 11.7 .970 −8.18 1.19 28 45 63 94.1% 
11 Finger Tap Ndom 46.0 6.0 53.0 7.9 51.1 9.3 .972 −10.21 1.15 24 42 59 94.4% 
— Column Averages — — 47.3 10.3 47.6 10.6 .959 0.02 1.00 30.0 45.0 60.0 91.5% 
— OTBM11 mean — — 47.3 5.7 47.6 6.0 .971 −0.79 1.02 29.9 45.2 60.6 94.2% 
— OTBM11 SD (IIV) — — 10.1 7.9 10.1 7.9 .905 1.45 0.84 5 = 5.6 10 = 9.8 15 = 14.0 82.0% 
 Volunteers data set Raw scores
 
Independent var. Meyers T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 32.0 19.7 49.7 12.7 45.6 12.6 .949 −1.35 0.94 27 41 55 90.0% 
COWA 37.9 10.0 43.3 9.0 43.0 9.3 .900 2.90 0.93 31 45 59 81.0% 
VLT Trial 1 7.0 1.7 44.6 10.4 44.1 9.8 .967 2.07 0.89 29 42 56 93.6% 
VLT Trial 5 12.2 1.9 41.4 9.7 45.2 10.2 .953 3.70 1.00 34 49 64 90.8% 
VLT Total Score 51.9 8.8 40.5 11.3 43.7 11.0 .945 6.65 0.91 34 48 61 89.3% 
VLT Immed. FR 11.2 2.4 45.6 9.3 46.3 9.9 .968 −0.54 1.03 30 46 61 93.7% 
VLT Delayed FR 11.3 2.6 45.3 9.4 46.4 10.9 .939 −2.77 1.08 30 46 62 87.7% 
Trails A 21.9 7.9 52.7 12.6 53.0 12.0 .979 3.91 0.93 32 46 60 95.8% 
Trails B 46.4 14.0 55.4 10.2 55.7 9.6 .979 4.58 0.92 32 46 60 95.8% 
10 Finger Tap Dom 48.5 8.3 48.4 9.5 49.4 11.7 .970 −8.18 1.19 28 45 63 94.1% 
11 Finger Tap Ndom 46.0 6.0 53.0 7.9 51.1 9.3 .972 −10.21 1.15 24 42 59 94.4% 
— Column Averages — — 47.3 10.3 47.6 10.6 .959 0.02 1.00 30.0 45.0 60.0 91.5% 
— OTBM11 mean — — 47.3 5.7 47.6 6.0 .971 −0.79 1.02 29.9 45.2 60.6 94.2% 
— OTBM11 SD (IIV) — — 10.1 7.9 10.1 7.9 .905 1.45 0.84 5 = 5.6 10 = 9.8 15 = 14.0 82.0% 
Table 9.

Results from volunteers data for each of the 11 test scores examined, as well as the overall test battery mean (OTBM11). The independent variables in the regression equations are MA-Reg's normative T scores and the dependent variables are Heaton's normative T scores

 Volunteers data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 41.7 25.6 43.6 13.8 45.6 12.6 .930 8.51 0.85 34 47 60 86.5% 
COWA 37.9 9.8 45.9 9.0 43.0 9.3 .952 −2.39 0.99 27 42 57 90.7% 
VLT Trial 1 7.0 2.0 47.9 11.3 44.1 9.8 .983 3.40 0.85 29 42 54 96.6% 
VLT Trial 5 12.3 2.0 44.9 13.4 45.2 10.2 .964 12.41 0.73 34 45 56 93.0% 
VLT Total Score 51.8 8.8 50.8 9.0 43.7 11.0 .952 −15.06 1.16 20 37 54 90.7% 
VLT Immed. FR 11.1 2.6 46.5 10.1 46.3 9.9 .983 1.57 0.96 30 45 59 96.5% 
VLT Delayed FR 11.5 2.4 45.2 12.9 46.4 10.9 .977 8.89 0.83 34 46 59 95.4% 
Trails A 21.9 7.8 53.3 10.1 53.0 12.0 .895 −3.52 1.06 28 44 60 80.3% 
Trails B 46.4 14.0 54.4 7.7 55.7 9.6 .854 −2.67 1.07 30 46 62 73.0% 
10 Finger Tap Dom 48.5 8.3 45.3 17.6 49.4 11.7 .983 19.78 0.66 39 49 59 96.6% 
11 Finger Tap Ndom 46.0 6.0 48.4 12.4 51.1 9.3 .974 15.82 0.73 38 49 60 94.8% 
— Column Averages — — 47.8 11.9 47.6 10.6 .961 4.35 0.91 31.3 44.7 58.2 90.3% 
— OTBM11 mean — — 47.8 6.4 47.6 6.0 .961 4.69 0.90 31.6 45.0 58.5 92.3% 
— OTBM11 SD (IIV) — — 11.1 9.0 10.1 7.9 .707 3.60 0.58 5 = 6.5 10 = 9.2 15 = 12.3 50.0% 
 Volunteers data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Heaton T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 41.7 25.6 43.6 13.8 45.6 12.6 .930 8.51 0.85 34 47 60 86.5% 
COWA 37.9 9.8 45.9 9.0 43.0 9.3 .952 −2.39 0.99 27 42 57 90.7% 
VLT Trial 1 7.0 2.0 47.9 11.3 44.1 9.8 .983 3.40 0.85 29 42 54 96.6% 
VLT Trial 5 12.3 2.0 44.9 13.4 45.2 10.2 .964 12.41 0.73 34 45 56 93.0% 
VLT Total Score 51.8 8.8 50.8 9.0 43.7 11.0 .952 −15.06 1.16 20 37 54 90.7% 
VLT Immed. FR 11.1 2.6 46.5 10.1 46.3 9.9 .983 1.57 0.96 30 45 59 96.5% 
VLT Delayed FR 11.5 2.4 45.2 12.9 46.4 10.9 .977 8.89 0.83 34 46 59 95.4% 
Trails A 21.9 7.8 53.3 10.1 53.0 12.0 .895 −3.52 1.06 28 44 60 80.3% 
Trails B 46.4 14.0 54.4 7.7 55.7 9.6 .854 −2.67 1.07 30 46 62 73.0% 
10 Finger Tap Dom 48.5 8.3 45.3 17.6 49.4 11.7 .983 19.78 0.66 39 49 59 96.6% 
11 Finger Tap Ndom 46.0 6.0 48.4 12.4 51.1 9.3 .974 15.82 0.73 38 49 60 94.8% 
— Column Averages — — 47.8 11.9 47.6 10.6 .961 4.35 0.91 31.3 44.7 58.2 90.3% 
— OTBM11 mean — — 47.8 6.4 47.6 6.0 .961 4.69 0.90 31.6 45.0 58.5 92.3% 
— OTBM11 SD (IIV) — — 11.1 9.0 10.1 7.9 .707 3.60 0.58 5 = 6.5 10 = 9.2 15 = 12.3 50.0% 
Table 10.

Results from volunteers data for each of the 11 test scores examined, as well as the overall test battery mean (OTBM11). The independent variables in the regression equations are MA-Reg's normative T scores and the dependent variables are Heaton's normative T scores

 Volunteers data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Meyers T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 32.0 19.7 43.6 13.8 49.7 12.7 .894 13.92 0.82 39 51 63 79.9% 
COWA 37.9 10.0 45.9 9.0 43.3 9.0 .936 −0.08 0.94 26 41 55 87.6% 
VLT Trial 1 7.0 1.7 47.9 11.3 44.6 10.4 .935 3.73 0.85 29 42 55 87.4% 
VLT Trial 5 12.2 1.9 44.9 13.4 41.4 9.7 .964 10.34 0.69 31 42 52 93.0% 
VLT Total Score 51.9 8.8 50.8 9.0 40.5 11.3 .970 −21.25 1.22 15 33 52 94.1% 
VLT Immed. FR 11.2 2.4 46.5 10.1 45.6 9.3 .979 3.78 0.90 31 44 58 95.8% 
VLT Delayed FR 11.3 2.6 45.2 12.9 45.3 9.4 .968 13.24 0.71 35 45 56 93.7% 
Trails A 21.9 7.8 53.3 10.1 52.7 12.6 .870 −5.043 1.08 27 44 60 75.7% 
Trails B 46.4 14.0 54.4 7.7 55.4 10.2 .841 −5.54 1.12 28 45 62 70.7% 
10 Finger Tap Dom 48.5 8.3 45.3 17.6 48.4 9.5 .977 24.36 0.53 40 48 56 96.2% 
11 Finger Tap Ndom 46.0 6.0 48.4 12.4 53.0 7.9 .958 23.44 0.61 42 51 60 91.8% 
— Column Averages — — 47.8 11.9 47.3 10.3 .948 5.54 0.87 31.2 44.2 57.2 93.7% 
— OTBM11 mean — — 47.8 6.4 47.3 5.7 .965 6.32 0.86 32.0 46.8 57.7 90.9% 
— OTBM11 SD (IIV) — — 11.1 9.0 10.1 7.9 .624 4.02 0.55 5 = 6.8 10 = 9.5 15 = 12.3 38.9% 
 Volunteers data set Raw scores
 
Independent var. MA-Reg T scores
 
Dependent var. Meyers T scores
 
Regression results
 
DV scores that correspond to these IV scores
 
Var. acc. 
Test name M SD M SD M SD r Intercept Slope T = 30 T = 45 T = 60 R2 
Category Test 32.0 19.7 43.6 13.8 49.7 12.7 .894 13.92 0.82 39 51 63 79.9% 
COWA 37.9 10.0 45.9 9.0 43.3 9.0 .936 −0.08 0.94 26 41 55 87.6% 
VLT Trial 1 7.0 1.7 47.9 11.3 44.6 10.4 .935 3.73 0.85 29 42 55 87.4% 
VLT Trial 5 12.2 1.9 44.9 13.4 41.4 9.7 .964 10.34 0.69 31 42 52 93.0% 
VLT Total Score 51.9 8.8 50.8 9.0 40.5 11.3 .970 −21.25 1.22 15 33 52 94.1% 
VLT Immed. FR 11.2 2.4 46.5 10.1 45.6 9.3 .979 3.78 0.90 31 44 58 95.8% 
VLT Delayed FR 11.3 2.6 45.2 12.9 45.3 9.4 .968 13.24 0.71 35 45 56 93.7% 
Trails A 21.9 7.8 53.3 10.1 52.7 12.6 .870 −5.043 1.08 27 44 60 75.7% 
Trails B 46.4 14.0 54.4 7.7 55.4 10.2 .841 −5.54 1.12 28 45 62 70.7% 
10 Finger Tap Dom 48.5 8.3 45.3 17.6 48.4 9.5 .977 24.36 0.53 40 48 56 96.2% 
11 Finger Tap Ndom 46.0 6.0 48.4 12.4 53.0 7.9 .958 23.44 0.61 42 51 60 91.8% 
— Column Averages — — 47.8 11.9 47.3 10.3 .948 5.54 0.87 31.2 44.2 57.2 93.7% 
— OTBM11 mean — — 47.8 6.4 47.3 5.7 .965 6.32 0.86 32.0 46.8 57.7 90.9% 
— OTBM11 SD (IIV) — — 11.1 9.0 10.1 7.9 .624 4.02 0.55 5 = 6.8 10 = 9.5 15 = 12.3 38.9% 
Table 11.

Results from patients and volunteers data for the overall test battery means (OTBMs), showing the magnitude the composites, and the split-half reliability for each measure by sample. Results compare (a) normative data; and, (b) sample specific data (i.e., lower case “p” for Patients; lower case “v” for Volunteers’)

 Patients and volunteers data set Indep. var. Dep. var. Indep. var. Dep. var. Regression results—Spearman/Brown correction
 
DV scores that correspond to these IV scores
 
Var. acc. 
Measures and norms First half M Second half SD First half M Second half SD r Intercept Slope T = 30 T = 45 T = 60 R2 
1a pOTBM13 Heaton's norms 43.7 43.9 8.9 8.3 .941 7.24 0.84 32.4 45.0 57.6 89.5% 
2a vOTBM11 Heaton's norms 47.5 47.3 6.6 6.3 .897 10.49 0.78 33.9 45.6 57.3 80.5% 
 Mean for Heaton's norms 44.6 44.7 8.4 7.9 .933 7.99 0.83 33.7 45.1 57.5 87.0% 
3a pOTBM13 MA-Reg's norms 42.7 43.2 9.8 10.0 .934 5.25 0.89 32.0 45.3 58.7 87.2% 
4a vOTBM11 MA-Reg's norms 47.8 47.1 6.5 7.3 .896 8.88 0.83 33.8 46.2 58.7 80.2% 
 Mean for MA-Reg's norms 43.9 44.1 9.1 9.4 .927 6.09 0.88 32.6 45.6 58.7 85.9% 
5a pOTBM13 Meyers' norms 44.8 45.3 8.6 9.4 .922 3.33 0.94 31.5 45.6 59.7 85.0% 
6a vOTBM11 Meyers' norms 46.6 47.6 5.9 6.6 .866 7.85 0.85 33.4 46.1 58.9 74.9% 
Mean for Meyers' norms 46.7 45.2 8.1 8.8 .911 4.37 0.92 31.6 45.7 59.5 83.0% 
1b pOTBM13 Heaton's norms 43.7 43.9 8.9 8.3 .941 7.24 0.84 32.4 45.0 57.6 89.5% 
3b pOTBM13 MA-Reg's norms 42.7 43.2 9.8 10.0 .934 5.25 0.89 32.0 45.3 58.7 87.2% 
5b pOTBM13 Meyers's norms 44.8 45.3 8.6 9.4 .922 3.33 0.94 31.5 45.6 59.7 85.0% 
— Mean for patients' data 43.7 44.1 9.1 9.3 .933 5.27 0.89 32.0 45.3 58.7 89.0% 
2b vOTBM11 Heaton's norms 47.5 47.3 6.6 6.3 .897 10.49 0.78 33.9 45.6 57.3 80.5% 
4b vOTBM11 MA-Reg's norms 47.8 47.1 6.5 7.3 .896 8.88 0.83 33.8 46.2 58.7 80.2% 
6b vOTBM11 Meyers' norms 46.6 47.6 5.9 6.6 .866 7.85 0.85 33.4 46.1 58.9 74.9% 
— Mean for volunteers' data 47.3 47.3 6.3 6.7 .887 9.07 0.82 33.7 46.0 58.3 79.0% 
 Patients and volunteers data set Indep. var. Dep. var. Indep. var. Dep. var. Regression results—Spearman/Brown correction
 
DV scores that correspond to these IV scores
 
Var. acc. 
Measures and norms First half M Second half SD First half M Second half SD r Intercept Slope T = 30 T = 45 T = 60 R2 
1a pOTBM13 Heaton's norms 43.7 43.9 8.9 8.3 .941 7.24 0.84 32.4 45.0 57.6 89.5% 
2a vOTBM11 Heaton's norms 47.5 47.3 6.6 6.3 .897 10.49 0.78 33.9 45.6 57.3 80.5% 
 Mean for Heaton's norms 44.6 44.7 8.4 7.9 .933 7.99 0.83 33.7 45.1 57.5 87.0% 
3a pOTBM13 MA-Reg's norms 42.7 43.2 9.8 10.0 .934 5.25 0.89 32.0 45.3 58.7 87.2% 
4a vOTBM11 MA-Reg's norms 47.8 47.1 6.5 7.3 .896 8.88 0.83 33.8 46.2 58.7 80.2% 
 Mean for MA-Reg's norms 43.9 44.1 9.1 9.4 .927 6.09 0.88 32.6 45.6 58.7 85.9% 
5a pOTBM13 Meyers' norms 44.8 45.3 8.6 9.4 .922 3.33 0.94 31.5 45.6 59.7 85.0% 
6a vOTBM11 Meyers' norms 46.6 47.6 5.9 6.6 .866 7.85 0.85 33.4 46.1 58.9 74.9% 
Mean for Meyers' norms 46.7 45.2 8.1 8.8 .911 4.37 0.92 31.6 45.7 59.5 83.0% 
1b pOTBM13 Heaton's norms 43.7 43.9 8.9 8.3 .941 7.24 0.84 32.4 45.0 57.6 89.5% 
3b pOTBM13 MA-Reg's norms 42.7 43.2 9.8 10.0 .934 5.25 0.89 32.0 45.3 58.7 87.2% 
5b pOTBM13 Meyers's norms 44.8 45.3 8.6 9.4 .922 3.33 0.94 31.5 45.6 59.7 85.0% 
— Mean for patients' data 43.7 44.1 9.1 9.3 .933 5.27 0.89 32.0 45.3 58.7 89.0% 
2b vOTBM11 Heaton's norms 47.5 47.3 6.6 6.3 .897 10.49 0.78 33.9 45.6 57.3 80.5% 
4b vOTBM11 MA-Reg's norms 47.8 47.1 6.5 7.3 .896 8.88 0.83 33.8 46.2 58.7 80.2% 
6b vOTBM11 Meyers' norms 46.6 47.6 5.9 6.6 .866 7.85 0.85 33.4 46.1 58.9 74.9% 
— Mean for volunteers' data 47.3 47.3 6.3 6.7 .887 9.07 0.82 33.7 46.0 58.3 79.0% 
Fig. 1.

Distributions of scores for the composite overall test battery mean (i.e., OTBM13) generated from 13 measures using the patient sample data (n = 330). Each normative system is shown: Heaton norms are shown in black dotted line; MA-Reg norms are shown in light gray solid line; and Meyers norms are shown in dark gray solid line.

Fig. 1.

Distributions of scores for the composite overall test battery mean (i.e., OTBM13) generated from 13 measures using the patient sample data (n = 330). Each normative system is shown: Heaton norms are shown in black dotted line; MA-Reg norms are shown in light gray solid line; and Meyers norms are shown in dark gray solid line.

Fig. 2.

Patient sample (n = 330) results, showing the Overall Test Battery Mean generated from 13 scores (i.e., OTBM13) regressed across the various normative systems.

Fig. 2.

Patient sample (n = 330) results, showing the Overall Test Battery Mean generated from 13 scores (i.e., OTBM13) regressed across the various normative systems.

Fig. 3.

Example of results from the patient sample data (n = 330), showing the least variable (i.e., COWA FAS/CFL) and the most variable (i.e., Boston Naming Test) associations.

Fig. 3.

Example of results from the patient sample data (n = 330), showing the least variable (i.e., COWA FAS/CFL) and the most variable (i.e., Boston Naming Test) associations.

Fig. 4.

Distributions of scores for the composite Overall Test Battery Mean (i.e., OTBM11) generated from 11 measures using the volunteer sample data (n = 99). Each normative system is shown: Heaton norms are shown in black dotted line; MA-Reg norms are shown in light gray solid line; and Meyers norms are shown in dark gray solid line.

Fig. 4.

Distributions of scores for the composite Overall Test Battery Mean (i.e., OTBM11) generated from 11 measures using the volunteer sample data (n = 99). Each normative system is shown: Heaton norms are shown in black dotted line; MA-Reg norms are shown in light gray solid line; and Meyers norms are shown in dark gray solid line.

Fig. 5.

Volunteer sample (n = 99) results, showing the Overall Test Battery Mean generated from 11 scores (i.e., OTBM11) regressed across the various normative systems.

Fig. 5.

Volunteer sample (n = 99) results, showing the Overall Test Battery Mean generated from 11 scores (i.e., OTBM11) regressed across the various normative systems.

Fig. 6.

Example of results from the volunteer sample (n = 99), showing the least variable (i.e., VLTim) and the most variable (i.e., Trails B) associations.

Fig. 6.

Example of results from the volunteer sample (n = 99), showing the least variable (i.e., VLTim) and the most variable (i.e., Trails B) associations.

Fig. 7.

Data from the patient sample (n = 330), using multiple regression to predict scores, which resulted in near-perfect prediction, with regression equations that have a slope of 1.00 and an intercept of 0.00. Standardized regression coefficients are also presented for each of the four demographic variables (i.e., age, education, race, and sex); showing the influence the variables have on the regression results.

Fig. 7.

Data from the patient sample (n = 330), using multiple regression to predict scores, which resulted in near-perfect prediction, with regression equations that have a slope of 1.00 and an intercept of 0.00. Standardized regression coefficients are also presented for each of the four demographic variables (i.e., age, education, race, and sex); showing the influence the variables have on the regression results.

Fig. 8.

Data from the volunteer sample (n = 99), using multiple regression to predict scores, which resulted in near-perfect prediction, with regression equations that have a slope of 1.00 and an intercept of 0.00. Standardized regression coefficients are also presented for each of the four demographic variables (i.e., age, education, race, and sex); showing the influence the variables have on the regression results.

Fig. 8.

Data from the volunteer sample (n = 99), using multiple regression to predict scores, which resulted in near-perfect prediction, with regression equations that have a slope of 1.00 and an intercept of 0.00. Standardized regression coefficients are also presented for each of the four demographic variables (i.e., age, education, race, and sex); showing the influence the variables have on the regression results.

Standard error of the estimate

Table 4 shows the regression coefficients (r) and standard error of estimates (SEest) for each normative test comparison: (a) Meyers/Heaton, (b) MA-Reg/Heaton, and (c) MA-Reg/Meyers. Zheng and Agresti (2000) have shown that, in some circumstances, even though certain tests have a smaller estimated amount of bias than another, if it has a larger SEest, it is generally not as good of a predictor of the population correlation as a test with the smaller SEest. Accordingly, the majority of the comparisons of Meyers/Heaton and Meyers/MA-Reg showed higher correlations than the MA-Reg/Heaton and lower SEest. Even though the Boston Naming Test (BNT) was excluded from the Meyers/Heaton comparison, and Finger Tapping Non-dominant (FTND) and CT tests from Meyers/MA-Reg revealed a lower correlation than the MA-Reg/Heaton, yet the SEest results were lower than the MA-Reg/Heaton. Furthermore, the OTBM correlations with the lowest SEest were generated by the Meyers/Heaton and the MA-Reg/Meyers regressions, when compared with the MA-Reg/Heaton regression.

A comparative analysis was conducted on the SEest as shown in Table 4. Larger SEest are associated with more outliers, weaker associations, and thus less consistency. For the Meyers system, the regression coefficients had an average magnitude for the SEest that was the smallest (i.e., best) of the three systems, with 50% of the comparisons ranking as the smallest and 17% of the rankings in the second smallest. The correlation was the smallest or worst of the three systems for 33% of the comparisons. In contrast, the MA-Reg system comparisons were the largest for just 37% of the comparisons, second largest for 42% of the comparisons, and smallest for 21% of the comparisons. Finally, clearly lower than the other two systems, the Heaton systems SEest magnitude was the largest for 12% of the comparisons, second largest for 42% of the comparisons, and the smallest is 46% of the comparisons. These differences in associations among the various systems were also compared using a Kruskal–Wallis test for ranks, which revealed a significant difference, H = 6.42, p = .0403. When conducting pairwise comparisons, it was found that the Meyers system had very similar magnitudes of comparisons to the MA-Reg (M rank = 34) and Heaton systems (M rank = 33), but the comparisons of the Heaton and MA-Reg systems showed significantly lower associations (M rank = 47).

Regression coefficients

Comparing the various regression coefficients with one another revealed a similar but weaker pattern to that which was found with the SEest. Specifically, the average magnitude of the regression coefficients generated by the comparisons of the three systems with one another found the Meyers/Heaton systems to show the highest degree of association (mean r = .959). The coefficients were slightly lower when the associations were calculated for the MA-Reg/Meyers systems (mean r = .953). Finally, the coefficients were the smallest for the MA-Reg/Heaton systems (mean r = .947). The pattern of these associations varied based on whether the comparisons were made using the volunteers' or patients' data. The associations were not different from one another when only patients' data were examined. However, the associations for the volunteers remained significantly larger when the Meyers/Heaton systems were compared (mean r = .957). The coefficients were slightly smaller when the comparisons were between the MA-Reg/Meyers systems (r = .949). They were the smallest for the comparisons between the MA-Reg/Heaton systems (mean r = .931). Despite the noted differences, which were statistically significant between patient and volunteer samples (p = .0459), we note that all of these coefficients are extremely high and the magnitude of these differences is quite small and of minimal clinical significance.

The associations generated for the Meyers/Heaton comparisons were nearly the same as those which were generated for the MA-Reg/Meyers comparisons. Yet, the MA-Reg system was more similar to the Meyers system than it was to the Heaton system; making the Meyers system the “middle of the pack.” Some of these associations are graphically presented for the patient sample using the OTBM13 in Fig. 1, as well as in Fig. 2a–c. For the volunteer sample, the associations among the OTBM11 are shown in Figs. 4 and 5a–c. In addition, the patients' data are shown in Fig. 3a–f, showing the best (i.e., COWA) and worst (i.e., BNT) associations of the 13 measures examined. Similar sets of data for the volunteers are shown in Fig. 6a–f, again showing the best (i.e., Verbal Learning Test Immediate Recall or VLTim) and worst (i.e., Trails B) association of the 11 measures examined.

Slopes and intercepts comparisons

Another set of analyses was conducted on the regression lines analyzing the intercepts and slopes across normative systems and samples. Summarizing across many analyses, all of the slopes for both samples across all three normative systems were significantly different from 0.00. The sample comparisons that had an average slope that was closest to 1.00 was from volunteers (slope M = 0.919), although this was not significantly different from the mean slope generated from patients' data (slope M = 0.878). The comparisons whose average slope was closest to 1.00 came from the Meyers/Heaton comparisons (slope M = 0.960). The average slope from the MA-Reg/Meyers associations was significantly lower than the Meyers/Heaton slope (slope M = 0.869; p = .0347), which was also true for the MA-Reg/Heaton results (slope M = 0.862; p = .0486). However, the latter two mean slopes were not different from one another.

The majority of the intercepts were significantly different from 0.00 (n = 53 of 72 comparisons or 74%) with an average p-value that was ≤.0001. However, there were twice as many nonsignificant results found when examining the volunteer sample (36%) when compared with the patient sample (18%). Yet again, the magnitude of the differences was small and, on average, equal to 2.5 T-score points. Specifically, the volunteers' intercepts were, on average, closer to 0.00 than were the patients' (3.29 and 5.79, respectively). The Meyers/Heaton comparisons were significantly closer to 0.0 and smaller in magnitude (i.e., 1.36), when compared with the MA-Reg/Heaton and MA-Reg/Meyers results, which again did not differ from one another (i.e., 6.34 and 6.23, respectively).

Multiple regression

The regression equations reported earlier (i.e., Meyers/Heaton, MA-Reg/Heaton, and MA-Reg/Meyers) were repeated, but using multiple regression, with the demographic variables of age, education, ethnicity, and sex entered into the analyses as independent variables. These are shown in Table 7a–c for the patient sample and in Table 8a–c for the volunteer sample. As shown, adding these variables to the regression equations resulted in even smaller amounts of unexplained variance and greater cohesion among the three systems.

The additional variance explained by these multiple regression analyses ranged from a low of 1.7%, which was generated using patients' data examining the associations between the MA-Reg/Meyers systems, to a high of 8.5%, which was generated using patients' data examining the association between the MA-Reg/Heaton systems. The average amount of additional variance explained in all of these comparisons was 4.5%, which was the same for volunteers and patients alike. It is worth noting that once these demographic variables were added to the regression equations, all regression coefficients rose to a level that exceeded .975. In addition, all of the regression equations had an estimated intercept of 0.00 and an estimated slope of 1.00. The variables that explained the most additional variance, determined through the examination of the standardized regression coefficients, were education and race. These variables explained about the same amount of additional variance. In most cases, the variance explained by age and sex was not statistically significant, whereas the variance explained by education and race was twice the magnitude of that which was explained by age and sex.

Repeated-measures ANOVA

Despite the similarities across the various measures examined based on regression analyses, a repeated-measures ANOVA revealed significant differences among the three OTBMs, F(2, 428) = 47.49, p < .0001. However, the magnitudes of these differences were relatively small, with the Heaton, Meyers, and MA-Reg OTBMs equal to 44.7, 45.2, and 44.1, respectively. The corresponding effect sizes were −0.06 when the Heaton and Meyers systems were examined, 0.07 when the Heaton and MA-Reg systems were compared, and 0.13 when the Meyers and MA-Reg systems were compared. The average magnitude of these differences was <1.0 T-score points. The clinical meaningfulness of differences of this magnitude is questionable.

Discussion

We began by asking the question, “Is co-norming required?” More specifically, how different are individual T scores and/or summary indices when they are generated from co-normed versus independently normed tests? Is the use of a co-normed fixed battery necessary for good clinical practice? As cited earlier, some have claimed (e.g., Russell, 2004; Schretlen et al., 2010; White & Stern, 2003) that valid clinical decision making and treatment planning requires clinicians to use a fixed battery of tests and a set of norms that were generated from the same sample. We cited literature regarding the long-held historical debate within clinical neuropsychology regarding the use of fixed versus flexible batteries. In the 1980s, the most popular method of neuropsychological assessment was the fixed battery (Reitan & Wolfson, 1985). Embedded in the fixed/flexible battery debate is the belief that an essential benefit of fixed batteries is that they can be co-normed. Flexible batteries, by definition, are an amalgam of various stand-alone tests, whose normative samples may not have been reasonably similar to one another. Thus, it has been assumed that by mixing various normative samples and averaging scores, one is likely to come to erroneous impressions of patients' actual cognitive status (e.g., see Schneider & McGrew, 2011). The source of this error variance, that is, differences in characteristics of the normative sample, is presumed to introduce error into the interpretation of the test scores. In fact, Russell (2004), Hom (2008), and Faust (2007) all have asserted that the use of flexible batteries, and particularly the use of tests that have not been co-normed, is of limited clinical value and might even harm patients. We find such bold statements too extreme based on the data we present here. Furthermore, to our knowledge, these researchers have never empirically evaluated their assumptions. Had they done so, they might have found what we have found. Specifically, there are negligible differences in the normative systems, at least in the ones we examined thus far. It is doubtful that two clinicians examining the same raw data would come to different conclusions based on the use of two different normative systems.

Specifically, our results indicate that the methods used by Mitrushina and colleagues (2005), Meyers (1995), and Meyers & Volbrecht (2003) closely replicate the co-normed system of Heaton and colleagues (1991, 2004). In fact, the differences are so minor, they can hardly be perceived with the naked eye, as evidenced in Figs. 1 and 4, which shows patients' and volunteers' distribution of scores on the OTBM. Furthermore, multiple regression analyses that again included the demographic variables could only explain, on average, an additional 4.5% of the variance among the three systems. Furthermore, it is not possible to definitively conclude that any one system is better than another. It is conceivable that one system may have over corrected for certain demographic variables. Simply finding a way to get the two systems to come to greater agreement does not allow us to determine which of the systems is more accurate. Based on our results, we believe Meyers system to be more accurate, as it falls between the two other systems on a large number of metrics used to determine the quality of systems. Furthermore, the other two systems fall further apart from one another than they do from the Meyers system. However, this is a hypothesis that requires more testing to substantiate. We are working on a follow-up study where we will be presenting differences in diagnostic outcome to address the question of how much different methods of determining impairment are influenced by the normative system used.

What we find remarkable about our analyses was the degree of concordance. This occurred despite the many coding problems we faced and the corrections we had to implement to complete our analyses. Reiterating, there were different versions of tests administered to subjects, as well as to the normative samples (e.g., AVLT vs. CVLT; HCT vs. VCT vs. BCT; COWA used FAS as well as CFL). There were different numbers of tests in common for the two samples (i.e., OTBM13 for patients and OTBM11 for volunteer). Some test scores had to be prorated based on differences in the number of test items (e.g., AVLT, CVLT, and BCT). We collapsed various minority participants into a single minority group. We examined samples that were quite different from one another (i.e., referred patients and normal college students) with respect to age, health status, and premorbid functioning. When our results are placed in this context, it is rather amazing that we obtained the level of agreement we did across the three systems.

Taking another perspective, one which we have not heard others express, the issue of co-norming relates to the concept of statistical power and the benefits of over sampling. By accumulating data from several samples and adjusting scores based on each sample's demographic makeup, the results are not dependent on any one sampling methodology or model of the population. When one encounters extreme scores in a normative study, such was revealed for the original CVLT (e.g., Weins, Tindall, & Crossen, 1994) and the Stanford-Binet Fourth Edition (e.g., Youngstrom, Glutiing, & Watkins, 2003), the impact this has on the clinician's score interpretation is much greater than it would be by having any one study of 30–50 studies being in error. In meta-regression, such was used by Mitrushina and colleagues (2005), errors in sampling for any one study (i.e., noise in the system) are likely compensated for by errors in sampling in another study that is in the opposite direction. Errors are likely to be randomly distributed across study samples. Logically, the result is that the noise that exists in each study included tends to cancel itself out, whereas the signal that exists in all the study samples tends to get amplified. If one follows this argument, it would lead to the opposite conclusion of that promulgated by enthusiasts of co-norming. Specifically, be careful in putting too much faith into any single co-normed system without fully appreciating what might go wrong if that system was unrepresentative of the population from which your patients are drawn.

Although our results might lead one to conclude that the difference in T scores for the various normative systems are of a magnitude that is unimportant, this remains an empirical question. The magnitude of the difference in the various normative systems is relative to how much of a difference changes the diagnostic outcome. Small differences in norms might make for big differences in outcomes. Therefore, as has been noted, we are in the process of completing follow-up analyses with these and other data, using different diagnostic decision-making models to assess the magnitude change normative systems have on the outcome. We are also examining if the pattern of scores for an individual varies across normative systems and might lead one to incorrectly diagnose a cause of the presumed deficits (i.e., false positive). We believe this to be an empirical question, which, to our knowledge, no one has yet examined. Thus, despite the high degree of congruence we found across normative systems, we believe that the jury is still out as to how much the use of these various systems might lead to differences in diagnosis and concomitant treatment.

In summary, the data we present do not support the demand that the tests used by clinical neuropsychologists in flexible batteries must be co-normed, as is done with fixed batteries. Our data are not entirely conclusive, but they should give pause to those who have called for the use of only fixed batteries and/or those who demand that the tests used in a patient assessment must come from co-normed batteries. Rather, here is the claim made by Rohling and colleagues (2004) a decade ago:

We expect, in most cases with real patients assessed by two different clinicians, that the correlation between the two OTBMs would likely be greater than .90. Substantiating this point, in Heaton et al.'s study (Heaton et al., 2001) of patients suffering from schizophrenia, these authors obtained a test–retest reliability coefficient of .97 for their Global Neurological T -Score. Of course, they most likely were comparing the results of two nearly identical test batteries, rather than our worst-case scenario (p. 167).

Thus, in reality, what Rohling and colleagues (2004) predicted proved to be true. When we compared the results from the Meyers and MA-Reg normative systems with those generated using the Heaton system, the OTBM regression coefficients were .98 and .94, respectively, with a difference in means of <1.0 T score point (43.9, 44.6, and 42.9, respectively). These results are not only greater than the predicted .90, but are equivalent to the test–retest reliability of the Heaton system itself. Such results clearly challenge the assumption that co-normed batteries are necessary for ethical and valid clinical practice. In fact, it appears that co-norming is not required for clinicians to make sound judgments on patients' data as long as adequate normative systems are used when converting patients' raw test scores to a common metric for subsequent analysis.

Funding

None of the authors received any funding by any entity to conduct this research.

Conflict of Interest

None declared.

References

American Academy of Clinical Neuropsychology (AACN). (2008)
Baxter v. Temple, 949 A.2d 167 (N.H. May 20, 2008)
.
Amicus Brief submitted on behalf of the American Academy of Clinical Neuropsychology and in Support of Plaintiff
. .
Army Individual Test Battery
. (
1944
).
Manual of directions and scoring
 .
Washington, DC
:
War Department, Adjutant General's Office
.
Axelrod
B. N.
,
Wall
J. R.
(
2007
).
Expectancy of impaired neuropsychological test scores in a non-clinical sample
.
International Journal of Neuroscience
 ,
117
,
1591
1602
.
Barona
A.
,
Reynolds
C. R.
,
Chastain
R.
(
1984
).
A demographically based index of premorbid intelligence for the WAIS-R
.
Journal of Consulting and Clinical Psychology
 ,
52
,
885
887
.
Benton
A. L.
,
Hamsher
S. K.
,
Sivan
A. B.
(
1994
).
Multilingual aphasia examination manual for instruction
 , (2nd ed.).
Iowa City, IA: AJA Associates
.
Bigler
E. D.
(
2007
).
A motion to exclude and the “fixed” vs. “flexible” battery in forensic neuropsychology: Challenges to the practice of clinical neuropsychology
.
Archives of Clinical Neuropsychology
 ,
22
,
45
51
.
Calamia
M.
,
Markon
K.
,
Tranel
D.
(
2014
).
The robust reliability of neuropsychological measures: Meta-analyses of test–retest correlations
.
The Clinical Neuropsychologist
 ,
27
,
1077
1105
.
DeFilippis
N. A.
,
McCampbell
E.
(
1997
).
Booklet Category Test (BCT)
  (
2nd ed.
).
Odessa, FL
:
Psychological Assessment Resources
.
Delis
D. C.
,
Kramer
J. H.
,
Kaplan
E.
,
Ober
B. A.
(
1987
).
The California Verbal Learning Test
 .
San Antonio, TX
:
Psychological Corporation
.
Fields
K.
,
Hill
B. D.
,
Corley
E.
,
Russ
K.
,
Boettcher
A.
,
Musso
M.
et al
. (
2013
, October).
1, 2, 3, 4… I declare norms war! Comparison of the Mitrushina meta-norms and Heaton and Comprehensive Neuropsychological Normative System demographic norms in a matched African-American and white sample
.
Poster presented at the 33rd Annual Meeting of the National Academy of Neuropsychology in San Diego, CA
.
Halstead
W.
(
1947
).
Brain and intelligence
 .
Chicago
:
University of Chicago Press
.
Hasselblad
V.
,
Hedges
L. V.
(
1995
).
Meta-analysis of screening and diagnostic tests
.
Psychological Bulletin
 ,
117
,
167
178
.
Heaton
R. K.
,
Gladsjo
J. A.
,
Palmer
B. W.
,
Kick
J.
,
Marcotte
T. D.
,
Jeste
D. V.
(
2001
).
Stability and course of neuropsychological deficits in schizophrenia
.
Archives of General Psychiatry
 ,
58
,
24
32
.
Heaton
R. K.
,
Grant
I.
,
Matthews
C. G.
(
1991
).
Comprehensive norms for an expanded Halstead-Reitan battery: demographic corrections, research findings, and clinical applications
 .
Odessa, FL: Psychological Assessment Resources
.
Heaton
R. K.
,
Miller
W.
,
Taylor
M. J.
,
Grant
I.
(
2004
).
Revised comprehensive norms for an expanded Halstead-Reitan Battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults
 .
Lutz, FL
:
Psychological Assessment Resources
.
Hill
B. D.
,
Boettcher
A.
,
Cary
B.
,
Kline
J.
,
Womble
M.
,
Rohling
M. L.
(
2013, February
).
Much ado about norming: A comparison of the Heaton demographically adjusted norms and Mitrushina meta-norms
 .
Poster presented at the 41st Annual Meeting of the International Neuropsychological Society in Waikoloa, HI
.
Hill
B. D.
,
Rohling
M. L.
,
Boettcher
A. C.
,
Meyers
J. E.
(
2013
).
Cognitive intra-individual variability has a positive association with traumatic brain injury severity and suboptimal effort
.
Archives of Clinical Neuropsychology
 ,
28
,
640
648
.
Holdnack
J. A.
(
2001
).
Wechsler Test of Adult Reading: WTAR
 .
San Antonio, TX
:
Pearson
.
Holdnack
J. A.
,
Drozdick
L. W.
(
2009
).
Advanced clinical solutions: Clinical and interpretive manual
 .
San Antonio, TX
:
Pearson
.
Hom
J.
(
2008
).
Commentary. Response to Bigler (2007). The sky is not falling
.
Archives of Clinical Neuropsychology
 ,
23
,
125
128
.
Irwig
L.
,
Tosteson
A. N.
,
Gatsonis
C.
,
Lau
L.
,
Colditz
G.
,
Chalmers
T. C.
et al
. (
1994
).
Guidelines for meta-analyses evaluating diagnostic tests
.
Annuals of Internal Medicine
 ,
120
,
667
676
.
Jastak
S. R.
,
Wilkinson
G. S.
(
1984
).
Wide Range Achievement Tests-Revised
 .
Wilmington, DE
:
Jastak Associates
.
Kaplan
E.
,
Goodglass
H.
,
Wientrab
S.
(
1983
).
The boston naming test
 .
Philadelphia, PA
:
Lea & Febriger
.
Kern
R. S.
,
Nuechterlein
K. H.
,
Green
M. F.
,
Baade
L. E.
,
Fenton
W. S.
,
Gold
J. M.
et al
. (
2008
).
The MATRICS Consensus Cognitive Battery, Part 2: Co-norming and standardization
.
American Journal of Psychiatry
 ,
165
,
214
220
.
Koenig
K. A.
,
Frey
M. C.
,
Detterman
D. K.
(
2008
).
ACT and general cognitive ability
.
Intelligence
 ,
36
,
153
160
.
Kozel
J. J.
,
Meyers
J. E.
(
1998
).
A cross-validation study of the Victoria Revision of the Category Test
.
Archives of Clinical Neuropsychology
 ,
13
,
327
332
.
Labreche
T. M.
(
1983
).
The Victoria Revision of the Halstead Category Test
 .
Unpublished doctoral dissertation, University of Victoria, BC, Canada
.
Larrabee
G. J.
(
2008
).
Commentary: Flexible vs. fixed batteries in forensic neuropsychological assessment: Reply to Bigler and Hom
.
Archives of Clinical Neuropsychology
 ,
23
,
763
776
.
Lezak
M. D.
,
Howieson
D. B.
,
Loring
D. W.
(
2004
).
Neuropsychological evaluation
 .
New York
:
Oxford University Press
.
Meyers
J. E.
(
1995
).
Manual for the Meyers Neuropsychological Battery (MNB)
.
Retrieved June 18, 2015, from www.meyersneuropsychological.com
.
Meyers
J. E.
(
2007
).
Malingering mild traumatic brain injury: Behavioral approaches used by both malingering actors and probable malingerers
. In
Boone
K.
(Ed.),
Assessment of feigned cognitive impairment: A neuropsychological perspective
 .
New York
:
Guilford Press
.
Meyers
J. E.
,
Rohling
M. L.
(
2004
).
Validation of the Meyers Short Battery on mild TBI patients
.
Archives of Clinical Neuropsychology
 ,
19
,
637
651
.
Meyers
J. E.
,
Rohling
M. L.
(
2009
).
CT and MRI correlations with neuropsychological tests
.
Applied Neuropsychology
 ,
16
,
237
253
.
Meyers
J. E.
,
Volbrecht
M. E.
(
2003
).
A validation of multiple malingering detection methods in a large clinical sample
.
Archives of Clinical Neuropsychology
 ,
18
,
261
276
.
Meyers
J. E.
,
Volbrecht
M.
,
Axelrod
B. N.
,
Reinsch-Boothby
L.
(
2011
).
Embedded symptom validity tests and overall neuropsychological test performance
.
Archives of Clinical Neuropsychology
 ,
26
,
8
15
.
Miller
L. S.
,
Rohling
M. L.
(
2001
).
A statistical interpretive method for neuropsychological test data
.
Neuropsychology Review
 ,
11
,
143
169
.
Mitrushina
M.
,
Boone
K. B.
,
Razani
J.
,
D'Elia
L. F.
(
2005
).
Handbook of normative data for neuropsychological assessment
  (
2nd ed.
).
New York
:
Oxford University Press
.
Moore
D.
,
McCabe
G.
,
Craig
B.
(
2012
).
Introduction to the practice of statistics
  (
7th ed.
).
New York
:
W.H. Freeman & Company
.
Putnam
S.
(
1989
).
The TCN Salary Survey: A salary survey of neuropsychologists
.
The Clinical Neuropsychologist
 ,
3
,
97
115
.
Rabin
L. A.
,
Barr
W. B.
,
Burton
L. A.
(
2005
).
Assessment practices of clinical neuropsychologists in the United States and Canada: A survey of INS, NAN, and APA Division 40 members
.
Archives of Clinical Neuropsychology
 ,
20
,
33
65
.
Reitan
R. M.
,
Wolfson
D.
(
1985
).
The Halstead-Reitan Neuropsychological Test Battery: Theory and interpretation
 .
Tucson, AZ
:
Neuropsychology Press
.
Rey
A.
(
1941
).
L'examen psychologique dans les cas d'encéphalopathie traumatique
.
Archives de Psychologie
 ,
28
,
21
.
Rohling
M. L.
,
Axelrod
B. N.
,
Wall
J. R.
(
2008, June
).
What impact does co-norming have on neuropsychological test scores: Myth vs. reality
.
Poster presented at the sixth annual meeting of the American Academy of Clinical Neuropsychology in Boston, MA
.
The Clinical Neuropsychologist
 ,
22
,
384
454
.
Rohling
M. L.
,
Langhinrichsen-Rohling
J.
,
Miller
L. S.
(
2003
).
Statistical methods for determining malingering: Rohling's interpretive method
. In
Franklin
R.
(Ed.),
Prediction in forensic and neuropsychology: Sound statistical practices
  (pp.
171
207
).
Mahwah, NJ
:
Lawrence Erlbaum Associates
.
Rohling
M. L.
,
Miller
L. S.
,
Langhinrichsen-Rohling
J.
(
2004
).
Rohling's interpretive method for neuropsychological data analysis: A response to critics
.
Neuropsychology Review
 ,
14
,
155
169
.
Rosenthal
R.
(
1995
).
Writing meta-analytic reviews
.
Psychological Bulletin
 ,
118
,
183
192
.
Ross
S. R.
,
Putnam
S. H.
,
Millis
S. R.
,
Adams
K. M.
,
Krukowski
R. A.
(
2006
).
Detecting insufficient effort using the Seashore Rhythm and Speech-Sounds Perception Test in head injury
.
The Clinical Neuropsychologist
 ,
20
,
789
815
.
Russell
E. W.
(
2004
).
The operating characteristics of the major HRNES-R measures
.
Archives of Clinical Neuropsychology
 ,
19
,
1043
1061
.
Russell
E. W.
(
2012
).
Scientific foundations of neuropsychological assessment: With application to forensic evaluation
  (pp.
297
298
).
Waltham, MA
:
Elsevier
.
Russell
E. W.
,
Russell
S. L. K.
,
Hill
B. D.
(
2005
).
The fundamental psychometric status of neuropsychological batteries
.
Archives of Clinical Neuropsychology
 ,
20
,
785
794
.
Schneider
J.
,
McGrew
K.
(
2011
).
Applied psychometrics 101 – #10: “Just say no” to averaging IQ subtest scores
 .
St. Josephs, MN
:
Institute of Applied Psychometrics
. Retrieved
Schretlen
D. J.
,
Testa
S. M.
,
Pearlson
G. D.
(
2010
).
Calibrated Neuropsychological Normative System professional manual
 .
Lutz, FL
:
Psychological Assessment Resources
.
Slick
D. J.
,
Sherman
E. M. S.
,
Iverson
G. L.
(
1999
).
Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research
.
Clinical Neuropsychology
 ,
13
,
545
561
.
Spreen
O.
,
Strauss
E.
(
1998
).
A compendium of neuropsychological tests: Administration, norms, and commentary
  (2nd ed.).
New York: Oxford University Press
.
Stern
R. A.
,
White
T.
(
2003
).
Neuropsychological assessment battery: Administration, scoring, and interpretation manual
 .
Lutz, FL
:
Psychological Assessment Resources
.
Sweet
J. J.
,
Meyers
D. G.
,
Nelson
N. W.
,
Moberg
P. J.
(
2011
).
The TCN/AACN 2010 “Salary Survey:” Professional practices, beliefs, and incomes if U.S. neuropsychologists
.
The Clinical Neuropsychologist
 ,
25
,
12
61
.
Tabachnick
B. G.
,
Fidell
L. S.
(
2013
).
Using multivariate statistics
  (
6th ed.
).
Boston
:
Allyn and Bacon
.
Uttle
B.
(
2002
).
North American Adult Reading Test: Age norms, reliability, and validity
.
Journal of Clinical and Experimental Neuropsychology
 ,
24
,
1123
1137
.
Wechsler
D.
(
2008
).
Wechsler Adult Intelligence Scale—Fourth edition
 .
San Antonio, TX
:
Pearson
.
Wechsler
D.
(
2009a
).
Wechsler Individual Achievement Test—Third edition
 .
San Antonio, TX
:
Pearson
.
Wechsler
D.
(
2009b
).
Wechsler Memory Scale—Fourth edition
 .
San Antonio, TX
:
Pearson
.
Weins
A. N.
,
Tindall
A. G.
,
Crossen
J. R.
(
1994
).
California Verbal Learning Test: A normative data study
.
The Clinical Neuropsychologist
 ,
8
,
75
90
.
White
T.
,
Stern
R. A.
(
2003
).
Neuropsychological assessment battery: Psychometric and technical manual
  (pp.
1
6
).
Lutz, FL
:
Psychological Assessment Resources
.
Youngstrom
W. A.
,
Glutiing
J. J.
,
Watkins
M. W.
(
2003
).
Stanford-Binet Intelligence Scale: Fourth Edition (SB4): Evaluating the empirical bases for interpretations
. In
Reynolds
C. R.
,
Kamphaus
R. W.
(Eds.),
Handbook of psychological and educational assessment: Intelligence, aptitude, and achievement
  (
2nd ed.
, pp.
217
242
).
New York
:
Guilford Press
.
Zheng
B.
,
Agresti
A.
(
2000
).
Summarizing the predictive power of a generalized linear model
.
Statistics in Medicine
 ,
19
,
1771
1781
.