This joint position paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology sets forth our position on appropriate standards and conventions for computerized neuropsychological assessment devices (CNADs). In this paper, we first define CNADs and distinguish them from examiner-administered neuropsychological instruments. We then set forth position statements on eight key issues relevant to the development and use of CNADs in the healthcare setting. These statements address (a) device marketing and performance claims made by developers of CNADs; (b) issues involved in appropriate end-users for administration and interpretation of CNADs; (c) technical (hardware/software/firmware) issues; (d) privacy, data security, identity verification, and testing environment; (e) psychometric development issues, especially reliability, and validity; (f) cultural, experiential, and disability factors affecting examinee interaction with CNADs; (g) use of computerized testing and reporting services; and (h) the need for checks on response validity and effort in the CNAD environment. This paper is intended to provide guidance for test developers and users of CNADs that will promote accurate and appropriate use of computerized tests in a way that maximizes clinical utility and minimizes risks of misuse. The positions taken in this paper are put forth with an eye toward balancing the need to make validated CNADs accessible to otherwise underserved patients with the need to ensure that such tests are developed and utilized competently, appropriately, and with due concern for patient welfare and quality of care.
The use of computerized neuropsychological assessment devices (CNADs) is receiving increasing attention in clinical practice, research, and clinical trials. There are several potential advantages of computerized testing including: (a) the capacity to test a large number of individuals quickly; (b) ready availability of assessment services without advance notice; (c) the ability to measure performance on time-sensitive tasks, such as reaction time, more precisely; (d) potentially reduced assessment times through the use of adaptive testing protocols; (e) reduced costs relating to test administration and scoring; (f) ease of administering measures in different languages; (g) automated data exporting for research purposes; (h) increased accessibility to patients in areas or settings in which professional neuropsychological services are scarce; and (i) the ability to integrate and automate interpretive algorithms such as decision rules for determining impairment or statistically reliable change.
CNADs range from stand-alone computer-administered versions of established examiner-administered tests (e.g., Wisconsin Card Sorting Test) to fully web-integrated testing stations designed for general (e.g., cognitive screening) or specific (e.g., concussion evaluation and management) applications. This has been a highly active research and development area, and new tests and findings are being released continuously (Crook, Kay, & Larrabee, 2009). Researchers have used computerized neuropsychological testing with numerous clinical groups across the lifespan. Examples include children with attention-deficit hyperactivity disorder (Bolfer et al., 2010; Chamberlain et al., 2011; Gualtieri & Johnson, 2006; Polderman, van Dongen, & Boomsma, 2011) or depression (Brooks, Iverson, Sherman, & Roberge, 2010); adults with psychiatric illnesses, such as depression or bipolar disorder (Iverson, Brooks, Langenecker, & Young, 2011; Sweeney, Kmiec, & Kupfer, 2000); and adolescents and young adults who sustain sport-related concussions (Bleiberg, Garmoe, Halpern, Reeves, & Nadler, 1997; Bleiberg et al., 2004; Broglio, et al., 2007; Cernich, Reeves, Sun, & Bleiberg, 2007; Collie, Makdissi, Maruff, Bennell, & McCrory, 2006; Collins, Lovell, Iverson, Ide, & Maroon, 2006; Gualtieri & Johnson, 2008; Iverson, Brooks, Collins, & Lovell, 2006; Iverson, Brooks, Lovell, & Collins, 2006; Peterson, Stull, Collins, & Wang, 2009; Van Kampen, Lovell, Pardini, Collins, & Fu, 2006). CNADs have also been applied to adult epilepsy (Moore, McAuley, Long, & Bornstein, 2002), cardiovascular surgery (Raymond, Hinton-Bayre, Radel, Ray, & Marsh, 2006), neurocognitive problems encountered by active duty military service members and veterans (Anger et al., 1999; Marx et al., 2009; McLay, Spira, & Reeves, 2010; Retzlaff, Callister, & King, 1999; Vasterling et al., 2006), and mild cognitive impairment in older adults (Doniger et al., 2006; Dwolatzky et al., 2004; Gualtieri & Johnson, 2005; Tornatore, Hill, Laboff, & McGann, 2005; Wild, Howieson, Webbe, Seelye, & Kaye, 2008) or dementia (Doniger et al., 2005; Dorion et al., 2002; Wouters, de Koning, et al., 2009; Wouters, Zwinderman, van Gool, Schmand, & Lindeboom, 2009). Computerized tests, sometimes administered as part of a predominantly examiner-administered battery, are also used to identify poor effort within the context of a comprehensive neuropsychological evaluation (Green, Rohling, Lees-Haley, & Allen, 2001; Slick et al., 2003). The potential application of CNADs to other medical and neuropsychiatric conditions seems limited only by available knowledge and recognition of neurocognitive symptoms seen in these disorders. For this reason, clinical application of CNADs is expected to increase in the coming years.
Computerized neuropsychological assessment is currently being used in many mainstream applications to which examiner-administered neuropsychological assessment has been historically applied. This paper describes the position of the American Academy of Clinical Neuropsychology (AACN) and the National Academy of Neuropsychology (NAN) with regard to key issues in the development, dissemination, and implementation of computerized neuropsychological tests in clinical practice.
Nature and Definition of CNADs
We define a “computerized neuropsychological testing device” as any instrument that utilizes a computer, digital tablet, handheld device, or other digital interface instead of a human examiner to administer, score, or interpret tests of brain function and related factors relevant to questions of neurologic health and illness. Although it is tempting to consider CNADs as directly comparable with examiner-administered tests, there are significant differences between the two approaches. First, it is important to recognize that even when a traditional examiner-administered test is programmed for computer administration, it becomes a new and different test. One obvious difference is in the patient interface. In examiner-centered approaches, the patient interacts with a person who presents stimuli, records verbal, motor, or written responses and makes note of key behavioral observations. For a CNAD, examinees interact with a computer or tablet testing station through one or more alternative input devices (e.g., keyboard, voice, mouse, or touch screen), in some cases without supervision or observation by a test administrator. Also, some CNADs utilize an “adaptive” assessment approach derived from Item Response Theory (Reise & Waller, 2009; Thomas, 2010) wherein the program adjusts task difficulty or stimulus presentation as a function of task success or failure on the part of the examinee. This does not typically occur in examiner-centered approaches.
Second, whereas most examiner-administered tests require documentation of test user qualifications on purchase, some CNADs are advertised and marketed to end-users with no expertise in neuropsychological assessment or knowledge of psychometric principles. Many contain proprietary algorithms for calculating summary scores or indices from performance data, and some provide the end-user with boilerplate report language derived from the examinee's performance that is intended as an automated form of interpretation based solely on test metrics. Third, the responsible interpretation and reporting of results of CNADs requires an understanding of test utility and accuracy when installed and used in the local clinical setting, which in turn requires familiarity with many technical details regarding their psychometric properties and normative standards. How the installed program interacts with the user's unique software and hardware configuration may affect important parameters including timing accuracy, screen resolution or refresh rate, or the sensitivity of input devices. These and other issues discussed in this paper lead to the conclusion that CNADs are qualitatively and technically different from examiner-administered instruments and require best practices for their competent and safe use.
Key Issues and Position Statements on CNAD
(1) Device marketing and performance claims
It is our position that CNADs are subject to, and should meet, the same standards for the development and use of educational, psychological, and neuropsychological tests (American Psychological Association, 1999) as are applied to examiner-administered tests. This position echoes similar statements made over 20 years ago by Kramer (1987) and Matarazzo (1985, 1986). In addition, CNADs likely qualify, and thus will eventually be regulated as, “medical devices” according to prevailing definitions in Federal law. As regulated devices, developers of computerized neuropsychological assessment tools will likely need to provide additional documentation that meets specific labeling standards particular to medical device regulation.
Section 201(h) of the Federal Food, Drug & Cosmetic Act (FD&C; 21 U.S.C. 301) defines a “medical device” as “an instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent, or other similar or related article, including a component part, or accessory which is … intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment, or prevention of disease, in man or other animals … .” This definition would appear to include CNADs. As the field of computerized assessment evolves, it is reasonable to assume that such tools will come under the regulatory authority of the Food and Drug Administration (FDA), the agency that, under federal law, regulates all drugs and medical devices.
When considering the use of a CNAD in a particular setting, end-users need to know the answers to questions such as (a) What does the device claim to do? (b) What does it actually do? (c) How does it do what it claims to do? and (d) Is it safe and effective? Such answers are critical in making informed decisions about the reliability, validity, and clinical utility of CNADs in any particular setting. The claims made by the developer are critical to the ability of the end-user to evaluate the product. Some products may claim to be stand-alone diagnostic tests for specific conditions, whereas others may purport increased technical accuracy over examiner-administered neuropsychological assessment. The marketing claims and claims made in the documentation for the measure should be evaluated in light of the data presented and the technical information included in the manual, similar to the evaluation of examiner-administered neuropsychological measures. In addition, understanding the mechanism by which the measure provides diagnostic information or normative comparisons is critical for users of CNADs. Even if the algorithm is proprietary, the methodology must be understandable and transparent so that the user can evaluate the validity of the claim made.
Because some CNADs are intended for end-users who have no neuropsychological expertise, claims may be difficult to evaluate by the potential user. When users consider installing a CNAD in their practice or research program, key information about the device, its intended application, and its expected performance is needed. In this regard, it is important to distinguish among (a) “marketing” (the healthcare professional to whom the device is targeted), (b) “labeling” (the information about the device that is provided on packaging or accompanying inserts), (c) “use” (the intended application for the device), and (d) “documentation” (information that accompanies the device, including installation instructions, normative data, or information about device utility). Readers who consult device websites will discover broad variation in the extent to which such information is provided by developers of commercially available CNADs.
Regarding safety and efficacy, developers of CNADs should be explicit about how the test should be used in deriving diagnostic or prognostic statements. Some CNADs provide simple, easy-to-use reports that represent performance levels in color-coded fashion, with “red” indicating a cause for concern, and “green” suggesting normal results, similar in principle to the results of a metabolic panel. Though this simplifies the use of the device, it can obscure the need to consider other data (e.g., additional laboratory, neuroimaging, or clinical data) needed to establish a complex diagnosis and it contains little guidance regarding differential diagnostic considerations in the individual case. Test developers should provide sufficient data to allow users to determine if the CNAD has been previously applied to the problem/condition considered by the user, so that users can determine the suitability of the CNAD for their settings and needs. At the same time, users are expected to implement CNADs in a responsible and appropriate manner. For example, it would not be appropriate to implement a CNAD that had been developed for concussion assessment and management in a clinical practice for use as a dementia screening device in the absence of empirical data supporting its efficacy for that use.
(2) End-user issues
Developers of CNADs are expected to provide a clear definition of the intended end-user population, including a description of the competencies and skills necessary for effective and accurate use of the device and the data it provides.
Some devices are specifically intended as self-test instruments (with the examinee as the end-user), whereas others require test-user qualifications similar to those imposed on examiner-administered neuropsychological tests. Still other CNADs are intended for use by healthcare providers who possess varying knowledge of psychometric principles and/or neuropsychological expertise. Although test “administration” is likely to be less affected by this lack of knowledge if appropriate orientation to the use of and training on the specific test is undertaken, “interpretation” of the data generated by the measure may be more substantially affected. Dependent on the intended use or application of the test, a lack of knowledge, regarding psychometric properties of the measure, test behavior, associated medical or behavioral data to support interpretation, and neuropsychological expertise, may present a specific challenge to the general healthcare provider and create a risk to the patient with whom the test is used.
CNADs can be appropriately administered by paraprofessionals or technically trained staff who may lack the education, training, or experience necessary to integrate or interpret test results. However, unlike examiner-administered testing, many CNADs are intended to support clinical interpretations rendered by practitioners who have little or no expertise, training, or experience in psychometrics or clinical neuropsychology. The safe and compet ent use of CNADs requires a link to professionals trained in the use of psychometric techniques in the differential diagnostic setting. The appropriate process of test interpretation involves an integration of quantitative test findings with information from medical records, including disease course, functional impairment, comorbid illnesses, history, and other relevant factors. Also, an understanding that multiple factors separate from central nervous system disease or injury (e.g., premorbid abilities, general health, neuropsychiatric and emotional status, medications, fatigue, and effort) can affect performance on cognitive tests is critical to accurate interpretation of test results. Bypassing careful clinical interpretation may lead to potential misuse of the data or failure to consider potential clinical or methodological issues that could influence the results (American Psychological Association, 1986). This issue is also relevant to the application of CNADs in settings in which a professional is not available to make behavioral observations of the examinee during the testing session. Indeed, some CNADs are designed to be administered to a patient or client with minimal or no direct observation by a trained examiner, and some CNADs that do involve observation by the examiner are intended for use by professionals or paraprofessionals with limited education, training, or experience in neuropsychological assessment. In these instances, behavioral indicators of emotional, motivational, or mental status issues that might complicate test interpretation may be inadvertently missed.
Test developers have taken one of two broad approaches to the user qualifications issue. Some CNADs require appropriate licensure or certification in relevant healthcare fields (e.g., psychology, medicine), thus making the device available to an “expert user.” Another approach has been to develop an interpretive algorithm within the device software that essentially creates an “expert system.” In this approach, the program itself contains clinical actuarial routines that generate clinical findings and recommendations. However, this poses two challenges for end-users: (a) they might lack the knowledge and training to independently evaluate the accuracy of the output and/or the claims the developer makes regarding the test results; and (b) the proprietary interpretive routines might be opaque or ill-defined methodologically, obscuring critical evaluation by the end-user or scientific community. Test developers should make sufficient information available to the end-user (without compromising proprietary information or trade secrets) so that an independent evaluation of the validity of the interpretive report can be made, and so that the utility of the test in actual clinical practice can be independently evaluated.
In considering the broader context in which CNADs are applied today, it is important to distinguish between neuropsychological testing (utilizing cognitive tests to obtain behavioral samples of abilities in memory, concentration, or executive function) and neuropsychological assessment (providing a comprehensive evaluation of an individual that integrates test results with history, symptoms, behavioral observations, physical findings, and other aspects of the examinee's situation to yield interpretive statements about the underlying causes of the patient's performance pattern; Matarazzo, 1990). The interpretation of CNAD results requires similar specialized training and expertise in clinical neuropsychology as the interpretation of examiner-administered neuropsychological tests. Moreover, the interpretation of computerized test results, like their examiner-administered counterparts, occurs in the context of knowledge of relevant information from the social, medical, mental health, educational, and occupational history of the examinee (Matarazzo, 1990). Because of this, the specific interpretive statements generated by CNAD software may not apply to the individual examinee. If applied in practice, these automated reports should be carefully reviewed by someone with expertise in neuropsychological test interpretation for accuracy and relevance related to each individual examinee in each specific case. Consistent with professional competence, clinicians “do not promote the use of psychological assessment techniques by unqualified persons, except when such use is conducted for training purposes with appropriate supervision” (APA, 2010, Ethical Standard 9.07, Assessment by Unqualified Persons).
(3) Technical (hardware/software/firmware) issues
Test developers should provide users with sufficient technical information to insure that the local installation of a CNAD will produce data that can be accurately compared with that which exists in the test's normative database.
As is true of examiner-administered assessment instruments, CNADs are developed within a specific environment that helps define the domains to which the test and its results can be generalized. Technical aspects of the computing environment in which the test was developed may critically affect how the test performs when applied in clinical settings (Cernich, Brennana, Barker, & Bleiberg, 2007). Such aspects include the computer or tablet's operating system, the speed of the central processing unit, the amount of available memory, how the program clock interacts with the system clock (McInnes & Taylor, 2001), resolution and refresh rate of the display (Gofen & Mackeben, 1997), characteristics of the user interface (Gaver, 1991), and other aspects of the operating system environment (Forster & Forster, 2003; Myors, 1999; Plant, Hammond, & Turner, 2004; Plant & Turner, 2009). Even subtle differences between the performance of the test in standardization and its performance when locally installed on the user's computer, tablet, or handheld device can influence whether the test performs as advertised. For example, performance indices that rely on millisecond distinctions between groups become less discriminative if the operating environment is clouded by operating system interference, security verification, and/or commonly scheduled program updates that interfere with timing resolution (Creeger, Miller, & Paredes, 1990).
CNADs installed on a user's local machine should duplicate with sufficient accuracy the computing environment in which the normative performance data for the test were established. If this fidelity cannot be demonstrated or confirmed, users have reason to doubt the results of the CNAD. Test developers are expected to provide specific guidance to users that will enable them to determine whether their local installation meets certain technical criteria, including a clear description of the necessary hardware and software configuration, and a developer-provided diagnostic that allows the user to determine that the test has been properly installed and will operate with fidelity to the normative installation.
(4) Privacy, data security, identity verification, and testing environment
Ultimately, maintaining patient privacy and security is the responsibility of the healthcare professional who collects, stores, and/or transmits personal health information, and users of CNADs should have appropriate knowledge about information technology that assures that patient rights are protected. Maintaining this responsibility requires the provision of detailed information about how the CNAD collects and stores patient data. If data are to be transmitted to remote servers or databases for normative referencing or automated report generation, users need to understand how that data are protected from security intrusion, corruption, or other threats to data integrity and privacy. Test developers provide a procedure to verify the identity of examinees who complete a CNAD remotely.
Some CNADs store patient data files on a local hard disk, whereas others utilize a “store and forward” web interface in which the patient's data are collected locally and then uploaded via a web connection after testing is completed (Cernich et al., 2007). Users need detailed information about security precautions in place when such data are transmitted, stored, and accessed, and what procedures are in place in the event of inadvertent data loss. Because users have legal obligations to examinees imposed by the HIPAA Security Rule, civil rights legislation, and ethical guidelines, they also need assurances about the security and privacy of data that are transmitted over the web to remote databases (American Psychological Association, 2007). Prevailing law and best practice requires the use of encryption technologies that offer a measure of protection from unauthorized intrusion. Users need to be informed about, and aware of, the unique characteristics of electronic data, how to protect privacy when transmitting data to remote sites, and challenges that exist in disposing of electronic data.
By design, CNADs can be administered remotely, and identity verification can pose particular challenges for implementation of computerized measures when used in this way. Though this does not pose particular problems for in-person administration of a CNAD, remote use, especially in situations where there is no proctor to verify identity, presents logistic and ethical issues (Naglieri et al., 2004). In internet-based applications, individuals could be represented by accomplices or assisted by individuals to either feign or enhance their test performance. Even with security protocols that include provision of personal information to verify identity, an informed accomplice could assist. In certain settings (e.g., Department of Defense), systems are being developed wherein the person being assessed is identified by their personal identification verification card or common access card. This is a federal access card that includes password protection and personal information about the individual, including a biometric identifier (fingerprint; Department of Defense, 2008). Though these systems are restricted in nature, they may provide a template by which individuals could be identified for remote testing in a secure and authentic manner. Although it is not suggested that remote CNAD's must contain such sophisticated biometric identification routines, it is important that developers address identity verification concerns in a thoughtful way so that users can be reasonably assured that the remotely assessed examinee is who she/he purports to be.
Remote testing, of course, creates special challenges relating to the reliability and validity of the results—given that it can be difficult to control the environment in which the administration occurs. For example, tasks that are dependent on precise presentation of stimuli or that require motor responses may be performed differently if the examinee is lounging on a couch using a laptop than if the examinee is seated in an office environment at a well-lit desk. Test developers should provide general guidance regarding characteristics of the testing environment that are reasonably likely to affect test performance so that users can advise examinees about such environmental considerations.
(5) Psychometric development issues
CNADs are subject to the same standards and conventions of psychometric test development, including descriptions of reliability, validity, and clinical utility (accuracy and diagnostic validity), as are examiner-based measures. Psychometric information should be provided to potential users of the CNAD in a manner that enables the user to ascertain the populations and assessment questions to which the test can be appropriately applied. Test developers should provide psychometric data relevant to the claimed purpose or application of the test. The actual data provided may vary depending on whether the test's claimed purpose is to provide a description of cognitive functions or domains versus assisting with the identification of the cognitive sequelae of specific diseases, injuries, or conditions. When established examiner-administered tests are offered in a computerized version, new psychometric data that describe the CNAD version are required. Information about how the data are scored, transformed, and analyzed to produce the CNADs output statistics should be provided with sufficient clarity so that users understand the meaning of the results they produce.
Prevailing ethical standards (APA, 2010) state that “Psychologists who develop tests and other assessment techniques use appropriate psychometric procedures and current scientific or professional knowledge for test design, standardization, validation, reduction or elimination of bias, and recommendations for use” (Standard 9.05). Although these standards are not binding on non-psychologist developers, the fact remains that, in order to be useful and meaningful in practice, all cognitive tests must meet minimum psychometric standards for reliability and validity. Reliability refers to the consistency of a test's output and pertains to both “test scores” and the “clinical inferences” derived from test scores (c.f., Franzen, 1989, 2000). Reliability can be evaluated through several kinds of evidence, including (a) consistency across test items (internal consistency), (b) consistency over time (test retest reliability or test stability), (c) consistency across alternate forms (alternate form reliability), and (d) consistency across raters (inter-rater reliability).
Validity refers to the degree to which a test measures the construct(s) it purports to measure. According to classical test theory (c.f., Downing & Haladyna, 2006), types of validity include (a) content validity, (b) criterion-related validity (e.g., concurrent and predictive validity), and (c) construct validity (e.g., convergent and discriminant validity). Validity may be defined as the extent to which theory and empirical evidence support the interpretation(s) of test scores when they are used as intended by the test developer or test publisher (American Psychological Association, 1999; Messick, 1989; Pedhazer & Pedhazer Schmelkin, 1991). In other words, validity is a property of the interpretation or meaning attached to a test score within a specific context of test usage, not a property of a given test (Cronbach, 1971, p. 447; c.f., Franzen, 1989; Franzen, 2000; Urbina, 2004).
Reliability and validity are not unitary psychometric constructs. Instead, they are measured in studies in different clinical contexts with diverse populations. Moreover, reliability and validity should be viewed as a matter of degree rather than in absolute terms, and tests must be re–evaluated as populations and testing contexts change over time (Nunnally & Bernstein, 1994). Developers of CNADs are encouraged to update their psychometric studies and their normative databases over time. Working knowledge of reliability and validity, and the factors that impact those psychometric constructs, is a central requirement for responsible and competent test use, whether the measure is used for diagnostic or research purposes.
Neuropsychological tests yield scores that are derived from a comparison of a person's performance to the performance of a healthy normative sample, clinical samples, one's own expected level of performance, or, in the case of symptom validity tests, research participants who had been given specific instructions to perform in a certain manner. The quality and representativeness of normative data can have a major effect on the clinical interpretation of test scores (c.f., Mitrushina, Boone, Razani, & D'Elia, 2005). APA (2010) Ethical Standard 9.02 (Use of Assessments), Section (b) states, “Psychologists use assessment instruments whose validity and reliability have been established for use with members of the population tested. When such validity or reliability has not been established, psychologists describe the strengths and limitations of test results and interpretation.”
It cannot be assumed that the normative data obtained for an examiner-administered test apply equally well to a computerized version of the same test, due to changes in the method used to conduct the administration and variations in computer familiarity according to patient demographics. Studies of the comparability of computerized measures that are adaptations of examiner administered tests indicate that there are substantive differences in some samples (Berger, Chibnall, & Gfeller, 1997; Campbell et al., 1999; Choca & Morris, 1992; Ozonoff, 1995; Tien et al., 1996), further demonstrating the need for new normative data obtained with the computerized test. As we have indicated above, a computerized test adapted from an examiner administered test is a new test. As a result, it is essential that new normative data with adjustments for the pertinent demographic variables be established for computerized tests. The relevant standard from the APA 2010 Ethics Code, Standard 9.02 (Use of Assessments) Section (a) states that, “Psychologists administer, adapt, score, interpret, or use assessment techniques, interviews, tests, or instruments in a manner and for purposes that are appropriate in light of the research on or evidence of the usefulness and proper application of the techniques.”
Prior to using tests diagnostically, it is important to have information relating to their accuracy for that purpose (Newman & Kohn, 2009). Operating characteristics (sensitivity, specificity, positive predictive power, and negative predictive power) are important and should be considered when using a test in a specific clinical setting. Sensitivity is the accuracy of a test for identifying a condition of interest (i.e., the proportion with the condition that is correctly identified); such cases are considered true-positive results. Specificity is the proportion of people who do not have the condition who are correctly identified; such cases are considered true-negative results. Positive predictive value (PPV) is the probability that a person has a disease or condition given a positive test result; that is, the proportion of individuals with positive test results who are correctly identified by the test. Negative predictive value (NPV) is the probability that a person does not have a disease or condition given a negative test result; that is, the proportion of individuals with negative test results who are correctly identified as not having the condition. PPV and NPV are related to the base rate, or prevalence in a given population, of the condition/disease that one is trying to identify. For example, a sensitive test may result in many false-positive results (low PPV) if the prevalence of the condition of interest is low. Research relating to sensitivity, specificity, and diagnostic classification accuracy (Retzlaff & Gibertini, 2000) is an important foundation for the proper use of CNADs just as it is for examiner-administered neuropsychological tests. Research relating to diagnostic validity must be evaluated with a critical eye toward the actual diagnostic question to which the CNAD will be applied. For example, it is important to know whether a measure is useful in differentiating patients with dementia from neurologically intact individuals, or whether it can also be useful in making the more difficult distinction between those with dementia and those with mild cognitive impairment.
Data included in the technical manual should be appropriate to the use of the test intended by the developer. Dependent on the claim made, information related to reliability, validity, and diagnostic classification should be included to allow the user to evaluate the utility of that specific CNAD in fulfilling its claimed purpose. Also, test developers are encouraged to make clear to the user what other information is required in the applied context in order to use the test most appropriately. This can be done, for example, by relating performance on the test to prevailing diagnostic algorithms with proven validity in the clinical literature.
(6) Examinee issues: Cultural, experiential, and disability factors
Test developers should provide appropriate normative information that allows the user to determine whether the CNAD can be given to patients from different racial, ethnic, age, and educational backgrounds. Some patients with cognitive, motor, or sensory disabilities might have difficulty completing a computerized test in the manner intended by the developer. In addition, individual differences in computer use and familiarity may affect how examinees interact with devices, utilize response modalities, or respond to stimuli. Test developers are encouraged to provide documentation that such factors have been accounted for during test standardization and validation and should provide guidance to users with regard to how motor, sensory, or cognitive impairment in targeted patient populations may affect their performance on the test. It is particularly important to specify conditions under which the test should not be used in patients with motor, sensory, or cognitive impairment.
As with examiner-administered neuropsychological assessment, computerized testing has limitations regarding the scope of information that can be obtained and the validity of data that are collected. One key aspect of this issue is how the physical, psychiatric, or neurologic condition of the patient affects his or her ability to interact with the computer interface. For example, computerized assessment places demands on the examinee's academic skills (e.g., reading), cognitive function (e.g., attention), and sensory and motor functioning (e.g., rapid reaction time) that may influence the results if the examinee has disabilities or limitations in one or more of these areas. If the examinee does not comprehend task instructions, the results of the test will not be valid, and if the program requires speeded motor responses as a proxy for cognitive processing speed, patients with bradykinesia, tremors, or hemiparesis may be significantly compromised for reasons apart from impairments in the targeted construct. With the hemiparetic patient, validity or reliability of the measure might be diminished if they use their non-dominant hand to manipulate the mouse or provide a motor response. As with examiner-administered tests, numerous similar examples can be envisioned that complicate the quality of the data that emerge from CNADs (Hitchcock, 2006).
A key issue with many CNADs is that they may not include plans for consistent observation of the examinee by a trained examiner. Therefore, clinically useful information may be missed relating to task engagement, display of emotion, frustration, or tendency to give up easily when confronted with more challenging test items. Significant individual differences exist in computer use and familiarity (U Iverson et al., 2009), and results from computerized versus examiner-administered testing may be different in computer-competent versus computer-naïve populations (Feldstein et al., 1999).
Computerized assessment is constrained by the current hardware and software limitations of the field. Consequently, assessment of some important and sensitive aspects of cognitive functioning, such as free recall (vs. recognition) memory, expressive language, visual-constructional skills, and executive functioning may be difficult to incorporate into a CNAD. Clinicians utilizing CNADs in practice are responsible for recognizing the limitations of this testing approach and for appropriately documenting the impact such factors may have upon their findings. In situations involving examinees who require special testing accommodations as a result of sensorimotor limitations, aphasia, dyslexia, confusion, or variable cooperation, or in those instances that require the assessment of individuals who are less facile or comfortable with computers and tablets, examiner-administered testing may be advantageous or preferred.
(7) Use of computerized testing and reporting services
Professionals “select scoring and interpretation services (including automated services) on the basis of evidence of the validity of the program and procedures as well as on other appropriate considerations” (APA, 2010, Ethical Standard 9.09, Test Scoring and Interpretation Services, Section b). Those “who offer assessment or scoring services to other professionals accurately describe the purpose, norms, validity, reliability, and applications of the procedures and any special qualifications applicable to their use” (APA, 2010, Ethical Standard 9.09, Test Scoring and Interpretation Services, Section a). Professionals “retain responsibility for the appropriate application, interpretation, and use of assessment instruments, whether they score and interpret such tests themselves or use automated or other services” (APA, 2010, Ethical Standard 9.09, Test Scoring and Interpretation Services, Section c).
Professionals who lack training and expertise in clinical assessment might be tempted to simply accept the content of automated reports that provide descriptive or interpretive summaries of test results and to incorporate textual output from CNADs into their standard clinical reports. This might occur because clinicians assume, uncritically or without sufficient evidence, that such summaries accurately reflect an individual patient's status and that the scientific bases of such interpretations have been established in the clinical setting. Practitioners are encouraged to evaluate the accuracy and utility of automated clinical reports in light of the total corpus of information available on the patient, including symptom reports, functional abilities, personal and family history, and other relevant factors. Automated reports are best viewed as an important “resource” for knowledgeable professionals, rather than as a “substitute” for necessary and sufficient expertise.
(8) Checks on validity of responses and results
Examinee compliance, cooperation, and sufficient motivation are essential to the process of obtaining valid neuropsychological test data (AACN, 2007; Bush et al., 2005; Heilbronner et al., 2009). Developers of CNADs are encouraged to address these issues during test development and standardization. It is important for test developers to consider carefully the role of motivation and effort when conducting computerized testing. This is particularly true for CNADs intended for use by professionals unfamiliar with the signs and consequences of reduced effort on cognitive test performance. Test developers are encouraged to (a) provide information on how poor effort can be identified by patterns of performance on the CNAD or (b) make specific recommendations about additional tests or procedures that can be concurrently conducted to evaluate examinee effort.
Over the past few decades, research on effort and its effects on the validity of neuropsychological test results has dominated forensic neuropsychology, and significant interest has been devoted to understanding the role of effort and motivation in producing impairments on a wide variety of neuropsychological tests (Boone, 2007; Larrabee, 2007; Sweet, King, Malina, Bergman, & Simmons, 2002). Effort has been shown to substantially influence neurocognitive test scores, and in some studies, the variance attributable to effort is greater than that attributable to injury severity or other variables more directly related to underlying pathophysiology (Constantinou, Bauer, Ashendorf, Fisher, & McCaffrey, 2005; Green et al., 2001; Stevens, Friedel, Mehen, & Merten, 2008; West, Curtis, Greve, & Bianchini, 2011). These findings lead to the inescapable conclusion that carefully considering patient motivation and effort is a mainstream part of clinical practice (Boone, 2009). Without some form of assurance that the examinee has put forth adequate effort in completing neuropsychological tests, the clinician cannot interpret low test scores as evidence of impairment in a particular neurocognitive ability.
The assessment of effort requires the use of empirically derived indicators. Behavioral observations made during testing by a trained examiner are also useful but may suffice only when the lack of cooperation and effort affects overt behavior. Test developers are encouraged to provide users with procedural guidance about how to identify poor effort on the CNAD. This can be done by documenting a built-in measure of effort that has been appropriately validated within the CNAD or by providing specific recommendations regarding other validated tests of effort that should be administered along with the CNAD.
This position paper is intended to provide guidance for test developers and users of CNADs that will promote accurate and appropriate use of computerized tests in a way that maximizes clinical utility and minimizes risks of misuse. We fully recognize the tension that exists between industry and professional users in bringing CNADs to market. On the one hand, there is substantial need to improve access to neurocognitive testing for underserved patients who, by virtue of economic, socioeconomic, geographical, logistical, or cultural reasons are not referred for, or cannot access, needed services. On the other hand, the development of CNADs is a complex enterprise; the tests themselves measure complex constructs, the technical and information technology issues that ensure appropriate installation of the test in the local environment are nuanced, and the manner in which different patient groups interact with the assessment device introduces important sources of variance that can affect the interpretation of the test results.
Although there are clear differences between stand-alone computerized platforms of common examiner-administered tests (e.g., Wisconsin Card Sorting Test) and full-fledged computerized testing systems, all developers of CNADs are encouraged to provide users with core information regarding (a) test reliability, validity, accuracy, and utility; (b) technical specifications, including how to insure that the local installation faithfully duplicates the environment in which normative data was collected; (c) methods to protect privacy and data integrity; (d) the minimal qualifications of those who can install, administer, or interpret the test; (e) further requirements regarding utilization of computerized or actuarial reporting services; (f) information on who can and cannot benefit from undergoing assessment; (g) what the test claims to be able to do for the patient and/or professional user; and (h) guidance with regard to how submaximal effort affects test results and how to interpret results when the examinee intentionally or unintentionally underperforms due to reasons other than neurocognitive compromise.
CNADs (both individual tests and test batteries) are expected to meet the same psychometric standards of adequate reliability and validity for the intended clinical populations as examiner-administered neuropsychological tests. Adaptation of an examiner-administered test for computers or tablets should be accompanied by the development of normative or equivalency data for the computerized version; a computerized version is a new test and not merely a slightly different format for an existing test. Expertise in the interpretation of computerized tests requires advanced knowledge of testing theory and the complex interaction of multiple factors that can affect performance on cognitive tests, aside from putative or clearly established injury to the brain. Such expertise is typically obtained from specialized education, training, and experience in clinical neuropsychology.
Qualified test users understand that CNAD results must be interpreted in the context of relevant history, other test findings, and data available from other disciplines. For test results to be considered valid, all neuropsychological testing, including computerized testing, require adequate motivation and cooperation from examinees.
It is clear that the competent use of appropriately developed computerized neuropsychological measures will serve an increasingly important role in the evaluation of a variety of patient populations. The use of CNADs clearly has a role in bringing valid and effective neuropsychological evaluation techniques to underserved populations. However, such application should proceed with an understanding that effective use of such techniques is not “plug and play,” but in fact requires attention to a broad range of factors that determine whether the test will be useful, accurate, and appropriate in the intended setting. Users and consumers of CNADs must be mindful that ethical and clinically useful practice require that such tests meet appropriate quality and efficacy criteria and that those employing CNADs have the education, training, and experience necessary to interpret their results in a manner that will best meet the needs of the patients they serve.
Russell M. Bauer, Ph.D. is supported in part by grant UL1 RR029890 to the University of Florida.
Conflict of Interest
G.L.I. has led or been a member of research teams that have received grant funding from test publishing companies, the pharmaceutical industry, and the Canadian government to study the psychometrics of computerized and traditional neuropsychological tests. These companies include AstraZeneca Canada, Lundbeck Canada, Pfizer Canada, ImPACT Applications, Inc., CNS Vital Signs, Psychological Assessment Resources (PAR, Inc.), and the Canadian Institute of Health Research. He is also a co-author on a test published by PAR, Inc. A.N.C.'s views are her own and do not necessarily represent the views of the Department of Veteran's Affairs (VA). The VA had no role in the writing of the article or the decision to submit it for publication. R.M.R. has published four tests with Psychological Assessment Resources, Inc. L.M.B. is the author and publisher of the Portland Digit Recognition Test.
The authors thank NAN Policy and Planning Committee members Shane Bush, Ph.D., William MacAllister, Ph.D., Thomas Martin, Ph.D., and Michael Stutts, Ph.D. for their review and suggestions regarding this article.