The field of psychology is rife with fallacies and myths (Lilienfeld, Lynn, Ruscio, & Beyerstein, in press), and intellectual, neuropsychological, educational, and personality testing are hardly immune to this phenomenon. Although some misconceptions about testing are limited largely to the academic world, others are even more widespread among members of the general public—including prominent public figures. In an interview in the early 1990s, future Supreme Court Justice Sonia Sotomayor referred to the “cultural biases built into [standardized] testing,” arguing that these biases have “been shown by statistics.” The notion that standardized tests display marked cultural biases is merely one of the untrue “truths” addressed by the contributors to this volume. At its best, Correcting Fallacies About Educational and Psychological Testing functions as an intellectual vaccine, immunizing readers against intuitively plausible yet unsupported notions about testing.
The book comprises an Introduction by the editor (Phelps), seven chapters spanning the fields of intelligence testing, educational achievement testing, psychiatric diagnostic testing, college admissions testing, employment testing, certification and licensure testing, and cognitive diagnostic testing, and an eighth summary chapter co-authored by Phelps and Linda Gottfredson. Useful ancillary features include a glossary of key terms and a website featuring logical arguments against and for intelligence testing.
The opening chapter by Gottfredson is among the most compelling in the book, and delineates 13 fallacies concerning intelligence and intelligence testing. For example, she refutes widespread claims that genetic influences on intelligence equal biological influences on intelligence (many biological influences, such as nutrition, are non-genetic), intelligence tests must be free of error before they can be valid (no psychological test of any construct is free of measurement error), and intelligence tests are biased against certain minority groups (evidence demonstrates they are not).
Janet Carlson and Kurt Geisinger take aim at overstated claims that tests designed to assist clinicians with diagnosing psychopathology are overly expensive, culturally biased or irrelevant, and easily faked. Their chapter is a helpful guide to commonplace misunderstandings of diagnostic testing, although their assertion that “substantial evidence supports the treatment utility of assessments” (p. 73) is itself overstated. To the contrary, few randomized studies have demonstrated that providing some clinicians, but not others, with specific assessment information (e.g., MMPI-2 results), enhances treatment outcomes (Lima et al., 2005).
In his chapter, Wayne Camara notes “the harmful, antiscientific practice of characterizing anecdote as fact” (p. 175) in the domain of standardized testing. For example, he observes that popular claims that the SAT and other standardized tests measure little more than socioeconomic status (SES), a view expressed by the recent ex-president of the University of California system Richard C. Atkinson (ironically, himself a prominent psychologist), are refuted by meta-analytic findings (Arneson, Waters, Sackett, Kuncel, & Cooper, 2006). Such findings show that the correlation between SAT scores and college grades remains virtually unchanged after controlling statistically for SES. Camara also dispels assertions that standardized tests are easily coached; most research demonstrates that enrollment in commercial standardized test preparation courses yields an average boost of only about 20 total points on the SAT, far less than claimed by most test preparation companies.
Ernest O'Boyle and Michael McDaniel debunk claims that employment tests discriminate against racial minorities, do not display validity generalization across job contexts, and necessarily invade respondents' privacy. They convincingly refute assertions that unstructured interviews provide a great deal of useful information not afforded by standard employment tests, citing meta-analytic evidence (Schmidt & Hunter, 1998) that unstructured employment interviews possess minimal incremental validity beyond measures of intelligence. Regrettably, O'Boyle and McDaniel fall prey to a widespread misconception themselves (p. 188) by equating the clinical versus actuarial debate (which is no longer really a scientific debate; seeGrove et al., 2000) with the question of whether “gut feelings” or diagnostic tests are superior predictors of behavior. As Meehl (1954) noted over a half century ago, this debate does not bear on the question of which data (e.g., quantified clinical impressions, structured personality tests) we should use; it bears only on the question of how we should integrate such data once they have been collected.
The most troubling entry of the book is Phelps' chapter on educational achievement testing, which comes across as more of an emotionally charged polemic than a dispassionate review of the literature. Phelps performs a valuable service by dispelling oft-repeated misconceptions about achievement testing, such as claims that testing affords few or no benefits, high stakes testing generates artificial test score inflation, and American students are massively over-tested. Yet, much of the chapter consists of summary statements about past research rather than discussions of the methodologies of the studies themselves, leaving readers to accept the author's conclusions largely on faith. Moreover, the chapter assumes an unfortunate and even unprofessional tone when Phelps repeatedly accuses some educational researchers of having “rigged” studies so that only “negative results were possible” (p. 124) and “censored and suppressed” (p. 127) work that is not to their liking, even to the point of suggesting that scholars who are “among the most ambitious and narcissistic” (p. 127) often lack the willingness to conduct comprehensive literature reviews. Phelps may have a point about censorship of data, but he probably underestimates the power of confirmation bias (Nickerson, 1998) as a contributor to differences of opinion in such matters.
There are also a few notable omissions. For instance, there is no explicit discussion of fallacies concerning neuropsychological testing, such as the assertion that neuropsychological assessment is useful only for examining cognitive ability among individuals with brain dysfunction—which fosters the belief that assessing normally functioning individuals is uninformative. This belief, launched by Dodrill's (1997) paper, dispelling six myths of neuropsychology, stemmed from findings that a linear association between full scale IQ (FSIQ) and neuropsychological performance reaches an asymptote when intellectual functioning is well above the average. Following compelling evidence from later studies demonstrating that this association remains linear even at high levels of FSIQ (Tremont, Hoffman, Scott, & Adams, 1998; Bell & Roper, 1998), Dodrill (1999) amended this claim.
In her chapter on large-scale cognitive diagnostic testing, Leighton comments on the limitations of drawing inferences about intellectual ability from a single test score. The problem of reliance on one score, such as FSIQ, to represent overall cognitive ability has also been addressed by neuropsychologists (Lezak, Howieson, & Loring, 2004). Although this problem may seem benign, Folstein (1989) noted that cognitively disabled individuals may be refused benefits on the grounds that their IQ falls above the cutoff for mental retardation (i.e., 70), even though they are disabled in pertinent aspects of cognitive functioning.
Practicing neuropsychologists also often fall prey to the “localization of function” myth: they may base erroneous inferences on the assumption that certain neuropsychological tests measure highly specific brain functions. As one example, because the frontal lobes are sometimes presumed to mediate executive functioning exclusively, neuropsychological tests of executive functioning can be falsely interpreted as measuring frontal lobe dysfunction (Reitan & Wolfson, 1995).
Moreover, there is no mention of the “learning styles” movement, which attempts to assess students' preferred learning styles (e.g., visual, auditory) and match these styles to teachers' instructional styles. Despite the remarkable popularity of this movement in educational circles, there is scant evidence that students exhibit reliable learning styles over time or that matching learning styles to instructional styles yields improvements in student learning outcomes (Zhang, 2006).
These significant shortcomings aside, Correcting Fallacies About Educational and Psychological Testing should prove a valuable resource for teachers and students alike. This text is a needed reminder that even psychological myths we have long assumed to be dead often remain alive and well. In the words of Faulkner (1951), “The past is never dead. It's not even past.”