Abstract

The Test of Memory Malingering (TOMM) is a measure of test-taking effort which has traditionally been utilized with adults, but which more recently has demonstrated utility with children. The purpose of this study was to investigate whether the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) Digit Span, commonly used in neuropsychological evaluations, can also be functional as an embedded measure by detecting effort in children with dual diagnoses; a population yet to be investigated. Participants (n = 51) who completed neuropsychological evaluations including the TOMM, WISC-IV, Wisconsin Card Sorting Test, Children's Memory Scale, and Delis–Kaplan Executive Function System were divided into two groups: Optimal Effort and Suboptimal Effort, based on their TOMM Trial 2 scores. Digit Span findings suggest a useful scaled score of ≤4 resulted in optimal cutoff scores, yielding specificity of 91% and sensitivity of 43%. This study supports previous research that the WISC-IV Digit Span has good utility in determining optimal effort, even in children with dual diagnosis or comorbidities.

Introduction

Research supports the utility of both stand alone and embedded Symptom Validity Tests (SVTs) to accurately determine optimal test-taking effort. The study of pediatric motivation during cognitive and behavioral evaluations is proving to be valuable and essential in the field of neuropsychology. However, children who provide a suboptimal test-taking effort often do so for less advantageous reasons than has been suggested from adults (Constantinou & McCaffrey, 2003). Reasons include boredom, complex psychosocial circumstances, difficulty with attention or concentration, cognitive limitations, or perhaps just disinterest in testing or refusing to cooperate with the examiner making it increasingly difficult to measure cognitive functioning (Courtney, Dinkins, Allen, & Kuroski, 2003; Donders, 2005; Kirkwood, Yeates, Randolph, & Kirk, 2012). In addition, effort may fluctuate across an examination. A child's effort during testing may wax and wane depending on the above-mentioned factors. Thus, ongoing assessment across the battery, via both stand-alone SVTs and embedded measures is helpful. Research regarding effort and motivation has largely focused on stand-alone SVTs (Blaskewitz, Merten, & Kathmann, 2008; Brooks, Sherman, & Krol, 2012; Constantinou, Bauer, Ashendorf, Fisher, & McCaffrey, 2005; Etherton, Bianchini, Greve, & Ciota, 2005; Gervais, Rohling, Green, & Ford, 2004; Inman & Berry, 2002; Kirkwood et al., 2012; O'Bryant & Lucas, 2006), despite the fact that many authors recommend multiple measures of effort as being superior to any one single measure (Greve, Binder, & Bianchini, 2009; Larrabee, 2008; Nelson et al., 2003). Kirkwood and colleagues (2012) examining children with mild traumatic brain injury (mTBI; n = 276; age range 8–16 years) and Perna and Loughan (2012) examining children with a mixture of clinical diagnoses (N = 75; age range 6–18 years) revealed positive correlations with neuropsychological performance and stand-alone SVTs supporting their usefulness during neuropsychological evaluations. However, making use of multiple indicators of effort, including already administered embedded measures, enhances the economy (dual purpose) of testing, allows for continual assessment of motivation during testing, and also helps cross-validation of findings in concluding about someone's level of effort or the validity of the data collected (Arnett, Hammeke, & Schwartz, 1995; Meyers & Volbrecht, 2003; Sherman, Boone, Lu, & Razani, 2002). In fact, when testing children, the profile of attention span, age, and intellectual level may negatively impact performance on measures of effort, which, if assessed only once during testing, could lead to inaccurate assumptions (Blaskewitz et al., 2008; Irwin-Chase & Burns, 2000). While some research has suggested that multiple effort measures may contradict one another, thus reducing the effectiveness of using a singular measure (Rosenfeld, Sands, & van Gorp, 2000), research by Nelson and colleagues (2003) found that using several effort measures is beneficial, especially if they are based on different neurocognitive domains (e.g., attention, memory).

In adults, the value of embedded indicators is well established. The Digit Span subtest from the Wechsler instruments is one of the most thoroughly investigated of all embedded measures. One method used to quantify the diagnostic ability of a measure is to calculate sensitivity and specificity (Altman & Bland, 1994). Sensitivity is defined as the proportion of true positives that are correctly identified by a measure and specificity is defined as the proportion of true negatives that are correctly identified by a measure. Although there is no universally accepted standard or ideal for specificity and sensitivity in the field, consensus among the literature appears to agree that achieving specificity above 90% combined with the highest sensitivity is optimal. The absence of an ideal appears to be a function of individual differences in tolerance for different kinds of diagnostic errors. Research by Axelrod, Fichtenberg, Millis, and Wertheimer (2006) support the Digit Span as an effective measure of effort in addition to a compliment of SVTs such as the Test of Memory Malingering (TOMM; Axelrod et al., 2006; Nies & Sweet, 1994). Although pediatric populations have been researched less, a recent study published by Kirkwood, Hargrave, and Kirk (2011) evaluating school age children (N = 274) with mTBIs found that Digit Span Age Corrected scaled scores (ss) of ≤5 as the most effective cutoff with a sensitivity of 51% and a specificity of 96%. For Reliable Digit Span, the optimal cutoff score was ≤6 with a sensitivity of 51% and specificity of 92%. Perna, Loughan, Hertza, and Segraves (2012) investigated a sample of 75 heterogeneous clinically diagnosed children (ages 6–18) finding an effective Digit Span cutoff of ≤4 with 44% sensitivity and 94% specificity.

The extensive body of research on the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) Digit Span suggests that individuals with neurologic and developmental disabilities have reduced WISC-IV Digit Span scores when compared with neurotypical peers. Childhood developmental disabilities have repeatedly shown reduced Digit Span ss. Coutinho, Mattos, and Malloy-Diniz (2009) reported children with Attention Deficit Hyperactivity Disorder (ADHD; n = 186) to have significantly reduced Digit Span Forward (DSF; ss = 5.73, p = .013) and Backward (DSB; ss = 3.90, p = .01) scores when compared with 80 control children (DSF = 6.20; DSB = 4.39). Similarly, Mayes and Calhoun (2007) suggested that children (aged 6–16) with ADHD (n = 724) and children with autism (n = 118) had significantly reduced (p < .0001) Digit Span scores when compared with their neurotypical peers (n = 149). Empirical research has also highlighted similar negative effects on Digit Span scores in children with affective disorders. Gruzelier, Seymour, Wilson, Jolley, and Hirsh (1988) revealed that individuals with schizophrenia (n = 36; ss = 5.85, p < .008), depression (n = 12; ss = 5.00, p < .001), and symptoms of mania (n = 9; ss = 6.11, p < .02) had significantly reduced Digit Span scores than their non-diagnosed peers (n = 36; ss = 6.85). Although revealing a small effect size, Kaslow, Rehm, and Siegel (1984) examined 108 children (age range 6–14 years) and reported that children's depression scores, as measured by the Child Depression Inventory (CDI; Kovacs, 1992), significantly negatively correlated with performance on Digit Span (r = −.22, p < .01) and had a small-to-medium magnitude.

To the author's knowledge, no study has investigated the utility of embedded measures, specifically the WISC-IV Digit Span, in children with comorbidities. In recent decades, the field of psychology has broadened its awareness to psychiatric comorbidity and its usefulness in providing in-depth understanding to the development of psychopathology (Angold, Costello, & Erkanli, 1999). Unfortunately, literature all too often fails to consider the frequencies that psychiatric or neurologic conditions co-occur (Caron & Rutter, 1991), even though the rate of comorbidity is astounding. A brief literature review revealed that in medical populations, 40%–50% of children diagnosed with epilepsy were reported to have comorbid DSM-IV Axis I disorders (Jones et al., 2007; Pellock, 2004), 58% of children with TBIs had dual diagnosed psychiatric disorders following injury (Bloom et al., 2001), and a nearly 4-fold increase of very low birth weight children were subsequently diagnosed with ADHD when compared with their normal birth weight peers (Botting, Powls, & Cooke, 1997). Similar results were evident in developmental populations: 50%–90% comorbidity rate has been reported in children with ADHD (Cuffe, Moore, & McKeown, 2005; Spencer, Biederman, & Wilens, 1999), 70% of children with autism spectrum disorder have been diagnosed with comorbid psychiatric disorders (Simonoff et al., 2008), 40%–50% of children with math disorders also have a Reading Disorder (RD; Lewis, Hitch, & Walker, 1994; Shalev, Auerbach, Manor, & Gross-Tsur, 2000), and RDs have been correlated with high rates of comorbidity in boys with externalizing disorders including ADHD, oppositional defiant disorder, and conduct disorder (Willcutt & Pennington, 2000).

Further, there is reason to hypothesize that children with comorbidities will also have compromised Digit Span scores. As discussed previously, Digit Span has been proven useful as an embedded effort measure in children with mTBI (Kirkwood et al., 2011) and in children with a range of neurologic and developmental disabilities (Perna et al., 2012). However, there remains question as to whether this measure or the suggested ss cutoff of 4 or 5 will hold when evaluating only children with comorbidities. Literature supports that children with dual diagnoses also have reduced WISC-IV Digit Span scores when compared against their typical peers or those with single diagnoses. Wu, Anderson, and Castiello (2002) found a significant difference between Digit Span scores when evaluating children (n = 112; age range 7–13 years) with ADHD, ADHD and Learning Disabilities (LDs), and controls. Children with the dual diagnosis (ADHD + LD; ss = 6.92) scored lower than those with ADHD alone (ss = 7.96) or no diagnosis (ss = 8.79). Roodenrys, Koloski, and Grainger's (2001) study revealed similar findings in that children with dual diagnoses of ADHD and RD (ss = 4.16) had significantly reduced Digit Span scores compared with peers without multiple diagnoses (ss = 5.20; p < .01) and marginally different from children with RD alone (ss = 4.55; p < .1). This led the authors to believe that the cutoff score for Digit Span when being used in children with dual diagnoses may be lower than previously reported by Kirkwood and colleagues (2011) and Perna and colleagues (2012).

Based on the aforementioned research, our inquiry investigates whether the WISC-IV Digit Span is valuable in detecting suboptimal effort during child neuropsychological evaluations in a clinical sample of dually diagnosed children, which to the author's knowledge has yet to be investigated. Our hypothesis is that this frequently given embedded measure during neuropsychological evaluations will be shown as useful, yet may reveal a lower cutoff score for suboptimal effort than previous findings: ≤5 (Kirkwood et al., 2011) and ≤4 (Perna et al., 2012).

Methods

Participants

Data were collected on a sample of 51 children (girls = 19, boys = 32) who were referred for neuropsychological evaluation due to academic and behavioral problems (Table 1). All children in the study samples were dually diagnosed with two or more disorders. Diagnoses included heterogeneous mixtures of affective disorders, mild uncomplicated TBIs, ADHD, LDs, Pervasive Developmental Disabilities (PDDs), and/or Intellectual Disabilities (IDs). While recognizing the potential limitations of generalizability given a heterogenous rather than a homogeneous sample, the aforementioned sample was chosen given its representation of complex cases frequently seen by neuropsychologists. A cutoff determined by such a sample may be helpful in addressing those cases where diagnosis is not clear-cut and where comorbidities complicate the application of previous studies and cutoffs.

Table 1.

Demographics by group

Demographic Whole sample (n = 51) Optimal group (n = 44) Suboptimal group (n = 7) 
Age 11.8 (3.5) 11.5 (3.4) 8.4 (2.3) 
Grade level 5.6 (3.3) 6.0 (3.2) 2.6 (2.0) 
Gender 
 Men (%) 63 64 57 
 Women (%) 37 36 43 
Demographic Whole sample (n = 51) Optimal group (n = 44) Suboptimal group (n = 7) 
Age 11.8 (3.5) 11.5 (3.4) 8.4 (2.3) 
Grade level 5.6 (3.3) 6.0 (3.2) 2.6 (2.0) 
Gender 
 Men (%) 63 64 57 
 Women (%) 37 36 43 

Diagnoses were determined and classified based strictly on DSM-IV-Text Revision (TR) (American Psychiatric Association, 2000) diagnostic criteria. Children diagnosed ADHD had at least six of nine of the Inattention or Hyperactive/Impulsive symptoms. Children diagnosed with LDs had a discrepancy analysis with a 1.5–2 SD discrepancy equation—FSIQ (Full-Scale IQ)/GAI (General Ability Index) versus academic achievement. The GAI can be useful in the determining eligibility for classification due to the reduced influence of working memory and processing speed; two domains frequently impacted in children with LDs (Raiford, Weiss, Rolfhus, & Coalson, 2008). To be included as having an Affective Disorder (as defined by one of the following diagnoses: Disruptive Behavior Disorder, Oppositional Defiant Disorder, Conduct Disorder, Major Depression, or Anxiety Disorder), diagnoses were based on clinical interviews with the child and a parent, review of a DSM-IV-TR diagnostic checklist, records review, Achenbach Child Behavior Checklist, Youth Self-Report, Teacher Rating Scale (Achenbach 1991), CDI (Kovacs, 1992), and other neuropsychological test data. The classification of PDD was determined following a clinical interview, record history, a DSM-IV diagnostic checklist, and the Gilliam Autism Rating Scale (Gilliam, 1995). Lastly, the classification of IDs was determined using a WISC-IV FSIQ below 70 and deficits or impairments present in adaptive functioning as measured by the Vineland Adaptive Behavior Scales-II (Sparrow, Cicchetti, & Balla, 2005). The sample diagnostic subgroups includes: 32 children diagnosed with an Affective Disorder (63%), 18 diagnosed with mTBI (35%), 38 diagnosed with ADHD (75%), 13 diagnosed with LDs (26%), 5 diagnosed with PDD (10%), and 12 diagnosed with IDs (24%). All human data included in this manuscript were obtained in compliance with the Helsinki Declaration.

The mean age was 11.8 (3.5), age range 6–18, and mean education level was 5.6 (3.3) years. The samples mean FSIQ was low average (88.1 [17.4]). The mean TOMM scores for the three trials were 44.3, 48.0, and 47.3. None of the children were involved in any reported litigation, had any recent accidents, and were not currently involved in a custody dispute. There was no apparent gain for these children to underperform on testing. None of the children evaluated were blind or had significant visual perceptual impairments, deaf, acutely confused, or reported chronic pain.

Measures

All participants completed a neuropsychological evaluation consisting of a clinical interview, records review, and tests, including the WISC-IV (Wechsler, 2003), Wisconsin Card Sorting Test (WCST; Heaton, Chelune, Talley, Kay, & Curtiss, 1993), Children's Memory Scale (CMS; Cohen, 1997), the Trail Making subtest of the Delis–Kaplan Executive Function System (DKEFS; Delis, Kaplan, & Kramer, 2001), as well as several other measures that were not entered into this database. Embedded measures used for comparison with TOMM scores included the WISC-IV Digit Span subtest. The data were entered into and analyzed via SPSS.

Results

The mean FSIQ score for the whole sample was 88.1, SD = 17.4, and ranged from 44 to 127. The verbal comprehension index of the group ranged from 53 to 119, M = 93.2, and SD = 14.9; the perceptual reasoning index ranged from 47 to 145, M = 93.5, and SD = 18.8; the working memory index ranged from 50 to 135, M = 87.0, and SD = 17.8; and the processing speed index ranged from 53 to 125, M = 84.7, and SD = 14.5. The groups mean's for the WCST were average. The group's DKEFS Trail 2 scores were also average (M = 93.2, SD = 20.4); however, when increasing the demands during Trail 4, scores decreased to low average (M = 88.3, SD = 24.4). The groups mean for the CMS verbal memory were average; however, CMS visual memory and recognition were low average (Table 2).

Table 2.

Test score means (SD) for the whole sample, OEG, and SEG

Neuropsychological Measures Whole Sample (n = 51; M [SD]) Optimal Effort Group (n = 44; M [SD]) Suboptimal Effort Group (n = 7; M [SD]) Effect Sizea 
Full-Scale IQ 88.1 (17.4) 90.0 (17.7) 76.7 (11.1) Large 
Verbal Comprehension 93.2 (14.9) 94.3 (14.9) 86.1 (13.5) Large 
Perceptual Reasoning 93.5 (18.8) 96.5 (18.1) 74.4 (9.4) Large 
Working Memory 87.0 (17.8) 88.0 (17.9) 80.9 (17.7) Small 
Processing Speed 84.7 (14.5) 85.3 (14.9) 80.7 (10.7) Small 
Digit Span 86.7 (16.4) 88.1 (16.2) 78.1 (16.2) Medium 
CMS Visual Immediate 87.1 (15.9) 87.6 (15.5) 83.3 (20.9) Medium 
CMS Visual Delay 87.1 (16.5) 87.9 (16.7) 81.5 (15.2) Small 
CMS Verbal Immediate 92.5 (20.7) 93.1 (19.9) 88.7 (26.4) Small 
CMS Verbal Delay 90.5 (19.4) 91.9 (18.8) 82.1 (23.0) Medium 
CMS Delayed Recognition 89.9 (19.5) 91.1 (18.2) 91.1 (18.2) Medium 
WCST Categories 94.7 (10.1) 95.2 (10.3) 91.8 (9.1) Small 
WCST Perseverations 92.7 (17.3) 92.9 (18.2) 91.0 (9.4) Small 
WCST Failure to Maintain Set 95.0 (11.0) 95.1 (10.5) 93.7 (9.4) Small 
DKEFS Trails 2 93.2 (20.4) 95.1 (19.4) 77.0 (24.4) Large 
DKEFS Trails 4 88.3 (24.4) 88.5 (25.1) 88.5 (25.1) Small 
TOMM Trail 1 44.3 (5.5) 45.48 (4.3) 37.14 (6.8) Large 
TOMM Trail 2 48.0 (4.0) 49.32 (1.3) 40.00 (6.0) Large 
Neuropsychological Measures Whole Sample (n = 51; M [SD]) Optimal Effort Group (n = 44; M [SD]) Suboptimal Effort Group (n = 7; M [SD]) Effect Sizea 
Full-Scale IQ 88.1 (17.4) 90.0 (17.7) 76.7 (11.1) Large 
Verbal Comprehension 93.2 (14.9) 94.3 (14.9) 86.1 (13.5) Large 
Perceptual Reasoning 93.5 (18.8) 96.5 (18.1) 74.4 (9.4) Large 
Working Memory 87.0 (17.8) 88.0 (17.9) 80.9 (17.7) Small 
Processing Speed 84.7 (14.5) 85.3 (14.9) 80.7 (10.7) Small 
Digit Span 86.7 (16.4) 88.1 (16.2) 78.1 (16.2) Medium 
CMS Visual Immediate 87.1 (15.9) 87.6 (15.5) 83.3 (20.9) Medium 
CMS Visual Delay 87.1 (16.5) 87.9 (16.7) 81.5 (15.2) Small 
CMS Verbal Immediate 92.5 (20.7) 93.1 (19.9) 88.7 (26.4) Small 
CMS Verbal Delay 90.5 (19.4) 91.9 (18.8) 82.1 (23.0) Medium 
CMS Delayed Recognition 89.9 (19.5) 91.1 (18.2) 91.1 (18.2) Medium 
WCST Categories 94.7 (10.1) 95.2 (10.3) 91.8 (9.1) Small 
WCST Perseverations 92.7 (17.3) 92.9 (18.2) 91.0 (9.4) Small 
WCST Failure to Maintain Set 95.0 (11.0) 95.1 (10.5) 93.7 (9.4) Small 
DKEFS Trails 2 93.2 (20.4) 95.1 (19.4) 77.0 (24.4) Large 
DKEFS Trails 4 88.3 (24.4) 88.5 (25.1) 88.5 (25.1) Small 
TOMM Trail 1 44.3 (5.5) 45.48 (4.3) 37.14 (6.8) Large 
TOMM Trail 2 48.0 (4.0) 49.32 (1.3) 40.00 (6.0) Large 

Notes: CMS = Children's Memory Scale; WCST = Wisconsin Card Sorting Test; DKEFS = Delis–Kaplan Executive Function System; TOMM = Test of Memory Malingering.

aSmall effect sizes equate to <0.50, medium effect sizes range between 0.50 and 0.79, and large effect sizes are >0.80 (Cohen, 1992).

Scores for the whole sample on Trial 1 of the TOMM ranged from 25 to 50 with a mean equal to 44.3 (SD = 5.5). Scores on the Trial 2 of the TOMM ranged from 28 to 50, M = 48.0, and SD = 4.0. Seven participants (14%) scored below the optimal published adult cutoff criteria. This percentage is slightly higher than the TOMM failure rates previously presented in literature including 2% by Constantinou and McCaffrey (2003), 3% from Donders (2005), and 4% from Kirk and colleagues (2011). However, the 14% of TOMM failures in our sample falls between the 10% reported from MacAllister, Nakhutina, Bender, Karantzoulis, and Carlson (2009) and 18% reported from Kirkwood and colleagues (2011), along with the 18.5% of failures reported from the Medical SVT according to Kirkwood and colleagues (2012). Two groups were created from these scores: Optimal Effort Group (OEG, n = 44; age range 6–18) = TOMM Trial 2 score at or above published adult cut-off criteria and Sub-OEG (SEG, n = 7; age range 6–13) = TOMM Trial 2 scores below published adult cutoff criteria. Effect sizes were large (Cohen, 1992) for both age (r2 = .095) and grade (r2 = .136). Although a comparison of the groups should not be performed due to low statistical power, performance profiles in those participants who failed the TOMM (SEG) and those who did not (OEG) are presented in Table 2 including effect sizes. A detailed description of the seven children who failed the TOMM according to the published adult cutoff criteria is provided in Table 3.

Table 3.

Detailed case-by-case description of the SEG (n = 7)

 Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 
Age 13 
Gender Female Male Female Male Male Female Male 
Dx Affective, mTBI and ADHD Affective and mTBI Affective, mTBI and ID Affective and ADHD mTBI, PDD and ID Affective, mTBI and ID Affective, mTBI and ADHD 
FSIQ 84 87 57 89 74 72 74 
TOMM Trial 1 35 28 25 40 44 44 39 
TOMM Trial 2 36 33 44 42 44 44 42 
 Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 
Age 13 
Gender Female Male Female Male Male Female Male 
Dx Affective, mTBI and ADHD Affective and mTBI Affective, mTBI and ID Affective and ADHD mTBI, PDD and ID Affective, mTBI and ID Affective, mTBI and ADHD 
FSIQ 84 87 57 89 74 72 74 
TOMM Trial 1 35 28 25 40 44 44 39 
TOMM Trial 2 36 33 44 42 44 44 42 

Notes: FSIQ = Full-Scale IQ; TOMM = Test of Memory Malingering.

In an effort to examine embedded measures, the two groups were compared by their Digit Span scores. Sensitivity (%) and specificity (%) values and their respective Digit Span cutoff ss are presented in Table 4. Scores were considered optimal when specificity was 90% or greater while maintaining maximum sensitivity. For Digit Span, the optimal cutoff ss was ≤4, resulting in sensitivity of 43% and specificity of 91%.

Table 4.

Sensitivity, specificity, PPV, and NPV values for Digit Span cutoffs

Digit Span cutoff for Determining a “Pass” Statistics
 
SN SP PPV NPV 18% BR
 
10% BR
 
4% BR
 
3% BR
 
2% BR
 
PPV NPV PPV NPV PPV NPV PPV NPV PPV NPV 
If DS >10 
 Pass DS 0.71 0.32 0.14 0.87 0.19 0.83 0.10 0.91 0.04 0.96 0.03 0.97 0.03 0.98 
 Fail DS               
If DS >9 
 Pass DS 0.71 0.40 0.16 0.89 0.21 0.86 0.12 0.92 0.05 0.97 0.04 0.98 0.02 0.99 
 Fail DS               
If DS >8 
 Pass DS 0.57 0.55 0.17 0.89 0.21 0.85 0.12 0.92 0.05 0.98 0.04 0.98 0.02 0.98 
 Fail DS               
If DS >7 
 Pass DS 0.57 0.67 0.21 0.91 0.27 0.88 0.16 0.93 0.07 0.97 0.05 0.98 0.03 0.99 
 Fail DS               
If DS >6 
 Pass DS 0.57 0.73 0.25 0.91 0.31 0.89 0.19 0.94 0.08 0.98 0.06 0.98 0.04 0.99 
 Fail DS               
If DS >5 
 Pass DS 0.57 0.80 0.31 0.92 0.38 0.89 0.24 0.94 0.10 0.98 0.08 0.98 0.05 0.99 
 Fail DS               
If DS >4 
 Pass DS 0.43 0.91 0.43 0.91 0.51 0.88 0.35 0.93 0.17 0.97 0.13 0.98 0.09 0.99 
 Fail DS               
If DS >3 
 Pass DS 0.29 0.95 0.50 0.89 0.58 0.86 0.39 0.92 0.19 0.97 0.15 0.98 0.11 0.99 
 Fail DS               
Digit Span cutoff for Determining a “Pass” Statistics
 
SN SP PPV NPV 18% BR
 
10% BR
 
4% BR
 
3% BR
 
2% BR
 
PPV NPV PPV NPV PPV NPV PPV NPV PPV NPV 
If DS >10 
 Pass DS 0.71 0.32 0.14 0.87 0.19 0.83 0.10 0.91 0.04 0.96 0.03 0.97 0.03 0.98 
 Fail DS               
If DS >9 
 Pass DS 0.71 0.40 0.16 0.89 0.21 0.86 0.12 0.92 0.05 0.97 0.04 0.98 0.02 0.99 
 Fail DS               
If DS >8 
 Pass DS 0.57 0.55 0.17 0.89 0.21 0.85 0.12 0.92 0.05 0.98 0.04 0.98 0.02 0.98 
 Fail DS               
If DS >7 
 Pass DS 0.57 0.67 0.21 0.91 0.27 0.88 0.16 0.93 0.07 0.97 0.05 0.98 0.03 0.99 
 Fail DS               
If DS >6 
 Pass DS 0.57 0.73 0.25 0.91 0.31 0.89 0.19 0.94 0.08 0.98 0.06 0.98 0.04 0.99 
 Fail DS               
If DS >5 
 Pass DS 0.57 0.80 0.31 0.92 0.38 0.89 0.24 0.94 0.10 0.98 0.08 0.98 0.05 0.99 
 Fail DS               
If DS >4 
 Pass DS 0.43 0.91 0.43 0.91 0.51 0.88 0.35 0.93 0.17 0.97 0.13 0.98 0.09 0.99 
 Fail DS               
If DS >3 
 Pass DS 0.29 0.95 0.50 0.89 0.58 0.86 0.39 0.92 0.19 0.97 0.15 0.98 0.11 0.99 
 Fail DS               

Notes: PPVs and NPVs for Digit Span values at base rates of 18%, 10%, 4%, 3%, and 2%. Overall performance, which is passing or failing the TOMM, is based on the adult cutoff and decisions outlined in the test manual (Tombaugh, 1996). SN = Sensitivity; SP = Specificity; PPV = positive predictive value; NPV = negative predictive value; BR = base rates of failure; TOMM = Test of Memory Malingering; DS = Digit Span. Base rate references: 18% BR (Kirkwood et al., 2011); 10% BR (Loughan & Perna, 2012a; MacAllister et al., 2009); 4% BR (Kirk et al., 2011); 3% BR (Donders, 2005); and 2% BR (Constantinou & McCaffrey, 2003).

The positive predictive value (PPV) is the proportion of participants below the cutoff in the SEG, whereas the negative predictive value (NPV) is the proportion of participants above the cutoff in the OEG. For Digit Span, results showed a PPV of 43% and a NPV of 91%, which shows good concordance with the TOMM. PPV and NPV for Digit Span indices for base rates of failure at 18% (Kirkwood et al., 2011), 10% (Loughan & Perna, 2012a; MacAllister et al., 2009), 4% (Kirk et al., 2011), 3% (Donders, 2005), and 2% (Constantinou & McCaffrey, 2003) are displayed in Table 4.

Discussion

The present study sought to investigate whether the WISC-IV Digit Span commonly administered in pediatric neuropsychological evaluations could have utility in determining the level of test-taking effort in dually diagnosed children. A group of 51 children referred for neuropsychological evaluations were investigated and divided into two groups based on their TOMM Trial 2 performances, an already validated measure of effort. Fourteen percent of this sample performed below the cutoff scores for the TOMM Trial 2. During evaluations, each child was administered a series of neuropsychological measures including the WISC-IV Digit Span as part of the WISC-IV. The pattern of performance on this measure was investigated as a potential embedded determinant of effort, which to the author's knowledge has yet to be studied or published in comorbid pediatric populations. A heterogenous sample was chosen because a cutoff determined by such a sample may be helpful in addressing those cases where comorbidities complicate the application of previously determined cutoffs. That being said, the use of a heterogeneous rather than a homogenous sample is a limitation of this study.

Empirical studies suggest that the Digit Span subtest is a valid embedded determinant of effort in adults (Axelrod et al., 2006; Babikian, Boone, Lu, & Arnold, 2006; Duncan & Ausborn, 2002) and more recently in children (Kirkwood et al., 2011; Perna et al., 2012). When comparing the OEG and SEG in Digit Span performances within our sample, no significant differences were found between the groups. When assessing the Digit Span ss for specificity and sensitivity, an ss of ≤4 suggested optimal specificity at 91% with 43% sensitivity. This aligns with the previous research reported from the study of Kirkwood and colleagues (2011) which suggests a cutoff Digit Span ss of ≤5 and the Perna and colleagues (2012) study which suggests an ss cutoff of ≤4.

As suggested in Kirkwood and colleagues (2011), a positive and negative predictive power may be more functional than specificity and sensitivity as they allow for clinicians to determine whether a specific SVT/effort score is suggestive of suboptimal effort in a client. At the suggested Digit Span cutoff score based on this study's dually diagnosed sample (ss ≤ 4), the PPV was determined to be 43% and the NPV was 91%.

Overall, although our results are consistent with previous findings that the WISC-IV Digit Span subtest can be used as an embedded measure of effort in children with comorbid diagnoses, results do not fully support our original hypothesis that the cutoff score would be lower than previously reported. It appears that even with findings suggesting clinical populations have reduced Digit Span scores (Coutinho et al., 2009; Gruzelier et al., 1988; Kaslow et al., 1984; Mayes & Calhoun, 2007) and even more so when populations of multiple diagnoses are investigated (Roodenrys et al., 2001; Wu et al., 2002), the standard score cutoff scores of ≤70–75 are maintained in children providing a valid effort. These results suggest that Digit Span scores ≤4 are strongly suggestive of suboptimal effort even in children with multiple diagnoses.

The current study did not attempt to examine the base rate of non-credible effort by determining the rate of false positives. That information goes beyond the intent of this article. However, a detailed table was provided of the children in this study who failed the TOMM (n = 7). This should provide additional information regarding the demographics, diagnoses, and cognitive ability of our sample and perhaps help determine the factors which may compromise TOMM performance in children. As found in the literature, there remains question into the utility of effort testing in those with ID (Loughan & Perna, 2012a; MacAllister et al., 2009). Research has revealed that some children who fail SVT's may do so based on reduced cognitive ability, rather than true malingering or reduced motivation. In regard to the current study sample, one child had ID (Case 3 FSIQ = 57) and three had borderline cognition (Case 5 FSIQ = 74; Case 6 FSIQ = 72; and Case 7 FSIQ = 74). However, three children also had low-average cognitive performance during intelligence testing (Case 1 FSIQ = 84; Case 2 FSIQ = 87; and Case 4 FSIQ = 89). This suggests that reduced intelligence alone cannot explain SVT failures. It should also be noted that within the entire sample, seven children had FSIQ scores ≤70 and thus in the impaired range. However, out of these seven, only one failed the TOMM (Case 1). Not surprising given the Digit Span is part of the IQ test, of the seven children with impaired intellect, five children had Digit Span SS of ≤4. Future investigations should be completed on the utility of Digit Span in children with ID. Given the past research expressing concerns and possible inappropriate use of SVT's in children with ID, Digit Span may present the same concerns particularly with this being part of the IQ measure. A larger sample is needed to draw these conclusions. Another consideration regarding false positives could be age. In the study, the SEG was almost 3 years younger than the OEG. Donders (2005) and Loughan and Perna (2012b) have found that as children age, their TOMM scores tend to improve with specific focus on the age range of 6–8 displaying significantly lower scores. This suggests that children may sometimes fail the TOMM for reasons related to cognitive development and potentially should be screened out when investigating effort using these measures. Further support for this hypothesis is that four of the seven children in our SEG were within this age range. Further investigation is needed in this area.

Limitations of our study include a limited sample size that may limit generalizability. This sample population is widely heterogeneous including comorbidity variations of many developmental and neurological populations. We recommend future investigations across homogeneous comorbid populations to determine the most appropriate cutoff score within each dually diagnoses group. Future research should also be carried out with the use of a second confirmatory SVT or index of motivation in the battery in combination with the TOMM which will help ensure the accuracy of Digit Span patterns for identifying the level of effort.

Conflict of interest

None declared.

References

Achenbach
T. M.
Manual for the Child Behavior Checklist/4–18 and 1991 profile
 , 
1991
Burlington, VT
Department of Psychiatry, University of Vermont
Altman
D. G.
Bland
J. M.
Diagnostic tests: Sensitivity and specificity
British Medical Journal
 , 
1994
, vol. 
308
 pg. 
1552
 
American Psychiatric Association
Diagnostic and statistical manual of mental disorders: Text revision
 , 
2000
Rev. 4th ed.
Washington, DC
Author
Angold
A.
Costello
E. J.
Erkanli
A.
Comorbidity
Journal of Child Psychology and Psychiatry
 , 
1999
, vol. 
40
 
1
(pg. 
57
-
87
)
Arnett
P. A.
Hammeke
T. A.
Schwartz
L.
Quantitative and qualitative performance on Rey's 15-Item Test in neurological patients and dissimulators
The Clinical Neuropsychologist
 , 
1995
, vol. 
9
 (pg. 
17
-
26
)
Axelrod
B.
Fichtenberg
N.
Millis
S.
Wertheimer
J.
Detecting incomplete effort with digit span from the Wechsler Adult Intelligence Scale-third edition
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
513
-
523
)
Babikian
T.
Boone
K.
Lu
P.
Arnold
G.
Sensitivity and specificity of various digit span scores in the detection of suspect effort
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 (pg. 
145
-
159
)
Blaskewitz
N.
Merten
T.
Kathmann
N.
Performance of children on symptom validity tests: TOMM, MSVT, and FIT
Archives of Clinical Neuropsychology
 , 
2008
, vol. 
23
 (pg. 
379
-
391
)
Bloom
D. R.
Levin
H. S.
Ewings-Cobbs
L.
Saunders
A. E.
Song
J.
Fletcher
J. M.
Lifetime and novel psychiatric disorders after pediatric traumatic brain injury
Journal of the American Academy of Child and Adolescent Psychiatry
 , 
2001
, vol. 
40
 
5
(pg. 
572
-
579
)
Botting
N.
Powls
A.
Cooke
R. W. I.
Attention deficit hyperactivity disorders and other psychiatric outcomes in very low birthweight children at 12 years
Journal of Child Psychology and Psychiatry
 , 
1997
, vol. 
8
 (pg. 
931
-
941
)
Brooks
B. L.
Sherman
E. M. S.
Krol
A. L.
Utility of the TOMM Trial 1 as an indicator of effort in children and adolescents
Archives of Clinical Neuropsychology
 , 
2012
, vol. 
27
 (pg. 
23
-
29
)
Caron
C.
Rutter
M.
Comorbidity in child psychopathology: Concepts, issues and research strategies
Journal of Child Psychology and Psychiatry
 , 
1991
, vol. 
32
 
7
(pg. 
1063
-
1080
)
Cohen
B.
Children's Memory Scale
 , 
1997
San Antonio, TX
The Psychological Corporation
Cohen
J.
A power primer
Psychological Bulletin
 , 
1992
, vol. 
112
 
1
(pg. 
155
-
159
)
Constantinou
M.
Bauer
L.
Ashendorf
L.
Fisher
J. M.
McCaffrey
R. J.
Is performance on recognition memory effort measures indicative of generalized poor performance on neuropsychological tests
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
191
-
198
)
Constantinou
M.
McCaffrey
R. J.
Using the TOMM for evaluating children's efforts to perform optimally on neuropsychological measures
Child Neuropsychology
 , 
2003
, vol. 
9
 
2
(pg. 
81
-
90
)
Courtney
J. C.
Dinkins
J. P.
Allen
L. M.
Kuroski
K.
Age related effects in children taking the Computerized Assessment of Response Bias and Word Memory Test
Child Neuropsychology
 , 
2003
, vol. 
9
 (pg. 
109
-
116
)
Coutinho
G.
Mattos
P.
Malloy-Diniz
L. F.
Neuropsychological differences between attention deficit hyperactivity disorder and control children and adolescents referred for academic impairment
Revista brasileira de psiquiatria
 , 
2009
, vol. 
31
 
2
(pg. 
141
-
144
)
Cuffe
S. P.
Moore
C. G.
McKeown
R. E.
Prevalence and correlates of ADHD symptoms in the national health interview survey
Journal of Attention Disorders
 , 
2005
, vol. 
9
 
2
(pg. 
392
-
401
)
Delis
D. C.
Kaplan
E.
Kramer
J. H.
The Delis-Kaplan Executive Function System
 , 
2001
San Antonio, TX
The Psychological Corporation
Donders
J.
Performance on the test of memory malingering in a mixed pediatric sample
Child Neuropsychology
 , 
2005
, vol. 
11
 (pg. 
221
-
227
)
Duncan
S.
Ausborn
D.
The use of reliable digits to detect malingering in a criminal forensic pretrial population
Assessment
 , 
2002
, vol. 
9
 (pg. 
56
-
61
)
Etherton
J. L.
Bianchini
K. J.
Greve
K. W.
Ciota
M. A.
Test of Memory Malingering performance is unaffected by laboratory-induced pain: Implications for clinical use
Archives of Clinical Neuropsychology
 , 
2005
, vol. 
20
 (pg. 
375
-
384
)
Gervais
R. O.
Rohling
M. L.
Green
P.
Ford
W.
A comparison of WMY, CARB, and TOMM failure rates in non-head injury disability claimants
Archives of Clinical Neuropsychology
 , 
2004
, vol. 
19
 (pg. 
475
-
487
)
Gilliam
J. E.
Gilliam Autism Rating Scale
 , 
1995
Austin, TX
Pro-Ed
Greve
K. W.
Binder
L. M.
Bianchini
K. J.
Rates of below-chance performance in forced-choice symptom validity tests
The Clinical Neuropsychologist
 , 
2009
, vol. 
23
 (pg. 
534
-
544
)
Gruzelier
J.
Seymour
K.
Wilson
L.
Jolley
A.
Hirsh
S.
Impairments on neuropsychological tests of temporohippocampal and frontohippocampal functions and word fluency in remitting schizophrenia and affective disorders
Archives of General Psychiatry
 , 
1988
, vol. 
45
 (pg. 
623
-
629
)
Heaton
R. K.
Chelune
G. J.
Talley
J. L.
Kay
G.
Curtiss
G.
Wisconsin Card Sorting Test manual: Revised and expanded.
 , 
1993
Odessa, FL
Psychological Assessment
Inman
T. H.
Berry
D. T. R.
Cross-validation of indicators of malingering: A comparison of nine neuropsychological tests, four tests of malingering, and behavioral observations
Archives of Clinical Neuropsychology
 , 
2002
, vol. 
17
 (pg. 
1
-
23
)
Irwin-Chase
H.
Burns
B.
Developmental changes in children's abilities to share and allocate attention in a dual task
Journal of Experimental Child Psychology
 , 
2000
, vol. 
77
 (pg. 
61
-
85
)
Jones
J. E.
Watson
R.
Sheth
R.
Caplan
R.
Koehn
M.
Seidenberg
M.
Psychiatric comorbidity in children with new onset epilepsy
Developmental Medicine and Child Neurology
 , 
2007
, vol. 
49
 (pg. 
493
-
497
)
Kaslow
N. J.
Rehm
L. P.
Siegel
A. W.
Social-cognitive and cognitive correlates of depression in children
Journal of Abnormal Child Psychology
 , 
1984
, vol. 
12
 
4
(pg. 
605
-
620
)
Kirk
J. W.
Harris
B.
Hutaff-Lee
C. F.
Koelemay
S. W.
Dinkins
J. P.
Kirkwood
M. W.
Performance on the test of memory malingering (TOMM) among a large clinic-referred pediatric sample
Child Neuropsychology
 , 
2011
, vol. 
17
 
3
(pg. 
242
-
254
)
Kirkwood
M. W.
Hargrave
D. D.
Kirk
J. W.
The value of the WISC-IV Digit Span subtest in detecting noncredible performance during pediatric neuropsychological examinations
Archives of Clinical Neuropsychology
 , 
2011
, vol. 
26
 (pg. 
377
-
384
)
Kirkwood
M. W.
Yeates
K. O.
Randolph
C.
Kirk
J. W.
The implication of symptom validity test failure for ability-based test performance in a pediatric sample
Psychological Assessment
 , 
2012
, vol. 
24
 
1
(pg. 
36
-
45
)
Kovacs
M.
Child Depression Inventory
 , 
1992
North Tonawanda, NY
Multi-Health Systems
Larrabee
G.
Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios
The Clinical Neuropsychologist
 , 
2008
, vol. 
22
 (pg. 
666
-
679
)
Lewis
C.
Hitch
G. J.
Walker
P.
The prevalence of specific arithmetic difficulties and specific reading difficulties in 9 to 10 year old boys and girls
Journal of Child Psychology and Psychiatry
 , 
1994
, vol. 
35
 
2
(pg. 
283
-
292
)
Loughan
A. R.
Perna
R.
Performance and specificity rates in the Test of Memory Malingering: An investigation into pediatric clinical populations
Applied Neuropsychology: Child
 , 
2012
Loughan
A. R.
Perna
R.
Performance on the Test of Memory Malingering (TOMM) by age in children
 , 
2012
Montreal, Canada
International Neuropsychological Society Conference Poster
MacAllister
W. S.
Nakhutina
L.
Bender
H. A.
Karantzoulis
S.
Carlson
C.
Assessing effort during neuropsychological evaluation with the TOMM in children and adolescents with epilepsy
Child Neuropsychology
 , 
2009
, vol. 
15
 (pg. 
521
-
531
)
Mayes
S. D.
Calhoun
S. L.
Learning, attention, writing, and processing speed in typical children and children with ADHD, autism, anxiety, depression, and oppositional-defiant disorder
Child Neuropsychology
 , 
2007
, vol. 
13
 (pg. 
469
-
493
)
Meyers
J. E.
Volbrecht
M. E.
A validation of multiple malingering detection methods in a large clinical sample
Archives of Clinical Neuropsychology
 , 
2003
, vol. 
18
 (pg. 
261
-
276
)
Nelson
N.
Boone
K.
Dueck
A.
Wagener
L.
Lu
P.
Grills
C.
Relationships between eight measures of suspect effort
The Clinical Neuropsychologist
 , 
2003
, vol. 
17
 (pg. 
263
-
272
)
Nies
K. J.
Sweet
J. J.
Neuropsychological assessment and malingering: A critical review of past and present strategies
Archives of Clinical Neuropsychology
 , 
1994
, vol. 
9
 (pg. 
501
-
552
)
O'Bryant
S.
Lucas
J.
Estimating the predictive value of the Test of Memory Malingering: An illustrative example for clinicians
The Clinical Neuropsychologist
 , 
2006
, vol. 
20
 
3
(pg. 
533
-
540
)
Pellock
J. M.
Defining the problem: Psychiatric and behavioral comorbidity in children and adolescents with epilepsy
Epilepsy and Behavior
 , 
2004
, vol. 
5
 
3
(pg. 
3
-
9
)
Perna
R.
Loughan
A. R.
The influence of effort on neuropsychological performance in children: Is performance on the TOMM indicative of neuropsychological ability?
Applied Neuropsychology: Child
 , 
2012
Perna
R.
Loughan
A. R.
Hertza
J.
Segraves
K.
The value of embedded measures in detecting suboptimal effort in children: An investigation into the WISC-IV Digit Span and CMS Verbal Memory Subtests
Applied Neuropsychology: Child
 , 
2012
Raiford
S. E.
Weiss
L. G.
Rolfhus
E.
Coalson
D.
General ability index: Technical report #4
 , 
2008
San Antonio, TX
The Psychological Corporation
Roodenrys
S.
Koloski
N.
Grainger
J.
Working memory function in attention deficit hyperactivity disordered and reading disabled children
British Journal of Developmental Psychology
 , 
2001
, vol. 
19
 (pg. 
325
-
337
)
Rosenfeld
B.
Sands
S. A.
van Gorp
W. G.
Have we forgotten the base rate problem? Methodological issues in the detection of distortion
Archives of Clinical Neuropsychology
 , 
2000
, vol. 
15
 (pg. 
349
-
359
)
Shalev
R. S.
Auerbach
J.
Manor
O.
Gross-Tsur
V.
Developmental dyscalculia: Prevalence and prognosis
European Child and Adolescent Psychiatry
 , 
2000
, vol. 
9
 
2
(pg. 
58
-
64
)
Sherman
D. S.
Boone
K. B.
Lu
P.
Razani
J.
Re-examination of a Rey Auditory Verbal Learning Test/ Rey Complex Figure discriminate function to detect suspect effort
The Clinical Neuropsychologist
 , 
2002
, vol. 
16
 (pg. 
2242
-
250
)
Simonoff
E.
Pickles
A.
Charman
T.
Chandler
S.
Loucas
T.
Baird
G.
Psychiatric disorders in children with autism spectrum disorders: Prevalence, comorbidity, and associated factors in a population-derived sample
Journal of the American Academy of Child and Adolescent Psychiatry
 , 
2008
, vol. 
47
 
8
(pg. 
921
-
929
)
Sparrow
S. S.
Cicchetti
D. V.
Balla
D. A.
Vineland Adaptive Behavior Scales II manual
 , 
2005
Circle Pines, MN
AGS Publishing
Spencer
T.
Biederman
J.
Wilens
T.
Attention-deficit/hyperactivity disorder and comorbidity
Pediatric Clinics of North America
 , 
1999
, vol. 
46
 
5
(pg. 
915
-
927
)
Tombaugh
T. N.
TOMM, Test of Memory Malingering
 , 
1996
New York
Multi-Health Systems
Wechsler
D.
Wechsler Intelligence Scale for Children-Fourth Edition
 , 
2003
San Antonio, TX
The Psychological Corporation
Willcutt
E. G.
Pennington
B. F.
Psychiatric comorbidity in children and adolescents with reading disability
Journal of Child Psychology and Psychiatry
 , 
2000
, vol. 
41
 
8
(pg. 
1039
-
1048
)
Wu
K. K.
Anderson
V.
Castiello
U.
Neuropsychological evaluation of deficits in executive functioning for ADHD children with or without learning disabilities
Developmental Neuropsychology
 , 
2002
, vol. 
22
 
2
(pg. 
501
-
531
)