Abstract

Exploratory factor analyses (EFAs) of the Comprehensive Trail Making Test suggested a possible two-factor solution that might better reflect the differences in Trails 1–3 and Trails 4 and 5 as opposed to a single Composite Index for the total standardization sample. The purpose of this study was to conduct a confirmatory factor analysis (CFA) of the two-factor structure with a subset of the standardization sample ages 18 or younger that had completed all five-trail tasks. The sample included 251 boys and 306 girls, ages 8–18, with a mean age of 12.76 (SD = 3.07). Data were collected across 16 states with representation from all regions of the USA. Standardized scores on each of the trails (1–5) tasks were considered. The results of CFA using M-plus indicated a good fit for the two-factor model, χ2(4) = 18.686, p = .0009, root mean-square error of approximation = 0.081, comparative fit index = 0.986 and standardized root-mean-squared residual = 0.021. A one-factor model was not supported. As suggested by the EFA in the manual, Trails 1–3 and Trails 4 and 5, while related, appear to be different in subtle ways that may be most meaningful in conjunction with evaluation of children with neurodevelopmental differences. Implications and possible explanations for this difference are discussed.

Introduction

Neuropsychological assessment is concerned with a range of functional domains; executive function (EF) is one of these global domains. EF has been conceptualized as comprising a variety of psychological processes, including those necessary for problem-solving; self-regulation; planning; directing, shifting and maintaining attention; organization; and abstract reasoning (Barkley, 2000; Cicerone, 2005; Zelazo, Craik, & Booth, 2004). In effect, it is believed that intact EF allow an individual to be alert, productive, creative, and perform important activities in daily living, with deficits in EF resulting in problems in daily functioning (Baron, 2004; Cicerone, 2005). Many measures have been developed to assess various aspects of EF; perhaps the most common of these is the Trail Making Test (TMT).

The TMT has a long history of use in neuropsychological assessment (Franzen, 2000; Lezak, Howieson, Loring, Hannay, & Fischer, 2004; Strauss, Sherman, & Spreen, 2006). The TMT has been used as a measure of processing speed and cognitive flexibility (Miner & Ferraro, 1998; Wecker, Kramer, Wisniewski, Delis, & Kaplan, 2000), as well as a nonspecific measure of brain dysfunction (Reitan & Wolfson, 1995). The TMT includes two parts, generally designated as TMT-A and TMT-B. In the TMT-A, the individual connects encircled numerals in order; in the TMT-B, the individual connects encircled numerals and letters in order, alternating from numeral to letter. Miner and Ferraro further argued that the TMT-B may tap inhibitory mechanisms in that the dominant response (i.e., following numerical order alone) must be inhibited. As such, TMT-B results are interpreted as reflective of the set-switching and cognitive flexibility (Arbuthnott & Frank, 2000).

Although it has been argued that performance on the TMT may reflect difficulty with inhibition or attention (Miner & Ferraro, 1998), there are no stimuli present on the TMT that need to be ignored or that are irrelevant to the task (i.e., no distractor elements). Additionally, the normative data available for the TMT do not meet the current standards (i.e., stratified and reflective of the population). The Comprehensive TMT (CTMT; Reynolds, 2002) was created to address some of the shortcomings of the original TMT. The norms for the CTMT were developed based on U.S. Census information using stratified random sampling. The CTMT was extended to include five separate tasks to increase its utility when measuring executive functioning, including the presence of distractor elements on specific tasks.

Performance on the CTMT improved with age and peaked for individuals in their early to mid-twenties, in a pattern consistent with many studies of the maturation of the frontal lobes (Reynolds & Horton, 2008). The CTMT Composite Index has been found to have acceptable construct and criterion validity for use in the assessment of children with traumatic brain injury (TBI) when used as part of an assessment of executive functioning (e.g., Allen, Haderlie, Kazakov, & Mayfield, 2009). In a study comparing the performance of children who had sustained a TBI to a healthy comparison group, the TBI group performed worse on all five trails and the Composite Index. Notably, Trails 4 and 5 were found to be the most accurate in discriminating between the TBI and control groups (Armstrong, Allen, Donohue, & Mayfield, 2008). Consistent with this, the exploratory factor analysis (EFA) of the standardization sample (ages 8–75 years) suggested the emergence of two different factors on the CTMT, the first comprised Trails 1 through 3 and the second comprised Trails 4 through 5 (Reynolds, 2002). Theoretically, Trails 1 through 3 require the examinee to complete the trail using a single concept with increasing levels of distraction; Trails 4 and 5 require the examinee to shift between two different concepts; thus supporting the idea that there would be two factors, rather than a single factor.

In an earlier study by Atkinson and Ryan (2008), the model fit for three different trail tasks, including the CTMT, yielded a two-factor solution (sequencing and shifting), and provided a better fit than either a unitary or three-factor model. Moreover, a study by Atkinson and colleagues (2010) again explored the construct validity of three variations of trail tasks, not including the CTMT, with confirmatory factor analysis (CFA) yielded a two-factor solution—sequencing and shifting. Interpretation of measures included in neuropsychological assessment should be based on the construct validity of the scores generated; for this reason, it is important to determine if it is sufficient and appropriate to interpret a composite score, or if interpretation of the underlying factor structure is more meaningful and appropriate in describing the integrity of brain function. These findings raise the question of whether a single score represents the construct(s) being measured by the five-trail tasks that comprise the CTMT. The purpose of this study was to conduct CFA for a two-factor structure with that subset of the standardization sample aged 18 or younger. It was hypothesized that, similar to the results of Atkinson and Ryan (2008) and Atkinson and colleagues (2010), a two-factor model would be acceptable and perhaps preferable.

Method

Participants

Although the CTMT is used with individuals from 8 to 75 years of age, the sample used for this study included ages 8–18. This subset was selected due to the age-related changes that may be masked by including the full adult population and the many developmental factors that complicate test interpretation for youth during these years, in hopes of adding greater clarity to the CTMT interpretive schema. Standardization data were collected across 16 states with representation from all regions of the USA; standardization data were matched to census information as reported by the U.S. Census Bureau (1998) using stratified random sampling. Additional information on the standardization sample is provided in the test manual. For these analyses, only participants who completed all five tasks that comprise the CTMT were considered; participants representing special groups were excluded. The resulting sample comprised 251 boys and 306 girls with a mean age of 12.76 (SD = 3.07); the majority of the sample was white (81.51%). Additional demographic data are provided in Table 1.

Table 1.

Sample demographics (N = 557)

Variable N (%) % of U.S. School Age Populationa 
Gender 
 Boys 251 (45.06) 51 
 Girls 306 (54.94) 49 
Race/Ethnicity 
 White 454 (81.51) 80 
 African American 48 (8.62) 13 
 Hispanic 55 (9.87) 
Region of Country 
 Northeast 80 (14.36) 18 
 Midwest 152 (27.29) 22 
 South 210 (37.70) 36 
 West 115 (20.65) 24 
Variable N (%) % of U.S. School Age Populationa 
Gender 
 Boys 251 (45.06) 51 
 Girls 306 (54.94) 49 
Race/Ethnicity 
 White 454 (81.51) 80 
 African American 48 (8.62) 13 
 Hispanic 55 (9.87) 
Region of Country 
 Northeast 80 (14.36) 18 
 Midwest 152 (27.29) 22 
 South 210 (37.70) 36 
 West 115 (20.65) 24 

Measures

Comprehensive Trail-Making Test (Reynolds, 2002). The CTMT consists of five trails; a practice trial precedes administration of Trails 1, 4, and 5. Trail 1 (T1) is similar to the TMT-A and requires the subject to draw a line connecting circles with the numbers 1–25. Trails 2 (T2) and 3 (T3) add distractors—empty circles (T2) and circles with figures (T3). Trial 4 (T4) introduces a set switching from Arabic numerals to number words. Trial 5 (T5) is comparable to TMT-B with alternation between numerals and letters, but also includes empty circles as distractors.

Scoring is based on the time taken to complete each trail. Raw scores are converted into normalized t-scores (M = 50, SD = 10). A Composite Index score is derived based on overall performance across five trails. Reliability of scores from the CTMT has been found acceptable for clinical use, with internal consistency coefficients ranging from 0.70 to 0.77 for the five trails and 0.92 for the Composite Index (Reynolds, 2002). Test–retest reliability coefficients range from 0.70 to 0.78 for the five trails and 0.84 for the Composite Index. Scorer reliability ranged from 0.96 to 0.99 across the five trails and the Composite Index. Validity data provided in the manual suggest that there is a minimal level of association with motor tasks and that CTMT performance improves with age in childhood and the early adult years; additional concurrent and construct validity studies have been conducted independently as well (Allen et al., 2009).

At this time, a single composite score is provided, combining all five-trail tasks, along with a standard score for each of the five trails; as noted, the EFAs reported in the test manual indicated the possibility of a two-factor structure with Trails 1–3 potentially loading on one factor, and Trails 4 and 5 loading on a second factor (Reynolds, 2002). In the Allen et al. study, they examined two factors based on Trails 1–3 and Trails 4 and 5. Although they did not have the sample size necessary to conduct a CFA, correlations between the two created subscales (termed Simple and Complex, respectively) were .55 for the normal control group, and .77 for the clinical group. Although statistically significant, the correlations would suggest that for either group, the two subscales not only measure similar constructs, but also some differing constructs.

Results

Two-Factor Model

CFA was conducted to test the hypothesized two-factor structure. The correlation matrix used in the analysis is shown in Table 2. Trails 1–3 were hypothesized to load on one factor and Trails 4 and 5 were hypothesized to load on another factor. This two-factor model was examined using M-plus version 5.21. To assess a goodness of fit of the model, we examined a variety of fit statistics. First, we looked at chi-square fit statistic as an exact fit test of the model. The chi-square fit statistic was significant, χ2(4) = 18.69, p < .001, and did not support the two-factor structure. The chi-square fit statistic, however, is known to be too sensitive to a trivial misspecification in large samples. Thus, several approximate fit indices were also examined: (a) comparative fit index (CFI); (b) standardized root-mean-squared residual (SRMR); and (c) root mean-square error of approximation (RMSEA). Recommended criteria for a good model fit are CFI > 0.95, SRMR < 0.08, and RMSEA < 0.08 (Browne & Cudeck, 1993; Hu & Bentler, 1999). All three fit indices indicated a good fit of the hypothesized factor structure: CFI = 0.99, SRMR = 0.02, and RMSEA = 0.08. The standardized factor loadings are provided in Table 3. All item factor loadings were >0.75, indicating that all items are adequately related to their respective factors. The correlation between the two factors was .79.

Table 2.

Correlation matrix of five items on CTMT

 Trail 1 Trail 2 Trail 3 Trail 4 Trail 5 
Trail 1      
Trail 2 .634     
Trail 3 .521 .616    
Trail 4 .462 .473 .513   
Trail 5 .455 .446 .485 .584  
 Trail 1 Trail 2 Trail 3 Trail 4 Trail 5 
Trail 1      
Trail 2 .634     
Trail 3 .521 .616    
Trail 4 .462 .473 .513   
Trail 5 .455 .446 .485 .584  

Note: CTMT = Comprehensive Trail Making Test.

Table 3.

Factor loadings for two-factor structure of CTMT

 Factor 1 Factor 2 
Trail 1 .746 (.024)  
Trail 2 .814 (.022)  
Trail 3 .753 (.024)  
Trail 4  .782 (.028) 
Trail 5  .747 (.029) 
 Factor 1 Factor 2 
Trail 1 .746 (.024)  
Trail 2 .814 (.022)  
Trail 3 .753 (.024)  
Trail 4  .782 (.028) 
Trail 5  .747 (.029) 

Notes: CTMT = Comprehensive Trail Making Test. Standardized factor loadings are shown and standard errors are in parentheses.

One-Factor Model

In addition to our main hypothesized model, a single-factor model was also examined as a plausible alternative model. The single-factor model did not provide a good fit: χ2(5) = 72.19, p < .001, CFI = 0.94, SRMR = 0.04, and RMSEA = 0.16. Although the SRMR value seems acceptable, other three fit indices did not support a good fit of this alternative model. The standardized factor loadings were 0.73, 0.78, and 0.75 for Trails 1, 2, and 3 and 0.68 and 0.65 for Trails 4 and 5, respectively. The pattern of factor loadings also appeared to favor the two-factor model of CTMT.

Discussion

The existing research with child and adolescent participants suggest that the CTMT provides useful information for both clinical and nonclinical populations. The increased complexity of Trails 4 and 5, with set switching requirements, suggests that additional abilities may be being tapped when compared with the TMT. These differences suggest the need for additional score options for subscales that reflect attention/sequencing and set switching/inhibition. The purpose of this study was to test the hypothesis that a two-factor rather than a one-factor model would provide a better fit for the data. Using the standardization data for children 8–18 years of age, the one-factor model was not supported; the data better fit a two-factor model. With the developmental changes to be considered in the assessment of children and adolescents, this study only considered data for individuals aged 18 or younger. Our results suggest two next steps: the replication of these results with the adult population and, if confirmed therein, the development of standard scores for composites composed of Trails 1–3 and Trials 4 and 5. Further replication with samples that are not part of the standardization sample, to further support the generalization of this factor structure, will be needed once these two steps have been completed.

These findings and next steps are important, not just from a psychometric perspective, but from a clinical perspective. Interpretation of scores generated needs to be based on evidence of the underlying constructs, generally involving more than a single subscale score (i.e., some combination of subscale scores). At the same time, it is important that the new scale score created have sufficient internal consistency for interpretation. Based on these results, the Composite Index generated on the CTMT does not fully represent a single construct; as such, information relating to individual functioning may be lost by using this score. Nevertheless, the Composite Index is the most highly reliable of scores obtained on the CTMT and does correlate with important variables such as severity of injury as reflected in GCS scores and remains informative. However, the alternative two-factor structure, representing attention/sequencing and set switching/inhibition, better fits the data and is consistent with prior research on TMTs. Use of the two factors for interpretation may be beneficial in identifying the underlying difficulties that lead to an overall lower Composite Index and may provide convergent evidence for identification of relative strengths and weaknesses of the individual. Until composite scores are developed for these two factors, examiners may confidently compare the mean levels of performance on Trails 1–3 with the mean level of performance on Trails 4 and 5 as an adjunct interpretive procedure as suggested in the CTMT Manual.

Conflict of Interest

The third author (CRR) has an economic and commercial interest in the measure of interest.

References

Allen
D. N.
Haderlie
M.
Kazakov
D.
Mayfield
M.
Construct and criterion validity of the Comprehensive Trail Making Test in children and adolescents with traumatic brain injury
Child Neuropsychology
 , 
2009
, vol. 
15
 (pg. 
543
-
553
doi:10.1080/092797040902748234
Arbuthnott
K.
Frank
J.
Trail making test, part B as a measure of executive control: Validation using a set-switching paradigm
Journal of Clinical and Experimental Neuropsychology
 , 
2000
, vol. 
22
 (pg. 
518
-
528
)
Armstrong
C.
Allen
D. N.
Donohue
B.
Mayfield
J.
Sensitivity of the Comprehensive Trail Making Test to traumatic brain injury in adolescents
Archives of Clinical Neuropsychology
 , 
2008
, vol. 
23
 (pg. 
351
-
358
)
Atkinson
T. M.
Ryan
J. P.
The use of variants of the trail making test in serial assessment: A construct validity study
Journal of Psychoeducational Assessment
 , 
2008
, vol. 
26
 (pg. 
42
-
53
)
Atkinson
T. M.
Ryan
J. P.
Lent
A.
Wallis
A.
Schachter
H.
Coder
R.
Three trail making tests for use in neuropsychological assessments with brief intertest intervals
Journal of Clinical and Experimental Neuropsychology
 , 
2010
, vol. 
32
 (pg. 
151
-
158
)
Barkley
R. A.
Genetics of childhood disorders: XVII. ADHD, part 1: The executive functions and ADHD
Journal of the American Academy of Child and Adolescent Psychiatry
 , 
2000
, vol. 
39
 (pg. 
1064
-
1068
)
Baron
I. S.
Neuropsychological evaluation of the child
 , 
2004
New York
Oxford University Press
Browne
M. W.
Cudeck
R.
Bollen
K. A.
Long
J. S.
Alternative ways of assessing model fit
Testing structural equation models
 , 
1993
Beverly Hills, CA
Sage
(pg. 
136
-
162
)
Cicerone
K. D.
High
W. M.
Sander
A. M.
Struchen
M. A.
Hart
K. A.
Rehabilitation of executive function impairments
Rehabilitation for traumatic brain injury
 , 
2005
New York
Oxford University Press
(pg. 
71
-
87
)
Delis
D.
Kaplan
E.
Kramer
J.
Delis-Kaplan Executive Function System. The Psychological Corporation
 , 
2001
San Antonio, TX
Harcourt Brace
Franzen
M. D.
Neuropsychological assessment in traumatic brain injury
Critical Care Nursing Quarterly
 , 
2000
, vol. 
23
 (pg. 
58
-
64
)
Hu
L.
Bentler
P. M.
Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives
Structural Equation Modeling
 , 
1999
, vol. 
6
 
1
(pg. 
1
-
55
)
Lezak
M. D.
Howieson
D. B.
Loring
D. W.
Hannay
H. J.
Fischer
J. S.
Neuropsychological assessment
 , 
2004
4th ed.
New York
Oxford University Press
Miner
T.
Ferraro
F. R.
The role of speed of processing, inhibitory mechanisms, and presentation order in Trail Making Test performance
Brain and Cognition
 , 
1998
, vol. 
38
 (pg. 
246
-
253
)
Reitan
R. M.
Wolfson
D.
The Halstead-Reitain neuropsychological test battery: Theory and clinical interpretation
 , 
1992
2nd ed.
Tucson, AZ
Neuropsychology Press
Reynolds
C. R.
Comprehensive Trail-making Test
 , 
2002
Austin, TX
PRO-ED
Reynolds
C. R.
Horton
A. M.
Jr.
Assessing executive functions: A life-span perspective
Psychology in the Schools
 , 
2008
, vol. 
45
 (pg. 
875
-
892
)
Strauss
E.
Sherman
E. M. S.
Spreen
O.
A compendium of neuropsychological tests: Administration, norms, and commentary
 , 
2006
New York
Oxford University Press
U. S. Census Bureau
Statistical abstracts of the United States
 , 
1998
Washington, DC
Department of Commerce
Wecker
N. S.
Kramer
J, H.
Wisniewski
A.
Delis
D. C.
Kaplan
E.
Age effects on executive ability
Neuropsychology
 , 
2000
, vol. 
14
 (pg. 
409
-
414
)
Zelazo
P. D.
Craik
F. I. M.
Booth
L.
Executive functions across the life span
Acta Psychologica
 , 
2004
, vol. 
115
 (pg. 
167
-
183
)