Abstract

Studies indicate that executive functioning (EF) is a strong predictor of everyday function. However, assessment can be problematic as no single standardized instrument is known to measure all EF domains simultaneously. Thus, the Pillbox Test was developed as a new measure tapping four EF factors through the real-time assessment of medication management, a complex instrumental activity of daily living. The Pillbox Test showed good criterion-related validity and was effective in differentiating graduated levels of executive dysfunction between a mixed neurological group, medical control group, and healthy community control group. This test also showed good convergent validity as it correlated significantly in expected directions with established EF measures in all four of selected EF domains and the Direct Assessment of Functional Status. Finally, a receiver operator characteristic curve found a sensitivity of 75% and specificity of 87.5%, suggesting that the Pillbox Test is a promising new ecological measure of EF.

Introduction

Executive function (EF) is a complex construct that is one of the most useful neuropsychological predictors of everyday functioning (Bell-McGinty, Podell, Franzen, Baird, & Williams, 2002; Boyle, Paul, Moser & Cohen, 2004; Goverover, 2004; O'Bryant et al., 2011; Penades et al., 2010; Sadek, Stricker, Adair, & Haaland, 2011). Although many theoretical models of EF exist (Chan, Shum, Toulopoulou, & Chen, 2008), one of the most widely accepted is the Lezak's (1995) four-factor model. This model considers EF to be a multidimensional construct composed of purposive action/self-regulation, planning/attention, volition/inhibition, and effective performance/self-monitoring (Lezak, Howieson, & Loring, 2004).

The purposive action/self-regulation factor, otherwise known as complex purposive behavior, refers to one's ability to initiate, maintain, and/or adjust behavior when it does not provide the desired outcome. This skill is particularly taxed when one must produce non-routine behaviors or integrate past knowledge with new learning (Lamar, Zonderman, & Resnick, 2002; Lezak et al., 2004). The planning/attention factor refers to one's ability to sustain attention, consider alternatives, reason sequentially and hierarchically, and demonstrate sound strategy formulation (Duke & Kaszniak, 2000; Lamar et al., 2002; Lezak et al., 2004). The volition/inhibition factor refers to an individual's ability to formulate goals and intentions, with deficits noted when one's intentions no longer guide behavior (Duke & Kaszniak, 2000; Grigsby, Kaye, Baxter, Shetterly, & Hamman, 1998; Lezak et al., 2004). The final factor, effective performance/self-monitoring, also known as meta-cognition, refers to the ability to adequately observe one's own behavior, evaluate task difficulty in relation to personal strengths/ weaknesses, choose appropriate strategies for task completion, and recognize/correct errors in an environmental context (Goverover, 2004; Lamar et al., 2002; Lezak et al., 2004; Malloy, Cohen, & Jenkins, 1998).

Persons with executive dysfunction typically manifest deficits in more than one of these four areas (Bryan & Luszcz, 2000; Strauss, Sherman, & Spreen, 2006). Thorough assessment of these deficits can be problematic as no single standardized instrument is known to measure all four EF domains simultaneously. As such, one must carefully choose several measures to achieve this goal (Brauer-Boone, Ponton, Gorsuch, Gonzalez & Miller, 1998; Lamar et al., 2002; Miyake, Friedman, Emerson, Witzki, & Howerter, 2000; Strauss et al., 2006). This is an important clinical consideration as reliance on one EF measure is unlikely to accurately characterize one's overall EF abilities (Brauer-Boone et al., 1998; Lamar et al., 2002; Miyake et al., 2000; Strauss et al., 2006).

To address the lack of an existing measure assessing all four EF factors, a new process-oriented, performance-based instrument, the Pillbox Test, was designed which evaluates medication management skills. Medication management was chosen as the basis of this new EF measure because it is considered one of the most complex instrumental activities of daily living (IADLs; Burdick et al., 2005; Park, 1999; Stilley, Bender, Dunbar-Jacob, Sereika, & Ryan, 2010) and successful performance requires intact functioning in all four of Lezak's (1995) EF domains. Additionally, poor medication management not only contributes to the therapeutic failure of disease management but also to the wastage of healthcare resources. For example, non-compliance of antihypertensive medication is associated with increased doctor office visits, increased ER visits, and increased hospitalizations with longer stays (Bennett, Sauve, & Shaw, 2005; Salas et al., 2001; Stilley et al., 2010). Furthermore, poor medication management may signal impairment in other cognitive domains, as well as decreased safety for independent living (Burdick et al., 2005).

The cognitive abilities contained within the purposive action/self-regulation domain are most influential on medication adherence as problems with mental flexibility, stimulus-bound behavior, perseveration, suboptimal search strategies, concrete thinking, and the inability to multitask have been associated with incorrect medication administration (Duke & Kaszniak, 2000; Lamar et al., 2002; Lezak et al., 2004; Miyake et al., 2000). As space on medication labels is limited and instructions are generally vague, disjointed, and poorly organized, individuals must rely on making inferences regarding information not explicitly stated (Hayes, 1998; Kendrick & Bayne, 1982; McGraw & Drennan, 2001; Park, Willis, Morrow, Diehl, & Gaines, 1994; Willis et al., 1999). Several studies reveal that both cognitively intact and cognitively impaired individuals have difficulty making inferences from medical information, although those with cognitive impairment make significantly more errors (Hayes, 1998; McGraw & Drennan, 2001; Park et al., 1994; Zwahr, 1999). Kendrick and Bayne (1982), for example, found 29% of their cognitively intact elderly sample demonstrated a different understanding of medication administration than the written label. Additionally, only 22% of their sample demonstrated correct administration knowledge of the instruction “take one tablet every 6 h.” Kripalani and colleagues (2006) showed that 38% of individuals from a large, urban general medical clinic were unable to identify all of their medications in spite of being able to look at the bottle, label, and pills themselves.

Within the planning/attention EF domain, problems are seen in the preplanning component, often due to a disorganized approach to tasks, which has been related to difficulty performing IADLs, especially medication management (Duke & Kaszniak, 2000; Lamar et al., 2002; Lezak et al., 2004). In taking medication, the person must plan to carry out the appropriate behaviors at the correct time, integrate medication schedules, and observe any special instructions such as food restrictions (Gould, 1999; Park, 1999; Park et al., 1994). Within the general relationships noted above, specific to medication management, the planning/attention factor is also considered a mediator for the volition/inhibition factor by influencing the efficiency and completeness with which one gathers medical information. For example, Isaac and Tamblyn (1993) found the inability to independently complete medication administration, and the number of errors was related to both slowed visual scanning speed and poor initial encoding of information. Additional medication administration problems related to volition/inhibition deficits include difficulty inhibiting the automatic and habitual use of old medication directions. This disinhibition leads to decreased effortful processing and results in the perseveration of previously learned material (Isaac & Tamblyn, 1993; Park et al., 1994). This is an important clinical consideration when a prescription is changed or a new medical regimen is implemented.

Finally, impairments in the effective performance/self-monitoring domain, such as increased reliance on routine, resistance to change, poor insight into one's abilities, and environmental dependency/indifference negatively contribute to medication compliance (Gould, 1999; Isaac & Tamblyn, 1993; Park et al., 1994; Willis et al., 1999). For successful medication administration, one must be able to monitor both medication performance and physiological changes that may signal a medical complication. This requires adequate working memory based on feedback received from using (or not using) medication. Additionally, patients must successfully communicate ongoing effectiveness or problems to medical providers and/or family (Gould, 1999; Park, 1999; Park et al., 1994).

Thus, the purpose of the current study was to examine the construct validity of a new measure of EF, the Pillbox Test, which addresses all four of Lezak's (1995) factors through the real-time assessment of medication management. The following three hypotheses were examined: (1) individuals with neurological impairment will perform worse than those with cardiovascular disease who will perform worse than healthy community controls on traditional and performance-based measures of EF, including the Pillbox Test; (2) the Pillbox Test will correlate moderately with variables from widely established EF measures in each of the four domains purported by Lezak; and (3) the Pillbox Test will demonstrate good sensitivity and specificity in individuals with neurologic impairment and healthy controls.

Method

Participants

Potential participants were recruited into one of three groups: a mixed neurological group (NEURO), a medical control group (MED), and a healthy community control group (CON). The NEURO group was referred by physicians from an outpatient memory clinic at a county hospital or neurorehabilitation day program at a for-profit medical center. The MED group was recruited via flyers posted in a cardiovascular rehabilitation clinic, and the CON group was recruited following presentations on aging at community senior centers. Participants were excluded if English was not their primary language (n = 2), they were unable to read items on the Reitan Indiana Aphasia Screening Test (Reitan & Wolfson, 1993; n = 1), or vision and hearing were not adequate for study purposes (n = 2). An additional two people declined to complete testing after an interview. Thus, the final sample consisted of 120 of 127 referrals.

The NEURO group was composed of 40 participants (20 men and 20 women) of which approximately 65% were Caucasian, 30% were African American, and 5% were of other ethnicities. Nine participants were previously diagnosed with Alzheimer's disease (AD), 13 were diagnosed with Vascular Dementia (VaD), and 18 had incurred stoke within 6 months prior to study participation. Diagnoses of AD and VaD were established by a combination of the referring neurologist's opinion and previous neuropsychological assessment. These diagnoses were not based on tests used in the current study. None of these individuals were on memory-enhancing drugs at the time of testing. The mean age was 68.63 years (SD = 8.08), and the average education was 12.30 years (SD = 2.86). The mean number of medications consumed by this group as verified by the chart review was 6.88 (SD = 5.44).

The MED group also consisted of 40 participants (20 men and 20 women) of which approximately 88% were Caucasian, 10% were African American, and 2% were of other ethnicities. People with cardiovascular disease were chosen as the medical control group, since mild executive dysfunction has been found in patients with cardiovascular disease (Bennett et al., 2005; Newman et al., 2005), which may contribute to medical non-compliance often seen within this population (Barclay, Weiss, Mattis, Bond, & Blass, 1988). All MED participants had either experienced a heart attack within the past year or been diagnosed with a chronic cardiovascular disorder such as congestive heart failure. Additionally, all MED participants denied a prior diagnosis of cognitive impairment or any other conditions that might affect cognition. The mean age was 69.13 years (SD = 8.12), and the average education was 13.91 years (SD = 3.34). This group was consuming an average of 7.08 medications (SD = 4.12) per chart review.

The CON group representing “normal aging” consisted of 40 participants (17 men and 23 women) of which approximately 94% were Caucasian, 3% were African American and 3% were of other ethnicities. All CON participants denied a prior diagnosis of cognitive impairment or any other conditions that might affect cognition and were living independently. The mean age was 64.10 years (SD = 7.32), and the average education was 15.48 years (SD = 2.70). This group was consuming an average of 3.25 medications (SD = 2.64), which was determined by reviewing their medication bottles during the study session. None of these medications were known to significantly impact cognitive functioning.

Materials

Neuropsychological measures

Based on a literature review, measures were chosen to represent each of Lezak's (1995) four factors of EF. Traditional neuropsychological measures in addition to a performance-based measure were utilized and are listed under the factor they represent below, along with references supporting their inclusion under each factor. Raw scores were used as dependent variables to facilitate comparisons, since demographically corrected scores for the Pillbox Test have not been derived. Likewise, given the preliminary nature of this work, individual test/subtest scores were examined rather than domain or factor scores in order to gain a deeper understanding of the characteristics of the Pillbox Test.

Volition/Inhibition Factor (Brauer-Boone et al., 1998; Bryan & Luszcz, 2000; Shimamura, 1994; Strauss et al., 2006): Planning/Attention Factor (Brauer-Boone et al., 1998; Carlson, Fried, Xue, & Brant, 1999; Golden & Freshwater, 2002; Lamar et al., 2002; Ready, Stierman, & Paulsen, 2001; Senanarong et al, 2005; Strauss et al., 2006): Purposive Action/Self-Regulation Factor (Boyle et al., 2004; Hanks, Rapport, Millis, & Deshpande, 1999; Kibby, Schmitter-Edgecombe, & Long, 1998; Lamar et al., 2002; Lezak et al., 2004; Ready et al., 2001; Salthouse, Atkinson, & Berish, 2003): Effective Performance/Self-monitoring Factor (Bryan & Luszcz, 2000; Ferrer-Caja, Crawford, & Bryan, 2002; Hanks et al., 1999; Salthouse et al., 2003; Strauss et al., 2006):

  • Rey–Osterrieth Complex Figure Test (Rey; Osterrieth, 1944) Copy raw score.

  • Rey Executive Planning raw score, which was determined using the organizational scoring system of the Rey Complex Figure by Savage and colleagues (1999). This scoring system qualitatively evaluates the large rectangle, diagonal cross, vertical midline, horizontal midline, and vertex of triangle on the right.

  • Trail Making Test (TMT; Reitan & Wolfson, 1993) Part A total time in seconds.

  • Stroop Word and Color raw scores.

  • Clock Draw Test (CDT) total raw score from the 7-min screen (Solomon et al., 1998).

  • Controlled Oral Word Association Test (COWAT; Benton, 1968) total raw score.

  • Design Fluency Test (DFT; Jones-Gotman & Milner, 1977) novel output raw score.

  • Rey Delayed Recall raw score,

  • TMT Part B total time in seconds,

  • Wechsler Adult Intelligence Scale-Third Edition (WAIS-III; Wechsler, 1997) Similarities subtest raw score,

  • Wisconsin Card Sorting Test (WCST; Grant & Berg, 1948; Heaton, Chelune, Talley, Kay, & Curtis, 1993) total errors and conceptual responses raw scores,

  • Direct Assessment of Functional Status (DAFS; Loewenstein & Bates, 1992) subscale scores. The DAFS is a behaviorally based measure of IADLs developed and validated in a neurologically impaired sample (Loewenstein et al., 1989). This assessment is composed of several subtests in which the participant is asked to complete a particular IADL while the examiner observes. These IADL activities consist of locating a number in a phone book, dialing a telephone, identifying traffic signs, reading a clock, prepare a letter for mailing, counting currency, writing a check, balancing a checkbook, and recording a phone message. Four subscale scores, Communications, Transportation, Financial, and Message, and one Total Scale score are derived.

  • COWAT perseverative responses raw score,

  • DFT perseverative responses raw score,

  • WCST number of categories, perseverative responses, and perseverative errors raw scores.

The Pillbox Test

The Pillbox Test consists of a pillbox and five pill bottles. The pillbox contains four rows labeled as “Breakfast,” “Lunch,” “Dinner,” and “Bedtime” and seven columns labeled for each day off the week, “Sunday” through “Saturday.” The five pill bottles have standardized administration labels and contain colored beads resembling the approximate size of common aspirin or antihypertensive medications as these were the two most commonly prescribed types of medications according to chart reviews of 50 county hospital memory clinic patients (Brose-Zartman & Houtz, 2003). The average age of these patients was 75 years (SD = 6.45). Thirty-eight percent were diagnosed with Dementia of the Alzheimer's type, 25% were diagnosed with VaD, 8% were diagnosed with dementia of other etiologies, 12% were diagnosed with mild cognitive impairment, and 17% were not diagnosed with any cognitive disorder. Chart reviews also revealed that the average number of medications taken was five, which was congruent with findings of Burdick and colleagues (2005).

The five most common directions were also determined from chart reviews and consultation with hospital pharmacy staff. Medication directions printed on pill bottle labels are as follows: “one tablet daily at bedtime” (orange pills), “one tablet daily in the morning” (blue pills), “one tablet three times a day” (yellow pills), “one tablet twice a day with breakfast and dinner” (green pills), and “one tablet every other day” (red pills). These directions require the participant to make inferences based on vague medication labels. Participants were given credit as long as they did not violate the instructions. Omission errors were counted if the pill was missing from the proper container and not better identified as a misplaced movement error. Misplaced movement errors were pills placed in the correct day and quantity but not according to exact direction. For example, if someone placed an orange pill (one tablet at bedtime) in the breakfast compartment versus the bedtime compartment. In reality, the person would still be taking the correct daily dose but at the improper time which could potentially lead to side effects or other problems. In another example, as long as one yellow pill was placed into compartments three times a day, it did not matter whether the compartments were breakfast, lunch, and dinner or breakfast, dinner, and bedtime and/or if one red pill was placed in the breakfast compartment on Monday but in the dinner compartment on Wednesday as these directions were not time-specific. As such, this would reflect no errors. Commission errors were counted as excess pills (e.g., four yellow pills per day instead of three).

Participants are first asked to read aloud each medication label to verify adequate reading ability. Participants are then instructed to organize the medication in the pillbox as they normally would for 1 week. No further direction or feedback is provided by the examiner for the remainder of the test. This lack of structure was purposeful as EF is most taxed in ambiguous situations (Lamar et al., 2002; Lezak et al., 2004). Participants are allowed 5 min to complete this task. Throughout the test, the examiner records behavioral observations such as the approach to the task. After the participant completes the test, the examiner records the number and type of beads placed into in each pillbox compartment.

This test yields several scores. A straight pass/fail designation is determined by three or more total errors of any type on the task. This criterion was established through consultation with family practice physicians and gerontologists who indicated that this was the number of medication administration errors that would commonly cause medical complications. Additional scores include: Although Pillbox Test scores do not map perfectly onto Lezak's four EF factors, inferences about these abilities can be made based on successful completion of the task and/or difficulties in specific aspects of performing the task. For the volition/inhibition factor, one must inhibit knowledge of one's own medication regimen and commence the task. For the planning/attention factor, one must employ an efficient strategy to carry out the task while sustaining attention (i.e., time to complete the task). For the effective performance/self-monitoring factor, one must monitor when to stop placing pills into the pillbox based on administration directions (i.e., misplacement errors), and for the purposive action/self-regulation factor, one must make inferences from instructions, reason alternative solutions, mentally shift between medication instructions, and demonstrate working memory and motor programming (i.e., total errors, omission errors). Behavioral observations, although not quantified, are equally important in identifying executive dysfunction, as well as other cognitive and motor impairments that may affect performance on the Pillbox Test. For example, the complexity of instructions differs and errors on particular pills (e.g., green pill errors) may provide important information about the capacity of the examinee.

  • sum of yellow pill commission and omission errors (one tablet three times a day)

  • sum of green pill commission and omission errors (one tablet twice a day with breakfast and dinner)

  • sum of blue pill commission and omission errors (one tablet daily in the morning)

  • sum of orange pill commission and omission errors (one tablet daily at bedtime)

  • sum of red pill commission and omission errors (one tablet every other day)

  • number of misplaced movement errors = the sum of number of pills placed in incorrect compartments

  • number of total errors = the sum of total omission errors, commission errors, and misplacement errors

  • total time to complete the task

Procedure

IRB approval was obtained at the university and both hospitals from which participants were recruited. Written informed consent was obtained from all participants at the beginning of the evaluation session. All testing sessions were conducted in the morning to control for time of day effects on cognitive ability, as this is a common concern for the elderly and neurological populations (Salthouse et al., 2003). Participants completed an intake interview to collect demographic data during which the Reitan Indiana Aphasia Screening Test (Reitan & Wolfson, 1993) was administered to provide a brief check of inclusion/exclusion criteria (e.g., visual acuity, ability to read). Participants meeting inclusion/exclusion criteria were then administered the above measures in the following order according to the standardized instructions: Rey Copy, TMT, WAIS-III Similarities, CDT, COWAT, Rey Delayed Recall, DFT, Stroop, WCST, GDS-SF, DAFS, and Pillbox Test.

Data Analysis

Chi-squared tests and one-way analyses of variance (ANOVAs) were performed to determine if there were significant group differences in demographic variables. If significant group differences in age, education, gender, and/or ethnicity were identified, then the variables were used as covariates in subsequent analyses. One-way univariate and multivariate ANOVAs were used to test hypothesis 1 that significant group differences would be found on all traditional and performance-based measures. If significant group differences were found, Tukey's LSD post hoc tests were conducted to identify which groups differed significantly from one another. For variables that violated assumptions of normality, results were confirmed using non-parametric analyses, although effects of covariates could not be examined in these cases. Spearman's rho correlation coefficients were utilized to investigate relationships between the Pillbox Test and demographic variables and to test hypothesis 2 that Pillbox Test variables would correlate significantly with existing EF measures which tap each of Lezak's (1995) four factors. Given the large number of variables (i.e., 28), only traditional EF measures that were significantly different among groups and Pillbox variables with p-values of <.001 in differentiating among the groups were analyzed. To test hypothesis 3, a receiver operator characteristic (ROC) curve analysis was conducted to determine the sensitivity and the specificity of the pass/fail criterion by examining the number of total errors made on the Pillbox Test. A p-value of ≤.05 was utilized for all analyses; corrections for multiple comparisons were not employed give the preliminary nature of this work.

Results

Significant group differences were found in age, F(2,117) = 4.98, p = .008, education, F(2,117) = 11.37, p < .001, ethnicity, χ2(6) = 15.76, p = .02, and the number of medications consumed, F(2,117) = 9.91, p < .001, but not gender χ2(2) = 0.60, p = .74. As shown in Table 1, the CON group was significantly younger and taking significantly fewer medications than the NEURO and MED groups, which did not differ significantly from each other. The CON and MED groups were significantly more educated and comprised of a significantly greater proportion of Caucasian participants than the NEURO group but did not differ significantly from each other.

Table 1.

Demographic characteristics by group

 NEURO (n = 40)
 
MED (n = 40)
 
CON (n = 40)
 
p-value 
Mean SD mean SD mean SD 
Age (years) 68.63a 8.08 69.13b 8.12 64.10a,b 7.32 .008 
Education (years) 12.30a,b 2.86 13.91b 3.34 15.48a 2.70 ≤.001 
# medications 6.68a 5.44 7.08b 4.12 3.25a,b 2.64 ≤.001 
% Men 50  50  42.5  NS 
% Caucasian 65  87.5  95  .02 
 NEURO (n = 40)
 
MED (n = 40)
 
CON (n = 40)
 
p-value 
Mean SD mean SD mean SD 
Age (years) 68.63a 8.08 69.13b 8.12 64.10a,b 7.32 .008 
Education (years) 12.30a,b 2.86 13.91b 3.34 15.48a 2.70 ≤.001 
# medications 6.68a 5.44 7.08b 4.12 3.25a,b 2.64 ≤.001 
% Men 50  50  42.5  NS 
% Caucasian 65  87.5  95  .02 

Notes: NEURO = mixed neurological group; MED = medical control group; CON = healthy community control group.

Groups with the same subscript are significantly different from one another.

Controlling for age, education, and ethnicity in the omnibus one-way and multivariate ANOVAs, significant differences were found among the three groups on traditional EF measures including the Rey, F(6,224) = 2.48, p = .02, TMT, F(4,226) = 6.12, p < .001, CDT, F(2) = 5.11, p = .007, Stroop, F(6,224) = 4.18, p = .001, WCST, F(10,220) = 2.52, p = .007, and GDS-SF, F(2) = 8.41; p < .001, but not on the COWAT, F(4,226) = 1.22, p = .30, or DFT, F(4,226) = 0.54, p = .71. As seen in Table 2, post hoc analyses showed that the NEURO group performed significantly worse than the CON group on the Rey Copy, Delayed Recall, and Executive Plan Scores; TMT Parts A and B; CDT total score; Similarities; Stroop Word, Color, and Color/Word; WCST number of categories, total errors, perseverative responses and errors, and conceptual responses; and GDS-SF. The NEURO group also performed significantly worse than the MED group on all of these measures except WCST total errors. The MED group performed significantly worse than the CON group on Rey Delayed Recall, TMT-Part B, Similarities, Stroop Color and Color/Word, and WCST number of categories, total errors, perseverative responses and errors, and conceptual responses.

Table 2.

Means and standard deviations (SD) of EF measures by group

 NEURO (n = 40)
 
MED (n = 40)
 
CON (n = 40)
 
F p-value 
M SD M SD M SD 
Rey copya 26.16a,b 7.99 31.33a 5.29 33.11b 3.79 5.93 .004 
Rey recall 10.84a 6.18 14.68a 6.32 17.79a 5.82 4.42 .001 
Rey execa 11.28a,b 2.57 12.81a 1.60 13.38b 1.00 6.19 .003 
Trails Aa 77.38a,b 50.04 47.23a 19.91 35.98b 13.85 10.09 <.001 
Trails Ba 175.15a 80.94 119.03a 68.80 82.95a 41.34 8.91 <.001 
CDT total 4.93a,b 1.31 5.83a 1.24 5.75b 0.98 5.12 .007 
Similaritiesa 15.40a 7.03 19.85a 6.18 24.45a 4.09 7.59 .001 
COWAT total 23.38 13.73 29.08 9.76 33.47 9.97 1.98 .14 
COWAT psv 0.58 1.04 0.48 1.22 0.58 1.24 0.19 .82 
DFT novel 4.25 3.14 5.20 3.92 6.20 3.30 1.05 .35 
DFT psv 3.12 2.91 3.10 3.32 3.25 3.71 0.11 .90 
Stroop word 63.88a,b 22.43 75.90a 17.38 92.40b 14.70 5.41 .006 
Stroop color 43.43a 17.00 56.15a 12.74 63.40a 12.66 9.74 <.001 
Stroop CW 19.50a 8.84 28.58a 9.28 32.85a 9.95 11.74 <.001 
WCST # cata 2.85a 2.24 3.73a 2.15 5.18a 1.47 4.87 .009 
WCST errora 53.63a 26.94 44.53b 23.97 24.00a,b 16.65 5.60 .005 
WCST psv respa 43.63a 35.91 26.60a 19.65 13.40a 10.03 5.01 .008 
WCST psv errorsa 37.20a 27.64 23.60a 16.00 12.08a 8.84 6.19 .002 
WCST concepta 46.85a 20.03 57.70a 19.44 67.10a 10.83 5.36 .006 
GDS-SF total 9.30a,b 5.71 4.78a 4.42 3.43b 5.77 8.41 <.001 
 NEURO (n = 40)
 
MED (n = 40)
 
CON (n = 40)
 
F p-value 
M SD M SD M SD 
Rey copya 26.16a,b 7.99 31.33a 5.29 33.11b 3.79 5.93 .004 
Rey recall 10.84a 6.18 14.68a 6.32 17.79a 5.82 4.42 .001 
Rey execa 11.28a,b 2.57 12.81a 1.60 13.38b 1.00 6.19 .003 
Trails Aa 77.38a,b 50.04 47.23a 19.91 35.98b 13.85 10.09 <.001 
Trails Ba 175.15a 80.94 119.03a 68.80 82.95a 41.34 8.91 <.001 
CDT total 4.93a,b 1.31 5.83a 1.24 5.75b 0.98 5.12 .007 
Similaritiesa 15.40a 7.03 19.85a 6.18 24.45a 4.09 7.59 .001 
COWAT total 23.38 13.73 29.08 9.76 33.47 9.97 1.98 .14 
COWAT psv 0.58 1.04 0.48 1.22 0.58 1.24 0.19 .82 
DFT novel 4.25 3.14 5.20 3.92 6.20 3.30 1.05 .35 
DFT psv 3.12 2.91 3.10 3.32 3.25 3.71 0.11 .90 
Stroop word 63.88a,b 22.43 75.90a 17.38 92.40b 14.70 5.41 .006 
Stroop color 43.43a 17.00 56.15a 12.74 63.40a 12.66 9.74 <.001 
Stroop CW 19.50a 8.84 28.58a 9.28 32.85a 9.95 11.74 <.001 
WCST # cata 2.85a 2.24 3.73a 2.15 5.18a 1.47 4.87 .009 
WCST errora 53.63a 26.94 44.53b 23.97 24.00a,b 16.65 5.60 .005 
WCST psv respa 43.63a 35.91 26.60a 19.65 13.40a 10.03 5.01 .008 
WCST psv errorsa 37.20a 27.64 23.60a 16.00 12.08a 8.84 6.19 .002 
WCST concepta 46.85a 20.03 57.70a 19.44 67.10a 10.83 5.36 .006 
GDS-SF total 9.30a,b 5.71 4.78a 4.42 3.43b 5.77 8.41 <.001 

Notes: NEURO = mixed neurological group; MED = medical control group; CON = healthy community control group. Rey = Rey Complex Figure Test; Rey exec = executive plan score; Trails = Trail Making Test; CDT = Clock Draw Test; COWAT = Controlled Oral Word Association Test; COWAT psv = perseverative responses; DFT = Design Fluency Test; DFT novel = novel output score; DFT psv = perseverative responses; Stroop = Stoop Word and Color Test; Stroop CW = Color Word subscale; WCST = Wisconsin Card Sort Test; WSCST # cat = number of categories; WCST error = total errors; WCST psv resp = number of perseverative responses; WCST psv errors = number of perseverative errors; WCST concept = conceptual responses; GDS-SF = Geriatric Depression Scale-Short Form. Groups with the same subscript are significantly different from one another.

aFindings did not change when non-parametric analyses were utilized to account for unequal variances.

On performance-based measures, significant group differences were found on both the DAFS, F(10,220) = 8.92, p < .001, and the Pillbox Test, F(16,214) = 3.36, p < .001, after correcting for age, education, and ethnicity. As shown in Table 3, the NEURO group performed significantly worse than the MED and CON groups on the DAFS Total score and the Communication, Financial, and Message subscale scores, whereas the MED group performed significantly worse than the CON group on the DAFS Message subscale only. On the Pillbox Test, the NEURO group performed significantly worse than the other two groups on total errors, misplacement errors, total time, yellow pill errors, green pill errors, blue pill errors, and orange pill errors. The MED group performed significantly worse than the CON group on green pill errors only. When considering the pass/fail criterion, there were significant differences among the groups, χ2(2) = 29.27, p < .001, with a significantly greater portion of the NEURO group failing the Pillbox Test (75%) compared with both the MED (43%) and CON (15%) groups, χ2(1) = 8.72 and p = .003 and χ2(1) = 29.09 and p < .001, respectively, and a significantly greater portion of the MED group failing compared with the CON group, χ2(1) = 7.34, p = .007.

Table 3.

Means and standard deviations (SD) of performance based measures by group

 NEURO (n = 40)
 
MED (n = 40)
 
CON (n = 40)
 
F p-value 
M SD M SD M SD 
DAFS total 55.90a,b 7.94 62.85a 2.66 64.60b 1.57 23.21 <.001 
DAFS comm 11.18a,b 2.23 13.65a 0.62 13.68b 0.66 33.14 <.001 
DAFS tran 11.68 1.58 12.20 0.97 12.60 0.71 1.09 .34 
DAFS fin 15.98a,b 3.03 18.13a 1.14 18.52b 0.82 13.78 <.001 
DAFS mes 1.97a 1.19 3.33a 0.89 3.85a 0.36 30.63 <.001 
Pill total 29.70a,b 30.25 8.57a 13.48 2.58b 6.18 12.31 <.001 
Pill me 13.25a,b 25.32 3.38a 5.68 1.30b 3.16 3.04 .05 
Pill time 259.75a,b 62.15 204.35a 77.40 190.55b 44.24 7.54 .001 
Pill yellow 10.85a,b 11.50 2.60a 6.86 0.93b 4.61 11.24 ≤.001 
Pill green 9.98a 9.75 4.15a 7.08 0.88a 3.24 8.36 ≤.001 
Pill blue 2.48a,b 4.99 0.33a 1.33 0.00b 0.00 5.03 .008 
Pill orange 3.43a,b 6.10 0.43a 1.50 0.40b 2.22 4.24 .02 
Pill red 2.98 5.67 0.93 2.73 0.20 0.69 2.89 .06 
 NEURO (n = 40)
 
MED (n = 40)
 
CON (n = 40)
 
F p-value 
M SD M SD M SD 
DAFS total 55.90a,b 7.94 62.85a 2.66 64.60b 1.57 23.21 <.001 
DAFS comm 11.18a,b 2.23 13.65a 0.62 13.68b 0.66 33.14 <.001 
DAFS tran 11.68 1.58 12.20 0.97 12.60 0.71 1.09 .34 
DAFS fin 15.98a,b 3.03 18.13a 1.14 18.52b 0.82 13.78 <.001 
DAFS mes 1.97a 1.19 3.33a 0.89 3.85a 0.36 30.63 <.001 
Pill total 29.70a,b 30.25 8.57a 13.48 2.58b 6.18 12.31 <.001 
Pill me 13.25a,b 25.32 3.38a 5.68 1.30b 3.16 3.04 .05 
Pill time 259.75a,b 62.15 204.35a 77.40 190.55b 44.24 7.54 .001 
Pill yellow 10.85a,b 11.50 2.60a 6.86 0.93b 4.61 11.24 ≤.001 
Pill green 9.98a 9.75 4.15a 7.08 0.88a 3.24 8.36 ≤.001 
Pill blue 2.48a,b 4.99 0.33a 1.33 0.00b 0.00 5.03 .008 
Pill orange 3.43a,b 6.10 0.43a 1.50 0.40b 2.22 4.24 .02 
Pill red 2.98 5.67 0.93 2.73 0.20 0.69 2.89 .06 

Notes: NEURO = mixed neurological group; MED = medical control group; CON = healthy community control group;  DAFS = Direct Assessment of Functional Status: DAFS comm = DAFS communication subscale; DAFS tran = DAFS transportation subscale; DAFS fin = DAFS financial; DAFS mes = DAFS message; Pill total = Pillbox total errors; Pill me = Pillbox misplacement errors; Pill time = Pillbox total time; Pill yellow = Pillbox yellow pill errors; Pill green = Pillbox green pill errors; Pill orange = Pillbox orange pill errors; Pill red = Pillbox red pill errors. Groups with the same subscript are significantly different from one another.

Significant correlations were found between the primary Pillbox Test scores and age, education, and ethnicity but not gender. Correlations with age ranged from .20 (number of green pill errors) to .34 (total time to complete task), with education from −.17 (number of yellow pill errors) to −.31 (total time to complete task), and with ethnicity from .21 (total time to complete task) to .38 (total number of errors). The Pillbox variables correlated significantly in the expected direction with most neuropsychological measures, including the DAFS (Table 4). Correlations between the Pillbox variables and established EF measures ranged from −.004 (i.e., number of green pill errors and GDS-SF) to −.60 (i.e., total errors and DAFS Communication). For the volition/inhibition factor, the strongest correlation was seen between total time and Stroop Color/Word (r = −.46). For the planning/attention factor, the strongest correlations were found between total errors and Rey copy (r = −.57), total errors and TMT Part A (r = .57), and the number of yellow pill errors and Rey copy (r = −.56). For the purposive action/self-regulation factor, the strongest correlations occurred between total errors and DAFS Total, Communication, and Message (r = −.57, −.60, and −.55, respectively) and TMT Part B (r = .56), between total time and TMT Part B (r = .54), and between the number of yellow pill errors and DAFS Total, Communication, and Message (r = −.55, −.56, and −.55, respectively). For the final factor, effective performance/self-monitoring, the strongest correlations were found between pass/fail and WCST perseverative errors and responses (r = −.44 and −.43, respectively), between total errors and WCST perseverative errors (r = .43), number of categories (r = −.42), and perseverative responses (r = .40), and between the number of yellow pill errors and perseverative errors (r = .41). When examining correlations within each of the three groups separately, significant correlations were found between Pillbox Test variables and measures representing each of Lezak's four-factor EF except in the CON group where no Pillbox Test scores correlated significantly with any measures of effective performance/self-monitoring (Supplementary Table S1). These findings were largely congruent with expectation in relation to the level of executive dysfunction within each of these patient groups and the psychometric properties of each of the standardized EF measures highlighted in this table.

Table 4.

Correlations of primary Pillbox Test scores with existing measures of EF

 Total errors Total time Yellow pill errors Green pill errors Pass/Fail ≥ 3 Pass/Fail ≥ 5 
Volition/Inhibition EF factor 
 Stroop C/W −0.38*** −0.46*** −0.35*** −0.33*** 0.37*** 0.40*** 
 GDS-SF 0.13 0.21* 0.19* −0.004 −0.28** −0.25** 
Planning and Attention EF factor 
 Rey copy −0.57*** −0.36*** −0.56*** −0.42*** 0.47*** 0.48*** 
 Rey exec −0.48*** −0.32*** −0.51*** −0.33*** 0.42*** 0.43*** 
 Trails A 0.57*** 0.43*** 0.48*** 0.43*** −0.46*** −0.43*** 
 Stroop word −0.24** −0.26** −0.18* −0.19* 0.11 0.16 
 Stroop color −0.37*** −0.38*** −0.30*** −0.36*** 0.35*** 0.36*** 
 CDT total −0.38*** −0.20* −0.43*** −0.29*** 0.35*** 0.33*** 
Purposive Action and Self-Regulation EF factor 
 Rey recall −0.38** −0.40*** −0.40*** −0.29*** 0.47*** 0.42*** 
 Trails B 0.56*** 0.54*** 0.49*** 0.49*** −0.56*** −0.55*** 
 Similarities −0.52*** −0.33*** −0.44*** −0.46*** 0.44*** 0.48*** 
 WCST total error 0.42*** 0.31*** 0.39*** 0.38*** −0.43*** −0.39*** 
 WCST concept −0.38** −0.30*** −0.33*** −0.34*** 0.26** 0.26** 
 DAFS total −0.57*** −0.25** −0.55*** −0.47*** 0.54*** 0.55*** 
 DAFS comm −0.60*** −0.32*** −0.56*** −0.45*** 0.40*** 0.39*** 
 DAFS financial −0.52*** −0.17 −0.47*** −0.41*** 0.48*** 0.48*** 
 DAFS message −0.55*** −0.31*** −0.55*** −0.45*** −0.45*** 0.50*** 
Effective Performance/Self-Monitoring EF factor 
 WCST # cat −0.42*** −0.27** −0.39*** −0.38** 0.37*** 0.35*** 
 WCST psv resp 0.40*** 0.28** 0.39*** 0.35*** −0.43*** −0.41*** 
 WCST psv err 0.43*** 0.26** 0.41*** 0.38*** −0.44*** −0.42*** 
 Total errors Total time Yellow pill errors Green pill errors Pass/Fail ≥ 3 Pass/Fail ≥ 5 
Volition/Inhibition EF factor 
 Stroop C/W −0.38*** −0.46*** −0.35*** −0.33*** 0.37*** 0.40*** 
 GDS-SF 0.13 0.21* 0.19* −0.004 −0.28** −0.25** 
Planning and Attention EF factor 
 Rey copy −0.57*** −0.36*** −0.56*** −0.42*** 0.47*** 0.48*** 
 Rey exec −0.48*** −0.32*** −0.51*** −0.33*** 0.42*** 0.43*** 
 Trails A 0.57*** 0.43*** 0.48*** 0.43*** −0.46*** −0.43*** 
 Stroop word −0.24** −0.26** −0.18* −0.19* 0.11 0.16 
 Stroop color −0.37*** −0.38*** −0.30*** −0.36*** 0.35*** 0.36*** 
 CDT total −0.38*** −0.20* −0.43*** −0.29*** 0.35*** 0.33*** 
Purposive Action and Self-Regulation EF factor 
 Rey recall −0.38** −0.40*** −0.40*** −0.29*** 0.47*** 0.42*** 
 Trails B 0.56*** 0.54*** 0.49*** 0.49*** −0.56*** −0.55*** 
 Similarities −0.52*** −0.33*** −0.44*** −0.46*** 0.44*** 0.48*** 
 WCST total error 0.42*** 0.31*** 0.39*** 0.38*** −0.43*** −0.39*** 
 WCST concept −0.38** −0.30*** −0.33*** −0.34*** 0.26** 0.26** 
 DAFS total −0.57*** −0.25** −0.55*** −0.47*** 0.54*** 0.55*** 
 DAFS comm −0.60*** −0.32*** −0.56*** −0.45*** 0.40*** 0.39*** 
 DAFS financial −0.52*** −0.17 −0.47*** −0.41*** 0.48*** 0.48*** 
 DAFS message −0.55*** −0.31*** −0.55*** −0.45*** −0.45*** 0.50*** 
Effective Performance/Self-Monitoring EF factor 
 WCST # cat −0.42*** −0.27** −0.39*** −0.38** 0.37*** 0.35*** 
 WCST psv resp 0.40*** 0.28** 0.39*** 0.35*** −0.43*** −0.41*** 
 WCST psv err 0.43*** 0.26** 0.41*** 0.38*** −0.44*** −0.42*** 

Notes: Yellow pill direction = one tablet three times a day; Green pill direction = one tablet twice a day with breakfast and dinner, Pass/Fail = three or more errors; EF = executive function; Stroop C/W = Stroop Color/Word score; GDS-SF = Geriatric Depression Scale-Short Form total score; Rey = Rey Complex Figure copy score; Rey exec = Rey executive plan score; Trails = Trail Making Test; CDT = Clock Draw Test; WCST = Wisconsin Card Sort Test; WCST concept = number of conceptual responses; WCST # cat = number of categories completed; WCST psv resp = number of perseverative responses; WCST psv errors = number of perseverative errors; DAFS = Direct Assessment of Functional Status; DAFS comm = communication subscale score.

*p < .05.

**p < .01.

***p < .001.

An ROC analysis indicated that the area under the curve was significant (i.e., 0.82, 95% CI: 0.73–0.92; p < .001) when using the total number of Pillbox Test errors to predict membership in the NEURO or CON groups (Figure 1). Two or more total errors on the Pillbox Test provided a sensitivity level of 77.5% and a specificity level of 77.5%, ≥3 total errors provided a 75.0% sensitivity level and an 80% specificity level, ≥4 total errors kept the sensitivity level at 75% and increased the specificity level to 85%, and ≥5 total errors also had a 75% sensitivity level and increased the specificity level slightly to 87.5%. Correlations between existing EF measures and cutscores of ≥3 and ≥5 are shown in Table 4 and reveal moderate correlations in the expected directions on most measures.

Fig. 1.

ROC of group membership (i.e., NEURO or CON) predicted by Pillbox Test total errors.

Fig. 1.

ROC of group membership (i.e., NEURO or CON) predicted by Pillbox Test total errors.

Discussion

The purpose of the current study was to examine the construct validity of a new performance-based measure of EF, the Pillbox Test, which was hypothesized to address all four of Lezak's (1995) EF factors (i.e., purposive action/self-regulation, planning/attention, volition/inhibition, and effective performance/self-monitoring) through the real-time assessment of medication management. The following three hypotheses were examined and generally supported: (1) individuals with neurological impairment will perform worse than individuals with cardiovascular disease who will perform worse than healthy community controls on traditional and performance-based measures of EF, including the Pillbox Test; (2) the Pillbox Test will correlate moderately with variables from widely established EF measures in each of the four domains purported by Lezak; and (3) the Pillbox Test will demonstrate good sensitivity and specificity in individuals with neurologic impairment and healthy controls.

With regard to hypothesis 1, the NEURO group performed significantly worse than the MED and CON groups on seven of eight Pillbox Test measures, most established EF measures, and four of five DAFS subscales, an existing performance-based measure of IADLs. The MED group performed significantly worse than the CON group on one Pillbox Test variable (i.e., green pill errors), many of the traditional EF measures, and one of the DAFS subscales (i.e., Message). These findings indicate that the Pillbox Test has good criterion-related validity and is effective in differentiating graduated levels of executive dysfunction as the worst performance was seen in the NEURO group, with the MED group demonstrating less difficulty but more than controls, consistent with mild executive dysfunction often noted in cardiovascular samples. An exploratory analysis within the NEURO group comparing patients with and without dementia revealed that patients with dementia performed worse on all Pillbox variables, albeit not always significantly so (data not shown), providing further preliminary support of its ability to differentiate among levels of cognitive impairment.

The Pillbox Test also showed good convergent validity as it correlated significantly in expected directions with established EF measures in all four of Lezak's EF domains, as well as with DAFS subscales (hypothesis 2). Interestingly, the strongest correlations noted in each factor were related to some of the most commonly used measures in each respective domain: Stroop Word/Color score for the volition/inhibition factor (Strauss et al., 2006), Rey Copy and TMT Part A for the planning/attention factor (Brauer-Boone et al., 1998; Savage et al, 1999), TMT Part B and DAFS Total score for the purposive action/self-regulation factor (Hanks et al, 1999), and WCST perseverative errors, perseverative responses, and number of categories for the effective performance/self-monitoring factor (Kibby et al., 1998). Especially noteworthy are the moderate correlations between Pillbox Test variables, TMT Part B and WCST variables as the latter two tests are frequently considered the “gold standard” in measuring EF. Pillbox Test scores were significantly correlated with the four most sensitive WCST scores for determining executive dysfunction: raw error score, number of categories completed, perseverative raw response score, and perseverative error score (Crawford, Bryan, Luszcz, Obonsawin, & Stewart, 2000; Ready et al., 2001). While the WCST has been criticized for its length and difficulty of administration (Lezak, 1995; Lezak et al., 2004; Strauss et al., 2006), the Pillbox Test could provide similar information in a shorter amount of time and may be less frustrating.

The initial pass/fail designation of three or more total errors on the Pillbox Test was based on consultation with family practice physicians and gerontologists who indicated this as the threshold of errors commonly associated with medical complications. An ROC curve found this criterion resulted in the sensitivity of 75% and the specificity of 80% (hypothesis 3). However, increasing the threshold to five or more total errors resulted in the same sensitivity (75%) but increased the specificity to 87.5%, thus lowering the false positive rate to only 12.5%. When five or more errors were applied as the pass/fail criterion to the current study, significant differences among the NEURO, MED, and CON groups remained.

Theoretically, the Pillbox Test appears to compensate for documented weaknesses of traditional EF measures in several ways. First, while no single standardized instrument is known to measure all EF domains (Brauer-Boone et al., 1998; Lamar et al., 2002; Miyake et al., 2000; Strauss et al., 2006), the Pillbox Test appears to tap the four EF domains purported by Lezak. Thus, by considering Pillbox Test performance both quantitatively and qualitatively, not only can one identify the presence of impairment, but one may also determine which particular EF domain(s) is affected, resulting in the ability to apply specific corrective actions and inform treatment planning.

Other advantages of the Pillbox Test over existing EF measures are its lack of reliance on novelty and the minimal interactions required by the examiner. For these reasons, the Pillbox Test is expected to have higher test–retest reliability than other standardized measures of EF which are commonly low due to benefits of prior exposure (Miyake et al., 2000; Strauss et al., 2006) and greater ecological validity as the examiner does not function as the patient's “frontal lobes” by employing corrective actions such as those applied during Trails-B (Manchester, Priestley, & Jackson, 2004; Richardson, Nadler, & Malloy, 1995). Additionally, this test requires the patient to make inferences from ambiguous instructions, an ability suggested as most responsible for poor medication management but not readily assessed by other neuropsychological tests (Hayes, 1998; McGraw & Drennan, 2001; Park et al., 1994).

Finally, the utility of traditional EF measures in predicting one's ability to carry out IADLs has been criticized due to the lack of strong correlations between performance on standardized testing and real-life functions within a specific domain (Diehl, Willis, & Schaie, 1995; Duke & Kaszniak, 2000; Grigsby, Kaye, & Robbins, 1995). As the Pillbox Test replicates an IADL task, this criticism does not apply. In fact, in addition to identifying possible executive dysfunction, Pillbox Test performance may provide important information about the level of comprehension patients have regarding their medical regimen and opportunities to implement strategies to improve medication management. In the long term, this may lead to earlier diagnosis and more cost-effective healthcare with increased medication adherence and lower hospitalization rates due to mismanagement of medications. Additionally, while patients' families may have difficulty comprehending how their loved ones' performances on traditional neuropsychological tasks relate to real-world functioning, showing them a concrete example in the form of the patient's difficulties organizing a pillbox can make it easier to understand the provider's concerns. This is especially important as families tend to overestimate patients' abilities to perform supposedly over learned IADL tasks such as medication management (Park, 1999).

Because the concept of assessing medication management is not new (Farris & Phillips, 2008), it is important to point out how the Pillbox Test differs from existing performance-based medication management measures, such as the Hopkins Medication Schedule (HMS; Carlson et al., 2005), the Medication Management Ability Assessment (MMAA; Patterson et al., 2002), and the Medication Management Test (Gurland et al., 1994). First and foremost, as has been stated above, the Pillbox Test was designed a priori to assess the four factors of EF proposed by Lezak (1995). No other medication management measure was developed to assess EF specifically, although varying degrees of EF are likely required. Toward this end, instructions for the Pillbox Test are very brief and lack structure (i.e., organize this pillbox with these medications for 1 week) so that the examinee's executive abilities are assessed as fully as possible. This is in contrast to other medication management measures, most of which were developed to assess capacity to manage medications and ask specific questions of the examinee (e.g., if you took three pills every day, how many days would the pills last?) or provide specific instructions about what to do (e.g., read and explain five prescription labels, recognize tablet colors). Another difference in administration is that the Pillbox Test asks examinees to fill in compartments of a “weekly” pillbox with “five” medications, while other measures ask examinees to fill the compartments of a “daily” pillbox (e.g., HMS, MMAA) or to fill a weekly pillbox with fewer than five medications (e.g., Standardized Medication Task; Isaac et al., 1993). Because the Pillbox Test provides less structure and requires the organization of more medications over a longer time period than most other measures, it may be more difficult and thus more sensitive to detecting subtle cognitive dysfunction and particularly executive dysfunction.

Findings of this study should be interpreted in the context of study limitations. First, the results of this study were based on a limited sample size (n = 120). Second, due to source location characteristics, collateral information regarding each participant's ability to perform IADLs was not collected. Third, neuroimaging was unavailable for the cardiac group, which may have provided more information concerning the integrity of the anatomy and function of the brain. Fourth, divergent validity of the Pillbox Test was not addressed and convergent validity of Lezak's (1995) four-factor model could not be fully established due to the limited number of existing neuropsychological measures that formally assess each of the four EF factors. For example, few tests have been developed to quantify specifically self-monitoring or self-corrective behavior (i.e., factor four). Thus, perseverative responses on several tests were used as convergent validity measures but likely provide too narrow a representation of this EF factor. Finally, while these initial findings of the Pillbox Test suggest that it may be clinically useful as a measure of medication management, it was designed as an ecological measure of EF and not for assessing all aspects involved in medication adherence such as ability to open pill bottles, remembering to take medications on time, and obtaining refills. Healthcare providers wishing to thoroughly assess capacity to manage medications would be better served administering a measure designed specifically for this purpose (seeFarris & Phillips, 2008, for a review of measures).

In summary, the Pillbox Test appears to be a promising measure of EF. Initial findings indicate that it demonstrates good construct validity, tapping all four domains of EF purported by Lezak, it does not rely on novelty or the examiner to act as the patient's frontal lobes, it involves a concept familiar to most people and thus is likely less threatening, and it is brief and easy to administer and score, making it a potential measure for cognitive screening, especially in primary care settings. However, additional studies of test–retest reliability and validity in large and diverse samples with a variety of clinical diagnoses, including well-defined groups with specified cognitive impairment, are needed to firmly establish the utility of the Pillbox Test. The proposed cutscores, in particular, will require further validation before being applied more widely to other patient populations with questionable abilities to comply with medication regimens (e.g., moderate-to-severe traumatic brain injury, human immunodeficiency virus). Moreover, research on whether a time limit should be imposed may be helpful, since there is no time limit when organizing a pillbox at home.

Supplementary Material

Supplementary material is available at Archives of Clinical Neuropsychology online.

Funding

There were no sources of financial support for this study.

Conflict of Interest

None declared.

References

Barclay
L. L.
Weiss
E. M.
Mattis
S.
Bond
O.
Blass
J. P.
Unrecognized cognitive impairment in cardiac rehabilitation patients
Journal of the American Geriatrics Society
 , 
1988
, vol. 
36
 (pg. 
22
-
28
)
Bell-McGinty
S.
Podell
K.
Franzen
M.
Baird
A. D.
Williams
M. J.
Standard measures of executive function in predicting instrumental activities of daily living in older adults
International Journal of Geriatric Psychiatry
 , 
2002
, vol. 
17
 (pg. 
828
-
834
)
Bennett
S. J.
Sauve
M. J.
Shaw
R. M.
A conceptual model of cognitive deficits in chronic heart failure
Journal of Nursing Scholarship
 , 
2005
, vol. 
37
 (pg. 
222
-
228
)
Benton
A. L.
Differential behavioral effects in frontal lobe disease
Neuropsychologia
 , 
1968
, vol. 
6
 (pg. 
53
-
60
)
Boyle
P. A.
Paul
R. H.
Moser
D. J.
Cohen
R. A.
Executive impairments predict functional declines in vascular dementia
The Clinical Neuropsychologist
 , 
2004
, vol. 
18
 (pg. 
75
-
82
)
Brauer-Boone
K.
Ponton
M. O.
Gorsuch
R. L.
Gonzalez
J. J.
Miller
B. L.
Factor analysis of four measures of prefrontal lobe functioning
Archives of Clinical Neuropsychology
 , 
1998
, vol. 
13
 
7
(pg. 
585
-
595
)
Brose-Zartman
A.
Houtz
A.
The Pillbox Test as an ecological valid measure of executive functioning in Alzheimer's disease and vascular dementia: A preliminary report
 , 
2003
,
Dallas
Poster session presented at the annual meeting of the National Academy of Neuropsychology
Bryan
J.
Luszcz
M. A.
Measurement of executive function: Considerations for detecting adult age differences
Journal of Clinical and Experimental Neuropsychology
 , 
2000
, vol. 
22
 (pg. 
40
-
55
)
Burdick
D. J.
Rosenblatt
A.
Samus
Q. M.
Steele
C.
Baker
A.
Harper
M.
, et al.  . 
Predictors of functional impairment in residents of assisted-living facilities: The Maryland assisted living study
Journal of Gerontology: Medical Sciences
 , 
2005
, vol. 
60A
 
2
(pg. 
258
-
264
)
Carlson
M. C.
Fried
L. P.
Xue
Q. L. T.
Bandeen-Roche
K.
Zeger
S.
Brandt
J.
Association between executive attention and physical functional performance in community-dwelling older women
Journals of Gerontology, Series B: Psychological Sciences and Social Sciences
 , 
1999
, vol. 
54B
 
5
(pg. 
S262
-
S270
)
Carlson
M. C.
Fried
L. P.
Xue
Q. L. T.
Brant
J.
Validation of the Hopkins Medication Schedule to identify difficulties in taking medications
Journal of Gerontology: Medical Sciences
 , 
2005
, vol. 
60V
 
2
(pg. 
217
-
223
)
Chan
R. C. K.
Shum
D.
Toulopoulou
T.
Chen
E. Y. H.
Assessment of executive functions: Review of instruments and identification of critical issues
Archives of Clinical Neuropsychology
 , 
2008
, vol. 
23
 
2
(pg. 
201
-
216
)
Crawford
J. R.
Bryan
J.
Luszcz
M. A.
Obonsawin
M. C.
Stewart
L.
The executive decline hypothesis of cognitive aging: Do executive deficits qualify as differential deficits and do they mediate age-related memory decline?
Aging, Neuropsychology, and Cognition
 , 
2000
, vol. 
7
 (pg. 
9
-
31
)
Diehl
M.
Willis
S. L.
Schaie
K.
Everyday problem solving in older adults: Observational assessment and cognitive correlates
Psychology and Aging
 , 
1995
, vol. 
10
 
3
(pg. 
478
-
491
)
Duke
L. M.
Kaszniak
A. W.
Executive control functions in degenerative dementias: A comparative review
Neuropsychology Review
 , 
2000
, vol. 
10
 (pg. 
75
-
99
)
Farris
K. B.
Phillips
B. B.
Instruments assessing capacity to manage medications
Annals of Pharmacotherapy
 , 
2008
, vol. 
42
 (pg. 
1026
-
1036
)
Ferrer-Caja
E.
Crawford
J. R.
Bryan
J.
A structural modeling examination of the executive decline hypothesis of cognitive aging through reanalysis of Crawford et al.'s (2000) data
Aging, Neuropsychology, and Cognition
 , 
2002
, vol. 
9
 (pg. 
231
-
249
)
Golden
C. J.
Freshwater
S. M.
Stroop Color and Word Test: A manual for clinical and experimental uses
 , 
2002
Wood Dale, IL
Stoelting
Gould
O. N.
Park
D. C.
Morrell
R. W.
Shifren
K.
Cognition and affect in medication adherence
Processing of medical information in aging patients: Cognitive and human factors perspectives
 , 
1999
Mahwah, NJ
Lawrence Erlbaum Associates
(pg. 
145
-
166
)
Goverover
Y.
Categorization, deductive reasoning, and self-awareness: Association with everyday competence in persons with acute brain injury
Journal of Clinical and Experimental Neuropsychology
 , 
2004
, vol. 
26
 (pg. 
737
-
749
)
Grant
D. A.
Berg
E.
A behavioral analysis of degree of reinforcement and ease of shifting to new responses in a Weigl-type card-sorting problem
Journal of Experimental Psychology
 , 
1948
, vol. 
38
 
4
(pg. 
401
-
411
)
Grigsby
J.
Kaye
K.
Baxter
J.
Shetterly
S. M.
Hamman
R. F.
Executive cognitive abilities and functional status among community-dwelling older persons in the San Luis Valley health and aging study
Journal of the American Geriatrics Society
 , 
1998
, vol. 
46
 (pg. 
590
-
596
)
Grigsby
J.
Kaye
K.
Robbins
L. J.
Behavioral disturbance and impairment of executive functions among the elderly
Archives of Gerontology and Geriatrics
 , 
1995
, vol. 
21
 (pg. 
167
-
177
)
Gurland
B. J.
Cross
P.
Chen
J.
Wilder
D. E.
Pine
Z. M.
Lantigua
R. A.
, et al.  . 
A new performance test of adaptive cognitive functioning: the Medication Management Test (MMT)
International Journal of Geriatric Psychiatry
 , 
1994
, vol. 
9
 (pg. 
875
-
885
)
Hanks
R. A.
Rapport
L. J.
Millis
S. R.
Deshpande
S. A.
Measures of executive functioning ability as predictors of functional ability and social integration in a rehabilitation sample
Archives of Physical Medicine and Rehabilitation
 , 
1999
, vol. 
80
 (pg. 
1030
-
1037
)
Hayes
K. S.
Randomized trial of geragogy-based medication instruction in the emergency department
Nursing Research
 , 
1998
, vol. 
47
 (pg. 
211
-
218
)
Heaton
R. K.
Chelune
G. J.
Talley
J. L.
Kay
G. G.
Curtis
G.
Wisconsin Card Sorting Test manual: Revised and expanded
 , 
1993
Odessa, FL
Psychological Assessment Resources
Isaac
L. M.
Tamblyn
R. M.
and the McGill-Calgary Drug Research Team
Compliance and cognitive function: A methodological approach to measuring unintentional errors in medication compliance in the elderly
The Gerontologist
 , 
1993
, vol. 
33
 (pg. 
772
-
781
)
Jones-Gotman
M.
Milner
B.
Design fluency: The invention of nonsense drawing after focal cortical lesions
Neuropsychologia
 , 
1977
, vol. 
15
 (pg. 
653
-
674
)
Kendrick
R.
Bayne
J. R.
Compliance with prescribed medication by elderly patients
Canadian Medical Association Journal
 , 
1982
, vol. 
127
 (pg. 
961
-
962
)
Kibby
M. Y.
Schmitter-Edgecombe
M.
Long
C. J.
Ecological validity of neuropsychological tests: Focus on the California Verbal Learning Test and the Wisconsin Card Sorting Test
Archives of Clinical Neuropsychology
 , 
1998
, vol. 
13
 (pg. 
523
-
534
)
Kripalani
S.
Henderson
L. E.
Chiu
E. Y.
Robertson
R.
Kolm
P.
Jacobson
T. A.
Predictors of medication self-management skill in a low-literacy population
Journal of General Internal Medicine
 , 
2006
, vol. 
21
 (pg. 
852
-
856
)
Lamar
M.
Zonderman
A. B.
Resnick
S.
Contribution of specific cognitive processes to executive functioning in an aging population
Neuropsychology
 , 
2002
, vol. 
16
 (pg. 
156
-
162
)
Lezak
M. D.
Neuropsychological assessment
 , 
1995
3rd ed.
New York
Oxford Press
Lezak
M. D.
Howieson
D. B.
Loring
D. W.
Neuropsychological assessment
 , 
2004
4th ed.
New York
Oxford Press
Loewenstein
D. A.
Amigo
E.
Duara
R.
Guterman
A.
Hurwitz
D.
Berkowitz
N.
, et al.  . 
A new scale for the assessment of functional status in Alzheimer's disease and related disorders
Journal of Gerontology
 , 
1989
, vol. 
44
 
4
(pg. 
114
-
121
)
Loewenstein
D. A.
Bates
C. B.
The Direct Assessment of Functional Status (DAFS). Manual for administration and scoring
 , 
1992
Miami Beach, FL
Neuropsychological Laboratories, The Wien Center for Alzheimer's Disease and Memory Disorders, Mount Sinai Medical Center
Malloy
P. M.
Cohen
R. A.
Jenkins
M. A.
Snyder
P. J.
Nussbaum
P. D.
Frontal lobe function and dysfunction
Clinical neuropsychology: A pocket handbook for assessment
 , 
1998
Washington, DC
American Psychological Association
(pg. 
573
-
590
)
Manchester
D.
Priestley
N.
Jackson
H.
The assessment of executive functions: Coming out of the office
Brain Injury
 , 
2004
, vol. 
18
 (pg. 
1067
-
1081
)
McGraw
C.
Drennan
V.
Self-administration of medicine and older people
Nursing Standard
 , 
2001
, vol. 
15
 
18
(pg. 
33
-
36
)
Miyake
A.
Friedman
N. P.
Emerson
M. J.
Witzki
A. H.
Howerter
A.
The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: A latent variable analysis
Cognitive Psychology
 , 
2000
, vol. 
41
 (pg. 
49
-
100
)
Newman
A. B.
Fitzpatrick
A. L.
Lopez
O.
Jackson
S.
Lyketos
C.
Jagust
W.
, et al.  . 
Dementia and Alzheimer's disease incidence in relationship to cardiovascular disease in the Cardiovascular Health Study cohort
Journal of the American Geriatric Society
 , 
2005
, vol. 
53
 (pg. 
1101
-
1107
)
O'Bryant
S. E.
Falkowski
J.
Hobson
V.
Johnson
L.
Hall
J.
Schrimsher
G. W.
, et al.  . 
Executive functioning mediates the link between other neuropsychological domains and daily functioning: A Project FRONTIER study
International Psychogeriatrics
 , 
2011
, vol. 
23
 (pg. 
107
-
113
)
Osterrieth
P. A.
Le test de copie d'une figure complexe
Archives de Psychologie
 , 
1944
, vol. 
30
 (pg. 
206
-
356
Translated by J. Corwin & F. W. Bylsma (1993), The Clinical Neuropsychologist, 7, 9–15
Park
D. C.
Park
D. C.
Morrell
R. W.
Shifren
K.
Aging and the controlled and automatic processing of medical information and medical intentions
Processing of medical information in aging patients: Cognitive and human factors perspectives
 , 
1999
Mahwah, NJ
Lawrence Erlbaum Associates
(pg. 
3
-
22
)
Park
D. C.
Willis
S. L.
Morrow
D.
Diehl
M.
Gaines
C. L.
Cognitive function and medication usage in older adults
Journal of Applied Gerontology
 , 
1994
, vol. 
13
 (pg. 
39
-
57
)
Patterson
T. L.
Lacro
J.
McKibbin
C. L.
Moscona
S.
Hughes
T.
Jeste
D. V.
Medication management ability assessment: Results from a performance-based measure in older outpatients with schizophrenia
Journal of Clinical Psychopharmacology
 , 
2002
, vol. 
22
 (pg. 
11
-
19
)
Penades
R.
Catalan
R.
Puig
O.
Masana
G.
Pujol
N.
Navarro
V.
, et al.  . 
Executive function needs to be targeted to improve social functioning with Cognitive Remediation Therapy (CRT) in schizophrenia
Psychiatry Research
 , 
2010
, vol. 
177
 (pg. 
41
-
45
)
Ready
R. E.
Stierman
L.
Paulsen
J. S.
Ecological validity of neuropsychological and personality measures of executive function
The Clinical Neuropsychologist
 , 
2001
, vol. 
15
 (pg. 
314
-
323
)
Reitan
R. M.
Wolfson
D.
The Halstead-Reitan Neuropsychological Test Battery: Theory and clinical interpretation
 , 
1993
2nd ed.
Tucson, AZ
Neuropsychology Press
Richardson
E. D.
Nadler
J. D.
Malloy
P. F.
Neuropsychologic predictors of performance measures of daily living skills in geriatric patients
Neuropsychology
 , 
1995
, vol. 
9
 (pg. 
565
-
572
)
Sadek
J. R.
Stricker
N.
Adair
J. C.
Haaland
K. Y.
Performance-based everyday functioning after stroke: Relationship with IADL questionnaire and neurocognitive performance
Journal of the International Neuropsychological Society
 , 
2011
, vol. 
17
 (pg. 
832
-
840
)
Salas
M.
Veld
I.
van der Linden
P.
Hofman
A.
Breteler
M.
Sticker
B. H.
Impaired cognitive function and compliance with antihypertensive drugs in the elderly: The Rotterdam study
Clinical Pharmacological Therapy
 , 
2001
, vol. 
70
 (pg. 
561
-
566
)
Salthouse
T. A.
Atkinson
T. M.
Berish
D. E.
Executive functioning as a potential mediator of age-related cognitive decline in normal adults
Journal of Experimental Psychology: General
 , 
2003
, vol. 
132
 (pg. 
566
-
594
)
Savage
C. R.
Baer
L.
Keuthen
N. J.
Brown
H. D.
Rauch
S. L.
Jenike
M. A.
Organizational strategies mediate nonverbal memory impairment in obsessive-compulsive disorder
Biological Psychiatry
 , 
1999
, vol. 
45
 (pg. 
905
-
916
)
Senanarong
V.
Poungvarin
N.
Jamjumras
P.
Sriboonroung
A.
Danchaivijit
C.
Udomphanthuruk
S.
, et al.  . 
Neuropsychiatric symptoms, functional impairment and executive ability in Thai patients with Alzheimer's disease
International Psychogeriatrics
 , 
2005
, vol. 
17
 (pg. 
81
-
90
)
Shimamura
A. P.
Neuropsychological perspectives on memory and cognitive decline in normal human aging
Seminars in the Neurosciences
 , 
1994
, vol. 
16
 (pg. 
387
-
394
)
Solomon
P. R.
Hirschoff
A.
Kelly
B.
Relin
M.
Brush
M.
DeVeaux
R. D.
, et al.  . 
A 7 minute neurocognitive screening battery highly sensitive to Alzheimer's disease
Archives of Neurology
 , 
1998
, vol. 
55
 
3
(pg. 
349
-
355
)
Stilley
C. S.
Bender
C. M.
Dunbar-Jacob
J.
Sereika
S.
Ryan
C. M.
The impact of cognitive functioning on medication management: Three studies
Health Psychology
 , 
2010
, vol. 
29
 
1
(pg. 
50
-
55
)
Strauss
E.
Sherman
E.
Spreen
O.
A compendium of neuropsychological tests
 , 
2006
3rd ed.
New York
Oxford University Press
Wechsler
D.
Wechsler Adult Intelligence Scale (3rd ed.) administration and scoring manual
 , 
1997
San Antonio, TX
The Psychological Corporation
Willis
S. L.
Dolan
M. M.
Bertrand
R. M.
Park
D. C.
Morrell
R. W.
Shifren
K.
Problem solving on health-related tasks of daily living
Processing of medical information in aging patients: Cognitive and human factors perspectives
 , 
1999
Mahwah, NJ
Lawrence Erlbaum Associates
(pg. 
199
-
219
)
Yesavage
J. A.
Brink
T. L.
Rose
T. L.
Lum
O.
Huang
V.
Adey
M.
, et al.  . 
Development and validation of a geriatric depression screening scale: A preliminary report
Journal of Psychiatric Research
 , 
1983
, vol. 
17
 (pg. 
37
-
49
)
Zwahr
M. D.
Park
D. C.
Morrell
R. W.
Shifren
K.
Cognitive processes and medical decisions
Processing of medical information in aging patients: Cognitive and human factors perspectives
 , 
1999
Mahwah, NJ
Lawrence Erlbaum Associates
(pg. 
3
-
22
)