Reliability and Validity of the Balance Evaluation Systems Test (BESTest) in People With Subacute Stroke

Background The Balance Evaluation Systems Test (BESTest) is a new clinical balance assessment tool, but it has never been validated in patients with subacute stroke. Objective The purpose of this study was to examine the reliability and validity of the BESTest in patients with subacute stroke. Design This was an observational reliability and validity study. Methods Twelve patients participated in the interrater and intrarater reliability study. Convergent validity was investigated in 70 patients using the Berg Balance Scale (BBS), Postural Assessment Scale for Stroke (PASS), Community Balance and Mobility Scale (CB&M), and Mini-BESTest. The receiver operating characteristic curve was used to calculate the sensitivity, specificity, and accuracy of the BESTest, Mini-BESTest, and BBS in classifying participants into low functional ability (LFA) and high functional ability (HFA) groups based on Fugl-Meyer Assessment motor subscale scores. Results The BESTest showed excellent intrarater reliability and interrater reliability (intraclass correlation coefficient=.99) and was highly correlated with the BBS (Spearman r=.96), PASS (r=.96), CB&M (r=.91), and Mini-BESTest (r=.96), indicating excellent convergent validity. No floor or ceiling effects were observed with the BESTest. In contrast, the Mini-BESTest and CB&M had a floor effect in the LFA group, and the BBS and PASS demonstrated responsive ceiling effects in the HFA group. In addition, the BESTest showed high accuracy as the BBS and Mini-BESTest in separating participants into HFA and LFA groups. Limitation Whether the results are generalizable to patients with chronic stroke is unknown. Conclusion The BESTest is reliable, valid, sensitive, and specific in assessing balance in people with subacute stroke across all levels of functional disability.

B alance control is a complex integration of multiple systems, including sensory and motor systems, sensorimotor integration, and high-level premotor processing-all of which can be affected following stroke. 1 Laboratory studies have shown that muscle paresis and spasticity can impair postural alignment, base of foot support, and forces necessary to correct postural equilibrium. [2][3][4] Most patients with stroke also demonstrate decreased ankle proprioception and somatosensation at the feet, which are critical for balance control. 2,4 In addition, patients with stroke have difficulties in central processing for sensory integration and sensory reweighting. 5 They demonstrate more postural sway in the conditions of altered somatosensory information and altered vision, 6 such as walking on uneven surfaces and walking in the dark. Anticipatory postural adjustments and automatic postural responses to restore the body's equilibrium during perturbations are reduced on the paretic side. [7][8][9][10] Cognitive impairments can lead to inadequate allocation of attention and to greater fall risk in this population. 2,11 These studies suggest that postural control impairments in patients with stroke arise from diverse mechanisms involved in balance control. Therefore, a balance assessment that could capture all underlying balance impairments is necessary.
At present, balance evaluation for assessing patients with stroke in clinical settings involves ordinal scales to assess a set of functional tasks that require balance while maintaining posture or changing position. 12 Examples of the most commonly used functional scales in patients with stroke include the Berg Balance Scale (BBS) and the Postural Assessment Scale for Stroke Patients (PASS). 13 These functional balance scales are useful in reporting the presence of balance impairments, but they cannot identify the type of impairments underlying balance problems.
Recently, the Balance Evaluation Systems Test (BESTest) has been developed based on a conceptual model of balance control in which 6 different systems or domains contribute to balance control. The BESTest evaluates the following systems: biomechanical constraints, stability limits and verticality, anticipatory postural adjustments, automatic postural responses, sensory organization, and stability in gait. 14 The 36 items of the BESTest are scored from 0 (severe impairment) to 3 (no impairment). The total score is calculated as a percentage of the points scored out of 108 total points. 14 The advantage of the BESTest is that it includes tasks involving many different mechanisms underlying the control of balance so that clinicians can determine the type of balance retraining intervention specific to the underlying causes of balance problems. The BESTest has shown excellent interrater reliability and validity and the potential to differentiate types of balance deficits in patients with Parkinson disease (PD), vestibular loss, and peripheral neuropathy. 14 However, the BESTest has not yet been validated to assess balance performance in patients with stroke.
The Mini-BESTest, the shorter version of the BESTest, was developed to reduce the assessment time. 15 Item redundancy and the first 2 sections (biomechanical constraints and stability limits with verticality) from the BESTest were removed based on Rasch analysis to yield the assessment of a common construct, "dynamic balance," in the Mini-BESTest. 15 The 14-item Mini-BESTest is scored on a 3-level ordinal scale from 0 (poor balance performance) to 2 (no balance impairment). Several studies have shown that the Mini-BESTest also was reliable and valid 16 -18 and useful for fall prediction in patients with PD. 19,20 The use of the Mini-BESTest in communitydwelling people with chronic stroke has recently been reported, with excellent interrater and intrarater reliability and high correlations with other balance measures such as the BBS and the Timed "Up & Go" Test but without the floor and ceiling effects seen in these traditional balance tests. 21 However, the feasibility of the Mini-BESTest for assessing balance across lower levels of subacute stroke impairment severity has not been reported.
The purposes of this study were: (1) to assess the reliability and convergent validity of the BESTest in patients with subacute stroke and (2) to determine whether the BESTest could be used to identify patients with low and high functional ability, as classified with the Fugl-Meyer Stroke Assessment motor subscale (FM-motor). We hypothesized that the BESTest would be reliable and well correlated with other balance scales such as the BBS, PASS, and Mini-BESTest. Furthermore, we predicted that the BESTest would have fewer floor and ceiling effects than the other clinical balance tests, so that the BESTest will be more feasible for people with stroke across a wider range of motor disability. Finally, we hypothesized that the lack of musculoskeletal and limits of stability/verticality domains in the Mini-BESTest may lead to a floor effect in patients Available With This Article at ptjournal.apta.org who have low functional ability because these domains are commonly affected in patients with subacute stroke.

Method Participants
Participants were patients with subacute stroke (defined as stroke of less than 4 months since onset 22 ) referred for physical therapist services at Prasart Neurological Institute in Thailand from November 1, 2012, to August 31, 2013. Individuals were included in this study if they passed the screening tests by a physician using the following criteria: diagnosis of cerebral hemorrhage or cerebral infarction with stable medical conditions, age between 25 and 90 years, first unilateral hemispheric stroke, and able to follow instructions to complete the assessment. Individuals were excluded from the study if they had cognitive impairment (defined as having a Mini-Mental State Examination [MMSE] score of less than 24), 23,24 cerebral aneurysm, a lesion at the brain stem involving the sleep-wake and respiratory control center or cerebellum, aphasia as diagnosed by a physician using a Thai adaptation of the Western Aphasia Battery, 25 a neurological disorder other than stroke, or the presence of a major peripheral neuropathy or musculoskeletal problem that was sufficient to disturb balance. All participants signed a consent form approved by the Human Research Protection Committee at Prasart Neurological Institute Research Center, Thailand.

Outcome Measures
To determine whether the balance tools could be used to separate participants into high-and lowfunctioning groups, the FM-motor was selected to measure functional recovery from stroke. 26 The FM-motor is scored from 0 to 100 points, with each item scored on a 3-point ordinal scale ranging from 0 (cannot perform) to 2 (performs fully). The FM-motor score is used to classify patients with stroke into 4 groups according a level of disability: a score of 0 to 35 represents severe disability, a score of 36 to 55 represents moderately severe disability, a score of 56 to 79 represents moderate disability, and a score of 80 or higher represents mild disability. 27 In this study, we referred to patients with stroke who had FM-motor scores between 0 and 55 as the low functional ability group (LFA group) and to those who had FM-motor scores greater than 55 as the high functional ability group (HFA group).
Five balance evaluation scales-BESTest, Mini-BESTest, BBS, PASS, and Community Balance and Mobility Scale (CB&M)-were administered to each participant. The BBS is a widely used clinical scale for functional balance in the clinic and is currently considered as a reference standard for assessing balance in patients with stroke. The BBS consists of 14 functional balance items that focus on the ability to maintain a position and postural adjustments to voluntary movements. 28 The total possible score on the BBS is 56 points, with ratings ranging from 0 (unable to perform) to 4 (able to complete the task). 13 The PASS is a 12-item test designed to measure balance function in patients with stroke, especially those with very poor motor function. The 4-point items, creating the total score of 36, assess functional balance performance in maintaining or changing lying, sitting, and standing postures. 29 The CB&M consists of 19 tasks, including advanced functional balance and mobility activities aimed at ambulatory or communitydwelling people with stroke. 30 Items are scored on ordinal scale ranging from 0 (unable to perform) to 5 (able to perform independently).

Procedure
Reliability. Interrater and intrarater reliability were determined by 5 physical therapists. Raters were a convenient sample of 3 physical therapists from the stroke rehabilitation center with stroke rehabilitation experience of 1, 5, and 10 years, and 2 PhD physical therapist students from Srinakharinwirot University and Mahidol University. Prior to the study, all raters participated in 2 training workshops on using the BESTest. The training workshops were carried out by a therapist who received training from the BESTest developer. Each item of the BESTest was demonstrated on individuals who were healthy, and the rating criteria were discussed. All raters also scored each participant's performance from a demonstration video, and rating criteria were discussed again if there was a discrepancy. When there were differences in the rating scores, the raters were asked to watch the videotape with the trainer, who encouraged the discussion until all raters reached agreement.
After the training workshops, a BESTest reliability trial was performed in 12 patients with stroke. The sample size calculation for this study was based on a power of 0.80 and an alpha level of .05. A null intraclass correlation coefficient (ICC) of .60 31 and expected correlation coefficients in our study were based on the correlation coefficient of rϭ.93 found in a previous balance measure reliability study. 32 The BESTest was administered by one rater (B.C.), and the procedures were video recorded. The evaluation was performed in a laboratory setting at Prasart Neurological Institute, and videotape recording was performed in the same view for all participants. The rater used the same testing instructions to standardize performance. Before testing, the vital signs of the participants were evaluated to Validation of BESTest in Stroke ensure stable medical status. All participants received the same verbal instructions. Participants were permitted to rest as needed during the test.
All raters scored each participant's performance from videotape on 2 separate occasions. The second scoring was performed 7 days after the first scoring. Intrarater reliability of total scores and section scores was assessed by comparing the score of occasion 1 and score of occasion 2 for each rater. Interrater reliability was determined by comparing the scores from occasion 1 for all 5 raters. Each rater scored the videotape on separate scoring sheets for each occasion and did not discuss scoring among participants and occasions.
Validity. Prior to the validity study, the raters (B.C., N.C.) received additional training for using the Mini-BESTest, BBS, PASS, CB&M, and FM-motor. The convergent validity sample size was calculated from a null correlation coefficient of .50 33 and acceptable correlation of .80 from a previous study, 32 yielding at least 29 participants per group. After signing the approved consent forms, each participant's demographic and clinical information, including Barthel Index (BI) score and FM-motor score, was recorded by one rater (N.C.). Seventy patients, 35 in each group, participated in this study. A balance assessment of each participant was then performed by the second rater (B.C.), who was blinded to the basic characteristics and the FM-motor scores of the participants. The sequence of the balance tests was randomly performed following the order of the starting position: from lying to sitting to standing and walking. Any test item that was duplicated between the tests was performed only once and scored using the criteria for each test. 34 The evaluation was performed in the same laboratory setting, and all par-ticipants received the same verbal instructions. Rest was allowed during performance of each item to avoid fatigue. Total assessment time was approximately 1.5 hours, but if the testing could not be completed in 1 day, it was continued on the next day. To ensure the accuracy of the scoring, the rater (B.C.) reviewed and scored each participant's test performance from the videotape again. In case of disagreement between concurrent rating and video rating, the second rater (N.C.) was asked to score those items from the videotape to decide the final score.

Data Analysis
A descriptive statistical analysis of demographic and baseline clinical characteristics of the participants was conducted. An independentsample t test was used to compare age, time since stroke, BI score, and BBS score, whereas the Mann-Whitney U test was used to compare MMSE score, FM-motor score, and other balance measurement scores between the LFA and HFA groups. Interrater and intrarater reliability values were calculated based on the participants' ICCs. 35 The ICC model 2,k was used for interrater reliability, and the ICC model 3,k was used for intrarater reliability. An ICC value of .80 or higher indicates excellent correlation (good reliability), values of .60 to .80 indicate adequate correlation (moderate reliability), and values of .40 to .60 indicate poor correlation (weak reliability). 31,36 To determine the convergent validity of the BESTest, the strength of the relationship among the BESTest, Mini-BESTest, PASS, BBS, and CB&M was examined using Spearman rank order correlations. Correlation coefficients of .00 to .49 were interpreted as poor, those of .50 to .79 as moderate, and those of 0.8 or higher as excellent. 33 The floor and ceiling effects were calculated as the per-centage of the sample scoring the minimum or maximum possible scores, respectively. Ceiling and floor effects of 20% or greater are considered significant. 37 In this study, we also recorded the "responsive ceiling effect," which represents the 20% or greater percentage of participants who scored within the top 10% of the test and allows a sufficient number of scores to measure change over time. 34 The percentage of minimum and maximum scores for each item of the BESTest was calculated as the percentage of participants who received a minimum score (0) or a maximum score (3) for each item. These percentages represented the difficulty of the item, such that a high percentage (Ͼ80%) of minimum score reflected a difficult item and a high percentage of maximum score indicated an easy item.
Receiver operating characteristic (ROC) curves were used to determine the relative performance of the BESTest score, the Mini-BESTest score, and the BBS score for classifying participants into 2 groups based on their FM-motor score to discriminate between low functional ability (FM-motor score Յ55) and high functional ability (FM-motor score Ͼ55). The accuracy for each balance test was assessed using the area under the curve (AUC), which can be interpreted as the probability of correctly identifying participants with low or high function. An AUC value of 0.9 and above indicates high accuracy, 0.7 to 0.9 indicates moderate accuracy, 0.5 to 0.7 indicates low accuracy, and 0.5 and below indicates test due to chance. 38 The cutoff point value was chosen by selecting the score that provides the best balance between high sensitivity and high specificity. 35 Furthermore, likelihood ratios and accuracy were used to determine posttest probabilities that helped to improve certainty for confirming the diagnosis or abandoning it. A positive likelihood ratio

Validation of BESTest in Stroke
(LRϩ) was the probability of a participant having a score above the cutoff point, whereas a negative likelihood ratio (LRϪ) was the probability of a participant having a score below the cutoff point. An LRϩ over 5 and an LRϪ less than 0.2 indicate that the test is useful due to its high probability of correctly classifying patients into functional ability groups, but likelihood ratios close to 1.00 indicate that the tests are useless, as the probability of correctly or incorrectly classifying patients into groups is the same. 35 The accuracy of the selected cutoff point in classifying patients into functional ability groups was calculated as the proportion of the participants in the correctly identified group when using that cutoff point for all participants. 35

Role of the Funding Source
This study was supported by the Thailand Research Fund, the Office of Higher Education Commission, and Srinakharinwirot University (grant no. RSA5580002).

Reliability
Twelve patients with subacute hemorrhagic or ischemic stroke (

Convergent Validity
The eFigure (available at ptjournal. apta.org) depicts the number of eligible participants, included participants, and excluded participants with the reasons for exclusion. One hundred thirteen patients with subacute stroke met the eligibility criteria, and 70 patients consented to participate in the validity testing. Of the 43 patients who were not included in this study, 30 did not require continued rehabilitation, 5 had cognitive impairment or severe motor aphasia, 2 had severe internal carotid artery occlusion that required urgent surgery, and 6 had major musculoskeletal problems, including osteoarthritis and gouty arthritis. The majority of the participants (nϭ67) completed the validity testing within 1 day. Demographic and clinical characteristics of participants in the validity study are shown in Table 1. Participants in the LFA and HFA groups were similar in age, sex, and cognitive impairment as measured with the MMSE, but those in the LFA group had a longer time since stroke and significantly lower scores on all of the clinical balance tests.

Score Distribution Across Levels of Functional Ability
The distribution of participants' scores for the balance measures across levels of functional ability is shown in Figure 2.

Item Scores of the BESTest
The scores of each item and section of the BESTest compared between the LFA and HFA groups are shown in Table 2. In general, the section scores were significantly lower in the LFA group, suggesting that the BESTest was able to differentiate between the 2 groups. Several items in the BESTest were too hard to perform for the LFA group, as shown by the percentage of minimum score of more than 80%. These items included sit on floor, rise to toe, stand on one leg, alternate stair touching, compensatory postural response in all directions, stand on foam with eyes closed, and all items in the "stability in gait" section, with the exception of change in gait speed. In contrast, there were only 2

Validation of BESTest in Stroke
items (base of support and sitting verticality) that were shown to be easy (Ͼ80% of maximum score) in both LFA and HFA groups.
Comparison of the percentage of minimum and maximum scores between the LFA and HFA groups also suggested the feasibility of each item to be used in all levels of functional ability. For example, 31.4% of the participants in the LFA group compared with 71.4% of the participants in the HFA group received the maximum score on item 6 (lateral lean). A similar trend was evident with the minimum score, where there was a decrease in percentage of minimum score in the HFA group compared with the LFA group. Such a pattern was found for all items of the BESTest, except for 3 items (base of support and sitting verticality, left and right).

Identification of Functional Ability Level
The

Validation of BESTest in Stroke
BESTest had higher probability of correctly classifying patients according to level of functional ability compared with the BBS.

Discussion
This is the first study to examine whether the BESTest is reliable and valid to assess balance impairments in patients with subacute stroke. Our results confirmed that the BESTest has excellent interrater and intrarater reliability, as well as excellent convergent validity, compared with commonly used clinical balance scales in patients with stroke. The high intrarater and interrater reliability of the BESTest in patients with subacute stroke agreed with previous studies in patients with neurological conditions or healthy elderly people. 14 Although all of the clinical balance measures observed in this study can be used to assess balance impairments in patients with subacute stroke, this study demonstrated that the BESTest may be more desirable than other balance scales due to its lack of floor and ceiling effects. The BESTest was shown to be suitable for patients with subacute stroke across many levels of functional ability, as evidenced by the distribution of BESTest scores that covered the   The BESTest was developed to provide information on which particular balance systems were underlying balance impairments. Results from our study confirmed that balance deficits in patients with subacute stroke covered impairments of all postural control subsystems represented in the BESTest (Tab. 2), but the amount of subsystem deficit varied depending on the level of motor impairment. For example, patients with LFA had deficits in all postural control systems (mean score of the BESTest was less than 50%), except the stability limit and verticality subsystems. In contrast, patients in the HFA group showed more deficits in stability in gait. Therefore, the use of the BESTest in patients with subacute stroke can guide clinicians to tailor their intervention to impairments of specific postural control subsystems. Although we did not study the responsiveness of the BESTest, the differences in the item scores between the LFA and HFA groups indicated the potential of the BESTest to detect change in performance. All items in the BESTest, with the exception of base of support and sitting verticality (left and right), showed high median scores and high percentage of maximum scores in the HFA group compared with the LFA group. Our participants received a daily rehabilitation program that included interventions to maintain joint range of motion and prevent deformity. Therefore, deformity or pain in the feet was rare in our participants, resulting in an initially high score on the base of support item. Inaccurate representation of vertical-ity is often found in patients with stroke who have neglect or pusher syndrome. 42,43 However, none of our participants had been diagnosed with either neglect or pusher syndrome, and they all had perfect scores on the verticality items.
In this study, we also questioned whether the scores from the BESTest, Mini-BESTest, and BBS could classify patients with a high versus low level of functional ability as measured with the FM-motor. This study demonstrated that the scores from all 3 balance scales could be used to classify patients with stroke into 2 functional ability groups. However, the slightly larger LRϩ values suggest the BESTest may be preferable to the BBS and Mini-BESTest for functional classification. Although the time for administering the BESTest (30 -35 minutes) is twice as long as for the Mini-BESTest or the BBS (10 -15 minutes), the information gathered with the BESTest is more useful compared with the BBS and Mini-BESTest in customizing balance intervention and more sensitive than the Mini-BESTest to detect changes in balance performance. However, we question whether some items in the BESTest that had maximum scores in the first assessment may have a floor effect and could limit item responsiveness in patients with subacute stroke. Therefore, further analysis, such as Rasch analysis, may be required on BESTest data from patients with subacute stroke to eliminate the least sensitive and redundant items, which will shorten the assessment time. Future studies about sensitivity to detect change and minimal detectable changes of the BESTest also are essential for evaluating the effectiveness of treatment programs. Furthermore, our study was carried out only in patients with early stroke; thus, the results cannot be generalized to patients with chronic stroke.

Validation of BESTest in Stroke
In conclusion, the BESTest is a reliable, valid, and comprehensive measure of balance that can be used in people with subacute stroke throughout different levels of functional ability without floor and ceiling effects.
Ms Chinsongkram, Dr Viriyatharakij, Dr Saengsirisuwan, Dr Horak, and Dr Boonsinsukh provided concept/idea/research design. Ms Chinsongkram, Dr Horak, and Dr Boonsinsukh provided writing. Ms Chinsongkram, Ms Chaikeeree, and Dr Saengsirisuwan provided data collection. Ms Chinsongkram and Dr Boonsinsukh provided data analysis. Dr Boonsinsukh provided project management and fund procurement. Ms Chaikeeree and Dr Boonsinsukh provided facilities/equipment. Dr Viriyatharakij and Dr Horak provided consultation (including review of manuscript before submission).