Keitel Functional Test for patients with rheumatoid arthritis: translation, reliability, validity, and responsiveness.

Background and Purpose: The purpose of this study was to translate the German Keitel Functional Test (KFT) into Danish and test it for reliability, concurrent and predictive validity, and responsiveness in patients with rheumatoid arthritis (RA). Methods: Translation of the KFT was performed according to international recommendations, and the translated version was tested twice by 2 observers for intraobserver and interobserver reliability, with a 1-week interval between assessments, in 20 patients with RA with stable disease activity. Validity was investigated by studying 2 patient groups: (1) 15 patients with long-lasting (median=6 years) active RA, tested before and after 2, 6, and 14 weeks of anti-tumor necrosis factor alpha (TNF-α) inhibitor therapy, and (2) 35 patients with early (median=0.25 year) RA, tested at years 0, 0.5, 1, and 2. Twenty-three patients in the early RA group also were tested at year 7. KFT, conventional clinical and biochemical markers of disease activity, and Health Assessment Questionnaire (HAQ) were used. Results: The translated KFT showed good intraobserver reliability (intraclass correlation coefficients [ICC]=.90 and .95, coefficient of variation [CV]=3.5%) and interobserver reliability (ICC=.99 and .92, CV=3.5%), and the KFT correlated with several measures of disease activity and, most closely, with the HAQ. The KFT was, in contrast to clinical disease activity measures, not sensitive to changes over time. Only baseline KFT data were significantly related to functional changes over a long period of time as measured by the KFT, and only in the early RA group. Discussion and Conclusion: The Danish translation of the KFT showed good reliability, acceptable concurrent validity, very poor responsiveness, and inconclusive results concerning predictive validity. The results of this study do not support the use of the KFT for monitoring function in clinical practice, as an outcome measure in clinical trials, or as a predictor of functional changes.

R heumatoid arthritis (RA) is a common inflammatory joint disease characterized by pain, swelling, and reduced mobility of the joints, followed by various degrees of functional impairment. In all stages of the disease, physical therapy intervention and medication are considered to be important cornerstones of treatment. Despite treatment, limitations in physical functions and restrictions in daily activities and social participation, including paid work, are often seen, and approximately half of the patients leave the labor force within 6 to 10 years of disease onset. 1,2 To optimize interventions aimed at maintaining physical function and minimizing disability, a proper understanding of patients' functioning and health status is needed. 3 The International Classification of Functioning, Disability and Health (ICF), 4,5 a model developed by the World Health Organization (WHO), provides a useful framework for this purpose. This model refers to dimensions in a person's life at different levels (ie, participation, activity, body functions, or body structure), 4,5 and it can be used to assess effects of different treatments at different levels. At the level of participation in society, health status and quality-of-life measures have been developed. At the level of activity, the Health Assessment Questionnaire (HAQ), which is a well-known, validated, self-reporting system for evaluating activities of daily living, 6 -8 often is used. The HAQ has been reported to have good predictive value for physical disability 1,9 -11 and is sensitive to changes over time, 9 but it is not an appropriate instrument for assessing changes in physical impairment after short-term exercise therapy. 12 At the level of body structure, several measures (ie, measurements of joint swelling, pain, and radiological changes and biochemical markers) are in use. 13 In contrast, there are very few reliable and validated RA-specific measures at the level of body function, which might be useful to show the effects of physical therapy intervention on daily functioning. A German functional performance test, the Keitel Functional Test (KFT), has been developed for use in patients with RA. The KFT is based on range of motion and muscle activity, assessing 24 simple movement patterns for both upper and lower extremities. The 24 items are graded with a scoring system in which an index value of 100 points corresponds to normal functional ability. The test can be performed in 15 to 20 minutes and does not require any special instruments (Appendix). 14 This RA-specific measure of impairment of body functions has for several years been used by physical therapists in Denmark, for both inpatients in hospitals and outpatients in rehabilitation clinics, without prior validation. The KFT has been described as an outcome measure 15 and as a gold standard for evaluation of a new index of hand function. 16 It has been shown to have good concurrent validity, 17 especially when used with the HAQ, 18 -20 and it has been reported to be a strong predictor of mortality. 21 Previous studies 22,23 have shown the KFT to be sensitive for detecting changes after 0.5 and 1.5 years of treatment with disease-modifying antirheumatic drugs (DMARDs), although its sensitivity for detecting changes over shorter (weeks to months) and longer (years) periods of time has not been systematically investigated. Changes during treatment with novel RA therapies, such as use of tumor necrosis factor alpha (TNF-␣) inhibitors, have not been studied.
Because the KFT provides an overall picture of functional limitation and was developed to detect functional changes over time, 14 it may be a useful measure of impairment of body functions in both clinical practice and research. However, the KFT needs to be sufficiently validated. The purpose of this study was to validate a translated version of the KFT for reliability, concurrent and predictive validity, and responsiveness.

Method
This study of KFT involved 3 main stages: (1) translation, including field testing, (2) assessment of reliability, and (3) assessment of validity and responsiveness.

Translation, Including Field Testing
The test was translated from the original German language into Danish following international recommendations, 24 including a "bilingual panel," a "professional panel," a practical field test, and finally a backtranslation procedure. This was done in 1996.
Field testing. A pilot study with 4 patients with RA and 4 physical therapists was carried out to assess the new translation of the KFT. Four patients with RA at Copenhagen University Hospital at Hvidovre were included (1 male, aged 55 years, and 3 females, aged 35-78 years). The patients were in functional class II to III 25 and showed a variation in disease activity. Four physical therapists from the Department of Physiotherapy, Copenhagen University Hospital at Hvidovre, (females, aged 26Ϫ55 years), who were not specialized in treating patients with RA and who had no knowledge of the KFT tested the 4 patients. They were introduced to the KFT by reading and performing all test items once. If doubts in interpretations occurred during testing, the test was returned to the professional panel to adjust the language to produce the final version.
Back translation. The final version was translated back to German by a Danish rheumatologist whose first language was German. The original version and the back-translated version were given to the bilingual panel, who compared the 2 texts to examine whether the meaning was identical for all items. If this was confirmed, the translated KFT was finally approved.

Assessment of Reliability
Procedure. To determine the intraobserver and interobserver reliability of the translated KFT, 20 patients with RA were tested 4 times (by 2 observers at 2 time points, with a 1-week interval between tests, randomized into 4 different sequences). This was carried out in 1997.
Patients. By review of patient files, 40 patients with RA and with unchanged medical therapy during the last 3 months were identified at the Department of Rheumatology, Copenhagen University Hospital at Hvi-dovre. A random sample of 20 patients (17 women and 3 men, median ageϭ64 years [rangeϭ26 -78], median disease durationϭ6 years [rangeϭ 2-48]) was included and evaluated as described above. Patients were excluded if they had changed therapy or reported a change in disease status during the 1-week interval between tests.
Observers. Two physical therapists from the Department of Physiotherapy, Copenhagen University Hospital at Hvidovre, were the observers. One physical therapist was very experienced in treating patients with RA and was familiar with the KFT. The other physical therapist had never seen or tried testing with the KFT. She first was introduced to the KFT and then administered the test, under supervision, to one patient. The 2 observers were masked to the functional level and previous test results of the patients.
Statistical analysis. The intraobserver and interobserver intraclass correlation coefficients (ICCs) were calculated. In addition, the variance between the 2 tests performed by the same observer and the 2 tests performed by 2 different observers was calculated using the coefficient of variance (CV).
For later evaluation of the ability of the KFT to show changes over time, the smallest detectable difference (SDD) was calculated. The SDD is derived from the limits of agreement method, representing the smallest change in score that can be discriminated from the measurement error of the scoring method. Use of the SDD as the threshold level for a certain increase or decrease in scores of functional changes ensures that changes are not due to random variability or measurement error. The SDD is based on the 95% limits of agreement, as described by Bland and Altman. 26

Assessment of Validity and Responsiveness
Criterion validity is the agreement with concurrent and future standards, defined as the degree to which a measure truly reflects a gold standard. 27 There are 2 types of criterion validity: concurrent validity and predictive validity. Concurrent validity is the degree to which a measure reflects a gold standard applied at the same time (eg, pathologic evidence of joint inflammation and destruction), and predictive validity is the degree to which a measure predicts a future gold standard outcome (eg, functional impairment). 27 Furthermore, we looked at responsiveness or sensitivity to change, which means the ability of a measure to detect clinically important degrees of change. Both variation in measurements over time (eg, treatment induced) and sufficient reproducibility to allow a reliable detection of this change are required. 27,28 Procedure. To investigate concurrent and predictive validity and responsiveness, 2 groups of patients with RA from the outpatient clinic at the Department of Rheumatology, Copenhagen University Hospital at Hvidovre, were recruited and followed.
Patients. One group (the anti-TNF-␣ group) consisted of 15 patients (14 women and 1 man, median ageϭ45 years, rangeϭ23-62) with long-lasting RA (median disease du-rationϭ6 years, rangeϭ0 -36). They were examined before treatment, at week 0, and during treatment with the TNF-␣ antagonist infliximab after 2, 6, and 14 weeks. This part of the study was carried out in 2005. Another group (the TIRA group) consisted of 35 patients (28 women and 7 men, median ageϭ55 years, rangeϭ20 -82) with early, relatively mild RA (median disease dura-tionϭ0.25 years, range 0 -2). They were included in the Danish TIRA

Keitel Functional Test for Patients With Rheumatoid Arthritis
Group study 29 and treated according to a protocol aimed at maximal inflammatory suppression with nonsteroidal anti-inflammatory drugs, DMARDs, corticosteroids, and, when available and necessary after 2 years, with biological treatment. The patients were examined before and after 0.5, 1, 2, and 7 years of therapy. The part of this study was carried out from 1998 to 2005.
Tests. The KFT was performed at all test sessions using the newly developed Danish KFT manual as described above. Furthermore, conventional clinical and biochemical measurements of disease activity were obtained, as recommended by the European League Against Rheumatism. 13 These measurements consisted of the HAQ score, 8 which is related to the patient's activity level according the ICF framework, and 6 parameters of disease activity: number of swollen joints, number of tender joints, patient's pain on a visual analog scale (VAS), patient's and physician's global assessments of disease activity on a VAS, and an acute phase reactant, the serum C-reactive protein (CRP). These 6 parameters of disease activity are related to the body structure level according to the ICF framework. In addition, the 28-item Disease Activity Score (DAS-28)-a composite measure combining number of tender joints, number of swollen joints, patient's global assessment of disease activity, and CRP-was calculated, 13 and, for the TIRA group patients, the examination program was supplemented with conventional radiography of the hands and wrists, scored using the method described by Larsen et al. 30 These tests were selected because they are related to the body structure and activity levels according to the ICF framework and are most closely related to the body function level, which is measured by the KFT. Because no gold standard at this level exists, the disease activity measures and particularly the HAQ score were the best available comparators for the KFT.
Test procedure. Except for radiographs, all tests of the individual patients were performed on the same day. The physical therapists were masked to previous KFT results and other clinical data.
Observers. Only experienced physical therapists administered the KFT. In the anti-TNF-␣ group, 1 physical therapist did the testing of 10 patients, and 2 physical therapists tested 5 patients each. In the TIRA group, 4 different physical therapists administered the tests during the first 2 years. The 7-year follow-up tests were administered by the same physical therapist.
Statistical analysis. Because all dependent variables were not normally distributed (Kolmogorov-Smirnov test), nonparametric tests were applied. Medians and ranges were used in the analysis. The KFT results were compared with the results of all conventional methods at all test times using the Spearman coefficient of correlation () to illustrate the concurrent validity. To assess responsiveness, the standardized response mean (SRM; smallϭϽ0.5, mediumϭ0.5-0.8, largeϭϾ0.8) 31 was used and changes from baseline were tested using the Wilcoxon signed rank test. Statistical significance was defined as PՅ.05. Assessment of the predictive value was applied using a forward stepwise regression analysis, with changes in functional ability, measured by the KFT and the HAQ, as the dependent variable. Baseline values for the KFT, the HAQ, and the other clinical parameters were included in the regression analysis.

Ethics
The study protocol was approved by the local ethics committee, and the patients provided informed consent after receiving verbal and written information.

Translation
The bilingual panel experienced only a few problems in translating the 24 functional tasks of the KFT. Only 1 task was sent to the professional panel because of alternative translations. The professional panel agreed on the text of this particular task and corrected a few minor linguistic errors.
The results of the field testing showed that 78 of the total of 86 possible answers in the KFT were used, and no problems in understanding the tasks occurred. However, the 4 physical therapists suggested that the differences between the "normal" speed and the "reduced" speed of a task (as seen in 4 tasks for the lower extremities and 2 tasks for transfer) should be explained specifically at the introduction of the test. The back translation and the original text were not identical, but no differences in task performance occurred when instructing by each of the 2 texts.

Reliability
Twenty patients with RA participated in the reliability testing. Twelve patients were classified as being in functional class II, 7 in class III, and 1 in class I. 25 In accordance with the inclusion criteria, no patient had changes in medication, number of painful joints (medianϭ 10, rangeϭ0 -38), or morning stiffness (medianϭ1, rangeϭ0.5-2 hours) within the 1-week interval between assessments. Results from the reliability cohort are presented in Table 1.
For intraobserver agreement, the ICCs for observer A and observer B were .95 and .90, respectively. The mean CV was 3.5% (3.4% for observer and 3.6% for observer B). For interobserver agreement, the ICCs

Keitel Functional Test for Patients With Rheumatoid Arthritis
for test times 1 and 2 were .92 and .99, respectively. The mean CV was 3.5% (4.8% for test time 1 and 2.1% for test time 2). The intraobserver and interobserver SDDs were 9.7 and 9.3 points, respectively.

Validity and Responsiveness
In this part of the study, 2 groups of patients participated. In the anti-TNF-␣ group, 13 of the 15 patients completed the study. Two patients showed lack of efficacy and wanted to discontinue at the follow-up assessment. In the TIRA group, 23 of the 35 patients completed the 7-year follow-up assessment. Reasons for the 12 dropouts were that 5 patients had moved to another area of the country, 3 patients did not respond to the written request, 2 patients were living in residential homes for elderly people, and 2 patients had died. As assessed by the KFT and the HAQ, the functional level of both groups of patients was reduced. Baseline values of the various parameters are represented in Table 2.

Concurrent validity.
In the anti-TNF-␣ group, the highest correlation coefficients were generally found between the KFT and the HAQ and between the KFT and the CRP. The correlation coefficients were generally higher in the TIRA group than in the anti-TNF-␣ group, with the highest observed value between KFT and HAQ. At baseline, the KFT showed correlation coefficients of at least .50 for the HAQ, patient's pain, physician's global assessment of disease activity, CRP, and DAS-28 (Tab. 3).

Responsiveness.
In both patient groups, all clinical parameters of disease activity and the HAQ showed, as expected, significant improvements from baseline. Most SRMs were large (Ͼ0.8), indicating good responsiveness (Tab. 2). In contrast, the KFT showed no significant changes from baseline at any time point except after 2 years in the TIRA group, and SRMs were low, except for medium SRMs (0.5-0.80) at 14 weeks in the anti-TNF-␣ group and at 2 years in the TIRA group, indicating poor responsiveness. Table 4 shows the proportion of patients who had an SDD of 10 or higher for change in KFT scores. In the anti-TNF-␣ group, the largest number of patients with an SDD of 10 or higher for change in KFT scores was 54% at week 14. In the TIRA group, the largest number of patients with an SDD of 10 or higher for change in KFT scores was 56% at year 7.
Predictive validity. In the anti-TNF-␣ group, the regression analysis showed no significant results (ie, no baseline parameters could predict KFT changes over time). In the TIRA group, the baseline KFT scores could explain 41% of the change in KFT scores after 2 years (PϽ.001) and 28% of the change in KFT scores after 7 years (PϽ.05). Baseline KFT scores could not predict changes measured by the HAQ in the 2 groups.

Discussion
Impairment of physical function is an important aspect in the evaluation of RA. There is a lack of observational tests of physical performance designed for use in patients with RA that have been tested for reliability, validity, and responsiveness. In the present study, the KFT, which focuses on detection of functional impairment in the trunk and the extremities, was translated into Danish, and reliability and various aspects of validity were tested. The KFT correlated with a disability measure (ie, the HAQ), with measures of disease activity, and with radiographic assessment of structural joint damage. Regression analysis showed the KFT to be only a predictor of future development of functional changes over long periods of time, but the KFT was not sufficiently sensitive to show changes over time. Thus, the study did not support the use of the KFT for monitoring function in clinical practice, as an outcome measure in clinical trials, or as a predictor of functional changes.
The translation of the KFT into Danish appeared to be successful. Both the bilingual panel and the professional panel experienced only a few problems during the translation procedure. Even though only 4 patients were assessed, the field test revealed good results, as almost all instructions were used without difficulty in performing the tasks. The translation method applied in this study is very extensive, in accordance with recommendations for health-related quality-of-life measures. 24 In the present study, test-retest reliability was assessed in 20 patients, with a 1-week interval between tests. This test-retest interval was used in an earlier study 32 with good results and seems to be adequate for securing 2 identical groups of patients with RA to be studied. The cohort of 20 patients was smaller than co-    horts in other studies 8,10,32 but was favored by unchanged medication, number of painful joints, and duration of morning stiffness.
To ensure that the KFT version using the Danish manual was reproducible before using it in the validity study, a reliability study was completed beforehand. Consequently, 2 different cohorts were examined. Optimally, the validity study would have included a repetition of reliability testing.
In accordance with a previous study, 10 the present study had a very comprehensive approach, including both long-term and short-term assessments to ensure that a broad spectrum of disease severity and disease duration was included. In the TIRA group, 12 of the 35 patients dropped out between years 2 and 7. This dropout rate of 34% is higher than in an earlier study with a similar follow-up time. 1 As the results of the 7-year follow-up assessment were similar to the results of the 2-year follow-up assessment with all patients examined, the dropout rate is not expected to have had a major influence on the results.
In both the anti-TNF-␣ and the TIRA groups, the KFT was compared with internationally recommended parameters for assessment of disease activity, structural joint damage, and disability, 13 but it would have been optimal if a gold standard for assessment of impairment of body function had been included. Neither the present study nor previous studies that tested the KFT [17][18][19][20]23,33   However, the apparent unresponsiveness of the KFT could be due to lack of real change in functional ability over time in the patients in the TIRA group, as they had early, relatively mild RA and received effective therapy, which was aimed at controlling disease activity and, consequently, at reducing functional impairment. The poor responsiveness at the group level also could partly have been influenced by the fact that the functional level of the individual patient could change in both directions (improvement or deterioration), as illustrated in Table 4. At the patient level, changes greater than the SDD of 10 points occurred in 56% of the patients in the TIRA group after 7 years, suggesting that the individual patients did change. However, considered as a group, this change was smaller (between 1 and 6 points) (Tab. 2) and not detectable, because some patients improved and others deteriorated. Conclusively, the negative results of the responsiveness assessment can be explained by poor reliability, by changes in opposite directions at the level of the individual patient, and by the patient's functional stability.
The predictive value of the KFT was found only in the TIRA group for future functional changes after 2 and 7 years. The predictive value of the KFT has been analyzed in only one previous study, 33 which showed that baseline KFT scores, patient's pain, and patient's global assessment of disease activity were able to predict 25% of the functional changes in 85 patients after 1 year. Such results could not be found in our study.
Our study had some limitations. The study design should determine whether the KFT could be recommended as a test of physical performance. The KFT was compared with conventional clinical and biochemical measures of disease activity related to the body structure and activity level according to the ICF framework. 4,5 However, the KFT relates to the body function level, and we could have added a measure such as the Swedish SOFI at this level as a gold standard for patients with newly diagnosed RA. The first step then should have been a translation of the Swedish SOFI into Danish. 10 This would have been of value in interpreting both validity and responsiveness of the KFT, but was considered too laborious for the purpose of this work.
The numbers of patients in the anti-TNF-␣ group and in the 7-year follow-up assessment of the TIRA group were small. No formal sample size calculation was done. However, we considered the sample size sufficient to illustrate potential responsiveness of the KFT because available data indicated that it would be sufficient to illustrate changes in the conventionally used measures. This was confirmed.
Another methodological problem may be that we tested the reproducibility of the KFT separately and did not repeat this test in the validity part of the study. However, we considered the good results from the reliability part of the study to be largely transferable to the validity part of the study because the 24 items of the KFT manual were described in detail and the same manual was used throughout the study.

Conclusion
A Danish translation of the KFT was successfully performed, and it showed good intraobserver and interobserver agreement, with acceptable concurrent validity, by comparison with measures of disease activity and of activities of daily living. In this study, the KFT showed very poor responsiveness and inconclusive results concerning predictive validity, indicating that the KFT is not suitable for assessing treatment efficacy, as it could not show changes over time. The present study does not support the use of the KFT for monitoring function in clinical practice, as outcome measure in clinical trials, or as a predictor of functional changes.
Ms Holm, Dr Jacobsen, and Dr Ostergaard provided concept/idea/research design. Ms Holm, Dr Jacobsen, Dr Hetland, and Dr Ostergaard provided writing. Ms Holm, Dr Skjodt, and Dr Jensen provided data collection. Ms Holm, Dr Jensen, and Dr Ostergaard provided data analysis. Ms Holm provided project management and clerical support. Ms Holm and Dr Hetland provided subjects. Ms Holm and Dr Ostergaard provided facilities/equipment. Dr Holm, Dr Jacobsen, Dr Klarlund, Dr Jensen, Dr Hetland, and Dr Ostergaard provided consultation (including review of manuscript before submission).