Psychometric testing of the British English Workplace Activity Limitations Scale in four rheumatic and musculoskeletal conditions

Abstract Objectives The aims were to validate a British English version of the Workplace Activity Limitations Scale (WALS) linguistically, then test this psychometrically in RA, axial spondyloarthritis (axSpA), OA and FM. Methods The WALS was forward translated, reviewed by an expert panel, and cognitive debriefing interviews were conducted. Participants completed a postal questionnaire booklet. Construct (structural) validity was examined by fit to the Rasch measurement model. Concurrent validity included testing between the WALS and the Work Limitations Questionnaire-25 (WLQ-25). Two weeks later, participants were mailed a second questionnaire booklet for test–retest reliability. Results Minor wording changes were made to the WALS, then 831 employed participants completed questionnaires: 267 men and 564 women; 53.5 (s.d. 8.9) years of age; with condition duration 7.7 (s.d. 8.0) years. The WALS satisfied Rasch model requirements, and a WALS Rasch transformation table was created. Concurrent validity was strong with the WLQ-25 (RA rs = 0.78; axSpA rs = 0.83; OA rs = 0.63; FM rs = 0.64). Internal consistency was consistent with group use (α = 0.80–0.87). Test–retest reliability was excellent, with intraclass correlation coefficient (2,1) at ≥0.90. Conclusion A reliable, valid British English version of the WALS is now available for use in the UK.


Introduction
Work participation (i.e. paid work) is important for the health and well-being of people with rheumatic and musculoskeletal disorders (RMDs). Yet they have a shorter healthy working life expectancy [1] and are less likely to be employed compared with those without long-term health conditions [2]. Working people with RMDs can struggle to manage work, leading to presenteeism (i.e. reduced at-work productivity owing to health problems [3]). This is an important target for improvement in medical, rehabilitation and vocational interventions, and from the perspectives of people with RMDs [4]. Outcome measures assessing at-work productivity, tested across a range of RMDs, can help to direct and evaluate such interventions.
The OMERACT Work Productivity Group identified two patient-reported outcome measures of at-work productivity suitable for use in RMD [5,6]. The Work Limitations Questionnaire-25 (WLQ-25) measures duration of difficulty with work activities (work productivity) [7]; and the Workplace Activity Limitations Scale (WALS) measures the amount of difficulty with work activities (work ability) [4,8]. People with RA and OA preferred the WALS over the WLQ-25 as an outcome measure [9].
The WALS was developed and tested psychometrically in Canada and has been used there in studies in inflammatory arthritis [i.e. RA, PsA and axial spondyloarthritis (axSpA)], OA, lupus and scleroderma [10][11][12][13][14][15]. In RA and OA, it has the following characteristics: good content validity, comprehensibility and content relevance [9]; low respondent burden [16]; and concurrent validity with other work measures [6,17]; although there is only limited evidence for its testretest reliability [6]. It has potential for clinical and research use in the UK. The WALS was developed in Canadian English. Before use in the UK, it should be validated linguistically (i.e. translated and culturally adapted) into British English (a different form of the same language), then tested psychometrically [18]. Although most Canadian English is understandable in the UK, some words used in the WALS have different meanings, e.g. 'subway' means a rapid transport system in North America but an underpass for crossing roads in the UK. The aims of the present study were therefore as follows: to validate linguistically, investigate content validity, and evaluate the psychometrics of a British English WALS among employed people with RA, axSpA, lower limb OA and FM in the UK. Testing should include both classical testing and item response theory (e.g. Rasch analysis) to establish psychometric properties (e.g. reliability and validity) [19].

Methods
The study design used cross-cultural adaptation, followed by cross-sectional surveys to establish psychometric properties of the WALS. The COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklists were followed [19,20]. Ethical approval was obtained from the National Research Ethics Service Committee East Midlands, Leicester South (17/EM/0409). All participants provided written, informed consent.

Participants and recruitment
Patients were recruited from 41 secondary care and six community National Health Service Trusts' Rheumatology, Orthopaedic or Therapy outpatient clinics, with some participants from our research group's Arthritis Volunteer Register, in the UK. Participants were eligible if they were: 18 years of age, in paid employment 1 day per week, currently working or on <4 weeks sick leave, with participation delayed until at work, and had a primary diagnosis of RA or undifferentiated inflammatory arthritis (UIA), axSpA, OA (knee and/or hip) or FM. Diagnoses were confirmed by a rheumatologist for RA, UIA and axSpA or by a rheumatologist, orthopaedic surgeon, general practitioner or extended scope physiotherapist for OA and FM. Participants needed to be able to read, write and understand British English and were ineligible if on long-term sick leave, because they were unable to complete the work measures. Patients were identified by research facilitators or therapists using these criteria and given a short study explanation and information pack. The latter included a reply form, including diagnosis, employment and sick leave status, to check eligibility criteria.

Data collection
In phase 1, linguistic validation and cross-cultural adaptation were conducted to ensure that the wording in the WALS was considered comprehensible by participants. Content validity (i.e. the degree to which the content of a patient-reported outcome measure is considered an adequate reflection of what is being measured) was also tested [18,21] (see Supplementary Data S1, available at Rheumatology Advances in Practice online).
In phase 2, for psychometric testing, participants were mailed a paper questionnaire booklet to complete at home [test 1 (T1)]. Two weeks after return of the questionnaire, they were mailed a second questionnaire [test 2 (T2)], to assess test-retest reliability. Participants were sent a reminder letter after 2 weeks, followed at 4 weeks by another reminder and questionnaire booklet, if needed.
The T1 booklet included demographic data: age, sex, living arrangements, education status, condition duration, medication regimen, employment status and job title, to allow coding to job skill level f1 ¼ elementary occupations; 2 ¼ requiring compulsory education/work-related training; 3 ¼ post-compulsory education (sub-degree) or longer work experience; 4 ¼ degree education or equivalent experience [22]g.
The T1 booklet also included the British English WALS, consisting of 12 items, measured on a scale of 0-3 for difficulty in performing work activities (0 ¼ no difficulty; 3 ¼ unable to do; Supplementary Data S2, available at Rheumatology Advances in Practice online). The WALS includes: eight physical activity items; three about managing work; and concentration at work [12]. The instructions state that respondents should answer about their work performance without help from others or using special gadgets or equipment, in order that their answers are not confounded by the use of workplace behavioural coping strategies [10]. The recall period is not specified. Those items answered as 'not applicable to my job' are scored 0. Scoring allows up to three missing items, which can be imputed using the individual's mean or median scores (depending on the data distribution). A summed score is calculated (0-36), with scores 9 being associated with greater absenteeism, job disruptions and need for work accommodations, compared with those scoring <5 [13].
To test concurrent validity, several work and health measures were included in the T1 questionnaire booklet. Some of these were condition-specific measures, and therefore four separate condition-specific T1 questionnaire booklets were used, with participants completing the booklet relevant for their condition. For all measures, a higher score indicates worse status. Three work measures were included. The WLQ-25 consists of 25 items in four subscales (1-5 scale), indicating the percentage time in the past 2 weeks that physical work, time, mental-interpersonal and output demands were limited [7]. From these, the WLQ Percentage Productivity Loss [7] and Summed scores [23] are created. Two forms of the Work Instability Scale (WIS) were used: the RA-WIS was included in those questionnaires for people with RA, OA or FM, and the AS-WIS for axSpA [24][25][26]. Both forms measure the degree of mismatch between the respondent's work abilities and their job demands. The RA-WIS includes 23 true/false items and the AS-WIS 20 items. Both have cut-points indicating low, moderate and high work instability (RA-WIS <10 and >17; AS-WIS <11 and >18). The third work measure was the Work Productivity and Activity Impairment (WPAI) (General Health) scale, which includes six items, from which a Percentage overall work impairment due to health (in the past 7 days) score is calculated [27].
For RA, the condition-specific health measures included in the T1 booklet were: the Rheumatoid Arthritis Impact of Disease (RAID) scale, consisting of seven 0-10 numerical rating scales (NRS; e.g. pain, fatigue, function) scored by summing weighted NRS scores [28]; and the HAQ, consisting of 20 daily activities rated 0-3 (0 ¼ not at all difficult; 3 ¼ unable to do) [29]. The HAQ was scored by summing all items (0-20 ¼ mild; 21-40 ¼ moderate; 41-60 ¼ severe disability) without adjustment for using aids and devices [30]. For axSpA, the health measures were: the BASDAI, in which the average score (0-10) is calculated from six 10 cm visual analogue scales (VAS) of symptom severity (e.g. fatigue, spinal pain [31]); and the BASFI, in which an average score (0-10) is calculated from ten 10 cm VAS of physical function [32]. For OA, two subscales of the WOMAC were included [pain (five items) and physical function (17 items); both scored from 0 ¼ none to 4 ¼ extreme], with total scores for each subscale calculated [33]. Finally, for FM, the Revised Fibromyalgia Impact Questionnaire (FIQR) was included. This consists of three subscales rated on 0-10 NRS [overall impact (two items); symptoms (10 items); and function (nine items)]. Subscale and overall total scores were then calculated [34]. For all four conditions, an additional health question was about perceived health status: 'Considering all the ways that your condition affects you, how have you been over the past month?' (scored from 1 ¼ very good to 5 ¼ very poor).
At test 2, participants completed the WALS, perceived heath status and also an item on perceived change in health status: 'Overall, how much is your arthritis/condition troubling you now compared with when you last completed this questionnaire?' (1 ¼ much less; 3 ¼ about the same; 5 ¼ much more).

Sample size
Given that Rasch analysis was used to assess construct (structural) validity, a minimum of 150 cases are needed within each condition group [35]. We aimed to collect 250 to ensure a broad spread of responses. At least 79 sets of repeated responses were needed to demonstrate that a test-retest correlation of 0.7 differed from a background correlation (constant) of 0.45, with 90% power at the 1% significance level.
A test-retest reliability correlation of 0.7 is considered a minimum acceptable level [36].

Statistical analyses
Demographic, work and health measures were summarized descriptively, as appropriate. RUMM 2030þ software was used for Rasch analysis [37]. Given that all work and health measures either consisted of ordinal data or were not normally distributed, non-parametric statistical tests were conducted using the Statistical Package for the Social Sciences (SPSS) v.26 [38]. The following psychometric properties were assessed.
Compliance Compliance (i.e. the amount of missing data) was assessed by identifying the number (percentage) of missing data items and also WALS which could not be scored.

Validity
Construct (structural) validity measures the degree to which the scores of a patient-reported outcome measure adequately reflect the dimensionality of the construct being measured (e.g. do all scale items measure the same construct, and are items independent of one another?). The first analytical strategy was testing the fit of the WALS for each condition to the Rasch measurement model [39]. The approach also tested cross-diagnostic validity to test for invariance (i.e. whether the scale can be used to assess group differences because items are being interpreted similarly across groups; e.g. across conditions, age groups and sex). For interested readers, full details of the analysis are given in Supplementary Data S3 and Table S1, available at Rheumatology Advances in Practice online, and described in detail elsewhere [40].
Concurrent validity (i.e. the degree to which scores are consistent with hypotheses, e.g. that scores on other relevant measures are correlated with the WALS) was assessed using Spearman's correlations with work and health measures. We hypothesized that there would be moderate to strong correlations between scores for the WALS and the three work measures and moderate correlations with perceived health status and condition-specific symptoms and physical function scales. Correlations of 0.4-0.59 are considered moderate and 0.6 are strong [41].
Discriminant validity (i.e. hypothesis testing that there would be significant WALS score differences between those reporting they had very poor/poor, fair or good/very good perceived health status). This was assessed using Kruskal-Wallis tests, with P 0.05 considered significant.

Reliability
Internal consistency (i.e. the degree of interrelatedness between items within a scale) was assessed using Cronbach's a. Results 0.8 were deemed good to excellent: 0.9 is consistent with individual use; and >0.7 with group-level use [41].
Test-retest reliability is the extent to which scores for participants who report that their health has not changed are the same for repeated measurements over time. This was assessed in those reporting perceived health as 'the same' at T2, using Spearman's correlations and intraclass correlation coefficient (ICC) (2,1): two-way random consistency, average measures model. An ICC of 0.75 is considered excellent and 0.5-0.74 moderate [42]. Reliability of individual WALS items was calculated using linear weighted j, with levels of agreement considered as: 0.41-0.60 ¼ moderate; 0.61 ¼ good [41].

Responsiveness
Sensitivity to change was assessed by calculating the standard error of measurement (SEM) and the minimal detectable change 95 (MDC 95 ) scores. The SEM represents the S.D. of repeated measures of one individual. The MDC 95 is a statistical estimate of the smallest detectable change corresponding to change in ability rather than a measurement error [43,44].
Floor and ceiling effects were considered present if >15% of participants achieved either the lowest or highest scores in the WALS [45]. If present, these can have a negative effect on the quality of the measure, because responsiveness (i.e. the ability to detect change over time) will be limited.

Phase 1
Linguistic validation, cross-cultural adaptation and content validity results are given in Supplementary Data S1, Tables S2 and S3, available at Rheumatology Advances in Practice online. In cognitive debriefing interviews (n ¼ 48; participant characteristics are in Table 1), all items were considered very relevant and, following expert panel review, only minor changes in wording were needed.

Compliance
There were 0.01% missing data. WALS scores could not be calculated for three participants (with 5-12 missing items each) in each of the RA, axSpA and OA groups. These participants were not included in analyses (i.e. the sample size was reduced to 822). All FM scores could be calculated. The frequency of 'not applicable' (re-scored as 0) and 'missing' data are shown in Supplementary Table S4, available at Rheumatology Advances in Practice online. Table 3 displays the detailed analysis of fit to the Rasch model. The scale is unidimensional. The items most easily affirmed (i.e. the transition from no to some difficulty) were: 'Lifting, carrying or moving objects' (RA); 'Crouching, bending or kneeling' (axSpA and OA); and 'Concentrating' (FM). The items most difficult to affirm (i.e. the transition from a lot of difficulty to unable to do) was: 'Working with your hands' (RA, axSpA, OA and FM). Invariance was confirmed for age, sex, condition, disease duration, educational and work status. Full details of the results are given in Supplementary Data S3, available at Rheumatology Advances in Practice online. A transformation table was created to convert WALS raw scores to interval level scores, if required (Supplementary Table S5, available at Rheumatology Advances in Practice online). A reference metric was also created to allow test equating of raw WALS scores with raw RA-and AS-WIS scores (Supplementary Table S6, available at Rheumatology Advances in Practice online). Both the latter have clinically derived cut-points. Direct comparison with these cut-points suggests that WALS scores of 7 and 14 would indicate thresholds for moderate and high work instability, respectively, in these four RMDs.

Discriminant validity
As hypothesized, there were significant differences between the three levels of perceived disease severity for the WALS across all four conditions, with higher perceived disease severity subgroups scoring worse (Supplementary Table S7, available at Rheumatology Advances in Practice online).

Internal consistency
Cronbach's a values across the four conditions were good to excellent, ranging from 0.80 (FM) to 0.87 (RA). All were consistent with group-level use (Table 3).

Test-retest reliability
At T2, 356 of 622 (57%) participants reported that their condition was 'the same' as at T1 and included in analyses. For all four conditions, correlations between T1 and T2 scores were strong to very strong (r s ¼ 0.80 and above). The ICCs (2,1) were excellent, at 0.90 and above (Table 5). Item reliability was moderate to good (Supplementary Table S8, available at Rheumatology Advances in Practice online).

Sensitivity to change
The MDC 95 scores ranged from 3.17 (RA) to 5.08 (OA) in those stating that their health was 'the same' at T2 (Table 5).  We ensured linguistic and cross-cultural validity of the WALS by using a standard translation process [21], with the approval of the WALS developer. Example activities were updated in three items: to reflect active travel options (in item 1); and in items 2 and 6 to increase relevancy to manual jobs.

The workplace activity limitations scale
Participants considered the WALS comprehensive, comprehensible and easy to complete, indicating good content validity from the perspective of the patients in these four RMDs (i.e. comparable to findings in RA and OA in Canada [9]). To our knowledge, this is the first study examining construct (structural) validity of the WALS in RA, axSpA, OA and FM, demonstrating fit to the Rasch model and making available a Rasch transformation table from WALS raw to interval scores. Given that the WALS is unidimensional, either summed or (Rasch) standardized scores can be used. As hypothesized, the WALS demonstrated good concurrent validity with other work measures, except the WLQ-25 physical demands subscale in FM. Some participants can have difficulty completing this subscale, because instructions are reversed compared with the other three subscales [6]. Potentially, more participants with FM experience such difficulty, because >50% of people with FM report cognitive deficits, which is higher than that experienced by people with RA,     Alison Hammond et al. Bold text indicates ideal values. a Between parallel forms. a: Cronbach's a; axSpA: axial spondyloarthritis; DIF: differential item functioning; ECV: explained common variance; LCI: lower confidence interval; PSI: Person separation index.  The workplace activity limitations scale 7 for example [46,47]. As hypothesized, correlations with physical function, symptom and health scales were moderate in OA and FM, but generally strong (i.e. higher than expected) in RA and axSpA. These findings are comparable to those in RA and OA in Canada [17]. We also demonstrated the WALS has good discriminant validity in the four RMDs, which had not been tested previously.
Internal consistency was good, and comparable to findings in RA and OA in Canada [6], meaning that the WALS can be used for group measurement in the four RMDs. Identifying that WALS scores of 7 and 14 equate to RA-and AS-WIS cutpoints for moderate and high work instability indicates that the WALS could help not only to identify patients' work limitations but also who could benefit from work rehabilitation. The evidence for test-retest reliability is extended, and specific values of MDC 95 for each of the four RMDs are provided. These had previously been tested in only a small sample of 'workers with arthritis' [5,6].
It is worth noting that across all four RMDs, the phase 2 results showed that those reporting they had 'fair' health status had average WALS scores exceeding the cut-off score for moderate work instability (i.e. 7) and also (apart from axSpA) those with poor/very poor health status had scores exceeding the cut-off for high work instability (i.e. 14). The FM group also had higher average WALS scores than the other three RMDs, despite being younger, with many experiencing high work instability. Health professionals working with employed people with RMDs reporting fair or poor health status, and especially those with FM, are recommended to screen for work problems and provide work advice and support, as relevant.
The WALS tests intrinsic work activity impairment (i.e. capacity in International Classification of Functioning, Disability and Health terms), because the instructions specify reporting difficulty without help from another person or use of gadgets or equipment. It might not therefore reflect the person's real ability to work (performance in International Classification of Functioning, Disability and Health terms; i.e. with ergonomic modifications, help and/or job accommodations). Under the UK Equality Act 2010, it is the duty of an employer to provide these (termed as 'reasonable adjustments') to employees with disabilities. Clinically, and in work rehabilitation studies, using a WALS omitting the instructions to answer 'without help or gadgets/equipment' could better identify whether improvement occurs following work rehabilitation and putting reasonable adjustments in place. Modified instructions could focus on how people usually do these activities. Additionally, there is no time frame in the instructions. Some work measures (e.g. the WLQ-25), ask about the last 2 weeks. A disadvantage of a short time frame is, firstly, that the measure can be completed only by people working for 1 day during that time. Those on sick, annual or other extended leave for >2 weeks cannot complete it. Second, people with RMDs can experience episodic flares or worse health. A limited time frame means that completion might coincide with a period of unusual ill-health or good health. Avoiding a time scale overcomes this problem, because participants might either reflect on their difficulties when last at work or estimate difficulties. This could, however, be problematic in those on long-term sick leave if they estimate difficulties incorrectly. Particularly in intervention studies, it might be better to specify a time (e.g. 3 months). Future research could psychometrically test a WALS with modified instructions.
A strength of this study was that we had relatively large samples of people with RA, axSpA, OA and FM recruited from a wide variety of NHS outpatient clinics, meaning that the results are representative for people accessing secondary or community care. Limitations were that fewer people with FM had stable self-reported health between T1 and T2, compared with the other conditions, resulting in a smaller sample for test-retest reliability than required. Responsiveness (i.e. longitudinal validity) still needs testing, and minimal clinically important differences need to be established in the UK. Further testing in other RMDs is required.

Conclusions
Overall, psychometric testing of the British English WALS demonstrated good validity and reliability in employed people with RA, axSpA, OA and FM in the UK. The WALS meets most recommendations of the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklists for methodological quality and reporting [19,20]. Accordingly, the British English WALS can be used in the UK for these four RMDs.

Supplementary material
Supplementary material is available at Rheumatology Advances in Practice online.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request, following completion of associated studies. The British English WALS is available in the Supplementary Materials.

Contribution statement
A.H., A.T. and Y.P. contributed to the study conception and design. Phase 1: A.H. and Y.P. conducted data collection and analysis. A.H., A.T., M.G., Y.P., S.V. and R.O'B. were members of the Expert Panel. Phase 2: material preparation and data collection were performed by A.H., A.C., and J.P. Analyses were performed by A.T. (Rasch analysis) and A.H. (classical testing). The first draft of the manuscript was prepared by A.H., with A.T. drafting the construct (structural) validity/Rasch analysis sections. All authors contributed to and revised previous versions of the manuscript. All authors read and approved the final manuscript.

Funding
This work was part-supported by the European Alliance of Associations for Rheumatology (EULAR) grant number HPR035. NHS service support costs were secured from the NIHR Comprehensive Local Research Network.
Disclosure statement: The authors have no relevant financial or non-financial interests to disclose.