Estimating the population health burden of musculoskeletal conditions using primary care electronic health records

Abstract Objectives Better indicators from affordable, sustainable data sources are needed to monitor population burden of musculoskeletal conditions. We propose five indicators of musculoskeletal health and assessed if routinely available primary care electronic health records (EHR) can estimate population levels in musculoskeletal consulters. Methods We collected validated patient-reported measures of pain experience, function and health status through a local survey of adults (≥35 years) presenting to English general practices over 12 months for low back pain, shoulder pain, osteoarthritis and other regional musculoskeletal disorders. Using EHR data we derived and validated models for estimating population levels of five self-reported indicators: prevalence of high impact chronic pain, overall musculoskeletal health (based on Musculoskeletal Health Questionnaire), quality of life (based on EuroQoL health utility measure), and prevalence of moderate-to-severe low back pain and moderate-to-severe shoulder pain. We applied models to a national EHR database (Clinical Practice Research Datalink) to obtain national estimates of each indicator for three successive years. Results The optimal models included recorded demographics, deprivation, consultation frequency, analgesic and antidepressant prescriptions, and multimorbidity. Applying models to national EHR, we estimated that 31.9% of adults (≥35 years) presenting with non-inflammatory musculoskeletal disorders in England in 2016/17 experienced high impact chronic pain. Estimated population health levels were worse in women, older aged and those in the most deprived neighbourhoods, and changed little over 3 years. Conclusion National and subnational estimates for a range of subjective indicators of non-inflammatory musculoskeletal health conditions can be obtained using information from routine electronic health records.


Introduction
Musculoskeletal conditions such as low back pain (LBP) and osteoarthritis (OA) are extremely common, have proven over decades to be stubbornly resistant to treatment, and represent one of the greatest challenges to healthcare services and population health through their impact on everyday life [1]. Despite such overwhelming evidence of their significance, there is a lack of data that provide estimates of the extent of the impact of musculoskeletal conditions at a population level that can be used to guide interventions and preventative strategies.
Primary care electronic health records (EHR) offer the potential to be an ongoing source of data that can be used for surveillance and drive improvements in healthcare and health [2]. This ongoing collection of information can provide estimates of the number of people who have conditions and the processes of care such as the number that receive joint replacement or are prescribed pain medications and biologic therapies [2][3][4][5], although notably the availability of these data varies depending on geography and source (e.g. prescribed analgesics are well-recorded in primary care settings, joint replacement and biologic therapy are better recorded in secondary care data). However, the reason that people seek health care is not directly linked to the presence of musculoskeletal conditions but more so to the severity of symptoms (e.g. severity of pain) and their impact, in terms of disability and reduced quality of life [5], which drives the need for intervention and preventative strategies.
EHR does not routinely capture information on the severity or impact of musculoskeletal conditions and these data are best collected from patient reports [6,7]. National surveys provide data on impact but have limited space for specific information on musculoskeletal conditions that can help with the prioritization of resources and services [2,8,9]. Combining EHR with patient reported information presents an opportunity to more accurately identify the impact of musculoskeletal conditions and the distribution and inequalities in the population [10]. However, patient reported information on musculoskeletal conditions may not always be available, and if EHR are to be used for ongoing surveillance, their ability to estimate the impact of musculoskeletal conditions must be examined [11].
In this study, the focus is on adults seeking healthcare for common musculoskeletal conditions. Five population indicators are proposed for surveillance of musculoskeletal health and that can be used to guide intervention strategies. The aim of this study was to examine if EHR data can estimate the extent of the impact of musculoskeletal conditions in musculoskeletal consulters at a population level.

Design
We conducted our investigation in three stages: i. A local census survey of all adults aged 35 years presenting to selected English general practices in one calendar year for non-inflammatory musculoskeletal conditions. ii. Using linked primary care EHR data from consenting respondents, we derived and internally validated one model each for estimating population-level estimates of five self-reported indicators-the prevalence of high impact chronic pain, musculoskeletal health (mean Musculoskeletal Health Questionnaire (MSK-HQ) score), quality of life [mean EuroQoL health utility score (EQ-5D-5L)], prevalence of moderate-to-severe chronic LBP among LBP consulters, prevalence of moderate-to-severe chronic shoulder pain among shoulder pain consulters iii. We applied our models using harmonized code lists to an independent national primary care EHR database (Clinical Practice Research Datalink) to obtain national and regional estimates of each indicator for three successive calendar years (2014/15, 2015/16 and 2016/17).

Population and setting
The target population was adults aged 35 years presenting to primary care with LBP, neck pain, osteoarthritis, non-specific hip pain, knee pain, shoulder pain or hand/wrist pain.

Musculoskeletal health indicators
Based on a review of national outcome frameworks [11,12], existing indicators [13], proposed indicator sets for musculoskeletal health [14] and input from public contributors, we selected the following five musculoskeletal health indicators for this study: . Proportion of MSK consulters with high impact chronic pain (HICP) defined as pain on most or all days in the previous 6 months and that limited life or work activities on most or all days. This approach is used in the US National Pain Survey [15]. . Mean Musculoskeletal Health Questionnaire (MSK-HQ) score: a 14-item questionnaire that captures key outcomes that patients with musculoskeletal conditions have prioritized as important for use across clinical pathways [16]. Scores range from 0 to 56, higher scores indicating better musculoskeletal health over the past 2 weeks [16]. . Mean EQ-5D-5L health utility score: the EQ-5D-5L selfclassifier provides a self-reported description of healthrelated quality of life, rated on the day of response, according to a five-dimensional classification divided into five levels of perceived problem (no, slight, moderate, severe, unable). It has excellent psychometric properties [17]. We calculated the EQ-5D-5L utility score using the UK crosswalk value set [17], with scores ranging from <0.0 (representing health states worse than death) to 1.0 (full health). . Proportion of LBP consulters with moderate-to-severe chronic LBP, defined as LBP present on most or all days in the previous 6 months and average intensity 5 on 0-10 NRS [18]. We excluded patients with recorded inflammatory disease, spondyloarthropathy or crystal arthropathy. The survey instrument contained recommended items and instruments measuring the nature, severity and impact of MSK conditions, including the five indicators described above [17]. At 2 weeks, non-responders were re-sent the survey and offered the option of online completion, and at 4 weeks a minimum data collection survey was mailed to non-respondents, again with the option of online completion. Of 8461 mailed, 4528 responded (response rate 54%). Of these, 3828 (85%) consented to link their survey responses to routinely collected primary care EHR data, and 3710 (97%) had completed self-reported musculoskeletal health indicators. The general practices had all previously contributed to the CiPCA (North Staffordshire) primary care EHR database, which included training and assessment in morbidity recording [19], and been previously shown to give similar annual consultation prevalence rates for musculoskeletal conditions as national and international EHR databases [20,21].
Covariates considered for inclusion in the models to estimate each of the five indicator measures were selected based on previous literature, expert opinion (including that of patients), potential association with MSK health status and routinely recorded within primary care EHR. These included demographic, socioeconomic, lifestyle, comorbidity, and musculoskeletal/pain-specific primary care contacts, diagnoses/problem codes, referrals, investigations and treatments (Table 1). A data manager independent from, and blinded to, survey data extracted these candidate covariates from the EHR of consenting respondents using pre-defined code lists (available from the authors; for the period up to 10 years prior to the survey). Details for definition of all candidate covariates are presented in Supplementary Table 1, available at Rheumatology online. Briefly, lifestyle predictors (i.e. smoking status, BMI), the most recent record before the index date was used; other candidate covariates were defined as having any record within 10 year prior to the survey (i.e. the Charlson Comorbidity Index was solely defined by Read codes, without combining other function or evaluation procedures).
These data were then linked to survey data to create the PRELIM Survey.
Clinical practice research datalink national EHR data Clinical practice research datalink (CPRD) GOLD contains EHR data from over 10 million patients registered with over 650 UK general practices [22]. For this study we used data from practices (all in England) which consented to linkage to the Index of Multiple Deprivation (IMD) [23]. Based on patient's residential postcode, IMD is a composite measure of neighbourhood deprivation incorporating domains on income, employment, education, health, housing, crime, and environment. Using code lists for eligibility criteria that were harmonized with those used in PRELIM Survey-EHR we included adults aged 35 years (n ¼ 49 788) consulting for a non-inflammatory musculoskeletal pain condition in July 2016-June 2017 (i.e. as per PRELIM). Using another set of harmonized code lists we extracted information on their covariates recorded in the previous 10 years. We then repeated this process for cases consulting between July 2015 and June 2016 and between June 2014 and July 2015 to evaluate the stability over time of our modelled estimates.

Model performance
For subgroups of the population based on age, gender, CCG and deprivation, we compared the observed prevalence rates and mean scores (as appropriate) of the indicators from the PRELIM survey with their estimated values derived from the models utilizing the linked EHR. For logistic regression models, performance of the final model was also examined using the C-statistic. For linear regression models, performance was assessed using R 2 (proportion of the variance in continuous outcomes explained by the included covariates).
Final models were applied to 100 bootstrapped samples to examine performance (as described above), and then to the original dataset to test model performance and optimism (the difference in the performance in the bootstrapped and original data). Overall optimism was estimated for all models. The overall optimism-corrected calibration of these models was assessed graphically by plotting agreement between predicted and observed values for each decile of predicted risk.

Application of models to national EHR data
The final parsimonious, optimism-corrected models derived in the PRELIM Survey-EHR data were then applied to the relevant MSK consulter cohorts in the CPRD dataset to estimate the prevalence/mean of each of the five indicators for national estimates in three con- )] for EQ-5D-5L in the specific population. We present these estimates overall, and stratified by sex, age (10-year age bands), deprivation (quintiles) and geographical region.
To explore the sensitivity of our findings to length of look-back period, we repeated all the preceding steps using a 2-year look-back period in the EHR data.

Ethical approval
Ethical approval was obtained for the PRELIM survey and linkage to primary care EHR data from the North West-Greater Manchester East Research Ethics Committee (REC Ref: 15/NW/0735). The use of CPRD was approved by the Independent Scientific Advisory Committee (reference number: 18_014).

Patient and public involvement
Public contributors were involved throughout this study to ensure that the perspectives of patients remained at the centre of the research. Ten public contributors from the Research User Group, Keele University, were involved in the study, as part of advisory groups or study management meetings. They provided patient perspectives on the development of the proposal (particularly on linkage of data from EHR and questionnaires), study materials (participation information sheets, consent forms) and the PRELIM questionnaire. A public co-applicant (S.D.) is a member of the study team and two other public contributors attended the study steering committee.

Model development and apparent performance-PRELIM Survey-EHR
Based on consistently good relative model fit and performance, the 5-year look-back period for identifying covariates recorded in the EHR was selected as optimal for all indicators, although differences between lookback periods were generally small ( Supplementary Fig.  1, available at Rheumatology online). Distribution of the covariates over the 5-year period in the PRELIM Survey-EHR cohort are given in Table 1.
After backward elimination, between 7 and 16 covariates were retained in each model (minimum of 14 events per parameter in logistic regression models and 143 subjects per parameter in linear regression models). The coefficients of the models are given in Supplementary Table 2, available at Rheumatology online. Prescription of strong or very strong analgesia was strongly associated with all five indicators while antidepressant prescriptions, time since MSK consultation and area-level deprivation were strongly associated with four of the five indicators. Any MSK referral and joint injection were associated with moderate-to-severe chronic low back pain and EQ-5D-5L, respectively. MSK X-ray and smoking were associated with moderate-to-severe chronic shoulder pain. The non-linear associations of continuous covariates with indicators is shown in Supplementary  Fig. 2, available at Rheumatology online.
Absolute differences between observed and estimated prevalence rates and means when stratified by age, sex, Estimating the population health burden of musculoskeletal conditions using primary care EHR https://academic.oup.com/rheumatology  Estimating the population health burden of musculoskeletal conditions using primary care EHR https://academic.oup.com/rheumatology CCG and deprivation are presented in Fig. 1. Estimated prevalence varied from that observed by a maximum of 5% for high impact chronic pain, moderate-to-severe chronic shoulder pain and moderate-to-severe chronic LBP; and mean scores by 60.2 for MSK-HQ score and 60.01 for EQ-5D-5L score. The optimism-corrected Cstatistics for the three prediction models for binary MSK health indicators ranged from 0.74 to 0.77, while for the two continuous indicators the optimism-corrected R 2 values were 0.30 and 0.33 (Supplementary Table 3, available at Rheumatology online). The optimismcorrected calibration slopes were all 0.99 and with good agreement between observed and estimated prevalence rates and means.

National estimates of MSK indicators
Compared with MSK consulters recorded in CPRD, participants in the PRELIM Survey-EHR cohort were older, and more likely to live in deprived neighbourhoods (Table 1). They were also more likely to have previous recorded MSK consultations in the hand and hip and for osteoarthritis, analgesic prescriptions and MSK X-ray. However, the level of recorded prescriptions for NSAIDs, antidepressants, muscle relaxants and sedatives as well as MSK referrals were lower. By applying our final PRELIM-derived models in CPRD, we estimated nationally that 31.9% of adults aged 35 years and over who had consulted for a common non-inflammatory musculoskeletal pain condition in 2016-2017 would be experiencing high impact chronic pain ( Table 2). The estimated mean MSK-HQ and EQ-5D-5L scores in these MSK consulters were 33.8 and 0.66, respectively. Among recent LBP consulters, an estimated 26.0% had moderate-to-severe chronic LBP. Of recent shoulder pain consulters, an estimated 27.8% had moderate-to-severe chronic shoulder pain. Across all indicators, MSK health among consulters was worse in women than in men, with older age, and in those living in the most deprived neighbourhoods. Over the three consecutive years from 2014/15 to 2016/17 age-, sexand deprivation-specific estimates for all indicators showed either no or small improvements with greatest increases seen in mean EQ-5D-5L scores in all strata. The sensitivity analysis using a shorter 2-year look-back period for covariates gave similar estimates and patterns, although a slightly lower prevalence of high impact chronic pain (28.9% vs 31.9% in 2016/17) and a slightly higher prevalence of moderate-to-severe chronic LBP (29.2% vs 26.0%) (Supplementary Table 4, available at Rheumatology online).

Summary of main findings
Our study provides evidence that it is feasible to use routinely collected EHR data to estimate the extent of the impact of musculoskeletal conditions in populations to guide interventions and healthcare planning. While the remit of our study was specifically five selected indicators on the severity and impact of common, noninflammatory musculoskeletal disorders, the methodology is likely to be generalizable to other indicators and other musculoskeletal conditions.

Comparison with previous research
To our knowledge this is the first study to use prediction model methodology based on routine EHR data to estimate the prevalence and distribution of patient-reported severity and impacts of musculoskeletal conditions. Efforts to classify the severity of long-term musculoskeletal conditions from information in the EHR [24] are based on the expectation that severity can be meaningfully inferred from available patterns of coded events and processes. Our approach extends this by directly modelling patient-reported measurement of severity to obtain population-level estimates of health. Primary care EHRs currently contain little systematic measurement of pain severity, functional status, wellbeing and quality of life. As a result, there are few direct comparisons for the estimates provided here. UK and US surveys estimate the prevalence of moderately severely disabling chronic pain/high impact chronic pain in the adult general population at between 10 and 16% [25]. Our estimate of 32% with high impact chronic pain among MSK consulters aged over 35 years reflects the older age range in our study but more crucially the selection of a high-risk group (MSK consulters). Where comparable estimates exist in MSK consulter populations, our estimates appear similar. For example, our estimated mean MSK-HQ and EQ-5D-5L scores of 33.8 and 0.66 among MSK consulters were just slightly higher (indicating better MSK health) than those reported in a study of adult musculoskeletal patients referred to community physiotherapy clinics (30.5 and 0.60, respectively) [26]. Our estimated EQ-5D-5L mean score is higher than that from the General Practice Patient Survey (0.577) [13], which is likely to reflect the fact that the former is restricted to adults reporting a long-term MSK problem. The current indicator for the prevalence of 'severe back pain' used in the PHE Fingertips tool is also applied to those with a long-term back problem and uses a lower threshold for defining 'severe'. Our estimates show the expected pattern of worse MSK health in females, older ages, and those living in more deprived neighbourhoods.

Strengths and limitations
Our study illustrates an approach to producing timely, affordable indicators of the non-fatal impacts of musculoskeletal conditions that could be derived from continuous EHR data at national and subnational levels. It highlights the potential benefits of such an approach to inform health system responses to the growing challenge of musculoskeletal conditions, which have historically received less attention than other conditions. We deliberately focused on the subpopulation of adults Dahai Yu et al. aged 35 years who had a record of a non-inflammatory MSK consultation in the previous year. Our estimates do not therefore cover younger ages or those suffering MSK conditions but not presenting to primary healthcare in a given year of interest. Our survey, designed with the involvement of patients and members of the public, provided rich self-reported information on musculoskeletal health from nearly 4000 adults, with a response rate equivalent to that of the Health Survey for England (HSE) [27], and substantially higher than the national GP Patient Survey [28]-both sources currently used to produce national musculoskeletal health indicators in England. A high proportion of respondents consented to EHR linkage in practices with a history of high-quality coding. Our public contributors improved the clarity of the study materials for participants and identified key areas for inclusion in the study questionnaire. Our public co-applicant (and co-author) provided the patient perspective on study decision-making. However, our local sampling frame is known to under-represent black, Asian and minority ethnic groups compared with the national average. Future enriched sampling of these groups or a shift to nationally representative survey sample frames with EHR linkage is needed. We found that models based on 5 years of continuous retrospective records were generally optimal but excluding patients and practices with <5 years' prior registration does reduce the sample size and has the potential to introduce selection bias. We used 5 years for all models for consistency. Other indicators or conditions may require fewer years of continuous records. In our study, models requiring only 2 years of retrospective records were only marginally inferior and we have provided these in full in Supplementary data, available at Rheumatology online. The models rely on consistent coding of the included covariates. Lifestyle information, in particular, can often be missing from these records, but completeness has been improving over recent years. Performance of models could be improved by including information from the unstructured free text within the EHR [29] but access to this is increasingly difficult for researchers in the UK due to information governance restrictions. The prediction models have been derived using retrospective data and are limited in their application at the individual level to identify those at high risk. A prospective cohort design would be able to yield more discriminated and calibrated prediction model to identify high-risk individuals.

Implications for research and practice
The need to integrate patient-reported outcomes into EHRs has received considerable attention, but typically from the standpoint of clinical care and organisation of health services. We hope that our study stimulates further research on the harnessing of data within the EHR for population musculoskeletal health indicators and greater attention within health policy and practice to preventing and reducing disability associated with common musculoskeletal conditions in the population. Our national estimates confirm the significant impact of musculoskeletal pain. Future external validation of our models, including research that explores how frequently such models may need to be updated in response to changing patterns of healthcare use and recording, and validation in other geographical areas with health record collation and linkage (such as in Scotland and Wales), are encouraged. Future studies using EHR to estimate the impact of MSK conditions on work ability are also warranted.

Conclusion
Information routinely recorded within English care EHR can estimate the prevalence and extent of key patientreported measures of musculoskeletal health among adult consulters with acceptable accuracy. This approach could provide a sustainable, timely source for a richer array of population musculoskeletal health indicators to inform and support health policy and practice.