Determining cardiovascular risk in patients with unattributed chest pain in UK primary care: an electronic health record study

Abstract Aims Most adults presenting in primary care with chest pain symptoms will not receive a diagnosis (‘unattributed’ chest pain) but are at increased risk of cardiovascular events. To assess within patients with unattributed chest pain, risk factors for cardiovascular events and whether those at greatest risk of cardiovascular disease can be ascertained by an existing general population risk prediction model or by development of a new model. Methods and results The study used UK primary care electronic health records from the Clinical Practice Research Datalink linked to admitted hospitalizations. Study population was patients aged 18 plus with recorded unattributed chest pain 2002–2018. Cardiovascular risk prediction models were developed with external validation and comparison of performance to QRISK3, a general population risk prediction model. There were 374 917 patients with unattributed chest pain in the development data set. The strongest risk factors for cardiovascular disease included diabetes, atrial fibrillation, and hypertension. Risk was increased in males, patients of Asian ethnicity, those in more deprived areas, obese patients, and smokers. The final developed model had good predictive performance (external validation c-statistic 0.81, calibration slope 1.02). A model using a subset of key risk factors for cardiovascular disease gave nearly identical performance. QRISK3 underestimated cardiovascular risk. Conclusion Patients presenting with unattributed chest pain are at increased risk of cardiovascular events. It is feasible to accurately estimate individual risk using routinely recorded information in the primary care record, focusing on a small number of risk factors. Patients at highest risk could be targeted for preventative measures.


Methods and results
The study used UK primary care electronic health records from the Clinical Practice Research Datalink linked to admitted hospitalizations. Study population was patients aged 18 plus with recorded unattributed chest pain 2002-2018. Cardiovascular risk prediction models were developed with external validation and comparison of performance to QRISK3, a general population risk prediction model. There were 374 917 patients with unattributed chest pain in the development data set. The strongest risk factors for cardiovascular disease included diabetes, atrial fibrillation, and hypertension. Risk was increased in males, patients of Asian ethnicity, those in more deprived areas, obese patients, and smokers. The final developed model had good predictive performance (external validation c-statistic 0.81, calibration slope 1.02). A model using a subset of key risk factors for cardiovascular disease gave nearly identical performance. QRISK3 underestimated cardiovascular risk.

Conclusion
Patients presenting with unattributed chest pain are at increased risk of cardiovascular events. It is feasible to accurately estimate individual risk using routinely recorded information in the primary care record, focusing on a small number of risk factors. Patients at highest risk could be targeted for preventative measures.

Lay summary
It is known that patients with chest pain without a recognized cause are at increased risk of future cardiovascular events (for example, heart disease) and so this study aimed to find out whether those patients at greatest risk could be determined using information in their health records.
• It is possible to accurately estimate a person's risk of future cardiovascular events using the information entered into their health records, and this risk can be estimated using only a small number of factors.
• Patients at highest risk could now be targeted for management to help prevent future cardiovascular events.

Introduction
Chest pain is a common symptom for patients in primary care. In the UK, around 2% of adults will present in primary care with chest pain symptoms annually. [1][2][3][4] Whilst general practitioners (GPs) may pursue investigations and diagnose angina or a non-cardiac causes (such as gastro-oesophageal disease, musculoskeletal disease, or anxiety 5 ), many patients do not receive a specific diagnosis. 1,2,6 Patients with such unattributed chest pain have an increased risk of future cardiovascular events compared to those without chest pain, 7,8 and to patients diagnosed with non-coronary causes. 4,9 However, the majority of patients with unattributed chest pain do not receive preventative medication (for example, lipid-lowering drugs), even those at potentially higher risk of cardiovascular disease. 9 Identification of those who have the greatest risk of future cardiovascular events would allow targeting of key modifiable cardiovascular risk factors with preventative management. 6 Cardiovascular disease risk algorithms exist for the general population but may have less validity in other populations. The QRISK score (the most recent being QRISK3 10 ) is the recommended algorithm for assessing cardiovascular risk by the UK National Institute for Health and Care Excellence (NICE) 11 and was developed and validated for use in UK primary care to estimate the risk of cardiovascular events over ten years in patients known to be currently free of cardiovascular disease and not currently prescribed lipid lowering medication. However, QRISK3 was developed and validated in the general population and may not be valid for use in patients with unattributed chest pain, some of whom are already being prescribed lipid-lowering medication and who have a higher underlying risk of cardiovascular disease than the general population.
The aim of this study was to assess, within patients presenting to primary care with chest pain for which no diagnosis is given, whether those at greatest risk of cardiovascular disease and hence for whom early preventative medication and targeting of key risk factors may be most beneficial, can be accurately identified. Setting the study within routinely recorded EHR ensured identified risk factors are readily available to GPs. The specific objectives were (i) to assess the performance of a general population risk prediction model (QRISK3) in this population, (ii) determine key risk factors for cardiovascular disease in patients with unattributed chest pain in UK primary care, and (iii) derive and validate improved prediction models for cardiovascular disease in these patients.

Setting
The study was set within the Clinical Practice Research Datalink (CPRD). All analyses were performed using the CPRD Aurum database with validation performed in the CPRD GOLD database. Aurum is a UK primary care EHR database containing anonymized information routinely recorded in (as of November 2021) over 1400 general practices (over 40 million patients) which use EMIS Web® software. 12,13 The CPRD GOLD database includes information from over 900 general practices which use Vision® software. 13,14 Practices used in this study were the subgroup of English practices which have consented to linkage to inpatient diagnoses and procedures from Hospital Episode Statistics (HES), cause-specific mortality from the Office for National Statistics (ONS), and neighbourhood deprivation scores. Data linkage is undertaken by a trusted third party (NHS Digital) using an eight-stage deterministic methodology involving the National Health Service (NHS) number, gender, dob, and postcode. 15 The majority of patients (for example, 96% for CPRD GOLD to HES in 2018) are matched on exact NHS number, gender, dob, and postcode, or exact NHS number, gender, and dob. The de-identified linked data are then sent to CPRD with the relevant requested anonymized data then sent on to researchers. UK primary care has traditionally used Read codes (up to 2018) to electronically record morbidities and symptoms presented by patients whilst more recently SNOMED-CT is used. UK secondary care uses ICD-10 and OPCS Classification of Interventions and Procedures codes (OPCS-4) to record morbidities and procedures, respectively. The study followed the PROGRESS framework for prognostic research 16,17 and is reported using TRIPOD guidance. 18

Study population
As described previously, 9 the study population was patients aged ≥18 years presenting to primary care between 2002 and 2018 with incident chest pain with no cause recorded. The date of incident chest pain was defined as the index date. Patients were excluded if they had cardiovascular disease recorded prior to or up to 6 months after their index date, a non-coronary cause (such as costochondritis) recorded for their chest pain at index date or in the 6 months after index date, <2 prior years of registration at their general practice, or <6 months of follow-up data after index date. We allowed 6 months after index date (the 'diagnostic window') for investigations and diagnosis related to initial presentation to occur.
Unattributed chest pain was defined using Read codes recorded in primary care for symptoms not clearly specifying the cause of the pain. This included codes with terms such as 'chest pain not otherwise specified' and 'chest tightness'. Read code lists were derived through consensus work in a previous study 4 and are shown in Supplementary material online, Table S1.

Outcomes
The primary outcome was incident cardiovascular disease (CVD) defined as a record of any of fatal or non-fatal acute myocardial infarction, angina, coronary heart disease not otherwise specified, heart failure, ventricular arrhythmia, cardiac arrest, ischaemic stroke, haemorrhagic stroke, stroke type not specified, transient ischaemic attack, peripheral arterial disease, abdominal aortic aneurysm, sudden cardiac death, percutaneous coronary intervention, and coronary artery bypass graft surgery. Outcomes were captured from the primary care, secondary care, and ONS death registry records, using derived and validated algorithms. 19 Patients were followed from end of the 6-month diagnostic window until end of follow-up defined as the earliest of date of death, transfer out of practice, occurrence of outcome, or end of study (31 December 2018).

Risk factors
The potential risk factors were decided by consensus of the study team by consideration of those included in the QRISK3 algorithm, 10 and potential alternative explanations for chest pain and comorbidities previously suggested to be predictive of cardiovascular disease. 20,21 These are listed in Table 1. Comorbidities were measured in the 24 months prior to index date up to end of the 6-month diagnostic window. Prescription-based comorbidities (treated hypertension, corticosteroids) were defined as at least two prescriptions in this 30-month period. Body mass index (BMI, categorized into underweight, normal, overweight, obese, and not recorded) and smoking status (never, current, ex, and not recorded) were based on record nearest, but prior to, the end of the 6-month diagnostic window. Body mass index was categorized to allow use of information captured by Read codes (for example, diagnosis codes for overweight or obese) where no BMI value was recorded. Neighbourhood deprivation was based on the Townsend score and categorized at the quintile scores. As cholesterol values were not recorded comprehensively and unlikely to be missing at random, we imputed total cholesterol/HDL ratio based on the mean value for those in the data set with the same age, gender, and ethnicity.

Performance of QRISK3
The QRISK3 estimated 10-year CVD risk was calculated using the online open access gender-specific algorithms, 22 replicated for use in Stata/MP 15.1 for Windows, and compared for different combinations of risk factors to the estimated risk produced by the online calculator. Determination of QRISK3 score requires actual BMI value; therefore, for patients with a recorded BMI category (underweight, normal, overweight, or obese) but no BMI value recorded, we allocated mean BMI value for those of the same BMI category, age, gender, and ethnicity. If there was no BMI category recorded, they were allocated the mean BMI value for those of the same age, gender, and ethnicity, as this is how missing data are imputed by QRISK3. For smoking, if the record only indicated current smoker and no evidence of level, then they were allocated the most frequent level of smoking (light, moderate, and heavy) for current smokers of their age, gender, and ethnicity. If there was no information on smoking, then they were allocated the most frequent category for people of their age, gender, and ethnicity. Performance of QRISK3 in both Aurum and GOLD was assessed through discrimination and calibration. Discrimination was assessed using Harrell's C-statistic which ranges between 0.5 (even chance) and 1 (perfect discrimination). Calibration was assessed in three ways: (i) ratio of expected and observed probability of CVD, (ii) calibration slope by estimating the beta-coefficient of the linear predictor of the score via a flexible parametric survival model for CVD, and (iii) a calibration plot of observed and expected probabilities for each tenth of predicted risk forming 10 equal sized groups.

Determination of risk factors and development of new model
Determination of key risk factors and new model development was performed in Aurum. Unadjusted and adjusted associations between risk factors and time to cardiovascular event were modelled using flexible parametric models with three degrees of freedom. Five models were considered overall, building in complexity. Model 1 included only demographic information (age centred around the mean; gender; ethnicity; and deprivation). Model 2 (reduced model) included risk factors the research team considered to be the key risk factors for cardiovascular disease: age, gender, ethnicity, deprivation, smoking status, type 1 diabetes, type 2 diabetes, family history of coronary event, chronic kidney disease, atrial fibrillation, treated hypertension, and body mass index. Model 3 included all covariates. Model 4 tested fractional polynomials for age and total cholesterol/HDL ratio and then used backwards stepwise selection (based on P < 0.01) of the factors in Model 3, with enforced entry of age, gender, and ethnicity. The full model (Model 5) assessed through backwards stepwise selection interactions of age and gender with the covariates remaining in Model 4.

Internal validation
For each model, the 10-year estimated risk of CVD was calculated for each patient. Discrimination was assessed using the C-statistic, and the D statistic where higher values indicate greater discrimination with an increase of ≥0.1 over other prediction models suggesting improved separation. Calibration was assessed as described for the assessment of QRISK3. The amount of optimism in the models was assessed using van Houwelingen's heuristic shrinkage factor using 83 degrees of freedom. 23 The net reclassification index (NRI) was derived comparing risk categorization on the QRISK3 to risk obtained from the optimal developed model using a risk of ≥10% as the cut-off as this is the level at which QRISK3 defines patients at high risk for CVD.

External validation
External validation of the five estimated risk equations from the above models was assessed in the CPRD GOLD data set. C-statistic, calibration slope, and ratio of expected and observed probability were determined. For the optimal models, discrimination and calibration by gender, geographical region (ten regions), and deprivation category (based on quintile scores) were also assessed.

Reduction in modifiable risk factors
The extent that risk of CVD could be reduced was determined by assessing potential impact of a population level reduction in two modifiable risk factors, using CPRD Aurum. Changes in risk factors considered were (i) move from current smoker to ex-smoker; (ii) move from obese to overweight; and (ii) move from obese to overweight or from current smoker to exsmoker. The estimated reduction in mean 10-year risk was determined for each of these changes.

Sensitivity analyses
The first sensitivity analysis excluded patients prescribed lipid-lowering drugs during the diagnostic window. The second sensitivity analysis imputed Cardiovascular risk in unattributed chest pain missing data for smoking status, BMI, and cholesterol using multiple imputation by chained equations. All covariates plus the optimal fractional polynomials for age identified in Model 4, indicator for cardiovascular event, and time to cardiovascular event were included in the multiple imputation model. A two-stage procedure was conducted to identify the optimal number of imputed data sets required. 24 Ten data sets were first imputed to determine the total number of imputed data sets required to ensure standard errors of hazard ratios could be replicated. In total, 20 imputed data sets were created for analysis. Finally, gender-specific models were developed and compared with the model incorporating gendercovariate interactions.

Patient and public involvement
Three meetings with a patient and public users group were held. These meetings highlighted key risk factors from the patient perspective, discussed  Cardiovascular risk in unattributed chest pain interpretations of findings, and potential use by patients of the information resulting from the study.

Results
In the development data set (Aurum), 374 917 patients had a new record of unattributed chest pain, fulfilled the inclusion criteria, and had complete linkage. Mean age was 47.8 (SD 16.5) years and 47% were male. Median follow-up was 6.1 years. There were 226 024 patients in the validation data set (GOLD) with similar mean age (47.3) and percentage who were male (47%), but with a shorter median follow-up of 5.4 years. Baseline characteristics are shown in Table 1 (development data set) and Supplementary material online, Table S2 (validation data set).

QRISK3
In Aurum, although the C-statistic was high (0.79 overall, 0.79 males, 0.80 females), QRISK3 underestimated the risk of CVD, with the amount of underestimation becoming larger in higher-risk groups ( Table 2, Supplementary material online, Figure S1). The estimated calibration slope was 0.75 (overall), 0.83 (males) and 0.78 (females) and the ratio of expected and observed probability of CVD was 0.51 (overall), 0.53 (males) and 0.48 (females). Similar performance measures were observed in GOLD ( Table 3, Supplementary material online, Figure S1).

Determination of risk factors and model development
The associations with CVD for each of the risk factors across Models 1-4 in the development data set are shown in Interactions of gender with age, ethnicity, type 2 diabetes, atrial fibrillation, treated hypertension, respiratory conditions, and BMI were included in Model 5, suggesting that their impact varies by gender with slightly higher increased risk if a comorbidity is present in females than males, but lower risk related to obesity and in Asian populations. No interactions were observed for age other than with gender. This matched findings from the sensitivity analysis developing gender-specific models which showed consistency in risk factors between males and females, although generally with slightly stronger associations in females for comorbidity (see Supplementary material online, Table S3).
Sensitivity analysis removing patients prescribed lipid-lowering drugs and imputation of missing data yielded similar hazard ratios as the main analysis (data not shown).

Internal validation
Internal validation showed predictive performance was good across the five models and better than QRISK3 ( Table 2). C-statistic values ranged between 0.78 and 0.80. The D statistic values suggest Models 2-5 had greater discriminative ability than Model 1 although there was little difference in discrimination ability between these four models. There was close agreement between the observed and predictive probabilities for CVD. There was a negligible amount of optimism so estimated coefficients were not corrected for this. Performance was good when stratified by gender ( Table 2).
Calibration plots showed good agreement between observed and predicted CVD at all levels of risk overall, and by gender (plots for Model 5 shown in Figure 1).

Comparison to QRISK3
Model 5 estimated 53% of patients had a 10-year CVD risk of 10% or more, compared to 29% using QRISK3. Only 188 (0.1%) patients with risk more than 10% on QRISK3 moved to a risk estimate of less than 10% on the new model. Net reclassification index for events (net proportion of patients with events assigned a higher risk category based on ≥10% cut-off) was 0.21 and for non-events (net proportion of patients without events assigned a lower risk category) was −0.24.

External validation
The models showed strong predictive performance in the external validation data set, overall and by gender, and were again superior to QRISK3 ( Table 3). Model 5 C-statistic was 0.81 and the ratio of expected and observed probabilities for CVD and calibration slopes  were close to one. Calibration plots for Model 5 ( Figure 2) show good agreement between observed and expected risk at all levels of risk. Stratified by deprivation, Model 5 C-statistics ranged from 0.80 to 0.81 and calibration slopes from 0.91 to 1.09 and stratified by geographical region, C-statistics ranged from 0.79 to 0.82 and calibration slopes from 0.94 to 1.08 (see Supplementary material online, Figures S2 and  S3). Model 5 also performed well in those currently not prescribed lipid-lowering drugs (C-statistic 0.81, calibration slope 1.05) and performed as well as gender-specific models in terms of discrimination and calibration (see Supplementary material online, Table S4). The reduced Model 2 based on traditional risk factors also gave good model performance that was similar to the full Model 5 in the validation data set ( Table 3).
The risk equations based on Models 2 and 5 are shown in Supplementary material online, Table S5.

Modifiable risk factors
Nearly half of patients with a Model 5 estimated CVD risk of ≥10% were either current smokers or obese. Population level removal of these factors reduced the estimated mean 10-year risk from 17.4% to 16.9% (all obese to overweight), 16.5% (all current to ex-smoker), and 16.0% (all obese to overweight and all current to ex-smoker). The biggest estimated effect is in those who are obese and currently smoke where removal of both factors would reduce estimated risk from 21.7% to 15.6%.

Discussion
This study of over 600 000 patients with unattributed chest pain has identified their key risk factors for future cardiovascular disease recorded in primary care, highlighted that general population algorithms will underestimate CVD risk in this population, derived improved prediction models with high discrimination and calibration, and validated these findings in a second database.
There are several cardiovascular risk prediction algorithms for potential use for clinicians. 25,26 UK primary care guidelines recommend the use of QRISK for prediction of 10-year cardiovascular risk. 10 However, this algorithm was designed and validated for use in the general population, not in those presenting with chest pain who are older, have increased risk of a future cardiovascular event, and may already be prescribed lipid-lowering medication. 9 It is not surprising therefore that QRISK3 underestimates the cardiovascular risk in this population. A third of patients classified below the recommended 10% cut-off for starting preventative medication on QRISK3 were classified as having risk greater than 10% based on our developed model. By contrast, only 0.1% of patients with risk greater than 10% on QRISK3 had risk below that level on the developed model. However, other than a less strong association of chronic kidney disease, there was consistency in the key risk factors identified in this population with those identified by QRISK3 for the UK general population. This includes the higher risk associated with Asian populations and lower risk for Black populations. Whilst there are conflicting findings from other studies relating to risk for Black populations, a reduced risk has also been identified in other UK general population studies 27 and other studies have shown an increased risk of cardiovascular disease in Black patient groups is removed after adjustment for other risk factors such as socioeconomic characteristics. 28,29 Our study indicates risk of cardiovascular disease is higher in males, which has also been shown in patients discharged from hospital with unexplained chest pain. 30 Despite indications in our study from our full model (Model 5) that certain comorbidities (type 2 diabetes, atrial fibrillation, hypertension, and respiratory conditions) confer higher risk in females than males, and being obese a lower Cardiovascular risk in unattributed chest pain risk, the full prediction model including interaction terms with gender did not greatly improve the model compared to models without interaction, and gender-specific models also did not improve performance. Higher risk related to some comorbidity for females seen in the unattributed chest pain population is also generally evident for the general population. 10,31

Implications
There is a high proportion of patients presenting with chest pain that, whilst not typical of ischaemic chest pain, cannot be attributed definitively to another cause, and these patients are at higher risk of future CVD than those with chest pain attributed to a non-coronary cause or without chest pain. 4,7-9 A survey of UK GPs in 2019 found that most respondents are aware of cardiovascular risk prediction tools and QRISK in particular, and use them to guide therapy and to comply with guidelines. 32 However, studies have also shown that preventative medication is not always targeted at those most at risk. [33][34][35] This includes those patients presenting with chest pain with an unattributed cause 9 and that will be magnified as our current analysis shows that the risk of future CVD is likely to be underestimated by CVD risk prediction tools recommended for use in primary care as a method of ascertaining who should receive preventative measures. Whilst the most optimal model developed in those with unattributed chest pain includes a range of covariates and interactions, a simpler model utilizing just traditional risk factors and without interactions can be used without great loss of performance (as found in other populations 36 ). A small number of key risk factors can hence be used to accurately predict CVD, and it likely that GPs should focus on these factors as a means of targeting for closer surveillance and management. This could include encouragement and initiatives to improve lifestyle behaviours relating to diet, physical activity, and smoking, and prescribing of lipid-lowering and other preventative medication. This study has also shown the potential benefit of lifestyle behaviour relating to smoking and diet, with a potential reduction for those currently obese and who smoke from 22% to 16% in mean 10-year CVD risk.

Strengths and limitations
This study utilized two large, nationally representative primary care EHR databases, 12,14 representing different information systems used in UK primary care, with linkage to inpatient, mortality, and deprivation data. The list of potential risk factors was wide and drew on those used in other UK cardiovascular prediction algorithms. Models performed well across genders, deprivation levels, and geographical regions.
A coded primary care record of chest pain with no attribution does not indicate any suspected underlying reason for the chest pain the primary care provider may have. This may be recorded in free (unstructured) text that generally cannot be accessed for research. The coded record though should reflect findings from any cardiac diagnostic investigation. We excluded patients with recorded cardiovascular events in the first 6 months as they were likely to be the underlying reason for the initial presentation of chest pain. It is possible the diagnostic period may be longer than 6 months for some patients prior to a cardiovascular event being diagnosed, particularly if the patient presented with atypical features. However, rapid access chest pain clinics in the UK should ensure that most patients receive a diagnosis within 6 months. Some patients were already being prescribed lipid-lowering drugs and the algorithm may be best used on those not currently offered such preventative medication. However, the magnitude and direction of the risk factor estimates were similar in the subgroup of those not prescribed lipid-lowering medication to those for the overall models. As is common in routine primary care data, there was missing data on smoking, BMI, and cholesterol. A sensitivity analysis using multiple imputation suggested the missing data would not impact on findings. There was a low percentage of patients recorded as non-white in the validation data set. Further research should test the models in different ethnic groups.

Conclusions
Patients presenting to primary care with unattributed chest pain are at increased risk of cardiovascular events, but this study has shown that it is feasible to ascertain those most at risk using routinely recorded information in the primary care record. Consideration of a select number of key risk factors identified here could help target patients at highest risk for preventative measures.