Using big data from health records from four countries to evaluate chronic disease outcomes: a study in 114 364 survivors of myocardial infarction

Abstract Aims To assess the international validity of using hospital record data to compare long-term outcomes in heart attack survivors. Methods and results We used samples of national, ongoing, unselected record sources to assess three outcomes: cause death; a composite of myocardial infarction (MI), stroke, and all-cause death; and hospitalized bleeding. Patients aged 65 years and older entered the study 1 year following the most recent discharge for acute MI in 2002–11 [n = 54 841 (Sweden), 53 909 (USA), 4653 (England), and 961 (France)]. Across each of the four countries, we found consistent associations with 12 baseline prognostic factors and each of the three outcomes. In each country, we observed high 3-year crude cumulative risks of all-cause death (from 19.6% [England] to 30.2% [USA]); the composite of MI, stroke, or death [from 26.0% (France) to 36.2% (USA)]; and hospitalized bleeding [from 3.1% (France) to 5.3% (USA)]. After adjustments for baseline risk factors, risks were similar across all countries [relative risks (RRs) compared with Sweden not statistically significant], but higher in the USA for all-cause death [RR USA vs. Sweden, 1.14 (95% confidence interval 1.04–1.26)] and hospitalized bleeding [RR USA vs. Sweden, 1.54 (1.21–1.96)]. Conclusion The validity of using hospital record data is supported by the consistency of estimates across four countries of a high adjusted risk of death, further MI, and stroke in the chronic phase after MI. The possibility that adjusted risks of mortality and bleeding are higher in the USA warrants further study.


Introduction
Health records from different health systems might provide insights into the care of patients with chronic diseases and the long-term outcomes of these conditions, 1,2 but there have been few comparisons across countries. National hospital data are collected and coded in health systems in many countries and such data (compared with voluntary registries or consented studies) may provide samples that are larger, more nationally representative, and not limited to the study of any one disease, or any one stage of its development. 3 However, there are important concerns about the quality and validity of such data.
In coronary disease, most studies of outcomes following myocardial infarction (MI) have focused on the acute phase post-MI, typically up to 1 year. However, given marked improvements over the past decade in short-term and long-term mortality following MI, 4 -6 there is a growing need to characterize the outcomes experienced by patients in whom follow-up begins after the acute phase. By the time of the first anniversary following admission for an acute MI, dual antiplatelet therapy, 7 -10 cardiac rehabilitation, and cardiologist follow-up 11 have commonly ended, and uptake of secondary prevention medication may be declining. 12 Recent clinical guidelines 7 -10 do not directly address the care of patients in this chronic phase of disease, whereas a recent trial found that prolonged dual antiplatelet therapy beyond the first year after an acute MI lowers the risk of cardiovascular death, MI, and stroke. 13 To deliver better long-term care for patients surviving MI, two central questions need addressing. First, what is the risk of major clinical outcomes following the high-risk acute post-MI phase? Nearly all previous studies 14, 15 of MI outcomes start in the acute hospital setting rather than in the community, and it is well known that early events predominate in estimates of long-term risk. Most of the information on long-term outcomes available so far is derived from trials and voluntary registries, whose risks may not extrapolate to the wider patient population. 16 Secondly, how do long-term clinical outcomes vary in different health systems? While international comparisons of cancer outcomes 17 have influenced policy and quality-improvement initiatives, in coronary disease comparisons have been limited to the acute hospital care setting. 5,18,19 To answer these questions, we sought national, unselected, ongoing sources of data provided by the health systems in four countries. While these data sources have been used for acute MI outcomes research within countries, 20 their use in evaluations of the chronic phase of disease has been much less common, and the present study is the first to use such data to compare outcomes between the USA and European countries (Sweden, England, and France). Our objective was to test the validity of using such hospital record data to estimate and compare across countries the risk of three prognostic outcomes among MI survivors: all-cause death; composite of MI, stroke, or all-cause death; and hospitalized bleeding.

Health record data sources and study population
We analysed anonymized patient data from national ongoing hospital sources that use the International Classification of Diseases (ICD) coding system. In Sweden, we used nationwide (100% population coverage) administrative linked data (not directly used for reimbursement) obtained from mandatory Swedish national registries: the National Inpatient Register, the Swedish Prescribed Drug Register, and the Cause of Death Register. In the USA, we used an administrative claims database (Medicare) obtained from the Centers for Medicare & Medicaid Service's standard analytic files that are publicly available; these contain a nationally representative 5% random sample of all Medicare beneficiaries, based on selecting records with 05, 20, 45, 70, or 95 in positions 8 and 9 of the Social Security Number (SSN) (Centers for Medicaid and Medicare, Standard Analytical Files. https://www.cms.gov/ research-statistics-data-and-systems/files-for-order/limiteddatasets/ standardanalyticalfiles.html, accessed 17 December 2015). Patients are linked across the enrolment and eligibility file and service claims files using a unique encrypted SSN. Deaths are determined by linkage to the National Death file. In England, a single primary care electronic health record (EHR) covers .95% of the population and we used a 4% sample available for research. We used the CALIBER research platform of primary care EHRs (Clinical Practice Research Datalink), linked via the unique identifier of the National Health Service number with other record sources [the Myocardial Ischaemia National Audit Project (MINAP), the Hospital Episodes Statistics database, and the nationwide cause-specific mortality database]. The CALIBER data resource has been shown to be representative of the general population, 21 -23 and valid for cardiovascular research. 24 -28 In France, the source data came from the administrative claims insurance database, which covers 95% of the French population. The sample [Echantillon Généraliste des Bénéficiaires (EGB)] available for researchers was built by randomly selecting patients from their national id check number (97 random possibilities). This permanent 1/97 sample has been shown to be representative in terms of age, sex, social status, and overall medical expenses. 29 -33 The EGB health insurance claims data are linked to hospital discharge summaries and death registry through the unique healthcare identifier number.
Our study population was defined by the presence of three characteristics. First, we identified an index acute MI as the patient being admit-  34 Patients had to have continuous registration in the respective data sets for at least 12 months before the index MI (the first MI admission during the study period). Second, we identified those patients who at 12 months after their index acute MI were alive, with no further MI. We defined the study entry date as 12 months after the date of admission for the index MI. Third, we restricted the population to patients aged 65 years and older at study entry with no upper age bound, because Medicare predominantly covers this age group (the USA has no national unselected sources of data in younger patients).
The study was approved by the Independent Scientific Advisory Committee of the Medicines and Healthcare products Regulatory Agency (protocol number 13_163) in England, regional ethics committee in Linkö ping, Sweden (reference number 2013/294-31), and Centers for Medicare & Medicaid Services Data Use Agreement in the USA. No ethical approval is required in France for the use of anonymized data.

Baseline risk factors and co-morbidities
We included demographics (age, sex) and cardiovascular and noncardiovascular co-morbidities (ICD-9 and ICD-10 codes in Supplementary material online, Table S1) appearing as primary or secondary diagnoses in hospital admissions before the study entry date. We considered patients as currently receiving a medication (codes in Supplementary material online, Table S2) if their last active prescription or dispensation ended ,60 days before study entry. No prescription data were available in the Medicare data. We included percutaneous coronary intervention (PCI) and International health record data for evaluating chronic disease outcomes coronary artery bypass graft (CABG) procedures performed on the day of the index MI up to the following 12 months.

Endpoints
We studied three outcomes of interest: all-cause death; a composite of death, hospital admission for MI, or hospital admission for stroke; and hospitalized bleeding. The ICD-9/ICD-10 codes used to define these outcomes are shown in Supplementary material online, Table S3. Stroke types included ischaemic, haemorrhagic, and unclassified. Hospitalized bleeding was defined as hospital admission with a bleeding cause as a primary diagnosis. Patients were censored at the earliest of experiencing the event of interest (with censoring specific to that event type), deregistration from the primary care practice (England), or end of study period.

Statistics
Data from each of the four countries were analysed independently following a common protocol. We estimated the direct age-and sex-standardized prevalence of co-morbidities in each country using as reference the 2012 World Health Organization world population truncated to ages 65 years and older. For each country and endpoint, we estimated observed (Kaplan -Meier) and predicted risks, adjusted to the average characteristics of the Swedish patients (aged 78 years, with covariate values shown in Supplementary material online, Table S4). We chose Sweden as the reference population because it had the largest sample size. Predicted risks were based on incrementally adjusted Cox models (fitted separately per country): Model 1 included age, sex, and year of index MI; Model 2 included Model 1 covariates plus co-morbidities [history of more than one MI, diabetes, renal disease, heart failure, peripheral arterial disease (PAD), atrial fibrillation, stroke, hospitalized bleeding, chronic obstructive pulmonary disease, and cancer]; Model 3 included Model 2 covariates plus revascularization procedures (CABG or PCI) received in the 12 months following the index MI. Annual risks were estimated as the average annual risks over the first 3 years.
We estimated the relative risks (RRs) for each endpoint in each country and the 95% confidence intervals (CIs) for 3 years of follow-up using as reference the corresponding risks estimated for Sweden. For a time point t the RR for country A vs. country B is RR t = (risk(t) A)/(risk(t) B). The overall RR reported is the mean of RR t {t = 0, 0.5, . . . 3 years}. We verified the proportional hazards assumption of the Cox model within countries by plotting the Schoenfeld residuals and confirmed that RRs did not change with time by plotting time-specific RRs estimated for every half year between 0 and 3 years of follow-up (Supplementary material online, Figure S5).
We compared the associations of age, sex, co-morbidities, and revascularization treatments with the outcomes across the different countries based on the adjusted hazard ratios (HRs) in Model 3. The overall mean HR for a risk factor was estimated by combining country-specific HRs via random-effects meta-analysis. For France, risk of hospitalized bleeding was adjusted only for Model 1, owing to the small number of events (n ¼ 23). Analyses were performed in R version 15 and SAS version 9.3.

Patients
Of the 220 738 patients hospitalized for MI during the study period, 114 364 (54 841 in Sweden, 53 909 in the USA, 4653 in England, and 961 in France) were eligible for inclusion in the analysis (alive, aged 65 years and older, and without subsequent MI at 12-month follow-up; Supplementary material online, Figure S1). Median follow-up ranged from 1.5 years (England) to 3.2 years (USA), during which a total of 37 626 deaths, 45 072 events of MI/stroke/death, and 4697 bleeding hospitalizations were observed in the four countries.

Baseline characteristics
Baseline characteristics of the post-MI survivors from each country are shown in Table 1. Mean age ranged from 77.5 years in England to 78.6 years in the USA. After standardization for age and sex, we found that compared with patients from Sweden, England, and France, US patients had a higher prevalence of diabetes, heart failure, PAD, renal disease, and chronic obstructive pulmonary disease, and were more likely to have undergone CABG ( Figure 1).

All-cause death
There were large differences in the unadjusted (Kaplan-Meier) risk of all-cause death across the four countries ( Figure 2 .7)]. These differences were progressively attenuated to not statistically significant (95% CI for the RR vs. Sweden crossing 1) after sequential adjustments for age, sex, year of index MI, co-morbidities, and revascularization treatments, except for the USA where the RR of death compared with Sweden was slightly higher [RR USA vs. Sweden, 1.14 (95% CI, 1.04 -1. 26)]. Based on the mean covariates in the Swedish sample as per Table 1, the fully adjusted 3-year cumulative risks ranged from 12.8% (England) to 19.5% (USA).

Myocardial infarction, stroke, and all-cause death
There were large differences in the unadjusted (Kaplan-Meier) risk of the composite endpoint MI, stroke, or death across the four countries ( Figure 3) Figure S2).

Hospitalized bleeding
The observed 3-year cumulative risk of hospitalized bleeding was lowest in France (3.1%) and Sweden (3.2%), higher in England (4.6%), and highest in the USA (5.3%) (Figure 4). The adjusted 3-year risk of hospitalized bleeding ranged from 2.7% (Sweden) to 4.0% (USA and England). Compared with Sweden, the fully adjusted RR of bleeding for French and English patients was close to 1.0 (not statistically significant), but was .50% higher for US patients [RR 1.54 (95% CI, 1.21-1.96)].

Outcome predictors
Each of the three outcomes showed consistent and strong (majority of HRs .1.5) age-and sex-adjusted associations across the four countries for 12 baseline variables assessed, including risk factors and cardiovascular and non-cardiovascular co-morbidities. The strongest associations (approximately two-fold increase in risk) with the composite of MI, stroke, or death ( Figure 5) or with allcause death alone (Supplementary material online, Figure S3) were observed for history of renal disease, heart failure, chronic obstructive pulmonary disease, and cancer. For hospitalized bleeding, the strongest associations were observed with history of previous hospitalized bleeding, renal disease, heart disease, PAD, and atrial fibrillation (Supplementary material online, Figure S4).

Discussion
In one of the first US -European uses of hospital record data to evaluate long-term fatal and non-fatal clinical outcomes in CVD,  International health record data for evaluating chronic disease outcomes we present two findings that suggest that such data have useful validity and are informative in CVD outcomes research. First, there was a consistency across all four countries in the high level of risk of further MI, stroke, or death. This occurred in about a third of the patients aged 65 years and above over the next 3 years. This suggests that the high risk is an international phenomenon, rather than a problem with one healthcare system or resulting from the different natures of the underlying record systems. This high risk was considerably higher than that reported in the few smaller previous studies conducted in selected populations, 16 highlighting the value of examining less-selected patient samples.
Second, there was a consistency across all four countries in the magnitudes of association between 12 baseline risk factors and each of the three disease outcomes. These associations were highly consistent with published findings from smaller, consented studies, supporting the validity of our risk adjustment and comparison of outcomes. Thus, as in previous studies in post-MI survivors, 35 -37 we found strong associations between MI, stroke, or death (with heart failure, stroke, PAD, diabetes, renal disease, and chronic obstructive pulmonary disease) and for hospitalized bleeding (with renal disease, history of hospitalized bleeding, and atrial fibrillation). This provides some evidence of the prognostic validity of the hospital record data coded in different healthcare systems, despite the diversity of data collection systems.
Our approach was to use hospital healthcare records that have features of 'big data': being characterized by large sample sizes ('volume'), diverse data sources, collected for different purposes, and using different coding systems ('variety') and lack of researcher control over the meaning of the data ('veracity'). This approach has been widely advocated in understanding and improving the International health record data for evaluating chronic disease outcomes outcomes of disease, 1 but seldom applied in international contexts. 38 The strengths of this approach (compared with voluntary registries or consented studies) lie in direct health system relevance, less bias (larger samples, unselected population-based samples, long-term follow-up with minimal losses), and potential scalability to a wide range of clinical start points and endpoints. 3 Such record data are also more widely accessible to the research community than those from consented studies.
Our study has important limitations, which are largely inherent in these diverse data sources. First, in only one country (Sweden) were nationwide data accessed; the sample of national data available for research in France was particularly small, but it is, nonetheless, representative of the French population. Second, such health record data will inevitably lack relevant data items. For example, MI subtype (STEMI or NSTEMI) was not recorded across all four countries and could not therefore be included in the model adjustments. However, there is strong evidence that, at 1 year following the index MI, STEMI and NSTEMI shared similar mortality, suggesting that MI subclass is unlikely to have influenced our comparisons. 39 Information on younger patients, socioeconomic position, ethnicity, drug use, primary care, and cause-specific death was not simultaneously available in all four countries. It is a challenge to these health systems to improve the coverage, depth, and quality of data as part of efforts to expand international comparisons.
We observed an annual risk of death ranging from 6.5% (England) to 10.0% (USA), more than double those in the general population  Table S5). Since 57.9% of deaths are due to CVD (based on Swedish data), our study population is in the high-risk category based on the 2012 American College of Cardiology/American Heart Association guidelines (where high risk is defined as .3% annual risk of cardiovascular death) 40 or the 2013 European Society of Cardiology guidelines (where high risk is defined as .3% annual risk of all-cause death). 41 However, these guidelines are described in the context of the wider population of patients with stable coronary artery disease (many of whom have no history of MI). Also, most of the information comes from meta-analyses of clinical trial data, in which survival is generally higher owing to enrolment of lower-risk populations and better adherence to therapy.
Our finding of higher adjusted death rates and hospitalized bleeding rates in the USA than in Sweden could be artefactual but warrants further investigation. The higher death rates are consistent with the lower life expectancy at age 65 years in the USA compared with Europe (Supplementary material online, Table  S5). 42 It is possible that the case mix of patients differs in ways that were not included in our adjustments (e.g. related to the substantially higher prevalence of obesity in the general US population). 42 We did find that US patients had higher age-and sex-standardized prevalences of diabetes, heart failure, PAD, renal disease, and chronic obstructive pulmonary disease-but each of these factors was included in the risk adjustment models. The USA might also have a higher proportion of ethnic minorities, which could confound between-country comparisons. It is also International health record data for evaluating chronic disease outcomes possible that care differs. Studies in the USA indicate that previously uninsured populations may delay seeking care before becoming eligible for Medicare, 43,44 and mortality may remain elevated for up to 10 years, compared with those with private insurance. 45 In contrast, European Union study populations would have had continuous access to healthcare before the age of 65 years. 46 It is possible that in the USA compared with Europe secondary prevention medications including dual antiplatelet therapy (aspirin and clopidogrel) are used more or at higher doses; 47 however, evidence of this in unselected populations of MI survivors is lacking. Reported use of other CVD medications in Medicare populations indicates that treatment rates are similar to those observed in the EU study population for b-blockers and calcium channel blockers, but somewhat lower for angiotensin-converting enzyme inhibitors and lipid-lowering therapies. 48 -53 Our findings have clinical implications. First, our results provide evidence for clinicians and regulators when considering new interventions, and when assessing the generalizability of results from clinical trials. 13,43 The recently reported PEGASUS-TIMI-54 trial results in 1-year MI survivors are the first to demonstrate a role for long-term (i.e. beyond 1 year) dual antiplatelet use. 13 We applied the trial inclusion and exclusion criteria to our real-world patients (Supplementary material online, Figure S1) and demonstrated that the 'trial-like' population represents a large proportion (e.g. 66% in Sweden) of the overall MI survivor population, and identified a population at high risk (Supplementary material online, Figure S6). Second, our findings suggest the value of considering MI in a chronicdisease management framework, e.g. with a 1-year health check after acute MI optimizing behavioural, secondary preventive, and wider health interventions. We found that a substantial proportion of deaths are from non-cardiovascular causes (53% in England and 42% in Sweden), suggesting the importance of a multidisciplinary team approach in primary care. Guidelines need to be developed for this population that recognize the multitude of cardiovascular co-morbidities (atrial fibrillation, heart failure, diabetes, and PAD) and non-cardiovascular co-morbidities (renal disease, chronic obstructive pulmonary disease) that are highly prevalent among longterm survivors of MI.

Figure 5
Age-and sex-adjusted hazard ratios (95% confidence interval) for the association of age, sex, and medical history with the composite of myocardial infarction, stroke, and all-cause death among post-myocardial infarction survivors from Sweden (n ¼ 54 841), USA (n ¼ 53 909), England (n ¼ 4653), and France a (n ¼ 961). a Incidence of PAD in the French study was ,0.5%; hence, it was not possible to obtain estimates of association with outcomes. CI, confidence interval; COPD, chronic obstructive pulmonary disease; HR, hazard ratio; MI, myocardial infarction; PAD, peripheral arterial disease.
In conclusion, analysing hospital record data in the USA and three European countries reveals a consistently high adjusted risk of death, further MI, and stroke in the chronic phase after MI. Inherently, diverse data produced by different health systems may provide insights that are useful in evaluating and comparing the care of patients with chronic diseases and the long-term outcomes of these conditions.

Supplementary material
Supplementary material is available at European Heart Journal -Quality of Care and Clinical Outcomes online.