-
PDF
- Split View
-
Views
-
Cite
Cite
Lovedeep S Dhingra, Arya Aminorroaya, Veer Sangha, Aline F Pedroso, Folkert W Asselbergs, Luisa C C Brant, Sandhi M Barreto, Antonio Luiz P Ribeiro, Harlan M Krumholz, Evangelos K Oikonomou, Rohan Khera, Heart failure risk stratification using artificial intelligence applied to electrocardiogram images: a multinational study, European Heart Journal, Volume 46, Issue 11, 14 March 2025, Pages 1044–1053, https://doi.org/10.1093/eurheartj/ehae914
- Share Icon Share
Abstract
Current heart failure (HF) risk stratification strategies require comprehensive clinical evaluation. In this study, artificial intelligence (AI) applied to electrocardiogram (ECG) images was examined as a strategy to predict HF risk.
Across multinational cohorts in the Yale New Haven Health System (YNHHS), UK Biobank (UKB), and Brazilian Longitudinal Study of Adult Health (ELSA-Brasil), individuals without baseline HF were followed for the first HF hospitalization. An AI-ECG model that defines cross-sectional left ventricular systolic dysfunction from 12-lead ECG images was used, and its association with incident HF was evaluated. Discrimination was assessed using Harrell’s C-statistic. Pooled cohort equations to prevent HF (PCP-HF) were used as a comparator.
Among 231 285 YNHHS patients, 4472 had primary HF hospitalizations over 4.5 years (inter-quartile range 2.5–6.6). In UKB and ELSA-Brasil, among 42 141 and 13 454 people, 46 and 31 developed HF over 3.1 (2.1–4.5) and 4.2 (3.7–4.5) years. A positive AI-ECG screen portended a 4- to 24-fold higher risk of new-onset HF [age-, sex-adjusted hazard ratio: YNHHS, 3.88 (95% confidence interval 3.63–4.14); UKB, 12.85 (6.87–24.02); ELSA-Brasil, 23.50 (11.09–49.81)]. The association was consistent after accounting for comorbidities and the competing risk of death. Higher probabilities were associated with progressively higher HF risk. Model discrimination was 0.718 in YNHHS, 0.769 in UKB, and 0.810 in ELSA-Brasil. In YNHHS and ELSA-Brasil, incorporating AI-ECG with PCP-HF yielded a significant improvement in discrimination over PCP-HF alone.
An AI model applied to a single ECG image defined the risk of future HF, representing a digital biomarker for stratifying HF risk.

Among individuals without baseline heart failure (HF), an artificial intelligence (AI) model that detects cross-sectional left ventricular systolic dysfunction from 12-lead electrocardiogram (ECG) images was evaluated for predicting new-onset HF. In multinational cohorts, the AI-ECG model strongly predicted the risk of future HF, representing a digital biomarker for stratifying HF risk using a single ECG image. AI-ECG, artificial intelligence-enhanced electrocardiogram; aHR, adjusted hazard ratio; ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; PCP-HF, pooled cohort equation to prevent heart failure; YNHHS, Yale New Haven Health System
See the editorial comment for this article ‘Using artificial intelligence to spot heart failure from ECGs: is it prime time?’, by C. Antoniades, https://doi.org/10.1093/eurheartj/ehae906.
Introduction
Despite the rising global burden of heart failure (HF) and the availability of evidence-based therapies for preventing and slowing the progression of the disease, there is a lack of a broadly accessible approach for identifying individuals at the highest risk of developing HF.1,2 Due to this absence of an established and accessible screening strategy, patients often have delayed diagnosis and its consequences, including clinical HF, frequent hospitalizations, and premature mortality.3–5 Identifying individuals most likely to develop future HF can alleviate these risks with early initiation of low-cost medical therapies that have been proved in clinical practice guidelines to modify the trajectory of the disease, reducing both the risk of incident clinical HF and improving life expectancy.6–9
Several serum assay- and clinical score-based strategies have been proposed to predict incident HF.10–18 While serum assay-based biomarkers such as N-terminal pro-B-type natriuretic peptide (NT-proBNP) and high-sensitivity cardiac troponin I are independently associated with an elevated risk of incident HF,18–21 they are limited by the need for an invasive blood draw and frequent inaccessibility at the point-of-care.15,18 Predictive models based on clinical risk scores often require specialized testing and have varying predictive discrimination and feasibility of deployment.11–13,22,23 Recently, artificial intelligence (AI)-enhanced interpretation of electrocardiograms (ECGs; AI-ECG) has been proposed to detect hidden cardiovascular disease signatures from 12-lead ECGs.24–34 However, these deep learning models have focused on the cross-sectional detection of prevalent systolic dysfunction or HF,29–32,35–38 with limited application in predicting incident HF.39 Moreover, most current approaches use raw ECG voltage data as inputs, which are inaccessible to clinicians and patients at the point-of-care.31,32 Thus, there is an unmet need for practical and non-invasive screening tools that rely on ubiquitous and interoperable data sources to predict the risk of HF.29 In our previous work, we reported an image-based AI-ECG detection approach, a positive screen for which portended a higher risk of subsequently developing left ventricular systolic dysfunction (LVSD) [left ventricular ejection fraction (LVEF) < 40%] in patients with normal LVEF.29 However, an AI-ECG approach for the comprehensive identification of HF risk is essential to realize the goals of HF prevention.
In this study, across three geographically and clinically distinct cohorts, we evaluated the hypothesis that an AI-ECG model developed to detect signatures of left ventricular dysfunction on an ECG image at baseline will identify those at an elevated risk of new-onset HF.
Methods
The Yale Institutional Review Board approved the study protocol and waived the need for informed consent as the study involves secondary analysis of pre-existing data. An online version of the model is publicly available for research use at: https://www.cards-lab.org/ecgvision-lv.
Data sources
We used data from the Yale New Haven Health System (YNHHS), the UK Biobank (UKB) cohort, and the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil) in our study (Figure 1). While the YNHHS represents a large and diverse healthcare system in the USA, the UKB and ELSA-Brasil represent the largest population-based cohorts in the UK and Brazil, respectively, with protocolized baseline evaluation and detailed healthcare data capture. A brief overview of the data sources is included in the Supplementary data online, Supplementary Methods.

Study overview. We examined the use of artificial intelligence applied to electrocardiogram images as a strategy to predict heart failure risk across multinational cohorts in the Yale New Haven Health System, UK Biobank, and Brazilian Longitudinal Study of Adult Health. AI, artificial intelligence; ECG, electrocardiogram; ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; EHR, electronic health records; LVSD, left ventricular systolic dysfunction; YNHHS, Yale New Haven Health System
Study population
Across data sources, we included individuals with no known HF who had undergone a 12-lead ECG and followed them for the development of incident HF. For this, we constructed a cohort of patients seeking care at YNHHS, representing a large integrated electronic health record (EHR)-based cohort along with well-characterized population-based cohort studies of UKB and ELSA-Brasil.
Among YNHHS patients, we identified the first recorded encounter for all patients within the EHR and instituted a 1-year blanking period to define prevalent HF (see Supplementary data online, Supplementary Methods and Figure S1). Among 325 319 patients who had one or more ECGs after this 1-year blanking period, 76 736 patients who had been included in the AI-ECG model development and 15 754 patients with any HF diagnosis code before their ECG were excluded. Moreover, 1544 patients with an echocardiogram with an LVEF < 50% or moderate or severe left ventricular diastolic dysfunction were excluded (see Supplementary data online, Figure S2). In a sensitivity analysis, we further implemented a 3-month blanking period after the ECG, excluding any individuals who had an HF hospitalization within 3 months following the baseline ECG.
In the UKB, we identified all 42 366 participants who had undergone a protocolized ECG as a part of the study procedures. Using the linked national EHR data from the UK, we excluded 225 participants who had been hospitalized with the record indicating an HF diagnosis code before the baseline ECG. Similarly, in ELSA-Brasil, we identified all 13 739 participants with a protocolized ECG at baseline and excluded 227 with HF at baseline and 58 with an LVEF < 50% noted on the baseline echocardiogram (see Supplementary data online, Figure S2).
Study exposure: AI-ECG-based HF risk
The AI-ECG-based risk of HF represented the direct deployment of a previously developed AI-ECG model that detects LVSD (defined as LVEF < 40%) on ECG images,29 without any further development or fine-tuning.
Briefly, the model was developed in the YNHHS using ECG images from patients with paired echocardiograms and validated across six demographically diverse and geographically distinct populations as a cross-sectional association between AI-ECG-based and imaging-based evaluation of LVSD [area under the receiver operating characteristic curve of 0.91, 95% confidence interval (CI) 0.90–0.92]. The model was deployed on ECG images, plotted in standard clinical layout from signal waveform data (examples included in Supplementary data online, Figure S3 and Supplement materials), and three novel ECG layouts (three-rhythm, no-rhythm, and rhythm-on-top formats) that were not encountered during model training (see Supplementary data online, Supplementary Methods and Figure S4). The study exposure was a positive AI-ECG screen, defined as a model output probability >0.1, representing the threshold at which the model achieved a sensitivity of over 90% for the cross-sectional detection of LVSD during internal validation.29 Further information about the application of the model in this study is included in the Supplementary data online, Supplementary Methods.
Study outcomes and covariates
The study outcome was new-onset or incident HF. In the YNHHS, this was defined as an inpatient admission with an International Classification of Diseases, Tenth Revision Code—Clinical Modification (ICD-10-CM) for HF as the principal hospitalization diagnosis (see Supplementary data online, Table S1). The choice of this approach was guided by the over 95% specificity of HF diagnosis codes, especially as the principal discharge diagnosis for a clinical diagnosis of HF.40 We pursued the same approach in UKB, where we used linked National Health Service EHR to identify hospitalization records with HF as the principal diagnosis code. In ELSA-Brasil, incident HF was identified by either in-person interview or the annual telephonic surveillance for all hospitalizations, followed by independent medical record review and adjudication of HF hospitalizations by two cardiologists (see Supplementary data online, Supplementary Methods).41
To evaluate the specificity of the HF risk defined by the AI-ECG model, we examined the risk of other cardiovascular conditions, including acute myocardial infarction (AMI), stroke hospitalizations, and all-cause mortality (see Supplementary data online, Table S1). Information about all-cause death was defined by established approaches for each source (see Supplementary data online, Supplementary Methods). A composite outcome of major adverse cardiovascular events (MACE) was defined as any primary HF, AMI, or stroke hospitalization, or death.
For all analyses, common demographic covariates were selected across cohorts, including age, sex, race, and ethnicity. Age was defined at the time of the index ECG across all cohorts. We further identified the presence of hypertension and type 2 diabetes mellitus using encounters for these conditions in the YNHHS EHR as well as the EHR records linked with UKB (see Supplementary data online, Table S1). In ELSA-Brasil, information about demographic covariates and baseline hypertension and type 2 diabetes was recorded at the baseline study visit.42 Race, or skin colour, was self-classified based on Brazil’s National Bureau of Statistics definition and classified as White, Black, ‘Pardo’, Asian, or Others.42,43
Study comparator
In YNHHS and ELSA-Brasil, we compared the predictive performance of the AI-ECG model with the pooled cohort equations to prevent HF (PCP-HF), representing sex- and race-specific clinical risk models for estimating incident HF risk, developed and validated using data from seven population-based cohorts.43,44 The PCP-HF risk score includes a combination of several demographic and laboratory-based covariates, including age, body mass index, systolic blood pressure, total cholesterol, high-density cholesterol, fasting blood glucose, smoking status, antihypertensive medication use, antihyperglycaemic medication use, as well as electrocardiographically defined QRS duration. The PCP-HF input features were defined using the EHR in YNHHS and study visits in ELSA-Brasil (see Supplementary data online, Supplementary Methods).44,45 The laboratory measurements were captured in the first and second study visits in the UKB, whereas the ECGs were recorded in the third and fourth visits. Therefore, the PCP-HF score at the time of the ECG could not be reliably calculated in UKB.
Statistical analysis
Categorical variables were reported as counts and percentages and continuous variables as median and inter-quartile range (IQR). The association of AI-ECG-based risk with incident HF was evaluated in age- and sex-adjusted Cox proportional hazard models with time-to-first HF event as the dependent variable and the AI-ECG-based screen status (positive or negative) as the key independent variable. Proportional hazards assumption was confirmed by visually inspecting the symmetric distribution of Schoenfeld residuals against time around zero, suggesting the validity of the assumption. Further, to account for the competing risk of death while evaluating incident HF, we used age- and sex-adjusted multi-outcome Fine–Gray subdistribution hazard models.46 We also pursued a pooled cohort analysis, evaluating AI-ECG-based prediction of HF after combining UKB and ELSA-Brasil. The discrimination of AI-ECG and PCP-HF for HF prediction was assessed using Harrell’s C-statistic, which incorporates the time dependence of outcomes and the non-linearity in the association between predictions and time-to-outcomes.47–49 In YNHHS, we further compared the discrimination of AI-ECG against a model that included individual PCP-HF components as covariates. For evaluating Harrell’s C-statistic for Cox proportional hazard models including more than one variable, we performed bootstrapping with 40 repetitions. We also evaluated the net reclassification improvement index (NRI) for AI-ECG over PCP-HF using the 0.1 probability threshold for both risk scores.50 The statistical analyses were two-sided, the significance level was set at 0.05, and 95% CI was reported. All statistical analyses were executed using Python 3.11.2 and R version 4.2.0. Our study follows the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis + AI (TRIPOD + AI) checklist (see Supplementary data online, Table S2).51
Results
A total of 231 285 individuals constituted the study cohort at YNHHS, with a median age of 57 years (IQR 42–70), 130 941 (56.6%) were women, and 85 559 (37.0%) were non-White. Over a median follow-up of 4.5 years (IQR 2.5–6.6), 4472 (1.9%) had a primary HF hospitalization, 9645 (4.2%) had a primary HF hospitalization or an echocardiogram with an LVEF < 50% on subsequent echocardiogram, and 17 380 (7.5%) died (Table 1).
Characteristic . | YNHHS . | UKB . | ELSA-Brasil . |
---|---|---|---|
Number | 231 285 | 42 141 | 13 454 |
Age at ECG, median (IQR) | 57.4 [42.1,70.2] | 65 [59,71] | 51 [45,58] |
Female sex, N (%) | 130 941 (56.6) | 21 795 (51.7) | 7348 (54.6) |
Race/ethnicity, N (%) | |||
White | 145 726 (63.0) | 40 691 (96.6) | 6920 (51.4) |
Black | 36 605 (15.8) | 304 (0.7) | 2130 (15.8) |
Hispanic | 36 298 (15.7) | ||
Asian | 4221 (1.8) | 600 (1.4) | 332 (2.5) |
Other | 2565 (1.1) | 546 (1.3) | 305 (2.3) |
Brazilian ‘Pardo’ | 3767 (28.0) | ||
Missing | 5870 (2.5) | ||
Death, N (%) | 17 380 (7.5) | 346 (0.8) | 229 (1.7) |
Follow-up time, years; median (IQR) | 4.5 [2.5,6.6] | 3.1 [2.1,4.5] | 4.2 [3.7, 4.5] |
Positive screens, N (%) | 17 868 (7.7) | 1142 (2.7) | 239 (1.8) |
Hypertension at baseline | 110 454 (47.8) | 6126 (14.5) | 4739 (35.3) |
Type 2 diabetes mellitus at baseline | 46 607 (20.2) | 1258 (3.0) | 2105 (15.6) |
Obesity at baseline | 40 237 (17.4) | 7535 (17.9) | 3045 (22.6) |
Use of antihypertensive drugs at baseline | 62 180 (26.9) | 9936 (23.9) | 3640 (27.1) |
Use of antihyperglycaemic drugs at baseline | 39 608 (17.1) | 321 (0.8) | 1072 (8.0) |
Primary HF hospitalization during follow-up, N (%) | 4472 (1.9) | 46 (0.1) | 31 (0.2) |
Primary HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 9645 (4.2) | ||
Any HF hospitalization during follow-up, N (%) | 19 004 (8.2) | 231 (0.5) | |
Any HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 21 849 (9.4) | ||
Primary AMI hospitalization during follow-up, N (%) | 288 (0.1) | 208 (0.5) | 60 (0.4) |
Primary stroke hospitalization during follow-up, N (%) | 3688 (1.6) | 210 (0.5) | 59 (0.4) |
Major adverse cardiovascular events during follow-up, N (%) | 24 059 (10.4) | 768 (1.8) | 338 (2.5) |
Characteristic . | YNHHS . | UKB . | ELSA-Brasil . |
---|---|---|---|
Number | 231 285 | 42 141 | 13 454 |
Age at ECG, median (IQR) | 57.4 [42.1,70.2] | 65 [59,71] | 51 [45,58] |
Female sex, N (%) | 130 941 (56.6) | 21 795 (51.7) | 7348 (54.6) |
Race/ethnicity, N (%) | |||
White | 145 726 (63.0) | 40 691 (96.6) | 6920 (51.4) |
Black | 36 605 (15.8) | 304 (0.7) | 2130 (15.8) |
Hispanic | 36 298 (15.7) | ||
Asian | 4221 (1.8) | 600 (1.4) | 332 (2.5) |
Other | 2565 (1.1) | 546 (1.3) | 305 (2.3) |
Brazilian ‘Pardo’ | 3767 (28.0) | ||
Missing | 5870 (2.5) | ||
Death, N (%) | 17 380 (7.5) | 346 (0.8) | 229 (1.7) |
Follow-up time, years; median (IQR) | 4.5 [2.5,6.6] | 3.1 [2.1,4.5] | 4.2 [3.7, 4.5] |
Positive screens, N (%) | 17 868 (7.7) | 1142 (2.7) | 239 (1.8) |
Hypertension at baseline | 110 454 (47.8) | 6126 (14.5) | 4739 (35.3) |
Type 2 diabetes mellitus at baseline | 46 607 (20.2) | 1258 (3.0) | 2105 (15.6) |
Obesity at baseline | 40 237 (17.4) | 7535 (17.9) | 3045 (22.6) |
Use of antihypertensive drugs at baseline | 62 180 (26.9) | 9936 (23.9) | 3640 (27.1) |
Use of antihyperglycaemic drugs at baseline | 39 608 (17.1) | 321 (0.8) | 1072 (8.0) |
Primary HF hospitalization during follow-up, N (%) | 4472 (1.9) | 46 (0.1) | 31 (0.2) |
Primary HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 9645 (4.2) | ||
Any HF hospitalization during follow-up, N (%) | 19 004 (8.2) | 231 (0.5) | |
Any HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 21 849 (9.4) | ||
Primary AMI hospitalization during follow-up, N (%) | 288 (0.1) | 208 (0.5) | 60 (0.4) |
Primary stroke hospitalization during follow-up, N (%) | 3688 (1.6) | 210 (0.5) | 59 (0.4) |
Major adverse cardiovascular events during follow-up, N (%) | 24 059 (10.4) | 768 (1.8) | 338 (2.5) |
AMI, acute myocardial infarction; ECG, electrocardiogram; ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; HF, heart failure; IQR, inter-quartile range; UKB, UK Biobank; YNHHS, Yale New Haven Health System.
Characteristic . | YNHHS . | UKB . | ELSA-Brasil . |
---|---|---|---|
Number | 231 285 | 42 141 | 13 454 |
Age at ECG, median (IQR) | 57.4 [42.1,70.2] | 65 [59,71] | 51 [45,58] |
Female sex, N (%) | 130 941 (56.6) | 21 795 (51.7) | 7348 (54.6) |
Race/ethnicity, N (%) | |||
White | 145 726 (63.0) | 40 691 (96.6) | 6920 (51.4) |
Black | 36 605 (15.8) | 304 (0.7) | 2130 (15.8) |
Hispanic | 36 298 (15.7) | ||
Asian | 4221 (1.8) | 600 (1.4) | 332 (2.5) |
Other | 2565 (1.1) | 546 (1.3) | 305 (2.3) |
Brazilian ‘Pardo’ | 3767 (28.0) | ||
Missing | 5870 (2.5) | ||
Death, N (%) | 17 380 (7.5) | 346 (0.8) | 229 (1.7) |
Follow-up time, years; median (IQR) | 4.5 [2.5,6.6] | 3.1 [2.1,4.5] | 4.2 [3.7, 4.5] |
Positive screens, N (%) | 17 868 (7.7) | 1142 (2.7) | 239 (1.8) |
Hypertension at baseline | 110 454 (47.8) | 6126 (14.5) | 4739 (35.3) |
Type 2 diabetes mellitus at baseline | 46 607 (20.2) | 1258 (3.0) | 2105 (15.6) |
Obesity at baseline | 40 237 (17.4) | 7535 (17.9) | 3045 (22.6) |
Use of antihypertensive drugs at baseline | 62 180 (26.9) | 9936 (23.9) | 3640 (27.1) |
Use of antihyperglycaemic drugs at baseline | 39 608 (17.1) | 321 (0.8) | 1072 (8.0) |
Primary HF hospitalization during follow-up, N (%) | 4472 (1.9) | 46 (0.1) | 31 (0.2) |
Primary HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 9645 (4.2) | ||
Any HF hospitalization during follow-up, N (%) | 19 004 (8.2) | 231 (0.5) | |
Any HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 21 849 (9.4) | ||
Primary AMI hospitalization during follow-up, N (%) | 288 (0.1) | 208 (0.5) | 60 (0.4) |
Primary stroke hospitalization during follow-up, N (%) | 3688 (1.6) | 210 (0.5) | 59 (0.4) |
Major adverse cardiovascular events during follow-up, N (%) | 24 059 (10.4) | 768 (1.8) | 338 (2.5) |
Characteristic . | YNHHS . | UKB . | ELSA-Brasil . |
---|---|---|---|
Number | 231 285 | 42 141 | 13 454 |
Age at ECG, median (IQR) | 57.4 [42.1,70.2] | 65 [59,71] | 51 [45,58] |
Female sex, N (%) | 130 941 (56.6) | 21 795 (51.7) | 7348 (54.6) |
Race/ethnicity, N (%) | |||
White | 145 726 (63.0) | 40 691 (96.6) | 6920 (51.4) |
Black | 36 605 (15.8) | 304 (0.7) | 2130 (15.8) |
Hispanic | 36 298 (15.7) | ||
Asian | 4221 (1.8) | 600 (1.4) | 332 (2.5) |
Other | 2565 (1.1) | 546 (1.3) | 305 (2.3) |
Brazilian ‘Pardo’ | 3767 (28.0) | ||
Missing | 5870 (2.5) | ||
Death, N (%) | 17 380 (7.5) | 346 (0.8) | 229 (1.7) |
Follow-up time, years; median (IQR) | 4.5 [2.5,6.6] | 3.1 [2.1,4.5] | 4.2 [3.7, 4.5] |
Positive screens, N (%) | 17 868 (7.7) | 1142 (2.7) | 239 (1.8) |
Hypertension at baseline | 110 454 (47.8) | 6126 (14.5) | 4739 (35.3) |
Type 2 diabetes mellitus at baseline | 46 607 (20.2) | 1258 (3.0) | 2105 (15.6) |
Obesity at baseline | 40 237 (17.4) | 7535 (17.9) | 3045 (22.6) |
Use of antihypertensive drugs at baseline | 62 180 (26.9) | 9936 (23.9) | 3640 (27.1) |
Use of antihyperglycaemic drugs at baseline | 39 608 (17.1) | 321 (0.8) | 1072 (8.0) |
Primary HF hospitalization during follow-up, N (%) | 4472 (1.9) | 46 (0.1) | 31 (0.2) |
Primary HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 9645 (4.2) | ||
Any HF hospitalization during follow-up, N (%) | 19 004 (8.2) | 231 (0.5) | |
Any HF hospitalization or an echocardiogram with LVEF < 50% during follow-up, N (%) | 21 849 (9.4) | ||
Primary AMI hospitalization during follow-up, N (%) | 288 (0.1) | 208 (0.5) | 60 (0.4) |
Primary stroke hospitalization during follow-up, N (%) | 3688 (1.6) | 210 (0.5) | 59 (0.4) |
Major adverse cardiovascular events during follow-up, N (%) | 24 059 (10.4) | 768 (1.8) | 338 (2.5) |
AMI, acute myocardial infarction; ECG, electrocardiogram; ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; HF, heart failure; IQR, inter-quartile range; UKB, UK Biobank; YNHHS, Yale New Haven Health System.
In UKB, 42 141 included participants had a median age of 65 years (IQR 59–71), 21 795 (51.7%) were women, and 40 691 (96.6%) were of White race. Over a median follow-up of 3.1 years (IQR 2.1–4.5), 46 (0.1%) had an HF hospitalization event and 346 (0.8%) died (Table 1).
In ELSA-Brasil, the median age of the 13 454 included participants was 51 years (IQR 45–58) and 7348 (54.6%) were women. There were 6920 (51.4%) adults identifying as White, 2130 (15.8%) as Black, and 3767 (28.0%) as ‘Pardo’. A total of 31 people developed HF, and 229 died over a median follow-up of 4.2 years (IQR 3.7–4.5).
Predicting the risk of incident HF
At YNHHS, 17 868 (7.7%) patients screened positive based on the AI model applied to ECG images. A positive screen was associated with an over 6.5-fold higher risk of incident HF [hazard ratio (HR) 6.51 (95% CI, 6.11–6.93); Table 2]. Patients with a positive AI-ECG screen had a nearly four-fold risk of incident HF, compared with patients with a negative screen, after accounting for differences in age and sex [adjusted HR (aHR) 3.88 (95% CI, 3.63–4.14)], as well as additionally accounting for differences baseline HF risk factors for hypertension and diabetes [aHR 3.73 (95% CI, 3.50–3.99)]. Accounting for the competing risk of death, in addition to age and sex, a positive screen was associated with an aHR of 3.54 (95% CI, 3.30–3.79) for incident HF (Table 2).
Artificial intelligence-enhanced electrocardiogram model performance [hazard ratio (95% confidence interval)] for predicting heart failure risk
Model . | Covariates . | YNHHS . | UKB . | ELSA-Brazil . |
---|---|---|---|---|
Cox proportional hazard model | Positive screen | 6.51 (6.11–6.93) | 18.33 (9.90–33.97) | 32.06 (15.36–66.92) |
Cox proportional hazard model | Positive screen + age + sex | 3.88 (3.63–4.14) | 12.85 (6.87–24.02) | 23.50 (11.09–49.81) |
Cox proportional hazard model | Positive screen + age + sex + HTN + T2DM | 3.73 (3.50–3.99) | 11.36 (6.04–21.36) | 17.36 (8.55–35.26) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + competing risk of death | 3.54 (3.30–3.79) | 12.70 (6.70–24.07) | 22.79 (10.21–50.89) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + HTN + T2DM + competing risk of death | 3.41 (3.18–3.65) | 11.14 (5.72–21.70) | 17.96 (8.14–39.61) |
Model . | Covariates . | YNHHS . | UKB . | ELSA-Brazil . |
---|---|---|---|---|
Cox proportional hazard model | Positive screen | 6.51 (6.11–6.93) | 18.33 (9.90–33.97) | 32.06 (15.36–66.92) |
Cox proportional hazard model | Positive screen + age + sex | 3.88 (3.63–4.14) | 12.85 (6.87–24.02) | 23.50 (11.09–49.81) |
Cox proportional hazard model | Positive screen + age + sex + HTN + T2DM | 3.73 (3.50–3.99) | 11.36 (6.04–21.36) | 17.36 (8.55–35.26) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + competing risk of death | 3.54 (3.30–3.79) | 12.70 (6.70–24.07) | 22.79 (10.21–50.89) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + HTN + T2DM + competing risk of death | 3.41 (3.18–3.65) | 11.14 (5.72–21.70) | 17.96 (8.14–39.61) |
ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; HTN, hypertension; T2DM, type-2 diabetes mellitus; UKB, UK Biobank; YNHHS, Yale New Haven Health System.
Artificial intelligence-enhanced electrocardiogram model performance [hazard ratio (95% confidence interval)] for predicting heart failure risk
Model . | Covariates . | YNHHS . | UKB . | ELSA-Brazil . |
---|---|---|---|---|
Cox proportional hazard model | Positive screen | 6.51 (6.11–6.93) | 18.33 (9.90–33.97) | 32.06 (15.36–66.92) |
Cox proportional hazard model | Positive screen + age + sex | 3.88 (3.63–4.14) | 12.85 (6.87–24.02) | 23.50 (11.09–49.81) |
Cox proportional hazard model | Positive screen + age + sex + HTN + T2DM | 3.73 (3.50–3.99) | 11.36 (6.04–21.36) | 17.36 (8.55–35.26) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + competing risk of death | 3.54 (3.30–3.79) | 12.70 (6.70–24.07) | 22.79 (10.21–50.89) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + HTN + T2DM + competing risk of death | 3.41 (3.18–3.65) | 11.14 (5.72–21.70) | 17.96 (8.14–39.61) |
Model . | Covariates . | YNHHS . | UKB . | ELSA-Brazil . |
---|---|---|---|---|
Cox proportional hazard model | Positive screen | 6.51 (6.11–6.93) | 18.33 (9.90–33.97) | 32.06 (15.36–66.92) |
Cox proportional hazard model | Positive screen + age + sex | 3.88 (3.63–4.14) | 12.85 (6.87–24.02) | 23.50 (11.09–49.81) |
Cox proportional hazard model | Positive screen + age + sex + HTN + T2DM | 3.73 (3.50–3.99) | 11.36 (6.04–21.36) | 17.36 (8.55–35.26) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + competing risk of death | 3.54 (3.30–3.79) | 12.70 (6.70–24.07) | 22.79 (10.21–50.89) |
Fine–Gray subdistribution hazard model | Positive screen + age + sex + HTN + T2DM + competing risk of death | 3.41 (3.18–3.65) | 11.14 (5.72–21.70) | 17.96 (8.14–39.61) |
ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; HTN, hypertension; T2DM, type-2 diabetes mellitus; UKB, UK Biobank; YNHHS, Yale New Haven Health System.
The association of a positive screen with an elevated risk of HF was noted across demographic subgroups of age, sex, race, and ethnicity (see Supplementary data online, Table S3). Notably, a positive screen portended an eight-fold higher risk of incident HF in patients <65 years of age at the time of ECG [aHR 8.00 (95% CI, 7.12–8.99)].
The model performance was consistent in sensitivity analyses with subsets of the population with (i) ≥3 years of follow-up in the EHR [aHR, 3.75 (95% CI, 3.46–4.08)] and (ii) ≥1 encounter every 2 years [aHR, 3.49 (95% CI, 3.25–3.75)]. Further, the patterns were consistent when a 3-month post-ECG blanking period was implemented [aHR, 3.45 (95% CI, 3.21–3.71); Supplementary data online, Table S4 and Figure S5], when a random ECG was chosen instead of the first ECG [aHR, 3.76 (95% CI, 3.38–4.18); Supplementary data online, Figure S6], and across different definitions of HF (see Supplementary data online, Tables S5 and S6). The model output probability was also associated with an increased risk of new-onset HF when evaluated on ECG images plotted in novel formats not encountered during model training [three-rhythm format: aHR 3.20 (95% CI, 3.03–3.41); no-rhythm format: aHR 2.84 (95% CI, 2.67–3.00); and rhythm-on-top format: aHR 3.17 (95% CI, 2.98–3.37); Supplementary data online, Table S7].
In UKB, 1142 (2.7%) participants screened positive with the AI-ECG model. A positive AI-ECG screen portended an 18-fold higher hazard for incident HF [HR 18.33 (95% CI, 9.90–33.97)]. Accounting for age, sex, baseline hypertension, and type 2 diabetes, screen-positive participants had an 11-fold higher risk of HF [HR 11.36 (95% CI, 6.04–21.36); Table 2]. Further, this risk was even higher in individuals below 65 years of age, with an age- and sex-adjusted HR of 25.63 (95% CI, 6.34–103.61; Supplementary data online, Table S3). After accounting for the competing risk of death, a positive screen was associated with a nearly 13-fold risk of incident HF (aHR: 12.70; 95% CI, 6.70–24.07).
At ELSA-Brasil, 239 (1.8%) participants had a positive AI-ECG screen, with a 24-fold higher risk of incident HF [age- and sex-aHR 23.50 (95% CI, 11.09–49.81)] compared with screen-negative participants. This association was consistent even after accounting for the competing risk of death (aHR of 22.79; 95% CI, 10.21–50.89; Table 2). In the pooled UKB and ELSA-Brasil, a positive AI-ECG screen portended an over 17-fold higher hazard for new-onset HF [age- and sex-adjusted HR: 17.07 (95% CI, 10.54–27.65); Supplementary data online, Table S8 and Figure S7].
Hazard across model probability outputs
In the YNHHS cohort, each 0.1 increment in the model output probability portended a 36% higher hazard of an incident primary HF hospitalization [aHR 1.36 (95% CI, 1.35–1.38)]. Among screen-positive patients, patients with model output probability between 0.1 and 0.5 and 0.5 and 1 had an over three- and seven-fold higher risk of incident HF, compared with screen-negative patients [aHR 3.31 (95% CI, 3.08–3.55) and 7.11 (95% CI, 6.42–7.88), respectively; Figure 2]. Higher model probabilities were progressively associated with a higher risk of incident HF across various probability bins (see Supplementary data online, Table S9).

Age- and sex-adjusted cumulative hazard curves for new-onset heart failure across bins of model output probability in (A) Yale New Haven Health System, (B) UK Biobank, and (C) ELSA-Brasil. Across all study cohorts, higher model output probabilities are progressively associated with higher hazard for new-onset heart failure. AI-ECG, artificial intelligence-enhanced electrocardiogram; ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; YNHHS, Yale New Haven Health System
These patterns were replicated across both UKB and ELSA-Brasil, with a 0.1 increase in model probability associated with 81% and 93% higher risk of incident HF [aHR 1.81 (95% CI, 1.58–2.07) and aHR 1.93 (95% CI, 1.68–2.21), respectively, Supplementary data online, Table S9]. A higher threshold for defining a screen-positive ECG consistently defined a higher hazard of incident HF (see Supplementary data online, Table S10).
Prediction of other cardiovascular outcomes
A positive AI-ECG screen was also associated with a more modest but still elevated risk of AMI [aHR 1.44 (95% CI, 1.04–2.00)], all-cause death [aHR 1.19 (95% CI, 1.15–1.24)], or MACE, defined by AMI, stroke, HF, or death [aHR 2.10 (95% CI, 2.04–2.17); Supplementary data online, Table S6] in the YNHHS. Similarly, in the UKB, a positive AI-ECG screen was associated with an elevated risk of AMI and stroke [aHRs 3.16 (95% CI, 1.98–5.02) and 2.30 (95% CI, 1.36–3.9), respectively], all-cause death [aHR 2.13 (95% CI, 1.41–3.24)], and MACE [aHR 2.79 (95% CI, 2.17–3.6); Supplementary data online, Table S6]. This pattern was replicated in ELSA-Brasil across all non-HF cardiovascular outcomes of AMI [aHR 3.53 (95% CI, 1.4–8.85)], stroke [aHR 5.74 (95% CI, 2.59–12.72)], death [aHR 3.64 (95% CI, 2.27–5.83)], and MACE [aHR 4.04 (95% CI, 2.77–5.89)], with a comparably smaller increase in risk of non-HF cardiovascular events than HF risk.
Comparison with pooled cohort equations to prevent heart failure
In YNHHS, the AI-ECG model had a model discrimination based on Harrell’s C-statistic of 0.718 (0.697–0.738), compared with 0.601 (0.581–0.621) for PCP-HF score (P < .001; Table 3) and 0.688 (0.670–0.707) for a model with all PCP-HF components as covariates (P = .03; Supplementary data online, Table S11). In UKB and ELSA-Brasil, the AI-ECG’s model discrimination for incident HF was 0.769 (95% CI, 0.670–0.867) and 0.810 (95% CI, 0.714–0.907), respectively. In ELSA-Brasil, the discrimination was comparable to that for PCP-HF (P = .89). Across all cohorts, the AI-ECG model discrimination for new-onset HF was consistent when evaluated on novel image formats not seen during model training (see Supplementary data online, Table S12). In both YNHHS and ELSA-Brasil, incorporating AI-ECG model probability with PCP-HF yielded a statistically significant improvement in discrimination over the use of PCP-HF alone [YNHHS: 0.147 (95% CI, 0.124–0.170); ELSA-Brasil: 0.106 (95% CI, 0.030–0.181); Table 3].
Comparison of discrimination [Harrell’s C-statistic (95% confidence interval)] for artificial intelligence-enhanced electrocardiogram model output probability and pooled cohort equations to prevent heart failure for incident heart failure
Covariates . | YNHHS . | UKB . | ELSA-Brasil . | ||||
---|---|---|---|---|---|---|---|
Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | Harrell’s C-statistic . | Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | |
PCP-HF | 0.601 (0.581–0.621) | 0.821 (0.748–0.893) | |||||
AI-ECG model output probability | 0.718 (0.697–0.738) | 0.117 (0.086–0.147) | <0.001 | 0.769 (0.670–0.867) | 0.810 (0.714–0.907) | −0.010 (−0.149–0.130) | 0.89 |
AI-ECG model output probability + age + sex | 0.724 (0.705–0.743) | 0.122 (0.098–0.148) | <0.001 | 0.832 (0.756–0.910) | 0.881 (0.820–0.942) | 0.060 (−0.039–0.159) | 0.23 |
AI-ECG model output probability + PCP-HF | 0.748 (0.730–0.766) | 0.147 (0.124–0.170) | <0.001 | 0.926 (0.886–0.966) | 0.106 (0.030–0.181) | 0.006 |
Covariates . | YNHHS . | UKB . | ELSA-Brasil . | ||||
---|---|---|---|---|---|---|---|
Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | Harrell’s C-statistic . | Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | |
PCP-HF | 0.601 (0.581–0.621) | 0.821 (0.748–0.893) | |||||
AI-ECG model output probability | 0.718 (0.697–0.738) | 0.117 (0.086–0.147) | <0.001 | 0.769 (0.670–0.867) | 0.810 (0.714–0.907) | −0.010 (−0.149–0.130) | 0.89 |
AI-ECG model output probability + age + sex | 0.724 (0.705–0.743) | 0.122 (0.098–0.148) | <0.001 | 0.832 (0.756–0.910) | 0.881 (0.820–0.942) | 0.060 (−0.039–0.159) | 0.23 |
AI-ECG model output probability + PCP-HF | 0.748 (0.730–0.766) | 0.147 (0.124–0.170) | <0.001 | 0.926 (0.886–0.966) | 0.106 (0.030–0.181) | 0.006 |
ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; PCP-HF, pooled cohort equations to prevent heart failure; UKB, UK Biobank; YNHHS, Yale New Haven Health System.
Comparison of discrimination [Harrell’s C-statistic (95% confidence interval)] for artificial intelligence-enhanced electrocardiogram model output probability and pooled cohort equations to prevent heart failure for incident heart failure
Covariates . | YNHHS . | UKB . | ELSA-Brasil . | ||||
---|---|---|---|---|---|---|---|
Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | Harrell’s C-statistic . | Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | |
PCP-HF | 0.601 (0.581–0.621) | 0.821 (0.748–0.893) | |||||
AI-ECG model output probability | 0.718 (0.697–0.738) | 0.117 (0.086–0.147) | <0.001 | 0.769 (0.670–0.867) | 0.810 (0.714–0.907) | −0.010 (−0.149–0.130) | 0.89 |
AI-ECG model output probability + age + sex | 0.724 (0.705–0.743) | 0.122 (0.098–0.148) | <0.001 | 0.832 (0.756–0.910) | 0.881 (0.820–0.942) | 0.060 (−0.039–0.159) | 0.23 |
AI-ECG model output probability + PCP-HF | 0.748 (0.730–0.766) | 0.147 (0.124–0.170) | <0.001 | 0.926 (0.886–0.966) | 0.106 (0.030–0.181) | 0.006 |
Covariates . | YNHHS . | UKB . | ELSA-Brasil . | ||||
---|---|---|---|---|---|---|---|
Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | Harrell’s C-statistic . | Harrell’s C-statistic . | Marginal difference over Harrell’s C-statistic for PCP-HF . | P-value . | |
PCP-HF | 0.601 (0.581–0.621) | 0.821 (0.748–0.893) | |||||
AI-ECG model output probability | 0.718 (0.697–0.738) | 0.117 (0.086–0.147) | <0.001 | 0.769 (0.670–0.867) | 0.810 (0.714–0.907) | −0.010 (−0.149–0.130) | 0.89 |
AI-ECG model output probability + age + sex | 0.724 (0.705–0.743) | 0.122 (0.098–0.148) | <0.001 | 0.832 (0.756–0.910) | 0.881 (0.820–0.942) | 0.060 (−0.039–0.159) | 0.23 |
AI-ECG model output probability + PCP-HF | 0.748 (0.730–0.766) | 0.147 (0.124–0.170) | <0.001 | 0.926 (0.886–0.966) | 0.106 (0.030–0.181) | 0.006 |
ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; PCP-HF, pooled cohort equations to prevent heart failure; UKB, UK Biobank; YNHHS, Yale New Haven Health System.
Using AI-ECG improved the reclassification of individuals for the risk of new-onset HF across study cohorts. The NRI for AI-ECG compared with the PCP-HF score was 21.9% (95% CI, 18.0–27.0) in YNHHS and 35.6% (95% CI, 7.9–62.1) in ELSA-Brasil (see Supplementary data online, Table S13).
Discussion
Across three clinically and geographically distinct cohorts, a deep learning model developed to define signatures of LVSD on ECG images represents a non-invasive and accessible digital biomarker for predicting HF risk from ECG images as the sole input. A positive AI-ECG screen was associated with a 4- to 24-fold higher hazard of incident HF across different populations than a negative screen, consistent after accounting for competing risk of death. The model was evaluated in demographically and racially diverse cohorts, with high predictive performance across subgroups of age, sex, race, and ethnicity. We observed a progressive increase in risk based on the left ventricular dysfunction probability on baseline ECG, such that graded increments in the model output probability were associated with higher hazards of incident HF. The model performance was consistent when evaluated on ECG image layouts not encountered during training. A positive screen was also relatively specific for HF risk, with a 2-to-6-fold higher risk of HF than the risk of other cardiovascular outcomes, such as MACE and death. In addition to being substantially easier to use than PCP-HF, which uses an ECG-derived measure in addition to other laboratory and clinical testing, the AI-ECG approach improved discrimination of HF risk over PCP-HF (Structured Graphical Abstract). Therefore, deep learning-enhanced interpretation of ECG images can represent a scalable and reliable strategy for risk stratification for incident HF.
The ability to identify individuals at the highest risk of HF is crucial given the high clinical and economic impact of HF, along with the availability of evidence-based risk-mitigating therapies.6–9 Thus, there has been substantial investigation into defining risk of HF.10,18–20 Notably, blood-based biomarkers have been a key focus of investigation, with an NT-proBNP of 125 pg/mL associated with a 2.4-fold higher risk of incident HF.18,51 This is further potentiated with the addition of high-sensitivity cardiac troponin.18,52 However, the application of these serum-based assays for HF risk is limited by the need for blood draws and laboratory testing.15,53 Similarly, PCP-HF and other proposed HF risk scores such as the Atherosclerosis Risk in Communities HF score or the Health ABC model require detailed clinical history, physical examination, and testing for routine and specialized laboratory measures,8,11,16,22,54 such that these are not consistently recommended in clinical practice guidelines.
While prior work in developing neural network-based models has focused on the detection of prevalent systolic dysfunction or HF,29,31 our study suggests the role of an AI-ECG model for the prediction of HF risk. Previous studies have demonstrated that a positive AI-ECG screen for LVSD is associated with future LVSD risk in those with preserved ejection fraction at baseline.5,11,16 However, these studies included a highly selected population of patients with clinical indications for serial echocardiography and performed predictive assessments in the same cohorts in which the models were developed. In contrast, our study included all adults who underwent an ECG without any prior left ventricular dysfunction or HF, representing a less selective population. We also evaluated the model in two geographically distinct cohorts of community-dwelling adults with protocolized ECG assessments and longitudinal follow-up, a scenario representing the future application of AI-ECG-driven HF risk stratification. Further, our study defined new-onset HF as the outcome instead of future LVSD, increasing the generalizability to broader literature where the outcome for prediction and prevention has been clinical HF.54 In other work, we have found that the ECG image-based LVSD model identifies those with subclinical markers of left ventricular dysfunction, including abnormal left ventricular strain and diastolic dysfunction.55 Therefore, we defined our approach using this strong mechanistic foundation hypothesizing that our AI-ECG model for LVSD would predict future HF. We posit that this use of a stringent imaging-based clinical definition for model development is more resilient to coding variations than models where the occurrence of HF is used to define cases and controls.39,56 The use of incident HF as an outcome for model development may also be challenged by the heterogeneous aetiology of HF outcomes, the competing risk of other events, and potential confounders encoded as a predictive signal in AI-ECG. Nonetheless, a head-to-head comparison of both development approaches is warranted.
Given that ECGs are most commonly available to clinicians and patients as digital images or printouts, the application of conventional signal-based AI-ECG models has been limited by access to raw input ECG voltage data.29 The image-based approach can use interoperable digital images or smartphone photographs of printouts, independent of the ECG format.14 This represents a scalable strategy for deployment in the health system and community settings, without the need to modify technical infrastructure or clinical workflows. Thus, our AI-based approach can enable opportunistic risk stratification for patients undergoing clinical ECGs and facilitate population-based HF risk assessment by serving as a digital biomarker for HF risk.
The predictive performance of our model was high across demographic subgroups, with the highest predictive power in younger individuals across cohorts. This suggests an opportunity for proactive HF screening in younger individuals, followed by implementing risk-mitigating strategies. Further, the AI-ECG probability was well calibrated to the risk of new-onset HF, such that higher AI-ECG scores were associated with progressively elevated risk of HF. This dose dependence represents ideal characteristics for a predictive biomarker, enabling graded risk stratification and proactive mitigation in those at the highest predicted risk.54 Given this dose-dependent association of AI-ECG probability with predicted risk, future work could explore if longitudinal changes in AI-ECG scores in serial ECGs can capture a dynamic underlying clinical risk profile.55
Our study has certain limitations that merit consideration. First, outcomes in the EHR are prone to inconsistent capture due to site-specific variability in coding practices,57,58 though we opted for a specific definition of HF. The association of a positive AI-ECG screen with the risk of HF across YNHHS and population-based cohorts at UKB and ELSA-Brasil indicates that the model captures a predictive signature of disease across a spectrum of disease phenotypes. Moreover, the patterns were consistent across several sensitivity analyses that defined these conditions differently in YNHHS and UKB. Further, in ELSA-Brasil, HF events were expert-adjudicated using established clinical criteria. Second, despite an integrated health system with broad geographic coverage, some HF and other outcome events may have occurred outside YNHHS, potentially resulting in incomplete capture of outcomes. This may be reflected in the lower risk of HF and other events in YNHHS compared with the UKB and ELSA-Brasil, where the follow-up was consistent and protocolized. Therefore, the risk of HF observed for patients in the YNHHS likely represents the lower bound of their actual HF risk, based on the findings in the prospective cohorts. Moreover, the patients who underwent ECG testing in the YNHHS represent a selected cohort, with a risk for the unmeasured potential risk profile of those who underwent a clinical ECG but had a negative AI-ECG screen. This is distinct from the population-based cohorts where ECGs were protocolized. Third, calculating the PCP-HF risk score in UKB was limited by the concurrent availability of laboratory metrics and the ECG-derived QRS duration. This need for extensive clinical evaluation further underscores the impracticality of PCP-HF as a scalable risk stratification strategy for HF. Fourth, given the lack of NT-proBNP assessments in UKB and ELSA-Brasil, we could not evaluate NT-proBNP as a comparator in this study. In YNHHS, since NT-proBNP is rarely performed as a part of primary prevention and is typically ordered for diagnosis of symptoms suggestive of HF, its use could incorporate substantial selection bias. Despite the non-invasive evaluation of AI-ECG risk on ECG images directly on web-based software representing a convenient strategy over blood-based biomarkers, a future head-to-head assessment of AI-ECG and NT-proBNP as predictors for HF is warranted. Fifth, our study evaluated AI-ECG as a one-time risk stratification tool and did not evaluate whether changes in AI-ECG probability over time define the differential risk of HF. We were limited by the lack of serial ECG assessments in the UKB and ELSA-Brasil and sought not to pursue this analysis in the highly selected group of individuals at YNHHS who underwent serial clinical ECGs. In future work, we will explore the longitudinal relationship of serial AI-ECG scores with a dynamic HF risk profile. Sixth, while the model performance was consistent across novel image formats not seen during model training, prospective evaluation of the model using real-world ECG photographs and scanned images may be necessary before wider implementation in the community. Finally, while the study finds a high risk of subsequent HF, it is unclear whether the risk of HF identified by AI-ECG is modifiable. Nevertheless, these observations suggest focusing on targeted identification and management of known HF risk factors.
Conclusions
An AI model applied to images of 12-lead ECGs can identify those at elevated risk of HF across multinational cohorts. As a digital biomarker of HF risk that requires just an ECG image, this AI-ECG approach can enable scalable and efficient screening for HF risk.
Supplementary data
Supplementary data are available at European Heart Journal online.
Declarations
Disclosure of Interest
R.K. is an Associate Editor of JAMA. R.K. and V.S. are the co-inventors of U.S. Provisional Patent Application No. 63/346,610, ‘Articles and methods for format-independent detection of hidden cardiovascular disease from printed electrocardiographic images using deep learning’ and are co-founders of Ensight-AI. R.K. receives support from the National Heart, Lung, and Blood Institute of the National Institutes of Health (under awards R01HL167858 and K23HL153775) and the Doris Duke Charitable Foundation (under award 2022060). He receives support from the Blavatnik Foundation through the Blavatnik Fund for Innovation at Yale. He also receives research support, through Yale, from Bristol-Myers Squibb, BrideBio, and Novo Nordisk. In addition to 63/346610, R.K. is a co-inventor of U.S. Provisional Patent Applications 63/177117, 63/428569, and 63/484426. R.K. and E.K.O. are co-founders of Evidence2Health, a precision health platform to improve evidence-based cardiovascular care. E.K.O. is a co-inventor of the US Patent Applications 63/508315 and 63/177117 and has been a consultant to Caristo Diagnostics Ltd (all outside the current work). H.M.K. works under contract with the Centers for Medicare & Medicaid Services to support quality measurement programmes; was a recipient of a research grant from Johnson & Johnson, through Yale University, to support clinical trial data sharing; was a recipient of a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; receives payment from the Arnold & Porter Law Firm for work related to the Sanofi clopidogrel litigation, from the Martin Baughman Law Firm for work related to the Cook Celect IVC filter litigation, and from the Siegfried and Jensen Law Firm for work related to Vioxx litigation; chairs a Cardiac Scientific Advisory Board for United Health; was a member of the IBM Watson Health Life Sciences Board; is a member of the Advisory Board for Element Science, the Advisory Board for Facebook, and the Physician Advisory Board for Aetna; and is the co-founder of Hugo Health, a personal health information platform, and co-founder of Refactor Health, a healthcare AI-augmented data management company. A.L.P.R. was supported in part by the National Council for Scientific and Technological Development—CNPq (grants 465518/2014-1, 310790/2021-2, 409604/2022-4 e 445011/2023-8). F.W.A. was supported by Heart4Data, which received funding from the Dutch Heart Foundation and ZonMw (2021-B015), and UCL Hospitals NIHR Biomedical Research Centre.
Data Availability
The data from the YNHHS represent protected health information. To protect patient privacy, the Yale Institutional Review Board does not allow the sharing of these data. Data from the UKB and the Brazilian Longitudinal Study of Adult Health are available for research to licensed users. The model is publicly accessible, and programming code for generating key results is available from the authors on request.
Funding
R.K. was supported by the National Heart, Lung, and Blood Institute (grant nos. R01HL167858 and K23HL153775), National Institute on Aging (grant no. R01AG089981), and the Doris Duke Charitable Foundation (grant no. 2022060). E.K.O. was supported by the National Heart, Lung, and Blood Institute (grant no. F32HL170592).
Ethical Approval
The Yale Institutional Review Board approved the study protocol and waived the need for informed consent as the study involves secondary analysis of pre-existing data.
Pre-registered Clinical Trial Number
Not applicable.