-
PDF
- Split View
-
Views
-
Cite
Cite
J Tromp, DN Bauer, BL Claggett, M Frost, MB Iversen, N Prasad, M Petrie, MG Larson, JA Ezekowitz, SD Solomon, A prospective validation of a deep learning-based automated workflow for the interpretation of the echocardiogram, European Heart Journal - Cardiovascular Imaging, Volume 23, Issue Supplement_1, February 2022, jeab289.002, https://doi.org/10.1093/ehjci/jeab289.002
- Share Icon Share
Abstract
Type of funding sources: Private company. Main funding source(s): Us2.ai
Background. Deep learning can automate the interpretation of medical imaging tests. This study aimed to prospectively assess the interchangeability of deep learning algorithms with expert human measurements for interpreting echocardiographic studies, the primary method for assessing cardiac structure and function.
Methods. We compared a deep learning interpretation of 23 echocardiographic parameters—including cardiac volumes, ejection fraction, and Doppler measurements—with three repeated measurements by core lab human experts in a prospective study for submission to the United States Food and Drug Administration (FDA). The primary outcome metric was the individual equivalence coefficient (IEC), which compares the disagreement between deep learning and human readers relative to the disagreement among human readers. The pre-determined non-inferiority criterion was 0.25 for the upper bound of the 95% confidence interval (CI). Secondary outcomes included measures of agreement, including the mean absolute deviation.
Results. We included 602 studies from 600 participants (421 with heart failure, 179 controls, 69% women) with a mean age of 57 ± 16 years. The point estimates of IEC were all <0, indicating that the disagreement between the deep learning and human measures were lower than the disagreement among three core lab readers, and the upper bound of the 95% CI of IECs fell below the prespecified success criterion of 0.25. Secondary endpoints showed good agreement of automated with human expert measurements (Figure), with comparable or lower mean absolute deviations between automated and human experts relative to the mean absolute deviation among human experts.

Abstract Figure.