-
PDF
- Split View
-
Views
-
Cite
Cite
Evangelos K Oikonomou, Musib Siddique, Charalambos Antoniades, Artificial intelligence in medical imaging: A radiomic guide to precision phenotyping of cardiovascular disease, Cardiovascular Research, Volume 116, Issue 13, 1 November 2020, Pages 2040–2054, https://doi.org/10.1093/cvr/cvaa021
- Share Icon Share
Abstract
Rapid technological advances in non-invasive imaging, coupled with the availability of large data sets and the expansion of computational models and power, have revolutionized the role of imaging in medicine. Non-invasive imaging is the pillar of modern cardiovascular diagnostics, with modalities such as cardiac computed tomography (CT) now recognized as first-line options for cardiovascular risk stratification and the assessment of stable or even unstable patients. To date, cardiovascular imaging has lagged behind other fields, such as oncology, in the clinical translational of artificial intelligence (AI)-based approaches. We hereby review the current status of AI in non-invasive cardiovascular imaging, using cardiac CT as a running example of how novel machine learning (ML)-based radiomic approaches can improve clinical care. The integration of ML, deep learning, and radiomic methods has revealed direct links between tissue imaging phenotyping and tissue biology, with important clinical implications. More specifically, we discuss the current evidence, strengths, limitations, and future directions for AI in cardiac imaging and CT, as well as lessons that can be learned from other areas. Finally, we propose a scientific framework in order to ensure the clinical and scientific validity of future studies in this novel, yet highly promising field. Still in its infancy, AI-based cardiovascular imaging has a lot to offer to both the patients and their doctors as it catalyzes the transition towards a more precise phenotyping of cardiovascular disease.

1. Introduction
Modern medicine is characterized by the generation of a vast amount of data, which include high-resolution imaging modalities. With the amount of medical information increasing at an unprecedented rate, medical professionals are turning to novel technologies in order to interpret these large sums of data, maximize efficiency while ensuring patient safety and well-being.1 The arrival of artificial intelligence (AI) and its application in medicine has brought hope that it can improve health outcomes by supplementing human intelligence and by maximizing the diagnostic and prognostic value of existing tests while minimizing physician burden.1
Non-invasive imaging is the pillar of modern cardiovascular diagnostics, with modalities such as cardiac computed tomography (CT) now recognized as first-line options for cardiovascular risk stratification and the assessment of stable and unstable cardiovascular patients.2,3 Among all routinely available diagnostic tests, coronary CT angiography (CCTA) has the highest sensitivity (95–99%) for detection of coronary artery disease (CAD) (defined as stenosis ≥50% on invasive coronary angiography), with a specificity of 64–83%.4 The clinical benefit associated with the use of CCTA to diagnose stable CAD and guide downstream decision-making has been provided by two large clinical trials, namely PROMISE (Prospective Multicenter Imaging Study for Evaluation of Chest Pain) and SCOT-HEART (SCOTtish computed tomography of the HEART).5,6 These randomized controlled trials have been instrumental in establishing CCTA as a first-line diagnostic test, as highlighted in recent national and international guidelines.2,3
Thanks to its role as a first-line diagnostic test, the availability of large data sets and registries and significant advances in radiomic analysis methods and machine learning (ML) systems, cardiac CT offers an optimal platform to bridge AI with clinical medicine. The basic principle behind these novel ‘radiomic’ approaches is that CT scans are more than images, they are data.7 In other words, the traditional grey-scale images of cardiac CT scans can now be represented using complex mathematical formulae that enable the characterization of features and detection of patterns invisible to the naked eye.
The aim of this review is to provide an overview of AI in modern cardiac CT, and its dual implications in clinical care and scientific research and discovery. We first discuss key terms in the field of AI including ‘big data’, ‘machine learning’ (ML), ‘deep learning’, and ‘radiomics’. Next, we review the current evidence, strengths, limitations, and future directions of AI in non-invasive cardiovascular imaging, using cardiac CT as a running example of the many challenges and opportunities. Finally, we propose a scientific framework in order to ensure the clinical and scientific validity of future studies in this novel, yet highly promising and exciting field.
2. Artificial intelligence, machine learning, and big data
2.1 Artificial intelligence vs. machine learning
The terms ‘artificial intelligence’, ‘machine learning’, and ‘big data’, although distinct, are often mistakenly used interchangeably. Artificial intelligence (AI), also known as ‘machine intelligence’, is a broad term that refers to the ability of a machine or computational programme to execute tasks that are characteristic of human intelligence, such as pattern recognition and problem-solving.8 While the concept of AI is not new,9 modern AI has benefited enormously from an increase in available computational power and large data sets that can be used to train these systems.10 Just in the field of cardiac CT imaging, it is currently estimated that >42 000 cardiac CT scans are performed every year in the UK alone, whereas this number is expected to reach 350 000 if the National Institute for Health and Care Excellence (NICE) guidelines were to be fully implemented.2,11 On the other hand, the process by which an AI system autonomously acquires knowledge by identifying and extracting patterns among a group of observations (a ‘data set’) is called machine learning (ML).12,13
2.2 Big data
‘Big data’ is a term frequently used to describe vast amounts of collected data, whether that is genomic data from large biobanks or numerous CT scans from electronic health record archives and large cohorts or registries.14,15 Even though ML algorithms can be trained using both small and large data sets, the availability of large data sets provides the necessary sample variation to maximize both the internal and external validity (reproducibility) of the trained algorithms.16 It also reduces the risk of overfitting, a state where a trained model is too complex, mirroring the noise in the original training data set.16
Data sets are matrices of data, where rows typically describe a single observation (e.g. patient) and columns describe the values of different features for each observation,17 including labels for a given condition (e.g. ‘dead’ or ‘alive’) to be used for prediction or classification purposes. As the fuel for ML, the quality of the data sets is critical in determining the quality of the final ML models.18 These core attributes are described by the five ‘V’s of big data: Volume, Velocity, Variety, Veracity, and Value. ML algorithms benefit from data sets that are large (Volume), generated and processed rapidly (Velocity), come from different sources (Variety), are trustworthy (Veracity), and above all provide answers to important questions, such as diagnosis or prognosis of a disease (Value).15
2.3 Unsupervised, supervised, and deep learning
Within the field of ML, there are two broad task categories: supervised and unsupervised learning (Figure 1).13 The selection of the right model often relies on an operator’s expertise, nature of the data set, and the purpose of the final AI system.17

Artificial intelligence and machine learning. While AI describes a programme capable of performing tasks typical of human intelligence, machine learning refers to the process through which an AI system is trained to learn. The two main types of machine learning used in medicine are supervised and unsupervised learning. In the former, different algorithms such as regression or more advanced methods reflecting the structure of the human brain (neural networks), using decision trees or projecting a data set into a higher-order space to identify optimal separation planes (hyperplanes in support vector machines) or combination thereof are used to predict the class or value of a given label. In unsupervised learning (e.g. clustering), the data set is analysed to identify inherent patterns of the data, often using hierarchical or k-means clustering methods.
Supervised learning is an iterative process which selects (or removes), processes, and assigns appropriate weightings to features in order to predict a given value or class.13 The former is typically known as regression (i.e. linear or logistic regression with feature selection), whereas the latter is known as classification. Further to traditional linear regression, newer statistical approaches, such as ‘neural networks’, support vector machines and decision trees have come to light in order to maximize the flexibility of the training algorithms and model complex non-linear relationships between the features.13,17 For instance, neural networks are modelled based on the neurons of the human brain with input and output layers, separated by several ‘hidden layers’, which are connected in nodes, similar to human synapses. In addition, some of these algorithms can be combined (‘bagging’ and ‘ensemble’ algorithms) to generate stronger predictors using a series of weaker predictors.
Based on artificial neural networks, deep learning refers to a particularly powerful ML method often used in imaging for pattern recognition and classification (e.g. diagnosis of melanoma or diabetic retinopathy19,20). It mimics human cognition by using convolutional neural networks (CNNs) and is characterized by the ability to learn based on prior experience, thus simulating human-like decision-making.1
As opposed to supervised learning, unsupervised learning is not relying on a label but uses the features of a data set to identify inherent patterns. The most common example is that of clustering (e.g. ‘k-means’ or hierarchical clustering).13 These approaches analyse the n-dimensional space of a data set to identify clusters of spatially related observations using a ‘distance’ measure. Such approaches are important in identifying previously ignored phenotype clusters in patients based on their presentation or imaging features and often challenge perceptions about the homogeneity of a given condition.21
2.4 Performance assessment
Validation is a key step that aims to improve the validity and reproducibility of a given algorithm. This is usually done by randomly splitting the data into a training, a validation, and a testing set.16,22 Where available, an ‘unseen’ external data set from an independent population may be used as the final testing data set to assess the external validity of a model. In classification problems, common metrics are the accuracy (proportion of correct predictions relative to the total number of observations) and the area under the curve (AUC) of receiver operating characteristic curves, which reflects the increase in correct classifications as one accepts more false positives.22 Other metrics include log-loss, precision, and recall, the Dice coefficient, whereas regression tasks are assessed using different metrics (i.e. root mean square error).22
3. Radiomics: a link between CT imaging and machine learning
The term ‘radiomics’ refers to the application of complex mathematic formulae to a given radiological image (e.g. a CT scan) that enable the calculation of a wide number of features, relating to the shape, attenuation, and ‘texture’ of a given volume of interest (Figure 2).23 As an isotropic imaging modality that is composed of superimposed numerical matrices [Hounsfield Unit (HU) values],24 CT is a prime candidate for the application of radiomic methods.25

Radiomic characterization of textural features. For a given volume of interest, differences in the underlying histological structure will result in different texture patterns that can be described using higher-order features that reflect the unique spatial arrangement of voxels and their attenuation on computed tomography. Histogram-based first-order features only reflect the voxel attenuation distribution. Different texture patterns (same number of voxels with similar attenuation values but different location) may still have identical histogram and therefore similar first-order statistics.
The main concept in radiomic analysis is that of the radiomic texture. Statistical texture refers to the stochastic or random properties of the spatial distribution of grey levels within an image using statistical measures, such as marginal probabilities.26 Contrary to shape-related or first-order statistics that are derived directly from an attenuation histogram ignoring the distribution of attenuation values in the three-dimensional space, texture statistics reflect the unique spatial arrangement of voxels.27 These second- or higher-order statistics are derived from grey-level intensity matrices, which can be calculated using different methods (Figure 2). A detailed description of the mathematical formulae behind the calculation of all these features goes beyond the focus of this review article, and relevant information can be found in other articles.28
Radiomic features can be calculated using both the original images as well as mathematical transformations of the original data, such as wavelet decompositions. Wavelet transformation decomposes the data into high- and low-frequency components, which describe the pattern and rate at which attenuation changes along spatial directions. At high frequency, the wavelets can capture discontinuities, ruptures, and singularities in the original data. At low frequency, the wavelet characterizes the coarse structure of the data to identify the long-term trends. Thus, the wavelet analysis allows extraction of hidden and significant temporal features of the original data, while improving the signal-to-noise ratio of imaging studies.29,30
Overall, radiomic approaches bridge the gap between CT scans and generating data sets that can be used in ML and thus generate AI systems. Any image or volume can be broken down into a range of radiomic features that describe the phenotypic variation in radiodensity/attenuation in the given tissue.26,28 These numbers can then be fed into an ML algorithm that can be used for classification or prediction purposes. Furthermore, this provides an unparalleled opportunity for a more personalized assessment model. This has long been demonstrated in the field of cancer imaging, where comprehensive radiomic characterization of lung tumours was found to be superior to traditional tumour, node, and metastasis staging in predicting future mortality,28,31 and also associated with discrete transcriptional changes in tumour biology.31
4. Machine learning and radiomics in cardiovascular medicine: from electrocardiogram to cardiac CT
To date, AI approaches in Cardiology have traditionally focused on electrocardiogram (ECG) and echocardiogram interpretation, particularly with the use of deep neural networks (DNNs). The availability of these tests provided researchers with vast amounts of data in order to train their algorithms. DNNs have been shown to have high sensitivity (∼93%) and specificity (∼90%) in diagnosing acute myocardial infarction,32 as well as classifying arrhythmias and electrical conduction abnormalities, with accuracy comparable with that of trained cardiologists.33 The power of big data and AI was demonstrated in a recent landmark study which analysed 180 922 patients with 649 931 normal sinus rhythm ECGs and demonstrated that a CNN algorithm was able to reliably detect the presence of atrial fibrillation [AUC of 0.87 (95% confidence interval 0.86–0.88)].34 More recently, however, with the increasing adoption of cardiac CT as the go-to test for the non-invasive assessment of CAD,11 the focus of AI research has expanded to the analysis and interpretation of cardiac CT scans.
4.1 Image processing, detection, and segmentation
AI and deep learning can improve the speed of the initial steps of image pre-processing, boundary detection, and volume segmentation, which are often time-consuming in the busy clinical setting. An ML algorithm that mapped features related to image quality (i.e. noise, contrast, misregistration scores, and uninterpretability index) and was trained in 75 CCTA scans before being validated in 50 independent studies, had excellent discriminatory accuracy in identifying low-image quality (AUC of 0.96 in the validation set). In an independent set of 172 CCTAs, the agreement between manually assigned visual image quality score (5-point Likert scale) and the ML algorithm was found to be high [Cohen’s kappa of 0.67 (P < 0.01)].35 Three-dimensional CNNs with subject-specific data set normalization have also been shown to improve the accuracy of coronary artery lumen segmentation compared with traditional methods.36
4.2 Coronary artery calcium
Traditionally measured using ECG-gated non-contrast CT scans of the heart, coronary artery calcium (CAC) provides a simple and quick indirect assessment of the extent of coronary atherosclerosis.37 As a result, CAC is often used in the risk stratification of selected patients where a risk-based treatment approach remains uncertain based on traditional risk factors.38 Several ML and DNN approaches have been developed to automate the calculation of CAC from cardiac CT scans. A CNN trained in 4973 cases showed very strong correlation with manual measurements in a testing set of 1000 scans.39 A texture-based radiomic approach has also shown promise in detecting CAC from non-ECG-gated chest CT scans.40 In total, more than 150 publications have been published on automating the methods for CAC detection, highlighting the need for more automated tools in the clinical setting.41 Despite the fact that CAC may fail to detect low-attenuation, non-calcified plaques that are known to be more prone to rupture compared with calcified lesions,42,43 the diagnostic and prognostic value of CAC in primary prevention is supported by numerous studies. In a recent analysis of 13 054 participants from the CONFIRM registry, a boosted ensemble ML algorithm incorporating clinical variables as well as the CAC score (derived from non-contrast cardiac CT scans) was found to accurately estimate the likelihood of obstructive CAD on CCTA with an AUC of 0.881 (vs. AUC of 0.773 for the clinical ML algorithm alone).44
4.3 Coronary plaque detection
Clinicians interpreting CCTA scans focus, among other things, on the identification of lesions that may be causing significant narrowing to the coronary lumen. While this is normally based on visual assessment of the reconstructed images, ML algorithms have been shown to be highly accurate in identifying such obstructive lesions. A support vector machine algorithm applied in 42 CCTAs had a sensitivity of 93% and specificity of 95% in identifying coronary artery lesions compared with a human observer.45
4.4 Haemodynamic assessment of coronary lesions
Evaluating the haemodynamic effects of a coronary lesion in a non-invasive manner is a challenging task.46 This is often assessed by estimating the myocardial flow reserve on 13N-Ammonia positron emission tomography (PET) (abnormal if ≤2),47 or the invasive lesion-specific fractional flow reserve (FFR) on cardiac catheterization (abnormal if ≤0.80).48 ML-derived algorithms that incorporate several measures of plaque composition (e.g. stenosis, non-calcified, low-density non-calcified, calcified and total plaque volumes, contrast density difference) were found to be superior in detecting haemodynamically significant obstructive lesions compared with the degree of luminal stenosis alone (AUC of 0.84 vs. 0.76 for lesion-specific FFR,48 and 0.83 vs. 0.66 for 13N-Ammonia PET in a separate study47). Other similar comprehensive approaches that rely on ML-based synthesis of several plaque features have demonstrated high accuracy in identifying haemodynamically significant lesions (AUC of 0.89 on a per-lesion level and 0.91 on a per-patient level) which is comparable with that of complex computation fluid dynamic modelling.49,50 Interestingly, processing time for the ML-based approach was significantly shorter compared with computation fluid dynamics (40.5 min ± 6.3 vs. 43.4 min ± 7.1; P = 0.042).50 Further to plaque-derived measurements, models trained to characterize resting myocardial CT perfusion (i.e. gradient boosting classifiers) as well as deep learning analysis algorithms of the left ventricular myocardium have significantly improved the discriminatory accuracy of diameter stenosis for detection of ischaemia (FFR ≤ 0.80).51,52
4.5 Coronary plaque phenotyping
Whereas detection of coronary plaques and their haemodynamic significance relies on ML-based combination of several, yet common CCTA-derived metrics, a more comprehensive assessment of the plaque microenvironment, histology, and ultimately biology requires a more in-depth radiomic characterization of its phenotype.
These high-risk plaque features offer an individualized insight into the vascular biology of each plaque.53,54 For instance, vascular wall remodelling in atherosclerosis is the end-result of complex pathways that converge to cell migration and extracellular matrix remodelling, often due to an imbalance in the relative expression and activity of matrix metalloproteinases and their inhibitors in the plaque microenvironment.55 The resulting outwards vascular remodelling can then be detected on CCTA as a relative increase in vascular diameter around the plaque captured by the remodelling index.53 Low-attenuation plaque, on the other hand, is associated with a lipid-rich necrotic core, an extracellular mass in the intima induced by necrosis and apoptosis of lipid-laden macrophage foam cells.56 Such a high-risk plaque phenotype composed of a thin fibrotic cap above a necrotic core is often described as a ‘napkin-ring sign’ (NRS) on CCTA, manifesting as a low-attenuation area surrounded by a high-attenuation rim.53,57 Finally, spotty calcification identifies inflamed areas of confluent coronary calcification and microcalcification.53,58 Vascular calcification represents a local response to an inflammatory microenvironment, with a well-defined link between inflammatory cell infiltration and osteoblastic metaplasia.59
Radiomic phenotyping of a given plaque can identify such high-risk plaque features as changes in the attenuation histogram and radiomic texture of a plaque, thus standardizing what is often a subjective and operator-dependent process (Figure 3). In an analysis of 30 NRS lesions and 30 non-NRS plaques with similar degree of calcification, luminal obstruction, localization, and imaging parameters, Kolossvary et al.60 demonstrated that 916 radiomic features were significantly different between the two groups, with 418 of these features reaching an AUC of >0.80. Texture-shape statistics such as short- and long-run low grey-level emphasis had the highest AUC (0.918 and 0.894, respectively), whereas none of the conventional CCTA metrics discriminated between the two groups. More recently, in an analysis of 44 plaques from 25 patients that underwent multimodality imaging, CCTA-derived radiomic parameters outperformed conventional metrics (i.e. stenosis, plaque volume) in identifying intravascular ultrasound-defined attenuated plaques, optical coherence tomography-detected thin-cap fibroatheroma and 18F-sodium fluoride (18F-NaF) positivity on PET, a marker of microcalcification and coronary inflammation.61 Finally, in an analysis of 445 cross-sections taken from 21 coronary arteries from seven male hearts that were imaged ex vivo, a radiomics-based ML model was found to be superior to visual assessment (AUC = 0.73 vs. 0.65; P = 0.04), low attenuation (AUC = 0.55; P = 0.01), and mean HU (AUC = 0.53; P = 0.004) in the identification of advanced atheromatous lesions.62

Radiomic phenotyping of coronary lesions. Differences in coronary plaque composition will manifest as different radiomic texture patterns on computed tomography analysis, which can then be quantified using first- and higher-order radiomic features. Changes in these metrics can be used in an automated way to not only detect plaques but also produce a deep characterization of the histology and biology of a given lesion.
4.6 Myocardial tissue characterization
Application of radiomic texture phenotyping in myocardial segmentations has also highlighted the ability of low-dose, thick-slice, non-contrast cardiac CT to detect myocardial pathology, such as scarring and infarction. Myocardial texture mapping showing increased heterogeneity on delayed iodine-enhanced images may enable the detection of scarred myocardial tissue in patients with myocarditis,63,64 as well as detect left ventricular dilation and systolic/diastolic function in patients with recurrent ventricular tachcycardia.65 Of note, texture analysis of thick-slice CT images of the left ventricle may discriminate infarcted myocardial tissue from healthy areas (AUC of 0.78 and 0.90 in two separate studies).64,66 The latter model included the first-order statistic of kurtosis and the higher-order statistic of short-run high grey-level emphasis, highlighting the complementary nature of first and higher-order radiomic features in characterizing tissue histology.66
4.7 Adipose tissue characterization
Epicardial and total thoracic adipose tissue depots can be detected on cardiac CT by applying an attenuation-based segmentation approach, which classifies voxels with attenuation values between −190 and −30 HU as adipose tissue/fat. Epicardial adipose tissue (EAT), the layer of fat located between the visceral layer of the pericardium and the myocardium, is involved in cardiovascular disease pathogenesis through direct paracrine interaction with the adjacent coronary artery and myocardial tissue.67–70 In addition, several observational studies have described a positive association between the degree of epicardial obesity and the presence of coronary calcification, CAD, as well as the future incidence of adverse cardiac events.71 However, segmentation of the EAT on CT scans is an intensive process that requires manual editing from an experienced operator. Several groups have now succeeded in developing automated, ML-derived solutions by enabling automated tracking of the pericardial layer and segmentation of epicardial fat-containing voxels, including random forest classifiers,72 rotation forest algorithms using a multilayer perceptron regressor,73 as well as deep learning methods that allow computation in less than 6 s with strong correlation between the manual and automated measurements (e.g. r = 0.924 for EAT).74
4.8 Radiomic phenotyping of perivascular fat
Perivascular adipose tissue (PVAT) plays a key role in regulating vascular homeostasis and disease75 and participates in a complex bidirectional interplay with the adjacent arterial wall.67,68,76,77 In the presence of vascular inflammation, the release of pro-inflammatory mediators into the surrounding PVAT blocks the ability of perivascular pre-adipocytes to differentiate into mature lipid-laden adipocytes. This creates an inflammation-induced gradient in PVAT composition, which can be detected on standard CCTA as spatial changes in the PVAT radiomic texture, quantified by the fat attenuation index (FAI).77 In the CRISP-CT study, FAI radiomic mapping at baseline offered incremental prognostic value for future adverse cardiac events beyond traditional risk factors, extent of coronary atherosclerosis, and presence of high-risk plaque features, highlighting a residual cardiac risk hidden in the PVAT radiome (Figure 4).78 Further work has confirmed that unadjusted or adjusted (FAI) PVAT attenuation is associated with the presence of unstable lesions in acute coronary syndromes,77,79 predicts the future progression of coronary atherosclerosis,80 is reduced in response to anti-inflammatory therapies with novel biologics in patients with psoriasis,81 and is strongly associated with local vascular inflammation, as assessed by 18F-NaF PET-CT imaging.82

Radiomic phenotyping of perivascular fat to detect coronary inflammation. (A) Radiomic characterization of perivascular fat by means of the fat attenuation index to detect vascular effects on the adjacent fat. (B) Prognostic value of perivascular fat attenuation index phenotyping for all-cause and cardiac mortality in the Cardiovascular Risk Prediction using Computed Tomography study. Reproduced with permission from Oikonomou et al.78
In addition to inflammation, dysfunctional adipose tissue remodelling is also characterized by fibrosis and changes in adipose tissue vascularity.75,83 By applying a radiotranscriptomic approach, we recently developed an integrated CT signature of pericoronary fat that links the ‘radiome’ of pericoronary fat to not only inflammatory but also permanent fibrotic and microvascular changes (Figure 5), thus functioning as a surrogate marker of cumulative coronary injury and ageing. More importantly, when tested in participants of the SCOT-HEART study, this integrated signature (known as Far Radiomic Profile) predicted a significant residual cardiac risk not captured by traditional risk factors, CAD, high-risk plaque (HRP) features, or coronary calcium score (CCS).84

Radiomic phenotyping to detect biological hallmarks of dysfunctional adipose tissue. (A–C) Manhattan plots presenting the strength of association between adipose tissue radiomic features and the relative gene expression of TNFA (inflammation), COL1A1 (fibrosis), and CD31 (endothelial marker, vascularity). (D) Component plot of the three principal components of the adipose tissue radiome. (E) Comparison of nested linear regression models with relative gene expression as the dependent variable and (i) clinical risk factors alone (Model 1: age, sex, hypertension, hypercholesterolaemia, diabetes mellitus, and body mass index); (ii) Model 1 + mean attenuation (Model 2); and (iii) Model 2 + PVAT radiome (first three principal components) as the independent predictors. Imc, informational measure of correlation 2; L/H, low/high wavelet transformation; SALGLE, small area low grey-level emphasis; SDLGLE, small dependence low grey-level emphasis; SRLGLE, short-run low grey-level emphasis. Reproduced with permission from Oikonomou et al.84
4.9 Cardiac risk prediction
Supervised and unsupervised ML approaches have shown promise in identifying patterns with significant prognostic value for future adverse cardiac events in patients undergoing CT. For instance, using a registry of 10 030 patients undergoing CCTA with 25 clinical and 44 CCTA parameters, Motwani et al.85 developed and tested a boosted ensemble ML algorithm that had higher discriminatory accuracy for 5-year mortality compared with the Framingham Risk Score (FRS) or modified Duke index (DI) alone (AUC of 0.79 vs. 0.61 for FRS and 0.62 for DI; P < 0.001). Similarly, an ML extreme gradient boosting algorithm derived from detailed plaque analysis of standard 16 coronary segments on CCTA had greater prognostic accuracy for myocardial infarction and death than current CCTA integrated risk scores (AUC of 0.771 vs. 0.685–0.701, P < 0.001 for other scores such as DI).86 Finally, in a study of 2924 Framingham Heart Study patients that underwent chest and abdomen CT, measures of valvular/vascular calcification, adiposity, and muscle attenuation were collected and used in an unsupervised manner to identify a cluster of patients with unfavourable multiorgan phenotype and a 2.6-fold higher prospective mortality risk compared with the favourable phenotype group, independent of CAC, visceral adipose tissue, and FRS.21 These findings highlight the ability of ML to identify patterns in the data sets that are of significant diagnostic and prognostic value, yet invisible to the human eye.
5. A proposed quality control framework for future studies
AI-powered radiomic phenotyping of patients using cardiac CT can identify signatures for precision diagnosis and prognosis, thus providing an additional powerful tool in modern medicine. However, the great power of AI and ML comes with great responsibility,18 highlighting the need for a standardized approach to ML-based prediction modelling. Five core steps have been described in a radiomics study, namely data selection, medical imaging, feature extraction, exploratory analysis, and modelling.7 Given the flexibility offered to researchers by the wide range of available software, methods and ML algorithms, the literature is full of competing models, different algorithms, and often contrasting approaches to feature selection, model development, validation, and performance assessment. This in turns introduces bias in these studies, limits their reproducibility and therefore potential clinical value. Based on the work of Lambin et al.,7 we propose a set of guidelines to ensure the high quality of radiomic studies in the field of cardiovascular CT (Table 1). Moving forward, scientific societies should also focus on the standardization of radiomic feature definitions and their extraction methods in order to ensure generalizability and reproducibility. To date, there is no scientific consensus statement on the use of radiomics in cardiovascular imaging.
Methodological quality in studies using radiomics in cardiac computed tomography imaging
Checklist . | Descriptions . |
---|---|
1. Pre-defined image protocol and registration | Prospective registration of radiomic studies, including pre-defined imaging protocols to be used consistently in all patients and study sites. |
2. Segmentation robustness | Intra- and inter-operator levels of agreement for repeated segmentations and calculation of the robustness of radiomic features (i.e. intraclass correlation coefficient). |
3. Technical parameters | Assessment of the sensitivity of radiomic features to changes in technical acquisition parameters, across different scanners and vendors. |
4. Scan–rescan robustness | Robustness of radiomic features to scan–rescan analysis using the same scanner, parameters, and other settings. |
5. Normalization and standardization | Protocol-defined selection of the methods for pre-processing and standardization (e.g. Z-score transformation) of radiomics. |
6. Algorithm selection | The rationale for the selection of a given machine learning algorithm should be clearly described. |
7. Multiple comparisons and redundance | Addressing the potential redundancy of radiomic features (i.e. dimension reduction or feature removal) as well as multiple comparisons (e.g. Bonferroni adjustment). |
8. Multivariable models | Radiomic-based models should still be adjusted for traditional risk factors and expected factors and co-variates |
9. Associations with known clinical variables | The strength and nature of the association of radiomic-based models with traditional risk factors (i.e. coronary calcium) should be explored and discussed. |
10. Risk group identification | Where risk groups are to be defined based on a radiomic signature, the method for cut-off identification should be defined a priori. |
11. Discrimination–performance | Measures of performance should be appropriately selected based on the task (classification and regression) and nature of the data (e.g. C-statistic vs. accuracy for unbalanced groups). |
12. Calibration | Present appropriate calibration metrics. |
13. Validation | Discuss the process for internal (e.g. cross-validation) and external validation (i.e. unseen data). |
14. Comparison to clinical ‘gold-standard’ | Where a radiomic model is proposed as a replacement for an established ‘gold-standard’, the change in discrimination and reclassification should be discussed. |
15. Clinical utility and cost-effectiveness | If possible, a clinical utility and cost-effectiveness analysis should be performed, including the time saved or wasted for each type of analysis, and clinical benefit to the patient. |
16. Accessibility | Investigators should provide information about the accessibility of their code and availability for use in independent studies. |
Checklist . | Descriptions . |
---|---|
1. Pre-defined image protocol and registration | Prospective registration of radiomic studies, including pre-defined imaging protocols to be used consistently in all patients and study sites. |
2. Segmentation robustness | Intra- and inter-operator levels of agreement for repeated segmentations and calculation of the robustness of radiomic features (i.e. intraclass correlation coefficient). |
3. Technical parameters | Assessment of the sensitivity of radiomic features to changes in technical acquisition parameters, across different scanners and vendors. |
4. Scan–rescan robustness | Robustness of radiomic features to scan–rescan analysis using the same scanner, parameters, and other settings. |
5. Normalization and standardization | Protocol-defined selection of the methods for pre-processing and standardization (e.g. Z-score transformation) of radiomics. |
6. Algorithm selection | The rationale for the selection of a given machine learning algorithm should be clearly described. |
7. Multiple comparisons and redundance | Addressing the potential redundancy of radiomic features (i.e. dimension reduction or feature removal) as well as multiple comparisons (e.g. Bonferroni adjustment). |
8. Multivariable models | Radiomic-based models should still be adjusted for traditional risk factors and expected factors and co-variates |
9. Associations with known clinical variables | The strength and nature of the association of radiomic-based models with traditional risk factors (i.e. coronary calcium) should be explored and discussed. |
10. Risk group identification | Where risk groups are to be defined based on a radiomic signature, the method for cut-off identification should be defined a priori. |
11. Discrimination–performance | Measures of performance should be appropriately selected based on the task (classification and regression) and nature of the data (e.g. C-statistic vs. accuracy for unbalanced groups). |
12. Calibration | Present appropriate calibration metrics. |
13. Validation | Discuss the process for internal (e.g. cross-validation) and external validation (i.e. unseen data). |
14. Comparison to clinical ‘gold-standard’ | Where a radiomic model is proposed as a replacement for an established ‘gold-standard’, the change in discrimination and reclassification should be discussed. |
15. Clinical utility and cost-effectiveness | If possible, a clinical utility and cost-effectiveness analysis should be performed, including the time saved or wasted for each type of analysis, and clinical benefit to the patient. |
16. Accessibility | Investigators should provide information about the accessibility of their code and availability for use in independent studies. |
Modelled based on the work by Lambin et al.7
Methodological quality in studies using radiomics in cardiac computed tomography imaging
Checklist . | Descriptions . |
---|---|
1. Pre-defined image protocol and registration | Prospective registration of radiomic studies, including pre-defined imaging protocols to be used consistently in all patients and study sites. |
2. Segmentation robustness | Intra- and inter-operator levels of agreement for repeated segmentations and calculation of the robustness of radiomic features (i.e. intraclass correlation coefficient). |
3. Technical parameters | Assessment of the sensitivity of radiomic features to changes in technical acquisition parameters, across different scanners and vendors. |
4. Scan–rescan robustness | Robustness of radiomic features to scan–rescan analysis using the same scanner, parameters, and other settings. |
5. Normalization and standardization | Protocol-defined selection of the methods for pre-processing and standardization (e.g. Z-score transformation) of radiomics. |
6. Algorithm selection | The rationale for the selection of a given machine learning algorithm should be clearly described. |
7. Multiple comparisons and redundance | Addressing the potential redundancy of radiomic features (i.e. dimension reduction or feature removal) as well as multiple comparisons (e.g. Bonferroni adjustment). |
8. Multivariable models | Radiomic-based models should still be adjusted for traditional risk factors and expected factors and co-variates |
9. Associations with known clinical variables | The strength and nature of the association of radiomic-based models with traditional risk factors (i.e. coronary calcium) should be explored and discussed. |
10. Risk group identification | Where risk groups are to be defined based on a radiomic signature, the method for cut-off identification should be defined a priori. |
11. Discrimination–performance | Measures of performance should be appropriately selected based on the task (classification and regression) and nature of the data (e.g. C-statistic vs. accuracy for unbalanced groups). |
12. Calibration | Present appropriate calibration metrics. |
13. Validation | Discuss the process for internal (e.g. cross-validation) and external validation (i.e. unseen data). |
14. Comparison to clinical ‘gold-standard’ | Where a radiomic model is proposed as a replacement for an established ‘gold-standard’, the change in discrimination and reclassification should be discussed. |
15. Clinical utility and cost-effectiveness | If possible, a clinical utility and cost-effectiveness analysis should be performed, including the time saved or wasted for each type of analysis, and clinical benefit to the patient. |
16. Accessibility | Investigators should provide information about the accessibility of their code and availability for use in independent studies. |
Checklist . | Descriptions . |
---|---|
1. Pre-defined image protocol and registration | Prospective registration of radiomic studies, including pre-defined imaging protocols to be used consistently in all patients and study sites. |
2. Segmentation robustness | Intra- and inter-operator levels of agreement for repeated segmentations and calculation of the robustness of radiomic features (i.e. intraclass correlation coefficient). |
3. Technical parameters | Assessment of the sensitivity of radiomic features to changes in technical acquisition parameters, across different scanners and vendors. |
4. Scan–rescan robustness | Robustness of radiomic features to scan–rescan analysis using the same scanner, parameters, and other settings. |
5. Normalization and standardization | Protocol-defined selection of the methods for pre-processing and standardization (e.g. Z-score transformation) of radiomics. |
6. Algorithm selection | The rationale for the selection of a given machine learning algorithm should be clearly described. |
7. Multiple comparisons and redundance | Addressing the potential redundancy of radiomic features (i.e. dimension reduction or feature removal) as well as multiple comparisons (e.g. Bonferroni adjustment). |
8. Multivariable models | Radiomic-based models should still be adjusted for traditional risk factors and expected factors and co-variates |
9. Associations with known clinical variables | The strength and nature of the association of radiomic-based models with traditional risk factors (i.e. coronary calcium) should be explored and discussed. |
10. Risk group identification | Where risk groups are to be defined based on a radiomic signature, the method for cut-off identification should be defined a priori. |
11. Discrimination–performance | Measures of performance should be appropriately selected based on the task (classification and regression) and nature of the data (e.g. C-statistic vs. accuracy for unbalanced groups). |
12. Calibration | Present appropriate calibration metrics. |
13. Validation | Discuss the process for internal (e.g. cross-validation) and external validation (i.e. unseen data). |
14. Comparison to clinical ‘gold-standard’ | Where a radiomic model is proposed as a replacement for an established ‘gold-standard’, the change in discrimination and reclassification should be discussed. |
15. Clinical utility and cost-effectiveness | If possible, a clinical utility and cost-effectiveness analysis should be performed, including the time saved or wasted for each type of analysis, and clinical benefit to the patient. |
16. Accessibility | Investigators should provide information about the accessibility of their code and availability for use in independent studies. |
Modelled based on the work by Lambin et al.7
6. Limitations
Several limitations should be kept in mind when designing an ML-based radiomic study. First, the quality of all AI and ML systems depends on the quality of the raw data and features that were used to train these in the first place. An accurate data set with minimal missing values and proper parameterization is of paramount importance; however, this is often challenging in ‘big data’ studies that include electronic health records and data from multiple sources.17,18 In particular, the ‘ground truth’ is often hard to determine in cardiac CT imaging studies due to interrater variability, although easier compared with other imaging modalities. Likewise, variations in CT acquisition parameters may also affect the validity of imaging and particular radiomic markers, which are sensitive to changes in slice thickness, scanner type, tube voltage, and pitch.87 Second, many complicated ML systems function as a ‘black box’, providing minimal insight into the logic behind a given algorithm.13 As a result, there have been several concerns regarding the acceptance of such a tool by physicians and patients alike. It should be noted that artificial and human intelligence are not competing, and AI systems can reduce the work burden of clinicians, who will still have the final say in deciding patient management.1 Nevertheless, this also raises a series of ethical and regulatory questions regarding the use of AI in patient care rather than in research alone.88 Third, different types of bias, such as selection bias in the patients or CT scanning protocols included in the training phase, will transfer into the derived ML algorithms.89 Finally, the multidimensional nature of radiomic features means that are for a small to moderate data set, there will always be a danger of overfitting.16 The redundancy in these features should be noted and accounted for, and proper validation approaches should be applied to minimize this risk.
7. Conclusions
In an era of increasing digitalization and accumulation of vast amounts of medical information and images, AI and ML provide novel solutions to the old problems of disease diagnosis and risk prediction. The simultaneous development of the field of radiomics now enables the quantitative mapping of routine cardiac CT scans, generating arrays of features that can be fed into ML algorithms for improved cardiovascular disease diagnosis and risk stratification. These novel approaches may transform the structure of modern healthcare, by relieving the physician from time-consuming image processing tasks, and maximizing the diagnostic and prognostic yield of existing images, with important clinical and health economic benefits (Figure 6). Still in its infancy, AI-based cardiovascular imaging has a lot to offer to both the patients and their doctors, as it catalyses the transition towards a more personalized model of care.

Computed tomography radiomics for precision medicine. A proposed workflow for the incorporation of machine learning-powered radiomic analysis of cardiac computed tomography scans in clinical practice. Radiomic analysis can reduce the analysis time and when integrated with electronic health records can provide automated recommendations to the physician regarding diagnosis and patient prognosis. At this stage, artificial and human intelligence can converge to enable the physician to select the optimal management plan based on all available data.
Conflict of interest: The methods for analysis of the perivascular fat attenuation index described in this report are subject to patent PCT/GB2015/052359 and patent applications PCT/GB2017/053262, GB2018/1818049.7, GR20180100490, and GR20180100510, licensed through exclusive license to Caristo Diagnostics. C.A. is a founder and shareholder of Caristo Diagnostics Ltd., a CT image analysis company. E.K.O. declares consultancy with Caristo Diagnostics. M.S. is an employee of Caristo Diagnostics.
Funding
The study was funded by the British Heart Foundation (FS/16/15/32047 and TG/16/3/32687 to C.A.) and the National Institute for Health Research Oxford Biomedical Research Centre. C.A. acknowledges support from the Oxford British Heart Foundation Centre of Research Excellence. E.K.O. acknowledges support from the A.G. Leventis Foundation.
References
National Institute for Health and Care Excellence (NICE). Chest pain of recent onset: assessment and diagnosis. Clinical Guideline [CG95]. https://www.nice.org.uk/guidance/cg95?unlid=28903932120171912336 (date last accessed 27 July 2019).
Task Force Members
SCOT-HEART Investigators.
The Lancet.