The ‘Digital Twin’ to enable the vision of precision cardiology

Abstract Providing therapies tailored to each patient is the vision of precision medicine, enabled by the increasing ability to capture extensive data about individual patients. In this position paper, we argue that the second enabling pillar towards this vision is the increasing power of computers and algorithms to learn, reason, and build the ‘digital twin’ of a patient. Computational models are boosting the capacity to draw diagnosis and prognosis, and future treatments will be tailored not only to current health status and data, but also to an accurate projection of the pathways to restore health by model predictions. The early steps of the digital twin in the area of cardiovascular medicine are reviewed in this article, together with a discussion of the challenges and opportunities ahead. We emphasize the synergies between mechanistic and statistical models in accelerating cardiovascular research and enabling the vision of precision medicine.


Modelling approaches for image analysis
A natural interplay between mechanistic and statistical models occurs through the images that are used to inform models, and both modelling approaches are driven by progress in imaging technology 9 .
Images can be used to generate patient-specific mechanistic models of cardiac function as well as to extract knowledge using statistical models. To date, these tasks have primarily been handled separately but harnessing the interplay between these modelling types offers opportunities to advance both.
Mechanistic models can benefit from the ability of statistical models to automatically segment and extract landmarks from images. The generation of patient-specific models from imaging data traditionally required considerable effort. Image registration and/or segmentation techniques have now become the core engine for this personalisation 10,11 , and these processes can now benefit from using fully automated convolutional neural network-based approaches [12][13][14] .
There is an increasing recognition that mechanistic models can also be used to improve the accuracy and reliability of statistical models in image-based analysis, which have experienced an explosion in the medical field over the last decade 15,16 . However, the accuracy of these models depends on accurate and reliable annotation of the training data. The high inter-observer and intra-observer variability, seen e.g. in common ultrasound measurements 17 , makes this reliability questionable. Either separate labelling studies must be commissioned to ensure data reliability or extensive data cleaning processes must be undertaken, both time-consuming and expensive due to the large amount of training data required.
Mechanistic models can be used here as tools to simulate new sets of training data 18 . Large sets of synthetic patients can be generated with a known ground truth by varying parameters in mechanistic models, and data can then be generated using emulators of the image acquisition process 19,20 .
The interplay between mechanistic and statistical models can also expand to the inference of cardiac function from images. An existing example is the use of a mechanistic model of cardiac electrical propagation to train a statistical model and then infer response to cardiac resynchronisation therapy (CRT) from body surface potential maps, achieving good predictive power when tested in patient data 21 .
The area of image analysis is where industrial translation has generated early success stories, enabled by different degrees of integration between statistical and mechanistic models. For example, CardioAI (Arterys Inc, USA) has implemented deep learning algorithms for segmentation 22 , magnetic resonance view projection 23 , and data generation 24 . EchoMD AutoEF (Bay Labs Inc, USA) assists the acquisition process with scan quality assessment in echocardiography, and EchoGo (Ultromics, UK) provides diagnostic assistance in stress-echo scans 25 . The benefits of these products include reducing operation time and sharing expertise and data for better model training.

Models to study molecular profiling data
Over the past decades, multiple -omics profiling technologies have emerged and been used in a wide variety of medical fields. In cardiology, numerous studies have demonstrated the complexity of cardiovascular diseases as intricate interactions of many genes, non-coding regions and regulatory proteins 26,27 . These technologies are also becoming more affordable, increasing their presence and enabling the creation of bigger databases that will provide new insights into cardiovascular disease pathophysiology 28 .
In a clinical environment, polygenic risk scores are currently used to discover at-risk patients by providing a forecast of cardiovascular events like myocardial infarction, heart failure and stroke, and to help customise therapy selection. Recent studies have shown different promising applications for molecular profiling, like their combination with imaging to compile a stratified profile of patient population 29 , or using these approaches to help identifying new target molecules for drug discovery and repositioning 30 . And high-throughput protein profiling methods are increasing specificity of protein read-outs in multiplexing assays 31,32 .
One challenge in these studies is the very large and heterogeneous data, with many variations to be stored, analysed and used. There is growing evidence of the important role that statistical models could have in cardiovascular disease risk estimation by computing personalised genomic risk scores 33 . Several studies have already improved performance by applying machine learning to conventional cardiovascular disease risk factors on large populations 34,35 . An example is the ability to assess, in abdominal aortic aneurysm, the effectiveness of adjusting lifestyles given personal genome baselines by integrating personal genomes and electronic health record data, demonstrating its utility as a personalised health management 36 .
Mechanistic models are also being used to refine our understanding of genomic variants in cardiac disease. Examples include elucidating the role of early somatic mosaicism in life-threatening arrhythmias in long-QT syndrome infants 37 , characterising novel autosomal dominant heterozygous mutations in catecholaminergic polymorphic ventricular tachycardia 38 , or explaining juxtaposed effects of gain-of-function mutations linked to QT prolongation and sudden cardiac death when expected to cause QT shortening 39 . The integration of statistical and mechanistic models holds significant potential for identifying novel genotypes and phenotypes in heterogeneous cardiovascular diseases 40 . In a near future, such datasets could be generated from patient samples and become a routine part of cardiovascular care and diagnosis, which could then feed mechanistic models to provide better insights into human biology.

Models for home monitoring and wearable sensors
With commercial technologies developing at a fast pace, even highly reliable and accurate acquisition devices can now be deployed in ambulatory or domestic care scenarios. Smart watches and wearable sensors are opening a new dimension for continuous monitoring and, subsequently, for diagnosing and detecting critical health events.
As an example, ambulatory ECG measurements have the potential of early detection of atrial fibrillation, leading to refined health care resource use and a more timely initiation of anticoagulant therapy 56 , with logic behind the ECG signal analysis based on the mechanistic understanding of rhythm variability. The opportunity for improvements resides in applying machine learning methods to identify additional signatures in long duration ECG recordings 57 . Other early examples have demonstrated the ability of statistical models to handle and analyse vast datasets from wearable devices in a clinically meaningful way. Patterns extracted from photoplethysmography signals were used to detect atrial fibrillation in an ambulatory setting, showcasing the potential of these methods in early detection 58 .
Furthermore, mechanistic-driven processing of the photoplethysmography signal has been applied to identify heart rate variability 59 .

Models for population studies
The inherent limitations of shaping treatment guidelines based on large population studies leads to decisions that are based on an "average" patient within a large group, thereby missing the opportunity of personalisation. In this setting, the digital twin may present an opportunity to define targeted patientspecific guidelines based on standardised predictive models taking in account the "individual-specific" factors for treatment 41 .
Furthermore, the increasing availability of large cardiological databases in electronic health records, and the integration with imaging, omics and wearable/home sources, is creating opportunity to improve disease diagnosis and prognosis. Building on these records, information gained on a patient population can be used to individualise care 42,43 , or to build risk prediction models using records from different countries [44][45][46] . The clinical adoption of these tools relies on their validity, and thus on the availability of multiple databases with good data recording control to prevent bias and missing data. However, combining databases is challenging in practice. The majority share basic categories of information such as age or undergone procedures but may lack specific examination findings such as ejection fraction or haemoglobin level.
Statistical and mechanistic models can tackle these problems. Sensitivity analyses can determine the most important factors, and data imputation may help addressing incomplete records. Models can simulate the set of missing data 47 , and can then be used to assist the personalisation of treatment for individual patients 41,48,49 . For example, collaborative filtering techniques can integrate data from multiple sources to provide an estimate in cases where data points are missing 50 . Such approaches have already shown high predictive accuracy for both sudden cardiac death and recurrent myocardial infarction 51 .
Other machine learning techniques such as principal component analysis or kernel learning can also determine the most relevant dimensions of data sources with hundreds or thousands of dimensions 52,53 .
This can be used to identify the most relevant parameters for mechanistic models. Similarly, sensitivity and uncertainty analyses have been employed with mechanistic models to identify important parameters in simulations 54,55 , thus guiding the choice of the most relevant metrics for population studies.