Cardiovascular RNA markers and artificial intelligence may improve COVID-19 outcome: a position paper from the EU-CardioRNA COST Action CA17129

Abstract The coronavirus disease 2019 (COVID-19) pandemic has been as unprecedented as unexpected, affecting more than 105 million people worldwide as of 8 February 2020 and causing more than 2.3 million deaths according to the World Health Organization (WHO). Not only affecting the lungs but also provoking acute respiratory distress, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is able to infect multiple cell types including cardiac and vascular cells. Hence a significant proportion of infected patients develop cardiac events, such as arrhythmias and heart failure. Patients with cardiovascular comorbidities are at highest risk of cardiac death. To face the pandemic and limit its burden, health authorities have launched several fast-track calls for research projects aiming to develop rapid strategies to combat the disease, as well as longer-term projects to prepare for the future. Biomarkers have the possibility to aid in clinical decision-making and tailoring healthcare in order to improve patient quality of life. The biomarker potential of circulating RNAs has been recognized in several disease conditions, including cardiovascular disease. RNA biomarkers may be useful in the current COVID-19 situation. The discovery, validation, and marketing of novel biomarkers, including RNA biomarkers, require multi-centre studies by large and interdisciplinary collaborative networks, involving both the academia and the industry. Here, members of the EU-CardioRNA COST Action CA17129 summarize the current knowledge about the strain that COVID-19 places on the cardiovascular system and discuss how RNA biomarkers can aid to limit this burden. They present the benefits and challenges of the discovery of novel RNA biomarkers, the need for networking efforts, and the added value of artificial intelligence to achieve reliable advances.


Introduction: SARS-CoV-2 in 2020
The effect of the coronavirus disease 2019 (COVID-19) pandemic on the cardiovascular system is alarming. More research focusing on the collateral damage associated with COVID-19 infection is needed. COVID-19 causes pneumonia with multi-organ disease.
Infection can be asymptomatic or may cause a wide spectrum of symptoms, from mild upper respiratory tract infection to life-threatening sepsis with generalized endothelial damage, inflammation and thrombosis. COVID-19 first emerged in December 2019 in Wuhan, China, and as of February 8th 2020 has affected people in more than 200 countries, with more than 105 million identified cases and with over 2.3 million confirmed deaths (WHO Coronavirus Disease Dashboard). It is clear that one of the causes for the significant differences in the severity of symptoms and mortality may derive from patient susceptibility to infection. Moreover, a significant proportion of COVID-19 survivors suffer cardiovascular damage. As such, there is a clinical need for novel biomarkers which would aid in the identification of patients at risk of suffering a severe form of the disease or that may identify those patients prone to develop collateral damage in the vascular, cardiac and cerebrovascular systems that may jeopardize their future wellbeing. We need to investigate and innovate to detain the next pandemic wave of COVID-related cardiovascular disease.
To face the pandemic and limit its medical, -social and economic burden, health authorities have launched several Fast Track calls for research projects aiming to develop rapid strategies to combat the disease, as well as longer-term projects to learn and draw lessons from the current pandemic and prepare for the future 1 .
A myriad of potential biomarkers of COVID-19, for both diagnostic and prognostic purposes, have been highlighted in an extremely high number of published articles within the few months following the beginning of the pandemic. Although it is difficult to identify from all these reports the most relevant biomarkers with serious translational potential, artificial intelligence approaches could constitute a key component of such endeavours. Cardiovascular and blood RNA markers, coupled with artificial intelligence methods, represent a still poorly explored yet rich reservoir of novel biomarkers with some potential to aid in personalizing healthcare of COVID-19 patients. Recent singlecell RNA sequencing experiments support this assumption 2 .

Epidemiology of SARS-CoV-2 and cardiovascular disease
SARS-CoV-2 infection affects mostly the ageing population with pre-existing cardiovascular diseases, such as coronary artery diseases, heart failure or respiratory failure of any origin. Moreover, individuals with pre-existing risk factors for cardiovascular disease or with co-morbidities affecting the cardiovascular system, are at high-risk for worse clinical outcome during the infection 3  Patients with pre-existing heart failure and SARS-CoV-2 infection have a two-fold higher risk of 30-day mortality as compared to patients without pre-exiting heart failure and SARS-CoV-2 infection, independently of the category of heart failure (reduced, midrange, or preserved ejection fraction) 9 . Multi-organ failure due to hypoxia caused by respiratory failure, acute kidney injury, electrolyte disturbances, systemic inflammation and cytokine storm contribute to the cardiac injury in patients with SARS-CoV-2. The cytokine storm seems to contribute to a large extent to cardiac and vascular events.
However, there are reports asking for a more concise definition of the cytokine storm and its real impact in the pathogenesis of the infection [10][11][12] . Altered coagulation may lead to thrombotic complications including microthrombosis, microvascular damage, and generalized thromboembolic disorder. Recent empirical drugs against OVID-19 such as chloroquine, antiviral or anti-rheumatic drugs, monoclonal antibodies or antibiotics may also aggravate cardiovascular symptoms by prolonging QT interval leading to arrhythmias, or resulting in drug-induced cardiomyopathies or cardiotoxicity 4 . Since SARS-CoV-2 has a strong affinity for the angiotensin-converting enzyme 2 (ACE2) cell receptor, it was plausible to assume that antihypertensive treatment with ACE inhibitors or angiotensin receptor blockers (ARBs) might aggravate the disease. To date, however, no clinical evidence can confirm this assumption, thus ACE inhibitor and ARB treatments continue to be administered to SARS-CoV-2 positive patients 13,14 . The lockdown regulations and subsequent closure of out-patient clinics has led to major re-organizational efforts of the management of patients with cardiovascular disease. A paradoxical decrease of documented acute myocardial infarction has also been observed, which could be attributed to the lack of preventive control of patients with chest pain, and the self-quarantining of the patients fearing from the risk of nosocomial infection 15 .

Pathophysiology of SARS-CoV-2 infection phases; effects on the heart
SARS-CoV-2, a member of the family of coronaviruses, is an enveloped, positive-sense, single-stranded RNA virus that is able to infect various host species 16 . Among the viralencoded proteins, the SARS-CoV-2 spike (S) transmembrane glycoprotein protrudes from the viral surface and is essential for target cell binding and infection. ACE2 has been identified as the SARS-CoV-2 receptor 17-20 and ACE2 is highly expressed in the lung, heart, ileum, kidney and bladder 21 . The majority of adaptive immune cells that invade the infected lung tissue consist of T cells, since a proportional decrease in circulating T cells has been observed in COVID-19 patients. IL-8 and IL-6, recognized chemoattractants for T cells and neutrophils, are produced by SARS-CoV-2-compromised lung epithelial cells (Figure 1a) 22 . As neutrophils function in adaptive immunity but can also provoke further damage to the lung, these cells are regarded as double-edged swords in the context of COVID-19 23 . Circulating monocytes are attracted from the circulation by granulocyte macrophage colony stimulating factor that is produced by local T cells in infected tissue. In addition, elevated CD14+CD16+ inflammatory monocytes producing high levels of IL-6 are found in COVID-19 patients, suggesting that also monocytes actively contribute to the systemic inflammatory response. Finally, thrombosis and pulmonary embolism are commonly observed in severely-ill COVID-19 patients ( Figure   1b), likely indicating the presence of significant endothelial injury and microvascular permeability, which may further exacerbate viral invasion. The symptoms of COVID-19 patients are heterogeneous, ranging from minimal symptoms to significant hypoxia with acute respiratory distress, shock, coagulation dysfunction, and multi-organ involvement, including acute kidney injury, encephalopathy, myocardial injury and heart failure. Indeed, epidemiological, clinical and biological evidence shows a clear cardiac involvement in COVID-19 patients, due to direct myocardial infection and injury and/or to indirect mechanisms, linked to the underlying pathophysiology of the disease 24 .
In keeping with a direct effect on heart function of SARS-CoV-2 (Figure 1c), its receptor ACE2 is expressed by cardiomyocytes, fibroblasts, endothelial cells, pericytes, macrophages and the epicardial fat 21 . Moreover, ACE2 levels are increased in failing hearts and its high expression in arterial vascular cells of fibrotic lungs may facilitate the bloodstream spreading of SARS-CoV-2 25 . Cardiomyocytes derived from human induced pluripotent stem cells can be infected efficiently by SARS-CoV-2 26,27 . The SARS-CoV-2 genome has been identified in endomyocardial biopsies of patients with suspected myocarditis 28 . However, while cardiomyocyte damage was present, no viral particles were detected in cardiomyocytes and endothelium, suggesting that the particles were due to infected macrophage migration. Thus, direct myocardial infection may not be the main mechanism of myocardial damage explaining the frequently observed troponin increases.
The release of inflammatory cytokines (Figure 1c), a hallmark of severe COVID-19, can also lead to a form of myocarditis resembling Takotsubo syndrome 29 . Moreover, the prothrombotic state of COVID-19 patients, associated to D-dimers increase, may lead to microvascular dysfunction, coronary thrombosis or embolism (Figure 1b) 30 . Along with the pro-coagulant profile of patients with COVID-19 31 , other forms of stress may facilitate cardiomyopathy occurrence, such as hypoxemia caused by respiratory dysfunction, endothelial dysfunction leading to small arterial obliteration 28 , and the increased metabolic demands (Figure 1d).

Remdesivir
Remdesivir is the first medicinal product for human use for the treatment of COVID-19 which was granted a conditional marketing authorization of the European Parliament and of the Council 32 . It is a nucleotide analogue with a broad-spectrum antiviral activity. The European Medicines Agency, specifically the Committee for Medicinal Products for Human Use, has granted a conditional marketing authorization to Veklury (remdesivir) for the treatment of COVID-19 in adults and adolescents with pneumonia who require supplemental oxygen (O2) 33 . The recommendation of remdesivir is mainly based on the results of the Adaptive COVID-19 Treatment Trial (ACTT)-1 sponsored by the US National Institute of Allergy and Infectious Diseases, and supporting data from other studies on remdesivir 33,34 . According to the ACTT-1 study, patients in the remdesivir group had a shorter time to recovery than patients in the placebo group (median 10 vs.15 days) 35 . Kaplan-Meier estimates of mortality at day 29 were 11.4% in the remdesivir group and 15.2% in the placebo group (hazard ratio 0.73; 95% CI 0.52 -1.03) 35 . The Food and Drug Administration (FDA) issued an emergency use authorization 36 . The use of remdesivir has shown shortening of recovery time in severe patients with O2 saturation ≤ 94%, and cases requiring supplemental O2, mechanical ventilation, or extracorporeal membrane oxygenation 37,38 . It is recommended to start the treatment on day 1 with 200 mg infusion, followed by 100 mg infusion daily for at least 4 days and maximum 9 days 33 . According to the WHO SOLIDARITY trial (results in preprint), death rate ratios for remdesivir are RR=0.95 (95% CI 0.81-1.11, p=0.50) 39 . Comparative results from other studies are shown in Table 1. Overall, remdesivir, while improving time to recovery in patients with mild symptoms in ACTT1 trial, fails to improve mortality.  Interferon ****** -SOLIDARITY trial. Three doses over six days of 44 μg subcutaneous Interferon-ß1a.

Dexamethasone
According to the RECOVERY trial results, in the dexamethasone group, the incidence of death was lower than in the usual care group among patients receiving invasive  40 . Based on these results, 6 mg of dexamethasone is recommended once daily for up to 10 days in COVID-19 patients on mechanical ventilation or who require supplemental O2 but who are not on mechanical ventilation 38,41 .

Chloroquine or hydroxychloroquine, lopinavir-ritonavir
Although chloroquine or hydroxychloroquine were one of the medications which appeared to show great potential at the beginning of COVID-19 pandemic, their use has been stopped due to lack of efficacy. Numerous companies donated these medications for treating COVID-19 patients,however, the FDA revoked the emergency use authorization for this drug. Furthermore, the combined use of hydroxychloroquine and azithromycin is not recommended because of the potential adverse reactions. Lopinavir/ritonavir also did not demonstrate benefit in patients with COVID-19. As reported in Table 1, the interim WHO SOLIDARITY trial results indicate that remdesivir, hydroxychloroquine, lopinavir and interferon treatments had little or no effect on hospitalized COVID-19 patients, as indicated by overall mortality, initiation of ventilation and duration of hospital stay 39 .

Immunomodulatory medications
Several medications used in modulating the immune response, such as interleukin-1 (anakinra) or interleukin-6 (sarilumab, siltuximab, tocilizumab) inhibitors are being used off-label and are being investigated. These medications have been proposed to suppress the cytokine storm 42 .

Convalescent plasma
The convalescent plasma containing antibodies against SARS-CoV-2 virus collected from recovered COVID-19 patients is also being widely investigated. A randomised clinical trial with convalescent plasma therapy did not show any statistically significant improvement in clinical status or death rate 43 . However, this trial provided valuable information on the potential benefits of convalescent plasma, which may be useful in combination with antiviral drugs. According to some preliminary research, early administration of high dose intravenous immunoglobulin therapy may improve the prognosis of critically ill patients 44 . On August 23, 2020 FDA issued an emergency use authorization for convalescent plasma for the treatment of COVID-19 in hospitalized patients 45 .

Markers of disease evolution: what is available, what is needed
As the world faces the COVID-19 pandemic, markers enabling to predict the development of severe symptoms after SARS-CoV-2 infection are highly needed.
Presence of cardiovascular risk factors (particularly arterial hypertension, diabetes mellitus and aging) and previous cardiovascular diseases reportedly expose to an unfavourable progression of COVID-19 46 . As such, they can already provide an initial and rudimental model to risk stratify patients.
Mortality rate after COVID-19 is associated with elevation in the "classic" cardiac damage biomarkers, such as troponin T (TnT) and/or BNP/NT-proBNP 3,47 . In line with that, COVID-19 patients who do not have significantly increased TnT levels show a lower mortality compared to patients without cardiovascular disease 5,48 . This suggests that TnT and BNP/NT-proBNP concentration should be closely followed in patients with COVID-

both for diagnostic (cardiac involvement) and prognostic purposes. Elevations of D-
Dimers have also been associated with poor outcome 49 . The addition of other biomarkers such as the inflammatory cytokine IL6 and lymphocyte count will be also helpful to determine the individual risk of a patient.
Omics-based approaches recently discovered interesting metabolites in plasma of patients with COVID-19. Using both targeted and untargeted tandem mass spectrometry to profile the plasma lipidome and metabolome of COVID-19 patients with various degrees of severity and healthy controls, a panel of 10 plasma metabolites was found to distinguish COVID-19 patients from healthy controls with an area under the receiver-operating characteristic curve (AUC) of 0.975 50 .
Biomarkers that might be useful in indicating progression from mild to severe multiorgan complication in COVID-19 patients are summarized in Tables 2 and 3 (Tables 2 and 3).
The role of the cardiovascular expression/activity of the putative SARS-CoV-2 receptor ACE2 as well as of the use of renin-angiotensin-aldosterone system (RAAS) inhibitors in SARS-CoV-2 susceptibility and COVID-19 disease severity have been a matter of debate [68][69][70][71] . However, the clear recommendation is to continue the administering of RAAS inhibitors or blockers in SARS-CoV-2 positive patients with underlying cardiovascular disease. Elevated angiotensin II levels have been found to correlate with lung injury and viral load, suggesting that administration of angiotensin 1-7 and angiotensin 1-9 may help in restoration of normal functioning of renin-angiotensin system by antagonizing the effect of abnormally increased angiotensin II 72 .
Circulating RNAs represent a rich source of biomarkers with clinical utility due to their biological relevance, dynamic regulation in response to onset and progression of disease, tissue-specificity, and accessibility for non-invasive analysis using biofluids ("liquid biopsies"). Especially for diseases with diverse symptoms and complications such as  The FADD/caspase-8 axis regulates TNF-α and IFN-γ co-treatmentinduced inflammatory cell death independent of intrinsic apoptosis in macrophages Given the disproportionate impact of COVID-19 in ethnic minorities, it is essential to clarify if biomarkers are of use in such populations and if so how they could be ad-hoc adapted. Not only cardiac but also endothelial biomarkers deserve attention 77 . Gendermedicine considerations for COVID-19 cardiovascular risk stratification are also of paramount importance. Women appear to be better protected, as men display higher mortality rates (ranging from 60 to 75 %) 78 . Should this be due to a protective effect of oestrogens, perimenopausal and postmenopausal women without hormonal replacement therapy could be considered at higher risk of cardiovascular death following COVID-19.
Preclinical evidence suggests that sex may influence the expression of the ACE2 receptor 78 . Hence, the examination of sex differences should be an integral part of COVID-19 directed research projects. This is especially crucial as sex-specific RNA biomarkers may help in tailoring future healthcare. Addressing the increasing challenges posed by communicable diseases thus calls for multidisciplinary and multi-centre international cooperation to link available data, tools and expertise, which will otherwise only be sub-optimally exploited at regional or national levels. A truly integrated approach coordinating and facilitating the access to and sharing of biological resources, data, advanced technological facilities and expertise, within a common research roadmap, is needed to exploit the full potential of the various resources. As COVID-19 incidence and clinical outcomes have been shown to be greatly influenced by many biological and environmental factors, the need to integrate data across the various settings worldwide is critical to increase the precision of analyses and to deliver meaningful results.

Networking and coordination efforts for multinational multi-center studies on cardiovascular RNA markers
Through the EU-CardioRNA COST Action 79 , in April 2020, a call was placed to assemble a taskforce of clinicians and translational scientists working with COVID-19 patients to join forces in an international effort. This was communicated internally within the Action network as well as externally on the Action website and professional (social) media (https://cardiorna.eu/news/cost-actions-unite-efforts-in-the-fight-against-covid-19/) 80

Technical challenges and requirements in the RNA-study
The quantitative analysis of RNAs in biological samples faces several technical challenges that must be overcome in order to generate robust and reproducible results.
Specifically, the analysis of circulating RNAs is complicated by a variety of preanalytical settings that impact the analysis as well as the analytical challenge to deal with very low RNA concentrations.
To date, whole blood, serum and plasma are the most widely explored liquid matrices for circulating RNA analysis. Analysis of whole blood can be biased by red blood cells and platelets, which are a rich source of small RNAs despite being anucleate 83 . Thus, protocols for specific depletion of certain types of RNAs have been developed for whole blood that improve sensitivity for other types of RNAs 84 . Serum and plasma as the liquid components of blood can behave quite differently due to the release of RNAs during platelet activation and blood coagulation after which serum is collected 85 . Therefore, results for RNA biomarker analysis are oftentimes not comparable between serum and plasma 86 . In addition, contamination of serum or plasma with cellular RNA derived from red blood cells due to haemolysis 87, 88 , or platelets due to variable pre-analytical processing 89 , can confound the analysis and lead to false-positive or false-negative results 90 .
Currently, only few studies have attempted to address sources of bias for other types of liquid biopsies. For example, in case of urine it is known that donor-dependent differences in volume based on hydration status result in highly variable RNA concentrations that require normalization prior to analysis using for example urinary creatinine levels 91 .
In biofluids, RNAs are associated with two main types of RNA carriers, which facilitate transport and protect their RNA cargo from degradation: protein complexes and extracellular vesicles (EVs). At least in terms of small RNAs, it is known that the majority of extracellular RNAs in plasma or conditioned media is associated with protein complexes 92,93 . This means that total RNA isolation and analysis from these matrices mainly reflects the protein-associated RNA fraction, and that the separate analysis of RNAs that are selectively released via EVs can reveal different results 94 . It is important to note that RNA analysis in EVs is anything but trivial and requires careful optimization of EV isolation and characterization and reporting according to the MISEV standard developed by the International Society of Extracellular Vesicles 95 .
The analysis of RNA integrity and abundance obtained by RNA isolation is hampered by low concentrations. Thus, either highly sensitive methods using RNA specific dyes should be used and internal process controls such as spike-in oligonucleotides ("spikeins") can be useful to monitor RNA recovery and analytical variability and to normalize RNA expression data in biofluids in the absence of robust endogenous RNA references.
Analytical methods for circulating RNA quantification must also be highly sensitive to cope with low concentrations. Reverse-transcription quantitative PCR (RT-qPCR) is a gold-standard technology for this purpose. However, low throughput and high cost for using RT-qPCR in genome-wide RNA biomarker discovery have restricted its use to targeted analyses for biomarker validation. This limitation resulted in the uptake of nextgeneration sequencing (NGS) for untargeted RNA biomarker discovery. Since early on it was observed that the abundance and stability of small RNAs in biofluids was surprisingly high, small RNA sequencing was rapidly adopted for biomarker identification in liquid biopsies 96 .
The challenges for using small RNA NGS for circulating RNA analysis are 1) the extended PCR pre-amplification that is need to obtain sufficient input material but is potentially resulting in PCR duplicates, 2) adapter-ligation bias leading to over-and under-representation of certain RNAs in the library, and 3) the relative quantification that restricts the main use to cross-sectional comparisons between selected groups. To overcome these challenges, unique molecular indices can be included in the adapter sequences to identify and remove PCR duplicates prior to data analysis 97 . Second, sophisticated adapter-design such as randomized ends or single ligation protocols have been shown to reduce the ligation-bias and reduce adapter-dimers 98,99 .. Finally, the addition of spike-in calibrators with randomized ends and optimized concentration ranges can be used to normalize small RNA NGS data and achieve absolute quantification that is less sensitive towards changes in the (small) RNA composition of a sample 100 .
Recently, also the application of total RNA sequencing for RNA biomarkers discovery in liquid biopsies has advanced to explore the full spectrum of RNAs. A stranded total RNA sequencing kit appeared to be sufficiently robust, accurate and precise to quantify thousands of genes in platelet-rich and platelet-free plasma, urine, and conditioned medium as well as EVs isolated from these matrices 101 . EVs from platelet-free plasma showed a large percentage (>80%) of short reads that were too short to be aligned. This was not observed for total RNA from platelet-free plasma and platelet-rich plasma, and total RNA as well as EV-RNA from urine and conditioned medium. This might suggest that RNA released from cells via EVs into the blood stream might be fragmented endogenously. In terms of gene-biotypes, protein coding genes made up the majority (>70%) of reads for all matrices except platelet-rich plasma, followed by pseudogenes, long noncoding RNAs (lncRNAs), and miscellaneous RNAs 101 .
Overall, the planning of ideal RNA biomarker study should in the first step consciously decide which biological matrix and RNA carrier are most relevant and practical, secondly, implement standardized protocols for sample collection and sample quality control at the study sites, and thirdly, take advantage of a well-characterized, fit-for-purpose validated, NGS protocol for genome-wide total RNA and small RNA quantification in low RNA input samples.

Data handling and integration
Data infrastructure that curates, integrates and analyses clinical and experimental data from several COVID-19 cohorts is pivotal to make harmonized data available to research network members in order to unravel cardiovascular RNA markers of SARS-CoV-2 infection. Systematic collection, and application of standards play an important role in managing and handling cohort data and its meta-data efficiently. They facilitate FAIR (Findable, Accessible, Interoperable and Reusable) use of the data, which provides a solid foundation for systematically discovering, retrieving, understanding, integrating, disseminating, exchanging, reusing the data and reproducing research results and outcome.

Making data findable, including provisions for metadata
In order to make the data discoverable, the following rules should be ensured: • Data sets need to be assigned a unique identifier within the project. The data management team ensures that the identifier is globally unique.
• Accompanying metadata such as the study protocol, experimental parameters etc.
should be provided. This will make it possible for members of research networks to fully grasp the experimental setup and data content.

Making data accessible
Data should be made available to broader audience in accordance with the access model that will be defined by participant informed consent, and ethics/institutional review board approvals. This should include descriptions and data formats and in compliance with legal obligations, in particular the General Data Protection Regulation (GDPR). Data security is of paramount importance for protection of personally identifiable information.

Making data interoperable
Harmonization of data and metadata by applying standard ontologies, controlled terminologies, and state-of-the-art data models is pivotal for interoperability of the data that will facilitate cross-study analysis. Clinical and phenotype data should be standardised by using state-of-the-art standards such as the CDISC standards: Study Data In addition to applying the state-of-the-art standards for clinical and omics data, application of standards in data management to guarantee data security, data privacy and compliance with GDPR and ethical guidelines are necessary. Given the sensitive nature of human data, the data and computing environment must be access controlled and in/output data flows should be encrypted, site restricted and equipped with two-factor authentication wherever needed.

Increase data re-use (through clarifying licences)
The long-term sustainability for the database, analysis portal and related outputs (results, tools, software modules and algorithms) should be planned in advance. For archiving, preservation and long-term usage of the data and software tools/algorithms, research network partners should have the capacity to provide long-term sustainability of translational research data through GDPR-compliant hosting and tools. The process should follow well-defined access criteria and data protection needs. We recommend to prepare a sustainability plan for defining the rules to fulfil the legal processes (including addressing the issue of institutional data access committee responsibility), governance and the economic viability of the database.

Data integration
A robust and secure data management and analysis platform, for example through a software portal and database, is important for the collection and integration of harmonized clinical, healthcare (electronic health records) data and pre-processed omics (molecular) data, imaging data, and real-world sensor/mobile data, biobank sample data and metadata from various COVID-19 projects (Figure 2). . Such a data portal should also provide secure, easy and robust interface for the input and integration of new data from ongoing recruitment of cohort studies. Analytical tools from existing initiatives/packages such as I2B2 106 , tranSMART 107 , SmartR 108 , EGA (European Genome-phenome Archive) 109 , and eTRIKS platform 110 are very useful to perform integrated data analysis and hypothesis generation. In order to store, process and analyse imaging data, for example chest X-ray images from COVID-19 patients, a dedicated open source imaging informatics solution such as XNAT 111 should be integrated into the platform instead of only storing the images in a file system. Such a portal will enable researchers to perform cross-study comparisons, slice and dice the cohorts based on certain clinical features and run built-in workflows from the graphical user interface. An application programming interface to enable batch/programmatic interaction with the portal will provide structured and harmonized data to bioinformaticians, statisticians and data scientists working with large amounts of data.

Data analysis, biostatistics, artificial intelligence
After the data on RNA and clinical data are collected, secured, pre-processed and integrated, most informative biomarkers to predict major adverse cardiovascular events (MACE) and mortality of COVID-19 patients shall be identified. This identification can rely on biostatistical and machine-learning (ML) methods. Afterwards, ML should be utilized to build a classifier to predict MACE and mortality based on these biomarkers.
For this approach to be used, RNA expression data accompanied by demographic and  112,113 .

Biomarker identification
The most basic approach to identify predictive RNAs is differential expression analysis: RNAs that are significantly over-or under-expressed in patients who experienced a MACE or died, compared to those who did not, are potential biomarkers. Various statistical methods can be used for this 114 . However, this approach is simplistic, mainly in that it does not take into account interactions between the RNAs, so it can only serve as the first step. Two more sophisticated approaches can be explored: Bayesian variable selection (BVS) and feature selection.
Bayesian variable selection is a state-of-the-art statistical approach for selecting informative predictors such as RNA biomarkers 115 . One first picks a class of models, such as linear or logistic regression models, to predict the end-point of interest (e.g. MACE or mortality) based on the predictors (RNA quantities). The goal is to select from this class of models those able to accurately predict end-points. To do so, prior probability distributions of their parameters need to be set first. The most appropriate strategies to do this is subject of ongoing research, but one of accepted automatic methods can certainly be used. We believe, though, that information on RNA's biological function from the NONCODE database 116 , or overlap with genomic loci related to cardiovascular disease, could yield more informative biomarkers. Based on the models' prior probabilities and the collected data, one computes their posterior probability using the Bayes rule, where good models are the ones with a high posterior probability. Since the space of models is too large to search exhaustively, Monte-Carlo sampling is used, which can relatively quickly identify accurate models.
Feature selection is an approach that selects informative features (RNA biomarkers) to be used to train ML models that predict the end-point of interest (MACE or mortality) 117 .
There are three main groups of feature-selection methods. Filter methods consider each feature in isolation and are similar to differential expression analysis, so they are rarely the best option. Embedded methods are a part of some ML algorithms. Their quality depends on the quality of the algorithm they are derived from, but they can take into account some interactions between features. Wrapper methods are the most complex ones and are conceptually similar to BVS. They search the space of feature combinations, and evaluate each combination by training a model on it and checking the model's accuracy.
Since the space of feature combinations is again too large to search exhaustively, various types of greedy search are typically used. The main advantage of simple approaches, such as differential expression analysis or filter feature selection, is the clear justification for the selection of each biomarker. The disadvantage is that they can provide redundant biomarkers or fail to identify RNAs having biomarker potential only when combined with others. The advantage of BVS and more advanced feature selection is that they provide sets of biomarkers that perform well in combination. The disadvantages are that they are somewhat opaque and computationally expensive. Wrapper methods appear to be the most flexible and potentially most powerful methods to identify predictive biomarkers.
From the two approaches, we recommend Bayesian variable selection and feature selection, either the one that results in better risk-prediction models on the validation dataset, or the combination of both can be used. They can be combined in sequence (one making the first selection and the other refining it) or in parallel (by using the intersection or union of the biomarkers selected by the two approaches). The best approach depends on the dataset and the outcome to predict, and needs to be determined experimentally.

Cardiovascular/COVID-19 risk prediction
After identifying the most informative RNA biomarkers, thesetogether with phenotype (demographic and clinical) dataare fed into ML algorithms to build risk-prediction models. Figure 3 depicts the workflow of biomarker identification and COVID-19 risk prediction.
Details of data collection and data management are depicted in Figure 2. The workflow starts with the data collection and management. This is followed by biomarker identification using the training dataset (cf. section Biomarker identification) and machine learning model development with the validation dataset. Note that even though biomarker identification can be done independently of phenotype and clinical data, such data is often included in the prediction models. This enables one to analyse their capacity to predict MACE and mortality alongside with the RNA biomarkers. Finally, the prediction model is thoroughly evaluated using the test dataset.  SVM is another ML algorithm that is able to capture data non-linearity. SVM applies a kernel to map data into multidimensional space. A SVM model is a hyperplane that splits the classes in this multidimensional space in a way that minimizes the prediction error during data classification. The selection of the kernel function is crucial to the algorithm's performance. Compared to ANNs, SVMs tend to be more resistant to overfitting (better handle noise in the training data) and require less memory.
Ensemble methods are a popular approach that has been successfully applied to highdimensional biomedical datasets with small sample size. The idea behind ensemble methods is to combine several base classifiers that will produce better classification results than a single classifier. One of the most successful ensemble methods is random forest. Random forest uses a set of decision trees that form a forest. In order to avoid overfitting, each decision tree in the forest uses a random subset of samples from the training set, and a random subset of features. Classification is then performed based on the majority vote of the trees. For example, the FEELnc tool uses random forest for annotation of lncRNAs and achieves an AUC of 0.97 119  Another aspect of the COVID-19 pandemic is the need for cardioprotective strategies to prevent the long term cardiovascular consequences of the disease. Yet, despite intensive efforts, the development of cardioprotective therapies has been unsuccessful in the last 3 decades 126 . Small non-coding RNAs fingerprints of COVID-19 itself and the different comorbidities and their co-medications that affect the infection may provide a useful tool to develop diagnostic and prognostic markers and to discover novel drug targets to prevent and treat COVID-19 and its cardiovascular consequences 127 . Understanding the molecular interactions between SARS-CoV-2 and its host as well as the influence of cardiovascular risk factors, comorbidities, and medications on clinical outcomes may significantly speed up the lengthy process of development of diagnostics and therapeutics not only against COVID-19 but also other diseases [127][128][129] .

Conclusion and perspectives
COVID-19 has brought about an unexpected and unprecedented historical period, worldwide. Despite the tremendous efforts and reactiveness of all stakeholders from the broad healthcare sector ─ clinicians, healthcare staff, researchers, funding bodies, and regulatory authorities ─, the burden of COVID-19 is enormous, both medically, socially as well as economically.
The research field has been very reactive and multiple networks of experts and task forces have been formed to tackle the challenge of finding drugs and biomarkers of COVID-19.
Building an effective coordination of large interdisciplinary networks involved in multicentre studies is key for success of biomarker projects. RNA biomarkers combined with artificial intelligence-based strategies will certainly help in building algorithms to aid in clinical decision making and personalization of healthcare through risk stratification of patients. Efficient academia-industry partnerships are essential to rapid marketing and clinical use of novel disease biomarkers. Novel tools based on systems biomedicine concepts and artificial intelligence methods are needed to speed up the translational process and clinical application.
While it is obvious that the cardiovascular burden associated with SARS-CoV-2 infection is alarming and deserves great attention during healthcare of COVID-19 patients, it is also important to keep in mind that more than a third of hospitalized COVID-19 patients present psychological distress and neurological manifestations such as headache, ischemic stroke, seizures, and other diverse encephalopathies 130 . SARS-CoV-2 has been detected in the brain and cerebrospinal fluid 131 , and is associated with encephalitis.
Various neurological sequelae have been associated with the Spanish influenza pandemic and other coronaviruses 132 . Therefore, a deeper knowledge of the host-pathogen interactions involving regulatory RNAs 82 in the brain-heart axis 133 may provide novel avenues for discovery of biomarkers and therapeutic pathways to improve healthcare and prepare for future pandemics.