Improving dynamic stroke risk prediction in non-anticoagulated patients with and without atrial fibrillation: comparing common clinical risk scores and machine learning algorithms

Abstract Aims Diversified cardiovascular/non-cardiovascular multi-morbid risk and efficient machine learning algorithms may facilitate improvements in stroke risk prediction, especially in newly diagnosed non-anticoagulated atrial fibrillation (AF) patients where initial decision-making on stroke prevention is needed. Therefore the aims of this article are to study common clinical risk assessment for stroke risk prediction in AF/non-AF cohorts together with cardiovascular/ non-cardiovascular multi-morbid conditions; to improve stroke risk prediction using machine learning approaches; and to compare the improved clinical prediction rules for multi-morbid conditions using machine learning algorithms Methods and results We used cohort data from two health plans with 6 457 412 males/females contributing 14,188,679 person-years of data. The model inputs consisted of a diversified list of comorbidities/demographic/ temporal exposure variables, with the outcome capturing stroke event incidences. Machine learning algorithms used two parametric and two nonparametric techniques. The best prediction model was derived on the basis of non-linear formulations using machine learning criteria, with the highest c-index was obtained for logistic regression [0.892; 95% confidence interval (CI) 0.886–0.898] with consistency on external validation (0.891; 95% CI 0.882–0.9). These were significantly higher than those based on the conventional stroke risk scores (CHADS2: 0.7488, 95% CI 0.746–0.7516; CHA2DS2-VASc: 0.7801, 95% CI 0.7772–0.7831) and multi-morbid index (0.8508, 95% CI 0.8483–0.8532). The machine learning algorithm had good internal and external calibration and net benefit values. Conclusion In this large cohort of newly diagnosed non-anticoagulated AF/non-AF patients, large improvements in stroke risk prediction can be shown with cardiovascular/non-cardiovascular multi-morbid index and a machine learning approach accounting for dynamic changes in risk factors.


Introduction
Atrial fibrillation (AF) is the commonest cardiac rhythm disorder and confers a five-fold greater risk of stroke. 1 The risk of stroke is not homogeneous and depends on the presence of various clinical risk factors. The more common and validated stroke risk factors have been used to formulate clinical risk scores for stroke risk stratification, but all clinical scores only have modest predictive value for identifying the 'high risk' patients that actually sustain stroke events, with c-indexes (a statistical measure of prediction) of 0.6. 1,2 More complicated clinical risk scores or the addition of biomarkers will always statistically improve on risk prediction, but the absolute difference in c-index is often modest, and clinical utility improvements using decision curve analysis are marginal. 3,4 Indeed, the debate is unsettled as to whether adding biomarkers improves the clinical utility of current risk scores, especially since many biomarkers are nonspecific and are affected by non-cardiac conditions or are predictive of both thrombotic and bleeding events. 4,5 In addition, some complicated clinical risk scores or biomarkerbased scores have been derived from highly selected clinical trial cohorts of anticoagulated patients, whether on warfarin or a direct oral anticoagulant (DOAC). [6][7][8] Improvements in risk prediction are particularly needed, especially in newly diagnosed non-anticoagulated patients where decision-making on stroke prevention with oral anticoagulation (OAC) is being considered. Also, many AF patients have pre-existing conditions not accounted for in the existing clinical risk scores that introduce variability that impacts tool performance, for example, valvular heart disease, sleep apnoea, and chronic kidney disease. 9 In addition, existing clinical risk stratification scores mostly rely on baseline factors and the use of linear terms for the calculation of stroke risk; however, the clinical risk is dynamic and stroke risk changes with age and incident risk factors. [10][11][12] With the advent of efficient machine learning technologies, it may be possible to develop complex models, which piece together important clinical and demographic parameters in non-linear formulations, to enhance markedly the performance of such rules. If successful, this has the potential to improve the practice of medicine. 13,14 In the absence of selecting important clinical and demographic variables for use in improving the practice of medicine for specific outcomes, the performance of risk models will be at best slightly improved using machine learning techniques relative to conventional methods. 15,16 With the above in mind, the application of machine learning techniques should take into account the non-linear effects of prior history of stroke, older age, and multi-morbid conditions.
In the present study, our aim was to perform a comparative assessment of stroke risk prediction in a large non-anticoagulated US cohort via the use of machine learning algorithms compared to the CHADS 2 and CHA 2 DS 2 -VASc risk scores. 17,18 Given the association of stroke with multiple comorbidities beyond the CHADS 2 and CHA 2 DS 2 -VASc scores, a new multi-morbid index was generated accounting for both cardiovascular and non-cardiovascular clinical history and were comparatively compared to the conventional stroke risk indices and the machine learning algorithms. Model performance attributes are examined in terms of calibration, discrimination, and clinical utility. [19][20][21]

Methods
We studied a non-anticoagulated (i.e. not exposed to warfarin or DOAC) population of patients with and without AF. It was drawn from the Commercial plan for the working population and their families (18-64 years) and Medicare plan for the elderly (> 65 years) and those with disabilities (> 18 years) which were used as the primary sources of data.
The Medicare plan was derived from the Medicare Advantage and Medicare-Medicaid Advantage (for dual eligible Medicare beneficiaries) enrolment. During the study period (1 January 2016 to 30 June 2020), the targeted population from both plans contributed 2 912 241 males and 3 535 030 females, with follow-up data of 7 656 600 and 6 532 079 personyears, respectively. The population had complete coverage for both medical and pharmacy benefits and was identified from the pharmacy and medical claim databases. IRB approval was not required for the extraction of data from the claim databases; however, compliance with US privacy laws and Company governance is required for use of data.

Population identification
The process of identifying the non-anticoagulated AF and non-AF populations consisted of several steps: (a) obtain the pharmacy claims for OAC medications (Warfarin or coumadin, eliquis or apixaban, pradaxa or dabigatran etexilate, xarelto or rivaroxaban) together with the member identification; (b) obtain the medical claims for AF members using ICD-10 codes (I480, I481, I4811, I4819, I482, I4820, I4821, I483, I484, I489, I4891,  I4892), and extract the associated identification parameters; (c) identify the AF and non-AF members who are not on OACs. Each medication was analysed using both NDC (National Drug Code) and GPI (Generic Product Identifier) codes (see suppl. table S1 for the respective NDC codes for each of the anti-coagulants). This is because NDCs can be ambiguous and many codes exist for a single product, leading to inaccuracies in the dispensing of drugs. Therefore, GPI was used to ensure consistency, with many (from NDC) to one (to GPI) mapping. The medical claims were obtained for primary and secondary AF and non-AF ICD 10 codes.

Parameter identification
The index condition for an AF target was identified as having two or more medical claims during the period of 1 January 2017 to 30 March 2020. The date of the first claim was the index date. The incidence of stroke outcome was identified as the first event, which occurred after the index date until the end of the study period (30 June 2020). Patients were censored when they had their first stroke, or (as we intended to determine risk without attenuation by anticoagulation use) when AF patients were initiated on OAC.
Comorbid conditions for the AF cohorts were tracked starting from 1 January 2016 until the day prior to the index date. The stroke outcome and comorbid conditions were identified from medical claims using primary and/or secondary diagnoses. Supplementary material online, Table  S2 provides a list of ICD 10 codes for input and output conditions. Gender and age were documented from the medical databases. Age was categorized into four groups (18-54, 55-64, 65-74, and > _75 years) and was also assessed as a continuous variable.
For non-AF cohorts who are not on anticoagulants, a patient had to have a history of a minimum of 6 months for comorbid conditions upon entry into the study, after which the first incidence of stroke was considered the outcome for these cohorts. As such, the equivalent of AF index date for non-AF cohorts was 6 months after entry into the study.
A comorbid condition or stroke outcome was identified as present ('1') or absent ('0') and acted as a binary outcome. Gender was treated as a binary variable with '1' for a female and '0' for a male. Recent research has shown that the inclusion of AF duration in stroke risk prediction, which has not been traditionally used in modelling, tends to improve the discrimination validity of the model. In this study, exposure time to AF or non-AF was assessed in two ways: (i) time duration in days from the AF index date or non-AF status date (6 months after entry into the study) to the end of follow-up or benefits; (ii) time duration in days from the last prior stroke case to the AF/non-AF index date. The CHADS 2

Strategy for model development/validation
The large dataset utilized in this study is drawn from the diverse geographical areas covering the US continent. Therefore, the prediction of stroke risk should be coherent to all geographical areas. With this in mind, the training and validation samples were drawn at random from the primary data sources with equal representations of the outcome events. Model development was performed on two-third of the whole population (training sample). Validation was performed on the remaining onethird of the data (validation sample) once the model developed on the training sample was deemed appropriate on the grounds of clinical meaningfulness and other criteria specific to the predictive algorithm in use.
Model validation was based on calibration (internal and external), discrimination, and clinical utility. [19][20][21][22] Calibration was assessed graphically between the predicted and observed outcomes for the training and validation samples after being subjected to regression smoothing methods such as a locally weighted least squares regression smoother or 'loess' algorithm. Discrimination was evaluated using the C-statistic. In addition, external validation was performed using cumulative lift measures. Clinical utility was assessed using decision curve analysis in terms of the net benefit measure at a given probability threshold which reflects the risk at which one is indifferent about the treatment under consideration, with a balance between both sensitivity and specificity in terms of appropriate values.
In this study, we compared the machine learning-based algorithms against the two common stroke risk scores (i.e. CHADS 2 and CHA 2 DS 2 -VASc scores). We also compared the machine learning-based algorithms against a multi-morbid index.
The multi-morbid index was developed by employing a logistic regression model on the basis of the training sample using main effects with comorbid history, demographic variables (gender, age group as explained above), and other variables (AF status-1 for AF group and 0 for non-AF group via the use of ICD 10 codes as explained before; Medicare status-1 for Medicare plan and 0 for Commercial plan). On the basis of this analysis, the multi-morbid index was constructed as the sum of multi-morbid conditions (2 points for the presence of hypertension or diabetes mellitus; 19 points for the presence of prior stroke history; 1 point for congestive heart failure, vascular disease, valvular disease, coronary artery disease, chronic kidney disease, sleep apnoea, chronic obstructive pulmonary disease, alcohol use or disorders, prior history of major bleeding, inflammatory disease, and lipid disorders; and 0 point for the absence of condition), gender (1 point for female and 0 for male), and age group (0 point for 18-54 years; 4 points for 55-64 years; 8 points for 65-74 years; and 12 points for >75 years). In essence, the multi-morbid index is an enhancement of the CHA 2 DS 2 -VASc score by accounting for non-cardiovascular conditions, additional cardiovascular events, additional weights for stroke history and age groups, as well as an additional age group category from 55 to 64 years). Based on main effect analysis using logistic regression, it was determined that the weight for prior history of stroke should be increased to 19 points and the hypertension and diabetes mellitus weights should be amplified to 2 points. There were 4 points for 55-64 years age group, 8 points for 65-74 years age group, and 12 points for >75 years age group. There was 1 point for each of the following additional conditions: coronary artery disease, valvular disease, sleep apnoea, chronic kidney disease, chronic obstructive pulmonary disease/bronchiectasis, prior history of major bleeding, alcohol use or disorders, inflammatory disease, and lipid disorders. The index ranged from 0 to 44 points.
Four machine learning algorithms were employed, that is, two parametric (logistic regression and neural network) and two non-parametric (decision tree, gradient boosting) [23][24][25] (see Supplementary material online, Table S3 for greater details). The inputs to stroke outcome were the baseline characteristics of comorbid history and demographic variables. In addition, we assessed (i) the temporal characteristics of exposure to AF/non-AF status until the end of follow-up time; and (ii) exposure time from last prior stroke event in comorbid history to AF/non-AF index date. The Statistical Analysis Software Enterprise Miner 15.1 was used to implement these algorithms in a JAVA Web Platform. There was no funding for the study.

Results
Our final study cohort consisted of 6 For the training and validation samples in AF cohorts, the prevalence of hypertension was the highest (34.5%) followed by lipid disorders (28.9%), coronary artery disease (15.9%), congestive heart failure (13.7%), valvular disease (12.8%), then chronic obstructive pulmonary disease (12.6%). For the training and validation samples in non-AF cohorts, the comorbidities are summarized in Table 1. In general, the non-AF cohort had significantly less comorbid conditions than AF cohorts. The entirety of AF and non-AF cohorts was not taking anticoagulants.
Comparative assessment of stroke risk indices Table 2 shows the incidence rates in cases per 100 person-years for the CHADS 2 and CHA 2 DS 2 -VASc scores as well as the multi-morbid index scores. The results are provided for low-, medium-, and highrisk scores for both the AF and non-AF cohorts. In general, the CHADS 2 scores had the highest incidence rates followed by the CHA 2 DS 2 -VASc scores, then the multi-morbid index scores for all three risk levels and AF/non-AF cohorts. The incidence rates for the AF cohorts were significantly much higher than those for the non-AF cohorts at the 5% level. Table 3 shows a summary of three sets of models based on the three types of stroke risk indices for the training samples. Set 1 is only based on the stroke risk index and sets 2 and 3 based on (i) the stroke risk index/AF group status and (ii) stroke risk index/AF group status/Medicare group status, respectively. The C indices were higher for set 3, followed by set 2, then set 1. Figure 1 shows similar results based on external validation. Figure 2 shows the clinical utility of three sets of stroke risk-based models. In general, any of the models has a higher net benefit than the 'treat all' and 'treat none' strategies, therefore, these models are clinically useful at designated probability thresholds. The multi-morbid index-based models have the highest net benefit in terms of stroke cases per 100 patients adjusted for any false positives at a given probability threshold.          48.8% or 15 913 events (predicted using gradient boosting), and 47.6% or 15 525 events (predicted using neural network) of patients with stroke events were reached when 5% of members were targeted. Supplementary material online, Figure S1 indicates the incremental improvement of the machine learning-based logistic regression algorithm over the full multi-morbid index-based model in terms of clinical utility. At a probability threshold of 3.75%, the machine learning algorithm produced a net benefit of 1.0 true stroke events relative to the 0.94 stroke event achieved using the full multi-morbid-indexbased model. At this probability threshold, both models are much more clinically useful relative to the all treatment strategy; yet the machine learning algorithm provided a total of 14 441 true stroke events relative to the 12 642 events using the full-multi-morbid index-based model. Supplementary material online, Figure S2 shows the importance of the multi-morbid index, AF status, and Medicare status as variables in the machine learning-based algorithms. For simple indices, one can opt for the full multi-morbid index-based model; however, complex formulations such as machine learning-based algorithms afford us the opportunity to capture more events for severe clinical outcomes such as stroke.

Discussion
In this large contemporary cohort of newly diagnosed non-anticoagulated patients with AF/non-AF, our principal findings are the demonstration of improved stroke risk prediction using cardiovascular and non-cardiovascular multi-morbid conditions and machine learning approaches, compared to the two conventional clinical risk scores, the CHADS 2 and CHA 2 DS 2 -VASc scores. Indeed, the highest cindex was obtained for logistic regression (0.892), with consistency on external validation (0.891).
The present study based on a large US cohort validates the clinical meaningfulness of the CHADS 2 and CHA 2 DS 2 -VASc scores 17,18 developed on smaller sample sizes approximately 20 and 10 years ago, respectively, which showed good calibration even in this contemporary cohort. The discriminant validity of these tools showed moderately good c-indexes of >0.7. In an independent PCORI systematic review and evidence appraisal, the CHADS 2 , CHA 2 DS 2 -VASc, and ABC-stroke scores had the best prediction for stroke events. 2 While the CHADS 2 was simple, its use has been superceded by the CHA 2 DS 2 -VASc score in many contemporary guidelines, given the default has shifted to offer stroke prevention (which Is OAC) unless the patient is 'low risk' (and the CHA 2 DS 2 -VASc score could help initially identify those low-risk patients). 3 Nonetheless, both clinical risk scores are simplifications, have modest predictive value, and do not account for the dynamic nature of risk.
This study also constructed a multi-morbid index consisting of cardiovascular and non-cardiovascular comorbid conditions. In general, it had more discriminatory power than the two conventional clinical scores as it explained more of the variance in stroke outcome. Additionally, the multi-morbid index, which is a modification of the CHA 2 DS 2 -VASc score in terms of more comorbid conditions, changed weights for prior stroke, hypertension, and diabetes mellitus, and lowered the age threshold to > _55 years with more weights for different age groups, provided the best performance from among the stroke risk indices.
In the present study, we gain an additional improvement in stroke risk prediction in AF/non-AF cohorts from the use of various comorbid conditions and their synergistic effects with age, gender, and temporal exposure using claims information. This considerable gain was achieved due to the non-linear formulations of aforementioned variables via machine learning algorithms, leading to a c-index with logistic regression of 0.892, with consistency on external validation (0.891), as well as resulting in net benefit values better than the 'treat all' strategy or current clinical risk scores. In addition, at the 3.75% risk threshold, the machine learning model showed better clinical utility in comparison to the three-stroke risk indices across all levels of probability thresholds.
The inclusion of AF exposure in terms of the cumulative temporal exposure not taking anticoagulant medication from the index date to the end of follow-up or benefits as well as from the last prior stroke event to the AF/non-AF index date added considerable explanation of the variability in stroke risk prediction. This emphasizes the importance of temporal exposure of AF status in stroke risk prediction, given the dynamic nature of stroke risk and how risk can be influenced by incident risk factors, ageing, and AF progression. [10][11][12]26 The relationship among prior clinical history as measured by the multimorbid index and demographic variables showed considerable improvement in stroke risk prediction due to the non-linear formulations reported by using machine learning algorithms in terms of twofactor interactions. 27

Practical implications
From clinical risk, one can predict stroke risk to a reasonable degree via the diversified comorbid history and age/gender information and simple clinical risk scores. The present study in a contemporary cohort expanded the utility of stroke risk prediction for patients with additional comorbidities and using machine learning techniques. The ability to improve predictive precision, which was also translated into improved clinical utility could facilitate dynamic stroke risk assessments. The possible incorporation of machine learning risk prediction into Apps and smart mobile health (mHealth) technology would enable 'real time' dynamic assessments of stroke (and possibly bleeding) risk. Indeed, the AF patient pathway (or 'patient journey') would require risk reassessment(s) at intervals, when not on antithrombotic therapy (e.g. when newly diagnosed), and while on aspirin (e.g. with background vascular disease) and post-OAC (whether on warfarin or DOAC). Machine learning could adapt to these treatment changes over time, as well as incident risk factors, and is the subject of ongoing analyses.
The potential opportunities here are illustrated by our Mobile Health (mHealth) technology to improve optimization of integrated care in patients with Atrial Fibrillation App program (mAFA) which investigated mHealth technology for improved screening and integrated care in patients with AF, facilitating early diagnosis, dynamic (re)assessments of risk profiles, and holistic AF management. 28 In our prospective randomized clinical trial, this integrated care approach significantly reduced the composite outcome of 'ischaemic stroke/ systemic thromboembolism, death, and rehospitalization' compared with usual care. 29 Prospective dynamic monitoring and re-assessment of bleeding risk using the HAS-BLED score was associated with less major bleeding events, a reduction in modifiable bleeding risk factors, and increased OAC uptake; in contrast, bleeding rates were higher and OAC use decreased by 25% in the 'usual care' arm, when the baseline was compared to 12 months. Incorporation of a dynamic machine learning model into our mHealth technology would facilitate 'real time' assessment of stroke risk, facilitating mitigation of modifiable risk factors (e.g. blood pressure control).

Limitations
Our study is limited by its observational design, but currently represents the largest contemporary cohort of non-anticoagulated 'real world' patients for the assessment of stroke risk and comparisons of risk prediction models. As with observational cohorts, the possibility of residual confounding remains. 30

Data availability
Data are available as presented in the article. According to US laws and corporate agreements, our own approvals to use the Anthem and IngenioRx data sources for the current study do not allow us to distribute or make patient data directly available to other parties.