Development and internal validation of machine learning–based models and external validation of existing risk scores for outcome prediction in patients with ischaemic stroke

Abstract Aims We developed new machine learning (ML) models and externally validated existing statistical models [ischaemic stroke predictive risk score (iScore) and totalled health risks in vascular events (THRIVE) scores] for predicting the composite of recurrent stroke or all-cause mortality at 90 days and at 3 years after hospitalization for first acute ischaemic stroke (AIS). Methods and results In adults hospitalized with AIS from January 2005 to November 2016, with follow-up until November 2019, we developed three ML models [random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBOOST)] and externally validated the iScore and THRIVE scores for predicting the composite outcomes after AIS hospitalization, using data from 721 patients and 90 potential predictor variables. At 90 days and 3 years, 11 and 34% of patients, respectively, reached the composite outcome. For the 90-day prediction, the area under the receiver operating characteristic curve (AUC) was 0.779 for RF, 0.771 for SVM, 0.772 for XGBOOST, 0.720 for iScore, and 0.664 for THRIVE. For 3-year prediction, the AUC was 0.743 for RF, 0.777 for SVM, 0.773 for XGBOOST, 0.710 for iScore, and 0.675 for THRIVE. Conclusion The study provided three ML-based predictive models that achieved good discrimination and clinical usefulness in outcome prediction after AIS and broadened the application of the iScore and THRIVE scoring system for long-term outcome prediction. Our findings warrant comparative analyses of ML and existing statistical method–based risk prediction tools for outcome prediction after AIS in new data sets.


Description iScore
A point-based risk prediction model developed using logistic regression analysis to predict death at 30 days and 1 year in large cohort of patients with acute ischemic stroke (n=12,262) from the Canadian Stroke Network Registry 1 .Based on age, age, sex, stroke severity assessed with the Canadian Neurological Scale, stroke subtype according to the Trial of ORG 10172 in Acute Stroke Treatment (TOAST) 2, glucose level, comorbid atrial fibrillation, congestive heart failure, cancer, kidney disease, and preadmission dependency an integer score is calculated with higher the score greater the mortality.The original scoring system had a c-statistic of 0.85 at 30 days and 0.82 at 1 year in development set (n=8,223) and 0.85 at 30 days and 0.84 in internal validation set (n=4,039) and 0.79 at 30 days and 0.78 at 1 year in external validation cohort (n=3,272 from Ontario Stroke Audit).The iScore was subsequently externally validated in several studies 1,3-14 and available as online calculator 15.

THRIVE
A 5-predictor variable (age, stroke severity, and history of hypertension, diabetes mellitus and atrial fibrillation) point-based risk prediction tool developed using logistic regression analysis to predict functional outcome at 3-months in a cohort of patients (n=305) receiving endovascular treatment for ischemic stroke 16.The score ranged from 0-9 with higher the score poor the outcome.The THRIVE scoring system was subsequently externally validated in several studies 5-7,16-24 .each algorithm, with optimal hyperparameters, is subsequently selected.Case-control study-Give the eligibility criteria, and the sources and methods of case ascertainment and control selection.Give the rationale for the choice of cases and controls.

Supplemental Table 5. Strengthening The Reporting of Observational studies in Epidemiology (STROBE) check list
Cross-sectional study-Give the eligibility criteria, and the sources and methods of selection of participants.

Data variables
The two primary outcome variables were the occurrence of recurrent stroke or mortality within 90-days and 3years of hospital discharge among patients with AIS.A total of 90 independent variables were recorded for each patient, including demographics, length of hospital stays, social indicators, severity of stroke assessed by National Institute of Health Stroke Scale (NIHSS), stroke subtypes according to Trial of Org 10172 in Acute Stroke Treatment (TOAST) classification and comorbidities 2 .Table 1 reports the recorded patients' data for 90-day and 3-year outcomes, respectively.Comorbidities included the presence of hypertension, dyslipidemia, depression, heart failure, atrial fibrillation, coronary artery disease, peripheral artery disease, chronic obstructive pulmonary disease, chronic kidney disease, cancer, dementia, obstructive sleep apnea, obesity or osteoarthritis before initial admission.Vitals and pathology data included systolic and diastolic blood pressure (average of three consecutive systolic and three consecutive diastolic blood pressures on admission), heart rate (average of three heart rates on admission), blood glucose, blood urea nitrogen, creatinine, and hemoglobin levels, medications on dismissal, discharge disposition, and activities of Daily Living.

Data processing and sub sampling for class imbalance
A total of 14 (1.9%) patients contained one or more missing data.Imputation of missing values was performed using random forest imputation with the missForest package in R. Imputation was performed independently on training and test sets to prevent data leakage.A complete list of the variables along with their descriptive statistics is provided in Table 1.Baseline characteristics and recurrent stroke or mortality within 90-days and 3years of discharge status of the cohorts were assessed.For continuous variables, the mean (SD) for sufficiently normally distributed variables and median (IQR) for non-normally distributed variables were reported.For categorical variables, the number (proportion) is reported.Comparisons between the groups for continuous variables were conducted with one-way analysis of variance (ANOVA) test for sufficiently normally distributed data, and Kruskal-Walli's test for non-normally distributed data.Comparisons between the groups for categorical variables was conducted with Pearson χ2 test.Differences between the groups were deemed statistically significant at a p-value of 5% or lower.

Model development
The data was randomly partitioned with a split of 70/30 into development and validation sets, respectively.To develop the models to predict 90-day and 3-year recurrent stroke or mortality, we employed ten times repeated ten-fold cross-validation using the development set.Evaluation of the models was performed on the validation set. Figure 2 illustrates the overall model development and evaluation process.The statistical models were based on logistic regression (LR) and least absolute shrinkage and selection operator (LASSO) logistic regression.The machine learning-based models included random forest (RF), support vector machine (SVM), and extreme gradient boosting decision trees (XGBOOST).The full set of variables were used as input for each of the models.Hyperparameter tuning of models was performed by a grid search of values assessed for each model hyperparameter during the cross-validation on the development set.Optimal hyperparameters were selected based on the minimum Brier score across the cross-validation, and the final model was fit to the development set with the optimal hyperparameters.To compare overall performance of the models, we calculated the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV) and model training time in seconds.95% confidence intervals for AUC were calculated via bootstrap resampling in 2000 resamples.We reported the performance metrics at the default 0.5 probability threshold.Data was standardized prior to model training and validation.One variable ("psychiatric") had zero variance and was removed during model development, due to lack of predictive ability.Model development were performed using the Caret package in RStudio [10] on computer with a Ryzen 7 3700X 8-Core 4.2GHz processor.
Supplemental Table 6

Supplemental Figure 1 .Supplemental Figure 2 .
Strengthening The Reporting of Observational studies in Epidemiology (STROBE) flow diagram of final cohort selection 90-day composite outcome 3-year composite outcome Final cohort with acute ischemic stroke for data sharing, n=721 Missing data, n=14 159 excluded for age ≥90 years per Mayo Clinic IRB recommendation Patients identified with acute ischemic stroke, n = 888 99 patients excluded: previous stroke and non-ischemic stroke (ICH, SAH, SDH) Patients assessed for eligibility n = 979 45 patients excluded for non-adult and non-stroke conditions: pediatric, brain tumor, trauma, and TIA Patient identified from EMR 2005 -2019, n = 1024 Schematic representation of the process of the development of machine learning algorithms: datasets, methods of development, and cross-validation.The flow diagram shows the process of data splitting (70:30 into development and validation sets), imputation of missing data (missForest with package R), hyperparameter tuning, cross validation (10 times repeated and 10-fold cross validation), and sequential validation.Additionally, the figure showcases the process of model fitting, where each machine learning algorithm is fit using cross-validation.The best performing model for 9 Indicate the study's design with a commonly used term in the title or the abstract 1 Setting 5 Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection 6,7 (a) Cohort study-Give the eligibility criteria, and the sources and methods of selection of participants.Describe methods of follow-up Participants 6 the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time.A diagram may be helpful. of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcomeshow a comparison with the development data of the distribution of important variables (demographics, predictors and outcome).

Figure 2
of participants and outcome events in each analysis.7 14b D If done, report the unadjusted association between each candidate predictor and outcome.prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point).

Figure 2
the use the prediction model.

Figure 6 .
Figure 6.ROC curves for iScore and THRIVE scores for predicting a 90-day composite outcome.

.
Description of machine learning models

Table 4 .
: iScore, Ischemic Stroke Predict to Risk Score; THRIVE, Totaled Health Risks in Vascular Events Supplemental Original iScore and THRIVE point-based prediction scores for validation and recalibration in the study cohort.

Components Number of points Points 30-day score 1-year score iScore
Abbreviations: DM, diabetes mellitus; HTN, hypertension; iScore, Ischemic Stroke Predict to Risk Score; NIHSS, National Institute of Health Stroke Scale THRIVE, Totaled Health Risks in Vascular Events Cohort study-If applicable, explain how loss to follow-up was addressed Case-control study-If applicable, explain how matching of cases and controls was addressed.Give information separately for cases and controls in case-control studies and, if applicable, for exposed and unexposed groups in cohort and cross-sectional studies.An Explanation and Elaboration article discusses each checklist item and gives methodological background and published examples of transparent reporting.The STROBE checklist is best used in conjunction with this article (freely available on the Web sites of PLoS Medicine at http://www.plosmed (b) Cohort study-For matched studies, give matching criteria and number of exposed and unexposed Case-control study-For matched studies, give matching criteria and the number of controls per case.*Note:

.
Additional details of machine learning models

Table 1
16scuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data).16

Table 11 .
Comparison of iScore and THRIVE scores for predicting a 90-day composite outcome.