Using artificial intelligence to predict mortality in AKI patients: a systematic review/meta-analysis

ABSTRACT Background Acute kidney injury (AKI) is associated with increased morbidity/mortality. With artificial intelligence (AI), more dynamic models for mortality prediction in AKI patients have been developed using machine learning (ML) algorithms. The performance of various ML models was reviewed in terms of their ability to predict in-hospital mortality for AKI patients. Methods A literature search was conducted through PubMed, Embase and Web of Science databases. Included studies contained variables regarding the efficacy of the AI model [the AUC, accuracy, sensitivity, specificity, negative predictive value and positive predictive value]. Only original studies that consisted of cross-sectional studies, prospective and retrospective studies were included, while reviews and self-reported outcomes were excluded. There was no restriction on time and geographic location. Results Eight studies with 37 032 AKI patients were included, with a mean age of 65.1 years. The in-hospital mortality was observed to be 19.8%. The pooled [95% confidence interval (CI)] AUC was observed to be highest for the broad learning system (BLS) model [0.852 (0.820–0.883)] and elastic net final (ENF) model [0.852 (0.813–0.891)], and lowest for proposed clinical model (PCM) [0.765 (0.716–0.814)]. The pooled (95% CI) AUC of BLS and ENF did not differ significantly from other models except PCM [Delong's test P = 0.013]. PCM exhibited the highest negative predictive value, which supports this model's use as a possible rule-out tool. Conclusion Our results show that BLS and ENF models are equally effective as other ML models in predicting in-hospital mortality, with variability across all models. Additional studies are needed.


INTRODUCTION
Acute kidney injury ( AKI) is a condition characterized by a significant decline in kidney function that results in the dysfunction of the kidneys' ability to properly filter waste products and regulate fluid/electrolyte homeostasis [1 ].Beyond the acute phase of AKI, progression to chronic kidney disease, increased risk of cardiovascular complications, recurrent episodic AKI and longterm mortality are common in survivors of AKI [2 , 3 ].AKI affects approximately 15% of hospitalized patients and 30%-60% of patients in the intensive care unit ( ICU) , with high morbidity and mortality risk in both settings.Continuous renal replacement therapy ( CRRT) is commonly used to provide renal support in ICU patients with severe AKI, yet the mortality rate of AKI patients receiving CRRT remains high at 30%-70% [4 , 5 ].Improvement in patient outcomes with AKI has proven to be complex due to the condition's wide array of etiologies and complex pathophysiology, which often leads to a delay in diagnosis [2 ].
Earlier identification of high-risk AKI patients can lead to better-targeted interventions that have the potential to reduce morbidity and mortality, contribute to more efficient resource allocation, decrease length of stay and reduce total hospital expenses for patients [6 ].To improve the prediction of mortality in AKI patients, prior studies have evaluated prognostic models that employ both conventional statistical modeling techniques and advanced artificial intelligence ( AI) modeling techniques [6 ].
Machine learning ( ML) is a field of artificial intelligence ( AI) that computes future predictions from previous input.These algorithms identify patterns, relationships and inputs from given data and then apply these externally to novel data [3 ].Linear ML models such as logistic regression or linear regression aim to understand the relationships between variables and predict outcomes [8 ].These linear ML models may be affected by factors of multicollinearity and non-linear, more complex relationships [6 -10 ].
Advanced non-linear modeling techniques of ML such as random forest ( RF) , extreme gradient boosting ( XGBoost) , broad learning system ( BLS) , support vector machine ( SVM) , artificial neural network ( ANN) , multi-layer perceptron ( MLP) and other AI models have emerged as promising prediction models due to their ability to factor in more complex variables.It is anticipated that these AI models will be able to enhance prognostic accuracy and overcome shortcomings of previous models [6 ].By efficiently utilizing large datasets such as the Medical Information Mart for Intensive Care III and patient electronic health records, these models can discriminately predict new, complex clinical scenarios, thus reducing mortality and poor patient outcomes in AKI patients [4 , 6 ].Furthermore, even subjective factors such as sentiment analysis and nursing notes have shown to be highly useful in these advanced models [4 ].
The aim of this study was to compare the performance of different ML models and other AI models in predicting in-hospital mortality of patients with established AKI.Statistical analyses and performance metrics were compared to analyze the potential differences in predictors and predictive output of such models as well as the presence of heterogeneity and publication bias.

Literature search
A literature search was conducted using databases including PubMed and Embase, by two independent reviewers.The search strategy was conducted using broader terminology specifically targeting AI and ML utilization in AKI, due to vastly limited results with using "mortality" or synonymous terms as a primary search query.Thus, primary search queries included "Artificial Intelligence," "Machine Learning," "Deep Learning" and "Acute Kidney Injury."The search was not restricted to any age group.The search was restricted to English, and the selected articles were exported to a citation managing software "Rayyan."All literature results were reviewed by two independent reviewers ( R.S. and P.N.) .Any disagreements regarding data extraction were resolved by a third reviewer ( R.R.) .A detailed search strategy can be found in the Supplementary data file.

Selection criteria
The studies selected were appropriate for inclusion when statistical variables regarding the efficacy of the AI model were reported [the AUC, accuracy, sensitivity, specificity, negative predictive value ( NPV) and positive predictive value ( PPV) ].Additionally, only original studies that consisted of cross-sectional studies, prospective and retrospective studies were included in the systematic review.Furthermore, there was no restriction on time ( given that most studies included were in the past decade) and geographic location.Publications classified as review articles, systematic reviews and self-reported outcomes were excluded from this meta-analysis.A detailed PICOS ( population, intervention, comparator, outcome, study design) table can be found in the Supplementary data file.

Data extraction
The data extraction of included articles was performed using a standardized data collection tool, which included first author,

Results of literature search
From a total of 623 articles gathered after an initial search from multiple databases, only 8 articles matched our inclusion criteria and were utilized for meta-analysis.Qualitative analysis of the included studies was done using the Newcastle-Ottawa Scale ( NOS) tool.The qualitative analysis can be found in the supplemental data file.

Statistical analysis
The outcomes included model performance metrics ( AUC, sensitivity/recall, specificity, PPV/precision, NPV and accuracy) of different ML/AI models in assessing in-hospital mortality among AKI patients.These outcomes and their 95% confidence intervals ( 95% CIs) were extracted for each of the included study.The degree of between-study heterogeneity was assessed using the I 2 test, where I 2 ≥ 50% indicated high heterogeneity.The overall ( pooled) estimate was calculated using a random effects model for high heterogeneity and fixed effects model for low hetero-geneity.Analysis of the performance of individual models was conducted.However, to simplify the complexity of comparing each set of results from each study's implementation of a model with another, pooling the resulting values presents a more efficient manner of comparing overall model performance results, which has also been conducted.
A forest plot was used to visualize these outcomes in each study and the combined estimated outcomes with their 95% CI.Publication bias was assessed with an Egger's test.The DeLong's test was used to compare the receiver operating characteristic curves of the two models.A P -value ≤.05 was set as the level of significance.All statistical analyses were performed with R software version 3.1.0.

Included studies
A total of eight studies including patients with AKI admitted in an ICU/hospital were included.Of these eight studies, two included AKI patients undergoing CRRT, one included COVID-19 patients with AKI and one included patient with cardiac surgery-associated AKI.The overall sample size across these studies was 37 032 ( ranging from 270 to 19 044 across various studies) .The mean/median age of the patients across these studies was 65.

Area under the curve
All eight studies reported the data on AUC for 14 different ML/AI models.A meta-analysis was conducted for eight models with data from one or more cohort as shown in Table 1 .Across these eight models, the pooled ( 95% CI) AUC was observed to be highest for BLS models [0.852 ( 0.820-0..1319 a This clinical model was restricted to 14-15 variables that were top features of different ML models and were deemed by the investigators as pragmatic given its routine use/access to in clinical practice.D: derivation; V: validation. was not significantly different from other models.The pooled ( 95% CI) AUC of logistic regression was observed to be higher than that of XGBoost, RF, SVM and ANN/MLP, but not significantly different from these models.There was no evidence of publication bias for most of the models based on Egger's test ( P > .05)except ANN/MLP, ENF model fitted, and PCM.Tables 1 -6 also provide detail on heterogeneity analysis for each model.Tables 7 a-7 h and Fig. 1 provide meta-analysis of AUC for individual ML/AI models across different studies.

Sensitivity/recall, specificity, PPV and NPV
A total of four studies reported the data on these outcome measures for seven different ML/AI models.

Accuracy
A total of four studies reported the data on accuracy for six different ML/AI models.A meta-analysis was conducted for six models with data from one or more cohort.Across the models with meta-analysis, the pooled ( 95% CI) AUC was observed to be highest for BLS models [0.742 ( 0.706-0.778)] and lowest for XG-Boost [0.666 ( 0.578-0.754)] ( Table 6 ) .Even considering the models with only one cohort, the accuracy for BLS models was observed to be the highest.There was no evidence of publication bias for BLS models, RF and SVM based on Egger's test ( P > .05), but evidence of publication bias was observed for the other models.Tables 12 a-12 f and Fig. 6 provide meta-analysis results of accuracy for individual ML/AI models across different studies.

DISCUSSION
Conventional scoring systems and regression-based models have shown limited performance, emphasizing the need for more encapsulating predictive models through the exploration of ML methods [6 ].The necessity for improved, more accurate and actionable prediction models could lead to targeted interventions and better allocation of clinical resources [6 ].
The results of this study describe the differences of statistical performance between different forms of ML/AI models in predicting in-hospital mortality in patients with AKI.By understanding the performance characteristics of each ML/AI model, proper use can be attained in guiding clinical practice and future research.The use of specific performance metrics throughout each study can be attributed to the primary objective of the study.The AUC measures the model's capability in discriminating between positive versus negative cases.The majority of included studies provided the AUC as a comprehensive statistic to describe each model's performance.More specifically, other studies may have also described other performance metrics including PPV, NPV and accuracy.Based on the clinical context, each call for prediction necessitates different goals and projected actionable interventions.Researchers and clinicians should carefully apply the referred statistics of each ML/AI model to their correct, corresponding potential use.
The study evaluated 14 different ML/AI models to describe their performance.The "best model" depends on various clinical factors, different goals, computational resources, dataset       The strengths and weaknesses of the other performance metrics can be applicable to other corresponding clinical situations.The ML/AI models showed varied performance regarding sensitivity, specificity, PPV, NPV and accuracy.However, for the three common ML models ( RF, logistic regression and XGBoost) , the pooled sensitivity, specificity, PPV and NPV values were observed to be within the 95% CI of each other, indicating insignificant difference in performance across these models.This suggests the importance of understanding individual model performance for their corresponding clinical application.Overall, different ML/AI models have a myriad of strengths and limitations and therefore a comprehensive approach is necessary to choose the most efficient model.
In relation to standard clinical scoring tools such as the SOFA, MOSAIC, APACHE II and SAPS II, the goal of ML/AI models is to           offer different avenues of reaching accurate predictability.Instead of relying solely on clinical parameters and physiological measurements such as in these clinical scoring tools, ML/AI models have the capability to incorporate complex variables from a wide array of large datasets.In relation to AUC, the three common ML models of RF, logistic regression and XG-Boost scored higher in comparison with SOFA, MOSAIC and APACHE II.Further, these three scoring clinical models scored significantly lower in comparison with ENF, SVM and ANN.The ML/AI models additionally showed similar discriminatory     This study assessed the degree of heterogeneity using the I 2 tests, with some models showing a high degree which could be attributed to differences in patient population, data sources or model configurations.More specifically, this may include the limited number of ML model studies, the variety of different                the risk of overfitting.However, it is crucial to acknowledge the prevalence of models that are trained and tested on the same data due to the limited availability of a substantial set of clean and accessible data in the medical field.Due to restrictions in data availability, studies without an external validation set constitute a considerable proportion of work regarding the use of AI in medicine; as such, some studies without validation datasets were also considered within the scope of this study while acknowledging the limitations that come alongside them.Future research should include more diverse ML models and different datasets for each model in order to assess robustness and generalizability.
The findings of this study provide a foundation for further research in the field of AKI mortality prediction using ML/AI

Figure 1 :
Figure 1: Forest plot of the meta-analysis of the ( a) logistic regression model, ( b) the RF model, ( c) the BLS model, ( d) the ENF model, ( e) the XGBoost model, ( f) the SVM model, ( g) the ANN/MLP model and ( h) the PCM AUC across different studies.The lower diamond in each graph represents the pooled estimate AUC.

Figure 2 :Figure 3 :Figure 4 :Figure 5 :
Figure 2: Forest plot of the meta-analysis of ( a) the logistic regression model, ( b) the RF model, ( c) the XGBoost model, ( d) the BLS model, ( e) the SVM model, ( f) the PCM and ( g) the ANN/MLP model sensitivity across different studies.The lower diamond in each graph represents the pooled estimate sensitivity.

Figure 6 :
Figure 6: Forest plot of the meta-analysis of ( a) the logistic regression model, ( b) the RF model, ( c) the XGBoost model, ( d) the BLS model, ( e) the SVM model and ( f) the PCM accuracy across different studies.The lower diamond in each graph represents the pooled estimate accuracy.

Table 1 : Meta-analysis of AUC for different ML/AI models in predicting in-hospital mortality among AKI patients.
a Fixed effect models, for other random effect models.b This clinical model was restricted to 14-15 variables that were top features of different ML models and were deemed by the investigators as pragmatic given its routine use/access to in clinical practice.No meta-analysis done as there was only one cohort.D: derivation; V: validation.

Table 4 : Meta-analysis of PPV of different ML/AI models in assessing in-hospital mortality among AKI patients.
833) ] and elastic net final ( ENF) model [0.852 ( 0.813-0.891)], and lowest for proposed clinic model ( PCM) [0.765 ( 0.716-0.814)].This difference was statistically significant ( Delong's test P = .013) .However, the AUC of BLS model a This clinical model was restricted to 14-15 variables that were top features of different ML models and were deemed by the investigators as pragmatic given its routine use/access to in clinical practice.D: derivation, V: validation.

Table 6 : Meta-analysis of accuracy of different ML/AI models in assessing in-hospital mortality among AKI patients.
a Fixed effect models, for other random effect models.b This clinical model was restricted to 14-15 variables that were top features of different ML models and were deemed by the investigators as pragmatic given its routine use/access to in clinical practice.No meta-analysis done as there was only one cohort.D: derivation; V: validation.

Table 7a : Meta-analysis of the AUC for the logistic regression model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 7b : Meta-analysis of the AUC for the RF model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 7c : Meta-analysis of the AUC for the BLS model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 7d : Meta-analysis of the AUC for the ENF fitted model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 7e : Meta-analysis of the AUC for the XGBoost model across different studies in assessing in-hospital mortality among AKI patients.
characteristics, model training databases and interpretability requirements.The performance statistics indicate that each ML/AI model exhibited diverse performance outcomes, particularly in relation to the prediction of hospital mortality.Among the evaluated models, the BLS model and ENF model had the highest AUC [0.852 ( 0.820-0.883)and 0.852 ( 0.813-0.891), respectively].This indicates a reliable discriminatory model only in relation to the PCM when predicting hospital mortality in AKI patients.The PCM exhibited the highest NPV, a critical attribute in clinical decision-making.Despite its potential for occasional

Table 7f : Meta-analysis of the AUC for the SVM model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 7g : Meta-analysis of the AUC for the ANN/MLP model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 7h : Meta-analysis of the AUC for the PCM across different studies.
D: derivation; V: validation.

Table 8a : Meta-analysis of sensitivity for the logistic regression model across different studies in assessing in-hospital mortality among AKI patients.
[9 ]ls.Although these models have been widely used in medical research and have been shown to perform well in predicting clinical outcomes, recent advances in ML models are able to improve prediction and efficiently handle large and complex datasets.The similar discriminatory efficiency of certain traditional ML models including logistic regression model, RF and XGBoost has been exhibited in a previous study as quantitatively described by the AUC[9 ].

Table 8b : Meta-analysis of sensitivity for the RF model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 8c : Meta-analysis of sensitivity for the XGBoost model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 8d : Meta-analysis of sensitivity for the BLS model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 8e : Meta-analysis of sensitivity for the SVM model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 8f : Meta-analysis of sensitivity for the PCM across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 8g : Meta-analysis of sensitivity for the ANN/MLP model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9a : Meta-analysis of specificity for the logistic regression model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9b : Meta-analysis of specificity for the RF model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9c : Meta-analysis of specificity for the XGBoost model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9d : Meta-analysis of specificity for the BLS model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9e : Meta-analysis of specificity for the SVM model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9f : Meta-analysis of specificity for the PCM across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 9g : Meta-analysis of specificity for the ANN/MLP model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 10a : Meta-analysis of PPV for the logistic regression model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 10b : Meta-analysis of PPV for the RF model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 10e : Meta-analysis of PPV for the SVM model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 10f : Meta-analysis of PPV for the PCM across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 10g : Meta-analysis of PPV for the ANM/MLP model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 11b : Meta-analysis of NPV for the RF model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 11c : Meta-analysis of NPV for the XGBoost model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 11d : Meta-analysis of NPV for the BLS model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 11e : Meta-analysis of NPV for the SVM model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 11f : Meta-analysis of NPV for the PCM across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 11g : Meta-analysis of NPV for the ANN/MLP model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 12a : Meta-analysis of accuracy for the logistic regression model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 12b : Meta-analysis of accuracy for the RF model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 12c : Meta-analysis of accuracy for the XGBoost model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 12d : Meta-analysis of accuracy for the BLS model across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.

Table 12e : Meta-analysis of accuracy for the SVM model across different studies in assessing in-hospital mortality among AKI patients.
deep learning models such as ANN/MLP.This leads to an over-representation bias, a limited comparison and a limited applicability to other domains.Additionally, two of the included studies did not include validation cohorts.In examining the performance of any predictive model on unseen data, a validation dataset is crucial to ensure a given model is making accurate predictions.Although a model trained and tested on the same dataset may still create generalizable predictions, the training accuracy of a model is a less reliable metric to determine its feasibility in a deployable environment, especially considering with

Table 12f : Meta-analysis of accuracy for the PCM across different studies in assessing in-hospital mortality among AKI patients.
D: derivation; V: validation.