Derivation and Internal Validation of a Mortality Prognostication Machine Learning Model in Ebola Virus Disease Based on Iterative Point-of-Care Biomarkers

Abstract Background Although multiple prognostic models exist for Ebola virus disease mortality, few incorporate biomarkers, and none has used longitudinal point-of-care serum testing throughout Ebola treatment center care. Methods This retrospective study evaluated adult patients with Ebola virus disease during the 10th outbreak in the Democratic Republic of Congo. Ebola virus cycle threshold (Ct; based on reverse transcriptase polymerase chain reaction) and point-of-care serum biomarker values were collected throughout Ebola treatment center care. Four iterative machine learning models were created for prognosis of mortality. The base model used age and admission Ct as predictors. Ct and biomarkers from treatment days 1 and 2, days 3 and 4, and days 5 and 6 associated with mortality were iteratively added to the model to yield mortality risk estimates. Receiver operating characteristic curves for each iteration provided period-specific areas under curve with 95% CIs. Results Of 310 cases positive for Ebola virus disease, mortality occurred in 46.5%. Biomarkers predictive of mortality were elevated creatinine kinase, aspartate aminotransferase, blood urea nitrogen (BUN), alanine aminotransferase, and potassium; low albumin during days 1 and 2; elevated C-reactive protein, BUN, and potassium during days 3 and 4; and elevated C-reactive protein and BUN during days 5 and 6. The area under curve substantially improved with each iteration: base model, 0.74 (95% CI, .69–.80); days 1 and 2, 0.84 (95% CI, .73–.94); days 3 and 4, 0.94 (95% CI, .88–1.0); and days 5 and 6, 0.96 (95% CI, .90–1.0). Conclusions This is the first study to utilize iterative point-of-care biomarkers to derive dynamic prognostic mortality models. This novel approach demonstrates that utilizing biomarkers drastically improved prognostication up to 6 days into patient care.

The 2014-2016 West Africa Ebola virus disease (EVD) outbreak was the most geographically widespread and deadliest to date, the scale and duration of which prompted significant research on the natural history and outcomes of the disease [1].Several studies found that Ebola virus cycle threshold (Ct) based on reverse transcriptase polymerase chain reaction (PCR)-a value inversely proportional to viral load-was important in mortality prognostication [2,3].Since then, multiple clinical prognostic models have used Ct, epidemiologic, and symptom-based data to identify risk of death and inform care delivery for patients infected with EVD [4][5][6][7].A small number of studies have reported associations between EVD mortality and elevations in biomarkers such as C-reactive protein (CRP), blood urea nitrogen (BUN), creatinine, aspartate aminotransferase (AST), and alanine aminotransferase (ALT) [3,[8][9][10][11].However, during the West Africa outbreak, laboratory facilities were scarce, and serum biomarker testing was not available at most Ebola treatment centers (ETCs).Therefore, the incorporation of biomarker data into prognostic models has been limited.
The 10th EVD outbreak in the Democratic Republic of Congo (DRC) from 2018 to 2020 was the second deadliest, with 2287 deaths and 3470 confirmed cases [12].Point-of-care (POC) serum testing was more widely available during this outbreak, allowing the integration of biomarkers into prognostic scores for adult and pediatric populations with EVD [13,14].Although these scores have potential utility in informing patient care, the research to date is limited by the use of biomarker data from the first 48 hours of ETC admission.These single-time point approaches are unable to Mortality Prognostication in Ebola Virus Disease • OFID • 1 Open Forum Infectious Diseases M A J O R A R T I C L E integrate changes in clinical states and biomarkers over the patient's treatment course.Given this, the current study aimed to derive a dynamic biomarker-based tool for iterative mortality prognostication by using machine learning (ML) models among persons with EVD, with data collected from the 2018-2020 DRC outbreak.

Study Participants and Setting
This retrospective cohort study included patients aged ≥18 years admitted to International Medical Corps' Mangina ETC with laboratory-confirmed EVD from December 2018 to January 2020.The Mangina ETC, located in the North Kivu province, served the catchment areas of the North Kivu and Ituri provinces.It had a capacity of 102 beds and was staffed by 17 physicians and 43 nurses.Patients were excluded if their survival outcome or Ct was not recorded.

Patient Consent Statement
The project received a human research exemption from the institutional review board at Rhode Island Hospital (Lifespan Health System; No. 1527875).All research was performed in accordance with DRC government regulations.No additional local approval was required.

Clinical Approach and Biomarker Testing
ETC staff screened all persons according to World Health Organization and Médecins Sans Frontières guidelines [15,16].Patients with PCR-confirmed EVD were admitted directly to the confirmed ward.Patients with suspected cases were admitted to a separate ward, where they underwent serologic testing with reverse transcriptase PCR (Xpert Ebola Assay; Cepheid).If results were negative, patients were tested again after 72 hours and then discharged if the second test result was negative.If the result was positive, patients were admitted to the confirmed ward.The Xpert Ebola Assay was also used to determine Ct, the minimum number of PCR cycles required to detect the target RNA.Ct is an inversely proportional proxy for viral load; a cutoff >40 cycles indicated a negative test result.All EVD laboratory testing was conducted by the Institut National de Recherche Biómedicale, DRC.
POC serum biomarker testing was available for use at the discretion of the clinical providers.POC testing was performed with the Piccolo Xpress chemistry analyzer (Abaxis) and Piccolo AmLyte 13 cartridges, which measure ALT, albumin, amylase, AST, BUN, calcium, CRP, creatinine, creatinine kinase, glucose, potassium, sodium, and total bilirubin [17].
Medical records were maintained in hard copy for each patient.At ETC triage, information was collected on demographics, symptoms, symptom onset, and self-reported vaccination status.Patients were evaluated daily by medical staff, and clinical information was documented at each assessment.Ct and POC biomarker testing were documented as utilized.
Clinical records were retrospectively digitalized into a secure server according to previously validated methods [18], deidentified, and manually extracted by trained research staff who were blinded to the study aims.Data quality assurance was assessed by double entering 15% of records.The double-entry audit found 97.3% concordance between the data sets.

Statistical Analysis
Demographic data, self-reported vaccination status, symptoms at triage, and mortality outcome were summarized by median and IQR for continuous variables and frequency and percentage for categorical variables.To ensure sufficient data for modeling, biomarker data were combined into 2-day periods: treatment days 1 and 2 (D 1,2 ), days 3 and 4 (D 3,4 ), days 5 and 6 (D 5,6 ).Treatment day 1 referred to the calendar day of a patient's ETC admission.Modeling beyond treatment day 6 was not possible due to high class imbalance, as most deaths occurred in the first week of care.Biomarker distribution across D 1,2 , D 3,4 , and D 5,6 was summarized by median and IQR.Consistent with prior literature, high viral load was defined as Ct ≤22 [19].As sodium, potassium, calcium, and glucose can have deleterious effects at elevated and reduced values, they were categorized as high, normal, and low based on Piccolo AmLyte 13 reference ranges [17].Biomarkers were compared by mortality outcome according to Pearson chi-square or Fisher exact test for categorical variables and by Mann-Whitney or t test for continuous variables, as appropriate.
An iterative series of logistic regression prognostic ML models for the outcome of mortality during ETC care were created with data from patients with recorded values for candidate predictors for a given period.The ML model cannot hold an infinite number of variables.Therefore, biomarkers to enter the model were chosen a priori according to clinical expertise and statistically significant differences between survivors and nonsurvivors.The serial ML models used the risk profile from the preceding model to provide an updated risk prediction and, in turn, inform subsequent predictions.This approach provided a more temporally dynamic output to account for changes in clinical status over time and addressed the limitations of using data from only initial presentation.Building from prior work, a base model was used with predictors of age and Ct value at ETC admission, which were modeled as continuous variables [6].In addition to the risk prediction from the preceding model, all Ct and POC biomarkers measured within 2 days from the preceding prognostic model were iteratively incorporated to construct updated models for each of the 3 periods.Transformations were applied to certain biomarkers to account for their clinical impacts.For example, potassium was converted into a unimodal variable by subtracting the mean of the normal range and taking the absolute value of this difference, while albumin was coded inversely by subtracting each value from the maximum observed among all samples.
The base model was augmented with biomarkers from D 1,2 , D 3,4 , and D 5,6 and Ct values from D 3,4 , and D 5,6 according to a validated protocol that recalibrated the base model and incorporated additional predictors simultaneously by fitting a new model with the linear predictors of the base model and the additional biomarkers [20].Biomarkers were selected per elastic net regularization, which combines the Lasso and Ridge regression methods to handle multicollinearity [13,21].The threshold for variable inclusion was set at 70% to exclude variables with weak or inconsistent effects.This threshold also ensured that no pair of variables had a Pearson correlation coefficient >0.65, therefore reducing the potential for model issues due to collinearity [22].The updated model that resulted from augmenting the base model with the selected biomarkers from D 1,2 was subsequently augmented with the Ct values and selected biomarkers from D 3,4 , and again augmented with those from D 5,6 .Bootstrap resampling was used for internal validation of each recursive model.Discrimination was evaluated by optimism-corrected area under curve (AUC) of the receiver operating characteristic (ROC) curves, while calibration was assessed through a calibration plot comparing predicted probabilities with observed outcome [23].ROC curves with AUC and associated 95% CIs were derived for each 2-day clinical period of interest.The ROC curves were generated by the pROC package with R Statistical Software (version 4.3.2;R Core Team).
A sensitivity analysis was conducted to assess the iterative prognostic model results without longitudinal Ct data, as Ct values are not a POC biomarker.This analysis followed the same methodology as the primary model but eliminated Ct as a variable from all but the base model.The base model was augmented with D 1,2 biomarkers selected through elastic net regularization.Selected biomarkers from D 3,4 were subsequently incorporated, followed by those from D 5,6 .To evaluate the potential impact of EVD-specific therapeutics, a second sensitivity analysis was performed among the subset of patients who received mAb114 or REGN-EB3 following the same methodology.

Cohort Characteristics
From December 2018 to January 2020, 425 patients with PCR-confirmed EVD were admitted to the ETC.The study included 310 patients after excluding 101 patients <18 years old, 2 with no documented Ct value, and 12 with no documented survival outcome.Biomarker data were available for 100 patients during D 1,2 , 84 during D 3,4 , and 69 during D 5,6 (Figure 1).There were no statistically significant differences in sex, age, self-reported vaccination status, or symptoms at triage (Supplementary Table 2).

POC Biomarker Profile
Summary biomarker data were stratified by mortality across D 1,2 , D 3,4 , and D 5,6 (Figure 2).Hepatic biomarkers, including total bilirubin, AST, and ALT, were significantly higher among nonsurvivors during D 1,2 , and D 3,4 but were not significantly different during D 5,6 .AST showed greater variance vs ALT in survivors and nonsurvivors.Primary renal biomarkers, including BUN and creatinine, were significantly higher among nonsurvivors during all 3 periods.Albumin was significantly lower among nonsurvivors in all periods.CRP was significantly higher in nonsurvivors during all 3 periods and showed an upward trend among nonsurvivors between D 1,2 and D 5,6 .In comparison, CRP in survivors initially increased between D 1,2 and D 3,4 but then decreased between D 3,4 and D 5,6 .

Prognostic Models
Figure 3 displays distributions of the biomarkers associated with mortality identified through elastic net regularization regression.Significant predictors of mortality were elevated creatinine kinase, AST, BUN, ALT, and potassium; hypoalbuminemia during D 1,2 ; elevated CRP, BUN, and potassium during D 3,4 ; and elevated CRP and BUN during D 5,6 .
In the sensitivity analysis, biomarkers associated with mortality remained consistent with the primary model for each period.For D 1,2 , the AUC remained 0.84 (95% CI, .73-.94; Supplementary Figure 2B) since it already incorporated Ct on admission.Augmenting the D 1,2 model with D 3,4 biomarker values improved the AUC to 0.87 (95% CI, .72-1.0;Supplementary Figure 2C), and further incorporating D 5,6 biomarkers values resulted in an AUC of 0.95 (95% CI, .87-1.0;Supplementary Figure 2D).Calibration curves for the sensitivity models are displayed in Supplementary Figure 3. Furthermore, the sensitivity analysis of the subset of patients who received REGN-EB3 or mAb114 revealed AUCs similar to the primary model (Supplementary Figure 4); however, the data were too sparse to perform the D 5,6 model analysis.

DISCUSSION
This study is the first to utilize iterative dynamic biomarker data from an outbreak response setting in a ML model for EVD mortality prognostication.This novel approach demonstrates that the addition of easily attainable POC biomarker data significantly improved the ability to identify patients at higher risk for mortality up to 6 days into patient care.In the era of increased availability of POC testing and continued limited access to EVD therapeutics [24,25], these data suggest that inclusion of POC biomarkers may have clinical utility in identifying patients who are the most vulnerable and informing   EVD response care.As the current analysis is a derivation of this potentially beneficial approach, further research with external validation and implementation assessments is needed to properly inform clinical application.
Although prior prognostic models have been reported, they are limited to biomarkers collected in the first 48 hours of ETC admission and lack data on the temporal variability as it relates to EVD mortality [4][5][6][7].While early identification and prognostication are important, the ability to triage patients and allocate resources throughout their ETC admission would be beneficial and may improve clinical outcomes.The current iterative model is the first to include biomarker data past the first 48 hours, thereby addressing that limitation, and demonstrates an improvement in predictive accuracy over the course of care.
The sensitivity analysis showed that the model is robust and was not substantially affected when Ct from D 3,4 and D 5,6 was eliminated.As PCR is a more intensive assay to complete, the results of the sensitivity analysis further support the pragmatic utility of POC biomarkers.The longitudinal modeling approach provides the advantage of allowing for the reevaluation of a patient's mortality risk over time during EVD treatment, and it suggests that the utilization of iterative ML model architecture could provide clinical prediction models that leverage more objective predictive parameters from biomarker data.Given the complexity of ML modeling over multiple periods, further work to develop digital health interfaces that are accessible and acceptable to health care providers is needed.The temporal variation of biomarkers found to be associated with mortality in the current study is consistent with clinical reasoning and biologic plausibility.The D 1,2 hepatic and renal dysfunction associated with mortality is well supported by prior research [3,[8][9][10].Although AST had a larger variance than ALT in the population studied, the median AST value was more than twice as high as ALT, which has been described in EVD [11].As ALT is primarily present in the hepatic tissue but AST also exists in heart, muscle, kidney, brain, pancreas, and lung tissue, this difference may reflect systemic cellular damage instead of direct hepatotoxicity of Ebola virus.BUN is the only variable that remained a constant predictor during the 3 prognostic periods and may represent a combination of volume contraction as well as systemic protein breakdown in the setting of inflammatory cytokine surge that is seen in EVD [26].While the sensitivity analysis demonstrated stability of the biomarkers associated with mortality during each time point, it is possible that variations in specific prognostic biomarkers could exist according to individual patient characteristics.As such, future research to validate the iterative modeling is needed to confirm the biomarker temporal profile in EVD.
Significant advancements have been made in EVD therapeutics, yet access to these therapies in low-resource settings remains limited such that patient triage and resource management continues to be an essential element of outbreak response [25].The World Health Organization guidelines for EVD therapeutics recommend treatment with mAb114 or REGN-EB3 for patients with PCR-confirmed EVD, as these therapies have shown to reduce EVD mortality [27,28].Nevertheless, aside from approval for compassionate use, neither therapy is approved for therapeutic use in the DRC, and manufacturers of the recommended treatments do not currently have plans to obtain licensure in African countries [24].In this context, the stratification of patients with EVD to most accurately identify those at the highest risk of a poor outcome is crucial for providing optimal care and ensuring the most impactful resource allocations.This derived iterative POC biomarker model provides dynamic risk stratification and may represent a significant step toward improving care for patients with EVD, particularly when supply chains for therapeutics are constrained and dictate the need to focus treatments to those at highest risk for mortality.

Limitations
Certain limitations should be considered when interpreting the current study.The lack of standardized biomarker collection may have introduced selection bias.However, testing was performed as deemed indicated by clinical providers and is representative of actual operating conditions in outbreak response care.Additionally, patients who died early after admission may not have had biomarker samples collected, and this could have introduced survival bias.Yet, one of the strengths of the iterative modeling approach is that patients did not require data from each time point to contribute to the model.The Mangina ETC was a PALM trial research site, and approximately 30% of patients received mAb114 or REGN-EB3.Furthermore, 44.0% self-reported having received the rVSV-ZEBOV vaccine.It was not possible to verify vaccination status, although >70% of patients reported that they had received the vaccine within 2 weeks of presentation to the ETC, which suggests that many may not have had sufficient time to mount protective immunity.Although the model did not account for patient-level pharmacotherapies and their potential impact on biomarker variations [29], none of the PALM trial therapies were shown to significantly affect viral clearance [28].The incorporation of physiologic profiles over time may inherently account for such effects.Last, external validation and implementation research is required to develop digital health interfaces for future application of the prognostic tool in response efforts.

CONCLUSION
The findings from the current study demonstrate that the POC biomarker-based iterative prognostic model yielded enhanced identification of patients with EVD who were at high risk for mortality up to 6 days into patient care.Given the legitimate limitations regarding availability of EVD-specific pharmacologic treatments in response settings, identification of patients' risk profiles over the duration of their treatment course is important to maximize resource allocation and minimize mortality.If future work validates the current findings, inclusion of POC biomarkers may have substantial clinical utility in informing EVD response care.

Figure 4 .
Figure 4. Receiver operating characteristic curves for the mortality prognostic models based on iterative biomarker data: A, base model including age and cycle threshold; B, treatment day 1,2 ; C, treatment day 3,4 ; and D, treatment day 5,6 .Central black line, curve; gray shaded area, 95% CI.AUC, area under the curve.