Prognostic Accuracy of Screening Tools for Clinical Deterioration in Adults With Suspected Sepsis in Northeastern Thailand: A Cohort Validation Study

Abstract Background We sought to assess the performance of commonly used clinical scoring systems to predict imminent clinical deterioration in patients hospitalized with suspected infection in rural Thailand. Methods Patients with suspected infection were prospectively enrolled within 24 hours of admission to a referral hospital in northeastern Thailand between 2013 and 2017. In patients not requiring intensive medical interventions, multiple enrollment scores were calculated including the National Early Warning Score (NEWS), the Modified Early Warning Score, Between the Flags, and the quick Sequential Organ Failure Assessment score. Scores were tested for predictive accuracy of clinical deterioration, defined as a new requirement of mechanical ventilation, vasoactive medications, intensive care unit admission, and/or death approximately 1 day after enrollment. The association of each score with clinical deterioration was evaluated by means of logistic regression, and discrimination was assessed by generating area under the receiver operating characteristic curve. Results Of 4989 enrolled patients, 2680 met criteria for secondary analysis, and 100 of 2680 (4%) experienced clinical deterioration within 1 day after enrollment. NEWS had the highest discrimination for predicting clinical deterioration (area under the receiver operating characteristic curve, 0.78 [95% confidence interval, .74–.83]) compared with the Modified Early Warning Score (0.67 [.63–.73]; P < .001), quick Sequential Organ Failure Assessment (0.65 [.60–.70]; P < .001), and Between the Flags (0.69 [.64–.75]; P < .001). NEWS ≥5 yielded optimal sensitivity and specificity for clinical deterioration prediction. Conclusions In patients hospitalized with suspected infection in a resource-limited setting in Southeast Asia, NEWS can identify patients at risk of imminent clinical deterioration with greater accuracy than other clinical scoring systems.

Sepsis, defined as organ dysfunction from infection-related immune dysregulation, is a major cause of global disease and death [1].Critically, sepsis disproportionately impacts lowand middle-income countries [2].In tropical Southeast Asia, infection-related hospitalization is common, often progresses to sepsis, and infectious causes are diverse [3].
To rapidly identify hospitalized patients at high risk of clinical deterioration, multiple early warning scores have been developed and are commonly used in healthcare centers in North America, Europe, and Australia; scoring systems include the National Early Warning Score (NEWS) and the Modified Early Warning Score (MEWS) [4,5].Another early warning system, Between the Flags (BTF), has been developed for hospitalized patients in Australia [6,7].The quick Sequential Organ Failure Assessment (qSOFA) score, requiring only 3 clinical examination components, was initially developed to specifically help clinicians identify hospitalized patients at risk of sepsis outside of an intensive care setting [8].Although NEWS and MEWS were not originally designed to identify patients at high risk of progression to sepsis, they have been frequently used as sepsis screening tools in resource-rich settings, and 2021 Surviving Sepsis guidelines recommended against using qSOFA over NEWS or MEWS to screen for sepsis [9,10].
In patients with suspected infection in the United States, multiple studies suggest that NEWS may be superior for predicting death or intensive care unit (ICU) transfer compared with other commonly used scores [11,12].However, Early Warning Tools in Sepsis in Southeast Asia • OFID • 1 Open Forum Infectious Diseases M A J O R A R T I C L E established early warning scores may face limitations in resource-constrained regions, such as barriers to implementation, unique risk factors for sepsis-related outcomes, and low sepsis awareness despite a high burden of disease [1,2,13,14].For example, a study in Malawi reported reduced sensitivity and specificity of MEWS for predicting early death in hospitalized patients [15].In addition, the Royal College of Physicians in the United Kingdom initially developed NEWS to identify patients at risk of imminent clinical deterioration, typically within 24 hours of score calculation, though this outcome is not commonly assessed [5,16,17].Therefore, we sought to validate the accuracy of NEWS for predicting imminent clinical deterioration in patients hospitalized with suspected infection in northeastern Thailand and to compare its performance with that of other commonly used clinical assessment tools.To our knowledge, this is the largest prospective study to assess the prediction accuracy of clinical warning scores for early deterioration in a resource-constrained setting.

Study Design and Participants
Patients aged ≥18 years admitted to Sunpasitthiprasong Hospital in Ubon Ratchathani, Thailand with suspected sepsis were prospectively enrolled between 2013 through 2017.This cohort has been described elsewhere [18].In brief, patients admitted with suspected or documented infection within the prior 24 hours were eligible.In addition, patients were required to have ≥3 of 20 systemic manifestations of infection proposed as diagnostic criteria for sepsis by the 2012 Surviving Sepsis Campaign (Supplementary Table 1) [19].Recruitment occurred through screening medical records of patients admitted to the emergency department, medical wards and medical ICUs.Assessments by the study team, including collection of clinical and laboratory data, occurred at the time of enrollment as well as on the subsequent calendar days.All enrolled patients were contacted 28 days after enrollment to determine 28-day mortality and date of death, if applicable.In this secondary analysis, patients not requiring mechanical ventilation, vasoactive medications, or ICU admission at the time of enrollment were subsequently selected for inclusion in the analysis cohort.

Clinical Definitions
Four scores (NEWS, MEWS, qSOFA, and BTF) were calculated using clinical data available closest to the time of enrollment.Each score components, relevant variables and availability of data are listed in Supplementary Tables 2 and 3. A Glasgow Coma Scale (GCS) was calculated at the time of enrollment by the study team.As some of the calculated scores use an AVPU (alert, voice, pain, unresponsive) rating rather than GCS, a GCS ≤13 was considered equivalent to a V, P, or U rating for NEWS calculation.For MEWS and BTF calculations, a GCS ≤8 was considered equivalent to a P/U rating and a GCS of 9-13 was considered equivalent to a V rating [20].To convert BTF to a numerical score for model development, patients meeting any yellow zone criteria were assigned 1 point and those meeting any red zone criteria were assigned 2 points [11,21].Patients not meeting either zone's criteria were assigned a 0 and classified in a "low-risk" category.For consistency with previous methods, when score components were not available, 0 points were assigned [8,22].

Outcome Measure
The primary outcome measure was imminent clinical deterioration, defined as the new requirement of mechanical ventilation, vasoactive medications, ICU admission, or death within 1 day of enrollment.NEWS originally defined imminent clinical deterioration within 24 hours after score calculation [5].In our original study, follow-up data were collected prospectively by the study team each calendar day after patient enrollment.To minimize the effect of variable follow-up times in this secondary analysis, only patients with follow-up data obtained during a window of 20-28 hours following enrollment were included.At the study hospital, it is common practice to rapidly discharge both improving patients as well as those in moribund conditions who wish to die at home.In this secondary analysis, patients discharged before the follow-up window were included in the analysis and were considered to meet the outcome criteria if they died within 1 calendar day of enrollment.However, data regarding the other elements of the imminent clinical deterioration definition were not available for patients discharged before the follow-up assessment.

Statistical Analysis
Enrollment clinical data, including calculated scores, were summarized using proportions for discrete variables and medians and interquartile ranges for continuous variables.The association of each score with clinical deterioration was evaluated by logistic regression.Discrimination for predicting clinical deterioration was subsequently evaluated by generating the area under the receiver operating characteristic curve (AUROC).Models were assessed for bias by internal validation using 10-fold cross-validation [23].Comparisons of AUROCs were made using the roccomp command in Stata software.Cutoff values representing optimal discrimination were determined using Youden's index [24].Subsequently, the sensitivity, specificity, and negative and positive predictive likelihood ratios with corresponding 95% confidence intervals were calculated.Analyses were performed using Stata/SE software, version 14.2.Reporting guidelines for multivariable prediction model validation were followed [25].

Sensitivity Analyses
Multiple sensitivity analyses were performed.In our primary analysis, missing variables were assumed to be normal during score calculation [8,22].Two sensitivity analyses were therefore also performed related to bias from this treatment of missing data: (1) a complete case analysis including only patients without any missing variables and (2) an analysis in which the most recent available prior value was used when a variable was missing, similar to prior methods [12].In addition, patients were often enrolled near the time of hospital admission, and all were enrolled within 24 hours of admission.Therefore, a third sensitivity analysis using the most abnormal values between admission or enrollment was performed, though early warning scores may not be used this way in clinical practice.Finally, as this was a secondary analysis of a prospectively collected data set, the exact follow-up times for enrolled patients was variable.To minimize variability in the time between enrollment score calculation and the outcome measurement, a 24 ± 4-hour window was selected a priori.However, to account for the possibility that excluded patients with follow-up outside this window may be relevant to score prediction accuracy, we performed a fourth sensitivity analysis for follow-up data available during a 24 ± 8-hour window after the time of enrollment.

Patient Consent Statement
Written informed consent was obtained from study participants or their representatives before enrollment.The studies were approved by the Sunpasitthiprasong Hospital Ethics Committee (no.039/2556), the Ethics Committee of the Faculty of Tropical Medicine, Mahidol University (no.MUTM2012-024-01) the University of Washington Institutional Review Board (no.42988).and the Oxford University Tropical Research Ethics Committee (no.OXTREC172-12).

Patient Characteristics
Of 4989 patients in the original cohort, 2939 were not admitted to the ICU or required mechanical ventilation or vasoactive medications at enrollment and were retained for this secondary analysis.Of this selected cohort, 2680 had follow-up data within 24 ± 4 hours after enrollment (Figure 1); 259 were excluded from analysis because their follow-up occurred outside the 24 ± 4-hour window.Of the 2680 patients in the final analysis cohort, 138 (5%) were discharged before the follow-up period.The clinical characteristics of the final analysis cohort are listed in Table 1.The median age in this cohort (interquartile range) was 53 (34-68) years, and 49% identified as female.In this cohort, 1661 of 2680 (62%) were referred from 53 different hospitals in the region; 100 of the 2639 patients (4%) met the criteria of clinical deterioration within a day (24 ± 4 hours) of enrollment (outcome distribution provided in Supplementary Figure 1).Clinical scores were calculated at the time of enrollment, and summarized scores are listed in Table 1.The distribution of each score and related proportion of clinical deterioration was then calculated for each score level.For each of the 4 scores, the proportion of patients experiencing a clinical deterioration generally increased with higher scores (Supplementary Figure 2).
We also performed multiple sensitivity analyses: (1) a complete case analysis in 2103 patients with complete variable data at the time of enrollment, (2) an analysis calculating scores using the most recent prior value when a score variable was missing, (3) an analysis calculating the worst score variable within the time from admission to enrollment, and (4) an analysis using enrollment scores but including 2930 patients with follow-up data available during a 24 ± 8-hour window after enrollment.Relative discrimination of clinical deterioration remained similar among all assessed scores (Supplementary Table 4).
Because of the relative strength of NEWS compared with the other assessed scores, we calculated an optimal NEWS threshold using a Youden index and determined the subsequent clinical performance of this threshold in our cohort.An optimal NEWS of ≥5 had a sensitivity of 89% (95% CI, 81%-94%) and specificity of 54% (52%-56%) for clinical deterioration.A NEWS of ≥6 decreased sensitivity to 76% (95% CI, 66%-84%) but improved specificity to 70% (68%-72%).Complete clinical performance characteristics of these NEWS thresholds are listed in Table 3.

DISCUSSION
In a large, prospectively enrolled cohort of patients hospitalized with suspected infection in northeastern Thailand, NEWS was superior to 3 other scoring systems for predicting early clinical deterioration.To our knowledge, this study is the largest prospective validation of NEWS undertaken to date in a resourceconstrained setting.
NEWS and its successor, NEWS-2, were originally developed in the United Kingdom as simple scoring systems to alert healthcare providers to hospitalized patients with a high risk of clinical decline outside of ICUs [16,17].Multiple large studies in resource-rich settings have demonstrated that NEWS can outperform other scores in predicting clinical decline.An analysis of electronic health record data from a large healthcare system in the United States found that NEWS was superior to MEWS, qSOFA, systemic inflammatory response syndrome (SIRS), and BTF at predicting ICU admission or death during hospitalization in patients with or without infection [11].A similar US-based study of hospitalized patients with suspected infection also reported that NEWS performed better than several other scores at identifying patients outside the ICU at risk of in-hospital death or ICU transfer [12].
However, the original evaluations of NEWS focused on its prediction of imminent clinical deterioration, typically within 24 hours after score calculation, and not longer-term outcomes [5,16].Evaluations of NEWS in resource-constrained settings have been similarly limited by retrospective study design or identification of individuals at risk of different outcomes, such as in-hospital death or sepsis diagnosis [26,27].In our study, we specifically sought to validate the performance of NEWS in identifying patients at highest risk of imminent clinical deterioration in a large prospectively recruited cohort of patients with suspected infection in northeastern Thailand.
Whether specific scoring systems may be superior in identifying high-risk patients with suspected infection is a matter of debate.qSOFA, for example, was not originally designed as an early warning score but rather to identify patients at risk of sepsis outside of a critical care setting, and it has been implemented as a tool for early sepsis identification in resource-limited settings [8,28].Indeed, the 2021 Surviving Sepsis Guidelines recommended against using qSOFA over other scores as single screen for sepsis [10].Nevertheless, due to its simplicity, qSOFA remains an attractive tool for resource-limited settings.We have also reported that augmenting the qSOFA score with a point-of-care lactate concentration may have similar 28-day outcome prediction in Thai patients with suspected infection as the more cumbersome Sequential Organ Failure Assessment (SOFA) [29].However, in this study, qSOFA had significantly lower accuracy than NEWS for predicting imminent clinical deterioration.MEWS, which uses a method similar to that of NEWS, though without oxygen-related data, has strong prediction of infection-related outcomes in high-resource settings but had markedly worse predictive accuracy of clinical deterioration compared with NEWS in our study [12].
In our cohort, the optimal NEWS threshold was 5, the same threshold for activating an urgent healthcare provider response proposed by the Royal College of Physicians [16].This similar target across diverse patient populations possibly highlights the strength of a more comprehensive clinical score [19].However, a more comprehensive scoring system may also not be practical in a resource-limited setting.For example, per the Royal College of Physicians, a NEWS of ≥5 activates an escalation protocol including hourly assessments along with an urgent clinician assessment [17].While our study suggests that a patient with a NEWS of ≥5 may be at higher risk of deterioration in the next 24 hours, whether an urgent healthcare provider response may be beneficial is unknown, particularly if additional interventions are limited.
While the close monitoring recommended by the Royal College of Physicians may pose challenges in a resource-limited setting, implementation of NEWS may improve patient monitoring and outcomes, even when protocol adherence is poor [30].In addition, a NEWS threshold of ≥5 in our study had a reasonably high sensitivity of 89% and a negative predictive value of 99%, suggesting that intermittent screening may have value in identifying which patients are at lower risk of deterioration, allowing for triage and resource utilization.Early Warning Tools in Sepsis in Southeast Asia • OFID • 5 However, further research is necessary to understand the utility and potential implementation challenges of NEWS in resourcelimited settings.
Our study has several strengths.Data in the parent study were collected prospectively, a rarity in the region.Enrollment was undertaken within 24 hours after admission, and for this study we analyzed patients with dedicated follow-up occurring 20-28 hours after enrollment.This study design resulted in minimal missing data and rigorous capture of cases with early clinical deterioration.Our broad definition of clinical deterioration did not require ICU transfer, as this may not be appropriate in resource-limited settings where critical care interventions are often provided outside of a dedicated ICU facility [22,31].In addition, the parent study was performed in northeast Thailand, where infectious causes are diverse [3,32].Finally, we performed numerous sensitivity analyses to account for variability in prospective data collection and patient inclusion.
Our study also has several limitations.Each of the assessed clinical scoring systems was developed in resource-rich settings and may have limitations in its implementation in other settings [13,14].This was a single-center study of a referral hospital in northeast Thailand.Although 62% of the cohort was transferred from 53 different hospitals-representing a broad catchment of the region-our results may not be representative of other settings or the referring healthcare centers.As study enrollment began in 2013, patients were identified as having suspected sepsis using contemporary criteria and not using current sepsis guidelines, though these may not be appropriate for resource-limited settings [33,34].We also compared 4 scores widely studied across economic settings, but other novel scores, including those localized to our specific setting, may have superior performance.
The clinical data used to calculate scores were collected prospectively at the time of enrollment, but the predictive performance of sequential scores or scores calculated later in the hospital stay-or prediction of deterioration beyond 24 hours after score calculation-is unknown.Furthermore, how these scores would perform outside of a prospective study may be different [35].Finally, we did not evaluate the performance of NEWS-2 [17].While NEWS-2 may have additional strengths compared with NEWS, it requires the additional consideration of values for the arterial partial pressure of carbon dioxide.In our study, arterial blood gas measurements were not frequently obtained outside the ICU, limiting calculation of NEWS-2.However, this may reflect the advantages of a score based on readily available clinical data points in resource-limited settings.
In conclusion, we report that NEWS is superior to other early warning scores and to qSOFA in predicting early clinical deterioration in patients hospitalized with suspected infection in a resource-constrained area in Southeast Asia.These findings could have critical implications for clinical care guidelines in similar areas where infection-related hospitalization and sepsis are common.

Figure 1 .
Figure 1.Study flow chart, showing the analysis of the cohort.
Abbreviations: BTF, Between the Flags; HIV, human immunodeficiency virus; ICU, intensive care unit; IQR, interquartile range; MEWS, Modified Early Warning Score; NEWS, National Early Warning Score; qSOFA, quick Sequential Organ Failure Assessment. a Data represent no.(%) of patients unless otherwise specified.b Percentage listed for each clinical deterioration component is the percentage of patients with clinical deterioration.

Figure 2 .
Figure 2. Area under the receiver operating characteristic curves for predicting clinical deterioration, comparing the National Early Warning Score (NEWS), the Modified Early Warning Score (MEWS), the quick Sequential Organ Failure Assessment (qSOFA) score, and Between the Flags (BTF).

a
Ten-fold internal cross-validation.

Table 2 . Discrimination of Clinical Scores for Predicting Clinical Deterioration
Abbreviations: AUROC, area under the receiver operating characteristic curve; BTF, Between the Flags; CI, confidence interval; MEWS, Modified Early Warning Score; NEWS, National Early Warning Score; qSOFA, quick Sequential Organ Failure Assessment.